More recently, the former MIT professor teamed up with some students to create BABEL, a computer program that can create gibberish essays that other computer programs score as outstanding pieces of writing.

Robo-scoring fans like to reference a 2012 study by Mark Shermis (University of Akron) and Ben Hamner, in which computers and human scorers produced near-identical scores for a batch of essays. The full dismantling is here, but the basic problem, beyond methodology itself, was that the testing industry has its own definition of what the task of writing should be, which more about a performance task than an actual expression of thought and meaning.

It's cheap, it's quick, and it makes it easy to hoover up a ton of data about each student.

But that sort of automated scoring only works reliably for bubble tests, assessments based on superficial objective questions.

Automation has always broken down when it comes to machine-scored writing. Pearson released a white paper entitled "Pearson's Automated Scoring of Writing, Speaking, and Mathematics" way back in 2011, and hardly a year goes by that some media outlet doesn't publish an article along the lines of "Whizbang Corporation Announces Computers Can Grade Essays." It's never true, but the dream is so beautiful that test manufacturing companies can't stop trying.

Says the senior research scientist at ETS, "In other words, rather than trying to make software recognize good writing, we'll simply redefine good writing as what the software can recognize." In states like Utah and Ohio where it is being used, we can expect to see more bad writing and more time wasted on teaching students how to satisfy a computer algorithm rather than develop their own writing skills and voice to become better communicators with other members of the human race. Fans of robo-graders like the one in the NPR piece talk about how the AI can "learn" what a good essay looks like by being fed a hundred or so "good" essays. The first is that somebody has to pick the 100 exemplars, so hello again, human bias. The second is that this narrows the AI's view by saying that a good essay is one that looks a lot like these other essays. The point is not that robo-graders can't recognize gibberish. The point is that their inability to distinguish between good writing and baloney makes them easy to game. That's underlined by a horrifying quote in the NPR piece.


