If you listen to the executives at Pearson an and the other big edtech companies who are using AI to assess student writing, you begin to believe that we at the dawning of a new era. We have reached a point where we can feed our students’ essays into an algorithmic software like Pearson’s Intelligent Essay Assessor and they can can get feedback that is not only instantaneous, but substantive as well. The gradebots are here and they want to help!
Most of my English-teaching colleagues, as I mentioned in part one of this post, are absolutely open to this kind of technological assistance. Sure, some have expressed the fear that algorithm-based assessment may one day alleviate the need for teachers, and I understand their concerns, but don’t necessarily share them.
I see the AI that is being developed as something that savvy educators will use take off some of the pressure (as would be the case with automated essay assessment), and something that that would enhance student engagement. I’m just one of those who believe that the role of the organic, flesh-and-blood teacher could never really be replaced.
I don’t fear AI assessment. I’m just skeptical about whether or not it really works. I’ve assessed and provided feedback on tens of thousands of papers in my 25 years as an English teacher, and I know that there are infinite ways to express oneself in writing. Every writer has a different, approach, a different take, a different voice. Writing is a strange, organic, often nonlinear thing. Sure, there are grammar, spelling and syntax conventions which I imagine a computer could identify and, if necessary, correct. And certainly, paragraph structure and overall coherence of thought is critical to good writing, but so much more goes into it.
I like what the people at Hubert ai, an edtech company out of Sweden say about this:
The big question is just how much of a poet a computer is capable of becoming in order to recognize small but significant nuances the can mean the difference between a good essay and a great essay. Can it capture essentials of written communication: reasoning, ethical stance, argumentation, clarity?
This is precisely it for me and why, at my core, I am skeptical when it comes to the claims made by those who announce that that the assessment bots have arrived, and that they’re just as good, if not better than human graders. Which brings me to the views of an even greater skeptic of AI assessment, Les Perelman, former director of undergraduate writing at MIT.
Perlman in his studies has found that AI writing assessment software is light years away from being an effective or reliable means of judging student work. He even demonstrated this in a very humorous way by developing a software, called Babel Generator, which constructs complete essays within seconds based on keywords. The essays use rich vocabulary, complex sentence construction and flawless syntax, but, when read, are completely meaningless. Check out this nugget I generated from Babel Generator, using the keywords hybrid, climate and future.
“As I have learned in my theory of knowledge class, humanity will always incarcerate time to come. The same brain may emit two different neutrinoes to reproduce. Despite the fact that gravity counteracts plasmas, the same brain may receive two different neurons of accessions.”
Wow! What a mess, and yet, the kind of written expression that a software like Pearson’s Intelligent Essay Assessor might score a proficient or beyond-proficient level. Perlman has also made intelligible, but totally outrageous claims and been rewarded by automated grading systems. On an essay he submitted to e-Rater, the automated grader developed by ETS (Educational Testing Service) and very similar to Person’s technology in response to a prompt asking respondents to discuss some of the challenges facing education, he argued that the main challenge is that teachers assistants are paid six times as much as college presidents and receive extra benefits such as south seas vacations and trips on private jets, thus causing financial stress on universities. He received a 6 out of 6 on the essay. He has gamed and fooled a number of systems, and now companies that produce such software won’t even talk to him. I can’t blame them.
Perlman’s work supports what I’ve sort of suspected all along. Any experienced English teacher knows that assessing student style, reasoning, critical thought, voice, et al is uniquely human ability. An essay is at its core a dialogue between a person with something to say and a human audience that cares to spend some time listening. It is not an interaction between a person and a machine. The machine doesn’t care. It can’t care. As I say this, I can’t help but be reminded of the movie, Her in which Joaquin Phoenix forges an “intimate” relationship with the voice on his phone which seems somewhat human, but clearly falls short in the end. I see the automated grading software as something “like” a real teacher, but not quite the real thing.
This is not to say that computer scientists won’t someday crack the essay-assessment code. AI is changing the way we interact with the world in breathtaking ways, and, as I’ve said (and I believe I speak for many an overburdened English teacher), I would welcome an essay bot that could provide meaningful and substantive feedback to my students quickly. I would love to dedicate all of my time to the fun part, the teaching part. I just don’t think that we’re there yet.
Despite what companies like Pearson and ETS claim, real assessment and real feedback still lies in the hands of real teachers. What remains is for us to determine ways that we can take the pressure off of them that smartly leverages technology in a way that maximizes the teachers’ ability to bring their expertise, experience and human judgement in the equation.