Sunday, June 13, 2010

Merit Pay for Teachers

Should K–12 teachers be paid according to how well their students learn? This is one of those ideas that sounds like a no-brainer. After all, aren’t other professionals routinely evaluated and compensated in some way that relates to their accomplishments? Why shouldn’t teachers, too, be rated on their effectiveness and paid more when they do a better job?

The problem, it seems to me, is how their effectiveness can be fairly evaluated. Looking at student outcomes seems reasonable, but that is not simple. We tend to oversimplify it by judging what children learned on the basis of achievement test scores. Most would agree that absolute score differences (just higher scores) are no measure of teacher effectiveness. Obviously, not all students of the same grade begin the year at the same level or possess the same skills. If we used that measure, everyone would fight to teach only the most gifted and motivated students.

So, should we use a growth or value-added model, to see whether individual children show a year’s progress in a year? I ran across a very entertaining video in which cognitive psychologist and University of Virginia Prof. Daniel Willingham describes — in less than three minutes! — six problems (some conceptual, some statistical) with evaluating teachers by comparing student achievement in the fall and in the spring. Among them are biased data when some children move away during the year, the effect of others on how teachers do (such as the help or hindrance of a building administrator), and the effect of peers — is the classroom cohort rowdy or well-behaved? I’ve spent enough time in classrooms to know that sometimes there is a student who seems to suck all the oxygen out of the room, making life difficult for everyone else. I might add that some years (when going through a messy divorce, or when grieving the loss of a child or a spouse) will find a typically great teacher unable to think or function as well as usual. Should she be fired? Are her students permanently harmed? I think of the lessons in compassion and in dealing with difficult people that I and my children learned from such sub-optimal school years — which reminds me of all we learn that is useful in life but not evaluated on tests.

A related conceptual problem with using test scores is that they focus on short-term gains. Most of us, in higher education and then in adult life, did not find that only discrete bits of information were important. Broader knowledge and skills — and especially the abilities to research, evaluate, learn, and apply things on our own — were much more vital to our success. Yet these skills are not evaluated on most tests.

Portfolios of student work give a superior picture of student abilities, but they would be very expensive and time-consuming to evaluate in a fair way. If we cannot find the time or money for essays in our achievement tests, then, realistically, we’ll never use portfolios.

This hints at the major problem with test results as a measure of student learning: in order to make massive testing programs at all affordable, we have made them much less valid over the years. Our MEAP writing test, for example, used to require legions of low-paid evaluators. Not only was this very expensive, and so time-consuming that Michigan was penalized by the federal government for being late with the scores (under No Child Left Behind rules), but it proved nearly impossible to get consistent grading from so many evaluators. So, the writing test today involves very little actual writing.

You get what you measure, especially if the tests have high stakes, and we are not measuring the complex skills that our children need for life success. Even the gains we do measure can be evanescent, disappearing soon after the tests were scored.

Professor of education at California State University, Monterey Bay, Nicholas Meier (rightly) fears that basing teacher pay on test scores will encourage even more focus on test fodder and test preparation alone, at the expense of more worthwhile endeavors. I recently heard an impassioned speech about education outreach plans for the Yankee Air Museum that involved teaching children forgotten history and engaging them in exciting activities, since schools no longer do. My reaction was a sad acknowledgment that, yes, if it’s not on the MEAP, our teachers don’t have time for it anymore. We trade hundreds of students (and their indispensable funding!) back and forth every year with charters and schools of choice — and those decisions are most likely made on the basis of test scores. The draconian penalties levied by NCLB are base largely on test scores. And now we want to raise the stakes by basing teacher pay and job security on them too?

Dr. Meier says this better than I could: “There is an axiom in the social sciences known as Campbell’s Law that says that the higher the stakes on a particular social indicator (e.g., a single test score), the more the use of that indicator corrupts the original intent, as it encourages people to manipulate the system to look good on that indicator regardless of other effects. We see that happening already—retaining students so they take the easier test; pushing kids to disappear from the system. There is the focus on the kids that show the most promise of moving from one category to the next, while ignoring others. Not to mention the examples of out and out cheating….”

So, no, I have little hope that pay for performance will improve public education.

There are other ways to evaluate teachers, of course, with classroom observation being a time-honored one. This tends to be more of a good idea than a good practice, though, as it is done too rarely for validity. As a young teacher in my family put it: “Who will be doing my evaluation? Will it be an administrator who has classroom experience? If so, will that be experience in my field or something totally unrelated? Are those responsible for my evaluation familiar with the standards the students are expected to achieve? If not I would question, seriously, the reliability of their judgment.”

When you think about it, there is a good reason many teachers are suspicious and fearful about the performance-based evaluation required by the federal Race to the Top education funding program. If they are unconvinced of the fairness or validity of evaluation by their own administrators, think how much less confident they must be about the judgment of state legislators (who have changed laws to comply with RttT) in this matter.


  1. What a thoughtful and insightful post. And discouraging, too. I especially appreciate your comment about all the things we learn in school that are not on tests.

    You don't offer any alternatives to measuring teacher competence, so here's mine. Many real-word educational programs have a component where the students evaluate the teacher at the end of the course. This is something that happens in most professional training courses, and even in the docent training course at the public garden where I volunteer.

    Now, I know, many students, particularly in k-12 would use an opportunity to evaluate their teacher as an opportunity for revenge. But if such evaluations were reviewed by not only parents but other teachers and administrators in the same school, I think you might end up with some valid information.

    How such findings are used - e.g. to adjust teacher salaries - is another question. But surely there are poor teachers, and there are others in the same school who know who the poor teachers are.

  2. There's also research showing that such student evaluations essentially come down to two factors, at least in colleges: did the raters think the professor was (1) nice and (2) organized? While I believe people generally can learn more when they are comfortable and know what's going on, I also recall the "hard" teachers that I did not like much at first but really came to appreciate later. I'm not sure whether that "later" was by semester's end....