Now please understand, Dr. Popham has worked in measurement for many years and describes himself as a "reformed test builder," presumably implying some sort of 12 step program. Despite this, or at least as a prelude to this, Dr. Popham has been very influential in assessment. He was an expert witness in the landmark "Debra P." case in Florida, and was involved in the early days of teacher certification in Texas and elsewhere. He is also the author of numerous publications.
Over the years I have listened to Jim says some outrageous things. For those of you who know Jim, this is no surprise. He is quite a presenter and, I suspect, basks a little too much in the glow of his own outrageousness. However, many of the things I have heard him say (at the Florida Educational Research Association-FERA meeting, for example) were just plain incorrect. I won't bother you with the specifics as I am sure Dr. Popham would claim he is correct. Yet, it does put me in a quandary. Despite his recent statements, I actually have to agree with what Dr. Popham said at the ACT-CASMA conference back in November.
Jim's theme—one he has articulated in multiple venues—regarded what he calls "instructional sensitivity." Here are the basic tenets of his argument:
"A cornerstone of test-based educational accountability:
Higher scores indicate effective instruction; lower scores indicate the opposite."
"Almost all of today's accountability tests are unable to ascertain instructional quality. That is, they are instructionally insensitive."
"If accountability tests can't distinguish among varied levels of instructional quality, then schools and districts are inaccurately evaluated, and bad educational things happen in classrooms."
I keep returning to this theme. While I make a living building assessments of all types, recently most of my efforts and those of my colleagues have been with assessments supporting NCLB, which are "instructionally insensitive" according to Dr. Popham. It is hard to believe that any assessment that asks three or four questions regarding a specific aspect of the content standards or benchmarks (and by the way does so only once a year) can be very sensitive to changes in student behavior due to instruction on that content. At the same time, having some experience teaching, testing, and improving student learning, I have seen the power that measures just like these have for teachers who know what to do with the data and have a plan to improve instruction.
Hence my dilemma: why do I keep returning to Dr. Popham's argument? While I am not ready to admit I might have been wrong to dismiss Jim as a "reformed test builder" and to ignore his rants, I do admit he has a valid point to some extent regarding instructional sensitivity. I suppose I would have called his argument "the instructional insensitivity of large-scale assessments," but who am I to quibble with vocabulary.
Dr. W. James Popham, Professor Emeritus from UCLA welcomes all "suggestions, observations, or castigations regarding this topic...." Contact him at wpopham@ucla.edu. Or send an email to TrueScores@Pearson.com, and I will forward it to him.