Monday, August 01, 2005

Why not make standard setting scientific?

The number of different standard setting methods has proliferated over the years. Research has focused on evaluating current standard-setting methods, improving these methods, and discovering which methods are suitable for different situations. See the 1996 book, Setting Performance Standards by Greg Cizek for a review of most of these methods.

In the fourteenth century, William of Ockham noted: "Pluralitas non est ponenda sine neccesitate'', which translates as "entities should not be multiplied unnecessarily.'' Any casual observer would certainly note the multiplication of standard setting methods, though the judgment of being unnecessary remains to be made.

I have noticed that proponents for or opponents against standard setting approaches have used intuitive explanatory models of how standard setting judges think as an Ockham’s razor to appraise standard setting methods. Critics have used intuitive explanatory models of judges’ thinking to argue against the use of some standard setting methods. For example, the National Academy of Education used intuitive models of judges’ thinking to argue against the use of the modified Angoff method. The NAE made the claim—that can only be based on a model of how judges think during standard setting—that estimating the probability that a borderline test taker will answer an item correctly is a task that is too difficult for judges to do effectively. This claim was one source of support for the conclusion that the modified Angoff method was fundamentally flawed.

Alternatively, proponents have used intuitive explanatory models of judges’ thinking to argue in favor of the use of other standard setting methods. For example, Impara and Plake in the Journal of Educational Measurement made the claim that 1) judges may have difficulty conceptualizing hypothetical test takers, and 2) judges may have difficulty estimating proportion correct. Like the NAE’s claims, these claims can only be based on assumptions of how judges are thinking. These claims were used as rationale for proposing and testing two variations in the way the Angoff method is typically applied.

Arguments around standard setting methods seem always to depend on intuition because no formal explanatory models of judges’ thinking is out there. So, someone arguing for or against any standard setting approach has no shared, public criterion that might serve as a foundation for criticisms or acclamations. Standard setting research and practice has no Ockham’s razor against which to judge standard setting methods. Why not?

An earlier post to this blog (Those Pesky Performance Standards, Friday, May 20, 2005) noted that standard setting resulted in arbitrary, but not capricious, judgments, and mused that all of the research and rhetoric using the results or outcomes of such judgmental procedures may not be worth the efforts. But the validity of standard setting results are in the procedure, not the results. We should understand how judges think during these procedures and not just go on hunches.

Why do educational researchers keep making these claims about how standard setting judges think, but fail to do the the research to support a scientifically based model of how standard setting judges really think? Models of how people think have been proposed and tested in many other areas, why not standard setting?

No comments: