TrueScores: Abandoning the Pejorative

Psychometricians often speak of error in educational and psychological measurement. But error seems like such a pejorative term. According to the Merriam-Webster online dictionary Merriam-Webster Dictionary, an error is "an act involving an unintentional deviation from truth or accuracy." In ordinary language, error suggests unflattering connotations and alarming inferences.

Under classical test theory, error is the difference between observed student performance and an underlying "true score." But who has special insight into what is true? As Nichols, Twing, Mueller and O’Malley in their recent standard setting article* point out, human judgment is involved in every step of test development. Typically, a construct or content framework describing what we are trying to test is constructed by experts. But the experts routinely disagree amongst themselves and certainly disagree with outside experts. The same expert may even later disagree with a description constructed by them earlier in their career! So what psychometricians call error is just the difference between how one group of experts expects students to perform and how students actually perform.

Psychometricians might even be able to control the amount of "error" by careful selection of what they declare is true. After all, the size of the difference between the experts' description and student performance, or error, depends on what experts you listen to or when you catch them. A strategy for decreasing error might be to gather a set of experts' descriptions and declare as “true” the experts' description that shows the smallest difference between how experts' expect students to perform and how students actually perform. Who is to say this group of experts is right and that group of experts is wrong?

So let's agree to abandon the pejorative label "error" variance. The variance between how one group of experts expects students to perform and how students actually perform might be more appropriately referred to as "unexpected" or "irrelevant." Let's recognize that this variance is irrelevant to the description constructed by the experts, i.e., construct irrelevant variance. Certainly this unexpected variance threatens the interpretation of student performance using the experts' description. But does the unexpected deserve the pejorative label "error?"

*Nichols, P. D., Twing, J., Mueller, C. D., & O’Malley, K. (2010). Standard setting methods as measurement processes. Educational Measurement: Issues and Practices, 29, 14-24,

Paul Nichols, PhD
Vice President
Psychometric & Research Services
Assessment & Information
Pearson

TrueScores

Wednesday, June 30, 2010

Abandoning the Pejorative

No comments:

Search This Blog

Pearson's Test, Measurement & Research Services

Blog Archive

Followers

Copyright © 2010 Pearson Education, Inc. or its affiliate(s). All rights reserved.