Monday, June 20, 2005

Formative Assessments Link Measurement and Instruction

Many presentations at the annual CCSSO conference on Large Scale Assessment, taking place here in sunny San Antonio, reference the need to expand testing to include more formative assessments.

Pearson Educational Measurement has our own set of formative assessments know as PASeries. A white paper describing how PASeries was developed and how it might be used to improve student learning, as well as other information regarding PASeries, are available here at the conference and are quite popular. Check out PASeries and see for yourself.

Wednesday, June 15, 2005

Models, Measurement and Learning

For years, the measurement community has debated which of a virtually limitless number of mathematical models is most appropriate for a given measurement activity. I remember the very heated discussion between Ben Wright and Ron Hambleton at an AERA/NCME conference not too long ago. Ben spoke of "objective measurement." Ron spoke of "representing the integrity" of measurement practitioners. Both sides had their points and the debate still continues in some circles.

In this age of "standards referenced" assessment, the selection of a measurement model might not be an academic debate only. Think for a minute about a standards referenced test where six of the 60 items come from a particular content domain (say Mathematical Operations). For the teacher, student and accountability folks, this means that 10% of the emphasis of the test is on operations (6/60 = 10 percent). However, measurement practitioners know that by selecting an IRT measurement model (other than Rasch), each of these items are weighted based on the slope parameter. Pattern scoring will then essentially guarantee that 10% of the test is not assigned to mathematical operations. This is because the contributions of these items to the total measure (the resulting theta value) will be weighted by the discrimination (slope) parameter. So, if the operation items do not do a good job in discriminating between high and low overall test scorers, these items are likely to contribute far less than expected to the total ability measure. While this might not be as big a deal when number correct scoring is used, the effects of this weighting are still present in research (equating and scaling).

I think we as measurement experts need to be cognizant of some distinctions when debating psychometric issues. First there is the psychometric or mathematical aspect: Which model fits the data better? Which is most defensible and practical? And which is most parsimonious? Often, I fear, psychometricians decide before seeing the data (with almost religious zeal) which model is "correct." The second aspect is one of instruction: Are we measuring students in the way their cognitive processes function? Are we controlling for irrelevant variance? And do our measures make sense in the context? Often, I think, psychometricians are too quick to compromise on measures without fully understanding constructs. Finally, we need to consider the learning aspect: Are we measuring what is being taught in the way it is being taught or are we doing something else? Are we as psychometricians measuring what is being taught, the stated curriculum, or something else (speededness for example)? Without considering these aspects, at a minimum, we are likely to argue for mathematical models that might not be helpful for our mission of improved student learning.

Just one man's opinion...