Thursday, October 27, 2005

True/False, Three of a Kind, Four or More?

I seldom publish my thoughts or reactions to research until after I have lived with, wrestled with, and clearly organized them. I have found that this helps protect my “incredibly average” intellect and keeps me from looking foolish. However, when it comes to the recent research of my good friend Dr. Michael Rodriguez (Three Options are Optimal for Multiple-Choice Items: A Meta-Analysis of 80 Years of Research. EM:IP, Summer 2005), I can’t help but share some incomplete thoughts.

Michael’s research is well conducted, well documented, well thought out, and clearly explained. Yet, I still can’t help but think his conclusions are far too strong based on his research evidence. Forget about what you believe or don’t believe regarding meta-analytic research. Forget about the sorry test questions you have encountered in the past. Forget about Dr. Rodriguez being an expert in research regarding the construction of achievement tests. Instead, focus on the research presented, the evidence posted, and the conclusions made.

For example, consider “…the potential improvements in tests through the use of 3-option items enable the test developer and user to strengthen several aspects of validity-related arguments.” (pg. 4). This is some strong stuff and has several hidden implications. First, is the assumption that a fourth or fifth option must not be contributing very much to the overall validity of the assessment. Perhaps—but how is this being controlled in the research? Second is the assumption that the time saved in moving to three options will allow for more test questions in the allotted time, leading to more content coverage. I speculate that there might not be nearly the time savings the author thinks. Third is the assumption that all distractors must function in an anticipated manner. For example, Dr. Rodriguez reviewed the literature and found that most definitions of functional distractors required certain levels of item-to-total correlation and a certain minimum level of endorsement (often at least five percent of the population), among other attributes. It is unlikely in a standards-referenced assessment that all of the content will allow such distractor definitions. Hopefully, as more of the standards and benchmarks are mastered, more and more of the population choosing the incorrect response options (regardless of how many) will decrease, essentially destroying such operational definitions of “good distractors.”

Finally, another area of concern I have with the strong conclusion that three options are optimal is the fact that this was a meta-analytic study. As such, all of the data came from existing assessments. I agree with the citation from Haladyna and Downing (Validity of a Taxonomy of Multiple-choice Item Writing Rules. APM, 2, 51-78) stating that the key is not the number of options, but the quality of options. As such, does the current research mean to imply that given the lack of good distractors beyond three that three distractors are best? Or, does it mean that given five equally valuable distractors that three are best? If the former, would it not make more sense to write better test questions? If the later, then is not controlled experimentation required?

Please do not mistake my discussion as a criticism of the research. On the contrary, this research has motivated me to pay more attention to things I have learned long ago and that I put into practice almost daily. This is exactly what research should do, generate discussion. I will continue to read and study the research and perhaps, in the near future, you will see another post from me in this regard.

No comments: