While most people argue that you have to earn the respect you are given, this is not always the case. Take for example the hard working, informative, creative and open-ended test item type commonly known as the "griddable item." This item type gets no respect. In fact, my guess is you don't even know what I mean when I refer to a gribbable item. Let me elaborate.
When criterion-referenced and mastery testing was all the rage back in the late 60's and early 70's, most bashers of multiple-choice or supply-only assessment items came crawling out of the woodwork. Now remember, this was prior to high-stakes assessment so most tests were loved by all! In response to this, assessment developers looked to "enhance" objective measures by making them more "authentic." One way to do this and still keep the advantages of machine scoring was to ask an open-ended item (say a multiple-step mathematics problem) and to place a grid on the response document similar to how you might grid your name or date of birth. Once the student solved the math problem and presumably reached one correct answer in one format, he or she could grid the answer on the document. What a great idea! Boy, did people hate it and, as far as I can tell, people still hate it today.
Pearson has conducted research in all manners of investigations regarding the gribbable item (see Pearson Research Bulletin #3), and very little of which has generated much interest. For example, when Pearson was advising the Florida Department of Education in this regard, the griddable item was perceive by the program's critics as an "ineffective" attempt to "legitimize" a large-scale objective assessment as measuring "authentic" and meaningful content (i.e., including performance tasks) when it did not. This really seemed to be a policy and/or political battle which positioned the proponents of performance tasks, who wanted rich embedded assessments, against the policy makers, who wanted economical and psychometrically defensible measures. It is too bad gribbable research did not carry the day.
Another issue with griddables seems to be their content classification. Multiple step mathematics problems, for example, are likely to match more than one cell of a content classification. Furthermore, depending on how they are classified, substeps are not likely to reach a Depth of Knowledge (DOK) of 3 even if the total item does. Finally, some concerns have been raised from psychometricians using IRT to calibrate gribbable items. Under IRT the argument goes, unless you are using the Rasch Model, a 3PL model will be required for traditional multiple-choice type items, but there will be no guessing associated with a griddable item. Hence, a 2PL model will be required to calibrate these items with no pseudo-guessing parameter. (We will save the argument of forcing the c-parameter to zero and not going to a mixed model for another blog.) Add to this the inevitable sprinkling of two and three category open-response items and the mixed model becomes a burden that might not be justified given the relatively few gridded items. Other attributes of the griddable item are delineated in the Pearson Research Bulletin #3.
The point of this blog (clearly a failure given that I feel the necessity to remind you of the point I was making) is to get assessment specialists, psychometricians, policy makers and teachers to objectively evaluate the merits of this item type. Another goal is to have my readers consider how the use of griddable items might help assessment become more of a driving force for good instruction. These are the goals of the blog despite the fact that gribbable items get no respect.