Thursday, August 25, 2005

Designing Assessment Information to Support Teachers

A few years ago at a CASMA conference, Bob Linn made a presentation in which he noted that states faced a challenge in meeting their NCLB goals. The states have accepted that challenge, and they are rolling up their sleeves and getting to work. And they are looking to the educational measurement community to help them in this task. States are asking the educational measurement community to provide teachers the assessment information they need to achieve NCLB targets. In response, PEM has offered the PASeries to help states and districts meet their NCLB goals.

Before we can better help states meet their AYP goals, we must ourselves be able to answer two questions. What is the information teachers can use to meet NCLB achievement targets? And how do we effectively communicate that information? The design and configuration of assessment information that works well for teachers and helps support their work in the classroom, rather than make it more complicated, should be tackled systematically. But design ideas for assessment information are neither obvious nor effective when they are based on psychometric considerations alone. The design and configuration of assessment information that works well for teachers requires understanding how teachers work and what kind of results instructional practices obtain. Neither classical test theory nor IRT addresses teaching and instructional practices. But socio-cultural theory may provide a framework to work out answers to these questions.

What is socio-cultural theory? Typically, socio-cultural theory is associated with Vygotsky and individual learning. Vygotsky maintained the child follows the adult's example and gradually develops the ability to do certain tasks without help or assistance. He called the difference between what a child can do with help and what he or she can do without guidance the "zone of proximal development." However, socio-cultural theory has been transferred and extended to industrial design and product development. For example, the paper by Aula, Pekkala, and Romppainen outlines a research approach to designing successful products by recognizing the end users’ needs and expectations. This approach is part of the National Science Foundation’s research into the implementation of design theory to advance the product realization process. Some of this research is being funded by the Division of Design, Manufacture and Industrial Innovation.

In educational measurement, we can extend socio-cultural theory to the design of assessment information that supports teachers’ work in the classroom and helps teachers meet NCLB achievement targets. Our charge would be to develop theories and produce findings that are pertinent to understanding the design, development and implementation of usable assessment information systems. Such theories and findings would answer questions such as:

  • What kind of instructional practices are best able to take advantage of what kind of assessment information? For example, teachers whose only instructional strategy is to reteach a unit are able to use a different kind of assessment information than teachers who have available different instructional strategies for students with different misconceptions.
  • What kind of assessment information is best suited to inform instruction on what kind of learning? For example, different assessment information might be better suited to inform instruction of a procedure, such as the subtraction of multi-digit numbers, than the assessment information that is better suited to inform the instruction of conceptual understanding, such as the structure of the U.S. government.

We as educational measurement professionals have much work to do before we can identify a teacher’s “zone of instructional development” for assessment information. But we cannot give educators the same kind of response as Henry Ford gave to car buyers when asked what color cars were available: “Any color – so long as it’s black.” If we hope to help educators improve childrens’ learning, we must be able to design assessment information by recognizing teachers’ needs and expectations. And educators are pleading for, even demanding, this kind of information.

Tuesday, August 23, 2005

Reading IS Fundamental

The Second Annual Lexile© National Reading Conference is in the pun intended. I attended, as I did last year, and was once again impressed by what I discovered. Sure, Greg Cizek's presentation about "Testing Myths" was enjoyable and informative. My presentation criticizing NCLB not allowing "off level" reading assessment was novel if not interesting (though it was well attended). Quality Quinn, Lou Fabrizio and Malbert Smith all provided very informative and instructional presentations. All of these were worth the price of admission alone. However, what impressed me the most was the desire of the attendees to read! Teachers were buying books, with their own money, to give to "troubled readers" in their classrooms. Malbert Smith talked about that "parasite" we have in our homes, the television, that robs us of intellect. Reading teachers agreed that the best way to teach reading was to get children to read. Assessment developers understood the needs of the reading specialists! In short, it was utopia. Well, short of utopia, it was very exciting to see people paying attention to reading and reading instruction. I hope you can attend next year, but in the mean attention to reading.

Monday, August 08, 2005

Is AERA Too Big to be Useful?

Recently, I attended the CCSSO Large-Scale Assessment (LSA) Conference held this June in San Antonio, Texas. On the airplane ride home, I had a moment to compare the experiences of the two conferences I have attended in 2005: the CCSSO LSA conference and the AERA conference held this April in Montreal, Canada.

Take, for example, the access to sessions. The CCSSO LSA conference was held in one hotel, and all the presentation rooms were on one floor. I was able to leisurely stroll from session to session, and rooms were easy to find. I was able to attend nearly all the sessions that attracted my interest. And, I didn’t once have to leave the air-conditioned comfort of the hotel.

The AERA conference was spread across a handful of hotels, located blocks away from one another, plus having multiple conference rooms on different floors. I had to run from hotel to hotel, and struggle to learn multiple layouts of floors and rooms. I missed many of the sessions I wanted to attend because they were too far from the last session I attended, or because they were scheduled at the same time as another session. On top of that, I walked blocks and blocks in the cold and the rain.

As another example, consider the interaction with colleagues. The CCSSO LSA conference had a number of opportunities to talk with colleagues. Because the conference was held in one hotel, I constantly crossed paths with colleagues between sessions. Furthermore, I could easily arrange to meet friends and colleagues in the mornings or evenings because nearly all of us were staying in the conference hotel. In addition, at least one reception was held every night, providing a relaxing atmosphere in which to meet and talk.

The AERA conference was attended by more of my colleagues but I crossed paths with fewer of them. I rarely crossed paths with colleagues and often, when I did, it was in the middle of a crosswalk as I ran from session to session. Friends and colleagues were scattered across the city at different hotels sometimes miles apart. It was difficult to find people and more difficult to arrange meetings.

Not surprisingly, I enjoyed the CCSSO LSA conference more than I did the AERA conference. I saw more of who and what I wanted to see. I did so in a relaxed and comfortable environment. I don’t intend this as a rap against either Montreal or the AERA staff. Montreal is a great city, and the AERA staff are always pleasant and hard working. But the AERA conference has grown so large that it has out grown being a meeting for professional growth and exchange. An alternative should be considered that is more intimate, perhaps more like the CCSSO LSA conference in size and format.

Thursday, August 04, 2005

Is It Fact or Process?

A friend of mine and I were recently discussing some aspect of mathematics instruction. I wanted to talk about the "number line" and he wanted to talk about "math facts". Perhaps I was being a bit ornery (quite contrary actually), but I looked at him with a blank stare and asked what he meant by "facts." He said, "You know...facts, like two times two is equal to four." Since I started down this path, I continued. So I replied, "Well, actually, two times two is really a concept. The concept of the successive addition of two for a total of two cycles." He became quite agitated and said, "No! It is a fact, you either know it or you don't." That is when I drew a matrix of 1-9 across the top of a piece of paper and 1-9 down the side and showed him how this matrix provided the "facts" he claimed without really "knowing" anything (other than how to draw the matrix). My friend then noticed that I was trying to teach him about process and he was trying to teach me about facts, and that we were getting no where fast. As such, he changed the topic to history, which I am sure he thought was a safe subject. "Math is no different than history," he said. "It's all about knowing the facts and sequencing them correctly." I said, "Really? Then if you list the League of Nations before the United Nations on some timeline you have demonstrated knowledge of history?" My friend was skeptical (and annoyed) and did not answer. I told him that in reality, it might very well be important to understand the impact the League of Nations had on the development of the United Nations if you were going to use history to help understand current events and/or future events.

At this point we decided to end the conversation before anyone got really mad. In departing, he did take one last shot. He said, "It's just like with testing...all you have to do is figure out if the kids know the facts." And I asked him, "Perhaps, but what process do you want to use? Multiple-choice, short answer, essay...?"

Perhaps the next time I see my friend we will talk will likely lead to a simpler discussion!

Monday, August 01, 2005

Why not make standard setting scientific?

The number of different standard setting methods has proliferated over the years. Research has focused on evaluating current standard-setting methods, improving these methods, and discovering which methods are suitable for different situations. See the 1996 book, Setting Performance Standards by Greg Cizek for a review of most of these methods.

In the fourteenth century, William of Ockham noted: "Pluralitas non est ponenda sine neccesitate'', which translates as "entities should not be multiplied unnecessarily.'' Any casual observer would certainly note the multiplication of standard setting methods, though the judgment of being unnecessary remains to be made.

I have noticed that proponents for or opponents against standard setting approaches have used intuitive explanatory models of how standard setting judges think as an Ockham’s razor to appraise standard setting methods. Critics have used intuitive explanatory models of judges’ thinking to argue against the use of some standard setting methods. For example, the National Academy of Education used intuitive models of judges’ thinking to argue against the use of the modified Angoff method. The NAE made the claim—that can only be based on a model of how judges think during standard setting—that estimating the probability that a borderline test taker will answer an item correctly is a task that is too difficult for judges to do effectively. This claim was one source of support for the conclusion that the modified Angoff method was fundamentally flawed.

Alternatively, proponents have used intuitive explanatory models of judges’ thinking to argue in favor of the use of other standard setting methods. For example, Impara and Plake in the Journal of Educational Measurement made the claim that 1) judges may have difficulty conceptualizing hypothetical test takers, and 2) judges may have difficulty estimating proportion correct. Like the NAE’s claims, these claims can only be based on assumptions of how judges are thinking. These claims were used as rationale for proposing and testing two variations in the way the Angoff method is typically applied.

Arguments around standard setting methods seem always to depend on intuition because no formal explanatory models of judges’ thinking is out there. So, someone arguing for or against any standard setting approach has no shared, public criterion that might serve as a foundation for criticisms or acclamations. Standard setting research and practice has no Ockham’s razor against which to judge standard setting methods. Why not?

An earlier post to this blog (Those Pesky Performance Standards, Friday, May 20, 2005) noted that standard setting resulted in arbitrary, but not capricious, judgments, and mused that all of the research and rhetoric using the results or outcomes of such judgmental procedures may not be worth the efforts. But the validity of standard setting results are in the procedure, not the results. We should understand how judges think during these procedures and not just go on hunches.

Why do educational researchers keep making these claims about how standard setting judges think, but fail to do the the research to support a scientifically based model of how standard setting judges really think? Models of how people think have been proposed and tested in many other areas, why not standard setting?