Friday, February 25, 2011

Old Dogs Need New Tricks


As soon as I finished graduate school my husband and I got two puppies, small Italian Greyhounds named Cyclone and Thunder. We named them after the mascots from our undergraduate universities. We took them everywhere with us and spoiled them rotten until the kids arrived. Once we had a toddler walking around the house, it was obvious that these old dogs needed to learn some new tricks. We had to teach them not to jump up on the kids or lick their faces. We had gotten small dogs because we weren’t going to have a lot of time to really train them, but now we knew we were going to have to be creative in teaching them that it really wasn’t okay to treat a toddler like a lollipop. It was going to be a challenge, but we were motivated.

Recently I was able to attend an event related to one of my challenges at work. The Center for K-12 Assessment and Performance Management at ETS hosted a research symposium on Through-Course Summative Assessments. Attendance at the conference was by invite only and each organization was only able to send one or two participants. I was excited to represent Pearson with such a distinguished group of researchers and thinkers. Through-course summative assessment poses some incredible challenges for traditional psychometrics, and I was eager to hear the recommendations from leaders in the field on issues such as how to handle reliability, validity, scoring, scaling, and growth measures in these types of assessments. Instead, the eight papers generally focused on the identification of issues and challenges, resulting in many recommendations for further research. Very few solutions were proposed, and many of the solutions that were proposed did not seem very viable for large-scale testing. It was clear to me that, just like my Italian Greyhounds, we old psychometricians will need some new tricks.

Although most of the papers focused on somewhat technical measurement topics, the audience at the symposium was really a mix of technical and policy experts. The tension between those viewpoints was evident throughout the conference. As set out in the presentations of the initial policy context, the next generation assessments designs proposed by the Partnership for Assessment of Readiness for College and Careers (PARCC) and by the SMARTER Balanced Assessment Consortium (SBAC)—complete with formative, interim, through-course, summative, and performance components—will be used to:

  • signal and model good instruction,
  • evaluate teachers and schools,
  • show student growth and status toward college and career readiness,
  • diagnose student strengths and weaknesses to aide instructional decision making,
  • be accessible to all student populations regardless of language proficiency or disability status, and
  • allow the United States to compete with other nations in a global economy.
That’s a tall order! It’s like trying to teach a puppy to sit, stay, roll over, and fetch at the same time.

The policy goals and several of the desired policy uses of the assessments are clear. What is not as clear is what psychometric models can be used to support these claims. It was mentioned more than once at the conference that if a test has too many purposes, it is unlikely that any purpose will be well met. I think it’s clear, however, that the new assessments will be used for all those purposes, and the assessment community must find a way to support them.

Too often the psychometric mantra has been “Just say no.” If you recall, that was the advertising campaign for the war on drugs in the 80’s and 90’s. It’s time to move to the 21st century. Assessments will be used for more than identifying how much grade-level content a student has mastered. We may not have originally developed assessments to be used for evaluating teachers, but they are used for this and will continue to be. In the same way, high school assessments will be used to predict readiness for college and careers. Policy makers are asking for our help to design and provide validity evidence for assessments that will serve a variety of purposes. No, the assessments may not have the same level of standardization and tight controls, but they still can be better than an alternative design that excludes psychometrics entirely.

There is already a mistrust of testing and an overload of data. Moving forward, we need to work with teachers, campus leaders, parents, and the community to better involve them in the testing process and particularly in the processes for reporting and interpreting test results. Tests should not be administered simply to check off a requirement. The data produced from assessments should inform instruction, student progress, campus development, and more. The assessments are not isolated events, but rather part of a larger educational system of instruction and assessment with the goal of preparing students for college and careers. This is a worthy goal. As a trained psychometrician, I also struggle with determining how far we can push the boundaries in meeting this goal before we’ve stepped over the line. If I bathe my kids in lemon juice to keep the dogs from licking them, have I gone too far? It may seem like a crazy idea, but I can’t ignore the need to think differently.

Indeed the next generation assessments, including through-course summative assessments, will provide new challenges and opportunities for psychometrics and research. The research, however, must be focused around solving the practical challenges that the assessment consortia will face. States are looking to us to be creative and propose solutions, not develop a laundry list of problems. There is no perfect solution. Instead, psychometrics must take steps forward to present innovative assessment solutions that balance the competing priorities and bring us closer to the goal of improving education for all students. We must continue to research and use that research to refine and update the assessment systems.

As Stan Heffner, Associate Superintendent for the Center for Curriculum and Assessment at the Ohio Department of Education, discussed in his presentation, “This is a time to be propelled by ‘what should be’ instead of limited by ‘what is’.”

He was too polite to really say it, but I think he meant that old dogs need new tricks.

Katie McClarty
Senior Research Scientist
Test, Measurement & Research Services
Assessment and Information
Pearson

No comments: