PEMSolutions Logo Pearson: A Pearson School Solution
Pearson School Home   About Us   Support   Shopping Cart  

Wednesday, July 09, 2008

Why I Stopped Reading Editorials

I gave up reading editorials quite a long time ago. Not because they are too often misleading or inaccurate (many of them are), or that they are too often purposely written to be controversial and sensational (again, many of them are). Rather, I quit reading because the whole purpose of editorials seems rather futile to me.

Let me explain. People who write editorials usually have a strong position with reasons and rationales why they feel that particular way. Informed readers of editorials either agree with that position and its reasons and rationales, or they disagree, usually from a strong position with a direct opposite viewpoint from that of the editorial writer. In either case, the editorial does little to change someone’s opinion; it just stirs up a lot of emotion. Therefore, the only people who might benefit from reading editorials are those who have not yet made up their minds. However, if the topic is well enough defined to cause a debate in the editorial pages, I wonder how many people really have no position or opinion. Hence, the futility. So I just quit reading them.

Occasionally, friends, family, colleagues or even readers of TrueScores send me editorials and ask for a reaction or an opinion. Not too long ago this happened regarding Dr. Chris Domaleski's op-ed “Tests: Some good news that you may have missed,” from May 29, 2008, in the Atlanta Journal-Constitution. Chris is a colleague, customer, and friend of mine; and I found his comments to be very well written, well supported, and his message very helpful for all those impacted by testing in Georgia. His message, simplified and summarized, was: testing is complicated, necessary and beneficial and ill-informed rhetoric does not help improve learning. (This is my summary of his message and not his own words.)

Unfortunately, it would seem the ill-informed rhetoric continues. I am referring to Michael Moore's post on SavannahaNOW, called "Politics of school testing." It is too bad, but apparently Mr. Moore did not read Dr. Domaleski's comments. First, Mr. Moore claims that the state "blindsided" the schools regarding the poor results on the state’s CRCT. I don't know how this can be as the law of the land has required that states move to "rigorous" content standards and further, this law expects that no child is left behind in attaining those standards. Georgia has implemented a new curriculum with teacher and educator input. Field testing, data review, content review, alignment reviews have been conducted by educators across Georgia. All of this conducted under the "Peer Review" requirements of the Federal NCLB legislation. Passing standards were established with impact data and sanctioned by the State Board of Education. How can anyone be blindsided by such an open and public action?

Mr. Moore also states that he has seen no analysis of the assessment and no discussion of how "...a curriculum and test can be so far out of line." Hmm… I wonder if Mr. Moore is not more upset with the poor performance of the students. It could be the curriculum and the assessment fit very well together. In fact, the required alignment studies as well as educator's working with the Department to review the items should ensure they are aligned. Since the curriculum is new, perhaps the students have not learned it as well as they should.

Mr. Moore then mistakenly claims that the CRCT test in Georgia is constructed out of a huge bank of questions the test service provider (in this case CTB/McGraw-Hill) owns and is part of a "...larger national agenda." I am not much into conspiracy theories, but a quick review of the solicitation seeking contractor help would reveal that the test questions are to be created for use and ownership in Georgia only. Mr. Moore also claims that the multiple-choice format "...seldom reflect the actual goals of the standards." I admit, some things are difficult to measure with multiple-choice test questions—for example, direct student writing—yet many aspects of the learning system do lend themselves to objective assessment via multiple-choice and other objective test questions.

I don't want to get into a debate with Mr. Moore about how the State of Georgia manages the trade-offs between budget pressures (multiple-choice questions are much less expensive in total than subjective but rich open-ended responses) and curriculum coverage of more difficult aspects of the curriculum he outlines, such as inquiry-based activities. It is an over simplification, however, to simply dismiss the issues and suggest or imply that all would be well if Georgia abandoned objective measures.

At the end of the day, I disagree with Mr. Moore and agree with Dr. Domaleski that less rhetoric and more fact-based discussions are needed. If we build the test to measure the curriculum, and the curriculum is new and rigorous, it is unlikely that students will perform well at first. If we build a test where all students perform well, what good does a new and rigorous curriculum get us? Students will receive credit without learning.

Wednesday, June 04, 2008

The Academic Debate about Formative Assessments

There are some things in educational measurement that are not debated. Foremost, the purpose of instruction is to improve learning. The purpose of assessment is to improve instruction, which in turn improves learning. In other words, it’s all about the learning—debate over.

Some researchers (myself included) have become sloppy with our language, labeling assessments "for learning" to be formative and assessments "of learning" to be summative. So, under this lax jargon, a multiple-choice quiz used by the teacher in the classroom at the end of instruction for the purpose of tailoring additional instruction would be deemed "formative." If you follow the rhetoric from national "experts," technical advisory committees, or other learned people, then I have just offended many!

Currently, there is much discussion regarding formative assessments and the need to balance the multitude of assessments that might be used during a school year. A good place to start might be with the paper by Perie, et. al. (2006) posted to the CCSSO SCASS website. You and I might not agree with the classifications or the terminology, but the classification scheme used by these authors helps to contextualize the debate quite well and may even allow you to make up your own mind.

What does, however, put peanut butter into my cognitive gears is all the arguments and wasted effort I hear regarding what exactly does or does not constitutes a "real" formative assessment. I even heard one nationally recognized measurement expert comment that, by definition, no assessment constructed by anyone other than a teacher can be called a formative assessment. I try to remind myself (and others) that at the end of the day only one thing matters: What have you done to improve learning? I doubt that arguing about definitions of formative, benchmark, or interim assessments helps with this.

Tuesday, April 01, 2008

Len Swanson: Pearson Visiting Scholar

Dr. Len Swanson from ETS was the most recent keynote speaker during the Pearson Visiting Scholars program in Iowa City. Dr. Swanson talked about Computer Adaptive Testing (CAT) and the history of how we got to where we are. Dr. Swanson was a particularly good choice for this presentation as he has worked in CAT since its inception and was "on the floor" when most of the ground work was laid for what we take for granted today. Pearson was very lucky to have Dr. Swanson’s expertise, which he also shared with students and faculty from the University of Iowa.
Len pointed out that the desire to tailor testing toward individuals was really enabled by the proliferation of IRT methodology, as well as the continued improvements in technology. Early research was provided by think tanks like ETS with funding coming from the Office of Naval Research.

Len provided the following timeline as a backbone to anchor CAT development to:

1980-1984: Computerized college placement tests
1987-1988: Computerized mastery testing (NCARB)
1990-1993: Praxis exam operational
1990-1993: Graduate Records Exam (GRE) CAT Version
1993-1994: NCLEX (Nurses' Certification and Licensing Exam) CAT Operational.
1994-2008: Statewide CATs implemented both district and state level
When asked about the challenges encountered on the road to operational CAT exams, Dr. Swanson responded that quality item pools, infrastructure, and exam security were the big issues of the day. Funny, isn’t it? Almost fifty years later the same issues are still roadblocks to fully realizing the potential of both computer-based, as well as computer adaptive testing.

Wednesday, March 19, 2008

More Pearson at AERA/NCME!

Sometimes I forget how big Pearson really is. Here are additional presentations at both the AERA and NCME national conventions.

NCME Papers and presentations

Chu, Kwang-lee, & Lin, Serena Jie
Distracter Rationale Taxonomy: A Formative Evaluation Utilizing Multiple-Choice Distracters

Jirka, Stephen
Test Accommodations and Item-Level Analyses: Mixture DIF Models to Establish Valid Test Score Inferences

Lau, Allen
Evaluating Equivalence of Test Forms in Test Equating With the Random Group Design

Lin, Serena Jie
Examining the Impact of Omitted Responses on Equating

Seo, Daeryong
Exploring the Structure of Achievement Goal Orientations Using Multidimensional Rasch Models

Stephenson, Agnes
Examining Individual Students’ Growth on Two States’ English Language Learners Proficiency Assessments

Using HLM to Examine Growth of English Abilities for ELL Students and Group Differences

Wang, Jane
Modeling Growth: A Longitudinal Study Based on a Vertical Scaled English-Language Proficiency Test

Wang, Shudong
Vertical Scaling: Design and Interpretation

The Sensitivity of Yen’s Q3 Statistics in Detecting Local Item Dependence

NCME Papers and presentations

Arce-Ferrer, Alvaro & Diaz, Ileana
An Experimental Investigation of Rating Scale Construction Guidelines: Do They Work with Spanish-Speaking Populations

Yi, Qing
Item Pool Characteristics and Test Security Control in CAT

Wang, Shudong; Zhang, Liru; Kersteter, Patsy; Bolig, Darlene; Yi, Qing
An Investigation of Linking a State Assessment to the 2003 National Achievement of Educational Progress (NAEP) for 4th and 8th Grade Reading

Arce-Ferrer, Alvaro & Shin, Seon-Hi
Three Approaches to Measuring Individual Growth


Wang, Shudong; Jiao, Hong; & Hi, Wei
Parameter Estimation of One-Parameter Testlet Model

Wang, Shudong & Jiao, Hong
Empirical Evidences of construct Equivalence of Vertical Scale Across Grades in K-12 Large-Scale Standardized Reading Assessments

Tuesday, March 18, 2008

Pearson Presentations at AERA & NCME

The contingent of Pearson researchers has, once again, done an admirable job of representing our industry at the annual meeting of the American Educational Research Association (AERA) and the National Council on Measurement in Education (NCME) the week of March 24th in New York City.

The following are the AERA paper and symposium submissions:

Jason Meyers & Xiaojin Kong
An Investigation of the Changes in Item Parameter Estimates for Items Re-field Tested

Leslie Keng, Walter L. Leite, & Natasha Beretvas
Comparing Growth Mixture Models when Measuring Latent Constructs with Multiple Indicators

Leslie Keng, Edward Miller, Kimberly O'Malley, & Ahmet Turhan
Composite Score Reliability Given Correlated Measurement Errors between Subtests and Unknown Reliability for Some Subtests

Ye Tong, Sz-Shyan Wu, & Ming Xu
A Comparison of Pre-Equating and Post-Equating Using Large-Scale Assessment Data

Rob Kirkpatrick & Denny Way
Field Testing and Equating Designs for State Educational Assessments

Lei Wan & Brad Ching-Chow Wu
Person-fit of English Language Learner (ELL) Students in High-Stakes Assessments
Ellen Strain-Seymour
A User-Centered Design Approach for the Refinement of a Computer-Based Testing Interface

Jeff Wilson
A User-Centered Design Approach to Developing an Assessment Management System

Paul Nichols
The Role of User-Centered Design in Building Better Assessments

Michael Harms
An Introduction to User-Centered Design in Large-Scale Assessment

The following are NCME paper and symposium submissions:
Paul Nichols & Natash Williams
Evidence of Test Score Use In Validity: Roles And Responsibility

Denny Way, Chow-Hong Lin, Katie McClarty, & Jadie Kong
Maintaining Score Equivalence as Tests Transition Online: Issues, Approaches and Trends

Denny Way, Paul Nichols, & Daisy Vickers
Influences of Training and Scorer Charactersistics on Human Constructed Response Scoring

Ye Tong & Michael Kolen
Maintenance of Vertical Scales

Leslie Keng, Tusng-Han Ho, Tzu-An Chen, & Barbara Dodd
A Comparison of Item and Testlet Selection Procedures In Computerized Adaptive Testing

Jon S. Twing
Off-the-Shelf Tests and NCLB: Score Reporting Issues

Erika Hall & Timothy Ansley
Exploring the Use of Item Bank Information to Improve IRT Item Parameter Estimation

Canda Mueller
Response Probability Criterion and Subgroup Performance

Tony Thompson
Using CAT To Increase Precision In Growth Scores

Come see us in action. You are bound to go away smarter!

Thursday, February 21, 2008

International Objective Measurement Workshop in NYC!

In the olden day, Raschites and 3PL researchers fought with so much vigor that they separated ways. I recall one AERA/NCME conference with Ron Hambleton on the right, Ben Wright on the left, and nothing but a "DMZ" in between. Well, times have changed, and more moderate heads have prevailed. Hence, those of you attending the AERA/NCME national conference in New York City should consider coming early and checking out the International Objective Measurement Workshop (IOMW). The workshop is held the two days prior to AERA and provides an excellent opportunity to hear about the latest developments in measurement.

The preliminary program for IOMW 2008 is now available. The conference will be held March 22 and 23, 2008 at New York University (NYU) in New York City. There are 48 paper presentations and 2 computer demonstrations scheduled. The preliminary program and the conference registration form can be found on the Journal of Applied Measurement (JAM) web site.

Early registration is currently open and in effect until March 14, 2008. Register now and save $10 on the registration fee. Late and onsite registration will also be available.

Hotel information can be found on the NYU web site. But hey, this is NYC, and it is easy to get anywhere from anywhere.

So check it out. If you have to travel all the way to NYC, you should at least take this opportunity to extend your stay over the front-end weekend.

Send your comments on this entry to TrueScores@Pearson.com.

Monday, February 11, 2008

Standard Setting Workshop at ATP in Dallas

The Pearson psychometric and research services team will be presenting a workshop at the annual conference for the Association of Test Publishers (ATP) held in Dallas, Monday, March 3, 2008. The title for the workshp is "Setting Performance Standards on High Stakes Tests."


Pearson has arguably more experience setting performance standards under NCLB than anyone. Most of this research is not published in peer-reviewed journals, but rather, become aspects of statewide technical reports. This workshop will be a great opportunity for customers, researchers and other practicioners to see what standard setting is all about for large-scale, high-stakes assessments required under NCLB.

The conference this year is being held at the Gaylord Texas resort near Grapevine. The workshop will take place on Monday, March 3, 2:00- 4:30 p.m.


The presenters from Pearson include:

Dr. Scott Davies
Dr. Erika Hall
Dr. Paul Nichols
Dr. Kimberly O’Malley
The team will describe basic activities used under common standard-setting methodology, including:

-item mapping
-modified Angoff
-body of work
-ID matching
-contrasting/borderline groups
-judgmental procedures
Facilitators will describe the roles played by psychometricians, meeting coordinators, and data analysts. Then, attendees will participate in a sample item mapping standard setting in which they will set a cut point and describe reasons for their judgments. Throughout the workshop, seasoned facilitators will share lessons learned and will distinguish what should happen in theory from what does happen in practice. Attendees will leave the workshop with a set of practical materials that will help them plan a future standard-setting meeting.

If you had no reason to attend this conference, this workshop should cause you to not only resgister, but show up and participate!

Send your comments to TrueScores@Pearson.com.

Copyright © 2007 Pearson Education, Inc. or its affiliate(s). All rights reserved.