Tuesday, October 23, 2007

The 2007 Coffman Lecture Series: Lindquist was Right!

Living in Iowa City, arguably the mecca of achievement testing, I am privileged and fortunate to be able to attend professional development activities that many of my colleagues located elsewhere can not. Take for instance the 2007 William E. Coffman Lecture Series sponsored by the University of Iowa and the Iowa Testing Program. This year the guest lecture was Dr. Daniel Koretz, Harvard Professor and noted measurement scholar. I have disagreed with Professor Koretz in the past, but I disagree with everyone on virtually everything so this should be of no surprise. However, the good doctor's lecture this year, while nothing new, reminded me and my staff of why we got into measurement in the first place. The title of Dr. Koretz's presentation was:

Test-based Monitoring and Accountability: Time to Take Lindquist's Warning Seriously
Dr. Koretz started his lecture by recalling the words E.F. Lindquist composed as the introduction to the 1951 edition of Educational Measurement, of which Dr. Lindquist was the author:

"The widespread and continued use of a test will, in itself, tend to reduce the correlation between the test series and the criterion series…. Because of the…potency of the rewards and penalties associated…with high and low achievement test scores of students, the behavior measured by a widely used test tends in itself to become the real objective of instruction, to the neglect of the (different) behavior with which the ultimate objective is concerned." (Lindquist, 1951, 152–153)
What this states, simply, is that even in 1951 and prior to the NCLB rage for "education reform," "high-stakes testing" and "accountability," Dr. Lindquist anticipated the consequences warned by many when we moved to high-stakes testing. This consequence is the focus on improving test scores without the improvement in learning required.

As Dr. Koretz points out in his lecture, this does not have to be as blatant as an increased motivation to cheat or otherwise subvert the system, or even the result of teaching to the test. It could be something much more subtle; one example of which he calls "Reallocation." Reallocation means, in its most simplistic, the shifting of educational resources based on assessment results or lack there of. While this might make sense—that is, if the measures say we need help in area X, we should look at improving instruction and learning about area X. But the complications lie in how we pay attention to this area of needed improvement. Or, to quote Dr. Kortez:


"Individual elements of the domain may be measured well, but representation of the domain is undermined."
An example of this—for those of us who have to make sure our corners are square—is the use of the "3:4:5" triangle as a substitution for a clear understanding and application of the Pythagorean theorem. Students are likely to get the test questions correct without knowing the Pythagorean theorem if test builders are not careful about how they ask the questions and students apply this simple rule. In this case, incorrect inferences about the Pythagorean theorem and possibly about the more general domain of geometric shapes will be made when in reality rote learning of a "trick" led to the correct answer.

Professor Koretz ended his lecture showing that efforts to reduce the "reallocation" effect NCLB has brought is failing mainly because educators are not heeding the words Dr. Lindquist posed years ago: namely, instruction should not focus on the sample of skills tested, but rather on the domain being measured. My staff and I argue it is the instruction that the dialog should be about, but clearly we need to ensure that the efficacy of the measures are of value first.

Monday, October 08, 2007

CASMA-ACT Conference Looks Great, Despite It's Name!

The CASMA-ACT Invitational Conference on Current Challenges in Educational Testing seems to offer a very interesting slate of speakers and topics, despite it's rather academic name.
Dr. Dan Koretz, from Harvard and a national assessment expert, will speak about higher education accountability.

Joe Crick, from the National Board of Medical Examiners, has been solving assessment problems almost as long as there have been assessments and will give his perspectives on performance testing simulations.

Dr. Doug Christensen, the current Commissioner of Education in Nebraska, is an outspoken critic of NCLB and to his credit has managed to implement education the way he sees fit for his native Nebraska. All of this despite NCLB being the "law of the land."
These speakers alone would make it worth the trip to Iowa City on a warm and sunny November afternoon (Saturday, November 3rd to be exact). I promise that the fall colors will still be...well, that the leaves will have color...well, some leaves will still be on the trees!

Another interesting part of the conference will be the participation of "national media" as they provide their perspectives. I hope to leave before this part. Oh, yeah, then there is Popham.

Regardless, it looks to be a great conference, and I hope you can find the time to attend. The Hawkeyes will be out of town that weekend, so hotel rooms will be easy to find and inexpensive.

Monday, September 24, 2007

IEREA Conference Scheduled for November 30th in Iowa City

While the presidential election may seem years away, it is right around the corner here in Iowa as evidenced by the many political pundits vying for sound bytes. An outsider may well consider this a time of confusion and turmoil prior to a more focused direction once the party candidates are chosen. This is also true on the education front. The reauthorization of NCLB is yet to be determined. Recent Federal guidelines for the assessment of students with special needs (the "two-percent population") have raised more questions than have provided answers. College-readiness, school-to-work transition and high-school reform are still topics of the day. As such, your conference planning committee worked hard to come up with a conference this year that would provide you with information that would be helpful in getting your jobs done—namely educating Iowa’s youth—in a time of transition with much unknown about the future.

Our conference theme this year is “Success for All: Access, Connections and Transitions” and focuses on the strong commitment Iowa educators have to ensuring that all students, regardless of background and circumstances, have an opportunity for success whatever the endeavor: finding a job, going to college, starting a business or pursing a trade.

We are very fortunate this year to have Dr. Judy Jeffrey, Director of the Iowa Department of Education to be our keynote speaker. Dr. Jeffrey will discuss how Iowa is prepared to meet the challenge of providing success for all students as they transition across the developmental curriculum. Dr. Jeffrey will speak about course taking patterns, particularly as they apply to students at risk. She will speak specifically about closing the equity gap for such students and will highlight the community college as a key component of the transitional plan.

Other presentations will include “Project Lead the Way.” This project provides technical and engineering career pathways and outlines coursework and critical experiences that are needed to help students fully explore and prepare for the world of work. Additionally, Project Lead the Way helps provide a consistent curriculum process that is recognized across the country.

Plan ahead and register early! Registration information can be found at the IEREA website.

Tuesday, September 11, 2007

Why is Linking College Readiness to High School Skills so Difficult?

In a previous newsletter to NCME, Dr. Michael Kirst argued that our post-secondary and secondary education systems are disconnected. He argued, that:
"A big issue is the proliferation of tests in grades 9 through 11 caused by the combination of post-secondary admissions assessments, and the new statewide tests created by the K–12 standards movement."
Dr. Kirst also suggested that there are very few synergies between the college and high school space:
"Education standards and tests are created in different K–12 and post-secondary orbits that only intersect for students in Advanced Placement courses."
Now, in all fairness to Dr. Kirst, these quotes are taken out of context; and I suggest my readers review his thesis in its entirety. When you are finished, I hope you check out my reaction:
"The 'Intrinsic Rational Validity' of an Integrated Education System"
As well as my colleagues' reaction:
"Assessing College Readiness: A Continuation of Kirst"
by Scott Marion and Brian Gong of the National Center for the Improvement of Educational Assessment. Both of these commentaries can be found in the June 2007 NCME Newsletter.

My summary conclusions were oversimplified like usual. Namely, it should not be that hard to align enabling skills required in elementary school with what will eventually be needed in high school, or that high school should teach the enabling skills that will ultimately be needed in college. The premise of the entire argument, however, is somewhat like the one about "the chicken and the egg." Dr. Kirst claims that post-secondary needs to remediate college students because they are unprepared. State testing directors claim they measure the curriculum that is required by their state standards. We all know, or should, that often the state content standards do not prepare students for success in college. (Where are the Algebra II standards?) So, instead of pitying the very successful colleges who earn lots-o-dollars remediating students, or lamenting the terrible job high schools do preparing students for college, maybe we should step back and ask what the purpose of high school is, because I am not at all sure it is to make students college ready.

Friday, August 31, 2007

IEREA Poster Submissions Deadline Soon Approaching

A reminder to let everyone know that our annual conference for the Iowa Educational Research and Evaluation Association (IEREA) is fast approaching and will be upon us before we know it. One of the popular features of the conference is our poster presentation and our paper contest. We need lots of poster and paper submissions to make this part of our conference a success. This conference is also a great opportunity for people to get involved in IEREA, support Iowa, and get feedback on timely topics in education.

Please mark your calendars and see conference information below, including the call for proposals.

Theme: Success For All: Access, Connections, and Transitions
Date: Friday, November 30, 2007
Location: Sheraton Hotel, Iowa City

Iowa Educational Research and Evaluation Association: 2007 Call for Proposals

Iowa educators are invited to submit proposals to present their research at IEREA's annual conference in Iowa City, IA. Proposals from faculty members, graduate students, and education professionals conducting research related to education, specifically this year's theme, are invited to submit proposals. Additionally, individuals involved in school-based or university-school collaborative action research studies, innovative program evaluations, and work related to technical issues of assessment are also encouraged to submit proposals. IEREA utilizes a poster presentation format, designed to foster dialogue among presenters and conference attendees. To maximize interaction during the poster sessions, posters will be displayed in an open space with sufficient room to congregate, browse, and discuss.

Refreshments will also be provided during poster sessions.
Instructions for displaying research in a poster format will be sent to presenters of all accepted posters. At least one presenter per poster must register to attend the IEREA Conference, and all poster presenters qualify for reduced conference registration fees. Details are provided upon acceptance of the proposal.

The deadline to submit poster proposals is 5:00 pm, Friday, September 14, 2007. Submissions must include two copies of the proposal. One copy of the proposal should contain author name(s), institutional affiliation(s), and complete contact information for the coordinating presenter all on a separate cover sheet. The second copy of the proposal should contain no author names, titles, or contact information in order to facilitate blind review of all proposals. The poster proposal itself should be no more than three (3) double-spaced pages (excluding
references) with reasonable margins and minimum 11-point type. Each proposal must include the following:

Title of Poster
Abstract (maximum 50 words)
Goals/Objectives
Design and Methods
Results
Significance/Impact
References

E-mail submissions are strongly encouraged (please type IEREA Proposal in the email Subject line), and receipt of proposals will be acknowledged via return e-mail. Send all poster proposals to

Jan Walker
jan.walker@drake.edu

or

IEREA Conference Planning Committee
ATTN: Dr. Jan Walker
3206 University Av
Des Moines, IA 50311
(515) 271-3719

Tuesday, August 28, 2007

Griddable Items Get No Respect...No Respect at All!

While most people argue that you have to earn the respect you are given, this is not always the case. Take for example the hard working, informative, creative and open-ended test item type commonly known as the "griddable item." This item type gets no respect. In fact, my guess is you don't even know what I mean when I refer to a gribbable item. Let me elaborate.

When criterion-referenced and mastery testing was all the rage back in the late 60's and early 70's, most bashers of multiple-choice or supply-only assessment items came crawling out of the woodwork. Now remember, this was prior to high-stakes assessment so most tests were loved by all! In response to this, assessment developers looked to "enhance" objective measures by making them more "authentic." One way to do this and still keep the advantages of machine scoring was to ask an open-ended item (say a multiple-step mathematics problem) and to place a grid on the response document similar to how you might grid your name or date of birth. Once the student solved the math problem and presumably reached one correct answer in one format, he or she could grid the answer on the document. What a great idea! Boy, did people hate it and, as far as I can tell, people still hate it today.

Pearson has conducted research in all manners of investigations regarding the gribbable item (see Pearson Research Bulletin #3), and very little of which has generated much interest. For example, when Pearson was advising the Florida Department of Education in this regard, the griddable item was perceive by the program's critics as an "ineffective" attempt to "legitimize" a large-scale objective assessment as measuring "authentic" and meaningful content (i.e., including performance tasks) when it did not. This really seemed to be a policy and/or political battle which positioned the proponents of performance tasks, who wanted rich embedded assessments, against the policy makers, who wanted economical and psychometrically defensible measures. It is too bad gribbable research did not carry the day.

Another issue with griddables seems to be their content classification. Multiple step mathematics problems, for example, are likely to match more than one cell of a content classification. Furthermore, depending on how they are classified, substeps are not likely to reach a Depth of Knowledge (DOK) of 3 even if the total item does. Finally, some concerns have been raised from psychometricians using IRT to calibrate gribbable items. Under IRT the argument goes, unless you are using the Rasch Model, a 3PL model will be required for traditional multiple-choice type items, but there will be no guessing associated with a griddable item. Hence, a 2PL model will be required to calibrate these items with no pseudo-guessing parameter. (We will save the argument of forcing the c-parameter to zero and not going to a mixed model for another blog.) Add to this the inevitable sprinkling of two and three category open-response items and the mixed model becomes a burden that might not be justified given the relatively few gridded items. Other attributes of the griddable item are delineated in the Pearson Research Bulletin #3.

The point of this blog (clearly a failure given that I feel the necessity to remind you of the point I was making) is to get assessment specialists, psychometricians, policy makers and teachers to objectively evaluate the merits of this item type. Another goal is to have my readers consider how the use of griddable items might help assessment become more of a driving force for good instruction. These are the goals of the blog despite the fact that gribbable items get no respect.

Monday, July 23, 2007

Summertime and the Power of Peppermint

It's lazy summertime here in Iowa City. No wait—it's more like, "Here I am, being lazy during the summer, in Iowa City." Regardless, it seems to be time for another blog post but most of my topics are cynical, political, or otherwise educationally augmentative, and these are just not the moods I want to portray for the summer. Hence, I pulled out a dated Washington Post article on testing entitled: The Power of Peppermint is Getting Put to the Test.

This article is about a principal in Silver Spring, Maryland, (gas prices have struck the Post just has they have the rest of us) who purchased peppermint candies for her students such that, "Along with smart teaching, careful preparation, a good night's sleep, and a full stomach, peppermint candies are said to improve test performance." Who would have thought? My guess is that if such information gets out, we are likely to have a shortage of peppermint just like we often have to order extra large print booklets because someone started a rumor that students did better on a test when it was presented in large print.

Principal Boucher went on to say:
"...millions of sites claimed that peppermint were the perfect midpoint snack for things like testing."
Now, before you start making fun of me for making fun of Principal Boucher, note that the Post cites evidence that this might not be as far fetched as it may sound. According to the Post article, research from the University of Cincinnati in the 1990's found that a whiff of peppermint helped test subjects concentrate and do better on tests! The psychology professor that helped conduct the research claims that there is more than a little bit of truth in the "peppermint theory." Dr. Joel Warm claims:
"Not only do you get an improvement (in focus) with peppermint, you get a change in response that affects alertness in target direction."
I am not sure what Principal Boucher found out after her test results came back. My guess is that if the scores were up, she will make claims about her wonderful teachers and their great school, but if the scores were down, it will be because the peppermint did not work. I might call her to followup, but as a colleague pointed out, perhaps a nationally funded, double-blind research study would be better!

Have a happy, safe, and productive summer!

Tuesday, May 22, 2007

Pearson is Nashville Bound!

Pearson is excited about our opportunity to share aspects of our research agenda with participants of the annual CCSSO Conference on Large Scale Assessment. Each year, CCSSO sponsors many important events supporting education, but the main opportunity for all of us to get together is the annual assessment conference. This year, the conference is venturing back across the Mississippi river to the "Volunteer State." I, for one, look forward to exploring Nashville and anticipate another wonderfully informative conference.

Based on just the presentations sponsored by or accepted from Pearson, the conference is likely to be another big success.

CCSSO Sessions for 2007
(Dates and Times are tentative. Please consult the conference program for final arrangements.)

Sunday, June 17, 3:45 - 5:15 pm
Assessments for English Language Learners: Validity Evidence from California and Texas
-Kimberly O'Malley

Leveraging Technology for State Assessments: Testing Directors Share Current Initiatives and Future Visions
-Denny Way

Monday, June 16, 8:00 - 10:00 am
Vertical Integration of Benchmarks and Standards: Including Alternate Assessments in Evaluating Growth
-Scott Davies

Gridded Response Items: Should They Be Used in High-Stakes Testing?
-Kimberly O'Malley, Rob Kirkpatrick, Ahmet Turhan

1:30 - 3:00 pm
Anticipated/Unanticipated Consequences of NCLB: An Applied Psychometric Perspective from Testing Industry Leaders
-Jon Twing

Can Statewide Assessments Identify Students Ready for College without ACT or SAT Scores?
-Jon Twing (Organizer)

3:30 - 5:00 pm
Comparability of Two Common Test Variations
-Steve Fitzpatrick

Comprehensive Integration of Paper/Pencil and Online Testing - Making It Happen: Program Management and Operational Perspective
-Kim Carson

Tuesday, June 19, 8:15 - 9:45 am
Innovative Science Assessment Supports for Students with Disabilities
-Michael Harms

Improving the Working Relationships Between States and Their Contractors - Steps Each can Take
-Michael Hussey

1:45 - 3:35 pm
Portfolio- and Events-Based Approaches to Alternate Assessments: Improvements on Assessing the 1% Student Population
-Scott Davies, Karen Squires, and Linda Zimmerman (Organizers)

Challenges and Opportunities in Designing Innovative Computer-based Test Items
-Ellen Strain-Seymore

Using New Automated Technologies to Scoring Writing Assessments
-Paul Nichols, Karen Lochbaum

4:15 - 5:50 pm
Field Testing to Support Assessment Programs: Options, Pitfalls, and Technical Considerations
-Rob Kirkpatrick

Wednesday, June 20, 9:00 - 10:30 am
Standard Setting Approaches for Alternate Assessments: Experiences and Research
-Scott Davies

Tuesday, May 08, 2007

Bullies Grow Up to be Bullies

When I was in school countless years ago, I had my run in with bullies, too. They were not bigots, racists, or otherwise terrorists. They were your average "tough thugs" who would take my lunch money, pin my arms behind my back, and occasionally beat me up. These experiences were not fun and did not cause me undue mental trauma (at least none that I know of). While these experiences might have been hard for some, I have since long forgotten them. That is, until I ran into one of these bullies at the Ontario airport. This thug was hogging the space for bags on the shuttle bus, and was being abusive to me and the others as we tried to board the bus. The same feelings, which I thought had faded away long ago, came rushing back quicker than that embarrassed feeling us 70's graduates have when we hear the Bee Gees.

This story is relevant for education because the national debate on what to do with bullies at school has recently had some air play. Iowa, for instance, recently passed a bill "to ban bullying in all Iowa schools," so says the Des Moines Register. According to the article, at the time of signing, Iowa was one of only ten states in the nation to enact such a comprehensive anti-bullying act. While this act does include provisions that I might otherwise call discrimination (i.e., the bullying of gays), it also covers the old fashion kind I am familiar with.

A school bully does indeed grow up to be a bully in life, as I am sure was the case with my encounter at the Ontario airport. Similarly, according to Urbandale Superintendent Greg Robinson:
"I think our kids see adults bully each other all the time. Before I comment on someone else's behavior, I've got to take care of my own."

So don't scoff at the debate on bullying. Eliminating bullying in our schools will allow us the time and energy to focus on education—and may very well keep me from getting beat up in airports.

Monday, April 23, 2007

What I Did on my Spring Break

Psychometricians are peculiar. While most people look forward to spring break by planning family outings, going to the beach to drink beer, or simply forgetting about the grind of existence, we (psychometricians) typically spend them at our annual conference for the National Council on Measurement in Education (NCME) and the American Educational Research Association (AERA). This year was no different as I drove to Chicago on Easter Sunday (April 8th for those of you with different persuasions) looking forward to an invigorating conference with winds in excess of 20 mph and temperatures below freezing. (It even snowed, and the Cubbies were cancelled because of sleet!)

Make fun as I might (and do), I am genuinely recharged at such meetings. I am reminded, in this political world where so little really matters, that what teachers do daily is very important. As such, what psychometricians and measurement professionals do daily is also important. As my batteries consume the flow of research energy, I am also reminded that we are scientists and that our standards of best practice, measurement design and our profession in general MUST BE guided by research.

Perhaps the quality of the research in general does not seem to be at the levels it once was. Perhaps there were too many sessions lamenting the terribleness of NCLB. Perhaps some members of NCME still refuse to embrace their AERA brethren. Despite all of these, there is much to be learned from each research paper presented—if you are wise enough to understand it.

Here are some of the things I learned:

  1. Graduate students, despite how sophisticated they might seem, are very poor presenters. They need coaching around the most simple aspects of text size for overheads, how to articulate without the dreaded "umhs..." and "...ahs..." typical of nervous presenters, and most of all, they have to understand how much information they can really present in the 1o or 12 minutes they have. I don't recall struggling so much with these when I was a gradual student, but I'm sure my memory is as sharp as my presentations were.
  2. Calling the front desk or speaking to the cleaning staff is not likely to bring the elevators to the 29th floor any faster than sacrificing chickens would.
  3. While walking down 29 flights of stairs might be easier than walking up, it is still a really long journey and certainly enough to make you break a sweat.
  4. I would rather buy dinner for a large group of people than listen to ten Ph.D.'s figure out how best to split the bill.
  5. If you bring a printer along, you can actually be quite productive while you work out of your hotel room.
  6. The One-Parameter Logistic Model (OPLM) is really either the Rasch model with two parameters or it is a 2PL Model with fixed, estimated elsewhere, integer a-parameter values.
  7. Kansas, of all places, has an assessment program, and it seems very rigorous and robust.
  8. You don't need to be an alum to attend the Iowa, North Carolina or Michigan State Alumni parties.
  9. Walking home after three alumni parties in a town like Chicago is quite a challenge.
  10. There are more things in common between Thurstone, Guttman, Rasch, and Mokken than there are differences.
  11. "Just Noticeable Differences" or JNDs are alive and well when comparing self-parking at $26.00 a night to valet parking at $35.00 a night.
  12. Pearson sponsored the NATD Breakfast, the NATD Dinner, the Division H Breakfast, the RASCH SIG Dinner, and a graduate student reception, to name a few.

OK, so maybe I should have learned more, but it was spring break after all.

Tuesday, April 03, 2007

My 15 Minutes of Fame

Education News columnist, Robert Oliphant, has in the past agreed with some of the notions, comments or papers written by myself or others as referenced in this blog. (See for example one of his earlier posts.) While this is quite flattering and I appreciate his perspective, our mission (his and mine) seems to continuously miss the mark—or at least policy makers still don't seem to understand what we are trying to say.

One of Mr. Oliphant's more recent columns continues my call that assessment be "transparent, verifiable and not too complex." As a psychometrician, this is a "no brainer" as the scientific side of psychometrics is ingrained in mathematical statistics, where proofs and reproducibility are paramount. (Most mathematical statisticians I know are still working on the communication and complexity aspect.) Mr. Oliphant applies this principle to national standards—which is just fine by me—and others have applied this principle to instruction, to education in general, as well as to the definition of what the "product" of our school systems needs to be.

While this last aspect sounds simple, the current debate about college readiness and workplace readiness, the rigor of high school (particularly the senior year) and the recent lack of mandated achievement standards for the accreditation of institutions of higher learning speaks volumes. Namely, that we are still thinking about education in far too complicated ways and are missing the "forest for the trees."

Here is how I think about education:
First, we need to link the curriculum (content standards) in a progressive manner that delineates what it is we want students to learn from pre-kindergarten to college—the old-fashion notion of PreK–16 or K–20. This will allow the "compound interest" of learning to continue across the grades.

Second, we have to measure what it is we expect children to learn across the linked system. We can then use these measures to not only improve our instruction but also to manage our intervention. As novel as it sounds, the measurement data could actually inform teachers regarding what is working and what isn't.

Finally, (and this is arguably the most controversial) we have to stop denying individual differences and allowing students without pre-requisite skills to advance. I don't mean that failing students should repeat the grade, but rather, the system should have a continuous feedback/intervention loop such that students will master the pre-requisite skills before moving to the next level of content. Notice I said content level and not necessarily grade level. Students who move on in the current system—many of whom struggle with the mastery of basic skills—are destined to failure at later grades without mastering those skills.
Some people would call these ideas naive; and some would label them as another example of the failed "ungraded systems" that were the rage in education in other decades. I call it a transparent, verifiable and not too complex system of education, and a simple way to focus our attention on what is important: instruction, learning, measuring, and the feedback/intervention loop.

Tuesday, March 20, 2007

PEM Pre-Conference Workshop at NCME

With no small amount of pride, I would like to point loyal blog readers to the NCME Pre-Conference Workshops where our very own Dr. Ye Tong will be conducting a workshop on Vertical Scaling. Dr. Tong will be supporting her major advisor Dr. Michael Kolen and the venue will be sure to tell you more about vertical scaling than you had ever hoped for! Just check out the abstract taken from the NCME online program:

Vertical Scaling

Presenters: Michael Kolen, University of Iowa; Ye Tong, Pearson Educational Measurement

The potential need for constructing a vertical scale arises whenever a testing program has multiple grade levels and wishes to have a common scale to compare test scores across these grade levels. Vertical scaling uses statistical process to place test scores that measure similar content domain but at different educational levels onto a common scale. The goals of the session are for attendees to be able to understand the principles of vertical scaling, to conduct vertical scaling and to interpret the results of vertical scaling in reasonable ways. Vertical scaling will be contrasted with related equating and linking processes. Traditional and IRT vertical linking methodologies will be described and practical issues will be discussed. The focus is on developing a conceptual understanding of vertical scaling through numerical examples and discussion of practical issues. Importance and challenges related to vertical scaling will be included. The text for the session is a chapter in Kolen and Brennan’s (2004) “Test Equating, Scaling, and Linking. Methods and Practices (Second Edition).”
Register Now!

Tuesday, March 06, 2007

Pearson Leads The Way

A key strategic objective of Pearson is to promote measurement best practices through research. The annual meeting of the American Educational Research Association (AERA) and the National Council on Measurement in Education (NCME) is one venue in this regard. I am happy to report that, once again, Pearson is providing several relevant paper presentations at the conference.

This year, the conference will be held in the "windy city" Chicago, April 9-April 13. Here are some of our accepted topics to be presented. (Check the official online program for dates, times, and locations.)

NCME 2007

Training session: Vertical Scaling Methodology, Application and Research
- Tong, Ye

Understanding Correlates of Rapid-Guessing Behavior in Low-Stakes Testing: Implications for Test Development and Measurement Practice
- Xiaojing (Jadie) Kong

Estimating Classification Consistency for Complex Assessments
- Wan, Lei

Imputation Methods for Handling Null Categories in Polytomous Items
- Keng, Leslie & Turhan, Ahmet

An Investigation of the Accuracy of the Estimates of Standard Errors for the Kernel Equating Functions
- Mao, Xia

Some Issues in Computing Conditional Standard Errors of Measurement for State Testing Programs
- Thompson, Tony

Effects of Anchor Item Properties and Dimensionality of Test on Vertical Scaling
- Turhan, Ahment; Tong, Ye & Um, Kay

Individual Growth and School Performance Indices
- Li, Dongmei & Shin, David

Priors in Vertical Scaling
- Tong, Ye

AERA 2007

Models of Raters' Cognition During Essay Scoring: Theory and Framework
-Nichols, Paul

Investigating the Effects of Training and Rater Variables on Reliability Measures: A Comparison of Standup Local Scoring, Online Distributed Scoring, and Online Local Scoring
-Kreimen, Cindi

Effects of Scoring Environment on Rater Reliability, Score Validity and Generalizability: A Comparison of Standup Local Scoring, Online Distributed Scoring and Online Local Scoring
-Kanada, Mayuko

Cognitive Task Analysis of Raters’ Evaluation Strategies for Scoring Constructed Response Items
-Harms, Mike

Establishing Measurement Equivalence of Transadapted Reading and Mathematics Tests
-Davies, Scott, O'Malley, Kimberly & Wu, Brad

Reliability estimates in an alternative scoring procedure of constructed-response items in large scale standardized tests
-Kanada, Mayuko & Nichols, Paul

An Investigation of College Performance of Advanced Placement (AP) and Non-AP Student Groups
-Keng, Leslie

Comparisons of the Kernel Equating Method with the Traditional Equating Methods in Random Groups Design-A Simulation Study
-Mao, Xia

Using Collateral Information to Improve the Estimating of NAEP Subscale Score
-Shin, David

An Exploration of Methods for Evaluation of Individual and School Progress at the Subscale Level
-Shin, David, & Li, Dongmei

Evaluation of calibration/linking approaches to mixed format writing assessments
-Um, Kay; Kim, Dong-In & Turhan, Ahmet

Monday, February 19, 2007

Real Standards get Booted...er...Boost On the Hill

I read with great pleasure the recent Ed Week article from Lynn Olson regarding legisative intent to set national standards. Actually, it was more like how I read National Lampoon or Mad Magazine when I was a kid.

Anyway, it seems like the politicians are hard at work looking to see what they can do next after NCLB. The Dodds-Ehlers bill—called the Standards to Provide Educational Achievement for Kids, or SPEAK, Act—is really an attempt to push states to adopt NAEP-like standards for national comparative purposes. According to Olson, the bill would provide up to $4 million in grants to each state adopting the new and voluntary "American education content standards" in math and science. These new content standards would be developed by the National Assessment Governing Board (NAGB), the overseers of NAEP. In addition, the SPEAK Act would allow the Secretary of Education to extend the 2014 deadline for No Child Left Behind for states who adopt the new standards.

Let the rhetoric begin! I have commented in this blog and elsewhere about the questions I have regarding NAEP and its standards. Why would we expect states with different content standards than those measured on NAEP to have comparable performance to NAEP? The Olson article also generates other good questions. Suppose for a minute that we (you and I) actually wanted national standards. Would we start by giving the power to create those standards to a national committee like NAGB, comprised of just a handful of people? I doubt it! Furthermore, I hear everyone claiming that our goal is to have all graduating high school students ready for college. I am not against such a goal, per say, but where was I when we debated that this was the purpose for having high school? I am not at all sure that our high schools are ready to teach the current standards, regardless of what people think about them, let alone Algebra II, Physics and Calculus required to prepare students for college.

One additional interesting comment came from the Olson article, namely a quote from noted researcher Daniel Koretz:
"If we want common content standards, we need to do some work, and it's not clear to me that a small, federally appointed board is the right place to do that."
Finally, I can agree with Koretz and be accepted into the Borg.

Monday, February 05, 2007

Chaucer, Beowulf, Literature and...Assessment

I went to a poetry reading last Saturday night at a local bookstore (Northside Book Market) here in Iowa City. So, you thought all us psychometrician types were accountants, engineers or mathematics majors who could not find real jobs? Well, I will have you know that I am trained in Middle English, studied Beowulf, and can recite the Prolog from the Canterbury Tales. But I digress. During this poetry reading, provided by our local anarchist who happens to be living and teaching in China now (go figure), it occurred to me that much of my school experience, both in high school and undergraduate, were filled with classic literature and poetry. Yet, seldom do we measure classics or poetry on either the NCLB assessments or end-of-course assessments that we are all so familiar with. Why is that? We have reading passages that are expository, narrative, informative, and technical, yet few, if any, poems. Is it because we don't value such readings? I doubt it. Just search online for references to Edgar Allen Poe (1.2 million hits) or e. e. cummings (also 1.2 million hits) and see what you find.

I fear that our lack of measuring poetry may be tied to many things. First, it is likely harder to teach than simple reading—which is not simple at all. Certainly it would be harder to measure because the construct of poetry is at least one part art, one part text and one part interpretation. Second, it may not be valued as much now as an academic subject as it was when I was in school. Like Latin, it may have fallen into that "don't really need it anymore" category. Third, it could be because some people don't like schools fooling around with areas close to emotion and passion. A politically insensitive poem is like rap or heavy metal music—something to avoid if possible. Yet such creativity, expression and passion are just what most English Langauage Arts instructors talk about when describing what they want their students to achieve. The rest of us talk about spelling, grammar, and punctuation.

Regardless of the reasons, poetry and in many ways other arts are not being assessed and I speculate are not being taught as much anymore. Tis a pitty, to quote Poe, that "...when his strength failed him at length he met a pilgrim shadow. 'Shadow', said he, 'where can it be, this land of Eldorado? Over the mountains of the moon and down the valley of shadow, ride boldly ride', the shadow replied, 'if you seek for Eldorado.'" Poetry might now be relegated to Eldorado.

Thursday, January 25, 2007

Spring is in the Air!

Spring is in the air, and I don't mean the spring assessment season. What I mean is our annual junket to a sunny place known as the Association of Test Publishers (ATP) annual vacation... er... uhh... conference! This year the conference is in sunny Palm Springs—land of heat stokes near the 17th green.

Seriously, this conference has become the place to learn about measurement from a multidisciplinary point of view. With the four divisions of ATP represented (Education, Industrial/Organizational, Certification and Licensure, and Clinical), there should be research and learning available for all. As the distinguished professor, Dr. Ron Hambleton, put it:

"If you want to be a player in credentialing, this is the conference for you!"

A quick look at the program suggests several relevant presentations, even one provided by yours truly:

Like a Bridge Over Troubled Waters: Collaborating with Clients When Things Go Wrong

Tuesday, February 6, 2007 1:30—2:30 pm

Denny Way, Vice President, - PEM
George Powell, Associate Vice President - ETS
Bill Hogan, Vice President-Marketing - AMP
Jon Twing, Executive Vice President - Pearson
Linda Waters, Vice President - Thomson Prometric

So, enjoy your time away from the snow and cold winter months and learn something to boot!