Monday, November 03, 2008

Policy Wonks We Are—Implications for NCME Members

This is the "full text" of a contribution I made to the NCME newsletter. I thought you might like to get the full inside story!

Mark Reckase’s call for NCME members to become more involved in educational policy is timely and relevant, while perhaps also a little misleading. For example, some of my colleagues and I have been working with states, local schools, and the USDOE regarding implementing policy decisions for many years. Testifying at legislative hearings, making presentations to Boards of Education, reviewing documents like the Technical Standards, and advising policy makers are all examples of how psychometricians and measurement experts already help formulate and guide policy. Nonetheless, I still hear many members of technical advisory committees (experts in psychometrics and applied measurement) “cop out” when asked to apply their experience, wisdom, and expertise to issues related to education policy, often citing that they are technical experts and the question at hand is “a matter of policy.”

I have commented and I believe that we no longer live in a world where the policy and technical aspects of measurement can remain independent. In fact, some good arguments can be made that when such independence (perhaps bordering on isolation) between policy and good measurement practice exists, poor decisions can result. When researchers generate policy governing the implementation of ideas, they must carefully consider a variety of measurement issues (e.g. validity, student motivation, remediation, retesting, and standard setting) to avoid disconnects between what is arguably good purpose (e.g. the rigorous standards of NCLB) and desired outcomes (e.g. all students meeting standards).

In this brief text, I will entertain the three primary questions asked by Dr. Reckase: (1) Should NCME become more involved in education policy? Why or why not? (2) How should other groups and individuals in the measurement community be involved in education policy? (3) What resources and supports are necessary to engage measurement professionals in education policy conversations? In what ways should NCME be involved in providing these?

I think I have already answered the first question, but let me elaborate. I maintain that we measurement professionals are already involved in policy making. Some of us influence policy directly (as in testifying before legislatures developing new laws governing education). Some of us influence policy in more subtle ways, by researching aspects of current or planned policy we do not like or endorse. We often seek out the venue of conference presentations to voice our opinions regarding what we think is wrong with education and how to fix it, which inevitably means we make a policy recommendation.

Not only do I believe that NCME and its members are involved in policy making, but I also believe it is critically important for all researchers and practitioners in the measurement community to seek out opportunities to influence relevant policy. I recall recently being involved in some litigation regarding the fulfillment of education policy and the defensibility of the service provider’s methods. After countless hours of preparation, debate, deposition, and essentially legal confrontation, I asked my colleague (also a measurement practitioner) why we bother defending best practice when there are so many agendas, so many different ways to interpret policy, so many points of view regarding the “correct way” to implement a measure. Her response was surprising—she said we do it because it is the “right thing to do” and that if we stop defending the right way to do things, policy makers will make policy that is convenient but not necessarily correct. Her argument was not about defining right from wrong; her argument was that if we were not there instigating debate there would be none, and resulting decisions would most likely be poorly informed.

So, my simple answer to the second question is to get involved. If you don’t like NCLB, what did you do to inform the policy debate before it became the law of the land? If you think current ESL, ELL, or bilingual education is insufficient to meet the demands of our ever-increasing population in these areas, what are you doing to help shape policy affecting them? Across the country debate rages regarding the need for “national standards” or state-by-state comparability. Why aren’t NCME, AERA, and all other organizations seemingly affected by such issues banding together to drive the national debate? Do we not all claim to be researchers? If so, is not an open debate what we want and need? When was the debate where it was decided that the purpose of a high school diploma was college readiness? When did we agree to switch the rhetoric from getting everyone "proficient" by 2014 to getting everyone “on grade level” by 2014? The input of measurement experts was sorely missing in state legislation regarding these issues. It is still desperately needed.

For the purpose of this presentation, let’s assume that all measurement and research practitioners are in agreement that we need to take part in policy discussions directly. What resources, tools, and/or procedures can we use to implement these discussions and how can NCME help? I stipulate that there is a feeling of uneasiness surrounding the engagement of researchers and measurement practitioners in policy debates or decisions.

Perhaps this is an unfounded concern, but there seems to be an air, forgive me, of such debates being below our standards of scientific research. Policy research is very difficult (to generate and to read), so why leave the comforts of a safe “counter-balanced academic research design” to mingle with such “squishy” issues as the efficacy of policy implementation? Perhaps NCME could strive for a division or subgroup on Federal and State Policy that would focus on measurement research as it applies to education policy (policy, law making, and rule implementation) to lend more credibility to such a scientific endeavor. Maybe NCME could work with other groups with similar interests (like AERA, ATP, CCSSO) and maybe even get a spot in the cabinet of the next Secretary of Education for the purpose of promoting the credibility of measurement research and application for informing policy. Perhaps less ambitious things like including more policy research in measurement publications, sponsoring more policy discussions and national conventions, and encouraging more policy-related coursework in measurement-related Ph.D. programs would be a good place for NCME (and other organizations) to start.

Let me close with a simple example of why this interaction between applied measurement and education policy is so important. Many of you are firm believers in the quality of the NAEP assessments. Some of you have even referred to NAEP as the “gold standard” for assessment. NAEP is arguably the most researched and highest quality assessment system around. Yet, to this day many of my customers (typically the educational policy makers and policy implementers in their states) ask me simple questions: Why is NAEP the standard of comparison for our NCLB assessments? NAEP does not measure our content standards very well; why are our NAEP scores being scrutinized? What research exists demonstrating that NAEP is a good vehicle to judge education policy—both statewide and for NCLB?

Don’t get me wrong. My argument here is not against NAEP or the concept of using NAEP as a standard for statewide comparability. My question is why my customers—the very people making educational policy at the state level—were not at the table when such issues were being debated and adopted? Did such a debate even take place? As measurement experts, when our customers come to us for advice or guidance, or with a request for research regarding the implementation of some new policy, I believe it is our obligation to know and understand the implications of such a request from a policy point of view, not just a measurement point of view. Otherwise, we will be acting in isolation and increasing the divide between sound measurement practice and viable educational policy.

Wednesday, July 09, 2008

Why I Stopped Reading Editorials

I gave up reading editorials quite a long time ago. Not because they are too often misleading or inaccurate (many of them are), or that they are too often purposely written to be controversial and sensational (again, many of them are). Rather, I quit reading because the whole purpose of editorials seems rather futile to me.

Let me explain. People who write editorials usually have a strong position with reasons and rationales why they feel that particular way. Informed readers of editorials either agree with that position and its reasons and rationales, or they disagree, usually from a strong position with a direct opposite viewpoint from that of the editorial writer. In either case, the editorial does little to change someone’s opinion; it just stirs up a lot of emotion. Therefore, the only people who might benefit from reading editorials are those who have not yet made up their minds. However, if the topic is well enough defined to cause a debate in the editorial pages, I wonder how many people really have no position or opinion. Hence, the futility. So I just quit reading them.

Occasionally, friends, family, colleagues or even readers of TrueScores send me editorials and ask for a reaction or an opinion. Not too long ago this happened regarding Dr. Chris Domaleski's op-ed “Tests: Some good news that you may have missed,” from May 29, 2008, in the Atlanta Journal-Constitution. Chris is a colleague, customer, and friend of mine; and I found his comments to be very well written, well supported, and his message very helpful for all those impacted by testing in Georgia. His message, simplified and summarized, was: testing is complicated, necessary and beneficial and ill-informed rhetoric does not help improve learning. (This is my summary of his message and not his own words.)

Unfortunately, it would seem the ill-informed rhetoric continues. I am referring to Michael Moore's post on SavannahaNOW, called "Politics of school testing." It is too bad, but apparently Mr. Moore did not read Dr. Domaleski's comments. First, Mr. Moore claims that the state "blindsided" the schools regarding the poor results on the state’s CRCT. I don't know how this can be as the law of the land has required that states move to "rigorous" content standards and further, this law expects that no child is left behind in attaining those standards. Georgia has implemented a new curriculum with teacher and educator input. Field testing, data review, content review, alignment reviews have been conducted by educators across Georgia. All of this conducted under the "Peer Review" requirements of the Federal NCLB legislation. Passing standards were established with impact data and sanctioned by the State Board of Education. How can anyone be blindsided by such an open and public action?

Mr. Moore also states that he has seen no analysis of the assessment and no discussion of how "...a curriculum and test can be so far out of line." Hmm… I wonder if Mr. Moore is not more upset with the poor performance of the students. It could be the curriculum and the assessment fit very well together. In fact, the required alignment studies as well as educator's working with the Department to review the items should ensure they are aligned. Since the curriculum is new, perhaps the students have not learned it as well as they should.

Mr. Moore then mistakenly claims that the CRCT test in Georgia is constructed out of a huge bank of questions the test service provider (in this case CTB/McGraw-Hill) owns and is part of a "...larger national agenda." I am not much into conspiracy theories, but a quick review of the solicitation seeking contractor help would reveal that the test questions are to be created for use and ownership in Georgia only. Mr. Moore also claims that the multiple-choice format "...seldom reflect the actual goals of the standards." I admit, some things are difficult to measure with multiple-choice test questions—for example, direct student writing—yet many aspects of the learning system do lend themselves to objective assessment via multiple-choice and other objective test questions.

I don't want to get into a debate with Mr. Moore about how the State of Georgia manages the trade-offs between budget pressures (multiple-choice questions are much less expensive in total than subjective but rich open-ended responses) and curriculum coverage of more difficult aspects of the curriculum he outlines, such as inquiry-based activities. It is an over simplification, however, to simply dismiss the issues and suggest or imply that all would be well if Georgia abandoned objective measures.

At the end of the day, I disagree with Mr. Moore and agree with Dr. Domaleski that less rhetoric and more fact-based discussions are needed. If we build the test to measure the curriculum, and the curriculum is new and rigorous, it is unlikely that students will perform well at first. If we build a test where all students perform well, what good does a new and rigorous curriculum get us? Students will receive credit without learning.

Wednesday, June 04, 2008

The Academic Debate about Formative Assessments

There are some things in educational measurement that are not debated. Foremost, the purpose of instruction is to improve learning. The purpose of assessment is to improve instruction, which in turn improves learning. In other words, it’s all about the learning—debate over.

Some researchers (myself included) have become sloppy with our language, labeling assessments "for learning" to be formative and assessments "of learning" to be summative. So, under this lax jargon, a multiple-choice quiz used by the teacher in the classroom at the end of instruction for the purpose of tailoring additional instruction would be deemed "formative." If you follow the rhetoric from national "experts," technical advisory committees, or other learned people, then I have just offended many!

Currently, there is much discussion regarding formative assessments and the need to balance the multitude of assessments that might be used during a school year. A good place to start might be with the paper by Perie, et. al. (2006) posted to the CCSSO SCASS website. You and I might not agree with the classifications or the terminology, but the classification scheme used by these authors helps to contextualize the debate quite well and may even allow you to make up your own mind.

What does, however, put peanut butter into my cognitive gears is all the arguments and wasted effort I hear regarding what exactly does or does not constitutes a "real" formative assessment. I even heard one nationally recognized measurement expert comment that, by definition, no assessment constructed by anyone other than a teacher can be called a formative assessment. I try to remind myself (and others) that at the end of the day only one thing matters: What have you done to improve learning? I doubt that arguing about definitions of formative, benchmark, or interim assessments helps with this.

Tuesday, April 01, 2008

Len Swanson: Pearson Visiting Scholar

Dr. Len Swanson from ETS was the most recent keynote speaker during the Pearson Visiting Scholars program in Iowa City. Dr. Swanson talked about Computer Adaptive Testing (CAT) and the history of how we got to where we are. Dr. Swanson was a particularly good choice for this presentation as he has worked in CAT since its inception and was "on the floor" when most of the ground work was laid for what we take for granted today. Pearson was very lucky to have Dr. Swanson’s expertise, which he also shared with students and faculty from the University of Iowa.
Len pointed out that the desire to tailor testing toward individuals was really enabled by the proliferation of IRT methodology, as well as the continued improvements in technology. Early research was provided by think tanks like ETS with funding coming from the Office of Naval Research.

Len provided the following timeline as a backbone to anchor CAT development to:

1980-1984: Computerized college placement tests
1987-1988: Computerized mastery testing (NCARB)
1990-1993: Praxis exam operational
1990-1993: Graduate Records Exam (GRE) CAT Version
1993-1994: NCLEX (Nurses' Certification and Licensing Exam) CAT Operational.
1994-2008: Statewide CATs implemented both district and state level
When asked about the challenges encountered on the road to operational CAT exams, Dr. Swanson responded that quality item pools, infrastructure, and exam security were the big issues of the day. Funny, isn’t it? Almost fifty years later the same issues are still roadblocks to fully realizing the potential of both computer-based, as well as computer adaptive testing.

Wednesday, March 19, 2008

More Pearson at AERA/NCME!

Sometimes I forget how big Pearson really is. Here are additional presentations at both the AERA and NCME national conventions.

NCME Papers and presentations
Chu, Kwang-lee, & Lin, Serena Jie
Distracter Rationale Taxonomy: A Formative Evaluation Utilizing Multiple-Choice Distracters

Jirka, Stephen
Test Accommodations and Item-Level Analyses: Mixture DIF Models to Establish Valid Test Score Inferences

Lau, Allen
Evaluating Equivalence of Test Forms in Test Equating With the Random Group Design

Lin, Serena Jie
Examining the Impact of Omitted Responses on Equating

Seo, Daeryong
Exploring the Structure of Achievement Goal Orientations Using Multidimensional Rasch Models

Stephenson, Agnes
Examining Individual Students’ Growth on Two States’ English Language Learners Proficiency Assessments

Using HLM to Examine Growth of English Abilities for ELL Students and Group Differences

Wang, Jane
Modeling Growth: A Longitudinal Study Based on a Vertical Scaled English-Language Proficiency Test

Wang, Shudong
Vertical Scaling: Design and Interpretation

The Sensitivity of Yen’s Q3 Statistics in Detecting Local Item Dependence

NCME Papers and presentations

Arce-Ferrer, Alvaro & Diaz, Ileana
An Experimental Investigation of Rating Scale Construction Guidelines: Do They Work with Spanish-Speaking Populations

Yi, Qing
Item Pool Characteristics and Test Security Control in CAT

Wang, Shudong; Zhang, Liru; Kersteter, Patsy; Bolig, Darlene; Yi, Qing
An Investigation of Linking a State Assessment to the 2003 National Achievement of Educational Progress (NAEP) for 4th and 8th Grade Reading

Arce-Ferrer, Alvaro & Shin, Seon-Hi
Three Approaches to Measuring Individual Growth

Wang, Shudong; Jiao, Hong; & Hi, Wei
Parameter Estimation of One-Parameter Testlet Model

Wang, Shudong & Jiao, Hong
Empirical Evidences of construct Equivalence of Vertical Scale Across Grades in K-12 Large-Scale Standardized Reading Assessments

Tuesday, March 18, 2008

Pearson Presentations at AERA & NCME

The contingent of Pearson researchers has, once again, done an admirable job of representing our industry at the annual meeting of the American Educational Research Association (AERA) and the National Council on Measurement in Education (NCME) the week of March 24th in New York City.

The following are the AERA paper and symposium submissions:
Jason Meyers & Xiaojin Kong
An Investigation of the Changes in Item Parameter Estimates for Items Re-field Tested

Leslie Keng, Walter L. Leite, & Natasha Beretvas
Comparing Growth Mixture Models when Measuring Latent Constructs with Multiple Indicators

Leslie Keng, Edward Miller, Kimberly O'Malley, & Ahmet Turhan
Composite Score Reliability Given Correlated Measurement Errors between Subtests and Unknown Reliability for Some Subtests

Ye Tong, Sz-Shyan Wu, & Ming Xu
A Comparison of Pre-Equating and Post-Equating Using Large-Scale Assessment Data

Rob Kirkpatrick & Denny Way
Field Testing and Equating Designs for State Educational Assessments

Lei Wan & Brad Ching-Chow Wu
Person-fit of English Language Learner (ELL) Students in High-Stakes Assessments
Ellen Strain-Seymour
A User-Centered Design Approach for the Refinement of a Computer-Based Testing Interface

Jeff Wilson
A User-Centered Design Approach to Developing an Assessment Management System

Paul Nichols
The Role of User-Centered Design in Building Better Assessments

Michael Harms
An Introduction to User-Centered Design in Large-Scale Assessment

The following are NCME paper and symposium submissions:
Paul Nichols & Natash Williams
Evidence of Test Score Use In Validity: Roles And Responsibility

Denny Way, Chow-Hong Lin, Katie McClarty, & Jadie Kong
Maintaining Score Equivalence as Tests Transition Online: Issues, Approaches and Trends

Denny Way, Paul Nichols, & Daisy Vickers
Influences of Training and Scorer Charactersistics on Human Constructed Response Scoring

Ye Tong & Michael Kolen
Maintenance of Vertical Scales

Leslie Keng, Tusng-Han Ho, Tzu-An Chen, & Barbara Dodd
A Comparison of Item and Testlet Selection Procedures In Computerized Adaptive Testing

Jon S. Twing
Off-the-Shelf Tests and NCLB: Score Reporting Issues

Erika Hall & Timothy Ansley
Exploring the Use of Item Bank Information to Improve IRT Item Parameter Estimation

Canda Mueller
Response Probability Criterion and Subgroup Performance

Tony Thompson
Using CAT To Increase Precision In Growth Scores

Come see us in action. You are bound to go away smarter!

Thursday, February 21, 2008

International Objective Measurement Workshop in NYC!

In the olden day, Raschites and 3PL researchers fought with so much vigor that they separated ways. I recall one AERA/NCME conference with Ron Hambleton on the right, Ben Wright on the left, and nothing but a "DMZ" in between. Well, times have changed, and more moderate heads have prevailed. Hence, those of you attending the AERA/NCME national conference in New York City should consider coming early and checking out the International Objective Measurement Workshop (IOMW). The workshop is held the two days prior to AERA and provides an excellent opportunity to hear about the latest developments in measurement.

The preliminary program for IOMW 2008 is now available. The conference will be held March 22 and 23, 2008 at New York University (NYU) in New York City. There are 48 paper presentations and 2 computer demonstrations scheduled. The preliminary program and the conference registration form can be found on the Journal of Applied Measurement (JAM) web site.

Early registration is currently open and in effect until March 14, 2008. Register now and save $10 on the registration fee. Late and onsite registration will also be available.

Hotel information can be found on the NYU web site. But hey, this is NYC, and it is easy to get anywhere from anywhere.

So check it out. If you have to travel all the way to NYC, you should at least take this opportunity to extend your stay over the front-end weekend.

Send your comments on this entry to

Monday, February 11, 2008

Standard Setting Workshop at ATP in Dallas

The Pearson psychometric and research services team will be presenting a workshop at the annual conference for the Association of Test Publishers (ATP) held in Dallas, Monday, March 3, 2008. The title for the workshp is "Setting Performance Standards on High Stakes Tests."

Pearson has arguably more experience setting performance standards under NCLB than anyone. Most of this research is not published in peer-reviewed journals, but rather, become aspects of statewide technical reports. This workshop will be a great opportunity for customers, researchers and other practicioners to see what standard setting is all about for large-scale, high-stakes assessments required under NCLB.

The conference this year is being held at the Gaylord Texas resort near Grapevine. The workshop will take place on Monday, March 3, 2:00- 4:30 p.m.

The presenters from Pearson include:

Dr. Scott Davies
Dr. Erika Hall
Dr. Paul Nichols
Dr. Kimberly O’Malley
The team will describe basic activities used under common standard-setting methodology, including:

-item mapping
-modified Angoff
-body of work
-ID matching
-contrasting/borderline groups
-judgmental procedures
Facilitators will describe the roles played by psychometricians, meeting coordinators, and data analysts. Then, attendees will participate in a sample item mapping standard setting in which they will set a cut point and describe reasons for their judgments. Throughout the workshop, seasoned facilitators will share lessons learned and will distinguish what should happen in theory from what does happen in practice. Attendees will leave the workshop with a set of practical materials that will help them plan a future standard-setting meeting.

If you had no reason to attend this conference, this workshop should cause you to not only resgister, but show up and participate!

Send your comments to

Monday, February 04, 2008

Together We Can Change the World...

Well, the long anticipated integration between Pearson and former test publishing giant Harcourt Assessment, Inc. (also known at times in it's history as The Psychological Corporation and the testing division of Harcourt, Brace and Jovonovich) is complete! Senior Pearson leadership were in San Anotonio this week to meet with the new Pearson employees and to celebrate the lengthy process of DOJ approval.

In a previous press release, Pearson CEO, Marjorie Scardino, said:

"We have long admired these businesses. They bring new intellectual property, capabilities and skills to Pearson, and will enable us to accelerate our strategy of leading the personalisation of learning, both in the US and around the world. We know that their people share our commitment to education, and we look forward to welcoming them as colleagues."
For me personally, it was like "old home week" seeing many of the faces I've gotten to know over the years and reviewing the wonderful products, services and staff from Pearson's new "crown jewel."

Make no mistake, while challenges lie ahead in the education and assessment arenas, Pearson's goal to educate, inform and entertain our customers remains our primary motivator and this latest acquisition is another step toward accomplishing that goal.

Welcome aboard!

You can comment on this posting by emailing

Monday, January 28, 2008

If He is Correct, Then I Must Have Been Wrrrrrrrong(?)

I have had the occasion to know Dr. James Popham for many years and in many contexts. You might recall that Dr. Popham was the keynote speaker at the last ACT-CASMA conference and that I had devoted some space to the conference in a previous post.

Now please understand, Dr. Popham has worked in measurement for many years and describes himself as a "reformed test builder," presumably implying some sort of 12 step program. Despite this, or at least as a prelude to this, Dr. Popham has been very influential in assessment. He was an expert witness in the landmark "Debra P." case in Florida, and was involved in the early days of teacher certification in Texas and elsewhere. He is also the author of numerous publications.

Over the years I have listened to Jim says some outrageous things. For those of you who know Jim, this is no surprise. He is quite a presenter and, I suspect, basks a little too much in the glow of his own outrageousness. However, many of the things I have heard him say (at the Florida Educational Research Association-FERA meeting, for example) were just plain incorrect. I won't bother you with the specifics as I am sure Dr. Popham would claim he is correct. Yet, it does put me in a quandary. Despite his recent statements, I actually have to agree with what Dr. Popham said at the ACT-CASMA conference back in November.

Jim's theme—one he has articulated in multiple venues—regarded what he calls "instructional sensitivity." Here are the basic tenets of his argument:

"A cornerstone of test-based educational accountability:
Higher scores indicate effective instruction; lower scores indicate the opposite."

"Almost all of today's accountability tests are unable to ascertain instructional quality. That is, they are instructionally insensitive."

"If accountability tests can't distinguish among varied levels of instructional quality, then schools and districts are inaccurately evaluated, and bad educational things happen in classrooms."

I keep returning to this theme. While I make a living building assessments of all types, recently most of my efforts and those of my colleagues have been with assessments supporting NCLB, which are "instructionally insensitive" according to Dr. Popham. It is hard to believe that any assessment that asks three or four questions regarding a specific aspect of the content standards or benchmarks (and by the way does so only once a year) can be very sensitive to changes in student behavior due to instruction on that content. At the same time, having some experience teaching, testing, and improving student learning, I have seen the power that measures just like these have for teachers who know what to do with the data and have a plan to improve instruction.

Hence my dilemma: why do I keep returning to Dr. Popham's argument? While I am not ready to admit I might have been wrong to dismiss Jim as a "reformed test builder" and to ignore his rants, I do admit he has a valid point to some extent regarding instructional sensitivity. I suppose I would have called his argument "the instructional insensitivity of large-scale assessments," but who am I to quibble with vocabulary.

Dr. W. James Popham, Professor Emeritus from UCLA welcomes all "suggestions, observations, or castigations regarding this topic...." Contact him at Or send an email to, and I will forward it to him.

Friday, January 18, 2008

IQ and the Flynn Effect

Back in the 1980s when I worked on the development of the Wechsler Intelligence Scale for Children, Third Edition (WISC-III), I was fascinated with a process commonly referred to at the time as "continuous norming." Applied by Dr. Gale Roid as developed by Professor Richard Gorsuch, continuous norming was a slick way to improve the precision of empirical norms. While things seemed to get in the way of any in-depth analysis of the procedure, and while I did stay in contact with Professor Gorsuch occasionally, I did nothing to understand or apply the process anew and simply moved on.

Over the winter holidays, I was reading The New Yorker (yes, even people who live in Iowa read The New Yorker) and discovered, much to my surprise, a story about IQ written by Malcolm Gladwell, titled "None of the Above: What IQ doesn’t tell you about race" (December 17, 2007, pp. 92-96). As you may recall, Malcolm Gladwell is the author of both The Tipping Point and Blink. Both books interested me, so I read what he had to say about IQ.

Gladwell references something he (and apparently others) call “The Flynn Effect.” The Flynn Effect comes from James Flynn, author of What is Intelligence?, and is essentially the term used to describe what Flynn claims to have discovered—that all humans are getting smarter. As Gladwell points out, Flynn looked at years of IQ assessment data from all over the world and concluded that humans gain three IQ points per decade. Gladwell then tries to put this in context. For example, if Americans' average IQ in 2000 was 100, then in 1990 it was 97, in 1980 it was 94, in 1970 it was 91, and so on. If true, this implies that my grandfather (and yours) were “dull normals” at best, but were most likely mentally retarded. Flynn claims that this is due more to the way we measure intelligence than anything else. He states, as Gladwell points out:

“An IQ, in other words, measures not so much how smart we are as how modern we are.”
For example, when members of the Kepelle tribe in Liberia were asked to associate objects such as a potato and a knife, they linked them together according to function. As Gladwell points out, after all, you use a knife to cut a potato. Most IQ assessments would expect the potato to be linked to other legumes and the knife to be linked to other tools. Flynn claims modern culture has “taught” us to think in the way the IQ assessment measures and, while this is different than how the Kepelle thought, there is no reason to believe that their thinking represents anything less intelligent.

Gladwell then tries to articulate the issue that Flynn makes regarding intelligence test norms. He observes that if the center of each new edition of the WISC is 100, and everyone is getting smarter by three IQ points per decade, than each subsequent form of the WISC (the first WISC was standardized in the 1940s) must be getting harder. Very interesting—I need to dig up references on the “continuing norming” process used for the WISC and see what impact, if any, such a process might have on "The Flynn Effect."

You can comment on this posting by emailing