Monday, June 29, 2009

Pearson is Fulfilling the Goal to be the Nation’s Thought Leader in Assessment

One of primary objectives of Pearson as the leading provider of educational measurement research is to lead the effort on effective educational policy discussion. Sometimes these efforts are clearly articulated in customer facing actions (like legally defensible setting of student performance standards), academic research publications or conference presentations. Other times, policy and/or position papers are prepared to inform our customers and others regarding the direction Pearson is steering education. I was recently involved in the development of such a paper and wanted to share it with you in this post

“Using Assessments to Improve Student Learning and Progress” is a very interesting paper that clarifies the roles of large-scale, high-stakes assessments as contrasted with classroom assessments. While I have made such comparisons in other TrueScores posts, this paper is much more comprehensive.

Here is a brief except of the distinctions made in the paper:
“Assessments for learning provide the continuous feedback in the teach-and-learn cycle, which is not the intended mission of summative assessment systems. Teachers teach and often worry if they connected with their students. Students learn, but often misunderstand subtle points in the text or in the material presented. Without ongoing feedback, teachers lack qualitative insight to personalize learning for both advanced and struggling students, in some cases with students left to ponder whether they have or haven’t mastered the assigned content.”
This paper also contains links to other Pearson related efforts to inform and shape public policy and opinion as evidenced from the follow except:

“Assessments for learning are part of formative systems, where they not only provide information on gaps in learning, but inform actions that can be taken to personalize learning or differentiate instruction to help close those gaps. The feedback loop continues by assessing student progress after an instructional unit or intervention to verify that learning has taken place, and to guide next steps. As described by (Pearson authors) Nichols, Meyers, and Burling:
‘Assessments labeled as formative have been offered as a means to customize instruction to narrow the gap between students’ current state of achievement and the targeted state of achievement. The label formative is applied incorrectly when used as a label for an assessment instrument reference to an assessment as formative is shorthand for the particular use of assessment information, whether coming from a formal assessment or teachers’ observations, to improve student achievement. As William and Black (1996) note: ‘To sum up, in order to serve a formative function, an assessment must yield evidence that…indicates the existence of a gap between actual and desired levels of performance, and suggests actions that are in fact successful in closing the gap.’”

This quote also shows how the Pearson themes are indeed consistent in that personalized learning is supported through the Pearson "teach and learn" cycle as informed by assessment—one Pearson's primary goals. So, go check it out!

Monday, June 22, 2009

Universal Design for Computer-Based Testing Guidelines

This was just posted on the Pearson website. Pearson’s Universal Design for Computer-Based Testing Guidelines examines the specific student challenges related to each test question construct and pinpoints question design solutions that can make test questions more accessible to all students. The study touts the value of digital technology and its ability to incorporate multiple representations, such as text, video and audio, into computer-based testing.

Thursday, June 04, 2009

Pearson Sessions at CCSSO (Updated)

Assessing Writing Online: The Benefits and Challenges

The transition to assessing student writing online presents both benefits and challenges to states and their students. This session will discuss the logistical, content, scoring, and political issues states face while implementing the transition to assessing student writing online. Louisiana will provide insight on implementing an online writing assessment and Minnesota will discuss the challenges it faced while attempting to develop an online writing component. In addition, research and content-based perspectives will be offered from two testing companies, Pearson and Pacific Metrics.

Presenters:
Jennifer Isaacs Pacific Metrics Corporation
Denny Way, Pearson Education
Claudia Davis, Louisiana Department of Education
Dirk Mattson, Minnesota Department of Education

Three States’ Experiences in Implementing a Vertical Scale

In this session we describe the methods used by three states to implement vertical scales in reading and mathematics across grades 3 through 8. A vertical scale has become a desirable component of a state’s assessment program in recent years because schools that do not meet their AYP requirements in terms of the number of students meeting standards may still be counted as meeting the AYP requirements if they can show that acceptable progress has been made. In Virginia, tests are administered online and on paper and a vertical scale had to be created that would be applicable to both modes. In Texas, tests are administered in English and Spanish so two different vertical scales were developed. Mississippi is implementing a multi-year plan which includes developing vertical linking items that reflect the progression of content through the curriculum and monitoring student performance prior to reporting results on the vertical scale.

Presenters:
Steve Fitzpatrick, Pearson
Ahmet Turhan, Pearson
Kay Um, Pearson

Accommodations in a Computer-Based Testing Environment

States increasingly are delivering or considering delivery of assessments via computer. Some states are working towards (or have fulfilled) a dual administration model while others are exploring using only computer-based testing in the future. Use of the computer opens the door to technological solutions for accommodations. However, the use of technology for accommodations also raises a number of questions, such as ease of use and score comparability. This session will discuss the research and development of computer-based accommodations, such as text-to-speech and onscreen magnification, and discuss the policy and practical issues/potential solutions surrounding their development and use in a secure testing environment.

Presenters:
John Poggio, University of Kansas
Bob Dolan, Pearson
Shelly Loving-Ryder, Virginia Department of Education
Todd Nielsen, Iowa Testing Programs
Discussant:
Sue Rigney, U.S. Department of Education

Ensuring Technical Quality of Formative Assessments

Reference to an assessment as formative is shorthand for the formative use of assessment data—whether coming from standardized tests, teacher observations, or intelligent tutoring systems—with the explicit goal of providing focused interventions to improve student learning. The technical quality of assessments indicate the extent to which interpretations and decisions derived from assessment results are reasonable and appropriate. However, familiar technical requirements such as validity and reliability have been developed with a focus on summative assessment and have not considered a coordinated system of instruction and assessment. For example, reliability has traditionally indicated the degree of consistency of test scores over replications, a definition that would inadequately describe a formative assessment that inconsistently prescribes appropriate targeted instruction over time and across conditions. This session will discuss different approaches to defining new indicators of technical quality appropriate for ensuring the effectiveness of formative assessment systems.

Presenters:
Bob Dolan, Pearson
Meg Litts, Onamia Public Schools, MN
John Poggio, University of Kansas
Jerry Tindal, University of Oregon
Discussant:
Tim Peters, New Jersey DOE

Application of Validity Studies for College Readiness: the American Diploma Project Algebra II End-of-Course Exam

Too many students graduate from high school unprepared for college—nearly one-fourth of first-year college students must take remedial courses in mathematics. An intended use of the American Diploma Project (ADP) Algebra II End-of-Course Exam is to serve as an indicator of readiness for first-year college credit-bearing courses—to ensure that students receive the preparation they need while they are still in high school. This session will discuss how the multistate ADP consortium gathered validity evidence for this use of the exam through studies involving: 1) content judgments of college instructors; 2) comparisons of college students’ ADP Algebra II End-of-Course Exam performance with subsequent grades in college level courses; and 3) empirical relationships between exam performance and ACT or SAT scores. Representatives from two ADP states, Achieve, and the exam vendor will share their experiences with collecting the data and how results of the exam will be used.

Presenters:
Julies A. Miles, Pearson
Nevin C. Brown, Achieve
Stan Heffner, Ohio Department of Education
Rich Maraschiello, Pennsylvania Department of Education

Keeping All Those Balls in the Air: Challenges and Approaches for Linking Test Scores Across Years in Multiple Format Environments.

In this era of requiring annual improvements in student test scores, valid test score linking is one of the most important components of a state testing program. To make things more challenging, many states are implementing computer-based testing programs, but few if any are able to completely shift to CBT platforms. Therefore, in addition to the usual challenges with test score linking, states are trying to ensure that comparable inferences are drawn from the same scale scores no matter the format of the test. This session brings together equating contractors and testing directors from two states that have successfully addressed these challenges to share their lessons learned and offer recommendations to other state leaders.

Presenters:
Scott Marion, NCIEA
Matt Trippe, HumRRO
Deborah Swensen, Utah State Office of Education
Tony Thompson, Pearson
Dirk Mattson, Minnesota Department of Education
Discussant:
Rich Hill, NCIEA


Revising the Standards for Educational and Psychological Testing

AERA, NCME and the APA have launched an effort to revise the Standards for Educational and Psychological Testing. Several members of the committee drafting this revision will describe issues being addressed and the process and timeline for completing our work. The charge to this committee specifies areas of focus including: (a) the increased use of technology in testing, (b) the increased use of tests for educational accountability, (c) access for all examinee populations, and (d) issues associated with work-place testing. The committee will also review the scope and formatting of the Standards. A state testing director will be invited to serve as discussant describing implications of the Test Standards for state assessments. A significant part of the session will be devoted to questions and comments from the audience.

Presenters:
Lauress Wise, HumRRO
Brian Gong, NCIEA
Linda Cook, ETS
Joan Herman, CRESST
Denny Way, Pearson
Discussant:
John Lawrence, California Department of Education


Lessons Learned and the Road Ahead for the ADP Algebra II Consortium

Students across 12 states took the American Diploma Project Algebra II End-of-Course Exam for the first time in Spring 2008. In August 2008, the first Annual report of the results and findings was released. This session will focus on the lessons learned from the first administration, progress towards the 3 common goals set for this endeavor by participating states and the challenges ahead for the multi-state consortium. Among topics to be discussed are the validity studies that are being conducted to help establish the exam as an indicator of student readiness for first-year College credit bearing courses as well as how the states are using the assessment to improve high school Algebra II curriculum and instruction.

Presenters:
Laura Slover, Achieve
Shilpi Niyogi, Pearson
Tim Peters, New Jersey Department of Education
Gayle Potter, Arkansas Department of Education
Bernie Sandruck, Howard Community College

The Role of Technology in Improving Turnaround Time and Quality in Large Scale Assessments

To meet their ever-increasing needs to have faster turnaround of test scores and more defensible scores, states are searching for ways to satisfy their constituents and to meet the demands of high stakes testing. This panel will discuss implementations of programs that leverage technology One test that is administered frequently and requires immediate turnaround is the ACCUPLACER, which is given to incoming freshmen at community colleges. The College Board uses an online testing environment to administer the tests and an automated intelligence engine to score them. North Carolina uses a distributed scoring model that engages large groups of scorers for its writing assessment and local district scoring of multiple choice tests for their EOC and EOG tests. Virginia administers all multiple choice tests online with Rapid Turn Around to score and deliver results. These three models meet the demands of later or on-demand testing and the demands for faster or immediate test results.

Presenters:
Daisy Vickers, Pearson
Jim Kroenig, North Carolina DOE
Ed Hardin, The College Board
Shelley Loving-Ryder, Virginia Department of Education

Accurate and Time-Saving: Online Assessment of Oral Reading Fluency Using Advanced Speech Processing Technology

This panel will describe four studies conducted across eight states investigating the usability and impact of using an online, automated test delivery and scoring system to measure and track students’ oral reading fluency (ORF) performance. The ORF system produced words correct per minute (WCPM) scores for oral reading samples from hundreds of 1st through 5th graders. The session includes discussion of: 1) technical and practical challenges involved in large-scale test delivery, scoring, and data management; 2) the reliability of the automated scoring system which produces scores that correlate highly (0.95-0.99) with teachers scoring manually; 3) policy-related impacts of reliability data comparing machine scores with scores from expert test administrators and classroom teachers; 4) innovative methods for scoring other aspects of ORF (e.g., expressiveness, accuracy); 5) teacher feedback on the value of the automated system, including how automated scoring enables reallocation of teacher time from test administration and scoring to instruction.

Presenters:
Ryan Downey, Pearson
David Rubin, Pearson
Jack Shaw, National DIBELS
DeAnna Pursai, San Jose Unified School District, CA

Developing Valid Alternate Assessments with Modified Achievement Standards: Three States' Approaches

Implicit in the design of an alternate assessment based on modified achievement standards (AA-MAS) is a validity argument that the assessment appropriately and accurately measures the grade-level academic achievement of students in a targeted sub-population of students with disabilities. In this session three states at different stages in the development process will present their approaches to developing valid AA-MAS. The states will discuss their test development models and factors influencing their choice of model. Presentations will cover the rationales and processes used for item development, research used to support test development, activities involving stakeholders, and the process of collecting validity evidence. A discussant will respond to the development models presented focusing on threats to validity, documenting validity evidence, and employing ongoing validity evaluations. The discussant’s presentation will be relevant to the test designs used in the three states and to AA-MAS test development more generally.

Presenters:
Kelly Burling, Pearson
Shelley Loving-Ryder, Virginia Department of Education
Cari Wieland, Texas Education Agency
Elizabeth Hanna, Pearson
Malissa Cook, Oklahoma Department of Education
Discussant:
Stuart Kahl, Measured Progress

Legislation in the One Corner, Implementation in the Other: And, It’s a Knock Out

Assessment legislation often gets passed before an implementation plan is fully vetted. In this session, Minnesota, Nebraska, and Texas square off against challenging assessment legislation. They will describe how they bob and weave as policy is thrown their way to develop implementation plans that are reasonable and in the best interest of their students. Presenters will share legislation that turned the local assessment system upside down, such as Texas legislation limiting field-testing at a time when 12 end-of-course assessments are to be rolled out. Join this session and listen as these three states explain how they knocked out the legislative challenges early instead of going the full 12 rounds.

Presenters:
Kimberly O'Malley Pearson
Christy Hovanetz Lassila, Consultant
Pat Roschewski, Nebraska Department of Education
Gloria Zyskowski, Texas Education Agency
Discussant:
Roger Trent, Executive Director Emeritus for the Ohio Department of Education

Cognitive Interviews Applied to Test and Item Design and Development for AA MAS (2 percent)

The session focuses on applying cognitive interviews (CI) in the development of alternate assessments judged against modified achievement standards (AA-MAS) and presents principles for using CI with AA-MAS that were formulated during a research symposium and published in a 2009 white paper. The symposium built on work of Designing Accessible Reading Assessments and Partnership for Accessible Reading Assessment, addressing think aloud methods with students eligible for AA-MAS from recent research. White paper principles and four recent CI AA-MAS studies will be presented. Studies will be discussed in the context of the principles and how results can be used to inform test design and development in AA-MAS. Copies of the white paper will be available and audience interaction will be encouraged. Participants will be invited to ask questions, offer experiences, and discuss methods for interviewing students with limited communication, gathering reliable data, and applying CI to AA-MAS test development in their own settings.

Presenters:
Patricia Almond, University of Oregon
Caroline E. Parker, EDC
Chris Johnstone, NCEO
Shelley Loving-Ryder, Virginia Department of Education
Jennifer Stegman, Oklahoma Department of Education
Kelly Burling, Pearson
Discussant:
Phoebe Winter, Pacific Metrics Corporation