Tuesday, September 28, 2010

Trends in Alternate Assessments

While working for Pearson we have seen a shift in the assessment of students receiving special education services that, for the most part, seems to be in the right direction. The movement away from off-grade level testing was at first met with resistance and disbelief from state departments, contractors, and teachers but now seems to be reluctantly accepted. The introduction of the 1% (alternate achievement standards) and 2% (modified achievement standards) assessments forced states to look at the number of students taking special education tests. Federal accountability only allows states to count 3% of their student population as proficient if they test outside of the general assessment (more than 3% of students may take the 1% and 2% assessments, but if more than 3% are proficient they will not count as proficient for school, district, and state Adequate Yearly Progress reporting). Some states had a larger percentage of students being assessed with special education tests, and the development of participation requirements or guidelines became necessary to decrease the number of students taking these assessments and to try to make sure that the students were being assessed appropriately. There is still considerable controversy about whether students are being assessed with the right test.

The 1% assessment (designed for students with the most significant cognitive disabilities) allows the assessment of academic content standards through content that has been linked to grade-level standards. Common approaches to the 1% assessment have included checklists, portfolios, and structured observation, all requiring substantial teacher involvement. Assessment of these students using pre-requisite skills linked to the grade-level curriculum was not popular with teachers. Prior to No Child Left Behind Act (NCLB), many of these students had been taught life skills, and assessing academic content was criticized as being unimportant for this student population. However, once teachers started teaching the pre-requisite skills associated with the grade-level curriculum, we heard many positive reports. Teachers were surprised to find that their students could handle the academic content. One of the most common things heard at educator meetings were teachers saying that they never knew their students could do some of the things they were now teaching.

Providing psychometric evidence to support the validity and reliability of the 1% assessment has been challenging. The student population taking the 1% assessment is a unique and varied group. Creating a standardized test to meet the needs of the population requires assessment techniques (checklists, portfolios, and structured observations) that require a great deal of teacher involvement and fall outside the more traditional types of psychometric analyses used for multiple-choice assessments. The role of the teacher in the assessment of the 1% population is a source of controversy because of their high level of involvement. In order to develop a standardized test that requires a great deal of input from the teacher, additional research studies to evaluate reliability and validity should be included as part of the test development process. Approaches that we have seen used include interrater reliability studies and validity audits. These types of studies provide evidence that the assessments are being administered as intended and that teachers are appropriately evaluating student performance. These approaches provide evidence that the results of the 1% assessments are a true indication of what the student can do rather than what the teacher says the student can do.

The federal government’s legislation to allow an optional 2% assessment (for students who are not able to progress at the same rate as their non-disabled peers) has been met with varying levels of acceptance. At this time only two states have received approval on their 2% assessments and 15 states are in the process of developing a 2% assessment. However, there is some talk that with the reauthorization of the Elementary and Secondary Education Act (ESEA) and the movement towards common core the 2% assessment will go away. The communication from the Department of Education in the context of Race to the Top and Common Core has been that the students participating in 2% assessments should be able to participate in the new general assessments developed for the state consortia. However, actual changes to the NCLB assessment and reporting requirements will have to be legislated, most likely during ESEA reauthorization.

Many states simply did not bother to develop a 2% test. It’s an expensive endeavor for an optional assessment. Those states that have developed (or are in the process of developing) a 2% test have struggled to find a cost-effective approach to setting modified achievement standards and to modifying or developing grade-level items that are accessible to this group of students.

There seems to be a need for an assessment that falls between the 1% test and the general test offered with accommodations. But there are differences in opinion about how students should be performing on the 2% test. If students perform poorly is that to be expected because they shouldn’t be able to perform well on grade level material or does that indicate that the test has not been modified enough for these students to be able to show what they know? If students perform well on the assessment does that mean that the modifications have been done well or that the wrong students are taking the test? We would like to think that the intent of the legislation was for states to develop a test that assesses grade-level content in such a way that students could be successful. Even so, we have heard the argument that if students taking the 2% test are not doing well they are still performing better than they would have if they had taken the general test.

Most 2% assessments developed or in development use a multiple-choice format, and traditional psychometric analyses associated with multiple-choice items work well here. But there have been discussions about what the data for a 2% test should look like. Of particular interest is whether a vertical scale should or could be developed for a 2% assessment. Recent investigations show that the vertical scale data do not look like vertical scale data seen on general assessments, but it is unclear whether this is a problem or whether this is to be expected. Our initial recommendation to one state was not to develop a vertical scale since the vertical scale focuses on a year’s worth of student growth and a year’s worth of growth for a 2% student may be very different from what we see in the general population. But after collecting vertical scale data for that 2% assessment, the data looked better than expected though not close enough to a general assessment vertical scale to recommend its implementation. Further research is being conducted.

Growth models for both the 1% and 2% assessments are also being developed. Again, the type of growth expected from the students taking these assessments is questionable, especially for the 1%. The question is how to capture the types of growth these students do show. Models are being implemented now, and we are curious to see what the evaluation of these models will show. Are we able to capture growth for these students?

As the lifespan of alternate assessments under NCLB has increased, they have received increasing scrutiny under peer review and by assessment experts. The tension between creating flexible, individualized assessments and meeting technical requirements for validity and reliability has led to increased structure, and often increased standardization, in the development of alternate assessments. Yet, for myriad reasons, alternate assessments do not, and should not, look like the current primarily multiple-choice format of general assessments. The unique nature of alternate assessments has allowed psychometrics and other research to better understand non-traditional measurement. Providing reliability and validity evidence for assessments with alternate and modified achievement standards has required innovative thinking; thinking that has already been informing assessment design ideas for the common core assessment systems, which are expected to be innovative, flexible, and, to some extent, performance based.

Natasha J. Williams, Ph.D.
Senior Research Scientist
Psychometric and Research Services
Pearson

No comments: