Wednesday, November 22, 2006

What Are We Trying to Measure? IPC and Compensatory Models

Two debates were of note at the Texas State Board of Education (SBOE) this week. There were more, but I only want to talk about two.

The first was the discussion about curriculum and course sequencing. The SBOE increased the rigors of both mathematics and science curriculum by requiring for graduation four years of math (including Algebra II) and advanced science courses. I applaud these efforts as research is quite clear that college readiness depends upon taking a rigorous set of high school courses. (For example, see the ACT Policy Paper: Courses Count.) Go figure! What I don’t understand is a science course called “Integrated Physics and Chemistry” (IPC). This course is a “survey” or “introduction” course that in one year teaches both physics and chemistry, with the idea being that students take a biology course, the integrated course, and then one of either chemistry or physics. This sounded odd to me, so I asked an expert—a teacher. She told me that physics and chemistry are really difficult (not surprising, I remember them being difficult when I was in school) and that IPC was an instructional "warm-up" or instructional bridge to physics and chemistry such that later on the students would be better prepared to take physics and/or chemistry. This logic has a degree of sense to it, but I wonder: Would it not be better to spend the time giving students the pre-requisite skills needed to be successful in physics and chemistry during their school career instead of defining a “prep course” to take prior to their engagement? Perhaps, but I am smart enough to listen to a teacher when she tells me about instruction, so I will take a wait-and-see approach. Stay tuned.

The second debate was really an old issue regarding compensatory models. The argument is simple from a “passing rates” perspective. Namely, students are required to pass four subjects by performing at or above the proficiency standard for each subject. If you allow a really good score in one area to compensate for a low (but not disastrous) score in another area, more students will pass. There were passionate and sound (and not so sound) arguments to use a compensatory model for this purpose (i.e., increased passing rates). Being a psychometrician, however, it is hard for me to reconcile this logic. Why should, on a standards-referenced assessment, good performance in say social studies compensate for poor performance in say mathematics? I claim that instead of muddying the waters with respect to the construct being measured, and if increased passing rates are the goal, do something to enhance learning or lower the standards. I am sure this last claim would get me run out of town on a rail. Yet, the notion of compensating for poor math performance via one of the other subject areas was a legitimate agenda item at the SBOE meeting. Again, go figure!

Thursday, November 09, 2006

Educational "Case Control" Studies?

I attended the American Evaluation Association conference last week in Portland, Oregon. The weather was typical—raining again—but the conference inspired some creative thoughts, particularly on large-scale data research. My colleague from Pearson Education, Dr. Seth Reichlin, presented on efficacy research regarding instructional programs (e.g., online homework tutorials) and suggested that, perhaps, a better way to conduct such research was via "epidemiological studies." He went on to reference the usefulness of the massive database of educational data at the University of Georgia:

Integrated data warehouses have the potential to support epidemiological studies to evaluate education programs. The University System of Georgia has the best of these education data warehouses, combining all relevant instructional and administrative data on all 240,000 students in all public colleges in the state for the past five years. These massive data sets will allow researchers to control statistically for instructor variability, and for differences in student background, preparation, and effort. Even better from a research standpoint, the data warehouse is archived so that researchers can study the impact of programs on individuals and institutions over time.
The easiest way to think about this is via "case-control" methodology often used in cancer research. Think about a huge database full of information. Start with all the records and match them on values of relevant background factors, dropping the records where no match occurs. Match on things like age, course grades, course sequences, achievement, etc. Match these records on every variable possible, dropping more and more records in the process, until the only variables in the database left unmatched are the ones you are interested in researching. For this example, lets say the use of graphing vs. non-graphing calculators in Algebra. Perhaps, after all this matching and dropping of records is done, there are as many as 100 cases left (if you are lucky to have that many). These cases are matched on all the variables in the database, with some number (lets say 45) using calculators and the others (the remaining 55) taking Algebra without calculators. The difference in performance on some criterion measure—like an achievement test score—should be a direct indicator of the "calculator effect" for these 100 cases.

The power of such research is clear. First, it involves no expensive additional testing, data collection or messy experimental designs. Second, it can be done quickly and efficiently because the data already exists in an electronic database and is easily accessible. Third, it can be done over and over again with different matches and different variables at essentially no additional costs and can be done year after year such that true longitudinal research can be conducted.

In the past, databases may not have been in place to support such intense data manipulation. With improvements in technology, we see data warehouses and integrated data systems becoming more and more sophisticated and much larger. Perhaps now is the time to re-evaluate the usefulness of such studies.

Thursday, November 02, 2006

Growth Models Are Still the Rage

I was fortunate enough to be able to attend a Senate Education meeting this past month; and I got to hear Bill Sanders articulate the virtues of value-added models and how they differ from growth models. While I did not agree with all the things the good Dr. Sanders reported—I found his arguments over-simplified—I have no issues with either growth models or value-added models. I do worry though that few people, particularly politicians, lobbyist and legislators, have had a chance to really think about and understand the differences and/or the implications of selecting and using such models, particularly in large-scale accountability programs. For example, Lynn Olson reported recently in Education Week that growth models, via the USDOE pilot program in 2006, do not help much:

“But so far, said Louis M. Fabrizio, the director of the division of accountability services in the state department of education, ‘it's not helping much at all.’

‘I think many felt that this was going to be the magic bullet to make this whole thing better, and it doesn't,’ he said of using a growth model, ‘or at least it doesn't from what we have seen so far.’”
In Tennessee, the home of value-added models, the picture is not much different according to Olson's report:

“In Tennessee, only eight schools' achievement of AYP was attributable to the growth model said Connie J. Smith, the director of accountability for the state education department. Tennessee uses individual student data to project whether students will be proficient three years into the future.

‘I was not surprised,’ Ms. Smith said. ‘It's a stringent application of the projection model.’

Despite the few schools affected, she said, ‘it's always worth doing and using a growth model, even if it helps one school.’”

Understanding complicated mathematical or measurement models is a long row to hoe. General confusion is likely to be enhanced with the great amount of research, reviews and press talking about gain scores, growth models, vertical scales, value-added models, etc. Hence, PEM has been trying to simplify some of the misunderstandings regarding such models via our own research and publications. Check out a recent addition to our website: An Empirical Investigation of Growth Models for a simple empirical comparison of some common models.

“This paper empirically compared five growth models in the attempt to inform practitioners about relative strengths and weaknesses of the models. Using simulated data where the true growth is assumed know a priori, the research question was to investigate whether the various growth models were able to recover the true ranking of schools.”