Thursday, November 09, 2006

Educational "Case Control" Studies?

I attended the American Evaluation Association conference last week in Portland, Oregon. The weather was typical—raining again—but the conference inspired some creative thoughts, particularly on large-scale data research. My colleague from Pearson Education, Dr. Seth Reichlin, presented on efficacy research regarding instructional programs (e.g., online homework tutorials) and suggested that, perhaps, a better way to conduct such research was via "epidemiological studies." He went on to reference the usefulness of the massive database of educational data at the University of Georgia:

Integrated data warehouses have the potential to support epidemiological studies to evaluate education programs. The University System of Georgia has the best of these education data warehouses, combining all relevant instructional and administrative data on all 240,000 students in all public colleges in the state for the past five years. These massive data sets will allow researchers to control statistically for instructor variability, and for differences in student background, preparation, and effort. Even better from a research standpoint, the data warehouse is archived so that researchers can study the impact of programs on individuals and institutions over time.
The easiest way to think about this is via "case-control" methodology often used in cancer research. Think about a huge database full of information. Start with all the records and match them on values of relevant background factors, dropping the records where no match occurs. Match on things like age, course grades, course sequences, achievement, etc. Match these records on every variable possible, dropping more and more records in the process, until the only variables in the database left unmatched are the ones you are interested in researching. For this example, lets say the use of graphing vs. non-graphing calculators in Algebra. Perhaps, after all this matching and dropping of records is done, there are as many as 100 cases left (if you are lucky to have that many). These cases are matched on all the variables in the database, with some number (lets say 45) using calculators and the others (the remaining 55) taking Algebra without calculators. The difference in performance on some criterion measure—like an achievement test score—should be a direct indicator of the "calculator effect" for these 100 cases.

The power of such research is clear. First, it involves no expensive additional testing, data collection or messy experimental designs. Second, it can be done quickly and efficiently because the data already exists in an electronic database and is easily accessible. Third, it can be done over and over again with different matches and different variables at essentially no additional costs and can be done year after year such that true longitudinal research can be conducted.

In the past, databases may not have been in place to support such intense data manipulation. With improvements in technology, we see data warehouses and integrated data systems becoming more and more sophisticated and much larger. Perhaps now is the time to re-evaluate the usefulness of such studies.

No comments: