Tuesday, August 31, 2010

Lessons Learned from a Psychometric Internship

“Meetings. Meetings. Meetings. How do these people get any work done?” This was one of my first impressions of Pearson. Well, as I quickly learned, much of the work gets done through the meetings. By including everyone with a need to know, and by meeting frequently, projects are kept moving forward smoothly with no items being overlooked or persons left out. Not only does this contribute to successful projects, but it helps build an esprit de corps where everyone realizes the value of their own and each others contribution to getting the job done.

Most images of a psychometric internship center around doing nothing but research. Mine was not like that. It was balanced between doing production work and research. As a result, I developed a better (and more favorable) understanding of a large scale psychometric testing process than I had ever imagined. My first week I started coding the opened answers from a survey administered to teachers to get feedback on how to improve the testing process for students with limited English who have language accommodations on tests. This was followed in later weeks by observing meetings with the Texas Technical Advisory Committee (invited scholars who provide insight into questions that Pearson and the Texas Education Agency have on the testing process), with panels of teachers invited to review new questions before they are put into the test bank, and with other panels who performed peer review of alternate assessments. Some of what goes into creating these panels and getting them together to do their work, I picked up from listening to the people who sat in adjacent cubes. There was a continuous conversation with school administrators about panelist selection, and with panelists about the mechanics of their coming to Austin to work for a few days. It was really amazing.

The research I did I was not able to complete. (I needed another month.) A disappointment, but I have topics to work on in the future; an appreciation of the breadth and depth of psychometrics that I did not have before, and professional and personal contacts with some of the sharpest people in the business.

C. Vincent Hunter
Psychometric Intern
Test, Measurement & Research Services
Pearson

Tuesday, August 24, 2010

Reflecting on a Psychometric Internship

The summer of 2010 was an unusual one in my career life: I was fortunately selected as a summer psychometric intern working in Iowa City. The short eight weeks turned out to be my most productive and memorable period during my years of graduate study. Not only did I develop two AERA proposals out of the projects I participated in, but I got to know a group of wonderful researchers in the educational measurement field. The hearty hospitality, thoughtful consideration and gracious support that I received from the group exceeded what I had expected.

I very much appreciate what I have learned from the weekly seminars that covered a broad range of concurrent hot topics and techniques on which leading testing companies continue making efforts. One-on-one meetings with experienced psychometricians about hands-on practices in K-12 testing projects have gradually broadened my views towards academic research and industrial operations. By seeing the practical needs emerging from testing operations, I think that the most valuable gain for me is to realize how much potential exists in the research of educational measurement.

Jessica Yue
Psychometric Intern
Test, Measurement & Research Services
Pearson

Wednesday, August 04, 2010

Educational Data Mining Conference -- Part Two

Exploratory learning environments, by their nature, allow students to freely explore a target domain. Nonetheless, the art of teaching compels us to provide appropriate, timely supports for students, especially for those who don’t learn well in such independent situations. While intelligent tutoring systems are designed to identify and deliver pedagogical and cognitive supports adaptively for students during learning, they have largely been geared to more structured learning environments, where it is fairly straightforward to identify student behaviors that likely are or aren’t indicative of learning. But what about providing adaptive supports during exploratory learning?

One of the key note lectures at the Third International Conference on Educational Data Mining conference in Pittsburgh in June (discussed in a previous blog posting) covered this topic. Cristina Conati, Associate Professor of Computer Science at the University of British Columbia, described her research using data-based approaches to identify student interaction behaviors that are conducive to learning vs. behaviors that indicate student confusion while using online exploratory learning environments (ELEs). The long term goal of her work is to enable ELEs to monitor a student’s progress and provide adaptive support when needed, while maintaining the student’s sense of freedom and control. The big challenge here is that there are no objective definitions of either correct or effective student behaviors. Thus the initial effort of her team’s work is to uncover student interaction patterns that correlate with, and thus can be used to distinguish, effective vs. ineffective learning.

Core to this “bootstrapping” process is the technique of k-means clustering employed by Conati and her team. K-means clustering is a cluster analysis method to define groups (e.g., students) that exhibit similar characteristics (e.g., interaction behaviors), and is commonly used in educational data mining research. Data from student use of two different college-level ELEs were used in the study: AIspace, a tool for teaching and learning artificial intelligence (AI), and the Adaptive Coach for Exploration (ACE), an interactive open learning environment for exploration of math functions. The data sets consisted of both interface action only or interface action with student eye-tracking. Identification of groups as high learners (and therefore presumably exhibiting largely effective behaviors) vs. low learners (and therefore presumably exhibiting largely ineffective behaviors) was determined either by comparing students’ pre- and post-test scores or through expert judgment. Formal statistical tests were used to compare clusters in terms of learning and feature similarity.

In the end, the data permitted distinction of two (one high learner and one low learner) and three (one high learner and two low learner) groups (i.e., k=2 and k=3) as a function of student behaviors. Differential behavior patterns include:

Low learners moved more quickly through exercises, apparently allowing less time for understanding to emerge.
High learners paused longer with more eye-gaze movements during some activities.
Low learners paused longer after navigating to a help page.
Low learners chose to ignore coaches’ suggestion to continue exploring current exercise more frequently than high learners.
Low learners appeared to move impulsively back and forth through the curriculum.

In summary, the research shows promise for k-means clustering as a technique for distinguishing effective from ineffective learning behaviors, even during unstructured, exploratory learning. Of course, this work is just a start. For example, additional research with larger numbers of students (24 and 36 students were used in the current studies) might support distinguishing of additional groups — should such additional groups exists. In the end, the hope is that by identifying patterns of behaviors that can serve as indicators of effective vs. ineffective learning, targeted, adaptive interventions can be applied in real-time to students to support their productive learning while maintaining the freedom that defines ELE learning.


Bob Dolan, Ph.D.
Senior Research Scientist, Assessment & Information
Pearson

Monday, August 02, 2010

Educational Data Mining Conference -- Part One

The Third International Conference on Educational Data Mining was held in Pittsburgh in June. The conference began in Montreal two years ago as an offshoot of the Intelligent Tutoring Systems Conference (held this year in Pittsburgh immediately following). A small (approximately 100 participants), single-track conference, participants are mostly academicians in the fields of cognitive science, computer science, and artificial intelligence (AI), most or all of whom have dedicated their efforts to education research.

Educational data mining (EDM) is the process of analyzing student (and in some cases even educator) behaviors for the purpose of understanding their learning processes and improving instructional approaches. It is a critical component of intelligent tutoring systems, since there is an implicit realization in this field that unidimensional models of student knowledge and skills are generally insufficient for providing adaptive supports. That said, the results from EDM go beyond informing Intelligent Tutoring Systems (ITS) on how to do their job. For example, they can be a cornerstone of formative assessment practices, in which we provide teachers with actionable data on which to shape instructional decisions. In fact, few would argue that the most successful ITS is one that not only provides individualized opportunities and supports to students in real-time but also keeps the teacher actively in the loop.

Examples of the types of data used in EDM include:

Correctness of student responses (of course!)
Types of errors made by students
Number of incorrect attempts made
Use of hints and scaffolds
Level of engagement / frequency of off-task behaviors (as measured through eye-tracking, student/computer interaction analysis, etc.)
Student affect (as measured through physiological sensors, student self-reports, etc.)
The list goes on ...

Much of EDM research focuses on identification of how students cluster into groups based upon their behaviors (K-Means is particularly popular, though by no means exclusive). For example, it might be found that a population of students working on an online tutoring system seems to divide into three groups -- high-performing, low-performing/high-motivated, and low-performing/low-motivated -- with each group exhibiting distinguish patterns of interaction and hence learning. The types of supports offered to students in each of these groups can, and should, vary as a function of this clustering.

As efforts to bridge the divide between instruction and assessment get underway, such as Federal Race To The Top Assessment program, it is important that the educational testing research community stay on top of the developments from the EDM research community, to best understand the types of data to collect, techniques for analysis, and their potential for improving educational opportunities for students.


Bob Dolan, Ph.D.
Senior Research Scientist, Assessment & Information
Pearson