Press Releases

January 14, 2016

Study calls into Question the Effectiveness and Fairness of Many High-Stakes Teacher Evaluations

January 14, 2016

Matthew Steinberg's research shows evaluations based on observing teachers in the classroom often fail to meaningfully assess teacher performance.

Media Contact: 

Jeff Frantz, Penn GSE Associate Director of Communications | 215-898-3269

Teachers may not have a level playing field when it comes to one of the most commonly used tools for high-stakes job performance evaluations — classroom observation of teachers’ instructional practice.

New research from the University of Pennsylvania Graduate School of Education (Penn GSE) and the American Institutes for Research (AIR) shows evaluations based on observing teachers in the classroom often fail to meaningfully assess teacher performance. The study, published this week in Educational Evaluation and Policy Analysis, significantly contributes to the ongoing policy debate over when and how teachers should be evaluated.

Researchers Matthew Steinberg, from Penn GSE, and Rachel Garrett, from AIR, found that the achievement of students entering a teacher’s classroom accounts for much of a teacher’s evaluated performance. This means that prior academic achievement of students is a significant predictor of teacher success in the high-stakes evaluation system.

“When information about teacher performance does not reflect a teacher’s practice, but rather the students to whom the teacher is assigned, such systems are at risk of misidentifying and mislabeling teacher performance,” Steinberg and Garrett wrote.

 Among Steinberg and Garrett’s findings:

  • Math teachers were six times more likely to be among the top performers when assigned students who were the highest achievers the previous year. English language arts (ELA) teachers with high achievers in their classroom were twice as likely to be among top performers.
  • Only 37 percent of ELA teachers and 18 percent of math teachers assigned the lowest-performing students were highly rated based on classroom observation scores.
  • When teachers were assigned a class with higher incoming achievement, they were more likely to see an increase in their measured performance.
  • Teachers with higher achieving students are rated higher in “communicating with students” and “engaging students in learning.” These areas reflect teacher interaction with students, so they tend to be student dependent.
  • However, measures that depend more on the instructional strategies teachers bring to the classroom — “using questioning techniques” and “assessment to drive instruction” — were largely uncorrelated with student achievement.

Based on their results, Steinberg and Garrett caution that using observation-based measures for high-stakes teacher accountability without understanding and accounting for classroom composition will skew results, with potentially significant consequences.

“The misidentification of teachers’ performance levels has real implications for personnel decisions, and fundamentally calls into question an evaluation system’s ability to effectively and equitably improve, reward, and sanction teachers,” wrote Steinberg and Garrett.

Steinberg and Garrett reviewed data from a previous study that looked at six school districts over two years, including the New York City Department of Education. This Measure of Effective Teaching study randomly matched teachers to classrooms, differentiating between grade and if a teacher is a subject matter generalist or specialist.

The study comes as schools across the country have stepped up efforts to evaluate teachers. By the start of the 2014–2015 school year, 78 percent of states and 85 percent of the largest school districts and the District of Columbia had implemented teacher evaluation reform.