The Protocol for Language Arts Teaching Observations (PLATO) focuses on classroom instructional interactions and teacher practices that most centrally contribute to student achievement. PLATO was developed by Pam Grossman and colleagues at Stanford University to reliably measure instructional work that teachers do with students. PLATO was designed for use in English/Language Arts classrooms and validated with students in grades 3 through 9, but it has since been used in various content areas. It has been used to evaluate curricular interventions and teacher professional development programming, to diagnose teacher practice in schools and systems, to investigate the relationship between teacher practice and student outcomes, and as the basis of teacher professional development.
PLATO is a classroom observation protocol focused on middle and high school English/Language Arts (ELA) Instruction. The PLATO protocol includes twelve elements that encompass a number of key areas of ELA classroom instruction. The protocol was developed as a part of a classroom practices research study that differentiates more effective teachers, as measured by their impact on student achievement.
PLATO was developed by University of Pennsylvania Graduate School of Education (Penn GSE) Professor and former dean Pam Grossman and colleagues at Stanford University to measure reliably several aspects of instruction that teachers use with students in English/Language Arts (ELA) classrooms. PLATO is research-based and was developed after an extensive review of effective instruction in middle-grades ELA and literacy more generally.
In recent years, PLATO has been used successfully by researchers and practitioners beyond the content area of ELA. Work on adolescent literacy has focused on the importance of developing academic literacy skills across content areas (c.f. Snow & Biancarosa, 2003); a variety of reforms are targeting literacy instruction across content areas, such as science, social studies, and mathematics.
Prior research has suggested at least three dimensions that characterize classroom interaction: the relationships among teacher and students (c.f. Hamre & Pianta, 2001): the organization and use of classroom resources, including time and materials (c.f. Denham & Lieberman; La Paro, Pianta, & Stuhlman, 2004); and the instructional interactions that occur around content, including: the intellectual challenge of tasks assigned to students (c.f. Newmann, Lopez, & Bryk, 1998); the quality of instructional conversation, including teachers’ uptake and elaboration of student ideas (c.f. Nystrand, Gamoran, Kachur, & Prendergast, 1996; O’Connor & Michaels, 1993); and representations of content, including instructional explanations and the use of analogies or examples (c.f. Leinhardt, 2004).
PLATO focuses on instructional interactions that occur in classrooms and the practices that most centrally contribute to student achievement. While there is a growing consensus on effective approaches to early literacy instruction (c.f. Snow, Burns & Griffin, 1998), there is much less consensus about effective literacy instruction for secondary school students.
Much less is known about practices associated with English/Language Arts (ELA) achievement at the secondary level, although some research suggests that explicit instruction of comprehension and metacognitive strategies is effective at the middle and high school levels as well (Beck & McKeown, 2002; Greenleaf, Schoepenhauer, Cziko, and Mueller, 2001).
Alston, C. L. (2012). Examining instructional practices, intellectual challenge, and supports for African American student writers. Research in the Teaching of English, 112-144.
Blikstad-Balas, M., Roe, A., & Klette, K. (2018). Opportunities to Write: An Exploration of Student Writing During Language Arts Lessons in Norwegian Lower Secondary Classrooms. Written Communication, 35(2), 119–154. https://doi.org/10.1177/0741088317751123
Brevik, L.M. (2019). Explicit reading strategy instruction or daily use of strategies? Studying the teaching of reading comprehension through naturalistic classroom observation in English L2. Reading and Writing, 32, 2281–2310. https://doi.org/10.1007/s11145-019-09951-w
Cohen, J. & Brown, M. (2016). Teaching quality across school settings. The New Educator, 12(2), 191-218. https://doi.org/10.1080/1547688X.2016.1156459
Cor, M. K. (2011). Investigating the reliability of classroom observation protocols: The case of PLATO. Paper presented at the 2011 AERA annual meeting in New Orleans.
Cohen, J., & Grossman, P. (2016). Respecting complexity in measures of teaching: Keeping students and schools in focus. Teaching and Teacher Education, 55, 308-317. https://doi.org/10.1016/j.tate.2016.01.017
Grossman, P., Cohen, J., Ronfeldt, M., & Brown, L. (2014). The Test Matters: The Relationship Between Classroom Observation Scores and Teacher Value Added on Multiple Types of Assessment. Educational Researcher, 43(6), 293–303. https://doi.org/10.3102/0013189X14544542
Grossman, P. & Cohen, J. (2016). Respecting complexity in measures of teaching: Keeping students and schools in focus. Teaching and Teacher Education, 55, 308-317. https://doi.org/10.1016/j.tate.2016.01.017
Grossman, P., Loeb, S., Cohen, J., & Wyckoff, J. (2013). Measure for Measure: The Relationship between Measures of Instructional Practice in Middle School English Language Arts and Teachers’ Value-Added Scores. American Journal of Education, 119(3), 445-470. https://doi.org/10.1086/669901
Grossman, P., Cohen, J., & Brown, L. (2015). Understanding instructional quality in English Language Arts: Variations in PLATO scores by content and context. Designing teacher evaluation systems: New guidance from the Measures of Effective Teaching Project, 303-331. :10.1002/9781119210856.ch10
Joyce, J., Gitomer, D.H., & Iaconangelo, C.J. (2018). Classroom assignments as measures of teaching quality. Learning and Instruction, 54, 48-61. https://doi.org/10.1016/j.learninstruc.2017.08.001
Ivancevic, M. (2018). Vague feedback in English L2 classrooms-A study of feedback practices in seven video recorded classrooms in lower secondary school (Master’s thesis). Kane, T. J., & Staiger, D. O. (2012). Gathering Feedback for Teaching: Combining High-Quality Observations with Student Surveys and Achievement Gains. Research Paper. MET Project. Bill & Melinda Gates Foundation. https://files.eric.ed.gov/fulltext/ED540960.pdf
Klette, K., & Blikstad-Balas, M. (2018). Observation manuals as lenses to classroom teaching: Pitfalls and possibilities. European Educational Research Journal, 17(1), 129-146. https://doi.org/10.1177/1474904117703228
Lockwood, J. R., Savitsky, T. D., & McCaffrey, D. F. (2015). Inferring constructs of effective teaching from classroom observations: An application of Bayesian exploratory factor analysis without restrictions. The Annals of Applied Statistics, 9(3), 1484-1509. https://doi.org/10.48550/arXiv.1511.05360
Magnusson, C.G., Roe, A., & Blikstad-Balas, M. (2018). To What Extent and How Are Reading Comprehension Strategies Part of Language Arts Instruction? A Study of Lower Secondary Classrooms. Reading Research Quarterly, 54(2), 187-212. https://doi.org/10.1002/rrq.231
Mahan, K.R. (2020). The comprehending teacher: scaffolding in content and language integrated learning (CLIL). The Language Learning Journal. https://doi.org/10.1080/09571736.2019.1705879
Mahan, K.R., Brevik, L.M., & Ødegaard, M. (2018). Characterizing CLIL teaching: new insights from a lower secondary classroom. International Journal of Bilingual Education and Bilingualism, 1-18. https://doi.org/10.1080/13670050.2018.1472206
Mihaly, K. and Mccaffrey, D.F. (2015). Grade‐Level Variation in Observational Measures of Teacher Effectiveness. In Designing Teacher Evaluation Systems (eds T.J. Kane, K.A. Kerr and R.C. Pianta). https://doi-org.proxy.library.nyu.edu/10.1002/9781119210856.ch2
Mihaly, K., McCaffrey, D. F., Staiger, D. O., & Lockwood, J. R. (2013). A composite estimator of effective teaching. Seattle, WA: Bill & Melinda Gates Foundation. https://www.rand.org/pubs/external_publications/EP50155.html
Park, Y. S., Chen, J., & Holtzman, S. L. (2015). Evaluating efforts to minimize rater bias in scoring classroom observations. Designing teacher evaluation systems: New guidance from the Measures of Effective Teaching Project, 381-414. https://doi.org/10.1002/9781119210856.ch12
Polikoff, M.S. (2015). The stability of observational and student survey measures of teaching effectiveness. American Journal of Education, 121(2), 183-212. https://doi.org/10.1086/679390
Tengberg, M., van Bommel, J., Nilsberth, M., Walkert, M., & Nissen, A. (2022). The quality of instruction in Swedish lower secondary language arts and mathematics. Scandinavian Journal of Educational Research, 66(5), 760-777. https://doi.org/10.1080/00313831.2021.1910564
Dr. Brown has been working with the PLATO tool for nearly 15 years. She is a strong believer in the potential of teachers, and she investigates flexible and scalable strategies for teacher development.
Dr. Grossman is a leading expert in teacher preparation, teacher quality, and teacher professional development. Her work to identify and leverage high-impact teacher practices resulted in the development of the Protocol for Language Arts Teaching Observations.
In the early 2000s, research was consistently finding little to no link between some measures of teacher quality—SAT scores, prestige of certifying institution, whether the teacher had a graduate degree—and student achievement. This led to the question: if teacher quality was not a characteristic of teachers themselves, what were they doing in the classroom to produce variation in student learning? PLATO was developed as part of a grant to develop answers to this question, with support from the W. T. Grant/Spencer.
Early studies using PLATO investigated the quality of English/Language Arts (ELA) teaching in New York City schools. We found that certain ELA practices were more closely related to achievement but infrequently enacted and rarely at high levels of quality. Teachers’ content domain of ELA also mattered, with teachers exhibiting higher instructional practice in reading lessons than writing lessons. Research with the protocol also suggested important equity concerns: some practices seemed more effective at raising achievement for Hispanic and for African American students than for the overall student population. At the same time that these practices were closely related to achievement, Hispanic and African American students were less likely to receive such instruction.
PLATO has been used since in several research studies, including the Measures of Effective Teaching study by the Gates Foundation, the Understanding Teacher Quality study by the Educational Testing Service (ETS), and several studies in the Quality in Nordic Teaching (QUINT) center at the University of Oslo. Data from each research and rater training experience was used to modify and refine the measurement of PLATO practices and its accompanying training. PLATO is now in its fifth and final iteration.
In addition to its usage in research, the PLATO tool and its practices have been used as the basis of several teacher professional development projects at Stanford University, New York University, the University of Virginia, and the University of Michigan, among others.
PLATO includes a rubric with which to score twelve elements of English/Language Arts (ELA) instruction, as well as a checklist to capture activity structures and content focus. Each element was crafted to be as independent as possible from the others and to capture different aspects of classroom instruction. Observers score each element separately.
The twelve elements are divided into four main categories, based on theoretical and empirical factor analyses, and include:
Representation of Content is the element that focuses on the teacher’s ability and accuracy in representing ELA, science, and/or history content to students through effective and meaningful explanations, examples, and analogies, along with the conceptual richness of the teacher’s instructional explanations. At the lowest level, the teacher may introduce ideas (i.e. close reading, editing, symbolism), but either does not provide any examples or explanations or provides incorrect examples or explanations. At the highest level, the teacher provides clear and nuanced explanations and helps students distinguish between different but related ideas, and the instruction focuses on conceptual understanding of ELA, science, and/or history content.
Purpose attempts to capture both the coherence of the lesson around a communicated objective (internal learning goal) and the position of the lesson within a larger context (situated learning goal). The internal learning goal speaks to lesson structure and the relevance of classroom activities toward meeting a learning goal identified by the teacher. Situated purpose speaks to the future relevance to motivate the students to engage with the task at hand. The element focuses on whether the purpose of the lesson is made explicit by the teacher, is tied to the goals of ELA instruction, and is reflected in the activities undertaken by the class. At the highest level, an ELA-related purpose is clearly articulated, the lesson activities directly address and make progress toward the stated purpose, and the teacher or students check their progress toward achieving the purpose during and at the end of the lesson.
The element of Connections to Prior Academic Knowledge focuses on the extent to which new material is connected to students’ previous academic knowledge. At the high end, new material explicitly builds on prior academic knowledge to develop skills, strategies, and conceptual understandings within a knowledge domain in order to meet the lesson’s goals. At the lower end, connections may be made occasionally, but they do not advance student learning.
The element of Intellectual Challenge focuses on the intellectual rigor of the activities in which students engage during the instructional segment. Activities with high intellectual challenge ask students to engage in analytic or inferential thinking. Activities with low challenge, in contrast, only require students to engage in recall or rote thinking. Intellectual Challenge also depends on the level of analytic or inferential thinking demanded by the questions asked by the teacher during class activities.
Classroom Discourse focuses on the opportunities students have for extended ELA, science, and/or history-related talk with the teacher or among peers, and the extent to which the teacher and other students pick up on, build on, and clarify each other’s ideas. At the low end, the teacher does the majority of the talking and, if student talk is present, the teacher and students do not build on previous responses; rather, the talk is disconnected. At the highest level, students engage in elaborated, coherent, and focused discussions, in which the teacher and other students build on each other’s contributions and prompt each other to clarify and specify their ideas.
The element of Text-Based Instruction assesses the degree to which students engage in activities and discourse that are grounded in authentic texts. The element captures both the degree to which students use authentic texts and engage in the production of them. At the highest level, the teacher is using the text in the service of a larger goal: the development of readers and writers. Students actively use authentic texts for a sustained period of time to deepen their understanding of the text and wider genre and/or engage in writing authentic texts for a sustained period of time with attention to specific features of style and genre.
The element of Modeling and Use of Models focuses on the degree to which a teacher visibly enacts strategies, skills, and processes targeted in the lesson to guide students’ work before or while they complete the task, the extent to which they are analyzed or not, and whether they are used to illustrate for students what constitutes good work on a given task. The teacher might model metacognitive or discussion strategies, a think-aloud on how to identify theme, demonstrating how to support a statement with textual evidence, and so on. Modeling often includes think-aloud and role-plays. This element also includes the use of models to support students in completing the task at hand. At the high end, the teacher decomposes specific features of the process by using modeling or models to provide detailed instruction. At the low end, the teacher may simply refer to a model, without using it to provide instruction in the task at hand or visibly enacting the strategies, skills or processes that are targeted.
The element of Strategy Use and Instruction focuses on the teacher’s ability to teach strategies and skills that supports students in reading, writing, speaking, listening, and engaging with literature. ELA, science, and/or history strategies may help students complete such tasks as reading for meaning, generating ideas for writing, or figuring out the meaning of unfamiliar words. Strategy instruction does not include the teaching of rules (e.g., grammar/spelling rules, definitions of parts of a story). The teacher can use a variety of methods for teaching explicit strategies, including modeling strategies, providing opportunities for guided practice, etc. At the high end, teachers provide the opportunity for students to develop a repertoire of strategies and skills that they can use flexibly and independently, depending on their purpose. At the low end, where strategy instruction is minimal or insufficient, teachers may repeat definitions and rules when students are stuck.
The element of Feedback focuses on the quality of feedback provided in response to student application of ELA, science, and/or history skills, concepts, or strategies. Feedback includes comments on the quality or nature of student work as well as suggestions for how students can improve the quality of their work. At the high end, feedback is specific and targets the skills at the heart of the activity. The feedback helps students understand the quality of their work and helps students better perform the task at hand by addressing substantive elements of the task. At the low end, feedback consists of vague comments that are not clearly anchored in student work and suggestions for improvement tend to be procedural (i.e. focused on the instructions for the activity rather than the skills or knowledge that students are applying).
The element of Accommodations for Language Learning focuses on the range of strategies and supports that a teacher might use to make a lesson accessible to non-native English speakers or native speakers struggling to develop ELA, science, and/or history skills. These accommodations consider individual students’ levels of language proficiency and can include a strategic use of primary language, differentiated materials (pictures, other visuals, or hands-on materials), as well as graphic organizers and visual displays to make texts and instruction accessible to all students. At the high end, teachers effectively modify assignments and assessments so that all students successfully meet the ELA, science, and/or history goals for the lesson, despite their level of language proficiency.
The element of Behavior Management focuses on the degree to which behavior management facilitates academic work and is concerned with behavioral norms and consequences. This component does not presume that an ideal classroom is a quiet and controlled one. The key question is whether student behavior is appropriate for the task at hand; an “orderly” classroom will look different during a lecture than it would during small group work.
The element of Time Management focuses on the amount of time students are engaged in ELA, science, and/or history-focused activity. It looks at the teacher’s efficient organization of classroom routines and materials to ensure that little class time is lost and that instructional time is maximized. Periods of downtime may occur because of a lack of procedures for routines like getting into groups, passing out papers, or collecting work. In addition, behavior management issues may impact time management. For example, a teacher who spends a significant amount of whole-class activity addressing student misbehavior would be scored down on time management.
Observation cycles for PLATO include a 15-minute segment of observation, followed by approximately 10 minutes to score the segment.
Scores are given on a 4-point scale.
1= Provides almost no evidence
2= Provides limited evidence
3= Provides evidence with some weaknesses
4= Provides consistent, strong evidence
During an observation cycle, the PLATO scoring is designed to capture the average experience of a student in the classroom. Each element is made up of several components that combine to give the overall score for that element.
The observation begins at the start of a class period when the bell rings or something else happens to signal the start of class. Each observation cycle takes roughly 25 minutes. For 15 minutes, the observer takes notes on all aspects of the classroom, focusing especially on classroom instruction that matches the PLATO elements. After fifteen minutes, the observer stops observing the classroom and completes the PLATO scoring sheet that includes all twelve elements as well as the descriptors of content domains covered during that 15-minute period. The scoring sheet also captures other aspects of the classroom such as grouping of students and specific practices related to English Learners. During video observation, this may occur when the video is paused.
Scoring should take roughly ten minutes, depending on whether the observation is live or on video and the experience of the observer. After the scoring is complete, the observer starts fresh with a new observation cycle. Each segment of instruction captured during an observation cycle should be scored independently, even though it is likely to be occurring during the same class period. The observer will give a completely new score for each element based only on the current 15 minutes of instruction. Depending on the design of the study, it may be necessary to set a concrete amount of time for scoring (e.g., ten minutes). This will then allow for a constant number of observations for each class period and for consistency of segments for checking on inter-rater reliability.
In general, more observations will result in a more reliable estimate of instruction. At a minimum, we recommend at least three class periods (with two PLATO entries per period) to get a stable teacher-level score on all twelve elements. In order to obtain a more complete picture of the instruction and content covered in class, the observations should occur over one to two weeks.
To achieve reliable and accurate scoring, all observers must be certified to score in PLATO. Users are certified after they have completed the PLATO training and have achieved 70% exact match score on at least five samples of English/Language Arts instruction. More information on PLATO training can be found here.
PLATO training consists of self-paced online modules combined with one-on-one or small group remediation calls and independent practice.
The training package includes:
Cost of Training:
Training Structure, Schedule, and Time Commitment:
PLATO trainees will have access to training support for a total of 3 weeks. All training activities must be completed during this 3-week window
Request Access to PLATO Training
If you would like access to PLATO training or additional information, please contact us at PLATObservation@gmail.com about your training needs and a member of our team will contact you.
As with many observation tools, it is possible for trained raters to experience drift from the core PLATO constructs over time. Calibration exercises help ensure fidelity to the PLATO rubric and validate coding. Calibration is recommended especially for teams who will score over longer timelines (i.e. more than three months).
What is the calibration process?
The calibration process includes:
Cost of Calibration
PLATO training is valid for 18 months from certification. Raters who wish to use the protocol beyond this period must go through a recertification process.
What is recertification?
Recertification requires raters to rewatch the online training modules and take a recertification test. Remediation is available for raters who do not recertify on all elements.
Cost of Recertification
The cost of recertification is $300/rater to include:
Our training window is three weeks long. We recommend reserving the following amounts of time in each week of training:
Many PLATO trainees become reliable after this training window. However, some individuals require additional calibration. Additional remediation is offered at the discretion of the trainer and principal investigator/project lead and usually involves a time commitment of 2 hours per week.
PLATO scoring is based on classroom observation or class video review. Observation begins at the start of a class period when the bell rings or something else happens to signal the start of class. Each observation cycle takes roughly 25 minutes. For 15 minutes, the observer takes notes on all aspects of the class, focusing especially on instruction that matches the PLATO elements. After fifteen minutes, the observer stops observing and completes the PLATO scoring sheet that includes all 12 elements as well as the descriptors of content domains covered during that 15-minute period. The scoring sheet also captures other aspects of the instruction such as grouping of students and specific practices related to English learners. During video observation, this may occur when the video is paused.
Scoring should take roughly 10 to 15 minutes, although the length of time it takes to complete the scoring sheet may vary based on the observer. After the scoring is complete, the observer starts fresh with a new observation cycle. Each segment of instruction captured during an observation cycle should be scored independently, even though both cycles occurred during the same class period. The observer will give a completely new score for each element based only on the current 15 minutes of instruction. Depending on the design of the study, it may be necessary to set a concrete amount of time for scoring (e.g., ten minutes). This will then allow for a constant number of observations for each class period and for consistency of segments for checking on inter-rater reliability.
Analyses suggest that stable estimates can be achieved with a minimum of four distinct lesson observations per teacher. There should be at least one week between the first and last observations. Under average scoring conditions, a 45-minute class results in two segments for live observations and up to three for video observations.
PLATO certification is valid for 18 months from the successful completion of the reliability process. At this time, raters will be issued a certificate of reliability. To avoid rater drift and inaccurate scoring, project leads are asked to reach out to the PLATO training team when raters’ certifications expire. Trainers and project leads will develop a plan for recertification of raters depending on the project needs and timeline. Recertification typically consists of a series of practice videos and small-group debrief calls followed by a reliability test, but may also involve review of the original training materials.
If a team’s scoring period is to extend beyond six months after raters complete training, we recommend that raters engage in periodic calibration exercises to ensure fidelity to the PLATO tool. Project leads are asked to discuss potential calibration needs with the PLATO training team at the initiation of training.
No, we do not currently operate on a training-of-trainers model. If you would like to use the tool extensively or over a long period of time, please reach out to PLATObservation@gmail.com to discuss your particular needs.
Yes. While the tool was developed for use in English/Language Arts classrooms, many researchers and practitioners have found the tool applicable to their discipline. However, some elements or processes may not apply or may require adaptation. We request that you make these in partnership with our team and clearly write up any adaptations when you report your results.
Thank you for your interest! The PLATO training was developed for researchers and is designed to focus on rater reliability rather than professional development. It’s not an ideal resource for teacher development in its current state. However, many teachers and instructional leaders have found the tool useful for their professional development and we hope to develop additional virtual resources for this purpose in the future.
Trainees who do not pass the second reliability test may engage in ongoing remediation at the discretion of the project lead and trainers. If you have completed PLATO training but have not passed the second reliability test, please contact your project lead to clarify your status and potential next steps.
Reliability is measured element by element on a total of nine reliability segments over two tests. Raters must achieve a total score of 70% exact matches with master-scored segments in order to pass each element.
You may use selected elements for your project. However, it is mandatory that raters are trained on the entire tool regardless of the number of elements they will ultimately use for their project. This allows raters to have a full picture of the tool and to properly categorize evidence into the elements used and not used.
There are affordances and constraints to each modality. However, we do find that in-person observations provide slightly more access to information than observations by video. Our internal analyses have shown that the overall PLATO scores for video data are decreased by approximately one-tenth of a PLATO point when compared to live rater scores for the same lesson.
Our most complete validation data is from a generalizability study conducted in 2010 (Cor, M. K. (2011). Investigating the reliability of classroom observation protocols: The case of PLATO. Paper presented at the 2011 AERA annual meeting in New Orleans). These data were generated primarily by experienced raters, so the scores may differ from what you would see with new raters.
There are a few ways that the protocol has changed since this generalizability study:
Explicit Strategy Instruction is now titled Strategy Use and Instruction. The element still captures the same concept, but we found that people were consistently calling it "Explicit Instruction" which de-emphasized the idea of strategy that is integral to the element.
PLATO requires a relatively high level of training and accuracy to achieve the desired reliability results. Raters should be good note takers, analytic thinkers, and be very attentive to details. While familiarity with classrooms is useful, a long career in education can sometimes inhibit adoption of the PLATO “lens”. In general, we suggest prioritizing raters who are persistent, analytic, and detail-oriented.
In order to encourage its reliable and valid use, we currently do not share the full tool publicly. You are welcome to see certain elements within the tool by emailing PLATObservation@gmail.com.