FALL 2006

In Practice:
Encouraging the Flight of Error

By Bob Boruch

Thomas Jefferson once observed of the increasing strength of evidence for Isaac Newton’s theory of gravitation that Newton “indulged in reason and experimentation, and error fled before them.”

At this late date, few would dispute the primacy of “reason and experimentation” in scientific inquiry. When we board an airplane, for instance, we do so with the expectation that the particular model of plane we’re flying has passed rigorous safety tests as required by the Federal Aviation Administration. When the doctor writes us a new prescription, we assume that the pharmaceutical company that developed the drug put it through the scientific tests required by the Food and Drug Administration.

The FAA and FDA are imperfect.  Nonetheless, our safety in their hands begs the question, Why, then, don’t we in the social and education sciences hold ourselves to the same standards?

Done properly, randomized fields trials (RFTs) can guarantee fair evaluations of the effectiveness of scientific and social interventions. Non-randomized trials, such as surveys, provide no such assurances. In fact, using conventional statistical methods based on surveys or other kinds of non-randomized trials can damage, making programs that are merely useless appear harmful or, conversely, making so-so programs look better than they really are.

Examples of such wrong-headedness abound. Take, for example, hormone replacement therapy. The earliest, non-randomized studies of such therapy for women suggested strongly that the therapy had positive effects. Millions of women signed on for the treatment. More recent, well-conducted trials have shown us that the effects of such therapy can be harmful.

In the realm of social interventions, Scared Straight programs in the United States had been thought, for at least a decade, to be an effective means of discouraging at-risk children from criminal careers. Indeed, a dozen non-randomized studies and lots of local newspaper reports provided “evidence” and testimonials to support that notion. A Campbell Collaboration review of eight of the best randomized trials of Scared Straight programs showed that, in fact, the program’s effects were other than promised. In particular, Scared Straight was demonstrated to have no effect on kids and, in some trials, increased the probability that juveniles would commit a crime. Scared Straight, despite the popularity of the idea, had no effect or created bad kids despite its good intentions.

Closer to home for educators is the case of Head Start. The earliest federally sponsored analysis—a non-randomized trial done by Ohio State and Westinghouse Learning Corporation in the late 1960s—produced evidence that the program’s impact was negative, not positive. But subsequent studies have found the original analysis was flawed, miscalculating what would have happened to Head Start children in the absence of the program.

So powerful are these examples that, in the arenas of medicine, social work, and elsewhere, some eminent scholars have come to believe that it is unethical to rely on non-randomized trials when the evidence gleaned will be used to inform policy and practice. Governments, too, increasingly subscribe to this world view, and as a consequence, a number of countries have initiated serious efforts to improve the evidence used in constructing evidence-based policy.

Mounting evidence
In this country, the U.S. Department of Education’s What Works Clearinghouse (WWC) was established in 2002 to review evidence on the effectiveness of branded education programs and curricula. Standards at the Clearinghouse are scientific and high. The WWC rounds up all the studies that have focused on a particular program—say, a much-heralded math curriculum—and conducts a systematic review of those studies to determine what we can claim with confidence about the curriculum’s effectiveness. For teachers, school administrators, policymakers, researchers, and the public, it provides a central source of scientific evidence of what works in education based on dependable evidence as opposed to the fragmentary and untrustworthy evidence we have settled for all too often.

Governments in other countries are also investing substantially in organizations that screen studies for their quality. They then disseminate the results of those assessments to people with a need to know—including politicians, teachers, and parents. Among others, the United Kingdom, Canada, the Netherlands, Belgium, Israel, the Nordic countries, and China have explored efforts to create such organizations and moreover to learn about how to design and execute RFTs.

Less industrialized countries are also in the game. In Mexico, Kenya, and India, for instance, high-quality randomized field trials are being mounted to understand how to reduce school drop-out rates. For such countries—where resources are scarce and needs abundant—the importance of accurate information is vital and a reliable understanding of what works critical to human capital development.

Mexico’s Progresa program stands out as one of the great milestones in evaluating international development. Launched in 1997, the program pays a subsidy to poor mothers in rural villages to send their children to school and take them to the doctor. The idea is to stem the country’s shockingly high drop-out rate, and all indications are that the strategy works. An interesting idea, certainly, but what may be the program’s most lasting contribution to the field is that, unlike so many international aid efforts, it has undergone rigorous scientific testing.

The Progresa program was designed in response to the large number of poor Mexican children—some 45 percent of rural students—abandoning school to contribute to the family income by working in the fields. Those drop-outs represented an enormous loss in human capital over many decades. In 2000, the program cost the Mexican government $1 billion to reach 2.5 million rural families, about a ninth of all families in the country.
At those prices, the Mexicans couldn’t really afford to cross their fingers and hope that Progresa worked. Its phased-in design gave the Mexicans an opportunity to conduct a controlled experiment, randomly assigning 320 villages to the program 20 months earlier than the control group of 186 villages. The results were impressive: transitions to secondary school increased by nearly 20 percent, with child employment rates declining by about 15 percent.
With high-quality evidence of Progresa’s effectiveness to bolster their cause, advocates of the program succeeded in preserving it after a change in administration. Indeed, when he came into office, Vicente Fox expanded the program. Renamed Oportunidades, it retained its main elements but was exported into urban Mexico—all with the help of a $1 billion loan from the Inter-American Development Bank.
Thinking globally
In the United States, the Institute of Education Sciences has been remarkable in providing funding for high-quality control trials, with well over 50 universities and non-profit research organizations conducting them. Again, other industrialized countries are following suit, as are a number of the less developed nations—partly at the urging of the World Bank, which recently has developed the courage to do so.

In a climate increasingly hospitable to randomized controlled trials, the time is ripe to create an organization aimed at fostering trials of social interventions on a global stage and, in the process, advancing the state of the art. There are, of course, voluntary organizations like the Campbell Collaboration, addressing the social, behavioral and educational arenas, and the Cochrane Collaboration, in the field of health care. Important as their work is, their reach is limited.

At the University of Pennsylvania, conversations have begun about establishing just such a global network on randomized trials—one that would take advantage of this institution’s considerable strengths in the field. Housed at Penn, that global network could bring together many talented faculty members who have contributed substantially to testing interventions with randomized trials. This university and especially its Graduate School of Education are beautifully positioned to undertake such an initiative. GSE’s strengths in this area are reflected by a strong roster—John Fantuzzo, Rebecca Maynard, Margaret Beale Spencer, Paul McDermott, Vivian Gadsden, Jon Supovitz, among them. Around campus, Penn faculty in criminology, economics, nursing, sociology, social policy, psychology, and communications have made contributions at least as remarkable as those made by GSE colleagues.

An international register of trialists, based at Penn, could go a long way toward consolidating common interests across the university. Part of the aim here is to reach well beyond our West Philadelphia neighborhood to enlist universities at home and abroad. The aim is also to reach other entities already engaged in this work, for example, the Nordic Campbell Collaboration Center, the Swedish Board on Health and Welfare, Canada’s Social Research and Development Corporation, among others, and ministries of countries served by the World Bank, International Monetary Fund, the Organization for Economic Cooperation and Development, and other multi-national organizations.

Dependable evidence and democracies
When we launch programs to redress poverty and inequity—in this country and in the developing world—we owe it to the people we are theoretically serving to get it right. No less than those of us taking our daily dose of aspirin, they deserve assurance that the interventions they’re subject to are effective in improving their life chances.

The world of domestic and international aid is littered with well-intentioned, failed programs, and there is no shortage of projects that are thought to do good but whose value remains uncertain. Randomized trials are essential to establish that these interventions do indeed work.

All of this is in the spirit of fostering thoughtful, democratic societies. The goal here is to assist governments and the people they represent in making sound decisions about where to expend social and financial capital.

With Thomas Jefferson, we would prefer error to flee more briskly. Its flight at any pace is an exciting prospect that, for an informed democracy, depends on reason and a society that experiments conscientiously, making honest failures and honing genuine successes.

Bob Boruch is the University Trustee Chair Professor at Penn GSE and the Statistics Department, WhartonSchool. He also co-chairs the steering committee of the Campbell Collaboration.

Research Notes

Penn GSE faculty and researchers explore the issues at the forefront of American education today—urban education, equity and diversity, educational opportunity and educational excellence, and the management of complex organizations. They engage in high-impact research, innovation, and training in public education, as well as in literacy, psychology, social policy, and higher and adult education. The following pages present a sampling of recent studies and findings from Penn GSE faculty and researchers.

“Academic Disaster Areas” Redux
In 1967, when the Harvard Educational Review published “The American Negro College,” by Christopher Jencks and David Riesman, the article dealt a stinging blow to Black colleges—labeling them “academic disaster areas.” Nearly 40 years later, Marybeth Gasman outlines the ways in which Black leaders defended the reputation of these institutions.
The response of the Black college presidents was coordinated and carefully structured. Among them, the presidents of the United Negro College Fund, Morehouse, Hampton Institute, and Dillard crafted a response that charged Jencks and Riesman with a variety of sins: a failure to understand the Black college community, questionable methodology and an over-reliance on anecdotal evidence, misleading institutional comparisons, and implicit racist assumptions.
Nearly a decade later, Charles Willie, an African American professor at Harvard GSE, tried another strategy: with several Harvard colleagues, Willie hosted a conference that was an explicit rejoinder to Jencks and Riesman. “A clever handler of the media,” as Gasman describes him, Willie saw the conference “as an opportunity to set the record straight” and used it to elicit apologies both from Harvard and Riesman.
Nonetheless, Jencks and Riesman’s long-ago commentary continues to shape the view of Black colleges today, and in the recent spate of press stories about the financial woes of Morris Brown, Bennett, and Texas Southern University, Gasman hears echoes of the 1967 coverage of Black colleges as “academic disaster areas.”
Gasman concludes, “The exaggerated claims of these news articles have gained national attention, jeopardizing the fundraising programs and, in some cases, the existence of the institutions in question. The historical efforts of the Black college leaders and of Black intellectuals to deflect Jencks and Riesman’s criticisms may point the way for current efforts to avert crisis. Charles Willie’s actions, on the other hand, were a good example of how scholars can use the media....”
Salvaging “Academic Disaster Areas”: The BlackCollege Response to Christopher Jencks and David Riesman’s 1967 Harvard Educational Review Article appears in The Journal of Higher Education, 77(2).

The Punishment Doesn’t Fit the Crime
Two high school students—call them Susie and Sarah—have violated their school’s discipline code, and both have received the same punishment: a one-day internal suspension. But few would argue that their respective offenses, eating outside the cafeteria for the second time and forgery, are of the same gravity.
Writing in School Discipline in Moral Decay, Joan Goodman presents the hypothetical case of Susie and Sarah to underscore her critique of current school disciplinary policies.
To construct her argument, Goodman drew both on disciplinary theory and on a study of 50 codes of conduct that are striking in their similarities. Goodman found that “discipline policies are weakly linked to the moral and educational purposes of schooling.... When all peccadilloes are perceived as morally offensive and responded to with punishments or, contrariwise, no behaviour is deemed morally offensive, worthy of no more than a corrective then, either way, discipline codes become trivial, losing potential moral clout.”
In concluding her critique of the ways schools think about discipline, Goodman argues for more transparency: “Going public with the moral goals of education would afford students with the opportunity to align themselves with moral purposes now obscure. Without such an alignment, students are likely to perceive much of school authority as illegitimate, punishment as undeserved, and obedience as involuntary.”
With goals clearly articulated, schools then need to ensure that the punishment fits the crime, if you will. Or, as Goodman explains, “Offences against school rules must be distinguished from moral wrongs ... [for] the blurring of ethical distinctions is extremely unhelpful to children’s moral development.”
Goodman argues for a restrained disciplinary system that minimizes the importance of school (as opposed to moral) rules and maximizes student participation in the process (i.e., through class meetings, student government and disciplinary bodies, alternative dispute resolutions). For Goodman, that restraint is all the more appropriate in light of  the complexity of children’s moral development—a process years in the unfolding: she writes, “Over a long development period children are not fully independent moral agents.... They make mistakes because they are young, not bad. Usually our interventions should offer support and guidance. Sometimes, however, they are culpable and a just, if mild, punishment is in order.”
This article appears in Journal of Moral Education, 35(2).

Black Youth and Depression
For urban Black adolescents, depression is on the rise. With recent research demonstrating the influence of racism stress on the mental and emotional health of young African Americans, Gwendolyn Davis and Howard Stevenson wanted to understand more about the relationship between depression and racial socialization.
Suspecting that adaptive racial socialization experiences can serve as a buffer against emotional distress for these young people, they studied 160 urban African-American adolescents enrolled in a summer job preparation program.
They found, among other things, that cultural pride socialization helped protect against low self-esteem and lethargy, that those especially alert to discrimination experienced a relatively high sense of helplessness, and that—as with so many things—gender made a difference.
But in what seemed at first a counterintuitive finding, Davis and Stevenson discovered that students encouraged to fit into the mainstream culture reported a greater number of depressive symptoms.
“It is our view,” the authors write, “that youth who primarily receive mainstream-fit socialization will be at a loss to emotionally manage the inherent contradictions of the American dream because of its illusory connections to Black culture, life, expression, and history. Many Black youth dream like mainstream America, but they can’t always live like mainstream America.”
This study is described in Racial Socialization Experiences and Symptoms of Depression among Black Youth, which appears in Journal of Child and Family Studies, 15(3), June 2006.

Is Aggression Catching?
Prior research—not to mention common sense—suggests that young children exposed to classrooms with high levels of student aggression may themselves adopt aggressive behaviors.
To explore this process in more detail, Duane Thomas and colleagues followed a sample of 4,907 children as they progressed from first through third grade. The researchers looked at demographic factors associated with exposure to high-aggression classrooms—school context (size, student poverty levels, and rural vs. urban location) and student ethnicity. They also examined whether exposure to aggressive behaviors in the critical first grade set kids up for lasting problems or whether repeated exposure had a more powerful impact.
What they found? African-American children in large, urban schools serving disadvantaged students were more likely than other students to land in high-aggressive classrooms. And, controlling for initial levels of aggression, the research showed that kids with multiple years of exposure showed higher levels of aggressive behavior than did children with less chronic exposure.
An article describing this research, titled The Impact of Classroom Aggression on the Development of Aggressive Behavior Problems in Children, appears in Development and Psychopathology, 18(2),

The Promise of Bilingual Education
            Writing in Nichols to NCLB: Local and Global Perspectives on U.S. Language Education Policy, Nancy Hornberger examines the impact of two landmark cases, Brown v. Board and Lau v. Nichols, on language education policy. Particularly since Lau and the passage of the Bilingual Education Act in 1968, the history of bilingual education in America has been characterized by periods of contraction and expansion that followed closely on shifts in the political climate—with the enactment of the No Child Left Behind legislation in 2002 marking a near-complete retrenchment.
But even in an NCLB era, Hornberger sees hope for keeping alive the promise of Nichols. She urges the bilingual educational community to redefine what may seem like “stop-gap implementational measures” as creative strategies for the future and to enlist scientifically based research to help advance multilingual education policies.
Finally, she writes, “it is high time for us in the U.S. and in other parts of the developed world to accept foreign aid from the developing world.... We can no longer afford to ignore the accumulating inspiration and insight available to us from the concrete experiences and experiments in multilingual education and multiliteracies pedagogy that are increasingly in evidence around the world.”
            This article appears in Working Papers in Educational Linguistics, 19(2).

Teaching English in Asia
Three Asian countries—South Korea, Japan, and Taiwan—have all introduced English language studies at the elementary school level in recent years. But implementation has varied widely. Writing in English in the Elementary School: Current English Language Education Policies in South Korea, Japan, and Taiwan, Yuko Goto Butler has combed data from government documents, research and news articles, and field observations to outline the different approaches taken in these three countries.   
In Korea, policies are largely centralized, with one national textbook, prescribed limits on the number of vocabulary words and the length of sentences used, and a variety of teacher training programs. But in Japan, English is an optional part of a “period of integrated study” rather than a mandatory subject, policies are created mostly at the local level, and a significant portion of instruction employs native English speakers. Taiwan takes a middle path, making English an official subject, with governmental curricular and professional development guidelines that can be adapted by local institutions.
Butler notes that all three countries share certain goals—and certain problems. None provides a clear argument for the benefits of beginning English instruction at earlier ages, policies have been developed mostly through trial and error, and research is only rarely shared among countries.
This article appears in Pleiades Journal of TYLE (December 2005).

When It Doesn’t Add Up
These days, parents sitting down to help their kids with their math homework may end up baffled—particularly if their children’s school has adopted a reform curriculum. Janine Remillard and Kara Jackson were particularly interested in how low-income parents negotiated that terrain and, as part of a larger study of parent-child numeracy connections, examined the experience of ten parents.
For all those interviewed, their efforts to help their children went beyond helping with homework—especially in teaching everyday mathematical tasks. But many found the formal school curriculum unrelated either to everyday math or to the math they themselves had learned in school. According to the researchers, these parents were hampered, in part, by the commonly accepted equation of mathematical knowledge with computational proficiency. What is more, they were given only limited opportunities to learn about the curriculum (Everyday Math) and the real-life connections that are one of its main features.
The irony, the researchers point out, is that while parents are expected to support their children’s learning, curricular reforms can all too often block their efforts. Remillard and Jackson conclude, “Excluding parents from the discourse of educational reforms will likely lead to the failure of those reforms.... They need opportunities to learn about the ideas behind the reforms as well as particular approaches taken by reform-inspired curricula.”
            Old Math, New Math: Parents’ Experiences with Standards-based Reform appears in Mathematical Thinking and Learning, 8(3). 

Stress Management
To learn how to reduce minority children’s vulnerability to stress, Margaret Beale Spencer, Suzanne Fegley, and Davido Dupree surveyed 699 students—primarily minority children age nine to 16—about the stress, risk, and protective factors they face every day.
            Their findings paint a nuanced picture of these children’s experiences. Girls reported more emotional and physical distress than did boys, but benefited considerably from involvement in sports and academics—suggesting that parents might help by encouraging their daughters to participate in sports and to excel in the classroom.
            Maladaptive coping, defined here as aggressive tendencies, increased with stress. For the boys surveyed, that dynamic meant becoming “tougher”—a concept that manifested as “callousness toward women, the sense that danger is exciting, and the notion that violence is manly.” Again, the authors point to the implications of these findings, suggesting that “stressors can be reduced by training boys to redefine their ideas of manhood, which can also reduce stressors for women who are part of their context.”
            Investigating and Linking Social Conditions of Minority Children and Adolescents with Emotional Well-Being appears in Ethnicity & Disease, 16, Spring 2006. For a copy, contact marges@gse.upenn.edu.

Mission Statements
Following a trend that originated in the corporate world, most colleges and universities now craft mission statements—with significant resources committed to the effort. But are they worth it?
In Mission Statements: A Thematic Analysis of Rhetoric Across Institutional Type, Christopher Morphew and Matthew Hartley investigate whether these documents articulate a shared purpose crucial to institutional structure or whether they’re merely “rhetorical pyrotechnics” with little relation to the real-life work of an institution.
After examining nearly 300 college and university mission statements, Morphew and Hartley looked at the frequency with which specific elements appeared across different types of institutions. While some ideas—“diversity,” “liberal arts,” “service”—were popular across institution type, institutional control (public or private) is more important in predicting elements than Carnegie Classification.
For example, “serves local area” was one of the most frequent elements for public institutions, but rarely appeared for private institutions, while “religious affiliation” was very common in private mission statements but not public ones. Differences emerged even when institutions used similar terminology. Although most schools stressed service, public institutions emphasized civic service to the region and preparing students to be citizens, while private institutions focused more on personal development to prepare students to “transform the world.”
These differences suggest that mission statements tend to express what an institution’s benefactors value, serving less as aspirational pronouncements or planning tools than as a means of communicating specific messages to specific audiences. Thus, these documents serve important legitimizing roles and their complex signaling reflects the realities of institutions’ environments.
This article appeared in the Journal of Higher Education (77:3).

“Still Separate and Unequal”
More than 50 years after Brown v. Board of Education, the 19 southern and southern-border states account for 41 percent of all college students nationwide—but 59 percent of all African-American students. Curious to look deeper into those statistics, Laura Perna and colleagues used the Integrated Postsecondary Education Data System to examine the status of equity for Blacks in enrollment and bachelor’s degree attainment at public higher education institutions in the South.
            Their findings show that, although there has been progress, public higher education in the South remains highly inequitable for African Americans, with race continuing to define access and opportunity. Depending on the kind of institution, prospects vary, with relatively greater opportunity at public four-year HBCUs and public two-year colleges and substantially less at flagship institutions. In all 19 states, public four-year HBCUs are the only sector in which African-Americans consistently approach or achieve equity.
            This article, The Status of Equity for Black Undergraduates in Public Higher Education in the South: Still Separate and Unequal, appears in Research for Higher Education, 47(2).

Inside a Teacher Community of Inquiry
            Since founding the Philadelphia Writing Project (PhilWP) in the late 1980s, Susan Lytle has conducted research about school-university teacher communities of inquiry, one of which—Project SOULL—serves as the basis for “The Literacies of Teaching Urban Adolescents in These Times.”
            An acronym for a Study of Urban Learning and Leading, Project SOULL was founded in the late 1990s by women who had been teaching in the School District of Philadelphia since the 1970s. Their purpose, explains Lytle, was “to investigate how teacher leaders in urban secondary schools define, enact, and assess leadership in relation to school change.” But the group’s focus expanded to include discussions not only about teacher leadership, but also about strategies for intervention in the service of social justice and equity for their students.
            The project saw two interwoven themes emerge about teaching urban adolescents: advocacy for students and what Lytle describes as “the pursuit of ‘professional intimacy.’” Judging from the stories of the SOULL teachers, the impetus behind their advocacy for students arose from the memory of their own experience as students and their curiosity about their students’ own stories. Interventions on the behalf of students ranged from the very personal (helping a student whose sister had committed suicide) to the administrative (challenging the inequitable administration of rules on lateness) to the curricular (participating in district-sponsored test construction).
            The term “professional intimacy” was coined by one of the SOULL participants to describe the connectedness to colleagues that enabled these teachers to deepen their practice. By and large, these connections centered around student-focused professional collaboration to help foster experimentation, to identify weaknesses in one’s practice, and analyze events and issues about students and school culture.
            For Lytle, these efforts are especially significant in the test-driven school culture spawned by NCLB. She writes, “Teachers like those in Project SOULL do not oppose standards, the need for highly qualified teachers, the assessment of outcomes, or policies that seek to rectify long-standing inequities.... What they resist is the gross oversimplification of the complexity of the task at hand, the proliferation of policies and high-stakes tests that fail to take into account that teaching is not fundamentally technical work, but rather ... a highly complex, deliberative, and adaptive process.”
            This piece appears in Reconceptualizing the Literacies in Adolescents’ Lives, edited by Donna Alvermann, Kathleen Hinchman, David Moore, Stephen Phelps, and Diane Waff (Mahwah, NJ: Lawrence Erlbaum Associates, 2006).

Making Evidence Matter
Researchers may welcome the trend toward evidence-based policy and practice but, says Rebecca Maynard, they “are far from a world in which evidence is routinely ... integrated into decision making.” Policymakers often ignore or misapply research because the evidence it offers can be contradictory or confusing.
Writing in a presidential address for the Association for Public Policy Analysis and Management, Maynard urges researchers to keep in mind what evidence is necessary and useful. The production of reliable evidence that is of use to policymakers means the inclusion of multiple forms of evidence and research that crosses disciplinary boundaries. Broad and numerous perspectives ensure that results fit together, saving time and further studies.
As policy research expands and improves, the issue of synthesizing evidence becomes vital. While comprehensive aggregation of research is common practice in medicine, it’s relatively new to education. Still, Maynard sees a future in which public policy research will be of great use to practitioners: “If the available evidence has been smartly synthesized, decision makers will at least understand the extent to which they are operating in uncharted territory, or a territory with equivocal, moderate, or strong support for the choices they are making.”
Evidence-Based Decision Making: What Will It Take for the Decision Makers to Care? appears in the Journal of Policy Analysis and Management 25(2).