By Jack Schneider (click here for the PDF)
For the past two decades, as a result of No Child Left Behind and its successor legislation, the Every Student Succeeds Act, unprecedented data have been made available to the public. These data include much that might inform interested parties about the performance of schools. Yet, at present, measurement and accountability systems—formal systems run by states, as well as the informal systems run by third-party groups—discourage all but the shallowest forms of engagement with data. This is largely due to the narrow tailoring of such systems and to the manner in which information is compressed. Before addressing those matters, however, it is important to consider why measurement and accountability systems look the way they look.
While there have been some notable exceptions—the calls for “data-driven decision making” being chief among them—the dominant theories of change behind the use of performance data have not emphasized the internal dynamics of schools. Arguably the most pervasive theory of change, often referenced as “consequential accountability,” posits that the nation’s schools will be strengthened through an increase in pressure (Kress, Zechmann, & Schmitten, 2011). That is, if educators and local leaders have reasonable fear of sanctions, they will improve their performance in response. The second of these theories of change, frequently referred to via the phrase “school choice,” emphasizes the role of markets rather than the state. Specifically, neoliberals and market-oriented conservatives have argued that the public education system will be strengthened by exposing schools to competition and encouraging families to consume education as they would any other good (Urquiola, 2016). In keeping with these theories, policy leaders have pushed for relatively simple measurement and accountability systems that rate and rank schools, and which tend toward algorithmically-determined results rather than human judgment (Schneider& Gottlieb, 2021).
Operating within a policy paradigm shaped by “consequential accountability” and “school choice,” state and federal leaders have sought to identify a limited number of quantitative measures for use in comparing schools against one another. Although this approach can be explained in a number of ways, including by pointing to the role of history in shaping present structures and cultures (Hutt & Schneider, 2018), such narrow tailoring raises serious questions about measurement validity. Validity, in this sense, describes the degree to which a particular approach or instrument measures the construct of concern—in this case, school quality (American Educational Research Association, 2014). School quality, as scholars have both theorized (Eisner, 2001) and substantiated through studies of the values held by the American public (Rothstein, Jacobsen, & Wilder, 2008; Schneider, 2017), is a multidimensional construct. Among other things, good schools promote critical thinking, are characterized by positive cultures, help students develop social and emotional competencies, foster appreciation for literature and the arts, and prepare students for civic life. Yet, most accountability systems are comprised of a small number of measures, relying chiefly on student standardized test scores in two subject areas—math and English (Education Commission of the States, 2018). What is measured, in short, does not align with what it purports to describe. Moreover, research demonstrates that school quality is not a uniform construct; measuring one tile does not adequately capture the entire mosaic (Gagnon & Schneider, 2019; Schneider, et al., 2021).
The narrowness of measurement and accountability systems would be problematic enough if they were not plagued by additional validity challenges. However, research has established the fact that performance on standardized tests is better predicted by student demographic variables than by school-based variables. As research finds, roughly two-thirds of achievement outcomes are explained by student and family background variables, with less than one-third being explained by school-based factors (Di Carlo, 2010; Haertel, 2013; Koretz, 2017). This does not mean that test scores are meaningless; measured differences between lower-scoring students and higher-scoring students do reflect differences in knowledge and skill. Yet, it equally does not mean that the schools attended by these students are underperforming.
To make matters worse, state measurement and accountability systems tend to compress information into a single summative rating like an A-F grade (Education Commission of the States, 2018). Supporters have argued that such practices are necessary due to limited public understanding of data and the importance of facilitating easy interpretation. Current summative ratings, however, only exacerbate existing validity challenges. By taking just a few of the many measures required to assemble a portrait of “school quality”—measures that typically correlate with demography (Schneider, et al., 2021)—and then offering a single rating, such systems are highly problematic. Even if they were to include a broader range of data that correlated less strongly with student demography, they would still conceal the various strengths and weaknesses within each school.
It is important to note that states are not the only actors in this field. In fact, they are bit players in comparison with third-party providers like GreatSchools.org, Niche.com, and now U.S. News and World Report, which recently announced a plan to begin rating elementary and middle schools. Given the fact that these companies rely on states for their information, these privately-run rating systems in many respects mirror their public counterparts, doing so with more design-savvy and orientation toward consumers. Though it could be argued that these systems are essentially no different than those run by states, there is one major difference in practice: the embedding of data in real estate websites. GreatSchools.org is the primary actor in this regard, and their ratings are embedded in the websites of Trulia, Zillow, and RedFin. Although GreatSchools.org has revamped its algorithm in response to public criticism, it still relies on a narrow range of data provided by the states, and its ratings still correlate strongly with demographic variables (Barnum, 2020).
Official state measurement and accountability systems can exacerbate school segregation by relying on measures that indicate more about demography than school quality—effectively steering families to Whiter and more affluent schools. By actually embedding their ratings into real estate websites, however, GreatSchools.org has the potential to do far more harm. Users of sites like Trulia, Zillow, and RedFin are offered a school rating filter, which, when set to a user-selected threshold, will systematically remove from the map of available homes any school that scores below that rating. In many cases, setting the GreatSchools filter to 5 (out of 10) can eliminate an entire community from consideration, suggesting to prospective homebuyers that there are no schools of reasonable quality in that area (Barnum & LeMee, 2019; Schneider, 2017). Given the strong correlation between demographic variables (including race) and test scores, and given the heavy reliance of such ratings on test scores, the use of such scores in real estate websites is tantamount to the kind of steering prohibited by federal law (Humber, 2020).
Not all families use these websites; anecdotal evidence suggests that they tend to be used more by White and middle-class families, which is in keeping with higher home-ownership rates among that population (e.g. Haughwout, et al., 2020). Yet, such limited usage does not reduce the risk that school ratings, particularly those embedded in real estate websites, will exacerbate segregation; in fact, it may increase that risk. Research demonstrates that families act on this information in ways that accelerate divergence in housing values, income distributions, education levels, and racial and ethnic composition across communities (Hasan & Kumar, 2019). If White and middle-class families are using information that they mistakenly believe is about school quality to instead choose homes in segregated neighborhoods, tremendous harm is being done. This harm is done chiefly by third parties, but it is made possible by state governments and federal law.
Two decades after the passage of No Child Left Behind, we have enough evidence to act. Yet, we have been stubbornly wedded not only to failed theories of change, but also to a set of cultural beliefs around testing (Hutt & Schneider, 2018). This is not to suggest that we must reject either measurement as a practice or accountability as a process. But, at present, neither fulfills its promise, and both are plagued by serious unintended consequences.
Alternatives exist. In states like California, Colorado, Georgia, Texas, and Massachusetts, efforts have been made to more fairly and more comprehensively measure school quality. In some instances, this has been coupled with experiments in visualizing and reporting on school performance—experiments designed to promote inquiry and dialogue rather than rating and ranking, and which also seek to advance racial and economic equity. Such efforts, however, must be supported by federal law, which presently constrains the nature of educational measurement and accountability. Moreover, they must be coupled with intentional efforts to advance the integration of our public schools, as well as of the neighborhoods in which those schools are located. There any many compelling reasons to improve our measurement and accountability systems—to restore the full mission of public education, to reduce incentives for gaming, to address the disproportional closure of schools serving racially and economically marginalized populations, etc. But, if we are serious about racial and economic integration, we must address the steering mechanisms that presently direct families to Whiter and more affluent schools, regardless of quality. ▀
Jack Schneider (firstname.lastname@example.org) is an Associate Professor of Education at the University of Massachusetts Lowell.
American Educational Research Association, American Psychological Association, and National Council on Measurement in Education. (2014). Standards for educational and psychological testing. Washington, DC: American Educational Research Association. https://www.testingstandards.net/uploads/7/6/6/4/76643089/standards_2014edition.pdf
Barnum, M. & LeMee, G. L. (2019, Dec. 5). Looking for a home? You’ve seen GreatSchools ratings. Here’s how they nudge families toward schools with fewer Black and Hispanic students. Chalkbeat. https://www.chalkbeat.org/2019/12/5/21121858/looking-for-a-home-you-ve-seen-greatschools-ratings-here-s-how-they-nudge-families-toward-schools-wi
Barnum, M. (2020, Sept. 24). GreatSchools overhauls ratings in bid to reduce link with race and poverty. Chalkbeat. https://www.chalkbeat.org/2020/9/24/21453357/greatschools-overhauls-ratings-reduce-link-race-poverty
Di Carlo, M. (2010, July 14). Teachers matter, but so do words. Albert Shanker Institute. https://www.shankerinstitute.org/blog/teachers-matter-so-do-words
Education Commission of the States (2018). States’ school accountability systems: State profiles. https://www.ecs.org/states-school-accountability-systems-state-profiles/
Eisner, E. W. (2001). What does it mean to say a school is doing well? Phi Delta Kappan 82(5): 367-372.
Gagnon, D. J. & Schneider, J. (2019). Holistic school quality measurement and the future of accountability: Pilot-test results. Educational Policy 33(5): 734-760.
Haertel, E. H. (2013). Reliability and validity of inferences about teachers based on student scores. Princeton, NJ: Educational Testing Service.
Hasan, S. & Kumar, A. (2019). Digitization and divergence: Online school ratings and segregation in America. SSRN: https://papers.ssrn.com/sol3/papers.cfm?abstract_id=3265316
Haughwout, A., Lee, D., Scally, J., & van der Klaauw, W. (2020). Inequality in U.S. homeownership by race and ethnicity. Federal Reserve Bank of New York. https://libertystreeteconomics.newyorkfed.org/2020/07/inequality-in-us-homeownership-rates-by-race-and-ethnicity/
Humber, N. J. (2020). In west Philadelphia born and raised or moving to Bel-Air: Racial steering as a consequence of using race data on real estate websites. Hastings Race & Poverty Law Journal 17: 133.
Hutt, E. L., & Schneider, J. (2018). A history of achievement testing in the United States or: Explaining the persistence of inadequacy. Teachers College Record 120(11): 1-34.
Koretz, D. (2017). The testing charade. University of Chicago Press.
Kress, S., Zechmann, S., & Schmitten, J. M. (2011). When performance matters: The past, present, and future of consequential accountability in public education. Harvard Journal on Legislation 48.
Rothstein, R., Jacobsen, R. & Wilder, T. (2008). Grading education: Getting accountability right. Washington, DC, Economic Policy Institute & New York, Teachers College Press.
Schneider, J. (2017). Beyond test scores: A better way to measure school quality. Cambridge, MA: Harvard University Press.
Schneider, J. (2017, June 14). What to know before using school ratings tools from real estate companies. Washington Post. https://www.washingtonpost.com/news/answer-sheet/wp/2017/06/14/what-to-know-before-using-school-ratings-tools-from-real-estate-companies/
Schneider, J., Noonan, J., White, R. S., Gagnon, D. J., & Carey, A. (2021). Adding “student voice” to the mix: Perception surveys and state accountability systems. AERA Open 7.
Schneider, J. & Gottlieb, D. (2021). In praise of ordinary measures: The present limits and future possibilities of educational accountability. Educational Theory 71 (4).
Urquiola, M. (2016). Competition among schools: Traditional public and private schools. In E. Hanushek, S. Machin, & L. Woessman (Eds.), Handbook of the Economics of Education 5, pp. 209-237. Elsevier.