By Jay P. Heubert (Click here to view the entire P&R issue)
The stated objective of the “standards” movement in American public education is to hold all schools, teachers and students to high standards of teaching and learning. Accountability can take many forms, one of which is tests, known as “high-stakes” tests because they are used in making decisions about which students will be promoted or retained in grade and which will receive high-school diplomas. While many agree that high-stakes testing will especially affect minority students, English-language learners and students with disabilities, there are disputes over whether the consequences will be beneficial or harmful.
Graduation and Promotion Testing
Graduation testing has gone through several stages of development in the U.S. and varies considerably from state to state. In the 1970s and 1980s, a number of states adopted requirements under which students had to pass “minimum competency tests” as a condition of getting high-school diplomas, even if the students had satisfied all other requirements for graduation. In the late 1980s and 1990s —responding in part to A Nation At Risk, a report that warned of “a rising tide of mediocrity” in American public education, and to the rise of today’s “standards” movement — some states replaced minimum competency tests with graduation exams measuring knowledge and skills at the 10th grade level or higher. At present, about 23 states — up from 18 in 1998 — require students to pass graduation tests, and the number is expected to increase to 29 by 2003. Of the 23, 14 now set graduation-test standards at the 10th grade level or higher.
Further, in response to concerns about “social promotion,” a growing number of states — 13, about twice as many as a year ago — now require students to pass standardized tests as a condition of grade-to-grade promotion. Moreover, many school districts, particularly in urban areas, have also adopted promotion-test policies. Thus, large numbers of the nation’s minority students and English-language learners are now subject to these state or local promotion-test programs.
Moreover, under current federal law, students with disabilities and English-language learners — whom many states and school districts have traditionally exempted from large-scale assessments — must now be included in state and local testing programs, with accommodation and alternative assessment where necessary. To serve this objective, states and school districts must not only assess such students but also publish disaggregated data on their performance. Significantly, federal law takes no position on whether states and districts should use test results to determine whether individual students will receive high-school diplomas or be promoted to the next grade.
Effects of High-Stakes Testing
Proponents of standards-based reform and high-stakes testing point out that blacks, Latinos, English-language learners, students with disabilities and poor children are among those who most often receive low-quality instruction, and who therefore have the most to gain from any movement that attempts to hold all schools, teachers and students to high standards of teaching and learning. Meanwhile, critics of high-stakes testing fear that such children will be disproportionately retained in grade or denied high-school diplomas because their schools do not expose them to the knowledge and skills students need to pass the tests. There is support for both positions, but the story is complex and the evidence incomplete.
Even on graduation tests that measure basic skills, for example, minority students and students with disabilities usually fail at higher rates than other students, especially in the years after such tests are first introduced. For example, in the 1970s, when minimum competency tests gained popularity, 20% of black students, compared with 2% of white students — a discrepancy of ten to one — initially failed Florida’s graduation tests and were denied high-school diplomas. And while many students with disabilities were excluded from state graduation-test programs, those who did participate failed at rates over 50%.
For a variety of reasons, failure rates typically decline among all groups in the years after a new graduation test is introduced. This was true of the early minimum competency tests; after a few years, for example, black failure rates were far lower than 20%, and failure rates for students with disabilities also declined. This also appears to be true for graduation tests adopted more recently. Texas, for example, which has a graduation test set at the 7th or 8th grade level, reports that pass rates of blacks and Latinos roughly doubled between 1994 and 1998, and that the gap in failure rates between whites, blacks, and Latinos narrowed considerably during that time. Even so, 1998 data from the Texas graduation tests show continuing disparities of three to one: cumulative failure rates of 17.6% for black students, 17.4% for Hispanic students, and 6.7% for white students.
Data for students with disabilities are harder to find, but they show a similar pattern. On the one hand, there is evidence that many students with disabilities do pass state tests in higher numbers over time: New York reports, for example, that the number of students with disabilities who passed the state’s English Regents exam in 1998-99 was nearly twice as high as the number who took the exam two years earlier. On the other hand, 1998 data from 14 states show gaps that remain quite high: students with disabilities consistently fail state graduation tests at rates 35 to 40 percentage points higher than those for nondisabled students.
An important, largely unanswered question concerns the extent to which improved pass rates on graduation tests actually reflect improved teaching and learning on the part of teachers and students. Such improvements are plainly one explanation, and the most desirable one. During the 1980s, however, when many states reported sharply improved pass rates on graduation tests, scores on the National Assessment of Educational Progress (NAEP) — a highly regarded nationally administered examination — showed little or no gain in student learning. Indeed, evidence that minimum competency tests were not producing improved student performance on the NAEP is one reason why the current standards movement emphasizes higher standards, and why some states have been raising graduation test standards. More recent 4th and 8th grade NAEP scores suggest improvements in student mathematics performance — especially for minority students and low-SES students — during the period 1990-96, particularly in some states (including Texas and North Carolina) that invested heavily in smaller class sizes, preschool programs and better resources for teachers. Gains reported on state tests continued to exceed the improvements measured by NAEP, however, and it is unclear to what extent improved 4th and 8th grade NAEP scores are due to high-stakes graduation testing rather than to such specific educational interventions.
What factors other than improved achievement may explain increased pass rates on state tests? First, test scores often increase, especially during the years after a test is first introduced, because teachers increasingly “teach to the test,” i.e., focus on subject matter and formats that appear on the test, and students become familiar with that test’s format. Second, some states may reduce initially high failure rates by making the state graduation tests easier or by setting lower cutoff scores that students must achieve to pass. Third, if low-achieving students are not part of the test-taking population, then the pass rates of those who remain will be higher — even if the achievement of those who actually take the test has not improved.
Thus, reported pass rates should be viewed in the context of such factors as: (a) dropout rates; (b) whether states count among dropouts (or include in test results) students who choose — or are encouraged — to leave school to pursue general equivalency diplomas; (c) exemption of students with disabilities or English-language learners from the test-taking population; and (d) excessive testing accommodations that may artificially inflate some students’ scores.
Not surprisingly, there is also a spirited debate about whether graduation testing causes increased dropout rates. On the one hand, it appears that many low-achievers start to disengage from school well before graduation tests loom. On the other hand, there are reputable scholars who argue — credibly — that fear of failing a graduation test increases the likelihood that low-achievers will leave school. (Such fears presumably are greater in states where graduation test standards are high.) Also, the current climate of accountability places new pressures on schools to increase student pass rates, which lead to increased and/or understated dropout rates. Unfortunately, this critical issue is complicated by a lack of uniformity among the states in defining and counting dropouts.
Given these complexities, it is difficult to draw firm, general conclusions — even regarding minimum competency tests — about the effects of graduation testing on minority students, students with disabilities and English-language learners.
In any event, the consequences of basic-skills graduation tests are becoming less relevant in the face of two important developments. One such development, already noted, is that more states are raising the bar: setting higher standards on state graduation exams. The most ambitious states are adopting graduation tests that reflect “world-class” standards, such as those embodied in NAEP.
Based on national NAEP data, about 38% of all students would fail tests that reflect such standards if the tests were administered today. For minority students and English-language learners, moreover, there is clear evidence that failure rates on tests embodying “world-class” standards would be extremely high — about 80% — at least at first. These predictions are consistent with recent data from Massachusetts, where students have begun taking graduation tests that reflect “world-class” standards. For students with disabilities, it is also reasonable to assume that initial failure rates on such tests would also be very high: in the 75-80% range.
Equally important, the proliferation of large-scale promotion testing, which is most pronounced in large, urban school districts, has led to sharply higher rates of retention in grade, especially for black students, Latino students and English-language learners. In New York City, Chicago and other cities, hundreds of thousands of students, the vast majority black, Latino, and/or English-language learners, have failed promotion tests and been retained in grade, and it is reasonable to expect that students with disabilities would also be retained in large numbers.
The single strongest predictor of whether students will drop out of school is whether they have been retained in grade. The rapid growth of promotion testing, particularly in our large cities, is therefore likely to create an increasingly large class of students — disproportionately comprised of blacks, Latinos, English-language learners, students with disabilities and low-SES students — who are at increased risk of dropout by virtue of having been retained in grade one or more times. Those retained in grade even once are much likelier to drop out later than are students not retained, and the effects are even greater for students retained more than once.
Promotion testing is thus likely to reduce, perhaps significantly, the numbers of students who remain in school long enough to take graduation tests, and to increase the numbers of students who suffer the serious consequences of dropping out. The effects of retention, moreover, may not be felt until years later. These potential consequences warrant more attention than they have received thus far.
Promotion and graduation testing may also have unintended consequences for teachers. High-stakes testing is intended to raise teacher motivation and effectiveness, and there is evidence that, with appropriate professional development, support, resources and time, teaching effectiveness can improve significantly. There is also evidence, however, that the negative publicity associated with poor test scores can lead experienced teachers to leave urban schools for the suburbs. Plainly, efforts to improve low-performing urban schools — and to educate all children effectively — will be undermined if those schools lose strong teachers.
Policies that lead to improved teaching and learning are likely to benefit minority students, English-language learners and students with disabilities even more than they do other students. New York State Education Commissioner Richard Mills, for example, has defended stringent graduation-test requirements partly because he hopes they will bring an end to low-track classes, in which students — most of them black students, Latino students and/or English-language learners — typically receive poor quality, low-level instruction from less-qualified teachers. There is very strong evidence that placement in typical low-track classes is educationally harmful for students, and that students in low-track classes would learn more if they were placed in more demanding classes. Disability rights groups likewise hope that state standards and tests will drive teachers to upgrade the individualized education programs (IEPs) of students with disabilities, so that IEPs reflect more of the knowledge and skills that nondisabled students are expected to acquire — and here, too, there is evidence that higher expectations and improved instruction lead to improved achievement.
Advocates for minority children and low-SES children hope that high standards will provide the political and legal leverage needed to improve resources and school effectiveness so that all children receive the high-quality instruction they need to be able to meet demanding academic standards. Moreover, some proponents of high-stakes testing argue that fear of negative consequences — retention or diploma denial for students, negative publicity and (in rare instances) adverse personnel action for educators — can be a positive force, serving to increase the motivation to teach and learn effectively.
Standards of Appropriate Test Use: Widely Accepted, Often Ignored
Whether graduation testing helps or hurts low achievers depends largely on whether such tests are used to promote high-quality education for all children — the stated objective of standards-based reform — or to penalize students for not having knowledge and skills they have not been taught in school. If high-stakes tests are used properly — a very big “if” — they can help leverage improved instruction.
Norms of appropriate test use have been articulated by the testing profession, the National Research Council (NRC) and the American Educational Research Association (AERA). For example, the December 1999 Standards for Educational and Psychological Testing (by the AERA, the American Psychological Association and the National Council on Measurement in Education) asserts that promotion and graduation tests should cover only the “content and skills that students have had an opportunity to learn.” The Congressionally-mandated NRC study, High Stakes: Testing for Tracking, Promotion, and Graduation, reached a similar conclusion in 1999: “Tests should be used for high-stakes decisions . . . only after schools have implemented changes in teaching and curriculum that ensure that students have been taught the knowledge and skills on which they will be tested.” So did the AERA in its July 2000 Policy Statement Concerning High Stakes Testing.
Unfortunately, there often are discrepancies between what high-stakes tests measure and what students have been taught. Results of a recent ten-state study suggest that there is surprisingly little overlap between a state’s standards and what teachers in the state say they are actually teaching students. The actual overlap ranged from a low of 5% to a high of 46%, depending on the subject, grade level and state. Such discrepancies are likely to be high for minority students, English-language learners and students with disabilities, if only because such students often lack access to high-quality instruction.
Similarly, as noted above, increasing numbers of states and school districts automatically deny promotion or high-school diplomas to students who fail state tests, regardless of how well the students have performed on other measures of achievement, such as course grades. The NRC study emphasizes that educators should buttress test score information with “other relevant information about the student’s knowledge and skills, such as grades, teacher recommendations, and extenuating circumstances” when making high-stakes decisions about individual students. This is consistent with the Standards for Educational and Psychological Testing,which states that “in elementary or secondary education, a decision or characterization that will have a major impact on a test taker should not automatically be made on the basis of a single test score. Other relevant information … should be taken into account if it will enhance the overall validity of the decision.” Similarly, the July 2000 AERA Policy Statement provides that “[d]ecisions that affect individual students’ life chances or educational opportunities should not be made on the basis of test scores alone….”
Why is it so important to use multiple measures in making important decisions about individuals? The answer is that any single measure is inevitably imprecise and limited in the information it provides. Proponents of high-stakes testing sometimes point out the problems associated with exclusive reliance on student grades in making promotion and graduation decisions: there has been considerable grade inflation during the last three decades, and there is considerable variation among teachers, schools and school districts in what particular grades mean. Their points are well taken.
At the same time, large-scale tests also are limited in what they measure. It is well known, for example, that standardized tests do not measure student motivation over time, even though such motivation is important to later success. Moreover, even the best standardized tests are far less precise than most people realize. Given the imprecision of grades and test scores, judgments based on combinations of both are more accurate and reliable than those based on either by itself.
To complicate matters, there is at present no satisfactory mechanism for ensuring that test developers, states and school districts respect even widely accepted norms of appropriate, nondiscriminatory test use. The two existing mechanisms — professional discipline through the organizations that develop test-use standards and legal enforcement through courts and administrative agencies — have complementary shortcomings. The professional associations that define appropriate test use have detailed standards, but they lack mechanisms for monitoring or enforcing compliance with those standards. For courts and federal civil rights agencies, the reverse is true: they have complaint procedures and enforcement power, but lack specific, legally enforceable standards on the appropriate use of high-stakes tests. Recognizing the problem, the U.S. Department of Education’s Office for Civil Rights has produced a draft resource guide that, while not legally binding, aims to promote appropriate use of high-stakes tests.
In conclusion, the standards movement and high-stakes testing present both opportunities and risks to students of color, English-language learners and students with disabilities. These students are among those who stand to benefit most if states and school districts insist that all schools and teachers provide high-quality instruction to all students. Such students are also at great risk, however, especially in states that administer high-stakes promotion and graduation tests before having made the improvements in instruction that will enable all students to meet the standards. As noted above, if graduation tests embodying “world-class” standards were implemented today — when far too many students do not receive “world-class” instruction — students of color, English-language learners and students with disabilities would be denied high-school diplomas at rates of 75-80%, rates that are plainly unacceptable, for those students and for our entire society.
The key, then, is for students to have an opportunity to acquire the relevant knowledge and skills before individuals suffer high-stakes consequences such as retention in grade or denial of a regular high-school diploma. On this point, there is agreement among the authors of Standards for Educational and Psychological Testing, the Congressionally-mandated NRC study and the July 2000 AERA Policy Statement. This has important implications for all low-achieving students, and special consequences for some. For students with disabilities, it will be necessary to revisit IEPs to make sure that all students subject to high-stakes tests are taught the relevant knowledge and skills. English-language learners should get the opportunity to acquire high levels of English proficiency as well as the other knowledge and skills that high-stakes tests measure.
Unfortunately, there are some test developers, states and school districts that do not appear to be observing these and other norms of appropriate test use. Is this problem due to insufficient knowledge about norms of appropriate test use, or are states under political pressure to disregard such norms as a national debate over educational accountability rages? Will states also face political pressure to back away from tests that lead to very large numbers of students being retained in grade or denied regular high-school diplomas? The prospect of high failure rates has already produced a backlash against high-stakes testing programs in some states. Lawsuits are beginning, if only because there exists no viable alternative by which to ensure appropriate use of graduation and promotion tests.
All these questions call for additional research. There also remains a need for significantly improved data on the effects of high-stakes testing on student achievement and dropout rates, for students generally, and for such important groups as students of color, English-language learners and students with disabilities.
The stakes are high indeed.
Jay P. Heubert is Associate Professor at Teachers College, Columbia University and adjunct professor at Columbia Law School. He directed a congressionally-mandated study of high-stakes testing for the National Academy of Sciences, and for five years was a trial attorney in the education section of the Civil Rights Division, U.S. Department of Justice. jay.heubert@columbia.edu.
Notes:
This is an edited version of an extensively documented article, titled “High-Stakes Testing: Opportunities and Risks for Students of Color, English-Language Learners, and Students with Disabilities,” to appear in M. Pines, ed., The Continuing Challenge: Moving the Youth Agenda Forward, Policy Issues Monograph 00-02, Sar Levitan Center for Social Policy Studies, Johns Hopkins University.
[1795]
Jay Heubert (jay.heubert@ columbia.edu.) is Associate Professor at Teachers College, Columbia Uni- versity and adjunct professor at Co- lumbia Law School. He directed a con- gressionally-mandated study of high-stakes testing for the National Academy of Sciences, and for five years was a trial attorney in the education section of the Civil Rights Division, U.S. Department of Justice