Do small classes make a difference in the
academic achievement of elementary school students? From the
attention given this subject by politicians, it would be reasonable
to assume that class size has been shown to be essential to good
academic outcomes. Congress, for example, allocated $1.3 billion
for the "Class Size Reduction" provision of the Elementary and
Secondary Education Act (ESEA) in fiscal year 2000. The Clinton
Administration has requested even more funding for FY
2001.1 And there are proposals to pump large sums of
money into efforts to increase the number of teachers in public
elementary schools in order to decrease the ratio of students to
teachers.2
This
report uses data from the 1998 National Assessment of Educational
Progress (NAEP) reading examination to analyze the effect of class
size on academic achievement. The NAEP provides the most
comprehensive database on educational outcomes available to
researchers. Among the major findings of this analysis of NAEP data
are that:
Background
Most
Americans believe that educating children in smaller classes would
improve educational outcomes. Indeed, according to an NBC
News/Wall Street Journal poll taken in March 1997, some 70
percent of adults believe that reducing class size would lead to
significant academic improvements in public
schools.3
But
elementary and secondary school class sizes have fallen steadily
over the past few decades. In 1970, public schools averaged 22.3
students per teacher nationwide. By the late 1990s, however, public
schools averaged about 17 students per teacher, due to a
combination of demographic trends and conscious policy decisions to
lower the ratios.4
Over
the same period, however, academic achievement, as measured by the
NAEP exam, stayed relatively constant. Achievement for all three
grades (fourth, eighth, and twelfth) that take the NAEP tests may
vary slightly from year to year, but as shown in Chart 1, the
average score on the reading test has changed very little over the
past 25 years. At face value, this record of "stability" may not be
sufficient evidence to conclude that the decline in class size has
had no influence on test scores. It does, however, illustrate the
trend in academic achievement over time in America's
schools.5

The academic literature on the impact of low class
size on academic achievement has been decidedly mixed. One of the
most frequently cited reports on class size is Frederick
Mosteller's study of young elementary school students in Tennessee.
6 Mosteller found a significant difference in
achievement between the students in classes of 15 students per
teacher and those in classes of 23. Recently, however, University
of Rochester economist Eric Hanushek has questioned the results of
this study, noting that "the bulk of evidence...points to no
systematic effects of class size reductions within the relevant
policy range." 7 Of the studies that do demonstrate some
statistically significant gains in achievement, most generally
involve substantial reductions in class size. 8 However,
none of the current national policy proposals would massively
shrink class size. 9 Clearly, more research is needed on
this subject.
|
How to Interpret
These Findings
This report contains the results of statistical tests that use
NAEP data to explain differences in reading test scores. These
statistical tests isolate the independent effects of a number of
factors on reading scores (such as the education of parents) in
order to determine whether class size matters to these test scores.
The statistical tests (or correlations) cover data on a wide array
of school children, as defined by their race, income, and other
socioeconomic characteristics. Because the statistical model used
here includes these socioeconomic characteristics, the reader can
interpret these findings as applicable to each of these groups of
students. Thus, the findings about class size and reading scores
apply as much to upper-income as to lower-income students, to
blacks as to whites, to girls as to boys, and so forth.
These correlations suggest that there is a statistical
relationship between the factor and achievement in reading, but
they do not suggest that these independent factors cause
differences in academic achievement.
The variables in the model came from the NAEP database and do
not include everything that might have an effect on academic
achievement, such as the methods used to teach reading. These
factors may be much more important in general, or for a particular
child, than the factors recorded in the NAEP data. Moreover:
-
Some variables, such as
participation in the federal free and reduced-price lunch program,
are proxies (substitutes) for other unobserved factors. For
example, eligibility for the free and reduced-price lunch program
is determined by income; only children from low-income families may
participate. Although not all low-income children will participate
in the free and reduced-price lunch program, many will. Such
information may be used, then, to analyze the effect of different
characteristics on achievement.
-
Some variables also may be used to
determine the effect of some unobservable "third factor." For
example, this model does not suggest that poor families have
children who do worse on the NAEP because they are poor.
Rather, poor families may have some unobservable characteristics or
challenges that make it more difficult to succeed in school.
Similarly, the categories of black and Hispanic students cover
children whose characteristics other than their race may make it
more difficult for them to score well.
- "Statistically insignificant" means that the effect of the
variable/factor, if any, is no different from zero effect. For
example, if the relationship between small class size and academic
achievement is statistically insignificant, that means that
students in small classes do no better than those in large
ones.
|
Characteristics of the NAEP Data
The
author used the 1998 NAEP database of reading to measure the
influence of class size on academic achievement. The NAEP, first
administered in 1969, is an examination that measures academic
achievement in a variety of fields, such as reading, writing,
mathematics, science, geography, civics, and the arts. Currently,
the NAEP is administered to fourth, eighth, and twelth grade
students, with the main tests in math and reading given alternately
every two years. For example, reading was tested in 1998; math was
assessed in 1996 and 2000.
The
NAEP is actually two tests, a nationally administered test and
state-administered tests. Over 40 states participate in the
separate state samples used to gauge achievement within those
individual jurisdictions. For the purposes of this study, only 1998
national reading data were used.
The
most significant benefit of using the NAEP data is that, in
addition to test scores in the subject area, it includes an
assortment of background information for the students taking the
exam, their main subject-area teacher, and their school
administrator. Responses from the teachers and school
administrators are linked to the student's information, which
yields a rich database of information. The background questions
include:
-
TV viewing habits,
-
Computer usage at home and school,
-
Teacher tenure and certification,
-
Socioeconomic status,
-
Basic demographics, and
- School characteristics.
By
incorporating this information with their assessment of the NAEP
data, researchers can glean a great deal of evidence into the
factors that explain the differences found in NAEP scores among
children.
The Heritage
Analysis
This
analysis looked at academic achievement by analyzing six factors:
class size, race and ethnicity, parents' educational attainment,
number of reading materials in the home, free or reduced-price
lunch participation, and gender. Using regression analysis,
Heritage analysts can isolate the effect of each factor. The
Heritage analysis uses a jackknifed ordinary least squares model
10 and looks at the effects of these factors on the NAEP
1998 nationwide sample of public school children. 11
The Independent
Variables
-
Class Size
Frederick Mosteller explains why small classes boost
achievement: "Having fewer children in class reduces the
distractions in the room and gives the teacher more time to devote
to each child." The average time a teacher can spend with each
child, then, appears to be important in the learning process. To
address class size, this analysis studies the NAEP data in two
different ways (statistical models). The first compares the
academic outcomes of children in the smallest classes (20 or fewer
students per teacher) with those of all other students. The second
only compares the children in these small classes with those in
large classes (at least 31 students per teacher).
-
Race and Ethnicity
Many studies and reports have demonstrated that over time,
African-American and Latino students tend to perform more poorly on
standardized tests than do white students (although the gap has
generally narrowed over the past 25 years). 13 There are
a number of potential explanations for this trend. 14
Because strong differences exist in academic achievement among the
races, the variables of race and ethnicity are included in the
analysis.
-
Parents' Education
Many researchers have noted that the educational attainment of
a child's parents is a good predictor of their child's academic
achievement. Parents who, for instance, are college educated could
be better equipped to help their children with homework and
understanding concepts than are those who have less than a high
school education, all other things being equal. Because the
education level of one parent is often highly correlated with the
other's, only a single variable is included in the analysis.
-
Number of Reading Materials in the
Home
The presence of books, magazines, encyclopedias, and
newspapers generally indicates a dedication to learning in the
household. Researchers have determined that these reading materials
are important aspects of the home environment. 15 The analysis thus
includes a variable controlling for the number of these four types
of reading materials found at home.
-
Free/Reduced-Price Lunch
Participation
Income is often a key predictor of academic achievement
because low-income families seldom have the resources to purchase
extra study materials or tutorial classes that may help their
children perform better in school. While the NAEP does not collect
data on household income, it does collect data on participation in
the school free and reduced-price lunch program that are used here.
16
-
Gender
Empirical research has suggested that girls tend to perform
better on reading and writing subjects while boys perform better on
the more analytical subjects of math and science. 17
Many authors have expounded on this idea, 18 yet the
data on the male-female achievement gaps are often inconsistent. In
1998, for example, young men scored higher than young women on both
the verbal and quantitative sections of the Scholastic Achievement
Test (SAT). Some writers have noted that this may be because of a
fundamental bias against females in the educational system.
19 Another explanation, however, is that the test
results reflect a selection bias in which more "at-risk" females
opt to take the SAT relative to males who take it. 20 In
order to account for this factor, the analysis includes a variable
for gender.
- Omitted variables
Previous research 21 has included more family
background variables in the model specification. In the 1998 NAEP
database, however, the only information available on children's
parents is their educational attainment. The NAEP does not ask
whether the child lives with both parents (or parental figures),
one parent, or no parents (i.e., in a group home). Future
administrations of the NAEP test should include this type of
question since a great deal of research has found that having both
parents in the home can improve a child's academic
achievement.
Results of the
Analysis
These six factors formed the basis of two
statistical models 22 that were applied to the NAEP's
1998 nationwide sample of public school children who took the
reading test. 23 As noted above, the first model
compares the data for children in small class sizes (20 or less
students per teacher) to all other students. The second model only
compares data for students reported to be in either small or large
classes (classes with 31 or more students per teacher). By
determining whether or not an achievement difference exists between
the smallest and largest classes in America, the second model
addresses the contention that there may be differences in
achievement as the class size gap widens.


Chart 2 and
Chart 3 show
the percent change in fourth and eighth grade reading scores
attributable to the factors in the first model, compared with a
base case, while Chart 4 and Chart 5 show the percent change in the
second model. 24 Here, the base case is defined as a
child with the following characteristics:
- White;
- Female;
- Non-poor (that is, not participating in
the free and reduced price lunch program);
- Parents who did not attend college;
- Has two out of the four possible reading
materials in the home; and
- Has a reading class size of over 20
students to one teacher.
The
estimates of the base case are reported in Table 1 for both models.
These are the scores that a hypothetical individual would score out
of a maximum possible NAEP score of 500. Chart 2 through Chart 5
show the positive or negative percent changes for each variable,
holding constant all other variables in the model.
In
the first model, the analysis of the data on children in all class
sizes shows no significant difference in reading test scores
attributable to class size, holding all other variables constant.
25 As seen in Chart 2, NAEP scores of fourth grade
children whose parents attended some college are 2.2 percent higher
than scores for children whose parents have a high school education
or less. Most important, moving from a class size above 20 down to
20 or fewer reduces NAEP scores by 0.8 percent, but this effect is
statistically indistinguishable from no influence. While it may
seem logical that lower class sizes would have a positive influence
on achievement, the NAEP data do not support that conclusion. The
second model, comparing children in small classes to those in large
classes, reaches a similar conclusion. Again, class size does not
have a meaningful impact on academic achievement.
For fourth graders, the model results do not change
appreciably when comparing only those in small or large classes.
One exception is the variable that controls for having at least one
parent who attended college. The importance of this variable
increases when the model compares children in small and large class
sizes.
For
eighth graders, the class size variable is significant when
comparing children in small and large classes. The results of the
comparison are counterintuitive since the coefficient has a
negative sign. Holding other variables constant, this means that
eighth grade children in small class sizes do worse on the NAEP
reading exam than do those in large classes. The magnitude of the
effect is significant; in the base model, a child would score 1.7
percent less than the base case child. The variable is barely
significant statistically, 26 however, and should be
treated with suspicion.
Both
fourth and eighth grade girls score slightly higher than do boys on
the reading exam, which bolsters recent evidence on gender
differences in academic achievement. Girls on average, notes
American Enterprise Institute W. H. Brady Fellow Christina Hoff
Sommers, "get better grades, are more engaged academically, and are
now in the majority in higher education." 27 The results
here support the contention that schools are not shortchanging
girls. 28
Conclusion
Class size has little or no effect on
academic achievement, according to this analysis of 1998 NAEP data.
It is quite likely, in fact, that class size as a variable pales in
comparison with the effects of many factors not included in the
NAEP data, such as teacher quality and teaching methods. Observes
Irwin Kurz, principal of the highly successful P.S. 161, a public
school in Brooklyn, New York, that serves poor children and has an
average class size of 35, it is "[b]etter to have one good teacher,
than two crummy teachers any day." 29
Kirk A. Johnson,
Ph.D. is a Policy Analyst in the Center for Data Analysis at
The Heritage Foundation.
Appendix A: Results of the Statistical
Models
Table 2 and Table 3 report the results of
an analysis of NAEP data using two statistical models. Table 2
shows the coefficients and significance tests for the first model,
which compares data for all public school children in the analysis,
while Table 3 reports the results for the model that compares only
small classes (20 students or less per teacher) to large classes
(31 students or more per teacher). As shown in these tables, most
variables are statistically significant. 30 Contrary to
conventional wisdom, the class size variable is not significant or
has the wrong sign on the coefficient.
In
this analysis, there are two statistical issues to consider. First,
the NAEP exam is a long test and therefore is not administered in
its entirety to all children. Rather, different parts are given to
different children. Certain students will do better on certain
portions of the test than others. Consequently, a "true" score must
be estimated, or imputed, from the incomplete information. The NAEP
estimates five plausible composite reading scores and recommends
that researchers use all five in any analysis. The Heritage
analysis described here follows the guidelines specified by the
Educational Testing Service (which works closely with the National
Center for Education Statistics in developing the data file) for
incorporating all five reading scores into the analysis.
31
Second, the NAEP utilizes a complex sample
design, oversampling children with certain characteristics.
32 Each child, then, is given a unique weight, which is
calculated from the probability of being selected from the
population at large (in this case, from the U.S. population of
fourth or eighth graders in public schools). The NAEP's sample
design requires a complex modeling technique, which the Heritage
model employs. 33
U.S. Department of Education, "Total
Appropriations for ESEA, 1990-2001," unpublished tables available
upon request from the author.