Do federal social programs work? This is a simple question. While the question may be straightforward, however, finding an answer is complicated. To answer in the affirmative, federal social programs must ameliorate the social problems they target. In essence, social programs seek to improve human behavior in ways that will make people better off. For example, the social programs of the Great Society sought to eradicate the fundamental causes of poverty by providing opportunity for the poor to join other Americans in prosperity.
As used in this paper and the book by the same name, the term “social program” refers to efforts by the federal government that attempt to improve human behavior by increasing skills or awareness, chiefly through noncompensatory services. These programs engage in social engineering that attempts to enhance the well-being of citizens. Social programs are intended to fix social problems that individuals are assumed to be unable to solve themselves. Head Start is a classic example of a social program. Created as part of the War on Poverty in 1965, Head Start is a preschool community-based program that is intended to provide a boost to disadvantaged children before they enter elementary school.
In the federal budget, social programs are considered discretionary and grouped with “other mandatory” domestic programs. This classification includes numerous education, welfare, housing, and employment programs.
Determining the effectiveness of federal social programs is particularly relevant given the current political debate over the federal government’s persistent deficits and debt. Many of the budget plans in Congress reduce the rate of spending increases on federal social programs. Very few plans actually propose real spending reductions.
Opponents of spending reductions assert that spending any less on social programs will have disastrous effects on society. In April 2012, for example, President Barack Obama called the budget plan passed by the Republican-controlled U.S. House of Representatives a “Trojan Horse” and “thinly veiled social Darwinism.” In particular, the President said, “If this budget becomes law and the cuts were applied evenly, starting in 2014, over 200,000 children would lose their chance to get an early education in the Head Start program.” The clear implication is that over 200,000 children will somehow be harmed by not attending Head Start. This would be true only if Head Start is an effective program that actually benefits the children it serves.
Using evidence from scientifically rigorous multisite experimental evaluations of national programs published since 1990, the book Do Federal Social Programs Work? demonstrates that federal social programs such as Head Start are ineffective. The American people have nothing to fear from the elimination of ineffective programs.
The findings of multisite experimental evaluations are reliable because they assess the performance and effectiveness of federal social programs in multiple locations. While individual programs operating in single locations may undergo experimental evaluations, these small-scale, single-site evaluations do not inform federal policymakers of the general effectiveness of national programs. The success of a single program that serves a particular jurisdiction or population does not necessarily mean that the same program will achieve similar success on a national scale. Thus, small-scale evaluations are poor substitutes for large-scale evaluations. Yet many advocates of social engineering make grandiose claims about the potential effectiveness of small-scale programs implemented on the national scale.
On December 31, 2011, the gross debt racked up by the federal government reached $15.2 trillion—the legal limit authorized by Congress. In response, President Obama formally notified Congress on January 12, 2012, of his intent to raise the nation’s debt ceiling by $1.2 trillion from $15.2 trillion to $16.4 trillion. At the end of fiscal year (FY) 2013 on September 30, 2013, the federal government’s gross debt was expected to reach $17.5 trillion or 107.4 percent of gross domestic product (GDP). As of February 14, 2014, the gross debt was $17.3 trillion. This is a staggering sum that is difficult for Americans to grasp. If we did, we would be truly frightened at the prospect of paying it off.
While entitlement spending is the primary driver of the federal debt, spending on social programs is not trivial. Obtaining exact data on how much the federal government spends on social programs is difficult. The Office of Management and Budget (OMB) started classifying spending by subfunction categories in 1962. The Education, Training, Employment, and Social Services function and the Housing Assistance, Food and Nutrition Assistance, Other Income Security, and Criminal Justice Assistance subfunctions are used to estimate the amount of taxpayer dollars spent on federal social programs. Within these categories are a range of social programs intended to change human behavior for the better. While this measure is imperfect, it is a practical estimate of social program spending. Social programs funded under these categories include Temporary Assistance for Needy Families (TANF), job-training and education programs, juvenile delinquency prevention programs, and many other programs. The figures presented include only federal outlays and ignore spending by state and local governments.
A growing population means more spending on social programs. To account for population growth, Chart 1 presents federal social program spending on a per capita basis in 2010 dollars. In 1962, social program spending was $125.67 per capita. In 2011, the figure reached $1,421.02 per capita—a 1,031 percent increase. Total U.S. population grew from 186,537,737 people in 1962 to 311,591,917 people in 2011—an annual growth rate of 1.1 percent. An annual growth rate of 1.1 percent means that the total population will double in size every 91 years. In contrast, the annual growth rate of 5.1 percent in social program spending per capita means that spending in this category doubles every 19 years.
Why does the federal government overspend so much? To arrive at an answer, we need to understand how greatly the federal government’s scope, power, and responsibilities have expanded. Starting with the Progressive Era, running through the New Deal, and ending with the Great Society, the original understanding of the government’s role in protecting our freedoms that was established during the American Founding was redefined with the call for a much more activist federal government. Each of these political waves sought to transform America into something very different from what the Founding Fathers envisioned.
Based on the Founders’ conception, once government secures our natural rights or formal freedom as expressed in the Declaration of Independence, the responsibility for obtaining our hopes, desires, and economic security—while not guaranteed—is up to us. The Progressives replaced the Founders’ notion of natural rights with a new positive or effective freedom that required government to assist individuals in achieving their full potential as human beings. Securing formal freedoms was not enough to allow individuals to be truly free. Individuals must be given the capacity and resources to achieve “effective” freedom. Instead of protecting our formal freedoms as understood by the Founders, Woodrow Wilson asserted that “the individual must be assured the best means, the best and fullest opportunities, for complete self-development.” Federal social programs must be created to help individuals acquire the necessary resources to be effectively free.
This fundamental transformation of the notion of freedom allowed Franklin D. Roosevelt to assert that “[e]very man has a right to life; and this means that he has also a right to make a comfortable living.” Instead of requiring that Americans not harm each other, we are, according to Roosevelt, obligated to ensure that every American has a comfortable living.
Unlike the New Dealers, the advocates of the Great Society sought not to provide relief and social insurance programs, but rather to social engineer a better society. In May 1964, President Lyndon B. Johnson envisioned the Great Society as
a place where every child can find knowledge to enrich his mind and to enlarge his talents. It is a place where leisure is a welcome chance to build and reflect, not a feared cause of boredom and restlessness. It is a place where the city of man serves not only the needs of the body and the demands of commerce but the desire for beauty and the hunger for community.
Johnson expressed freedom in terms of reaching the highest ideals. From the Progressive Era through the New Deal to the Great Society, the role of the federal government was greatly expanded. Instead of securing natural rights based on the principles of the Founding, the mission of the federal government now revolves around the Great Society’s objective of providing effective freedom that alleviates material hardship and promotes the realization of individual fulfillment.
Democrats and Republicans have embraced the progressive cause of providing social programs to assist individuals in achieving their full potential as human beings. Further, support for federal social engineering programs crosses ideological lines. Despite the transformation of freedom coming from the Left, many on the Right have embraced federal social engineering.
Holding Social Programs Accountable
Given the fiscal crises that the federal government is facing, holding federal social programs accountable for their performance is necessary to regain control of excessive spending. Why should Congress routinely spend taxpayer dollars on failed social programs that do not work?
Operating with increasingly scarce resources, federal policymakers need to start denying funds to ineffective programs, even if calls for funding these programs seem morally compelling. Calling for more spending on social programs may seem morally compelling, but continuing to spend taxpayer dollars on programs that do not produce their intended results is morally indefensible.
Social programs should be carefully evaluated to determine whether they do in fact work. Determining whether these programs work requires reliably sorting out the effect of a social program from confounding factors, which is a difficult task. While large-scale experimental evaluations are the best method for assessing cause-and-effect relationships, there are additional methods to judge the merits of social programs. Most notable is the legal matter of whether or not the U.S. Constitution provides Congress the authority to create social programs in the first place. In addition, the worthiness of spending hundreds of billions of dollars on social programs should be judged in light of the fact that the federal government’s gross debt is $17.3 trillion.
Science Versus Anecdotal Observations. There are numerous methods available for making sense of the world around us. We frequently make personal observations of events around us to bring order to our lives. We often assign cause-and-effect relationships to events we personally experience. For instance, learning that touching a hot stove will burn one’s hand is an easy cause-and-effect association that does not need to be tested more than once. We can easily correlate the act of touching the stove with the pain felt. Firsthand experience is often instrumental to developing knowledge. Every day, we make personal observations that guide us in our activities. We often seek the advice of others based on their personal experiences.
However, the usefulness of personal observations or experiences can be undermined when assessing complex social interactions that can have multiple causes. This problem is particularly acute when assessing the effectiveness of social programs where multiple factors can cause the outcomes of interest. The best way to determine whether federal social programs do in fact work is to conduct large-scale, multisite experimental evaluations that attempt to isolate the direct effects of social programs from other factors that affect the outcomes of interest.
Impact evaluations often assess impacts by comparing treatment or intervention groups to control or comparison groups. Determining the impact of social programs requires comparing the conditions of those who received assistance with the conditions of an equivalent group that did not experience the intervention. However, evaluations differ by the quality of methodology used to separate the net impact of programs from other factors that may explain differences in outcomes between comparison and intervention groups.
Experimental evaluations are the “gold standard” of evaluation designs. Randomized experiments attempt to demonstrate causality by (1) holding all other possible causes of the outcome constant, (2) deliberately altering only the possible cause of interest, and (3) observing whether the outcome differs between the intervention and control groups. In reality, all evaluation methods, including experimental designs, can never establish 100 percent certainty that all of the potential causes (confounding factors) were held constant.
When conducting an impact evaluation of a social program, identifying and controlling for all of the possible factors that influence the outcomes of interest is impossible. We simply do not have enough knowledge to accomplish this task. Even if we could identify all possible causal factors, collecting complete and reliable data on all of these factors would likely still be beyond our abilities. For example, it is impossible to isolate a person participating in a social program from his family in order to “remove” the influences of family.
This is where the benefits of random assignment become clear. Because we do not know enough about all possible causal factors to identify and hold them constant, randomly assigning test subjects to intervention and control groups allows us to have a high degree of confidence that these unidentified factors will not confound our estimate of the intervention’s impact. Random assignments should evenly distribute these unidentified factors between the intervention and control groups of an experimental evaluation.
Standards for Assessing the Effectiveness of Federal Social Programs
Congress can take several steps to ensure that federal social programs are properly assessed using experimental evaluations. These experimental evaluations should be large in scale and based on multiple sites to avoid the problems of simplistic generalizations. Given the multitude of confounding factors that may influence the performance of social programs, the larger the size of the evaluation, the more likely that the federal social program will be assessed under all of the conditions under which it operates.
When Congress creates social programs, especially state and local grant programs, the funded activities are implemented in multiple cities or towns. While individual social programs operating in a single location and funded by the federal government may undergo experimental evaluations, these small-scale, single-site evaluations do not inform policymakers of the general effectiveness of national social programs. Small-scale evaluations assess only the impact on a small fraction of the people served by federal social programs. The success of a single program that serves a particular jurisdiction or population does not necessarily mean that the same program will achieve similar success in other jurisdictions or among different populations. Put simply, small-scale evaluations are poor substitutes for large-scale evaluations.
Thus, federal social programs should be evaluated in multiple sites so that social programs can be tested in the various conditions in which they operate and in the numerous types of populations that they serve. In addition, a multisite experimental evaluation that examines the performance of a particular program in numerous and diverse settings can potentially produce results that are more persuasive to policymakers than results from a single locality.
The case of police departments performing mandatory arrests in domestic violence incidents is a poignant example of why policymakers should exercise caution when generalizing findings from a single evaluation. During the 1980s, criminologists Lawrence W. Sherman and Richard A. Berk analyzed the impact of mandatory arrests for domestic violence incidents on future domestic violence incidents in Minneapolis, Minnesota. Compared to less severe police responses, the Minneapolis experiment found that mandatory arrests lead to significantly lower rates of domestic violence. Sherman and Berk urged caution, but police departments across the nation adopted the mandatory arrest policy based on the results of one evaluation conducted in one city.
However, what worked in Minneapolis did not always work in other locations. Experiments conducted by Sherman and others in Omaha, Nebraska; Milwaukee, Wisconsin; Charlotte, North Carolina; Colorado Springs, Colorado; and Dade County, Florida, found mixed results. Experiments in Omaha, Milwaukee, and Charlotte found that mandatory arrests lead to long-term increases in domestic violence. Apparently, knowing that they would automatically be arrested prompted repeat offenders to become more abusive. It seems the offender followed the sick logic that if he were going to be automatically arrested and spend the night in jail, he might as well beat his wife even more. In a subsequent analysis of the disparate findings, Sherman postulated that arrested individuals who lacked a stake in conformity within their communities were significantly more likely to engage in domestic violence after arrest, while arrested individuals who were married and employed were significantly less likely to commit further domestic violence infractions.
Large-Scale Multisite Experimental Evaluations
Despite the trillions of dollars that Congress has spent on federal social programs, only a few have undergone large-scale experimental impact evaluations. We have done our best to include all of the relevant multisite experimental evaluations of federal social programs that have been published since 1990. These 20 evaluations assessed the impact of 21 federal social programs:
- Early Head Start;
- Enhanced Early Head Start with Employment Services;
- Head Start;
- Even Start Family Literacy Program;
- 21st Century Community Learning Centers;
- Abstinence Education;
- Upward Bound;
- Food Stamp (renamed Supplemental Nutrition Assistance Program, or SNAP) Employment and Training Program;
- Employment Retention and Advancement (ERA) Project;
- Building Strong Families (BSF);
- Supporting Healthy Marriage;
- Moving to Opportunity;
- Section 8 Housing Vouchers;
- Job Training Partnership Act (JTPA) programs;
- Unemployment Insurance Self-Employment Demonstrations;
- Project GATE (Growing America Through Entrepreneurship);
- Job Corps;
- Center for Employment Training (CET) Replication; and
- Quantum Opportunity Program Demonstration.
Federal Social Programs for Children and Families
This section covers federal social programs that are intended to benefit a wide range of clients—from infants and toddlers to adult heads of household. Federal programs target a host of social problems, including low academic skills, poverty, personal relations, hard-to-employ workers, and low wages.
Young Children. Commenting on the lack of demonstrated effectiveness of federal social programs over the past 20 years, Isabel V. Sawhill of the Brookings Institution and Jon Baron of the Coalition for Evidence-Based Policy wrote, “Only one program—Early Head Start (a sister program to Head Start, for younger children)—was found to produce meaningful, though modest, positive effects.” While the short-term results of the evaluation were publicly available at the time that Sawhill and Baron made this statement, a subsequent long-term follow-up study released in 2012 found that the short-term benefits of Early Head Start quickly faded away—not an uncommon finding for social programs.
Early Head Start, created during the 1990s, is a federally funded community-based program that serves low-income families with pregnant women, infants, and toddlers up to age three. While the short-term findings indicated modest positive impacts, almost all of the positive findings for all Early Head Start Participants are driven by the positive findings for blacks. The program had little to no effect on white and Hispanic participants, who are the majority of program participants.
For the long-term findings, the overall initial effects of Early Head Start at age three clearly faded away by the fifth grade. For the 11 child-social-emotional outcomes, none of the results were found to have statistically meaningful impacts. Further, Early Head Start failed to have statistically measurable effects on the 10 measures of child academic outcomes, including reading, vocabulary, and math skills.
Despite Head Start’s long life, the program never underwent a thorough, scientifically rigorous evaluation of its effectiveness until Congress mandated an evaluation in 1998. The Head Start Impact Study began in 2002, and the immediate-term, short-term, and long-term results released in 2005, 2010, and 2012, respectively, are disappointing. Overall, the evaluation found that the program largely failed to improve the cognitive, social-emotional, health, and parenting outcomes of children who participated compared with the outcomes of similar children who did not participate. According to the report, “[T]he benefits of access to Head Start at age four are largely absent by 1st grade for the program population as a whole.” Alarmingly, Head Start actually had a harmful effect on three-year-old participants once they entered kindergarten. Teachers reported that non-participating children were more prepared in math skills than the children who participated in Head Start.
The third-grade follow-up to the Head Start Impact Study followed students’ performance through the end of third grade. The results shed further light on the ineffectiveness of Head Start. By third grade, Head Start had little to no effect on cognitive, social-emotional, health, or parenting outcomes of participating children.
In addition to these failures, evaluations of the Enhanced Early Head Start with Employment Services, which provides early childhood care and employment training services to families, and the Even Start Family Literacy Program, which is intended to meet the basic educational needs of parents and children, failed to produce statistically meaningful impacts.
School-Age Children. Federal social programs targeting school-age children do not fare any better. An evaluation of the 21st Century Community Learning Centers, an after-school program intended to improve the academic performance of students, found that the program actually decreased the academic performance and increased the behavioral problems of participating students.
Created in 1965, Upward Bound is an original War on Poverty social program intended to help economically disadvantaged students successfully complete high school and attend college. However, an evaluation found that in general, the program has almost entirely no effect on these goals. Other school-age social programs have been found to be ineffective too.
Families. Of all the social programs reviewed, welfare-to-work strategies have had the most positive results. In 1989, the federal government funded the National Evaluation of Welfare-to-Work Strategies (NEWWS), which assessed the long-term effects of 11 mandatory welfare-to-work programs operating in seven sites during the late 1980s and 1990s. According to Lawrence M. Mead, a professor of politics at New York University,
NEWWS aimed mainly to determine whether programs that emphasized training or work first were more effective and whether work tests for mothers would harm children. A clear verdict emerged that work first was best and that children were little affected. The congressional drafters of PRWORA [Personal Responsibility and Work Opportunity Reconciliation Act] in 1996 had already assumed that a radical, work-first reform was best, but the NEWWS results ratified that judgment.
Overall, NEWWS found that welfare-to-work strategies were more effective at increasing the earning of participants. In particular, strategies that focused on quick entry into the labor force moved welfare recipients into jobs more quickly and were more effective than strategies that focused on job training.
While the 1989 welfare-to-work evaluation showed consistent positive impacts on earnings, welfare reform experts argue that the spread of work-focused strategies encouraged many individuals to find employment in the first place rather than seeking welfare assistance. This effect should surely be considered a positive benefit of welfare reform and was not part of the NEWWS evaluation. The NEWWS evaluation, with similar randomized experiments, used a control group that participates in some form of welfare assistance. This fact means that these experimental evaluations cannot assess the potential benefit of encouraging individuals to obtain employment instead of seeking welfare assistance and are an inadequate method for estimating changes in welfare caseloads.
Despite the success of welfare-to-work strategies, other social programs focusing on families that have undergone multisite experimental evaluations have been found to be ineffective. For example:
- The Food Stamp (renamed Supplemental Nutrition Assistance Program) Employment and Training Program failed to affect earnings and employment outcomes;
- The Employment Retention and Advancement Project, a program designed to provide additional employment and training services to Temporary Assistance to Needy Families participants and others with employment difficulties, was largely ineffective in improving earnings and employment outcomes; and
- The evaluation of Moving to Opportunity/Section 8 housing vouchers found that housing subsidies intended to improve the lives of parents and children consistently failed to produce statistically meaningful results.
Federal Social Programs for Workers
This section covers federal social programs that are intended to boost the job skills and employability of workers. The federal government has spent decades trying to improve the earnings of low-income individuals through various employment and training programs, but the Government Accountability Office has concluded that there is little evidence to show that these programs are effective.
Adults and Youth. Conducted in 16 sites across the nation during the late 1980s and early 1990s, the Job Training Partnership Act evaluation tracked program effects for more than 20,000 adult men, adult women, and out-of-school youths over the course of 30 months. Overall, the performance of JTPA programs is widely considered to be a failure. While adult females had several positive outcomes, the results were generally not large enough to be considered very meaningful. Further, JTPA programs were largely ineffective in raising the incomes of adult males or male and female youths.
Adults. To assess the effectiveness of self-employment training programs for those receiving Unemployment Insurance benefits, the federal government sponsored two experimental impact evaluations of programs in Washington State and Massachusetts. Both programs had some very small and fleeting effects on increasing self-employment, but they were largely ineffective at raising the earnings of participants.
To help Americans start new businesses, the federal government established an employment program to assist people in creating or expanding their own business enterprises. Begun in 2003, Project GATE operated in Pennsylvania, Minnesota, and Maine. Overall, Project GATE appears to have an initial impact on business ownership and self-employment, but it quickly faded away. Most important, Project GATE failed to increase the self-employment earnings of participants while temporarily reducing their total earnings.
Youth. Three multisite experimental evaluations of employment and training programs that specifically target youth strongly suggest that these programs are poorly serving this population.
Created in 1964, the Job Corps is a residential job-training program that serves disadvantaged youth in 125 sites across the nation. An experimental evaluation found the program to be ineffective. Over the course of the 48-month study, Job Corps participants actually worked less than the control group. The study also revealed that the Job Corps had little impact on increasing the number of hours worked per week. If the Job Corps is effective in improving the skills of its participants, then it should have substantially raised the hourly wages they received. Initially, Job Corps participants earned an average of $0.24 more per hour than non-participants, but this difference decreased to $0.22 per hour after one year.
The JOBSTART Demonstration evaluated the impact of 13 job-training programs that were offered by community-based organizations, schools, and the Job Corps. Overall, the programs failed to increase the earnings of participants. Of the 13 sites, 12 were found to be ineffective at raising the incomes of participants. However, one site—the Center for Employment Training (CET) in San Jose, California—had a positive impact on earnings. Thus, at a single site, the CET appears to have been effective at raising the incomes of participants. For policymakers, the important question is whether these results can be replicated at different sites and for different populations.
Based on the JOBSTART evaluation results for the CET, the U.S. Department of Labor sought to replicate the program at 16 other sites. However, only 12 of these sites were evaluated. The CET model had little to no effect on short-term and long-term employment and earnings outcomes at these other locations. The multisite experimental evaluation of the CET, according to its authors, “shows, that even in sites that best implemented the model, CET had no overall employment and earnings effects for youth in the program, even though it increased participants’ hours of training and receipt of credentials.”
Similarly, the Quantum Opportunity Program demonstration, which offered intensive and comprehensive services with the intention of helping at-risk youth graduate from high school and enroll in postsecondary education or training, failed to increase the employability or earnings of youth.
Do federal social programs work? Based on the scientifically rigorous multisite experimental evaluations published since 1990, the answer certainly cannot be in the affirmative. Despite the best social engineering efforts, overwhelming evidence points to the conclusion that federal social programs are ineffective.
Ameliorating such problems as low academic achievement, poor cognitive ability, poverty, joblessness, low wages, and personal relations appears to be out of reach for federal social programs. The most notable exception is welfare-to-work programs, which increased earnings, but participants still received some government assistance.
The evidence clearly shows that federal social programs are ineffective. It cannot be just a coincidence that the many multisite evaluations published since 1990 overwhelmingly find that this is true. Our nation faces a severe debt crisis that threatens our very future. Americans should not fear eliminating social programs. Now is the time for deep budget cuts in federal social programs.
The social programs that Congress continues to fund need to undergo large-scale experimental evaluations. Multisite experimental evaluations are the best method for assessing the effectiveness of federal social programs. Yet to date, this method has been used to evaluate only a handful of federal social programs. Congress needs to reverse this trend.—David B. Muhlhausen, PhD, is Research Fellow in Empirical Policy Analysis in the Center for Data Analysis at The Heritage Foundation.