Given that the federal government’s debt is over $19.4 trillion—$14.0 trillion in debt held by the public and nearly $5.4 trillion in intergovernmental holdings—every American should be concerned about the nation’s extraordinary level of debt. Congress, which in recent years has seemed incapable of curbing spending and allocating resources effectively, needs to relearn how to be a wise steward of the federal purse. Through leadership, the next President can help restore fiscal discipline in the federal government. Such leadership does not mean merely releasing statements about funding the programs that work and cavalierly demanding results. It does not mean calling for the creation of new “evidence-based” programs, while leaving the vast majority of current federal programs untouched. Real leadership requires articulation of a clear and persuasive message that is backed by concrete actions that instill a culture of fiscal discipline in the nation’s capital.
At times, the Office of Management and Budget (OMB) has been labeled the most powerful naysayer in government. The OMB has not always lived up to this stingy reputation and its influence has fluctuated over the years. In addition to formulating the President’s budget recommendation to Congress, the OMB “operates as a clearinghouse for legislative proposals that departments and agencies wish to see introduced into and passed by the Congress. Such initiatives must receive OMB’s approval as conforming with presidential policy guidelines.” The OMB is expected to provide the President with objective information and analysis, while White House staff may be less willing to deliver bad news. Presidents need to hear the complete case before making a decision. The OMB has the advantage of longer-term institutional memory than White House staff.
During the George W. Bush Administration, the OMB created the Program Assessment Rating Tool (PART) to help inform budget decisions by holding federal government programs accountable. Debuting in President Bush’s fiscal year (FY) 2004 budget recommendation, PART was an attempt to assess every federal program’s purpose, management, and results to determine its overall effectiveness. The extremely ambitious PART was a first-of-its-kind attempt to link federal budgetary decisions to performance. Such accountability had never been attempted by a President. PART placed “unprecedented focus and sustained pressure on executive agencies to improve performance.” Unfortunately, President Barack Obama terminated the original PART.
Instituting an improved PART (PART 2.0) will help the next President pressure Congress to eliminate wasteful and ineffective programs, no matter how politically popular they may be, and to make remaining federal programs operate as efficiently as possible to save taxpayer money.
Government’s Lack of Fiscal Discipline Threatens America’s Future
America’s debt is out of control, and Congress and recent Presidents have done little to decrease spending and reduce the debt. The current fiscal path will debilitate the economy, substantially weaken prosperity, and lead to massive tax burdens for future generations.
The United States has four basic options to prevent out-of-control debt from devastating the economy. The first is to raise taxes. The second is to cut spending. The third is to print more money to pay down the debt, while increasing inflation. The last is to default. While the “correct” option is often based on one’s ideology, there is a body of empirical research that indicates that the best option is to cut spending.
Several studies strongly suggest that cutting spending and reducing debt—instead of increasing taxes and spending—can help to boost the economy. This body of literature suggests a clear path for America: cutting spending to boost the economy and reduce debt. A good place to start is with the elimination of funding for ineffective programs.
Reducing spending and debt is an ambitious agenda. However, ambition must be matched with persistence and momentum. One obvious tool missing from the budget-cutters’ toolbox is strongly linking evidence-based policymaking to budgetary decisions. When practiced correctly, evidence-based policymaking is a tool that allows policymakers, especially at the OMB, to base funding decisions on scientifically rigorous impact evaluations of programs. Given scarce federal resources, federal policymakers should fund only those programs that have been proven to work and defund programs that do not work.
In the free market, businesses that do not produce profits either innovate to become successful, or they go out of business. In the government sector, there is no such profit-loss mechanism. In essence, an evidence-based policymaking agenda that is strongly linked to performance budgeting will bring something similar to the accountability seen in the free market to the federal government. Government programs that fail to produce verifiable results should lose funding, while truly effective programs should retain their budget.
The Appalling Lack of Accountability. The effectiveness of federal programs is often unknown. Many programs operate for decades without undergoing thorough scientific evaluations. The federal government needs to prioritize government functions by intelligently targeting resources. Federal bureaucrats should be expected to make a credible case that the programs they manage deliver evidence-based results. Objective, reliable evidence of program effectiveness or ineffectiveness should encourage Congress to be a wiser steward of the federal purse.
The potential of performance budgeting and management is degraded when agencies turn the system into “make work” and compliance exercises. In order for performance management to lead to a leaner, more effective government, performance needs to be strongly linked to budget decisions. Without a serious commitment from the executive and legislative branches to funding programs that work and defunding programs that do not work, performance management will never live up to its potential. Encouraging policymakers to utilize performance information in budget decision making is the key task at hand.
Learning from Experience. Performance management and budgeting helps policymakers learn from experience. By systematically analyzing what works and what does not, and then employing what is learned, government resources can be allocated more effectively. The federal government needs to develop the capacity not only to assess its successes and failures honestly, but also to translate this information directly into budget decisions. The most scientific evaluation is ultimately meaningless if the results are not incorporated into budgets.
As a management tool, performance-monitoring systems monitor the implementation (not the effectiveness) of programs. Monitoring systems that rely on “outputs” and “outcomes” without a clear counterfactual too often fail to provide reliable evidence of effectiveness or lack thereof. This “attribution problem” can be solved by the use of large-scale experimental (random assignment) evaluations.
The performance reports released by Cabinet-level departments of the federal government make this case in point, such as the FY 2015 performance report by the U.S. Department of Labor, which relies on outputs and outcomes to assess performance, while ignoring multi-site experimental evaluations that have found its programs to be ineffective. As for performance measures of job-training programs, the Department of Labor relies on outputs, such as before-and-after participation changes in employment and earnings, that have no reliable counterfactual to estimate effectiveness accurately.
As the example of federal job-training programs strongly suggests, performance monitoring has serious limitations in assessing effectiveness. While the U.S. Department of Labor’s job-training performance-monitoring system collects some useful information, “it suffers from shortcomings,” according to Diane Blank of the Government Accountability Office (GAO) and her coauthors, “that may limit its usefulness in understanding the full reach of the system and may lead to disincentives to serve those who may most need services.”
First, performance monitoring does not measure program “impact.” Instead, it measures outcome or output. Program impact is assessed by comparing outcomes for program participants with estimates of what the outcomes would have been had the participants not partaken in the program. Without a valid comparison, performance monitoring based on “output” or “outcome” cannot provide valid estimates of program effectiveness.
Second, the effect of cream skimming can make the results of the performance-monitoring system overstate the effectiveness of programs. Through gaming the system, administrators can engage in strategic decision making by selectively including certain performance data that misrepresent program effectiveness. Professor Burt Barnow at George Washington University and Professor Jeffrey Smith at the University of Michigan found that local job-training administrators engaged in strategic behavior by manipulating whether participants were formally enrolled and thus recorded in the performance monitoring system. Under the Department of Labor’s performance-monitoring system, only individuals officially enrolled in job-training programs were counted toward performance standards. For instance, some local administrators increased reported performance by only including participants in the monitoring system if those individuals gained employment, thus counting them as successes. Alternatively, those job-training participants who never obtained employment were not officially counted in the performance-monitoring system. Thus, these failures were never recorded as part of the program’s performance.
Another case is the Department of Labor’s performance measures for the Reintegration of Ex-Offenders (RExO) program. The Department of Labor uses the percentage of re-entry program participants who are employed a year after program exit and their recidivism rate. As a measure of success, the department notes that six grantees have managed to each place more than 100 enrollees into jobs as an indication of “What Worked.” Further, the department reports that the recidivism rate of participants for FY 2014—12.33 percent—as reported by grantees, is a success because it is lower than the target of 22 percent. While this outcome is counted as a success, the department notes that “[t]here is a problem with the recidivism rates grantees have been reporting as Social Policy Research has found much higher recidivism rates of past enrollees based on state criminal records data.” This clearly indicates that the performance data reported by grantees is unreliable.
The Department of Labor’s report also fails to mention the large-scale experimental evaluation of the RExO program. According to that evaluation, RExO is ineffective. The services provided by the RExO grantees had a small effect on the employment and earnings of participants. One year after random assignment, participants in the re-entry programs were slightly more likely to be employed, but they were no more likely to be employed during the following year, compared to similar former prisoners not receiving services. Over two years, the program had no effect on the average number of days worked. In fact, on average, the program participants earned only $883 more in income than the control group over the two-year period.
Did RExO reduce recidivism? Based on administrative data, the services provided by the RExO grantees failed to improve the recidivism, convictions, and re-incarceration rates of participants. The evaluation’s authors conclude that the criminal justice results based on administrative data provide “no evidence whatsoever of any impacts of RExO.” Yet, this scientifically rigorous evaluation is omitted from the Labor Department’s performance report.
Due to the limitations of performance monitoring, budget decisions, wherever possible, need to be based on large-scale experimental evaluations.
Anemic Performance. While the supply of performance information has increased, there is less evidence of use of performance information by government in decision making.[ 27] Previous reform efforts, such as Planning-Programming-Budgeting Systems, Management by Objectives, and Zero-Based Budgeting, have failed because performance budgeting systems did not account for the political process of Congress.
Low use of performance information in budgeting decisions occurs not only in America. A review of the relevant research concluded that legislatures in Organization for Economic Co-operation and Development countries frequently failed to use performance information meaningfully in budgetary decisions.
The political process that ultimately decides budgetary decisions is heavily influenced by ideological biases, special interest groups, and protective bureaucracies. These factors are all too often in conflict with performance budgeting and evidence-based policymaking.
Third parties, like special interests, are often dependent on continued funding, even if the programs are ineffective. Legislators, too, want to dole out taxpayer dollars with little regard to credible evidence that such funding will work. Thus, they have strong incentives to confuse the public about the effectiveness of programs.
For example, Yasmina Vinci, the executive director of the National Head Start Association—an organization that represents Head Start grantees in Washington, DC—spun the dismal effects of the large-scale experimental Head Start Impact Study to appear as if the program had a much more substantial impact than found by the evaluation. According to Vinci, “the study documented children’s significant gains at the end of the Head Start experience and the flattening benefits of Head Start attendance at the end of third grade.” This assessment is entirely wrong. Almost all of the benefits of participating in Head Start disappeared by the time students were re-assessed in kindergarten. A “flattening” of benefits would suggest that beneficial impacts were retained through the third grade.
Overall, the Head Start Impact Study found that the program largely failed to improve the cognitive, socio-emotional, health, and parenting outcomes of children in kindergarten and first grade who participated, compared with the outcomes of similar children who did not participate. According to the report, “[T]he benefits of access to Head Start at age four are largely absent by 1st grade for the program population as a whole.” The few beneficial effects of the program disappeared in kindergarten. The third-grade follow-up to the Head Start Impact Study followed students’ performance through the end of third grade. The results shed further light on the ineffectiveness of Head Start. By third grade, Head Start had little to no effect on cognitive, social-emotional, health, or parenting outcomes of participating children.
While Members of Congress have passed legislation that requires many programs to be evidence-based, Congress continually funds programs, such as Head Start and Early Head Start, which are known to be ineffective based on the results of large-scale experimental impact evaluations. This dilemma arises because intentions and symbolism are sometimes more important to Congress than the performance of the programs it funds.
Claiming to hold government programs accountable for their performance is popular among politicians of all stripes. According to Professor Donald P. Moynihan of the University of Wisconsin-Madison, “Performance management is attractive because it communicates to the public that elected officials share their frustration with inefficient bureaucracies and are holding them accountable, saving taxpayer money and fostering better performance.” Further, “Performance management reforms rest on the assumption that once performance information is made available, it will be widely used and result in better decisions because it will foster consensus and make decision making more objective. But the limited evidence of use does not match that model.” The use of performance information, especially claims of supporting evidence-based policymaking, is often merely symbolic. If politicians do not base their funding decisions on rigorous evidence, they are only making symbolic gestures.
In many cases, adequately defining which outcomes should constitute success is difficult. However, this difficulty cannot be used as an excuse for failing to assess performance adequately. The solution to the problems of performance management is to strongly incorporate evidence-based policymaking into budget decisions.
Evidence-Based Policymaking. Unless rigorous evaluation results are strongly linked to budget decisions, all the proclamations about the benefits of evidence-based policymaking are meaningless. Promising to create new evidence-based paradigms, such as the Social Innovation Fund for funding additional programs, while continuing to fund—and in some cases expand federal programs amply demonstrated to be ineffective—is fiscally irresponsible.
The term “evidence-based” should mean that experimental evaluations of a program model have found consistent statistically significant effects that meaningfully ameliorate a targeted social problem in at least three different settings. Once a program model has been found to produce meaningful results in multiple settings, the likelihood of its successful replication elsewhere should increase greatly.
Can Government Replicate Success? In practice, policymakers frequently assume that when something has been found effective in one setting, the same results will be repeated elsewhere. However, the history of social programs is replete with examples of programs effective in one location that simply failed to work elsewhere.
The federal government has a poor record of replicating effective social programs. An excellent example of a federal attempt to replicate an effective local program is the Center for Employment Training (CET) replication. Of 13 youth job-training programs evaluated, the JOBSTART demonstration found only one program to have a positive impact on earnings: the CET in San Jose, California. Based on the results for the CET, the U.S. Department of Labor replicated and evaluated the impact of CET in 12 other sites using random assignment. The CET model had little to no effect on short-term and long-term employment and earnings outcomes at these other locations. According to the evaluation’s authors, “[E]ven in sites that best implemented the model, CET had no overall employment and earnings effects for youth in the program, even though it increased participants’ hours of training and receipt of credentials.”
A more recent example is the Obama Administration’s funding of Teen Pregnancy Prevention (TPP) grants. The Department of Health and Human Services (HHS) “invests in the implementation of evidence-based TPP programs, and provides funding to develop and evaluate new and innovative approaches to prevent teen pregnancy.” In June 2016, Ron Haskins, a research fellow at the Brookings Institution and co-chair of the Evidence-Based Policymaking Commission, testified before Congress that HHS requires “high-quality evidence showing that the programs produced significant impacts on important measures of teen sexual activity or teen pregnancy for the TPP program.”
According to HHS, Tier 1 grants are awarded to grantees replicating programs that “have been shown, in at least one program evaluation, to have a positive impact on preventing teen pregnancies, sexually transmitted infections, or sexual risk behaviors.” Does this definition include methodologically weak evaluations that are likely to overstate the effectiveness of programs? The belief is that these grants will be effective because they are replicating programs labeled “evidence-based.” Is this assumption correct?
Each of the Tier 1 grantees is supposed to evaluate the impact of the evidence-based model they are replicating. So far in 2016, HHS has released five final reports based on experimental evaluations of these grant programs. All five evaluations of Tier 1 TPP grant-funded programs failed to affect all sexual outcome measures. Clearly, replicating an evidenced-based program model does not guarantee similar results.
The other set of TPP grants (called Tier 2) fund demonstration programs that do not meet HHS’s evidence-based definition, but are considered by HHS to be innovative programs worthy of funding. To date, HHS has released five final reports based on experimental evaluations of Tier 2 grant programs. All five evaluations overwhelmingly find that these programs fail to affect the sexual outcome measures.
Just because an evidence-based program appears to have worked in one location, does not mean that the program can be effectively implemented on a larger scale or in a different location. Proponents of evidence-based policymaking should not automatically assume that pumping taxpayer dollars toward programs attempting to replicate previously successful findings will yield the same results.
The faulty reasoning that drives such failed expansions of social programs is known as the “single-instance fallacy.” This fallacy occurs when a person believes that a small-scale social program that works in one instance will yield the same results when replicated elsewhere. Compounding the effects of this fallacy, one often does not truly know why a certain program worked in the first place. In particular, the dedication and entrepreneurial enthusiasm of a program’s founder is difficult to quantify or duplicate. HHS’s definition that defines a program model as “evidence-based” based on a single evaluation is faulty.
Benefits of Fiscally Disciplined Evidence-Based Policymaking. Evidence-based policymaking that is focused on fiscal discipline has several benefits. First, judging the performance of programs based on rigorous evidence leads to improved allocative efficiency. When programs that fail to produce results receive reduced funding or are terminated altogether, and programs that produce credible and meaningful results continue to receive funding, a better allocation of scarce resources is the result.
Second, a fiscally disciplined evidence-based policymaking process helps hold federal programs accountable to the public. For external accountability by the public to work, information on the performance of programs must be released on a timely basis and made widely available to the public. These requirements mean that the federal government will no longer withhold or delay the release of evaluations that find programs to be ineffective.
For example, a cost-benefit analysis of Job Corps—a Great Society–era job-training program for disadvantaged youth—that found that program costs outweighed the benefits was finalized in 2003, but the Department of Labor withheld it from the public until 2006. The GAO has criticized the Department of Labor for its history of delaying the release of its research findings.
Similarly, HHS has noticeably delayed the release of reports based on the Head Start Impact Study that reported underwhelming results. There appears to be a pattern of withholding the results of experimental evaluations at HHS. There is reason to believe that the 2010 study of kindergarten and first-grade students was neither completed nor published in a timely fashion. According to the report, data collection for the kindergarten and first-grade evaluation was completed in 2006—nearly four years before its results were made public. For the national impact evaluation of third-grade students, data collection was conducted during the springs of 2007 and 2008. On December 21, 2012, the Friday before Christmas, HHS released the findings of the Third-Grade Head Start Impact Study without a press release to notify the public. HHS withheld this study for about four and a half years after the final data were collected.
Third, a fiscally disciplined evidence-based policymaking process helps elected officials hold bureaucrats accountable for the performance of programs. Such internal accountability assists the President’s ability to hold administrators accountable and aids Congress in practicing oversight.
While political factors, such as values and judgments on the proper role of the federal government, will always influence budget decisions, programs funded by Congress should produce their intended results. Programs that fail to produce their intended results should not be continually funded by Congress. This is where evidence-based policymaking should matter.
Evidence-based policymaking can play a role in improving the deliberative process in Congress and lead to a better-informed public about the role of public policy. While emotions and beliefs will always strongly influence political decisions, the degree to which these decisions are based on rigorous evidence may be the difference between creating public policies that fail or succeed. The question is whether policymakers in the executive and legislative branches can create an environment where rigorous evidence informs political decisions.
Empty Promises. All too often, promises of making funding contingent upon evidence are merely rhetoric without substance. To date, evidence-based policymaking has yet to be systematically linked to budget decisions. For the FY 2011 budget, the OMB announced that the Obama Administration would invest in program evaluations so that federal agencies would “have the capacity to use evidence to invest more in what works and less in what does not.” The Administration made an important distinction between performance monitoring and program evaluation: “Performance measurement is a critical tool managers use to improve performance, but often cannot conclusively answer questions about how outcomes would differ in the absence of a program or if a program had been administered in a different way. That is where program evaluations play a critical role.”
President Obama’s first Director of the OMB, Peter R. Orszag, argued that empirical evidence is the foundation of policymaking in the Obama Administration. Orszag asserted that the Obama Administration “has been clear that it places a very significant emphasis on making policy conclusions based on what the evidence suggests.”
To demonstrate how the Obama Administration is using empirical evidence to guide decision making, Orszag used the examples of Head Start and Early Head Start:
Head Start and Early Head Start also both have documented very strong suggestive evidence that they pay off over the medium and long term, both in terms of narrow indicators and broader social indicators for society as a whole. These evaluations demonstrated progress against important program goals and provided documentation necessary to justify increases in funding in the president’s budget to…further expand access, in the cases of Head Start and Early Head Start.
Of particular interest are Orszag’s comments on Head Start and Early Head Start. Orszag cites the 2010 Head Start Impact Study as evidence that the number of children participating in Head Start needs to be expanded. Unwittingly, Orszag also justifies the proposed termination of the Even Start Family Literacy Program based on its first-year follow-up’s findings because the program “has been evaluated rigorously three times” and “out of forty-one measurable outcomes, the program demonstrated no measured difference between those enrolled in the program and those not on thirty-eight of the outcomes.” Due to the program being a failure, the Obama Administration decided, as the previous Administration proposed, that Even Start should be terminated.
However, Orszag’s logic does not hold for Head Start and Early Head Start. While the first-year follow-up evaluation found Even Start to have no effect on 38 of 41 outcome measures, Head Start’s performance was even worse. Overall, the 2010 Head Start Impact Study that assessed findings for kindergarten and first grade found that Head Start failed to have an effect on 110 of 112 outcome measures for the four-year-old group, with one harmful and one beneficial impact. For the three-year-old group, Head Start failed to have an impact on 106 of 112 measures, with five beneficial impacts and one harmful impact.
As for Early Head Start, the initial benefits produced by the program are limited to a minority of participants, and these benefits quickly fade. Early Head Start, created during the 1990s, is a federally funded community-based program that serves low-income families with pregnant women, infants, and toddlers up to age three. The results of the multisite experimental evaluation of Early Head Start are particularly important because the program was inspired by the findings of the Abecedarian Project, an early-childhood education that is assumed by many to be an effective program. By the time participants reached age three, Early Head Start had beneficial impacts on two of six outcome measures for child cognitive and language development, while the program had beneficial effects on four of nine measures of child-social-emotional development. While the short-term (age three) findings indicated modest positive impacts, almost all of the positive findings for all Early Head Start participants were driven by the positive findings for black children. The program had little to no effect on white and Hispanic participants, who are the majority of program participants. For Hispanic children, the program failed to have a short-term impact on all six measures of child cognitive and language development, while the program had a beneficial effect on only one of nine measures of child-social-emotional development. For white children, the program failed to produce any beneficial impacts on these outcome measures.
For the long-term findings, the overall initial effects of Early Head Start at age three clearly faded away by the fifth grade. For the 11 child-social-emotional outcomes, none of the results were found to have statistically meaningful impacts. Further, Early Head Start failed to have statistically measurable effects on the 10 measures of child academic outcomes, including reading, vocabulary, and math.
What happened when the long-term results were analyzed by race and ethnicity? There were only two beneficial impacts for black children on 11 of the child-social-emotional outcomes. For Hispanic and white children, there was no beneficial effect for any outcome. For child academic outcomes, the long-term findings by race and ethnicity were consistent. Early Head Start failed to affect any of the 10 academic outcomes for each of the subgroups. Despite the dismal results of the scientifically rigorous evaluations of Head Start and Early Head Start, Orzag called for increased funding for these failed programs.
Orszag concluded that “the highest level of integrity must be maintained in the process of using science to inform public policy. Sound data are not sufficient to guarantee sound policy decisions, but they are necessary.” Indeed, sound data are not a sufficient guarantee for sound policy decisions. Dealing with the data forthrightly is necessary as well.
In no way does the 2010 Head Start Impact Study demonstrate “very strong suggestive evidence” that Head Start “pay[s] off over the medium and long term.” Placing more children into an already failed program does not represent placing “significant emphasis on making policy conclusions based on what the evidence suggests.” Instead of acknowledging failure and eliminating Head Start and Early Head Start, the Obama Administration has sought to expand the program to serve more children for longer periods of time. It is as if the Administration never read the large-scale experimental evaluations of these programs.
While the Obama Administration is interested in evaluating federal programs, the link between results and budgets is less clear. In 2010, the Administration announced 128 “high priority performance goals” (HPPG) that define its priorities, but “it is unclear how HPPG performance review by OMB will be integrated into budget decision making, or whether it is intended to be integrated at all.” A safe expectation is that highly performing programs will receive requests for more funding. But will poorly performing programs receive declining budget requests? Professors L. R. Jones and Jerry McCaffery of the Graduate School of Business and Public Policy at the Naval Postgraduate School add that “[d]espite this interest in careful assessment of the performance of federal agencies, there is no evidence to suggest that the Obama administration will attempt to implement performance budgeting.” Their assessment is still relevant at the end of Obama’s presidency. The next President should implement a genuine evidence-based policymaking agenda that is truly focused on fiscal discipline.
A History of Failed or Incomplete Reform
The formulation of the President’s budget recommendation begins soon after the last recommendation is submitted to Congress. Each spring, the OMB starts the process of sending out planning guidance to agencies in the executive branch. Until the early 1980s, “spring review” entailed a detailed analysis of agencies by the OMB. During this review period, OMB career staff identified policy and budgetary issues that were anticipated to impact the upcoming budget. Afterwards, a series of planning review sessions were held for various departments and agencies. Once the review was complete, the findings were presented to the OMB Director and then to the President. The results of the review process provided “the foundation for a series of relatively in-depth programmatic guidelines and budgetary targets for agencies to use during preparation of their budgetary requests, which would be submitted to OMB in September.”
The spring reviews from 1981 and 1982 were successfully used to build support for President Ronald Reagan’s budget and policy priorities. However, the role of spring review was significantly reduced during the spring of 1983. For 12 years, a formal spring review was absent at the OMB. During the Clinton Administration in 1995, a formal spring review was re-established due to the opportunity presented by the passage of the Government Performance and Results Act (GPRA).
Enacted in 1993, the GPRA was intended to improve the public’s confidence in government, program effectiveness and accountability, administrative management, and congressional decision making. While the GPRA was an important development in gathering information about the performance of federal programs, the original act and its reauthorization in 2010 have serious limitations for assessing performance and holding bureaucracies accountable. First, the information collected through the GPRA provides information that cannot tell policymakers about the actual effectiveness of federal programs. The GPRA’s performance information requirements are weak on counterfactuals needed to accurately assess effectiveness. Second, the GPRA requires that stakeholders in federal programs have influence over which outcome measures are used to assess performance. Allowing special interests dependent on funding to have a say in defining outcomes undercuts objectively determining effectiveness. Instead, easy to achieve outputs are used as proof of effectiveness. Third, and most important, the GPRA is not adequately used by policymakers for budgetary decisions. Without performance information being strongly linked to budgetary decisions, agencies have little incentive to improve performance.
The creation of PART during the George W. Bush Administration was largely a response to the inadequacies of the GPRA’s weak connection to budget decision making. (The re-authorization of the GPRA through the Government Performance and Results Modernization Act of 2010 did not improve upon this situation.) In 2002, the Bush Administration created the PART scoring mechanism. The creation of PART represented a wager by the OMB that improving the information used for budget recommendations would change the decision-making process.
Under PART, based on answers to a series of questions, federal programs were rated in the following areas:
- Program purpose and design (20 percent);
- Strategic planning (10 percent);
- Program management (20 percent); and
- Program results (50 percent).
With the goal of integrating performance and budget requests, the results-related questions were given the greatest weight in calculating the overall PART score. Recognizing the diverse array of federal programs, the PART questions were tailored by the following program classifications:
- Competitive grant programs (such as Head Start);
- Block/formula grant programs (such as vocational-education state grants);
- Regulatory-based programs (such as those of the Occupational Safety and Health Administration);
- Capital assets and service acquisition programs (such as the Youth Anti-Drug Media Campaign);
- Credit programs (such as Rural Electric Utility Loans and Guarantees);
- Direct federal programs (such as the National Weather Service); and
- Research and development (R&D) programs (such as Mars exploration).
Overall PART scores were divided into five categories:
- Moderately effective;
- Ineffective; and
- Results not demonstrated.
Programs that received a “results not demonstrated” rating had no performance measures or data for OMB to assess.
A Paradigm Shift. PART represented an important shift in thinking about accountability and performance. According to David Frederickson and H. George Frederickson, the authors of Measuring the Performance of the Hollow State, “PART holds programs to high standards. Simple adequacy or compliance with the letter of the law is not enough; a program must show it is achieving its purpose and that it is well managed. PART requires a high level of evidence to justify a ‘yes’ response. Answers must be based on the most recent credible evidence.” The burden of proof was placed firmly on the agencies.
According to Professor Paul Posner of George Mason University and the late Denise Fantone of the GAO, PART had the potential “to link performance more directly with consequences for funding and program design.” One particular agency was informed by the OMB that if it did not reduce the number of programs rated as “results not demonstrated,” the OMB would consider reducing their administrative budget. U.S. Department of Justice Weed and Seed grants were rated as “results not demonstrated,” after being criticized by the GAO for failure to adequately assess the performance of the grant program.
An underlying rationale for PART was that through the use of an index assessing the performance of each federal program, all stakeholders in the budget process would be encouraged to respond to questions asking whether they agree or disagree with the assessments. If Congress, in particular, disagrees with an assessment, it should respond with evidence that is more credible than the evidence presented by the OMB. PART could have a profound impact on congressional appropriations if Congress was forced to examine the evidence more objectively, rather than relying on political rhetoric and anecdotal evidence.
According to Professor Moynihan, the OMB viewed PART as creating an evidence-based dialogue within the federal agencies in the following ways. First, PART was based on the review by a third party—OMB—that is sheltered from the unreliability of agency self-reporting on performance.[ ]Second, PART’s focus on performance during the budget preparation process would lead program managers within the executive branch to place greater attention on performance measures than would otherwise be the case. Third, the standard of proof for judging effectiveness required evidence of positive outcomes, rather than an absence of clear failure.[ ]Fourth, the burden of proof was placed on the agencies for demonstrating effectiveness. Fifth, PART required all programs to be reviewed over five-year intervals, therefore, placing pressure on agencies to continually collect performance information throughout their programs’ existence.[ ]Sixth, the routine process of PART was intended to create an incentive for change in agencies.
Accountability. Did PART increase accountability? PART was an attempt to hold federal programs accountable through the executive branch’s role in the budget process. Budget cuts are rare; program terminations are even rarer. Of the 65 programs recommended for elimination in FY 2005 by President Bush, only one, the Small Business Administration Business Information Centers, was eliminated that year. During President Bush’s first term, several education programs were rated as “results not demonstrated” or “ineffective,” but the Administration did not propose any cuts until after the President’s re-election. While the Bush Administration originally recommended that the failed Even Start be eliminated, it was eventually eliminated in FY 2011 by Congress.
Research on PART. Several studies have assessed the association between PART ratings on the President’s budget recommendations and congressional actions. PART scores have been found to have slight or modest positive associations with budget recommendations made by President Bush. While PART did have some influence over the President’s budget recommendations, once the recommendations moved across Pennsylvania Avenue, PART had little or no effect on congressional activity appropriations. In sum, PART was more useful for decision making in the executive branch than the legislative branch. Legislators with a business background were likely to support PART, while those with longer terms in Congress or who had higher levels of campaign contributions from special interest groups were less likely to support PART.
Severing Performance from Budget Decisions. In any case, one of President Obama’s first budgetary actions was to terminate PART and replace it with a loosely structured “performance improvement and analysis framework” that is separated from annual budgetary decisions. According to President Obama, the “ideological performance goals” of PART were replaced “with goals Americans care about and that are based on congressional intent and feedback from the people served by government programs.” The new framework was introduced during President Obama’s first budget request, for FY 2010. Instead of being a formal tool used by the OMB, this new, vaguely defined framework switched “the focus from grading programs as successful or unsuccessful to requiring agency leaders to set priority goals, demonstrate progress in achieving goals, and explain performance trends.” However, this new framework lacked budget accountability.
According to Robert D. Lee Jr., professor emeritus at Pennsylvania State University, and his coauthors, “Although a new framework for performance improvement and analysis was forecast, what actually developed were specific initiatives to conduct more rigorous program evaluations and publish the results.” While increasing the number of evaluations that assess the effectiveness of federal programs is an important success of the Obama Administration, the elimination of PART delinked accountability from budget decisions. Continuing to fund programs, regardless of evaluation results, does not serve the interests of federal taxpayers.
“While the expressed interest of the Obama administration in assessing the performance of federal government agencies is evident,” according to Professors Jones and McCaffery, “PART as employed by the Bush administration has been abandoned and there is no evidence to suggest that the [Obama] administration will attempt to implement performance budgeting per se as a means to accomplish this end. Rather, performance is now reviewed by OMB less formally than under PART.” Thus, the Obama Administration diminished the OMB’s role in assessing the performance of federal programs by effectively separating budget recommendations from performance measurement and evaluation. If the information obtained from evaluations has no influence over budgetary decisions, the knowledge gained from the evaluations is of little use.
In addition, the Obama Administration created a website—performance.gov—to highlight its plans to improve the performance of the federal government. Commenting on the site, Sean Reilly of The Financial Times reported that the “site also offers no comprehensive assessment of federal programs’ performance.” “Instead,” he continued,
the site, which is run by the Office of Management and Budget and the General Services Administration, offers anecdotal summaries of what various agencies have done to improve financial management, human resources, sustainability and other areas. It also includes performance reports and other information previously buried on individual agency websites.
To date, the information provided at performance.gov has not substantially changed since Reilly’s original assessment.
The use of rigorous evidence in policymaking is a crucial area where the next President can help to improve accountability and fiscal discipline in the federal budget process. The President’s budget recommendation is considered an opening maneuver in the budget process. As an opening maneuver, the President can encourage Congress to be more fiscally disciplined by incorporating rigorous evidence into budget recommendations. The next President must sell his or her vision to the American people to build support for presidential priorities, but the President must also devise tactics to sell budget recommendations to Congress. This is exactly where a revitalized and improved PART (PART 2.0) will play a vital role.
While the annual budget recommendation obviously serves the President’s political interests, it also provides useful information to Congress. Through PART 2.0, the OMB’s analytical skills will contribute to the budget information provided to Congress on how well federal agencies are performing.
Returning a reinvigorated PART to the President’s budget recommendations is crucial to setting an evidence-based agenda focused on fiscal discipline—but Congress will need to place a greater emphasis in its funding decisions on what rigorous evidence indicates about the performance of federal programs. If history is any guide, PART 2.0 will face congressional resistance.
Legislation that would have created a statutory obligation to implement some aspects of PART has been introduced in Congress, but not passed. However, the House of Representatives Appropriations Subcommittee on Financial Services and General Government, which has jurisdiction over the OMB, “put a limitation on OMB’s authority and approach to PART.” Further, the subcommittee “stipulated that if the committee did not agree with OMB’s plans for PART, it would prohibit OMB from using PART in its budget requests.”
Recommendations for PART 2.0
Under the assumption that government agencies seek to maximize their budgets, an instrument, like PART, is essential for applying an offsetting force against the ever-increasing need for more expenditures. Because appropriations by Congress largely occur on an incremental basis, targeting annual budgets based on performance is a reasonable method for counteracting bureaucratic behavior. While the effect of PART on congressional appropriations was largely nonexistent, the next Administration should bring the performance budgeting tool back. An Administration that clearly believes that rigorous evidence should be used in budgeting will encourage Congress to become more fiscally disciplined.
Revising and improving the original PART, essentially creating PART 2.0, will help the next President restore fiscal discipline and spend taxpayers’ hard-earned dollars wisely. In addition, PART 2.0 should help structure communication between the OMB and agencies by focusing administrators and managers on key presidential priorities. Creating PART 2.0 will send a clear message that fiscal responsibility dictates that funding allocations must be linked to performance.
How Would the Process Work? The emphasis on the use of rigorous evidence in the formulation of the President’s annual budget recommendation would logically begin with a reinvigorated spring planning review by the OMB.
A revitalized spring review should be focused on evidence-based policymaking. Federal agencies would be required to present the OMB with evidence on their performance for the OMB to review in the spring. Budget requests from agencies should be based on their performance, not just desired levels of funding.
PART 2.0 assessments should be performed during the spring and summer, before agency budget submissions occur in the fall. By strongly embedding PART during this time period, the OMB will increase its influence by defining what performance information is credible and relevant to more adequately classify programs as succeeding or failing. The reinvigoration of spring planning review is not a new idea. The next Administration should use PART and the OMB’s analytic capability to advance an evidence-based policy agenda that has real budgetary consequences. PART 2.0 should be used to tightly focus the OMB’s analytic resources on funding what works, and defunding what does not work.
Early Collaboration Is Essential. For PART 2.0 to be successful, the next Administration should facilitate a process for the OMB to collaborate with departments to ensure that federal programs are rigorously assessed for effectiveness. Collaboration will be especially important when the next Administration attempts to enact its policy agenda. Early collaboration between the OMB and agencies to embed rigorous evaluations during the development of the President’s agenda is often a missing ingredient. While the OMB is often perceived as blocking bad ideas from being developed by agencies, OMB staff also need to develop cooperative relationships with agencies to identify and encourage the evaluation of programs.
Not only do large-scale experimental evaluations need to be embedded into newly created programs and initiatives, existing programs need to undergo rigorous evaluations as well. Professor Dale Farran of Vanderbilt University has correctly criticized the Obama Administration for not requiring the rigorous evaluation of $226 million awarded to 18 states to fund new “high quality” state-level preschool programs. This is a mistake. The OMB needs to be involved early in the program-formulation stage. This means that the OMB program examiners should be trained in the benefits of random assignment and more involved in the development of the new initiatives. Not only will the OMB need to have enough expertise to recognize when an experimental evaluation can be applied, but the agency must be assertive in making these evaluations occur, and that the results are released without unnecessary delay.
Leadership Is Vital. Leadership is crucial to setting an evidence-based agenda. Effective leadership is more than offering mere political rhetoric. It has to have actual budget ramifications.
First, the next President needs to send a clear message to the OMB and the entire federal bureaucracy that the West Wing believes evidence-based policymaking should influence budget decisions and policy formulation. Evidence needs to be tied to budget decisions. Second, focusing the OMB on evidence-based policymaking will require the OMB Director and senior staff to develop clear expectations that program associate directors (PADs) and program examiners are to concentrate on rigorous evidence for justifying agency budgets.
Setting a clear evidence-based agenda that is tied to the performance evaluations of OMB personnel is a crucial element in shifting the federal government away from funding programs based on intentions, and toward results. A strong message from the White House and the OMB Director is needed to set expectations of OMB staff.
The OMB has five Resource Management Offices (RMOs)—(1) Natural Resources Programs; (2) Education, Income Maintenance and Labor Programs; (3) Health Programs; (4) General Government Programs; and (5) National Security Programs—that oversee the budgets and management of federal agencies. Each RMO is overseen by political appointees deemed PADs. The PADs oversee career civil servants. Under PADs are deputy associate directors (DADs, or “division chiefs”) that supervise branch chiefs.
Program examiners serve under branch chiefs and have been coined “critical foot soldiers” responsible for reviewing all budgetary, legislative, and program issues under their review areas. Program examiners perform the bulk of the information gathering and analysis by the OMB. Program examiners “must be proficient as translators of broad presidential policies into specific programming applications in order to be able to explain presidential policies to agencies, and to be able to make analysis and recommendations they offer the President useful in light of his agenda.” These examiners assist in clearing legislative proposals before being sent to Congress, help clear congressional testimony of executive branch appointees and career civil service staff, and occasionally participate in interagency task forces and commissions. OMB program examiners played crucial roles in implementing PART.
While there must be buy-in by career OMB staff in order for PART 2.0 to be successful, federal agencies may be resistant to any proposals to hold government accountable. What can the OMB do to encourage federal agencies to perform large-scale experimental evaluations of their programs?
First, the OMB’s apportionment powers need to be strategically exercised. Throughout the fiscal year, the OMB makes quarterly apportionments of funding appropriated by Congress. The OMB should encourage agencies that are reluctant to rigorously evaluate their programs by making apportionment contingent on performing such evaluations and releasing the results to the public on a timely basis.
In order to force agencies to overcome their reluctance in performing rigorous experimental evaluations, the OMB should withhold funds when agencies are determined to not be complying with directives to rigorously evaluate their programs. For example, the OMB should temporarily withhold apportioning funding directed toward the salaries of the leadership of a foot-dragging agency.
Second, the OMB has the authority to approve the “reprogramming” of funds—the “shifting of monies from one project to another within the same appropriations account.” If an agency is not devoting enough resources toward rigorous evaluation, the OMB should reprogram funds within an agency towards rigorous evaluations.
Assistance to Congress. What can the executive branch do to make the congressional process more inclined to adopt evidence-based policymaking? The executive branch should offer clear language on outcome expectations for authorization legislation. Further, it can propose performance measures that will be used to gauge progress toward the goals of the legislation.
To inform Congress and the policy community on the benefits of PART 2.0, senior OMB staff need to reach out to Congress, just like OMB officials did during the Bush Administration’s advocacy of PART. Bush Administration OMB Associate Director for Administration and Government Performance Robert Shea
made an admirable effort to win the attention and approval of key committee and agency staff, appearing and engaging all comers at a seemingly endless series of appointments on the Hill and around Washington for meetings sponsored by management advocacy organizations, think tanks, and consulting firms.
Not only should PART 2.0 make fiscal discipline a central aspect of the President’s budget requests, it can potentially improve fiscal discipline within Congress.
Encouraging a Fiscally Responsible Congress
The current link between performance and congressional appropriations is, at best, tenuous. Barriers to performance budgeting created by legislatures include uncertain or vague policy goals that impede deliberate goal setting, reliance on anecdotal information rather than rigorous evidence in making budget allocations, and dedicating inadequate attention to oversight of program performance. The following sections offer recommendations for how Congress, with the assistance of the executive branch, can become a wiser steward of the federal purse.
Appropriations. According to Philip Joyce, professor of public policy at the University of Maryland, “there is little evidence that appropriations committees consider performance information in any systematic way.” The appropriations committees tend to focus on marginal decisions, rather than on the effectiveness of spending. The process is too often a vehicle for doling out money to special interests and bureaucracies.
The best way to get Congress to adopt evidence-based policymaking is to cement reforms in the appropriations process. For example, passing legislation that would anchor PART 2.0 in the appropriations process will go a long way in making evidence-based policymaking influential in the federal government.
In 2009, Representative Henry Cuellar (D–TX) introduced the Government Efficiency, Effectiveness, and Performance Improvement Act of 2009 (H.R. 2142). The act would have codified the original PART into federal law. However, the legislation morphed into the GPRA Modernization Act of 2010 that completely dropped PART from the legislation that became law.
Oversight. During confirmation hearings of presidential nominees, committee members should ask detailed questions on how the next Administration can improve the application of evidence-based policymaking. For example, how can they improve upon what previous Administrations have done? By engaging political appointees on this topic, Congress will spur their interest in evidence-based policymaking.
Authorizations. Congress can take several steps to ensure that federal programs are properly assessed for effectiveness. First, when Congress authorizes existing or new programs, the legislation should set clear expectations for performance that are confirmed by large-scale, multisite experimental evaluations wherever possible. The expectations and evaluation of the program need to take hold during the authorization process. Second, the experimental evaluations should be large-scale, nationally representative, multisite studies. Third, Congress should specify the types of impact measures to be assessed. Fourth, Congress should institute procedures that encourage government agencies to carry out congressionally mandated evaluations, despite any entrenched biases against experimental evaluations. Fifth, Congress should require that congressionally mandated evaluations be submitted to the relevant congressional committees and released to the public in a timely manner after completion.
Commissions. Congress should consider creating commissions to help provide recommendations for creating a leaner, more effective federal government. The Bush Administration proposed the Government Reorganization and Program Performance Improvement Act of 2005, which would have empowered the President to create a commission to review the performance of federal agencies and recommend programs for termination. The termination recommendations would have to be approved through an expedited process by Congress. Similar legislation, the Government Reorganization and Program Performance Improvement Act of 2005 (S. 1399), which was also not passed, was proposed by Senator Craig Thomas (R–WY) during the 109th Congress.
The Bush Administration proposed two types of commissions to regularly assess the performance of federal programs. First, the Government Reorganization and Improvement of Performance Act would have created a bipartisan “sunset” commission to review the performance of federal programs over a 10-year period. It would have recommended ways to improve the performance of worthy programs and the abolishing of ineffective programs. Second, the Sunset Act would have created a “results” commission to evaluate the degree to which specific programs are producing their intended outcomes.
A similar commission proposal, the Commission on the Accountability and Review of Federal Agencies Act (H.R. 522), during the 114th Congress by Representative Doug Collins (R–GA), would create a federal commission to evaluate federal agencies and their programs over a six-year period to identify duplicative programs for consolidation, and wasteful programs for termination. For the identified programs and agencies, the commission would recommend realignment or termination. Similar legislative proposals have been viewed as a potentially effective means to consolidate duplicative programs and eliminate wasteful spending.
America’s debt is out of control, and Congress and recent Presidents have done little to decrease spending. The federal government needs to prioritize government spending by targeting resources intelligently. The use of rigorous evidence is a crucial area where the next President can help improve accountability and fiscal discipline in the federal budget process. A genuine evidence-based agenda focused on fiscal discipline will help the next President re-assert control over runaway spending.
The next Administration should re-establish a modified and improved PART along with a fiscally disciplined evidence-based spring review by OMB. When programs that fail to produce results receive reduced funding or are terminated altogether, and programs that generate results continue to receive funding, a better allocation of scarce resources is the result. Unless rigorous evaluation results are strongly linked to budget decisions, any proclamations about the benefits of evidence-based policymaking are meaningless.—David B. Muhlhausen, PhD, is a Research Fellow for Empirical Policy Analysis in the Center for Data Analysis, of the Institute for Economic Freedom and Opportunity, at The Heritage Foundation.