As a funder making about $18 million worth of grants annually in western New York State, we are tremendously interested and invested in the issue of evaluation and outcome, so when I read Steven Lawry’s “When Too Much Rigor Leads to Rigor Mortis,” it led to my wanting to share some thoughts. As someone who worked for over 18 years on the “seeker” side of the table, and now 10 years on the “maker” side, I know that the claims, expectations, and understandings about evaluation are all over the map, and often misguided.
The critical point at which thinking is split is around expectations. What can one reasonably expect to be learned in short-term evaluation that is useful to guide further development? What exactly is reasonable to try to measure? All of us on both sides of the grant would love to be able to draw a straight line from the check being deposited in the non-profit’s account to a significant improvement in a burgeoning social issue 12 months later. But one response to the axiom that you can’t manage what you can’t measure is a paraphrase of Einstein’s observation: “Not everything that can be measured is important, and not everything that is important can be measured.”
Two critical factors seriously limit our ability to measure “impact and outcomes.” One is time. Change takes a very long time to achieve, especially in the behavior of humans, and measuring what happens to people within a 12- or 24-month period based on an intermittent (at best) intervention is not likely to reflect true or lasting change. The other is outside influence—how many other things going on in the lives of the targeted populations are likely to influence the outcome of an intervention? If an organization is doing an afterschool nutrition program for kids, and at the same time, the school is doing a major push toward good food in the cafeterias, how is that accounted for in the measurement of the program? Or a physical activity program offered by an organization during football season, when the NFL is heavily promoting its Play60 program on television? Or a poet in a classroom during National Poetry Month, as opposed to other times?
The time factor is the most elusive. When I ran an injury prevention program in schools, I would pre- and post-evaluate in three areas: knowledge, perception of susceptibility, and behavior change. It was an elaborate, 43-item tool given to the same kids at one and six months after they participated in the program. I could always “prove” an immediate leap in knowledge, an immediate though shallower leap in whether they perceived themselves to be susceptible to injury, and little or no leap in behavior change. Six months later, the movement more often than not slipped backward to varying degrees. At 12 and 24 months, in lieu of additional interventions, I’m certain it would have been nearly back to baseline. But I didn’t have the time or funds to sustain the evaluation that long, and the funders were not interested in carrying it that long. What I most often reported to them, of course, was the result of the one-month evaluation that showed a great impact. And that was unequivocally true, but misleading.
In science, the outside influence factor is accounted for by use of the double-blind placebo approach, where no one knows who’s getting the intervention and who isn’t. This is fortified by the creation of a control group. With pharmaceutical trials, the test subjects are often kept secluded primarily for safety, but also to limit as much as possible any outside influence. And for as good as all that might be, and as much as it may indicate a good or successful drug, we’ve all seen how the time factor comes into play: What looked fantastic in the short term turned out to be a disaster (or useless) in the long term. This also raises the issue of inappropriate extrapolation. For years, drugs were only tested on adult men, often white, and then extrapolated to the rest of humanity. It didn’t go so well over the long term, though that wasn’t understood until relatively recently, most significantly in children.
It is similar in the social sector, and two interventions that come to mind quickly are DARE and abstinence education. Neither accomplished in the long term what they did in the short term—at least as far as the most recent evaluations indicate. Does that mean they shouldn’t have been done, or that all that effort was wasted? I would suggest that both programs undoubtedly had positive impacts that we either didn’t think to consider, or had no means to evaluate at the time.
Sign up for our free newsletters
Subscribe to NPQ's newsletters to have our top stories delivered directly to your inbox.
“Not everything that is important can be measured.”
I believe that’s the core of the argument posed by Mr. Lawry. We are ignoring critical factors regarding how social change work is done by the people who do it in favor of measuring short-term improvements that may or may not be indications of long-term change.
What I believe can and should be measured by funders in the short term is not whether a program alleviated poverty or made more people literate, but whether the organization providing the programs to accomplish those things is in a position to do so effectively into the future. Ultimately, the funder’s primary area of influence is neither the client nor the targeted population, but the organizations that do the work. We are at a remove from the actual work on the street, however reluctantly we have to accept that, but we can and should be directly involved in working with (and funding) the organizations to do the work as best they can.
This is the heart of the decades-old argument of general operating vs. project support. Funders are as passionately interested in effecting change at the street level as nonprofits are, and therefore that’s where we want most to focus. And while I am not an unequivocal supporter of GOS (another day, another letter), I believe organizational development is where we should primarily, though not exclusively, be focused. That doesn’t mean that programs or interventions shouldn’t or can’t be measured for their impacts. It means that such measurement must be made—committed to—over the long term (3-5+ years), and no funder or nonprofit or evaluation consultant should expect a reliable understanding of “impact” otherwise.
The development of the nonprofit organization provides plenty of factors to evaluate and many outcomes to strive for. It can also satisfy the funder’s obligation to effectively steward resources insofar as an organization is being helped to last for the long term and have a much greater chance of effectively achieving its, and therefore the funders’, goals.
In the same way a drug can’t be evaluated while ignoring the attributes of the body it’s travelling through, an intervention (program) can’t be evaluated apart from the attributes of the organization and people offering it. It can’t be evaluated in the short term only, and it can’t be evaluated without accounting for the many other factors that affect the claimed outcome. The impatience of funders to prove a return on investment, the nonprofit’s desire to prove its value not only to the funder but to itself, and the evaluation consultant’s need to prove that such things can be “proved” all continually clash, making program evaluation and outcome measurement problematic at best, and outright misleading—and therefore dangerous—at worst. Changing this will require a serious effort at realigning expectations, and perhaps a significant shift of focus.