Are We There Yet? A Conversation on Performance Measures in the Third Sector

What’s on the horizon for nonprofit accountability? How can scholars help practitioners make sense of these changes? Last month, 1,000 researchers and pracademics gathered in Chicago to discuss these and other questions at the annual meeting of the Association for Research on Nonprofit Organizations and Voluntary Action (ARNOVA).

Among the most interesting sessions was the closing plenary led by Dr. Alnoor Ebrahim of Harvard University’s Social Enterprise Initiative. He facilitated a dialogue with Jacob Harold (President & CEO of GuideStar), Debra Natenshon (Principal, DBN & Associates), and Mari Kuraishi (President, Global Giving) on third sector performance metrics. The conversation opened with a brief description of the Performance Imperative, a collaborative project organized by Leap of Reason to develop common understanding of what constitutes high performance in third sector organizations. The initiative’s working definition of performance is “the ability to deliver—over a prolonged period of time—meaningful, measurable, and financially sustainable results for the people or causes the organization is in existence to serve.” The discussion centered on three of the initiative’s seven performance indicators: a culture that values learning, internal monitoring for continuous improvement, and external evaluation for mission improvement. Drawing parallels with a growth mindset, the panel discussed how learning and evaluation have become core skills that organizations need to succeed in a complex and constantly changing world.

The discussion heated up as Jacob Harold explained GuideStar’s efforts to improve information flows through a taxonomic NTEE classification system of exempt entities. He described this approach as a valuable resource for researchers and funders who seek to make sense of the sector’s many faces and complexity. The NTEE code system allows for robust statistical analysis that enables increased understanding of funding patterns, organizational life cycles, effects of economic downturns, and other third sector phenomena.

In contrast, Mari Kuraishi argued that rigid taxonomies may inhibit the process of discovery, create power divides, and lead to pseudo-solutions in search of a market. She suggested that organizations instead be characterized through tags—multiple descriptors that reflect the range of things organizations do to address complex problems. A tagging approach eliminates the need for mutually exclusive and exhaustive categories and provides more options for researchers to compare organizations, recognize patterns, and produce richer and more nuanced understanding of the sector and communities.

Debra Natenshon observed that whatever system is used, practitioners should have the opportunity to develop the system since they are the ones ultimately held accountable for performance. A common taxonomic framework promotes shared understanding. It can also serve as a preliminary template for local organizations to adapt to regional needs. Ideally this process would be crowdsourced, with practitioners serving as the primary drivers of shaping and reshaping the taxonomies. The mechanisms for capturing the data will be more practical if the categories are designed by the end-users to meet their needs.

Dr. Ebrahim asked about the potential of these frameworks—for better or worse—to become de facto regulators of nonprofit behavior (i.e., imposing control through disciplinary technologies). If care is not taken with their development, performance metrics and incentives can inadvertently work against mission fulfillment by shifting attention from mission to conformance with bureaucratic norms. He noted that performance standards inherently reflect tensions between top down vs. bottom up, centralized vs. localized approaches. Each aspect is needed to understand the multiple dimensions of performance.

In the midst of this rich discussion, Dr. Lester Salamon emerged from the audience to ask about the role of values in performance measurement. Beyond a service function, he noted that nonprofits embody expressive functions that bring principles, values, and ideals to life. Ultimately, what counts as performance is determined by how we define purpose, a sentiment echoed by Dr. Chris Thompson of BoardSource.

The session closed with reflections on what scholars and leaders can do to improve nonprofit performance. Panelists mentioned Jim Collins’ observation that organizations don’t need to become more business-like. Rather, they should strive to become more disciplined. Mari encouraged more research on philanthropic funding and donor motivations as a way to better understand the sector, increase community engagement, and drive meaningful change. Jacob thanked the scholars for their research, urging them to translate their fancy language into usable formats that can improve practice. Debra advised leaders to spend 75 percent of their time facilitating shared understanding of their organizations’ theory of change, clarifying with stakeholders why the organization exists and defining what mission success looks like.

As an audience member who worked for many years at a natural history museum, I was struck by parallels between this conversation and natural science research. Taxonomy is a fundamental method used by environmental scientists to study, classify, and categorize organisms and their relationships. It feeds into the larger fields of systematics (understanding interrelationships among living things and the mechanisms for their evolution) and ecology (studying interactions of organisms and their environment). Collectively, these multiple approaches produce a more comprehensive understanding of the natural world than a single approach can offer alone. Further, in tandem they produce useful insights to inform policy and develop effective practical action (e.g., more successful conservation management practices).

How might this model be applied to the study and practice of third sector performance? Recognizing the potential usefulness of a nested approach is a good starting point. The “tags versus taxonomy” question boils down to standpoint. Just as mapping applications portray information in a variety of levels, so too should researchers and practitioners consider purpose in their frame selection for performance measures. In MapQuest, the street view perspective is not superior to the satellite view. The “best” one depends on how far you are traveling, and the value choices you make in route selection (e.g, toll avoidance, surface street vs. freeway). Similarly, what counts as performance for service delivery, integrative, and/or expressive functions looks very different for each. If we don’t recognize this we risk hegemony, potentially imposing our metrics onto others whose purpose and values may be different.

Ebrahim closed the session with a quip he attributed to Woody Allen: “If you are going to aim low, you better not miss.” The Performance Imperative offers a useful launching pad. Like systematics and ecology, it shines a light on the vital role that processes play in making systems sustainable. Many of the Performance Imperative’s seven dimensions are process-based (its other four dimensions are effective executive and board leadership; disciplined, people-focused management; well-designed programs and strategies; and financial health and sustainability). Performance in complex environments is not just about getting to a destination. It also involves the capacity to create the mode of transport, navigate collectively, recalibrate as conditions change, and discern optimal paths guided by mission and values.