High ImpactEvaluations
Exploring the Potential of
Real-Time and Prospective
Evaluations
Summary of a Workshop
 High ImpactEvaluations
Exploring the Potential of Real-
Time and Prospective Evaluations
A Workshop Conducted by the
Independent Evaluation Group
Washington, DC
January 27, 2010



    Contents

    Acronyms and abbreviations	                                           3	
    Foreword	                                                             4
    Welcoming Remarks – Daniela Gressani	                                 5
    Session 1: Conceptual Issues	                                         6
    Utilization-Focused Evaluation: Real-Time and Prospective
     Aspects – Michael Quinn Patton	                                      6
    Evaluating in Uncertain Environments: Prospective Evaluation
     and Scenario Building – Tom Ling	                                   12
    Discussion	                                                          16	
    Session 2: Real-Time and Propspective Evaluation in Practice	        21
    Real-Time and Prospective Evaluation in Practice: The Expericence of
     the U.S. Government Accountability Office – Stephanie Shipman	      21
    The United Kingdom Response to the Crisis: Evaluating in Real Time –
     Philip Airey	                                                       27
    Real-Time Evaluation in the Independent Evaluation Group:
     Assessing the World Bank Group’s Response to the
     Global Crisis – Ismail Arslan, Dan Crabtree, Ali Khadr,
     Marvin Taylor-Dormond and Stoyan Tenev	                             31
     Discussion	                                                         35	
    Concluding Remarks – Daniela Gressani, Marvin Taylor-Dormond	 46
    Keynote Address 	                                                    48	
     Introduction – Patrick G. Grasso	                                   48	
     Complexity Theory and Evaluation – Michael Quinn Patton	            48	
    Discussion and Close	                                                60	
    Annex: Utilization-Focused Evaluation: Real-Time and
     Prospective Aspects (paper)	                                        67
© 2011

Independent Evaluation Group, The World Bank Group
1818 H St., NW
Washington, DC 20433

IEG: Improving Development Results Through Excellence in Evaluation

The Independent Evaluation Group is an independent unit within the World Bank
Group; it reports directly to the Bank’s Board of Executive Directors. IEG assesses what
works, and what does not; how a borrower plans to run and maintain a project; and the
lasting contribution of the Bank to a country’s overall development.
   The goals of evaluation are to learn from experience, to provide an objective basis for
assessing the results of the Bank’s work, and to provide accountability in the achieve-
ment of its objectives. It also improves Bank work by identifying and disseminating the
lessons learned from experience and by framing recommendations drawn from evalu-
ation findings.
   The findings, interpretations, and conclusions expressed here are those of the
author(s) and do not necessarily reflect the views of the Board of Executive Directors
of the World Bank or the governments they represent, or IEG management.
   The World Bank cannot guarantee the accuracy of the data included in this work.
The boundaries, colors, denominations, and other information shown on any map in
this work do not imply on the part of the World Bank any judgment of the legal status
of any territory or the endorsement or acceptance of such boundaries.

ISBN 13: 978-1-60244-192-7
ISBN 10: 1-60244-192-8

Contact: IEG Communication, Learning and Strategies (IEGCS)
e-mail: ieg@worldbank.org
Telephone: 202-458-4497
Facsimile: 202-522-3125




2
           E x p l o r i n g t h e P o t e n t i a l o f Re a l - T i m e a n d P r o s p e c t i v e E v a l u a t i o n s




ACRONYMS AND ABBREVIATIONS

CEO	                 Chief executive officer
IDRC	                International Development Research Centre
IEG	                 Independent Evaluation Group
IFC	                 International Finance Corporation
IFI	                 International Financial Institution
INTEVAL	             International Research Group on Evaluation
GAO	                 Government Accoutability Office (United States)
GDP	                 Gross domestic product
MPL	                 Maximum probable loss
NAO	                 National Audit Office (United Kingdom)
NBC	                 National Broadcasting Company
OMB	                 Office of Management and Budget (United States)
RBS	                 Royal Bank of Scotland

All dollar amounts are U.S. dollars unless otherwise indicated.




                                                                                                                         3
H I G H I MPA C T E V A L U A T I O N s




Foreword



International development is undergoing a transformation driven by fundamental shifts
in the global economic landscape. The 2008 financial crisis sent a shock-wave across the
global markets and threatened to erase years of progress in development and poverty
reduction in developing countries. It underscored the changing nature of the global
architecture, a major aspect of which is the speed at which change occurs and the grow-
ing need for rapid and informed responses to potential and ongoing crises.
    As evaluators, this means our approach to our work must undergo a sea change. As
policy makers act on issues with very high stakes such as the global financial crisis and
climate change, where the long term impact of ongoing actions can benefit from early
feedback, we must be ready to provide an assessment of the likely effectiveness of their
responses – even as those responses are being formulated. To seize this opportunity,
evaluators need to revisit existing evaluation frameworks, respond to the uncertainties
of the time and be willing to provide inputs that inform current and future directions.
We need to work in real time so that our contribution is relevant, useful, and impactful.
We need to generate findings that facilitate continuous learning and feed into a forward
looking perspective.
    In January 2010, the Independent Evaluation Group (IEG) held a workshop com-
prised of academics and practitioners of real-time and prospective evaluation tech-
niques to exchange ideas and experiences. IEG’s subsequent works on the World Bank
Group’s response to the global economic crisis are informed by the discussion at the
workshop. We hope that this report – which includes a complete transcript of the work-
shop – and IEG’s evaluations will help with your work as an evaluator or as a consumer
of evaluations.

                                                                         Vinod Thomas




4
            E x p l o r i n g t h e P o t e n t i a l o f Re a l - T i m e a n d P r o s p e c t i v e E v a l u a t i o n s




Welcoming Remarks
Daniela Gressani, Deputy Director-General, Evaluation, World Bank Group


We are here today to discuss a topic that is relatively new and on which we all, I think,
have a lot to learn, but a topic that is really becoming important. As evaluators, we have
long recognized the need to make sure that our work has an impact, the greatest pos-
sible impact, and providing analysis in a timely way is a fundamental prerequisite for us
to have a greater impact. I think that this is especially important now when institutions
such as the World Bank Group have grown in size and in scope, and at a time when
all the international financial institutions (IFIs) are struggling to respond to a financial
crisis that has become an economic crisis, which, in turn, will require that we distill les-
sons and evaluate in real time.
    This is what this day is about: trying to learn from one another how to improve
the quality and the timeliness of our evaluation at a time when time is, in fact, of the
essence. The value added of today’s meeting is precisely to bring to the benefit of our
own evaluation work on the World Bank Group the experience of important partners
in other institutions.




                                                                                                                          5
H I G H I MPA C T E V A L U A T I O N s




Session 1: Conceptual Issues

UTILIZATION-FOCUSED EVALUATION: REAL-TIME AND
PROSPECTIVE ASPECTS1
Michael Quinn Patton, Organizational Development and Evaluation Consultant


I am delighted to have the opportunity to be with you for this important discussion
today, and am honored to kick it off. Let me remind everyone of some of the larger
context. Tonight, President Obama will deliver the State of the Union Address, and had
I known that when I prepared this, I would have called my presentation the State of
Evaluation Address, but in that spirit, let me invite half the room to interrupt my presen-
tation every three minutes with a standing ovation and the other half to boo and make
rude remarks as I proceed to get us warmed up for this evening’s adventure.
    [The book] Utilization-Focused Evaluation2 covers a great deal of our history, and
I want to use that to talk about the state of evaluation as context for this consideration.
The first edition of that book came out in 1978 and was basically reporting our findings
on a study of use in the federal government, and the importance of the personal factor
in how evaluations get used. The second edition, in 1986, brought together, from a lot of
the work being done on use, the importance of intended use by intended users, being
very clear about the purpose of any given evaluation and who it is for. In the 1997 edition,
I introduced, as a field that I was coming to be aware of, the idea of process use, which
is the way in which how an evaluation is conducted has an impact quite apart from the
findings—things like capacity building, what gets measured gets done, the creation of
logical frameworks and logic models for evaluation that begin to have an impact before
any data are collected. And that has become a major theme of the last decade, which I
think is quite relevant to real-time and prospective evaluation. The major new direction
of the latest edition, which came out just over a year ago, was the challenge of evaluating
under conditions of complexity. In a sense, in a thumbnail, that is some of the learning
about the way in which the profession has emerged. That means that this session, and
the direction that the Independent Evaluation Group (IEG) is going, are very much on
the cutting edge of the larger issues that the profession faces.

Premises
Utilization-focused evaluation is a decision-making framework for enhancing the util-
ity and actual use of evaluations. It begins with the premise that evaluations should be


1. See Annex for full paper.	
2. Michael Quinn Patton, Utilization-Focused Evaluation, 4th Ed. Thousand Oaks, CA: Sage,
2008.

6
            E x p l o r i n g t h e P o t e n t i a l o f Re a l - T i m e a n d P r o s p e c t i v e E v a l u a t i o n s




judged by their utility and actual use. Therefore, evaluators should facilitate the evalu-
ation process and design an evaluation with careful consideration; everything that will
be done from beginning to end will affect use. So a part of what I want to call to our
attention is that realtime and prospective forms of evaluation have utilization implica-
tions: not just the timing of evaluation, but issues of credibility and quality and speed
and all those things that are challenging the profession.
    Some of what we have learned about use may be germane here. We have learned
that use is a process, not an event, and that it needs to be facilitated. It involves an
interaction, not just a report, to interpret findings and apply them. It involves training
for use, not just the delivery of results. The intended users have to have some help
in knowing what to do with findings. It is not apparent or natural to go from data to
action and decision making, and use will mean different things for different evalua-
tion purposes.
    Evaluation is now part of an initial program design, including conceptualizing theo-
ries of change. Whether evaluators are present or not the very notion of theories of
change has become so prominent that evaluative thinking becomes built into the pro-
gram design process, and complexity is itself a theory of change about how the world
works. The evaluator’s role is to help users clarify their purposes, hope for results, and
change the model. Evaluators can and should offer conceptual and methodological
options. Evaluators can help by questioning assumptions. We play a key role in facilitat-
ing evaluative thinking throughout implementation as well as evaluation, and designs
can be emergent and flexible, which is one of the challenges we are going to be talking
about today, one of the new directions in evaluation.
    For me, the big context here, my own bias about this, is that we live in a world that is
increasingly driven by and paying attention to various forms of evidence-based practice.
I like to say that evaluation grew up in the projects, testing models under a theory of
change that pilot testing would lead to proven models, it could be disseminated and
taken to scale. The search for best practices-- evidence-based practices-- remains one of
the dominant, if not the most dominant, approach in much of philanthropy, in much of
government and international agency funding. But what that comes up against is a fun-
damental debate, both intellectual and practical, about how the world has changed.
    Whether it is through the top-down dissemination of “proven models” or a bottom-
up adaptive management, this is a fundamental issue that, at the macro level of theo-
ries of change, is what brings us to issues of complexity. These are competing views
about how the world is changed. Evaluation is a part of that debate because what we
produce is going to be what informs both of these approaches, either the top-down
dissemination of proven models or to be able to inform adaptive management, which
is indeed real-time and prospective.
    This also relates to an important distinction between dissemination of models and
dissemination of principles. Best practice models yield recipes for exactly what to do,


                                                                                                                          7
H I G H I MPA C T E V A L U A T I O N s




and the form of evaluation associated with that is fidelity evaluation. Is the model being
replicated exactly as evaluated? Principles come out of bottom-up adaptive manage-
ment, and when we generate principles and lessons learned those are not recipes. They
have to be interpreted and adapted and applied within complex adaptive systems and
contexts. That is a very different process than the high-fidelity replication of a proven
model. Which means that the conditions that challenge traditional model testing evalu-
ation, which I want to suggest has been and remains the dominant paradigm in the
field and the dominant paradigm as I interpret IEG’s work, the conditions that now
challenge and lead us into this new direction are high innovation, rapid change, high
uncertainty, dynamical, not just dynamic, systems.

Evaluation, complexity, and dynamical change
Dynamical is a word in the complexity language that means ups and downs, not simply
increases. Dynamic systems are on a pattern of increase or decrease; dynamical systems
fluctuate in unpredictable and uncontrollable ways, emergent of factors in situations
and overall systems change, all of which require and respond to adaptive management
rather than a top-down, evidence-based, fidelity-driven approach to either implementa-
tion or evaluation.
   Reminders, which we hardly need, but are part of conceptualizing our discussion of
sudden change in massive uncertainty: 9/11, the Rwanda genocide, the SARS epidemic.
When SARS hit Toronto, I happened to be working in Canada at the University of Toronto.
There were ultimately about 40 people who died of SARS, and the economy of Toronto
took a 25 percent hit from which it took two years to recover. The Wolfowitz scandal and
resignation from the World Bank: I presume that was not an expected event. The global
financial meltdown, which we are talking about today, the H1N1 virus, natural disasters
like tsunamis and earthquakes. And closer to my own home, some of you will recall that
on August 1, 2009, the bridge that was the main artery running through Minneapolis, sud-
denly collapsed at five o’clock in the afternoon, the main freeway that was the link not
only for the Twin cities but for the entire state of Minnesota, and indeed the entire region.
Ten days before this road collapsed, I was part of a group kayaking on the Mississippi
River. We put our kayaks underneath this bridge and hiked up the bank to a coffee shop,
came back down, we were cleaning up the river along that section. So when the bridge
collapsed, I can assure you that it fell on clean ground. There was no trash to interfere with
them later, but this has completely remade the transportation system in Minnesota, with
huge reverberations that are still going on, an unexpected and uncertain event.
   Evaluation’s traditional comfort zone has been smart goals, controlled interventions
anddefinitive findings -- traditional social science methods rendering major judgments.
The emergent realities outside of our comfort zone that we are here to talk about are
where uncertainty rules, where control is an illusion, and where complexity is the norm.
Part of the issue is how to know what that territory is and what its implications are.


8
            E x p l o r i n g t h e P o t e n t i a l o f Re a l - T i m e a n d P r o s p e c t i v e E v a l u a t i o n s




    Many of you, I suspect most, are familiar with Nissam Taleb’s important book, The
Black Swan3, in which he argues that the kind of events that I just went through, as
highly uncertain and unpredictable with big implications, are actually much more com-
mon and much more dominant than people acknowledge. Indeed one of the extraor-
dinary things about his 2007 book is that he predicts in great detail the global financial
crisis, regularly described by economists and financial managers and gurus as an outlier
event, and the reasons that it would occur. He argued that the the major reason the
crisis would occur was because the entire economics and financial world was treating
its likelihood as an outlier, outside of their probability estimates. He argues that black
swans are common, they are definitive, and they are what control the world, not our
normal activity. What goes on between black swan events is actually a temporary adjust-
ment to the last black swan event.

Evaluation and strategy
Let me introduce into this discussion Henry Mintzberg’s work on strategy. Mintzberg
is one of the major writers on strategic management. He is at McGill University. The
Wall Street Journal has identified him as one of the 10 most influential management
consultants of the last 30 years. He came out with a book in 2007 called Tracking
Strategies4, which is actually an evaluation book, although Henry did not recognize it as
such until I met with him and told him that was what it was. But that book has 13 case
studies of major multinational private sector organizations, and government and NGO
organizations, that he has tracked over 20 to 30 years of what has happened with their
strategies. And the picture that emerges from Mintzberg’s work is that any organization
begins with an intended strategy in a proposal, in a strategic plan about what they want
to accomplish, and then they go into implementation, and the implementation of that
he calls deliberate strategy, but every organization ends up having a part of that strategy
that is unrealized, and then as they implement, there are new emergent strategies that
end up as realized strategy.
   So what he is saying is that high-performing organizations, in a five-year period, will
begin expecting to go somewhere, and a part of that they will realize, but they will inevi-
tably leave some things behind, and some new things will emerge, and where they end
up in five years will not be where they thought they were going to be five years ahead.
That is normality. That is also complexity. Now the implication of this is huge for evalu-
ation, because our classic accountability model is to evaluate programs and projects
on whether or not they ended up where they thought they were going to be five years
earlier, and Mintzberg’s work says no effective organization does that.


3. Nissam Nicholas Taleb, The Black Swan: The Impact of the Highly Improbable, 2nd Ed. New
York: Random House, 2010.
4. Henry Mintzberg, Tracking Strategies: Toward a General Theory. Oxford: Oxford University
Press, 2007.

                                                                                                                          9
H I G H I MPA C T E V A L U A T I O N s




    The challenges, then, are situation recognition and appropriate evaluation designs,
and I am going to take you very quickly through a definition of complexity, and then
Tom and others will tell you what to do about it. So I am just going to try to help define
the territory of what it is that we are talking about. The context for this is research that is
going on about expertise, the nature of expertise, artificial intelligence work around try-
ing to model expertise, is that expertise does not consist of answers to things. Expertise
is actually defined as situation recognition. What great experts bring is a knack for being
able to understand what situation there is and the answers and responses flow from situ-
ation recognition. So we are talking about a contingency-based form of evaluation that
is based on situation recognition, context sensitivity, clarity about who this is for, clarity
about what it is for, matching methods to the situation, while maintaining criteria of cred-
ibility, meaningfulness, and timeliness.
    To look at this through a complexity lens means that we are dealing with non-lin-
earity, we are dealing with emergence, we are dealing with dynamical interactions, we
are dealing with uncertainty, we are dealing with adaptation. What is this complex ter-
ritory? Let me distinguish between simple, complicated and complex, and I have got
a full paper in your packet that goes into this in more detail [see Annex]. It is also in a
chapter in the Utilization-Focused Evaluation book, and it is a basis of a new book I
have coming out in June that is entirely devoted to complexity evaluations. It is built
around these distinctions, which I am going to run through very quickly.
    We use a two-dimension matrix that my colleague Brenda Zimmerman developed
based on work of Ralph Stacey out of organizational development. On the lower dimen-
sion is a continuum of how much we know about things, how to produce a desired
result, a degree of certainty dimension. The vertical dimension is how much there is
agreement on what to do and whether to do it. What we have here is a combination of
these two dimensions that gives us a matrix of the interactions between degree of cer-
tainty and degree of agreement that defines different kinds of situations. Where there
is a higher degree of certainty that we can produce an outcome and a higher degree of
agreement that is called simple space. This is a descriptive term, not a pejorative term.
It is not simplistic, it is simple. It means we know what to do, this is the realm of best
practices, this is the appropriate realm of randomized control trials, this is the only place
where that actually works, where you can do best practices. It is the realm of vaccines,
it is the realm of polio eradication. The world has decided they want to eradicate polio,
there is agreement about that, we actually know how to do it, and we are on the verge of
doing it because it is in simple space.
    Technically complicated things are things that have lots and lots of parts that you
have to fit together that require lots of features. Launching the space shuttles is tech-
nically complex. Socially complicated things have lots of people involved, and the
congressional analysis of the cause of the Space Shuttle disasters was partly technical,
the O-ring and the foam, but was largely social, the culture of NASA, the interaction
between the political people and the technical people. Socially complicated things are

10
            E x p l o r i n g t h e P o t e n t i a l o f Re a l - T i m e a n d P r o s p e c t i v e E v a l u a t i o n s




human rights agreements, environmental initiatives and the global financial situation,
and socially complicated situations pose a challenge of coordinating many players.
    So we finally get to the zone of complexity. Complexity is characterized by high
degrees of uncertainty, we do not actually know what to do, and high degrees of dis-
agreement about what the situation is, what ought to be done, and the politics of the
situation. The farthest outside is chaos, which is best to avoid, but sometimes inflicts
itself upon us, and the description of the eight days after Lehman Brothers failed, if you
read it in The New Yorker Magazine5, the 24/7 bringing together of the world’s financial
leaders and the world’s bankers, is the best description of utter chaos that I have ever
read. Absolutely nobody had any idea what was going on and were scared to death.
This framework is being used by David Snowden, the former Director of Knowledge
Management at IBM, who now directs a major consulting business called Cognitive
Edge, and wrote a very widely disseminated article in the Harvest Business Review in
November of 20076 about applying complexity to management, and given we are at
the [International Finance Corporation] IFC and the World Bank, it is helpful to have
a business kind of framing for this, which is why I am drawing upon Henry Mintzburg
and people like David Snowden. Snowden’s conclusion is that wise executives tailor
their approach to fit the complexity of the circumstances they face, and what he is
doing these days is training companies in how to deal with complexity mainly through
real-time kinds of evaluations. That is his approach.

Contingency-based developmental evaluation
This brings us into a contingency-based developmental evaluation, applying these kinds
of complexity concepts, matching the evaluation process and design to the nature of
the situation to achieve intended use by intended users. A contingency-based approach
beyond summative and formative, beyond static accountability models, to real-time,
prospective, emergent action evaluation, adaptive evaluation, what I am calling devel-
opmental evaluation, as opposed to development evaluation, in the paper that is a part
of my presentation. You will see that I distinguish both that all real-time evaluation is
not complexity adaptive, and not all developmental evaluation is development evalua-
tion. I make those distinctions.
    I have identified five issues that are not unlike the issues that Tom Ling is going to
take you through. Where I would leave you, based upon identifying and defining the
realities of the world of complexity that we are going to be talking about, is the mantra
for our time and for today that wise evaluators tailor their approach to fit the complex-
ity of the circumstances they face. Thank you.


5. James B. Stewart, A Reporter At Large, “Eight Days,” The New Yorker, September 21, 2009,
p. 59.
6. David J. Snowden and Mary E. Boone, “Leader’s Framework for Decision Making,” Harvard
Business Review, November 1, 2007.

                                                                                                                        11
H I G H I MPA C T E V A L U A T I O N s




EVALUATING IN UNCERTAIN ENVIRONMENTS:
PROSPECTIVE EVALUATION AND SCENARIO BUILDING
	
Tom Ling, Head of Evaluation and Audit, RAND Europe



This paper builds very closely on Michael Patton’s. But it does come from rather a
different set of concerns and anxieties. The first is the experience of conducting real-
time evaluation for the Department of Health, the European Commission and others
over the last five or six years, and realizing that some of the most important things that
come out of it are connected to the learning that took place and the changes that took
place during the life of the project, and how important it is for an evaluation to track
and learn the lessons from the changes that took place. Evaluating whether or not the
original objectives were achieved can sometimes be less revealing than evaluating how
and why the delivery was adapted to meet changing circumstances.
   The second is a longstanding interest in scenario thinking and in thinking about
whether, when you are faced with the kinds of uncertainties that Michael has been
talking about, you can construct potential scenarios, plausible images of the future, in
which you can test your strategies. At the back of my mind is the thought that there
must be a way of linking that approach to real-time evaluation or ex-ante evaluation. A
further thought is “What should the role of the evaluator be in this process?” We are
used to thinking that evaluations should be both summative and formative but when
and how should the evaluation itself become a driver of change?
   For example, the work we do for the European Commission on Impact Assessment
will typically ask us to construct three different approaches and test their effectiveness
in the future. This typically involves presenting the European Commission’s preferred
approach, a “do nothing approach”, and then there is the radical or extreme approach.
We are then required to say which of these three options is the best. Many have got
anxieties about how this approach to impact assessment is constructed. However, my
point is not to question the details of this approach but to ask whether we should adopt
a radically different approach to such ex ante evaluations. The thing that always occurs
to me is that I would like to take a completely different approach and take the preferred
approach and see how robustly it holds up in different plausible futures, as opposed to
taking different approaches and seeing how they thrive in exactly the same future. This
paper plays to that issue as well.
   The third thing that has influenced this paper is working with an organization called
INTEVAL (the International Research Group on Evaluation), where for many years
we’ve been arguing that we need to move evaluation as a discipline away from major
studies, typically at the end of projects, towards streams of evaluative learning, where
evaluation is wrapped into ongoing events in a way that can support effective learning

12
            E x p l o r i n g t h e P o t e n t i a l o f Re a l - T i m e a n d P r o s p e c t i v e E v a l u a t i o n s




and contribute to accountability. In this approach the evaluator is more immersed in
the process of learning and improvement (but carrying the risk that they may lose their
impartiality).
   Then the final factor influencing this presentation is, of course, that we undoubt-
edly are in turbulent times. Not everything will be turbulent, but many of the things
that we as evaluators are trying to engage with will be more turbulent than heretofore.
The important point here is that many of our evaluation frameworks help us to exam-
ine the costs and benefits of incremental changes, and to compare one standardized
intervention against another. In contrast, evaluating complex and dynamic interven-
tions requires us to look past the overt features of intervention and context and try to
understand the deeper, more systemic processes at work.
   So this paper builds very much on Michael’s substantial shoulders, but it does sug-
gest one particular way forward: exploring the relationship between scenario thinking
and real-time evaluation. It is not a solution to all the anxieties raised in this introduc-
tion but it provides a pragmatic way for addressing at least some of them.

The case for introducing scenario thinking to real-time evaluation
I am going to look at three dimensions of the problem: deep uncertainty; evolving
preferences and perceptions of utility changing during the lifetime of the project, and
scenario planning. Mintzberg’s questioning of old-style strategic planning, discussed
earlier this morning, speaks to a world where projects and programs are purposive and
include forward thinking and preparation, but evolve and adapt as practitioners learn
and the world changes. In passing, we should also note that this makes identifying a
single counterfactual even more complicated in this situation.
    So with deep uncertainty, evolving preferences, and the absorption of strategic plan-
ning into learning organizations, there is a great need for a new approach to evalua-
tion. Traditional monitoring and evaluation frameworks struggle to deal with projects
that adapt or radically change their planned activities in order to achieve their original
objectives. In other words they may keep the same objectives, but they actually change
how they’re going to achieve those. Secondly, many programs and projects quite rightly
respond to unexpected changes in their boundary partners, who they may influence
but cannot control. (Boundary partners are those organizations and groups who are a
necessary part of the chain of causality linking the project to intended outcomes but
who are not controlled by the project). These organizations whose behavior is crucial to
the successful delivery of the project, may react in ways that were not anticipated, and,
if so, the project may justifiably feel the need to adapt to these behaviors. And, thirdly,
they seek to maximize utility: for example, you may have a program to reduce infectious
diseases but divert resources to meet new needs resulting from natural disasters or civil
war. Should the program manager be punished for that by your evaluators or should
you be rewarded for showing flexibility and initiative? 	


                                                                                                                        13
H I G H I MPA C T E V A L U A T I O N s




    Furthermore, most interventions are, in practice, self-limiting, and delivering contin-
ued long-term benefits requires multifaceted and evolving strategies (or sunset clauses
and exit strategies). In turn, this requires non-linear, complex, and emerging evaluation
strategies. Since most evaluations don’t do this, most evaluation information is weak
and fails to convincingly deal with attribution or accountability.
    That’s easy to say. It is a bit like saying to the caterpillar with arthritis, “I’ve got the solu-
tion for you, my friend, you need to become a worm.” And the caterpillar says,”Great, how
do I go about doing it?” And you say “Hey, I do the strategic thinking around here – your job
is just implementation.” It is quite difficult to actually absorb lessons we are learning as prac-
tical evaluators. The difficulties are numerous, but non-linear evaluations can become simply
arbitrary and as shifting as a thing they evaluate. In other words, you may not really say a
great deal, you just track a lot of changes taking place and finish up with a final report that
might be summarized as “a lot of things happened, nothing much worked as intended (but
some benefits were delivered), and there are no transferable lessons.” We need to identify a
set of agreed methodologies instead of reinventing real-time evaluation every time.
    So we might think about real-time evaluation as a cycle of learning and accountability in
the face of uncertain futures. Instead of the classic evaluation questions (what were your
objectives, were they achieved?) we can ask a number of key questions about the capacity
and skills demonstrated in dealing with complexity and change. We can ask periodically not
only what has been done but also what is being learned? We can ask how the project equips
itself to deal with uncertain futures. We might ask have you got robust ideas that hold up
in different futures? Have you identified the different risks that exist in the areas? And there
are risks. Have you got the skills that you might need to deal with those different areas?
Are you monitoring the right elements in your environment? Are you identifying the key
boundary partners that you need to influence in order to deliver on the program?
    There is a related set of questions concerning how decision-making is devolved to
those who have the best information and greatest capacity to exercise effective judg-
ments. Is the program sufficiently adaptable? Have you got that capacity to adapt? Have
you got recognition of and responsiveness to environmental signals in your world? How
are your incentives working to avoid a program carrying on doing the same thing long
after it had become sub-optimal?
    In this context, evaluation becomes locked into a cycle of learning, supporting deci-
sion making, and demonstrating to others the reasons supporting the changes made.
Evaluation material might then begin to look very different from the ex post evaluations
we are used to seeing. They may take the form of annotated learning logs, for example.
They may not even be done by professional evaluators.

What is to be done?
For all the growth in evaluation in the past twenty years it is not obvious that either orga-
nizations are better at learning or that we feel more able to hold organizations to account.


14
            E x p l o r i n g t h e P o t e n t i a l o f Re a l - T i m e a n d P r o s p e c t i v e E v a l u a t i o n s




In the changing world we are describing today, what would happen if we thought less
about evaluation reports and more about a stream of evaluation products? Or, even more
heretically, we focused on evaluation activities rather than professional evaluators? The
purpose would be to support well-founded judgments in the face of a changing world
and applying lessons learned. It would also be to demonstrate to those holding them to
account that this had been achieved. I am suggesting the production of evaluation prod-
ucts at key stages in the process of scenario-based learning. How has the project taken
stock of their current situation, how have they identified the range of likely futures, how
have they adjusted their understanding of the risks they face, are they still influencing
the things in their environment? With this approach you can finish up with an evaluation-
which rewards and explores and interrogates the capacity to act and respond at least as
much as it addresses the extent to which you’ve achieved your initial objectives.
    This would be one way—I do not at all want to argue it is the only way—of doing
real-time evaluation in the face of uncertainty. It supports a creative response to the
reality that there are multiple plausible futures that we face. It would, I believe, sup-
port good, helpful interim evaluations of progress that would be relevant both to the
projects and to the wider community they serve. It can also be a way of including your
boundary partners. It could help to build consensus about what those future challenges
are. It can develop your ex-ante evaluations of capacities. It can provide an inclusive
and supportive evaluation, and it looks at what have you achieved and how might you
adapt, and evaluatees are not penalized for an inability to predict, but they are penal-
ized for a failure to learn and adapt.
    However, if you are going down that road, it does seem to me there are significant
issues and problems. Some of those would resonate, I think, with IEG. When do you
simply become implicated in strategic management? What do we really think about
evaluations becoming agents of change?
    It is the independence of evaluations, the dispassionate voice, which is in danger of
getting lost. You would get very rich narratives, but you might lose accountability for
performance against agreed standards. At the end of the day, public money, charitable
money, or private money has been put into achieving public objectives, and people are
entitled to ask were those objectives delivered on or not.
    So just to recap, I think that the things that Michael has identified are significant
changes for evaluators, and we are looking for different ways of reacting to that world.
I think that the conceptual framework that Michael offers is very valuable and useful. I
have identified one, I think, radical, way of working within that to try to build into our
real-time evaluation something that takes at its heart the idea of the uncertain, the com-
plex, and the need for adaptive, responsive but accountable organizations.




                                                                                                                        15
H I G H I MPA C T E V A L U A T I O N s




DISCUSSION
Daniela Gressani	
One question that I have, and I think is really for both speakers, perhaps more for Tom,
is the question of risk. When we are looking at things in the middle in real time, one
of the things that we need to take into account is the possibility that things would not
work as planned. I think that the testing of facts against alternative scenarios is part of
this thinking, but how do we choose the right risks? How do we identify the downsides
that we need to take into account when we, in fact, construct this scenario or more
generally when we ask the questions that we need to ask? I mean, part of real time is
that we do not know how things are going to work out before the full implementation
of what we evaluate.

Hans-Martin Boehmer, Manager, Communications, Learning and Strategy, IEG	
These were two very interesting presentations. Before I came to IEG, I was the head of
Corporate Strategy for the Bank, and we actually invited Mintzberg for a seminar with
our Board members. He basically gave exactly the slide that you just showed, and the
response from the Board members was, excuse my language, but the Marion Barry
incident was still fresh. This is how management wiggles out of accountability by say-
ing, ”Don’t measure us against our articulated strategy, measure us against something
else,” but things change.
   So the Board did not buy it, and in part the Board did not buy it because public
accountability is seen as a big thing, and the Bank operates in a realm that is socially
complex, where quite often what you actually do about the development problem isn’t
necessarily agreed on, and the Bank is seen as a rather contentious organization. I have
a hard time figuring out what this means for independent evaluation. If you have some
reflections on that, I would be very appreciative.

Gail Richardson, Lead Operations Officer, Europe and Central Asia Region, World Bank	
You have kind of thrown my world into a different sphere, so I appreciate that. It is a
very compelling presentation. I have two thoughts that came to mind, and one was this
fundamental challenge that we already face in country capacity. We are telling countries
the Bank does not actually do the evaluations, we give them the technical support and
the resources to have that be done. So we have had this paradigm where we set the
strategy, identify indicators, and now we are saying, yes, but we also have to be able to
operate in this fluid environment, which is very real and very true. It throws the IEG
evaluation of that original strategy into question in terms of the relevance of that if we
are not supposed to be where we said we were going to be anyway.




16
              E x p l o r i n g t h e P o t e n t i a l o f Re a l - T i m e a n d P r o s p e c t i v e E v a l u a t i o n s




    The other thought that comes to mind, in addition to the challenge of country capac-
ity, is that one of the drivers for change that I see as part of this complex environment is
the demand by consumers and beneficiaries to provide feedback and the mechanisms
to do that. We have got the ability to get data through cell phones now, much better
channels of communication through email and faxes and etc., so it used to be, well, you
go out mid-term and get that information, but now we are saying do not just accept
that, do it, have it be a more dynamic process.

Nidhi Khattri, Senior Evalution Officer, IEG
My question is around this whole issue of mid-term or prospective evaluation. When
projects do change, or public programs do change, do you have a set of questions you
have actually used in assessing whether that strategy ought to have changed? In other
words, not getting into any scenario planning or the actual content of the work itself,
but some guiding questions as to whether strategy should have changed, on what basis
it should have changed, and so forth. So it takes us a level higher than actually getting
involved in the management of the issue.
    The second question relates to whether in your own work you have come across
programs or projects that have in fact changed rapidly, because public agencies take a
long time to change and to deploy resources away from one set of options into some-
thing radically different, it takes a long time, and it is very difficult to do. So how do you
judge that whole process? Thank you.

Stephen Pirozzi, Senior Evaluation Officer, IEG	
I want to repeat my colleagues, thank you very much for your presentations. I have a
quick question about project-level evaluation. If in the beginning we have a set of cri-
teria or expectations or benchmarking for a project, and five years later we realize that
everything has shifted or changed due to unforeseen events, does it then become an
iterative process with the transaction team or management to reset those benchmarks?
If so, does that compromise independence? How do those get reset so we can properly
evaluate a project?

Michael Quinn Patton	
Well, you have raised a lot of stuff and all of it very important. Part of this is about
accountability and independence, so a couple of broad brush things. The notion of
what gets measured gets done is the basis of a lot of performance results and perfor-
mance management, the very kinds of things you are doing. That does make targets
rigid, and it focuses accountability on where you ended up and where you wanted to
end up when you started out.




                                                                                                                          17
H I G H I MPA C T E V A L U A T I O N s




    Let us imagine that instead of making the target a fixed target, that in fact what pro-
grams are held accountable for is their adaptability and resilience, and that their respon-
sibility is to document the basis of that. The way this wriggles out of accountability is
that programs do not know how to document the changes that they are making, and
document the evidence in a formal and systematic way about what they are seeing that
leads them to adapt to what is going on.
    We are acting like real-time evaluation is some new creation. That is how most busi-
nesses run. Businesses run on real-time evaluation. I am amazed that the World Bank
and IFC, coming out of a research paradigm, ignore the way that businesses operate.
They change constantly based upon customer feedback, based upon what’s working
and not working. They do not do five-year reports to find out whether or not their
new program worked. They get real-time customer feedback and adapt. They evaluate
whether there was a sound empirical basis for making the adjustments, and so a part
of the way you maintain your independence, and it is an important independent func-
tion, is to look at the paper trail, and the logic, and the data that inform those decisions.
Are people just shooting from the hip, or in fact are they reasonably tracking what is
going on, getting feedback, and making adjustments on the basis of what is happening
so there is a rationale for adjustments, it is not just willy-nilly? You, independently, can
look at the basis for those adjustments and determine their reasonableness.
    You cannot do it against a counterfactual. Tom and I may well disagree on this, but I
think the whole notion of counterfactual becomes irrelevant under conditions of com-
plexity. There are a million counterfactuals, so there cannot be a counterfactual. That is
a mechanistic kind of thinking. What you end up doing, in classic Herbert Simon terms,
is a satisficing judgment. Were the adaptations made reasonable given the nature of the
changes? Was there a rationale? And did people themselves readjust their targets and do
so on some reasonable basis? Can you track the path? A complexity-based evaluation is
a map of decisions and alterations that show you, ala Mintzburg, where you ended up
and why, and a judgment about the soundness of those decisions.
    One final quick comment. This stuff is one of the top-down bottom-up tensions I
was describing. I introduced into utilization-focused evaluation a new form of use that
seemed to me to become dominant, driven by accountability concerns. It is what I call
mechanistic use. Mechanistic use is the effort by policymakers to remove judgment
from the system by creating artificial rules of action, like “three strikes and you’re out.”
Like if you reach a certain test score in a school, the school goes on probation. No
discussion of what that means, no discussion of context. Remove from judges making
judgments, put it in the law: you do certain things, certain results happen.
    Now the evidence is that prosecutors know how to game “three strikes and you’re
out” and are doing it. The schools know how to game No Child Left Behind. This mech-
anistic kind of accountability—of policymakers setting artificial targets and then holding



18
             E x p l o r i n g t h e P o t e n t i a l o f Re a l - T i m e a n d P r o s p e c t i v e E v a l u a t i o n s




people accountable for them—is a direction that is a reflection of mistrust in our politi-
cal economy. It is very dangerous, it is very destructive, it is the opposite of adaptivity.
Complexity requires judgment. It requires a fair judgment. It requires looking at the
satisficing kinds of real world conditions that go on. And therefore independence helps
look at that and make judgments about that, but it will not be a mechanistic, perfor-
mance-based, number-based judgment. It is going to require independent auditors and
evaluators to actually own their judgments, and the criteria for judgments, which are
the reasonableness of adaptability as people respond to complexity.

Tom Ling	
The counterfactual issue is that if you lose the counterfactual, you lose one arm of tra-
ditional evaluation, and you then need to think about how you compensate for that.
I agree, in complexity, you have got an infinite number of counterfactuals, and so you
need to think about how you manage that. To understand changes in strategy and as
strategies evolve, one of the devices that I’ve used fairly successfully is a project diary, in
which the project managers are required to maintain a six-monthly diary, which identi-
fies key changes to strategy and why they made those, and lists the reasons why, and
that has been a very effective tool I have found.
   A small example would be an effort to improve the treatment of people who self-
harm, particularly in an accident emergency. That project started off with one theory of
change, and it was significantly transformed by the findings that emerged, but also by
the fact that it involved users within the project itself, and it produced outcomes we had
not anticipated that were really very interesting. So tracking those changes through the
project diaries was one way in which we at least had some kind of written document that
we could then point to and support our conclusion, which was that they would react and
respond very effectively to new information as it became available.
   I think also there is the question about risk and uncertainty and how we deal with
the problem of risk, which is extremely important. I would make a distinction. Risk,
which is a calculable thing, particularly in Anglo-Saxon approaches to risk; it is the
chance of something happening multiplied by the impact that that would have, both
of which are broadly quantifiable or scalable if not quantifiable. A lot of what we have
been talking about is uncertainty, which is not quantifiable or scaleable in that way. The
types of risks that you can address would be things like random behavior—try to model
out what would happen with different forms of random behavior. Or if you have got
inadequate information, which means that what you are doing is risky, you have got
ways of managing that by collecting better information or analyzing the data you have
got more effectively. But you still have got deeper uncertainties, which is really what
we have been talking about, where conventional risk analysis will not actually help you
to develop your strategy.



                                                                                                                         19
H I G H I MPA C T E V A L U A T I O N s




   I would sharply demarcate where they are dealing with risks that they failed to
identify but should have done from uncertainty that was either accommodated or
responded to, which they could not predict, but where they should have known that
there was a danger of becoming very like Rumsfeld. But there were risks, there were
uncertainties which they should have acknowledged as part of their program. So, sepa-
rate out uncertainty from risk.
   And then there was the issue of the counterfactual. By and large what I have tried
to do is to evolve contribution stories using John Mayne’s approach. My first act is to
say, why do you think what you are doing is going to make a difference, transforming
that into the theory of change that is testable, and then trying to develop data around
that. And the aim of the evaluation is not to get at certainty of effect, it is to reduce the
uncertainty that the project manager and those holding them to account have. So it is
a core of uncertainty where you can, by a series of evaluative activities, reduce the level
of uncertainty about the effectiveness of the project or the program over time, but that
core never reaches certainty. You are aiming to narrow down and reduce the areas of
uncertainty and be quite explicit about what is still uncertain in the evaluation, which
means judgment comes in. Thank you.




20
             E x p l o r i n g t h e P o t e n t i a l o f Re a l - T i m e a n d P r o s p e c t i v e E v a l u a t i o n s




Session 2: Real-Time and Prospective Evaluation in Practice	

REAL-TIME AND PROSPECTIVE EVALUATION IN PRACTICE:
THE EXPERIENCE OF THE U.S. GOVERNMENT ACCOUNTABILITY
OFFICE

Stephanie Shipman, Assistant Director, U.S. Government Accountability Office




Creating a new program or policy – like any change – involves risks and opportuni-
ties. Forging a new approach, creating new rules and procedures, altering relationships
between individuals and agencies, creates opportunities to fix problems with the old
way of doing things but also uncertainty about future success. Evaluation-based pro-
gram planning provides an opportunity to improve the chances of program success
through incorporating (1) program features associated with success in the past, and
(2) oversight mechanisms to provide timely corrective feedback on program perfor-
mance. A systematic approach to these tasks helps the program manager minimize
risk by ensuring a balanced, comprehensive analysis of the new program or policy that
identifies unmet assumptions, builds upon existing evidence, and anticipates and coun-
ters threats to program success. The U.S. Government Accountability Office (GAO) is
charged with providing objective information to assist congressional decision making.
GAO conducts a wide array of studies of programs and policies, both prospective and
retrospective. Today I will briefly describe our experience with two types of studies that
directly aim to assist program and policy design—prospective evaluation and real-time
evaluation.

The Prospective Evaluation Synthesis
Developed at the GAO in the 1980s, the prospective evaluation synthesis is a systematic
method for assessing the likely success of a proposal by comparing a new program or poli-
cy’s features and assumptions to existing evidence on similar approaches. It is intended for
use when a new program or alternative approach is being considered; the most effective
approach is not known; but similar approaches have been tried (and tested) in the past. The
method begins with an analysis of the proposal which articulates:

1.	 	 the nature of the problem the program is designed to address;
2.	 	 a conceptual “logic” model of the mechanisms by which program activities are
      expected to “fix” the problem; and
3.	 	 an operational model of what resources are required or assumed to be available.

                                                                                                                         21
H I G H I MPA C T E V A L U A T I O N s




   After assessing the proposal’s assumptions and internal consistency, data are col-
lected, reviewed and synthesized to assess the quality and extent of evidence support-
ing the proposal.

GAO’s Assessment of Teenage Pregnancy Program Proposals
In the 1980s, births to unmarried teenagers were rising alongside concerns about the
associated negative social and economic consequences for these teenagers and their
children. In the absence of a federal program specifically targeted to this problem, sev-
eral legislative proposals aimed to create new programs to prevent teenage pregnancy
or its economic disadvantages for young parents. GAO was asked to provide informa-
tion on:

1.	 	 the extent of the problem;
2.	 	 the effectiveness of programs for preventing teenage pregnancy and for providing
      related services to pregnant and parenting teenagers; and
3.	 	 the implications of this information for structuring new legislation.7

   To provide structure to the analysis, GAO selected two maximally different legisla-
tive proposals, from among a dozen being considered. Then, for each proposal, we
categorized the strategies they took, including the types of services, locations, and
populations they targeted. We then described each proposal with conceptual models
that articulated the mechanisms by which program activities were expected to result in
desired outcomes, and operational models that depicted the specified organizational
arrangements.
   To assess the promise of these conceptual and operational models, GAO reviewed
research on the size and scope of the issue (to estimate the population eligible for each
program), summaries of research on the antecedents and consequences of the prob-
lem (to compare to the conceptual models), and evaluations of similar service projects
conducted at the state or local level. Evaluation studies were first screened for research
quality, and then their results were summarized by program strategy and type of ser-
vice for each desired health, education, and income-related outcome. To assess the
operational models, we also reviewed the evaluation literature and a previous survey
of program administrators to identify challenges to and solutions for operating these
types of projects.

Lessons Learned
As you might imagine, we discovered that the success of the prospective evaluation
synthesis method is highly dependent on the availability of good quality studies of prac-


7. GAO, “Teenage Pregnancy: 500,000 Births per Year but Few Tested Programs,” GAO/PEMD-
86-16BR, July 1986, p.7.

22
            E x p l o r i n g t h e P o t e n t i a l o f Re a l - T i m e a n d P r o s p e c t i v e E v a l u a t i o n s




tices that are similar to the target proposals, and have been used with groups similar
to the intended population. Although similar teenage pregnancy programs had been
evaluated before, flaws in their research designs and lack of data on long-term benefits
limited our ability to identify “what works” in reducing the negative consequences of
unmarried teenage parenting. Thus, there was little direct “hard” evidence on which
proposal’s conceptual model – the comprehensive services or simpler approach – was
more likely to be successful in achieving the desired outcomes.
   On the other hand, evidence on the difficulties in implementing prior programs sug-
gested one should keep the program’s administrative procedures fairly simple. Here, a
lack of evidence on effectiveness did, nevertheless, clearly lead to a policy recommen-
dation. Since there was no evidence that the more complicated comprehensive service
model was more effective than the simpler model, there was no support for requiring
adoption of the more complex model that would most assuredly be more difficult and
expensive to implement.
   Finally, the lack of clear evidence of effectiveness suggested that Congress might
want to hold off on creating a new national program. Instead, they could consider cre-
ating a small demonstration program to carefully evaluate alternative service models in
order to learn what works for future dissemination. That is, a small, targeted program
with built-in feedback on performance can minimize current risk while also reducing
uncertainty of success in the future.

Real-Time Evaluation
After frustrating efforts to evaluate the effectiveness of programs only to discover that
they had not actually been carried out as designed, program evaluators now expect
program implementation to be evaluated before – or as part of designing – an effec-
tiveness evaluation. An implementation (or “process”) evaluation assesses the extent
to which a program is operating as intended, that is, conforming to statutory and
regulatory requirements, program design, professional standards or customer expec-
tations. It may address issues such as the appropriate and efficient use of resources,
the quality of products or services, or the extent to which the targeted population is
reached. While they could be undertaken at any time, implementation evaluations are
typically conducted early on to identify and respond to emerging problems in a timely
fashion. Real-time evaluation in the foreign assistance field has been described as a
typically rapid process evaluation of a relatively brief initiative (several months long),
intended to provide feedback to guide corrective action.8 When interventions are
this brief, it is probably especially important to draw on prior evaluations in program
design and obtain rapid feedback.


8. Maurice Herson and John Mitchell. “Real-Time Evaluation: Where Does Its Value Lie?”
Humanitarian Exchange Magazine 32 (December 2005) www.odihpn.org/report.
asp?ID=2772

                                                                                                                        23
H I G H I MPA C T E V A L U A T I O N s




GAO’s Real-Time Assessment of Recovery Act Implementation
In early 2009, the American Recovery and Reinvestment Act9 authorized an estimated
$787 billion in new federal spending and tax provisions to respond to what is believed
to be the Nation’s most serious economic crisis since the Great Depression. The Act has
an array of purposes: to create jobs and promote economic recovery; assist those most
impacted by the recession; invest in transportation and other infrastructure to provide
long-term benefits; and to stabilize state and local government budgets. Experience
with other large federal spending initiatives has found that risk for fraud and abuse
grows when billions of dollars go out quickly, eligibility requirements are established
or changed, or new programs are created. Thus, both Congress and the Administration
desired to ensure transparency and accountability in use of those funds to avoid waste,
fraud and abuse. As one piece of the built-in oversight framework, the Act mandated
GAO to, among other things, conduct bimonthly reviews of states’ and localities’ use of
Recovery Act funds and approaches taken to ensure accountability for those funds; to
assess whether the funds are achieving the stated purposes of the Act; and comment
on the estimates of the number of jobs created and retained by recipients of Recovery
Act funds.
   Since March 2009, GAO has been collecting longitudinal data on the actual and planned
use of Recovery Act funds in 16 states and D.C., which were selected to represent two-
thirds of the U.S. population and two-thirds of the intergovernmental grant funds.10 GAO
also collected data on grant making and monitoring activities from six federal agencies
overseeing Recovery Act grant programs that have begun disbursing funds to states or
have known or potential risks. GAO assessed the reliability of the estimates of jobs cre-
ated and retained through review of federal guidance and federal and state quality review
procedures, and analysis of recipient data submitted to Recovery.gov.

Lessons Learned
From the start, GAO’s reports (April 2009) provided valuable nationwide information on
the uses and tangible benefits of Recovery Act funds at the state and local levels.11 For
example, GAO clarified that much of this state and local spending would not occur until
2010, and a majority of the initial grants went to state Medicaid programs of health care
for the poor, elderly, and persons with disabilities. States reported using these funds to
maintain Medicaid eligibility and benefit levels and cover increased caseloads due to the
recession, as well as to offset state general fund deficits, thereby avoiding layoffs. The
bimonthly reports also provided insight into the interaction of federal and state rules
and processes that could – at the least – delay achievement of program benefits. For

9. PL 111-5, Feb. 17, 2009
10. GAO, “Recovery Act: Status of States’ and Localities’ Use of Funds and Efforts to Ensure
Accountability.” GAO-10-231, Dec. 10, 2009.
11. GAO, “Recovery Act: As Initial Implementation Unfolds in States and Localities, Continued
Attention to Accountability Issues Is Essential.”GAO-09-580, April 2009.

24
             E x p l o r i n g t h e P o t e n t i a l o f Re a l - T i m e a n d P r o s p e c t i v e E v a l u a t i o n s




example, in many states, legislative authorization is needed before the state can receive
and/or expend funds or make changes to program rules. In some programs, the twin
pressures for accountability and speed created difficulties. For example, by November,
one-third of local public housing authorities were not on-track to spend funds for capi-
tal improvements within the allotted 12 months. This was due, in part, to large grants
that led to more, and more complex, projects that required additional design work and
clearances; and, in part, to additional federal monitoring of a small number of local
authorities with troubled procurement histories.
    Early monitoring and reporting can – and did – identify important issues to cor-
rect while funding is still being disbursed. (As of late November, three-quarters of the
approximately $280 billion for programs administered by states and localities, had yet to
be paid out.) Some GAO recommendations have already been acted upon. To respond
to states’ lack of funds for their new oversight responsibilities, [the Office of Manage-
ment and Budget]OMB provided guidance on how to obtain some cost reimbursement,
while additional funds are sought from Congress. To help states coordinate the various
Recovery Act funding streams, OMB now requires federal agencies to notify state recov-
ery coordinators of any awards made in their jurisdiction. To improve the credibility of
recipient reports of jobs created or retained, OMB and federal agencies have worked
together to improve guidance and conduct outreach, and they have re-examined their
quality assurance processes after the first round of recipient reporting.
    In particular, GAO recommended modifying and leveraging an existing oversight
mechanism – the Single Audit Act – in order to simplify and consolidate some of the
separate federal agency oversight requirements. To reduce duplication and fragmenta-
tion in federal oversight of state and local execution of numerous federal programs, the
Act encourages reliance on periodic consolidated audits of these agencies’ fiscal and
program management. However, to ensure timely and efficient feedback on Recovery
Act operations, GAO recommended accelerating the audit reporting timelines, apply-
ing audit requirements to some small but high-risk programs, and considering lifting
these requirements for some low-risk programs. OMB is currently operating such a
pilot project in several states.
    Finally, this level of scrutiny of an unprecedented large, multi-agency initiative
requires vast resources. GAO obtained special authorization for temporary hiring that
allowed us to field audit teams across the country, in addition to our ongoing work.
GAO also worked closely with federal agency Inspectors General, state auditors, and
the Recovery Accountability and Transparency Board to share information and audit
findings.12 Although GAO has reviewed internal controls in new programs before, the
bimonthly reporting cycle has strained the audit agency’s capacity. Bimonthly report-


12. The Board, including many agency IGs, reviews the processing of contracts and grants, reports
quarterly to the President and Congress, and is charged with reporting any potential problems
requiring immediate attention.

                                                                                                                         25
H I G H I MPA C T E V A L U A T I O N s




ing is highly unusual and burdensome for an audit organization that devotes significant
resources to validating data, findings, and conclusions. Nevertheless, this type of com-
prehensive analysis - which draws on lessons learned over time in the areas of fraud
prevention, contract management, and grants accountability – will help control risk and
increase the Recovery Act’s chances of success.
   Program evaluation – unlike research – is primarily conducted as an aid to decision
making, and oversight agencies, in particular, aim to help policy makers manage risk
and opportunity. Thus, as evaluators we seek to marshal credible evidence on how well
programs have been performing and draw inferences about what we can reasonably
expect in the future, based on available information. Evaluators need not be forecasters
to be able to recommend ways to limit risk and encourage program success.




26
             E x p l o r i n g t h e P o t e n t i a l o f Re a l - T i m e a n d P r o s p e c t i v e E v a l u a t i o n s




THE UNITED KINGDOM RESPONSE TO THE CRISIS:
EVALUATING IN REAL TIME
Philip Airey, National Audit Office, United Kingdom



The National Audit Office (NAO) is, first and foremost, the United Kingdom govern-
ment’s financial auditors. We certify the accounts of many public bodies in the UK, but
we also have a statutory responsibility to report on the value for money with which
resources are used by the UK government. We produce around 60 reports a year. My
presentation is about two of those reports and a few more to come over the next few
years related to the financial crisis.

NAO Studies
We have published two reports so far. The first, on Northern Rock which was a rela-
tively small mortgage bank based in the north of England, was produced early last year.
I think one of the presentations earlier talked about complexity and chaos. This was
a case of the British public verging on chaos. Many people thought there was some
danger of losing their deposits in this bank when it got into trouble, so they formed an
orderly queue outside each branch to withdraw their money.
    We published a second report towards the end of last year. This report deals with
the program of projects that has been put in place since Northern Rock and it is a “map-
ping” report. It is very much a non-evaluative report. It sets out what has happened and
why, and positions the NAO for future evaluations over the next few years. For instance,
we are conducting a program of work at the moment, looking at a large insurance
scheme for one of our major banks. The scheme was being put in place when we did
the last report and now that it is up and running we will examine it. We will also report
on the unwinding of the measures as and when share stakes are sold and guarantees
lifted. So this is very much a real-time evaluation, a set of real-time evaluations for us.
    Why is the NAO interested in this? It is an extremely complex situation, with enor-
mous risks for the UK taxpayer.
    What are we doing? Well, two of our aims in doing this work are transparency and
accountability. Many taxpayers in the UK are unsure of what’s being done and why. Up
until these reports, there had been little accountability to the UK Parliament for what was
done. So those two reports by the NAO are the beginnings of a process of transparency
and accountability.
    First we had to define the scope of our work. If you are going to do an evaluation,
you have got to think carefully about what you are going to evaluate. The first problem
we had when putting together this piece of work was the question: what are we going
to look at? We had to be careful here because there was a lot going on. Financial regu-


                                                                                                                         27
H I G H I MPA C T E V A L U A T I O N s




lation in the past has not been part of the NAO’s audit responsibilities and we do not
examine the conduct of monetary policy. What we did cover, though, was the develop-
ment and implementation by the UK Treasury and others of a whole series of support
schemes. We are not looking at other, wider policy areas. You need to be quite careful
as it’s a huge area and very complex. I am not saying that we will not look at that in the
future, but these reports are all about implementation by the Treasury of a series of
schemes to deal with the financial crisis.
    We also had to have a clear message very early on, and in all our work the first ques-
tion we ask is: if the government spends money, did it need to spend all of that money
in the first place? It’s all very well having a fantastic project, but if you don’t need to do
it, there’s no real need for it, well, why bother? We had to have a clear message, espe-
cially in this second report, that “do nothing” was not an option. If nothing had been
done then chaos similar to that seen when Northern Rock got into difficulty would
surely have ensued. So we had to get that question out of the way, but then comes a
question that really does concern us: was all this value for money? £850 billion is a stag-
gering sum of money. The NAO has never looked at a program of projects involving
such a huge amount.
    The program is made up of a whole series of schemes. Much of it is guarantees and
insurance, both across the system and for individual banks, but there is some direct
expenditure in the mix, some share purchases, especially in two of our largest banks,
and some loans as well, to a whole series of smaller bodies and organizations. So in total
there is just over £100 billion in direct net expenditure so far. We have got around £14
billion back, which is a start, and one of the things we will do is keep a scorecard as we
go along, the cash out the door and cash in the door, in our future reports.
    One of the first things we do when we launch an evaluation is to ask an organization:
what is your aim and objectives for this project? So what were the objectives? Primar-
ily to protect the financial system, protect depositors’ money, those are the first two,
they’re not mutually exclusive, and the third, ensuring continued lending to creditwor-
thy borrowers, is also about financial stability, perhaps a bit more of a long-term objec-
tive. Those top three are absolutely key and the final objective -- the one that interested
us most -- if you have got to protect financial stability, what are you doing to protect
the taxpayers’ interest?

Findings
So let us start off. Did they get the basics right? Past crises around the world were
examined and there were generic solutions, but there was very little time to work out
detailed plans. The UK authorities did not have adequate legal powers when faced with
the crisis at Northern Rock, and their contingency planning for such an event was not
up to date.



28
             E x p l o r i n g t h e P o t e n t i a l o f Re a l - T i m e a n d P r o s p e c t i v e E v a l u a t i o n s




    Resources: When we looked at the Treasury, about 17 staff were responsible for
overseeing financial stability in 2007, so very few staff available -- very few staff with the
skills that would be needed.What we’ve seen is a heavy reliance on the external advi-
sors. There’s a plentiful supply of investment banking advisors in London, as you’d
expect, and they were brought in very quickly when needed.
    Timing: Careful thought was given to the scope of what could and could not be
done, from the do nothing option, right up to full nationalization of the banking sys-
tem. The thinking was always around what would be a proportionate response? What’s
happened in the market this week? What would be a proportionate response? So we
were satisfied that the taxpayers’ interest had been protected in the sense that the
schemes put in place were proportionate. The details of the schemes were worked out
quickly after they had been publicly announced, which helped in producing a direct
effect on the markets, and we could see that happening from all the published market
numbers. But this was never a simple cause and effect relationship.
    Performance measures: Another question we always ask when conducting evalu-
ations is: you’re doing this project, have you got a set of performance measures that
will tell you when you’ve achieved your objectives? If there aren’t any, we will try and
develop a set of measures. At the start, such performance measures were underdevel-
oped. They are developing some now. However, in looking at this and evaluating it all,
we have to bear in mind that this was a crisis situation. Nobody expected this to hap-
pen. Ultimately, the UK authorities did a pretty good job, and our reports say that.
    Financial stability: Was it maintained? It was. No disorderly failures, no losses of
deposits. Commitments are now in place to encourage lending to creditworthy bor-
rowers. In evaluation terms, the success in meeting lending commitments is on our
agenda. Financial stability was maintained, but when we tried to put a bit more of evalu-
ation into this, when we looked at various market indicators of solvency, of liquidity,
it was very difficult to isolate the effects of individual measures. For instance, the Bank
of England has reduced interest rates to half a percent, and is now buying high quality
assets. That also had an effect.
    When we published this report we said that all the indicators that we could find were
looking good, they were all heading in the right direction. We were not actually sure
whether it was because of the individual schemes that we were looking at or whether
other things were happening, and whether actions taken in other countries -- in the U.S.
in particular -- to deal with the financial crisis were having a direct impact on what was
happening in the UK. So that is all very complex, and we have not been able at this stage
to cover that in our report. I make no apologies for that, it is just too complex for one of
these reports. Perhaps in some future work we will come back to it.
    Taxpayer protection: A big question for us. So far, so good, no guarantees have
been called. They have had the effect intended and there is fee income coming in,



                                                                                                                         29
H I G H I MPA C T E V A L U A T I O N s




based on market prices. That is one thing we did look at. Were they charging these
banks for this support? They are and it’s based on market rates.
   Insurance: Again, the government undertook extensive due diligence before get-
ting into this. There is a question about pricing there, which we are looking at. For
various reasons the scheme could not be priced at market rates, but we will explain
that in the next report.
   Share purchases: These were done after extensive stress testing, so were propor-
tionate in that they only did the minimum needed. We will come back to that when the
shares are eventually sold. We will do further reports; again, we will have that scorecard
in the background as well.
   The lending to various organizations: Over-collateralized and priced at market
rates, so taxpayers are protected and there is a lot of follow-up work for us over the
next few years. Have we been able to fully evaluate? Actually, I do not think we have in
practice been able in the fullest sense to evaluate this program and we will not be able
to fully evaluate until about two or three years’ time, when the measures have wound
up. So far, so good. A bit of real-time evaluation for us, and it is a work in progress.




30
             E x p l o r i n g t h e P o t e n t i a l o f Re a l - T i m e a n d P r o s p e c t i v e E v a l u a t i o n s




REAL-TIME EVALUATION IN THE INDEPENDENT EVALUATION
GROUP: ASSESSING THE WORLD BANK GROUP’S RESPONSE
TO THE GLOBAL CRISIS

Ismail Arslan, Senior Evaluation Officer; Daniel Crabtree, Evaluation Officer; Ali Khadr,Manager;
Marvin Taylor-Dormond, Director; and Stoyan Tenev, Chief Evaluation Officer, IEG	



The main catalyst for this work goes back to September 2008, and that was the collapse
of Lehman Brothers. There was immediately a run on stocks globally. It spread very
quickly to the developing world, not just financially, but also economically and socially.
We have seen already quite an uptick in unemployment, and poverty levels are increas-
ing. In short, it was quite an ugly scene that emerged, and we are still feeling a lot of
the consequences of it.

The World Bank Group’s response to the crisis
What happened here on Pennsylvania Avenue or H Street, where the Bank Headquar-
ters are, I think we can categorize as an element of surprise when the crisis hit. There
were some warning signs that the crisis was coming, but the Bank Group was initially
focused elsewhere. The Bank was looking at the food crisis: food prices had increased
rapidly in the previous 12 months, so it was handling that. And IFC was, in the first
instance, concerned about ensuring that it had profits coming through in the current
year so that it could make further investments. Its capital was rather constrained.
    There was at that point a search for lessons and direction. Where do we go with
responding to this crisis? This is where IEG comes in as part of a multifaceted story. We
reviewed the lessons of previous crises very quickly. We looked back at 20-something
crises from the 1980s and 1990s and reported to the Board on those lessons. In Decem-
ber, some new crisis initiatives were launched, covering a number of aspects, such as
trade and infrastructure. There were announcements about new lending that would be
carried out over the coming years, and some objectives finally crystallized in the spring
of 2009. So there was a direction that IEG was helping to influence by looking at the
lessons of the past.
    These new realities—the doubling of lending, the fast tracking of lending, the pos-
sibility of greater impact, but also on the other hand the greater risks that come with
the additional lending and speedy lending—are highly complex, highly uncertain pros-
pects. That really ought to be the case for evaluation getting involved early on, on a
real-time basis, so that we can promote learning from experience as the crisis response
is being implemented.



                                                                                                                         31
H I G H I MPA C T E V A L U A T I O N s




    Results are more important than ever, and the resources, of course, are constrained
so there is less ability to carry out self-evaluation, and also an independent perspective
is important. And if we wait, it is going to be too late to influence the direction, to pro-
vide learning, and to change course if things are not going the right way. Also, evalua-
tion is important, of course, for accountability.

IEG’s evaluation work
Our approach covers the whole Bank Group, a joint effort, including IEG-IFC, IEG-
World Bank and IEG-[Multilateral Investment Guarantee Agency ]MIGA. It is a phased
approach, looking first of all globally, at what is happening with the response, and
then drilling down into country cases, sequencing the outputs that we will deliver, and
updating on a live basis. For example, we finished a report in November, which we sub-
mitted to the Commitee on Development Effectiveness (CODE) of the Bank’s Board,
and then we had an informal briefing with the Board of the Bank in January. The report
had the data until the end of the third quarter of 2009. The briefing contained data
through to the end of 2009. So we used data as close to the current day as we could,
and then because of the need for speed, used some less formal processes for delivering
interim products, to brief the Board, to elicit management feedback for internal clear-
ance, and for quality assurance.
   There are a few challenges that we have seen already in carrying out this work.
Firstly, arguing the case for doing real-time evaluation to the Board and management.
The Board and management were accustomed to IEG doing ex-post evaluations. Con-
ducting an evaluation in real time was in many ways precedent-setting. Second, data,
given timing, is of course incomplete, so we do not have all the data on outcomes or
impacts. Third, results frameworks are lacking in many cases, and there is a lack of
baseline data and monitoring. Fourth, the challenge of balancing speed and quality
in a real-time evaluation. Fifth relations with management, the point that Tom Ling
was making before about playing a judicious independent role without getting in the
kitchen, and, finally, timing our outputs so that they have the most utility, so that they
will be well received.
   How have we sought to address some of these challenges? Well, in making the case
for real-time evaluation, we made the promise that feedback would be timely, that we
would be able to offer learning. References to practice elsewhere, for example in the
NAO and the GAO were very helpful in that they were carrying out that work at that
time. Furthermore, the uniqueness and magnitude of this event argued for a case for
new approaches. Regarding the incomplete data, we cannot deal with this fully given
the timing, but we were able to pick off early aspects. So we could consider how rel-
evant is the response? How well designed is the response? How is it going in the first
year, factoring in lessons of past crises, looking at interim indicators, and focusing on
what is coming out as well as what is going in, on actions and processes, and maintain-


32
            E x p l o r i n g t h e P o t e n t i a l o f Re a l - T i m e a n d P r o s p e c t i v e E v a l u a t i o n s




ing frequent contact with operations to get the latest data to be able to update on a
live basis.
    We carried out country visits as soon as we could, with a prioritization on the region
that was initially hardest hit by the crisis.
    To ensure quality, we held regular, high-frequency meetings of the team, but also
had a steering committee, which cuts across the IEG to provide that guidance to the
core team. We kept close engagement with management and, with departure from
business as usual in terms of the informal exchanges, and understanding on the part of
management that there is a need to do something a bit differently this time.
    Timing for impact, we are looking ahead at what is coming, particularly the spring
meetings at the World Bank Group. When does the Board want briefings to inform their
thinking, their decision making, listening to what might be useful to management?
    Where have we reached so far, what have we delivered? We did the notes on the les-
sons of past crises at the end of last year. Within the last 12 months, we have delivered
our notes on the first year of the response, and briefed CODE just a couple of weeks
ago. Feedback has been very positive that the work has been relevant and useful. We
are victims of our own success at the moment: there is a demand for more, both deeper
work and broader work, which poses some challenges, especially over the next couple
of months, in being flexible in the use of our resources, paying special attention to
interaction with management, and ensuring that we manage expectations. There are
going to be some tradeoffs in that we have a relatively small amount of time to do the
work, so we have to say no to some aspects.

Findings and insights
At this point let us emphasize that we have passed these much more as descriptive
insights than judgmental insights, because this is an ongoing study, and we have not
as yet delivered any kind of formal report. By the spring of 2009, the Bank Group had
clearly articulated its objectives regarding the crisis response, but there was not a clear
sense of what would constitute success or otherwise. Now whether it is reasonable or
not to expect such clarity is entirely another matter.
   In terms of implementation the last fiscal year has been a year of historically high
lending, even though it is a modest amount relative to the financing gaps. Of course,
the issue of to what extent you catalyze the flow of funds comes up there. There has
been a stronger poverty focus in the response operations than had been the case, for
example, in the East Asia crisis, but the issue is how to sustain that. And IFC made a
quick response in terms of structuring initiatives, but the problem really has been in
implementing some of these initiatives.
   In terms of initial lessons and results, prior country engagement matters for both
speed and quality.This is a time of historically low conditionality in World Bank opera-
tions and much more country ownership. So what does that tell us about the impor-


                                                                                                                        33
H I G H I MPA C T E V A L U A T I O N s




tance of results frameworks to structure things and ensure sustainability? The private
sector platform, terms of structuring initiatives, again, is great, but in terms of imple-
mentation it has been somewhat feeble in some areas, and that might have meant
missed opportunities.
    Issues going forward, and these are really things at this point that we identify as
wanting to keep on the radar screen: Results frameworks, the importance of trying to
use those in World Bank operations. The issue of responding and channeling the finan-
cial flows where they are most needed. Again, this is akin to the counterfactuals debate
and has parallels with the issue of how long is a piece of string, but leveraging the
unique World Bank Group reach, contingent capital arrangements, setting up delivery
platforms and protocols ahead of time so you can structure a response in some estab-
lished framework, or at least ensure that people know the rules. On quality of impact,
you have got to worry about the fiscal sustainability of clients, strengthening the poverty
focus, supporting growth reforms, delivering on the private sector response and, of
course, not forgetting about long-term sustainability issues such as climate change and
environment.
    Now, in closing, let us to go back to something that Philip Airey said about the
definition of crisis in the UK—having people lining up outside banks, and it really is
so true that that happens. But you know, even in the stoic UK there is an almost daily
event, because it rains almost every day in Britain, that throws off even the most stoic
and determined British queuer, and that is the following: At a bus stop, when you are
queuing and waiting for the number 28 bus, let us say, and you see a number 28 bus
coming, and you think, “Great, here I am number 20 in line, and I’m probably going to
get on the bus.” Then you look again and you see another 28 bus, and a third 28 bus,
and sometimes even a fourth 28 bus, because of course all of them come at once, and
then they don’t come for half an hour. So what happens with that is immediately the
British system of queuing breaks down. You can tell the little old lady next to you is
wondering, ”Should I make a run for the last bus, or should I try to get on the first one
which is closer?” Of course, the last bus is going to be the emptiest, and so there is a big
reward in terms of getting there, but it is further away. The parallels between that and
the complexity of what evaluation has to look at are not lost on us. Thank you.




34
             E x p l o r i n g t h e P o t e n t i a l o f Re a l - T i m e a n d P r o s p e c t i v e E v a l u a t i o n s




DISCUSSION
Roland Michelitsch, Chief Evaluation Officer, IFC
I have a question to all of the speakers. You mentioned that it is very complex even to
have a model of how the interventions are really going to feed through the systems and
how they interact with each other is very difficult. So how did you actually address that
in terms of setting up in the evaluations where it is very difficult to attribute in complex
situations? What is really attributable to a specific intervention? Then, secondly, we
talked a little bit about the importance of what is your without project scenario, and
obviously the more dire your without project scenario is, the better your with project
scenario performs—the financial sector did not collapse or whatever you use as the
without the case scenario.
    For example, I know IFC best because I am from IFC, and when I look at past IEG
results, and this is IEG data not my own data, I see that those projects that were actu-
ally approved during crisis situations tended to perform very poorly. If we went in right
afterwards, we actually got very good results, and so how do you judge in that context
the fact that IFC was focusing on the portfolio? Arguably, if you construct a without
projects scenario, maybe all of these client companies would have gone under and our
[Maximum Probabilty Losses] MPLs would have shot up and really constrained our abil-
ity to do something in the future. And on top of that if we had just pushed out money
really fast, maybe we would have gotten, like we did in the past, really very poor results.
So how do you factor that into actually the messaging that comes out of that? I really
liked in the NAO case, but putting that in context while remembering this was a crisis
situation and just being able to react very fast and so on means that sometimes you will
have to make some trade-offs, speed versus quality.

Hans-Martin Boehmer	
I would like to hear about the experiences as to how the real-time evaluations have
been used by the management to maybe make some modifications or changes in their
programs. I think GAO did mention that there were some views, but it will be useful to
know how it gets used by the management.

Marvin Taylor-Dormond	
I just wanted to hear a little bit more from Stephanie and Phil about the way you are
dealing with results. Attribution is truly an important issue, but before attribution is
measuring results. That is a fundamental, a key issue in our case, and it is precisely what
has been behind one of the foundations of ex-post evaluation, because we argue that
in development results take time and then only ex-post evaluation, five years after the
project has been dispersed is the right way to do it. That would argue against real-time
evaluation. My argument has been that is like saying that in a hospital the emergency


                                                                                                                         35
H I G H I MPA C T E V A L U A T I O N s




department should not develop a results framework because that belongs to the long-
term care section, and so obviously is incorrect. There is a different results framework
in the emergency unit from that of the rest of the hospital. The problem here has been
that the units have not been used to developing this type of results framework for
emergency or crisis situations. So I just wanted to hear a little bit more on that. By the
way, I just saw, Phil, that you said that you did not venture much on results, but were
you courageous enough that there is a section of your report in which you are deter-
mining these results?

Tom Ling	
Just very quickly, before coming here I feared, or I thought, that very turbulent times
with the high need for government action would lead to much less evaluation. I think
what we are seeing is that it is leading to a different form of evaluation, where with that
pressing need to act on the edge of chaos, to use the earlier account, that what the
evaluators can seek to do is to map what is going on. We heard about that: look at the
basis for a future evaluation, look at the timeliness of the response, look at the legal-
ity of the ways in which it was being implemented, and begin to develop a framework
for the future, I think that earlier Michael and I were talking more about where there
was greater uncertainty, but not on the edge of chaos, and that is where emergent
approaches to real-time evaluation become quite appropriate.
    Probably what we have not talked about is that in those areas where there are high
levels of technical certainty and high levels of agreement, there is still a place for tra-
ditional classical evaluation. So it does seem to me that it’s got an interesting sense
of three different approaches that might become appropriate, depending on how far
away from that access from that earlier slide. We know as you get towards chaos, actu-
ally even doing scenario planning isn’t going to help a great deal, but you begin to try
to stabilize the future and stabilize the sense of understanding and build evaluation
frameworks for the future.

Nidhi Khattri
Coming back to the issue of independence, when you conduct these evaluations and
if there are specific recommendations particularly around implementation or design, is
there any thought or any issue in your minds about then recusing oneself from doing
evaluations down the line of the same projects or programs? Are there any issues with
respect to conflict of interest, because if recommendations get adopted, is that sort of
stepping into the kitchen some. So your thoughts on that would be helpful.

Stephanie Shipman	
Those are great questions. The independence from management is handled the way
GAO has for, I was going to say centuries, but it is really just decades done it. We do not


36
             E x p l o r i n g t h e P o t e n t i a l o f Re a l - T i m e a n d P r o s p e c t i v e E v a l u a t i o n s




prescribe the specific management actions that should be taken, but rather recommend
that appropriate actions should be taken to fix this problem. So management then has
the responsibility of determining what that action is and putting it into place. We care-
fully specify the nature of the problem, what is lacking—guidance, procedures, review,
whatever—and that it is management’s job. That is what we do day in, day out, year
after year after year. So we do not get into that problem.
    We are reporting at a much more rapid pace than we normally do, and this review
with the Recovery Act is a way to allow mid-course corrections, essentially prompting
more guidance, better clarification of the requirements, and the like so that OMB can
keep building and rebuilding guidance to make those changes. But that is their respon-
sibility, not ours.
    How is it used by management? It is so tailored, it is absolutely tailored, all the report-
ing and the discussions. Again, a lot of real-time briefings for managers at OMB and in
the individual federal agencies about what is going on that allows that process to keep
going. So what you will see in each of those bimonthly reports is a little update on what
happened to the recommendations that were made in the last report and the like.
    Structuring with reference to attribution: That is a big one, and then there is the
fast action, poor results issue. The agencies already have a variety of processes and
procedures in place to do a lot of the efforts. Okay, so with the transportation funding
they were encouraged to pick the projects that were top on their list to be funded.
We know perfectly well there were projects waiting to be funded that already had the
planning and the bids and the proposals, so they were ready to go, shovel ready, that
is what that is all about. So you are reducing uncertainty tremendously and allowed to
be able to prove the quality there by not having them trying to make up stuff because
that is when you are creating tremendous risk. We also have encouraged them to use
the same performance measures that they were using before, not create new measures
just for the Recovery Act. Use the same measures for transportation, for schooling, for
school improvement, for hunger assistance, etc., that you were using before you were
making use of that knowledge base.
    Attribution: We are not really dealing with results at this point. On the other hand,
there are two ways to address this. One is when you are as micro as we are getting
with detailing the specific use of funds by state agencies and local agencies, you get
out of some of those problems. You do not have to make it up. You are actually
documenting. There is enough of the knowledge that states were already planning
layoffs and other budget cuts in order to meet the shortfalls of funding from their
state resources that when the federal resources come in and fill that gap, you’re not
creating the same attribution problems that you would have in other settings. So part
of it is you get around that, and part of it is we have alerted people from the begin-
ning. They should never have said that they were going to create or save X number of
jobs, because there is no way anybody is going to be able to provide a good estimate


                                                                                                                         37
H I G H I MPA C T E V A L U A T I O N s




on that. It was dumb. We said that. What we have tried to do is to improve, provide
guidance to, or encourage development of appropriate guidance so that they can do
better estimates at the local level of what these particular dollars paid for. Who was
employed under those dollars?

Philip Airey	
How do we maintain our independence but at the same time engage with manage-
ment? We are quite clear that we are independent, we report directly to Parliament,
but we are always open to informal discussions with the managements of government
departments, and indeed we encourage them to approach us informally and chat things
through before projects, during projects and after projects. That works well, and every-
body understands the rules. They will not use it against us if we get it wrong and come
along and criticize them afterwards. I think it is good that we do that and we should
continue that way as long as we maintain that informal understanding between us.
   I heard a question on how have our reports been used by management? We have had
some impact. The primary impact was about accountability and transparency. There have
been changes, or changes are now taking place in the way that the Treasury will recruit
professional advisors in that sort of situation. Some of the contracted private investment
banks in 2008, particularly, were not ideal, but it did have to be done in a crisis situation.
We did recommend some changes there, and some changes are now being brought in.
We also made a series of recommendations for the Treasury internally about how they
organize projects and how they oversee projects like this. Again, this was a crisis, but their
project manager techniques are now being put in place on the insurance scheme they are
now looking at, unless there is a very, very different project to the early crisis response.
So we have had some impact and literally at the margins becausewhat we think they did
was pretty good. So we are acting there in the margins.
   Somebody asked about how we judge the success or otherwise of individual inter-
ventions. I think we realized pretty quickly that we were looking at a program of what
were, at first view, a series of interventions of different types at different stages, but we
quickly realized this was a program that was being managed by the Treasury. It was all
done in response to market changes and risk in the financial sector, and I think, as I
said in the presentation, you could pull out one of those schemes, for instance the guar-
antee of bank borrowing in the markets, where banks are allowed to borrow privately
but with a government guarantee. There are measures of that, you can look at interest
rates and other things, but you can do that, and you can say, yes, they are heading in
this direction, that is good, bad, or indifferent. But ultimately, there certainly are other
things that would have an impact on this. As I said, the monetary policy of the UK gov-
ernment or interventions by other UK government departments that are not part of this
program and what is happening in other countries and sentiments in global markets
[have an impact], and we realized that we could take it only so far.


38
            E x p l o r i n g t h e P o t e n t i a l o f Re a l - T i m e a n d P r o s p e c t i v e E v a l u a t i o n s




    We could have gone down the route of attempting a complete evaluation. It would
be probably too early now, but I think that was way beyond the scope of what we
wanted to do at this stage. I am not ruling out that we might do something on those
lines in some future date, but I think it is unlikely. I think it is just too complex, and
what purpose would it serve? If there was a prospect of being able to look back at what
was done, and when it was done, and how it was done and imply that some schemes
were perhaps more effective than others and, therefore, if you get in that situation
again, do this first and then do that later, if need be, then there might be some value,
but that’s something that we will keep in our minds as we go along. I think it’s a wider
issue for when we begin to look at the changes that are now being made to the regula-
tion of banks and to the way that government itself will organize that.
    I think somebody asked about the “do-nothing” option and how we actually came
to the opinion that do nothing is not an option. I mean it actually came down to the
position obviously of banks in an advanced economy and the impact a major failure
would have. I think one thing that struck us [was]when we looked at the government’s
handling of the crisis at Royal Bank of Scotland (RBS), which quickly ran out of money
and had to be supported with a series of loans by the Bank of England. What if the gov-
ernment had allowed RBS to collapse? RBS had assets of, I think, two and a half trillion
Sterling, which is ‘way more than the annual GDP of the UK for start. It is just gigantic,
and it had 20 million customers. It was a counterparty to lots of other banks in the UK
internationally. And then the modeling of a disorderly failure—there is no way of letting
it fail in an orderly way, unfortunately—said it would have been catastrophic, ultimately
with a contagion effects in other banks, as well. The UK was looking at a potential
breakdown in social order if they had allowed to it happen. So these were some of the
most important decisions taken by UK governments since the Second World War, and
I am not overplaying that. There really were difficult decisions.
    As an audit office, we felt that we needed to be quite clear when we were going into
this that the government had to do this; we were not allocating blame as to why they
had to do it, and we thought of this, but it was something that needed to be done. I
am angry about Tom’s point about our reaction to this, given the circumstances, it was
quite a scary moment for us. What do we do? This is ‘way outside our comfort zone,
we look at lots of programs around government in health and defense and all over the
place, and they do tend to be at that bottom left-hand corner of the complexity graph,
most of them. I think there are well-established benchmarks by which you can measure
procurement of defense equipment, health programs, all that sort of stuff, and we have
been developing those for many years. This, when it happened, took us out to the edge
of the chaos frontier, and we were not prepared. Treasury certainly was not prepared.
We had a bit more time to think about it, and what we have come up with is certainly
a first stab at it, that mapping report just trying to set up for outreach to back bench
members of Parliament, media, the taxpayer, and generally in the UK. A lot of time they
were just scratching their heads, saying, “You didn’t understand this at all.” Why did

                                                                                                                        39
H I G H I MPA C T E V A L U A T I O N s




they not just let these banks fail? Why were they protected? Why was loads of my money
put towards protecting these people? We felt we were really on sort of a mission to
explain, to bring a bit of transparency, to bring a bit of accountability to it.
   Ultimately, as the years go by, we will reach judgments in value for money. The UK
government has said publicly that by the time all these schemes are wound down,
there will be a return for the taxpayer. We are discussing with them what they mean by
return, but we will actually come back around in a year’s time and say, well, actually, did
they get a return, not necessarily a profit but some sort of return? We will evaluate that.
So, it is scary stuff, it is big, important stuff and, it is a pleasure to do.

Ali Khadr
Just to pick up on the questions that Roland raised. On attribution: A very good ques-
tion, I think, which we always grapple with, and particularly in a complex situation it
is incredibly challenging to try to attribute a cause and effect. I think where we are at
the moment is that we can say something directionally about what is happening. For
example, where finance is being withdrawn and IFC has come in we clearly see there
was no alternative. Then we can see that IFC has played a role, it has shown some addi-
tionality. We look at the facts on market confidence. I mean, that is a directional thing,
and the extent to which the financial sector was stabilized. That has been an issue in a
number of countries in Europe and Central Asia, an issue of course in the UK.
    On IFC, specifically, you were asking about the findings on past crises and what we
had found about which projects have had the most effect. The conclusion I think we
have reached is that if we are right in, immediately when a crisis hits, that can have an
effect on ongoing operations, and it is not necessarily wise to invest right away, but as
soon as you have hit rock bottom, which in this case was very quick, it is relevant and
important to invest. In fact, IFC was estimating all sorts of demands for new invest-
ments just last December. For example, the potential equity investments in banks were
estimated to be about $30 billion. It was a rationale for setting up the new initiatives.
Incredible demand was out there. Other international finance institutions have man-
aged to grow their operations in these difficult times. Of course, we will see results
down the line, but directionally it seems that there have been some sensible interven-
tions, and that includes IFC.
    We are not saying there have not been some successful investments in banks. In Geor-
gia, they have had some very good short-term effects. But what comes out of the analysis
is a sense of missed opportunities. The demand has been out there. That was recognized.
It was the rationale for going ahead with these initiatives. We see on the ground that first-
class clients were needing support from IFC and were not getting it. So that is the overall
flavor. In terms of managing the portfolio, it is useful to have a sound portfolio, and that
will help IFC in the future. It will help future operations in two or three years, but it is not
a response to the crisis, and that is really the fundamental point.


40
             E x p l o r i n g t h e P o t e n t i a l o f Re a l - T i m e a n d P r o s p e c t i v e E v a l u a t i o n s




Daniel Crabtree	
I would like to come back to one point that was raised, and a point that was addressed
also by Stephanie Shipman, on the extent to which one should have precision in the
recommendations as to the way ahead, based on evaluation, and the extent to which in
our language that takes us into the kitchen and into a more management function that
equally undermines independence. Stephanie laid it out very nicely that ideally, one
would like to diagnose and say this needs to be addressed and perhaps even outline
the prioritized areas that need to be addressed. But I have found that, in reality, it is not
that simple, and often one gets a very direct comment from Board members who, after
all, we report to, not to say management and other stakeholders, that it is all very well
identifying areas that need attention, but, we also need some wisdom on the relative
efficacy of measures, different measures, given what we know. That downdraft, I have
found that a lot of my colleagues and I struggle with on almost a daily basis: how to be
precise, yet without compromising our independence, because if you make a recom-
mendation that is too precise, if you, to put it another way, get into the kitchen, you
become compromised, and to what extent can you then evaluate future programs? It is
a tough issue and, unfortunately, I do not think I have a response.

Marvin Taylor-Dormond	
Every report that we produce contains recommendations, but in this case we have only
indicated issues going forward. So we are not recommending anything specifically, very
much in line with what Stephanie has mentioned that is a normal practice, as a matter
of fact, in GAO. We clearly understood that we could not recommend in a real-time
evaluation context.

Mark Sundberg, Manager, IEG	
Ranging from the morning ones that were abstract and theoretical to these applied
cases, all of which deal with crisis and largely budget transfers or budget support loans
in Bank parlance. Of course, much of the evaluation work that we do in the World
Bank -- and we have three groups represented here, IFC, World Bank, and MIGA -- is
on projects, be it infrastructure, with a long duration period, or a national education
curriculum, or down to community level practices. I think the real-time and adaptive
evaluation issues are very pertinent here too. The World Bank model of evaluation
could—this is too simplistic—hardly be further away from what hass been said here, in
the sense that we evaluate projects after closure, so it often has been years after they
have been initiated, and then even a lag after their closure. And we use objective-based
approaches, so you are confining the questions that are asked. If we move towards an
adaptive evaluation model, I think it forces us to really get involved with posing evalu-
ation interventions or analyses at the supervision and at the entry stages of projects to
build that into part of the process, which raises questions about independence.


                                                                                                                         41
H I G H I MPA C T E V A L U A T I O N s




   So my question is twofold. One is how would you characterize designing that across
these very different sorts of projects from national interventions of long lag periods to
very local community orsmall interventions that are perhaps more easily addressed?
You have mentioned evaluations, Michael, where impact evaluation is more pertinent at
a granular level in that certainty area, I think you used the word simple. But what about
across these sorts of projects that we deal with? And secondly, given that there is a very
attenuated causal chain, from the Bank that deals with donors, deals with national gov-
ernments, down to local level government, down to local implementing agencies, how
do you build that in across these complex areas of evalution that we deal with?

Keta Ruiz, Senior Operations Officer, IEG
I would like to bring a question pretty much in line with Mark’s question for a more
specific case. I am working on an evaluation of the Information and Communication
Technologies Sector and the support of the World Bank Group on that. This is very sui
generis, and I would categorize it as complex sector because there is a lot of innovation
and technological change. The markets are changing and reacting to these technologi-
cal changes, and there is the role of the public sector that is also quite different in dif-
ferent countries. So there is a lot of complexity, and one is the role of the World Bank
Group and how well the World Bank Group has been supporting the client countries
in this sector. There are the infrastructural kinds of projects that we support, but then
there is this kind of project or this kind of sector that is very, very rapidly changing.
Michael gave one suggestion of a methodology, benchmarking the private sector, for
example. I would want to hear a little more about what methodologies could be used
for this kind of innovative, rapidly changing sector.

Ismail Arslan
Actually, I am not going to ask questions, I would like to answer some of the ques-
tions raised by Mark and other colleagues, particularly on the World Bank side. What
we are doing in this evaluation is looking at large design issues rather than results. For
example, some of the infrastructure projects are in response to the crisis, designed
in such a way that they are responding to the impact of the crisis, creating short-term
employment, either in Bangladesh or Ukraine. The other point I would like to make
is that this evaluation in the World Bank Group has two stages. In the first phase, we
are evaluating the World Bank Group’s response to the crisis. For example, the World
Bank is investing heavily in economic and sector work; part of our fieldwork is on
the timeliness of World Bank’s reports on country economic memoranda, or poverty
assessments. The second dimension we are looking at is on lending; as my colleagues
mentioned, we are working on design issues. Very few loans have closed yet—they are
still under implementation. In the second phase we will be looking at more interim
results and impacts.


42
             E x p l o r i n g t h e P o t e n t i a l o f Re a l - T i m e a n d P r o s p e c t i v e E v a l u a t i o n s




Michael Quinn Patton	
Taking on Mark’s question about what would be a more real-time role for the Indepen-
dent Evaluation Group, let us suppose a program design that is more complex-based
and emergent, and then look at IEG’s role in that, and building on Stephanie’s warning,
which is regularly ignored by politicians and program designers, about not picking big,
hairy, audacious goals that you have no way of meeting, because that is actually more
about politics than it is about delivering anything. So those things get set, and then you
deal with them. But, one of the recommendations of people dealing with complexity
on the management side, the management gurus, the Jim Collins, the David Snowden
types, Henry Mintzburg, is that when companies begin a strategic effort or project that
they not start with predetermining what the outcomes are, but recognizing that one
of the outputs of engagement is outcomes, that you do not begin when you have not
engaged. Under conditions of complexity, the goal setting is not done ahead of time, it
is done as a part of the engagement, when you know enough.
    The International Development Research Centre (IDRC) in Ottawa, did what I think
is a fascinating and instructive study of their big five- and ten-year programs, about tim-
ing the role of evaluation feedback and reporting, and the question that they ask of their
senior people is when is the greatest learning and adaption taking place in programs?
Think about that for a moment, some of you who have been on the management side,
when is the greatest learning and adaptation? You have done as they do; they spend 18
months to two years coming up with five- and ten-year proposals in big program areas,
working with people around the world, and so then they begin it. What they found is
that when the rubber actually hits the road, in the first six months of implementation,
good projects change all their parameters, because now all of what they assumed, all
of what they questioned—Are the resources there? Are the players there? Can we hire
people? Do we have office space? Are the partners really going to engage?—all of those
things get real, which were in the assumptions column, and they redesign accordingly.
That redesign can be well done, it can be badly done, and one of the outcomes of that
is typically a closer set of indicators to the real action.
    That is a key place for an independent view, because the original program has gone
all the way to the Board, been approved by the Board at one time for five years, but in
reality it all changes in six months. That is never approved again, and so the need for
independent review.
    IDRC is quasi-governmental; they are funded primarily by Canada, but they get foun-
dation funds and other kinds of things. They are reviewed by the Treasury Board and by
the audit authorities in Canada, and have had to develop procedures that have public
accountability around them. So it is a quasi-governmental group.
    So the first place where an independent set of eyes actually is needed, that does not
happen, is in the big adjustment period, when things get underway: the reasonableness
of those adjustments, the reasonableness of new outcomes, the reasonableness of new


                                                                                                                         43
H I G H I MPA C T E V A L U A T I O N s




program designs. What I was saying earlier under the notion of what gets measured,
gets done is that currently the performance measurement metric is having people pre-
set outcomes and holding them accountable for those. But if we measure resilience and
adaptability under conditions of complexity, we will get resilience and adaptability. If
we measure conformity to preset, but unrealistic and not very meaningful outcomes in
a narrow mechanistic accountability mode, we will get compliance, with the pretense
that those are real and attempting to meet them. So there very much is an important
role from an accountability perspective about what that means, and what it dominantly
means is in the simple box.
    The reaction of evaluators on the whole to complexity is to try to control it, to
believe that the way that you deal with complexity is to impose more control. Noth-
ing could be more wrong and more damaging. Under a do-no-harm modality, I think
evaluators do a lot of harm by imposing fixed designs, by requiring fixed indicators, by
holding people accountable for fixed indicators, by actually interfering with adaptability
and resilience because of narrow mechanistic accountability frameworks. So we bear
responsibility here.
    Independence is also rigid in some cases. We impose rigid models that keep people
from adapting, and that part of independence then is assuring the more general pub-
lic and taxpayers that adaptations that are made are reasonable, that people do have
reasons to change what their outcomes are and to adjust. That greatly cries out for an
independent set of eyes making judgments about the reasonableness of that, because
as Tom pointed out, the danger here is that anything goes. I had a foundation president
describe complexity to me as a program officer’s wet dream, and it is precisely on this
issue. So I think it what it requires is faster, more flexible and different rubrics of what
independent accountability means in a real-time, unfolding, complex kind of scenario.

Tom Ling
You asked about how to characterize the kinds of evaluations that we are talking about.
I just jotted down a few things. There is moving from doing studies to streams of evalua-
tion, from post to real time; from objective outcomes to contribution stories; from fixed
outcomes to emergent outcomes; from detached evaluators to embedded evaluators;
and from proving what happened to understanding what happened. I have certainly
done evaluations which have been on the right-hand side of that, the latter side. In each
of them I have had to take clients with me through that journey, who initially might
have been a bit worried about where that might take them, but on reflection, of course,
I have never done that for the European Commission and nor have I done it for the
NAO, nor indeed for the World Bank, it so happens, who have this public audit func-
tion. There is a certain institutional architecture that places a different set of constraints
on an evaluation conducted within that framework than for evaluation more widely.
    I think there is an issue about how you, how we, think about the arguments that came
out earlier this morning and compare that with the efforts by public audit bodies to make

44
            E x p l o r i n g t h e P o t e n t i a l o f Re a l - T i m e a n d P r o s p e c t i v e E v a l u a t i o n s




sense of an emerging and critical situation. But yet, you know how uncomfortable that
might make you feel because of the institutional setting within which you are operating.
So it might be quite important for us to think about how we can take the lessons learned
and the discussion that we have had, but also understand how that could work within the
particular settings of the World Bank, or the NAO, or the GAO, where I believe they have
different requirements placed on them than I would as an evaluator working for RAND
or a government department.




                                                                                                                        45
H I G H I MPA C T E V A L U A T I O N s




CONCLUDING REMARKS

Daniela Gressani	
This has been very, very interesting. I do not think that I can say thank you fully to both
the presenters and the participants, but I certainly have learned a lot and I think every-
body else has learned a lot. We got a lot of food for thought, which has direct relevance
to things we do and therefore is especially valuable. I have been wondering whether I
could abuse my privilege here, not try to summarize the discussion, but to tell you what
my three top take-homes will be, and I am sure that different people will have different
ones, depending on where they sit and what their key priorities are, but I just thought it
might be worth mentioning.
    The the first take-home for me would be that real-time evaluation has become the
norm among institutions like the World Bank Group, which are large and complex,
because of necessity. Michael, I think, referred to the Black Swan phenomenon. I think
what it means is that we do need to provide real-time feedback or real-time learning.
We have no choice but to be able to deal with it, get organized, do our best.
    The second take-home for me is that we need to live with risk, with uncertainty, and
with interdependencies. So in my mind, that really requires a lot of clarity about the
framework within which we are doing our evaluation. Whether we are in the simple
corner, or the chaos corner, or somewhere in between, I think it is important for us to
be very clear about that as we launch into evaluation.
    The other take-home for me is the fact that I think everybody has mentioned,
directly or indirectly, that we cannot just evaluate by objectives. We need some, I think
Mark used the word adaptive models of evaluation, something that allows us to avoid
mechanistic approaches, that requires that we use good judgment. In order to be able
to use good judgment, we need real independence, we need enough resources, and
we need something that, I am not sure who, refers to as trust. A constructive, engaged
relationship with all of our stakeholders and, first of all, Management and the Board,
which allows us to communicate directly and constructively and to trust one another,
that we mean what we say and we say what we mean.
    Clearly, this is a big challenge. I do not think that we at IEG have ready-made
answers, ready-made solutions for delivering on this kind of objective, and as I thank
everybody here again, I also would like to get a promise from everybody that this is a
first engagement, but certainly not the last, and that we will need to keep learning from
one another and we need to keep having a very open mind to learn from our own les-
sons and mistakes and successes as we go forward.




46
            E x p l o r i n g t h e P o t e n t i a l o f Re a l - T i m e a n d P r o s p e c t i v e E v a l u a t i o n s




Marvin-Taylor Dormond	
Talking about instant real-time evaluation, I just want to say that using the nomencla-
ture of the NAO, I really got value for money here. That is my immediate evaluation of
what we have done during this morning. I think it has been a fascinating discussion, and
the two initial presentations beautifully set the stage for what we had in mind, coming
from the same context as was developed by both presenters and presenting compel-
ling arguments to move ahead in the area of our real-time evaluation and prospective
evaluation.
    I think it was really interesting what we heard about prospective evaluation, using
either scenarios, as was presented by Tom, or using assumption testing, as has been
the practical use that has been introduced in GAO. There are some ideas that we will
have to explore more in the future, and I very much agree with Daniela that this is just
the initiation of this conversation. I am sure that we will meet again and try to compare
notes with what we have been doing. I am really comforted by hearing that what we
have done in impact evaluation seems simple, but it was a huge change here, and the
way we have done things in IEG is a huge mental model change for everyone, for the
Board, for management, and for our own team. It was not easy to start navigating in that
direction, but I am comforted by what you have said. It was not an option in practical
terms—we had to do something in the midst of this gigantic crisis. But it is clear that
it is not an option from the conceptual point of view, either. So there are very strong
conceptual arguments as to why we should continue embarking in this direction.




                                                                                                                        47
H I G H I MPA C T E V A L U A T I O N s




Keynote Address
INTRODUCTION

Patrick G. Grasso, Management and Evaluation Consultant



Our keynote speaker is Mr. Michael Quinn Patton. Michael spoke earlier today, of
course, but let me give you just a little bit of background. Michael was on the Social Sci-
ence Faculty at the University of Minnesota for some 18 years. For five years he served
as Director of the Minnesota Center for Social Research. Most of us who have been
active in the evaluation business for a long time have known Michael very well, in many
capacities, one of which was president of the American Evaluation Association and, as
the author of quite a number of books, and coauthor of many other books and articles.
He is probably one of the most widely-read and recognizable names in the evaluation
business. I very rarely see any major book on the topic of evaluation in which Michael
Quinn Patton’s name is not somewhere in the references. So, it is with a great deal of
appreciation for his coming to visit with us today, and a great deal of anticipation at his
comments, that I would like to welcome Michael to please come up and give us our
keynote speech.

COMPLEXITY THEORY AND EVALUATION

Michael Quinn Patton	
I think this is a tremendously important meeting and discussion to bring together peo-
ple both within IEG and some of the other parts of the Bank, and the people who have
been resources from outside in other organizations that are struggling with these issues
of how to adapt evaluation to both our changing times and our changing understand-
ings of our times, and that is really where I would like to begin, with how we under-
stand what is going on and the importance of spending time on that.
    For the distinguished folks who have joined us at lunch, let me quickly review what
we have been doing today in talking about real-time and prospective evaluation. Basi-
cally, we have been looking at the implications of things like the global financial cri-
sis for engaging in evaluation sooner rather than later, which is being called real-time
evaluation, getting feedback about how these interventions under conditions of crisis
are actually happening, and doing that in a way that provides both some public sense
of accountability and internal guidance about improving those responses to crisis situ-
ations. We began the morning with some overall conceptualizations of the problem
and then heard from people who are actually doing this kind of work both outside
and inside the Bank. What I want to do is push that discussion—being an author and
researcher about evaluation, and myself trying to keep up with these new directions and

48
            E x p l o r i n g t h e P o t e n t i a l o f Re a l - T i m e a n d P r o s p e c t i v e E v a l u a t i o n s




writing about them—to share with you what I see going on here and, in so doing, to be
provocative about what the issues are and some of the opportunities to respond.

Interpretive frameworks
Let me begin with the whole notion of the importance of the interpretive frameworks
or the interpretive mindsets that we have for whatever arena that we are engaged in.
There has been a huge amount of work in the social sciences in the last decade about
how we program ourselves, through socialization and within our culture and within our
organizations, to see the world in certain ways. And within the world of interventions,
the dominant way that we have come to see things is in linear, mechanistic ways, consti-
tuted and represented by things like a logical framework, logic models, and the notion
that interventions are aimed at pilot testing something that’s going to be replicated and
taken to scale throughout the world. The notion that evaluation is about testing models
is fairly dominant, and that’s a mindset. It is an interpretive framework that constrains,
and issues of accountability, independence, and performance measurement and results
are all affected by and reside within that framework of things.
What we have been discussing today are the implications of an alternative framework
represented in simple language by the term complexity or complex-adaptive systems,
where things are highly interactive, rapidly changing, not predictable, not controllable,
nonlinear, and where our knowledge base about what to do and the agreement about
what to do is fairly minimal. So high degrees of uncertainty, high degrees of disagree-
ment about what to do, and situations where, in fact, any intervention within a system
creates actions and reactions that are non-predictable, that are iterative, that come
back, that go forward in unpredictable ways, and indeed one of the graphic images
of the morning was a knot all tangled up so that you could hardly tell where the ends
were. What we typically do as evaluators is to think our task is to unravel that knot and
find the straight lines rather than deal with the knot, and to even think that we are not
changing the situation by unraveling it and trying to make it straight, instead of looking
at the intrinsic and sometimes helpful characteristics of the knot itself, and dealing with
the knot. So these metaphors are part of what we are going to play with.
    Where I would like to begin with is some intriguing research done by two management
organizational development scholars at the University of Michigan -- Kathleen Sutcliffe and
Klaus Weber -- in a 2003 Harvard Business Review Report13, in which they compared two
sets of high-functioning organizations, each of which was going through major strategic
processes and strategic planning processes. One group of organizations decided that the
way to get better at what they were doing was to measure their performance more pre-
cisely and to use the best practices around performance measurement, and they set out to
do that and put resources into getting better data, larger sample sizes, [and] understand-


13. Kathleen M. Sutcliffe and Klaus Weber, “The High Cost of Accurate Knowledge,” Harvard
Business Review, 2003.

                                                                                                                        49
H I G H I MPA C T E V A L U A T I O N s




ing the knowledge base of their fields, across diverse industries, better. The other group,
going through strategic thinking and looking at their situation, determined that they were
probably in a highly dynamic environment, alluding themselves by thinking that they could
get precise measurements of moving targets and that what they needed to do was to spend
more time at senior levels, making sense of the data they already had and that they [had]
access to and that was coming in, indeed, in real time. That they needed to spend more
time interpreting and less time worrying about the precision of the data because it is an
imprecise world. They followed these two sets of companies over time to look at how each
was affected by their performance, and what they found is that the companies that define
the situation as understanding and responding to their environments did better over time
than the companies that thought the issue was more precise measurement of their envi-
ronments. The title of that article is “The High Cost of Accuracy,” which has to do with how
we define the situation.
   So part of what you are faced with at every level in the World Bank is how you are
defining a situation, and much of what has gotten defined in the situation. In IEG, what
we heard constantly this morning is that it is maintaining independence, maintaining
accountability, being able to specify attributions. I am going to suggest to you that those
are old paradigm concepts, they are mechanical, they are largely outdated, and they are
interpretive mindsets that actually become barriers to dealing with the complex realities
of a rapidly changing world.
   Let me give you an example of an interpretive mindset and why we need to dialogue
about how different people can take the same data and reach different conclusions.
There is a story about a man who was very, very ill, and after some time in the hospital,
he was getting well, and as he was about to leave the hospital his wife met with his
doctor, and she said, “Doctor, tell me the truth, what’s the real story here?” The doctor
said, “Well, your husband’s been really, really sick, but if you take really good care of
him, give him the kind of food he wants, loving, and give him all the sex he wants, he’ll
really be okay.” She said, ”Well, thank you, and I appreciate your being frank with me.”
So she came out of the doctor’s office, and they started walking out of the hospital, he
said,
   “So, what did the doctor say?” She said, “He said you’re going to die.”

Complexity and evaluation
Now part of what interpretive mindsets mean is that the kind of data that come in and
the framework about data under classic and traditional evaluations is finding definitive
answers to did it work? In complexity situations, we do not get those kinds of data. We
get patterns, we get feedback, we get possibilities. We are dealing with moving targets
in a rapidly changing world.
   I encountered this challenge to my own mindset some years ago, when I was doing
a local evaluation of a leadership program in northern Minnesota by the Blandin Com-
munity Foundation that was trying to train rural leaders throughout the state of Min-

50
            E x p l o r i n g t h e P o t e n t i a l o f Re a l - T i m e a n d P r o s p e c t i v e E v a l u a t i o n s




nesota. They brought them together in an intensive retreat environment that they had
never experienced before, gave them [training in] communications, strategic planning,
dealing with indicators, [and] how to do community organizing and sent them back in
their communities. I have the evaluation contract to do two and a half years of forma-
tive evaluation followed by two and a half years of summative evaluation. Classic, prob-
ably the most classic form of evaluation contract, and they were a great group to work
with. They were open to formative feedback. They kept changing their program. We
followed people up, found out what was working for them and not working for them,
they adapted the program curriculum; they were very open to feedback.
    On a cold Minnesota morning in February, I met with them after two and a half years
and said, “You folks have been a great group to work with, you’ve been open to feed-
back, you’ve made changes, you’ve really adapted, but now we’re moving into the sum-
mative period, where we have to decide if the model that has been developed works,
and so you can’t make any more changes in the program, because if you keep changing,
we can’t answer the question of did it work. It’s got to be stable, standardized, fixed,
and that’s now the challenge. So change is done, next two and a half years, everybody
gets the same intervention and then we’ll follow up, and see what’s happened to par-
ticipants, what they’re doing in their communities, what kind of differences they’re
making, how they’re communities view [them].”
    The director of their program looked at me and he said, “But we don’t want to stop
changing the program.” I said, “No, I understand you’ve been really good about chang-
ing the program, but we’re now doing what’s called summative evaluation, and that
means you can’t keep changing the program, the formative piece is over. The Board
has contracted me to do a summative evaluation to answer the question does it work?
There are a lot of people watching what you’re doing. People want to know if they
should emulate this model. That means summative evaluation.” He said, “No, no, no,
no, you don’t understand. We understand that we can’t keep the program the same, we
need to keep changing the program because the world around us is changing.” Then he
looked at me, fairly hostilely, and he said, “Formative evaluation, summative evaluation,
is that all you people have?”
    Well, in truth, those have been the dominant paradigms, with an accountability ver-
sion of summative evaluation, which is a lot of what IEG does. Quite taken back, I
said, “Well, I suppose, if you really wanted to, you know, we’d have to renegotiate
the contract, but you know if you really wanted to, we could try doing developmental
evaluation.” And they said, “What’s that?” I said, “That’s where you keep developing and
adapting.” And they said, “That’s what we want to do. How do we do that?” I said, “Well,
we’ll have to figure that out. I’ll get back to you on that.”
    It is important to distinguish here that it is not ongoing formative evaluation. The
purpose of formative evaluation is to work out the bugs of a model and stabilize the
model so that it can be put to a formal test of whether or not it works. Ongoing adap-
tation is a different animal, and what they understood was the technology was going

                                                                                                                        51
H I G H I MPA C T E V A L U A T I O N s




to affect local leadership programs, and mobile phones were coming in, computers
were just coming in, this is the beginning of the Internet age, the ebbs and flows of the
economy, changes in regionalization, migration patterns were going to affect what they
were doing. They wanted to get more young people involved. They wanted to take this
to Native American communities. But they never expected to have a fixed model, and
that the role of an evaluation under those conditions, of ongoing adaptation to change,
sometimes rapid change, sometimes slower change, but ongoing adaptation to change,
is a different animal.
    Complexity, the way I have defined it this morning, where we do not know what
to do, and there is not an agreement about what to do, means that there is going
to be ongoing adaptation, and we are not going to get fixed models to replicate. It
also means that the indicators themselves may be emergent and [may] change. This is
controversial, this whole issue about when do you have indicators, when do you have
predetermined goals, [and] accountability against predetermined goals. But in looking
for examples of where people have dealt with complexity and how they have come up
against it, it is intriguing.

Performance indicators and time frames
Let me remind you of this, because this is something I suspect all of you will remember to
some extent, although you may not have interpreted it quite the way I am going to, and
I invite you to go back and check the record and see if my interpretive mindset meshes
with yours. But when Alan Greenspan retired in 2005 after 20 years as chair of the Federal
Reserve Board, he was going to give his final benediction speech at the Annual Meeting
of the World Central Bankers in Jackson Hole, Wyoming, which was basically the world’s
assembled central bankers and economists. This was his chance to tell the world and that
group of people the most important message he had to leave with them about the future
of the economy, the global economy, and how to manage it. He could have talked about
anything. What did he chose to talk about? You can go back, you just Google Alan Green-
span’s 2005 Jackson Hole, Wyoming, speech and it will come up. It was fairly short, and
what he chose to talk about was warning the central bankers not to pick indicators and
goals as targets. He said, do not do it. For 20 years, Congress hassled him to set targets for
inflation, targets for interest rates, and what he said was, any time you pick any singular
targets, no matter how many and what subset, you will distort all the other indicators by
trying to meet those targets.
    What does the Central Bank do? What does the Federal Reserve do? As you know,
they have got staff all over the country, they have unlimited resources essentially in
terms of data collection and super computers and all that, they monitor all kinds of
things, and then once every three months they all come together and they argue about
the data. What’s going on? There’s been a crisis in Mexico. There’s a crisis in Thailand.
Something’s going on in China. There’s a more or less global financial crisis. What do
we do? How do we make sense of it? They dialogue about that, they have very few, as

52
            E x p l o r i n g t h e P o t e n t i a l o f Re a l - T i m e a n d P r o s p e c t i v e E v a l u a t i o n s




you know, policy actions that they can take, but they are constantly looking at these
moving targets, constantly looking at where bubbles and depressions are occurring and
managing this interactive system.
   Greenspan’s book, The Age of Turbulence14, which characterizes complexity, came out
after he retired, and was written before the most recent global financial crisis. What he
acknowledged was that there is no model of how the global economy works, and there
will never be a model of how the global economy works, and the models we have, have
not gotten any better in the last 50 years in predicting the future. That is the definition
of complexity. Complexity actually emerged out of meteorology, in studying hurricanes,
trying to study weather patterns, and this allows us to think about issues like attribution
and causality and management in a parallel metaphoric term. I just spent some time with
a group of meteorologists about their work in trying to inform the public of crises and
doing warnings. The big picture of meteorology and their forecasts long range over five
year periods, not unlike global economic forecasts, is simply that the weather is going
to get more and more turbulent, that there are going to be greater variations than there
have been in the past, that there will be intense micro weather systems within larger
macro systems of change, and that old patterns are not going to be the new patterns.
That is not much of a prediction. It is almost identical to what we can predict about the
global financial community. It would be absurd, I would suggest to you, on the face of it,
to ask meteorologists to tell us what is going to be the precise nature of the weather pat-
terns five years from now, but that is precisely what we ask people running big projects
to do. What are the outcomes you are gong to accomplish in five years, in a turbulent
economic, meteorological, social, and global context?
   However, when the weather gets close they have two-week forecasts, they have one
month forecasts, they have one week forecasts and—no surprise—their best forecasts
are their eight hour forecasts, which are highly accurate, and are used by people who
need to know what is going to happen in the next eight hours: the people who clear
snow from the roads in Minnesota and spread salt, the principals who have to decide
whether or not to keep schools open, community organizations that have to decide
whether or not to hold their events. That is enormously useful, and they get feedback
from those people in real time about whether or not their forecast was right within
eight hours, because it has big implications. That is real-time feedback, and the quality
and speed of those forecasts are getting better, but will never be perfect.
   So a part of what we talked about is what kind of performance indicators are appro-
priate within what kind of time frame. The irony, and you heard this from Stephanie
in her presentation about getting into the micro details, as well as in my experience
with doing real-time evaluations, is that interestingly the attribution problems actually
get fewer, the shorter the time periods because you can connect the dots more easily


14. Alan Greenspan, The Age of Turbulance: Adventures in a New World, New York: The Penguin
Press, 2007.

                                                                                                                        53
H I G H I MPA C T E V A L U A T I O N s




between an action and reaction. It is with long, impact-laden time periods that it is hard
to do attribution. It is not hard to do attribution in a short time frame, where there is a
very direct and observable action-to-reaction connection, so that the attribution picture
actually changes under those conditions when you are looking at how programs that
are emergent are actually doing what they are doing. Part of what emerges from these
kinds of retrospective evaluations is the question of what we learn from them. Your job
here is not to manage those adaptations, but to assure that the changes that are made
in real time by managers and by programs in whatever area the Bank activity is going on
are well-reasoned, that they are justified, that they are based upon data, and that can be
done in real time as you look at how those activities are unfolding. We had examples of
that being done in other arenas this morning, and in the work that the Bank has started
doing in that.

Learning lessons
But we are also attempting to learn lessons about doing that, and one of the challenges
in deciding what we take away from real-time evaluations that has any future use is that
whole challenge of learning lessons. One of the things we know often happens is that
people take lessons from some event and then end up fighting the last war because
future conditions have changed, but they are now trying to avoid the mistakes they
made last time, and in fact creating new mistakes because they are not paying attention
to how the unfolding world is different. And we just had a wonderful example of that,
that if you will indulge me, I will use.
   I realize that in an international organization many of you may not pay attention to
American television, but how many of you have been paying any attention at all to the
late-night talk show hosts fiasco going on? . The full story is that in 1992, when Johnny
Carson, the most popular late-night host of all time, retired, there was a big fiasco in
picking his successor, between Jay Leno and David Letterman. The network screwed it
up. It was very political, very controversial, and so the lesson they took from that was
next time plan ahead the succession. So in 2005, they went to Jay Leno, who had the
top ratings at The Tonight Show, and said, “We want to plan the succession because we
learned last time not to wait till the last moment to do this and do it on the fly. We’re
going to plan for you to retire in 2010, because what we’ve learned is you’ve got to plan
ahead. We don’t quite know yet what we’re going to do with you, but you’re going to
retire and Conan O’Brien is going to replace you.” Jay went along with that, assured that
things would work out, and sure enough they planned the work and worked the plan.
   So, 2010 came, and they told Jay Leno he was going to move to 10 o’clock and that
Conan O’Brien would take his position, because they were working their plan. They
ignored the fact that Jay Leno had the highest ratings in history and had surpassed
Johnny Carson’s ratings in the meantime, that this was a very high-risk proposition.
They were working the plan, because what they had learned last time was plan your
work and work your plan, and do not get distracted by any data. But there was a lot of

54
            E x p l o r i n g t h e P o t e n t i a l o f Re a l - T i m e a n d P r o s p e c t i v e E v a l u a t i o n s




new data, and, in fact, the experiment was a colossal failure, moving Jay to 10 o’clock
and bringing Conan O’Brien in, and so the soap opera of the last few weeks, for any-
body paying attention to the entertainment news, has been all this personality stuff,
and hurt feelings, and Jeff Zucker the CEO of [the National Broadcasting Coporation]
NBC looking bad.
   If you read the business press side of this story, which does not get the front page,
they are all applauding Jeff Zucker for doing real-time evaluation because Jay’s rat-
ings were instantaneously in the tank with no indicators they were going to get better.
Conan O’Brien’s ratings were in the cellar with no indications they were going to get
better, and so they did not set a target for what the ratings had to be. They vaguely
said, “We know they’re going to start out low and we’re going to look for their begin-
ning to attract some people and adapting to those new time slots.” At the end of four
months, the argument now is whether that was too soon or too late. They did not see
any changes in the data, and given the consumer responses they were getting from
focus groups and from surveys of people, they did not expect that to change, so they
pulled the plug. Now Zucker has been hugely criticized because it has all been about
the personalities, and Jay’s feelings and Conan O’Brien’s feelings, and contracts. But, in
fact, he followed the data, and their new lesson is to follow the data in real time. Do not
make big, five-year ahead decisions, and stick to them come hell or high water. Now
part of the challenge of using real-time data is who is going to act on it, and the politics
of action. This gets us to the World Bank Board and to senior managers.
   I am trying to pull in some different metaphors and analogies here for you to think
about as you think about your own arenas of work. So what I want to invite you to do is
not immediately dismiss these as not relevant, but think about what you can learn from
these kinds of analogies.
   In 2005, at the last International Evaluation Conference (we meet every 10 years as an
international evaluation community from the professional associations), the American
and Canadian co-hosts gave the first-ever international evaluation award for speaking
truth to power to Sir General Romèo Dallaire for his work during the Rwanda genocide.
He was the Canadian General in charge of peacekeeping forces, but the evaluation
side of that story is that the only lever Dallaire had was real-time reports on what was
going on in Rwanda, and that is what he did. He filed daily reports about the numbers
of deaths, who was killing whom, what the movements were. His troops were basically
on-the-ground reporters of what was happening, and he sent those reports up through
channels, and they were ignored. You may know the story of General Dallaire, who
came back from Rwanda with huge guilt about not having been able to stop the geno-
cide had a nervous breakdown, was found drunk underneath a park bench in Montreal,
has gone through an amazing rehabilitation, and now is dedicating himself to stopping
genocide in places like Darfur and other parts of the world.
   The lesson that he has taken from that, that he talked about at this international
convention, was that he made the mistake of playing a good soldier and only sending

                                                                                                                        55
H I G H I MPA C T E V A L U A T I O N s




those reports through channels, and not looking for other ways to draw the world’s
attention to what was going on and not dealing with the politics of the situation. One
of the challenges to real-time evaluation is the political capacity of large organizations
to adapt in their decision making to real-time contingencies.
   I was involved in a federal department that I will not name, because it is confidential
internal work that I was doing, but in anticipation of the change of administration a year
ago, they brought me in to help them design a very rapid reconnaissance of the major
issues that the new Secretary of that department would face. And we put together a
real-time methodology to look at what was going on. We looked at the evaluation data,
the management information system. We interviewed people in the field. The whole
thing was done in three months’ time reaching out and asking the question, “What
does a new Secretary in a new administration need to know about this department?” It
was one of the fastest and, I think, best pieces of work that we had ever done.
   The group decided that they needed to narrow it down, and they identified five
really high priorities, areas that needed rapid response and immediate attention, that
had high value and, if not attended to, represented dangers for our country. We got all
the methodology right, we got the feedback right, we figured out how to reduce this to
a communicable form, but what had not changed in this department was the approval
process for getting something to the Secretary, which takes months, and it started
going through that approval process. Some of the people carried over from the past
administration had a vested interest in not seeing those findings passed on to a new
administration. They did not suppress them, they did not sensor them, they did what
bureaucrats are good at: they sat on them; they asked questions about them; they sent
them back for revision.
   I was faced with whether or not to become a whistleblower, because nine months
after the change in administration, I learned that these findings had not yet gotten to
the new Secretary. Everything was timed to be real-time evaluation, high priority stuff,
but the political process did not allow that to happen. What did happen eventually was
that people within the department took it upon themselves to leak it, but the timeli-
ness had been severely impaired. So it is not just enough to do real-time evaluation.
We have to look at real-time decision making. We have to look at the way in which we
are organized at every level to engage with real-time data. It challenges what we have
learned about every aspect of things.

Five methodological provocations
Let me very quickly, because we want some time to interact here, suggest five provoca-
tive methodological issues. I have been talking sort of conceptually and politically about
these issues. I am going to do these very quickly. Some of this is talked about in the
paper that I wrote, more of it is talked about in the book that I have coming out in June,
but this will at least give you a flavor moving from conceptual stuff to methodological
stuff, which mirrors the morning.

56
             E x p l o r i n g t h e P o t e n t i a l o f Re a l - T i m e a n d P r o s p e c t i v e E v a l u a t i o n s




   One of the challenges of dealing seriously with complex nonlinear dynamics is some-
thing that is sacrosanct in evaluation and it is how we understand baselines and out-
comes, which are mirror images of each other. Where’d we start? Where are we trying
to get to? Then, evaluation basically measures where did we end up? Under condi-
tions of complexity both baselines and outcomes become dynamical and emergent and
changeable and unfixed. Now the very fact that that can occur, and in good organiza-
tions ought to occur and does occur, increases the importance of having independent
ways of verifying that those changes are appropriate and valid.
   What do we mean by dynamical baselines? Nothing is more sacred in evaluation
than that you have a solid starting point, against which you measure everything else.
But let me give you a program level example of this and extrapolate it to a country and
international level of it very quickly. I work at all levels, and one of the places that I do a
lot of work is community-based, anti-poverty programs and interventions with people
in poverty around employment programs and mental health programs and housing
programs. They do intake of clients who come into the system and find out what their
job status is, what their drug abuse status is, what their family status is, what the history
of their family is. What we have learned is that all those people have learned to lie sys-
tematically in order to be eligible for the program. They know what they are supposed
to say, they know what the eligibility requirements are, and that baseline intake data is
absolutely fabricated. It takes six to nine months for a program to build a relationship
with those clients before they actually know the realities of what their situation was
when they entered the program.
   Under current evaluation norms, it would be considered both invalid and unethi-
cal to go back and change those baselines. But, in fact, people have entered those
programs with much more severe conditions than were originally expected, and that
affects the comparison to the outcomes. In fact, in many cases, given the static nature
of the baselines, people looked like they got worse during the program, and when
experimental designs are done, neither the control group’s baseline has changed nor
the treatment group’s baseline has changed. At the country level, I have talked to a
lot of folks and been involved with projects nationally where it is only after you have
engaged for about six months that you find out that a lot of the baseline statistics about
the project were made up, that the data that was supposed to be there was not real
and was not very good. You find out what is really going on in the dynamics between
various departments, and most of the baseline assumptions have to be revisited and
updated, if it is a good project. That is a dynamic baseline.
   The same thing happens with the targets. When you learn more about what is going
on and you change what you think you actually can accomplish and under conditions of
uncertainty, that is appropriate. Under conditions of high certainty, when those targets
are meaningful, where there is a knowledge base to set them, it is appropriate to hold
people accountable for them. Part of the very meaning of complexity is that we do not
know enough to set targets because we do not know how to produce the outcomes.

                                                                                                                         57
H I G H I MPA C T E V A L U A T I O N s




That is what it means to be in a complex environment, so it makes no sense to set
definitive targets when you do not know how to get to them. You need moving targets,
updated targets, but you need to do those with authenticity and validity.
   A second issue that becomes very big in complexity is about unanticipated conse-
quences and side effects. Virtually all log frames give token attention to unanticipated
consequences and side effects and say they are important and we ought to pay attention
to them, but I think it is one of evaluation’s biggest dirty little secrets that we do almost
none of that in real ways. It is just not authentic. Performance measurement, measuring
whether or not goals are attained, is so dominant that all the resources and the designs
go into that. The only way to pick up unanticipated consequences is open-ended field
work, where you go out to see what happened that you did not even think about might
have happened. It is the only way to do it. It is not budgeted. It is not included in evalu-
ation designs. We give the most token kind of attention to unanticipated consequences
and side effects. What we know is that in complex nonlinear dynamics, those things are
certain to be there, they are going to be important, and they are often more important
than the anticipated and targeted outcomes. It means that evaluations, at any stage they
are done, have to take seriously the fact that we do not know all of what is going to
happen, and we need ways in real time to turn up what is emerging. Stuff is emerging,
and it is important stuff, and we were not able to think about it.
   We often do not know the consequences. One of the best examples of that on the
positive side that I have heard, and you probably have heard this but it made a big
impression on me, is that when 9/11 hit, and the attack on the World Trade Center
occurred, the world’s financial system was virtually unaffected. I remember not long
after that, hearing an interview with Alice Rivlin, asking her why they had targeted the
World Trade Center and why there had been virtually no ripple effects in the actual
financial system. Alice Rivlin’s response was because of Y2K. We had just been doing a
decade retrospection on Y2K, which became a big, speaking of late-night talk shows,
joke. All of that work went into Y2K, the thing that was anticipated that never happened,
millions, billions of dollars going in, but the effect of that was to make all the systems
redundant, to go through scenarios of backup and what would happen if something
happened, to decentralize databases, to run fire drills about what would happen if our
systems went down, and 9/11 was the real Y2K, an unanticipated consequence of what
went on. So what we learned about these things are the system interconnections over
time about a globally interrelated system. The things that we thought were over may,
in fact, reappear in other forms, and we need to understand both the implementation
and interaction equivalent to those.
   How we go about doing this work affects what is done. I talked this morning about
the mantra in performance measurement that what gets measured gets done. If we
focus all our attention on measuring preset outcomes against preset baselines, that is
what we will get. If we measure resiliency, adaptability, and the extent to which people
are updating their understandings of situations and setting new, appropriate targets

58
            E x p l o r i n g t h e P o t e n t i a l o f Re a l - T i m e a n d P r o s p e c t i v e E v a l u a t i o n s




and adapting to that, [then] that is what we will get. Our current system is largely aimed
at creating static, mechanistic implementation of programs and initiatives. If we want
programs to be able to deal with complexity and adapt, we need to have adaptive evalu-
ation systems because what gets measured gets done.
   The fourth point around methods is that under complex, dynamic systems, the find-
ings, require interpretation and dialogue. They are not going to be definitive. They are
not going to be black and white. We are not going to be able to say it worked or it did
not work. We are going to be able to describe interdependent factors and the relation-
ships among those factors and the ways in which they move together. The discussions
are going to be much more like the Federal Reserve discussions, when they look at the
data and try to figure out what is happening now, and they decide in six week incre-
ments, “Well, that is happening over there, we need to really pay attention to that for
the next six weeks and see what happens, and monitor that and pay less attention over
here.” That is an evaluation process that is adaptive and responsive, and still builds in
accountability and can be done independently.
   Finally, under complex, nonlinear dynamics and the realities of complexity, there can
be no methodological gold standard. The language of the gold standard has done great
harm to our capacity as evaluators to respond flexibly to appropriate evaluation designs
under different conditions. The very notion that there is such a thing as a gold standard
design—which then makes people want to meet it and creates incentives to have that
design regardless of whether they are appropriate or not—creates disincentives for new
cutting edge approaches.

The “Platinum Standard”: methodological appropriateness
The real platinum standard is methodological appropriateness: adapting the evaluation
approach to the degree of complexity and the nature of complexity that we face. And
that seems to me to be the overarching theme of the morning, both conceptually and
in the examples that we heard—that evaluation has to be done in different ways and
has different dynamics under different conditions. Complexity represents a different
condition, chaos represents a different condition, and the appropriate methods for
those conditions are not going to be the traditional evaluation methods that have been
more mechanistic and static.
   With that, let me stop, and invite both questions and comments from any of you
about your own takeaways from the morning. Daniela very beautifully closed out our
morning session with her takeaways. I would invite any of you to share with the Board
and senior managers here your takeaways and disagreements with anything that I have
said.




                                                                                                                        59
H I G H I MPA C T E V A L U A T I O N s




Discussion
Ali Khadr
Thank you, Michael, I thought that was great. I just had a couple of questions or obser-
vations based on a couple of the things you have said. One observation strikes me that
evaluative wisdom or the view that evaluation can give almost by definition becomes
very time sensitive. Let me give you a slightly different take on the Jay Leno and Conan
O’Brien thing, which I heard from a man sitting next to me on a recent flight back from
Cincinnati. The person said that the problem is exactly what you said about Johnny
Carson and how Jay Leno replaced him, but it had taken a long time for the transition,
and what he said was that was real guts. The executives of that time had a vision, they
knew their vision was right, and they stuck to it. Nowadays what happens is you observe
that ratings are not adjusting immediately, and so you give up on your entire vision,
and you adapt. Now you said it is a good thing, but this poor guy seems to think it was
a really bad thing.

Michael Quinn Patton	
Very quickly, what got left out in the story is that in 1993 affiliates were not powerful.
The pushback here was not actually the ratings, it was the affiliates’ pushback, the peo-
ple whose news programs were being hurt, who were threatening real action and saying,
“We’re monitoring the situation in real time, and we’re going to stop carrying your shows
if you continue another month.” So there was a real threat. They had real-time feedback:
“Do something now or we’re out of here.” That was not the previous condition, so the
world had changed in terms of the power dynamics between the affiliates and NBC during
that time.
Ali Khadr	
Very good point. Second point, very quickly, is just on the issue of updating baselines.
Again, you portrayed it very much as a virtue, as an issue of responding in real time
and so on, and it is a great way to look at it. Think of it another way, though, which
is that if I can say to somebody, “Well, you know, I’m going to have to update my
baseline, that’s a virtue, right? I am going to be held accountable, but only for updated
baselines.” And guess what, I can influence whether, at the country level, information
gets gathered or not gathered, and I can keep updating my baseline and say, “Well,
sorry, but we’re not getting good information, and so we have to keep updating this.”
I can work my way out of accountability, and I think that is part of the explanation
as to why the incentive system at country level is so slow on the results agenda. The
point is that good data are not technologically difficult to gather, yet it does not hap-
pen or it is not happening fast enough. I just sometimes wonder why and what the
sort of incentive framework is that gives a result like that, where this is so technologi-
cally simple to know what is going on out there.


60
              E x p l o r i n g t h e P o t e n t i a l o f Re a l - T i m e a n d P r o s p e c t i v e E v a l u a t i o n s




Christine Wallich, Director, IEG	
Michael actually put a finger on some of the challenges that we are facing, not just in
evaluation but in development. On not being rigid on original goals: We have to keep
our eye on something, and development takes time. You have to have some guidance
so you do not lose your original purpose. But we design projects with the best inten-
tions, with the best predictions at the time, and then it takes about four to seven years
to implement, and sometimes it could go up to ten years in fragile states. So many
things change during that time, and original goals often are not totally valid or achiev-
able, or you can overshoot [or] undershoot. Anything could happen. But our current
methodology for IEG, for the World Bank is sticking to original objectives, even if you
restructure a project, because management is encouraging this responsiveness [and]
adaptiveness, which means you have to restructure as you go along. But everybody
knows that you are going to be evaluated by the original objectives, even if restructur-
ing was done on valid grounds. So that is something for us to think about collectively.
    Second, I missed the morning discussion, but real-time evaluation implies that things
are happening before you really get to your goal. It means that when you do real-time
evaluation, you are looking at intermediate signs and early warning signals and more
input-output measures than what is real impact. We have to keep choosing what to
look at, and deciding whether it provides good predictive signs for the long term, for
what is going to happen. It is very challenging to find things that are really predictive.
It sort of changes the total evaluation concept because our evaluation concept is that
you have to look at the impact, and if you do real-time evaluation, it means you do not
wait until the impact. You look at intermediate things that are happening. So we have
to think about that.
    Also, I would like to think collectively about the roles of independent evaluation
versus management. Staff learning is important, and the whole objective of this seems
to be learning. We all have to learn and adapt very quickly. So what should independent
evaluation bring? What should management be doing about their own self-evaluation
and adaptive implementation? We have impact evaluations, a little bit [of a] different
animal, but they could be done at different stages. You can do it in the design stage,
you can do it ex-post, it could be done in different stages that will teach you different
things. So all these are useful things for us to consider.

Roland Michelitsch
Hello, Michael, a couple of observations. On some points, I would completely agree
that you don’t want to spend too much time getting more and more and more data.
You need to also take the time to evaluate that. But in our organizations it is the case
that even basic information is often still lacking. So I just want to make a plea to not
use that as an excuse to not collect basic information, which is a huge problem, and I
think too many decisions actually are being made with lack of data rather than having
too much data, at least when it comes to development results.

                                                                                                                          61
H I G H I MPA C T E V A L U A T I O N s




   Secondly, contrary to what you are saying, I still think it is a very good exercise to
come up with clear projections, not only for five years out but also for the next year,
and the year after that, and so on, because only then can you actually track whether
this project is on track or not, and you actually can prioritize where you are going to
put your resources: checking why is this off track, can we actually put it back on track,
and then putting it back. In a sense, I do think that in IFC we have something like real-
time monitoring or evaluation, or whatever you call it, with the Development Out-
come Tracking System. There we do require people to make five-year projections and
also annual targets, but what we then do is, in terms of evaluating the performance,
we try to use absolute benchmarks rather than only the objective-based benchmarks,
so that you can say, “Well this objective I overshot, this objective I undershot, but
overall, using objective benchmarks, in economic terms does it generate above 10
percent return, does it meet the environmental standards, and so on?” So you need
to have a combination of the two of them, and I would really shy away from a mes-
sage going out of here that people should not be setting clear targets and objectives
up front.
Hans-Martin Boehmer
I want to take advantage of the fact that we have some CODE Board members here as
well. Two years ago President Zoellick had a long discussion with the Board on his stra-
tegic vision and the quintessential diagnosis that he had was that the task of the World
Bank Group is to help solve interconnected, complex, dynamic problems. It described
exactly the world that Michael was just describing about complex systems and limited
ability to really predict what is going to happen, and he gave lots of examples of why
that is the case from the food crisis and so on. The Board still has discussions with the
president on post-crisis strategy, and it strikes me as if the role of IEG is in some sense
an integral part of the vision that emerges from that. If it is, in fact, one where the pur-
pose of the organization is to deal with this complexity, and to accept that there is a
world of unpredictability or a dynamical world, as we heard this morning, as opposed to
dynamic, where everything goes perhaps in the same direction, then that would imply
quite a different role for IEG. I would be interested to hear what kind of evaluation
function is commensurate with that direction, whether that is something to be thinking
about, and whether that is something that perhaps should be discussed.

Giovanni Majnoni, Executive Director, World Bank
I would like to thank IEG and Michael Patton for the extremely interesting conversation
over lunch. I would like to just mention two little points which are basically my take,
and the second is also a question for Michael.
   The first is that this complexity is in a way something that has always been built into
social sciences, so in the way we study something our very understanding of what hap-
pens affects the outside reality. Globalization is nothing but the magnification of this,
so that to a certain extent we can expect what we do—our policies actually are affecting

62
             E x p l o r i n g t h e P o t e n t i a l o f Re a l - T i m e a n d P r o s p e c t i v e E v a l u a t i o n s




the way the process presents itself. I somewhat disagree on the Greenspan comments,
as a former central banker, because there is a good clause which says that whatever
you try to target by being targeted changes over time. So I would give more credit to
central bankers.
    This brings me to the second point, which concerns baselines. We live in a world
where baselines are often non-existent, as you mentioned, and therefore I find [it] hard
to swallow that the little we have may disappear for a greater good. So the way I think,
and this is a question, is that baselines should not disappear, but maybe we need a range
of baselines, an upper end, a lower end, and this thing should move over time. This
brings me back to the central bankers, who typically, when they project the monitoring
aggregates, all have this range, which moves and every year is adjusted. So in that sense
is the above conclusion, to judge from your remarks, widely out of line?
Konstantin Huber, Executive Director, World Bank	

Thank you very much for this interesting lecture. I have the impression that probably
the world is even more complex, insofar as there are highly complex parts and other
less complex parts, and we are still to find out what is what. Now dealing with the finan-
cial crisis, of course, everything has been turned upside down and things are develop-
ing extremely fast. So I very much appreciate your points in this context. On the other
side of the development context, the traditional development role of the Bank, we deal
with an environment which is at times not developing very quickly. I am a develop-
ment practitioner, and I have spent many years in developing countries. And looking
at the situations there, I am sometimes overly surprised that things are still the same,
and they still have to tackle the same issues and the same problems. In that context, if
we do not have a good baseline, we do not try to understand the initial situation, we
never get anywhere. So I think it is probably at both ends—yes, trying to be adaptive
and trying to grasp what is changing, but also going down to the baseline and getting
the data. And I completely agree data are not there, but they are not that easy to get. It
is a matter of the government, and it is also a question of telling the practitioners and
operations people to do it, because they want to maintain the flexibility without living
with baselines.

Stoyan Tenev	
Three comments. One, the most important takeaway from this entire three-quarters
of the day session, is that we must adapt, I would not say to complexity, because we
have been used to dealing with complexity, that is what we do in evaluation, we deal
with complexity about what you call fast changing or dynamical conditions. It is not
only complexity, but that events are moving very fast, and we should adapt to these
changing conditions, and when I say “we” I believe that it is not only the evaluator that
should adapt and change, it is also the users. And I think that it is very appropriate that
here we have two important users of the work that we do in CODE and management.

                                                                                                                         63
H I G H I MPA C T E V A L U A T I O N s




It is not only producing a more appropriate type of work, but it is also having the right
audience to be able to use these more appropriate products.
    My second point is that I very much appreciate what you have said about with the
metaphor of weather forecasts. This just reaffirms what we have been saying in the
context of our crisis management work: that you went in there to have short-term
results, and in that context you should be able to predict better what your results were
supposed to be. It is more difficult to predict what is going to happen with an interven-
tion five years ahead on the road; you should be able to predict better what the short-
term results of your intervention will be, meaning that you should be able to put up an
appropriate results framework for what you are doing to respond to the crisis.
    Finally, I very much sympathize with what you are saying about dynamic baselines.
We have a very specific case here. We have been trying to engage in an evaluation of
[the] decentralization process in IFC, but the fact of the matter is that over the last
three years IFC has been moving so rapidly and constantly changing the conditions in
terms of decentralization and organization changes and so on that we cannot find that
baseline. What is it that we are going to evaluate because right now we are in the middle
of a substantive change again? My question to Michael is how do you evaluate in these
circumstances, because the essence of evaluation is comparing against something? One
thing is to measure, that is the first step of evaluation; the other very important task
is to compare it so that you can judge. How are you going to judge in this change of
circumstances?

Michael Quinn Patton	
As predicted, this would be provocative to raise questions about the sacrosanct base-
lines in evaluation and emergent goals. Let me try to emphasize the point that I was
making, because I am not suggesting that one go out of here saying that you never
have preset goals or that you are always updating baselines. The distinction that we
built on from this morning was distinguishing simple, complicated, and complex situ-
ations, where what is simple is where we know how to produce an outcome and
we agree as a global community that that outcome is important, like the eradication
of polio. It is perfectly appropriate, indeed it would be inappropriate, invalid, and
unethical, not to have a clear specific smart goal that polio ought to be eradicated, and
the definition of that is very clear. There is no polio anymore, it is gone, no kids are
getting polio. That is as clear and specific an outcome as you can get. It is attainable.
The world is spending more than a million dollars per case now to make sure that
it is attainable, and we are very close, but that means we know how to produce that
outcome because we have a technology that will do it, and the world has agreed that
that is something we ought to do. That defines the conditions under which you have
clear, specific, and measurable outcomes and hold people accountable for them. The
World Health Organization has a campaign predicated on a theory of change that will
attain that outcome.

64
            E x p l o r i n g t h e P o t e n t i a l o f Re a l - T i m e a n d P r o s p e c t i v e E v a l u a t i o n s




    Poverty reduction is not polio. There is no vaccine for it. We do not know how to
bring about poverty reduction. It takes many different forms. It manifests itself in many
different contexts. It is constantly a changing phenomenon affected by very different
kinds of contexts. So what I am suggesting, based upon complexity science, is that to
act like we are in the simple situation that we know how to produce an outcome and
set predetermined goals, when we do not know how to attain those, is fantasy life. It
is doing what the complexity theorists are saying one ought not do in the face of com-
plexity, and that is think that you can control it, use mechanisms of control. Setting
predetermined goals and having rigid baselines is a command and control strategy. It’s
not an adaptive strategy. So I am not saying you always update baselines, but that you
do so in complex dynamic situations, and you have all experienced this in programs
that you run.
    One of the common findings that I see in ex-post reports is when people look at why
they changed, what they were changing—and I have been a program director at of a big
USAID program and experienced this myself—is that when they look at the changes
they made, what I hear all the time is, “We actually didn’t really understand the problem
when we started.” That to me is saying we got the baseline wrong. It is not just a matter
of understanding. We did not know what the right questions were. Not only did we not
have good data, we were not even asking about the right data, we did not understand
the situation.
    Situation analysis is often done in a fairly abstract way. It is done fairly removed from
the situation, and the evidence of complex, unfolding dynamics is that when you get
on the ground and start doing stuff, you actually find out what that baseline situation
was. Now I take Ali’s point about it is manipulable and corruptible, and that is why
one needs independent evidence about whether or not those updates are valid and
appropriate, but the alternative is to continue to live in the fantasy world that those
made up baselines had any meaning. So, we are between a rock and a hard place, we
hold on to what we know are not real baselines or we update them and take the risk
that that is done badly and without accountability. Find the sweet spot in the middle,
which is doing valid and rigorous updating so that we understand what the situation
was to some extent.
    That is what I’m talking about, as well as emergent goals. The key thing here, the
overall message that I hope folks are taking away in conjunction with my colleagues
throughout the morning, is situational appropriateness for evaluation itself: that we do
different kinds of evaluation for different situations, different knowledge areas, different
degrees of change, different kinds of problems. And a recognition that evaluation as
currently practiced has been for one kind of situation, one kind of understanding about
how the world has changed, and that complexity and crisis present different kinds of
situations that present different challenges for evaluation, and that imposing our tra-
ditional practices on those new situations is going to not only not work very well, but
actually can do damage, can do harm, because it stops programs from adapting in ways

                                                                                                                        65
H I G H I MPA C T E V A L U A T I O N s




that they need to do. We impose rigidities, we impose mechanistic models on programs
that need to be adaptive and need to be changing.
   So this is not just an abstract kind of thing, it is not that stakes are not very high.
What has gotten me impassioned about this is seeing evaluation do so much damage
by keeping programs from being able to adapt and do a better job because of narrow,
simpleminded kind of accountability constraints. So that is the message here: What is
the real situation? How do we define that? That is the interpretive mindset. How do you
define the situations you are getting into, and then do you have evaluation approaches
that can be adaptive to those different situations? One size does not fit all.

Patrick G. Grasso	
Michael, thank you very much. Michael once recited a little aphorism, which is that you
do not need a randomized control trial to demonstrate that jumping out of a plane with-
out a parachute is a very bad idea. I would say you do not need a randomized control trial
to evaluate this day’s session as a real success. I think the people who organized it ought
to be congratulated, and our speakers ought to be thanked. So, thank you.




66
            E x p l o r i n g t h e P o t e n t i a l o f Re a l - T i m e a n d P r o s p e c t i v e E v a l u a t i o n s




Annex
UTILIZATION-FOCUSED EVALUATION: REAL-TIME AND
PROSPECTIVE ASPECTS
Michael Quinn Patton



Utilization-focused evaluation is evaluation done for and with specific primary
intended users for specific, intended uses. Utilization-focused evaluation begins with
the premise that evaluations should be judged by their utility and actual use; therefore,
evaluators should facilitate the evaluation process and design any evaluation with care-
ful consideration for how everything that is done, from beginning to end, will affect use.
Use concerns how real people in the real world apply evaluation findings and experi-
ence the evaluation process. Therefore, the focus in utilization-focused evaluation is on
achieving intended use by intended users. In responding to the challenges of the real-
time and prospective aspects of evaluation, utilization-focused evaluation includes an
option I call developmental evaluation, where the intended use is development under
conditions of complexity. I shall argue that this is a distinct and important evaluation
purpose. The primary intended users are social innovators and others working to bring
about major systems change (Patton, 2008).

An Overview of Utilization-Focused Evaluation
In any evaluation there are many potential stakeholders and an array of possible uses.
Utilization-focused evaluation requires moving from the general and abstract, i.e., pos-
sible audiences and potential uses, to the real and specific: actual primary intended
users and their explicit commitments to concrete, specific uses. The evaluator facili-
tates judgment, decision making, and action by intended users. Developmental evalu-
ation, conducted from a utilization-focused perspective, facilitates ongoing innovation
by helping those engaged in innovation examine the effects of their actions, shape and
formulate hypotheses about what will result from their actions, and test their hypoth-
eses about how to foment change in the face of uncertainty in situations characterized
by complexity.
   Utilization-focused evaluation is personal and situational. The evaluation facilitator
develops a working relationship with intended users to help them determine what kind
of evaluation they need. This requires negotiation in which the evaluator offers a menu
of possibilities within the framework of established evaluation standards and principles.
Thus, while concern about utility drives a utilization-focused evaluation, the evaluator
must also attend to the evaluation’s accuracy, feasibility, and propriety (Joint Commit-
tee on Standards, 1994). Moreover, as a professional, the evaluator has a responsibility

                                                                                                                        67
H I G H I MPA C T E V A L U A T I O N s




to act in accordance with the profession’s adopted principles of conducting systematic,
data-based inquiries; performing competently; ensuring the honesty and integrity of the
entire evaluation process; respecting the people involved in and affected by the evalu-
ation; and being sensitive to the diversity of interests and values that may be related to
the general and public welfare (AEA, 2004). 	
   Utilization-focused evaluation does not advocate any particular evaluation content,
model, method, theory or even use. Rather, it is a process for helping primary intended
users select the most appropriate content, model, methods, theory and uses for their
particular situation. Situational responsiveness guides the interactive process between
evaluator and primary intended users. Developmental evaluation is one of the options
now available in the feast that has become the field of evaluation. Utilization-focused
evaluation can include any evaluative purpose (formative, summative, developmental),
any kind of data (quantitative, qualitative, mixed), any kind of design (e.g., naturalistic,
experimental) and any kind of focus (processes, outcomes, impacts, costs, and cost-
benefit, among many possibilities). Utilization-focused evaluation is a process for mak-
ing decisions about these issues in collaboration with an identified group of primary
users focusing on their intended uses of evaluation.
   A psychology of use undergirds and informs utilization-focused evaluation. In
essence, research and my own experience indicate that intended users are more likely
to use evaluations if they understand and feel ownership of the evaluation process and
findings; they are more likely to understand and feel ownership if they’ve been actively
involved; and by actively involving primary intended users, the evaluator is training
users in use, preparing the groundwork for use, and reinforcing the intended utility
of the evaluation every step along the way. Developmental evaluation carries this user
involvement farther than usual by creating a dynamic partnership between social inno-
vators and the developmental evaluator. The language of “partnership” is not the norm
in describing the relationship between an evaluator and those whose work is being
evaluated. Thus, developmental evaluation invites both skepticism and controversy.

Situation recognition
Astute situation recognition is at the heart of utilization-focused evaluation. There is no
one best way to conduct an evaluation. This insight is critical. The design of a particular
evaluation depends on the people involved and their situation. The standards and prin-
ciples of evaluation provide overall direction, a foundation of ethical guidance, and a
commitment to professional competence and integrity, but there are no absolute rules
an evaluator can follow to know exactly what to do with specific users in a particular
situation. Recognizing this challenge, situation analysis is one of the essential compe-
tencies for program evaluators.
   The idea – admittedly an ideal -- is to match the type of evaluation to the situation
and needs of the intended users to achieve their intended uses. This means – and I
want to emphasize this point – developmental evaluation is not appropriate for every

68
            E x p l o r i n g t h e P o t e n t i a l o f Re a l - T i m e a n d P r o s p e c t i v e E v a l u a t i o n s




situation. Not even close. Indeed, I shall argue that its niche is small and demanding.
It will not work if the conditions and relationships are not right. I’ll be specifying what
those conditions and relationships are as we proceed. The point here is that every
evaluation involves the challenge of matching the evaluation process and approach to
the circumstances, resources, timelines, data demands, politics, intended users, and
purposes of a particular situation. Matching requires astute situation recognition.

Distinguishing simple, complicated, and complex situations
To facilitate situation recognition, it is useful to have a heuristic framework, some way
of cutting to the chase by knowing what factors are important to consider when we
encounter a new situation. Heuristics are short-cuts that tell us what’s important to pay
attention to. We cannot look at everything. We never have perfect information. We can’t
consider all possibilities. We need some way of focusing. Heuristics do that. Research
on decision-making shows that heuristics “make us smart” – smart in the sense that we
make intelligent decisions quickly. Heuristics direct us in making sense of things. They
frame and inform decisions. Indeed, they make choices and action possible.
   Developmental evaluation is especially appropriate for complex situations and aims
to inform fast action and quick reactions by social innovators. First, then, we have to
decide if we’re in a situation that is appropriate for developmental evaluation, a com-
plex situation, where the pace of actions, reactions, and interactions matter greatly. In
writing the book Getting to Maybe: How the World Is Changed (Westley, Zimmerman,
& Patton 2006) we looked at the implications of these distinctions for understanding
social innovation. In this paper I want to apply these distinctions to illuminate evalua-
tion situations and options.
   Remember, the focus here is on utility. These distinctions help with situation rec-
ognition so that an evaluation approach can be selected that is appropriate to a par-
ticular situation and intervention, thereby increasing the likely utility -- and actual
use – of the evaluation. Using these distinctions involves mapping the territory and
context within which an evaluation will take place to locate the evaluation within that
territory. Moreover, these are relative and perspective-dependent distinctions, not
absolute. A situation can be described as more or less simple, complicated, or com-
plex. Utility resides in examining the implications and insights generated by asking to
what extent a situation is usefully approached as simple, complicated, or complex, or
some combination.

The Degree of Uncertainty/Degree of Conflict Matrix
The degree of uncertainty/degree of conflict matrix developed by Zimmerman (adapted
from ideas of Ralph Stacey as published in Zimmerman, Lindberg, & Plsek, 1998, pp.
136-143) is the basis for the heuristic used here that distinguishes simple, complicated,



                                                                                                                        69
H I G H I MPA C T E V A L U A T I O N s




and complex situations. To make these distinctions, the matrix maps the situation along
two dimensions. One dimension scales the degree of certainty about what should be
done to solve a problem. We know how to eradicate polio. Immunize all children. We
don’t know how to reduce global warming. There are many competing ideas and plans,
but, in fact, our knowledge is quite limited about both the causes of global warming and
what interventions would work. Programs and interventions are close to certainty when
the cause and effect relationship is highly predictable, as in the relationship between
vaccination and preventing disease. At the other end of the certainty continuum are
innovative programs where the outcomes are highly unpredictable. Comprehensive
anti-poverty initiatives involve considerable uncertainty. Extrapolating from past experi-
ence is problematic because each community is unique and there is no immunization
against poverty.

                                   First heuristic dimension
Degree of certainty and predictability about how to solve a problem

Close to certainty \--------------------------------------------/ Far from certainty

    The second dimension depicts the degree of agreement among various stakehold-
ers about an intervention’s desirability, or alternatively, the degree of conflict. There is
universal agreement that preventing polio is a good thing and that children should be
vaccinated to eradicate polio worldwide. On the other hand, there is substantial politi-
cal conflict about almost all aspects of global warming. To what extent is global warming
occurring? To what extent is it caused by human activity (as opposed to being a natural
earthly cycle)? What are the primary causes of climate change? How much urgency is
there about intervening? What interventions, if any, will make a difference? Are the eco-
nomic costs of intervening worth the likely results? On these and other matters, there
is great disagreement.

                                  Second heuristic dimension
Degree of agreement or conflict about how to solve a problem

Close to agreeing \--------------------------------------------/ Far from agreeing
Little conflict	   			                                           Great conflict


Combining these two dimensions creates the borders of a territory that can be mapped,
or a matrix, as shown in Exhibit 1. The horizontal axis captures the degree of certainty
and predictability about how to solve a problem. The vertical axis displays the degree
of agreement about what to do.



70
                 E x p l o r i n g t h e P o t e n t i a l o f Re a l - T i m e a n d P r o s p e c t i v e E v a l u a t i o n s




                                                         Exhibit 1
Know When Your Challenges Are In the Zone of Complexity
                     Far from



                                  Socially
         Agreement




                                Complicated
                                  Build              Zone of
                                  relationships,
                                  create common     Complexity
                                  ground




                                  Simple           Technically Complicated
                     Close to




                                 Plan, control        Experiment, coordinate expertise

                                Close to                                                  Far from
                                                      Certainty


Simple situations
High levels of certainty and agreement make situations fairly simple. Simple, as used
here, is a descriptive term, not meant to be judgmental or pejorative. Simple is not
simplistic or simple-minded. A simple situation is, simply, one in which knowledge
and experience tell you what to do and there is great agreement about what to do. In
such a situation, it is both possible and appropriate to intervene from the top-down,
as in the worldwide campaign to eradicate polio. The high degree of predictability and
agreement permits detailed planning, controlled execution, and precise measurement
of the degree to which predetermined targets are reached. A best practice model can
be generated and subjected to a summative test.
   A simple problem is how to bake a cake, a metaphor for the capturing the char-
acteristics of the simple originally offered by Zimmerman and Glouberman (2004). A
good recipe, like a best practice, provides detailed guidance about the steps to follow
to achieve a desired outcome. A recipe has clear cause and effect relationships and can
be mastered through repetition and honing basic skills. Recipes present standard pro-
cedures and should provide sufficient detail that even someone who has never baked
has a high probability of success. In simple situations, what needs to be done is known.
Best practices for programs are like recipes in that they provide clear and high fidelity
directions. The standard procedures that have worked to produce desired outcomes in
the past are highly likely to work again in the future. Assembly lines in factories have a
“recipe” quality as do standardized school curricula. Part of the attraction of the 12-Step
program of Alcoholics Anonymous is its simple formulation (which doesn’t mean it is
easy to do, even one day at a time).



                                                                                                                             71
H I G H I MPA C T E V A L U A T I O N s




Complicated situations
As situations become less predictable and producing desired outcomes becomes less
certain, we are moving into complicated territory. It is useful to distinguish technical
complications from social complications. Sending a rocket to the moon is technically
complicated because there are thousands of elements that have to be coordinated
for a successful launch. Technical knowledge and expertise is needed to solve com-
plicated problems. More than one area of expertise is needed and must, therefore,
be coordinated and integrated. In rocket science, formulae are used to predict the
trajectory and path of the rocket. Calculations are required to ensure sufficient fuel
based on current conditions. If all of the many technical calculations are done well,
coordinated, and executed precisely, it is likely that the desired outcome – getting the
rocket to the moon – will be accomplished. Like integrating the many areas of exper-
tise needed to get a rocket into space, coordinating large-scale programs with many
local sites throughout a country or region is a complicated problem. When the degree
of uncertainty and agreement are such that what needs to be done is challenging and
difficult, but knowable, the situation is complicated. That is, how all the parts will fit
together is initially unknown but can be figured out, and is therefore knowable, in
complicated situations.
    Socially complicated situations involve situations with many different stakeholders
offering different perspectives, articulating competing values, and posing conflicting
solutions. Whether resources should be spent sending rockets into space is more con-
troversial than whether polio should be eradicated worldwide, thus rocket launches
are more socially complicated than immunization campaigns (at least for purposes of
illustrating the conceptual difference between simple and complicated). Abortion is an
example of a socially complicated issue, as is what to do about the energy crisis. Every-
one wants children to learn to read but there are intense disagreements about which
reading approach produces the best result. Controversial issues like sex education are
socially complicated. The more points of view there are and the greater the debate
among different stakeholders, the more socially complicated the situation becomes.
How diverse stakeholders will deal with their conflicts is initially unknown but know-
able as the interactions unfold. Some of the disagreements may be about degree of
technical complication (how much certainty there is about how to produce a desired
outcome), but many disagreements are about fundamental value differences and how
to even define the problem.
    Having distinguished the technically complicated from the socially complicated and
given illustrations of each, we need to combine them to look at their interactions. A
situation is complicated when there is either a high degree of uncertainty or a high
degree of disagreement. If there is both high uncertainty and high disagreement (for




72
            E x p l o r i n g t h e P o t e n t i a l o f Re a l - T i m e a n d P r o s p e c t i v e E v a l u a t i o n s




instance, uncertainty is a primary source of disagreements and disagreements contrib-
ute to the uncertainty), we have moved into the arena of complexity.

Complex situations
Complex situations are characterized by high uncertainty and high social conflict. In
studying social innovations, we were impressed by the uncertainty and unpredictability of
the innovative process, even looking back from a mountaintop of success, which is why
we called the book Getting to Maybe (Westley, Zimmerman and Patton 2006). Evaluating
social innovations is a complex challenge, as opposed to evaluating simple and compli-
cated problems. The outcomes of interventions aimed at solving problems under condi-
tions of complexity are unpredictable. So many factors and variables are interacting, many
of them not only unknown but unknowable, that there can be no recipe for success. And
even if something that looks like a recipe emerges from one or two successful attempts
to do something, the likelihood that the same result can be attained in other and differ-
ent contexts is low. There are simply too many dynamic variables and unknowns to make
recipe-like replication (or supposed best practices) predictable.
    It’s worth reiterating the interactions between high uncertainty and high disagree-
ment. These interaction are volatile, uncontrollable, unpredictable, and unknowable in
advance: high uncertainty about how to produce a desired result fuels disagreement,
and disagreements intensify and expand the parameters of uncertainty.
    Parenting is complex. Unlike the simple metaphor of a cooking recipe or the rocket
launching metaphor for a complicated situation, parenting involves huge uncertainties
and no clear rules guaranteeing success to follow. Oh, to be sure, there are many experts
in parenting and many guides available to parents. But none can be treated like a cook
book for a cake, or a set of formulae to send a rocket to the moon. In the case of the cake
and the rocket, for the most part, we were intervening with inanimate objects. The flour
does not suddenly decide to change its mind and gravity can be counted on to be con-
sistent too. On the other hand, children, as we all know, have minds of their own. Hence
our interventions are always in relationship with them. There are very few stand-alone
parenting tasks. Almost always, the parents and child interact to create outcomes.

Cause and effect relationships
At the heart of the distinctions between simple, complicated, and complex is the extent
to which cause and effect is or can be known. In simple situations cause and effect is
known so interventions and their consequences are highly predictable and control-
lable. In complicated situations cause and effect is knowable as patterns are established
through research and observations over time, but the many variables involved make
prediction and control more precarious. In complex situations, cause and effect is
unknown and unknowable until after the effect has emerged, at which point some



                                                                                                                        73
H I G H I MPA C T E V A L U A T I O N s




retrospective tracing and patterning may be possible. These different degrees of causal
knowability actually define the uncertainty dimension of the degree of uncertainty/
degree of conflict matrix. Causal knowability is a distinguishing element distinguish-
ing simple, complicated, and complex. Management and organizational development
consultant David Snowden has emphasized these different degrees of causal clarity to
distinguish simple, complicated, and complex, with special attention to their implica-
tions for management planning and action (Snowden and Boone, 2007).

                               The Cynefin Framework
     Wise executives tailor their approach to fit the complexity of the circumstances they
     face.
                                                       Snowden and Boone (2007, p. 68)

   This was the central message of “A Leader’s Framework for Decision Making” by
management consultants David Snowden and Mary Boone in their featured Harvard
Business Review article. The article was designated as the Best Practitioner-Oriented
Paper in Organizational Behavior in 2007 by the Organizational Behavior Division of
the Academy of Management. As Brenda Zimmerman was refining the distinctions
between simple, complicated and complex in the certainty and agreement matrix, David
Snowden and colleagues in IBM’s Institute of Knowledge Management were thinking
in parallel terms that led to the Cynefin framework, making the same distinctions, an
impressive exemplar of independent discoveries by creative minds following the same
path.
   Snowden, of Welsh lineage, chose the Welsh word Cynefin (pronounced kun-ev’in)
as the name of the framework distinguishing simple, complicated, complex, and cha-
otic. The Welsh dictionary translates cynefin as meaning haunt, habitat, acquainted,
accustomed, or familiar, being both noun and adjective, and thus requiring context to
understand its meaning in any given instance. Snowden resonated to this uncertainty
which evokes the sense that our understandings depend on our interactions with each
other and our environment, which includes cultural traditions, organizational norms,
and the geographical/ecological setting within which interactions occur. Snowden’s
cynefin framework emphasizes variations in the nature of causality and the correspond-
ing implications for decision-making and action (Snowden and Boone ,2007; Kurtz and
Snowden, 2003).
   Simple: linear, direct connection between cause and effect; easily observable, under-
standable, and verifiable. This is the arena where things are known, so best practices
can be identified and applied. A leader’s or manager’s decision/action sequence is:

Sense 	                                   Categorize 	                            Respond



74
            E x p l o r i n g t h e P o t e n t i a l o f Re a l - T i m e a n d P r o s p e c t i v e E v a l u a t i o n s




   Complicated: determining cause and effect requires analysis and expert investiga-
tion, so things are not yet known, but are knowable. Good, effective practices can be
identified (but not “best”). The decision/action sequence is:

Sense 	                                            Analyze 	                                                Respond

  Complex: Cause and effect is contingent on contextual and dynamic conditions,
and therefore unknowable; patterns are unpredictable in advance. Practice is emergent
and contingent. A leader’s or manager’s decision/action sequence should be:

Probe 	                                              Sense 	                                                Respond

   Chaotic: no observable or predictable relationship between cause and effect because
of rapidly changing and highly unstable/turbulent systems dynamics, but some kind of
action is required. The appropriate decision/action sequence is:

Act 	                                                Sense 	                                                Respond

    New Zealand evaluator and leading systems thinker Bob Williams (cf. Williams & Iman,
2007) shared with me his experience using the cynefin framework. I was exploring a
new method of handling patients within a healthcare situation. I got people to group
those aspects of the situation into Snowden’s four categories (simple/known, compli-
cated/knowable, complex/unknowable, chaotic), acknowledging that a given situation
has elements of all four states (each of which implies a different response - including
strategies that might move an aspect of the situation form one “state” to another and
thus make it easier to manage).
    This then leads to some very interesting conversations about whether they were
assuming that a problem was “knowable” if only they worked hard enough, or that they
were looking for “best practice” when actually “good practice” was what they should
be considering. Some aspects of the situation were placed in more than one category.
At this point all kinds of light bulbs lit up. People realized that part of the problem they
were experiencing was that different people were imagining that aspect from two dif-
ferent understandings of what is going on. They suddenly understood why they were
having difficulty resolving or managing the situation: “Oh so you were managing it as
if it were complicated and I was managing it as if were complex - no wonder we were
clashing over strategies.”
    Snowden’s focus has been on teaching leaders and managers to make cynefin frame-
work distinctions as a guide to decision-making. My focus here is on its implications
for evaluators. Exhibit 2 adapts his Leader’s Guide to Decisions in Multiple Contexts to
evaluation.


                                                                                                                        75
H I G H I MPA C T E V A L U A T I O N s




                                                               Exhibit 2
                  Decisions in Multiple Contexts: An Evaluator’s Guide

                         Wise evaluators tailor their approach to fit the complexity of the circumstances they face.

  	The Situation:
  	Agreement/Certainty Matrix
  	  and Cynefin Framework	The Leader’s Job	T
SIMPLE	            High agreement about the problem and what to do;	      Sense, categorize, respond. Know what is known. 	
	                  high certainty that the right action will produce the	 Manage based on facts. Advocate for and	
	                  desired results: clear, direct, linear, predictable,	  implement best practices.	
	                  and controllable cause-effect pattern.		
	                  What needs to be done is known.		
COMPLICATED	       Some disagreements about the problem and what to do.	     Sense, analyze, respond. Find needed expertise	
	                  Expertise needed. The necessity of coordinating many	     to identify good practices. Listen to and assess	
	                  areas of technical expertise and many actors introduces	  conflicting expert advice. Use monitoring and	
	                  uncertainty about attaining desired outcomes. More than	  evaluation to track what unfolds as good practices	
	                  one effective way possible. Cause-effect linkages are	    are tried.	
	                  context-contingent; discoverable with careful analysis,			
	                  but neither obvious nor certain. Contingencies discernible			
	                  (known unknowns).			
COMPLEX	           High uncertainty about how to produce desired results	  Probe, sense, respond. Foster dialog, creativity	
	                  and great disagreement among diverse stakeholders	      and innovation. Watch for and interpret	
	                  about the nature of the problem and what, if anything,	 emerging patterns. Be flexible and adaptive.	
	                  to do. Results highly dependent on initial conditions;	 Make time for and engage in reflective practice	
	                  non-linear interactions within a dynamic system.	       to capture, understand, and interpret what	
	                  No right answers; key variables and their interactions	 is emerging.	
	                  unknown in advance. Each situation is unique.		

CHAOTIC	           High conflict among stakeholders; extreme uncertainty	 Act, sense, respond. Try things out and see	
	                  about what to do. Turbulence and volatility make	      what happends, watching for anything that	
	                  pattern detection unreliable, even undecipherable.	    works. Manage what is manageable to	
	                  Dynamic interactions hard to follow, not even sure	    establish some degree of order. Don’t	
	                  what to pay attention to. Unreliable information.	     yield to panic.		
	                  What to focus on is unknown and a matter of great			
	                  debate. Tense, stressful decision environment.			
Source: Patton (2010).




76
                                                             E x p l o r i n g t h e P o t e n t i a l o f Re a l - T i m e a n d P r o s p e c t i v e E v a l u a t i o n s




                            	The Evaluator’s Job	Evaluation Challenges
                                              Validate best practices (summative evaluation). Monitor	               Assuring that best practices fit new contexts (different
                                              implementation of best practices to assure high fidelity,	             from where the practices were originated and
                                              adherence, and quality. Report departures from best	                   validated). Detecting unanticipated consequences and
                                              practices amd implications of those departures,	                       context-specific implementation problems.
                                              especially implications for outcomes.	
    Validate effective practices and options with attention to	                                                      Designing a reasonable test of the theory of change
    context and system contingencies. Convert expert advice	                                                         (summative evaluation). Understanding the system(s)
    into a testable theory of change. Evaluate and report	                                                           and context(s) within which action unfolds. Detecting
    unfolding cause-effect complications and their	                                                                  and measuring both outcomes and contingencies.
    implications. Systems thinking.	                                                                                 Facilitating interpretation of less-than-certain findings.
 			
 			
			
                                              Identift and document initial conditions and monitor what	             Keeping up with the rapid pace of change in turbulent
                                              emerges. Provide ongoing, timely, and rapid feedback	                  and dynamic environments, and documenting
                                              about what is emerging. Track incremental actions and	                 developments. Managing a flexible, emergent design.
                                              decisions that affect the paths taken (and not taken).	                High level of ongoing interaction and communication.
                                              Facilitate regular reflective practice about what is	                  Combining creative and critical (evaluative) thinking
                                              developing. Embed evaluative thinking in the	                          in support of innovation. Facilitating interpretation of
                                              innovative process.	                                                   emergent findings for action. Staying developmentally
                                                                                                                     focused.
            Distinguish better and worse data; some information	                                                     Acknowledging data inadequacies. Being open and
            may be better than none, but interpret cautiously.	                                                      opportunistic about finding data. Avoiding defaulting to
            Find those parts of the action where evaluation can make	                                                the simple in an effort to exercise control and create
            an immediate contribution to help survive chaos.	                                                        the illusion of certainty where none exists. Helping to
          		                                                                                                         transition to stability in the face of chaos. Don’t
         			                                                                                                         be a burden.
         			




                                                                                                                                                                             77
H I G H I MPA C T E V A L U A T I O N s




Applying Complexity Concepts to Real-Time and Prospective Aspects of
Evaluation
The basic premise here is that evaluation in complex adaptive systems is more likely
to be useful if the evaluation is informed by complexity concepts and understandings.
Pretty straightforward premise -- derived from the importance of matching the evalu-
ation to the nature of the situation. While complexity ideas raise doubts about linear,
formulaic, and mechanical models of the world, controversies surround complexity
constructs, raising doubts about whether agreement can ever be reached on core con-
structs. What is not in doubt is that complexity ideas are in vogue, have a lot of currency
these days, and, thereby, have attracted ardent adherents and fervent critics.
   What brings me to complexity is its utility. It identifies a set of intervention circumstances
that are amenable to a particular situationally appropriate evaluation response, what I am call-
ing here developmental evaluation. Complexity is a defining characteristic of developmental
evaluation’s niche. Principles for operating in complex adaptive systems inform the practice
of developmental evaluation. The controversies and challenges that come with complexity
ideas will also and inevitably afflict developmental evaluation. The insights and understand-
ings of complexity thinking that have garnered enthusiasm from social innovators will also
envelope developmental evaluation and open pathways for increasing the credibility, rel-
evance, and utility of evaluation undertaken from a specifically developmental perspective.
   Ramalingam and Jones (2008), in a comprehensive review of the application of com-
plexity theory to international humanitarian aid, distinguish three points of view about
complexity theory: champions, critics, and pragmatists. Their description of pragmatists
nicely summarizes my own perspective, so I cite it here:

     The pragmatists, for whom complexity provides interesting and potentially useful par-
     allels, are exploring the relevance of complexity science to social systems and organisa-
     tions, and working to assess the practical benefits that arise from its application outside
     the natural sciences…. This work suggests that complexity is a lens that helps us look
     at our world and shape our action but, importantly, that it is a set of concepts and tools
     that should not be treated as the ‘only way’ to look at and do things. The pragmatists
     tend to accept the work-in-progress nature of complexity sciences, and the challenges
     that arise from drawing on diverse and varied bodies of knowledge. These challenges
     create issues around definition, measurement, analysis and coherence, and lead to a
     general acknowledgement that there is a need for a deeper theoretical understanding
     and further practical applications. (Ramalingam & Jones, 2008, p.6)

So, from a pragmatic perspective, what are some of the compelling complexity con-
structs that inform developmental evaluation? I’ve focused on six central complexity
ideas: nonlinearity, emergence, adaptation, uncertainty, dynamical systems change, and
co-evolution. Exhibit 3 defines each of these concepts and suggests their implications
for developmental evaluation.

78
             E x p l o r i n g t h e P o t e n t i a l o f Re a l - T i m e a n d P r o s p e c t i v e E v a l u a t i o n s




Five Developmental Evaluation Purposes and Uses
In considering the relevance of systems thinking and complexity concepts for evalu-
ation, I want to suggest that developmental evaluation is particularly appropriate for
but needs to be matched to five different complex situations and developmental pur-
poses:

1.	 Ongoing development in adapting a project, program, strategy, policy, or other
    innovative initiative to new conditions in complex dynamic systems.
2.	 Adapting effective general principles to a new context as ideas and innovations are
    taken from elsewhere and developed within a new setting, the work of developmen-
    tal evaluation in the dynamic middle between top-down and bottom-up forces of
    change.
3.	 Developing a rapid response in the face of a sudden major change or a crisis, like a
    natural disaster or financial melt-down, exploring real-time solutions and generating
    innovative and helpful interventions for those in need.
4.	 Pre-formative development of a potentially scalable innovation to the point
    where it is ready for traditional formative and summative evaluation; pre-forma-
    tive developmental evaluation works with emerging ideas and visionary hopes
    in a period of exploration to shape them into a potential model that is a more
    fully conceptualized, potentially scalable intervention. (As models emerge out
    of exploratory and innovative initiatives, some may move into more traditional
    formative and summative evaluation to determine scalability and generalizability,
    while others remain in developmental mode, either undergoing further develop-
    ment or continuous experimentation in the search for new models.)
5.	 Major systems change and cross-scale developmental evaluation, providing feedback
    about how major systems change is unfolding, evidence of emergent tipping points,
    and/or how an innovation is or may need to be changed and adapted as it is taken to
    scale, that is, as its principles are shared and disseminated in an effort to have broader
    impact. Horizontal scaling across systems or vertical scaling to broader systems may
    involve more than adaptation; these dissemination and scaling processes can evolve
    an essentially new development, the emergence of which can be documented and
    analyzed as part of a developmental evaluation.

Issues in Real-Time and Prospective Aspects of Utilization-Focused Evaluation

Real-Time versus Developmental Evaluation
Real time refers generally to rapid feedback and response, linking data and action as
close together in time as possible. The ultimate in real-time data analysis is reporting on
stock market transitions in micro-seconds. In hospitals, real time means getting blood
analyses or other diagnostic tests back to a doctor within a short timeline that can range
from minutes to an hour. In evaluation situations, real time typically means getting

                                                                                                                         79
H I G H I MPA C T E V A L U A T I O N s




                                                          Exhibit 3
Characteristics of Complex Systems and Implications for Developmental
                              Evaluation

  	    Characteristics of Complex Systems	                                Implications for Developmental Evaluation

  1. Nonlinear: Sensitivity to initial conditions; small actions    Watch for, sample, and study critical incidences. Assess and
  can stimulate large reactions, thus the butterfly wings meta-     map tipping points and other changes in the intervention
  phor (Gleick, 1987); black swans (Taleb, 2007), in which          landscape. Use mixed methods to capture when cumulative
  highly improbable, unpredictable, and unexpected events have      quantitative changes in key indicators become substantively
  huge impacts; and tipping points (Gladwell, 2002) when            significant qualitative shifts. Don’t confuse linear logic models
  major shifts occur changing the whole landscape of action.        and strategic plans with what actually goes on in programs.
                                                                    Look for contextual changes that shift program patterns, forks
                                                                    in the road that move the program in new directions, and sud-
                                                                    den (or gradual) responses to unexpected developments.
  2. Emergence. Patterns emerge from self-organization              Be especially alert to formation of self-organizing subgroups
  among interacting agents. Each agent or element pursues its       who have different experiences of the program and, cor-
  own path but as that path intersects with, and the agent inter-   respondingly, different outcomes. Anticipate and expect
  acts with others, also pursuing their own paths, patterns of      emergent issues and take seriously the search for unan-
  interaction emerge and the whole of the interactions cohere,      ticipated consequences, tracking interactions among key
  becoming greater than the separate parts. What emerges can        players, both formal and informal, planned and unplanned.
  be beyond, outside of, and oblivious to any notion of shared      Map networks, system relationships, and subgroups. Track
  intentionality (Johnson, 2001).                                   information flows, communications, and emergent issues.
                                                                    Emergence applies to both processes and outcomes. Watch
                                                                    for and assess not only what emerges, but what declines
                                                                    or even disappears. Disappearance is the other side of the
                                                                    phenomenon of emergence. The unplanned emerges; the
                                                                    planned disappears. Both are important, as is what unfolds
                                                                    as planned. The evaluation design is also emergent.
  3. Adaptive: Interacting elements and agents respond and          Regularly capture perspectives from key actors in different but
  adapt to each other, and to their environment, so that what       interacting systems about what’s going on. Put these perspec-
  emerges is a function of ongoing adaptation both among inter-     tives in dialogue with each other to capture and track adapta-
  acting elements and the responsive relationships interacting      tions and their significance. Both new processes and new
  agents have with their environment. Adaptive management           outcomes may emerge requiring new evaluation design ele-
  is a systematic, iterative process for making decisions in the    ments and measures. The evaluation itself must be adaptive.
  face of uncertainty, reduced control, and low predictability,     An adaptive mindset essentially involves learning by doing
  through ongoing system monitoring and response to changes         and observing. This parallels the process recommended by
  in context. The process essentially involves learning by doing    knowledge management consultant David Snowden when
  and observing, then making adjustments based on what has          facing complexity: probe, sense, respond Snowden & Boone,
  been learned, and repeating this cycle of sensing, learning,      2007). Probing is the doing. Sensing is the observing (where
  and adapting over and over.                                       chance ever favors the prepared mind). And responding is
                                                                    the adaptation. The feedback provided by the developmental
                                                                    evaluator informs the innovators’ adaptive process, including
                                                                    heightening awareness of what incremental adaptations are
                                                                    occurring so that learnings can be identified and captured. The
                                                                    evaluator may also point out when innovators are not being
                                                                    adaptive despite what is emerging; or when there is increasing
                                                                    uncertainty within a system but the innovators are behaving
                                                                    as if they’ve figured things out and know what is happening.

80
                 E x p l o r i n g t h e P o t e n t i a l o f Re a l - T i m e a n d P r o s p e c t i v e E v a l u a t i o n s




4. Uncertainty. Under conditions of complexity, processes          Identify and acknowledge sources of uncertainty, including:
and outcomes are unpredictable, uncontrollable, and unknow-        inadequate knowledge about how to produce desired outcomes;
able in advance. Emergent and adaptive self-organization can       disagreements among key actors about what to do, including
create idiosyncratic bumps in patterns that becomes mounds         value conflicts; and turbulence in the larger environment. Work
that sometimes go on to become idiosyncratic mountains, or         with key stakeholders and primary intended users on an ongo-
at other times erode into nothingness, and it’s impossible to      ing basis to understand the implications of uncertainty. Nurture
know ahead of time which pattern, if either, will prevail. Not     tolerance for ambiguity and messiness. This means resisting
acknowledging and dealing with uncertainty and unexpected          the temptation to address uncertainty by imposing order and
events can lead to a spiral of disruption with things getting      control through evaluation by forcing the complex into a simple
worse (Weick & Sutcliffe (2001, p. 2). Uncertainty is a            linear evaluation logic model with predetermined clear, specific,
defining characteristic of complexity. (Westley, Zimmerman,        and measureable outcomes. Provide rapid feedback about
& Patton, 2006).                                                   unexpected events and their implications. Early detection of and
                                                                   feedback about emergent patterns can be critical. In early stages
                                                                   of trouble or opportunity, the unexpected may give off weak sig-
                                                                   nals. “The overwhelming tendency is to respond to weak signals
                                                                   with a weak response.” Understanding the potential significance
                                                                   of weak signals and responding strongly “holds the key to man-
                                                                   aging the unexpected” (Weick & Sutcliffe, 2001, p. 4).
5. Dynamical: Interactions within, between, and among              Track and document not only whether change occurs, but
subsystems and parts within systems can be volatile, chang-        how and why it occurs. Processes and outcomes can be both
ing rapidly and unpredictably due to the interdependence           dynamic and dynamical; pay attention to both, and their inter-
of key factors and variables. The system may shift from rest       relationship. Create a flexible and responsive data collection
to rhythmic oscillation to random thrashing. These changes         system that can mirror adaptive, emergent, and dynamic/
seem to be spontaneous, but they are driven by the internal        dynamical developments, so that fieldwork can speed up and
dynamics of the system itself as the constraining conditions       slow down in sync with the intervention’s rhythms of change.
interact with each other to influence the behaviors of agents      Engage in ongoing monitoring of shifts in levels of activity to
in the system.                                                     capture dynamic/dynamical transitions. Analyze and distin-
                                                                   guish contextual factors and participation patterns that are
                                                                   static, dynamic, and dynamical, and the implications of these
                                                                   different patterns.
6. Co-evolutionary: As interacting and adaptive agents             Developmental evaluation will co-evolve with the innovation
self-organize, ongoing connections emerge that become co-          and intervention, both affecting innovation and being affected
evolutionary as the agents evolve together (co-evolve) within      by it. This is a process of co-creation. The evaluation will not
and as part of the whole system, over time.                        be independent and separate from the innovation but will be
                                                                   interdependent with it, and with those involved in it ( as part
                                                                   of a team), as the evaluator provides feedback, facilitates con-
                                                                   ceptualization of the change process, and both captures and
                                                                   generates perspectives about what is happening, and why.
                                                                   Process use, in which evaluative thinking affects the interven-
                                                                   tion, will be as important as findings use.




                                                                                                                                81
H I G H I MPA C T E V A L U A T I O N s




results to intended users in a day or two, or at most a couple of weeks, rather than in
months or on a routine schedule of standard quarterly reports (a common information
system reporting timeframe).
   Developmental evaluation aims for real-time feedback, but not all real-time data use
and evaluation is developmental. Police departments use real-time data on increasing
crime in a neighborhood to reallocate personnel from lower crime to higher crime areas.
That is real-time evaluation and data use, but it is not developmental. This real-time use
of data by police involves implementing a rapid response management approach, but the
police are not developing that approach. In contrast, if crime data in a community indi-
cated a national gang was moving into the community, the police could develop a task
force to fight gang recruitment, infiltration, and crime and monitor emergent effects as
the gang adapted to police attention so that police could adapt accordingly. That would
be developmental evaluation because the intervention is emerging in real time and using
evaluation data to adapt the intervention to what emerges in real time.

Developmental Evaluation versus Development Evaluation
Developmental evaluation is easily confused with development evaluation. They are not
the same, though developmental evaluation can be used in development evaluations.
This has created some confusion, which I regret, and hereby address.
   Development evaluation is a generic term for evaluations conducted in developing
countries, usually focused on the effectiveness of international aid programs and agen-
cies. The work of IEG is development evaluation. The Road to Results: Designing and
Conducting Development Evaluations (Imas & Rist, 2009) is an exemplar of this genre,
a book based on The World Bank’s highly successful International Program for Devel-
opment Evaluation Training (IPDET) which the book’s authors founded and direct, and
on which their book is based. Full disclosure: I have been on the IPDET faculty since
the program began.
   Developmental evaluation, as defined and described in the Encyclopedia of Evalua-
tion (Mathison, 2005, p.116), has the purpose of helping develop an innovation, interven-
tion, or program. In developmental evaluation the evaluator typically becomes part of the
program or innovation design team, fully participating in decisions and facilitating discus-
sion about how to evaluate whatever happens. All team members, together, interpret evalu-
ation findings, analyze implications and apply results to the next stage of development. The
evaluator becomes involved in improving the intervention and uses evaluative approaches
to facilitate ongoing program, project, product, staff and/or organizational development.
The evaluator’s primary function in the team is to facilitate and elucidate team discussions
by infusing evaluative questions, data and logic, and to support data-based decision-making
in the developmental process. In this regard, developmental evaluation is analogous to
research and development (R & D) units in which the evaluative perspective is internalized



82
            E x p l o r i n g t h e P o t e n t i a l o f Re a l - T i m e a n d P r o s p e c t i v e E v a l u a t i o n s




in and integrated into the operating unit. In playing the role of developmental evaluator,
the evaluator helps make an intervention’s development an R & D activity.
    Part of the value of an experienced developmental evaluator to an innovation team
is bringing a reservoir of knowledge (based on many years of practice and having read
a great many evaluation reports) about what kinds of things tend to work and where
to anticipate problems. Experienced evaluators have typically accumulated a great deal
of knowledge and wisdom about what works and doesn’t work. More generally, as a
profession, the field of evaluation has generated a great deal of knowledge about pat-
terns of effectiveness. That knowledge makes evaluators valuable partners in designing
as well as evaluating social innovations.
    An evaluation focused on development assistance in developing countries could use
a developmental evaluation approach, especially if such developmental assistance is
viewed as occurring under conditions of complexity with a focus on adaptation to local
context. But developmental evaluations are by no means limited to projects in develop-
ing countries. Developmental evaluation can be used anywhere that social innovators
are engaged in bringing about systems change under conditions of complexity.
    The al in developmental is easily missed, but it is critical in distinguishing develop-
ment evaluation from developmental evaluation. Exhibit 4 portrays the relationship
between development evaluation and developmental evaluation.

                                                    Exhibit 4

                  DD 2 = Developmental evaluation used
                         for development evaluation




                   Development                          2
                                                                 Developmental
                    Evaluation                    DD              Evaluation




   When I first labeled and wrote about developmental evaluation 15 years ago (Patton,
1994), development evaluation was not a distinct and visible category of evaluation prac-
tice and scholarship. Evaluations in developing countries were certainly being conducted,
but an identifiable body of literature focused on evaluating development assistance had
not attracted general professional attention. One of the most important trends of the last



                                                                                                                        83
H I G H I MPA C T E V A L U A T I O N s




decade has been the rapid diffusion of evaluation throughout the world, including espe-
cially the developing world, highlighted by formation of the International Development
Evaluation Association which launched in Beijing, China, in 2002. IEG has been a leader in
developing development evaluation as a field of professional practice in evaluation.
   Confusion about the distinct and sometimes overlapping niches of development
evaluation and developmental evaluation is now, I’m afraid, part of the complex land-
scape of international evaluation. I hope this paper helps sort out both the distinctions
and the areas of overlap.

Ten other issues and controversies in Prospective Evaluation under con-
ditions of complexity
Here are some of the issues and controversies in Prospective Evaluation under condi-
tions of complexity:

1.		 Maintaining a results focus: Should there be and can there be pre-ordinate tar-
     geted outcomes? How can interventions be results-oriented under conditions of
     high uncertain and dynamical complexity?
2.		 Comparative analysis: Can baselines be revised given dynamic and dynamic condi-
     tions? Getting beyond static and sacrosanct baselines.
3.		 Emergence: How do we take emergence seriously? Getting beyond token attention
     to “unanticipated consequences.”
4.		 Flexible designs: How do we adapt evaluation to complex circumstances with emergent
     and flexible designs and measures?
5.		 Evaluation budgeting: How do we engage in contingency-based evaluation bud-
     geting?
6.		 Poverty focus: How can evaluate under conditions maintain a focus on poverty
     when more developed (relatively) countries may have more capacity for rapid
     adaptability?
7.		 Evaluation within a macro systems context: Climates change and the global eco-
     nomic crisis provide a context within which any particular evaluation will unfold
     for the foreseeable future. How does evaluation take this larger global context into
     consideration?
8.		 Sustainability Concerns: Under conditions of complexity sustainability means
     resilience rather than continuity, yet most traditional approaches to evaluation
     continue to treat continuity as the criterion for sustainability.
9.		 Forward- looking (prospective) uncertainties: Prospective Evaluation will offer prob-
     ability estimates under conditions of high uncertainty and little likelihood of being
     accurate. What form should such estimates take? For example, we will likely know
     more about factors to worry about than be able to offer actual estimates of results, but
     results estimates may be expected. Can we use scenario approaches instead of static
     future estimates? What caveats need to be included in prospective evaluation?

84
            E x p l o r i n g t h e P o t e n t i a l o f Re a l - T i m e a n d P r o s p e c t i v e E v a l u a t i o n s




10.	 Rapid and ongoing updates: Traditional evaluation focuses the action on the begin-
     ning (baseline), middle (progress report) and end (accountability and summative
     evaluation). In M & E, monitoring has served program management purposes more
     than evaluation. How can ongoing evaluation and updating prospective evaluation
     scenarios be built into evaluation under conditions of complexity?

The essence of utilization-focused developmental evaluation
So, bottom line: How can you tell if an evaluation is truly developmental? The answer
lies in focusing on the evaluation’s primary purpose and outcomes: Is the purpose and
focus of the evaluation helping develop something? Is something getting developed?
Did something get developed? If so, what? How? With what implications? The focus of
developmental evaluation is on developing and adapting innovations.
    To borrow an old saying, the proof of the pudding is in the eating. Since I distinguish
developments from improvements, and position developmental evaluation as different in
important ways from formative and summative evaluation, let me offer this cooking meta-
phor. Distinguished evaluation theorist and practitioner Bob Stake has explained: When
the cook tastes the soup, that’s formative; when the guests taste the soup, that’s sum-
mative. More generally, anything done to the soup during preparation in the kitchen is
improvement-oriented; when the soup is served, summative judgment is rendered by the
guests who consume the soup. And what of developmental evaluation in this metaphor?
    Developmental evaluation begins when, before cooking, the chef goes to the market
to see what vegetables are freshest, what fish has just arrived, and meanders through
the market considering possibilities, thinking about who the guests will be, what they
were served last time, what the weather is like, and considers how adventurous and
innovative to be with the meal. If the chef decides to follow a standard recipe, the situ-
ation remains appropriate for formative and summative evaluations based on fidelity
to the prescribed recipe. If the chef decides to attempt a new creation, innovate, and
develop a new dish especially well-suited for these particular guests in the context of
this particular evening, then the situation opens up the possibility for creativity and
developmental evaluation. And when a guest and a cook create and concoct a soup
together, that co-creation is developmental.

Situational Responsiveness and Developmental Evaluation
This entire paper has been about how we figure out what situation we face so we can
engage appropriately. In particular, I have been delineating and refining the niche of
developmental evaluation as especially appropriate for interventions and innovations
being undertaken under conditions of complexity. Applying David Snowden’s advice to
leaders, the message of this paper has been:

   Wise evaluators tailor their approach to fit the complexity of the circumstances
they face.

                                                                                                                        85
H I G H I MPA C T E V A L U A T I O N s




References

American Evaluation Association. (2004). Guiding principles for evaluators. http://www.eval.org/
   Publications/GuidingPrinciples.asp
Gladwell, M. (2002). The tipping point: How little things can make a big difference. Boston:
   Little, Brown.
Gleick, J. (1987). Chaos: Making a new science. New York: Penguin.
Johnson, S. (2001). Emergence: The connected lives of ants, brains, cities, and software. New
   York: Scribner.
Joint Committee on Standards. (1994) The program evaluation standards. Thousand Oaks, CA:
   Sage. http://www.wmich.edu/evalctr/jc/
Kurtz, C. F. & Snowden, D. J. (2003) The new dynamics of strategy: Sense-making in a complex
   and complicated world. IBM Systems Journal, 48 (3), 462–483.
Morra-Imas, L.G. & Rist, R. (2009). The road to results: Designing and conducting development
   Evaluations. Washington, D.C.: The World Bank.
Patton, M. Q. (1994). Developmental evaluation. Evaluation Practice 15 (3), 311–20.
Patton, M. Q. (2008). Utilization-focused evaluation, 4th ed. Thousand Oaks, CA: Sage.
Patton, M. (2010). Developmental evaluation: Applying complexity concepts to enhance use
   and innovation. New York: Guilford.
Ramalingam, B. & Jones, H. with Reba, T. & Young, J. (2008). Exploring the science of complex-
   ity: Ideas and implications for development and humanitarian efforts. Working Paper 285.
   London: Overseas Development Institute.
Snowden, D. J. & Boone, M.E. (2007). A leader’s framework for decision making. Harvard
   Business Review. 85 (11), 68–77.
Taleb, N.N. (2007). The black swan: The impact of the highly improbable. New York: Random
   House.
Weick, K. E. & Sutcliffe, K. (2001). Managing the unexpected: Assuring high performance in an
   age of complexity. San Francisco: Jossey-Bass.
Westley, F., B. Zimmerman & M. Q. Patton. (2006). Getting to maybe: How the world is changed.
   Toronto: Random House Canada.
Williams, B. & Iman, I. (Eds.) (2007). Systems concepts in evaluation: An expert anthology.
   American Evaluation Association monograph. Point Reyes, CA: EdgePress of Inverness.
Zimmerman, B. & Glouberman, S. (2004). Complicated and complex systems: What would success-
   ful reform of Medicare look like? In P.G. Forest, T. McIntosh, & G. Marchildon (Eds.) Health
   Care Services and the Process of Change (pp.21-53). Toronto: University of Toronto Press.
   Originally published as Discussion paper No. 8. (2002). Ottawa: Commission on the Future of
   Health Care in Canada
Zimmerman, B., Lindberg, C. & Plsek, P. (1998). edgeware: insights from complexity ideas for
   health care leaders. Irving, Texas: VHA.




86