Wednesday, November 17, 2010

My Return

I think it has been over a year since my last post here. Let’s just say I’m not a prolific blogger and leave it at that ok? What I have learned is that I blog when I think of something important that I would like to discuss with you dear reader.

So, what got me to return here after 16 months of just twitter comments? - The American Evaluation Association’s (AEA) annual meeting/conference/party. It is there that you will find the majority of the great thinkers in the field, their students, and the practitioners that attempt to make sense of the world. There is a level of intellectual discourse that differs from any other conference I’ve attended. It is a homecoming of sorts for most of us, who struggle and contemplate how best to evaluate and inform the various communities and organizations we serve.

It was there – last week, that I had a moment of crisis. Oh it was coming – it had been building like a slowly evolving migraine. And on Wednesday, in a session I was attending, it exploded across my poor brain. Evaluators proposing the notion that evaluation should inform decision making in as near real-time as possible. At once I knew that I was in trouble…

You see dear reader, my roots are in applied social psychology – empiricism reigned supreme in my thoughts around anything having to do with research methods. I’m like many evaluators, finding the profession through another and my past clearly colors my future. However, the shade of tint has been changing over time. As you have read my previous blog posts, you probably know that I’ve also been colored by an experience in quality management and a need to help programming succeed. That flavor has affected how I go about my practice as an evaluator as well.

The two views competed with each other a bit and one could argue I was a “mixed method” evaluator in that I craved the “certainty” (yes, I know it really isn’t certain) of empiricism and the impact on program and organizational improvement that more interactive forms of evaluation can provide. I would flip back and forth and to be honest, I still oscillate between the two like a strange quark, vibrating between these “approaches”. But, it wasn’t until my moment of panic today that I noticed how quickly I quivered.

And so dear reader, I come to you. In confusion and admittedly some fear. You see, in my role as Director of Evaluation for a foundation, I want it all. I’m sure my fellow staff members want it all. My Board wants it all. And I think my grantees want it all too. We want to know what “conclusively” works so that we can generalize these learnings to other programs and projects we fund. We want the evaluation “results” (outcomes) to inform our future grantmaking. We want good programmatic ideas to spread. The empiricist in me argues that the evaluator needs to be the critical friend that watched the program get dressed, go out on its date, and succeed or fail without providing any advice.

But, our desire to see the programs we fund succeed, we also want to be that critical friend that after seeing your outfit, suggests that you change it before going out and observes how the date is going and provides ideas of different topics of “small talk” or notices that the place doesn’t work for the person you are with and suggests alternate places to go for the date. We want that date to succeed. We want that program to succeed. But we also want to know at the end of the date whether the whole package works.

Peter York of TCC Group made an interesting observation in a session at AEA. It was in reference to a different issue, but somewhat related. I am curious to hear more from him on his thoughts, but it got me thinking. What if we broke the program or date into smaller parts instead of evaluating the whole thing? The solution allows for more interventional evaluation (preventing you from continuing to talk about your last significant other and suggesting other topics to discuss – like the weather) and maintains some of the possible strengths of empirical rigor. By chunking the process of the program into smaller parts, there is a more rapid cycle of reporting and an opportunity to improve the program.

This only gets us so far. We have to have evaluation questions that are only focused on the components, which have to be time-specific. This might actually be good from a generalizability standpoint as few programs are into copied lock, stock, and barrel. Rather, based upon the context of the environment and resources available, components of the program are implemented.

There is another issue as well - specifically, the “intervention” of the evaluation (assisting with identifying issues with the program and providing suggestions for changes). One great argument against this is that the program has been “tainted” by the process of evaluation and is no longer pure. Here’s where I’ve landed on this topic this morning:

• Programs change over time with or without formal evaluation. They change for various reasons – one being because someone has made an observation that things aren’t working as they would expect. Why is it so wrong that someone not be better informed by a system that has more rigor?

• As I mentioned above, the programs change over time. This is a problem faced by longer-term empirical designs and frankly is ignored often in these discussions. Live programs are not like the labs where much of social science is conducted – things happen.

Huey Chen made an interesting observation in a presentation this past week at AEA. At the time, he was discussing the idea that random control trials (RCT), while appropriate at certain points in evaluation practice, are better conducted when issues of efficacy and validity are addressed in previous evaluations. Taking his comments further (and of course without his permission at this point), I would argue that evaluation focused on program generalizability should only be conducted after a meta-analysis (in the broadest form, not the statistical method) indicates that in fact, the whole program might work across multiple domains and contexts.

So – where does this all leave me in my crisis? I should tell you that I’m feeling better – much better. You see, it really comes down to the evaluation question and whether that question is appropriate. The appropriateness of the question is more tied to timing and results of previous evaluations. If we are talking about a new program, it is important to conduct interventional evaluation – we are collaborating in building the best program possible. In more mature designs that have been conducted in a few places, assessment of the programmatic model now makes more sense and a more empirical model of evaluation would be more appropriate. It is all about timing and maturity.

Funders still want it all and so do I. How do we allow a funder that only is interested in starting new programs the opportunity to say that their program idea should be replicated, yet allow for interventional evaluation as well? I’ve three criteria here:

• Fund multiple programs (and no, not just 5).

• Fund internal interventional evaluations for each program.

• Fund a separate initiative level evaluation that can look across all the programs, their contexts, and the organizational interventions. (Including the interventions of the internal evaluations).

In this case, there is a different focus/viewpoint of programming. For as long as I’ve been an evaluator, there has been this constant complaint that organizations do not internalize evaluation – that they do not integrate evaluation into programming. Here is the opportunity to build that framework. Evaluators complain that evaluation is viewed as separate from programming – yet the whole empiricist view of evaluation would place evaluation outside the program – an observer looking through the glass and watching to later remark on what happened and perhaps why. “Evaluation” is being conducted by amateurs on a daily basis to run their programs – why don’t we empower them to do it right? Empower them to integrate better evaluative practices into their programming? And then recognize that evaluation is an integral part of programming, seeing it as an operational activity that affects programming in similar manners as staffing, networks of support, and environment – which we already consider appropriate evaluands?

Michael Scriven talks about it being time for a revolution in evaluation. Perhaps it is time to drive the spike in to connect the rails that he and David Fetterman have been building. Perhaps it is time to agree that interventional evaluation and the more empirical forms of evaluation can coexist much like we have found that qualitative and quantitative methods can coexist and in fact enhance one another through mixed methods approaches to evaluation.

As always, I’m open to your comments, suggestions, and questions. Please feel free to post comments.

Best regards,
Charles Gasper
The Evaluation Evangelist