program evaluation methodology sample

Search form

program evaluation methodology sample

  • Table of Contents
  • Troubleshooting Guide
  • A Model for Getting Started
  • Justice Action Toolkit
  • Best Change Processes
  • Databases of Best Practices
  • Online Courses
  • Ask an Advisor
  • Subscribe to eNewsletter
  • Community Stories
  • YouTube Channel
  • About the Tool Box
  • How to Use the Tool Box
  • Privacy Statement
  • Workstation/Check Box Sign-In
  • Online Training Courses
  • Capacity Building Training
  • Training Curriculum - Order Now
  • Community Check Box Evaluation System
  • Build Your Toolbox
  • Facilitation of Community Processes
  • Community Health Assessment and Planning
  • Section 1. A Framework for Program Evaluation: A Gateway to Tools

Chapter 36 Sections

  • Section 2. Community-based Participatory Research
  • Section 3. Understanding Community Leadership, Evaluators, and Funders: What Are Their Interests?
  • Section 4. Choosing Evaluators
  • Section 5. Developing an Evaluation Plan
  • Section 6. Participatory Evaluation

 

The Tool Box needs your help
to remain available.

Your contribution can help change lives.
.

 

Seeking supports for evaluation?
.

  • Main Section
Learn how program evaluation makes it easier for everyone involved in community health and development work to evaluate their efforts.
This section is adapted from the article "Recommended Framework for Program Evaluation in Public Health Practice," by Bobby Milstein, Scott Wetterhall, and the CDC Evaluation Working Group.

Around the world, there exist many programs and interventions developed to improve conditions in local communities. Communities come together to reduce the level of violence that exists, to work for safe, affordable housing for everyone, or to help more students do well in school, to give just a few examples.

But how do we know whether these programs are working? If they are not effective, and even if they are, how can we improve them to make them better for local communities? And finally, how can an organization make intelligent choices about which promising programs are likely to work best in their community?

Over the past years, there has been a growing trend towards the better use of evaluation to understand and improve practice.The systematic use of evaluation has solved many problems and helped countless community-based organizations do what they do better.

Despite an increased understanding of the need for - and the use of - evaluation, however, a basic agreed-upon framework for program evaluation has been lacking. In 1997, scientists at the United States Centers for Disease Control and Prevention (CDC) recognized the need to develop such a framework. As a result of this, the CDC assembled an Evaluation Working Group comprised of experts in the fields of public health and evaluation. Members were asked to develop a framework that summarizes and organizes the basic elements of program evaluation. This Community Tool Box section describes the framework resulting from the Working Group's efforts.

Before we begin, however, we'd like to offer some definitions of terms that we will use throughout this section.

By evaluation , we mean the systematic investigation of the merit, worth, or significance of an object or effort. Evaluation practice has changed dramatically during the past three decades - new methods and approaches have been developed and it is now used for increasingly diverse projects and audiences.

Throughout this section, the term program is used to describe the object or effort that is being evaluated. It may apply to any action with the goal of improving outcomes for whole communities, for more specific sectors (e.g., schools, work places), or for sub-groups (e.g., youth, people experiencing violence or HIV/AIDS). This definition is meant to be very broad.

Examples of different types of programs include:

  • Direct service interventions (e.g., a program that offers free breakfast to improve nutrition for grade school children)
  • Community mobilization efforts (e.g., organizing a boycott of California grapes to improve the economic well-being of farm workers)
  • Research initiatives (e.g., an effort to find out whether inequities in health outcomes based on race can be reduced)
  • Surveillance systems (e.g., whether early detection of school readiness improves educational outcomes)
  • Advocacy work (e.g., a campaign to influence the state legislature to pass legislation regarding tobacco control)
  • Social marketing campaigns (e.g., a campaign in the Third World encouraging mothers to breast-feed their babies to reduce infant mortality)
  • Infrastructure building projects (e.g., a program to build the capacity of state agencies to support community development initiatives)
  • Training programs (e.g., a job training program to reduce unemployment in urban neighborhoods)
  • Administrative systems (e.g., an incentive program to improve efficiency of health services)

Program evaluation - the type of evaluation discussed in this section - is an essential organizational practice for all types of community health and development work. It is a way to evaluate the specific projects and activities community groups may take part in, rather than to evaluate an entire organization or comprehensive community initiative.

Stakeholders refer to those who care about the program or effort. These may include those presumed to benefit (e.g., children and their parents or guardians), those with particular influence (e.g., elected or appointed officials), and those who might support the effort (i.e., potential allies) or oppose it (i.e., potential opponents). Key questions in thinking about stakeholders are: Who cares? What do they care about?

This section presents a framework that promotes a common understanding of program evaluation. The overall goal is to make it easier for everyone involved in community health and development work to evaluate their efforts.

Why evaluate community health and development programs?

The type of evaluation we talk about in this section can be closely tied to everyday program operations. Our emphasis is on practical, ongoing evaluation that involves program staff, community members, and other stakeholders, not just evaluation experts. This type of evaluation offers many advantages for community health and development professionals.

For example, it complements program management by:

  • Helping to clarify program plans
  • Improving communication among partners
  • Gathering the feedback needed to improve and be accountable for program effectiveness

It's important to remember, too, that evaluation is not a new activity for those of us working to improve our communities. In fact, we assess the merit of our work all the time when we ask questions, consult partners, make assessments based on feedback, and then use those judgments to improve our work. When the stakes are low, this type of informal evaluation might be enough. However, when the stakes are raised - when a good deal of time or money is involved, or when many people may be affected - then it may make sense for your organization to use evaluation procedures that are more formal, visible, and justifiable.

How do you evaluate a specific program?

Before your organization starts with a program evaluation, your group should be very clear about the answers to the following questions:.

  • What will be evaluated?
  • What criteria will be used to judge program performance?
  • What standards of performance on the criteria must be reached for the program to be considered successful?
  • What evidence will indicate performance on the criteria relative to the standards?
  • What conclusions about program performance are justified based on the available evidence?

To clarify the meaning of each, let's look at some of the answers for Drive Smart, a hypothetical program begun to stop drunk driving.

  • Drive Smart, a program focused on reducing drunk driving through public education and intervention.
  • The number of community residents who are familiar with the program and its goals
  • The number of people who use "Safe Rides" volunteer taxis to get home
  • The percentage of people who report drinking and driving
  • The reported number of single car night time crashes (This is a common way to try to determine if the number of people who drive drunk is changing)
  • 80% of community residents will know about the program and its goals after the first year of the program
  • The number of people who use the "Safe Rides" taxis will increase by 20% in the first year
  • The percentage of people who report drinking and driving will decrease by 20% in the first year
  • The reported number of single car night time crashes will decrease by 10 % in the program's first two years
  • A random telephone survey will demonstrate community residents' knowledge of the program and changes in reported behavior
  • Logs from "Safe Rides" will tell how many people use their services
  • Information on single car night time crashes will be gathered from police records
  • Are the changes we have seen in the level of drunk driving due to our efforts, or something else? Or (if no or insufficient change in behavior or outcome,)
  • Should Drive Smart change what it is doing, or have we just not waited long enough to see results?

The following framework provides an organized approach to answer these questions.

A framework for program evaluation

Program evaluation offers a way to understand and improve community health and development practice using methods that are useful, feasible, proper, and accurate. The framework described below is a practical non-prescriptive tool that summarizes in a logical order the important elements of program evaluation.

The framework contains two related dimensions:

  • Steps in evaluation practice, and
  • Standards for "good" evaluation.

The six connected steps of the framework are actions that should be a part of any evaluation. Although in practice the steps may be encountered out of order, it will usually make sense to follow them in the recommended sequence. That's because earlier steps provide the foundation for subsequent progress. Thus, decisions about how to carry out a given step should not be finalized until prior steps have been thoroughly addressed.

However, these steps are meant to be adaptable, not rigid. Sensitivity to each program's unique context (for example, the program's history and organizational climate) is essential for sound evaluation. They are intended to serve as starting points around which community organizations can tailor an evaluation to best meet their needs.

  • Engage stakeholders
  • Describe the program
  • Focus the evaluation design
  • Gather credible evidence
  • Justify conclusions
  • Ensure use and share lessons learned

Understanding and adhering to these basic steps will improve most evaluation efforts.

The second part of the framework is a basic set of standards to assess the quality of evaluation activities. There are 30 specific standards, organized into the following four groups:

  • Feasibility

These standards help answer the question, "Will this evaluation be a 'good' evaluation?" They are recommended as the initial criteria by which to judge the quality of the program evaluation efforts.

Engage Stakeholders

Stakeholders are people or organizations that have something to gain or lose from what will be learned from an evaluation, and also in what will be done with that knowledge. Evaluation cannot be done in isolation. Almost everything done in community health and development work involves partnerships - alliances among different organizations, board members, those affected by the problem, and others. Therefore, any serious effort to evaluate a program must consider the different values held by the partners. Stakeholders must be part of the evaluation to ensure that their unique perspectives are understood. When stakeholders are not appropriately involved, evaluation findings are likely to be ignored, criticized, or resisted.

However, if they are part of the process, people are likely to feel a good deal of ownership for the evaluation process and results. They will probably want to develop it, defend it, and make sure that the evaluation really works.

That's why this evaluation cycle begins by engaging stakeholders. Once involved, these people will help to carry out each of the steps that follows.

Three principle groups of stakeholders are important to involve:

  • People or organizations involved in program operations may include community members, sponsors, collaborators, coalition partners, funding officials, administrators, managers, and staff.
  • People or organizations served or affected by the program may include clients, family members, neighborhood organizations, academic institutions, elected and appointed officials, advocacy groups, and community residents. Individuals who are openly skeptical of or antagonistic toward the program may also be important to involve. Opening an evaluation to opposing perspectives and enlisting the help of potential program opponents can strengthen the evaluation's credibility.

Likewise, individuals or groups who could be adversely or inadvertently affected by changes arising from the evaluation have a right to be engaged. For example, it is important to include those who would be affected if program services were expanded, altered, limited, or ended as a result of the evaluation.

  • Primary intended users of the evaluation are the specific individuals who are in a position to decide and/or do something with the results.They shouldn't be confused with primary intended users of the program, although some of them should be involved in this group. In fact, primary intended users should be a subset of all of the stakeholders who have been identified. A successful evaluation will designate primary intended users, such as program staff and funders, early in its development and maintain frequent interaction with them to be sure that the evaluation specifically addresses their values and needs.

The amount and type of stakeholder involvement will be different for each program evaluation. For instance, stakeholders can be directly involved in designing and conducting the evaluation. They can be kept informed about progress of the evaluation through periodic meetings, reports, and other means of communication.

It may be helpful, when working with a group such as this, to develop an explicit process to share power and resolve conflicts . This may help avoid overemphasis of values held by any specific stakeholder.

Describe the Program

A program description is a summary of the intervention being evaluated. It should explain what the program is trying to accomplish and how it tries to bring about those changes. The description will also illustrate the program's core components and elements, its ability to make changes, its stage of development, and how the program fits into the larger organizational and community environment.

How a program is described sets the frame of reference for all future decisions about its evaluation. For example, if a program is described as, "attempting to strengthen enforcement of existing laws that discourage underage drinking," the evaluation might be very different than if it is described as, "a program to reduce drunk driving by teens." Also, the description allows members of the group to compare the program to other similar efforts, and it makes it easier to figure out what parts of the program brought about what effects.

Moreover, different stakeholders may have different ideas about what the program is supposed to achieve and why. For example, a program to reduce teen pregnancy may have some members who believe this means only increasing access to contraceptives, and other members who believe it means only focusing on abstinence.

Evaluations done without agreement on the program definition aren't likely to be very useful. In many cases, the process of working with stakeholders to develop a clear and logical program description will bring benefits long before data are available to measure program effectiveness.

There are several specific aspects that should be included when describing a program.

Statement of need

A statement of need describes the problem, goal, or opportunity that the program addresses; it also begins to imply what the program will do in response. Important features to note regarding a program's need are: the nature of the problem or goal, who is affected, how big it is, and whether (and how) it is changing.

Expectations

Expectations are the program's intended results. They describe what the program has to accomplish to be considered successful. For most programs, the accomplishments exist on a continuum (first, we want to accomplish X... then, we want to do Y...). Therefore, they should be organized by time ranging from specific (and immediate) to broad (and longer-term) consequences. For example, a program's vision, mission, goals, and objectives , all represent varying levels of specificity about a program's expectations.

Activities are everything the program does to bring about changes. Describing program components and elements permits specific strategies and actions to be listed in logical sequence. This also shows how different program activities, such as education and enforcement, relate to one another. Describing program activities also provides an opportunity to distinguish activities that are the direct responsibility of the program from those that are conducted by related programs or partner organizations. Things outside of the program that may affect its success, such as harsher laws punishing businesses that sell alcohol to minors, can also be noted.

Resources include the time, talent, equipment, information, money, and other assets available to conduct program activities. Reviewing the resources a program has tells a lot about the amount and intensity of its services. It may also point out situations where there is a mismatch between what the group wants to do and the resources available to carry out these activities. Understanding program costs is a necessity to assess the cost-benefit ratio as part of the evaluation.

Stage of development

A program's stage of development reflects its maturity. All community health and development programs mature and change over time. People who conduct evaluations, as well as those who use their findings, need to consider the dynamic nature of programs. For example, a new program that just received its first grant may differ in many respects from one that has been running for over a decade.

At least three phases of development are commonly recognized: planning , implementation , and effects or outcomes . In the planning stage, program activities are untested and the goal of evaluation is to refine plans as much as possible. In the implementation phase, program activities are being field tested and modified; the goal of evaluation is to see what happens in the "real world" and to improve operations. In the effects stage, enough time has passed for the program's effects to emerge; the goal of evaluation is to identify and understand the program's results, including those that were unintentional.

A description of the program's context considers the important features of the environment in which the program operates. This includes understanding the area's history, geography, politics, and social and economic conditions, and also what other organizations have done. A realistic and responsive evaluation is sensitive to a broad range of potential influences on the program. An understanding of the context lets users interpret findings accurately and assess their generalizability. For example, a program to improve housing in an inner-city neighborhood might have been a tremendous success, but would likely not work in a small town on the other side of the country without significant adaptation.

Logic model

A logic model synthesizes the main program elements into a picture of how the program is supposed to work. It makes explicit the sequence of events that are presumed to bring about change. Often this logic is displayed in a flow-chart, map, or table to portray the sequence of steps leading to program results.

Creating a logic model allows stakeholders to improve and focus program direction. It reveals assumptions about conditions for program effectiveness and provides a frame of reference for one or more evaluations of the program. A detailed logic model can also be a basis for estimating the program's effect on endpoints that are not directly measured. For example, it may be possible to estimate the rate of reduction in disease from a known number of persons experiencing the intervention if there is prior knowledge about its effectiveness.

The breadth and depth of a program description will vary for each program evaluation. And so, many different activities may be part of developing that description. For instance, multiple sources of information could be pulled together to construct a well-rounded description. The accuracy of an existing program description could be confirmed through discussion with stakeholders. Descriptions of what's going on could be checked against direct observation of activities in the field. A narrow program description could be fleshed out by addressing contextual factors (such as staff turnover, inadequate resources, political pressures, or strong community participation) that may affect program performance.

Focus the Evaluation Design

By focusing the evaluation design, we mean doing advance planning about where the evaluation is headed, and what steps it will take to get there. It isn't possible or useful for an evaluation to try to answer all questions for all stakeholders; there must be a focus. A well-focused plan is a safeguard against using time and resources inefficiently.

Depending on what you want to learn, some types of evaluation will be better suited than others. However, once data collection begins, it may be difficult or impossible to change what you are doing, even if it becomes obvious that other methods would work better. A thorough plan anticipates intended uses and creates an evaluation strategy with the greatest chance to be useful, feasible, proper, and accurate.

Among the issues to consider when focusing an evaluation are:

Purpose refers to the general intent of the evaluation. A clear purpose serves as the basis for the design, methods, and use of the evaluation. Taking time to articulate an overall purpose will stop your organization from making uninformed decisions about how the evaluation should be conducted and used.

There are at least four general purposes for which a community group might conduct an evaluation:

  • To gain insight .This happens, for example, when deciding whether to use a new approach (e.g., would a neighborhood watch program work for our community?) Knowledge from such an evaluation will provide information about its practicality. For a developing program, information from evaluations of similar programs can provide the insight needed to clarify how its activities should be designed.
  • To improve how things get done .This is appropriate in the implementation stage when an established program tries to describe what it has done. This information can be used to describe program processes, to improve how the program operates, and to fine-tune the overall strategy. Evaluations done for this purpose include efforts to improve the quality, effectiveness, or efficiency of program activities.
  • To determine what the effects of the program are . Evaluations done for this purpose examine the relationship between program activities and observed consequences. For example, are more students finishing high school as a result of the program? Programs most appropriate for this type of evaluation are mature programs that are able to state clearly what happened and who it happened to. Such evaluations should provide evidence about what the program's contribution was to reaching longer-term goals such as a decrease in child abuse or crime in the area. This type of evaluation helps establish the accountability, and thus, the credibility, of a program to funders and to the community.
  • Empower program participants (for example, being part of an evaluation can increase community members' sense of control over the program);
  • Supplement the program (for example, using a follow-up questionnaire can reinforce the main messages of the program);
  • Promote staff development (for example, by teaching staff how to collect, analyze, and interpret evidence); or
  • Contribute to organizational growth (for example, the evaluation may clarify how the program relates to the organization's mission).

Users are the specific individuals who will receive evaluation findings. They will directly experience the consequences of inevitable trade-offs in the evaluation process. For example, a trade-off might be having a relatively modest evaluation to fit the budget with the outcome that the evaluation results will be less certain than they would be for a full-scale evaluation. Because they will be affected by these tradeoffs, intended users have a right to participate in choosing a focus for the evaluation. An evaluation designed without adequate user involvement in selecting the focus can become a misguided and irrelevant exercise. By contrast, when users are encouraged to clarify intended uses, priority questions, and preferred methods, the evaluation is more likely to focus on things that will inform (and influence) future actions.

Uses describe what will be done with what is learned from the evaluation. There is a wide range of potential uses for program evaluation. Generally speaking, the uses fall in the same four categories as the purposes listed above: to gain insight, improve how things get done, determine what the effects of the program are, and affect participants. The following list gives examples of uses in each category.

Some specific examples of evaluation uses

To gain insight:.

  • Assess needs and wants of community members
  • Identify barriers to use of the program
  • Learn how to best describe and measure program activities

To improve how things get done:

  • Refine plans for introducing a new practice
  • Determine the extent to which plans were implemented
  • Improve educational materials
  • Enhance cultural competence
  • Verify that participants' rights are protected
  • Set priorities for staff training
  • Make mid-course adjustments
  • Clarify communication
  • Determine if client satisfaction can be improved
  • Compare costs to benefits
  • Find out which participants benefit most from the program
  • Mobilize community support for the program

To determine what the effects of the program are:

  • Assess skills development by program participants
  • Compare changes in behavior over time
  • Decide where to allocate new resources
  • Document the level of success in accomplishing objectives
  • Demonstrate that accountability requirements are fulfilled
  • Use information from multiple evaluations to predict the likely effects of similar programs

To affect participants:

  • Reinforce messages of the program
  • Stimulate dialogue and raise awareness about community issues
  • Broaden consensus among partners about program goals
  • Teach evaluation skills to staff and other stakeholders
  • Gather success stories
  • Support organizational change and improvement

The evaluation needs to answer specific questions . Drafting questions encourages stakeholders to reveal what they believe the evaluation should answer. That is, what questions are more important to stakeholders? The process of developing evaluation questions further refines the focus of the evaluation.

The methods available for an evaluation are drawn from behavioral science and social research and development. Three types of methods are commonly recognized. They are experimental, quasi-experimental, and observational or case study designs. Experimental designs use random assignment to compare the effect of an intervention between otherwise equivalent groups (for example, comparing a randomly assigned group of students who took part in an after-school reading program with those who didn't). Quasi-experimental methods make comparisons between groups that aren't equal (e.g. program participants vs. those on a waiting list) or use of comparisons within a group over time, such as in an interrupted time series in which the intervention may be introduced sequentially across different individuals, groups, or contexts. Observational or case study methods use comparisons within a group to describe and explain what happens (e.g., comparative case studies with multiple communities).

No design is necessarily better than another. Evaluation methods should be selected because they provide the appropriate information to answer stakeholders' questions, not because they are familiar, easy, or popular. The choice of methods has implications for what will count as evidence, how that evidence will be gathered, and what kind of claims can be made. Because each method option has its own biases and limitations, evaluations that mix methods are generally more robust.

Over the course of an evaluation, methods may need to be revised or modified. Circumstances that make a particular approach useful can change. For example, the intended use of the evaluation could shift from discovering how to improve the program to helping decide about whether the program should continue or not. Thus, methods may need to be adapted or redesigned to keep the evaluation on track.

Agreements summarize the evaluation procedures and clarify everyone's roles and responsibilities. An agreement describes how the evaluation activities will be implemented. Elements of an agreement include statements about the intended purpose, users, uses, and methods, as well as a summary of the deliverables, those responsible, a timeline, and budget.

The formality of the agreement depends upon the relationships that exist between those involved. For example, it may take the form of a legal contract, a detailed protocol, or a simple memorandum of understanding. Regardless of its formality, creating an explicit agreement provides an opportunity to verify the mutual understanding needed for a successful evaluation. It also provides a basis for modifying procedures if that turns out to be necessary.

As you can see, focusing the evaluation design may involve many activities. For instance, both supporters and skeptics of the program could be consulted to ensure that the proposed evaluation questions are politically viable. A menu of potential evaluation uses appropriate for the program's stage of development could be circulated among stakeholders to determine which is most compelling. Interviews could be held with specific intended users to better understand their information needs and timeline for action. Resource requirements could be reduced when users are willing to employ more timely but less precise evaluation methods.

Gather Credible Evidence

Credible evidence is the raw material of a good evaluation. The information learned should be seen by stakeholders as believable, trustworthy, and relevant to answer their questions. This requires thinking broadly about what counts as "evidence." Such decisions are always situational; they depend on the question being posed and the motives for asking it. For some questions, a stakeholder's standard for credibility could demand having the results of a randomized experiment. For another question, a set of well-done, systematic observations such as interactions between an outreach worker and community residents, will have high credibility. The difference depends on what kind of information the stakeholders want and the situation in which it is gathered.

Context matters! In some situations, it may be necessary to consult evaluation specialists. This may be especially true if concern for data quality is especially high. In other circumstances, local people may offer the deepest insights. Regardless of their expertise, however, those involved in an evaluation should strive to collect information that will convey a credible, well-rounded picture of the program and its efforts.

Having credible evidence strengthens the evaluation results as well as the recommendations that follow from them. Although all types of data have limitations, it is possible to improve an evaluation's overall credibility. One way to do this is by using multiple procedures for gathering, analyzing, and interpreting data. Encouraging participation by stakeholders can also enhance perceived credibility. When stakeholders help define questions and gather data, they will be more likely to accept the evaluation's conclusions and to act on its recommendations.

The following features of evidence gathering typically affect how credible it is seen as being:

Indicators translate general concepts about the program and its expected effects into specific, measurable parts.

Examples of indicators include:

  • The program's capacity to deliver services
  • The participation rate
  • The level of client satisfaction
  • The amount of intervention exposure (how many people were exposed to the program, and for how long they were exposed)
  • Changes in participant behavior
  • Changes in community conditions or norms
  • Changes in the environment (e.g., new programs, policies, or practices)
  • Longer-term changes in population health status (e.g., estimated teen pregnancy rate in the county)

Indicators should address the criteria that will be used to judge the program. That is, they reflect the aspects of the program that are most meaningful to monitor. Several indicators are usually needed to track the implementation and effects of a complex program or intervention.

One way to develop multiple indicators is to create a "balanced scorecard," which contains indicators that are carefully selected to complement one another. According to this strategy, program processes and effects are viewed from multiple perspectives using small groups of related indicators. For instance, a balanced scorecard for a single program might include indicators of how the program is being delivered; what participants think of the program; what effects are observed; what goals were attained; and what changes are occurring in the environment around the program.

Another approach to using multiple indicators is based on a program logic model, such as we discussed earlier in the section. A logic model can be used as a template to define a full spectrum of indicators along the pathway that leads from program activities to expected effects. For each step in the model, qualitative and/or quantitative indicators could be developed.

Indicators can be broad-based and don't need to focus only on a program's long -term goals. They can also address intermediary factors that influence program effectiveness, including such intangible factors as service quality, community capacity, or inter -organizational relations. Indicators for these and similar concepts can be created by systematically identifying and then tracking markers of what is said or done when the concept is expressed.

In the course of an evaluation, indicators may need to be modified or new ones adopted. Also, measuring program performance by tracking indicators is only one part of evaluation, and shouldn't be confused as a basis for decision making in itself. There are definite perils to using performance indicators as a substitute for completing the evaluation process and reaching fully justified conclusions. For example, an indicator, such as a rising rate of unemployment, may be falsely assumed to reflect a failing program when it may actually be due to changing environmental conditions that are beyond the program's control.

Sources of evidence in an evaluation may be people, documents, or observations. More than one source may be used to gather evidence for each indicator. In fact, selecting multiple sources provides an opportunity to include different perspectives about the program and enhances the evaluation's credibility. For instance, an inside perspective may be reflected by internal documents and comments from staff or program managers; whereas clients and those who do not support the program may provide different, but equally relevant perspectives. Mixing these and other perspectives provides a more comprehensive view of the program or intervention.

The criteria used to select sources should be clearly stated so that users and other stakeholders can interpret the evidence accurately and assess if it may be biased. In addition, some sources provide information in narrative form (for example, a person's experience when taking part in the program) and others are numerical (for example, how many people were involved in the program). The integration of qualitative and quantitative information can yield evidence that is more complete and more useful, thus meeting the needs and expectations of a wider range of stakeholders.

Quality refers to the appropriateness and integrity of information gathered in an evaluation. High quality data are reliable and informative. It is easier to collect if the indicators have been well defined. Other factors that affect quality may include instrument design, data collection procedures, training of those involved in data collection, source selection, coding, data management, and routine error checking. Obtaining quality data will entail tradeoffs (e.g. breadth vs. depth); stakeholders should decide together what is most important to them. Because all data have limitations, the intent of a practical evaluation is to strive for a level of quality that meets the stakeholders' threshold for credibility.

Quantity refers to the amount of evidence gathered in an evaluation. It is necessary to estimate in advance the amount of information that will be required and to establish criteria to decide when to stop collecting data - to know when enough is enough. Quantity affects the level of confidence or precision users can have - how sure we are that what we've learned is true. It also partly determines whether the evaluation will be able to detect effects. All evidence collected should have a clear, anticipated use.

By logistics , we mean the methods, timing, and physical infrastructure for gathering and handling evidence. People and organizations also have cultural preferences that dictate acceptable ways of asking questions and collecting information, including who would be perceived as an appropriate person to ask the questions. For example, some participants may be unwilling to discuss their behavior with a stranger, whereas others are more at ease with someone they don't know. Therefore, the techniques for gathering evidence in an evaluation must be in keeping with the cultural norms of the community. Data collection procedures should also ensure that confidentiality is protected.

Justify Conclusions

The process of justifying conclusions recognizes that evidence in an evaluation does not necessarily speak for itself. Evidence must be carefully considered from a number of different stakeholders' perspectives to reach conclusions that are well -substantiated and justified. Conclusions become justified when they are linked to the evidence gathered and judged against agreed-upon values set by the stakeholders. Stakeholders must agree that conclusions are justified in order to use the evaluation results with confidence.

The principal elements involved in justifying conclusions based on evidence are:

Standards reflect the values held by stakeholders about the program. They provide the basis to make program judgments. The use of explicit standards for judgment is fundamental to sound evaluation. In practice, when stakeholders articulate and negotiate their values, these become the standards to judge whether a given program's performance will, for instance, be considered "successful," "adequate," or "unsuccessful."

Analysis and synthesis

Analysis and synthesis are methods to discover and summarize an evaluation's findings. They are designed to detect patterns in evidence, either by isolating important findings (analysis) or by combining different sources of information to reach a larger understanding (synthesis). Mixed method evaluations require the separate analysis of each evidence element, as well as a synthesis of all sources to examine patterns that emerge. Deciphering facts from a given body of evidence involves deciding how to organize, classify, compare, and display information. These decisions are guided by the questions being asked, the types of data available, and especially by input from stakeholders and primary intended users.

Interpretation

Interpretation is the effort to figure out what the findings mean. Uncovering facts about a program's performance isn't enough to make conclusions. The facts must be interpreted to understand their practical significance. For example, saying, "15 % of the people in our area witnessed a violent act last year," may be interpreted differently depending on the situation. For example, if 50% of community members had watched a violent act in the last year when they were surveyed five years ago, the group can suggest that, while still a problem, things are getting better in the community. However, if five years ago only 7% of those surveyed said the same thing, community organizations may see this as a sign that they might want to change what they are doing. In short, interpretations draw on information and perspectives that stakeholders bring to the evaluation. They can be strengthened through active participation or interaction with the data and preliminary explanations of what happened.

Judgments are statements about the merit, worth, or significance of the program. They are formed by comparing the findings and their interpretations against one or more selected standards. Because multiple standards can be applied to a given program, stakeholders may reach different or even conflicting judgments. For instance, a program that increases its outreach by 10% from the previous year may be judged positively by program managers, based on standards of improved performance over time. Community members, however, may feel that despite improvements, a minimum threshold of access to services has still not been reached. Their judgment, based on standards of social equity, would therefore be negative. Conflicting claims about a program's quality, value, or importance often indicate that stakeholders are using different standards or values in making judgments. This type of disagreement can be a catalyst to clarify values and to negotiate the appropriate basis (or bases) on which the program should be judged.

Recommendations

Recommendations are actions to consider as a result of the evaluation. Forming recommendations requires information beyond just what is necessary to form judgments. For example, knowing that a program is able to increase the services available to battered women doesn't necessarily translate into a recommendation to continue the effort, particularly when there are competing priorities or other effective alternatives. Thus, recommendations about what to do with a given intervention go beyond judgments about a specific program's effectiveness.

If recommendations aren't supported by enough evidence, or if they aren't in keeping with stakeholders' values, they can really undermine an evaluation's credibility. By contrast, an evaluation can be strengthened by recommendations that anticipate and react to what users will want to know.

Three things might increase the chances that recommendations will be relevant and well-received:

  • Sharing draft recommendations
  • Soliciting reactions from multiple stakeholders
  • Presenting options instead of directive advice

Justifying conclusions in an evaluation is a process that involves different possible steps. For instance, conclusions could be strengthened by searching for alternative explanations from the ones you have chosen, and then showing why they are unsupported by the evidence. When there are different but equally well supported conclusions, each could be presented with a summary of their strengths and weaknesses. Techniques to analyze, synthesize, and interpret findings might be agreed upon before data collection begins.

Ensure Use and Share Lessons Learned

It is naive to assume that lessons learned in an evaluation will necessarily be used in decision making and subsequent action. Deliberate effort on the part of evaluators is needed to ensure that the evaluation findings will be used appropriately. Preparing for their use involves strategic thinking and continued vigilance in looking for opportunities to communicate and influence. Both of these should begin in the earliest stages of the process and continue throughout the evaluation.

The elements of key importance to be sure that the recommendations from an evaluation are used are:

Design refers to how the evaluation's questions, methods, and overall processes are constructed. As discussed in the third step of this framework (focusing the evaluation design), the evaluation should be organized from the start to achieve specific agreed-upon uses. Having a clear purpose that is focused on the use of what is learned helps those who will carry out the evaluation to know who will do what with the findings. Furthermore, the process of creating a clear design will highlight ways that stakeholders, through their many contributions, can improve the evaluation and facilitate the use of the results.

Preparation

Preparation refers to the steps taken to get ready for the future uses of the evaluation findings. The ability to translate new knowledge into appropriate action is a skill that can be strengthened through practice. In fact, building this skill can itself be a useful benefit of the evaluation. It is possible to prepare stakeholders for future use of the results by discussing how potential findings might affect decision making.

For example, primary intended users and other stakeholders could be given a set of hypothetical results and asked what decisions or actions they would make on the basis of this new knowledge. If they indicate that the evidence presented is incomplete or irrelevant and that no action would be taken, then this is an early warning sign that the planned evaluation should be modified. Preparing for use also gives stakeholders more time to explore both positive and negative implications of potential results and to identify different options for program improvement.

Feedback is the communication that occurs among everyone involved in the evaluation. Giving and receiving feedback creates an atmosphere of trust among stakeholders; it keeps an evaluation on track by keeping everyone informed about how the evaluation is proceeding. Primary intended users and other stakeholders have a right to comment on evaluation decisions. From a standpoint of ensuring use, stakeholder feedback is a necessary part of every step in the evaluation. Obtaining valuable feedback can be encouraged by holding discussions during each step of the evaluation and routinely sharing interim findings, provisional interpretations, and draft reports.

Follow-up refers to the support that many users need during the evaluation and after they receive evaluation findings. Because of the amount of effort required, reaching justified conclusions in an evaluation can seem like an end in itself. It is not . Active follow-up may be necessary to remind users of the intended uses of what has been learned. Follow-up may also be required to stop lessons learned from becoming lost or ignored in the process of making complex or political decisions. To guard against such oversight, it may be helpful to have someone involved in the evaluation serve as an advocate for the evaluation's findings during the decision -making phase.

Facilitating the use of evaluation findings also carries with it the responsibility to prevent misuse. Evaluation results are always bounded by the context in which the evaluation was conducted. Some stakeholders, however, may be tempted to take results out of context or to use them for different purposes than what they were developed for. For instance, over-generalizing the results from a single case study to make decisions that affect all sites in a national program is an example of misuse of a case study evaluation.

Similarly, program opponents may misuse results by overemphasizing negative findings without giving proper credit for what has worked. Active follow-up can help to prevent these and other forms of misuse by ensuring that evidence is only applied to the questions that were the central focus of the evaluation.

Dissemination

Dissemination is the process of communicating the procedures or the lessons learned from an evaluation to relevant audiences in a timely, unbiased, and consistent fashion. Like other elements of the evaluation, the reporting strategy should be discussed in advance with intended users and other stakeholders. Planning effective communications also requires considering the timing, style, tone, message source, vehicle, and format of information products. Regardless of how communications are constructed, the goal for dissemination is to achieve full disclosure and impartial reporting.

Along with the uses for evaluation findings, there are also uses that flow from the very process of evaluating. These "process uses" should be encouraged. The people who take part in an evaluation can experience profound changes in beliefs and behavior. For instance, an evaluation challenges staff members to act differently in what they are doing, and to question assumptions that connect program activities with intended effects.

Evaluation also prompts staff to clarify their understanding of the goals of the program. This greater clarity, in turn, helps staff members to better function as a team focused on a common end. In short, immersion in the logic, reasoning, and values of evaluation can have very positive effects, such as basing decisions on systematic judgments instead of on unfounded assumptions.

Additional process uses for evaluation include:

  • By defining indicators, what really matters to stakeholders becomes clear
  • It helps make outcomes matter by changing the reinforcements connected with achieving positive results. For example, a funder might offer "bonus grants" or "outcome dividends" to a program that has shown a significant amount of community change and improvement.

Standards for "good" evaluation

There are standards to assess whether all of the parts of an evaluation are well -designed and working to their greatest potential. The Joint Committee on Educational Evaluation developed "The Program Evaluation Standards" for this purpose. These standards, designed to assess evaluations of educational programs, are also relevant for programs and interventions related to community health and development.

The program evaluation standards make it practical to conduct sound and fair evaluations. They offer well-supported principles to follow when faced with having to make tradeoffs or compromises. Attending to the standards can guard against an imbalanced evaluation, such as one that is accurate and feasible, but isn't very useful or sensitive to the context. Another example of an imbalanced evaluation is one that would be genuinely useful, but is impossible to carry out.

The following standards can be applied while developing an evaluation design and throughout the course of its implementation. Remember, the standards are written as guiding principles, not as rigid rules to be followed in all situations.

The 30 more specific standards are grouped into four categories:

The utility standards are:

  • Stakeholder Identification : People who are involved in (or will be affected by) the evaluation should be identified, so that their needs can be addressed.
  • Evaluator Credibility : The people conducting the evaluation should be both trustworthy and competent, so that the evaluation will be generally accepted as credible or believable.
  • Information Scope and Selection : Information collected should address pertinent questions about the program, and it should be responsive to the needs and interests of clients and other specified stakeholders.
  • Values Identification: The perspectives, procedures, and rationale used to interpret the findings should be carefully described, so that the bases for judgments about merit and value are clear.
  • Report Clarity: Evaluation reports should clearly describe the program being evaluated, including its context, and the purposes, procedures, and findings of the evaluation. This will help ensure that essential information is provided and easily understood.
  • Report Timeliness and Dissemination: Significant midcourse findings and evaluation reports should be shared with intended users so that they can be used in a timely fashion.
  • Evaluation Impact: Evaluations should be planned, conducted, and reported in ways that encourage follow-through by stakeholders, so that the evaluation will be used.

Feasibility Standards

The feasibility standards are to ensure that the evaluation makes sense - that the steps that are planned are both viable and pragmatic.

The feasibility standards are:

  • Practical Procedures: The evaluation procedures should be practical, to keep disruption of everyday activities to a minimum while needed information is obtained.
  • Political Viability : The evaluation should be planned and conducted with anticipation of the different positions or interests of various groups. This should help in obtaining their cooperation so that possible attempts by these groups to curtail evaluation operations or to misuse the results can be avoided or counteracted.
  • Cost Effectiveness: The evaluation should be efficient and produce enough valuable information that the resources used can be justified.

Propriety Standards

The propriety standards ensure that the evaluation is an ethical one, conducted with regard for the rights and interests of those involved. The eight propriety standards follow.

  • Service Orientation : Evaluations should be designed to help organizations effectively serve the needs of all of the targeted participants.
  • Formal Agreements : The responsibilities in an evaluation (what is to be done, how, by whom, when) should be agreed to in writing, so that those involved are obligated to follow all conditions of the agreement, or to formally renegotiate it.
  • Rights of Human Subjects : Evaluation should be designed and conducted to respect and protect the rights and welfare of human subjects, that is, all participants in the study.
  • Human Interactions : Evaluators should respect basic human dignity and worth when working with other people in an evaluation, so that participants don't feel threatened or harmed.
  • Complete and Fair Assessment : The evaluation should be complete and fair in its examination, recording both strengths and weaknesses of the program being evaluated. This allows strengths to be built upon and problem areas addressed.
  • Disclosure of Findings : The people working on the evaluation should ensure that all of the evaluation findings, along with the limitations of the evaluation, are accessible to everyone affected by the evaluation, and any others with expressed legal rights to receive the results.
  • Conflict of Interest: Conflict of interest should be dealt with openly and honestly, so that it does not compromise the evaluation processes and results.
  • Fiscal Responsibility : The evaluator's use of resources should reflect sound accountability procedures and otherwise be prudent and ethically responsible, so that expenditures are accounted for and appropriate.

Accuracy Standards

The accuracy standards ensure that the evaluation findings are considered correct.

There are 12 accuracy standards:

  • Program Documentation: The program should be described and documented clearly and accurately, so that what is being evaluated is clearly identified.
  • Context Analysis: The context in which the program exists should be thoroughly examined so that likely influences on the program can be identified.
  • Described Purposes and Procedures: The purposes and procedures of the evaluation should be monitored and described in enough detail that they can be identified and assessed.
  • Defensible Information Sources: The sources of information used in a program evaluation should be described in enough detail that the adequacy of the information can be assessed.
  • Valid Information: The information gathering procedures should be chosen or developed and then implemented in such a way that they will assure that the interpretation arrived at is valid.
  • Reliable Information : The information gathering procedures should be chosen or developed and then implemented so that they will assure that the information obtained is sufficiently reliable.
  • Systematic Information: The information from an evaluation should be systematically reviewed and any errors found should be corrected.
  • Analysis of Quantitative Information: Quantitative information - data from observations or surveys - in an evaluation should be appropriately and systematically analyzed so that evaluation questions are effectively answered.
  • Analysis of Qualitative Information: Qualitative information - descriptive information from interviews and other sources - in an evaluation should be appropriately and systematically analyzed so that evaluation questions are effectively answered.
  • Justified Conclusions: The conclusions reached in an evaluation should be explicitly justified, so that stakeholders can understand their worth.
  • Impartial Reporting: Reporting procedures should guard against the distortion caused by personal feelings and biases of people involved in the evaluation, so that evaluation reports fairly reflect the evaluation findings.
  • Metaevaluation: The evaluation itself should be evaluated against these and other pertinent standards, so that it is appropriately guided and, on completion, stakeholders can closely examine its strengths and weaknesses.

Applying the framework: Conducting optimal evaluations

There is an ever-increasing agreement on the worth of evaluation; in fact, doing so is often required by funders and other constituents. So, community health and development professionals can no longer question whether or not to evaluate their programs. Instead, the appropriate questions are:

  • What is the best way to evaluate?
  • What are we learning from the evaluation?
  • How will we use what we learn to become more effective?

The framework for program evaluation helps answer these questions by guiding users to select evaluation strategies that are useful, feasible, proper, and accurate.

To use this framework requires quite a bit of skill in program evaluation. In most cases there are multiple stakeholders to consider, the political context may be divisive, steps don't always follow a logical order, and limited resources may make it difficult to take a preferred course of action. An evaluator's challenge is to devise an optimal strategy, given the conditions she is working under. An optimal strategy is one that accomplishes each step in the framework in a way that takes into account the program context and is able to meet or exceed the relevant standards.

This framework also makes it possible to respond to common concerns about program evaluation. For instance, many evaluations are not undertaken because they are seen as being too expensive. The cost of an evaluation, however, is relative; it depends upon the question being asked and the level of certainty desired for the answer. A simple, low-cost evaluation can deliver information valuable for understanding and improvement.

Rather than discounting evaluations as a time-consuming sideline, the framework encourages evaluations that are timed strategically to provide necessary feedback. This makes it possible to make evaluation closely linked with everyday practices.

Another concern centers on the perceived technical demands of designing and conducting an evaluation. However, the practical approach endorsed by this framework focuses on questions that can improve the program.

Finally, the prospect of evaluation troubles many staff members because they perceive evaluation methods as punishing ("They just want to show what we're doing wrong."), exclusionary ("Why aren't we part of it? We're the ones who know what's going on."), and adversarial ("It's us against them.") The framework instead encourages an evaluation approach that is designed to be helpful and engages all interested stakeholders in a process that welcomes their participation.

Evaluation is a powerful strategy for distinguishing programs and interventions that make a difference from those that don't. It is a driving force for developing and adapting sound strategies, improving existing programs, and demonstrating the results of investments in time and other resources. It also helps determine if what is being done is worth the cost.

This recommended framework for program evaluation is both a synthesis of existing best practices and a set of standards for further improvement. It supports a practical approach to evaluation based on steps and standards that can be applied in almost any setting. Because the framework is purposefully general, it provides a stable guide to design and conduct a wide range of evaluation efforts in a variety of specific program areas. The framework can be used as a template to create useful evaluation plans to contribute to understanding and improvement. The Magenta Book - Guidance for Evaluation  provides additional information on requirements for good evaluation, and some straightforward steps to make a good evaluation of an intervention more feasible, read The Magenta Book - Guidance for Evaluation.

Online Resources

Are You Ready to Evaluate your Coalition? prompts 15 questions to help the group decide whether your coalition is ready to evaluate itself and its work.

The  American Evaluation Association Guiding Principles for Evaluators  helps guide evaluators in their professional practice.

CDC Evaluation Resources  provides a list of resources for evaluation, as well as links to professional associations and journals.

Chapter 11: Community Interventions in the "Introduction to Community Psychology" explains professionally-led versus grassroots interventions, what it means for a community intervention to be effective, why a community needs to be ready for an intervention, and the steps to implementing community interventions.

The  Comprehensive Cancer Control Branch Program Evaluation Toolkit  is designed to help grantees plan and implement evaluations of their NCCCP-funded programs, this toolkit provides general guidance on evaluation principles and techniques, as well as practical templates and tools.

Developing an Effective Evaluation Plan  is a workbook provided by the CDC. In addition to information on designing an evaluation plan, this book also provides worksheets as a step-by-step guide.

EvaluACTION , from the CDC, is designed for people interested in learning about program evaluation and how to apply it to their work. Evaluation is a process, one dependent on what you’re currently doing and on the direction in which you’d like go. In addition to providing helpful information, the site also features an interactive Evaluation Plan & Logic Model Builder, so you can create customized tools for your organization to use.

Evaluating Your Community-Based Program  is a handbook designed by the American Academy of Pediatrics covering a variety of topics related to evaluation.

GAO Designing Evaluations  is a handbook provided by the U.S. Government Accountability Office with copious information regarding program evaluations.

The CDC's  Introduction to Program Evaluation for Publilc Health Programs: A Self-Study Guide  is a "how-to" guide for planning and implementing evaluation activities. The manual, based on CDC’s Framework for Program Evaluation in Public Health, is intended to assist with planning, designing, implementing and using comprehensive evaluations in a practical way.

McCormick Foundation Evaluation Guide  is a guide to planning an organization’s evaluation, with several chapters dedicated to gathering information and using it to improve the organization.

A Participatory Model for Evaluating Social Programs from the James Irvine Foundation.

Practical Evaluation for Public Managers  is a guide to evaluation written by the U.S. Department of Health and Human Services.

Penn State Program Evaluation  offers information on collecting different forms of data and how to measure different community markers.

Program Evaluaton  information page from Implementation Matters.

The Program Manager's Guide to Evaluation  is a handbook provided by the Administration for Children and Families with detailed answers to nine big questions regarding program evaluation.

Program Planning and Evaluation  is a website created by the University of Arizona. It provides links to information on several topics including methods, funding, types of evaluation, and reporting impacts.

User-Friendly Handbook for Program Evaluation  is a guide to evaluations provided by the National Science Foundation.  This guide includes practical information on quantitative and qualitative methodologies in evaluations.

W.K. Kellogg Foundation Evaluation Handbook  provides a framework for thinking about evaluation as a relevant and useful program tool. It was originally written for program directors with direct responsibility for the ongoing evaluation of the W.K. Kellogg Foundation.

Print Resources

This Community Tool Box section is an edited version of:

CDC Evaluation Working Group. (1999). (Draft). Recommended framework for program evaluation in public health practice . Atlanta, GA: Author.

The article cites the following references:

Adler. M., &  Ziglio, E. (1996). Gazing into the oracle: the delphi method and its application to social policy and community health and development. London: Jessica Kingsley Publishers.

Barrett, F.   Program Evaluation: A Step-by-Step Guide.  Sunnycrest Press, 2013. This practical manual includes helpful tips to develop evaluations, tables illustrating evaluation approaches, evaluation planning and reporting templates, and resources if you want more information.

Basch, C., Silepcevich, E., Gold, R., Duncan, D., & Kolbe, L. (1985).   Avoiding type III errors in health education program evaluation: a case study . Health Education Quarterly. 12(4):315-31.

Bickman L, & Rog, D. (1998). Handbook of applied social research methods. Thousand Oaks, CA: Sage Publications.

Boruch, R.  (1998).  Randomized controlled experiments for evaluation and planning. In Handbook of applied social research methods, edited by Bickman L., & Rog. D. Thousand Oaks, CA: Sage Publications: 161-92.

Centers for Disease Control and Prevention DoHAP. Evaluating CDC HIV prevention programs: guidance and data system . Atlanta, GA: Centers for Disease Control and Prevention, Division of HIV/AIDS Prevention, 1999.

Centers for Disease Control and Prevention. Guidelines for evaluating surveillance systems. Morbidity and Mortality Weekly Report 1988;37(S-5):1-18.

Centers for Disease Control and Prevention. Handbook for evaluating HIV education . Atlanta, GA: Centers for Disease Control and Prevention, National Center for Chronic Disease Prevention and Health Promotion, Division of Adolescent and School Health, 1995.

Cook, T., & Campbell, D. (1979). Quasi-experimentation . Chicago, IL: Rand McNally.

Cook, T.,& Reichardt, C. (1979).  Qualitative and quantitative methods in evaluation research . Beverly Hills, CA: Sage Publications.

Cousins, J.,& Whitmore, E. (1998).   Framing participatory evaluation. In Understanding and practicing participatory evaluation , vol. 80, edited by E Whitmore. San Francisco, CA: Jossey-Bass: 5-24.

Chen, H. (1990).  Theory driven evaluations . Newbury Park, CA: Sage Publications.

de Vries, H., Weijts, W., Dijkstra, M., & Kok, G. (1992).  The utilization of qualitative and quantitative data for health education program planning, implementation, and evaluation: a spiral approach . Health Education Quarterly.1992; 19(1):101-15.

Dyal, W. (1995).  Ten organizational practices of community health and development: a historical perspective . American Journal of Preventive Medicine;11(6):6-8.

Eddy, D. (1998). Performance measurement: problems and solutions . Health Affairs;17 (4):7-25.Harvard Family Research Project. Performance measurement. In The Evaluation Exchange, vol. 4, 1998, pp. 1-15.

Eoyang,G., & Berkas, T. (1996).  Evaluation in a complex adaptive system . Edited by (we don´t have the names), (1999): Taylor-Powell E, Steele S, Douglah M. Planning a program evaluation. Madison, Wisconsin: University of Wisconsin Cooperative Extension.

Fawcett, S.B., Paine-Andrews, A., Fancisco, V.T., Schultz, J.A., Richter, K.P, Berkley-Patton, J., Fisher, J., Lewis, R.K., Lopez, C.M., Russos, S., Williams, E.L., Harris, K.J., & Evensen, P. (2001). Evaluating community initiatives for health and development. In I. Rootman, D. McQueen, et al. (Eds.),  Evaluating health promotion approaches . (pp. 241-277). Copenhagen, Denmark: World Health Organization - Europe.

Fawcett , S., Sterling, T., Paine-, A., Harris, K., Francisco, V. et al. (1996).  Evaluating community efforts to prevent cardiovascular diseases . Atlanta, GA: Centers for Disease Control and Prevention, National Center for Chronic Disease Prevention and Health Promotion.

Fetterman, D.,, Kaftarian, S., & Wandersman, A. (1996).  Empowerment evaluation: knowledge and tools for self-assessment and accountability . Thousand Oaks, CA: Sage Publications.

Frechtling, J.,& Sharp, L. (1997).  User-friendly handbook for mixed method evaluations . Washington, DC: National Science Foundation.

Goodman, R., Speers, M., McLeroy, K., Fawcett, S., Kegler M., et al. (1998).  Identifying and defining the dimensions of community capacity to provide a basis for measurement . Health Education and Behavior;25(3):258-78.

Greene, J.  (1994). Qualitative program evaluation: practice and promise . In Handbook of Qualitative Research, edited by NK Denzin and YS Lincoln. Thousand Oaks, CA: Sage Publications.

Haddix, A., Teutsch. S., Shaffer. P., & Dunet. D. (1996). Prevention effectiveness: a guide to decision analysis and economic evaluation . New York, NY: Oxford University Press.

Hennessy, M.  Evaluation. In Statistics in Community health and development , edited by Stroup. D.,& Teutsch. S. New York, NY: Oxford University Press, 1998: 193-219

Henry, G. (1998). Graphing data. In Handbook of applied social research methods , edited by Bickman. L., & Rog.  D.. Thousand Oaks, CA: Sage Publications: 527-56.

Henry, G. (1998).  Practical sampling. In Handbook of applied social research methods , edited by  Bickman. L., & Rog. D.. Thousand Oaks, CA: Sage Publications: 101-26.

Institute of Medicine. Improving health in the community: a role for performance monitoring . Washington, DC: National Academy Press, 1997.

Joint Committee on Educational Evaluation, James R. Sanders (Chair). The program evaluation standards: how to assess evaluations of educational programs . Thousand Oaks, CA: Sage Publications, 1994.

Kaplan,  R., & Norton, D.  The balanced scorecard: measures that drive performance . Harvard Business Review 1992;Jan-Feb71-9.

Kar, S. (1989). Health promotion indicators and actions . New York, NY: Springer Publications.

Knauft, E. (1993).   What independent sector learned from an evaluation of its own hard-to -measure programs . In A vision of evaluation, edited by ST Gray. Washington, DC: Independent Sector.

Koplan, J. (1999)  CDC sets millennium priorities . US Medicine 4-7.

Lipsy, M. (1998).  Design sensitivity: statistical power for applied experimental research . In Handbook of applied social research methods, edited by Bickman, L., & Rog, D. Thousand Oaks, CA: Sage Publications. 39-68.

Lipsey, M. (1993). Theory as method: small theories of treatments . New Directions for Program Evaluation;(57):5-38.

Lipsey, M. (1997).  What can you build with thousands of bricks? Musings on the cumulation of knowledge in program evaluation . New Directions for Evaluation; (76): 7-23.

Love, A.  (1991).  Internal evaluation: building organizations from within . Newbury Park, CA: Sage Publications.

Miles, M., & Huberman, A. (1994).  Qualitative data analysis: a sourcebook of methods . Thousand Oaks, CA: Sage Publications, Inc.

National Quality Program. (1999).  National Quality Program , vol. 1999. National Institute of Standards and Technology.

National Quality Program . Baldridge index outperforms S&P 500 for fifth year, vol. 1999.

National Quality Program , 1999.

National Quality Program. Health care criteria for performance excellence , vol. 1999. National Quality Program, 1998.

Newcomer, K.  Using statistics appropriately. In Handbook of Practical Program Evaluation, edited by Wholey,J.,  Hatry, H., & Newcomer. K. San Francisco, CA: Jossey-Bass, 1994: 389-416.

Patton, M. (1990).  Qualitative evaluation and research methods . Newbury Park, CA: Sage Publications.

Patton, M (1997).  Toward distinguishing empowerment evaluation and placing it in a larger context . Evaluation Practice;18(2):147-63.

Patton, M. (1997).  Utilization-focused evaluation . Thousand Oaks, CA: Sage Publications.

Perrin, B. Effective use and misuse of performance measurement . American Journal of Evaluation 1998;19(3):367-79.

Perrin, E, Koshel J. (1997).  Assessment of performance measures for community health and development, substance abuse, and mental health . Washington, DC: National Academy Press.

Phillips, J. (1997).  Handbook of training evaluation and measurement methods . Houston, TX: Gulf Publishing Company.

Poreteous, N., Sheldrick B., & Stewart P. (1997).  Program evaluation tool kit: a blueprint for community health and development management . Ottawa, Canada: Community health and development Research, Education, and Development Program, Ottawa-Carleton Health Department.

Posavac, E., & Carey R. (1980).  Program evaluation: methods and case studies . Prentice-Hall, Englewood Cliffs, NJ.

Preskill, H. & Torres R. (1998).  Evaluative inquiry for learning in organizations . Thousand Oaks, CA: Sage Publications.

Public Health Functions Project. (1996). The public health workforce: an agenda for the 21st century . Washington, DC: U.S. Department of Health and Human Services, Community health and development Service.

Public Health Training Network. (1998).  Practical evaluation of public health programs . CDC, Atlanta, GA.

Reichardt, C., & Mark M. (1998).  Quasi-experimentation . In Handbook of applied social research methods, edited by L Bickman and DJ Rog. Thousand Oaks, CA: Sage Publications, 193-228.

Rossi, P., & Freeman H.  (1993).  Evaluation: a systematic approach . Newbury Park, CA: Sage Publications.

Rush, B., & Ogbourne A. (1995).  Program logic models: expanding their role and structure for program planning and evaluation . Canadian Journal of Program Evaluation;695 -106.

Sanders, J. (1993).  Uses of evaluation as a means toward organizational effectiveness. In A vision of evaluation , edited by ST Gray. Washington, DC: Independent Sector.

Schorr, L. (1997).   Common purpose: strengthening families and neighborhoods to rebuild America . New York, NY: Anchor Books, Doubleday.

Scriven, M. (1998) . A minimalist theory of evaluation: the least theory that practice requires . American Journal of Evaluation.

Shadish, W., Cook, T., Leviton, L. (1991).  Foundations of program evaluation . Newbury Park, CA: Sage Publications.

Shadish, W. (1998).   Evaluation theory is who we are. American Journal of Evaluation:19(1):1-19.

Shulha, L., & Cousins, J. (1997).  Evaluation use: theory, research, and practice since 1986 . Evaluation Practice.18(3):195-208

Sieber, J. (1998).   Planning ethically responsible research . In Handbook of applied social research methods, edited by L Bickman and DJ Rog. Thousand Oaks, CA: Sage Publications: 127-56.

Steckler, A., McLeroy, K., Goodman, R., Bird, S., McCormick, L. (1992).  Toward integrating qualitative and quantitative methods: an introduction . Health Education Quarterly;191-8.

Taylor-Powell, E., Rossing, B., Geran, J. (1998). Evaluating collaboratives: reaching the potential. Madison, Wisconsin: University of Wisconsin Cooperative Extension.

Teutsch, S.  A framework for assessing the effectiveness of disease and injury prevention . Morbidity and Mortality Weekly Report: Recommendations and Reports Series 1992;41 (RR-3 (March 27, 1992):1-13.

Torres, R., Preskill, H., Piontek, M., (1996).   Evaluation strategies for communicating and reporting: enhancing learning in organizations . Thousand Oaks, CA: Sage Publications.

Trochim, W. (1999).  Research methods knowledge base , vol.

United Way of America. Measuring program outcomes: a practical approach . Alexandria, VA: United Way of America, 1996.

U.S. General Accounting Office. Case study evaluations . GAO/PEMD-91-10.1.9. Washington, DC: U.S. General Accounting Office, 1990.

U.S. General Accounting Office. Designing evaluations . GAO/PEMD-10.1.4. Washington, DC: U.S. General Accounting Office, 1991.

U.S. General Accounting Office. Managing for results: measuring program results that are under limited federal control . GAO/GGD-99-16. Washington, DC: 1998.

U.S. General Accounting Office. Prospective evaluation methods: the prosepctive evaluation synthesis . GAO/PEMD-10.1.10. Washington, DC: U.S. General Accounting Office, 1990.

U.S. General Accounting Office. The evaluation synthesis . Washington, DC: U.S. General Accounting Office, 1992.

U.S. General Accounting Office. Using statistical sampling . Washington, DC: U.S. General Accounting Office, 1992.

Wandersman, A., Morrissey, E., Davino, K., Seybolt, D., Crusto, C., et al. Comprehensive quality programming and accountability: eight essential strategies for implementing successful prevention programs . Journal of Primary Prevention 1998;19(1):3-30.

Weiss, C. (1995). Nothing as practical as a good theory: exploring theory-based evaluation for comprehensive community initiatives for families and children . In New Approaches to Evaluating Community Initiatives, edited by Connell, J. Kubisch, A. Schorr, L.  & Weiss, C.  New York, NY, NY: Aspin Institute.

Weiss, C. (1998).  Have we learned anything new about the use of evaluation? American Journal of Evaluation;19(1):21-33.

Weiss, C. (1997).  How can theory-based evaluation make greater headway? Evaluation Review 1997;21(4):501-24.

W.K. Kellogg Foundation. (1998). The W.K. Foundation Evaluation Handbook . Battle Creek, MI: W.K. Kellogg Foundation.

Wong-Reiger, D.,& David, L. (1995).  Using program logic models to plan and evaluate education and prevention programs. In Evaluation Methods Sourcebook II, edited by Love. A.J. Ottawa, Ontario: Canadian Evaluation Society.

Wholey, S., Hatry, P., & Newcomer, E. .  Handbook of Practical Program Evaluation.  Jossey-Bass, 2010. This book serves as a comprehensive guide to the evaluation process and its practical applications for sponsors, program managers, and evaluators.

Yarbrough,  B., Lyn, M., Shulha, H., Rodney K., & Caruthers, A. (2011).  The Program Evaluation Standards: A Guide for Evalualtors and Evaluation Users Third Edition . Sage Publications.

Yin, R. (1988).  Case study research: design and methods . Newbury Park, CA: Sage Publications.

This website may not work correctly because your browser is out of date. Please update your browser .

Program Evaluation: a Plain English Guide

This 11-step guide defines program evaluation, what it is used for, the different types and when they should be used. Also covered is how to plan a program evaluation, monitor performance, communicate findings, deliver bad news, and put improvements into practice.

This resource and the following information was contributed to BetterEvaluation by Dana Cross , Grosvenor Management Consulting.   Authors and their affiliation Dana Cross, Grosvenor Management Consulting Year of publication 2015 Type of resource Guide Key features of the resource (summarise the purpose/focus of the resource or its key content/messages) An easy to read and understand guide on tried-and tested program evaluation practices including: the what and why of program evaluation how to articulate the workings of your program using program theory and program logic tools available for planning your program evaluation how to monitor program performance ways to communicate your findings This resource illustrates the practical, hands-on information that is particularly useful to program managers. It includes how-to-guides, diagrams and examples to understand and start implementing program evaluation in real life. Who is this resource useful for? Advocates for evaluation Commissioners/managers of evaluation How have you used or intend on using this resource? (In what ways have you used the resource? What was particularly helpful about it?) Program evaluation can be daunting for program managers approaching it for the first time. Program evaluators and managers have found this a particularly useful resource to share with peers and stakeholders who are new to evaluation; it provides a good introduction to what program evaluation might involve as part of the management and assessment of program performance.  Why would you recommend it to other people? This nuts-and-bolts guidance on the key components of program evaluation avoids jargon and provides a very practical way forward for implementation of the evaluation.
  • What is program evaluation?
  • Understanding which programs to evaluate
  • When is the best time for program evaluation?
  • Program theory and program logic: articulating how your program works
  • Types of program evaluation
  • Tools for planning program evaluation A: Evaluation framework
  • Tools for planning program evaluation B: Evaluation plan
  • How is your program going... really? Performance monitoring
  • Communicating your program evaluation findings effectively
  • Breaking bad news
  • Putting improvements into practice

Cross, D. (2015)  Program Evaluation: a Plain English Guide . Grosvenor Management Consulting

Related links

  • http://resources.grosvenor.com.au/program-evaluation-a-plain-english-guide

Back to top

© 2022 BetterEvaluation. All right reserved.

Site logo

  • Understanding Evaluation Methodologies: M&E Methods and Techniques for Assessing Performance and Impact
  • Learning Center

EVALUATION METHODOLOGIES and M&E Methods

This article provides an overview and comparison of the different types of evaluation methodologies used to assess the performance, effectiveness, quality, or impact of services, programs, and policies. There are several methodologies both qualitative and quantitative, including surveys, interviews, observations, case studies, focus groups, and more…In this essay, we will discuss the most commonly used qualitative and quantitative evaluation methodologies in the M&E field.

Table of Contents

  • Introduction to Evaluation Methodologies: Definition and Importance
  • Types of Evaluation Methodologies: Overview and Comparison
  • Program Evaluation methodologies
  • Qualitative Methodologies in Monitoring and Evaluation (M&E)
  • Quantitative Methodologies in Monitoring and Evaluation (M&E)
  • What are the M&E Methods?
  • Difference Between Evaluation Methodologies and M&E Methods
  • Choosing the Right Evaluation Methodology: Factors and Criteria
  • Our Conclusion on Evaluation Methodologies

1. Introduction to Evaluation Methodologies: Definition and Importance

Evaluation methodologies are the methods and techniques used to measure the performance, effectiveness, quality, or impact of various interventions, services, programs, and policies. Evaluation is essential for decision-making, improvement, and innovation, as it helps stakeholders identify strengths, weaknesses, opportunities, and threats and make informed decisions to improve the effectiveness and efficiency of their operations.

Evaluation methodologies can be used in various fields and industries, such as healthcare, education, business, social services, and public policy. The choice of evaluation methodology depends on the specific goals of the evaluation, the type and level of data required, and the resources available for conducting the evaluation.

The importance of evaluation methodologies lies in their ability to provide evidence-based insights into the performance and impact of the subject being evaluated. This information can be used to guide decision-making, policy development, program improvement, and innovation. By using evaluation methodologies, stakeholders can assess the effectiveness of their operations and make data-driven decisions to improve their outcomes.

Overall, understanding evaluation methodologies is crucial for individuals and organizations seeking to enhance their performance, effectiveness, and impact. By selecting the appropriate evaluation methodology and conducting a thorough evaluation, stakeholders can gain valuable insights and make informed decisions to improve their operations and achieve their goals.

2. Types of Evaluation Methodologies: Overview and Comparison

Evaluation methodologies can be categorized into two main types based on the type of data they collect: qualitative and quantitative. Qualitative methodologies collect non-numerical data, such as words, images, or observations, while quantitative methodologies collect numerical data that can be analyzed statistically. Here is an overview and comparison of the main differences between qualitative and quantitative evaluation methodologies:

Qualitative Evaluation Methodologies:

  • Collect non-numerical data, such as words, images, or observations.
  • Focus on exploring complex phenomena, such as attitudes, perceptions, and behaviors, and understanding the meaning and context behind them.
  • Use techniques such as interviews, observations, case studies, and focus groups to collect data.
  • Emphasize the subjective nature of the data and the importance of the researcher’s interpretation and analysis.
  • Provide rich and detailed insights into people’s experiences and perspectives.
  • Limitations include potential bias from the researcher, limited generalizability of findings, and challenges in analyzing and synthesizing the data.

Quantitative Evaluation Methodologies:

  • Collect numerical data that can be analyzed statistically.
  • Focus on measuring specific variables and relationships between them, such as the effectiveness of an intervention or the correlation between two factors.
  • Use techniques such as surveys and experimental designs to collect data.
  • Emphasize the objectivity of the data and the importance of minimizing bias and variability.
  • Provide precise and measurable data that can be compared and analyzed statistically.
  • Limitations include potential oversimplification of complex phenomena, limited contextual information, and challenges in collecting and analyzing data.

Choosing between qualitative and quantitative evaluation methodologies depends on the specific goals of the evaluation, the type and level of data required, and the resources available for conducting the evaluation. Some evaluations may use a mixed-methods approach that combines both qualitative and quantitative data collection and analysis techniques to provide a more comprehensive understanding of the subject being evaluated.

3. Program evaluation methodologies

Program evaluation methodologies encompass a diverse set of approaches and techniques used to assess the effectiveness, efficiency, and impact of programs and interventions. These methodologies provide systematic frameworks for collecting, analyzing, and interpreting data to determine the extent to which program objectives are being met and to identify areas for improvement. Common program evaluation methodologies include quantitative methods such as experimental designs, quasi-experimental designs, and surveys, as well as qualitative approaches like interviews, focus groups, and case studies.

Each methodology offers unique advantages and limitations depending on the nature of the program being evaluated, the available resources, and the research questions at hand. By employing rigorous program evaluation methodologies, organizations can make informed decisions, enhance program effectiveness, and maximize the use of resources to achieve desired outcomes.

Catch HR’s eye instantly?

  • Resume Review
  • Resume Writing
  • Resume Optimization

Premier global development resume service since 2012

Stand Out with a Pro Resume

4. Qualitative Methodologies in Monitoring and Evaluation (M&E)

Qualitative methodologies are increasingly being used in monitoring and evaluation (M&E) to provide a more comprehensive understanding of the impact and effectiveness of programs and interventions. Qualitative methodologies can help to explore the underlying reasons and contexts that contribute to program outcomes and identify areas for improvement. Here are some common qualitative methodologies used in M&E:

Interviews involve one-on-one or group discussions with stakeholders to collect data on their experiences, perspectives, and perceptions. Interviews can provide rich and detailed data on the effectiveness of a program, the factors that contribute to its success or failure, and the ways in which it can be improved.

Observations

Observations involve the systematic and objective recording of behaviors and interactions of stakeholders in a natural setting. Observations can help to identify patterns of behavior, the effectiveness of program interventions, and the ways in which they can be improved.

Document review

Document review involves the analysis of program documents, such as reports, policies, and procedures, to understand the program context, design, and implementation. Document review can help to identify gaps in program design or implementation and suggest ways in which they can be improved.

Participatory Rural Appraisal (PRA)

PRA is a participatory approach that involves working with communities to identify and analyze their own problems and challenges. It involves using participatory techniques such as mapping, focus group discussions, and transect walks to collect data on community perspectives, experiences, and priorities. PRA can help ensure that the evaluation is community-driven and culturally appropriate, and can provide valuable insights into the social and cultural factors that influence program outcomes.

Key Informant Interviews

Key informant interviews are in-depth, open-ended interviews with individuals who have expert knowledge or experience related to the program or issue being evaluated. Key informants can include program staff, community leaders, or other stakeholders. These interviews can provide valuable insights into program implementation and effectiveness, and can help identify areas for improvement.

Ethnography

Ethnography is a qualitative method that involves observing and immersing oneself in a community or culture to understand their perspectives, values, and behaviors. Ethnographic methods can include participant observation, interviews, and document analysis, among others. Ethnography can provide a more holistic understanding of program outcomes and impacts, as well as the broader social context in which the program operates.

Focus Group Discussions

Focus group discussions involve bringing together a small group of individuals to discuss a specific topic or issue related to the program. Focus group discussions can be used to gather qualitative data on program implementation, participant experiences, and program outcomes. They can also provide insights into the diversity of perspectives within a community or stakeholder group .

Photovoice is a qualitative method that involves using photography as a tool for community empowerment and self-expression. Participants are given cameras and asked to take photos that represent their experiences or perspectives on a program or issue. These photos can then be used to facilitate group discussions and generate qualitative data on program outcomes and impacts.

Case Studies

Case studies involve gathering detailed qualitative data through interviews, document analysis, and observation, and can provide a more in-depth understanding of a specific program component. They can be used to explore the experiences and perspectives of program participants or stakeholders and can provide insights into program outcomes and impacts.

Qualitative methodologies in M&E are useful for identifying complex and context-dependent factors that contribute to program outcomes, and for exploring stakeholder perspectives and experiences. Qualitative methodologies can provide valuable insights into the ways in which programs can be improved and can complement quantitative methodologies in providing a comprehensive understanding of program impact and effectiveness

5. Quantitative Methodologies in Monitoring and Evaluation (M&E)

Quantitative methodologies are commonly used in monitoring and evaluation (M&E) to measure program outcomes and impact in a systematic and objective manner. Quantitative methodologies involve collecting numerical data that can be analyzed statistically to provide insights into program effectiveness, efficiency, and impact. Here are some common quantitative methodologies used in M&E:

Surveys involve collecting data from a large number of individuals using standardized questionnaires or surveys. Surveys can provide quantitative data on people’s attitudes, opinions, behaviors, and experiences, and can help to measure program outcomes and impact.

Baseline and Endline Surveys

Baseline and endline surveys are quantitative surveys conducted at the beginning and end of a program to measure changes in knowledge, attitudes, behaviors, or other outcomes. These surveys can provide a snapshot of program impact and allow for comparisons between pre- and post-program data.

Randomized Controlled Trials (RCTs)

RCTs are a rigorous quantitative evaluation method that involve randomly assigning participants to a treatment group (receiving the program) and a control group (not receiving the program), and comparing outcomes between the two groups. RCTs are often used to assess the impact of a program.

Cost-Benefit Analysis

Cost-benefit analysis is a quantitative method used to assess the economic efficiency of a program or intervention. It involves comparing the costs of the program with the benefits or outcomes generated, and can help determine whether a program is cost-effective or not.

Performance Indicators

Performance indicator s are quantitative measures used to track progress toward program goals and objectives. These indicators can be used to assess program effectiveness, efficiency, and impact, and can provide regular feedback on program performance.

Statistical Analysis

Statistical analysis involves using quantitative data and statistical method s to analyze data gathered from various evaluation methods, such as surveys or observations. Statistical analysis can provide a more rigorous assessment of program outcomes and impacts and help identify patterns or relationships between variables.

Experimental designs

Experimental designs involve manipulating one or more variables and measuring the effects of the manipulation on the outcome of interest. Experimental designs are useful for establishing cause-and-effect relationships between variables, and can help to measure the effectiveness of program interventions.

Quantitative methodologies in M&E are useful for providing objective and measurable data on program outcomes and impact, and for identifying patterns and trends in program performance. Quantitative methodologies can provide valuable insights into the effectiveness, efficiency, and impact of programs, and can complement qualitative methodologies in providing a comprehensive understanding of program performance.

6. What are the M&E Methods?

Monitoring and Evaluation (M&E) methods encompass the tools, techniques, and processes used to assess the performance of projects, programs, or policies.

These methods are essential in determining whether the objectives are being met, understanding the impact of interventions, and guiding decision-making for future improvements. M&E methods fall into two broad categories: qualitative and quantitative, often used in combination for a comprehensive evaluation.

7. Choosing the Right Evaluation Methodology: Factors and Criteria

Choosing the right evaluation methodology is essential for conducting an effective and meaningful evaluation. Here are some factors and criteria to consider when selecting an appropriate evaluation methodology:

  • Evaluation goals and objectives: The evaluation goals and objectives should guide the selection of an appropriate methodology. For example, if the goal is to explore stakeholders’ perspectives and experiences, qualitative methodologies such as interviews or focus groups may be more appropriate. If the goal is to measure program outcomes and impact, quantitative methodologies such as surveys or experimental designs may be more appropriate.
  • Type of data required: The type of data required for the evaluation should also guide the selection of the methodology. Qualitative methodologies collect non-numerical data, such as words, images, or observations, while quantitative methodologies collect numerical data that can be analyzed statistically. The type of data required will depend on the evaluation goals and objectives.
  • Resources available: The resources available, such as time, budget, and expertise, can also influence the selection of an appropriate methodology. Some methodologies may require more resources, such as specialized expertise or equipment, while others may be more cost-effective and easier to implement.
  • Accessibility of the subject being evaluated: The accessibility of the subject being evaluated, such as the availability of stakeholders or data, can also influence the selection of an appropriate methodology. For example, if stakeholders are geographically dispersed, remote data collection methods such as online surveys or video conferencing may be more appropriate.
  • Ethical considerations: Ethical considerations, such as ensuring the privacy and confidentiality of stakeholders, should also be taken into account when selecting an appropriate methodology. Some methodologies, such as interviews or focus groups, may require more attention to ethical considerations than others.

Overall, choosing the right evaluation methodology depends on a variety of factors and criteria, including the evaluation goals and objectives, the type of data required, the resources available, the accessibility of the subject being evaluated, and ethical considerations. Selecting an appropriate methodology can ensure that the evaluation is effective, meaningful, and provides valuable insights into program performance and impact.

8. Our Conclusion on Evaluation Methodologies

It’s worth noting that many evaluation methodologies use a combination of quantitative and qualitative methods to provide a more comprehensive understanding of program outcomes and impacts. Both qualitative and quantitative methodologies are essential in providing insights into program performance and effectiveness.

Qualitative methodologies focus on gathering data on the experiences, perspectives, and attitudes of individuals or communities involved in a program, providing a deeper understanding of the social and cultural factors that influence program outcomes. In contrast, quantitative methodologies focus on collecting numerical data on program performance and impact, providing more rigorous evidence of program effectiveness and efficiency.

Each methodology has its strengths and limitations, and a combination of both qualitative and quantitative approaches is often the most effective in providing a comprehensive understanding of program outcomes and impact. When designing an M&E plan, it is crucial to consider the program’s objectives, context, and stakeholders to select the most appropriate methodologies.

Overall, effective M&E practices require a systematic and continuous approach to data collection, analysis, and reporting. With the right combination of qualitative and quantitative methodologies, M&E can provide valuable insights into program performance, progress, and impact, enabling informed decision-making and resource allocation, ultimately leading to more successful and impactful programs.

' data-src=

Munir Barnaba

Thanks for your help its of high value, much appreciated

' data-src=

Very informative. Thank you

' data-src=

Chokri HAMOUDA

I am grateful for this article, which offers valuable insights and serves as an excellent educational resource. My thanks go to the author.

Leave a Comment Cancel Reply

Your email address will not be published.

How strong is my Resume?

Only 2% of resumes land interviews.

Land a better, higher-paying career

program evaluation methodology sample

Jobs for You

Team leader.

  • North Macedonia

College of Education: Open-Rank, Evaluation/Social Research Methods — Educational Psychology

  • Champaign, IL, USA
  • University of Illinois at Urbana-Champaign

Deputy Director – Operations and Finance

  • United States

Energy/Environment Senior Advisor

Climate finance specialist, call for consultancy: evaluation of dfpa projects in kenya, uganda and ethiopia.

  • The Danish Family Planning Association

Project Assistant – Close Out

  • United States (Remote)

Global Technical Advisor – Information Management

  • Belfast, UK
  • Concern Worldwide

Intern- International Project and Proposal Support – ISPI

Budget and billing consultant, manager ii, budget and billing, usaid/lac office of regional sustainable development – program analyst, senior finance and administrative manager, data scientist.

  • New York, NY, USA
  • Everytown For Gun Safety

Services you might be interested in

Useful guides ....

How to Create a Strong Resume

Monitoring And Evaluation Specialist Resume

Resume Length for the International Development Sector

Types of Evaluation

Monitoring, Evaluation, Accountability, and Learning (MEAL)

LAND A JOB REFERRAL IN 2 WEEKS (NO ONLINE APPS!)

Sign Up & To Get My Free Referral Toolkit Now:

U.S. flag

An official website of the United States government, Department of Justice.

Here's how you know

Official websites use .gov A .gov website belongs to an official government organization in the United States.

Secure .gov websites use HTTPS A lock ( Lock A locked padlock ) or https:// means you’ve safely connected to the .gov website. Share sensitive information only on official, secure websites.

Plan for Program Evaluation from the Start

National Institute of Justice Journal

No matter the source of funding for their program — government or private foundation — managers everywhere are feeling greater pressure to demonstrate their programs' effectiveness. And they must do so using scientific methods, not anecdotes about individuals who benefited. Funders want to see data and other hard evidence to justify continuing or expanding a program.

One of the first tasks in gathering evidence about a program's successes and limitations (or failures) is to initiate an evaluation, a systematic assessment of the program's design, activities or outcomes. Evaluations can help funders and program managers make better judgments, improve effectiveness or make programming decisions. [1] Evaluations can describe how a program is operating, show whether it is working as intended, determine whether it has achieved its objectives and identify areas for improvement.

Having a plan for the evaluation is critical, and having it ready when the program launches is best.

Evaluation Plans

An evaluation plan outlines the evaluation's goals and purpose, the research questions, and information to be gathered. Ideally, program staff and an evaluator should develop the plan before the program starts, using a process that involves all relevant program stakeholders.

The benefits of an evaluation plan

Having a plan helps ensure that future evaluations are feasible and instructive. Putting the plan in writing helps ensure that the process is transparent and that all stakeholders agree on the goals of both the program and the evaluation. It serves as a reference when questions arise about priorities, supports requests for program and evaluation funding, and informs new staff. An evaluation plan also can help stakeholders develop a realistic timeline for when the program will (or should) be ready for evaluation.

Creating an evaluation plan

Partners and stakeholders use evaluation plans to clarify a program's purpose, goals and objectives and to describe how program activities are linked to their intended effects. To this end, stakeholders should consider developing a logic model. A logic model visually depicts how a program is expected to work and achieve its goals, specifying the program's inputs, activities, outputs and outcomes.

Figure 1. Sample Logic Model (figure forthcoming)

The evaluation plan should develop goals for future evaluations and questions these evaluations should answer. This information will drive decisions on what data will be needed and how to collect them. For example, stakeholders may be interested in the extent to which the program was implemented as planned. Determining that requires documentation on program design, program implementation, problems encountered, the targeted audience and actual participation. Or, stakeholders might want to know the program's impact on participants and whether it achieved its objectives. In this case, program staff should plan to collect data before implementing the program so an evaluator later can assess any changes attributable to the program.

Types of evaluations

A program can benefit from multiple evaluations over the course of its design and implementation.

The type and timing of evaluations are important. Evaluation is more difficult and less meaningful after the program ends, because stakeholders cannot use information gathered from the evaluation to alter the program's implementation or to justify continued funding. Conducting certain evaluations, like outcome evaluations, is difficult when a program is too new because program elements, strategies or procedures often still are being adjusted and finalized.

  Formative Process Impact or Outcome

Questions the type of evaluation can answer

When to use the type of evaluation

During the planning stages or beginning of the program's implementation so revisions can be made before the program starts

In the early stages of the program's implementation to provide initial feedback

At the end of the program's development, when the program is stable and unlikely to change in fundamental ways

Plan for Evaluation from the Start

When designing a program, it is easy to focus only on the immediate decisions that must be made to implement the program and make it operational. But evaluating a program can be challenging or impossible if stakeholders do not plan for evaluation during initial program development. Having evaluation in mind when designing a program can help ensure the success of future evaluations.

Choose the questions you want to answer and know what information you need to answer them

Stakeholders need to know the questions they want an evaluation to answer and build the capacity to collect data to answer those questions. For example, if stakeholders want to know what changes resulted from the program, baseline data should be collected before the program begins. This is especially important if the evaluation will use surveys or interviews to assess baseline opinions or behaviors, because asking respondents later to recall prior opinions or behavior may produce biased results. By thinking this through in advance, stakeholders can ensure they conduct any necessary pre-tests before the program begins and establish a method to collect data over the course of the program. Furthermore, planning for a future outcome evaluation — even if the immediate goal is a process evaluation — can be beneficial because at some point many stakeholders will want or need to answer the question "Does it work?" Partnering with an experienced evaluator can help stakeholders identify potential evaluation designs and decide how to collect the required data.

Determine the timing and resources needed

Stakeholders should consider the time and cost of an evaluation effort and build them into the evaluation plan. A general rule of thumb is to budget 10 percent of the total program cost for evaluation. Although completing a process evaluation may require only a few months, a large-scale outcome evaluation may require years and a substantial financial outlay. If stakeholders want the evaluation's results to help improve the program or justify continued funding, they need to make sure the evaluation is completed before the program is slated to end. This is particularly critical for programs that rely on grant funding, which are usually active only for a set period of time.

Document critical information

To help ensure the evaluation is instructive and meaningful, program staff should document the program's design, purpose and objectives so an evaluator can compare them to the program's actual implementation. Without that documentation, an evaluation is unlikely to produce enough meaningful information to justify its cost and level of effort. Having an evaluation plan in place from the beginning with clear requirements for documentation can help ensure that the needed information is actually collected.

Remain flexible

Despite the best planning, stakeholders cannot anticipate all aspects of a program's operation before implementation, so an evaluation plan should be responsive to program changes and shifting priorities. As they get new information, stakeholders may find some goals unrealistic or some data impossible to collect, access or track. They should revise the evaluation plan as necessary and document each change, justification and decision point.

In turn, stakeholders should be aware that some evaluations, particularly outcome evaluations, might require staff to operate a program differently than usual to rigorously assess the program's effect. For example, evaluators might ask staff to refrain from altering the program's operation during the evaluation period or to select participants in a different manner, perhaps through a randomized process. Partnering with an evaluator in the early stages of program development and implementation can help program staff understand what may be required of them to successfully evaluate the program later.

Special Challenges in Evaluating Multisite Programs

Implementing and evaluating a multisite program can be challenging, especially when sites are given latitude to implement the program in ways that suit their specific needs, because goals and designs will vary by site.

When writing an evaluation plan, stakeholders must consider whether sites will be implementing the program uniformly or will have flexibility in their design. If each site has a different strategy, stakeholders need to take that diversity into consideration and note it in the evaluation plan. Each site should create its own documentation, including a timeline and list of goals and objectives, and sites may require different evaluation strategies. Addressing differences across sites in the evaluation plan and monitoring their progress over time helps ensure each site is fully operational and has the necessary data and functionality for future evaluations.

Evaluability Assessments

Programs without evaluation plans in place can experience significant challenges during evaluations. If a program does not have an evaluation plan, an evaluability assessment can help determine whether the program can be evaluated and whether an evaluation will produce useful results. A program with an evaluation plan also can benefit from an evaluability assessment, which can gauge how well the evaluation plan was put into action and its effectiveness in preparing the program for an evaluation.

An evaluability assessment analyzes a program's goals, state of implementation, data capacity and measurable outcomes. It can save valuable time and money if it shows the program cannot be evaluated because evaluability assessments cost significantly less than actual evaluations. The evaluability assessment also can provide stakeholders with valuable information on how to alter the program structure to support future evaluations.

Design It So It Can Be Evaluated

The key to developing a program that can be evaluated is to have the goal of future evaluation in mind when designing the program's documentation, goals and implementation. Stakeholders also must continually monitor the program's progress and verify that relevant data are being captured, particularly if the goal is to conduct an outcome evaluation. Although evaluation is not always easy and can sometimes be an imposition to program operations, having an evaluation plan is invaluable to making such efforts as feasible and successful as possible. Program staff should, whenever possible, partner with a university, experienced researcher or sister science agency to help construct the plan. Having an evaluation plan in place will help ensure that future program evaluation is feasible and financially viable and that its results are instructive to program staff and stakeholders.

For More Information

Read a chapter by Finn-Aage Esbensen and Kristy N. Matsuda in Changing Course: Preventing Gang Membership (pdf, 12 pages) to learn more about program evaluations and why having a well-designed evaluation is critical to determining a program's effectiveness.

About This Article

This article appeared in  NIJ Journal No. 275 , posted May 2015.

[note 1] Patton, Michael Quinn, Qualitative Research Evaluation Methods , Thousand Oaks, Calif.: Sage, 1987; Rossi, Peter H., & Howard E. Freeman, Evaluation: A Systematic Approach (5th ed.), Newbury Park, Calif.: Sage, 1993.

About the author

Alison Brooks Martin was a postdoctoral research associate in NIJ's Office of Research and Evaluation from November 2013 until January 2015.

Cite this Article

Read more about:, related publications.

  • NIJ Journal Issue No. 275 NIJ Journal,

Skip to content. | Skip to navigation

Introduction

Associate Awards

  • History and Results
  • Work with Us

Capacity Building

  • Community-Based Information Systems

Data Demand and Use

Data Quality

Data Science

Economic Evaluation

Family Planning & Reproductive Health

Geographic Information Systems

  • Global Health Security
  • Health Informatics

Health Information Systems

HIV and AIDS

Key Populations

Learning Agenda

Maternal and Child Health

  • Network Analysis
  • Organizational Development

Orphans and Vulnerable Children

Pandemic Influenza

  • Routine Health Information Systems

Secondary Data Analysis

Systems Thinking

Tuberculosis

Youth and Adolescents

Publications

  • Newsroom: News, Blogs, Podcasts, Videos

Presentations

Health Information Systems Strengthening Resource Center

  • Results-Based Financing Indicator Compendium

Family Planning and Reproductive Health Indicators Database

Community-Based Indicators for HIV Programs

MEASURE Evaluation

Sampling and Evaluation – A Guide to Sampling for Program Impact Evaluation

PDF document icon

Author(s): Lance P, Hattori A

Sampling and Evaluation – A Guide to Sampling for Program Impact Evaluation

Program evaluation, or impact evaluation, is a way to get an accurate understanding of the extent to which a health program causes changes in the outcomes it aims to improve. Program impact studies are designed to tell us the extent to which a population's exposure to or participation in a program altered an outcome, compared to what would have happened in the absence of the program. Understanding whether a program produces intended changes allows society to focus scarce resources on those programs that most efficiently and effectively improve people's welfare and health.

The usual objective in program impact evaluation is to learn about how a population of interest is affected by the program. Programs are typically implemented in geographic areas where populations are large and beyond our resources to observe in their entirety. Therefore, we have to sample. Sampling is the process of selecting a set of observations from a population to estimate a chosen parameter—program impact, for example—for that population.

This manual explores the challenges of sampling for program impact evaluations—how to obtain a sample that is reliable for estimating impact of a program and how to obtain a sample that accurately reflects the population of interest.

The manual is divided into two sections: (1) basic sample selection and weighting and (2) sample size estimation. We anticipate that readers might get the most utility and comprehensive understanding from reading entire chapters rather than trying to cherry-pick portions of the discussions within them, as one might with a reference manual. This manual is more like a textbook.

Further, the manual is aimed at practitioners—in particular, those who design and implement samples for impact evaluation at their institution. Our discussions assume more than a basic understanding of sampling and some mathematical skill in applying sampling theory. That said, we are less interested in theory than in its practical application to solve sampling problems encountered in the field. We hope this manual will be a comprehensive and practical resource for that task.

WORK WITH US

Accomplishments

Evaluate Blog

Results-based Financing Indicator Compendium

Organization Development

Mail

  • Accessibility

Personal tools

  • Open access
  • Published: 02 September 2024

Using mixed methods and partnership to develop a program evaluation toolkit for organizations that provide physical activity programs for persons with disabilities

  • Sarah V. C. Lawrason 1 , 2 ,
  • Pinder DaSilva 3 ,
  • Emilie Michalovic 3 ,
  • Amy Latimer-Cheung 4 , 5 ,
  • Jennifer R. Tomasone 4 , 5 ,
  • Shane Sweet 6 , 7 ,
  • Tanya Forneris 1 ,
  • Jennifer Leo 8 ,
  • Matthew Greenwood 9 ,
  • Janine Giles 10 ,
  • Jane Arkell 11 ,
  • Jackie Patatas 12 ,
  • Nick Boyle 10 ,
  • Nathan Adams 1 , 2 &
  • Kathleen A. Martin Ginis 1 , 2 , 13  

Research Involvement and Engagement volume  10 , Article number:  91 ( 2024 ) Cite this article

Metrics details

The purpose of this paper is to report on the process for developing an online RE-AIM evaluation toolkit in partnership with organizations that provide physical activity programming for persons with disabilities.

A community-university partnership was established and guided by an integrated knowledge translation approach. The four-step development process included: (1) identify, review, and select knowledge (literature review and two rounds of Delphi consensus-building), (2) adapt knowledge to local context (rating feasibility of outcomes and integration into online platform), (3) assess barriers and facilitators (think-aloud interviews), and (4) select, tailor, implement (collaborative dissemination plan).

Step 1: Fifteen RE-AIM papers relevant to community programming were identified during the literature review. Two rounds of Delphi refined indicators for the toolkit related to reach, effectiveness, adoption, implementation, and maintenance. Step 2: At least one measure was linked with each indicator. Ten research and community partners participated in assessing the feasibility of measures, resulting in a total of 85 measures. Step 3: Interviews resulted in several recommendations for the online platform and toolkit. Step 4: Project partners developed a dissemination plan, including an information package, webinars, and publications.

This project demonstrates that community and university partners can collaborate to develop a useful, evidence-informed evaluation resource for both audiences. We identified several strategies for partnership when creating a toolkit, including using a set of expectations, engaging research users from the outset, using consensus methods, recruiting users through networks, and mentorship of trainees. The toolkit can be found at et.cdpp.ca. Next steps include disseminating (e.g., through webinars, conferences) and evaluating the toolkit to improve its use for diverse contexts (e.g., universal PA programming).

Plain English summary

Organizations that provide sport and exercise programming for people with disabilities need to evaluate their programs to understand what works, secure funding, and make improvements. However, these programs can be difficult to evaluate due to lack of evidence-informed tools, low capacity, and few resources (e.g., money, time). For this project, we aimed to close the evaluation gap by creating an online, evidence-informed toolkit that helps organizations evaluate physical activity programs for individuals with disabilities. The toolkit development process was guided by a community-university partnership and used a systematic four-step approach. Step one included reviewing the literature and building consensus among partners and potential users about indicators related to the success of community-based programs. Step two involved linking indicators with at least one measure for assessment. Step three involved interviews with partners who provided several recommendations for the online toolkit. Step four included the co-creation of a collaborative plan to distribute the toolkit for academic and non-academic audiences. Our comprehensive toolkit includes indicators for the reach, effectiveness, adoption, implementation, and maintenance of physical activity programs for individuals with disabilities. This paper provides a template for making toolkits in partnership with research users, offers strategies for community-university partnerships, and resulted in the co-creation of an evidence-informed evaluation resource to physical activity organizations. Users can find the toolkit at et.cdpp.ca.

Peer Review reports

Disability and physical activity

The United Nations Convention on the Rights of Persons with a Disability protects the rights of people living with disabilities to access full and effective participation in all aspects of life, including sports and other recreational forms of physical activity (PA) such as exercise and active play. But because of countless environmental, attitudinal and policy barriers [ 1 ], children, youth and adults with disabilities are the most physically inactive segment of society [ 2 , 3 ]. Physical inactivity increases the risk that people with disabilities will experience physical and mental health conditions, social isolation, and stigma [ 4 ]. Systematic reviews have evaluated the effects of participation in PA programs among children, youth, and adults with physical, intellectual, mental, or sensory disabilities. Many, but not all, of these reviews have reported significant improvements in physical health, mental health, and social inclusion [ 2 ]. One reason for the inconsistent outcomes is that the PA participation experiences of people with disabilities are not universally positive [ 5 ].

Qualitative and quantitative research shows that people with disabilities often report negative PA experiences; for instance, being marginalized, excluded, and receiving sub-standard equipment, access, instruction, and opportunities to fully participate in PA [ 6 , 7 , 8 ]. Research and theorizing on quality PA participation and disability indicate that these low-quality PA experiences deter ongoing participation and undermine the potential physical and psychosocial benefits of PA for children and adults [ 5 , 9 ]. These findings attest to the need for evaluation of existing PA programs to identify what is working, and where improvements are needed to achieve optimal participation and impact.

Evaluating community-based programs

Persons with disabilities increasingly participate in disability sport to be physically active, and disability sport is often delivered by community organizations [ 2 ]. Like many community-based and non-profit organizations, organizations that provide PA programming for persons with disabilities (herein referred to as ‘this sector’) are often expected to conduct evaluations. These evaluations are done to secure and maintain external funding, demonstrate impact to board members and collaborators, and understand capacity for growth [ 10 ]. Even though program evaluations are often required, real-world programs are difficult to evaluate [ 11 ] and organizations often lack capacity and resources to conduct evaluations effectively [ 12 ]. Programs may be difficult to evaluate due to program complexity (e.g., setting, target population, intended outcomes; [ 11 ], and evaluation priorities (e.g., differing partner needs and resources; [ 13 ]. Organizations may lack capacity in understanding and using appropriate evaluation methods and tools [ 14 ], determining what counts as evidence and its application [ 15 ], and the roles of researchers and practitioners in supporting real-world program evaluations [ 16 ].

Evaluation frameworks can be used to facilitate a guided, systematic approach to evaluation. A framework involves an overview or structure with descriptive categories, meaning they focus on describing phenomena and how they fit into a set of categories rather than providing explanations of how something is working or not working [ 17 ]. One evaluation framework that is commonly applied in PA and disability settings is the RE-AIM framework [ 18 ]. RE-AIM is comprised of five evaluation dimensions or categories: (a) Reach: the number, proportion, and representativeness of individuals who engage in a program, (b) Effectiveness: the positive and negative outcomes derived from a program, (c) Adoption, the number, proportion, and representativeness of possible settings and staff participating in the program, (d) Implementation: the cost and extent to which the program was intended to be delivered, and (e) Maintenance: the assessment beyond six months at the individual and organizational levels. The RE-AIM framework is appropriate in this sector because it aligns with organizations’ need to understand factors that influence PA participation at both individual and organizational levels and for process (formative) and outcome (summative) evaluations [ 19 , 20 , 21 , 22 , 23 ]. Additionally, the RE-AIM framework has demonstrated feasibility to evaluate programs in this sector [ 19 , 21 , 22 ]. The RE-AIM framework was developed to address the failures and delays of getting scientific research evidence into practice and policy [ 18 ].

Gaps between evaluation research and practice

There has been a growing body of evidence to suggest that one of the most effective ways to bridge the gap between research and practice is through integrated knowledge translation (IKT; [ 24 ]). IKT means that the right research users are meaningfully engaged at the right time throughout the research process [ 25 ]. IKT involves a paradigmatic shift from recognizing researchers as ‘experts’ to valuing the expertise of individuals with lived experience, programmers, and policymakers through their inclusion in the development of the research questions, methods, execution, and dissemination to ensure that the research is relevant, useful, and usable [ 25 ]. A commitment to IKT aligns with the “nothing about us without us” philosophy of the disability rights movement [ 26 ] and is therefore ideal for a toolkit development process for this sector.

To address the gaps of lack of evidence-informed resources and reduced organizational capacity to conduct program evaluations [ 12 ], our community partners (leaders from seven Canadian organizations in this sector) identified that a toolkit is needed. An evaluation toolkit is a collection of tools that includes materials that may be used individually or collectively, such as educational material, timelines, and assessment tools, and the tools may often be customized based on context, thus helping to bridge the translation gap between evidence and practice [ 27 ]. Toolkit development can be a multi-step process including literature reviews, interviewing partners, and using a Delphi approach [ 27 ]. Previous research with community-based disability PA organizations suggests that digital platforms can be an efficient way for participants and staff to provide evaluation access to evaluation tools [ 19 , 23 ]. Together, this research culminated in our decision to (1) use RE-AIM for the toolkit’s framework, meaning the toolkit was organized using the five evaluation dimensions, and (2) to deliver the toolkit through interactive technology. The purpose of this paper is to report on a systematic, IKT-focused process for the design, development, and formulation of implementation considerations for an online RE-AIM evaluation toolkit for organizations that provide PA programming for persons with disabilities.

Research approach

A community-university partnership was established between seven Canadian disability PA organizations and three universities. A technology partner guided the back-end development of the online toolkit. Using an IKT approach [ 25 ], community partners were engaged before the research grant was written and submitted to ensure that the project was meaningful and focused on the appropriate tasks and outcomes. To guide our partnership, we agreed to adopt the IKT guiding principles for SCI research [ 25 ] which aim to provide a foundation for meaningful engagement between partners. An example of a guiding principle is partners share in decision-making [ 25 ]. The principles were presented at each bi-monthly team meeting and participants had the opportunity to share concerns if certain principles were not upheld. Partners had regular opportunities for sharing in decision making, provided financial contributions to accelerate the project, and benefitted from developing the toolkit to tailor indicators and measures relevant for disability PA organizations. Two community partner leaders also provided mentorship to academic trainees on community engagement in research, employment in non-academia, and project management, emphasizing the multi-directional nature of the partnership. To see the entire IKT process, see Appendix A in the supplemental file.

To maximize the likelihood that our toolkit is used in practice, our development process was guided by the Knowledge-to-Action (KTA) framework (see Fig.  1 ; [ 28 ]). The KTA framework was developed to help researchers with knowledge translation by identifying the steps in moving knowledge into action [ 28 ]. The KTA framework has two components: (a) knowledge creation and (b) action cycle. Our toolkit development process followed the steps of the action cycle, whereby existing knowledge is synthesized, applied, and mobilized. The problem to be addressed is a need for a program evaluation toolkit. To solve the problem, as shown with the yellow boxes in Fig.  1 , the steps for developing the RE-AIM evaluation toolkit included: (1) identify, review, and select knowledge; (2) adapt the knowledge to the local context and users; (3) assess the barriers and facilitators to knowledge use; and (4) select, tailor, and implement the toolkit.

figure 1

Knowledge to action framework (adapted from [ 28 ])

To guide toolkit development, we ensured the methods aligned with recommendations from the COnsensus-based Standards for the selection of health Measurement Instruments/ Core Outcome Measures in Effectiveness Trials (COSMIN/COMET) groups for generating a set of core outcomes to be included in health intervention studies [ 29 ]. These guidelines state that developing a core outcome set requires finding existing outcome measurement instruments (see Step 1), quality assessment of instruments (see Step 2), and a consensus procedure to agree on the core outcome set (see Step 2) [ 29 ].

Step 1: Identify, review, and select knowledge

Literature review.

The first step in identifying, reviewing, and selecting knowledge was to conduct a literature review. The literature review examined research using the RE-AIM framework to evaluate community-based and health-related programs. This was completed through a search of www.re-aim.org (which lists all RE-AIM evaluations) to identify indicators for each RE-AIM dimension within community-based and health-related contexts. Studies were included if they: used the RE-AIM framework to evaluate a community-based health program or involved persons with disabilities, were published in English, and were peer reviewed. All study designs were included. The review also examined qualitative and quantitative studies of outcomes of community-based PA programs for people with disabilities (e.g., [ 9 ]) and outcomes our own partners have used in their own program evaluations. These papers and outcomes were discussed and chosen during early partnership meetings to initiate a list of indicators. Examples of community-based programs included peer support programs for individuals with spinal cord injuries in Quebec. Data extracted from papers included indicators (and their definitions) and associated measures used for evaluations.

Delphi process

The second part in identifying, reviewing, and selecting knowledge involves critically appraising the relevant literature identified, to determine its usefulness and validity for addressing the problem [ 28 ]. To determine usefulness and validity, a consensus-building outreach activity was used—an online Delphi method. Briefly, the Delphi method is used to arrive at a group decision by surveying a panel of experts [ 30 , 31 ]. The experts each respond to several rounds of surveys. Survey responses are synthesized and shared with the group after each round. The experts can adjust their responses in the next round based on their interpretations of the “group response.” The final response is considered a true consensus of the group’s opinion [ 30 , 31 ]. Delphi was ideal for our partnership approach because it eliminates power dynamics from the consensus-building process and ensures every expert’s opinion is heard and equally valued. Previous research has demonstrated the utility of Delphi methods to generate consensus among disability organizations regarding the most important outcomes to measure in a peer-support program evaluation tool [ 32 ].

Delphi methodologies are considered a reliable means for achieving consensus when a minimum of six experts are included [ 33 ]. Therefore, we aimed to recruit a minimum of six participants from each target group (i.e., members of disability PA organizations and researchers). Partners were encouraged to invite members who may qualify and be interested in completing the Delphi process. Participants completed a two-round Delphi process and were asked to rate each RE-AIM indicator on a scale of 1 (not at all important) to 10 (one of the most important). An indicator was included if at least 70% of participants agreed it was “very important” (8 or above) [ 31 ]. Indicators that did not meet these criteria were removed from the list.

Retained indicators were then paired with at least one possible measure of that indicator (e.g., the ‘Positive Youth Development’ indicator was paired with the Out-of-School Time Observation instrument [ 34 ]). The partnership’s goal was to develop a toolkit comprised of valid and reliable measures. Therefore, the validity and reliability of each measure were critically appraised by the academic team-members using COSMIN/COMET criteria [ 29 ]. For some ‘Effectiveness’ indicators, published questionnaires were identified from the scientific literature. Measures were retained if they had high quality evidence of good content validity and internal consistency reliability [ 29 ] and were used in PA contexts and/or contexts involving participants with disabilities. The measures of all other indicators (where no published questionnaire measure was identified) were assessed by nine partners and modified to ensure that the measure was accurate and reliable for evaluation use in this sector.

Step 2: Adapt knowledge to local context

In the KTA framework, this phase involves groups making decisions about the value, usefulness, and appropriateness of knowledge for their settings and circumstances and customizing the knowledge to their particular situation [ 28 ]. Using Microsoft Excel, partners were sent a list of the selected indicators and measures in two phases (Phase 1: “RE” indicators and Phase 2: “AIM” indicators). Partners were asked to rate, on a scale of 0 to 2 the following categories for each measure: feasibility-time (not at all feasible to feasible), feasibility-complexity (not at all feasible to feasible), accuracy (not at all accurate to accurate), and unintended consequences (no, maybe, yes). They were also asked to provide additional feedback. This step only involved partners on the project with experience administering questionnaires (in research or evaluation settings) because the process required knowledge of how to administer measures to respondents. The median and mean of each category were calculated with community partner responses given double weighting/value relative to academic partner responses. Double weighting was given to community partner responses as the toolkit is anticipated to be used more frequently in community settings. The feedback was summarized. Results were presented to all partners during an online meeting, and team members discussed feedback to establish agreement on measures. The measures were sent out to partners again to provide any final feedback on included indicators and measures. The selected indicators and measures were compiled in an online program evaluation toolkit compliant with accessibility standards.

Step 3: Assess barriers and facilitators

In the KTA framework, this step involves identifying potential barriers that may limit knowledge uptake and supports or facilitators that can be leveraged to enhance uptake [ 28 ]. In Step 3, partners were invited to participate in an unstructured, think-aloud interview while they used the online program evaluation toolkit [ 35 ]. Interviews were conducted to collect detailed data about how users reacted to different parts of the toolkit content, format, and structure. Each interview was conducted over Zoom with one participant and two interviewers. The two-to-one interview format [ 36 ] supported the ability to take notes during the interview, ask questions from different perspectives, and reflect on common experiences to the two interviewers [ 36 ] with the website. Participants were also asked how the toolkit was used and any barriers to its use, and identified features of the toolkit that may need to be changed. In a separate group meeting, team members were asked for ideas on how to overcome potential barriers to using the toolkit and tips for its implementation. Data were analyzed using a content analysis approach [ 37 ] and recommendations were prioritized by the lead and senior authors using the MoSCoW method [ 38 ]. The MoSCoW method is a prioritization technique that has authors categorize recommendations using the following criteria: (a) “Must Have” (Mo), (b) “Should Have” (S), (c) “Could Have” (Co), and (d) “Won't Have This Time” (W). These recommendations were presented to all partners for further discussion. Based on the feedback, the toolkit content and technology were further iterated as needed. Information from this step was used to write brief user guides for toolkit users.

Step 4: Select, tailor, implement

In the KTA framework, this step involves planning and executing interventions to promote awareness and implementation of knowledge, and tailoring interventions to barriers and audiences [ 28 ]. In Step 4, during an online partnership meeting, a brainstorming activity was completed to discuss target audiences for the toolkit, barriers and facilitators to outreach, and dissemination ideas. Team members formulated a dissemination plan and identified promotional resources they need to tailor the dissemination of the toolkit to their sector networks.

Literature Review

The initial searching process on the re-aim.org database identified 15 papers with relevant indicators for a RE-AIM toolkit. These papers and their citations are in Appendix B in the supplemental file. Additional resources identified by partners included: [ 2 , 9 , 39 , 40 ], and partners’ previous experiences with evaluations to inform potential indicator choices. In total, 62 indicators were identified across all RE-AIM domains.

In round 1, 32 people participated in the exercise (two participants did not provide demographic information). In round 2, 28 people completed the questionnaire (four participants did not provide demographic information). Detailed participant demographics are presented in Table  1 . The adaptation of indicators through the Delphi process can be found in Fig.  2 . Given that nearly all indicators were deemed important from round 2, we agreed that a third round of the Delphi process was not needed. Based on the literature review, measures for each indicator were identified.

figure 2

Adaptation process for indicators and measures from the Delphi process and partner feedback during COSMIN/COMET rating

Eight partners ( n  = 3 academic, n  = 5 community) completed the rating process for the “RE” domains and 10 partners ( n  = 3 academic, n  = 7 community) completed the rating process for the “AIM” domains (rating feasibility, complexity, accuracy, and unintended consequences; see Table  2 ). Respondent feedback was used to adapt and improve the measures to make them more feasible, less complex, and more accurate to reflect the indicators properly. Respondents also suggested that each measure should also include information boxes about the respondents, administrators, type of data collection, and time to complete data collection. The adaptation of indicators and measures from this process can be found in Fig.  2 . The final list of indicators and measures can be found in Table  3 .

Six partners (community and academic partners) participated in unstructured think-aloud interviews, one of which was conducted jointly with two partners ( M time  = 43.37, SD  ± 13.50 min). Across interviews, 45 unique recommendations were identified for improving the usability of the toolkit. These recommendations were sorted using the MoSCoW method, and prioritized based on budgetary constraints, team skillsets, and competing needs. Of the 45 recommendations, 30 were identified as ‘Must haves’, 6 as ‘Should haves’, 4 as ‘Could haves’, and 5 as ‘Won’t haves’ (see Appendix C in the supplemental file). All 30 ‘Must have’ recommendations were implemented in collaboration with the technology partner, along with 2 ‘Should have’ recommendations.

After all recommendations were executed by the technology partner, a final project meeting was held to discuss project updates, barriers and facilitators to outreach, and ideas for dissemination. Barriers to outreach included lack of research or evaluation knowledge to use the toolkit, lack of funding to conduct evaluations, poor turnover from reaching users (i.e., users becoming aware of the toolkit) to receiving (i.e., users browse the toolkit website) to using the toolkit (i.e., users use the toolkit for an evaluation), and challenges connecting with hard-to-reach organizations. Facilitators to outreach included providing resources for evaluation support, connecting with trainees to support evaluations, having positive self-efficacy and attitude for conducting evaluation, building awareness on the benefits of the toolkit through a dissemination campaign, credibility in the toolkit development process, and reaching out to key funders for administration of toolkit as guidance.

The toolkit can be found at et.cdpp.ca and is intended to be used by community organizations and academic institutions that conduct program evaluations involving PA and disability (and inclusive integrated programming). This interactive toolkit allows users to customize to their program evaluation situation by selecting a) which RE-AIM dimensions they want to evaluate, and b) which indicators they want to measure within a particular RE-AIM dimension (e.g., self-efficacy and quality participation within the Effectiveness dimension). Based on users’ selections, the toolkit program compiles the corresponding measures for each indicator into a customized, downloadable document that the user can then put in the format of their choosing (e.g., online survey, paper questionnaire) for their program evaluation. This design aligns with partner requests for a simple online interface that provides flexibility and tailoring to their program evaluation needs. The toolkit and user guides are made freely available (i.e., open access), to maximize accessibility to community organization and academic audiences.

A plan with dissemination and capacity building activities was created to ensure the supported uptake of the evaluation toolkit. Our priority was to create a knowledge translation and communications package (e.g., newsletter article, social media content) for community partner organizations to disseminate through their channels. This included disseminating information to other community organizations within their network and funding partners (e.g., Sport Canada, Canadian Tire Jumpstart, ParticipACTION, provincial ministries, and the Canadian Paralympic Committee). This package served as the official ‘launch’ of the evaluation toolkit on July 20, 2023. Through this package, other activities were listed as potential ‘services’ interested parties can use. These services include bookable time for ‘office hours’ whereby a one-on-one meeting on how to use the toolkit and conduct program evaluation can be arranged and a 1-h ‘frequently asked questions’ webinar/workshop. Other activities included publishing an open-access manuscript, writing knowledge translation and media blogs about the manuscript, and delivering academic and community conference presentations.

The purpose of this paper was to report on the process of developing an evaluation toolkit in partnership with organizations that provide PA programming for persons with disabilities. Informed by the RE-AIM framework [ 18 ] and the knowledge-to-action framework [ 28 ], the toolkit development process involved a literature review, Delphi process, and interviews to adapt indicators and measures. Recommendations from partners were implemented, and the final toolkit can be found at et.cdpp.ca. Partners collaborated to create a dissemination and capacity building plan to support the uptake of the toolkit across the target audience.

Community organizations struggle to conduct program evaluations and to use existing evaluation frameworks. A recent scoping review identified 71 frameworks used to evaluate PA and dietary change programs [ 41 ]. Despite access to many frameworks, Fynn et al. [ 41 ] found limited guidance and resources for using the frameworks. In response to these concerns, the toolkit acts as a resource for using the RE-AIM framework by facilitating the uptake of evidence-informed evaluation practices. The toolkit will help organizations overcome barriers to evaluation identified by previous research by increasing capacity to use appropriate methods and tools [ 14 ] and providing education on determining what counts as evidence and data [ 15 ]. This can facilitate better organizational direction, improved programming, and importantly, better quality PA experiences for individuals with disabilities. The toolkit also complied with accessibility standards, an important benchmark for our partnership and a necessary step when creating a product for organizations that serve persons with disabilities. Accessibility standards were relatively easy to achieve and should be customary in all IKT activities.

To the best of our ability, the toolkit was developed specifically for organizations that provide programming for people with disabilities by focussing the literature review, having program partners in the disability community participate in the Delphi process, and ensuring the validity and reliability of indicators in disability contexts. However, there is an enormous shortage of data related to PA and disability as most national health surveillance systems exclude or do not measure disability [ 2 ]. While this general limitation may affect the toolkit, it also means that the toolkit may be useful for universal PA organizations that are interested in evaluating programs with non-disabled individuals. Additional research is needed to examine the effectiveness of the toolkit in diverse contexts.

This project provides a template for developing open-access, online evidence-informed toolkits using an IKT approach with community partners. There are few resources on how to develop toolkits for the health and well-being field informed by knowledge translation frameworks or that include perspectives of end-users (e.g., [ 42 , 43 ]). The four-step mixed-methods approach was guided by the systematic use of frameworks to inform toolkit development. Our project utilized a rigorous, step-by-step process for creating toolkits and resources for this sector that centres the knowledge and expertise of research users. To centre the knowledge and expertise of research users, we employed several strategies identified by Hoekstra et al. [ 44 ] for building strong disability research partnerships. Important strategies for partnership when developing a toolkit include (1) using a set of norms, rules, and expectations, (2) engagement of research users in the planning of research, (3) using consensus methods (i.e., Delphi), and (4) recruiting research users via professional or community networks [ 44 ].

First, we used the IKT Guiding Principles [ 25 ] as the set of norms, rules, and expectations to guide our partnership. These principles were addressed throughout the partnership and provided criteria to understand the success of the partnership. Second, we engaged with community partners from the beginning of the research process. Working with community partners who were committed to developing a high-quality product was integral to the success of this project. Community partners were committed and highly engaged as the toolkit stemmed from a community-identified need, rather than solely a ‘research gap’. Third, using consensus methods is an excellent strategy to avoid decision-making that is dominated by certain voices or interests in the partnership [ 45 ]. One way that our project allowed for multiple voices to be heard was through our anonymous Delphi processes, which encouraged partners to share their input in a non-confrontational and data-driven manner. Fourth, in our partnership, many individuals and organizations had longstanding working relationships and aligned priorities for the project. Building our partnership based on previous trusting, respectful relationships was essential and using the IKT guiding principles [ 25 ] ensured that we maintained similar values and priorities throughout the partnership.

We used an additional strategy that has not been previously mentioned in the IKT literature: mentorship of research trainees by community partners. Through monthly meetings, two community partners provided mentorship sessions to three trainees. These sessions focused on how to close the research-to-practice gap and helped to facilitate strong relationships between researchers and research users. Mentorship was an important step for training the next generation of researchers to use IKT.

Limitations

This project has some limitations. First, an exhaustive systematic scoping review was not conducted to identify evaluation indicators. This may have limited the number of relevant evaluation indicators included in the Delphi surveys. However, given that only five indicators were removed, and none were added after two rounds of Delphi, we are confident that our search returned relevant indicators. In the future, it may be worthwhile to consider an in-person or video-conference-facilitated Delphi process to encourage discussion and differentiation of indicators. Second, we identified several barriers and facilitators for using the toolkit, but addressing these barriers meaningfully was beyond the scope of this paper. We are currently in the process of disseminating (e.g., social media campaigns, blogs, discussions with funders) and evaluating the toolkit (e.g., surveys, using data analytics). This data will be reported in a future paper. Third, the interviews revealed 45 unique recommendations for the website and toolkit, but only some of these recommendations could be implemented due to budgetary constraints (e.g., adding a search function and filtering indicators to the website could not be completed).

Conclusions

In summary, this paper reports on the development of an online, open-access program evaluation toolkit for the disability and PA sector. The toolkit is informed by the RE-AIM framework [ 18 ] and available at et.cdpp.ca. Our paper describes a four-step process guided by the KTA framework [ 28 ] and IKT principles [ 25 ] to work with community partners to ensure the toolkit is relevant, useful, and usable. The process included reviewing the literature, building consensus through two rounds of Delphi surveys, rating the feasibility and complexity of measures, assessing barriers and facilitators through think-aloud interviews, and crafting a dissemination and capacity-building plan. This paper provides a template for creating toolkits in partnership with research users, demonstrates strategies to enable successful community-university partnerships, and offers an evidence-informed evaluation resource to organizations that provide PA programming for persons with disabilities.

Availability of data and materials

The datasets used and/or analysed during the current study are available from the corresponding author on reasonable request.

Abbreviations

Integrated knowledge translation

Knowledge-to-action

Physical activity

Reach, effectiveness, adoption, implementation, maintenance

Martin Ginis KA, Ma JK, Latimer-Cheung AE, Rimmer JH. A systematic review of review articles addressing factors related to physical activity participation among children and adults with physical disabilities. Health Psychol Rev. 2016;10(4):478–94. https://doi.org/10.1080/17437199.2016.1198240 .

Article   PubMed   Google Scholar  

Martin Ginis KA, van der Ploeg HP, Foster C, Lai B, McBride CB, Ng K, et al. Participation of people living with disabilities in physical activity: a global perspective. Lancet. 2021;398(10298):443–55.

van den Berg-Emons RJ, Bussmann JB, Stam HJ. Accelerometry-based activity spectrum in persons with chronic physical conditions. Arch Phys Med Rehabil. 2010;91(12):1856–61.

World Health Organization. World report on disability. Geneva; 2011.

Evans MB, Shirazipour CH, Allan V, Zanhour M, Sweet SN, Martin Ginis KA, et al. Integrating insights from the parasport community to understand optimal experiences: the quality parasport participation framework. Psychol Sport Exerc. 2018;37:79–90.

Article   Google Scholar  

Martin Ginis KA, Gee CM, Sinden AR, Tomasone JR, Latimer-Cheung AE. Relationships between sport and exercise participation and subjective well-being among adults with physical disabilities: Is participation quality more important than participation quantity? Psychol Sport Exerc. 2024;70:102535.

Allan V, Smith B, Côté J, Martin Ginis KA, Latimer-Cheung AE. Narratives of participation among individuals with physical disabilities: a life-course analysis of athletes’ experiences and development in parasport. Psychol Sport Exerc. 2018;37:170–8.

Orr K, Tamminen KA, Sweet SN, Tomasone JR, Arbour-Nicitopoulos KP. “I’ve had bad experiences with team sport”: sport participation, peer need-thwarting, and need-supporting behaviors among youth identifying with physical disability. Adapt Phys Activ Q. 2018;35(1):36–56.

Shirazipour CH, Latimer-Cheung AE. Understanding quality participation: exploring ideal physical activity outcomes for military veterans with a physical disability. Qual Res Sport Exerc Health. 2020;12(4):563–78. https://doi.org/10.1080/2159676X.2019.1645037 .

Patton M. Qualitative research and evaluation methods. London: Sage Publications; 2015.

Google Scholar  

Moore GF, Audrey S, Barker M, Bond L, Bonell C, Hardeman W, et al. Process evaluation of complex interventions: Medical Research Council guidance. BMJ. 2015 Mar 19 [cited 2022 Apr 18];350. Available from: https://www.bmj.com/content/350/bmj.h1258

Lawrason S, Turnnidge J, Tomasone J, Allan V, Côté J, Dawson K, et al. Employing the RE-AIM framework to evaluate multisport service organization initiatives. J Sport Psychol Action. 2021;12(2):87–100.

Habicht JP, Victora CG, Vaughan JP. Evaluation designs for adequacy, plausibility and probability of public health programme performance and impact. Int J Epidemiol. 1999;28(1):10–8. https://doi.org/10.1093/ije/28.1.10 .

Article   CAS   PubMed   Google Scholar  

Milstein B, Wetterhall S. A framework featuring steps and standards for program evaluation. Health Promot Pract. 2000;1(3):221–8. https://doi.org/10.1177/152483990000100304 .

Li V, Carter SM, Rychetnik L. Evidence valued and used by health promotion practitioners. Health Educ Res. 2015;30(2):193–205. https://doi.org/10.1093/her/cyu071 .

Lobo R, Petrich M, Burns SK. Supporting health promotion practitioners to undertake evaluation for program development. BMC Public Health. 2014;14(1):1315. https://doi.org/10.1186/1471-2458-14-1315 .

Article   PubMed   PubMed Central   Google Scholar  

Nilsen P (2015) Making sense of implementation theories, models and frameworks. Implementation Sci 10:53. https://doi.org/10.1186/s13012-015-0242-0

Glasgow RE, Vogt TM, Boles SM. Evaluating the public health impact of health promotion interventions: the RE-AIM framework. Am J Public Health. 1999;89(9):1322–7.

Article   CAS   PubMed   PubMed Central   Google Scholar  

Shaw RB, Sweet SN, McBride CB, Adair WK, Martin Ginis KA. Operationalizing the reach, effectiveness, adoption, implementation, maintenance (RE-AIM) framework to evaluate the collective impact of autonomous community programs that promote health and well-being. BMC Public Health. 2019;19(1):803. https://doi.org/10.1186/s12889-019-7131-4 .

Bean CN, Kendellen K, Halsall T, Forneris T. Putting program evaluation into practice: enhancing the Girls Just Wanna Have Fun program. Eval Program Plann. 2015;49:31–40.

Gainforth HL, Latimer-Cheung AE, Athanasopoulos P, Martin Ginis KA. Examining the feasibility and effectiveness of a community-based organization implementing an event-based knowledge mobilization initiative to promote physical activity guidelines for people with spinal cord injury among support personnel. Health Promot Pract. 2014;16(1):55–62. https://doi.org/10.1177/1524839914528210 .

Sweet SN, Ginis KAM, Estabrooks PA, Latimer-Cheung AE. Operationalizing the RE-AIM framework to evaluate the impact of multi-sector partnerships. Implement Sci. 2014;9(1):74. https://doi.org/10.1186/1748-5908-9-74 .

Whitley MA, Forneris T, Barker B. The reality of evaluating community-based sport and physical activity programs to enhance the development of underserved youth: challenges and potential strategies. Quest. 2014;66(2):218–32. https://doi.org/10.1080/00336297.2013.872043 .

Graham ID, Kothari A, McCutcheon C, Angus D, Banner D, Bucknall T, et al. Moving knowledge into action for more effective practice, programmes and policy: protocol for a research programme on integrated knowledge translation. Implement Sci. 2018;13(1):22. https://doi.org/10.1186/s13012-017-0700-y .

Gainforth HL, Hoekstra F, McKay R, McBride CB, Sweet SN, Martin Ginis KA, et al. Integrated knowledge translation guiding principles for conducting and disseminating spinal cord injury research in partnership. Arch Phys Med Rehabil. 2021;102(4):656–63.

Charlton JI. Nothing about us without us: disability oppression and empowerment. Berkeley: University of California Press; 1998.

Book   Google Scholar  

Thoele K, Ferren M, Moffat L, Keen A, Newhouse R. Development and use of a toolkit to facilitate implementation of an evidence-based intervention: a descriptive case study. Implement Sci Commun. 2020;1(1):86. https://doi.org/10.1186/s43058-020-00081-x .

Graham ID, Logan J, Harrison MB, Straus SE, Tetroe J, Caswell W, et al. Lost in knowledge translation: time for a map? J Contin Educ Heal Prof. 2006;26(1):13–24. https://doi.org/10.1002/chp.47 .

Prinsen CAC, Vohra S, Rose MR, Boers M, Tugwell P, Clarke M, et al. How to select outcome measurement instruments for outcomes included in a “Core Outcome Set”—a practical guideline. Trials. 2016;17(1):449. https://doi.org/10.1186/s13063-016-1555-2 .

Hsu CC, Sandford BA. Minimizing non-response in the Delphi Process: how to respond to non-response. Practical Assessment, Research and Evaluation. 2007;12(17).

Hasson F, Keeney S, McKenna H. Research guidelines for the Delphi survey technique. J Adv Nurs. 2000;32(4):1008–15.

Shi Z, Michalovic E, McKay R, Gainforth HL, McBride CB, Clarke T, et al. Outcomes of spinal cord injury peer mentorship: a community-based Delphi consensus approach. Ann Phys Rehabil Med. 2023;66(1):101678.

Black N, Murphy M, Lamping D, McKee M, Sanderson C, Askham J, et al. Consensus development methods: a review of best practice in creating clinical guidelines. J Health Serv Res Policy. 1999;4(4):236–48. https://doi.org/10.1177/135581969900400410 .

Birmingham J, Pechman EM, Russell CA, Mielke M. Shared features of high-performing after-school programs: a follow-up to the TASC evaluation. Washington, DC; 2005.

van den Haak MJ, de Jong MDT, Schellens PJ. Evaluation of an informational Web site: three variants of the think-aloud method compared. Tech Commun. 2007;54(1):58–71.

Monforte J, Úbeda-Colomer J. Tinkering with the two-to-one interview: reflections on the use of two interviewers in qualitative constructionist inquiry. Methods Psychol. 2021;5:100082.

Hsieh HF, Shannon SE. Three approaches to qualitative content analysis. Qual Health Res. 2005;15(9):1277–88. https://doi.org/10.1177/1049732305276687 .

Hatton S. Early Prioritisation of Goals. In: Hainaut JL, Rundensteiner EA, Kirchberg M, Bertolotto M, Brochhausen M, Chen YPP, et al., editors. Advances in conceptual modeling—foundations and applications. Berlin, Heidelberg: Springer; 2007. p. 235–44.

Williams TL, Ma JK, Martin Ginis KA. Participant experiences and perceptions of physical activity-enhancing interventions for people with physical impairments and mobility limitations: a meta-synthesis of qualitative research evidence. Health Psychol Rev. 2017;11(2):179–96. https://doi.org/10.1080/17437199.2017.1299027 .

Shirazipour CH, Tennant EM, Aiken AB, Latimer-Cheung AE. Psychosocial aspects of physical activity participation for military personnel with illness and injury: a scoping review. Mil Behav Health. 2019;7(4):459–76. https://doi.org/10.1080/21635781.2019.1611508 .

Fynn JF, Hardeman W, Milton K, Jones AP. A scoping review of evaluation frameworks and their applicability to real-world physical activity and dietary change programme evaluation. BMC Public Health. 2020;20(1):1000. https://doi.org/10.1186/s12889-020-09062-0 .

Hildebrand J, Lobo R, Hallett J, Brown G, Maycock B. My-peer toolkit [1.0]: developing an online resource for planning and evaluating peer-based youth programs. Youth Studies Australia. 2012;31(2):53–61.

Buckingham S, Anil K, Demain S, Gunn H, Jones RB, Kent B, et al. Telerehabilitation for people with physical disabilities and movement impairment: development and evaluation of an online toolkit for practitioners and patients. Disabil Rehabil. 2023;45(11):1885–92. https://doi.org/10.1080/09638288.2022.2074549 .

Hoekstra F, Trigo F, Sibley KM, Graham ID, Kennefick M, Mrklas KJ, et al. Systematic overviews of partnership principles and strategies identified from health research about spinal cord injury and related health conditions: a scoping review. J Spinal Cord Med. 2023;46(4):614–31. https://doi.org/10.1080/10790268.2022.2033578 .

Oliver K, Kothari A, Mays N. The dark side of coproduction: do the costs outweigh the benefits for health research? Health Res Policy Syst. 2019;17(1):33. https://doi.org/10.1186/s12961-019-0432-3 .

Salsman JM, Lai JS, Hendrie HC, Butt Z, Zill N, Pilkonis PA, et al. Assessing psychological well-being: self-report instruments for the NIH Toolbox. Qual Life Res. 2014;23(1):205–15. https://doi.org/10.1007/s11136-013-0452-3 .

Bandura A. Self-Efficacy Beliefs of Adolescents. Pajares F, Urdan T, editors. Greenwich: information age publishing; 2006.

Bandura A. Social foundations of thought and action: a social cognitive theory. Englewood Cliffs: Prentice Hall; 1986.

Salsman JM, Schalet BD, Merluzzi TV, Park CL, Hahn EA, Snyder MA, et al. Calibration and initial validation of a general self-efficacy item bank and short form for the NIH PROMIS®. Qual Life Res. 2019;28(9):2513–23. https://doi.org/10.1007/s11136-019-02198-6 .

Smith BW, Dalen J, Wiggins K, Tooley E, Christopher P, Bernard J. The brief resilience scale: assessing the ability to bounce back. Int J Behav Med. 2008;15(3):194–200. https://doi.org/10.1080/10705500802222972 .

The WHO QoL Group. Development of the World Health Organization WHOQOL-BREF quality of life assessment. Psychol Med. 1998/05/01. 1998;28(3):551–8.

Sallis JF, Grossman RM, Pinski RB, Patterson TL, Nader PR. The development of scales to measure social support for diet and exercise behaviors. Prev Med (Baltimore). 1987;16(6):825–36.

Article   CAS   Google Scholar  

Asunta P, Rintala P, Pochstein F, Lyyra N, McConkey R. The development and initial validation of a short, self-report measure on social inclusion for people with intellectual disabilities: a transnational study. Int J Environ Res Public Health. 2021;18(5):2540.

Craig CL, Marshall AL, Sjöström M, Bauman AE, Booth ML, Ainsworth BE, et al. International physical activity questionnaire: 12-country reliability and validity. Med Sci Sports Exerc. 2003;35(8):1381–95.

Forrest CB, Bevans KB, Pratiwadi R, Moon J, Teneralli RE, Minton JM, et al. Development of the PROMIS® pediatric global health (PGH-7) measure. Qual Life Res. 2014;23(4):1221–31. https://doi.org/10.1007/s11136-013-0581-8 .

Hays RD, Bjorner JB, Revicki DA, Spritzer KL, Cella D. Development of physical and mental health summary scores from the patient-reported outcomes measurement information system (PROMIS) global items. Qual Life Res. 2009;18(7):873–80. https://doi.org/10.1007/s11136-009-9496-9 .

Marsh HW, Martin AJ, Jackson S. Introducing a short version of the physical self description questionnaire: new strategies, short-form evaluative criteria, and applications of factor analyses. J Sport Exerc Psychol. 2010;32(4):438–82.

Wicker P, Breuer C. Exploring the organizational capacity and organizational problems of disability sport clubs in Germany using matched pairs analysis. Sport Manag Rev. 2014;17(1):23–34.

Thrash TM, Elliot AJ. Inspiration: core characteristics, component processes, antecedents, and function. J Pers Soc Psychol. 2004;87(6):957–73.

Bakhsh JT, Lachance EL, Thompson A, Parent MM. Outcomes of the sport event volunteer experience: examining demonstration effects on first-time and returning volunteers. Int J Event Festiv Manag. 2021;12(2):168–83. https://doi.org/10.1108/IJEFM-09-2020-0057 .

Caron JG, Martin Ginis KA, Rocchi M, Sweet SN. Development of the measure of experiential aspects of participation for people with physical disabilities. Arch Phys Med Rehabil. 2019;100(1):67–77.

Bean C, Kramers S, Camiré M, Fraser-Thomas J, Forneris T. Development of an observational measure assessing program quality processes in youth sport. Cogent Soc Sci. 2018;4(1):1467304. https://doi.org/10.1080/23311886.2018.1467304 .

Download references

Acknowledgements

We would like to acknowledge Ava Neely and Kenedy Olsen for their contributions in assisting with this project. In memoriam of Jane Arkell who played an important role on this project and dedicated herself to improving the lives of individuals with disabilities.

This work was supported by a Social Sciences and Humanities Research Council Connection Grant.

Author information

Authors and affiliations.

School of Health and Exercise Sciences, University of British Columbia, Kelowna, BC, Canada

Sarah V. C. Lawrason, Tanya Forneris, Nathan Adams & Kathleen A. Martin Ginis

International Collaboration on Repair Discoveries, Vancouver, BC, Canada

Sarah V. C. Lawrason, Nathan Adams & Kathleen A. Martin Ginis

Abilities Centre, Whitby, ON, Canada

Pinder DaSilva & Emilie Michalovic

School of Kinesiology and Health Studies, Queen’s University, Kingston, ON, Canada

Amy Latimer-Cheung & Jennifer R. Tomasone

Revved Up, Kingston, ON, Canada

Department of Kinesiology and Physical Education, McGill University, Montreal, QC, Canada

Shane Sweet

Center for Interdisciplinary Research in Rehabilitation of Greater Montreal (CRIR), Montreal, Canada

The Steadward Centre for Personal and Physical Achievement, University of Alberta, Edmonton, AB, Canada

Jennifer Leo

Pickering Football Club, Pickering, ON, Canada

Matthew Greenwood

Rocky Mountain Adaptive, Canmore, AB, Canada

Janine Giles & Nick Boyle

Active Living Alliance, Ottawa, ON, Canada

Jane Arkell

BC Wheelchair Sports Association, Vancouver, BC, Canada

Jackie Patatas

Department of Medicine, University of British Columbia, Vancouver, BC, Canada

Kathleen A. Martin Ginis

You can also search for this author in PubMed   Google Scholar

Contributions

KMG, PDS, EM, TF, JL, MG, JG, JA, JP, & NB made substantial contributions to the conception of the project. SVCL, PDS, EM, ALC, JRT, SS, TF, JL, and KMG designed the project. All authors were involved in acquiring the data through recruitment. SVCL, NA, and KMG analyzed the data. All authors were involved in interpreting the data. SVCL, NA, and KMG drafted the paper or substantively revised it. All authors have approved the submitted version of this paper. All authors have agreed both to be personally accountable for the author's own contributions and to ensure that questions related to the accuracy or integrity of any part of the work, even ones in which the author was not personally involved, are appropriately investigated, resolved, and the resolution documented in the literature.

Corresponding author

Correspondence to Sarah V. C. Lawrason .

Ethics declarations

Ethics approval and consent to participate.

Approval was waived as the project conducted under ‘program evaluation’ requirements for University of British Columbia.

Consent for publication

Not applicable.

Competing interests

The authors declare no competing interests.

Additional information

Publisher's note.

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary Information

Additional file 1, additional file 2, rights and permissions.

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ . The Creative Commons Public Domain Dedication waiver ( http://creativecommons.org/publicdomain/zero/1.0/ ) applies to the data made available in this article, unless otherwise stated in a credit line to the data.

Reprints and permissions

About this article

Cite this article.

Lawrason, S.V.C., DaSilva, P., Michalovic, E. et al. Using mixed methods and partnership to develop a program evaluation toolkit for organizations that provide physical activity programs for persons with disabilities. Res Involv Engagem 10 , 91 (2024). https://doi.org/10.1186/s40900-024-00618-7

Download citation

Received : 02 February 2024

Accepted : 23 July 2024

Published : 02 September 2024

DOI : https://doi.org/10.1186/s40900-024-00618-7

Share this article

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

  • Implementation science
  • Knowledge translation
  • Delphi technique

Research Involvement and Engagement

ISSN: 2056-7529

program evaluation methodology sample

  • Open access
  • Published: 04 September 2024

Targeting emotion dysregulation in depression: an intervention mapping protocol augmented by participatory action research

  • Myungjoo Lee   ORCID: orcid.org/0000-0002-8301-7996 1 ,
  • Han Choi   ORCID: orcid.org/0000-0003-0406-5605 2 &
  • Young Tak Jo   ORCID: orcid.org/0000-0002-0561-2503 1  

BMC Psychiatry volume  24 , Article number:  595 ( 2024 ) Cite this article

Metrics details

Depression is a highly prevalent and often recurrent condition; however, treatment is not always accessible or effective in addressing abnormalities in emotional processing. Given the high prevalence of depression worldwide, identifying and mapping out effective and sustainable interventions is crucial. Emotion dysregulation in depression is not readily amenable to improvement due to the complex, time-dynamic nature of emotion; however, systematic planning frameworks for programs addressing behavioral changes can provide guidelines for the development of a rational intervention that tackles these difficulties. This study proposes an empirical and theoretical art-based emotion regulation (ER) intervention using an integrated approach that combines intervention mapping (IM) with participatory action research (PAR).

We used the IM protocol to identify strategies and develop an intervention for patients with major depressive disorder (MDD). As applied in this study, IM comprises six steps: (a) determining the need for new treatments and determinants of risk; (b) identifying changeable determinants and assigning specific intervention targets; (c) selecting strategies to improve ER across relevant theories and research disciplines; (d) creating a treatment program and refining it based on consultations with an advisory group; (e) developing the implementation plan and conducting a PAR study to pilot-test it; and (f) planning evaluation strategies and conducting a PAR study for feedback on the initial testing.

Following the steps of IM, we developed two frameworks for an art-based ER intervention: an individual and an integrative framework. The programs include four theory- and evidence-based ER strategies aimed mainly at decreasing depressive symptoms and improving ER in patients with MDD. We also developed a plan for evaluating the proposed intervention. Based on our preliminary PAR studies, the intervention was feasible and acceptable for adoption and implementation in primary care settings.

The application of IM incorporated with PAR has resulted in an intervention for improving ER in depression. While changing behavior is perceived as a challenging and elaborate task, this method can be useful in offering a clear structure for developing rational interventions. Further refinement is necessary through rigorous research.

Peer Review reports

Depression is a highly prevalent and often recurrent condition that severely impairs psychological functioning and quality of life. According to the Global Health Data Exchange, depression affects 3.8% of the world’s population and, as “a major contributor to the overall global burden of disease,” is associated with substantial societal and personal costs [ 1 , 2 ]. Due to its enormous impact on public health, the World Health Organization (WHO) predicts that depression will rank first among all causes of the burden of disease by 2030 [ 3 ]. As depression is frequently comorbid with other mental and physical disorders, it is particularly challenging to identify risk factors and develop effective interventions.

Depression is a disorder of emotion. Disordered affect is a hallmark of depressive episodes, characterized by complex but apparent abnormalities of emotional functioning [ 4 , 5 ]. Many factors may be associated with the disorder; however, its symptoms evidently indicate failures in emotional self-regulation [ 6 ]. Emotion regulation (ER) refers to an individual’s ability to modulate the intensity, frequency, and duration of emotional responses [ 7 , 8 ]. Decades of empirical research have shown that depression is associated with increases in unpleasant emotions and decreases in positive emotions [ 9 , 10 ]. It has been proposed that difficulties in ER in depression significantly contribute to dysfunctional emotions [ 10 , 11 ].

The complexity and time-dynamic nature of emotion make emotion dysregulation in depression particularly challenging to tackle. Most situations in daily life that evoke emotions are ambiguous. It remains unclear how patients can enhance their ER abilities in treatment [ 12 ]. Dysfunctional ER is a fundamental risk factor for the onset of depression and a range of psychiatric disorders [ 13 , 14 ]; however, the evidence base is diffuse and broad, as its mechanisms remain poorly specified [ 12 , 15 , 16 ]. Although some studies have developed psychological interventions to improve ER, research in this area remains limited [ 12 , 17 , 18 ]. Some have argued that teaching a wide range of ER strategies might not be effective in enhancing patients’ emotional functioning [ 12 , 17 ]. Of note, there is a lack of research on the use of art psychotherapy in this context.

An intervention mapping (IM) study systematically rooted in the evidence and theories of basic affective science is required to increase the likelihood of changing behaviors in ER. To target emotional dysregulation, a systematic, participatory, and integrated approach that benefits from efficient behavior change is crucial [ 19 ]. Accordingly, this study determines effective ways of enhancing patients’ ER capacities and developing an optimized art-based psychotherapy intervention for depression. For this purpose, we followed the standard IM protocol [ 20 ]. While developing a treatment may be time-consuming and burdensome, this study provides a straightforward, stepwise decision-making procedure. Along with its use of participatory action research (PAR), this study aims to benefit from the engagement of patients and mental health professionals in a collaborative manner. This type of collaboration is a practical and powerful tool for developing specialized interventions.

Intervention mapping protocol

This study mapped out the process of development based on IM, a program-planning framework. IM provides a step-by-step process for planning theory/evidence-based interventions from the needs to potential methods addressing those needs [ 20 , 21 ]. Since its development in the healthcare field in 1998, IM has been widely used and applications have emerged in other fields, including health promotion. It has been used to develop intervention programs to better target specific behaviors, including health, discrimination, and safety behaviors [ 22 ]. In particular, mental health researchers have largely applied the IM approach for either creating new interventions or adapting existing ones: strategies have been developed for the treatment and prevention of depression through IM, such as an internet-based intervention for postpartum depression [ 23 ], an online-coaching blended program for depression in middle-aged couples [ 24 ], a return-to-work intervention for depression [ 25 ], music therapy for depressive symptoms in young adults [ 26 ], and life-review therapy for depressive symptoms in nursing home residents [ 27 ]. The use of IM has proven to be a useful instrument for the development and optimization of treatments for depression that are tailored to different contexts and target populations.

Over the course of the development of the entire program, four perspectives characterizing IM are applied: (a) a participation approach that engages intended participants, allies, and implementers in program development, implementation, and evaluation; (b) a systems approach acknowledging that an intervention is an event occurring in a system that includes other factors influencing the intervention; (c) a multi-theory approach that stimulates the use of multiple theories; and (d) an ecological approach recognizing the relevance of social, physical, and political environmental conditions.

The IM protocol includes six core steps: (i) justifying the rationale for developing a new treatment; (ii) selecting targeted determinants and setting treatment goals; (iii) determining theoretical and empirical methods for behavior change; (iv) developing a treatment and program materials; (v) planning for adoption and implementation; and (vi) specifying the evaluation design [ 20 , 21 , 28 ]. The development process is cumulative: subsequent steps are based on completed tasks from the previous step. Figure  1 presents the six steps of IM. This article presents the details of our study methods and the results as the six steps of the IM process.

figure 1

Overview of the intervention mapping (IM) process [ 20 ]

Steps 1–3 of IM: Literature review

To address Steps 1, 2, and 3, we conducted a literature review using PubMed, ProQuest, Scopus, PsycArticles, and Google Scholar. Search strategies were devised using subject headings such as “emotion regulation,” “depression,” “emotional psychopathology,” “emotion regulation therapy,” and “art psychotherapy” as appropriate for each database. Furthermore, the program planners identified and included additional free text words. Due to the heterogeneity of emotion-related processes, the search strategies for Steps 1–3 were broad [ 15 ]. Additionally, we conducted an inclusive literature review of relevant databases to identify articles related to art-based interventions for ER, limited to published articles in English. This literature study identified effective ER strategies for improving regulatory capacities in depression. We describe the theoretical details related to ER and ER strategies in the Results section.

Steps 4–6 of IM: Participatory action research combined

Steps 4–6 of IM occasionally incorporate further studies for pilot testing and refining the intervention under development. As such, our study added participatory components to the IM process. PAR is “a participatory and consensual approach towards investigating problems and developing plans to deal with them.” [ 29 ] PAR empowers research participants compared with other approaches, where study participants are often considered subjects who passively follow directions [ 30 ]. The involvement of patients, care providers, and health professionals in research design is increasingly recognized as an essential approach for improving the quality of primary care [ 31 ] and bridging the gap between research and health care [ 32 ]. Indeed, PAR has been applied in many fields and achieved successful results, particularly in the field of mental health [ 33 ].

In particular, patient involvement is a meaningful partnership with stakeholders, including patients, carers, and public stakeholders, who actively participate in improving healthcare practices [ 31 ]. Involvement can occur at different levels and commonly includes patient engagement and advisory boards [ 32 ]. We conducted participatory action studies to combine systematic studies with the development of practical treatments [ 33 ] and anticipated the benefits of experiential knowledge. Figure  2 elaborates on how we incorporated PAR in the IM framework. It also presents our strategies to address the IM protocol and the results from each step. As described in Fig.  2 , the PAR in this current study comprises three phases:(a) consultation with an advisory board; (b) initial testing of intervention; and (c) mixed methods feedback studies using focus group interviews and survey research.

figure 2

Study procedure combined with PAR, strategies applied for each step, and results for each step

Noted. The figure specifies strategies to adopt in addressing the six steps of IM protocol and the actions for each step. It represents how IM can be applied and how it can augment its protocols through PAR. In the application of IM, this study relied on literature research and empirical studies: we conducted a literature study to address Steps 1–3 and combined the participatory action approach with IM methodology to address Steps 4–6

(a) PAR 1: Consultation with the advisory board

First, we established an advisory board that included a psychiatrist, an expert on methodology, a trained integrative medicine professional, and a professor in a graduate art psychotherapy program. The advisory board provided feedback at the individual level and comments during subsequent consultations. We engaged and managed the advisory board throughout Step 4, the intervention development process.

(b) PAR 2: Initial testing of intervention being developed 

In addition, we conducted a participatory action study to facilitate patient engagement and elicit their voices in a collaborative relationship with researchers. Based on voluntary participation, this study aimed to pretest art-based ER strategies and treatment designs. We conducted an art therapy program as part of routine inpatient therapeutic programs involving willing patients. The participants’ reports of their experiences during the sessions were obtained using structured questionnaires and unstructured interviews. For research purposes, we conducted a retrospective chart review for therapeutic sessions between February 2023 and February 2024. This review was approved by the Institutional Review Board of Kangdong Sacred Heart Hospital (IRB no. 2024–02-019) and exempted from requiring patients’ informed consent because it was part of a routine clinical practice.

(c) PAR 3: A mixed-method approach

In this study, we employed a mixed-methods approach to plan evaluation strategies by combining a quantitative online survey with focus group interviews. The primary aim of this study is to ensure that the intervention developed in Step 4 can be adopted and maintained over time. For this purpose, we are gathering feedback regarding the initial interventions from clinic staff, consisting of nurses and psychiatrists. This PAR study is currently ongoing and will last for four months. At the environmental level of the organization, the process will be managed to best leverage the intervention in primary care settings. This study was approved by the Institutional Review Board of Kangdong Sacred Heart Hospital (IRB no. 2023–12-002). PAR 2 and PAR 3 are currently being conducted; the results of those studies will be available after their completion.

This section focuses on the explanation of outputs obtained through the IM protocol. The details of the theoretical and empirical bases, designed frameworks, and strategies for the implementation and evaluation of the program are categorized into six steps:

Step 1. Needs and Logic for the Program

For the first step, we identified the target group and analyzed their determinants. This step included determining the rationale and need for a new art-based ER intervention for depression. The target population comprised patients diagnosed with major depressive disorder (MDD). Predefined behaviors targeted were core symptoms of major depression, namely, consistent depressed mood and anhedonia [ 6 ].

Theoretical evidence

Prior research has highlighted difficulties in ER contributing to the etiology and maintenance of numerous psychiatric symptoms, such as depression, chronic anxiety, post-traumatic stress disorder, eating disorders, and worry [ 15 , 34 , 35 , 36 , 37 , 38 , 39 , 40 ]. In particular, research on depression has emphasized that apparent failure to modulate emotions is a hallmark of this disorder [ 6 ] and has attempted to link it to emotional abnormalities in depression [ 10 , 11 ]. ER, which influences the onset, magnitude, and duration of emotional response [ 41 ], is a distinct and differentiated higher-order construct from emotion itself (i.e., fear, anxiety, and depression) at different levels of analysis (e.g., behavioral or neural) [ 42 , 43 ]. From this perspective, ER is an important determinant affecting lower-order factor variability, whereas emotion determines variance downwards in the lower-order indicators [ 42 ].

A literature review revealed that ER difficulties play a role in understanding psychological health in major depression. This suggests the importance of altering problematic patterns of emotional reactivity in depression and identifies emotion dysregulation as a determinant of the predefined target behaviors [ 17 , 44 , 45 , 46 , 47 ]. According to imaging studies utilizing functional magnetic resonance imaging (fMRI), functional abnormalities in specific neural systems support the processing of emotion and ER in patients with depressive disorders [ 6 ]. Moreover, decades of empirical evidence supports the notion that depressive symptoms, characterized by consistently elevated depressed mood and relatively low positive mood, are associated with difficulties in ER [ 9 , 10 , 16 ]. Our review allowed us to analyze and specify the determinants of depressive symptoms (Fig.  3 ). Without this analysis, it would be challenging for psychological treatments to address emotion dysregulation in MDD.

figure 3

Summary of the determinants influencing symptoms of major depression

Needs assessment for a new intervention

Although emotion dysregulation is a critical target in psychological treatments, intervention research examining ER is limited [ 18 , 48 ]. Psychotherapeutic approaches, including cognitive-behavioral and acceptance-based behavioral treatments, have positive effects on overall ER, and studies suggest that these improvements may mediate further improvements for psychiatric outcomes [ 18 , 48 ]: examples include cognitive behavioral therapy approaches (CBT) [ 49 , 50 ], acceptance and commitment therapy (ACT) [ 51 ], dialectical behavioral therapy (DBT) [ 52 , 53 ], and acceptance-based behavioral therapy (ABBT) [ 54 ]. However, most research assessing treatment efficacy precludes making any decisions about clinical mechanisms essential for improving ER. This is because they examine the impact of non-ER-focused interventions or interventions to target ER as part of a comprehensive program [ 18 , 48 ]. Due to the multi-component nature of the interventions, the specific components contributing to changes in ER remains unclear and whether the changes underlie improvements in other distressing symptoms has not yet been clarified. Thus, efforts to identify and inform the development of interventions leading to adaptive ER based on these studies are limited.

At present, patients who have distress disorders, such as generalized anxiety disorder (GAD), MDD, and particularly GAD diagnosed along with comorbid depression, often fail to respond well or experience sufficient gains from treatments: however, the reason for their lack of response is unknown [ 17 , 55 ]. Between 50 and 80% of patients receiving interventions for emotional disorders achieve the status of “responder.” [ 17 ] Between 50 and 60% of GAD patients showed meaningful improvement in response to treatment with traditional CBT [ 55 ]. While ER-focused interventions, such as the Unified Protocol (UP) [ 56 ], Emotion Regulation Group Therapy (ERGT) [ 57 ], and enhanced CBT emphasizing ER [ 58 ] were found to be effective in improving ER, research investigating these remains limited [ 18 , 59 , 60 ]. No substantial changes were found in the essential dimensions of ER after the application of several ER-focused interventions, implying that these were not present in a sufficient dose to promote ER [ 53 , 61 , 62 ]. Further, recent research identifying treatment response predictors for ERGT showed relatively few significant predictors [ 63 ]. In particular, the findings from a study that examined a treatment designed to enhance inpatient CBT for depression suggest that the addition of ER skills to CBT may not sufficiently change ER, although improvements were noted in ER strategies and depressive symptoms [ 58 ]. Another problem arises from the manualized CBT protocols, which are distinct and complex to use [ 17 , 64 ]. These protocols make it difficult to access and use CBT.

The limitations of the current interventions suggest the need for developing an ER-specific treatment. Designing more effective and targeted interventions requires a specific understanding of affective science to provide a broad framework for ER treatments. For example, recently, it has been identified that emotions can be generated and regulated not only through a top-down process but also through a bottom-up process: [ 65 ] current models of emotion generation and its regulation are based on these two processes, which are opposed but interactive [ 66 ]. The top-down mechanism is based on a view that focuses on cognition, where either individuals’ goal states or cognitive evaluations are thought to influence the variations in their emotional responses [ 67 ]. These processes are mapped to prefrontal cortical areas. Meanwhile, bottom-up mechanisms refer to processes based on a stimuli-focused view: in this mode of processing, emotions are mostly elicited by perceptions [ 68 ]. In everyday life, emotion can be processed through interactions between the bottom-up and top-down mechanisms [ 69 ].

Most research to date, however, has focused on top-down ER strategies, and few studies have focused on bottom-up regulation procedures [ 65 ]. In particular, CBT-based treatments, which are mainstream psychotherapies, focus on instruction in an array of cognitive means of coping with emotions; CBT traditionally tends to deal more directly with cognitive rather than emotional processes. One top-down strategy is cognitive reappraisal, an active component of most CBT-based treatments [ 70 ]. However, studies suggest that relying primarily on this strategy may be less effective for certain disorders, including depression, than treatments employing a flexible approach [ 65 ]. Such an approach would be straightforward and essential for researchers as they synthesize different research results, such as findings concerning bottom-up ER and its clinical implications for the investigation of interventions.

One intervention approach to bottom-up experiential ER is art psychotherapy. This type of treatment, which targets emotion dysregulation, may hold promise for improving ER in cases of depression. Patients with depression can benefit from experiential ER that emphasizes bottom-up means of coping with their emotional experiences over the course of art-based ER intervention. This perspective is supported by behavioral and neurocognitive findings indicating difficulties in top-down regulatory processes in individuals with depression [ 71 , 72 , 73 , 74 ]. Research examining neural activities between individuals with and without depression indicated different patterns between them: when downregulating negative emotions, individuals with depression show bilateral prefrontal cortex (PFC) activation, whereas individuals without depression show left-lateralized activation [ 74 ]. When given an effortful reappraisal task, moreover, the relationship patterns of individuals with depression between activation in the left ventrolateral PFC and the amygdala are different from those of individuals without depression. These findings indicate that the pathophysiology of depression underlies struggles of downregulation [ 74 ].

Thus, it is vital to design a new intervention for depression that focuses not only on top-down ER but also on bottom-up ER. In particular, this study examines art-based ER in the form of a client-centered and experiential psychotherapeutic approach allowing patients to attempt top-down and bottom-up regulation. While pursuing active engagement in art-based ER practices, patients can process their emotional experiences in a way that produces greater fine-tuning and depth. Art-based treatment is open and non-interventional as well as less demanding cognitively, enabling it to reach a diverse population with depressive symptoms. More promisingly, art-based ER primarily deals with visible and tangible works leading to visual representations. Emotional memory is perceptual [ 75 ], implying that art-based practices can influence its retrieval and manipulative process: the artworks that patients make in treatments are visual representations that are identical or similar to their emotional experiences. Importantly, creation involves colors, images, and spaces acting as new stimuli, allowing patients to manipulate and generate new emotions through a bottom-up process. As processes of emotion generation interact with those of ER [ 67 ], an art-based experiential approach can facilitate adaptive ER, potentially benefiting individuals who have emotional dysfunction.

However, few studies have explored ER in depression within the field of art psychotherapy [ 76 ]. The therapeutic strategies applied in relevant studies [ 77 , 78 , 79 ] are not explicitly identified or targeted with respect to the mechanisms of ER. For instance, earlier literature tested the effects of art therapy on ER in psychiatric disorders; most of these approaches focused on improving psychopathological symptoms related to specific disorders and considered ER to be a secondary therapeutic outcome. Thus, we identified a need to develop an effective art-based intervention specifically targeting emotion dysregulation in major depression.

Step 2: Formulation of change objectives

The second step required the specification of intervention goals, which involved moving from understanding what influences depressive symptoms, especially in terms of emotional abnormalities in depression, to clarifying what needs to be changed. Based on the needs assessment, the overall expected outcome was “a decrease in depressive symptoms and an improvement in ER.” In this process, the analysis of the determinants in Step 1 resulted in selecting key determinants to target, which were provided by a comprehensive review of the empirical literature and research evidence. It is difficult to understand generative and regulatory emotion processes that are enacted internally without the instigation of extrinsic stimuli [ 80 ]. Thus, it can be challenging to identify the right determinants to target and design an effective treatment that addresses problems related to ER. Based on our review, we determined and chose four important and changeable determinants and further divided them into five key determinants (see Table  1 ).

To apply IM, the construction of matrices of change consisting of performance and change objectives forms the basis for program development [ 20 , 81 ]. Overall, the program objectives were subdivided into performance objectives expected to be accomplished by the target group in the proposed intervention. While drawing on the key determinants and performance objectives, more general objectives, namely, change objectives, were formulated. The result of Step 2 is this change matrix, which further forms the basic factors for designing the intervention for major depression.

Step 3: Theory- and evidence-based strategies selection

In IM, Step 3 entails selecting theoretically grounded and evidence-based methods and strategies. For this process, we first conducted a comprehensive review of theories and empirical studies for therapeutic strategies, including the following characteristics: (i) they need to be confirmed as an efficient ER strategy based on empirical research evidence; (ii) they need to be effective not only in decreasing depressive symptoms but also in improving ER capacities of patients; and (iii) they can be translated into art-based practices. In iterations of reviewing theories related to and research evidence with regard to emotion regulatory strategies, we identified appropriate, theoretically sound therapeutic strategies for at least one program target.

Once an ER method was selected, we translated this method into art-based emotion regulation (ABER) strategies for practical applications. Practical applications refer to the practical translation of the chosen behavior change methods [ 19 , 20 , 21 , 81 ]. The end product of Step 3 is an initial set of theory- and evidence-based strategies selected and translated to address emotion dysregulation in major depression. Table 2  lists the strategies with supporting evidence and applications: art-based distraction, art-based positive rumination, art-based self-distancing (SD), and art-based acceptance. Based on an integrative view of emotional processing, which posits interactions between top-down and bottom-up systems [ 67 , 69 , 82 , 83 ], these strategies aim to modulate emotions through the use of top-down and bottom-up mechanisms.

In particular, as art-based ER involves visual-spatial processing that could exert influence as new sets of stimuli, this approach could lead to a more experiential bottom-up ER. For instance, distraction and cognitive defusion are usually considered cognitive forms of ER; however, both are translated and applied to art-based strategies. Individuals’ performance in art-based ER would differ from that on a given cognitive task, as their immersion experiences in the artistic and creative process involve the generation of colors, images, and spatial features, which may elicit new bottom-up processing. This may be associated with the superior ER effects of art-based distraction, as shown in some studies that compared the ER effects of artistic activities with those of non-artistic activities, such as completing verbal puzzles [ 98 , 99 , 100 ].

In addition, art-based SD promotes intuitive and experiential ER. Individuals are trained to adopt a self-distanced perspective in some treatments while reflecting on their emotions, such as mindfulness-based stress reduction (MBSR) and ERT. They meditate to take a decentered stance. Art-based SD may help those who have difficulty creating an internal distance. As individuals create visual forms of their inner feelings and thoughts, a spatially generated distance from the artworks representing their experiences allows them to adopt and maintain a more self-distanced perspective. As such, art-based SD is more intuitive but requires less mental energy. Importantly, this art-based experiential distancing may reconstrue individuals’ appraisals by facilitating a bottom-up mechanism.

Step 4: Program development

Step 4 concerns creating an actual program plan, leading to the ABER intervention model proposed in the current study. The intervention's elementary components, organization, and structure were created based on the findings of the preparation steps (Steps 1–3). Once the list of therapeutic strategies and their practical applications was generated, we designed a structured intervention framework that would be feasible and realistic to deliver in primary care settings.

The intervention framework developed in Step 4 is based on the process model of ER [ 7 ], supported by considerable empirical research [ 101 , 102 , 103 ]. Based on the extended model, a series of steps involved in the process of regulation with different ER strategies are considered while designing the conceptual framework. Accordingly, the primary areas of the intervention involve emotion perception, attention, and cognition. We developed specific art-based ER strategies, focusing primarily on antecedent- rather than response-focused regulation. Further, this intervention is meant to complement the process model in a framework that is designed to apply one or more strategies in a single session: this would be ideal for improving ER in real life, as current research on ER has found that people generally try multiple strategies simultaneously [ 104 ], whereas the process model examines a within-situation context, within which a single ER strategy is utilized [ 12 ]. In addition, we find that this treatment will be effective in improving ER as it attempts both top-down and bottom-up ER: actively engaging in artworks through the use of the body, a patient can apply experiential self-focus [ 64 ]. In treatment with art-making, patients can be provided with sufficient time and space to find personal meaning in their experiences and process emotions, which enables them to achieve change.

Table 3 presents an overview of the proposed intervention frameworks. As shown, we designed two frameworks to guide the intervention: an individual framework for short-term intervention and an integrative framework for long-term intervention. Each style of the ABER model draws on a different implementation design to build the framework, and each model has slightly different aims. In Step 4, the advisory board reviewed the draft frameworks, including the determinants, performance and change objectives, and therapeutic strategies. The advisory board acted as a support group throughout the review process, helping tailor the program to the target population. In response to the board’s reviews, supplementary resources were added.

Individual framework

First, a plan for an individual framework was devised that accounted for the scope and phase of a short-term intervention. As shown in Fig.  4 , this framework focuses on producing initial or short-term behavioral changes pertaining to achieving short-term clinical efficacy. That is, the individual model does not aim only at emotional changes in patients, such as increases or decreases in specific emotions. The therapeutic aim is not set in an emotion-specific manner, but in terms of effectiveness, it relates to the use of regulatory strategies [ 105 ]. Accordingly, an expected outcome is to increase the quantity and frequency of adaptive ER strategies. Patients are trained in rudimentary ER skills, including one or several combination ABER strategies, as designed in the previous step. These practices aim to enhance attentional, followed by cognitive control. The expected duration of individual sessions is around 1–1.5 h.

figure 4

Individual intervention model diagram. Noted. The panel shows the individual intervention model in an inpatient setting as an example: each patient (patient i ) has a different time of admission (t 0 ) and inpatient discharge (t d ). Thus, the number of participating patients can differ per session. During the hospital stay, patients are trained in rudimentary emotion regulation (ER) skills, including one or a combination of several art-based ER strategies (aber i ). The application of the therapeutic strategies is flexible: it depends on the patient’s cognitive functions, depressive symptoms, and severity of the symptoms. The time of inpatient discharge (t d ) affects each patient’s treatment duration

Integrative framework

While an individual framework comprises a single phase, an integrative framework includes stepwise sequential phases. In addition to skill development in the individual treatment, three phases of the integrative model are designed to foster adaptive motivational responses and cognitive-behavioral flexibility, which enables patients to achieve greater emotional clarity [ 106 ]. In the integrative treatment, all three phases are performed for 6–12 weeks.

The first phase of the integrative model begins with psychoeducation, in which the patient is taught the concept of ER and the importance of identifying his or her habitual reactions, such as in terms of rumination and dampening [ 91 ], that have characterized his or her life. This therapeutic process is important because ER is an automatic process requiring the consideration of motivation [ 107 ]. Psychoeducation regarding ER and monitoring patients’ responses to emotional experiences precede the skill development procedure. For instance, for patients’ self-monitoring, retrospective self-report questionnaires can capture data on ER skill use. While these methodologies are easy to use and cost-efficient [ 108 ], they are demanding tools for use in capturing natural fluctuating patterns in ER [ 109 ]. As an alternative, ecological momentary assessment can be used in treatment to capture situational context and adaptiveness of the skill use [ 108 ]. In addition to patients’ self-monitoring, a psychotherapist should monitor their emotional responses during and between therapy sessions: psychotherapists function as human raters. Because self-monitoring may not be feasible for all patients, assessing the typical patterns with which patients use maladaptive emotion regulatory strategies is important. Specifically, therapists need to assess a patient’s ER repertoire: the quantity of ER strategies, the frequency of strategy use, and how the patient’s strategy use changes.

The second phase entails adopting and implementing ER strategies with processes resembling those of the individual model. These processes entail the selection and repetition of adaptive strategies. They differ from the individual model in that the duration of Phase II can vary from one patient to another depending on the severity of depressive symptoms and the frequency of maladaptive strategies used. The ER practices delivered in Phase II are art-based tasks through which therapists and patients explore and try adaptive strategies. As shown in Fig.  5 , the intervention program includes four ABER strategies selected and translated in Step 3: art-based distraction, art-based SD, art-based positive rumination, and art-based acceptance. The patients work with therapists in 4–8 1.5-h sessions to engage in art-based practices.

figure 5

Summary steps and components for the integrative intervention model

Finally, the integrative framework includes a third phase for evaluation. While the previous sessions in Phase II focus on skill development, the sessions in Phase III focus on assessing changes in patients. All individual progress in ER is tracked and monitored. In this task, therapists help patients assess changes in their emotion-regulatory skill use and their achievements in terms of self-perception, effectiveness, and adaptiveness. Patients are given opportunities to take a broad view of the changes in their artworks during all treatment phases. Furthermore, patients receive a few tasks as homework to briefly review their strategy use in daily life from the beginning of the treatment until the current moment. The review process helps them assess their progress and supports their strengthening. It takes 6–12 weeks to complete the integrative treatment course, depending on the clinical impression. For instance, the duration of Phase II is expected to take 4–8 weeks, according to the clinical impression. A therapist or clinician renders his or her impression regarding the degree of the patients’ severity of depressive symptoms, use of maladaptive ER strategies, willingness to participate in the intervention, and insight into their treatment.

Step 5: Adoption and implementation

Implementation is an essential aspect of program development. In Step 5 of IM, the focus is on planning the adoption and implementation of the proposed intervention. This process is required at the environmental level [ 21 ] and ensures successful adoption and sustainable use in collaborating organizations. Thus, pilot tests can be conducted to gain practical insights into implementation decisions and refine the intervention. Using a PAR framework, we pilot-tested the individual model to ensure that the intervention is appropriate and helpful for patients. This PAR pilot study was performed to inform future practices while connecting intervention research with actual action in a primary care setting.

The advisory group’s results, which indicated that the intervention needed to be sufficiently pliable to be used in a variety of primary care settings, informed and supported the step for pretesting. Implementation was prepared in a primary care setting, in which the program was pretested with a steering group of psychiatrists, nurses, and an art psychotherapist. Two clinicians were in charge of informing the intervention program and facilitating patient involvement. The therapist, who had received appropriate training and instruction, was responsible for delivering the intervention and supporting all practical aspects of patient engagement. With support from the therapist, the patients were in charge of applying one or a combination of two strategies in therapeutic sessions.

We performed this initial testing in a psychiatric ward in Seoul. Between February 2023 and February 2024, during the first two phases of the pilot testing, approximately 24 sessions were conducted, and 45 inpatients, including 16 patients with depressive disorders, voluntarily participated in the program. At the end of each session, the participants were asked to report their experiences through free narratives and complete a short questionnaire survey (quantitative and free-text comments) that provided additional information regarding their involvement. The mean time expenditure for the patients was 1.1 h (SD: 18.0; range: 0.5–2). Patients’ emotional experiences were reflected in their artworks, and Fig.  6 shows a short overview of their art products. The detailed findings from these pilot trials are outside the scope of the IM protocol and will be available in a future publication.

figure 6

Examples of the art products of the participating patients with depression. Noted. Figure 6 briefly outlines patient engagement through their artworks made during the treatment sessions in the first pilot phase: a shows an artwork a patient made in a treatment session, which applied art-based acceptance; b shows an artwork showing a patient’s reflection on his experience, which applied art-based self-distancing and acceptance; c and d show artworks in which patients apply art-based positive rumination and distraction. Different art materials were provided in each session depending on the ER strategies used. The art-based practices of ER promoted relaxation and expression of the patient’s inner feelings and thoughts

Step 6. Evaluation plan

The sixth step of IM is the planning of evaluation strategies to assess the potential impacts of the proposed intervention [ 20 ]. For this purpose, we designed two phases based on a PAR framework: patient feedback and expert feedback. The rationale for this plan was that comprehensive evaluations could investigate the necessity of refinement and what is needed to produce a more feasible and effective intervention. In particular, we expected that the engagement of patients as well as health professionals in the evaluation process would integrate the organizational perspective into patient-oriented quality improvements. From these two phases, we developed questions and measures for evaluation, conducting preliminary PAR studies to determine the feasibility and efficacy of the complete program. Table 4 presents the evaluation strategies for gaining patient and expert feedback. Meanwhile, Table  5 presents an overview and timeline of PAR 2 and PAR 3.

First, we developed a set of patient-reported outcome measures to obtain patient feedback. Quantitative assessments of treatment satisfaction, perceived helpfulness of treatment, and perceived difficulty were conducted following the end of a therapeutic session. Patient evaluations must be carried out regularly during treatment to assess the efficacy of the integrative model. At the end of the program, unstructured or semi-structured interviews are recommended to explore patients’ experiences of the treatment process. In addition, we planned a two-phase mixed-methods study to obtain feedback from participating healthcare professionals using an online survey and focus group interviews. The assessments included process measures, such as perceived difficulty, program appropriateness, and recommendations for improvements to its implementation on a professional level. A web-based survey was disseminated among clinicians and nurses to assess the feasibility of the intervention. Together, this enabled us to increase the time efficiency and cost-effectiveness of the evaluation process.

Feasibility was assessed in five ways. First, the feasibility with which patients participated in the program was described. In our preliminary study, for instance, we calculated the percentage of patients approached for program participation relative to those who did not. Second, the feasibility of retaining patients in a treatment session was reported. To capture the feasibility of retention in treatment, we calculated the percentage of patients who failed to complete treatment compared with the percentage of those who completed it. Third, the feasibility of administering treatment was measured with a self-reported survey of patients’ perceived difficulty in participation and a survey of healthcare professionals’ perceived difficulty in implementation. To report the feasibility of administering treatment, we calculated the mean hours a patient spent in completing treatment. In addition to feasibility, acceptability was operationalized in three ways: a quantitative self-report survey of patient satisfaction, patient perceptions of helpfulness of treatment, and patient willingness to recommend program participation were used. In our preliminary study, we developed responses for the patient survey and calculated the means and standard deviations for each item.

We received patient feedback in the first two pilot phases (PAR 2), and the results showed that the intervention program was feasible and acceptable for implementation in the primary care setting (the mean scores were as follows: Treatment satisfaction = 4.82, Perceived helpfulness of treatment = 4.57, Perceived difficulty = 4.45). The patients provided further recommendations for improved intervention in free-text comments. In addition to this patient feedback, we began conducting PAR3 in February 2024. The feedback research is being conducted through an online questionnaire that includes multiple-choice questions and open-ended questions, with focus group interviews being conducted virtually through Zoom. The results for PAR 2 and PAR 3 will be reported in separate articles.

In this paper, we proposed conceptual frameworks for an intervention that targets emotion dysregulation in depression. IM was used as the conceptual protocol to develop the intervention. To the best of our knowledge, this is the first art-based ER intervention incorporating previous theories, research evidence, and review data in relation to affective science and intervention research, combining PAR components with IM. We developed the intervention following the rationale and stepwise process of IM, which identifies theory- and evidence-based strategies to address key barriers to ER. In addition, to evaluate the developed intervention, preliminary PAR studies were conducted, including the acceptability of the trials and the ABER intervention to patients; the rate of recruitment, attendance, and attrition; perceived difficulties in intervention implementation; and psychological outcomes. Consequently, the intervention is theoretically underpinned and supported by empirical evidence regarding ER and the results of our pilot studies.

The current study benefits from integrating the PAR approach into the IM framework in two ways. First, using PAR studies in the IM resulted in the cogeneration of knowledge among academic researchers, implementers, and the intended participants. PAR ensured experiential knowledge to deliver content that addressed difficulties in ER in collaborative partnerships. Another contribution was enhancing the feasibility and acceptability of the proposed intervention. In particular, preliminary PAR studies helped investigate whether modifications were needed before the intervention’s adoption. Even though IM is a time-consuming process, the use of PAR made it more cost-effective and time-efficient.

In addition to these strengths, it is crucial to acknowledge and affirm the study’s limitations. First, the current study offers only preliminary evidence for the given conceptual framework. Although the proposed intervention may precisely target emotional dysfunction in depression, such as in the restrictive use of adaptive ER skills with repetitive use of maladaptive strategies, the integrative and individual frameworks of ABER have not been evaluated through randomized clinical trials. As the current study pilot-tested the intervention in an inpatient setting that served an acute, transdiagnostic population, implementers could extend the use of these frameworks by performing a fine-grained analysis of treatment contexts (e.g., by adapting the model for depressed outpatients in primary care). As such, the intervention must be examined and refined on the basis of the results of empirical studies on multidisciplinary design. In addition, this article did not examine the therapists’ capability of delivering treatment, fidelity of implementation, and feasibility of measuring tools. Intervention researchers interested in these variables are encouraged to extend our models by testing the broad contextual variables that influence its process. Similarly, further research is required to investigate standardized forms of assessment in treatment (e.g., a measurable rating scale for patient monitoring) to increase the efficiency of the intervention.

Conclusions

This article proposes empirical and theoretical intervention frameworks that can improve ER in depression. This IM study is unique, as the development process incorporates PAR components. Moreover, the intervention consists of four art-based regulatory strategies that enrich the present literature on intervention research targeting dysfunctional ER in major depression. Our participatory action studies demonstrate that, in a primary care setting, the individual protocol is feasible and acceptable for implementation. This result represents a potential step forward toward filling a gap in current mental health treatments for patients with MDD. Despite the tiresome and time-consuming process of intervention development, the application of IM augmented by PAR is helpful in optimizing chances for an effective behavior change. Further testing is required to assess the impact of the therapeutic program proposed in this study.

Availability of data and materials

The author confirms that the data generated or analysed during this study are included in this published article: however, raw datasets are not publicly available due to local legal restrictions. Since the data being generated by PAR2 and PAR3 are outside the scope of the current intervention mapping study, they are available elsewhere.

Abbreviations

Art-based emotion regulation

Cognitive-behavioral therapy

  • Emotion regulation

Emotion Regulation Group Therapy

Generalized anxiety disorder

  • Intervention mapping

Mindfulness-based stress reduction

Major depressive disorder

  • Participatory action research

The Self-Assessment Manikin

Self-distancing

World Health Organization

Collaborators GBDMD. Global, regional, and national burden of 12 mental disorders in 204 countries and territories, 1990–2019: a systematic analysis for the global burden of Disease Study 2019. Lancet Psychiatry. 2022;9(2):137–50.

Article   Google Scholar  

World Health Organization. Depressive disorder(depression) . http://www.who.int/news-room/fact-sheets/detail/depression . Accessed 4th March 2024.

Malhi GS, Mann JJ. Depression Lancet. 2018;392(10161):2299–312.

Article   PubMed   Google Scholar  

Rottenberg J. Emotions in depression: What do we really know? Annu Rev Clin Psychol. 2017May;8(13):241–63.

Thompson RJ, Boden MT, Gotlib IH. Emotional variability and clarity in depression and social anxiety. Cogn Emot. 2017Jan 2;31(1):98–108.

American Psychiatric Association. Diagnostic and statistical manual of mental disorders (5th ed.). 2013. https://doi.org/10.1176/appi.books.9780890425596 .

Gross JJ. The emerging field of emotion regulation: An integrative review. Rev Gen Psychol. 1998Sep;2(3):271–99.

Gross JJ. Emotion regulation: Conceptual and empirical foundations. Handbook of emotion regulation. 2014;2:3–20.

Google Scholar  

Khazanov GK, Ruscio AM. Is low positive emotionality a specific risk factor for depression? A meta-analysis of longitudinal studies. Psychol Bull. 2016Sep;142(9):991.

Article   PubMed   PubMed Central   Google Scholar  

Vanderlind WM, Everaert J, Joormann J. Positive emotion in daily life: Emotion regulation and depression. Emotion. 2022Oct;22(7):1614.

Liu DY, Thompson RJ. Selection and implementation of emotion regulation strategies in major depressive disorder: An integrative review. Clin Psychol Rev. 2017Nov;1(57):183–94.

Southward MW, Sauer-Zavala S, Cheavens JS. Specifying the mechanisms and targets of emotion regulation: A translational framework from affective science to psychological treatment. Clin Psychol Sci Pract. 2021Jun;28(2):168.

Berenbaum H, Raghavan C, Le HN, Vernon LL, Gomez JJ. A taxonomy of emotional disturbances. Clin Psychol Sci Pract. 2003;10(2):206.

Jazaieri H, Urry HL, Gross JJ. Affective disturbance and psychopathology: An emotion regulation perspective. J Exp Psychopathol. 2013Dec;4(5):584–99.

Gross JJ, Jazaieri H. Emotion, emotion regulation, and psychopathology: An affective science perspective. Clinical psychological science. 2014Jul;2(4):387–401.

Joormann J, Stanton CH. Examining emotion regulation in depression: A review and future directions. Behav Res Ther. 2016Nov;1(86):35–49.

Barlow DH, Allen LB, Choate ML. Toward a unified treatment for emotional disorders. In The Neurotic Paradox, Volume 1. New York: Routledge; 2020. p. 141–166.

Gratz KL, Weiss NH, Tull MT. Examining emotion regulation as an outcome, mechanism, or target of psychological treatments. Curr Opin Psychol. 2015Jun;1(3):85–90.

Stutterheim SE, van der Kooij YL, Crutzen R, Ruiter RAC, Bos AER, Kok G. (2023). Intervention mapping as a guide to developing, implementing, and evaluating stigma reduction interventions. Stigma and Health. Advance online publication. https://doi.org/10.1037/sah0000445 .

Eldredge LK, Markham CM, Ruiter RA, Fernández ME, Kok G, Parcel GS. Planning health promotion programs: an intervention mapping approach. New York: Wiley; 2016.

Kok G, Bartholomew LK, Parcel GS, Gottlieb NH, Fernández ME. Finding theory-and evidence-based alternatives to fear appeals: Intervention Mapping. Int J Psychol. 2014Apr;49(2):98–107.

Kok G, Peters LW, Ruiter RA. Planning theory-and evidence-based behavior change interventions: a conceptual review of the intervention mapping protocol. Psicologia: Reflexão e Crítica. 2018;30.  https://www.scielo.br/j/prc/a/TkCBLGGQb7JRQRBbrpnjxgf/?lang=en&format=html .

Drozd F, Haga SM, Brendryen H, Slinning K. An internet-based intervention (Mamma Mia) for postpartum depression: mapping the development from theory to practice. JMIR research protocols. 2015;4(4): e4858.

Kim SS, Gil M, Kim D. Development of an online-coaching blended couple-oriented intervention for preventing depression in middle adulthood: An intervention mapping study. Front Public Health. 2022;10: 882576.

Wisenthal A, Krupa T. Using intervention mapping to deconstruct cognitive work hardening: a return-to-work intervention for people with depression. BMC Health Serv Res. 2014;14:1–11.

Aalbers S, Vink A, Freeman RE, Pattiselanno K, Spreen M, van Hooren S. Development of an improvisational music therapy intervention for young adults with depressive symptoms: An intervention mapping study. Arts Psychother. 2019;65: 101584.

van Venrooij I, Spijker J, Westerhof GJ, Leontjevas R, Gerritsen DL. Applying intervention mapping to improve the applicability of precious memories, an intervention for depressive symptoms in nursing home residents. Int J Environ Res Public Health. 2019;16(24):5163.

Roozen S, Stutterheim SE, Bos AE, Kok G, Curfs LM. Understanding the social stigma of fetal alcohol spectrum disorders: from theory to interventions. Found Sci. 2020May;29:1–9.

Bowling A. Research methods in health: investigating health and health services. UK: McGraw-hill education; 2014.

White MA, Verhoef MJ. Toward a patient-centered approach: incorporating principles of participatory action research into clinical studies. Integr Cancer Ther. 2005Mar;4(1):21–4.

Haesebaert J, Samson I, Lee-Gosselin H, Guay-Bélanger S, Proteau JF, Drouin G, Guimont C, Vigneault L, Poirier A, Sanon PN, Roch G. “They heard our voice!” patient engagement councils in community-based primary care practices: a participatory action research pilot study. Research involvement and engagement. 2020Dec;6:1–4.

Koskinas E, Gilfoyle M, Salsberg J. Exploring how patients, carers and members of the public are recruited to advisory boards, groups and panels as partners in public and patient involved health research: a scoping review protocol. BMJ Open. 2022Apr 1;12(4): e059048.

Dold CJ, Chapman RA. Hearing a voice: Results of a participatory action research study. J Child Fam Stud. 2012Jun;21:512–9.

Aldao A, Gee DG, De Los Reyes A, Seager I. Emotion regulation as a transdiagnostic factor in the development of internalizing and externalizing psychopathology: Current and future directions. Dev Psychopathol. 2016;28(4pt1):927–46.

Garnefski N, Kraaij V. Relationships between cognitive emotion regulation strategies and depressive symptoms: A comparative study of five specific samples. Personality Individ Differ. 2006Jun 1;40(8):1659–69.

Gross JJ, Muñoz RF. Emotion regulation and mental health. Clin Psychol Sci Pract. 1995;2(2):151.

Mennin DS, Holaway RM, Fresco DM, Moore MT, Heimberg RG. Delineating components of emotion and its dysregulation in anxiety and mood psychopathology. Behav Ther. 2007Sep 1;38(3):284–302.

Berking M, Wupperman P. Emotion regulation and mental health: recent findings, current challenges, and future directions. Curr Opin Psychiatry. 2012Mar 1;25(2):128–34.

Cloitre M, Khan C, Mackintosh MA, Garvert DW, Henn-Haase CM, Falvey EC, Saito J. Emotion regulation mediates the relationship between ACES and physical and mental health. Psychol Trauma Theory Res Pract Policy. 2019Jan;11(1):82.

Putnam KM, Silk KR. Emotion dysregulation and the development of borderline personality disorder. Dev Psychopathol. 2005Dec;17(4):899–925.

Gross JJ, editor. Handbook of emotion regulation. Hoboken: Guilford publications; 2013.

Cisler JM, Olatunji BO, Feldner MT, Forsyth JP. Emotion regulation and the anxiety disorders: An integrative review. J Psychopathol Behav Assess. 2010Mar;32:68–82.

Zinbarg RE, Barlow DH. Structure of anxiety and the anxiety disorders: a hierarchical model. J Abnorm Psychol. 1996May;105(2):181.

Article   CAS   PubMed   Google Scholar  

Campbell-Sills L, Barlow DH, Brown TA, Hofmann SG. Acceptability and suppression of negative emotion in anxiety and mood disorders. Emotion. 2006Nov;6(4):587.

Campbell-Sills L, Barlow DH, Brown TA, Hofmann SG. Effects of suppression and acceptance on emotional responses of individuals with anxiety and mood disorders. Behav Res Ther. 2006Sep 1;44(9):1251–63.

Kashdan TB, Steger MF. Expanding the topography of social anxiety: An experience-sampling assessment of positive emotions, positive events, and emotion suppression. Psychol Sci. 2006Feb;17(2):120–8.

Kashdan TB, Barrios V, Forsyth JP, Steger MF. Experiential avoidance as a generalized psychological vulnerability: Comparisons with coping and emotion regulation strategies. Behav Res Ther. 2006Sep 1;44(9):1301–20.

Moore R, Gillanders D, Stuart S. The impact of group emotion regulation interventions on emotion regulation ability: A systematic review. J Clin Med. 2022Apr 29;11(9):2519.

Forkmann T, Scherer A, Pawelzik M, Mainz V, Drueke B, Boecker M, Gauggel S. Does cognitive behavior therapy alter emotion regulation in inpatients with a depressive disorder? Psychol Res Behav Manag. 2014May;12:147–53.

Papa A, Boland M, Sewell MT. Emotion regulation and CBT. In: O'Donohue WT, Fisher JE, editors. Cognitive behavior therapy: Core principles for practice. Wiley. 2012. p. 273–323. https://doi.org/10.1002/9781118470886.ch11 .

Hayes SC. Acceptance and commitment therapy, relational frame theory, and the third wave of behavioral and cognitive therapies. Behav Ther. 2004Sep 1;35(4):639–65.

Linehan M. Cognitive-behavioral treatment of borderline personality disorder. Hoboken: Guilford press; 1993.

Neacsiu AD, Eberle JW, Kramer R, Wiesmann T, Linehan MM. Dialectical behavior therapy skills for transdiagnostic emotion dysregulation: A pilot randomized controlled trial. Behav Res Ther. 2014Aug;1(59):40–51.

Roemer L, Orsillo SM, Salters-Pedneault K. Efficacy of an acceptance-based behavior therapy for generalized anxiety disorder: evaluation in a randomized controlled trial. J Consult Clin Psychol. 2008Dec;76(6):1083.

Renna ME, Quintero JM, Fresco DM, Mennin DS. Emotion regulation therapy: a mechanism-targeted treatment for disorders of distress. Front Psychol. 2017Feb;6(8): 211118.

Bullis JR, Sauer-Zavala S, Bentley KH, Thompson-Hollands J, Carl JR, Barlow DH. The unified protocol for transdiagnostic treatment of emotional disorders: Preliminary exploration of effectiveness for group delivery. Behav Modif. 2015Mar;39(2):295–321.

Gratz KL, Gunderson JG. Preliminary data on an acceptance-based emotion regulation group intervention for deliberate self-harm among women with borderline personality disorder. Behav Ther. 2006Mar 1;37(1):25–35.

Berking M, Wupperman P, Reichardt A, Pejic T, Dippel A, Znoj H. Emotion-regulation skills as a treatment target in psychotherapy. Behav Res Ther. 2008Nov 1;46(11):1230–7.

Berking M, Whitley B. Emotion Regulation: Definition and Relevance for Mental Health. In: Affect Regulation Training. New York: Springer; 2014. https://doi.org/10.1007/978-1-4939-1022-9_2 .

Hall K, Simpson A, O’donnell R, Sloan E, Staiger PK, Morton J, Ryan D, Nunn B, Best D, Lubman DI. Emotional dysregulation as a target in the treatment of co‐existing substance use and borderline personality disorders: A pilot study. Clin Psychol. 2018;22(2):112–25.

Safer DL, Jo B. Outcome from a randomized controlled trial of group therapy for binge eating disorder: Comparing dialectical behavior therapy adapted for binge eating to an active comparison group therapy. Behav Ther. 2010;41(1):106–20.

Keuthen NJ, Rothbaum BO, Fama J, Altenburger E, Falkenstein MJ, Sprich SE, Kearns M, Meunier S, Jenike MA, Welch SS. DBT-enhanced cognitive-behavioral treatment for trichotillomania: A randomized controlled trial. J Behav Addict. 2012;1:106–14.

Gratz KL, Dixon-Gordon KL, Tull MT. Predictors of treatment response to an adjunctive emotion regulation group therapy for deliberate self-harm among women with borderline personality disorder. Personal Disord Theory Res Treat. 2014;5:97–107.

Barlow DH, Levitt JT, Bufka LF. The dissemination of empirically supported treatments: a view to the future. Behav Res Ther. 1999Jul;1(37):S147–62.

Wang Y, Vlemincx E, Vantieghem I, Dhar M, Dong D, Vandekerckhove M. Bottom-up and cognitive top-down emotion regulation: Experiential emotion regulation and cognitive reappraisal on stress relief and follow-up sleep physiology. Int J Environ Res Public Health. 2022Jun 22;19(13):7621.

Viviani R. Emotion regulation, attention to emotion, and the ventral attentional network. Front Hum Neurosci. 2013Nov;7(7):746.

PubMed   PubMed Central   Google Scholar  

McRae K, Misra S, Prasad AK, Pereira SC, Gross JJ. Bottom-up and top-down emotion generation: implications for emotion regulation. Social cognitive and affective neuroscience. 2012Mar 1;7(3):253–62.

Seligman ME. Phobias and preparedness. Behav Ther. 1971Jul 1;2(3):307–20.

Ochsner KN, Gross JJ. The neural architecture of emotion regulation. Handbook of emotion regulation. 2007;1(1):87–109.

Smits JA, Julian K, Rosenfield D, Powers MB. Threat reappraisal as a mediator of symptom change in cognitive-behavioral treatment of anxiety disorders: a systematic review. J Consult Clin Psychol. 2012Aug;80(4):624.

Bishop SJ. Trait anxiety and impoverished prefrontal control of attention. Nat Neurosci. 2009Jan;12(1):92–8.

Blair KS, Geraci M, Smith BW, Hollon N, DeVido J, Otero M, Blair JR, Pine DS. Reduced dorsal anterior cingulate cortical activity during emotional regulation and top-down attentional control in generalized social phobia, generalized anxiety disorder, and comorbid generalized social phobia/generalized anxiety disorder. Biol Psychiat. 2012Sep 15;72(6):476–82.

Lyubomirsky S, Kasri F, Zehm K. Dysphoric rumination impairs concentration on academic tasks. Cogn Ther Res. 2003Jun;27:309–30.

Johnstone T, Van Reekum CM, Urry HL, Kalin NH, Davidson RJ. Failure to regulate: counterproductive recruitment of top-down prefrontal-subcortical circuitry in major depression. J Neurosci. 2007Aug 15;27(33):8877–84.

Article   CAS   PubMed   PubMed Central   Google Scholar  

Arntz A, De Groot C, Kindt M. Emotional memory is perceptual. J Behav Ther Exp Psychiatry. 2005Mar 1;36(1):19–34.

Gruber H, Oepen R. Emotion regulation strategies and effects in art-making: A narrative synthesis. Arts Psychother. 2018Jul;1(59):65–74.

Abbing A, Baars EW, De Sonneville L. The effectiveness of art therapy for anxiety in adult women: a randomized controlled trial. Front Psychol. 2019May;29(10): 436010.

Haeyen S, van Hooren S, Dehue F, Hutschemaekers G. Development of an art-therapy intervention for patients with personality disorders: an intervention mapping study. International Journal of Art Therapy. 2018Jul 3;23(3):125–35.

Nan JK, Ho RT. Effects of clay art therapy on adults outpatients with major depressive disorder: A randomized controlled trial. J Affect Disord. 2017Aug;1(217):237–45.

Mennin DS, Fresco DM. Advancing emotion regulation perspectives on psychopathology: The challenge of distress disorders. Psychol Inq. 2015Jan 2;26(1):80–92.

Peters GJ. A practical guide to effective behavior change: how to identify what to change in the first place. European Health Psychologist. 2014Oct 1;16(5):142–55.

Ochsner KN, Bunge SA, Gross JJ, Gabrieli JD. Rethinking feelings: an FMRI study of the cognitive regulation of emotion. J Cogn Neurosci. 2002Nov 15;14(8):1215–29.

Ochsner KN, Gross JJ. The cognitive control of emotion. Trends Cogn Sci. 2005May 1;9(5):242–9.

Thiruchselvam R, Blechert J, Sheppes G, Rydstrom A, Gross JJ. The temporal dynamics of emotion regulation: An EEG study of distraction and reappraisal. Biol Psychol. 2011Apr 1;87(1):84–92.

Forkosh J, Drake JE. Coloring versus drawing: Effects of cognitive demand on mood repair, flow, and enjoyment. Art Ther. 2017Apr 3;34(2):75–82.

Gerin W, Davidson KW, Christenfeld NJ, Goyal T, Schwartz JE. The role of angry rumination and distraction in blood pressure recovery from emotional arousal. Psychosom Med. 2006Jan 1;68(1):64–72.

Urry HL, Gross JJ. Emotion regulation in older age. Curr Dir Psychol Sci. 2010Dec;19(6):352–7.

Fresco DM, Moore MT, van Dulmen MH, Segal ZV, Ma SH, Teasdale JD, Williams JM. Initial psychometric properties of the experiences questionnaire: validation of a self-report measure of decentering. Behav Ther. 2007Sep 1;38(3):234–46.

Kross E, Ayduk O. Facilitating adaptive emotional analysis: Distinguishing distanced-analysis of depressive experiences from immersed-analysis and distraction. Pers Soc Psychol Bull. 2008Jul;34(7):924–38.

Ayduk Ö, Kross E. From a distance: implications of spontaneous self-distancing for adaptive self-reflection. J Pers Soc Psychol. 2010May;98(5):809.

Feldman GC, Joormann J, Johnson SL. Responses to positive affect: A self-report measure of rumination and dampening. Cogn Ther Res. 2008Aug;32:507–25.

Gentzler AL, Ramsey MA, Yuen Yi C, Palmer CA, Morey JN. Young adolescents’ emotional and regulatory responses to positive life events: Investigating temperament, attachment, and event characteristics. J Posit Psychol. 2014Mar 4;9(2):108–21.

Harding KA, Hudson MR, Mezulis A. Cognitive mechanisms linking low trait positive affect to depressive symptoms: A prospective diary study. Cogn Emot. 2014Nov 17;28(8):1502–11.

Mennin DS, Fresco DM. Emotion regulation therapy. Handbook of emotion regulation. 2014;2:469–90.

Levitt JT, Brown TA, Orsillo SM, Barlow DH. The effects of acceptance versus suppression of emotion on subjective and psychophysiological response to carbon dioxide challenge in patients with panic disorder. Behav Ther. 2004Sep 1;35(4):747–66.

Wolgast M, Lundh LG, Viborg G. Cognitive reappraisal and acceptance: An experimental comparison of two emotion regulation strategies. Behav Res Ther. 2011Dec 1;49(12):858–66.

Moon BL. The role of metaphor in art therapy: Theory, method, and experience. Charles C Thomas Publisher; 2007.

Kimport ER, Robbins SJ. Efficacy of creative clay work for reducing negative mood: A randomized controlled trial. Art Ther. 2012Jun 1;29(2):74–9.

Drake JE, Hodge A. Drawing versus writing: The role of preference in regulating short-term affect. Art Ther. 2015Jan 2;32(1):27–33.

De Petrillo L, Winner E. Does Art Improve Mood? A Test of a Key Assumption Underlying Art Therapy. Art Ther. 2005;22(4):205–12. https://doi.org/10.1080/07421656.2005.10129521 .

Gross JJ. Emotion regulation: Current status and future prospects. Psychol Inq. 2015Jan 2;26(1):1–26.

Schutte NS, Manes RR, Malouff JM. Antecedent-focused emotion regulation, response modulation and well-being. Curr Psychol. 2009Mar;28:21–31.

Sheppes G, Gross JJ. Emotion regulation effectiveness: What works when. Handb Psychol. 2012Sep;26(5):391–406.

Aldao A, Nolen-Hoeksema S. One versus many: Capturing the use of multiple emotion regulation strategies in response to an emotion-eliciting stimulus. Cogn Emot. 2013Jun 1;27(4):753–60.

Lee M, Choi H. Art-based emotion regulation in major depression: Framework for intervention. Arts Psychother. 2023Mar;24: 102018.

Gohm CL, Clore GL. Four latent traits of emotional experience and their involvement in well-being, coping, and attributional style. Cogn Emot. 2002Jul 1;16(4):495–518.

Tamir M, Vishkin A, Gutentag T. Emotion regulation is motivated. Emotion. 2020Feb;20(1):115.

Nauphal M, Curreri AJ, Cardona ND, Meyer ER, Southward MW, Sauer-Zavala S. Measuring emotion regulation skill use during treatment: A promising methodological approach. Assessment. 2023Apr;30(3):592–605.

Aldao A, Nolen-Hoeksema S, Schweizer S. Emotion-regulation strategies across psychopathology: A meta-analytic review. Clin Psychol Rev. 2010Mar 1;30(2):217–37.

Bradley MM, Lang PJ. Measuring emotion: the self-assessment manikin and the semantic differential. J Behav Ther Exp Psychiatry. 1994Mar 1;25(1):49–59.

Download references

Acknowledgements

The present researchers express their gratitude to the Kangdong Sacred Heart Hospital for its help and support in this research. Appreciation is also extended to all participating patients, clinicians, health care professionals, and the advisory board in all steps of the development. There are no individuals or funding organizations, other than the co-authors, who contributed directly or indirectly to this article.

Not applicable.

Author information

Authors and affiliations.

Department of Psychiatry, Kangdong Sacred Heart Hospital, 150, Seongan-Ro, Gangdong-Gu, Seoul, Republic of Korea

Myungjoo Lee & Young Tak Jo

Department of Bio-Medical Engineering, Ajou University, 206, World Cup-Ro, Yeongtong-Gu, Gyeonggi-do, Republic of Korea

You can also search for this author in PubMed   Google Scholar

Contributions

ML contributed to plan and design the study with support from the rest of the study team. YT registered the trial. ML collected, and analyzed participant data. ML drafted and edited the manuscript. All authors reviewed and/or approved the final manuscript for submission.

Corresponding author

Correspondence to Young Tak Jo .

Ethics declarations

Ethics approval and consent to participate.

The ethical approvals for the current research were obtained from the Institutional Review Board of Kangdong Sacred Heart Hospital, PAR2 (IRB no. 2023–12-002), and PAR3 (no. 2024–02-019). In PAR2, informed consent was exempted due to its retrospective nature. Nevertheless, all patients who participated in the therapeutic sessions were requested to sign a consent form for later use of their artwork for educational and research purposes. In PAR 3, informed consent was obtained from the healthcare professionals, including physicians and nurses, who were involved in the program.

Consent for publication

Competing interests.

The authors declare no competing interests.

Additional information

Publisher’s note.

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/ .

Reprints and permissions

About this article

Cite this article.

Lee, M., Choi, H. & Jo, Y.T. Targeting emotion dysregulation in depression: an intervention mapping protocol augmented by participatory action research. BMC Psychiatry 24 , 595 (2024). https://doi.org/10.1186/s12888-024-06045-y

Download citation

Received : 05 April 2024

Accepted : 23 August 2024

Published : 04 September 2024

DOI : https://doi.org/10.1186/s12888-024-06045-y

Share this article

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

  • Psychotherapy

BMC Psychiatry

ISSN: 1471-244X

program evaluation methodology sample

U.S. flag

Official websites use .gov

A .gov website belongs to an official government organization in the United States.

Secure .gov websites use HTTPS

A lock ( ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.

Step 3: Focus the Evaluation Design

  • Introduction to Program Evaluation for Public Health Programs: A Self-Study Guide

‹ View Table of Contents

Types of Evaluations

Exhibit 3.1, exhibit 3.2, determining the evaluation focus.

  • Are You Ready to Evaluate Outcomes?

Illustrating Evaluation Focus Decisions

Defining the specific evaluation questions, deciding on the evaluation design, standards for step 3: focus the evaluation design, checklist for step 3: focusing the evaluation design.

  • Worksheet 3A - Focusing the Evaluation in the Logic Model
  • Worksheet 3B - “Reality Checking” the Evaluation Focus

After completing Steps 1 and 2, you and your stakeholders should have a clear understanding of the program and have reached consensus. Now your evaluation team will need to focus the evaluation. This includes determining the most important evaluation questions and the appropriate design for the evaluation. Focusing the evaluation assumes that the entire program does not need to be evaluated at any point in time. Rather, the right evaluation of the program depends on what question is being asked, who is asking the question, and what will be done with the information.

Since resources for evaluation are always limited, this chapter provides a series of decision criteria to help you determine the best evaluation focus at any point in time. These criteria are inspired by the evaluation standards: specifically, utility (who will use the results and what information will be most useful to them) and feasibility (how much time and resources are available for the evaluation).

The logic models developed in Step 2set the stage for determining the best evaluation focus. The approach to evaluation focus in the CDC Evaluation Framework differs slightly from traditional evaluation approaches. Rather than a summative evaluation, conducted when the program had run its course and asking “Did the program work?” the CDC framework views evaluation as an ongoing activity over the life of a program that asks,” Is the program working?”

Hence, a program is always ready for some evaluation. Because the logic model displays the program from inputs through activities/outputs through to the sequence of outcomes from short-term to most distal, it can guide a discussion of what you can expect to achieve at a given point in the life of your project. Should you focus on distal outcomes, or only on short- or mid-term ones? Or conversely, does a process evaluation make the most sense right now?

  Top of Page

Many different questions can be part of a program evaluation; depending on how long the program has been in existence, who is asking the question, and why the evaluation information is needed. In general, evaluation questions for an existing program [17] fall into one of the following groups:

Implementation/Process

Implementation evaluations (process evaluations) document whether a program has been implemented as intended—and why or why not? In process evaluations, you might examine whether the activities are taking place, who is conducting the activities, who is reached through the activities, and whether sufficient inputs have been allocated or mobilized. Process evaluation is important to help distinguish the causes of poor program performance—was the program a bad idea, or was it a good idea that could not reach the standard for implementation that you set? In all cases, process evaluations measure whether actual program performance was faithful to the initial plan. Such measurements might include contrasting actual and planned performance along all or some of the following:

  • The locale where services or programs are provided (e.g., rural, urban)
  • The number of people receiving services
  • The economic status and racial/ethnic background of people receiving services
  • The quality of services
  • The actual events that occur while the services are delivered
  • The amount of money the project is using
  • The direct and in-kind funding for services
  • The staffing for services or programs
  • The number of activities and meetings
  • The number of training sessions conducted

When evaluation resources are limited, only the most important issues of implementation can be included. Here are some “usual suspects” that compromise implementation and might be considered for inclusion in the process evaluation focus:

  • Transfers of Accountability: When a program’s activities cannot produce the intended outcomes unless some other person or organization takes appropriate action, there is a transfer of accountability.
  • Dosage: The intended outcomes of program activities (e.g., training, case management, counseling) may presume a threshold level of participation or exposure to the intervention.
  • Access: When intended outcomes require not only an increase in consumer demand but also an increase in supply of services to meet it, then the process evaluation might include measures of access.
  • Staff Competency: The intended outcomes may presume well-designed program activities delivered by staff that are not only technically competent but also matched appropriately to the target audience. Measures of the match of staff and target audience might be included in the process evaluation.

Our childhood lead poisoning logic model illustrates such potential process issues. Reducing EBLL presumes the house will be cleaned, medical care referrals will be fulfilled, and specialty medical care will be provided. These are transfers of accountability beyond the program to the housing authority, the parent, and the provider, respectively. For provider training to achieve its outcomes, it may presume completion of a three-session curriculum, which is a dosage issue. Case management results in medical referrals, but it presumes adequate access to specialty medical providers. And because lead poisoning tends to disproportionately affect children in low-income urban neighborhoods, many program activities presume cultural competence of the caregiving staff. Each of these components might be included in a process evaluation of a childhood lead poisoning prevention program.

Effectiveness/Outcome

Outcome evaluations assess progress on the sequence of outcomes the program is to address. Programs often describe this sequence using terms like short-term, intermediate, and long-term outcomes, or proximal (close to the intervention) or distal (distant from the intervention). Depending on the stage of development of the program and the purpose of the evaluation, outcome evaluations may include any or all of the outcomes in the sequence, including

  • Changes in people’s attitudes and beliefs
  • Changes in risk or protective behaviors
  • Changes in the environment, including public and private policies, formal and informal enforcement of regulations, and influence of social norms and other societal forces
  • Changes in trends in morbidity and mortality

While process and outcome evaluations are the most common, there are several other types of evaluation questions that are central to a specific program evaluation. These include the following:

  • Efficiency: Are your program’s activities being produced with minimal use of resources such as budget and staff time? What is the volume of outputs produced by the resources devoted to your program?
  • Cost-Effectiveness: Does the value or benefit of your program’s outcomes exceed the cost of producing them?
  • Attribution: Can the outcomes be related to your program, as opposed to other things going on at the same time?

All of these types of evaluation questions relate to part, but not all, of the logic model. Exhibits 3.1 and 3.2 show where in the logic model each type of evaluation would focus. Implementation evaluations would focus on the inputs, activities, and outputs boxes and not be concerned with performance on outcomes. Effectiveness evaluations would do the opposite—focusing on some or all outcome boxes , but not necessarily on the activities that produced them. Efficiency evaluations care about the arrows linking inputs to activities/outputs —how much output is produced for a given level of inputs/resources. Attribution would focus on the arrows between specific activities/outputs and specific outcomes —whether progress on the outcome is related to the specific activity/output.

Determining the correct evaluation focus is a case-by-case decision. Several guidelines inspired by the “utility” and “feasibility” evaluation standards can help determine the best focus.

Utility Considerations

1) what is the purpose of the evaluation.

Purpose refers to the general intent of the evaluation. A clear purpose serves as the basis for the evaluation questions, design, and methods. Some common purposes:

  • Gain new knowledge about program activities
  • Improve or fine-tune existing program operations (e.g., program processes or strategies)
  • Determine the effects of a program by providing evidence concerning the program’s contributions to a long-term goal
  • Affect program participants by acting as a catalyst for self-directed change (e.g., teaching)

2) Who will use the evaluation results?

Users are the individuals or organizations that will employ the evaluation findings. The users will likely have been identified during Step 1 in the process of engaging stakeholders. In this step, you need to secure their input into the design of the evaluation and the selection of evaluation questions. Support from the intended users will increase the likelihood that the evaluation results will be used for program improvement.

3) How will they use the evaluation results?

Many insights on use will have been identified in Step 1. Information collected may have varying uses, which should be described in detail when designing the evaluation. Some examples of uses of evaluation information:

  • To document the level of success in achieving objectives
  • To identify areas of the program that need improvement
  • To decide how to allocate resources
  • To mobilize community support
  • To redistribute or expand the locations where the intervention is carried out
  • To improve the content of the program’s materials
  • To focus program resources on a specific population
  • To solicit more funds or additional partners

4) What do other key stakeholders need from the evaluation?

Of course, the most important stakeholders are those who request or who will use the evaluation results. Nevertheless, in Step 1, you may also have identified stakeholders who, while not using the findings of the current evaluation, have key questions that may need to be addressed in the evaluation to keep them engaged. For example, a particular stakeholder may always be concerned about costs, disparities, or attribution. If so, you may need to add those questions to your evaluation focus.

Feasibility Considerations

The first four questions help identify the most useful focus of the evaluation, but you must also determine whether it is a realistic/feasible one. Three questions provide a reality check on your desired focus:

5) What is the stage of development of the program?

During Step 2, you will have identified the program’s stage of development. There are roughly three stages in program development –planning, implementation, and maintenance — that suggest different focuses. In the planning stage, a truly formative evaluation—who is your target, how do you reach them, how much will it cost—may be the most appropriate focus. An evaluation that included outcomes would make little sense at this stage. Conversely, an evaluation of a program in maintenance stage would need to include some measurement of progress on outcomes, even if it also included measurement of implementation.

Here are some handy rules to decide whether it is time to shift the evaluation focus toward an emphasis on program outcomes:

  • Sustainability: Political and financial will exists to sustain the intervention while the evaluation is conducted.
  • Fidelity: Actual intervention implementation matches intended implementation. Erratic implementation makes it difficult to know what “version” of the intervention was implemented and, therefore, which version produced the outcomes.
  • Stability: Intervention is not likely to change during the evaluation. Changes to the intervention over time will confound understanding of which aspects of the intervention caused the outcomes.
  • Reach: Intervention reaches a sufficiently large number of clients (sample size) to employ the proposed data analysis. For example, the number of clients needed may vary with the magnitude of the change expected in the variables of interest (i.e., effect size) and the power needed for statistical purposes.
  • Dosage: Clients have sufficient exposure to the intervention to result in the intended outcomes. Interventions with limited client contact are less likely to result in measurable outcomes, compared to interventions that provide more in-depth intervention.

6) How intensive is the program?

Some programs are wide-ranging and multifaceted. Others may use only one approach to address a large problem. Some programs provide extensive exposure (“dose”) of the program, while others involve participants quickly and superficially. Simple or superficial programs, while potentially useful, cannot realistically be expected to make significant contributions to distal outcomes of a larger program, even when they are fully operational.

7) What are relevant resource and logistical considerations?

Resources and logistics may influence decisions about evaluation focus. Some outcomes are quicker, easier, and cheaper to measure, while others may not be measurable at all. These facts may tilt the decision about evaluation focus toward some outcomes as opposed to others.

Early identification of inconsistencies between utility and feasibility is an important part of the evaluation focus step. But we must also ensure a “meeting of the minds” on what is a realistic focus for program evaluation at any point in time.

The affordable housing example shows how the desired focus might be constrained by reality. The elaborated logic model was important in this case. It clarified that, while program staff were focused on production of new houses, important stakeholders like community-based organizations and faith-based donors were committed to more distal outcomes such as changes in life outcomes of families, or on the outcomes of outside investment in the community. The model led to a discussion of reasonable expectations and, in the end, to expanded evaluation indicators that included some of the more distal outcomes, that led to stakeholders’ greater appreciation of the intermediate milestones on the way to their preferred outcomes.

Because the appropriate evaluation focus is case-specific, let’s apply these focus issues to a few different evaluation scenarios for the CLPP program.

At the 1-year mark, a neighboring community would like to adopt your program but wonders, “What are we in for?” Here you might determine that questions of efficiency and implementation are central to the evaluation. You would likely conclude this is a realistic focus, given the stage of development and the intensity of the program. Questions about outcomes would be premature.

At the 5-year mark, the auditing branch of your government funder wants to know, “Did you spend our money well?” Clearly, this requires a much more comprehensive evaluation, and would entail consideration of efficiency, effectiveness, possibly implementation, and cost-effectiveness. It is not clear, without more discussion with the stakeholder, whether research studies to determine causal attribution are also implied. Is this a realistic focus? At year 5, probably yes. The program is a significant investment in resources and has been in existence for enough time to expect some more distal outcomes to have occurred.

Note that in either scenario, you must also consider questions of interest to key stakeholders who are not necessarily intended users of the results of the current evaluation. Here those would be advocates, concerned that families not be blamed for lead poisoning in their children, and housing authority staff, concerned that amelioration include estimates of costs and identification of less costly methods of lead reduction in homes. By year 5, these look like reasonable questions to include in the evaluation focus. At year 1, stakeholders might need assurance that you care about their questions, even if you cannot address them yet.

These focus criteria identify the components of the logic model to be included in the evaluation focus, i.e., these activities, but not these; these outcomes, but not these. At this point, you convert the components of your focus into specific questions, i.e., implementation, effectiveness, efficiency, and attribution. Were my activities implemented as planned? Did my intended outcomes occur? Were the outcomes due to my activities as opposed to something else? If the outcomes occurred at some but not all sites, what barriers existed at less successful locations and what factors were related to success? At what cost were my activities implemented and my outcomes achieved?

Besides determining the evaluation focus and specific evaluation questions, at this point you also need to determine the appropriate evaluation design. Of chief interest in choosing the evaluation design is whether you are being asked to monitor progress on outcomes or whether you are also asked to show attribution—that progress on outcomes is related to your program efforts. Attribution questions may more appropriately be viewed as research as opposed to program evaluation, depending on the level of scrutiny with which they are being asked.

Three general types of research designs are commonly recognized: experimental, quasi-experimental, and non-experimental/observational. Traditional program evaluation typically uses the third type, but all three are presented here because, over the life of the program, traditional evaluation approaches may need to be supplemented with other studies that look more like research.

Experimental designs use random assignment to compare the outcome of an intervention on one or more groups with an equivalent group or groups that did not receive the intervention. For example, you could select a group of similar schools, and then randomly assign some schools to receive a prevention curriculum and other schools to serve as controls. All schools have the same chance of being selected as an intervention or control school. Random assignment, reduces the chances that the control and intervention schools vary in any way that could influence differences in program outcomes. This allows you to attribute change in outcomes to your program. For example, if the students in the intervention schools delayed onset or risk behavior longer than students in the control schools, you could attribute the success to your program. However, in community settings it is hard, or sometimes even unethical, to have a true control group.

While there are some solutions that preserve the integrity of experimental design, another option is to use a quasi-experimental design . These designs make comparisons between nonequivalent groups and do not involve random assignment to intervention and control groups.

An example would be to assess adults’ beliefs about the harmful outcomes of environmental tobacco smoke (ETS) in two communities, then conduct a media campaign in one of the communities. After the campaign, you would reassess the adults and expect to find a higher percentage of adults believing ETS is harmful in the community that received the media campaign. Critics could argue that other differences between the two communities caused the changes in beliefs, so it is important to document that the intervention and comparison groups are similar on key factors such as population demographics and related current or historical events.

Related to quasi-experimental design , comparison of outcomes/outcome data among states and between one state and the nation as a whole are common ways to evaluate public health efforts. Such comparisons will help you establish meaningful benchmarks for progress. States can compare their progress with that of states with a similar investment in their area of public health, or they can contrast their outcomes with the results to expect if their programs were similar to those of states with a larger investment.

Comparison data are also useful for measuring indicators in anticipation of new or expanding programs. For example, noting a lack of change in key indicators over time prior to program implementation helps demonstrate the need for your program and highlights the comparative progress of states with comprehensive public health programs already in place. A lack of change in indicators can be useful as a justification for greater investment in evidence-based, well-funded, and more comprehensive programs. Between-state comparisons can be highlighted with time–series analyses. For example, questions on many of the larger national surveillance systems have not changed in several years, so you can make comparisons with other states over time, using specific indicators. Collaborate with state epidemiologists, surveillance coordinators, and statisticians to make state and national comparisons an important component of your evaluation.

Observational designs include, but are not limited to, time–series analysis, cross-sectional surveys, and case studies. Periodic cross-sectional surveys (e.g.., the YTS or BRFSS) can inform your evaluation. Case studies may be particularly appropriate for assessing changes in public health capacity in disparate population groups. Case studies are applicable when the program is unique, when an existing program is used in a different setting, when a unique outcome is being assessed, or when an environment is especially unpredictable. Case studies can also allow for an exploration of community characteristics and how these may influence program implementation, as well as identifying barriers to and facilitators of change.

This issue of “causal attribution,” while often a central research question, may or may not need to supplement traditional program evaluation. The field of public health is under increasing pressure to demonstrate that programs are worthwhile, effective, and efficient. During the last two decades, knowledge and understanding about how to evaluate complex programs have increased significantly. Nevertheless, because programs are so complex, these traditional research designs described here may not be a good choice. As the World Health Organization notes, “the use of randomized control trials to evaluate health promotion initiatives is, in most cases, inappropriate, misleading, and unnecessarily expensive.” [18]

Consider the appropriateness and feasibility of less traditional designs (e.g., simple before–after [pretest–posttest] or posttest-only designs). Depending on your program’s objectives and the intended use(s) for the evaluation findings, these designs may be more suitable for measuring progress toward achieving program goals. Even when there is a need to prove that the program was responsible for progress on outcomes, traditional research designs may not be the only or best alternative. Depending on how rigorous the proof needs to be, proximity in time between program implementation and progress on outcomes, or systematic elimination of alternative explanations may be enough to persuade key stakeholders that the program is making a contribution. While these design alternatives often cost less and require less time, keep in mind that saving time and money should not be the main criteria selecting an evaluation design. It is important to choose a design that will measure what you need to measure and that will meet both your immediate and long-term needs.

Another alternative to experimental and quasi-experimental models is a goal-based evaluation model, that uses predetermined program goals and the underlying program theory as the standards for evaluation, thus holding the program accountable to prior expectations. The CDC Framework’s emphasis on program description and the construction of a logic model sets the stage for strong goal-based evaluations of programs. In such cases, evaluation planning focuses on the activities; outputs; and short-term, intermediate, and long-term outcomes outlined in a program logic model to direct the measurement activities.

The design you select influences the timing of data collection, how you analyze the data, and the types of conclusions you can make from your findings. A collaborative approach to focusing the evaluation provides a practical way to better ensure the appropriateness and utility of your evaluation design.

Standard Questions
Utility
Feasibility
Propriety
Accuracy
  • Define the purpose(s) and user(s) of your evaluation.
  • Identify the use(s) of the evaluation results.
  • Consider stage of development, program intensity, and logistics and resources.
  • Determine the components of your logic model that should be part of the focus given these utility and feasibility considerations.
  • Formulate the evaluation questions to be asked of the program components in your focus, i.e., implementation, effectiveness, efficiency, and attribution questions.
  • Review evaluation questions with stakeholders, program managers, and program staff.
  • Review options for the evaluation design, making sure that the design fits the evaluation questions.

Worksheet 3A – Focusing the Evaluation in the Logic Model

If this is the situation… Then these are the parts of the logic model, I would include in my evaluation focus:
1
2
3

Worksheet 3B – “Reality Checking” the Evaluation Focus

If this is my answer to these questions… Then I would conclude the questions in my evaluation focus are/are not reasonable ones to ask right now.
1
2
3

[17] There is another type of evaluation—“formative” evaluation—where the purpose of the evaluation is to gain insight into the nature of the problem so that you can “formulate” a program or intervention to address it. While many steps of the Framework will be helpful for formative evaluation, the emphasis in this manual is on instances wherein the details of the program/intervention are already known even though it may not yet have been implemented.

[18] WHO European Working Group on Health Promotion Evaluation. op cit.

Pages in this Report

  • Acknowledgments
  • Guide Contents
  • Executive Summary
  • Introduction
  • Step 1: Engage Stakeholders
  • Step 2: Describe the Program
  • › Step 3: Focus the Evaluation Design
  • Step 4: Gather Credible Evidence
  • Step 5: Justify Conclusions
  • Step 6: Ensure Use of Evaluation Findings and Share Lessons Learned
  • Program Evaluation Resources

E-mail: [email protected]

To receive email updates about this page, enter your email address:

U.S. flag

An official website of the United States government

Here’s how you know

The .gov means it’s official.

Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

The site is secure.

The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

  • Funding Programs
  • Public Wireless Supply Chain Innovation Fund

Innovation Fund Round 1 (2023) Research and Development, Testing and Evaluation

This section provides more information on the Innovation Fund’s grant program. Explore the full list of grant recipients and learn how their projects are driving wireless innovation. The Innovation Fund’s first NOFO focused on two areas: testing and evaluation (T&E) and research and development (R&D) into testing methods.

Research and Development

The Research & Development (R&D) focus area invests in the development of new/improved testing methods. These testing methods will assess the interoperability, performance, and/or security of networks.

Testing and Evaluation

The T&E focus area awards grants to proposals that streamline testing and evaluation across the U.S. Advanced access to affordable T&E lowers the barriers of entry for new and emerging entities, like small companies, start-ups, and SEDI businesses.

$140.4 million

Awarded to Date

Public Wireless Supply Chain Innovation Fund Grant Program Awards

Applications Received

Innovation Fund Program Snapshot

You can visit the Innovation Fund Program Snapshot page or download a one-page summary of the Public Wireless Supply Chain Innovation by clicking on the button below.

Program Documentation

The Program Documentation section has details on learning more about the Innovation Fund Grant Program and how to apply. Visit the following page to find out more.

See Program Awardees

Visit the Awardees page for the awardees funding allocations and related awardee information.

Related content

International briefing on the innovation fund.

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

  • View all journals
  • Explore content
  • About the journal
  • Publish with us
  • Sign up for alerts
  • Open access
  • Published: 05 September 2024

Evaluation of deterioration degree and consolidation effectiveness in sandstone and clay brick materials based on the micro-drilling resistance method

  • Qiong Zhang 1 ,
  • Guoxiang Yang 1 ,
  • Zhongjian Zhang 1 &
  • Feiyue Wang 1  

Scientific Reports volume  14 , Article number:  20693 ( 2024 ) Cite this article

Metrics details

  • Civil engineering
  • Environmental impact

The quick and accurate measurement and evaluation of the deterioration degree and consolidation effectiveness on the surface of masonry relics is valuable for disease investigation and restoration work. However, there is still a lack of quantitative indices for evaluating the deterioration degree and consolidation effectiveness of masonry relics in situ. Based on the micro-drilling resistance method, new quantitative evaluation indices for the deterioration degree and consolidation of masonry materials were proposed. Five types of masonry samples with different deterioration degrees were prepared by artificially accelerated deterioration tests involving sandstone and clay brick as research objects. Three types of consolidants were used to consolidate the deteriorated samples. Drilling resistance tests were conducted for deteriorated and consolidated samples. The variations in deterioration depth and average drilling resistance for samples with different numbers of deterioration cycles were analysed, while the differences in consolidation depth and average drilling resistance for samples with different consolidant types and dosages were compared. Finally, the deterioration degree index ( \(K\) ) and consolidation effectiveness index ( \({R}_{c}\) ), which are based on the average drilling resistance, are proposed. The results can be applied to quick on-site investigations of immovable masonry relics.

Introduction

As carriers of historical and cultural information, masonry relics present great historical, artistic, and economic value. However, accompanied by long-term natural deterioration, most masonry relics suffer from different degrees of deterioration and even threaten structural stability. Accurately evaluating the deterioration degree and consolidation effectiveness of masonry relics is highly important for disease investigation and restoration work.

To evaluate the deterioration degree of masonry relics, visual assessment of deterioration is the most intuitive method. Some researchers have proposed methods for evaluating the deterioration degree based on the clarity and legibility of inscriptions 1 , 2 . The generalized visual assessment method enables comprehensive evaluation of numerous masonry relics through a simple and efficient process, but there is still room for improvement in terms of accuracy and precision. In addition, researchers have proposed semiquantitative evaluation indices for the deterioration degree of masonry relics. For instance, Fitzner et al. proposed a damage index to assess limestone deterioration that uses planimetric data in conjunction with weathering forms and damage categories 3 . Warke et al. proposed a unit, area, and spread (UAS) staging system model to assess the deterioration degree, which involves controlling factors, including structural and mineralogical properties, inheritance effects, contaminant loading, and natural change 4 . According to the photobased and site-specific weathering forms, Thornbush proposed a weathering index (S-E index) to assess the deterioration degree 5 . However, because such schemes involve detailed surveying, there may be considerable demands on operator time and expertise.

In addition, the mechanical and physical properties of masonry relics, including microfracture and porosity 6 , as well as compressive and flexural strength 7 , can also reflect the deterioration degree. Therefore, mechanical and physical indices obtained from laboratory accelerated deterioration processes can be used in quantitative evaluation. As a result of the preciousness and uniqueness of masonry relics, more researchers have suggested the use of nondestructive testing methods to assess the deterioration degree in situ. There are many available studies, for example, deterioration assessments based on ultrasonic wave velocities 8 , 9 , Schmidt hammer rebound 10 , 11 , hardness testers 12 , 13 , penetration resistance testers 14 , 15 , ultrasonic CT 16 , and laser scanners 17 . However, ultrasonic, rebound, and hardness methods require the surface of the measured material to be as flat as possible. The applicable strength range of the penetration resistance method is from 0.4 MPa to 16 MPa, which is not recommended for hard rock 18 . Ultrasonic CT and laser scanners may be cumbersome to use in data processing and place considerable demands on operator time and expertise.

The majority of studies have concentrated on the variations in the physical and mechanical properties of masonry materials before and after consolidation to evaluate the consolidation effectiveness. These include pore size distributions, dynamic elastic moduli, and tensile strengths 19 , 20 , 21 . In addition, nondestructive test methods have been used to evaluate the consolidation effectiveness of masonry relics. The most commonly used method is the comparison of ultrasonic wave velocity before and after consolidation 22 , 23 . However, in most field situations, the material properties tend to vary with depth in deteriorated and consolidated masonry relics. The above methods make it difficult to directly and accurately reflect the mechanical properties versus with depth of the material surface layer before and after consolidation.

Drilling resistance measurement system (DRMS) is an instrument that can continuously measure the resistance of a material to a drill bit under constant drilling conditions. In contrast with other nondestructive measuring instruments, DRMS, which has high sensitivity, can directly and accurately reflect the variation in material properties from the surface to the interior 24 . Therefore, the DRMS has been applied to evaluate the deterioration degree of masonry relics. By analysing the variation in drilling resistance with drilling depth, the surface deterioration depth and the thickness of the deterioration layer can be obtained 25 , 26 . Fonseca et al. proposed a classification scheme for the deterioration of marble based on drilling resistance values to quantitatively classify the deterioration degree 27 , 28 . DRMS is also commonly used to evaluate the range and magnitude of variations in drilling resistance-depth profiles before and after consolidation and is one of the most suitable methods for assessing the consolidation effectiveness of masonry relics. Especially in soft rocks, the difference in drilling resistance before and after consolidation appears to be particularly pronounced 29 , 30 . The consolidation depth of different types and dosages of consolidants can be evaluated based on the change in drilling resistance 31 , 32 . According to the testing and comparison of rock before and after consolidation with a scanning electron microscope and DRMS, Ban et al. confirmed the reliability of assessing the consolidation effectiveness from drilling resistance-depth profiles 33 . In addition, DRMS has also been used to evaluate the consolidation effectiveness of microbially induced carbonate precipitation techniques 34 , 35 . However, the current application of DRMS in the evaluation of the deterioration degree and consolidation effectiveness of masonry relics is commonly used for qualitative or semiquantitative measurements of the deterioration layer thickness and deterioration depth, as well as mostly for qualitative comparisons of the differences in drilling resistance before and after consolidation. There is still a lack of quantitative indices for evaluating the deterioration degree and consolidation effectiveness of masonry relics in combination with nondestructive methods.

To study the nondestructive quantitative evaluation method of the deterioration degree and consolidation effectiveness in masonry relics, sandstone and clay bricks, which are common among masonry relics, are used as study objects. Five types of samples with different deterioration degrees were prepared by artificially accelerated deterioration tests for both sandstone and clay brick, and three types of consolidants were used to consolidate the deteriorated samples. Drilling resistance tests were conducted for deteriorated and consolidated samples, and the calculation method for the average drilling resistance was determined based on the range and magnitude of the variations in the drilling resistance-depth profiles. The variations in deterioration depth and average drilling resistance for samples with different numbers of deterioration cycles were analysed, while the differences in consolidation depth and average drilling resistance for samples with different consolidant types and dosages were compared. Moreover, deterioration degree indices ( \(K\) ) and consolidation effectiveness indices ( \({R}_{c}\) ), which are based on the average drilling resistance, are proposed. Finally, the results were compared with the evaluation indices in the relevant standardization (BS EN 12,371:2010; WW/T 0063–2015) 36 , 37 to verify the accuracy and reliability of the \(K\) and \({R}_{c}\) .

Materials and methods

Sandstone sample and clay brick sample.

Sandstone samples were purchased from Yuze Stone Industry Co., Ltd. (Jining, China). The lithology is red fine-grained feldspar sandstone with blocky formations. According to the results of the rock thin-section analysis and identification (as shown in Fig.  1 a), the sandstone is composed mainly of quartz (70–75%), potassium feldspar (5–10%), plagioclase (less than 5%), clasts (10–15%), and filler material (5–10%). The clasts are predominantly chlorite and white mica. The filler material contains reddish-brown ferruginous cement, which is commonly found in thin films and banded structures. The sandstone grains are mostly rounded and subangular in shape and consist mostly of fine sand (0.06–0.25 mm) and a small amount of medium sand (0.25–0.5 mm), with good sorting and rounding and a haphazard distribution. The sandstone samples were sliced from the same fine-grained sandstone. These samples have almost the same dimensions and mass. Samples with similar wave velocities were selected by ultrasonic wave velocity tests to ensure that there were no significant fissures within the experimental samples. A total of 8 sandstone samples (S1-S8) were obtained and each sample was a cylinder with a diameter of 50 mm and a height of 100 mm. Table 1 shows the bulk density, particle density, total porosity, free water absorption, forced water absorption, and uniaxial compressive strength of the sandstone.

figure 1

The results of petrographical examination and X-ray diffraction analysis: ( a ) microstructure of the sandstone sample in thin section; ( b ) X-ray diffraction analysis result of the clay brick sample.

Clay brick samples were purchased from Dukai Ancient Brick Industry Co., Ltd. (Handan, China), which are blue bricks. The manufacturing process of the blue bricks is as follows: The clay was first soaked and cleaned with water and then dried to a constant mass. Subsequently, the clay was mashed and sieved through a 1 mm sieve. The sieved clay particles were mixed with water and put into moulds. The shaped clay blocks were removed from the moulds and left to dry naturally indoors for 15 days. After that, the clay blocks were fired in a high-temperature furnace for 10 days, maintaining the temperature at 1100 ℃. Finally, the fired clay bricks were cooled by the addition of water in a confined space. The clay brick sample was pulverized into powder for X-ray diffraction analysis (as shown in Fig.  1 b). The X-ray diffraction pattern calculations were performed using the Clayquan program (version 2020) with Rietveld refinement methods. The components of the different minerals were calculated from the cumulative peak area. The results show that the main mineral components of the clay brick sample are quartz (62.0%), dolomite (7.8%), clay minerals (19.4%), potassium feldspar (4.5%), plagioclase (3.7%), and clasts (1.2%). A total of 8 clay brick samples (B1-B8) with similar ultrasonic wave speeds were obtained from the same batch of bricks, all of which were cubic with a length of 40 mm. Table 1 shows the bulk density, particle density, total porosity, free water absorption, forced water absorption, and uniaxial compressive strength of the clay brick.

Materials for deterioration and consolidation experiments

Sandstone and clay brick deterioration samples are obtained through laboratory accelerated dry and wet cycling processes. The instrument used for drying the samples was an electrothermal blast drying oven (produced by Shanghai Meiyu Instrument Co., Ltd., Shanghai, China). Sodium sulfate (Na 2 SO 4 ) is one of the most frequently found salts and the most damaging to masonry artifacts 38 , 39 ; hence, Na 2 SO 4 solution was selected as the immersion fluid with a mass fraction of 14%. Three commonly known consolidants for masonry relics were used to consolidate the sandstone sample after two dry and wet cycles and the clay brick sample after three dry and wet cycles. The three types of consolidants used were Paraloid B-72 (B-72), Tetraethyl orthosilicate (TEOS), and PS solution (PS). Consolidation with B-72 and TEOS are widely used in the restoration of architectural and cultural heritage, and their performance in this application is quite excellent 40 , 41 . PS is one of the most used consolidants for natural stones in the restoration of cultural heritage in China and the literature concerning its performance is quite abundant 42 , 43 . This research work builds upon previous studies that have examined the optimum ratio of consolidants 44 , and the properties of consolidants are shown in Table 2 .

Deterioration method

Literature indicates that salt can produce irreversible damage to masonry artifacts 45 . In this research work the accelerating salt weathering test was performed on sandstone and clay brick samples. This was done in order to study the drilling resistance for different deterioration degrees.

Sandstone and clay brick deterioration samples are obtained through laboratory accelerated dry and wet cycling processes, and samples of the same group are obtained at approximately the same ultrasonic velocity. The dry and wet cycle experiments were carried out according to BS EN 12,370:2020 46 . The specific steps for a cycle are as follows (as shown in Fig.  2 ): (1) All the samples were first dried at 105°C to a constant weight (until the difference in mass within 24 h did not exceed 0.1% of the first weight). (2) After drying, the samples were cooled at room temperature for 2 h and then put into a Na 2 SO 4 solution at 20°C for immersion. The distance between each sample was at least 10 mm, the distance between the sample and the container wall was at least 20 mm, and the liquid level of the solution was at least 8mm above the upper surface of the sample. In addition, the container was sealed with parafilm to reduce evaporation of the solution. (3) After immersion in Na 2 SO 4 solution for 2 h, the test block was removed and put into a drying oven for 16 h. Before drying, the evaporating dish containing water was put into a drying oven and heated for 30 min in advance to maintain high humidity.

figure 2

Dry and wet cycling process for the sandstone and clay brick samples.

A total of 8 samples each from sandstones (S1-S8) and clay bricks (B1-B8) were taken for dry and wet cycle experiments. After a certain number of dry and wet cycles were reached, the sandstone and clay brick samples were removed, washed with distilled water, and dried. The maximum number of dry and wet cycles is 8 for the sandstone samples and 15 for the clay brick samples.

Consolidation method

After two dry and wet cycles, three sandstone samples (S6, S7, and S8) were taken for consolidation tests, and three clay brick samples (B6, B7, and B8) were taken for consolidation tests after three dry and wet cycles. Since the samples in the laboratory were quite small (the sandstone sample was 50 mm in diameter and the clay brick sample had a side length of 40 mm), to accurately control the uniform distribution of the consolidant on the surface of the consolidated materials, the consolidation method of dropwise infiltration with a dropper was used in this paper. The dosage of the consolidant is distributed evenly over the surface of the consolidated material. The consolidation steps were as follows: (1) A dropper was used to add 1 ml of consolidant uniformly to the sample surface, and then 1 ml was added again after all the consolidant had penetrated into the sample. The first consolidation was completed after the consolidated samples were placed in a room temperature environment for 3 days. (2) Subsequently, 2 ml of consolidant was added to the same surface again in the same way as in the first round of consolidation. The second consolidation was also completed after the consolidated samples were placed in a room temperature environment for 3 days. The drilling resistance was tested before consolidation and after completion of each consolidation, as shown in Fig.  3 .

figure 3

Deteriorated sample consolidation process and drilling resistance test procedure.

Testing methods

Micro-drilling resistance testing method.

The operation principles of the DRMS (produced by SINT Technology Co. Ltd., Italy) used in this experiment are shown in Fig.  4 . Before drilling resistance testing starts, the instrument needs to be connected to the computer via a data cable with the penetration rate ( \(v\) ), revolution speed ( \(\omega\) ), and drilling depth ( \(h\) ) set in the corresponding "DRMS Cordless" software. During the drilling process, the instrument maintains a constant penetration rate and revolution speed to continuously measure the drilling resistance. The DRMS can visualize the output of real-time drilling resistance data and the drilling resistance-depth profile.

figure 4

The components, operation principles, and operation processes of DRMS.

A carbide drill bit (BOSCH, CYL-2, produced by BOSCH, Co. Ltd., Germany) was used in this experiment, and its structure is shown in Fig.  5 . In addition, the DRMS is a very sensitive instrument, and its measurement data are affected by drilling parameter settings, drill diameter, etc 47 , 48 , 49 , 50 . To control variables, based on studies correlating drilling resistance values with drilling parameters and bit parameters 24 , 44 , carbide drill bits with a diameter of 5 mm are selected, and the instrument settings are \(v\) =10 mm/min, \(\omega\) =600 rpm, and \(h\) =10 mm. The drilling resistance data were acquired every 0.1 s. At the same time, to avoid the influence of drill bit wear on the drilling resistance, a new carbide drill bit was used for each drill hole in all the experiments. The samples were dried before drilling resistance testing.

figure 5

DRMS and carbide drill bit used in the experiment: ( a ) Schematic of DRMS instrument; ( b ) carbide drill bit; ( c ) schematic structure of the carbide drill bit.

Moreover, to avoid the influence of neighboring drill holes and sample edges, drill holes were selected at a distance greater than 1 cm from the sample edge, and the distance between neighboring drill holes was not less than 1 cm. In addition, to assess the variability between the samples, drilling resistance tests were performed before deterioration and consolidation experiments. The deteriorated sample was tested only twice, before deterioration and after a specified number of deterioration cycles. The consolidated samples were tested only thrice, before consolidation, after the first consolidation, and after the second consolidation. Three parallel drillings for each test, each drilling can be obtained with 100 data points of drilling resistance versus drilling depth. The drilling resistance was averaged for three parallel drillings at the same drilling depth. Hence, each data of the drilling resistance-depth profile is the mean value obtained from three parallel drillings at the same drilling parameters.

Ultrasonic wave velocity testing method

The literature indicates that there is a correlation between the ultrasonic wave velocity and drilling resistance 8 , 9 . In this regard, primary wave and shear wave velocities were measured by an ultrasonic detector (Proceq Pundit PL-200, Proceq Trading Shanghai Co. Ltd., Shanghai, China) with input signals at frequencies of 54 and 250 kHz, respectively. The samples of sandstone (S1-S8) and clay brick (B1-B8) were subjected to testing for primary wave and shear wave velocities in a direction parallel to the drilling. Sandstone and clay brick samples were tested for ultrasonic wave velocity before each drilling resistance test. The testing steps are as follows: The transducer is uniformly coated with couplant and tightly attached to both ends of the sandstone or clay brick samples. The transmission time of the ultrasonic waves through the waveform graph on the ultrasonic detector was obtained and recorded as t , accurate to 0.1 \(\mu s\) , and 5 times in parallel to take the average value. According to the length ( \(l\) ) of each sample measured, the ultrasonic wave velocity can be calculated according to the ratio of length ( \(l\) ) to transmission time ( t ).

Deterioration experiment

Figures  6 and 7 show the drilling resistance-depth profiles for sandstone samples after 2, 4, 6, 7, and 8 dry and wet cycles and the apparent variations in the samples with increasing deterioration cycle times. Initial experiments assumed a maximum of 10 cycles of the sandstone samples to obtain data at each two-cycle interval. However, at the end of the 7th cycle, the S-4 sample appeared to be visibly cracked (Fig.  6 d). By the end of the 8th cycle, the S-5 sample exhibited severe surface exfoliation (Fig.  6 e). The deterioration experiment was terminated after 8 dry and wet cycles to avoid serious deterioration of the samples, resulting in an irregular surface, which would affect the drilling resistance test.

figure 6

Sandstone samples after dry and wet cycling experiments: ( a ) S-1 sample after 2 cycles; ( b ) S-2 sample after 4 cycles; ( c ) S-3 sample after 6 cycles; ( d ) S-4 sample after 7 cycles; and ( e ) S-5 sample after 8 cycles.

figure 7

Drilling resistance-depth profiles of sandstone samples after different numbers of dry and wet cycles: ( a ) S-1 sample after 2 cycles; ( b ) S-2 sample after 4 cycles; ( c ) S-3 sample after 6 cycles; ( d ) S-4 sample after 7 cycles; ( e ) S-5 sample after 8 cycles; and ( f ) comparison of samples after different numbers of deterioration cycles.

As shown in Figs.  6 and 7 , after 2 cycles, the drilling resistance within 0–4.5 mm is slightly lower than that of the undeteriorated sandstone sample, and the drilling resistance in the 4.5–10 mm range is approximately the same as that of the undeteriorated sandstone sample. After 4 cycles, the drilling resistance is significantly lower within the 0–4 mm region than that for the undeteriorated samples. After 6 cycles, the drilling resistance significantly decreased as the range increased to 0–6 mm, and the drilling resistance within 0–0.6 mm was only 0.63 N. After 7 cycles, the depth range of complete deterioration is further extended, with only 1.04 N of drilling resistance in the 0–0.8 mm range. After 8 cycles, the drilling resistance-depth profile clearly changes, and the drilling resistance within 0–1.2 mm is only 1.26 N, indicating that this range has completely deteriorated.

Figures  8 and 9 show the drilling resistance-depth profiles for the clay brick samples after 3, 6, 9, 12, and 15 dry and wet cycles, respectively, and the apparent variations in the samples with increasing deterioration cycle times. At the end of the 3rd cycle, there was no clear variation in the appearance of the B-1 sample. At the end of the 6th and 9th cycles, slight granular exfoliation occurred at the corners of the clay bricks. By the end of the 12th cycle, the B-4 sample exhibited more severe granular exfoliation. After 15 deterioration cycles, the B-5 sample had a large area of missing.

figure 8

Clay brick samples after dry and wet cycling experiments: ( a ) B-1 sample after 3 cycles; ( b ) B-2 sample after 6 cycles; ( c ) B-3 sample after 9 cycles; ( d ) B-4 sample after 12 cycles; and ( e ) B-5 sample after 15 cycles.

figure 9

Drilling resistance-depth profiles of clay brick samples after different numbers of dry and wet cycles: ( a ) B-1 sample after 3 cycles; ( b ) B-2 sample after 6 cycles; ( c ) B-3 sample after 9 cycles; ( d ) B-4 sample after 12 cycles; ( e ) B-5 sample after 15 cycles; and ( f ) comparison of samples after different numbers of deterioration cycles.

As shown in Figs.  8 and 9 , after 2 cycles, the drilling resistance-depth profiles did not change significantly, with the drilling resistance slightly decreasing within 0–4 mm, and the drilling resistance in the 4–10 mm range was approximately the same as that of the undeteriorated clay brick sample. Afterward, as the deterioration time increases, the drilling resistance in the depth range of 0–4 mm continues to decrease, but the deterioration depth range does not change significantly.

Quantitative evaluation of deterioration degree

To quantitatively analyse and evaluate the deterioration degree of sandstone and clay brick samples, a deterioration degree index ( \(K\) ) was proposed according to the results of drilling resistance testing from sandstone and clay brick samples before and after deterioration. \(K\) represents the rate of decrease in the average drilling resistance over the range of deterioration depths. The drilling depth on the drilling resistance-depth profile corresponding to the point at which the drilling resistance begins to stabilize is defined as the deterioration depth, as shown in Fig.  10 . The initial data with disturbances at drilling depths of 0–1 mm are removed from the calculation, and the calculation formula for \(K\) is shown in Eq. ( 1 ).

where \({DR}_{UD}\) is the average drilling resistance for undeteriorated samples (within the deterioration depth range) and \({DR}_{D}\) is the average drilling resistance for deteriorated samples (within the deterioration depth range).

figure 10

Schematic of deterioration depth and calculation depth of drilling resistance ( \(K\) ): \({f}_{UD}(x)\) is the drilling resistance-depth profile of undeteriorated samples; \({f}_{D}(x)\) is the drilling resistance-depth profile of deteriorated samples; i is the drilling depth of the point where the drilling resistance begins to stabilize.

The drilling resistance data from deteriorated and undeteriorated samples can be obtained as drilling resistance-depth profiles \({f}_{UD}(x)\) and \({f}_{D}(x)\) . \({DR}_{UD}\) and \({DR}_{D}\) are the arithmetic mean of the drilling resistance-depth profiles \({f}_{UD}(x)\) and \({f}_{D}(x)\) respectively over a depth range from 1mm to i mm. Table 3 shows the calculation results of \(K\) for the sandstone and clay brick samples at different deterioration cycle times. The average drilling resistance values of the undeteriorated sandstone samples ranged from 26.87 to 28.66 N, and those of the undeteriorated clay brick samples ranged from 15.50 to 19.11 N. The uniformity of the drilling resistance was superior for the fine-grained sandstone samples, with a maximum difference of only 6.7%; while the maximum difference in the drilling resistance for clay brick samples was up to 23.29%, with a high degree of discreteness. Non-homogeneity within the clay brick sample, soft clay minerals approximately 20%, and hard minerals (such as SiO 2 ) may lead to high strength in the localized area of the drill hole. The occurrence of minerals with different hardness could enhance the fluctuations of drilling resistance. In addition, \(K\) gradually increases as the number of deterioration cycles increases, and the deterioration degree of the samples gradually increases. For the sandstone samples, a significant decrease in the drilling resistance occurred at the 4th and 7th cycles. The clay brick samples exhibited a visible decrease in drilling resistance after every three deterioration cycles. The rate of decrease in the drilling resistance with deterioration cycle time for the sandstone sample was significantly greater than that for the clay brick sample.

In addition, Table 3 shows that the deterioration depth in the sandstone samples increases with the number of deterioration cycles, and the thickness of the deteriorated layer increases from 3.9 to 7.4 mm, but at the 7th and 8th cycles, the thickness of the deteriorated layer was only approximately 5.5 mm. The thickness of the deteriorated layer fluctuates gradually from 3 to 4 mm in the clay brick samples, and the deterioration degree cannot be accurately determined from the deterioration depth data alone.

Consolidation experiment

Figure  11 shows the experimental process for determining the consolidation effectiveness of the three types of consolidants (PS, B-72, and TEOS) for consolidating sandstone and clay brick samples. There is a clear difference in the penetration consolidation depth of the different types of consolidants. Figure  12 shows the drilling resistance-depth profiles for the sandstone and clay brick samples before and after consolidation for the three types of consolidants. The drilling resistance of the sandstone samples increased within 0–4.1 mm after consolidation with 2 ml of PS solution and further increased within 0–5.4 mm after consolidation with 4 ml of PS solution. Similarly, the drilling resistance of the sandstone samples increased within 0–3.6 mm after consolidation with 2 ml of B-72 solution and further increased within 0–5.4 mm after consolidation with 4 ml of B-72 solution. However, the drilling resistance-depth profiles of the sandstone samples exhibited little change after consolidation with 2 ml and 4 ml of TEOS solution. The drilling resistance of the clay brick samples increased within 0–1.8 mm after consolidation with 2 ml of PS solution and further increased within 0–3.1 mm after consolidation with 4 ml of PS solution. The clay brick samples exhibited a continuous increase in drilling resistance within 0–1.8 mm after consolidation with 2 and 4 ml of B-72 solution, while the increase in the second consolidation was greater. The drilling resistance-depth profiles of the clay brick samples also exhibited little change after consolidation with 2 and 4 ml of TEOS solution.

figure 11

Sandstone and clay brick samples after consolidation with three types of consolidants.

figure 12

Drilling resistance-depth profiles for sandstone and clay brick samples before and after consolidation with three types of consolidants: ( a ) S-6 sample consolidated with PS; ( b ) B-6 sample consolidated with PS; ( c ) S-7 sample consolidated with B-72; ( d ) B-7 sample consolidated with B-72; ( e ) S-8 sample consolidated with TEOS; ( f ) B-8 sample consolidated with TEOS.

Quantitative evaluation of consolidation effectiveness

To quantitatively analyse and evaluate the consolidation effectiveness of sandstone and clay brick samples, a consolidation effectiveness index ( \({R}_{c}\) ) was proposed according to the results of drilling resistance testing from sandstone and clay brick samples before and after consolidation. \({R}_{c}\) represents the increase rate of the average drilling resistance over the range of consolidation depths. The drilling depth on the drilling resistance-depth profile corresponding to the point at which the drilling resistance begins to coincide before and after consolidation is defined as the consolidation depth, as shown in Fig.  13 . The initial data with disturbances at drilling depths of 0–1 mm are removed from the calculation, and the calculation formula for \({R}_{c}\) is shown in Eq. ( 2 ).

where \({DR}_{UC}\) is the average drilling resistance of unconsolidated samples (within the consolidation depth range) and \({DR}_{C}\) is the average drilling resistance of consolidated samples (within the consolidation depth range).

figure 13

Schematic of the consolidation depth and calculation depth of the consolidation effectiveness index ( \({R}_{c}\) ): \({f}_{UC}(x)\) is the drilling resistance-depth profile of unconsolidated samples; \({f}_{C}(x)\) is the drilling resistance-depth profile of consolidated samples; j is the drilling depth of the point where the drilling resistance tends to coincide before and after consolidation.

The drilling resistance data from consolidated and unconsolidated samples can be obtained as drilling resistance-depth profiles \({f}_{C}(x)\) and \({f}_{UC}(x)\) . \({DR}_{C}\) and \({DR}_{UC}\) are the arithmetic mean of the drilling resistance-depth profiles \({f}_{C}(x)\) and \({f}_{UC}(x)\) respectively over a depth range from 1 mm to j mm. Table 4 shows the calculation results of \({R}_{c}\) for sandstone and clay brick samples with different reinforcement consolidant types and dosages. After the first and second consolidations with the PS solution, the \({R}_{c}\) values of the sandstone samples were 12.51% and 30.12%, respectively, while the \({R}_{c}\) values of the clay brick samples were 15.66% and 33.33%, respectively. Similarly, after the first and second consolidations with the B-72 solution, the \({R}_{c}\) values of the sandstone samples were 33.42% and 32.54%, respectively, while the \({R}_{c}\) values of the clay brick samples were 14.29% and 45.24%, respectively. Therefore, both the PS and B-72 solutions reinforce the sandstone and clay brick samples; the greater the consolidant dosage used is, the greater the \(R_{c}\) and consolidation effectiveness are. However, after the first and second consolidations with the TEOS solution, the \(R_{c}\) values of the sandstone samples were 9.42% and -8.64%, respectively, while the \(R_{c}\) values of the clay brick samples were 6.18% and -11.17%, respectively. An increase in the consolidant dosage of TEOS instead decreased \(R_{c}\) , and the consolidation effectiveness was not satisfactory.

In addition, Table 4 and Fig.  14 shows that the consolidation depth increases with increasing consolidant dosage. However, the consolidation depth does not directly reflect the consolidation effectiveness. The consolidation depth was almost the same for the clay brick samples after the first and second consolidation cycles with the B-72 solution, but the R c increased from 14.29% to 45.24%.

figure 14

Consolidation depth and \(R_{c}\) of the sandstone and clay brick samples after the first and second consolidation.

  • Deterioration degree

The deterioration depth can be determined by the drilling resistance values over a range of drilling depths 29 , 30 . This is also confirmed in the drilling resistance-depth profiles for sandstone and clay bricks in Figs.  7 and 9 . The drilling resistance-depth profile shows a continuously increasing tendency in the surface deterioration layer and stabilizes when the drill bit enters the fresh layer. However, the deterioration degree cannot be determined accurately from deterioration depth data alone (Table 3 ). The undeteriorated sandstone and clay brick samples also showed a continuously increasing trend within the 0–1 mm range, even though the surface of the samples had been polished. Similar observations have been reported in other studies, where the drilling resistance-depth profile always involves some initial data interference, and the drilling resistance data are meaningful only within the depth range after the drill bit has completely entered the material 51 , 52 . A carbide drill (BOSCH, CYL-2) with a V-shaped cross-section was used in the experiments, as shown in Fig.  5 . Before the front end of the drill bit enters the sample completely, the drilling resistance increases as the cross-sectional area of the drill bit increases, resulting in an increase within the 0–1 mm range of the undeteriorated samples. The inclusion of the data from 0 to 1 mm in the calculation will result in a lower calculated average drilling resistance than the true value. Therefore, when calculating \(K\) and \(R_{c}\) , the initial data with disturbances at drilling depths of 0–1 mm are removed from the calculation.

Regarding the calculation method of the average drilling resistance value, there is no uniform standard for the depth range chosen. Rodrigues and Costa proposed an average drilling resistance calculation method for low-strength mortars 53 . Based on a series of processes of segmenting, sorting, selecting, and averaging the data, the smallest 5 or 10 drilling resistance data points in each segment are ultimately selected to calculate the average value. Fernandes and Lourenço excluded the maximum or minimum drilling resistance data and then averaged the drilling resistance values 54 . Benavente et al. calculated average drilling resistances with data in a depth range of 0.5–25 mm 55 . Several researchers have directly calculated average drilling resistance values with data from the whole drilling depth range 56 . In this experiment, the drilling depth corresponding to the point at which the drilling resistance begins to stabilize is defined as the deterioration depth, and the initial data corresponding to disturbances at a drilling depth of 0–1 mm are removed from the calculation. In addition, the effect of deterioration depth and consolidation depth was taken into account when calculating the average drilling resistance value. The data within the deterioration depth ( i mm, Fig.  10 ) or consolidation depth ( j mm, Fig.  13 ) were selected for the calculation of the average drilling resistance value. Based on the variation in drilling resistance values, the deterioration degree index ( \(K\) ) is defined and calculated. The deterioration degree index ( \(K\) ) was compared with the weathering index ( F s ) proposed by WW/T 0063–2015 37 (shown in Eq.  3 ) and the dynamic elastic modulus loss rate ( \(\Delta E_{d}\) ) proposed by BS EN 12,371:2010 36 (shown in Eq.  4 ).

where \(V_{p0}\) is the primary wave velocity of the undeteriorated samples (m/s), \(\rho_{d}\) is the density of the samples (kg/m 3 ), \(V_{s}\) is the shear wave velocity of the deteriorated samples (m/s), and \(V_{p}\) is the primary wave velocity of the deteriorated samples (m/s).

Tables 5 and 6 show the primary wave velocity ( \(V_{p0}\) , \(V_{p}\) ), shear wave velocity ( \(V_{s0}\) , \(V_{s}\) ), average drilling resistance ( \(DR_{UD}\) , \(DR_{D}\) ) and dynamic elastic modulus ( \(E_{d0}\) , \(E_{d}\) ) of the samples before and after deterioration, as well as the loss rate of the dynamic elastic modulus ( \(\Delta E_{d}\) ), weathering index ( F s ) and weathering degree index ( \(K)\) of the samples after deterioration. The primary wave velocity, shear wave velocity, and average drilling resistance gradually decrease with increasing deterioration cycle time. The values of \(\Delta E_{d}\) , F s , and \(K\) gradually increase with the number of deterioration cycles, and the deterioration degree of the sandstone and clay brick samples gradually increases. The \(\Delta E_{d}\) of the sandstone samples reached 38.39% after the 8th deterioration cycle, and the \(\Delta E_{d}\) of the clay brick samples reached 47.07% after the 15th deterioration cycle; both of these values were in a state of extremely serious deterioration according to BS EN 12371:2010 36 (a sample is considered to experience extremely serious deterioration when \(\Delta E_{d}\) exceeds 30%).

Figure  15 shows the correlation between the deterioration degree index ( \(K\) ) and the dynamic elastic modulus loss rate ( \(\Delta E_{d}\) (%)) as well as the weathering indices ( F s ) of the sandstone and brick samples. \(K\) is linearly and positively correlated with both \(\Delta E_{d}\) and F s , with correlation coefficients for sandstone samples of 0.95 and 0.83, respectively, while the correlation coefficients for clay brick samples are 0.89 and 0.91, respectively, which further verifies the accuracy and reliability of \(K\) . Therefore, the deterioration degree of sandstone and clay brick samples can be evaluated by using the deterioration degree index ( \(K\) ) based on the average drilling resistance.

figure 15

Correlations between the deterioration degree index ( \(K\) ) and the dynamic elastic modulus loss rate ( \(\Delta E_{d}\) ) and between the weathering indices ( F s ) of sandstone and clay brick samples: ( a ) correlation between \(K\) and \(\Delta E_{d}\) ; ( b ) correlation between \(K\) and F s .

The above results show that the dynamic elastic modulus loss rate ( \(\Delta E_{d}\) ) and the weathering index ( F s ) are strongly correlated with the deterioration degree index ( \(K\) ). Especially for the clay brick samples, the variation rates of \(K\) and \(\Delta E_{d}\) are similar. After the 15th deterioration cycle, the \(K\) and \(\Delta E_{d}\) of the clay bricks were 43.75% and 47.07%, respectively, with a difference of only 10%. Compared with obtaining the dynamic elastic modulus loss rate ( \(\Delta E_{d}\) ) by measuring the ultrasonic wave velocity, the deterioration degree index ( \(K\) ), which is based on the average drilling resistance and involves controlling factors, including the deterioration depth and the deterioration in the mechanical properties of materials, can reflect the deterioration degree of the samples more directly and accurately.

In addition, the rate of decrease in the drilling resistance with deterioration cycle time for the sandstone sample was significantly greater than that for the clay brick sample (Table 3 ). The clay brick has a high content of quartz (more than 60%) and exhibits a high level of uniaxial compressive strength. The low content of calcium minerals, like calcite and dolomite, indicates good resistance toward sulfates 57 . The clay brick has a relatively high level of both free water absorption (15.56%) and forced water absorption (19.05%). Moreover, the saturation coefficient (ratio of free water absorption to forced water absorption) of the clay brick is 0.82, smaller than the critical value of 0.9, suggesting that the clay brick has good water swelling resistance 58 . The greater vitrification at higher firing temperatures implies the formation of relatively larger pores. The clay bricks used in this paper are fired at high temperatures (1100 ℃). Crystallisation pressure would be much lower in larger pores where no restraint exists for the crystal growth, which indicates its good resistance towards salt crystallisation damage 57 . In contrast, the sandstone is mainly composed of fine sand (0.06–0.25 mm), which should dissolve faster than a coarse-grained rock due to its higher reactive surface area 59 . In addition, the sandstone has a high content of calcium minerals such as calcite (> 10%), which will accelerate the sulphate erosion process. These account for the differences in deterioration rates between the sandstone and the clay brick.

The deterioration depth and deterioration degree index ( \(K\) ), obtained from drilling resistance tests, can be used to determine the optimal consolidation depth and consolidant dosage on-site to achieve accurate conservation and restoration. It is feasible to investigate more accurate consolidation methods for different deteriorated parts in the same material. Further investigation of the optimal consolidation parameters for different materials at varying deterioration depths and degrees is necessary.

  • Consolidation effectiveness

The drilling resistance-depth profile for the consolidated samples increases in the shallow surface depth range and then converges to coincide with the drilling resistance-depth profile for the unconsolidated samples (as shown in Fig.  12 ).

The drilling depth on the drilling resistance-depth profile corresponding to the point at which the drilling resistance begins to coincide before and after consolidation is defined as the consolidation depth. The consolidation effectiveness index ( \(R_{c}\) ) was proposed based on comparing variations in the average drilling resistance over a range of consolidation depths. The consolidation depth does not directly reflect the consolidation effectiveness (Fig.  14 ). The penetration distribution of the consolidant was not uniform (Fig.  11 ), even though dropwise infiltration with a dropper was used to maximize the uniformity of penetration. The alteration of material permeability before and after consolidation represents a significant factor influencing the consolidation depth. The mechanism of permeability properties of different materials influenced by different consolidants needs to be further investigated.

In addition, after the first and second consolidations with the TEOS solution, the increase in drilling resistance is concentrated in the 1–2 mm surface layer (as shown in Fig.  12 b and d). Similar observations regarding the concentration of consolidants on the surface can also be found in Valentini et al. 52 , which may be attributed to the insufficient permeability of the consolidant, as well as the evaporation and capillary action of the volatile components in the consolidant 60 . Furthermore, the porosity of the consolidated materials may prove to be a significant impediment to the consolidation depth achieved by the consolidants. In instances where the first consolidation is unable to fill the majority of surface pores, the second consolidation will preferentially fill the remaining surface pores, which may result in a lack of further increase in the consolidation depth. This phenomenon can be observed in the case of clay bricks consolidated by the B-72 solution, as illustrated in Fig.  14 . By comparing the variations in \(R_{c}\) , there is only a 2.6% difference between the first and second consolidations when the sandstone samples are consolidated with the B-72 solution. In contrast, the second consolidation showed a 216.6% increase in \(R_{c}\) over the first consolidation when the clay brick samples were consolidated with the B-72 solution. As the total porosity of the sandstone sample is 11.12%, which is much lower than the total porosity of the clay brick sample (32.35%), after the first consolidation with 2 ml of B-72 solution, the solute filled most of the pores; thus, the drilling resistance varied minimally in the second consolidation. The clay brick samples with a higher porosity exhibited a significant increase in \(R_{c}\) after the first and second consolidations, but the consolidation depths varied minimally.

The drilling resistance increased within the shallow surface layer of the samples consolidated with the PS solution and B-72 solution, and the drilling resistance increased within a wider range and magnitude as the dosage of consolidants increased. There was no visible variation in the drilling resistance-depth profiles of either the sandstone or clay brick samples after consolidation with the TEOS solution, and even after the second consolidation, a decrease in the drilling resistance was observed instead.

The dissociation products of PS solutions will result in electrostatic adsorption of metal cations on the clay particles of the sandstone and clay brick, which can alter the structure of the clay particles and form silico-aluminate reticulated colloids. In addition, the potassium ions of PS solutions will exchange and adsorb with particle debris in the sandstone and clay brick, which could make the dispersed particles aggregate into larger agglomerates and form an overall linkage 61 . These improve the drilling resistance of the material. Moreover, the PS solution has little effect on the permeability of the consolidation material 42 , 43 ; hence, the consolidation depth of the PS solution exhibited a significant increase after the second consolidation. B-72 solution is a synthetic resin and polymer material with a high strength and fast curing rate, widely used to conserve cultural relics 40 . Among the three consolidation materials, the sandstone and clay brick consolidated with B-72 solutions exhibited the most significant increase in drilling resistance.

It is widely recognized that the siloxane polymer generated by TEOS solution can strengthen the consolidated material 41 . Based on the hydrolysis of alkoxyl groups, TEOS solutions could connect dispersed particles with siloxane chains to consolidate and strengthen the deteriorated sandstone and clay brick. However, the TEOS solution in this experiment used anhydrous ethanol as the solvent (Table 2 ). The volatility of ethanol is pronounced at room temperature, and the rapid volatilisation is not conducive to the homogeneous dispersion and infiltration of the TEOS solution 61 . This may be a significant factor contributing to the limited increase in drilling resistance observed in the first consolidation by the TEOS solution. Furthermore, at the second consolidation by the TEOS solution, the drilling resistance exhibited a decrease, with \(R_{c}\) demonstrating a negative value. This phenomenon may be attributed to the siloxane polymer generated during the first consolidation, which has obstructed the downward seepage of the pore channels. Consequently, the second consolidation of the TEOS solution is unable to penetrate further (as evidenced by the almost identical consolidation depths of the two consolidation experiments in Table 4 ). Meanwhile, the siloxane polymers are transported to the material surface by the volatility of ethanol, forming a weaker layer of crust than the sandstone and clay brick. This ultimately results in a decrease in drilling resistance at the drill depth of 0–3 mm after the second consolidation, with a negative value for \(R_{c}\) .

These results suggest that the \(R_{c}\) based on the average drilling resistance could directly and accurately reflect the difference in consolidation effectiveness between the sandstone and clay brick samples with different consolidant types and dosages, which can provide an empirical reference for masonry relic reinforcement and restoration work.

Based on the micro-drilling resistance method, drilling resistance was tested and analysed for the sandstone and clay brick samples before and after deterioration, as well as before and after consolidation. Deterioration degree index ( \(K\) ) and consolidation effectiveness index ( \(R_{c}\) ), which are based on the drilling resistance, are proposed. The following conclusions can be drawn.

In comparison to the undeteriorated samples, a decrease in the drilling resistance was observed in the surface layer of the deteriorated samples, and the range and magnitude of the decrease increased with the number of dry and wet cycles. The deterioration depth can be identified from drilling resistance-depth profiles.

The deterioration degree index ( \(K\) ) based on the average drilling resistance of deterioration depth can accurately evaluate the deterioration degree of sandstone and clay brick samples. The deterioration degree index ( \(K\) ) was strongly correlated with the dynamic elastic modulus loss rate ( \(\Delta E_{d}\) ) and the weathering index ( F s ).

The consolidation effectiveness index ( \(R_{c}\) ) can directly and accurately evaluate the consolidation effectiveness of sandstone and clay brick samples with different consolidant types and dosages. The greater the amount of consolidant used is, the greater the increase in drilling resistance, but this increase can also be limited by the porosity of the consolidated material.

However, there are some challenges in field applications, for example, for non-homogeneous materials (e.g., mortar; heterogeneous constitution with hard constituents), drilling resistance-depth profiles have a wide range of floating values, which makes it difficult to define the deterioration depth and consolidation depth. The relationship between deterioration depth and deterioration degree, and between consolidation depth and consolidation effectiveness cannot be easily quantified. Further optimization should be explored in the application method of the deterioration degree index ( \(K\) ) and the consolidation effectiveness index ( \(R_{c}\) ).

Data availability 

Data is provided within the manuscript.

Rahn, P. H. The weathering of tombstones and its relationship to the topography of New England. J. Geol. Educ. 19 , 112–118. https://doi.org/10.5408/0022-1368-XIX.3.112 (1971).

Article   Google Scholar  

Meierding, T. C. Inscription legibility method for estimating rock weathering rates. Geomorphology 6 , 273–286. https://doi.org/10.1016/0169-555X(93)90051-3 (1993).

Article   ADS   Google Scholar  

Fitzner, B., Heinrichs, K. & Bouchardiere, D. L. Limestone weathering of historical monuments in Cairo Egypt. Geol. Soc. Lond. Spec. Publ. 205 , 217–239. https://doi.org/10.1144/GSL.SP.2002.205.01.17 (2002).

Article   ADS   CAS   Google Scholar  

Warke, P. A., Curran, J. M., Turkington, A. V. & Smith, B. J. Condition assessment for building stone conservation: a staging system approach. Build. Environ. 38 , 1113–1123. https://doi.org/10.1016/S0360-1323(03)00085-4 (2003).

Thornbush, M. J. A site-specific index based on weathering forms visible in central Oxford UK. Geosciences 2 , 277–297. https://doi.org/10.3390/geosciences2040277 (2012).

Sousa, L. M. O., del Río, L. M. S., Calleja, L., de Argandoña, V. G. R. & Rey, A. R. Influence of microfractures and porosity on the physico-mechanical properties and weathering of ornamental granites. Eng. Geol. 77 , 153–168. https://doi.org/10.1016/j.enggeo.2004.10.001 (2005).

Ruedrich, J., Knell, C., Enseleit, J., Rieffel, Y. & Siegesmund, S. Stability assessment of marble statuaries of the Schlossbrücke (Berlin, Germany) based on rock strength measurements and ultrasonic wave velocities. Environ. Earth. Sci. 69 , 1451–1469. https://doi.org/10.1007/s12665-013-2246-x (2013).

Fort, R., Buergo, M. A. D. & Perez-Monserrat, E. M. Non-destructive testing for the assessment of granite decay in heritage structures compared to quarry stone. Int. J. Rock Mech. Min. 61 , 296–305. https://doi.org/10.1016/j.ijrmms.2012.12.048 (2013).

Martínez-Martínez, J., Benavente, D., Gomez-Heras, M., Marco-Castaño, L. & García-del-Cura, M. A. Non-linear decay of building stones during freeze-thaw weathering processes. Constr. Build. Mater. 38 , 443–454. https://doi.org/10.1016/j.conbuildmat.2012.07.059 (2013).

Çobanoğlu, İ & Çelik, S. B. Estimation of uniaxial compressive strength from point load strength, Schmidt hardness and P-wave velocity. Bull. Eng. Geol. Environ. 67 , 491–498. https://doi.org/10.1007/s10064-008-0158-x (2008).

Article   CAS   Google Scholar  

Wilhelm, K., Viles, H. & Burke, Ó. Low impact surface hardness testing (Equotip) on porous surfaces - advances in methodology with implications for rock weathering and stone deterioration research. Earth Surf. Proc. Land. 41 , 1027–1038. https://doi.org/10.1002/esp.3882 (2016).

Aoki, H. & Matsukura, Y. Estimating the unconfined compressive strength of intact rocks from Equotip hardness. Bull. Eng. Geol. Environ. 67 , 23–29. https://doi.org/10.1007/s10064-007-0116-z (2008).

Yılmaz, N. G. The influence of testing procedures on uniaxial compressive strength prediction of carbonate rocks from equotip hardness tester (EHT) and proposal of a new testing methodology: hybrid dynamic hardness (HDH). Rock Mech. Rock Eng. 46 , 95–106. https://doi.org/10.1007/s00603-012-0261-y (2012).

Felicetti, R. & Gattesco, N. A penetration test to study the mechanical response of mortar in ancient masonry buildings. Mater. Struct. 31 , 350–356. https://doi.org/10.1007/BF02480678 (1998).

Sun, C. G., Kim, B. H., Park, K. H. & Chung, C. K. Geotechnical comparison of weathering degree and shear wave velocity in the decomposed granite layer in Hongseong. South Korea. Environ. Earth Sci. 74 , 6901–6917. https://doi.org/10.1007/s12665-015-4692-0 (2015).

Chen, X., Qi, X. B. & Xu, Z. Y. Determination of weathered degree and mechanical properties of stone relics with ultrasonic CT: A case study of an ancient stone bridge in China. J. Cult. Herit. 42 , 131–138. https://doi.org/10.1016/j.culher.2019.08.007 (2020).

Ercoli, L., Megna, B., Nocilla, A. & Zimbardo, M. Measure of a limestone weathering degree using laser scanner. Int. J. Archit. Herit. 7 , 591–607. https://doi.org/10.1080/15583058.2012.654893 (2013).

Ministry of Housing and Urban-Rural Development of the People's Republic of China. Technical specification for testing compressive strength of masonry mortar by penetration resistance method (JGJ/T 136–2017). China Architecture Publishing House, Beijing (2017).

Mosquera, M. J., Pozo, J. & Esquivias, L. Stress during drying of two stone consolidants applied in monumental conservation. J. Sol-Gel. Sci. Techn. 26 , 1227–1231. https://doi.org/10.1023/A:1020776622689 (2003).

Karatasios, I., Theoulakis, P., Kalagri, A., Sapalidis, A. & Kilikoglou, V. Evaluation of consolidation treatments of marly limestones used in archaeological monuments. Constr. Build. Mater. 23 , 2803–2812. https://doi.org/10.1016/j.conbuildmat.2009.03.001 (2009).

Sassoni, E., Naidu, S. & Scherer, G. W. The use of hydroxyapatite as a new inorganic consolidant for damaged carbonate stones. J. Cult. Herit. 12 , 346–355. https://doi.org/10.1016/j.culher.2011.02.005 (2011).

Martinho, E., Mendes, M. & Dionísio, A. 3D imaging of P-waves velocity as a tool for evaluation of heat induced limestone decay. Constr. Build. Mater. 135 , 119–128. https://doi.org/10.1016/j.conbuildmat.2016.12.192 (2017).

Molina, E., Fiol, C. & Cultrone, G. Assessment of the efficacy of ethyl silicate and dibasic ammonium phosphate consolidants in improving the durability of two building sandstones from Andalusia (Spain). Environ. Earth. Sci. 77 , 302. https://doi.org/10.1007/s12665-018-7491-6 (2018).

Pamplona, M., Kocher, M., Snethlage, R. & Barros, L. A. Drilling resistance: overview and outlook. Z. Dtsch. Ges. Geowiss. 158 , 665–679. https://doi.org/10.1127/1860-1804/2007/0158-0665 (2007).

Theodoridou, M. & Török, Á. In situ investigation of stone heritage sites for conservation purposes: a case study of the Székesfehérvár Ruin Garden in Hungary. Prog. Earth Planet Sci. 6 , 1–14. https://doi.org/10.1186/s40645-019-0268-z (2019).

Zhang, J. K. et al. Study on weathering characteristic and degree through non/little-destructive methods in site for sandstone cliffside figures. J. Northwest Univ. Nat. Sci. Ed. 51 , 379–389 (2021).

Google Scholar  

Fonseca, B. S. D., Pinto, A. P. F., Piçarra, S. & Montemor, M. F. Artificial aging route for assessing the potential efficacy of consolidation treatments applied to porous carbonate stones. Mater. Des. 120 , 10–21. https://doi.org/10.1016/j.matdes.2017.02.001 (2017).

Fonseca, B. S. D. et al. On the estimation of marbles weathering by thermal action using drilling resistance. J. Build. Eng. 42 , 102494. https://doi.org/10.1016/j.jobe.2021.102494 (2021).

Pinto, A. P. F. & Rodrigues, J. D. Stone consolidation: the role of treatment procedures. J. Cult. Herit. 9 , 38–53. https://doi.org/10.1016/j.culher.2007.06.004 (2008).

Pinto, A. P. F. & Rodrigues, J. D. Consolidation of carbonate stones: influence of treatment procedures on the strengthening action of consolidants. J. Cult. Herit. 13 , 154–166. https://doi.org/10.1016/j.culher.2011.07.003 (2012).

Raneri, S. et al. Efficiency assessment of hybrid coatings for natural building stones: Advanced and multi-scale laboratory investigation. Constr. Build. Mater. 180 , 412–424. https://doi.org/10.1016/j.conbuildmat.2018.05.289 (2018).

Wang, S. L., Bai, C. B., Xie, L. N., Wang, J. L. & Li, Y. H. A new procedure for desalination and reinforcement of brick cultural relics in typical sulfate environments. Sci. Conserv. Archaeol. 31 , 22–29 (2019).

Ban, M. et al. Distribution depth of stone consolidants applied on-site: Analytical modelling with field and lab cross-validation. Constr. Build. Mater. 259 , 120394. https://doi.org/10.1016/j.conbuildmat.2020.120394 (2020).

Rodrigues, J. D. & Pinto, A. P. F. Laboratory and onsite study of barium hydroxide as a consolidant for high porosity limestones. J. Cult. Herit. 19 , 467–476. https://doi.org/10.1016/j.culher.2015.10.002 (2016).

Rodrigues, J. D. & Pinto, A. P. F. Stone consolidation by biomineralisation. Contribution for a new conceptual and practical approach to consolidate soft decayed limestones. J. Cult. Herit. 39 , 82–92 (2019).

British Standards Institution. Natural stone test methods - Determination of frost resistance (BS EN 12371:2010) . BSI Standards Publication, London, UK, https://doi.org/10.3403/30163225 (2010).

Administration, N. C. H. Code for investigation of the protection engineering of the stone monument (WW/T 0063–2015) (Cultural Relics Publishing House, 2015).

Benavente, D., del Cura, M. A. G., Bernabeu, A. & Ordonez, S. Quantification of salt weathering in porous stones using an experimental continuous partial immersion method. Eng. Geol. 59 , 313–325. https://doi.org/10.1016/S0013-7952(01)00020-5 (2001).

Al-Omari, A., Beck, K., Brunetaud, X. & Al-Mukhtar, M. Weathering of limestone on Al-Ziggurat walls in the ancient Al-Nimrud city (Iraq). Environ. Earth. Sci. 74 , 609–620. https://doi.org/10.1007/s12665-015-4064-9 (2019).

Vaz, M. F., Pires, J. & Carvalho, A. P. Effect of the impregnation treatment with Paraloid B-72 on the properties of old Portuguese ceramic tiles. J. Cult. Herit. 9 , 269–276. https://doi.org/10.1016/j.culher.2008.01.003 (2008).

Franzoni, E., Pigino, B., Leemann, A. & Lura, P. Use of TEOS for fired-clay bricks consolidation. Mater. Struct. 47 , 1175–1184. https://doi.org/10.1617/s11527-013-0120-7 (2014).

Zhao, H. Y. et al. Impact of modulus and concentration of potassium silicate material on consolidating earthen architecture sites in arid region. Chin. J. Rock Mech. Eng. 25 , 557–562 (2006).

Li, Z. X., Zhao, L. Y. & Guo, Q. L. Deterioration of earthen sites and consolidation with PS material along silk road of China. Chin. J. Rock Mech. Eng. 28 , 1047–1054 (2009).

Wang, F.Y. Research on the evaluation of mechanical performance and reinforcement effect of masonry materials based on Drilling Resistance Measurement System (DRMS). Master′s Thesis, China University of Geosciences Beijing, Beijing, China (2023).

Al-Omari, A., Beck, K., Brunetaud, X. & Al-Mukhtar, M. Assessment the stones compatibility based on salt weathering tests. Zanco J. Pure. Appl. Sci. 31 , 75–83 (2019).

British Standards Institution. Natural stone test methods. Determination of resistance to salt crystallisation (BS EN 12370:2020). BSI Standards Publication, London https://doi.org/10.3403/30386983 (2020).

Tiano, P., Rodrigues, J. D., Witte, E. D. & Vergès-Belmin, V. The conservation of monuments: A new method to evaluate consolidating treatments. Restor. Build. Monum. 6 , 133–150. https://doi.org/10.1515/rbm-2000-5461 (2000).

Exadaktylos, G., Tiano, P. & Filareto, C. Validation of a model of rotary drilling of rocks with the drilling force measurement system. Restor. Build. Monum. 6 , 307–340. https://doi.org/10.1515/rbm-2000-5478 (2000).

Dumitrescu, T. F., Pesce, G. L. A. & Ball, R. J. Optimization of drilling resistance measurement (DRM) user-controlled variables. Mater. Struct. 50 , 243. https://doi.org/10.1617/s11527-017-1113-8 (2017).

Mudhukrishnan, M., Hariharan, P. & Palanikumar, K. Measurement and analysis of thrust force and delamination in drilling glass fiber reinforced polypropylene composites using different drills. Measurement 149 , 106973. https://doi.org/10.1016/j.measurement.2019.106973 (2020).

Pinto, A.P.F. Conservação de pedras carbonatadas: Estudo e selecção de tratamentos. Tese apresentada para obtenção do grau de doutoramento, IST, Lisboa (2002).

Valentini, E., Benincasa, A., Tiano, P., Fratini, F., Rescic, S. On site drilling resistance profiles of natural stones , ICVBC: Florence, Italy, https://api.semanticscholar.org/CorpusID:216125772 (2008).

Rodrigues, J. D. & Costa, D. A new interpretation methodology for microdrilling data from soft mortars. J. Cult. Herit. 22 , 951–955. https://doi.org/10.1016/j.culher.2016.06.010 (2016).

Fernandes, F. & Lourenço, P. B. Evaluation of the compressive strength of ancient clay bricks using microdrilling. J. Mater. Civil. Eng. 19 , 791–800. https://doi.org/10.1061/(ASCE)0899-1561(2007)19:9(791) (2007).

Benavente, D., Fort, R. & Gomez-Heras, M. Improving uniaxial compressive strength estimation of carbonate sedimentary rocks by combining minimally invasive and non-destructive techniques. Int. J. Rock Mech. Min. 147 , 104915. https://doi.org/10.1016/j.ijrmms.2021.104915 (2021).

Costa, D., Magalhães, A. & do Rosário Veiga, M. Characterisation of mortars using drilling resistance measurement system (DRMS): Tests on field panels samples. Springer 7 , 413–423 (2012).

Beckingham, L. E. et al. Evaluation of mineral reactive surface area estimates for prediction of reactivity of a multi-mineral sediment. Geochim. Cosmochim. Ac. 188 , 310–329. https://doi.org/10.1016/j.gca.2016.05.040 (2016).

Liu, J. B. & Zhang, Z. J. Characteristics and weathering mechanisms of the traditional Chinese blue brick from the ancient city of Ping Yao. R. Soc. Open Sci. 7 , 200058. https://doi.org/10.1098/rsos.200058 (2020).

Article   ADS   CAS   PubMed   PubMed Central   Google Scholar  

Manohar, S. & Santhanam, M. Correlation between physical-mineralogical properties and weathering resistance using characterisation case studies in historic Indian bricks. Int. J. Archit. Herit. 16 , 667–680. https://doi.org/10.1080/15583058.2020.1833108 (2022).

Franzoni, E., Graziani, G. & Sassoni, E. TEOS-based treatments for stone consolidation: acceleration of hydrolysis–condensation reactions by poulticing. J. Sol-Gel Sci. Technol. 74 , 398–405. https://doi.org/10.1023/B:SILC.0000025602.64965.e7 (2015).

Shao, M.S. Impact of PS on permeability of unsaturated ruins clay. PhD Thesis, Lanzhou University, Lanzhou, China (2010).

Download references

Acknowledgements

This research was funded by the National Natural Science Foundation of China (No. 42272336).

Author information

Authors and affiliations.

School of Engineering and Technology, China University of Geosciences (Beijing), Beijing, 100083, China

Qiong Zhang, Guoxiang Yang, Zhongjian Zhang & Feiyue Wang

You can also search for this author in PubMed   Google Scholar

Contributions

In this study, Z.Z.J., Y.G.X., Z.Q., and W.F.Y. conceived of the study, designed the study, and carried out a field investigation. Z.Q. and W.F.Y. carried out the laboratory work and analysed the data. Z.Q. and W.F.Y. wrote the original manuscript. Z.Z.J. and Y.G.X. critically revised the manuscript.

Corresponding authors

Correspondence to Guoxiang Yang or Zhongjian Zhang .

Ethics declarations

Competing interests.

The authors declare no competing interests.

Additional information

Publisher's note.

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/ .

Reprints and permissions

About this article

Cite this article.

Zhang, Q., Yang, G., Zhang, Z. et al. Evaluation of deterioration degree and consolidation effectiveness in sandstone and clay brick materials based on the micro-drilling resistance method. Sci Rep 14 , 20693 (2024). https://doi.org/10.1038/s41598-024-71820-6

Download citation

Received : 01 July 2024

Accepted : 30 August 2024

Published : 05 September 2024

DOI : https://doi.org/10.1038/s41598-024-71820-6

Share this article

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

  • Drilling resistance
  • Quantitative evaluation
  • Sandstone and clay brick

By submitting a comment you agree to abide by our Terms and Community Guidelines . If you find something abusive or that does not comply with our terms or guidelines please flag it as inappropriate.

Quick links

  • Explore articles by subject
  • Guide to authors
  • Editorial policies

Sign up for the Nature Briefing: Anthropocene newsletter — what matters in anthropocene research, free to your inbox weekly.

program evaluation methodology sample

IMAGES

  1. FREE 9+ Sample Project Evaluation Templates in PDF

    program evaluation methodology sample

  2. 8 Program Evaluation Forms to Free Download

    program evaluation methodology sample

  3. 12 Program Evaluation Forms

    program evaluation methodology sample

  4. 8 Program Evaluation Forms to Free Download

    program evaluation methodology sample

  5. 9 Sample Project Evaluation Templates to Download

    program evaluation methodology sample

  6. FREE 3+ Sample Program Evaluation Templates in PDF

    program evaluation methodology sample

VIDEO

  1. NMIMS

  2. NMIMS

  3. NMIMS

  4. NMIMS

  5. NMIMS

  6. Program Evaluation Example

COMMENTS

  1. PDF A Guide to Writing a Program Evaluation Plan

    recommended evaluation plan components: 1. Program Description: Setting context for the evaluation plan including, program mission, vision, listing of program goals and objectives, network history and members. 2. Evaluation Design: Describing the purpose and method of evaluation. 3. Plan to Measure Key Data: Selecting key process and outcome ...

  2. PDF Program Evaluation Toolkit: Quick Start Guide

    5 8. Program Evaluation Toolkit: Quick Start Guide. Joshua Stewart, Jeanete Joyce, Mckenzie Haines, David Yanoski, Douglas Gagnon, Kyle Luke, Christopher Rhoads, and Carrie Germeroth October 2021. Program evaluation is important for assessing the implementation and outcomes of local, state, and federal programs.

  3. PDF What is program evaluation?

    How does program evaluation answer questions about whether a program works, or how to improve it. Basically, program evaluations systematically collect and analyze data about program activities and outcomes. The purpose of this guide is to briefly describe the methods used in the systematic collection and use of data.

  4. Section 1. A Framework for Program Evaluation: A Gateway to Tools

    Program evaluation - the type of evaluation discussed in this section - is an essential organizational practice for all types of community health and development work. It is a way to evaluate the specific projects and activities community groups may take part in, rather than to evaluate an entire organization or comprehensive community ...

  5. Program Evaluation Guide

    Evaluation should be practical and feasible and conducted within the confines of resources, time, and political context. Moreover, it should serve a useful purpose, be conducted in an ethical manner, and produce accurate findings. Evaluation findings should be used both to make decisions about program implementation and to improve program ...

  6. Framework for Program Evaluation

    The Framework for Evaluation in Public Health guides public health professionals in their use of program evaluation. It is a practical, nonprescriptive tool, designed to summarize and organize essential elements of program evaluation. Adhering to the steps and standards of this framework will allow an understanding of each program's context ...

  7. Evaluation Development Tools

    Contact Evaluation Program. E-mail: [email protected]. Last Reviewed: November 15, 2016. Source: Centers for Disease Control and Prevention, Office of Policy, Performance, and Evaluation. Program evaluation is an essential organizational practice in public health. At CDC, program evaluation supports our agency priorities.

  8. PDF Key Concepts and Issues in Program Evaluation and Performance Measurement

    lementing, and publicly reporting the results of performance measurement. Information from program evaluations and performance measurement sys. ems is expected to play a role in the way managers manage their programs. Changes to improve program operations and efficiency and effec-tiveness are expected to be driven by.

  9. Program Evaluation: a Plain English Guide

    Sources. Cross, D. (2015) Program Evaluation: a Plain English Guide. Grosvenor Management Consulting. This 11-step guide defines program evaluation, what it is used for, the different types and when they should be used. Also covered is how to plan a program evaluation, monitor performance, communicate findings, deliver bad news, and put imp.

  10. PDF Basic Guide to Program Evaluation

    2. Improve delivery mechanisms to be more efficient and less costly - Over time, product or service delivery ends up to be an inefficient collection of activities that are less efficient and more costly than need be. Evaluations can identify program strengths and weaknesses to improve the program. 3.

  11. The Comprehensive Guide to Program Evaluation

    Here are the key steps to designing a program evaluation plan: Define the program: Clearly define the program being evaluated, including its goals, objectives, activities, inputs, outputs, and outcomes. Develop a logic model that visually represents how the program is intended to work.

  12. REL Resource

    Overview. The Program Evaluation Toolkit presents a step-by-step process for conducting your own program evaluation. The Quick Start Guide will help you decide if you are ready to use this toolkit and where to start. Program evaluation is important for assessing the implementation and outcomes of local, state, and federal programs.

  13. Understanding Evaluation Methodologies: M&E Methods and ...

    Program evaluation methodologies encompass a diverse set of approaches and techniques used to assess the effectiveness, efficiency, and impact of programs and interventions. These methodologies provide systematic frameworks for collecting, analyzing, and interpreting data to determine the extent to which program objectives are being met and to ...

  14. PDF PROGRAM EVALUATION PLAN

    A. Program Description B. Evaluation Needs III. Evaluation Purpose & Focus A. Program Logic Model B. Stakeholder Identification IV. Evaluation Design A. Evaluation Measures B. Evaluation Management Plan i. Evaluation Timeline ii. Evaluation Budget V. Findings and Recommendations VI. References VII. Appendices A. Samples of Evaluation Measures A1.

  15. Program Evaluation for Health Professionals: What It Is, What It Isn't

    We propose that health professionals are ideally positioned to both contribute to and lead program evaluation activities. In order for health professionals to be effective in this role, they require appropriate understanding, knowledge, skills, and confidence around program evaluation (Dickinson & Adams, 2012; Taylor-Powell & Boyd, 2008).However, in our experience program evaluation is often ...

  16. Plan for Program Evaluation from the Start

    Sample Logic Model (figure forthcoming) The evaluation plan should develop goals for future evaluations and questions these evaluations should answer. This information will drive decisions on what data will be needed and how to collect them. For example, stakeholders may be interested in the extent to which the program was implemented as planned.

  17. PDF Program Evaluation Methods

    From the point of view of evaluation methods, two groups of evaluation issues can be usefully distinguished. First, there are issues related to the theory and structure of the program, the program's rationale and possible alternatives. Consider, for example, an industrial assistance program where the government gives grants on a

  18. PDF Qualitative Research Methods in Program Evaluation ...

    Typically gathered in the field, that is, the setting being studied, qualitative data used for program evaluation are obtained from three sources (Patton, 2002): In-depth interviews that use open-ended questions: "Interviews" include both one-on-one interviews and focus groups.

  19. Program Evaluation Guide

    Introduction to Program Evaluation for Public Health Programs: A Self-Study Guide. Now that you have developed a logic model, chosen an evaluation focus, and selected your evaluation questions, your next task is to gather the evidence. Gathering evidence for an evaluation resembles gathering evidence for any research or data-oriented project ...

  20. PDF Qualitative Approaches to Program Evaluation

    Qualitative research gathers participants' experiences, perceptions, and behaviors through open-ended research questions. In evaluations, qualitative research can provide participant-informed insights into the implementation, outcomes, and impacts of programs. Using qualitative approaches, evaluators can incorporate the unique perspectives of ...

  21. Sampling and Evaluation

    Sampling and Evaluation - A Guide to Sampling for Program Impact Evaluation. Download Document: ms-16-112-en.pdf — PDF document, 2,848 kB (2,917,086 bytes) Author (s): Lance P, Hattori A. Year: 2016. Abstract: Program evaluation, or impact evaluation, is a way to get an accurate understanding of the extent to which a health program causes ...

  22. Using mixed methods and partnership to develop a program evaluation

    Background The purpose of this paper is to report on the process for developing an online RE-AIM evaluation toolkit in partnership with organizations that provide physical activity programming for persons with disabilities. Methods A community-university partnership was established and guided by an integrated knowledge translation approach. The four-step development process included: (1 ...

  23. Quantitative classification evaluation model for tight sandstone

    The first is the traditional classification and evaluation method, which directly uses indicators such as the lithology, physical properties, pore structure, sedimentary facies, and oil and ...

  24. A pathology foundation model for cancer diagnosis and prognosis

    Histopathology image evaluation is indispensable for cancer diagnoses and subtype classification. Standard artificial intelligence methods for histopathology image analyses have focused on ...

  25. Evaluating coupling coordination between urban smart performance and

    Evaluation of smart city: contents, methods, and subjects. The evaluation of smart cities is a central research area within the smart city development field.

  26. Targeting emotion dysregulation in depression: an intervention mapping

    Intervention mapping protocol. This study mapped out the process of development based on IM, a program-planning framework. IM provides a step-by-step process for planning theory/evidence-based interventions from the needs to potential methods addressing those needs [20, 21].Since its development in the healthcare field in 1998, IM has been widely used and applications have emerged in other ...

  27. Program Evaluation Guide

    Step 3: Focus the Evaluation Design. Introduction to Program Evaluation for Public Health Programs: A Self-Study Guide. After completing Steps 1 and 2, you and your stakeholders should have a clear understanding of the program and have reached consensus. Now your evaluation team will need to focus the evaluation.

  28. "Reference sample comparison method": A new voltammetric electronic

    Reference sample comparison method for voltammetric electronic tongue was designed. • A "one-to-one" shelf life model was established based on the D d values.. D d algorithm was introduced for data compression of voltammetric electronic tongue. "One-to-one" model enables the evaluation of microbial and sensory shelf life.

  29. Innovation Fund Round 1 (2023) Research and Development, Testing and

    These testing methods will assess the interoperability, performance, and/or security of networks.Testing and EvaluationThe T&E focus area awards grants to proposals that streamline testing and evaluation across the U.S. Advanced access to affordable T&E lowers the barriers of entry for new and emerging entities, like small companies, start-ups ...

  30. Evaluation of deterioration degree and consolidation effectiveness in

    The quick and accurate measurement and evaluation of the deterioration degree and consolidation effectiveness on the surface of masonry relics is valuable for disease investigation and restoration ...