Have a language expert improve your writing

Run a free plagiarism check in 10 minutes, generate accurate citations for free.

  • Knowledge Base

Methodology

  • What Is a Case Study? | Definition, Examples & Methods

What Is a Case Study? | Definition, Examples & Methods

Published on May 8, 2019 by Shona McCombes . Revised on November 20, 2023.

A case study is a detailed study of a specific subject, such as a person, group, place, event, organization, or phenomenon. Case studies are commonly used in social, educational, clinical, and business research.

A case study research design usually involves qualitative methods , but quantitative methods are sometimes also used. Case studies are good for describing , comparing, evaluating and understanding different aspects of a research problem .

Table of contents

When to do a case study, step 1: select a case, step 2: build a theoretical framework, step 3: collect your data, step 4: describe and analyze the case, other interesting articles.

A case study is an appropriate research design when you want to gain concrete, contextual, in-depth knowledge about a specific real-world subject. It allows you to explore the key characteristics, meanings, and implications of the case.

Case studies are often a good choice in a thesis or dissertation . They keep your project focused and manageable when you don’t have the time or resources to do large-scale research.

You might use just one complex case study where you explore a single subject in depth, or conduct multiple case studies to compare and illuminate different aspects of your research problem.

Receive feedback on language, structure, and formatting

Professional editors proofread and edit your paper by focusing on:

  • Academic style
  • Vague sentences
  • Style consistency

See an example

review case study methodology

Once you have developed your problem statement and research questions , you should be ready to choose the specific case that you want to focus on. A good case study should have the potential to:

  • Provide new or unexpected insights into the subject
  • Challenge or complicate existing assumptions and theories
  • Propose practical courses of action to resolve a problem
  • Open up new directions for future research

TipIf your research is more practical in nature and aims to simultaneously investigate an issue as you solve it, consider conducting action research instead.

Unlike quantitative or experimental research , a strong case study does not require a random or representative sample. In fact, case studies often deliberately focus on unusual, neglected, or outlying cases which may shed new light on the research problem.

Example of an outlying case studyIn the 1960s the town of Roseto, Pennsylvania was discovered to have extremely low rates of heart disease compared to the US average. It became an important case study for understanding previously neglected causes of heart disease.

However, you can also choose a more common or representative case to exemplify a particular category, experience or phenomenon.

Example of a representative case studyIn the 1920s, two sociologists used Muncie, Indiana as a case study of a typical American city that supposedly exemplified the changing culture of the US at the time.

While case studies focus more on concrete details than general theories, they should usually have some connection with theory in the field. This way the case study is not just an isolated description, but is integrated into existing knowledge about the topic. It might aim to:

  • Exemplify a theory by showing how it explains the case under investigation
  • Expand on a theory by uncovering new concepts and ideas that need to be incorporated
  • Challenge a theory by exploring an outlier case that doesn’t fit with established assumptions

To ensure that your analysis of the case has a solid academic grounding, you should conduct a literature review of sources related to the topic and develop a theoretical framework . This means identifying key concepts and theories to guide your analysis and interpretation.

There are many different research methods you can use to collect data on your subject. Case studies tend to focus on qualitative data using methods such as interviews , observations , and analysis of primary and secondary sources (e.g., newspaper articles, photographs, official records). Sometimes a case study will also collect quantitative data.

Example of a mixed methods case studyFor a case study of a wind farm development in a rural area, you could collect quantitative data on employment rates and business revenue, collect qualitative data on local people’s perceptions and experiences, and analyze local and national media coverage of the development.

The aim is to gain as thorough an understanding as possible of the case and its context.

Prevent plagiarism. Run a free check.

In writing up the case study, you need to bring together all the relevant aspects to give as complete a picture as possible of the subject.

How you report your findings depends on the type of research you are doing. Some case studies are structured like a standard scientific paper or thesis , with separate sections or chapters for the methods , results and discussion .

Others are written in a more narrative style, aiming to explore the case from various angles and analyze its meanings and implications (for example, by using textual analysis or discourse analysis ).

In all cases, though, make sure to give contextual details about the case, connect it back to the literature and theory, and discuss how it fits into wider patterns or debates.

If you want to know more about statistics , methodology , or research bias , make sure to check out some of our other articles with explanations and examples.

  • Normal distribution
  • Degrees of freedom
  • Null hypothesis
  • Discourse analysis
  • Control groups
  • Mixed methods research
  • Non-probability sampling
  • Quantitative research
  • Ecological validity

Research bias

  • Rosenthal effect
  • Implicit bias
  • Cognitive bias
  • Selection bias
  • Negativity bias
  • Status quo bias

Cite this Scribbr article

If you want to cite this source, you can copy and paste the citation or click the “Cite this Scribbr article” button to automatically add the citation to our free Citation Generator.

McCombes, S. (2023, November 20). What Is a Case Study? | Definition, Examples & Methods. Scribbr. Retrieved April 17, 2024, from https://www.scribbr.com/methodology/case-study/

Is this article helpful?

Shona McCombes

Shona McCombes

Other students also liked, primary vs. secondary sources | difference & examples, what is a theoretical framework | guide to organizing, what is action research | definition & examples, unlimited academic ai-proofreading.

✔ Document error-free in 5minutes ✔ Unlimited document corrections ✔ Specialized in correcting academic texts

  • Privacy Policy

Buy Me a Coffee

Research Method

Home » Case Study – Methods, Examples and Guide

Case Study – Methods, Examples and Guide

Table of Contents

Case Study Research

A case study is a research method that involves an in-depth examination and analysis of a particular phenomenon or case, such as an individual, organization, community, event, or situation.

It is a qualitative research approach that aims to provide a detailed and comprehensive understanding of the case being studied. Case studies typically involve multiple sources of data, including interviews, observations, documents, and artifacts, which are analyzed using various techniques, such as content analysis, thematic analysis, and grounded theory. The findings of a case study are often used to develop theories, inform policy or practice, or generate new research questions.

Types of Case Study

Types and Methods of Case Study are as follows:

Single-Case Study

A single-case study is an in-depth analysis of a single case. This type of case study is useful when the researcher wants to understand a specific phenomenon in detail.

For Example , A researcher might conduct a single-case study on a particular individual to understand their experiences with a particular health condition or a specific organization to explore their management practices. The researcher collects data from multiple sources, such as interviews, observations, and documents, and uses various techniques to analyze the data, such as content analysis or thematic analysis. The findings of a single-case study are often used to generate new research questions, develop theories, or inform policy or practice.

Multiple-Case Study

A multiple-case study involves the analysis of several cases that are similar in nature. This type of case study is useful when the researcher wants to identify similarities and differences between the cases.

For Example, a researcher might conduct a multiple-case study on several companies to explore the factors that contribute to their success or failure. The researcher collects data from each case, compares and contrasts the findings, and uses various techniques to analyze the data, such as comparative analysis or pattern-matching. The findings of a multiple-case study can be used to develop theories, inform policy or practice, or generate new research questions.

Exploratory Case Study

An exploratory case study is used to explore a new or understudied phenomenon. This type of case study is useful when the researcher wants to generate hypotheses or theories about the phenomenon.

For Example, a researcher might conduct an exploratory case study on a new technology to understand its potential impact on society. The researcher collects data from multiple sources, such as interviews, observations, and documents, and uses various techniques to analyze the data, such as grounded theory or content analysis. The findings of an exploratory case study can be used to generate new research questions, develop theories, or inform policy or practice.

Descriptive Case Study

A descriptive case study is used to describe a particular phenomenon in detail. This type of case study is useful when the researcher wants to provide a comprehensive account of the phenomenon.

For Example, a researcher might conduct a descriptive case study on a particular community to understand its social and economic characteristics. The researcher collects data from multiple sources, such as interviews, observations, and documents, and uses various techniques to analyze the data, such as content analysis or thematic analysis. The findings of a descriptive case study can be used to inform policy or practice or generate new research questions.

Instrumental Case Study

An instrumental case study is used to understand a particular phenomenon that is instrumental in achieving a particular goal. This type of case study is useful when the researcher wants to understand the role of the phenomenon in achieving the goal.

For Example, a researcher might conduct an instrumental case study on a particular policy to understand its impact on achieving a particular goal, such as reducing poverty. The researcher collects data from multiple sources, such as interviews, observations, and documents, and uses various techniques to analyze the data, such as content analysis or thematic analysis. The findings of an instrumental case study can be used to inform policy or practice or generate new research questions.

Case Study Data Collection Methods

Here are some common data collection methods for case studies:

Interviews involve asking questions to individuals who have knowledge or experience relevant to the case study. Interviews can be structured (where the same questions are asked to all participants) or unstructured (where the interviewer follows up on the responses with further questions). Interviews can be conducted in person, over the phone, or through video conferencing.

Observations

Observations involve watching and recording the behavior and activities of individuals or groups relevant to the case study. Observations can be participant (where the researcher actively participates in the activities) or non-participant (where the researcher observes from a distance). Observations can be recorded using notes, audio or video recordings, or photographs.

Documents can be used as a source of information for case studies. Documents can include reports, memos, emails, letters, and other written materials related to the case study. Documents can be collected from the case study participants or from public sources.

Surveys involve asking a set of questions to a sample of individuals relevant to the case study. Surveys can be administered in person, over the phone, through mail or email, or online. Surveys can be used to gather information on attitudes, opinions, or behaviors related to the case study.

Artifacts are physical objects relevant to the case study. Artifacts can include tools, equipment, products, or other objects that provide insights into the case study phenomenon.

How to conduct Case Study Research

Conducting a case study research involves several steps that need to be followed to ensure the quality and rigor of the study. Here are the steps to conduct case study research:

  • Define the research questions: The first step in conducting a case study research is to define the research questions. The research questions should be specific, measurable, and relevant to the case study phenomenon under investigation.
  • Select the case: The next step is to select the case or cases to be studied. The case should be relevant to the research questions and should provide rich and diverse data that can be used to answer the research questions.
  • Collect data: Data can be collected using various methods, such as interviews, observations, documents, surveys, and artifacts. The data collection method should be selected based on the research questions and the nature of the case study phenomenon.
  • Analyze the data: The data collected from the case study should be analyzed using various techniques, such as content analysis, thematic analysis, or grounded theory. The analysis should be guided by the research questions and should aim to provide insights and conclusions relevant to the research questions.
  • Draw conclusions: The conclusions drawn from the case study should be based on the data analysis and should be relevant to the research questions. The conclusions should be supported by evidence and should be clearly stated.
  • Validate the findings: The findings of the case study should be validated by reviewing the data and the analysis with participants or other experts in the field. This helps to ensure the validity and reliability of the findings.
  • Write the report: The final step is to write the report of the case study research. The report should provide a clear description of the case study phenomenon, the research questions, the data collection methods, the data analysis, the findings, and the conclusions. The report should be written in a clear and concise manner and should follow the guidelines for academic writing.

Examples of Case Study

Here are some examples of case study research:

  • The Hawthorne Studies : Conducted between 1924 and 1932, the Hawthorne Studies were a series of case studies conducted by Elton Mayo and his colleagues to examine the impact of work environment on employee productivity. The studies were conducted at the Hawthorne Works plant of the Western Electric Company in Chicago and included interviews, observations, and experiments.
  • The Stanford Prison Experiment: Conducted in 1971, the Stanford Prison Experiment was a case study conducted by Philip Zimbardo to examine the psychological effects of power and authority. The study involved simulating a prison environment and assigning participants to the role of guards or prisoners. The study was controversial due to the ethical issues it raised.
  • The Challenger Disaster: The Challenger Disaster was a case study conducted to examine the causes of the Space Shuttle Challenger explosion in 1986. The study included interviews, observations, and analysis of data to identify the technical, organizational, and cultural factors that contributed to the disaster.
  • The Enron Scandal: The Enron Scandal was a case study conducted to examine the causes of the Enron Corporation’s bankruptcy in 2001. The study included interviews, analysis of financial data, and review of documents to identify the accounting practices, corporate culture, and ethical issues that led to the company’s downfall.
  • The Fukushima Nuclear Disaster : The Fukushima Nuclear Disaster was a case study conducted to examine the causes of the nuclear accident that occurred at the Fukushima Daiichi Nuclear Power Plant in Japan in 2011. The study included interviews, analysis of data, and review of documents to identify the technical, organizational, and cultural factors that contributed to the disaster.

Application of Case Study

Case studies have a wide range of applications across various fields and industries. Here are some examples:

Business and Management

Case studies are widely used in business and management to examine real-life situations and develop problem-solving skills. Case studies can help students and professionals to develop a deep understanding of business concepts, theories, and best practices.

Case studies are used in healthcare to examine patient care, treatment options, and outcomes. Case studies can help healthcare professionals to develop critical thinking skills, diagnose complex medical conditions, and develop effective treatment plans.

Case studies are used in education to examine teaching and learning practices. Case studies can help educators to develop effective teaching strategies, evaluate student progress, and identify areas for improvement.

Social Sciences

Case studies are widely used in social sciences to examine human behavior, social phenomena, and cultural practices. Case studies can help researchers to develop theories, test hypotheses, and gain insights into complex social issues.

Law and Ethics

Case studies are used in law and ethics to examine legal and ethical dilemmas. Case studies can help lawyers, policymakers, and ethical professionals to develop critical thinking skills, analyze complex cases, and make informed decisions.

Purpose of Case Study

The purpose of a case study is to provide a detailed analysis of a specific phenomenon, issue, or problem in its real-life context. A case study is a qualitative research method that involves the in-depth exploration and analysis of a particular case, which can be an individual, group, organization, event, or community.

The primary purpose of a case study is to generate a comprehensive and nuanced understanding of the case, including its history, context, and dynamics. Case studies can help researchers to identify and examine the underlying factors, processes, and mechanisms that contribute to the case and its outcomes. This can help to develop a more accurate and detailed understanding of the case, which can inform future research, practice, or policy.

Case studies can also serve other purposes, including:

  • Illustrating a theory or concept: Case studies can be used to illustrate and explain theoretical concepts and frameworks, providing concrete examples of how they can be applied in real-life situations.
  • Developing hypotheses: Case studies can help to generate hypotheses about the causal relationships between different factors and outcomes, which can be tested through further research.
  • Providing insight into complex issues: Case studies can provide insights into complex and multifaceted issues, which may be difficult to understand through other research methods.
  • Informing practice or policy: Case studies can be used to inform practice or policy by identifying best practices, lessons learned, or areas for improvement.

Advantages of Case Study Research

There are several advantages of case study research, including:

  • In-depth exploration: Case study research allows for a detailed exploration and analysis of a specific phenomenon, issue, or problem in its real-life context. This can provide a comprehensive understanding of the case and its dynamics, which may not be possible through other research methods.
  • Rich data: Case study research can generate rich and detailed data, including qualitative data such as interviews, observations, and documents. This can provide a nuanced understanding of the case and its complexity.
  • Holistic perspective: Case study research allows for a holistic perspective of the case, taking into account the various factors, processes, and mechanisms that contribute to the case and its outcomes. This can help to develop a more accurate and comprehensive understanding of the case.
  • Theory development: Case study research can help to develop and refine theories and concepts by providing empirical evidence and concrete examples of how they can be applied in real-life situations.
  • Practical application: Case study research can inform practice or policy by identifying best practices, lessons learned, or areas for improvement.
  • Contextualization: Case study research takes into account the specific context in which the case is situated, which can help to understand how the case is influenced by the social, cultural, and historical factors of its environment.

Limitations of Case Study Research

There are several limitations of case study research, including:

  • Limited generalizability : Case studies are typically focused on a single case or a small number of cases, which limits the generalizability of the findings. The unique characteristics of the case may not be applicable to other contexts or populations, which may limit the external validity of the research.
  • Biased sampling: Case studies may rely on purposive or convenience sampling, which can introduce bias into the sample selection process. This may limit the representativeness of the sample and the generalizability of the findings.
  • Subjectivity: Case studies rely on the interpretation of the researcher, which can introduce subjectivity into the analysis. The researcher’s own biases, assumptions, and perspectives may influence the findings, which may limit the objectivity of the research.
  • Limited control: Case studies are typically conducted in naturalistic settings, which limits the control that the researcher has over the environment and the variables being studied. This may limit the ability to establish causal relationships between variables.
  • Time-consuming: Case studies can be time-consuming to conduct, as they typically involve a detailed exploration and analysis of a specific case. This may limit the feasibility of conducting multiple case studies or conducting case studies in a timely manner.
  • Resource-intensive: Case studies may require significant resources, including time, funding, and expertise. This may limit the ability of researchers to conduct case studies in resource-constrained settings.

About the author

' src=

Muhammad Hassan

Researcher, Academic Writer, Web developer

You may also like

Questionnaire

Questionnaire – Definition, Types, and Examples

Observational Research

Observational Research – Methods and Guide

Quantitative Research

Quantitative Research – Methods, Types and...

Qualitative Research Methods

Qualitative Research Methods

Explanatory Research

Explanatory Research – Types, Methods, Guide

Survey Research

Survey Research – Types, Methods, Examples

Cart

  • SUGGESTED TOPICS
  • The Magazine
  • Newsletters
  • Managing Yourself
  • Managing Teams
  • Work-life Balance
  • The Big Idea
  • Data & Visuals
  • Reading Lists
  • Case Selections
  • HBR Learning
  • Topic Feeds
  • Account Settings
  • Email Preferences

What the Case Study Method Really Teaches

  • Nitin Nohria

review case study methodology

Seven meta-skills that stick even if the cases fade from memory.

It’s been 100 years since Harvard Business School began using the case study method. Beyond teaching specific subject matter, the case study method excels in instilling meta-skills in students. This article explains the importance of seven such skills: preparation, discernment, bias recognition, judgement, collaboration, curiosity, and self-confidence.

During my decade as dean of Harvard Business School, I spent hundreds of hours talking with our alumni. To enliven these conversations, I relied on a favorite question: “What was the most important thing you learned from your time in our MBA program?”

  • Nitin Nohria is the George F. Baker Jr. Professor at Harvard Business School and the former dean of HBS.

Partner Center

  • Research article
  • Open access
  • Published: 25 October 2021

Evaluating complex interventions in context: systematic, meta-narrative review of case study approaches

  • Sara Paparini 1 ,
  • Chrysanthi Papoutsi 1 ,
  • Jamie Murdoch 2 ,
  • Judith Green 3 ,
  • Mark Petticrew 4 ,
  • Trisha Greenhalgh   ORCID: orcid.org/0000-0003-2369-8088 1 &
  • Sara E. Shaw   ORCID: orcid.org/0000-0002-7014-4793 1  

BMC Medical Research Methodology volume  21 , Article number:  225 ( 2021 ) Cite this article

8206 Accesses

27 Citations

39 Altmetric

Metrics details

There is a growing need for methods that acknowledge and successfully capture the dynamic interaction between context and implementation of complex interventions. Case study research has the potential to provide such understanding, enabling in-depth investigation of the particularities of phenomena. However, there is limited guidance on how and when to best use different case study research approaches when evaluating complex interventions. This study aimed to review and synthesise the literature on case study research across relevant disciplines, and determine relevance to the study of contextual influences on complex interventions in health systems and public health research.

Systematic meta-narrative review of the literature comprising (i) a scoping review of seminal texts ( n  = 60) on case study methodology and on context, complexity and interventions, (ii) detailed review of empirical literature on case study, context and complex interventions ( n  = 71), and (iii) identifying and reviewing ‘hybrid papers’ ( n  = 8) focused on the merits and challenges of case study in the evaluation of complex interventions.

We identified four broad (and to some extent overlapping) research traditions, all using case study in a slightly different way and with different goals: 1) developing and testing complex interventions in healthcare; 2) analysing change in organisations; 3) undertaking realist evaluations; 4) studying complex change naturalistically. Each tradition conceptualised context differently—respectively as the backdrop to, or factors impacting on, the intervention; sets of interacting conditions and relationships; circumstances triggering intervention mechanisms; and socially structured practices. Overall, these traditions drew on a small number of case study methodologists and disciplines. Few studies problematised the nature and boundaries of ‘the case’ and ‘context’ or considered the implications of such conceptualisations for methods and knowledge production.

Conclusions

Case study research on complex interventions in healthcare draws on a number of different research traditions, each with different epistemological and methodological preferences. The approach used and consequences for knowledge produced often remains implicit. This has implications for how researchers, practitioners and decision makers understand, implement and evaluate complex interventions in different settings. Deeper engagement with case study research as a methodology is strongly recommended.

Peer Review reports

There is growing interest in methodological approaches that support meaningful evaluation of complex interventions in health care [ 1 , 2 , 3 ], offer to address issues of causality in complex systems [ 4 , 5 ] and grapple with the thorny issue of what counts as ‘context’ and what as ‘intervention’. Case study research focuses on in-depth explorations of complex phenomena in their natural, or real-life, settings [ 6 ], enabling dynamic understanding of complexity, and surfacing the different logics underpinning causal inferences. While there is wide variation in case study research and its implementation, this approach can provide vital evidence for those concerned with internal and external validity and the likely effects of complex interventions across different settings. However, there is currently limited information about how the diversity of available case study research approaches can support implementation and evaluation of complex interventions in health care [ 7 ].

To address a recognised lack of clarity on how researchers should conduct and report empirical case studies [ 7 ], and especially to address the knotty problem of how context should be understood and operationalised in such studies [ 6 ], we undertook a systematic meta-narrative literature review. This was part of the Triple C (Case study, Context and Complex interventions) study that aims to develop guidance and standards for reporting case study research into the influence of context in complex health interventions. We begin by summarising approaches used in evaluating complex interventions, and setting out the principles and methods of meta-narrative review. We then present four research traditions, each comprising a meta-narrative (that is, an unfolding story of empirical research and the underpinning assumptions and theory), relating to case study research on context and complex interventions, arguing that those involved in intervention evaluation need to make explicit and transparent choices about the type/s of case study on which their research draws. Doing so will increase understanding of the knowledge produced and potential for transferability of findings.

Approaches to understanding and evaluating complex interventions

The current interest in case study research represents a shift away from studies of complex interventions that involve a standardised sequence of developing a structured, multi-component intervention, testing it in a RCT [ 8 ] and following a somewhat prescriptive approach to implementation. This well-established approach conceptualised complexity as residing in interventions that consisted of multiple components acting independently and inter-dependently, making it difficult to identify the ‘active ingredient’ [ 9 ] leading to intervention effects. In the UK, this approach formed the basis of the Medical Research Council’s (MRC) 2000 framework for the development and testing of complex interventions [ 9 ] and, later, guidance on conducting process evaluations [ 10 ].

Ways of conceptualising, developing, implementing and evaluating complex interventions have since shifted significantly, in terms of where the complexity is assumed to lie (from the intervention to the system to the interaction between the two [ 11 , 12 ]), and how best to study it (from the RCT to a more pluralistic approach that gives appropriate methodological weight to real-world case studies [ 4 , 6 , 13 ]). In public health and health services research, it is now widely accepted that evaluating complex interventions requires a wide range of evaluative evidence, particularly where RCTs and quasi-experimental studies are either not feasible or inappropriate. Many of the critiques of established research designs are linked to the challenge of ‘context’, which is crucial to understanding intervention effects in particular settings [ 14 ] but often brings ‘noise’ and uncertainty and so is often controlled for and excluded a priori.

Evaluation frameworks and guidance have adapted to account for the necessary behavioural change and organisational involvement required to implement the intervention, the level of variability of outcomes and the degree of intervention adaptability needed, the importance of non-linearity and iterative local tailoring, and the need to pay attention to the social, political or geographical context within which interventions take place [ 10 , 15 , 16 ]. Recently, the MRC and National Institute for Health Research (NIHR) commissioned an update of guidance on complex interventions [ 17 ]. Much uncertainty remains about the best methods for evaluating and implementing complex interventions. For instance, there is a need for better designs that can address questions of causation in natural experiments and questions of complex causation [ 18 ]. This includes more consideration of the potential of non-experimental, mixed methods and process-based approaches, appreciation of the different logics of causality, and use of case study research to understand context [ 13 , 19 , 20 , 21 ].

Case study research is sometimes regarded as providing ‘poor evidence’ for causality [ 7 , 22 ]. But empirical case studies can enable dynamic understanding of complex challenges, help strengthen causal inferences (particularly when pathways between intervention and effects are non-linear) and provide evidence about the necessary conditions for intervention implementation and effects [ 23 , 24 ]. This is because they ‘ generally address multiple variables in numerous real-life contexts, often where there is no clear, single set of outcomes ’ ([ 25 ] p775), making case study an important methodology for studying complexity and an invaluable resource for understanding the influence of context on complex system-level interventions.

There are many ways to conceive and operationalise context [ 26 ]. An influential definition from the MRC guidance refers to context as ‘ anything external to the intervention which impedes or strengthens its effects ’ ([ 10 ] p2). This intervention-centred approach reflects concerns (e.g. of researchers, funders) to prepare the grounds for an intervention, plan implementation and assess transferability across settings. Another approach sees context as relational and dynamic, and as emerging over time in multiple different levels of the wider system [ 27 ]. Rather than an external environment into which an intervention is introduced, context is seen as the numerous opportunities, constraints, issues and happenings that become salient as the intervention unfolds. In the latter view, context cannot be conceptualised and ‘measured’ separately from the intervention.

Most health-related interventions happen in complex systems made up of multiple evolving interactions [ 4 ]. As complex interventions typically depend on elements of context for their effectiveness and there is limited control over such context (it cannot be measured or isolated), challenges arise for a priori hypotheses, evaluation and translation beyond a specific study setting [ 28 , 29 ]. Case study research offers a much-needed resource for understanding the evolving influence of context and for enabling users to know what the likely effects of complex programmes or interventions will be in those settings [ 30 , 31 , 32 , 33 , 34 ].

Objectives and focus of the review

The Triple C study was funded via a commissioned call from the UK MRC Better Methods, Better Research panel, focused on improving the quality of case study research into the influences of context on complex system-level interventions. Research questions were as follows:

Which research (or epistemic) traditions have considered case study research, and how does each conceptualise and operationalise case study and context?

What theoretical and methodological assumptions underpin these research traditions?

What insights can be drawn about the use of case study research to understand context by combining and comparing findings from studies coming from different traditions?

What are the main methodological insights and/or empirical findings, particularly in relation to context, and the relationship between context and intervention in health research?

How do these findings relate to how case study research has been used in studies of complex health interventions? What, if anything, is missing?

The work reported here aimed to: (i) review and synthesise the literature on case study research methods across relevant disciplines, and (ii) determine relevance to the study of contextual influences on complex interventions in health systems and public health research. A subsequent phase involves development and testing of guidance and publication standards using a Delphi panel, workshop, and pilot testing on real-world case studies.

Methodological approach

We conducted a meta-narrative review [ 35 , 36 ]. Originally developed by Greenhalgh and colleagues to explain the disparate data encountered in their review of diffusion of innovation in health care organisations [ 32 ], the meta-narrative review process is guided by six principles (Table  1 ) and involves looking beyond the content of literature to the way it is framed.

Search strategy and selection of documents

Our review was carried out in three linked cycles (Fig.  1 ). Cycle 1 comprised a scoping review of seminal texts on case study methodology and on context, complexity and interventions. In addition to sources known to the research team, we located papers through database searches, expert recommendations (e.g. via social media), citation tracking and snowballing. This informed our detailed search strategy in Cycle 2, developed with an information specialist and using multiple search terms to capture the empirical literature in which case study, context and complex interventions overlapped (see Additional file 1 : Appendix 1). We searched 11 databases: Medline, Embase, PsycINFO, CAB Abstracts, Science Citation Index, Social Sciences Citation Index and Arts & Humanities Citation Index, ERIC, CINAHL, ASSIA, Sociological Abstracts and PAIS Index. We searched the databases from November 2019 back to when their respective records began (the earliest record returned was from 1971). After removal of duplicates, results were imported into Endnote for screening and classification. A sample of 50 papers were screened by SP, CP and SS and discussed in team meetings to progressively refine screening criteria and develop consensus within the team. SP captured the reasoning behind inclusion and exclusion decisions. SP independently screened and sorted all records from 2009 to 2019, initially by reviewing titles and abstracts and then by reviewing full papers in line with the criteria in Table  2 . Results from between 1971 and 2008 were screened by SP via title/abstract looking for high relevance papers only. Included papers were then sorted into groups (dates published; relevance category) and labelled for further analysis. A sample of 10 papers in each category was discussed with other reviewers (SS, CP, JM) until consensus was achieved.

figure 1

Overview of search results across cycles

Guided by the principles of meta-narrative review (Table 1 ) we classified papers as high, partial and low relevance based on the criteria in Table  3 . High relevance papers explicitly focussed on all the 3Cs, i.e. case study, context and complexity.

Cycle 3 involved identifying and reviewing: (a) additional seminal, methodological texts that were cited in high relevance empirical papers; and (b) ‘hybrid papers’ focused on the merits and challenges of case study in the evaluation of complex interventions that were identified during Cycle 1 and/or were cited in high relevance empirical studies (signalling common logics) and made reference to context and complexity. Hybrid papers provided methodological touchpoints enabling researchers to connect with key issues in case study methodology. Written by scholars from a particular discipline who advocate for the use of case study for evaluating interventions in their own field, hybrid papers frequently condensed original methodological texts (e.g. Stake, Yin), using (mostly their own) empirical examples to illustrate case study research. They sometimes (but not always) included a set of quality and/or reporting criteria.

Data extraction and analysis

Our primary focus was on empirical studies (Cycle 2), with methodological texts (Cycles 1 and 3) enabling us to question how researchers had operationalised case study approaches.

Giving more weight to high relevance papers, we analysed papers in reverse chronological order: starting with 2015–2019, working backwards to 1971 and monitoring how case study research for complex interventions was reported within traditions over time. We sampled 8–10 papers from each time period (71 empirical papers and 8 hybrid papers = 79 in total). The review team (SP, SS, CP, JM) discussed these in detail (e.g. focusing on epistemological underpinnings, how case study methodology was invoked, relevance of context and complexity). We summarised key aspects of each study in a data extraction spreadsheet (e.g. disciplines; key and additional methodologists cited; definition of context; definition of case; discussion on complexity; data collection methods; findings on context). The spreadsheet was modified as we read more papers, with the process then repeated for earlier papers. Adopting a hermeneutic approach, enabling ‘dialogue between the reader and the text, between readers and between texts […and…] translation in a concrete socio-historical and cultural context’ ([ 38 ] p262), we explored key concepts, epistemologies and methods within and across papers, both within particular traditions over time and across traditions.

Guided by a further set of analytical questions inspired by methodological texts (e.g. what does the case study do in this instance? what is this case a case of? how is context operationalised? how is context discussed in relation to the intervention? where does complexity lie according to the author?) we then deliberately placed papers in dialogue with one another. We did this by reading sets of papers each (with SP also reading all papers), sharing analytical notes and meeting regularly to discuss and refine, paying attention to the periods the sets were coming from and other connections (e.g. cross-citations) amongst them. We focussed on narrative threads (i.e. the ways in which authors tell stories about case study, context and complexity to the reader and to each other) to sensitise us to authors’ discussions that ran across groups of studies, allowing us to summarise the assumptions and values driving the empirical research. These narrative threads (e.g. about generalisability of findings; or what made an intervention complex) showed both commonalities and contradictions across research traditions. Indeed, in some cases contradictory narrative threads were evident in the same paper. For example, some studies did not fit neatly into a recognisable methodological paradigm, others recognised context in a specific way (e.g. as emergent) but then failed to operationalise it that way. This process led us to build a set of descriptive statements about the use of case study research and understandings about context and complexity that, in turn, helped us to obtain a picture of the different meta-narratives present in the literature.

We presented emerging narrative threads and meta-narratives to the wider team (JG, MP, and TG), and colleagues (e.g. seminars), with their feedback informing further analytic work (e.g. returning to methodological texts to appreciate threads).

Summary of search results and overview of the literature

Search results are presented in Fig. 1 . The total number of texts informing the review was 139 (71 empirical, 8 hybrid and 60 focusing on case study methodology). Most research teams reporting empirical studies were based in the United Kingdom, followed by the United States, conducting research in the same countries. Fewer study teams were based in Canada, Australia, and in sub-Saharan African and European countries. Authors typically worked in health services, health systems, population health, public health and primary care research teams.

Case study research spans several fields and encompasses multiple perspectives that are grounded in different assumptions about the nature of reality and lead to different combinations of methods applied in different ways [ 22 , 39 ]. This epistemological and methodological diversity was reflected in the empirical case studies reviewed, which covered a wide range of case study designs, from naturalistic approaches (typically employing qualitative methods and focused on one or a small number of cases) to more quasi-experimental studies (typically employing mixed methods across a larger number of cases and with some attempt to standardise aspects of the design across cases). In almost all papers in our dataset, authors placed more emphasis on procedural aspects of the methods and tools used (e.g. data collection, sampling) than on discussions of epistemology or methodology or on the nature, selection, definition or boundaries (if any) of ‘the case’.

Four meta-narratives reflecting four distinct research traditions

We identified four broad meta-narratives on case study research, context and complexity (Table  4 ). We summarise each below before examining commonalities, debates and tensions across these traditions. It should be noted that whereas two of the meta-narratives (1 and 3) were fairly distinct, meta-narratives 2 and 4 showed some overlap which reflected cross-fertilisation of ideas between them. Of note, as well as the four meta-narratives, we identified an additional set of papers that were classified as ‘case study’ (e.g. in the title or abstract), but on closer reading appeared to be qualitative or mixed-methods studies addressing context and complex interventions that were not designed to be case study research and did not engage with case study methodology. We highlight this set of papers in our Discussion as they reveal an important issue of classification and reporting of study research.

Meta-narrative #1: Case studies develop and test complex interventions

This first research tradition presents the case study as a way of testing complex interventions, comprising a set of specific instructions on how to design, conduct and report on a case study (Table 4 ). Building on the Medical Research Council’s widely cited Framework for Developing and Testing Complex Interventions [ 9 ], this tradition favours a qualitative development phase followed by a comparative case study testing phase, as illustrated by the early SHIP study, which formed a model which others followed and refined [ 47 ]. Case study research in this tradition is based on broadly positivist assumptions and the ‘theoretical replication’ methodology of Robert Yin. The focus is on technical research approaches and methods designed to test hypotheses about the impact of an intervention (and what mediates or moderates it) in real-life contexts. Researchers identify a pre-existing case or series of cases (e.g. one or more hospitals) and a specific intervention (e.g. an improvement effort to reduce waiting times), then set out to identify relevant contextual factors that are pre-existing and independent of the intervention (e.g. case mix, technological innovativeness) that can explain variations observed between the stated intervention objectives and outcomes in different settings. Complexity is viewed as an inherent property of the intervention or the context in which it is implemented.

In this tradition, case study research is regarded as an appropriate research design because it offers a robust and transparent research procedure for answering questions about ‘how’ and ‘why’ the intervention works in a specific setting, community or population. Take the example of a case study of an equity-enhancement intervention in primary care in Canada – researchers deliberately used case study as ‘ a comprehensive research strategy useful in exploring, describing, explaining, and evaluating causal links in real world interventions that are too complex to be assessed by survey or experimental strategies alone ’ ([ 44 ] p7).

Case study research that focuses on testing complex interventions often claims to use mixed (qualitative and quantitative) methods. However, data collection methods are predominantly qualitative (e.g. interviews, focus groups, documentary review), and quantitative data tend to be used as an illustration (e.g. describing a reduction in waiting time as part of a wider narrative of improved efficiency) [ 41 ]. The use of multiple data sources is frequently given as evidence of a case study approach, and such “data triangulation” is greatly valued as a way to increase the reliability of case study findings. The analytic process (e.g. framework analysis) is usually deductive, based on aggregation and commonly synthesised in a ‘case description’. For instance, a study of improvement efforts to ameliorate hip-fracture care at a Swedish acute care hospital used a stepwise approach in which data were ‘ organized and coded to characterize a) process problems (before and during the redesign), b) the actual changes carried out, and c) the effects of changes as reported by staff members ’ ([ 41 ] p3). Other studies used structured analytic techniques e.g. framework (see, e.g. [ 40 ]). Guided by Yin [ 30 ], multiple case studies typically present within-case followed by cross-case analysis, seeking generalisation of findings through a process of aggregation.

In this research tradition, the term ‘context’ is not defined but usually features in the commonly cited definition of what a case study is: ‘an empirical inquiry that investigates a contemporary phenomenon within its real-life context’ ([ 66 ] p13). The focus is on defined and tangible contextual features external to the intervention. For example, in a study of the implementation of public health policy in two Swedish municipalities, researchers examined ‘ the contextual steering mechanisms that are practiced in local government ’ ([ 43 ] p220) and local government implementation of national policies – both were held as conceptually separate, with opportunities to act and implement national targets ‘restricted by surrounding structures ’ (p221).

Context is often referred to in terms of specific ‘contextual factors ’, which are typically framed as background to the implementation of an intervention and articulated in terms of a (heterogeneous) list of pre-existing features. For example a case study of an equity-enhancement intervention in primary care listed 5 ‘ contextual factors ’ that shape the intervention: ‘( a) the characteristics of the population; (b) the characteristics of the staff; (c) the organizational milieu, including formal and informal power structures, policies, and funding; (d) the political, policy and economic context (…); and (e) the historical and geographic context, specifically, the physical location of organizations in varied rural and urban locations, and the social conditions linked to those locations’ ([ 44 ] p7).

In this tradition, study descriptions of context are often narrower than the contextual factors considered, and provided as rationales for case selection or sampling. For example, in the above mentioned Swedish study, two different municipalities were selected as cases to illustrate different local contexts, with specific characteristics of each municipal organisation then described as ‘contextual conditions’ [ 43 ].

Complexity (e.g. of the system, setting or intervention) is sometimes invoked in studies in this tradition as a rationale for selecting a case study design (e.g. [ 44 ]) or is presented as a characteristic of the case. But whilst complexity is often mentioned by name, the narrative thread on complexity is typically thin and scarce, often appearing as fleeting statements that reveal little as to how complexity is understood. Complexity, it seems, is just there but does not have to be theorised. The linear nature of the case study methodology in this tradition constrains opportunities to engage with complexity. Hence, while an interest in underlying intervention mechanisms is evident in the decision to adopt case study methodology, this typically plays out with researchers then deconstructing the phenomena under study into factors, components or levels in order to describe associations between context and impact on interventions.

The knowledge produced from designs in this tradition is mostly descriptive, presented in technical accounts detailing contextual factors affecting the intervention. Where theory is drawn on, this is for the specific purpose of disentangling the mechanisms through which the intervention operates in the case study. For example, in the study of hip-fracture care, authors do not cite a particular theoretical approach. They simply state that intervention complexity and the heterogeneity of intervention ‘ application ’ in different contexts ‘ constrain generalizations about which method works, when, and how. To gain a deeper understanding of what works, research needs to better disentangle what is actually being implemented and how the multiple components of improvement interventions contribute, or do not, to improved operational performance ’ ([ 41 ] p2).

The relationship between context and intervention (where addressed) tends to be fixed, with intervention success or failure explained as a matter of ‘fit’ between the relevant theory or hypothesis behind the intervention and the context of implementation. Variation between the local contexts of cases accounts for differences in implementation processes and outcomes. For example, a study of the introduction of an electronic audit and feedback system to improve maternal-newborn care practices and outcomes found that a ‘ one size fits all approach ’ was not feasible because ‘ the diversity in context within our case hospitals and in the facilitators and barriers they experienced demonstrates the challenges of implementing one audit and feedback system across an entire sector (all maternal-newborn hospitals) ’ ([ 45 ] p641).

Meta-narrative #2: Case studies analyse change in organisations

Theory-informed case studies of organisational and institutional change, including quality improvement efforts, seek to understand and evaluate the practices, processes and relationships relevant to the development, implementation and adoption of an intervention within specific organisations (Table 4 ). This tradition is more heterogeneous than the one described in meta-narrative 1, having more dispersed origins and wider influence from outside health services research. Researchers in this tradition share a commitment to bringing theoretically-informed rigour to the empirical study of organisational change and quality improvement. Whilst the case study methodologists cited vary, many studies in this tradition are inspired by the work of Yin [ 30 ] and adopt a positivist or critical realist perspective in which the ‘real world’ is external to the intervention. Evaluation involves testing relevant theory (sometimes referred to as a programme theory – that is, overarching theory of how an intervention is expected to work and its anticipated impacts). The unit of analysis is almost always the organisation (or department), and researchers use multiple data collection strategies to study, for instance, how staff variably perceive and carry out change-related activities (e.g. use a new computer system [ 52 ] or create partnerships to sustain organisational innovation [ 50 ]). Theoretical constructs such as agency (which varies amongst actors) are explored in evaluating planned changes, enabling researchers to account for power and resistance in organisations.

Within this broad tradition, there are some differences with regards to what research is valued by scientists in the tradition and how it should be done. One is a system approach to patient safety, in which medical error is theorised as emerging not primarily from individual failings but from features of the system (which is seen as complex and dynamic) [ 52 , 54 ]. Another approach, in which patient safety is also a prominent theme, considers how technologies (used and creatively adapted by humans) subtly alter both front-line work practices and the behaviour of the wider system (e.g. creating panopticon-like surveillance of staff) [ 51 ]. In each of these ‘sub-traditions’, successive studies seek to test and refine theoretical explanations of organisational change generated by previous authors.

Narrative threads about case study research in this tradition, portrayed the case study as an opportunity to study in-depth organisational practices and relationships and develop theories about how these change over time. The case - and what it is a case of - is rarely defined and can sometimes be conflated with setting or organisation of interest (e.g. a hospital). The selection, rather than definition, of case or setting is sometimes explained. For example, in a study of automation in drug-dispensation, the authors illustrate an ‘ archetypal case (…of…) failure ’ of such an innovation carried out in an ‘ ongoing field of activity’ , i.e. a busy emergency department in a US hospital ([ 52 ] p1494).

Qualitative data is usually collected via interviews, observation and documents analysed using the constant comparative method, with authors sometimes reporting using a priori themes from pilot/exploratory phase or literature review. The analysis reveals differentiated intervention effects through the interpretations of different actors involved and inherent consensus and tension.

Context is not defined, but is operationalised through detailed description of the organisation and macro-level changes that frame the intervention. For example, in a study of transformational change of multiple healthcare services into a single regional service in Australia, the context for evaluating ‘ the micro detail of healthcare reform processes ’ was made up of the ‘ forces that influence [the] nature of change efforts in the healthcare sector ’ ([ 53 ] p33).

The focus on specific organisations has led in this tradition to a heterogeneous approach to context. Researchers frequently equate case study setting (e.g. hospital) not only with ‘case’ but also with context, and/or focus on local and national policy contexts (commonly funding issues) as ‘external’ conditions shaping ‘internal’ change efforts. For example, returning to the above-cited study, authors continue: ‘ the change in policy direction (...) was an event that occurred outside the control of the project and an example of the way in which public sector agencies are subject to change caused by the political context within which they operate’ ([ 43 ] p39).

In addition to description of external conditions, context was also operationalised through detailed description of the characteristics of actors (e.g. level of buy-in; power differences), organisations (e.g. management structure, organisational culture), and relationships amongst staff in organisations, as well as of the intervention and its origins (e.g. a study of the use of secondary data analysis from an electronic patient record system to improve safety and quality of care in a UK hospital offered information on a decade-long timeline of the introduction of the e-database itself [ 51 ]). Case study thereby enabled in-depth analysis of one or more units and description of ‘internal’ contextual differences, tensions and contradictions. In contrast with meta-narrative 1, some aspects of context that were internal to the case study organisation were shown as dynamic. This was due to relationships between staff or stakeholders being altered by the intervention.

In this research tradition, there is a noticeable interest in complexity, particularly relating to issues of diffusion, adaptation, implementation stages or cycles of the intervention, as well as sustainability, change and contextual shifts over time. Complexity is captured in iterative methodological approaches to case study evaluation. In a paper reflecting on a case study of healthcare reform in Australia, the authors adopt multiple methods across multiple levels of the system, citing the need ‘to ensure that the evaluation has the flexibility and breadth to accommodate a changing and complex context’ ([ 67 ] p492). The form of data, the sample and the overall structure of the research designs within specific organisations still tends to be pre-determined a priori, but there is room for adaptation over time.

The knowledge produced draws out positive and negative effects of interventions, often through the lens of the different actors involved. Intervention and context remain separate. The intervention is framed as a set of prespecified activities and processes that are re-interpreted by staff at different levels in the organisation.

Accounts provide the detail of intervention effects through actors’ eyes and intrinsic aspects of change in study sites. Empirical generalisation is not a primary objective, hence there is often limited exploration of how findings might be relevant to other settings. For example, in one study of safety improvement programs in US hospitals, the authors set out how their methods were intended ‘to capture a snapshot of the key accomplishments of leading organizations and to synthesize the self-perceived learning of their internal change leaders ’, rather than being ‘ meant to be representative of all health care organizations ’ ([ 54 ] p166). However, theoretical generalisation through development of middle-range theory is often an explicit objective, allowing transferability of theoretical findings (see, e.g. [ 51 ]).

Meta-narrative #3: Case studies are appropriate for undertaking realist evaluation

Case studies in this research tradition apply the theories and methods of realist evaluation [ 68 ], which interrogates how intervention outcomes are achieved by mechanisms triggered in specific contexts by systematically formulating CMO (context-mechanism-outcome) configurations (Table 4 ). The realist evaluation tradition drew explicitly on social realist philosophy and the foundational work of Pawson and Tilley who originally developed the approach within social policy research [ 68 ]. A seminal paper in 2005 made this work accessible and appealing to health services researchers [ 69 ]. A leading research funder, the UK National Institute of Health Research, was attracted to its systematic approach to exploring why interventions work well in some contexts but less well in others, and supported the development of guidance and standards (‘RAMESES’) for both empirical studies and theory-driven systematic reviews of such studies [ 36 , 70 ]. Many though not all studies in our sample followed the RAMESES methods and reporting structure and were ‘realist’ in the sense meant by Pawson and Tilley: surfacing policymakers’ theories about why a programme was thought to work, then testing these theories by collecting and analysing (mostly qualitative) data. The main empirical phase maps context-mechanism-outcome configurations as emerging from data analysis, and identifies (generative) causal relationships in order to develop middle-range or programme theory that can account for how and why an intervention works (or not) and under what conditions. Some studies that cited Pawson’s work appeared to be realist in name only or to be based on a different conception of realism known as critical realism (developed and popularised by Bhaskar). Within this meta-narrative, therefore, not all studies followed the methods that have been endorsed by scholars in the tradition.

Realist case study evaluation is seen as a theory-testing approach because the case study can ‘ illuminate mechanism in relation to outcome ’ ([ 60 ] p5). Case study methodology is advocated due to the focus on phenomena (e.g. interventions) in context, linking closely with the emphasis in realist evaluations on ‘ how causal mechanisms are shaped and constrained by social, political, economic (and so on) contexts ’ ([ 70 ] p9). The choice to use case study is often because it allows multiple and emergent data collection methods (e.g. [ 58 , 60 ]). For example, one paper reporting trial of a breastfeeding support group in Scotland describes how realist evaluation ‘ examines baseline contexts, how organisations, structures and interrelationships shape both implementation and outcomes over time’ ([ 58 ] p771). Authors reflect that it is ‘ detailed case studies, employing quantitative and qualitative data’, that are useful in order ‘to test our propositions about the importance of context, organisation and professional relationships for outcome’ (p. 771) .

Case studies in the realist evaluation tradition commonly employed qualitative data collection methods (e.g. interviews, focus groups, observation) supplemented by targeted quantitative methods (e.g. structured surveys, retrospective cohort data). Different data were generally analysed separately first, using deductive and inductive approaches. Findings were then synthesised to map context-mechanism-outcome formations, using retroductive logic (i.e. asking “what could explain this?” and building and testing hypotheses about mechanisms that produce what are known as ‘demi-regularities—things that tend to happen, though they do not always happen, under particular circumstances). Counterfactual thinking (“what if this were not the case?” or “what if this happened instead of that?”) is used to test alternative explanations that can confirm or disprove the context-mechanism-outcome hypotheses obtained.

The notion of context features strongly in the realist evaluation tradition. It is central to the theoretical core of realism and viewed as a set of circumstances where mechanisms are triggered to produce specific outcomes. However, as noted above, broad definitions of context included in theoretical and methodological papers did not always match how context was understood or operationalised in the studies reported. The meaning of context was wide-ranging, capturing the characteristics of organisations or local area, relationships between staff or broader regional or national policies. Context was also linked to ‘space and place’ with, for instance ‘ the public–private interface and tensions between a mother’s choice and societal pressures’ ([ 58 ] p769) used as a starting point for theoretical development.

Some authors discussed the challenge of differentiating context from phenomena when seeking to distinguish mechanisms. This is illustrated in a case study of a mental health intervention to improve links between primary care and specialist services in England [ 55 ]. Here prevalence of mental health conditions, GPs’ professional background, or the relationships between staff could all count as context, with authors reflecting that: ‘ the same phenomenon could be coded as an outcome or context, or as an outcome or a mechanism. For example a disease register was an outcome of service development and could then act as a mechanism for improving care ’ (p.78).

Realist case study evaluations tended to include two key narrative threads about complexity, both shaped by an understanding of the interaction between context and mechanisms in the production of intervention outcomes. First, that realist evaluation is an appropriate approach to understanding complex interventions (i.e., the intervention is complex in and of itself and realist evaluation can unpack its differential outcomes). Second, that the complexity of the intervention is surfaced in the implementation context or system (i.e. complexity can be observed through the evaluation as it emerges from the interaction between intervention and context). For example, in a case study of academic/practice collaborations in England, authors suggest that: ‘ Realist evaluation is particularly appropriate for developing explanations about how programmes, which by their nature are complex, work contingently within the context of implementation ’ ([ 59 ] p3). The realist case study approach is presented by authors as ideal for exploring this complexity, allowing researchers to study ‘ [collaborations] that are complex, in the sense that their behaviour can be explained with reference to the properties of a whole (adaptive) system rather than its individual components’ (p. 13). The authors conclude that such an approach ‘enables a complexity theory lens that views outcomes as emerging from interactions amongst individuals within a system’ (p.13).

In terms of the knowledge produced, outputs of realist evaluations tend to be presented as technical reports focused on how and why the intervention did (or perhaps didn’t) work. In-depth, thick description and rich information on context are required in order to obtain ‘ insights into the attributes necessary within complex health systems for a policy to work ’ ([ 58 ] p777). As Byng et al. [ 55 ] report: ‘ in some cases potential contingent mechanisms or contexts could not be identified to explain why a mechanism was associated with an outcome in some situations but not others. This could be due to the paucity of data regarding potential contingent contexts or due to inconsistency of the data and lack of clear associations’ (p.79). There is a sense, however, that in such cases the researchers felt that they had looked exhaustively for CMO configurations and identified all the demi-regularities there were to find.

Meta-narrative #4: Case studies enable naturalistic study of complex change

This research tradition, inspired by the work of Robert Stake [ 71 , 72 ], is oriented to achieving hermeneutic understanding and is characterised by a deliberately open-ended approach to the case, complexity and context (Table 4 ). Grounded in interpretivism (an orientation to inquiry that sees social reality as shaped by human experiences and social contexts), this kind of case study research involves granular, naturalistic and often longitudinal observation of events and relationships. The detail of the case is built iteratively, with context understood as an emergent property of ongoing interactions between the complex system and intervention. The researchers’ task is to interpret these interactions though a process of sense-making. Some researchers in this tradition (but not all) seek to engage with, and extend, social science theory.

Whilst ‘thick description’ (that is, a very detailed presentation of real-world events and settings using the narrative form, illustrated with extensive extracts from field notes and on-the-job interviews) is valued to some extent by all case study researchers, in this meta-narrative such description is a goal in its own right. In this sense the naturalistic case study can trace its origins to seminal work in anthropology, where thick description was advocated to provide a picture of what human behaviour and symbols meant in different cultures so they could be understood [ 73 ].

Unlike the traditions described in meta-narratives 1–3, the design of naturalistic case study research is not prescribed. The effects of interventions are seen as nonlinear, explained by narrative causality, as in the events in an unfolding story. Instead of seeking predictable and generalisable relationships between variables (as in meta-narrative 1), transferable theoretical models about change (as in meta-narrative 2) or demi-regularities (as in meta-narrative 3), this tradition is oriented to describing ‘ interacting processes [and the] extent of reciprocal adaptation and embedding ’ ([ 61 ] p539). The focus is on an emic (i.e. from the participants’ perspective) analysis of the case (as opposed to an external, “etic” analysis from the researchers’ perspective), with selection of data sources guided by the principle of ‘the opportunity to learn’ [ 74 ]. Researchers are interested in reflexivity, granularity and preserving ‘multiple realities’—that is, the perspectival view of different individuals and interest groups, which may conflict but which, taken together, contribute to a rich picture of what is going on ([ 72 ] p12). In sum, naturalistic case study research is understood as building a rich, detailed picture in context. In this tradition, the understanding of a case is not specified up front and emerges through the process of conducting the case study.

Data tend to be gathered via qualitative, especially ethnographic, methods (e.g. observation), sometimes with additional quantitative data such as clinic audits relating to patient outcomes and demographics, and patient surveys (e.g. [ 58 ]). Longitudinal approaches are favoured. Another important data source is the reflexive experiences of the researcher, which may include accounts of events ‘from the field’ and an analysis of the researcher’s reactions to these (what John Van Maanen calls ‘confessional tales’ [ 75 ]). In contrast with the research tradition described in meta-narrative 1, where the researcher is seen, more or less, as a dispassionate observer in the case, in this tradition he or she may, in some cases, be a character in the story of the case study. Analysis of the dataset in naturalistic case study involves iteration, comparison and integration of multiple data sources using narrative as the key synthesising device, an emphasis on stakeholders’ perspectives, input from those involved in the research and (in most cases) an ongoing dialogue with relevant theory. Reflexivity is seen by some authors as aiding transparency about subject position and relational dynamics between researcher and researched.

Context is rarely defined in this research tradition, but represented as emerging from a set of relationships and in interaction with wider social forces (e.g. the economy). Such interaction is situated as ‘sense-making’. To put it another way, the essential goal of naturalistic case study is to tell a story, and the story form presents actions and their contexts as interwoven. Operationalisation of context emerges through a narrative iteration between micro and macro contexts and through reflexivity (which brings in the context of the researcher as well as the research). Whilst naturalistic case study has traditionally placed little emphasis on theory (emphasising what Stake has called “the intrinsic study of the valued particular” ([ 76 ], p448), those who have applied this approach in a healthcare setting have often brought in theoretical models to move back and forth from the particular of the case to the general lessons that might be drawn from it (e.g. [ 62 , 65 ]).

Naturalistic cases are usually singular, with the knowledge produced revealing complexity through thick description of complex processes and systems. Meta-level accounts of problems and solutions read as accounts of the particular, with close analysis of the specificities of each case instrumental in generating in-depth understandings about wider structural relations and the unfolding of complex change that are potentially illuminating for (though not predictive of) other situations and settings. Cases do not need to be representative to learn from and generate knowledge [ 73 , 77 ]. The basis for transferability is primarily naturalistic generalisation (in which the researcher and those who immerse themselves in the detail of the case acquire a richer vocabulary and imaginative capabilities which they can then apply to other cases), and—to a lesser extent— theoretical generalisation , where the rich description of the case enables the application of theory, potentially increasing the explanatory power of the case [ 76 ]. Most of the naturalistic case studies in our dataset favoured rich description without extensive theorising. For example a community HIV project in South Africa [ 63 ] was presented without use of the word ‘theory’; a study of hospital mergers in the UK [ 62 ] states that a ‘preliminary theoretical framework’ was selected to guide data collection but is not mentioned further in the paper. In a study of the sustainability of whole-system change in healthcare in London, a theoretical framework based on system dynamics ‘ was developed after completion of the data collection’ and used to inform analysis ([ 61 ] p542). In all three cases, however, the primary focus of the paper is on presenting an authentic descriptive account.

Commonalities, debates and tensions across meta-narratives

Engagement with methodological literature.

Two key methodologists – Yin and Stake (Table 5 ) – were repeatedly cited across empirical papers. Those adopting a ‘Stakian’ approach differed, often significantly, from those drawing mainly on Yin, though it should be noted that many studies cited these methodologists without following the actual methods they advocated (some studies in meta-narrative 1, for example, cited Stake but approached case study research from a technical and largely positivist stance). Both Yin and Stake (and also Pawson, who draws broadly on Yin) emphasise detail, depth and contextualisation; however, while Stake’s method aims to build a naturalistic and evolving picture in context through immersion and interpretation, Yin’s pays more attention to design choices at the outset (e.g. case selection, sampling), a priori theoretical frameworks, and the description of step-wise processes (e.g. to develop chains of evidence). This distinction was reflected in our review. Yin-influenced studies (meta-narratives 1 and 3 along with most studies in 2) tended to describe and justify certain elements of case study design (e.g. the type of case study; data collection methods) more than others (e.g. definition of the case). Studies inspired by Stake (meta-narrative 4) tended to emphasise knowledge as emergent.

As we synthesized findings across the literature in Cycles 1–3, we were struck by many authors’ limited reflexivity as to how case study methodology was taken up and modified in empirical application, leading to multiple, contradictory and confusing narrative threads about the philosophical foundations and methodological requirements of case study research. For example, empirical case studies in meta-narratives 1 and 2 tended to approach case study research as a set of procedures or tools. Few of the studies citing Yin (especially those in meta-narrative 1) included an explicit theoretical and methodological aim, despite Yin’s emphasis on setting out theoretical propositions a priori. There was an inconsistent use of Yin’s original methodology.

Some studies used ‘hybrid papers’ (Table  6 ) as a source of methodological input. Table 6 shows key observations of relevance in these papers with regards to the study of context in complex interventions, and the meta-narratives they relate to.

This body of hybrid literature was important in providing a ‘bridge’ from the empirical work to methodological sources. However, they frequently provided selected methodological detail (likely due to space constraints as journal articles) and tended to draw predominantly on Yin and Stake. There was limited reference, in empirical and hybrid papers, to how other disciplines have engaged with context and with the case, and how this has informed the researchers’ understanding of complexity in their study. This carries the potential to narrow the scope and potential of the methodology.

Overall, leading case study methodology experts from outside the healthcare field (e.g. Mitchell [ 83 ], Gerring [ 84 , 85 ], Flyvberg [ 22 ], Burawoy [ 85 ]) were conspicuously absent in the review of empirical case studies. Even where Yin and Stake were cited, empirical papers were not always faithful to the methodological principles of the original or provided a rationale for divergence.

Defining the case

Case definition is central to case study research and consequential for the knowledge produced. Moreover, case selection and its intended relationship to a broader class of phenomena forms the basis for causal inference. According to Gerring, ‘ what differentiates the case study from cross-unit study is its way of defining cases, not its analysis of those cases or its method of modelling causal relations’ ([ 84 ] p353).

Many papers offered a description of how a case was selected but not of how the case under study (regardless of whether it was ‘arrived at’ a priori or with an open-ended approach) was defined. This is important as, for example, defining a case by mentioning ‘the organisation’ (as several papers in our dataset did) at the exclusion of – for instance - how policy, discourse and wider structural relations shape organisational practices inevitably limits the choice of methods, analytical approach and findings to the boundaries of that organisation. Take the example of a study of mergers between different healthcare institutions in England, based on ‘ four in-depth case studies’ [ 62 ]. Each case study focused on integration of two institutions and they purposively selected four community trusts, in which such integration was taking place (to ensure ‘range of trust types and geographical spread in London ’). Case selection thus appears difficult to distinguish from the sampling of units of analysis. The authors then discuss how a ‘cross-case comparison’ produces a set of themes for the paper. Their detailed account is rich and offers a sense of the different processes of integration. However it remained unclear whether there was a specific case (of the process of integration) or whether the four ‘case studies’ were rather examples of what might happen during mergers. We reflected that examples such as these raised questions about the extent to which research teams had to make discipline-related choices regarding giving much detail about case selection whilst presenting the cases themselves as having unproblematic boundaries. It may be that in the empirical reality, as the research unfolded, what was ‘in frame’ and what was ‘out of frame’ changed, but these key decisions did not make it into the paper.

Connecting with context

Across papers it was for the most part unclear how authors understood, approached or defined context as a concept. There were varied meanings and uses of the term in empirical case studies, with implications for how evaluations of complex interventions are designed and conducted, the knowledge produced and potential transferability.

That said, case studies in all four research traditions clearly included an intention to contextualise. This was evident in: (i) the choice of a case study approach (e.g. citing Yin’s definition of a case as a phenomenon in ‘real life context’, e.g. [ 40 ]), (ii) use of context-mechanism-outcomes frameworks, (iii) details provided about organisations or settings ‘for’ an intervention, and iv) discussion about the importance of context more broadly. Papers attempted to operationalise context in different ways, e.g. describing study settings, offering contextualised justifications for case selection, reviewing national and local policies linked to the intervention, or recounting the history of an intervention or improvement activity.

In meta-narratives 1 and 2 (where case study is often procedure-driven and context external to the intervention), papers typically engaged with context in the ‘findings’ sections by offering lists of ‘contextual factors’ to be taken into consideration when assessing the intervention (e.g. [ 45 ]). In meta-narrative 3, realist case study evaluations included ‘context’ in the construction of context-mechanism-outcome hypotheses. In meta-narrative 4, naturalistic case studies situated context as emergent, relational and in dialogue with the intervention, offering rich or ‘thick’ descriptions for the reader to gain a ‘vicarious experience’ ([ 72 ] p86) of the case and relevant context (e.g. [ 63 ]).

Strikingly, even where authors discussed the importance of context and pointed at the contexts of relevance to their study, what is meant by ‘context’ and how this applied in the empirical studies reviewed remained unspecified. This lack of clarity made it difficult to appreciate how different kinds of contexts were conceptualised, how they compare (e.g. the ‘context’ of a specific hospital versus the policy ‘context’) or the relationship between context and intervention. A handful of papers (e.g. [ 63 ] explicitly engaged with context in ways that were ontologically coherent with the methods they adopted and had a clear level of analysis to focus on (e.g. language, social action). In absence of a conceptual definition, this was helpful to make sense of contextual dimensions of the case.

Some empirical papers cited (e.g. [ 50 , 59 ]) or made use of (e.g. [ 86 ]) the Consolidated Framework for Implementation Research (CFIR) [ 87 ], a meta-theoretical framework combining previous implementation research theories and models, to aid the assessment of different ‘dimensions of context’ (e.g. outer setting; inner setting) and linked sub-dimensional constructs (e.g. cost; implementation climate; planning). The papers using the CFIR largely (though not exclusively) aligned with meta-narrative 1, as the framework’s emphasis on contextual factors as ‘ surrounding the implementation efforts ’ ([ 87 ], p4) maintains a clear division between intervention, implementation and context.

Finally, a common characteristic across papers in meta-narratives 1–3 was a view of ‘changing contexts’ as an unexpected source of complexity, rather than change and dynamism being inherent qualities of context. Researchers frequently invoked the use of case study as a way to address complexity in and of context, but then revealed change as a finding. In some cases, attempts to integrate in results sections through the use of abstract phrases about ‘dynamic relationships’ were supported by limited empirical evidence of how this happens.

Transferability of findings from case study research

The question of transferability is central to much health services and public health research, whose goal might be said to be generating lessons from one setting that can be applied in other settings. Whilst case study research outside the healthcare field includes much discussion of this topic, we found limited engagement with the question of transferability (what some researchers call ‘external validity’) of case study evidence.

In the empirical studies we analysed, researchers rarely stated how findings could be generalised theoretically or applied to other settings. In meta-narrative 1, narratives focused on the need to aggregate and standardise datasets resulting from multiple data collection activities and provide lists of ‘contextual factors’ (typically high level and with limited contextual nuance) to explain variation in intervention outcomes. In meta-narrative 2, the focus was on how being ‘ rooted in specific context ’ means that generalisability of findings to other contexts was ‘limited by the extent to which contexts are similar ’ ([ 88 ] p9). Meta-narrative 3 used the concept of demi-regularities to convey the idea of partial transferability, and meta-narrative 4, as noted above, construed transferability mostly in terms of understanding and capacity to imagine, produced by immersion in the narrative detail of a single case. Overall, case studies provided insights into the organisation or other unit of analysis under study, while the choice of data collection methods, analytical approach and form of reporting meant they could easily be represented as too context specific to have wider relevance.

In meta-narratives 1–3, study findings were sometimes seen as informing middle-range theories—that is, theories that are sufficiently detailed to help explain some regularities in empirical findings but which do not account for every eventuality. For instance, studies in meta-narrative 3 sought to explain how an intervention works, why, for whom and under what conditions. This gave researchers a structured theoretical framework of reference and a way to express generalisability through middle-range theory development.

In meta-narratives 1 and 2 especially, authors frequently placed limitations on the explanatory power of single case studies (presented as offering useful points of transferability rather than stronger claims to theoretical generalisability) and appeared somewhat defensive vis-à-vis potential critiques rooted in statistical generalisation. In contrast, in meta-narrative 4 generalisation and transferability were based partly in the confidence researchers had in the naturalistic generalisability of a richly-described ‘n of 1’ case and partly on the development and refinement of substantive theory, noticing patterns, differences, commonalities or exceptions in instances or events in context and keeping these in dialogue with the theoretical approach adopted coming into the research study.

To our knowledge, this review is the first to focus on empirical and methodological literature relating to the intersection between case study research, context and complex interventions. Findings demonstrate the array of applications and the potential of case study research for evaluations of complex interventions in health care and public health research in key areas, including the use and refinement of theory, design flexibility and adaptability to emerging issues, breadth and depth of data sources and use of multiple methods, appreciating different kinds of causal mechanisms and complex causation, pragmatic advantages when experimenting is not feasible, and potential for transferability and generalisability from single and multiple case studies.

Summary of key findings and links to wider literature

The review has identified four broad research traditions in which case study was used to study complex interventions and the role of context: developing and testing complex interventions; developing theoretical models of change in organisations; undertaking realist evaluations; and producing thick descriptions of the change process. In these different traditions, case study, context and complex intervention, along with the interaction between them, were operationalised differently.

In the wider methodological literature, case study research is widely recognised as an overall approach or strategy encompassing a range of methods, and places much emphasis on the understanding and definition of the case, especially whether it has boundaries and the question of what it is a case of [ 22 , 71 , 84 , 85 , 89 , 90 ]. This was rarely reflected in the empirical literature we reviewed, with many papers emphasising case study methods but taking the case itself as given. The selection of the case (e.g. a particular intervention, a theory behind an intervention) and of the relevant units of analysis (e.g. an organisation through which the intervention is implemented) needs to be integrated with how a case study is understood and intended [ 81 ]. The lack of detail across meta-narratives about case definition and case selection (rather than simply a sampling strategy) makes it difficult to assess the value of case study evaluations. This, in turn, raised challenges for appreciating the context in which the case is situated and the evolving relationship between case, intervention and context.

Findings show that case study research offers useful avenues for analysing the nature of the relationship between context and intervention (e.g. whether distinct, interdependent, or indistinguishable). However, most empirical papers were limited in the extent to which the relationship between context and intervention was expressed and explored, the ways in which context can be understood and evaluated and the potential of case study research to aid this.

The current definition of context in MRC guidance as ‘anything external to the intervention which impedes or strengthens its effects’ ([ 10 ] p10) has been broadened to ‘any feature of the circumstances in which an intervention is implemented that may interact with the intervention to produce variation in outcomes’ in more recent Canadian Institute of Health Research/ National Institute of Health Research guidance for population health interventions studies ([ 13 ] p6). Taken together, these definitions reflect a predominant approach to thinking about context (e.g. meta-narratives 1–3), focusing on specific features that can aid understanding of any changes brought about by interaction with an intervention. Such research rarely sets out to study context, but is faced with challenges in intervention variation and understanding outcomes that lead researchers to then develop taxonomies and lists of contextual factors. This approach may risk de-contextualising context as high level ‘factors’, potentially losing the contextual nuance offered to the reader that is one of the strengths of case study research [ 6 ].

The dominance of this approach is tied up with the historical roots of health research (e.g. the biomedical institutionalisation of research and the RCT, need to generate probabilistic evidence of causality and generalisation), which has led to context being a distinct object of inquiry. By attempting to compartmentalise context/s, studies (e.g. meta-narrative 1) often move away from the stated aim of addressing complexity through case study, and return to a notion of discrete components (of the system, of the context) of a fixed reality, rather than engaging with the relational and processual nature of context and intervention.

Alongside the previously mentioned Consolidated Framework for Implementation Research [ 87 ], a number of other models of context, detailing different domains, constructs and attributes of context have become available in recent years [ 91 ]. For example, the 2017 Context and Implementation of Complex Interventions (CICI) framework is aimed at ‘simplifying and structuring complexity’ ([ 92 ] p1) in order to provide a lens for understanding interacting dimensions of context, implementation and setting. It presents a multi-level understanding of both context and implementation and sees context as ‘an overarching concept, comprising not only a physical location but also roles, interactions and relationships at multiple levels’ ([ 92 ] p6). Although not taken up by empirical papers reviewed in this paper, our scoping review showed that it is becoming popular in the field more recently.

Broader ways of thinking about and operationalising context are emerging in the healthcare field [ 13 , 17 , 27 , 93 ] emphasising, for instance, interventions in complex social systems and even interventions as system changes in themselves [ 94 , 95 ]. Methodological literature in the wider social sciences goes further, describing an array of ways in which context can be conceptualised with implications for the design, conduct and impact of case study research. There are connections across diverse disciplines between the idea of context as an event, as relational and/or socially structured action, or context as a process. For instance, Hawe’s idea of interventions as ‘events in systems’ [ 11 ], links with Rhodes and Lancaster’s idea of entanglements [ 96 ], and Meir and Dopson’s relational view as social action/process [ 97 ]. From this perspective, context is something that happens dynamically or is performed (usually to make sense of what is taking place) – this view of context was visible in meta-narratives 3 and 4, though not always articulated or operationalised explicitly in this way. We plan to examine cross-disciplinary contributions to understanding context in further detail in a future paper.

There are clearly opportunities for case study research to offer explanation and transfer of findings to other contexts. Empirical papers often held back from stating potential transferability and generalisability. This may be due to a historical tendency to understate and critique the potential (i) of case study to offer explanation and to test or build theory (compounded by the historical relegation of case study at the bottom of a methodological hierarchy of effectiveness) [ 6 ], and (ii) for abstraction from ‘the particular’ (e.g. specific case or context) to the general [ 22 ]. In-depth case studies, particularly in meta-narratives 3 and 4, tended to include lack of ‘representativeness’ as one of the limitations. This might be due to peer review in publication and a misunderstanding of the nature of inferences and how they are made (e.g. with authors starting with strong claims about theoretical generalisation from a single case, reviewers requesting acknowledgement of the case as limited in generalising to other cases characterised by different contextual factors, and authors accepting this point and downplaying explanatory power in order to get the paper accepted). Further work is needed to appreciate how and why transferability and inference are downplayed, how this manifests across meta-narratives and case study approaches, and to correct this.

Strengths and limitations of this study

Reviewing a vast and disciplinarily diverse literature focused on methodology and its operationalisation was challenging. The use of meta-narrative review was critical, enabling breadth to identify meta-narratives and depth to unpick pertinent methodological, ontological and epistemological concerns. Using three hermeneutic cycles allowed us to engage with the different layers of the literature – complexity, context and case study research – in iterative ways that a standard systematic review would not have enabled.

The aim of a meta-narrative review is to connect with seminal papers and key threads running through the literature. This allowed us to foreground different disciplines and paradigmatic approaches to case study and the ways in which these shape conceptual and empirical use. As the four meta-narratives demonstrate, case study research is a broad and contested terrain, with a lack of consensus on design and implementation and variation across disciplines. Given this breadth, we were aware that there may be sections of the literature that may have used case study methodology but different terminology (i.e. would likely meet our definitions for relevance but not use the 3C terms, perhaps ones describing their work as ‘ethnographies’, ‘mixed-methods’ or ‘qualitative studies’). These articles were unlikely to have been identified through our literature search; identifying them would need a much-expanded review.

One limitation was that our search strategy was neither sensitive nor specific. On the one hand, the final dataset was relatively small, making it difficult, for example, to demonstrate historicity in the different meta-narratives. It is likely that many relevant studies were missed. On the other hand, in an attempt to avoid an over-specific search (allowing ‘2Cs’ rather than insisting on ‘3Cs’), we turned up a significant body of work that used the term ‘case study’ but did not appear to adopt the methodology and were perhaps of marginal relevance to our research question. Nevertheless, these studies comprised a significant part of the dataset and allowed us to explore the reasoning behind the reporting of evaluative research as ‘case study’.

Our review focussed on ‘intervention-dependent context’, guiding us to literature in which the primary drive to study context came from the need to understand the effects of interventions. As set out above, other ways of conceiving context are possible.

This review also focused on reports of single case studies. A key contribution of case study designs to the literature on causal inference in complex systems is through comparative analysis of series of cases, such as through formal Qualitative Comparative Analysis (QCA) methods [ 98 ]. These use set-theory to systematically compare ‘configurations of conditions’ (such as elements of context, intervention features) to identify patterns associated with presence and absence of an outcome. As the findings of a QCA study are unlikely to be reported as ‘case studies’, they would not have been in scope for our search: the ‘cases’ here are the data for analysis, rather than describing the design of the study. QCA approaches can be applied to primary or secondary data. A recent review of these methods in evaluative health research [ 99 ] identified lack of empirical diversity as a challenge: in short, better reporting of primary case studies would also bring benefits for researchers using QCA and similar methods to improve causal inferences from reviews of those studies.

The need for better methods for evaluation of complex interventions is now widely recognised [ 13 , 21 , 28 , 29 ]. The ‘complexity turn’ has drawn attention to the limitations of relying on causal inference alone for understanding whether, and under which conditions, interventions in complex systems improve health services or the public health, and what mechanisms might link interventions and outcomes [ 4 ]. The four meta-narratives identified in our review are rooted in different ways of seeing the world, leading to different case study designs and methods and the production of different types of knowledge. Clearly, there are choices to be made about the exact approach to be taken in light of the focus of any evaluation, and the research questions that users of evaluation evidence need answered, the nature of the complex intervention and the extent of complexity within the intervention and system. If evaluative health research is to move beyond the current impasse on methods for understanding interventions as interruptions in complex systems then there is a need to more fully exploit the potential learning from this breadth of case study research in evaluation of complex interventions. To do so researchers, funders and users need to address five challenges.

First health research draws on multiple, and arguably incommensurable, conceptualisations of case study research: what a case study is, and the diversity in how empirical case studies are conducted and reported. Whilst we do not believe that consistency is needed across different approaches (indeed, each has important strengths), work is needed to appreciate the range of relevant meta-narratives shaping case study research, the scope and use of different methodologies and type of knowledge produced.

Second, misconceptions remain that case study research can only provide exploratory or descriptive evidence. Yet evidence from one case can be all that is needed for causal claims (e.g. claims that X CAN lead to Y; claims that X doesn’t necessarily lead to Y). This point relates to the value given to case study research generally. However, it has a particular salience in health sciences, where the evidence based medicine movement has instilled a hierarchy of evidence in which case study is firmly relegated to the bottom of the hierarchy. While there has been challenge to this hierarchy, and some movement, resituating and internalising case study research within health sciences requires significant change in the way in which researchers, funders and publishers within the field, not only conceive and rank different kinds of evidence and different kinds of causal inference [ 5 ] and their integration (where possible), but also the ways in which the community puts that into practice (e.g. via peer review, in scaling interventions).

Third, case study researchers, especially in some traditions, typically focus on ‘thick description’ of findings as a means of contextualising detail. This can make it challenging for those more familiar with RCT and quasi-experimental approaches to evaluation to identify the key messages related to intervention evaluation. It likely requires both a readiness on the part of users (e.g. policymakers, journal editors) to engage with the detail of case study research and better skills on the part of case study authors in distilling what are likely to be key issues for decision makers.

Fourth, the relationship between context and intervention needs to be conceptualised along a spectrum, from being separate through to being in interaction. This is not simply a matter of definition, but also relates to evaluation perspective and intended utility. For instance, from a realist perspective (meta-narrative 3) one might not differentiate between context and intervention (and indeed, outcome and mechanisms), but from an intervention-centred perspective (e.g. meta-narrative 1) it might be different. This approach represents a significant challenge to current approaches that tend to construct context as a set of factors/characteristics.

Finally, papers are variably framed as case study research. This is fostered by institutionalised conventions and checklists in which case study methodology either does not feature (and so almost inevitably becomes lost) or in which it is misinterpreted and misunderstood. There is scope for developing guidance and publication standards to help those reporting, publishing and using evidence from case studies.

Availability of data and materials

Data sharing is not applicable to this article as no datasets were generated or analysed during the current study.

Abbreviations

Medical Research Council

Randomised Controlled Trial

Lamont T, Barber N, de Pury J, Fulop N, Garfield-Birkbeck S, Lilford R, et al. New approaches to evaluating complex health and care systems. BMJ. 2016;352:i154.

Article   PubMed   Google Scholar  

Cohn S, Clinch M, Bunn C, Stronge P. Entangled complexity: why complex interventions are just not complicated enough. J Health Serv Res Policy. 2013;18(1):40–3.

Raine R, Fitzpatrick R, Barratt H, Bevan G, Black N, Boaden R, et al. Challenges, solutions and future directions in the evaluation of service innovations in health care and public health. Southampton: Health Services and Delivery Research; 2016. https://doi.org/10.3310/hsdr04160 .

Greenhalgh T, Papoutsi C. Studying complexity in health services research: desperately seeking an overdue paradigm shift. BMC Med. 2018;16:95.

Kneale D, Thomas J, Bangpan M, Waddington H, Gough D. Conceptualising causal pathways in systematic reviews of international development interventions through adopting a causal chain analysis approach. J Dev Effectiveness. 2018;10(4):422–37.

Article   Google Scholar  

Paparini S, Green J, Papoutsi C, Murdoch J, Petticrew M, Greenhalgh T, et al. Case study research for better evaluations of complex interventions: rationale and challenges. BMC Med. 2020;18(1):301.

Article   PubMed   PubMed Central   Google Scholar  

Carolan CM, Forbat L, Smith A. Developing the DESCARTE model: the Design of Case Study Research in health care. Qual Health Res. 2016;26(5):626–39.

Rutter H, Savona N, Glonti K. The need for a complex systems model of evidence for public health. Lancet. 2017;390:2602–4.

Campbell M, Fitzpatrick R, Haines A, Kinmonth AL, Sandercock P, Spiegelhalter D, et al. Framework for design and evaluation of complex interventions to improve health. BMJ. 2000;321(7262):694–6.

Article   CAS   PubMed   PubMed Central   Google Scholar  

Moore GF, Audrey S, Barker M, Bond L, Bonell C, Hardeman W, et al. Process evaluation of complex interventions: Medical Research Council guidance. BMJ. 2015;350:h1258.

Hawe P, Shiell A, Riley T. Theorising interventions as events in systems. Am J Community Psychol. 2009;43(3–4):267–76.

Shiell A, Hawe P, Gold L. Complex interventions or complex systems? Implications for health economic evaluation. BMJ. 2008;336(7656):1281–3.

Craig P, Di Ruggiero E, Frohlich KL, E M, White M, Context Guidance Authors Group. Taking account of context in population health intervention research: guidance for producers, users and funders of research: NIHR; 2018.

Hansen ABG, Jones A. Advancing 'real-world' trials that take account of social context and human volition. Trials. 2017;18(1):531.

Evans RE, Craig P, Hoddinott P, Littlecott H, Moore L, Murphy S, et al. When and how do 'effective' interventions need to be adapted and/or re-evaluated in new contexts? The need for guidance. J Epidemiol Community Health. 2019;73(6):481–2.

Craig P, Dieppe P, Macintyre S, Michie S, Nazareth I, Petticrew M, et al. Developing and evaluating complex interventions: the new Medical Research Council guidance. BMJ. 2008;337:a1655.

Skivington K, Matthew L, Simpson SA, Craig P, Baird J, Blazeby JM, et al. Updating the framework for developing and evaluating complex interventions. Public Health Research. In Press.

Craig P, Cooper C, Gunnell D, Haw S, Lawson K, Macintyre S, et al. Using natural experiments to evaluate population health interventions: new Medical Research Council guidance. J Epidemiol Community Health. 2012;66(12):1182–6.

Booth A, Harris J, Croot E, Springett J, Campbell F, Wilkins E. Towards a methodology for cluster searching to provide conceptual and contextual "richness" for systematic reviews of complex interventions: case study (CLUSTER). BMC Med Res Methodol. 2013;13:118.

Pfadenhauer LM, Gerhardus A, Mozygemba K, Lysdahl KB, Booth A, Hofmann B, et al. Making sense of complexity in context and implementation: the context and implementation of complex interventions (CICI) framework. Implement Sci. 2017;12(1):21.

Ogilvie D, Adams J, Bauman A, Gregg EW, Panter J, Siegel KR, et al. Using natural experimental studies to guide public health action: turning the evidence-based medicine paradigm on its head. J Epidemiol Community Health. 2020;74(2):203–8.

Flyvbjerg B. Five misunderstandings about case-study research. Qual Inq. 2006;12:219–45.

Byrne D. Evaluating complex social interventions in a complex world. Evaluation. 2013;19(3):217–28.

Kœnig G. Realistic evaluation and Case studies:stretching the potential. Evaluation. 2009;15(1):9–30.

Walshe C. The evaluation of complex interventions in palliative care: an exploration of the potential of case study research strategies. Palliat Med. 2011;25(8):774–81.

Meier N, Dopson S. Context in action and how to study it : illustrations from health care. First edition. ed. Oxford: Oxford University Press; 2019. xix, 261 pages p.

Greenhalgh J, Manzano A. Understanding ‘context’ in realist evaluation and synthesis. Int J Soc Res Methodol. 2021. https://doi.org/10.1080/13645579.2021.1918484 .

Wells M, Williams B, Treweek S, Coyle J, Taylor J. Intervention description is not enough: evidence from an in-depth multiple case study on the untold role and impact of context in randomised controlled trials of seven complex interventions. Trials. 2012;13(1):95.

Craig P, Katikireddi SV, Leyland A, Popham F. Natural experiments: an overview of methods, approaches, and contributions to public health intervention research. Annu Rev Public Health. 2017;38:39–56.

Yin RK. Case study research and applications: design and methods. Sage 2017.

King G, Keohane RO, Verba S. Designing social inquiry: Scientific inference in qualitative research 1994.

Greenhalgh T, Robert G, Macfarlane F, Bate P, Kyriakidou O. Diffusion of innovations in service organizations: systematic review and recommendations. The Milbank Quarterly. 2004;82(4):581–629.

Denzin NK, Lincoln YS. The SAGE handbook of qualitative research. 3rd ed. Thousand Oaks: Sage; 2005.

Google Scholar  

Raine R, Fitzpatrick R, Barratt H, Bevan G, Black N, Boaden R, et al. Challenges, solutions and future directions in the evaluation of service innovations in health care and public health. Southampton (UK): NIHR Journals Library; 2016. https://doi.org/10.3310/hsdr04160 .

Greenhalgh T, Robert G, Macfarlane F, Bate P, Kyriakidou O, Peacock R. Storylines of research in diffusion of innovation: a meta-narrative approach to systematic review. Soc Sci Med. 2005;61(2):417–30.

Wong G, Greenhalgh T, Westhrop G, Pawson R, NIHR journals library, National Institute for Health Research (Great Britain). Development of methodological guidance, publication standards and training materials for realist and meta-narrative reviews : the RAMESES (realist and Meta-narrative evidence syntheses – evolving standards) project. Southampton: NIHR Journals Library,; 2014.

Wong G, Greenhalgh T, Westhorp G, Buckingham J, Pawson R. RAMESES publication standards: meta-narrative reviews. BMC Med. 2013;11:20.

Boell SK, Cecez-Kecmanovic D. A hermeneutic approach for conducting literature reviews and literature searches. CAIS. 2014;34:12.

Harrison H, Birks M, Franklin R, Mills J, editors. Case study research: foundations and methodological orientations. Forum Qualitative Sozialforschung/Forum: Qualitative Social Research; 2017.

McDonnell A, Lloyd Jones M, Read S. Practical considerations in case study research: the relationship between methodology and process. J Adv Nurs. 2000;32(2):383–90.

Article   CAS   PubMed   Google Scholar  

Mazzocato P, Unbeck M, Elg M, Skoldenberg OG, Thor J. Unpacking the key components of a programme to improve the timeliness of hip-fracture care: a mixed-methods case study. Scand J Trauma Resuscitation Emerg Med. 2015;23:93.

Chandler CIR, Burchett H, Boyle L, Achonduh O, Mbonye A, DiLiberto D, et al. Examining intervention design: lessons from the development of eight related malaria health care intervention studies. Health Systems Reform. 2016;2(4):373–88.

Jansson E, Fosse E, Tillgren P. National public health policy in a local context-implementation in two Swedish municipalities. Health Policy. 2011;103(2/3):219–27.

Browne AJ, Varcoe C, Ford-Gilboe M, Wathen CN, Team ER. EQUIP healthcare: an overview of a multi-component intervention to enhance equity-oriented care in primary health care settings. Int J Equity Health. 2015;14:152.

Reszel J, Dunn SI, Sprague AE, Graham ID, Grimshaw JM, Peterson WE, et al. Use of a maternal newborn audit and feedback system in Ontario: a collective case study. BMJ quality &amp. Safety. 2019;28(8):635–44.

Bradley EH, Webster TR, Baker D, Schlesinger M, Inouye SK. After adoption: sustaining the innovation. A case study of disseminating the hospital elder life program. J Am Geriatr Soc. 2005;53(9):1455–61.

Bradley F, Wiles R, Kinmonth AL, Mant D, Gantley M. Development and evaluation of complex interventions in health services research: case study of the Southampton heart integrated care project (SHIP). The SHIP Collaborative Group. BMJ. 1999;318(7185):711–5.

Power R, Langhaug LF, Nyamurera T, Wilson D, Bassett MT, Cowan FM. Developing complex interventions for rigorous evaluation--a case study from rural Zimbabwe. Health Educ Res. 2004;19(5):570–5.

Lamb J, Dowrick C, Burroughs H, Beatty S, Edwards S, Bristow K, et al. Community engagement in a complex intervention to improve access to primary mental health care for hard-to-reach groups. Health Expect. 2015;18(6):2865–79.

Martin GP, Weaver S, Currie G, Finn R, McDonald R. Innovation sustainability in challenging health-care contexts: embedding clinically led change in routine practice. Health Serv Manag Res. 2012;25(4):190–9.

Dixon-Woods M, Redwood S, Leslie M, Minion J, Martin GP, Coleman JJ. Improving quality and safety of care using "technovigilance": an ethnographic case study of secondary use of data from an electronic prescribing and decision support system. Milbank Q. 2013;91(3):424–54.

Wears RL, Cook RI, Perry SJ. Automation, interaction, complexity, and failure: a case study. Reliability Eng Syst Safety. 2006;91(12):1494–501.

Hurley C, Baum F, Eyk H. 'Designing better health care in the South': a case study of unsuccessful transformational change in public sector health service reform. Aust J Public Adm. 2004;63(2):31–41.

McCarthy D, Blumenthal D. Stories from the sharp end: case studies in safety improvement. Milbank Q. 2006;84(1):165–200.

Byng R, Norman I, Redfern S, Jones R. Exposing the key functions of a complex intervention for shared care in mental health: case study of a process evaluation. BMC Health Serv Res. 2008;8:274.

Byng R, Norman I, Redfern S. Using realistic evaluation to evaluate a practice-level intervention to improve primary healthcare for patients with long-term mental illness. Evaluation. 2005;11(1):69–93.

Goicolea I, Hurtig AK, San Sebastian M, Marchal B, Vives-Cases C. Using realist evaluation to assess primary healthcare teams' responses to intimate partner violence in Spain. Gac Sanit. 2015;29(6):431–6.

Hoddinott P, Britten J, Pill R. Why do interventions work in some places and not others: a breastfeeding support group trial. Soc Sci Med. 2010;70(5):769–78.

Rycroft-Malone J, Burton CR, Wilkinson J, Harvey G, McCormack B, Baker R, et al. Collective action for implementation: a realist evaluation of organisational collaboration in healthcare. Implement Sci. 2016;11:17.

Mukumbang FC, van Wyk B, Van Belle S, Marchal B. Unravelling how and why the antiretroviral adherence Club intervention works (or not) in a public health facility: a realist explanatory theory-building case study. PLoS One. 2019;14(1):e0210565.

Greenhalgh T, Macfarlane F, Barton-Sweeney C, Woodard F. “If we build it, will it stay?” Case study of sustainability of whole system transformation. Milbank Quarterly. 2012;90:516–47.

Fulop N, Protopsaltis G, King A, Allen P, Hutchings A, Normand C. Changing organisations: a study of the context and processes of mergers of health care providers in England. Soc Sci Med. 2005;60(1):119–30.

Campbell C, Nair Y, Maimane S. Building contexts that support effective community responses to HIV/AIDS: a south African case study. Am J Community Psychol. 2007;39(3–4):347–63.

Greenhalgh T, Shaw S, Wherton J, Vijayaraghavan S, Morris J, Bhattacharya S, et al. Real-world implementation of video outpatient consultations at macro, Meso, and Micro levels: mixed-method study. J Med Internet Res. 2018;20(4):e150.

Maguire S. Discourse and adoption of innovations: a study of HIV/AIDS treatments. Health Care Manag Rev. 2002;27(3):74–88.

Yin RK. Design and methods. Case study Research. 2003;3(9.2).

Van Eyk H, Baum F, Blandford J. Evaluating healthcare reform: the challenge of evaluating changing policy environments. Evaluation. 2001;7(4):487–503.

Pawson R, Tilley N. Realistic evaluation. London ; Thousand Oaks: Sage,; 1997.

Pawson R, Greenhalgh T, Harvey G, Walshe K. Realist review-a new method of systematic review designed for complex policy interventions. J Health Serv Res Policy. 2005;10(1_suppl):21–34.

Wong G, Westhorp G, Manzano A, Greenhalgh J, Jagosh J, Greenhalgh T. RAMESES II reporting standards for realist evaluations. BMC Med. 2016;14(1):96.

Stake R. Case studies. Handbook of qualitative research, 2nd ed, Thousand Oaks: Sage Publications. 2000 435–454.

Stake RE. The art of case study research. London: Sage Publications Ltd; 1995.

Geertz C. Thick description: toward an interpretive theory of culture. Turning Points in Qualitative Research: Tying knots in a handkerchief. 1973;3:143–68.

Abma TA, Stake RE. Science of the particular: an advocacy of naturalistic case study in health research. Qual Health Res. 2014;24(8):1150–61.

Van Maanen J. Tales of the Field: on writing ethnography. Chicago and London: University of Chicago Press; 1988.

Stake RE Qualitative Case Studies. Denzin N, Lincoln Y eds. The Sage Handbook of Qualitative Research. Sage. Thousand Oaks. 2005. Pp. 443–366.

Flyvbjerg B. Case Study. Denzin N, Lincoln Y eds. The Sage handbook of qualitative research 4th edition. Sage. Thousand Oaks. 2011. Pp. 301–316.

Bergen A, While A. A case for case studies: exploring the use of case study design in community nursing research. J Adv Nurs. 2000;31(4):926–34.

Crowe S, Cresswell K, Robertson A, Huby G, Avery A, Sheikh A. The case study approach. BMC Med Res Methodol. 2011;11:100.

Payne S, Field D, Rolls L, Hawker S, Kerr C. Case study research methods in end-of-life care: reflections on three studies. J Adv Nurs. 2007;58(3):236–45.

Segar J, Checkland K, Coleman A, McDermott I. Thinking about Case Studies in 3-D: Researching the NHS Clinical Commissioning Landscape in England. Case Study Evaluation: Past, Present and Future Challenges. Advances in Program Evaluation. 15: Emerald Group Publishing Limited; 2015. p. 85–105.

Sharp K. The case for case studies in nursing research: the problem of generalization. J Adv Nurs. 1998;27(4):785–9.

Mitchell JC. Case studies. In: Ellen RF, editor. Ethnographic research: a guide to general conduct. London: Academic Press; 1984. p. 237–41.

Gerring J. What is a case study and what is it good for? Am Political Sci Review. 2004;98(2):341–54.

Burawoy M. The Extended Case Method Sociological Theory 1998;16.

Strehlenert H, Hansson J, Nyström ME, Hasson H. Implementation of a national policy for improving health and social care: a comparative case study using the consolidated framework for implementation research. BMC Health Serv Res 2019;19(1):1–0.

Damschroder LJ, Aron DC, Keith RE, Kirsh SR, Alexander JA, Lowery JC. Fostering implementation of health services research findings into practice: a consolidated framework for advancing implementation science. Implement Sci. 2009;4(1):1–5.

Ellis B, Howard J. Clinical governance, education and learning to manage health information. Clinical Governance: International Journal. 2011;16(4):337–52.

Yin RK. Case study research: design and methods. London: Sage publications; 2013.

Hyett N, Kenny A, Dickson-Swift V. Methodology or method? A critical review of qualitative case study reports. Int J Qual Stud Health Well Being. 2014;9(1):23606.

Squires JE, Graham I, Bashir K, Nadalin-Penno L, Lavis J, Francis J, et al. Understanding context: a concept analysis. J Adv Nurs. 2019 Dec;75(12):3448–70.

Pfadenhauer LM, Gerhardus A, Mozygemba K, Lysdahl KB, Booth A, Hofmann B, et al. Making sense of complexity in context and implementation: the context and implementation of complex interventions (CICI) framework. Implement Sci. 2017;12(1):1–7.

Skivington K, Matthews L, Craig P, Simpson S, Moore L. Developing and evaluating complex interventions: updating Medical Research Council guidance to take account of new methodological and theoretical approaches. Lancet. 2018;392:S2.

Moore GF, Evans RE, Hawkins J, Littlecott H, Melendez-Torres GJ, Bonell C, et al. From complex social interventions to interventions in complex social systems: future directions and unresolved questions for intervention development and evaluation. Evaluation. 2018;25(1):23–45.

Shoveller J, Viehbeck S, Di Ruggiero E, Greyson D, Thomson K, Knight R. A critical examination of representations of context within research on population health interventions. Crit Public Health. 2016;26(5):487–500.

Rhodes T, Lancaster K. Evidence-making interventions in health: a conceptual framing. Soc Sci Med. 2019;238:112488.

Meier N, Dopson S. Context in action and how to study it : illustrations from health care [still image]. Oxford: Oxford University Press,; 2019. Available from: Oxford scholarship online https://doi.org/10.1093/oso/9780198805304.001.0001 .

Ragin CC. The comparative method: moving beyond qualitative and quantitative strategies: Univ of California press; 2014.

Hanckel B, Petticrew M, Thomas J, Green J. The use of qualitative comparative analysis (QCA) to address causality in complex systems: a systematic review of research on public health interventions. BMC Public Health. 2021;21(1):1–22.

Download references

Acknowledgements

Thanks go to Nia Roberts, librarian at the Bodleian Health Care Library of the University of Oxford, for her help and support in developing search strategies and carrying out searches.

The research was funded by the Medical Research Council (MR/S014632/1).

Additional funding for SP, TG, CP and SS salaries over the course of the study was provided by the UK National Institute for Health Research Oxford Biomedical Research Centre (BRC-1215-20008), Wellcome Trust (WT104830MA; 221457/Z/20/Z) and the University of Oxford’s Higher Education Innovation Fund. JG was additionally supported by a Wellcome Trust Centre Grant (203109/Z/16/Z).

Funding bodies had no input to the design of the study and collection, analysis, and interpretation of data or preparation of this paper.

Author information

Authors and affiliations.

Nuffield Department of Primary Care Health Sciences, University of Oxford, Radcliffe Observatory Quarter, Oxford, OX2 6GG, UK

Sara Paparini, Chrysanthi Papoutsi, Trisha Greenhalgh & Sara E. Shaw

School of Population Health & Environmental Sciences, King’s College London, London, UK

Jamie Murdoch

Wellcome Centre for Cultures & Environments of Health, University of Exeter, Exeter, UK

Judith Green

Public Health, Environments & Society, London School of Hygiene & Tropical Medicine, London, UK

Mark Petticrew

You can also search for this author in PubMed   Google Scholar

Contributions

JM had the initial idea for the research and, with SS, developed the initial study design. SS led the funding application and was principal investigator for this study and (as such) its guarantor. She was involved in all aspects of study design, review, analysis and writing; she drafted the first version of the paper. CP, JM and TG are senior academics with expertise in evidence review and synthesis and were involved in refining study design. CP led on oversight of the meta-narrative review, was part of the review team (with JM, SP and SS) and contributed to all other aspects of the study. SP is a senior academic and was involved in all aspects of the research. She led on searching and meta-narrative review of papers. JG and MP are senior academics with expertise in public health. Along with TG they formed the wider study steering group, reviewing study progress, inputting to the meta-narrative review and identifying wider literature on case study, context and complex interventions. All three contributed theoretical and methodological perspectives on case study research relating to complex interventions in health systems and public health. All authors have reviewed and approved the final manuscript.

Corresponding author

Correspondence to Sara E. Shaw .

Ethics declarations

Ethics approval and consent to participate.

Not applicable.

Consent for publication

Competing interests.

The authors declare that they have no competing interests.

Additional information

Publisher’s note.

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary Information

Additional file 1., rights and permissions.

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ . The Creative Commons Public Domain Dedication waiver ( http://creativecommons.org/publicdomain/zero/1.0/ ) applies to the data made available in this article, unless otherwise stated in a credit line to the data.

Reprints and permissions

About this article

Cite this article.

Paparini, S., Papoutsi, C., Murdoch, J. et al. Evaluating complex interventions in context: systematic, meta-narrative review of case study approaches. BMC Med Res Methodol 21 , 225 (2021). https://doi.org/10.1186/s12874-021-01418-3

Download citation

Received : 16 June 2021

Accepted : 29 September 2021

Published : 25 October 2021

DOI : https://doi.org/10.1186/s12874-021-01418-3

Share this article

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

  • Complex interventions
  • Case study methods
  • Qualitative research
  • Mixed methods research
  • Literature review
  • Meta-narrative

BMC Medical Research Methodology

ISSN: 1471-2288

review case study methodology

A methodological review of qualitative case study methodology in midwifery research

Affiliations.

  • 1 Centre for Midwifery, Child and Family Health, University of Technology Sydney, New South Wales, Australia.
  • 2 Faculty of Health, ACT Health Directorate and University of Canberra, University of Canberra, Australian Capital Territory, Australia.
  • PMID: 26909766
  • DOI: 10.1111/jan.12946

Aim: To explore the use and application of case study research in midwifery.

Background: Case study research provides rich data for the analysis of complex issues and interventions in the healthcare disciplines; however, a gap in the midwifery research literature was identified.

Design: A methodological review of midwifery case study research using recognized templates, frameworks and reporting guidelines facilitated comprehensive analysis.

Data sources: An electronic database search using the date range January 2005-December 2014: Maternal and Infant Care, CINAHL Plus, Academic Search Complete, Web of Knowledge, SCOPUS, Medline, Health Collection (Informit), Cochrane Library Health Source: Nursing/Academic Edition, Wiley online and ProQuest Central.

Review methods: Narrative evaluation was undertaken. Clearly worded questions reflected the problem and purpose. The application, strengths and limitations of case study methods were identified through a quality appraisal process.

Results: The review identified both case study research's applicability to midwifery and its low uptake, especially in clinical studies. Many papers included the necessary criteria to achieve rigour. The included measures of authenticity and methodology were varied. A high standard of authenticity was observed, suggesting authors considered these elements to be routine inclusions. Technical aspects were lacking in many papers, namely a lack of reflexivity and incomplete transparency of processes.

Conclusion: This review raises the profile of case study research in midwifery. Midwives will be encouraged to explore if case study research is suitable for their investigation. The raised profile will demonstrate further applicability; encourage support and wider adoption in the midwifery setting.

Keywords: case study research; maternity; methodological review; methodology; midwifery; midwives; qualitative research.

© 2016 John Wiley & Sons Ltd.

Publication types

  • Qualitative Research*
  • Business Essentials
  • Leadership & Management
  • Credential of Leadership, Impact, and Management in Business (CLIMB)
  • Entrepreneurship & Innovation
  • Digital Transformation
  • Finance & Accounting
  • Business in Society
  • For Organizations
  • Support Portal
  • Media Coverage
  • Founding Donors
  • Leadership Team

review case study methodology

  • Harvard Business School →
  • HBS Online →
  • Business Insights →

Business Insights

Harvard Business School Online's Business Insights Blog provides the career insights you need to achieve your goals and gain confidence in your business skills.

  • Career Development
  • Communication
  • Decision-Making
  • Earning Your MBA
  • Negotiation
  • News & Events
  • Productivity
  • Staff Spotlight
  • Student Profiles
  • Work-Life Balance
  • AI Essentials for Business
  • Alternative Investments
  • Business Analytics
  • Business Strategy
  • Business and Climate Change
  • Design Thinking and Innovation
  • Digital Marketing Strategy
  • Disruptive Strategy
  • Economics for Managers
  • Entrepreneurship Essentials
  • Financial Accounting
  • Global Business
  • Launching Tech Ventures
  • Leadership Principles
  • Leadership, Ethics, and Corporate Accountability
  • Leading with Finance
  • Management Essentials
  • Negotiation Mastery
  • Organizational Leadership
  • Power and Influence for Positive Impact
  • Strategy Execution
  • Sustainable Business Strategy
  • Sustainable Investing
  • Winning with Digital Platforms

5 Benefits of Learning Through the Case Study Method

Harvard Business School MBA students learning through the case study method

  • 28 Nov 2023

While several factors make HBS Online unique —including a global Community and real-world outcomes —active learning through the case study method rises to the top.

In a 2023 City Square Associates survey, 74 percent of HBS Online learners who also took a course from another provider said HBS Online’s case method and real-world examples were better by comparison.

Here’s a primer on the case method, five benefits you could gain, and how to experience it for yourself.

Access your free e-book today.

What Is the Harvard Business School Case Study Method?

The case study method , or case method , is a learning technique in which you’re presented with a real-world business challenge and asked how you’d solve it. After working through it yourself and with peers, you’re told how the scenario played out.

HBS pioneered the case method in 1922. Shortly before, in 1921, the first case was written.

“How do you go into an ambiguous situation and get to the bottom of it?” says HBS Professor Jan Rivkin, former senior associate dean and chair of HBS's master of business administration (MBA) program, in a video about the case method . “That skill—the skill of figuring out a course of inquiry to choose a course of action—that skill is as relevant today as it was in 1921.”

Originally developed for the in-person MBA classroom, HBS Online adapted the case method into an engaging, interactive online learning experience in 2014.

In HBS Online courses , you learn about each case from the business professional who experienced it. After reviewing their videos, you’re prompted to take their perspective and explain how you’d handle their situation.

You then get to read peers’ responses, “star” them, and comment to further the discussion. Afterward, you learn how the professional handled it and their key takeaways.

HBS Online’s adaptation of the case method incorporates the famed HBS “cold call,” in which you’re called on at random to make a decision without time to prepare.

“Learning came to life!” said Sheneka Balogun , chief administration officer and chief of staff at LeMoyne-Owen College, of her experience taking the Credential of Readiness (CORe) program . “The videos from the professors, the interactive cold calls where you were randomly selected to participate, and the case studies that enhanced and often captured the essence of objectives and learning goals were all embedded in each module. This made learning fun, engaging, and student-friendly.”

If you’re considering taking a course that leverages the case study method, here are five benefits you could experience.

5 Benefits of Learning Through Case Studies

1. take new perspectives.

The case method prompts you to consider a scenario from another person’s perspective. To work through the situation and come up with a solution, you must consider their circumstances, limitations, risk tolerance, stakeholders, resources, and potential consequences to assess how to respond.

Taking on new perspectives not only can help you navigate your own challenges but also others’. Putting yourself in someone else’s situation to understand their motivations and needs can go a long way when collaborating with stakeholders.

2. Hone Your Decision-Making Skills

Another skill you can build is the ability to make decisions effectively . The case study method forces you to use limited information to decide how to handle a problem—just like in the real world.

Throughout your career, you’ll need to make difficult decisions with incomplete or imperfect information—and sometimes, you won’t feel qualified to do so. Learning through the case method allows you to practice this skill in a low-stakes environment. When facing a real challenge, you’ll be better prepared to think quickly, collaborate with others, and present and defend your solution.

3. Become More Open-Minded

As you collaborate with peers on responses, it becomes clear that not everyone solves problems the same way. Exposing yourself to various approaches and perspectives can help you become a more open-minded professional.

When you’re part of a diverse group of learners from around the world, your experiences, cultures, and backgrounds contribute to a range of opinions on each case.

On the HBS Online course platform, you’re prompted to view and comment on others’ responses, and discussion is encouraged. This practice of considering others’ perspectives can make you more receptive in your career.

“You’d be surprised at how much you can learn from your peers,” said Ratnaditya Jonnalagadda , a software engineer who took CORe.

In addition to interacting with peers in the course platform, Jonnalagadda was part of the HBS Online Community , where he networked with other professionals and continued discussions sparked by course content.

“You get to understand your peers better, and students share examples of businesses implementing a concept from a module you just learned,” Jonnalagadda said. “It’s a very good way to cement the concepts in one's mind.”

4. Enhance Your Curiosity

One byproduct of taking on different perspectives is that it enables you to picture yourself in various roles, industries, and business functions.

“Each case offers an opportunity for students to see what resonates with them, what excites them, what bores them, which role they could imagine inhabiting in their careers,” says former HBS Dean Nitin Nohria in the Harvard Business Review . “Cases stimulate curiosity about the range of opportunities in the world and the many ways that students can make a difference as leaders.”

Through the case method, you can “try on” roles you may not have considered and feel more prepared to change or advance your career .

5. Build Your Self-Confidence

Finally, learning through the case study method can build your confidence. Each time you assume a business leader’s perspective, aim to solve a new challenge, and express and defend your opinions and decisions to peers, you prepare to do the same in your career.

According to a 2022 City Square Associates survey , 84 percent of HBS Online learners report feeling more confident making business decisions after taking a course.

“Self-confidence is difficult to teach or coach, but the case study method seems to instill it in people,” Nohria says in the Harvard Business Review . “There may well be other ways of learning these meta-skills, such as the repeated experience gained through practice or guidance from a gifted coach. However, under the direction of a masterful teacher, the case method can engage students and help them develop powerful meta-skills like no other form of teaching.”

Your Guide to Online Learning Success | Download Your Free E-Book

How to Experience the Case Study Method

If the case method seems like a good fit for your learning style, experience it for yourself by taking an HBS Online course. Offerings span seven subject areas, including:

  • Business essentials
  • Leadership and management
  • Entrepreneurship and innovation
  • Finance and accounting
  • Business in society

No matter which course or credential program you choose, you’ll examine case studies from real business professionals, work through their challenges alongside peers, and gain valuable insights to apply to your career.

Are you interested in discovering how HBS Online can help advance your career? Explore our course catalog and download our free guide —complete with interactive workbook sections—to determine if online learning is right for you and which course to take.

review case study methodology

About the Author

  • Open access
  • Published: 14 October 2023

A scoping review of ‘Pacing’ for management of Myalgic Encephalomyelitis/Chronic Fatigue Syndrome (ME/CFS): lessons learned for the long COVID pandemic

  • Nilihan E. M. Sanal-Hayes 1 , 7 ,
  • Marie Mclaughlin 1 , 8 ,
  • Lawrence D. Hayes 1 ,
  • Jacqueline L. Mair   ORCID: orcid.org/0000-0002-1466-8680 2 , 3 ,
  • Jane Ormerod 4 ,
  • David Carless 1 ,
  • Natalie Hilliard 5 ,
  • Rachel Meach 1 ,
  • Joanne Ingram 6 &
  • Nicholas F. Sculthorpe 1  

Journal of Translational Medicine volume  21 , Article number:  720 ( 2023 ) Cite this article

3363 Accesses

5 Citations

21 Altmetric

Metrics details

Controversy over treatment for people with myalgic encephalomyelitis/chronic fatigue syndrome (ME/CFS) is a barrier to appropriate treatment. Energy management or pacing is a prominent coping strategy for people with ME/CFS. Whilst a definitive definition of pacing is not unanimous within the literature or healthcare providers, it typically comprises regulating activity to avoid post exertional malaise (PEM), the worsening of symptoms after an activity. Until now, characteristics of pacing, and the effects on patients’ symptoms had not been systematically reviewed. This is problematic as the most common approach to pacing, pacing prescription, and the pooled efficacy of pacing was unknown. Collating evidence may help advise those suffering with similar symptoms, including long COVID, as practitioners would be better informed on methodological approaches to adopt, pacing implementation, and expected outcomes.

In this scoping review of the literature, we aggregated type of, and outcomes of, pacing in people with ME/CFS.

Eligibility criteria

Original investigations concerning pacing were considered in participants with ME/CFS.

Sources of evidence

Six electronic databases (PubMed, Scholar, ScienceDirect, Scopus, Web of Science and the Cochrane Central Register of Controlled Trials [CENTRAL]) were searched; and websites MEPedia, Action for ME, and ME Action were also searched for grey literature, to fully capture patient surveys not published in academic journals.

A scoping review was conducted. Review selection and characterisation was performed by two independent reviewers using pretested forms.

Authors reviewed 177 titles and abstracts, resulting in 17 included studies: three randomised control trials (RCTs); one uncontrolled trial; one interventional case series; one retrospective observational study; two prospective observational studies; four cross-sectional observational studies; and five cross-sectional analytical studies. Studies included variable designs, durations, and outcome measures. In terms of pacing administration, studies used educational sessions and diaries for activity monitoring. Eleven studies reported benefits of pacing, four studies reported no effect, and two studies reported a detrimental effect in comparison to the control group.

Conclusions

Highly variable study designs and outcome measures, allied to poor to fair methodological quality resulted in heterogenous findings and highlights the requirement for more research examining pacing. Looking to the long COVID pandemic, our results suggest future studies should be RCTs utilising objectively quantified digitised pacing, over a longer duration of examination (i.e. longitudinal studies), using the core outcome set for patient reported outcome measures. Until these are completed, the literature base is insufficient to inform treatment practises for people with ME/CFS and long COVID.

Introduction

Post-viral illness occurs when individuals experience an extended period of feeling unwell after a viral infection [ 1 , 2 , 3 , 4 , 5 , 6 ]. While post-viral illness is generally a non-specific condition with a constellation of symptoms that may be experienced, fatigue is amongst the most commonly reported [ 7 , 8 , 9 ]. For example, our recent systematic review found there was up to 94% prevalence of fatigue in people following acute COVID-19 infection [ 3 ]. The increasing prevalence of long COVID has generated renewed interest in symptomology and time-course of post-viral fatigue, with PubMed reporting 72 articles related to “post-viral fatigue” between 2020 and 2022, but less than five for every year since 1990.

As the coronavirus pandemic developed, it became clear that a significant proportion of the population experienced symptoms which persisted beyond the initial viral infection, meeting the definition of a post-viral illness. Current estimates suggest one in eight people develop long COVID [ 10 ] and its symptomatology has repeatedly been suggested to overlap with clinical demonstrations of myalgic encephalomyelitis/chronic fatigue syndrome (ME/CFS). In a study by Wong and Weitzer [ 11 ], long COVID symptoms from 21 studies were compared to a list of ME/CFS symptoms. Of the 29 known ME/CFS symptoms the authors reported that 25 (86%) were reported in at least one long COVID study suggesting significant similarities. Sukocheva et al. [ 12 ] reported that long COVID included changes in immune, cardiovascular, metabolic, gastrointestinal, nervous and autonomic systems. When observed from a pathological stance, this list of symptoms is shared with, or is similar to, the symptoms patients with ME/CFS describe [ 13 ]. In fact, a recent article reported 43% of people with long COVID are diagnosed with ME/CFS [ 13 ], evidencing the analogous symptom loads.

A striking commonality between long COVID and similar conditions such as ME/CFS is the worsening of symptoms including fatigue, pain, cognitive difficulties, sore throat, and/or swollen lymph nodes following exertion. Termed post exertional malaise (PEM) [ 14 , 15 , 16 , 17 ], lasting from hours to several days, it is arguably one of the most debilitating side effects experienced by those with ME/CFS [ 16 , 17 , 18 ]. PEM is associated with considerably reduced quality of life amongst those with ME/CFS, with reduced ability to perform activities of daily living, leading to restraints on social and family life, mental health comorbidities such as depression and anxiety, and devastating employment and financial consequences [ 19 , 20 , 21 , 22 ]. At present, there is no cure or pharmacological treatments for PEM, and therefore, effective symptom management strategies are required. This may be in part because the triggers of PEM are poorly understood, and there is little evidence for what causes PEM, beyond anecdotal evidence. The most common approach to manage PEM is to incorporate activity pacing into the day-to-day lives of those with ME/CFS with the intention of reducing the frequency of severity of bouts of PEM [ 23 ]. Pacing is defined as an approach where patients are encouraged to be as active as possible within the limits imposed by the illness [ 23 , 24 , 25 ]. In practice, pacing requires individuals to determine a level at which they can function, but which does not lead to a marked increase in fatigue and other symptoms [ 26 , 27 ].

Although long COVID is a new condition [ 3 , 14 ], the available evidence suggests substantial overlap with the symptoms of conditions such as ME/CFS and it is therefore pragmatic to consider the utility of management strategies (such as pacing) used in ME/CFS for people with long COVID. In fact, a recent Delphi study recommended that management of long COVID should incorporate careful pacing to avoid PEM relapse [ 28 ]. This position was enforced by a multidisciplinary consensus statement considering treatment of fatigue in long COVID, recommending energy conservation strategies (including pacing) for people with long COVID [ 29 ]. Given the estimated > 2 million individuals who have experienced long COVID in the UK alone [ 30 , 31 , 32 ], there is an urgent need for evidence-based public health strategies. In this context, it seems pragmatic to borrow from the ME/CFS literature.

From a historical perspective, the 2007 NICE guidelines for people with ME/CFS advised both cognitive behavioural therapy (CBT) and graded exercise therapy (GET) should be offered to people with ME/CFS [ 33 ]. As of the 2021 update, NICE guidelines for people with ME/CFS do not advise CBT or GET, and the only recommended management strategy is pacing [ 34 ]. In the years between changes to these guidelines, the landmark PACE trial [ 35 ] was published in 2011. This large, randomised control trial (RCT; n = 639) compared pacing with CBT and reported GET and CBT were more effective than pacing for improving symptoms. Yet, this study has come under considerable criticism from patient groups and clinicians alike [ 36 , 37 , 38 , 39 ]. This may partly explain why NICE do not advise CBT or GET as of 2021, and only recommend pacing for symptom management people with ME/CFS [ 34 ]. There has been some controversy over best treatment for people with ME/CFS in the literature and support groups, potentially amplified by the ambiguity of evidence for pacing efficacy and how pacing should be implemented. As such, before pacing can be advised for people with long COVID, it is imperative previous literature concerning pacing is systematically reviewed. This is because a consensus is needed within the literature for implementing pacing so practitioners treating people with ME/CFS or long COVID can do so effectively. A lack of agreement in pacing implementation is a barrier to adoption for both practitioners and patients. Despite several systematic reviews concerning pharmacological interventions or cognitive behavioural therapy in people with ME/CFS [ 36 , 40 , 41 ], to date, there are no systematic reviews concerning pacing.

Despite the widespread use of pacing, the literature base is limited and includes clinical commentaries, case studies, case series, and few randomised control trials. Consequently, while a comprehensive review of the effects of pacing in ME/CFS is an essential tool to guide symptom management advice, the available literature means that effective pooling of data is not feasible [ 42 ] and therefore, a traditional systematic review and meta-analysis, with a tightly focussed research question would be premature [ 43 ]. Consequently, we elected to undertake a scoping review. This approach retains the systematic approach to literature searching but aims to map out the current state of the research [ 43 ]. Using the framework of Arksey and O'Malley [ 44 ], a scoping review aims to use a broad set of search terms and include a wide range of study designs and methods (in contrast to a systematic review [ 44 ]). This approach, has the benefit of clarifying key concepts, surveying current data collection approaches, and identifying critical knowledge gaps.

We aimed to provide an overview of existing literature concerning pacing in ME/CFS. Our three specific objectives of this scoping review were to (1) conduct a systematic search of the published literature concerning ME/CFS and pacing, (2) map characteristics and methodologies used, and (3) provide recommendations for the advancement of the research area.

Protocol and registration

The review was conducted and reported according to the Preferred Reporting Items for Systematic Reviews and Meta-Analyses extension for scoping reviews (PRISMA-ScR) guidelines [ 45 ] and the five-stage framework outlined in Arksey and O’Malley [ 44 ]. Registration is not recommended for scoping reviews.

Studies that met the following criteria were included in this review: (1) published as a full-text manuscript; (2) not a review; (3) participants with ME/CFS; (4) studies employed a pacing intervention or retrospective analysis of pacing or a case study of pacing. Studies utilising sub-analysis of the pacing, graded activity, and cognitive behaviour therapy: a randomised evaluation (PACE) trial were included as these have different outcome measures and, as this is not a meta-analysis, this will not influence effect size estimates. Additionally, due to the paucity of evidence, grey literature has also been included in this review.

Search strategy

The search strategy consisted of a combination of free-text and MeSH terms relating to ME/CFS and pacing, which were developed through an examination of published original literature and review articles. Example search terms for PubMed included: ‘ME/CFS’ OR ‘ME’ OR ‘CFS’ OR ‘chronic fatigue syndrome’ OR ‘PEM’ OR ‘post exertional malaise’ OR ‘pene’ OR ‘post-exertion neurogenic exhaust’ AND ‘pacing’ OR ‘adaptive pacing’. The search was performed within title/abstract. Full search terms can be found in Additional file 1 .

Information sources

Six electronic databases [PubMed, Scholar, ScienceDirect, Scopus, Web of Science, and the Cochrane Central Register of Controlled Trials (CENTRAL)] were searched to identify original research articles published from the earliest available date up until 02/02/2022. Additional records were identified through reference lists of included studies. ‘Grey literature’ repositories including MEPedia, Action for ME, and ME Action were also searched with the same terms.

Study selection and data items

Once each database search was completed and manuscripts were sourced, all studies were downloaded into a single reference list (Zotero, version 6.0.23) and duplicates were removed. Titles and abstracts were screened for eligibility by two reviewers independently and discrepancies were resolved through discussion between reviewers. Subsequently, full text papers of potentially relevant studies were retrieved and assessed for eligibility by the same two reviewers independently. Any uncertainty by reviewers was discussed in consensus meetings and resolved by agreement. Data extracted from each study included sample size, participant characteristics, study design, trial registration details, study location, pacing description (type), intervention duration, intervention adherence, outcome variables, and main outcome data. Descriptions were extracted with as much detail as was provided by the authors. Study quality was assessed using the Physiotherapy Evidence Database (PEDro) scale [ 46 , 47 ].

Role of the funding source

The study sponsors had no role in study design, data collection, analysis, or interpretation, nor writing the report, nor submitting the paper for publication.

Study selection

After the initial database search, 281 records were identified (see Fig.  1 ). Once duplicates were removed, 177 titles and abstracts were screened for inclusion resulting in 22 studies being retrieved as full text and assessed for eligibility. Of those, five were excluded, and 17 articles remained and were used in the final qualitative synthesis.

figure 1

Schematic flow diagram describing exclusions of potential studies and final number of studies. RCT = randomized control trial. CT = controlled trial. UCT = uncontrolled trial

Study characteristics

Study characteristics are summarised in Table 1 . Of the 17 studies included, three were randomised control trials (RCTs [ 35 , 48 , 49 ]); one was an uncontrolled trial [ 50 ]; one was a case series [ 51 ]; one was a retrospective observational study [ 52 ], two were prospective observational studies [ 53 , 54 ]; four were cross-sectional observational studies [ 25 , 55 , 56 ]; and five were cross-sectional analytical studies [ 57 , 58 , 59 , 60 , 61 ] including sub-analysis of the PACE trial [ 35 , 56 , 59 , 61 ]. Seven of the studies were registered trials [ 35 , 48 , 49 , 50 , 56 , 57 , 58 ]. Diagnostic criteria for ME/CFS are summarised in Table 2 .

Types of pacing

Pacing interventions.

Of the 17 studies included, five implemented their own pacing interventions and will be discussed in this section. Sample sizes ranged from n = 7 in an interventional case series [ 51 ] to n = 641 participants in the largest RCT [ 35 ]. The first of these five studies considered an education session on pacing and self-management as the ‘pacing’ group, and a ‘pain physiology education’ group as the control group [ 49 ]. Two studies included educational sessions provided by a therapist plus activity monitoring via ActiGraph accelerometers [ 51 ] and diaries [ 48 ] at baseline and follow-up. In the first of these two studies, Nijs and colleagues [ 51 ] implemented a ‘self-management program’ which asked patients to estimate their current physical capabilities prior to commencing an activity and then complete 25–50% less than their perceived energy envelope. They[ 51 ] did not include a control group and had a sample size of only n = 7. Six years later, the same research group [ 48 ] conducted another pacing study which utilised relaxation as a comparator group (n = 12 and n = 14 in the pacing and relaxation groups, respectively). The pacing group underwent a pacing phase whereby participants again aimed to complete 25–50% less than their perceived energy envelope, followed by a gradual increase in exercise after the pacing phase (the total intervention spanned three weeks, and it is unclear how much was allocated to pacing, and how much to activity increase). Therefore, it could be argued that Kos et al. [ 48 ] really assessed pacing followed by a gradual exercise increase as outcome measures were assessed following the graded activity phase. Another pacing intervention delivered weekly educational sessions for six weeks and utilised a standardised rehabilitation programme using the ‘activity pacing framework’ [ 50 ] in a single-arm, no comparator group feasibility study. Finally, the PACE trial adopted an adaptive pacing therapy intervention consisting of occupational therapists helping patients to plan and pace activities utilising activity diaries to identify activities associated with fatigue and staying within their energy envelope [ 35 ]. This study incorporated standard medical care, cognitive behavioural therapy (CBT) and graded exercise therapy (GET) as comparator groups [ 35 ]. It is worth noting that the pacing group and the CBT group were both ‘encouraged’ to increase physical activity levels as long as participants did not exceed their energy envelope. Although not all five intervention studies explicitly mentioned the “Energy Envelope Theory”, which dictates that people with ME/CFS should not necessarily increase or decrease their activity levels, but moderate activity and practice energy conservation [ 62 ], all intervention studies used language analogous to this theory, such as participants staying within limits, within capacity, or similar.

The interventions included in this review were of varying durations, from a single 30-min education session [ 49 ], a 3-week (one session a week) educational programme [ 51 ], a 3-week (3 × 60–90 min sessions/week) educational programme [ 48 ], a 6-week rehabilitation programme [ 50 ], to a 24-week programme [ 35 ]. Intervention follow-up durations also varied across studies from immediately after [ 49 ], 1-week [ 51 ], 3-weeks [ 48 ], 3-months [ 50 ], and 1-year post-intervention [ 35 ].

Observational studies of pacing

Eight studies were observational and, therefore, included no intervention. Observational study sample sizes ranged from 16 in a cross-sectional interview study [ 25 ] to 1428 in a cross-sectional survey [ 52 ]. One study involved a retrospective analysis of participants’ own pacing strategies varying from self-guided pacing or pacing administered by a therapist compared with implementation of CBT and GET [ 52 ]. Five involved a cross-sectional analysis of participants own pacing strategies which varied from activity adjustment, planning and acceptance [ 50 , 55 ], and the Energy Envelope method [ 58 , 60 ]. Two studies were prospective observational studies investigating the Energy Envelope theory [ 53 , 54 ]. Four studies [ 56 , 57 , 59 , 61 ] included in this review involved sub-analysis of results of the PACE trial [ 35 ].

Outcome measures

Quantitative health outcomes.

ME/CFS severity and general health status were the most common outcome measures across studies (16/17) [ 35 , 48 , 49 , 50 , 51 , 52 , 53 , 54 , 55 , 56 , 57 , 58 , 59 , 60 , 61 , 63 ]. Studies utilised different instruments, including the Short-Form 36 (SF-36; 8/16) [ 35 , 51 , 53 , 54 , 56 , 57 , 58 , 60 ], SF-12 (2/16) [ 50 , 63 ], ME symptom and illness severity (2/16) [ 52 , 55 ], Patient health (PHQ-15; 1/16) [ 59 ], DePaul symptom questionnaire (DSQ; 1/16) [ 58 ], and the Patient health questionnaire-9 (1/16) [ 50 ]. Additionally, some studies used diagnostic criteria for ME/CFS as an outcome measure to determine recovery [ 57 , 59 , 61 ].

Pain was assessed by most included studies (11/17) [ 35 , 49 , 50 , 51 , 53 , 54 , 55 , 57 , 59 , 60 , 61 , 63 ]. Two studies [ 59 , 61 ] included the international CDC criteria for CFS which contain five painful symptoms central to a diagnosis of CFS: muscle pain and joint pain. Other methods of assessment included Brief Pain Inventory (1/11) [ 53 ], Chronic Pain Coping Inventory (CPCI; 1/11) [ 49 ], Pain Self Efficacy Questionnaire (PSEQ; 1/11) [ 50 ], Tampa Scale for Kinesiophobia–version CFS (1/11) [ 49 ], algometry (1/11) [ 49 ], Knowledge of Neurophysiology of Pain Test (1/12) [ 49 ], Pain Catastrophizing Scale (1/11) [ 49 ], Pain Anxiety Symptoms Scale short version (PASS-20; 1/11) [ 50 ], Pain Numerical Rating Scale (NRS; 1/11) [ 63 ].

Fatigue or post-exertional malaise was assessed by 11 of the 17 studies [ 35 , 48 , 50 , 51 , 53 , 54 , 56 , 57 , 60 , 61 , 63 ]. Again, measurement instruments were divergent between studies and included the Chalder Fatigue Questionnaire (CFQ; 4/11) [ 35 , 50 , 57 , 63 ], Fatigue Severity Scale (2/11) [ 53 , 60 ], the Chronic Fatigue Syndrome Medical Questionnaire (1/11) [ 60 ], and Checklist Individual Strength (CIS; 2/11) [ 48 , 51 ].

Anxiety and depression were also common outcome measures, utilised by four studies (4/17) [ 50 , 53 , 59 , 63 ]. These were also assessed using different instruments including Hospital Anxiety and Depression Scale (HADS; 2/4) [ 59 , 63 ], Generalised Anxiety Disorder Assessment (1/4 [ 50 ]), Beck Depression Inventory (BDI-II; 1/4) [ 53 ], Beck Anxiety Inventory (BAI; 1/4) [ 53 ], and Perceived Stress Scale (PSS; 1/4) [ 53 ].

Outcome measures also included sleep (2/17) [ 53 , 59 ], assessed by The Pittsburgh Sleep Quality Index (1/2) [ 53 ] and Jenkins sleep scale (1/2) [ 59 ]; and quality of life (2/17) [ 50 , 53 ] as assessed by the EuroQol five-dimensions, five-levels (EQ-5D-5L; 1/2) [ 50 ] and The Quality-of-Life Scale (1/2) [ 53 ]. Self-Efficacy was measured in four studies [ 50 , 53 , 59 , 60 ], assessed by the Brief Coping Orientation to Problems Experienced Scale (bCOPE; 1/4) [ 60 ] and the Chronic Disease Self-Efficacy measure (3/4) [ 50 , 53 , 59 ].

Quantitative evaluation of pacing

Some studies (4/17) [ 25 , 50 , 52 , 63 ] included assessments of the participants’ experiences of pacing, using the Activity Pacing Questionnaire (APQ-28; 1/4 [ 50 ], APQ-38 (2/4) [ 25 , 63 ]), a re-analysis of the 228 question survey regarding treatment (1/4) [ 52 ] originally produced by the ME Association [ 55 ], and qualitative semi-structured telephone interviews regarding appropriateness of courses in relation to individual patient needs (1/4) [ 25 ]. The APQ-28 and -38 have been previously validated, but the 228-question survey has not. When outcome measures included physical activity levels (4/17), the Canadian Occupational Performance Measure (COPM) was used in two studies [ 48 , 51 ], and two studies used accelerometers to record physical activity [ 51 , 54 ]. Of these two studies, Nijs [ 51 ] examined accelerometery after a 3-week intervention based on the Energy Envelope Theory and Brown et al. [ 54 ] evaluated the Energy Envelope Theory of pacing over 12 months.

Other outcomes

Two [ 53 , 59 ] of the 17 studies included structured clinical interviews for the Diagnostic and Statistical Manual of Mental Disorders, 4th edition (DSM-IV) to assess psychiatric comorbidity and psychiatric exclusions. One study included a disability benefits questionnaire [ 55 ], and one study included employment and education questionnaire [ 55 ]. Additionally, satisfaction of primary care was also used as an outcome measure (2/17) [ 25 , 55 ] assessed using the Chronic Pain Coping Inventory (CPCI).

Efficacy of pacing interventions

The majority of studies (12/17) [ 25 , 48 , 50 , 51 , 52 , 53 , 54 , 55 , 56 , 58 , 60 , 63 ] highlighted improvements in at least one outcome following pacing (Fig.  2 ). When the effect of pacing was assessed by ME symptomology and general health outcomes, studies reported pacing to be beneficial [ 25 , 50 , 51 , 53 , 54 , 55 , 56 , 58 ]. It is worth noting however that pacing reportedly worsened ME symptoms in 14% of survey respondents, whilst improving symptoms in 44% of respondents [ 52 ]. Most studies using fatigue as an outcome measure reported pacing to be efficacious (7/10) [ 50 , 51 , 53 , 54 , 56 , 60 , 63 ]. However, one study reported no change in fatigue with a pacing intervention (1/10) [ 35 ], and 2/10 studies [ 53 , 63 ] reported a worsening of fatigue with pacing. Physical function was used to determine the efficacy of pacing in 11 studies [ 35 , 48 , 50 , 51 , 53 , 54 , 56 , 58 , 59 , 60 , 63 ]. Of these, the majority found pacing improved physical functioning (8/10) [ 48 , 50 , 51 , 53 , 54 , 56 , 58 , 60 ], with 1/10 [ 35 ] studies reporting no change in physical functioning, and 1/10 [ 59 ] reporting a worsening of physical functioning from pre- to post-pacing. Of the seven studies [ 35 , 49 , 50 , 51 , 53 , 54 , 60 ] which used pain to assess pacing efficacy, 4/7 [ 50 , 51 , 53 , 60 ] reported improvements in pain and 3/7 [ 35 , 51 , 53 ] reported no change in pain scores with pacing. All studies reporting quality of life (1/1) [ 53 ], self-efficacy (3/3) [ 50 , 53 , 59 ], sleep (2/2) [ 53 , 59 ], and depression and anxiety (4/4) [ 50 , 53 , 59 , 63 ], found pacing to be efficacious for ME/CFS participants.

figure 2

Bubble plot displaying number of studies reporting each domain (x-axis) and the percentage of studies reporting improvement with pacing (y-axis), including a coloured scale of improvement from 0–100%. PEM = post-exertional malaise, 6MWT = 6-min walk time, CFS = chronic fatigue syndrome, DSQ = DePaul Symptom Questionnaire, PA = Physical Activity, HRQOL = Health-related quality of life, COPM = The Canadian Occupational Performance Measure

Participant characteristics

The majority of studies (10/17) [ 25 , 50 , 52 , 53 , 54 , 58 , 59 , 60 , 61 , 63 ] did not report age of the participants. For those which did report age, this ranged from 32 ± 14 to 43 ± 13 years. Where studies reported sex (11/17) [ 35 , 48 , 49 , 50 , 51 , 54 , 55 , 56 , 57 , 58 , 60 ], this was predominantly female, ranging from 75 to 100% female. Only six studies [ 35 , 54 , 56 , 57 , 58 , 60 ] reported ethnicity, with cohorts predominantly Caucasian (94–98%). Time since diagnosis was mostly unreported (12/17) [ 25 , 48 , 49 , 50 , 52 , 53 , 54 , 58 , 59 , 60 , 61 , 63 ] but ranged from 32 to 96 months, with a cross-sectional survey reporting 2% of the participants were diagnosed 1–2 years previously; 6% 3–4 years since diagnosis; 13% 3–4 years since diagnosis; 12% 5–6 years since diagnosis; 20% 7–10 years since diagnosis; 29% 11–21 years since diagnosis; 13% 21–30 years since diagnosis; and 5% > 30 years since diagnosis. Of the studies which reported comorbidities of the participants (6/17) [ 25 , 35 , 50 , 56 , 57 , 63 ], the comorbidities were chronic pain, depressive disorder, psychiatric disorder.

Study location

Of the 17 studies, 14 were from Europe [ 25 , 35 , 48 , 49 , 50 , 51 , 52 , 55 , 56 , 57 , 58 , 59 , 61 , 63 ], and three from North America [ 53 , 54 , 60 ]. Of the 14 studies[ 25 , 35 , 48 , 49 , 50 , 51 , 52 , 55 , 56 , 57 , 58 , 59 , 61 , 63 ] from Europe, ten [ 25 , 35 , 50 , 52 , 55 , 56 , 57 , 58 , 59 , 61 , 63 ] were conducted in the United Kingdom, three in Belgium [ 48 , 49 , 51 ], and one was a multicentred study between the United Kingdom and Norway [ 58 ].

Recruitment strategy

Of the 17 studies, three [ 53 , 54 , 60 ] used announcements in a newspaper and physician referrals to recruit participants, two [ 50 , 63 ] recruited patients referred by a consultant from a National Health Service (NHS) Trust following a pain diagnosis, two [ 52 , 55 ] concerned online platforms on the web, two [ 59 , 61 ] recruited from secondary care clinics, and two used the PACE trial databases [ 56 , 57 ]. Moreover, one study recruited from the hospital [ 58 ], one from physiotherapist referrals [ 25 ], two from specialist clinic centres [ 35 , 64 ], one from waiting list of rehabilitation centre [ 48 ], and one from medical files [ 49 ].

Study settings

Ten studies were carried out in hospital and clinic setting [ 25 , 35 , 48 , 49 , 50 , 51 , 58 , 59 , 61 , 63 ]. Two studies were performed on online platforms [ 52 , 55 ]. Three studies did not report study setting [ 53 , 54 , 60 ]. Two studies generated output from PACE trial databases [ 56 , 57 ]

Adherence and feasibility

All five intervention studies reported adherence rates (which they defined as number of sessions attended), which ranged from 4–44% (4% [ 49 ], 8% [ 35 ], 25% [ 48 ], 29% [ 51 ], and 44% [ 50 ]). One study reported the median number of rehabilitation programme sessions attended was five out of six possible sessions, with 58.9% [ 50 ] participants attending ≥ 5 sessions; 83.2% participants attending at least one educational session on activity pacing and 56.1% attending both activity pacing sessions.

This scoping review summarises the existing literature, with a view to aid physicians and healthcare practitioners better summarise evidence for pacing in ME/CFS and use this knowledge for other post-viral fatiguing conditions. Overall, studies generally reported pacing to be beneficial for people with ME/CFS. The exception to this trend is the controversial PACE trial [ 36 , 37 , 38 , 39 ], which we will expand on in subsequent sections. We believe information generated within this review can facilitate discussion of research opportunities and issues that need to be addressed in future studies concerning pacing, particularly given the immediate public health issue of the long COVID pandemic. As mentioned, we found some preliminary evidence for improved symptoms following pacing interventions or strategies. However, we wish to caution the reader that the current evidence base is extremely limited and hampered by several limitations which preclude clear conclusions on the efficacy of pacing. Firstly, studies were of poor to fair methodological quality (indicated by the PEDro scores), often with small sample sizes, and therefore unknown power to detect change. Moreover, very few studies implemented pacing, with most studies merely consulting on people’s views on pacing. This may of course lead to multiple biases such as reporting, recruitment, survivorship, confirmation, availability heuristic, to name but a few. Thus, there is a pressing need for more high-quality intervention studies. Secondly, the reporting of pacing strategies used was inconsistent and lacked detail, making it difficult to describe current approaches, or implement them in future research or symptom management strategies. Furthermore, outcome evaluations varied greatly between studies. This prevents any appropriate synthesis of research findings.

The lack of evidence concerning pacing is concerning given pacing is the only NICE recommended management strategy for ME/CFS following the 2021 update [ 34 ]. Given the analogous nature of long COVID with ME/CFS, patients and practitioners will be looking to the ME/CFS literature for guidance for symptom management. There is an urgent need for high quality studies (such as RCTs) investigating the effectiveness of pacing and better reporting of pacing intervention strategies so that clear recommendations can be made to patients. If this does not happen soon, there will be serious healthcare and economic implications for years to come [ 65 , 66 ].

Efficacy of pacing

Most studies (12/17) highlighted improvements in at least one outcome measure following pacing. Pacing was self-reported to be the most efficacious, safe, acceptable, and preferred form of activity management for people with ME/CFS [ 55 ]. Pacing was reported to improve symptoms and improve general health outcomes [ 25 , 50 , 52 , 58 , 63 ], fatigue and PEM [ 48 , 50 , 51 , 53 , 54 , 55 , 56 , 60 , 63 ], physical functioning [ 48 , 50 , 51 , 53 , 56 , 58 , 60 , 63 ], pain [ 25 , 50 , 55 , 63 ], quality of life [ 50 ], self-efficacy [ 50 , 53 ], sleep [ 53 , 55 ], and depression and anxiety [ 50 , 53 , 63 ]. These positive findings provide hope for those with ME/CFS, and other chronic fatiguing conditions such as long COVID, to improve quality of life through symptom management.

Conversely, some studies reported no effects of pacing on ME/CFS symptoms [ 52 ], fatigue, physical functioning [ 35 ], or pain scores [ 49 , 61 ]. Some studies even found pacing to have detrimental effects in those with ME/CFS, including a worsening of symptoms in 14% of survey participants recalling previous pacing experiences [ 52 ]. Furthermore, a worsening of fatigue [ 35 , 59 ], and physical functioning from pre- to post-pacing [ 35 , 57 , 59 , 61 ] was reported by the PACE trial and sub-analysis of the PACE trial [ 56 , 57 , 61 ]. The PACE trial [ 35 ], a large RCT (n = 639) comparing pacing with CBT and GET, reported GET and CBT were more effective for reducing ME/CFS-related fatigue and improving physical functioning than pacing. However, the methodology and conclusions from the PACE trial have been heavily criticised, mainly due to the authors lowering the thresholds they used to determine improvement [ 36 , 37 , 38 , 67 ]. With this in mind, Sharpe et al. [ 56 ] surveyed 75% of the participants from the PACE trial 1-year post-intervention and reported pacing improved fatigue and physical functioning, with effects similar to CBT and GET.

Lessons for pacing implementation

All pacing intervention studies (5/5) implemented educational or coaching sessions. These educational components were poorly reported in terms of the specific content and how and where they had been developed, with unclear pedagogical approaches. Consequently, even where interventions reported reduction in PEM or improved symptoms, it would be impossible to transfer that research into practice, future studies, or clinical guidance, given the ambiguity of reporting. Sessions typically contained themes of pacing such as activity adjustment (decrease, break-up, and reschedule activities based on energy levels), activity consistency (maintaining a consistently low level of activity to prevent PEM), activity planning (planning activities and rest around available energy levels), and activity progression (slowly progressing activity once maintaining a steady baseline) [ 35 , 48 , 49 , 50 , 51 ]. We feel it is pertinent to note here that although activity progression has been incorporated as a pacing strategy in these included studies, some view activity progression as a form of GET. The NICE definition of GET is “first establishing an individual's baseline of achievable exercise or physical activity, then making fixed incremental increases in the time spent being physically active” [ 34 ]. Thus, this form of pacing can also be considered a type of ‘long-term GET’ in which physical activity progression is performed over weeks or months with fixed incremental increases in time spent being physically.

Intervention studies attempted to create behaviour change, through educational programmes to modify physical activity, and plan behaviours. However, none of these studies detailed integrating any evidence-based theories of behaviour change [ 68 ] or reported using any frameworks to support behaviour change objectives. This is unfortunate since there is good evidence that theory-driven behaviour change interventions result in greater intervention effects [ 69 ]. Indeed, there is a large body of work regarding methods of behaviour change covering public health messaging, education, and intervention design, which has largely been ignored by the pacing literature. Interventions relied on subjective pacing (5/5 studies), with strategies including keeping an activity diary (3/5 studies) to identify links between activity and fatigue [ 35 , 48 , 50 ]. Given the high prevalence of ‘brain fog’ within ME/CFS [ 70 , 71 , 72 , 73 ], recall may be extremely difficult and there is significant potential for under-reporting. Other strategies included simply asking participants to estimate energy levels available for daily activities (2/5 studies [ 48 , 51 ]). Again, this is subjective and relies on participants’ ability to recall previous consequences of the activity. Other methods of activity tracking and measuring energy availability, such as wearable technology [ 74 , 75 , 76 , 77 , 78 ] could provide a more objective measure of adherence and pacing strategy fidelity in future studies. Despite technology such as accelerometers being widely accessible since well-before the earliest interventional study included in this review (which was published in 2009), none of the interventional studies utilised objective activity tracking to track pacing and provide feedback to participants. One study considered accelerometery alongside an activity diary [ 51 ]. However, accelerometery was considered the outcome variable, to assess change in activity levels from pre- to post-intervention and was not part of the intervention itself (which was one pacing coaching sessions per week for 3 weeks). Moreover, most research-grade accelerometers cannot be used as part of the intervention since they have no ability to provide continuous feedback and must be retrieved by the research team in order to access any data. Consequently, their use is mostly limited to outcome assessments only. As pacing comprises a limit to physical activity to prevent push-crash cycles, it is an astonishing observation from this scoping review that only two studies objectively measured physical activity to quantify changes to activity as a result of pacing [ 51 , 54 ]. If the aim of pacing is to reduce physical activity, or reduce variations in physical activity (i.e., push-crash cycles), only two studies have objectively quantified the effect pacing had on physical activity, so it is unclear whether pacing was successfully implemented in any of the other studies.

By exploring the pacing strategies previously used, in both intervention studies and more exploratory studies, we can identify and recommend approaches to improve symptoms of ME/CFS. These approaches can be categorised as follows: activity planning, activity consistency, activity progression, activity adjustment and staying within the Energy Envelope [ 50 , 53 , 60 , 63 ]. Activity planning was identified as a particularly effective therapeutic strategy, resulting in improvement of mean scores of all symptoms included in the APQ-28, reducing current pain, improvement of physical fatigue, mental fatigue, self-efficacy, quality of life, and mental and physical functioning [ 50 ]. Activity planning aligns with the self-regulatory behaviour change technique ‘Action Planning’ [ 79 ] which is commonly used to increase physical activity behaviour. In the case of ME/CFS, activity planning is successfully used to minimise rather than increase physical activity bouts to prevent expending too much energy and avoid PEM. Activity consistency, meaning undertaking similar amounts of activity each day, was also associated with reduced levels of depression, exercise avoidance, and higher levels of physical function [ 63 ]. Activity progression was associated with higher levels of current pain. Activity adjustment associated with depression and avoidance, and lower levels of physical function [ 63 ]. Staying within the Energy Envelope was reported to reduce PEM severity [ 53 , 60 ], improve physical functioning [ 53 , 60 ] and ME/CFS symptom scores [ 53 ], and more hours engaged in activity than individuals with lower available energy [ 53 ]. These results suggest that effective pacing strategies would include activity planning, consistency, and energy management techniques while avoiding progression. This data is, of course, limited by the small number of mostly low-quality studies and should be interpreted with some caution. Nevertheless, these are considerations that repeatedly appear in the literature and, as such, warrant deeper investigation. In addition, and as outlined earlier, most studies are relatively old, and we urgently need better insight into how modern technologies, particularly longitudinal activity tracking and contemporaneous heart-rate feedback, might improve (or otherwise) adaptive pacing. Such longitudinal tracking would also enable activities and other behaviours (sleep, diet, stress) to be linked to bouts of PEM. Linking would enable a deeper insight into potential PEM triggers and mitigations that might be possible.

The PACE trial

We feel it would be remiss of us to not specifically address the PACE trial within this manuscript, as five of the 17 included studies resulted from the PACE trial [ 35 , 56 , 57 , 59 , 61 ]. There has been considerable discussion around the PACE trial, which has been particularly divisive and controversial [ 37 , 38 , 39 , 59 , 67 , 80 , 81 ]. In the PACE trial, GET and CBT were deemed superior to pacing by the authors. Despite its size and funding, the PACE trial has received several published criticisms and rebuttals. Notably, NICE's most recent ME/CFS guideline update removed GET and CBT as suggested treatment options, which hitherto had been underpinned by the PACE findings. While we will not restate the criticisms and rebuttals here, what is not in doubt, is that the PACE trial has dominated discussions of pacing, representing almost a third of all the studies in this review. However, the trial results were published over a decade ago, with the study protocol devised almost two decades ago [ 82 ]. The intervening time has seen a revolution in the development of mobile and wearable technology and an ability to remotely track activity and provide real-time feedback in a way which was not available at that time. Furthermore, there has been no substantive research since the PACE trial that has attempted such work. Indeed, possibly driven by the reported lack of effect of pacing in the PACE trial, this review has demonstrated the dearth of progress and innovation in pacing research since its publication. Therefore, regardless of its findings or criticisms, the pacing implementation in the PACE trial is dated, and there is an urgent need for more technologically informed approaches to pacing research.

Limitations of the current evidence

The first limitation to the literature included in this scoping review is that not all studies followed the minimum data set (MDS) of patient-reported outcome measures (PROMs) agreed upon by the British Association of CFS/ME Professionals (BACME) (fatigue, sleep quality, self-efficacy, pain/discomfort, anxiety/depression, mobility, activities of daily living, self-care, and illness severity) [ 83 , 84 ]. All but one study included in this review measured illness severity, most studies included fatigue and pain/discomfort, and some studies included assessments of anxiety/depression. There was a lack of quantitative assessment of sleep quality, self-efficacy, mobility, activities of daily living, and self-care. Therefore, studies did not consistently capture the diverse nature of the symptoms experienced, with crucial domains missing from the analyses. The MDS of PROMs were established in 2012 [ 83 , 84 ] and therefore, for studies published out prior to 2012, these are not applicable [ 35 , 49 , 51 , 53 , 54 ]. However, for the 12 studies carried out after this time, the MDS should have been considered elucidate the effects of pacing on ME/CFS. Importantly, despite PEM being a central characteristic of ME/CFS, only two studies included PEM as an outcome measure [ 55 , 60 ]. This may be because of the difficulty of accurately measuring fluctuating symptoms, as PEM occurs multiple times over a period of months, and therefore pre- to post- studies and cross-sectional designs cannot adequately capture PEM incidence. Therefore, it is likely studies opted for measuring general fatigue instead. More appropriate longitudinal study designs are required to track PEM over time to capture a more representative picture of PEM patterns. Secondly, reporting of participant characteristics was inadequate, but in the studies that did describe participants, characteristics were congruent with the epidemiological literature and reporting of ME/CFS populations (i.e., 60–65% female) [ 85 ]. Therefore, in this respect, studies included herein were representative samples. However, the lack of reporting of participant characteristics limits inferences we can draw concerning any population-related effects (i.e. whether older, or male, or European, or people referred by a national health service would be more or less likely to respond positively to pacing). Thirdly, comparison groups (where included) were not ideal, with CBT or GET sometimes used as comparators to pacing [ 35 ], and often no true control group included. Penultimately, there is a distinct lack of high-quality RCTs (as mentioned throughout this manuscript). Finally, in reference to the previous section, inferences from the literature are dated and do not reflect the technological capabilities of 2023.

Recommendations for advancement of the investigative area

It is clear from the studies included in this scoping review for the last decade or more, progress and innovation in pacing research have been limited. This is unfortunate for several reasons. People with ME/CFS or long COVID are, of course, invested in their recovery. From our patient and public involvement (PPI) group engagement, it is clear many are ahead of the research and are using wearable technology to track steps, heart rate, and, in some cases, heart rate variability to improve their own pacing practice. While the lack of progress in the research means this is an understandable response by patients, it is also problematic. Without underpinning research, patients may make decisions based on an individual report of trial-and-error approaches given the lack of evidence-based guidance.

A more technologically-informed pacing approach could be implemented by integrating wearable trackers [ 77 , 78 , 86 , 87 ] to provide participants with live updates on their activity and could be integrated with research-informed messaging aimed at supporting behaviour change, as has been trialled in other research areas [ 88 , 89 , 90 , 91 ]. However, more work is needed to evaluate how to incorporate wearable activity trackers and which metrics are most helpful.

A more technologically-informed approach could also be beneficial for longitudinal symptom tracking, particularly useful given the highly variable symptom loads of ME/CFS and episodic nature of PEM. This would overcome reliance on assessments at a single point in time (as the studies within this review conducted). Similarly, mobile health (mHealth) approaches also allow questionnaires to be digitised to make it easier for participants to complete if they find holding a pen or reading small font problematic [ 92 ]. Reminders and notifications can also be helpful for patients completing tasks [ 77 , 93 , 94 , 95 ]. This approach has the added advantage of allowing contemporaneous data collection rather than relying on pre- to post-intervention designs limited by recall bias. Future work must try to leverage these approaches, as unless we collect large data sets on symptoms and behaviours (i.e. activity, diet, sleep, and pharmacology) in people with conditions like ME/CFS we will not be able to leverage emerging technologies such as AI and machine learning to improve the support and care for people with these debilitating conditions. The key areas for research outline in the NICE guidelines (2021 update) speaks to this, with specific mention of improved self-monitoring strategies, sleep strategies, and dietary strategies, all of which can be measured using mHealth approaches, in a scalable and labour-inexpensive way.

The potential for existing pacing research to address the long COVID pandemic

There is now an urgent public health need to address long COVID, with over 200 million sufferers worldwide [ 30 ]. Given the analogous symptomology between ME/CFS and long COVID, and the lack of promising treatment and management strategies in ME/CFS, pacing remains the only strategy for managing long COVID symptoms. This is concerning as the quality of evidence to support pacing is lacking. Given long COVID has reached pandemic proportions, scalable solutions will be required. In this context, we propose that technology should be harnessed to a) deliver, but also b) evaluate, pacing. We recently reported on a just-in-time adaptive intervention to increase physical activity during the pandemic [ 78 ]. However, this method could be adapted to decrease or maintain physical activity levels (i.e., pacing) in long COVID. This method has the advantage of scalability and remote data collection, reducing resource commitments and participant burden, essential for addressing a condition with so many sufferers.

This review highlights the need for more studies concerning pacing in chronic fatiguing conditions. Future studies would benefit from examining pacing’s effect on symptomology and PEM with objectively quantified pacing, over a longer duration of examination, using the MDS. It is essential this is conducted as an RCT, given that in the case of long COVID, participants may improve their health over time, and it is necessary to determine whether pacing exerts an additional effect over time elapsing. Future studies would benefit from digitising pacing to support individuals with varying symptom severity and personalise support. This would improve accessibility and reduce selection bias, in addition to improving scalability of interventions. Finally, clinicians and practitioners should be cognisant of the strength of evidence reported in this review and should exert caution when promoting pacing in their patients, given the varying methods utilised herein.

Availability of data and materials

The datasets used and/or analysed during the current study are available from the corresponding author on reasonable request.

Abbreviations

Activity Pacing Questionnaire

Beck Anxiety Inventory

Beck Depression Inventory

Brief Coping Orientation to Problems Experienced Scale

Canadian Occupational Performance Measure

Centers for disease control and prevention

Chalder Fatigue Questionnaire

Checklist Individual Strength

Chronic Pain Coping Inventory

Cognitive behavioural therapy

Cochrane Central Register of Controlled Trials

DePaul symptom questionnaire

EuroQol five-dimensions, five-levels questionnaire

Graded exercise therapy

Hospital Anxiety and Depression Scale

Myalgic encephalomyelitis/chronic fatigue syndrome

Pain Self Efficacy Questionnaire

Pain Anxiety Symptoms Scale short version

Pain Numerical Rating Scale

Patient health questionnaire

Patient reported outcome measures

Physiotherapy Evidence Database

Perceived Stress Scale

Post exertional malaise

Preferred Reporting Items for Systematic Reviews and Meta-Analyses extension for scoping reviews

Randomised control trial

McMurray JC, May JW, Cunningham MW, Jones OY. Multisystem Inflammatory Syndrome in Children (MIS-C), a post-viral myocarditis and systemic vasculitis-a critical review of its pathogenesis and treatment. Front Pediatr. 2020;8: 626182.

Article   PubMed   PubMed Central   Google Scholar  

Perrin R, Riste L, Hann M, Walther A, Mukherjee A, Heald A. Into the looking glass: post-viral syndrome post COVID-19. Med Hypotheses. 2020;144: 110055.

Article   PubMed   PubMed Central   CAS   Google Scholar  

Hayes LD, Ingram J, Sculthorpe NF. More than 100 persistent symptoms of SARS-CoV-2 (Long COVID): A scoping review. Front Med. 2021. https://doi.org/10.3389/fmed.2021.750378 .

Article   Google Scholar  

McLaughlin M, Cerexhe C, Macdonald E, Ingram J, Sanal-Hayes NEM, Hayes LD, et al. A Cross-sectional study of symptom prevalence, frequency, severity, and impact of long-COVID in Scotland: part I. Am J Med. 2023. https://doi.org/10.1016/j.amjmed.2023.07.009 .

Article   PubMed   Google Scholar  

McLaughlin M, Cerexhe C, Macdonald E, Ingram J, Sanal-Hayes NEM, Hayes LD, et al. A cross-sectional study of symptom prevalence, frequency, severity, and impact of long-COVID in Scotland: part II. Am J Med. 2023. https://doi.org/10.1016/j.amjmed.2023.07.009 .

Hayes LD, Sanal-Hayes NEM, Mclaughlin M, Berry ECJ, Sculthorpe NF. People with long covid and ME/CFS exhibit similarly impaired balance and physical capacity: a case-case-control study. Am J Med. 2023;S0002–9343(23):00465–75.

Google Scholar  

Jenkins R. Post-viral fatigue syndrome. Epidemiology: lessons from the past. Br Med Bull. 1991;47:952–65.

Article   PubMed   CAS   Google Scholar  

Sandler CX, Wyller VBB, Moss-Morris R, Buchwald D, Crawley E, Hautvast J, et al. Long COVID and post-infective fatigue syndrome: a review. Open Forum Infect Dis. 2021;8:440.

Carod-Artal FJ. Post-COVID-19 syndrome: epidemiology, diagnostic criteria and pathogenic mechanisms involved. Rev Neurol. 2021;72:384–96.

PubMed   CAS   Google Scholar  

Ballering AV, van Zon SKR, Olde Hartman TC, Rosmalen JGM. Lifelines corona research initiative. Persistence of somatic symptoms after COVID-19 in the Netherlands: an observational cohort study. Lancet. 2022;400:452–61.

Wong TL, Weitzer DJ. Long COVID and Myalgic Encephalomyelitis/Chronic Fatigue Syndrome (ME/CFS)-a systemic review and comparison of clinical presentation and symptomatology. Medicina (Kaunas). 2021;57:418.

Sukocheva OA, Maksoud R, Beeraka NM, Madhunapantula SV, Sinelnikov M, Nikolenko VN, et al. Analysis of post COVID-19 condition and its overlap with myalgic encephalomyelitis/chronic fatigue syndrome. J Adv Res. 2021. https://doi.org/10.1016/j.jare.2021.11.013 .

Bonilla H, Quach TC, Tiwari A, Bonilla AE, Miglis M, Yang P, et al. Myalgic Encephalomyelitis/Chronic Fatigue Syndrome (ME/CFS) is common in post-acute sequelae of SARS-CoV-2 infection (PASC): results from a post-COVID-19 multidisciplinary clinic. medrxiv. 2022. https://doi.org/10.1101/2022.08.03.22278363v1 .

Twomey R, DeMars J, Franklin K, Culos-Reed SN, Weatherald J, Wrightson JG. Chronic fatigue and postexertional malaise in people living with long COVID: an observational study. Phys Ther. 2022;102:005.

Barhorst EE, Boruch AE, Cook DB, Lindheimer JB. Pain-related post-exertional malaise in myalgic encephalomyelitis/chronic fatigue syndrome (ME/CFS) and fibromyalgia: a systematic review and three-level meta-analysis. Pain Med. 2022;23:1144–57.

Goudsmit EM. The psychological aspects and management of chronic fatigue syndrome [Internet] [Thesis]. Brunel University, School of Social Sciences; 1996 [cited 2022 Jan 20]. https://scholar.google.co.uk/scholar_url?url=https://bura.brunel.ac.uk/bitstream/2438/4283/1/FulltextThesis.pdf&hl=en&sa=X&ei=kNYjZdeuA4-8ywTAmKmADQ&scisig=AFWwaeZvdxcuHmzGL08L3jp-QwNn&oi=scholarr . Accessed 2 Aug 2022

Stussman B, Williams A, Snow J, Gavin A, Scott R, Nath A, et al. Characterization of post-exertional malaise in patients with myalgic encephalomyelitis/chronic fatigue syndrome. Front Neurol. 2020;11:1025.

Holtzman CS, Bhatia KP, Cotler J, La J. Assessment of Post-Exertional Malaise (PEM) in Patients with Myalgic Encephalomyelitis (ME) and Chronic Fatigue Syndrome (CFS): a patient-driven survey. Diagnostics. 2019. https://doi.org/10.3390/diagnostics9010026 .

Fukuda K, Straus SE, Hickie I, Sharpe MC, Dobbins JG, Komaroff A. The chronic fatigue syndrome: a comprehensive approach to its definition and study. International Chronic Fatigue Syndrome Study Group. Ann Intern Med. 1994;121:953–9.

Carruthers BM, van de Sande MI, De Meirleir KL, Klimas NG, Broderick G, Mitchell T, et al. Myalgic encephalomyelitis: international consensus criteria. J Intern Med. 2011;270:327–38.

Carruthers JD, Lowe NJ, Menter MA, Gibson J, Eadie N, Botox Glabellar Lines II Study Group. Double-blind, placebo-controlled study of the safety and efficacy of botulinum toxin type A for patients with glabellar lines. Plast Reconstr Surg. 2003;112:1089–98.

Jason LA, Jordan K, Miike T, Bell DS, Lapp C, Torres-Harding S, et al. A pediatric case definition for myalgic encephalomyelitis and chronic fatigue syndrome. J Chronic Fatigue Syndrome. 2006;13:1–44.

Goudsmit EM, Nijs J, Jason LA, Wallman KE. Pacing as a strategy to improve energy management in myalgic encephalomyelitis/chronic fatigue syndrome: a consensus document. Disabil Rehabil. 2012;34:1140–7.

Antcliff D, Keenan A-M, Keeley P, Woby S, McGowan L. Engaging stakeholders to refine an activity pacing framework for chronic pain/fatigue: a nominal group technique. Musculoskeletal Care. 2019;17:354–62.

Antcliff D, Keeley P, Campbell M, Woby S, McGowan L. Exploring patients’ opinions of activity pacing and a new activity pacing questionnaire for chronic pain and/or fatigue: a qualitative study. Physiotherapy. 2016;102:300–7.

Yoshiuchi K, Cook DB, Ohashi K, Kumano H, Kuboki T, Yamamoto Y, et al. A real-time assessment of the effect of exercise in chronic fatigue syndrome. Physiol Behav. 2007;92:963–8.

Davenport TE, Stevens SR, Baroni K, Van Ness M, Snell CR. Diagnostic accuracy of symptoms characterising chronic fatigue syndrome. Disabil Rehabil. 2011;33:1768–75.

Nurek M, Rayner C, Freyer A, Taylor S, Järte L, MacDermott N, Delaney BC, Panellists D, et al. Recommendations for the recognition, diagnosis, and management of long COVID: a Delphi study. Br J Gen Pract. 2021. https://doi.org/10.3399/BJGP.2021.0265 .

Herrera JE, Niehaus WN, Whiteson J, Azola A, Baratta JM, Fleming TK, Kim SY, Naqvi H, Sampsel S, Silver JK, Gutierrez MV, Maley J, Herman E, Abramoff Benjamin, et al. Multidisciplinary collaborative consensus guidance statement on the assessment and treatment of fatigue in postacute sequelae of SARS-CoV-2 infection (PASC) patients. PM & R. 2021. https://doi.org/10.1002/pmrj.12684 .

Chen C, Haupert SR, Zimmermann L, Shi X, Fritsche LG, Mukherjee B. Global prevalence of post COVID-19 condition or long COVID: a meta-analysis and systematic review. J Infect Dis. 2022. https://doi.org/10.1093/infdis/jiac136 .

Office for National Statistics. Prevalence of ongoing symptoms following coronavirus (COVID-19) infection in the UK [Internet]. https://www.ons.gov.uk/peoplepopulationandcommunity/healthandsocialcare/conditionsanddiseases/bulletins/prevalenceofongoingsymptomsfollowingcoronaviruscovid19infectionintheuk/7july2022 . Accessed 2 Aug 2022

Office for National Statistics. Prevalence of ongoing symptoms following coronavirus (COVID-19) infection in the UK [Internet]. [cited 2022 Apr 1]. https://www.ons.gov.uk/peoplepopulationandcommunity/healthandsocialcare/conditionsanddiseases/bulletins/prevalenceofongoingsymptomsfollowingcoronaviruscovid19infectionintheuk/3march2022

Baker R, Shaw EJ. Diagnosis and management of chronic fatigue syndrome or myalgic encephalomyelitis (or encephalopathy): summary of NICE guidance. BMJ. 2007;335:446–8.

NICE. Overview | Myalgic encephalomyelitis (or encephalopathy)/chronic fatigue syndrome: diagnosis and management | Guidance | NICE [Internet]. NICE; [cited 2022 Aug 22]. https://www.nice.org.uk/guidance/ng206 . Accessed 2 Aug 2022

White P, Goldsmith K, Johnson A, Potts L, Walwyn R, DeCesare J, et al. Comparison of adaptive pacing therapy, cognitive behaviour therapy, graded exercise therapy, and specialist medical care for chronic fatigue syndrome (PACE): a randomised trial. The Lancet. 2011;377:823–36.

Article   CAS   Google Scholar  

Vink M. PACE trial authors continue to ignore their own null effect. J Health Psychol. 2017;22:1134–40.

Petrie K, Weinman J. The PACE trial: it’s time to broaden perceptions and move on. J Health Psychol. 2017;22:1198–200.

Stouten B. PACE-GATE: an alternative view on a study with a poor trial protocol. J Health Psychol. 2017;22:1192–7.

Agardy S. Chronic fatigue syndrome patients have no reason to accept the PACE trial results: response to Keith J Petrie and John Weinman. J Health Psychol. 2017;22:1206–8.

Kim D-Y, Lee J-S, Park S-Y, Kim S-J, Son C-G. Systematic review of randomized controlled trials for chronic fatigue syndrome/myalgic encephalomyelitis (CFS/ME). J Transl Med. 2020;18:7.

Twisk FNM, Maes M. A review on cognitive behavorial therapy (CBT) and graded exercise therapy (GET) in myalgic encephalomyelitis (ME) / chronic fatigue syndrome (CFS): CBT/GET is not only ineffective and not evidence-based, but also potentially harmful for many patients with ME/CFS. Neuro Endocrinol Lett. 2009;30:284–99.

PubMed   Google Scholar  

Mays N, Roberts E, Popay J. Synthesising research evidence. In: Fulop N, Allen P, Clarke A, Black N, editors. Studying the organisation and delivery of health services: research methods. London: Routledge; 2001. p. 188–220.

Munn Z, Peters MDJ, Stern C, Tufanaru C, McArthur A, Aromataris E. Systematic review or scoping review? Guidance for authors when choosing between a systematic or scoping review approach. BMC Med Res Methodol. 2018;18:143.

Arksey H, O’Malley L. Scoping studies: towards a methodological framework. Int J Soc Res Methodol. 2005;8:19–32.

Tricco AC, Lillie E, Zarin W, O’Brien KK, Colquhoun H, Levac D, et al. PRISMA extension for scoping reviews (PRISMA-ScR): checklist and explanation. Ann Intern Med. 2018;169:467–73.

de Morton NA. The PEDro scale is a valid measure of the methodological quality of clinical trials: a demographic study. Aust J Physiother. 2009;55:129–33.

Maher CG, Sherrington C, Herbert RD, Moseley AM, Elkins M. Reliability of the PEDro scale for rating quality of randomized controlled trials. Phys Ther. 2003;83:713–21.

Kos D, van Eupen I, Meirte J, Van Cauwenbergh D, Moorkens G, Meeus M, et al. Activity pacing self-management in chronic fatigue syndrome: a randomized controlled trial. Am J Occup Ther. 2015;69:6905290020.

Meeus M, Nijs J, Van Oosterwijck J, Van Alsenoy V, Truijen S. Pain physiology education improves pain beliefs in patients with chronic fatigue syndrome compared with pacing and self-management education: a double-blind randomized controlled trial. Arch Phys Med Rehabil. 2010;91:1153–9.

Antcliff D, Keenan A-M, Keeley P, Woby S, McGowan L. Testing a newly developed activity pacing framework for chronic pain/fatigue: a feasibility study. BMJ Open. 2021;11: e045398.

Nijs J, van Eupen I, Vandecauter J, Augustinus E, Bleyen G, Moorkens G, et al. Can pacing self-management alter physical behavior and symptom severity in chronic fatigue syndrome? A case series. J Rehabil Res Dev. 2009;46:985–96.

Geraghty K, Hann M, Kurtev S. Myalgic encephalomyelitis/chronic fatigue syndrome patients’ reports of symptom changes following cognitive behavioural therapy, graded exercise therapy and pacing treatments: Analysis of a primary survey compared with secondary surveys. J Health Psychol. 2019;24:1318–33.

Jason L, Muldowney K, Torres-Harding S. The energy envelope theory and myalgic encephalomyelitis/chronic fatigue syndrome. AAOHN J. 2008;56:189–95.

Brown M, Khorana N, Jason LA. The role of changes in activity as a function of perceived available and expended energy in non-pharmacological treatment outcomes for ME/CFS. J Clin Psychol. 2011;67:253.

Association ME. ME/CFS illness management survey results:‘“No decisions about me without me.” Part 1: Results and in-depth analysis of the 2012 ME association patient survey examining the acceptability, efficacy and safety of cognitive behavioural therapy, graded exercise therapy and pacing, as interventions used as management strategies for ME/CFS. 2015. https://www.meassociation.org.uk/wp-content/uploads/NO-DECISIONS-WITHOUT-ME-report.docx . Accessed 2 Feb 2022

Sharpe M, Goldsmith KA, Johnson AL, Chalder T, Walker J, White PD. Rehabilitative treatments for chronic fatigue syndrome: long-term follow-up from the PACE trial. The Lancet Psychiatry. 2015;2:1067–74.

White PD, Goldsmith K, Johnson AL, Chalder T, Sharpe M. Recovery from chronic fatigue syndrome after treatments given in the PACE trial. Psychol Med. 2013;43:2227–35.

O’connor K, Sunnquist M, Nicholson L, Jason LA, Newton JL, Strand EB. Energy envelope maintenance among patients with myalgic encephalomyelitis and chronic fatigue syndrome: Implications of limited energy reserves. Chronic Illn. 2019;15:51–60.

Dougall D, Johnson A, Goldsmith K, Sharpe M, Angus B, Chalder T, et al. Adverse events and deterioration reported by participants in the PACE trial of therapies for chronic fatigue syndrome. J Psychosom Res. 2014;77:20–6.

Brown AA, Evans MA, Jason LA. Examining the energy envelope and associated symptom patterns in chronic fatigue syndrome: does coping matter? Chronic Illn. 2013;9:302–11.

Bourke JH, Johnson AL, Sharpe M, Chalder T, White PD. Pain in chronic fatigue syndrome: response to rehabilitative treatments in the PACE trial. Psychol Med. 2014;44:1545–52.

Jason LA, Brown M, Brown A, Evans M, Flores S, Grant-Holler E, et al. Energy conservation/envelope theory interventions to help patients with myalgic encephalomyelitis/chronic fatigue syndrome. Fatigue. 2013;1:27–42.

Antcliff D, Campbell M, Woby S, Keeley P. Activity pacing is associated with better and worse symptoms for patients with long-term conditions. Clin J Pain. 2017;33:205–14.

Nijs T, Klein Y, Mousavi S, Ahsan A, Nowakowska S, Constable E, et al. The different faces of 4’-Pyrimidinyl-Functionalized 4,2’:6’,4"-Terpyridin es: metal-organic assemblies from solution and on Au(111) and Cu(111) surface platforms. J Am Chem Soc. 2018;140:2933–9.

Cutler DM, Summers LH. The COVID-19 pandemic and the $16 Trillion Virus. JAMA. 2020;324:1495–6.

Cutler DM. The costs of long COVID. JAMA Health Forum. 2022;3:e221809–e221809.

Geraghty K. ‘PACE-Gate’: when clinical trial evidence meets open data access. J Health Psychol. 2017;22:1106–12.

Davis R, Campbell R, Hildon Z, Hobbs L, Michie S. Theories of behaviour and behaviour change across the social and behavioural sciences: a scoping review. Health Psychol Rev. 2015;9:323–44.

Prestwich A, Sniehotta FF, Whittington C, Dombrowski SU, Rogers L, Michie S. Does theory influence the effectiveness of health behavior interventions? Meta-analysis Health Psychol. 2014;33:465–74.

Balinas C, Eaton-Fitch N, Maksoud R, Staines D, Marshall-Gradisnik S. Impact of life stressors on Myalgic encephalomyelitis/chronic fatigue syndrome symptoms: an Australian longitudinal study. Int J Environ Res Public Health. 2021;18:10614.

McGregor NR, Armstrong CW, Lewis DP, Gooley PR. Post-exertional malaise is associated with hypermetabolism, hypoacetylation and purine metabolism deregulation in ME/CFS cases. Diagnostics. 2019;9:70.

Nacul LC, Lacerda EM, Campion P, Pheby D, de Drachler M, Leite JC, et al. The functional status and well being of people with myalgic encephalomyelitis/chronic fatigue syndrome and their carers. BMC Public Health. 2011;11:402.

Deumer U-S, Varesi A, Floris V, Savioli G, Mantovani E, López-Carrasco P, et al. Myalgic Encephalomyelitis/Chronic Fatigue Syndrome (ME/CFS): an overview. J Clin Med. 2021. https://doi.org/10.3390/jcm10204786 .

Düking P, Giessing L, Frenkel MO, Koehler K, Holmberg H-C, Sperlich B. Wrist-worn wearables for monitoring heart rate and energy expenditure while sitting or performing light-to-vigorous physical activity: validation study. JMIR Mhealth Uhealth. 2020;8: e16716.

Falter M, Budts W, Goetschalckx K, Cornelissen V, Buys R. Accuracy of apple watch measurements for heart rate and energy expenditure in patients with cardiovascular disease: cross-sectional study. JMIR Mhealth Uhealth. 2019;7: e11889.

Fuller D, Colwell E, Low J, Orychock K, Tobin MA, Simango B, et al. Reliability and validity of commercially available wearable devices for measuring steps, energy expenditure, and heart rate: systematic review. JMIR Mhealth Uhealth. 2020;8: e18694.

Mair JL, Hayes LD, Campbell AK, Sculthorpe N. Should we use activity tracker data from smartphones and wearables to understand population physical activity patterns? J Measur Phys Behav. 2022;1:1–5.

Mair JL, Hayes LD, Campbell AK, Buchan DS, Easton C, Sculthorpe N. A personalized smartphone-delivered just-in-time adaptive intervention (JitaBug) to increase physical activity in older adults: mixed methods feasibility study. JMIR Formative Res. 2022;6: e34662.

Michie S, Richardson M, Johnston M, Abraham C, Francis J, Hardeman W, et al. The behavior change technique taxonomy (v1) of 93 hierarchically clustered techniques: building an international consensus for the reporting of behavior change interventions. Ann Behav Med. 2013;46:81–95.

Feehan SM. The PACE trial in chronic fatigue syndrome. The Lancet. 2011;377:1831–2.

Giakoumakis J. The PACE trial in chronic fatigue syndrome. The Lancet. 2011;377:1831.

White PD, Sharpe MC, Chalder T, DeCesare JC, Walwyn R, PACE trial group. Protocol for the PACE trial: a randomised controlled trial of adaptive pacing, cognitive behaviour therapy, and graded exercise, as supplements to standardised specialist medical care versus standardised specialist medical care alone for patients with the chronic fatigue syndrome/myalgic encephalomyelitis or encephalopathy. BMC Neurol. 2007;7:6.

Reuben DB, Tinetti ME. Goal-oriented patient care–an alternative health outcomes paradigm. N Engl J Med. 2012;366:777–9.

Roberts D. Chronic fatigue syndrome and quality of life. PROM. 2018;9:253–62.

Valdez AR, Hancock EE, Adebayo S, Kiernicki DJ, Proskauer D, Attewell JR, et al. Estimating prevalence, demographics, and costs of ME/CFS using large scale medical claims data and machine learning. Front Pediatr. 2019. https://doi.org/10.3389/fped.2018.00412 .

Greiwe J, Nyenhuis SM. Wearable technology and how this can be implemented into clinical practice. Curr Allergy Asthma Rep. 2020;20:36.

Sun S, Folarin AA, Ranjan Y, Rashid Z, Conde P, Stewart C, et al. Using smartphones and wearable devices to monitor behavioral changes during COVID-19. J Med Internet Res. 2020;22: e19992.

Hardeman W, Houghton J, Lane K, Jones A, Naughton F. A systematic review of just-in-time adaptive interventions (JITAIs) to promote physical activity. Int J Behav Nutr Phys Act. 2019;16:31.

Perski O, Hébert ET, Naughton F, Hekler EB, Brown J, Businelle MS. Technology-mediated just-in-time adaptive interventions (JITAIs) to reduce harmful substance use: a systematic review. Addiction. 2022;117:1220–41.

AhmedS A, van Luenen S, Aslam S, van Bodegom D, Chavannes NH. A systematic review on the use of mHealth to increase physical activity in older people. Clinical eHealth. 2020;3:31–9.

Valenzuela T, Okubo Y, Woodbury A, Lord SR, Delbaere K. Adherence to technology-based exercise programs in older adults: a systematic review. J Geriatric Phys Ther. 2018;41:49–61.

Bowling A. Mode of questionnaire administration can have serious effects on data quality. J Public Health. 2005;27:281–91.

Burns SP, Terblanche M, Perea J, Lillard H, DeLaPena C, Grinage N, et al. mHealth intervention applications for adults living with the effects of stroke: a scoping review. Arch Rehabil Res Clin Transl. 2021;3: 100095.

Vandelanotte C, Müller AM, Short CE, Hingle M, Nathan N, Williams SL, et al. Past, present, and future of eHealth and mHealth research to improve physical activity and dietary behaviors. J Nutr Educ Behav. 2016;48:219-228.e1.

Ludwig K, Arthur R, Sculthorpe N, Fountain H, Buchan DS. Text messaging interventions for improvement in physical activity and sedentary behavior in youth: systematic review. JMIR Mhealth Uhealth. 2018;6:e10799.

Download references

Acknowledgements

We have no acknowledgements to make.

Open access funding provided by Swiss Federal Institute of Technology Zurich. This work was supported by grants from the National Institute for Health and Care Research (COV-LT2-0010) and the funder had no role in the conceptualisation, design, data collection, analysis, decision to publish, or preparation of the manuscript.

Author information

Authors and affiliations.

Sport and Physical Activity Research Institute, School of Health and Life Sciences, University of the West of Scotland, Glasgow, UK

Nilihan E. M. Sanal-Hayes, Marie Mclaughlin, Lawrence D. Hayes, David Carless, Rachel Meach & Nicholas F. Sculthorpe

Future Health Technologies, Singapore-ETH Centre, Campus for Research Excellence and Technological Enterprise (CREATE), Singapore, Singapore

Jacqueline L. Mair

Saw Swee Hock School of Public Health, National University of Singapore, Singapore, Singapore

Long COVID Scotland, 12 Kemnay Place, Aberdeen, UK

Jane Ormerod

Physios for ME, London, UK

Natalie Hilliard

School of Education and Social Sciences, University of the West of Scotland, Glasgow, UK

Joanne Ingram

School of Health and Society, University of Salford, Salford, UK

Nilihan E. M. Sanal-Hayes

School of Sport, Exercise & Rehabilitation Sciences, University of Hull, Hull, UK

Marie Mclaughlin

You can also search for this author in PubMed   Google Scholar

Contributions

Authors’ contributions are given according to the CRediT taxonomy as follows: Conceptualization, N.E.M.S–H., M.M., L.D.H, and N.F.S.; methodology, N.E.M.S–H., M.M., L.D.H., and N.F.S.; software, N.E.M.S–H., M.M., L.D.H., and N.F.S.B.; validation, N.E.M.S–H., M.M., L.D.H, and N.F.S.; formal analysis, N.E.M.S–H., M.M., L.D.H., and N.F.S.; investigation, N.E.M.S–H., M.M., L.D.H., and N.F.S.; resources, L.D.H., J.O., D.C., N.H., J.L.M., and N.F.S.; data curation, N.E.M.S.-H., M.M., L.D.H., and N.F.S.; writing—original draft preparation, N.E.M.S.-H., M.M., L.D.H., and N.F.S.; writing—review and editing, N.E.M.S–H., M.M., L.D.H., J.O., D.C., N.H., R.M., J.L.M., J.I., and N.F.S.; visualisation, N.E.M.S–H. and M.M., supervision, N.F.S; project administration, N.E.M.S–H., M.M., L.D.H., and N.F.S.; funding acquisition, L.D.H., J.O., D.C., N.H., J.L.M., J.I., and N.F.S. All authors have read and agreed to the published version of the manuscript.

Corresponding author

Correspondence to Jacqueline L. Mair .

Ethics declarations

Ethical approval and content to participate.

This manuscript did not involve human participants, data, or tissues, so did not require ethical approval.

Consent for publication

This paper does not contain any individual person’s data in any form.

Competing interests

We report no financial and non-financial competing interests.

Additional information

Publisher's note.

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary Information

Additional file 1..

Supplementary file 1. Full search string for databse searching.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ . The Creative Commons Public Domain Dedication waiver ( http://creativecommons.org/publicdomain/zero/1.0/ ) applies to the data made available in this article, unless otherwise stated in a credit line to the data.

Reprints and permissions

About this article

Cite this article.

Sanal-Hayes, N.E.M., Mclaughlin, M., Hayes, L.D. et al. A scoping review of ‘Pacing’ for management of Myalgic Encephalomyelitis/Chronic Fatigue Syndrome (ME/CFS): lessons learned for the long COVID pandemic. J Transl Med 21 , 720 (2023). https://doi.org/10.1186/s12967-023-04587-5

Download citation

Received : 30 June 2023

Accepted : 03 October 2023

Published : 14 October 2023

DOI : https://doi.org/10.1186/s12967-023-04587-5

Share this article

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

  • Myalgic encephalomyelitis
  • Chronic fatigue syndrome
  • Post-exertional malaise

Journal of Translational Medicine

ISSN: 1479-5876

  • Submission enquiries: Access here and click Contact Us
  • General enquiries: [email protected]

review case study methodology

  • Open access
  • Published: 17 April 2024

Deciphering the influence: academic stress and its role in shaping learning approaches among nursing students: a cross-sectional study

  • Rawhia Salah Dogham 1 ,
  • Heba Fakieh Mansy Ali 1 ,
  • Asmaa Saber Ghaly 3 ,
  • Nermine M. Elcokany 2 ,
  • Mohamed Mahmoud Seweid 4 &
  • Ayman Mohamed El-Ashry   ORCID: orcid.org/0000-0001-7718-4942 5  

BMC Nursing volume  23 , Article number:  249 ( 2024 ) Cite this article

49 Accesses

Metrics details

Nursing education presents unique challenges, including high levels of academic stress and varied learning approaches among students. Understanding the relationship between academic stress and learning approaches is crucial for enhancing nursing education effectiveness and student well-being.

This study aimed to investigate the prevalence of academic stress and its correlation with learning approaches among nursing students.

Design and Method

A cross-sectional descriptive correlation research design was employed. A convenient sample of 1010 nursing students participated, completing socio-demographic data, the Perceived Stress Scale (PSS), and the Revised Study Process Questionnaire (R-SPQ-2 F).

Most nursing students experienced moderate academic stress (56.3%) and exhibited moderate levels of deep learning approaches (55.0%). Stress from a lack of professional knowledge and skills negatively correlates with deep learning approaches (r = -0.392) and positively correlates with surface learning approaches (r = 0.365). Female students showed higher deep learning approach scores, while male students exhibited higher surface learning approach scores. Age, gender, educational level, and academic stress significantly influenced learning approaches.

Academic stress significantly impacts learning approaches among nursing students. Strategies addressing stressors and promoting healthy learning approaches are essential for enhancing nursing education and student well-being.

Nursing implication

Understanding academic stress’s impact on nursing students’ learning approaches enables tailored interventions. Recognizing stressors informs strategies for promoting adaptive coping, fostering deep learning, and creating supportive environments. Integrating stress management, mentorship, and counseling enhances student well-being and nursing education quality.

Peer Review reports

Introduction

Nursing education is a demanding field that requires students to acquire extensive knowledge and skills to provide competent and compassionate care. Nursing education curriculum involves high-stress environments that can significantly impact students’ learning approaches and academic performance [ 1 , 2 ]. Numerous studies have investigated learning approaches in nursing education, highlighting the importance of identifying individual students’ preferred approaches. The most studied learning approaches include deep, surface, and strategic approaches. Deep learning approaches involve students actively seeking meaning, making connections, and critically analyzing information. Surface learning approaches focus on memorization and reproducing information without a more profound understanding. Strategic learning approaches aim to achieve high grades by adopting specific strategies, such as memorization techniques or time management skills [ 3 , 4 , 5 ].

Nursing education stands out due to its focus on practical training, where the blend of academic and clinical coursework becomes a significant stressor for students, despite academic stress being shared among all university students [ 6 , 7 , 8 ]. Consequently, nursing students are recognized as prone to high-stress levels. Stress is the physiological and psychological response that occurs when a biological control system identifies a deviation between the desired (target) state and the actual state of a fitness-critical variable, whether that discrepancy arises internally or externally to the human [ 9 ]. Stress levels can vary from objective threats to subjective appraisals, making it a highly personalized response to circumstances. Failure to manage these demands leads to stress imbalance [ 10 ].

Nursing students face three primary stressors during their education: academic, clinical, and personal/social stress. Academic stress is caused by the fear of failure in exams, assessments, and training, as well as workload concerns [ 11 ]. Clinical stress, on the other hand, arises from work-related difficulties such as coping with death, fear of failure, and interpersonal dynamics within the organization. Personal and social stressors are caused by an imbalance between home and school, financial hardships, and other factors. Throughout their education, nursing students have to deal with heavy workloads, time constraints, clinical placements, and high academic expectations. Multiple studies have shown that nursing students experience higher stress levels compared to students in other fields [ 12 , 13 , 14 ].

Research has examined the relationship between academic stress and coping strategies among nursing students, but no studies focus specifically on the learning approach and academic stress. However, existing literature suggests that students interested in nursing tend to experience lower levels of academic stress [ 7 ]. Therefore, interest in nursing can lead to deep learning approaches, which promote a comprehensive understanding of the subject matter, allowing students to feel more confident and less overwhelmed by coursework and exams. Conversely, students employing surface learning approaches may experience higher stress levels due to the reliance on memorization [ 3 ].

Understanding the interplay between academic stress and learning approaches among nursing students is essential for designing effective educational interventions. Nursing educators can foster deep learning approaches by incorporating active learning strategies, critical thinking exercises, and reflection activities into the curriculum [ 15 ]. Creating supportive learning environments encouraging collaboration, self-care, and stress management techniques can help alleviate academic stress. Additionally, providing mentorship and counselling services tailored to nursing students’ unique challenges can contribute to their overall well-being and academic success [ 16 , 17 , 18 ].

Despite the scarcity of research focusing on the link between academic stress and learning methods in nursing students, it’s crucial to identify the unique stressors they encounter. The intensity of these stressors can be connected to the learning strategies employed by these students. Academic stress and learning approach are intertwined aspects of the student experience. While academic stress can influence learning approaches, the choice of learning approach can also impact the level of academic stress experienced. By understanding this relationship and implementing strategies to promote healthy learning approaches and manage academic stress, educators and institutions can foster an environment conducive to deep learning and student well-being.

Hence, this study aims to investigate the correlation between academic stress and learning approaches experienced by nursing students.

Study objectives

Assess the levels of academic stress among nursing students.

Assess the learning approaches among nursing students.

Identify the relationship between academic stress and learning approach among nursing students.

Identify the effect of academic stress and related factors on learning approach and among nursing students.

Materials and methods

Research design.

A cross-sectional descriptive correlation research design adhering to the STROBE guidelines was used for this study.

A research project was conducted at Alexandria Nursing College, situated in Egypt. The college adheres to the national standards for nursing education and functions under the jurisdiction of the Egyptian Ministry of Higher Education. Alexandria Nursing College comprises nine specialized nursing departments that offer various nursing specializations. These departments include Nursing Administration, Community Health Nursing, Gerontological Nursing, Medical-Surgical Nursing, Critical Care Nursing, Pediatric Nursing, Obstetric and Gynecological Nursing, Nursing Education, and Psychiatric Nursing and Mental Health. The credit hour system is the fundamental basis of both undergraduate and graduate programs. This framework guarantees a thorough evaluation of academic outcomes by providing an organized structure for tracking academic progress and conducting analyses.

Participants and sample size calculation

The researchers used the Epi Info 7 program to calculate the sample size. The calculations were based on specific parameters such as a population size of 9886 students for the academic year 2022–2023, an expected frequency of 50%, a maximum margin of error of 5%, and a confidence coefficient of 99.9%. Based on these parameters, the program indicated that a minimum sample size of 976 students was required. As a result, the researchers recruited a convenient sample of 1010 nursing students from different academic levels during the 2022–2023 academic year [ 19 ]. This sample size was larger than the minimum required, which could help to increase the accuracy and reliability of the study results. Participation in the study required enrollment in a nursing program and voluntary agreement to take part. The exclusion criteria included individuals with mental illnesses based on their response and those who failed to complete the questionnaires.

socio-demographic data that include students’ age, sex, educational level, hours of sleep at night, hours spent studying, and GPA from the previous semester.

Tool two: the perceived stress scale (PSS)

It was initially created by Sheu et al. (1997) to gauge the level and nature of stress perceived by nursing students attending Taiwanese universities [ 20 ]. It comprises 29 items rated on a 5-point Likert scale, where (0 = never, 1 = rarely, 2 = sometimes, 3 = reasonably often, and 4 = very often), with a total score ranging from 0 to 116. The cut-off points of levels of perceived stress scale according to score percentage were low < 33.33%, moderate 33.33–66.66%, and high more than 66.66%. Higher scores indicate higher stress levels. The items are categorized into six subscales reflecting different sources of stress. The first subscale assesses “stress stemming from lack of professional knowledge and skills” and includes 3 items. The second subscale evaluates “stress from caring for patients” with 8 items. The third subscale measures “stress from assignments and workload” with 5 items. The fourth subscale focuses on “stress from interactions with teachers and nursing staff” with 6 items. The fifth subscale gauges “stress from the clinical environment” with 3 items. The sixth subscale addresses “stress from peers and daily life” with 4 items. El-Ashry et al. (2022) reported an excellent internal consistency reliability of 0.83 [ 21 ]. Two bilingual translators translated the English version of the scale into Arabic and then back-translated it into English by two other independent translators to verify its accuracy. The suitability of the translated version was confirmed through a confirmatory factor analysis (CFA), which yielded goodness-of-fit indices such as a comparative fit index (CFI) of 0.712, a Tucker-Lewis index (TLI) of 0.812, and a root mean square error of approximation (RMSEA) of 0.100.

Tool three: revised study process questionnaire (R-SPQ-2 F)

It was developed by Biggs et al. (2001). It examines deep and surface learning approaches using only 20 questions; each subscale contains 10 questions [ 22 ]. On a 5-point Likert scale ranging from 0 (never or only rarely true of me) to 4 (always or almost always accurate of me). The total score ranged from 0 to 80, with a higher score reflecting more deep or surface learning approaches. The cut-off points of levels of revised study process questionnaire according to score percentage were low < 33%, moderate 33–66%, and high more than 66%. Biggs et al. (2001) found that Cronbach alpha value was 0.73 for deep learning approach and 0.64 for the surface learning approach, which was considered acceptable. Two translators fluent in English and Arabic initially translated a scale from English to Arabic. To ensure the accuracy of the translation, they translated it back into English. The translated version’s appropriateness was evaluated using a confirmatory factor analysis (CFA). The CFA produced several goodness-of-fit indices, including a Comparative Fit Index (CFI) of 0.790, a Tucker-Lewis Index (TLI) of 0.912, and a Root Mean Square Error of Approximation (RMSEA) of 0.100. Comparative Fit Index (CFI) of 0.790, a Tucker-Lewis Index (TLI) of 0.912, and a Root Mean Square Error of Approximation (RMSEA) of 0.100.

Ethical considerations

The Alexandria University College of Nursing’s Research Ethics Committee provided ethical permission before the study’s implementation. Furthermore, pertinent authorities acquired ethical approval at participating nursing institutions. The vice deans of the participating institutions provided written informed consent attesting to institutional support and authority. By giving written informed consent, participants confirmed they were taking part voluntarily. Strict protocols were followed to protect participants’ privacy during the whole investigation. The obtained personal data was kept private and available only to the study team. Ensuring participants’ privacy and anonymity was of utmost importance.

Tools validity

The researchers created tool one after reviewing pertinent literature. Two bilingual translators independently translated the English version into Arabic to evaluate the applicability of the academic stress and learning approach tools for Arabic-speaking populations. To assure accuracy, two additional impartial translators back-translated the translation into English. They were also assessed by a five-person jury of professionals from the education and psychiatric nursing departments. The scales were found to have sufficiently evaluated the intended structures by the jury.

Pilot study

A preliminary investigation involved 100 nursing student applicants, distinct from the final sample, to gauge the efficacy, clarity, and potential obstacles in utilizing the research instruments. The pilot findings indicated that the instruments were accurate, comprehensible, and suitable for the target demographic. Additionally, Cronbach’s Alpha was utilized to further assess the instruments’ reliability, demonstrating internal solid consistency for both the learning approaches and academic stress tools, with values of 0.91 and 0.85, respectively.

Data collection

The researchers convened with each qualified student in a relaxed, unoccupied classroom in their respective college settings. Following a briefing on the study’s objectives, the students filled out the datasheet. The interviews typically lasted 15 to 20 min.

Data analysis

The data collected were analyzed using IBM SPSS software version 26.0. Following data entry, a thorough examination and verification were undertaken to ensure accuracy. The normality of quantitative data distributions was assessed using Kolmogorov-Smirnov tests. Cronbach’s Alpha was employed to evaluate the reliability and internal consistency of the study instruments. Descriptive statistics, including means (M), standard deviations (SD), and frequencies/percentages, were computed to summarize academic stress and learning approaches for categorical data. Student’s t-tests compared scores between two groups for normally distributed variables, while One-way ANOVA compared scores across more than two categories of a categorical variable. Pearson’s correlation coefficient determined the strength and direction of associations between customarily distributed quantitative variables. Hierarchical regression analysis identified the primary independent factors influencing learning approaches. Statistical significance was determined at the 5% (p < 0.05).

Table  1 presents socio-demographic data for a group of 1010 nursing students. The age distribution shows that 38.8% of the students were between 18 and 21 years old, 32.9% were between 21 and 24 years old, and 28.3% were between 24 and 28 years old, with an average age of approximately 22.79. Regarding gender, most of the students were female (77%), while 23% were male. The students were distributed across different educational years, a majority of 34.4% in the second year, followed by 29.4% in the fourth year. The students’ hours spent studying were found to be approximately two-thirds (67%) of the students who studied between 3 and 6 h. Similarly, sleep patterns differ among the students; more than three-quarters (77.3%) of students sleep between 5- to more than 7 h, and only 2.4% sleep less than 2 h per night. Finally, the student’s Grade Point Average (GPA) from the previous semester was also provided. 21% of the students had a GPA between 2 and 2.5, 40.9% had a GPA between 2.5 and 3, and 38.1% had a GPA between 3 and 3.5.

Figure  1 provides the learning approach level among nursing students. In terms of learning approach, most students (55.0%) exhibited a moderate level of deep learning approach, followed by 25.9% with a high level and 19.1% with a low level. The surface learning approach was more prevalent, with 47.8% of students showing a moderate level, 41.7% showing a low level, and only 10.5% exhibiting a high level.

figure 1

Nursing students? levels of learning approach (N=1010)

Figure  2 provides the types of academic stress levels among nursing students. Among nursing students, various stressors significantly impact their academic experiences. Foremost among these stressors are the pressure and demands associated with academic assignments and workload, with 30.8% of students attributing their high stress levels to these factors. Challenges within the clinical environment are closely behind, contributing significantly to high stress levels among 25.7% of nursing students. Interactions with peers and daily life stressors also weigh heavily on students, ranking third among sources of high stress, with 21.5% of students citing this as a significant factor. Similarly, interaction with teachers and nursing staff closely follow, contributing to high-stress levels for 20.3% of nursing students. While still significant, stress from taking care of patients ranks slightly lower, with 16.7% of students reporting it as a significant factor contributing to their academic stress. At the lowest end of the ranking, but still notable, is stress from a perceived lack of professional knowledge and skills, with 15.9% of students experiencing high stress in this area.

figure 2

Nursing students? levels of academic stress subtypes (N=1010)

Figure  3 provides the total levels of academic stress among nursing students. The majority of students experienced moderate academic stress (56.3%), followed by those experiencing low academic stress (29.9%), and a minority experienced high academic stress (13.8%).

figure 3

Nursing students? levels of total academic stress (N=1010)

Table  2 displays the correlation between academic stress subscales and deep and surface learning approaches among 1010 nursing students. All stress subscales exhibited a negative correlation regarding the deep learning approach, indicating that the inclination toward deep learning decreases with increasing stress levels. The most significant negative correlation was observed with stress stemming from the lack of professional knowledge and skills (r=-0.392, p < 0.001), followed by stress from the clinical environment (r=-0.109, p = 0.001), stress from assignments and workload (r=-0.103, p = 0.001), stress from peers and daily life (r=-0.095, p = 0.002), and stress from patient care responsibilities (r=-0.093, p = 0.003). The weakest negative correlation was found with stress from interactions with teachers and nursing staff (r=-0.083, p = 0.009). Conversely, concerning the surface learning approach, all stress subscales displayed a positive correlation, indicating that heightened stress levels corresponded with an increased tendency toward superficial learning. The most substantial positive correlation was observed with stress related to the lack of professional knowledge and skills (r = 0.365, p < 0.001), followed by stress from patient care responsibilities (r = 0.334, p < 0.001), overall stress (r = 0.355, p < 0.001), stress from interactions with teachers and nursing staff (r = 0.262, p < 0.001), stress from assignments and workload (r = 0.262, p < 0.001), and stress from the clinical environment (r = 0.254, p < 0.001). The weakest positive correlation was noted with stress stemming from peers and daily life (r = 0.186, p < 0.001).

Table  3 outlines the association between the socio-demographic characteristics of nursing students and their deep and surface learning approaches. Concerning age, statistically significant differences were observed in deep and surface learning approaches (F = 3.661, p = 0.003 and F = 7.983, p < 0.001, respectively). Gender also demonstrated significant differences in deep and surface learning approaches (t = 3.290, p = 0.001 and t = 8.638, p < 0.001, respectively). Female students exhibited higher scores in the deep learning approach (31.59 ± 8.28) compared to male students (29.59 ± 7.73), while male students had higher scores in the surface learning approach (29.97 ± 7.36) compared to female students (24.90 ± 7.97). Educational level exhibited statistically significant differences in deep and surface learning approaches (F = 5.599, p = 0.001 and F = 17.284, p < 0.001, respectively). Both deep and surface learning approach scores increased with higher educational levels. The duration of study hours demonstrated significant differences only in the surface learning approach (F = 3.550, p = 0.014), with scores increasing as study hours increased. However, no significant difference was observed in the deep learning approach (F = 0.861, p = 0.461). Hours of sleep per night and GPA from the previous semester did not exhibit statistically significant differences in deep or surface learning approaches.

Table  4 presents a multivariate linear regression analysis examining the factors influencing the learning approach among 1110 nursing students. The deep learning approach was positively influenced by age, gender (being female), educational year level, and stress from teachers and nursing staff, as indicated by their positive coefficients and significant p-values (p < 0.05). However, it was negatively influenced by stress from a lack of professional knowledge and skills. The other factors do not significantly influence the deep learning approach. On the other hand, the surface learning approach was positively influenced by gender (being female), educational year level, stress from lack of professional knowledge and skills, stress from assignments and workload, and stress from taking care of patients, as indicated by their positive coefficients and significant p-values (p < 0.05). However, it was negatively influenced by gender (being male). The other factors do not significantly influence the surface learning approach. The adjusted R-squared values indicated that the variables in the model explain 17.8% of the variance in the deep learning approach and 25.5% in the surface learning approach. Both models were statistically significant (p < 0.001).

Nursing students’ academic stress and learning approaches are essential to planning for effective and efficient learning. Nursing education also aims to develop knowledgeable and competent students with problem-solving and critical-thinking skills.

The study’s findings highlight the significant presence of stress among nursing students, with a majority experiencing moderate to severe levels of academic stress. This aligns with previous research indicating that academic stress is prevalent among nursing students. For instance, Zheng et al. (2022) observed moderated stress levels in nursing students during clinical placements [ 23 ], while El-Ashry et al. (2022) found that nearly all first-year nursing students in Egypt experienced severe academic stress [ 21 ]. Conversely, Ali and El-Sherbini (2018) reported that over three-quarters of nursing students faced high academic stress. The complexity of the nursing program likely contributes to these stress levels [ 24 ].

The current study revealed that nursing students identified the highest sources of academic stress as workload from assignments and the stress of caring for patients. This aligns with Banu et al.‘s (2015) findings, where academic demands, assignments, examinations, high workload, and combining clinical work with patient interaction were cited as everyday stressors [ 25 ]. Additionally, Anaman-Torgbor et al. (2021) identified lectures, assignments, and examinations as predictors of academic stress through logistic regression analysis. These stressors may stem from nursing programs emphasizing the development of highly qualified graduates who acquire knowledge, values, and skills through classroom and clinical experiences [ 26 ].

The results regarding learning approaches indicate that most nursing students predominantly employed the deep learning approach. Despite acknowledging a surface learning approach among the participants in the present study, the prevalence of deep learning was higher. This inclination toward the deep learning approach is anticipated in nursing students due to their engagement with advanced courses, requiring retention, integration, and transfer of information at elevated levels. The deep learning approach correlates with a gratifying learning experience and contributes to higher academic achievements [ 3 ]. Moreover, the nursing program’s emphasis on active learning strategies fosters critical thinking, problem-solving, and decision-making skills. These findings align with Mahmoud et al.‘s (2019) study, reporting a significant presence (83.31%) of the deep learning approach among undergraduate nursing students at King Khalid University’s Faculty of Nursing [ 27 ]. Additionally, Mohamed &Morsi (2019) found that most nursing students at Benha University’s Faculty of Nursing embraced the deep learning approach (65.4%) compared to the surface learning approach [ 28 ].

The study observed a negative correlation between the deep learning approach and the overall mean stress score, contrasting with a positive correlation between surface learning approaches and overall stress levels. Elevated academic stress levels may diminish motivation and engagement in the learning process, potentially leading students to feel overwhelmed, disinterested, or burned out, prompting a shift toward a surface learning approach. This finding resonates with previous research indicating that nursing students who actively seek positive academic support strategies during academic stress have better prospects for success than those who do not [ 29 ]. Nebhinani et al. (2020) identified interface concerns and academic workload as significant stress-related factors. Notably, only an interest in nursing demonstrated a significant association with stress levels, with participants interested in nursing primarily employing adaptive coping strategies compared to non-interested students.

The current research reveals a statistically significant inverse relationship between different dimensions of academic stress and adopting the deep learning approach. The most substantial negative correlation was observed with stress arising from a lack of professional knowledge and skills, succeeded by stress associated with the clinical environment, assignments, and workload. Nursing students encounter diverse stressors, including delivering patient care, handling assignments and workloads, navigating challenging interactions with staff and faculty, perceived inadequacies in clinical proficiency, and facing examinations [ 30 ].

In the current study, the multivariate linear regression analysis reveals that various factors positively influence the deep learning approach, including age, female gender, educational year level, and stress from teachers and nursing staff. In contrast, stress from a lack of professional knowledge and skills exert a negative influence. Conversely, the surface learning approach is positively influenced by female gender, educational year level, stress from lack of professional knowledge and skills, stress from assignments and workload, and stress from taking care of patients, but negatively affected by male gender. The models explain 17.8% and 25.5% of the variance in the deep and surface learning approaches, respectively, and both are statistically significant. These findings underscore the intricate interplay of demographic and stress-related factors in shaping nursing students’ learning approaches. High workloads and patient care responsibilities may compel students to prioritize completing tasks over deep comprehension. This pressure could lead to a surface learning approach as students focus on meeting immediate demands rather than engaging deeply with course material. This observation aligns with the findings of Alsayed et al. (2021), who identified age, gender, and study year as significant factors influencing students’ learning approaches.

Deep learners often demonstrate better self-regulation skills, such as effective time management, goal setting, and seeking support when needed. These skills can help manage academic stress and maintain a balanced learning approach. These are supported by studies that studied the effect of coping strategies on stress levels [ 6 , 31 , 32 ]. On the contrary, Pacheco-Castillo et al. study (2021) found a strong significant relationship between academic stressors and students’ level of performance. That study also proved that the more academic stress a student faces, the lower their academic achievement.

Strengths and limitations of the study

This study has lots of advantages. It provides insightful information about the educational experiences of Egyptian nursing students, a demographic that has yet to receive much research. The study’s limited generalizability to other people or nations stems from its concentration on this particular group. This might be addressed in future studies by using a more varied sample. Another drawback is the dependence on self-reported metrics, which may contain biases and mistakes. Although the cross-sectional design offers a moment-in-time view of the problem, it cannot determine causation or evaluate changes over time. To address this, longitudinal research may be carried out.

Notwithstanding these drawbacks, the study substantially contributes to the expanding knowledge of academic stress and nursing students’ learning styles. Additional research is needed to determine teaching strategies that improve deep-learning approaches among nursing students. A qualitative study is required to analyze learning approaches and factors that may influence nursing students’ selection of learning approaches.

According to the present study’s findings, nursing students encounter considerable academic stress, primarily stemming from heavy assignments and workload, as well as interactions with teachers and nursing staff. Additionally, it was observed that students who experience lower levels of academic stress typically adopt a deep learning approach, whereas those facing higher stress levels tend to resort to a surface learning approach. Demographic factors such as age, gender, and educational level influence nursing students’ choice of learning approach. Specifically, female students are more inclined towards deep learning, whereas male students prefer surface learning. Moreover, deep and surface learning approach scores show an upward trend with increasing educational levels and study hours. Academic stress emerges as a significant determinant shaping the adoption of learning approaches among nursing students.

Implications in nursing practice

Nursing programs should consider integrating stress management techniques into their curriculum. Providing students with resources and skills to cope with academic stress can improve their well-being and academic performance. Educators can incorporate teaching strategies that promote deep learning approaches, such as problem-based learning, critical thinking exercises, and active learning methods. These approaches help students engage more deeply with course material and reduce reliance on surface learning techniques. Recognizing the gender differences in learning approaches, nursing programs can offer gender-specific support services and resources. For example, providing targeted workshops or counseling services that address male and female nursing students’ unique stressors and learning needs. Implementing mentorship programs and peer support groups can create a supportive environment where students can share experiences, seek advice, and receive encouragement from their peers and faculty members. Encouraging students to reflect on their learning processes and identify effective study strategies can help them develop metacognitive skills and become more self-directed learners. Faculty members can facilitate this process by incorporating reflective exercises into the curriculum. Nursing faculty and staff should receive training on recognizing signs of academic stress among students and providing appropriate support and resources. Additionally, professional development opportunities can help educators stay updated on evidence-based teaching strategies and practical interventions for addressing student stress.

Data availability

The datasets generated and/or analysed during the current study are not publicly available due to restrictions imposed by the institutional review board to protect participant confidentiality, but are available from the corresponding author on reasonable request.

Liu J, Yang Y, Chen J, Zhang Y, Zeng Y, Li J. Stress and coping styles among nursing students during the initial period of the clinical practicum: A cross-section study. Int J Nurs Sci. 2022a;9(2). https://doi.org/10.1016/j.ijnss.2022.02.004 .

Saifan A, Devadas B, Daradkeh F, Abdel-Fattah H, Aljabery M, Michael LM. Solutions to bridge the theory-practice gap in nursing education in the UAE: a qualitative study. BMC Med Educ. 2021;21(1). https://doi.org/10.1186/s12909-021-02919-x .

Alsayed S, Alshammari F, Pasay-an E, Dator WL. Investigating the learning approaches of students in nursing education. J Taibah Univ Med Sci. 2021;16(1). https://doi.org/10.1016/j.jtumed.2020.10.008 .

Salah Dogham R, Elcokany NM, Saber Ghaly A, Dawood TMA, Aldakheel FM, Llaguno MBB, Mohsen DM. Self-directed learning readiness and online learning self-efficacy among undergraduate nursing students. Int J Afr Nurs Sci. 2022;17. https://doi.org/10.1016/j.ijans.2022.100490 .

Zhao Y, Kuan HK, Chung JOK, Chan CKY, Li WHC. Students’ approaches to learning in a clinical practicum: a psychometric evaluation based on item response theory. Nurse Educ Today. 2018;66. https://doi.org/10.1016/j.nedt.2018.04.015 .

Huang HM, Fang YW. Stress and coping strategies of online nursing practicum courses for Taiwanese nursing students during the COVID-19 pandemic: a qualitative study. Healthcare. 2023;11(14). https://doi.org/10.3390/healthcare11142053 .

Nebhinani M, Kumar A, Parihar A, Rani R. Stress and coping strategies among undergraduate nursing students: a descriptive assessment from Western Rajasthan. Indian J Community Med. 2020;45(2). https://doi.org/10.4103/ijcm.IJCM_231_19 .

Olvera Alvarez HA, Provencio-Vasquez E, Slavich GM, Laurent JGC, Browning M, McKee-Lopez G, Robbins L, Spengler JD. Stress and health in nursing students: the Nurse Engagement and Wellness Study. Nurs Res. 2019;68(6). https://doi.org/10.1097/NNR.0000000000000383 .

Del Giudice M, Buck CL, Chaby LE, Gormally BM, Taff CC, Thawley CJ, Vitousek MN, Wada H. What is stress? A systems perspective. Integr Comp Biol. 2018;58(6):1019–32. https://doi.org/10.1093/icb/icy114 .

Article   PubMed   Google Scholar  

Bhui K, Dinos S, Galant-Miecznikowska M, de Jongh B, Stansfeld S. Perceptions of work stress causes and effective interventions in employees working in public, private and non-governmental organisations: a qualitative study. BJPsych Bull. 2016;40(6). https://doi.org/10.1192/pb.bp.115.050823 .

Lavoie-Tremblay M, Sanzone L, Aubé T, Paquet M. Sources of stress and coping strategies among undergraduate nursing students across all years. Can J Nurs Res. 2021. https://doi.org/10.1177/08445621211028076 .

Article   PubMed   PubMed Central   Google Scholar  

Ahmed WAM, Abdulla YHA, Alkhadher MA, Alshameri FA. Perceived stress and coping strategies among nursing students during the COVID-19 pandemic: a systematic review. Saudi J Health Syst Res. 2022;2(3). https://doi.org/10.1159/000526061 .

Pacheco-Castillo J, Casuso-Holgado MJ, Labajos-Manzanares MT, Moreno-Morales N. Academic stress among nursing students in a Private University at Puerto Rico, and its Association with their academic performance. Open J Nurs. 2021;11(09). https://doi.org/10.4236/ojn.2021.119063 .

Tran TTT, Nguyen NB, Luong MA, Bui THA, Phan TD, Tran VO, Ngo TH, Minas H, Nguyen TQ. Stress, anxiety and depression in clinical nurses in Vietnam: a cross-sectional survey and cluster analysis. Int J Ment Health Syst. 2019;13(1). https://doi.org/10.1186/s13033-018-0257-4 .

Magnavita N, Chiorri C. Academic stress and active learning of nursing students: a cross-sectional study. Nurse Educ Today. 2018;68. https://doi.org/10.1016/j.nedt.2018.06.003 .

Folkvord SE, Risa CF. Factors that enhance midwifery students’ learning and development of self-efficacy in clinical placement: a systematic qualitative review. Nurse Educ Pract. 2023;66. https://doi.org/10.1016/j.nepr.2022.103510 .

Myers SB, Sweeney AC, Popick V, Wesley K, Bordfeld A, Fingerhut R. Self-care practices and perceived stress levels among psychology graduate students. Train Educ Prof Psychol. 2012;6(1). https://doi.org/10.1037/a0026534 .

Zeb H, Arif I, Younas A. Nurse educators’ experiences of fostering undergraduate students’ ability to manage stress and demanding situations: a phenomenological inquiry. Nurse Educ Pract. 2022;65. https://doi.org/10.1016/j.nepr.2022.103501 .

Centers for Disease Control and Prevention. User Guide| Support| Epi Info™ [Internet]. Atlanta: CDC; [cited 2024 Jan 31]. Available from: CDC website.

Sheu S, Lin HS, Hwang SL, Yu PJ, Hu WY, Lou MF. The development and testing of a perceived stress scale for nursing students in clinical practice. J Nurs Res. 1997;5:41–52. Available from: http://ntur.lib.ntu.edu.tw/handle/246246/165917 .

El-Ashry AM, Harby SS, Ali AAG. Clinical stressors as perceived by first-year nursing students of their experience at Alexandria main university hospital during the COVID-19 pandemic. Arch Psychiatr Nurs. 2022;41:214–20. https://doi.org/10.1016/j.apnu.2022.08.007 .

Biggs J, Kember D, Leung DYP. The revised two-factor study process questionnaire: R-SPQ-2F. Br J Educ Psychol. 2001;71(1):133–49. https://doi.org/10.1348/000709901158433 .

Article   CAS   PubMed   Google Scholar  

Zheng YX, Jiao JR, Hao WN. Stress levels of nursing students: a systematic review and meta-analysis. Med (United States). 2022;101(36). https://doi.org/10.1097/MD.0000000000030547 .

Ali AM, El-Sherbini HH. Academic stress and its contributing factors among faculty nursing students in Alexandria. Alexandria Scientific Nursing Journal. 2018; 20(1):163–181. Available from: https://asalexu.journals.ekb.eg/article_207756_b62caf4d7e1e7a3b292bbb3c6632a0ab.pdf .

Banu P, Deb S, Vardhan V, Rao T. Perceived academic stress of university students across gender, academic streams, semesters, and academic performance. Indian J Health Wellbeing. 2015;6(3):231–235. Available from: http://www.iahrw.com/index.php/home/journal_detail/19#list .

Anaman-Torgbor JA, Tarkang E, Adedia D, Attah OM, Evans A, Sabina N. Academic-related stress among Ghanaian nursing students. Florence Nightingale J Nurs. 2021;29(3):263. https://doi.org/10.5152/FNJN.2021.21030 .

Mahmoud HG, Ahmed KE, Ibrahim EA. Learning Styles and Learning Approaches of Bachelor Nursing Students and its Relation to Their Achievement. Int J Nurs Didact. 2019;9(03):11–20. Available from: http://www.nursingdidactics.com/index.php/ijnd/article/view/2465 .

Mohamed NAAA, Morsi MES, Learning Styles L, Approaches. Academic achievement factors, and self efficacy among nursing students. Int J Novel Res Healthc Nurs. 2019;6(1):818–30. Available from: www.noveltyjournals.com.

Google Scholar  

Onieva-Zafra MD, Fernández-Muñoz JJ, Fernández-Martínez E, García-Sánchez FJ, Abreu-Sánchez A, Parra-Fernández ML. Anxiety, perceived stress and coping strategies in nursing students: a cross-sectional, correlational, descriptive study. BMC Med Educ. 2020;20:1–9. https://doi.org/10.1186/s12909-020-02294-z .

Article   Google Scholar  

Aljohani W, Banakhar M, Sharif L, Alsaggaf F, Felemban O, Wright R. Sources of stress among Saudi Arabian nursing students: a cross-sectional study. Int J Environ Res Public Health. 2021;18(22). https://doi.org/10.3390/ijerph182211958 .

Liu Y, Wang L, Shao H, Han P, Jiang J, Duan X. Nursing students’ experience during their practicum in an intensive care unit: a qualitative meta-synthesis. Front Public Health. 2022;10. https://doi.org/10.3389/fpubh.2022.974244 .

Majrashi A, Khalil A, Nagshabandi E, Al MA. Stressors and coping strategies among nursing students during the COVID-19 pandemic: scoping review. Nurs Rep. 2021;11(2):444–59. https://doi.org/10.3390/nursrep11020042 .

Download references

Acknowledgements

Our sincere thanks go to all the nursing students in the study. We also want to thank Dr/ Rasha Badry for their statistical analysis help and contribution to this study.

The research was not funded by public, commercial, or non-profit organizations.

Open access funding provided by The Science, Technology & Innovation Funding Authority (STDF) in cooperation with The Egyptian Knowledge Bank (EKB).

Author information

Authors and affiliations.

Nursing Education, Faculty of Nursing, Alexandria University, Alexandria, Egypt

Rawhia Salah Dogham & Heba Fakieh Mansy Ali

Critical Care & Emergency Nursing, Faculty of Nursing, Alexandria University, Alexandria, Egypt

Nermine M. Elcokany

Obstetrics and Gynecology Nursing, Faculty of Nursing, Alexandria University, Alexandria, Egypt

Asmaa Saber Ghaly

Faculty of Nursing, Beni-Suef University, Beni-Suef, Egypt

Mohamed Mahmoud Seweid

Psychiatric and Mental Health Nursing, Faculty of Nursing, Alexandria University, Alexandria, Egypt

Ayman Mohamed El-Ashry

You can also search for this author in PubMed   Google Scholar

Contributions

Ayman M. El-Ashry & Rawhia S. Dogham: conceptualization, preparation, and data collection; methodology; investigation; formal analysis; data analysis; writing-original draft; writing-manuscript; and editing. Heba F. Mansy Ali & Asmaa S. Ghaly: conceptualization, preparation, methodology, investigation, writing-original draft, writing-review, and editing. Nermine M. Elcokany & Mohamed M. Seweid: Methodology, investigation, formal analysis, data collection, writing-manuscript & editing. All authors reviewed the manuscript and accept for publication.

Corresponding author

Correspondence to Ayman Mohamed El-Ashry .

Ethics declarations

Ethics approval and consent to participate.

The research adhered to the guidelines and regulations outlined in the Declaration of Helsinki (DoH-Oct2008). The Faculty of Nursing’s Research Ethical Committee (REC) at Alexandria University approved data collection in this study (IRB00013620/95/9/2022). Participants were required to sign an informed written consent form, which included an explanation of the research and an assessment of their understanding.

Consent for publication

Not applicable.

Competing interests

The authors declare that there is no conflict of interest.

Additional information

Publisher’s note.

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ . The Creative Commons Public Domain Dedication waiver ( http://creativecommons.org/publicdomain/zero/1.0/ ) applies to the data made available in this article, unless otherwise stated in a credit line to the data.

Reprints and permissions

About this article

Cite this article.

Dogham, R.S., Ali, H.F.M., Ghaly, A.S. et al. Deciphering the influence: academic stress and its role in shaping learning approaches among nursing students: a cross-sectional study. BMC Nurs 23 , 249 (2024). https://doi.org/10.1186/s12912-024-01885-1

Download citation

Received : 31 January 2024

Accepted : 21 March 2024

Published : 17 April 2024

DOI : https://doi.org/10.1186/s12912-024-01885-1

Share this article

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

  • Academic stress
  • Learning approaches
  • Nursing students

BMC Nursing

ISSN: 1472-6955

review case study methodology

  • Research article
  • Open access
  • Published: 15 April 2024

What is quality in long covid care? Lessons from a national quality improvement collaborative and multi-site ethnography

  • Trisha Greenhalgh   ORCID: orcid.org/0000-0003-2369-8088 1 ,
  • Julie L. Darbyshire 1 ,
  • Cassie Lee 2 ,
  • Emma Ladds 1 &
  • Jenny Ceolta-Smith 3  

BMC Medicine volume  22 , Article number:  159 ( 2024 ) Cite this article

1263 Accesses

59 Altmetric

Metrics details

Long covid (post covid-19 condition) is a complex condition with diverse manifestations, uncertain prognosis and wide variation in current approaches to management. There have been calls for formal quality standards to reduce a so-called “postcode lottery” of care. The original aim of this study—to examine the nature of quality in long covid care and reduce unwarranted variation in services—evolved to focus on examining the reasons why standardizing care was so challenging in this condition.

In 2021–2023, we ran a quality improvement collaborative across 10 UK sites. The dataset reported here was mostly but not entirely qualitative. It included data on the origins and current context of each clinic, interviews with staff and patients, and ethnographic observations at 13 clinics (50 consultations) and 45 multidisciplinary team (MDT) meetings (244 patient cases). Data collection and analysis were informed by relevant lenses from clinical care (e.g. evidence-based guidelines), improvement science (e.g. quality improvement cycles) and philosophy of knowledge.

Participating clinics made progress towards standardizing assessment and management in some topics; some variation remained but this could usually be explained. Clinics had different histories and path dependencies, occupied a different place in their healthcare ecosystem and served a varied caseload including a high proportion of patients with comorbidities. A key mechanism for achieving high-quality long covid care was when local MDTs deliberated on unusual, complex or challenging cases for which evidence-based guidelines provided no easy answers. In such cases, collective learning occurred through idiographic (case-based) reasoning , in which practitioners build lessons from the particular to the general. This contrasts with the nomothetic reasoning implicit in evidence-based guidelines, in which reasoning is assumed to go from the general (e.g. findings of clinical trials) to the particular (management of individual patients).

Not all variation in long covid services is unwarranted. Largely because long covid’s manifestations are so varied and comorbidities common, generic “evidence-based” standards require much individual adaptation. In this complex condition, quality improvement resources may be productively spent supporting MDTs to optimise their case-based learning through interdisciplinary discussion. Quality assessment of a long covid service should include review of a sample of individual cases to assess how guidelines have been interpreted and personalized to meet patients’ unique needs.

Study registration

NCT05057260, ISRCTN15022307.

Peer Review reports

The term “long covid” [ 1 ] means prolonged symptoms following SARS-CoV-2 infection not explained by an alternative diagnosis [ 2 ]. It embraces the US term “post-covid conditions” (symptoms beyond 4 weeks) [ 3 ], the UK terms “ongoing symptomatic covid-19” (symptoms lasting 4–12 weeks) and “post covid-19 syndrome” (symptoms beyond 12 weeks) [ 4 ] and the World Health Organization’s “post covid-19 condition” (symptoms occurring beyond 3 months and persisting for at least 2 months) [ 5 ]. Long covid thus defined is extremely common. In UK, for example, 1.8 million of a population of 67 million met the criteria for long covid in early 2023 and 41% of these had been unwell for more than 2 years [ 6 ].

Long covid is characterized by a constellation of symptoms which may include breathlessness, fatigue, muscle and joint pain, chest pain, memory loss and impaired concentration (“brain fog”), sleep disturbance, depression, anxiety, palpitations, dizziness, gastrointestinal problems such as diarrhea, skin rashes and allergy to food or drugs [ 2 ]. These lead to difficulties with essential daily activities such as washing and dressing, impaired exercise tolerance and ability to work, and reduced quality of life [ 2 , 7 , 8 ]. Symptoms typically cluster (e.g. in different patients, long covid may be dominated by fatigue, by breathlessness or by palpitations and dizziness) [ 9 , 10 ]. Long covid may follow a fairly constant course or a relapsing and remitting one, perhaps with specific triggers [ 11 ]. Overlaps between fatigue-dominant subtypes of long covid, myalgic encephalomyelitis and chronic fatigue syndrome have been hypothesized [ 12 ] but at the time of writing remain unproven.

Long covid has been a contested condition from the outset. Whilst long-term sequelae following other coronavirus (SARS and MERS) infections were already well-documented [ 13 ], SARS-CoV-2 was originally thought to cause a short-lived respiratory illness from which the patient either died or recovered [ 14 ]. Some clinicians dismissed protracted or relapsing symptoms as due to anxiety or deconditioning, especially if the patient had not had laboratory-confirmed covid-19. People with long covid got together in online groups and shared accounts of their symptoms and experiences of such “gaslighting” in their healthcare encounters [ 15 , 16 ]. Some groups conducted surveys on their members, documenting the wide range of symptoms listed in the previous paragraph and showing that whilst long covid is more commonly a sequel to severe acute covid-19, it can (rarely) follow a mild or even asymptomatic acute infection [ 17 ].

Early publications on long covid depicted a post-pneumonia syndrome which primarily affected patients who had been hospitalized (and sometimes ventilated) [ 18 , 19 ]. Later, covid-19 was recognized to be a multi-organ inflammatory condition (the pneumonia, for example, was reclassified as pneumonitis ) and its long-term sequelae attributed to a combination of viral persistence, dysregulated immune response (including auto-immunity), endothelial dysfunction and immuno-thrombosis, leading to damage to the lining of small blood vessels and (thence) interference with transfer of oxygen and nutrients to vital organs [ 20 , 21 , 22 , 23 , 24 ]. But most such studies were highly specialized, laboratory-based and written primarily for an audience of fellow laboratory researchers. Despite demonstrating mean differences in a number of metabolic variables, they failed to identify a reliable biomarker that could be used routinely in the clinic to rule a diagnosis of long covid in or out. Whilst the evidence base from laboratory studies grew rapidly, it had little influence on clinical management—partly because most long covid clinics had been set up with impressive speed by front-line clinical teams to address an immediate crisis, with little or no input from immunologists, virologists or metabolic specialists [ 25 ].

Studies of the patient experience revealed wide geographical variation in whether any long covid services were provided and (if they were) which patients were eligible for these and what tests and treatments were available [ 26 ]. An interim UK clinical guideline for long covid had been produced at speed and published in December 2020 [ 27 ], but it was uncertain about diagnostic criteria, investigations, treatments and prognosis. Early policy recommendations for long covid services in England, based on wide consultation across UK, had proposed a tiered service with “tier 1” being supported self-management, “tier 2” generalist assessment and management in primary care, “tier 3” specialist rehabilitation or respiratory follow-up with oversight from a consultant physician and “tier 4” tertiary care for patients with complications or complex needs [ 28 ]. In 2021, ring-fenced funding was allocated to establish 90 multidisciplinary long covid clinics in England [ 29 ]; some clinics were also set up with local funding in Scotland and Wales. These clinics varied widely in eligibility criteria, referral pathways, staffing mix (some had no doctors at all) and investigations and treatments offered. A further policy document on improving long covid services was published in 2022 [ 30 ]; it recommended that specialist long covid clinics should continue, though the long-term funding of these services remains uncertain [ 31 ]. To build the evidence base for delivering long covid services, major programs of publicly funded research were commenced in both UK [ 32 ] and USA [ 33 ].

In short, at the time this study began (late 2021), there appeared to be much scope for a program of quality improvement which would capture fast-emerging research findings, establish evidence-based standards and ensure these were rapidly disseminated and consistently adopted across both specialist long covid services and in primary care.

Quality improvement collaboratives

The quality improvement movement in healthcare was born in the early 1980s when clinicians and policymakers US and UK [ 34 , 35 , 36 , 37 ] began to draw on insights from outside the sector [ 38 , 39 , 40 ]. Adapting a total quality management approach that had previously transformed the Japanese car industry, they sought to improve efficiency, reduce waste, shift to treating the upstream causes of problems (hence preventing disease) and help all services approach the standards of excellence achieved by the best. They developed an approach based on (a) understanding healthcare as a complex system (especially its key interdependencies and workflows), (b) analysing and addressing variation within the system, (c) learning continuously from real-world data and (d) developing leaders who could motivate people and help them change structures and processes [ 41 , 42 , 43 , 44 ].

Quality improvement collaboratives (originally termed “breakthrough collaboratives” [ 45 ]), in which representatives from different healthcare organizations come together to address a common problem, identify best practice, set goals, share data and initiate and evaluate improvement efforts [ 46 ], are one model used to deliver system-wide quality improvement. It is widely assumed that these collaboratives work because—and to the extent that—they identify, interpret and implement high-quality evidence (e.g. from randomized controlled trials).

Research on why quality improvement collaboratives succeed or fail has produced the following list of critical success factors: taking a whole-system approach, selecting a topic and goal that fits with organizations’ priorities, fostering a culture of quality improvement (e.g. that quality is everyone’s job), engagement of everyone (including the multidisciplinary clinical team, managers, patients and families) in the improvement effort, clearly defining people’s roles and contribution, engaging people in preliminary groundwork, providing organizational-level support (e.g. chief executive endorsement, protected staff time, training and support for teams, resources, quality-focused human resource practices, external facilitation if needed), training in specific quality improvement techniques (e.g. plan-do-study-act cycle), attending to the human dimension (including cultivating trust and working to ensure shared vision and buy-in), continuously generating reliable data on both processes (e.g. current practice) and outcomes (clinical, satisfaction) and a “learning system” infrastructure in which knowledge that is generated feeds into individual, team and organizational learning [ 47 , 48 , 49 , 50 , 51 , 52 , 53 , 54 ].

The quality improvement collaborative approach has delivered many successes but it has been criticized at a theoretical level for over-simplifying the social science of human motivation and behaviour and for adopting a somewhat mechanical approach to the study of complex systems [ 55 , 56 ]. Adaptations of the original quality improvement methodology (e.g. from Sweden [ 57 , 58 ]) have placed greater emphasis on human values and meaning-making, on the grounds that reducing the complexities of a system-wide quality improvement effort to a set of abstract and generic “success factors” will miss unique aspects of the case such as historical path dependencies, personalities, framing and meaning-making and micropolitics [ 59 ].

Perhaps this explains why, when the abovementioned factors are met, a quality improvement collaborative’s success is more likely but is not guaranteed, as a systematic review demonstrated [ 60 ]. Some well-designed and well-resourced collaboratives addressing clear knowledge gaps produced few or no sustained changes in key outcome measures [ 49 , 53 , 60 , 61 , 62 ]. To identify why this might be, a detailed understanding of a service’s history, current challenges and contextual constraints is needed. This explains our decision, part-way through the study reported here, to collect rich contextual data on participating sites so as to better explain success or failure of our own collaborative.

Warranted and unwarranted variation in clinical practice

A generation ago, Wennberg described most variation in clinical practice as “unwarranted” (which he defined as variation in the utilization of health care services that cannot be explained by variation in patient illness or patient preferences) [ 63 ]. Others coined the term “postcode lottery” to depict how such variation allegedly impacted on health outcomes [ 64 ]. Wennberg and colleagues’ Atlas of Variation , introduced in 1999 [ 65 ], and its UK equivalent, introduced in 2010 [ 66 ], described wide regional differences in the rates of procedures from arthroscopy to hysterectomy, and were used to prompt services to identify and address examples of under-treatment, mis-treatment and over-treatment. Numerous similar initiatives, mostly based on hospital activity statistics, have been introduced around the world [ 66 , 67 , 68 , 69 ]. Sutherland and Levesque’s proposed framework for analysing variation, for example, has three domains: capacity (broadly, whether sufficient resources are allocated at organizational level and whether individuals have the time and headspace to get involved), evidence (the extent to which evidence-based guidelines exist and are followed), and agency (e.g. whether clinicians are engaged with the issue and the effect of patient choice) [ 70 ].

Whilst it is clearly a good idea to identify unwarranted variation in practice, it is also important to acknowledge that variation can be warranted . The very act of measuring and describing variation carries great rhetorical power, since revealing geographical variation in any chosen metric effectively frames this as a problem with a conceptually simple solution (reducing variation) that will appeal to both politicians and the public [ 71 ]. The temptation to expose variation (e.g. via visualizations such as maps) and address it in mechanistic ways should be resisted until we have fully understood the reasons why it exists, which may include perverse incentives, insufficient opportunities to discuss cases with colleagues, weak or absent feedback on practice, unclear decision processes, contested definitions of appropriate care and professional challenges to guidelines [ 72 ].

Research question, aims and objectives

Research question.

What is quality in long covid care and how can it best be achieved?

To identify best practice and reduce unwarranted variation in UK long covid services.

To explain aspects of variation in long covid services that are or may be warranted.

Our original objectives were to:

Establish a quality improvement collaborative for 10 long covid clinics across UK.

Use quality improvement methods in collaboration with patients and clinic staff to prioritize aspects of care to improve. For each priority topic, identify best (evidence-informed) clinical practice, measure performance in each clinic, compare performance with a best practice benchmark and improve performance.

Produce organizational case studies of participating long covid clinics to explain their origins, evolution, leadership, ethos, population served, patient pathways and place in the wider healthcare ecosystem.

Examine these case studies to explain variation in practice, especially in topics where the quality improvement cycle proves difficult to follow or has limited impact.

The LOCOMOTION study

LOCOMOTION (LOng COvid Multidisciplinary consortium Optimising Treatments and services across the NHS) was a 30-month multi-site case study of 10 long covid clinics (8 in England, 1 in Wales and 1 in Scotland), beginning in 2021, which sought to optimise long covid care. Each clinic offered multidisciplinary care to patients referred from primary or secondary care (and, in some cases, self-referred), and held regular multidisciplinary team (MDT) meetings, mostly online via Microsoft Teams, to discuss cases. A study protocol for LOCOMOTION, with details of ethical approvals, management, governance and patient involvement has been published [ 25 ]. The three main work packages addressed quality improvement, technology-supported patient self-management and phenotyping and symptom clustering. This paper reports on the first work package, focusing mainly on qualitative findings.

Setting up the quality improvement collaborative

We broadly followed standard methodology for “breakthrough” quality improvement collaboratives [ 44 , 45 ], with two exceptions. First, because of geographical distance, continuing pandemic precautions and developments in videoconferencing technology, meetings were held online. Second, unlike in the original breakthrough model, patients were included in the collaborative, reflecting the cultural change towards patient partnerships since the model was originally proposed 40 years ago.

Each site appointed a clinical research fellow (doctor, nurse or allied health professional) funded partly by the LOCOMOTION study and partly with clinical sessions; some were existing staff who were backfilled to take on a research role whilst others were new appointments. The quality improvement meetings were held approximately every 8 weeks on Microsoft Teams and lasted about 2 h; there was an agenda and a chair, and meetings were recorded with consent. The clinical research fellow from each clinic attended, sometimes joined by the clinical lead for that site. In the initial meeting, the group proposed and prioritized topics before merging their consensus with the list of priority topics generated separately by patients (there was much overlap but also some differences).

In subsequent meetings, participants attempted to reach consensus on how to define, measure and achieve quality for each priority topic in turn, implement this approach in their own clinic and monitor its impact. Clinical leads prepared illustrative clinical cases and summaries of the research evidence, which they presented using Microsoft Powerpoint; the group then worked towards consensus on the implications for practice through general discussion. Clinical research fellows assisted with literature searches, collected baseline data from their own clinic, prepared and presented anonymized case examples, and contributed to collaborative goal-setting for improvement. Progress on each topic was reviewed at a later meeting after an agreed interval.

An additional element of this work package was semi-structured interviews with 29 patients, recruited from 9 of the 10 participating sites, about their clinic experiences with a view to feeding into service improvement (in the other site, no patient volunteered).

Our patient advisory group initially met separately from the quality improvement collaborative. They designed a short survey of current practice and sent it to each clinic; the results of this informed a prioritization exercise for topics where they considered change was needed. The patient-generated list was tabled at the quality improvement collaborative discussions, but patients were understandably keen to join these discussions directly. After about 9 months, some patient advisory group members joined the regular collaborative meetings. This dynamic was not without its tensions, since sharing performance data requires trust and there were some concerns about confidentiality when real patient cases were discussed with other patients present.

How evidence-informed quality targets were set

At the time the study began, there were no published large-scale randomized controlled trials of any interventions for long covid. We therefore followed a model used successfully in other quality improvement efforts where research evidence was limited or absent or it did not translate unambiguously into models for current services. In such circumstances, the best evidence may be custom and practice in the best-performing units. The quality improvement effort becomes oriented to what one group of researchers called “potentially better practices”—that is, practices that are “developed through analysis of the processes of care, literature review, and site visits” (page 14) [ 73 ]. The idea was that facilitated discussion among clinical teams, drawing on published research where available but also incorporating clinical experience, established practice and systematic analysis of performance data across participating clinics would surface these “potentially better practices”—an approach which, though not formally tested in controlled trials, appears to be associated with improved outcomes [ 46 , 73 ].

Adding an ethnographic component

Following limited progress made on some topics that had been designated high priority, we interviewed all 10 clinical research fellows (either individually or, in two cases, with a senior clinician present) and 18 other clinic staff (five individually plus two groups of 5 and 8), along with additional informal discussions, to explore the challenges of implementing the changes that had been agreed. These interviews were not audiotaped but detailed notes were made and typed up immediately afterwards. It became evident that some aspects of what the collaborative had deemed “evidence-informed” care were contested by front-line clinic staff, perceived as irrelevant to the service they were delivering, or considered impossible to implement. To unpack these issues further, the research protocol was amended to include an ethnographic component.

TG and EL (academic general practitioners) and JLD (a qualitative researcher with a PhD in the patient experience) attended a total of 45 MDT meetings in participating clinics (mostly online or hybrid). Staff were informed in advance that there would be an observer present; nobody objected. We noted brief demographic and clinical details of cases discussed (but no identifying data), dilemmas and uncertainties on which discussions focused, and how different staff members contributed.

TG made 13 in-person visits to participating long covid clinics. Staff were notified in advance; all were happy to be observed. Visits lasted between 5 and 8 h (54 h in total). We observed support staff booking patients in and processing requests and referrals, and shadowed different clinical staff in turn as they saw patients. Patients were informed of our presence and its purpose beforehand and given the opportunity to decline (three of 53 patients approached did). We discussed aspects of each case with the clinician after the patient left. When invited, we took breaks with staff and used these as an opportunity to ask them informally what it was like working in the clinic.

Ethnographic observation, analysis and reporting was geared to generating a rich interpretive account of the clinical, operational and interpersonal features of each clinic—what Van Maanen calls an “impressionist tales” [ 74 ]. Our work was also guided by the principles set out by Golden-Biddle and Locke, namely authenticity (spending time in the field and basing interpretations on these direct observations), plausibility (creating a plausible account through rich persuasive description) and criticality (e.g. reflexively examining our own assumptions) [ 75 ]. Our collection and analysis of qualitative data was informed by our own professional backgrounds (two general practitioners, one physical therapist, two non-clinicians).

In both MDTs and clinics, we took contemporaneous notes by hand and typed these up immediately afterwards.

Data management and analysis

Typed interview notes and field notes from clinics were collated in a set of Word documents, one for each clinic attended. They were analysed thematically [ 76 ] with attention to the literature on quality improvement and variation (see “ Background ”). Interim summaries were prepared on each clinic, setting out the narrative of how it had been established, its ethos and leadership, setting and staffing, population served and key links with other parts of the local healthcare ecosystem.

Minutes and field notes from the quality improvement collaborative meetings were summarized topic by topic, including initial data collected by the researchers-in-residence, improvement actions taken (or attempted) in that clinic, and any follow-up data shared. Progress or lack of it was interpreted in relation to the contextual case summary for that clinic.

Patient cases seen in clinic, and those discussed by MDTs, were summarized as brief case narratives in Word documents. Using the constant comparative method [ 77 ], we produced an initial synthesis of the clinical picture and principles of management based on the first 10 patient cases seen, and refined this as each additional case was added. Demographic and brief clinical and social details were also logged on Excel spreadsheets. When writing up clinical cases, we used the technique of composite case construction (in which we drew on several actual cases to generate a fictitious one, thereby protecting anonymity whilst preserving key empirical findings [ 78 ]); any names reported in this paper are pseudonyms.

Member checking

A summary was prepared for each clinic, including a narrative of the clinic’s own history and a summary of key quality issues raised across the ten clinics. These summaries included examples from real cases in our dataset. These were shared with the clinical research fellow and a senior clinician from the clinic, and amended in response to feedback. We also shared these summaries with representatives from the patient advisory group.

Overview of dataset

This study generated three complementary datasets. First, the video recordings, minutes, and field notes of 12 quality improvement collaborative meetings, along with the evidence summaries prepared for these meetings and clinic summaries (e.g. descriptions of current practice, audits) submitted by the clinical research fellows. This dataset illustrated wide variation in practice, and (in many topics) gaps or ambiguities in the evidence base.

Second, interviews with staff ( n  = 30) and patients ( n  = 29) from the clinics, along with ethnographic field notes (approximately 100 pages) from 13 in-person clinic visits (54 h), including notes on 50 patient consultations (40 face-to-face, 6 telephone, 4 video). This dataset illustrated the heterogeneity among the ten participating clinics.

Third, field notes (approximately 100 pages), including discussions on 244 clinical cases from the 45 MDT meetings (49 h) that we observed. This dataset revealed further similarities and contrasts among clinics in how patients were managed. In particular, it illustrated how, for the complex patients whose cases were presented at these meetings, teams made sense of, and planned for, each case through multidisciplinary dialogue. This dialogue typically began with one staff member presenting a detailed clinical history along with a narrative of how it had affected the patient’s life and what was at stake for them (e.g. job loss), after which professionals from various backgrounds (nursing, physical therapy, occupational therapy, psychology, dietetics, and different medical specialties) joined in a discussion about what to do.

The ten participating sites are summarized in Table  1 .

In the next two sections, we explore two issues—difficulty defining best practice and the heterogeneous nature of the clinics—that were key to explaining why quality, when pursued in a 10-site collaborative, proved elusive. We then briefly summarize patients’ accounts of their experience in the clinics and give three illustrative examples of the elusiveness of quality improvement using selected topics that were prioritized in our collaborative: outcome measures, investigation of palpitations and management of fatigue. In the final section of the results, we describe how MDT deliberations proved crucial for local quality improvement. Further detail on clinical priority topics will be presented in a separate paper.

“Best practice” in long covid: uncertainty and conflict

The study period (September 2021 to December 2023) corresponded with an exponential increase in published research on long covid. Despite this, the quality improvement collaborative found few unambiguous recommendations for practice. This gap between what the research literature offered and what clinical practice needed was partly ontological (relating what long covid is ). One major bone of contention between patients and clinicians (also evident in discussions with our patient advisory group), for example, was how far (and in whom) clinicians should look for and attempt to treat the various metabolic abnormalities that had been documented in laboratory research studies. The literature on this topic was extensive but conflicting [ 20 , 21 , 22 , 23 , 24 , 79 , 80 , 81 , 82 ]; it was heavy on biological detail but light on clinical application.

Patients were often aware of particular studies that appeared to offer plausible molecular or cellular explanations for symptom clusters along with a drug (often repurposed and off-label) whose mechanism of action appeared to be a good fit with the metabolic chain of causation. In one clinic, for example, we were shown an email exchange between a patient (not medically qualified) and a consultant, in which the patient asked them to reconsider their decision not to prescribe low-dose naltrexone, an opioid receptor antagonist with anti-inflammatory properties. The request included a copy of a peer-reviewed academic paper describing a small, uncontrolled pre-post study (i.e. a weak study design) in which this drug appeared to improve symptoms and functional performance in patients with long covid, as well as a mechanistic argument explaining why the patient felt this drug was a plausible choice in their own case.

This patient’s clinician, in common with most clinicians delivering front-line long covid services, considered that the evidence for such mechanism-based therapies was weak. Clinicians generally felt that this evidence, whilst promising, did not yet support routine measurement of clotting factors, antibodies, immune cells or other biomarkers or the prescription of mechanism-based therapies such as antivirals, anti-inflammatories or anticoagulants. Low-dose naltroxone, for example, is currently being tested in at least one randomized controlled trial (see National Clinical Trials Registry NCT05430152), which had not reported at the time of our observations.

Another challenge to defining best practice was the oft-repeated phrase that long covid is a “diagnosis by exclusion”, but the high prevalence of comorbidities meant that the “pure” long covid patient untainted by other potential explanations for their symptoms was a textbook ideal. In one MDT, for example, we observed a discussion about a patient who had had both swab-positive covid-19 and erythema migrans (a sign of Lyme disease) in the weeks before developing fatigue, yet local diagnostic criteria for each condition required the other to be excluded.

The logic of management in most participating clinics was pragmatic: prompt multidisciplinary assessment and treatment with an emphasis on obtaining a detailed clinical history (including premorbid health status), excluding serious complications (“red flags”), managing specific symptom clusters (for example, physical therapy for breathing pattern disorder), treating comorbidities (for example, anaemia, diabetes or menopause) and supporting whole-person rehabilitation [ 7 , 83 ]. The evidentiary questions raised in MDT discussions (which did not include patients) addressed the practicalities of the rehabilitation model (for example, whether cognitive therapy for neurocognitive complications is as effective when delivered online as it is when delivered in-person) rather than the molecular or cellular mechanisms of disease. For example, the question of whether patients with neurocognitive impairment should be tested for micro-clots or treated with anticoagulants never came up in the MDTs we observed, though we did visit a tertiary referral clinic (the tier 4 clinic in site H), whose lead clinician had a research interest in inflammatory coagulopathies and offered such tests to selected patients.

Because long covid typically produces dozens of symptoms that tend to be uniquely patterned in each patient, the uncertainties on which MDT discussions turned were rarely about general evidence of the kind that might be found in a guideline (e.g. how should fatigue be managed?). Rather they concerned particular case-based clinical decisions (e.g. how should this patient’s fatigue be managed, given the specifics of this case?). An example from our field notes illustrates this:

Physical therapist presents the case of a 39-year-old woman who works as a cleaner on an overnight ferry. Has had long covid for 2 years. Main symptoms are shortness of breath and possible anxiety attacks, especially when at work. She has had a course of physical therapy to teach diaphragmatic breathing but has found that focusing on her breathing makes her more anxious. Patient has to do a lot of bending in her job (e.g. cleaning toilets and under seats), which makes her dizzy, but Active Stand Test was normal. She also has very mild tricuspid incompetence [someone reads out a cardiology report—not hemodynamically significant].
Rehabilitation guidelines (e.g. WHO) recommend phased return to work (e.g. with reduced hours) and frequent breaks. “Tricky!” says someone. The job is intense and busy, and the patient can’t afford not to work. Discussion on whether all her symptoms can be attributed to tension and anxiety. Physical therapist who runs the breathing group says, “No, it’s long covid”, and describes severe initial covid-19 episode and results of serial chest X-rays which showed gradual clearing of ground glass shadows. Team discussion centers on how to negotiate reduced working hours in this particular job, given the overnight ferry shifts. --MDT discussion, Site D

This example raises important considerations about the nature of clinical knowledge in long covid. We return to it in the final section of the “ Results ” and in the “ Discussion ”.

Long covid clinics: a heterogeneous context for quality improvement

Most participating clinics had been established in mid-2020 to follow up patients who had been hospitalized (and perhaps ventilated) for severe acute covid-19. As mass vaccination reduced the severity of acute covid-19 for most people, the patient population in all clinics progressively shifted to include fewer “post-ICU [intensive care unit]” patients (in whom respiratory symptoms almost always dominated), and more people referred by their general practitioners or other secondary care specialties who had not been hospitalized for their acute covid-19 infection, and in whom fatigue, brain fog and palpitations were often the most troubling symptoms. Despite these similarities, the ten clinics had very different histories, geographical and material settings, staffing structures, patient pathways and case mix, as Table  1 illustrates. Below, we give more detail on three example sites.

Site C was established as a generalist “assessment-only” service by a general practitioner with an interest in infectious diseases. It is led jointly by that general practitioner and an occupational therapist, assisted by a wide range of other professionals including speech and language therapy, dietetics, clinical psychology and community-based physical therapy and occupational therapy. It has close links with a chronic fatigue service and a pain clinic that have been running in the locality for over 20 years. The clinic, which is entirely virtual (staff consult either from home or from a small side office in the community trust building), is physically located in a low-rise building on the industrial outskirts of a large town, sharing office space with various community-based health and social care services. Following a 1-h telephone consultation by one of the clinical leads, each patient is discussed at the MDT and then either discharged back to their general practitioner with a detailed management plan or referred on to one of the specialist services. This arrangement evolved to address a particular problem in this locality—that many patients with long covid were being referred by their general practitioner to multiple specialties (e.g. respiratory, neurology, fatigue), leading to a fragmented patient experience, unnecessary specialist assessments and wasteful duplication. The generalist assessment by telephone is oriented to documenting what is often a complex illness narrative (including pre-existing physical and mental comorbidities) and working with the patient to prioritize which symptoms or problems to pursue in which order.

Site E, in a well-regarded inner-city teaching hospital, had been set up in 2020 by a respiratory physician. Its initial ethos and rationale had been “respiratory follow-up”, with strong emphasis on monitoring lung damage via repeated imaging and lung function tests and in ensuring that patients received specialist physical therapy to “re-learn” efficient breathing techniques. Over time, this site has tried to accommodate a more multi-system assessment, with the introduction of a consultant-led infectious disease clinic for patients without a dominant respiratory component, reflecting the shift towards a more fatigue-predominant case mix. At the time of our fieldwork, each patient was seen in turn by a physician, psychologist, occupational therapist and respiratory physical therapist (half an hour each) before all four staff reconvened in a face-to-face MDT meeting to form a plan for each patient. But whilst a wide range of patients with diverse symptoms were discussed at these meetings, there remained a strong focus on respiratory pathology (e.g. tracking improvements in lung function and ensuring that coexisting asthma was optimally controlled).

Site F, one of the first long covid clinics in UK, was set up by a rehabilitation consultant who had been drafted to work on the ICU during the first wave of covid-19 in early 2020. He had a longstanding research interest in whole-patient rehabilitation, especially the assessment and management of chronic fatigue and pain. From the outset, clinic F was more oriented to rehabilitation, including vocational rehabilitation to help patients return to work. There was less emphasis on monitoring lung function or pursuing respiratory comorbidities. At the time of our fieldwork, clinic F offered both a community-based service (“tier 2”) led by an occupational therapist, supported by a respiratory physical therapist and psychologist, and a hospital-based service (“tier 3”) led by the rehabilitation consultant, supported by a wider MDT. Staff in both tiers emphasized that each patient needs a full physical and mental assessment and help to set and work towards achievable goals, whilst staying within safe limits so as to avoid post-exertional symptom exacerbation. Because of the research interest of the lead physician, clinic F adapted well to the growing numbers of patients with fatigue and quickly set up research studies on this cohort [ 84 ].

Details of the other seven sites are shown in Table  1 . Broadly speaking, sites B, E, G and H aligned with the “respiratory follow-up” model and sites F and I aligned with the “rehabilitation” model. Sites A and J had a high-volume, multi-tiered service whose community tier aligned with the “holistic GP assessment” model (site C above) and which also offered a hospital-based, rehabilitation-focused tier. The small service in Scotland (site D) had evolved from an initial respiratory focus to become part of the infectious diseases (ME/CFS) service; Lyme disease (another infectious disease whose sequelae include chronic fatigue) was also prevalent in this region.

The patient experience

Whilst the 10 participating clinics were very diverse in staffing, ethos and patient flows, the 29 patient interviews described remarkably consistent clinic experiences. Almost all identified the biggest problem to be the extended wait of several months before they were seen and the limited awareness (when initially referred) of what long covid clinics could provide. Some talked of how they cried with relief when they finally received an appointment. When the quality improvement collaborative was initially established, waiting times and bottlenecks were patients’ the top priority for quality improvement, and this ranking was shared by clinic staff, who were very aware of how much delays and uncertainties in assessment and treatment compounded patients’ suffering. This issue resolved to a large extent over the study period in all clinics as the referral backlog cleared and the incidence of new cases of long covid fell [ 85 ]; it will be covered in more detail in a separate publication.

Most patients in our sample were satisfied with the care they received when they were finally seen in clinic, especially how they finally felt “heard” after a clinician took a full history. They were relieved to receive affirmation of their experience, a diagnosis of what was wrong and reassurance that they were believed. They were grateful for the input of different members of the multidisciplinary teams and commented on the attentiveness, compassion and skill of allied professionals in particular (“she was wonderful, she got me breathing again”—patient BIR145 talking about a physical therapist). One or two patient participants expressed confusion about who exactly they had seen and what advice they had been given, and some did not realize that a telephone assessment had been an actual clinical consultation. A minority expressed disappointment that an expected investigation had not been ordered (one commented that they had not had any blood tests at all). Several had assumed that the help and advice from the long covid clinic would continue to be offered until they were better and were disappointed that they had been discharged after completing the various courses on offer (since their clinic had been set up as an “assessment only” service).

In the next sections, we give examples of topics raised in the quality improvement collaborative and how they were addressed.

Example quality topic 1: Outcome measures

The first topic considered by the quality improvement collaborative was how (that is, using which measures and metrics) to assess and monitor patients with long covid. In the absence of a validated biomarker, various symptom scores and quality of life scales—both generic and disease-specific—were mooted. Site F had already developed and validated a patient-reported outcome measure (PROM), the C19-YRS (Covid-19 Yorkshire Rehabilitation Scale) and used it for both research and clinical purposes [ 86 ]. It was quickly agreed that, for the purposes of generating comparative research findings across the ten clinics, the C19-YRS should be used at all sites and completed by patients three-monthly. A commercial partner produced an electronic version of this instrument and an app for patient smartphones. The quality improvement collaborative also agreed that patients should be asked to complete the EUROQOL EQ5D, a widely used generic health-related quality of life scale [ 87 ], in order to facilitate comparisons between long covid and other chronic conditions.

In retrospect, the discussions which led to the unopposed adoption of these two measures as a “quality” initiative in clinical care were somewhat aspirational. A review of progress at a subsequent quality improvement meeting revealed considerable variation among clinics, with a wide variety of measures used in different clinics to different degrees. Reasons for this variation were multiple. First, although our patient advisory group were keen that we should gather as much data as possible on the patient experience of this new condition, many clinic patients found the long questionnaires exhausting to complete due to cognitive impairment and fatigue. In addition, whilst patients were keen to answer questions on symptoms that troubled them, many had limited patience to fill out repeated surveys on symptoms that did not trouble them (“it almost felt as if I’ve not got long covid because I didn’t feel like I fit the criteria as they were laying it out”—patient SAL001). Staff assisted patients in completing the measures when needed, but this was time-consuming (up to 45 min per instrument) and burdensome for both staff and patients. In clinics where a high proportion of patients required assistance, staff time was the rate-limiting factor for how many instruments got completed. For some patients, one short instrument was the most that could be asked of them, and the clinician made a judgement on which one would be in their best interests on the day.

The second reason for variation was that the clinical diagnosis and management of particular features, complications and comorbidities of long covid required more nuance than was provided by these relatively generic instruments, and the level of detail sought varied with the specialist interest of the clinic (and the clinician). The modified C19-YRS [ 88 ], for example, contained 19 items, of which one asked about sleep quality. But if a patient had sleep difficulties, many clinicians felt that these needed to be documented in more detail—for example using the 8-item Epworth Sleepiness Scale, originally developed for conditions such as narcolepsy and obstructive sleep apnea [ 89 ]. The “Epworth score” was essential currency for referrals to some but not all specialist sleep services. Similarly, the C19-YRS had three items relating to anxiety, depression and post-traumatic stress disorder, but in clinics where there was a strong focus on mental health (e.g. when there was a resident psychologist), patients were usually invited to complete more specific tools (e.g. the Patient Health Questionnaire 9 [ 90 ], a 9-item questionnaire originally designed to assess severity of depression).

The third reason for variation was custom and practice. Ethnographic visits revealed that paper copies of certain instruments were routinely stacked on clinicians’ desks in outpatient departments and also (in some cases) handed out by administrative staff in waiting areas so that patients could complete them before seeing the clinician. These familiar clinic artefacts tended to be short (one-page) instruments that had a long tradition of use in clinical practice. They were not always fit for purpose. For example, the Nijmegen questionnaire was developed in the 1980s to assess hyperventilation; it was validated against a longer, “gold standard” instrument for that condition [ 91 ]. It subsequently became popular in respiratory clinics to diagnose or exclude breathing pattern disorder (a condition in which the normal physiological pattern of breathing becomes replaced with less efficient, shallower breathing [ 92 ]), so much so that the researchers who developed the instrument published a paper to warn fellow researchers that it had not been validated for this purpose [ 93 ]. Whilst a validated 17-item instrument for breathing pattern disorder (the Self-Evaluation of Breathing Questionnaire [ 94 ]) does exist, it is not in widespread clinical use. Most clinics in LOCOMOTION used Nijmegen either on all patients (e.g. as part of a comprehensive initial assessment, especially if the service had begun as a respiratory follow-up clinic) or when breathing pattern disorder was suspected.

In sum, the use of outcome measures in long covid clinics was a compromise between standardization and contingency. On the one hand, all clinics accepted the need to use “validated” instruments consistently. On the other hand, there were sometimes good reasons why they deviated from agreed practice, including mismatch between the clinic’s priorities as a research site, its priorities as a clinical service, and the particular clinical needs of a patient; the clinic’s—and the clinician’s—specialist focus; and long-held traditions of using particular instruments with which staff and patients were familiar.

Example quality topic 2: Postural orthostatic tachycardia syndrome (POTS)

Palpitations (common in long covid) and postural orthostatic tachycardia syndrome (POTS, a disproportionate acceleration in heart rate on standing, the assumed cause of palpitations in many long covid patients) was the top priority for quality improvement identified by our patient advisory group. Reflecting discussions and evidence (of various kinds) shared in online patient communities, the group were confident that POTS is common in long covid patients and that many cases remain undetected (perhaps misdiagnosed as anxiety). Their request that all long covid patients should be “screened” for POTS prompted a search for, and synthesis of, evidence (which we published in the BMJ [ 95 ]). In sum, that evidence was sparse and contested, but, combined with standard practice in specialist clinics, broadly supported the judicious use of the NASA Lean Test [ 96 ]. This test involves repeated measurements of pulse and blood pressure with the patient first lying and then standing (with shoulders resting against a wall).

The patient advisory group’s request that the NASA Lean Test should be conducted on all patients met with mixed responses from the clinics. In site F, the lead physician had an interest in autonomic dysfunction in chronic fatigue and was keen; he had already published a paper on how to adapt the NASA Lean Test for self-assessment at home [ 97 ]. Several other sites were initially opposed. Staff at site E, for example, offered various arguments:

The test is time-consuming, labor-intensive, and takes up space in the clinic which has an opportunity cost in terms of other potential uses;

The test is unvalidated and potentially misleading (there is a high incidence of both false negative and false positive results);

There is no proven treatment for POTS, so there is no point in testing for it;

It is a specialist test for a specialist condition, so it should be done in a specialist clinic where its benefits and limitations are better understood;

Objective testing does not change clinical management since what we treat is the patient’s symptoms (e.g. by a pragmatic trial of lifestyle measures and medication);

People with symptoms suggestive of dysautonomia have already been “triaged out” of this clinic (that is, identified in the initial telephone consultation and referred directly to neurology or cardiology);

POTS is a manifestation of the systemic nature of long covid; it does not need specific treatment but will improve spontaneously as the patient goes through standard interventions such as active pacing, respiratory physical therapy and sleep hygiene;

Testing everyone, even when asymptomatic, runs counter to the ethos of rehabilitation, which is to “de-medicalize” patients so as to better orient them to their recovery journey.

When clinics were invited to implement the NASA Lean Test on a consecutive sample of patients to resolve a dispute about the incidence of POTS (from “we’ve only seen a handful of people with it since the clinic began” to “POTS is common and often missed”), all but one site agreed to participate. The tertiary POTS centre linked to site H was already running the NASA Lean Test as standard on all patients. Site C, which operated entirely virtually, passed the work to the referring general practitioner by making this test a precondition for seeing the patient; site D, which was largely virtual, sent instructions for patients to self-administer the test at home.

The NASA Lean Test study has been published separately [ 98 ]. In sum, of 277 consecutive patients tested across the eight clinics, 20 (7%) had a positive NASA Lean Test for POTS and a further 28 (10%) a borderline result. Six of 20 patients who met the criteria for POTS on testing had no prior history of orthostatic intolerance. The question of whether this test should be used to “screen” all patients was not answered definitively. But the experience of participating in the study persuaded some sceptics that postural changes in heart rate could be severe in some long covid patients, did not appear to be fully explained by their previously held theories (e.g. “functional”, anxiety, deconditioning), and had likely been missed in some patients. The outcome of this particular quality improvement cycle was thus not a wholescale change in practice (for which the evidence base was weak) but a more subtle increase in clinical awareness, a greater willingness to consider testing for POTS and a greater commitment to contribute to research into this contested condition.

More generally, the POTS audit prompted some clinicians to recognize the value of quality improvement in novel clinical areas. One physician who had initially commented that POTS was not seen in their clinic, for example, reflected:

“ Our clinic population is changing. […] Overall there’s far fewer post-ICU patients with ECMO [extra-corporeal membrane oxygenation] issues and far more long covid from the community, and this is the bit our clinic isn’t doing so well on. We’re doing great on breathing pattern disorder; neuro[logists] are helping us with the brain fogs; our fatigue and occupational advice is ok but some of the dysautonomia symptoms that are more prevalent in the people who were not hospitalized – that’s where we need to improve .” -Respiratory physician, site G (from field visit 6.6.23)

Example quality topic 3: Management of fatigue

Fatigue was the commonest symptom overall and a high priority among both patients and clinicians for quality improvement. It often coexisted with the cluster of neurocognitive symptoms known as brain fog, with both conditions relapsing and remitting in step. Clinicians were keen to systematize fatigue management using a familiar clinical framework oriented around documenting a full clinical history, identifying associated symptoms, excluding or exploring comorbidities and alternative explanations (e.g. poor sleep patterns, depression, menopause, deconditioning), assessing how fatigue affects physical and mental function, implementing a program of physical and cognitive therapy that was sensitive to the patient’s condition and confidence level, and monitoring progress using validated patient-reported outcome measures and symptom diaries.

The underpinning logic of this approach, which broadly reflected World Health Organization guidance [ 99 ], was that fatigue and linked cognitive impairment could be a manifestation of many—perhaps interacting—conditions but that a whole-patient (body and mind) rehabilitation program was the cornerstone of management in most cases. Discussion in the quality improvement collaborative focused on issues such as whether fatigue was so severe that it produced safety concerns (e.g. in a person’s job or with childcare), the pros and cons of particular online courses such as yoga, relaxation and mindfulness (many were viewed positively, though the evidence base was considered weak), and the extent to which respiratory physical therapy had a crossover impact on fatigue (systematic reviews suggested that it may do, but these reviews also cautioned that primary studies were sparse, methodologically flawed, and heterogeneous [ 100 , 101 ]). They also debated the strengths and limitations of different fatigue-specific outcome measures, each of which had been developed and validated in a different condition, with varying emphasis on cognitive fatigue, physical fatigue, effect on daily life, and motivation. These instruments included the Modified Fatigue Impact Scale; Fatigue Severity Scale [ 102 ]; Fatigue Assessment Scale; Functional Assessment Chronic Illness Therapy—Fatigue (FACIT-F) [ 103 ]; Work and Social Adjustment Scale [ 104 ]; Chalder Fatigue Scale [ 105 ]; Visual Analogue Scale—Fatigue [ 106 ]; and the EQ5D [ 87 ]. In one clinic (site F), three of these scales were used in combination for reasons discussed below.

Some clinicians advocated melatonin or nutritional supplements (such as vitamin D or folic acid) for fatigue on the grounds that many patients found them helpful and formal placebo-controlled trials were unlikely ever to be conducted. But neurostimulants used in other fatigue-predominant conditions (e.g. brain injury, stroke), which also lacked clinical trial evidence in long covid, were viewed as inappropriate in most patients because of lack of evidence of clear benefit and hypothetical risk of harm (e.g. adverse drug reactions, polypharmacy).

Whilst the patient advisory group were broadly supportive of a whole-patient rehabilitative approach to fatigue, their primary concern was fatiguability , especially post-exertional symptom exacerbation (PESE, also known as “crashes”). In these, the patient becomes profoundly fatigued some hours or days after physical or mental exertion, and this state can last for days or even weeks [ 107 ]. Patients viewed PESE as a “red flag” symptom which they felt clinicians often missed and sometimes caused. They wanted the quality improvement effort to focus on ensuring that all clinicians were aware of the risks of PESE and acted accordingly. A discussion among patients and clinicians at a quality improvement collaborative meeting raised a new research hypothesis—that reducing the number of repeated episodes of PESE may improve the natural history of long covid.

These tensions around fatigue management played out differently in different clinics. In site C (the GP-led virtual clinic run from a community hub), fatigue was viewed as one manifestation of a whole-patient condition. The lead general practitioner used the metaphor of untangling a skein of wool: “you have to find the end and then gently pull it”. The underlying problem in a fatigued patient, for example, might be an undiagnosed physical condition such as anaemia, disturbed sleep, or inadequate pacing. These required (respectively) the chronic fatigue service (comprising an occupational therapist and specialist psychologist and oriented mainly to teaching the techniques of goal-setting and pacing), a “tiredness” work-up (e.g. to exclude anaemia or menopause), investigation of poor sleep (which, not uncommonly, was due to obstructive sleep apnea), and exploration of mental health issues.

In site G (a hospital clinic which had evolved from a respiratory service), patients with fatigue went through a fatigue management program led by the occupational therapist with emphasis on pacing, energy conservation, avoidance of PESE and sleep hygiene. Those without ongoing respiratory symptoms were often discharged back to their general practitioner once they had completed this; there was no consultant follow-up of unresolved fatigue.

In site F (a rehabilitation clinic which had a longstanding interest in chronic fatigue even before the pandemic), active interdisciplinary management of fatigue was commenced at or near the patient’s first visit, on the grounds that the earlier this began, the more successful it would be. In this clinic, patients were offered a more intensive package: a similar occupational therapy-led fatigue course as those in site G, plus input from a dietician to advise on regular balanced meals and caffeine avoidance and a group-based facilitated peer support program which centred on fatigue management. The dietician spoke enthusiastically about how improving diet in longstanding long covid patients often improved fatigue (e.g. because they had often lost muscle mass and tended to snack on convenience food rather than make meals from scratch), though she agreed there was no evidence base from trials to support this approach.

Pursuing local quality improvement through MDTs

Whilst some long covid patients had “textbook” symptoms and clinical findings, many cases were unique and some were fiendishly complex. One clinician commented that, somewhat paradoxically, “easy cases” were often the post-ICU follow-ups who had resolving chest complications; they tended to do well with a course of respiratory physical therapy and a return-to-work program. Such cases were rarely brought to MDT meetings. “Difficult cases” were patients who had not been hospitalized for their acute illness but presented with a months- or years-long history of multiple symptoms with fatigue typically predominant. Each one was different, as the following example (some details of which have been fictionalized to protect anonymity) illustrates.

The MDT is discussing Mrs Fermah, a 65-year-old homemaker who had covid-19 a year ago. She has had multiple symptoms since, including fluctuating fatigue, brain fog, breathlessness, retrosternal chest pain of burning character, dry cough, croaky voice, intermittent rashes (sometimes on eating), lips going blue, ankle swelling, orthopnoea, dizziness with the room spinning which can be triggered by stress, low back pain, aches and pains in the arms and legs and pins and needles in the fingertips, loss of taste and smell, palpitations and dizziness (unclear if postural, but clear association with nausea), headaches on waking, and dry mouth. She is somewhat overweight (body mass index 29) and admits to low mood. Functionally, she is mostly confined to the house and can no longer manage the stairs so has begun to sleep downstairs. She has stumbled once or twice but not fallen. Her social life has ceased and she rarely has the energy to see her grandchildren. Her 70-year-old husband is retired and generally supportive, though he spends most evenings at his club. Comorbidities include glaucoma which is well controlled and overseen by an ophthalmologist, mild club foot (congenital) and stage 1 breast cancer 20 years ago. Various tests, including a chest X-ray, resting and exercise oximetry and a blood panel, were normal except for borderline vitamin D level. Her breathing questionnaire score suggests she does not have breathing pattern disorder. ECG showed first-degree atrioventricular block and left axis deviation. No clinician has witnessed the blue lips. Her current treatment is online group respiratory physical therapy; a home visit is being arranged to assess her climbing stairs. She has declined a psychologist assessment. The consultant asks the nurse who assessed her: “Did you get a feel if this is a POTS-type dizziness or an ENT-type?” She sighs. “Honestly it was hard to tell, bless her.”—Site A MDT

This patient’s debilitating symptoms and functional impairments could all be due to long covid, yet “evidence-based” guidance for how to manage her complex suffering does not exist and likely never will exist. The question of which (if any) additional blood or imaging tests to do, in what order of priority, and what interventions to offer the patient will not be definitively answered by consulting clinical trials involving hundreds of patients, since (even if these existed) the decision involves weighing this patient’s history and the multiple factors and uncertainties that are relevant in her case. The knowledge that will help the MDT provide quality care to Mrs Fermah is case-based knowledge—accumulated clinical experience and wisdom from managing and deliberating on multiple similar cases. We consider case-based knowledge further in the “ Discussion ”.

Summary of key findings

This study has shown that a quality improvement collaborative of UK long covid clinics made some progress towards standardizing assessment and management in some topics, but some variation remained. This could be explained in part by the fact that different clinics had different histories and path dependencies, occupied a different place in the local healthcare ecosystem, served different populations, were differently staffed, and had different clinical interests. Our patient advisory group and clinicians in the quality improvement collaborative broadly prioritized the same topics for improvement but interpreted them somewhat differently. “Quality” long covid care had multiple dimensions, relating to (among other things) service set-up and accessibility, clinical provision appropriate to the patient’s need (including options for referral to other services locally), the human qualities of clinical and support staff, how knowledge was distributed across (and accessible within) the system, and the accumulated collective wisdom of local MDTs in dealing with complex cases (including multiple kinds of specialist expertise as well as relational knowledge of what was at stake for the patient). Whilst both staff and patients were keen to contribute to the quality improvement effort, the burden of measurement was evident: multiple outcome measures, used repeatedly, were resource-intensive for staff and exhausting for patients.

Strengths and limitations of this study

To our knowledge, we are the first to report both a quality improvement collaborative and an in-depth qualitative study of clinical work in long covid. Key strengths of this work include the diverse sampling frame (with sites from three UK jurisdictions and serving widely differing geographies and demographics); the use of documents, interviews and reflexive interpretive ethnography to produce meaningful accounts of how clinics emerged and how they were currently organized; the use of philosophical concepts to analyse data on how MDTs produced quality care on a patient-by-patient basis; and the close involvement of patient co-researchers and coauthors during the research and writing up.

Limitations of the study include its exclusive UK focus (the external validity of findings to other healthcare systems is unknown); the self-selecting nature of participants in a quality improvement collaborative (our patient advisory group suggested that the MDTs observed in this study may have represented the higher end of a quality spectrum, hence would be more likely than other MDTs to adhere to guidelines); and the particular perspective brought by the researchers (two GPs, a physical therapist and one non-clinical person) in ethnographic observations. Hospital specialists or organizational scholars, for example, may have noticed different things or framed what they observed differently.

Explaining variation in long covid care

Sutherland and Levesque’s framework mentioned in the “ Background ” section does not explain much of the variation found in our study [ 70 ]. In terms of capacity, at the time of this study most participating clinics benefited from ring-fenced resources. In terms of evidence, guidelines existed and were not greatly contested, but as illustrated by the case of Mrs Fermah above, many patients were exceptions to the guideline because of complex symptomatology and relevant comorbidities. In terms of agency, clinicians in most clinics were passionately engaged with long covid (they were pioneers who had set up their local clinic and successfully bid for national ring-fenced resources) and were generally keen to support patient choice (though not if the patient requested tests which were unavailable or deemed not indicated).

Astma et al.’s list of factors that may explain variation in practice (see “ Background ”) includes several that may be relevant to long covid, especially that the definition of appropriate care in this condition remains somewhat contested. But lack of opportunity to discuss cases was not a problem in the clinics in our sample. On the contrary, MDT meetings in each locality gave clinicians multiple opportunities to discuss cases with colleagues and reflect collectively on whether and how to apply particular guidelines.

The key problem was not that clinicians disputed the guidelines for managing long covid or were unaware of them; it was that the guidelines were not self-interpreting . Rather, MDTs had to deliberate on the balance of benefits and harms in different aspects of individual cases. In patients whose symptoms suggested a possible diagnosis of POTS (or who suspected themselves of having POTS), for example, these deliberations were sometimes lengthy and nuanced. Should a test result that is not technically in the abnormal range but close to it be treated as diagnostic, given that symptoms point to this diagnosis? If not, should the patient be told that the test excludes POTS or that it is equivocal? If a cardiology opinion has stated firmly that the patient does not have POTS but the cardiologist is not known for their interest in this condition, should a second specialist opinion be sought? If the gold standard “tilt test” [ 108 ] for POTS (usually available only in tertiary centres) is not available locally, does this patient merit a costly out-of-locality referral? Should the patient’s request for a trial of off-label medication, reflecting discussions in an online support group, be honoured? These are the kinds of questions on which MDTs deliberated at length.

The fact that many cases required extensive deliberation does not necessarily justify variation in practice among clinics. But taking into account the clinics’ very different histories, set-up, and local referral pathways, the variation begins to make sense. A patient who is being assessed in a clinic that functions as a specialist chronic fatigue centre and attracts referrals which reflect this interest (e.g. site F in our sample) will receive different management advice from one that functions as a telephone-only generalist assessment centre and refers on to other specialties (site C in our sample). The wide variation in case mix, coupled with the fact that a different proportion of these cases were highly complex in each clinic (and in different ways), suggests that variation in practice may reflect appropriate rather than inappropriate care.

Our patient advisory group affirmed that many of the findings reported here resonated with their own experience, but they raised several concerns. These included questions about patient groups who may have been missed in our sample because they were rarely discussed in MDTs. The decision to take a case to MDT discussion is taken largely by a clinician, and there was evidence from online support groups that some patients’ requests for their case to be taken to an MDT had been declined (though not, to our knowledge, in the clinics participating in the LOCOMOTION study).

We began this study by asking “what is quality in long covid care?”. We initially assumed that this question referred to a generalizable evidence base, which we felt we could identify, and we believed that we could then determine whether long covid clinics were following the evidence base through conventional audits of structure, process, and outcome. In retrospect, these assumptions were somewhat naïve. On the basis of our findings, we suggest that a better (and more individualized) research question might be “to what extent does each patient with long covid receive evidence-based care appropriate to their needs?”. This question would require individual case review on a sample of cases, tracking each patient longitudinally including cross-referrals, and also interviewing the patient.

Nomothetic versus idiographic knowledge

In a series of lectures first delivered in the 1950s and recently republished [ 109 ], psychiatrist Dr Maurice O’Connor Drury drew on the later philosophy of his friend and mentor Ludwig Wittgenstein to challenge what he felt was a concerning trend: that the nomothetic (generalizable, abstract) knowledge from randomized controlled trials (RCTs) was coming to over-ride the idiographic (personal, situated) knowledge about particular patients. Based on Wittgenstein’s writings on the importance of the particular, Drury predicted—presciently—that if implemented uncritically, RCTs would result in worse, not better, care for patients, since it would go hand-in-hand with a downgrading of experience, intuition, subjective judgement, personal reflection, and collective deliberation.

Much conventional quality improvement methodology is built on an assumption that nomothetic knowledge (for example, findings from RCTs and systematic reviews) is a higher form of knowing than idiographic knowledge. But idiographic, case-based reasoning—despite its position at the very bottom of evidence-based medicine’s hierarchy of evidence [ 110 ]—is a legitimate and important element of medical practice. Bioethicist Kathryn Montgomery, drawing on Aristotle’s notion of praxis , considers clinical practice to be an example of case-based reasoning [ 111 ]. Medicine is governed not by hard and fast laws but by competing maxims or rules of thumb ; the essence of judgement is deciding which (if any) rule should be applied in a particular circumstance. Clinical judgement incorporates science (especially the results of well-conducted research) and makes use of available tools and technologies (including guidelines and decision-support algorithms that incorporate research findings). But rather than being determined solely by these elements, clinical judgement is guided both by the scientific evidence and by the practical and ethical question “what is it best to do, for this individual, given these circumstances?”.

In this study, we observed clinical management of, and MDT deliberations on, hundreds of clinical cases. In the more straightforward ones (for example, recovering pneumonitis), guideline-driven care was not difficult to implement and such cases were rarely brought to the MDT. But cases like Mrs Fermah (see last section of “ Results ”) required much discussion on which aspects of which guideline were in the patient’s best interests to bring into play at any particular stage in their illness journey.

Conclusions

One systematic review on quality improvement collaboratives concluded that “ [those] reporting success generally addressed relatively straightforward aspects of care, had a strong evidence base and noted a clear evidence-practice gap in an accepted clinical pathway or guideline” (page 226) [ 60 ]. The findings from this study suggest that to the extent that such collaboratives address clinical cases that are not straightforward, conventional quality improvement methods may be less useful and even counterproductive.

The question “what is quality in long covid care?” is partly a philosophical one. Our findings support an approach that recognizes and values idiographic knowledge —including establishing and protecting a safe and supportive space for deliberation on individual cases to occur and to value and draw upon the collective learning that occurs in these spaces. It is through such deliberation that evidence-based guidelines can be appropriately interpreted and applied to the unique needs and circumstances of individual patients. We suggest that Drury’s warning about the limitations of nomothetic knowledge should prompt a reassessment of policies that rely too heavily on such knowledge, resulting in one-size-fits-all protocols. We also cautiously hypothesize that the need to centre the quality improvement effort on idiographic rather than nomothetic knowledge is unlikely to be unique to long covid. Indeed, such an approach may be particularly important in any condition that is complex, unpredictable, variable in presentation and clinical course, and associated with comorbidities.

Availability of data and materials

Selected qualitative data (ensuring no identifiable information) will be made available to formal research teams on reasonable request to Professor Greenhalgh at the University of Oxford, on condition that they have research ethics approval and relevant expertise. The quantitative data on NASA Lean Test have been published in full in a separate paper [ 98 ].

Abbreviations

Chronic fatigue syndrome

Intensive care unit

Jenny Ceolta-Smith

Julie Darbyshire

LOng COvid Multidisciplinary consortium Optimising Treatments and services across the NHS

Multidisciplinary team

Myalgic encephalomyelitis

Middle East Respiratory Syndrome

National Aeronautics and Space Association

Occupational therapy/ist

Post-exertional symptom exacerbation

Postural orthostatic tachycardia syndrome

Speech and language therapy

Severe Acute Respiratory Syndrome

Trisha Greenhalgh

United Kingdom

United States

World Health Organization

Perego E, Callard F, Stras L, Melville-JÛhannesson B, Pope R, Alwan N. Why the Patient-Made Term “Long Covid” is needed. Wellcome Open Res. 2020;5:224.

Article   Google Scholar  

Greenhalgh T, Sivan M, Delaney B, Evans R, Milne R: Long covid—an update for primary care. bmj 2022;378:e072117.

Centers for Disease Control and Prevention (US): Long COVID or Post-COVID Conditions (updated 16th December 2022). Atlanta: CDC. Accessed 2nd June 2023 at https://www.cdc.gov/coronavirus/2019-ncov/long-term-effects/index.html ; 2022.

National Institute for Health and Care Excellence (NICE) Scottish Intercollegiate Guidelines Network (SIGN) and Royal College of General Practitioners (RCGP): COVID-19 rapid guideline: managing the long-term effects of COVID-19, vol. Accessed 30th January 2022 at https://www.nice.org.uk/guidance/ng188/resources/covid19-rapid-guideline-managing-the-longterm-effects-of-covid19-pdf-51035515742 . London: NICE; 2022.

Organization WH: Post Covid-19 Condition (updated 7th December 2022), vol. Accessed 2nd June 2023 at https://www.who.int/europe/news-room/fact-sheets/item/post-covid-19-condition#:~:text=It%20is%20defined%20as%20the,months%20with%20no%20other%20explanation . Geneva: WHO; 2022.

Office for National Statistics: Prevalence of ongoing symptoms following coronavirus (COVID-19) infection in the UK: 31st March 2023. London: ONS. Accessed 30th May 2023 at https://www.ons.gov.uk/peoplepopulationandcommunity/healthandsocialcare/conditionsanddiseases/datasets/alldatarelatingtoprevalenceofongoingsymptomsfollowingcoronaviruscovid19infectionintheuk ; 2023.

Crook H, Raza S, Nowell J, Young M, Edison P: Long covid—mechanisms, risk factors, and management. bmj 2021;374.

Sudre CH, Murray B, Varsavsky T, Graham MS, Penfold RS, Bowyer RC, Pujol JC, Klaser K, Antonelli M, Canas LS. Attributes and predictors of long COVID. Nat Med. 2021;27(4):626–31.

Article   CAS   PubMed   PubMed Central   Google Scholar  

Reese JT, Blau H, Casiraghi E, Bergquist T, Loomba JJ, Callahan TJ, Laraway B, Antonescu C, Coleman B, Gargano M: Generalisable long COVID subtypes: findings from the NIH N3C and RECOVER programmes. EBioMedicine 2023;87.

Thaweethai T, Jolley SE, Karlson EW, Levitan EB, Levy B, McComsey GA, McCorkell L, Nadkarni GN, Parthasarathy S, Singh U. Development of a definition of postacute sequelae of SARS-CoV-2 infection. JAMA. 2023;329(22):1934–46.

Brown DA, O’Brien KK. Conceptualising Long COVID as an episodic health condition. BMJ Glob Health. 2021;6(9): e007004.

Article   PubMed   Google Scholar  

Tate WP, Walker MO, Peppercorn K, Blair AL, Edgar CD. Towards a Better Understanding of the Complexities of Myalgic Encephalomyelitis/Chronic Fatigue Syndrome and Long COVID. Int J Mol Sci. 2023;24(6):5124.

Ahmed H, Patel K, Greenwood DC, Halpin S, Lewthwaite P, Salawu A, Eyre L, Breen A, Connor RO, Jones A. Long-term clinical outcomes in survivors of severe acute respiratory syndrome (SARS) and Middle East respiratory syndrome coronavirus (MERS) outbreaks after hospitalisation or ICU admission: a systematic review and meta-analysis. J Rehabil Med. 2020;52(5):1–11.

Google Scholar  

World Health Organisation: Clinical management of severe acute respiratory infection (SARI) when COVID-19 disease is suspected: Interim guidance (13th March 2020). Geneva: WHO. Accessed 3rd January 2023 at https://t.co/JpNdP8LcV8?amp=1 ; 2020.

Rushforth A, Ladds E, Wieringa S, Taylor S, Husain L, Greenhalgh T: Long Covid – the illness narratives. Under review for Sociology of Health and Illness 2021.

Russell D, Spence NJ. Chase J-AD, Schwartz T, Tumminello CM, Bouldin E: Support amid uncertainty: Long COVID illness experiences and the role of online communities. SSM-Qual Res Health. 2022;2: 100177.

Article   PubMed   PubMed Central   Google Scholar  

Ziauddeen N, Gurdasani D, O’Hara ME, Hastie C, Roderick P, Yao G, Alwan NA. Characteristics and impact of Long Covid: Findings from an online survey. PLoS ONE. 2022;17(3): e0264331.

Evans RA, McAuley H, Harrison EM, Shikotra A, Singapuri A, Sereno M, Elneima O, Docherty AB, Lone NI, Leavy OC. Physical, cognitive, and mental health impacts of COVID-19 after hospitalisation (PHOSP-COVID): a UK multicentre, prospective cohort study. Lancet Respir Med. 2021;9(11):1275–87.

Sykes DL, Holdsworth L, Jawad N, Gunasekera P, Morice AH, Crooks MG. Post-COVID-19 symptom burden: what is long-COVID and how should we manage it? Lung. 2021;199(2):113–9.

Altmann DM, Whettlock EM, Liu S, Arachchillage DJ, Boyton RJ: The immunology of long COVID. Nat Rev Immunol 2023:1–17.

Klein J, Wood J, Jaycox J, Dhodapkar RM, Lu P, Gehlhausen JR, Tabachnikova A, Greene K, Tabacof L, Malik AA et al : Distinguishing features of Long COVID identified through immune profiling. Nature 2023.

Chen B, Julg B, Mohandas S, Bradfute SB. Viral persistence, reactivation, and mechanisms of long COVID. Elife. 2023;12: e86015.

Wang C, Ramasamy A, Verduzco-Gutierrez M, Brode WM, Melamed E. Acute and post-acute sequelae of SARS-CoV-2 infection: a review of risk factors and social determinants. Virol J. 2023;20(1):124.

Cervia-Hasler C, Brüningk SC, Hoch T, Fan B, Muzio G, Thompson RC, Ceglarek L, Meledin R, Westermann P, Emmenegger M et al Persistent complement dysregulation with signs of thromboinflammation in active Long Covid Science 2024;383(6680):eadg7942.

Sivan M, Greenhalgh T, Darbyshire JL, Mir G, O’Connor RJ, Dawes H, Greenwood D, O’Connor D, Horton M, Petrou S. LOng COvid Multidisciplinary consortium Optimising Treatments and servIces acrOss the NHS (LOCOMOTION): protocol for a mixed-methods study in the UK. BMJ Open. 2022;12(5): e063505.

Rushforth A, Ladds E, Wieringa S, Taylor S, Husain L, Greenhalgh T. Long covid–the illness narratives. Soc Sci Med. 2021;286: 114326.

National Institute for Health and Care Excellence: COVID-19 rapid guideline: managing the long-term effects of COVID-19, vol. Accessed 4th October 2023 at https://www.nice.org.uk/guidance/ng188/resources/covid19-rapid-guideline-managing-the-longterm-effects-of-covid19-pdf-51035515742 . London: NICE 2020.

NHS England: Long COVID: the NHS plan for 2021/22. London: NHS England. Accessed 2nd August 2022 at https://www.england.nhs.uk/coronavirus/documents/long-covid-the-nhs-plan-for-2021-22/ ; 2021.

NHS England: NHS to offer ‘long covid’ sufferers help at specialist centres. London: NHS England. Accessed 10th October 2020 at https://www.england.nhs.uk/2020/10/nhs-to-offer-long-covid-help/ ; 2020 (7th October).

NHS England: The NHS plan for improving long COVID services, vol. Acessed 4th February 2024 at https://www.england.nhs.uk/publication/the-nhs-plan-for-improving-long-covid-services/ .London: Gov.uk; 2022.

NHS England: Commissioning guidance for post-COVID services for adults, children and young people, vol. Accessed 6th February 2024 at https://www.england.nhs.uk/long-read/commissioning-guidance-for-post-covid-services-for-adults-children-and-young-people/ . London: gov.uk; 2023.

National Institute for Health Research: Researching Long Covid: Adressing a new global health challenge, vol. Accessed 9.8.23 at https://evidence.nihr.ac.uk/collection/researching-long-covid-addressing-a-new-global-health-challenge/ . London: NIHR; 2022.

Subbaraman N. NIH will invest $1 billion to study long COVID. Nature. 2021;591(7850):356–356.

Article   CAS   PubMed   Google Scholar  

Donabedian A. The definition of quality and approaches to its assessment and monitoring. Ann Arbor: Michigan; 1980.

Laffel G, Blumenthal D. The case for using industrial quality management science in health care organizations. JAMA. 1989;262(20):2869–73.

Maxwell RJ. Quality assessment in health. BMJ. 1984;288(6428):1470.

Berwick DM, Godfrey BA, Roessner J. Curing health care: New strategies for quality improvement. The Journal for Healthcare Quality (JHQ). 1991;13(5):65–6.

Deming WE. Out of the Crisis. Cambridge, MA: MIT Press; 1986.

Argyris C: Increasing leadership effectiveness: New York: J. Wiley; 1976.

Juran JM: A history of managing for quality: The evolution, trends, and future directions of managing for quality: Asq Press; 1995.

Institute of Medicine (US): Crossing the Quality Chasm: A New Health System for the 21st Century. Washington, DC: National Academy Press; 2001.

McNab D, McKay J, Shorrock S, Luty S, Bowie P. Development and application of ‘systems thinking’ principles for quality improvement. BMJ Open Qual. 2020;9(1): e000714.

Sampath B, Rakover J, Baldoza K, Mate K, Lenoci-Edwards J, Barker P. ​Whole-System Quality: A Unified Approach to Building Responsive, Resilient Health Care Systems. Boston: Institute for Healthcare Immprovement; 2021.

Batalden PB, Davidoff F: What is “quality improvement” and how can it transform healthcare? In . , vol. 16: BMJ Publishing Group Ltd; 2007: 2–3.

Baker G. Collaborating for improvement: the Institute for Healthcare Improvement’s breakthrough series. New Med. 1997;1:5–8.

Plsek PE. Collaborating across organizational boundaries to improve the quality of care. Am J Infect Control. 1997;25(2):85–95.

Ayers LR, Beyea SC, Godfrey MM, Harper DC, Nelson EC, Batalden PB. Quality improvement learning collaboratives. Qual Manage Healthcare. 2005;14(4):234–47.

Brandrud AS, Schreiner A, Hjortdahl P, Helljesen GS, Nyen B, Nelson EC. Three success factors for continual improvement in healthcare: an analysis of the reports of improvement team members. BMJ Qual Saf. 2011;20(3):251–9.

Dückers ML, Spreeuwenberg P, Wagner C, Groenewegen PP. Exploring the black box of quality improvement collaboratives: modelling relations between conditions, applied changes and outcomes. Implement Sci. 2009;4(1):1–12.

Nadeem E, Olin SS, Hill LC, Hoagwood KE, Horwitz SM. Understanding the components of quality improvement collaboratives: a systematic literature review. Milbank Q. 2013;91(2):354–94.

Shortell SM, Marsteller JA, Lin M, Pearson ML, Wu S-Y, Mendel P, Cretin S, Rosen M: The role of perceived team effectiveness in improving chronic illness care. Medical Care 2004:1040–1048.

Wilson T, Berwick DM, Cleary PD. What do collaborative improvement projects do? Experience from seven countries. Joint Commission J Qual Safety. 2004;30:25–33.

Schouten LM, Hulscher ME, van Everdingen JJ, Huijsman R, Grol RP. Evidence for the impact of quality improvement collaboratives: systematic review. BMJ. 2008;336(7659):1491–4.

Hulscher ME, Schouten LM, Grol RP, Buchan H. Determinants of success of quality improvement collaboratives: what does the literature show? BMJ Qual Saf. 2013;22(1):19–31.

Dixon-Woods M, Bosk CL, Aveling EL, Goeschel CA, Pronovost PJ. Explaining Michigan: developing an ex post theory of a quality improvement program. Milbank Q. 2011;89(2):167–205.

Bate P, Mendel P, Robert G: Organizing for quality: the improvement journeys of leading hospitals in Europe and the United States: CRC Press; 2007.

Andersson-Gäre B, Neuhauser D. The health care quality journey of Jönköping County Council. Sweden Qual Manag Health Care. 2007;16(1):2–9.

Törnblom O, Stålne K, Kjellström S. Analyzing roles and leadership in organizations from cognitive complexity and meaning-making perspectives. Behav Dev. 2018;23(1):63.

Greenhalgh T, Russell J. Why Do Evaluations of eHealth Programs Fail? An Alternative Set of Guiding Principles. PLoS Med. 2010;7(11): e1000360.

Wells S, Tamir O, Gray J, Naidoo D, Bekhit M, Goldmann D. Are quality improvement collaboratives effective? A systematic review. BMJ Qual Saf. 2018;27(3):226–40.

Landon BE, Wilson IB, McInnes K, Landrum MB, Hirschhorn L, Marsden PV, Gustafson D, Cleary PD. Effects of a quality improvement collaborative on the outcome of care of patients with HIV infection: the EQHIV study. Ann Intern Med. 2004;140(11):887–96.

Mittman BS. Creating the evidence base for quality improvement collaboratives. Ann Intern Med. 2004;140(11):897–901.

Wennberg JE. Unwarranted variations in healthcare delivery: implications for academic medical centres. BMJ. 2002;325(7370):961–4.

Bungay H. Cancer and health policy: the postcode lottery of care. Soc Policy Admin. 2005;39(1):35–48.

Wennberg JE, Cooper MM: The Quality of Medical Care in the United States: A Report on the Medicare Program: The Dartmouth Atlas of Health Care 1999: The Center for the Evaluative Clinical Sciences [Internet]. 1999.

DaSilva P, Gray JM. English lessons: can publishing an atlas of variation stimulate the discussion on appropriateness of care? Med J Aust. 2016;205(S10):S5–7.

Gray WK, Day J, Briggs TW, Harrison S. Identifying unwarranted variation in clinical practice between healthcare providers in England: Analysis of administrative data over time for the Getting It Right First Time programme. J Eval Clin Pract. 2021;27(4):743–50.

Wabe N, Thomas J, Scowen C, Eigenstetter A, Lindeman R, Georgiou A. The NSW Pathology Atlas of Variation: Part I—Identifying Emergency Departments With Outlying Laboratory Test-Ordering Practices. Ann Emerg Med. 2021;78(1):150–62.

Jamal A, Babazono A, Li Y, Fujita T, Yoshida S, Kim SA. Elucidating variations in outcomes among older end-stage renal disease patients on hemodialysis in Fukuoka Prefecture, Japan. PLoS ONE. 2021;16(5): e0252196.

Sutherland K, Levesque JF. Unwarranted clinical variation in health care: definitions and proposal of an analytic framework. J Eval Clin Pract. 2020;26(3):687–96.

Tanenbaum SJ. Reducing variation in health care: The rhetorical politics of a policy idea. J Health Polit Policy Law. 2013;38(1):5–26.

Atsma F, Elwyn G, Westert G. Understanding unwarranted variation in clinical practice: a focus on network effects, reflective medicine and learning health systems. Int J Qual Health Care. 2020;32(4):271–4.

Horbar JD, Rogowski J, Plsek PE, Delmore P, Edwards WH, Hocker J, Kantak AD, Lewallen P, Lewis W, Lewit E. Collaborative quality improvement for neonatal intensive care. Pediatrics. 2001;107(1):14–22.

Van Maanen J: Tales of the field: On writing ethnography: University of Chicago Press; 2011.

Golden-Biddle K, Locke K. Appealing work: An investigation of how ethnographic texts convince. Organ Sci. 1993;4(4):595–616.

Braun V, Clarke V. Using thematic analysis in psychology. Qual Res Psychol. 2006;3(2):77–101.

Glaser BG. The constant comparative method of qualitative analysis. Soc Probl. 1965;12:436–45.

Willis R. The use of composite narratives to present interview findings. Qual Res. 2019;19(4):471–80.

Vojdani A, Vojdani E, Saidara E, Maes M. Persistent SARS-CoV-2 Infection, EBV, HHV-6 and other factors may contribute to inflammation and autoimmunity in long COVID. Viruses. 2023;15(2):400.

Choutka J, Jansari V, Hornig M, Iwasaki A. Unexplained post-acute infection syndromes. Nat Med. 2022;28(5):911–23.

Connors JM, Ariëns RAS. Uncertainties about the roles of anticoagulation and microclots in postacute sequelae of severe acute respiratory syndrome coronavirus 2 infection. J Thromb Haemost. 2023;21(10):2697–701.

Patel MA, Knauer MJ, Nicholson M, Daley M, Van Nynatten LR, Martin C, Patterson EK, Cepinskas G, Seney SL, Dobretzberger V. Elevated vascular transformation blood biomarkers in Long-COVID indicate angiogenesis as a key pathophysiological mechanism. Mol Med. 2022;28(1):122.

Greenhalgh T, Sivan M, Delaney B, Evans R, Milne R: Long covid—an update for primary care. bmj 2022, 378.

Parkin A, Davison J, Tarrant R, Ross D, Halpin S, Simms A, Salman R, Sivan M. A multidisciplinary NHS COVID-19 service to manage post-COVID-19 syndrome in the community. J Prim Care Commun Health. 2021;12:21501327211010990.

NHS England: COVID-19 Post-Covid Assessment Service, vol. Accessed 5th March 2024 at https://www.england.nhs.uk/statistics/statistical-work-areas/covid-19-post-covid-assessment-service/ . London: NHS England; 2024.

Sivan M, Halpin S, Gee J, Makower S, Parkin A, Ross D, Horton M, O'Connor R: The self-report version and digital format of the COVID-19 Yorkshire Rehabilitation Scale (C19-YRS) for Long Covid or Post-COVID syndrome assessment and monitoring. Adv Clin Neurosci Rehabil 2021;20(3).

The EuroQol Group. EuroQol-a new facility for the measurement of health-related quality of life. Health Policy. 1990;16(3):199–208.

Sivan M, Preston NJ, Parkin A, Makower S, Gee J, Ross D, Tarrant R, Davison J, Halpin S, O’Connor RJ, et al. The modified COVID-19 Yorkshire Rehabilitation Scale (C19-YRSm) patient-reported outcome measure for Long Covid or Post-COVID syndrome. J Med Virol. 2022;94(9):4253–64.

Johns MW. A new method for measuring daytime sleepiness: the Epworth sleepiness scale. Sleep. 1991;14(6):540–5.

Kroenke K, Spitzer RL, Williams JB. The PHQ-9: validity of a brief depression severity measure. J Gen Intern Med. 2001;16(9):606–13.

Van Dixhoorn J, Duivenvoorden H. Efficacy of Nijmegen Questionnaire in recognition of the hyperventilation syndrome. J Psychosom Res. 1985;29(2):199–206.

Evans R, Pick A, Lardner R, Masey V, Smith N, Greenhalgh T: Breathing difficulties after covid-19: a guide for primary care. BMJ 2023;381.

Van Dixhoorn J, Folgering H: The Nijmegen Questionnaire and dysfunctional breathing. In . , vol. 1: Eur Respiratory Soc; 2015.

Courtney R, Greenwood KM. Preliminary investigation of a measure of dysfunctional breathing symptoms: The Self Evaluation of Breathing Questionnaire (SEBQ). Int J Osteopathic Med. 2009;12(4):121–7.

Espinosa-Gonzalez A, Master H, Gall N, Halpin S, Rogers N, Greenhalgh T. Orthostatic tachycardia after covid-19. BMJ (Clinical Research ed). 2023;380:e073488–e073488.

PubMed   Google Scholar  

Bungo M, Charles J, Johnson P Jr. Cardiovascular deconditioning during space flight and the use of saline as a countermeasure to orthostatic intolerance. Aviat Space Environ Med. 1985;56(10):985–90.

CAS   PubMed   Google Scholar  

Sivan M, Corrado J, Mathias C. The Adapted Autonomic Profile (Aap) Home-Based Test for the Evaluation of Neuro-Cardiovascular Autonomic Dysfunction. Adv Clin Neurosci Rehabil. 2022;3:10–13. https://doi.org/10.47795/QKBU46715 .

Lee C, Greenwood DC, Master H, Balasundaram K, Williams P, Scott JT, Wood C, Cooper R, Darbyshire JL, Gonzalez AE. Prevalence of orthostatic intolerance in long covid clinic patients and healthy volunteers: A multicenter study. J Med Virol. 2024;96(3): e29486.

World Health Organization: Clinical management of covid-19 - living guideline. Geneva: WHO. Accessed 4th October 2023 at https://www.who.int/publications/i/item/WHO-2019-nCoV-clinical-2021-2 ; 2023.

Ahmed I, Mustafaoglu R, Yeldan I, Yasaci Z, Erhan B: Effect of pulmonary rehabilitation approaches on dyspnea, exercise capacity, fatigue, lung functions and quality of life in patients with COVID-19: A Systematic Review and Meta-Analysis. Arch Phys Med Rehabil 2022.

Dillen H, Bekkering G, Gijsbers S, Vande Weygaerde Y, Van Herck M, Haesevoets S, Bos DAG, Li A, Janssens W, Gosselink R, et al. Clinical effectiveness of rehabilitation in ambulatory care for patients with persisting symptoms after COVID-19: a systematic review. BMC Infect Dis. 2023;23(1):419.

Learmonth Y, Dlugonski D, Pilutti L, Sandroff B, Klaren R, Motl R. Psychometric properties of the fatigue severity scale and the modified fatigue impact scale. J Neurol Sci. 2013;331(1–2):102–7.

Webster K, Cella D, Yost K. The Functional Assessment of Chronic Illness T herapy (FACIT) Measurement System: properties, applications, and interpretation. Health Qual Life Outcomes. 2003;1(1):1–7.

Mundt JC, Marks IM, Shear MK, Greist JM. The Work and Social Adjustment Scale: a simple measure of impairment in functioning. Br J Psychiatry. 2002;180(5):461–4.

Chalder T, Berelowitz G, Pawlikowska T, Watts L, Wessely S, Wright D, Wallace E. Development of a fatigue scale. J Psychosom Res. 1993;37(2):147–53.

Shahid A, Wilkinson K, Marcu S, Shapiro CM: Visual analogue scale to evaluate fatigue severity (VAS-F). In: STOP, THAT and one hundred other sleep scales . edn.: Springer; 2011:399–402.

Parker M, Sawant HB, Flannery T, Tarrant R, Shardha J, Bannister R, Ross D, Halpin S, Greenwood DC, Sivan M. Effect of using a structured pacing protocol on post-exertional symptom exacerbation and health status in a longitudinal cohort with the post-COVID-19 syndrome. J Med Virol. 2023;95(1): e28373.

Kenny RA, Bayliss J, Ingram A, Sutton R. Head-up tilt: a useful test for investigating unexplained syncope. The Lancet. 1986;327(8494):1352–5.

Drury MOC: Science and Psychology. In: The selected writings of Maurice O’Connor Drury: On Wittgenstein, philosophy, religion and psychiatry. edn.: Bloomsbury Publishing; 2017.

Concato J, Shah N, Horwitz RI. Randomized, controlled trials, observational studies, and the hierarchy of research designs. N Engl J Med. 2000;342(25):1887–92.

Mongtomery K: How doctors think: Clinical judgment and the practice of medicine: Oxford University Press; 2005.

Download references

Acknowledgements

We are grateful to clinic staff for allowing us to study their work and to patients for allowing us to sit in on their consultations. We also thank the funder of LOCOMOTION (National Institute for Health Research) and the patient advisory group for lived experience input.

This research is supported by National Institute for Health Research (NIHR) Long Covid Research Scheme grant (Ref COV-LT-0016).

Author information

Authors and affiliations.

Nuffield Department of Primary Care Health Sciences, University of Oxford, Woodstock Rd, Oxford, OX2 6GG, UK

Trisha Greenhalgh, Julie L. Darbyshire & Emma Ladds

Imperial College Healthcare NHS Trust, London, UK

LOCOMOTION Patient Advisory Group and Lived Experience Representative, London, UK

You can also search for this author in PubMed   Google Scholar

Contributions

TG conceptualized the overall study, led the empirical work, supported the quality improvement meetings, conducted the ethnographic visits, led the data analysis, developed the theorization and wrote the first draft of the paper. JLD organized and led the quality improvement meetings, supported site-based researchers to collect and analyse data on their clinic, collated and summarized data on quality topics, and liaised with the patient advisory group. CL conceptualized and led the quality topic on POTS, including exploring reasons for some clinics’ reluctance to conduct testing and collating and analysing the NASA Lean Test data across all sites. EL assisted with ethnographic visits, data analysis, and theorization. JCS contributed lived experience of long covid and also clinical experience as an occupational therapist; she liaised with the wider patient advisory group, whose independent (patient-led) audit of long covid clinics informed the quality improvement prioritization exercise. All authors provided extensive feedback on drafts and contributed to discussions and refinements. All authors read and approved the final manuscript.

Corresponding author

Correspondence to Trisha Greenhalgh .

Ethics declarations

Ethics approval and consent to participate.

LOng COvid Multidisciplinary consortium Optimising Treatments and servIces acrOss the NHS study is sponsored by the University of Leeds and approved by Yorkshire & The Humber—Bradford Leeds Research Ethics Committee (ref: 21/YH/0276) and subsequent amendments.

Patient participants in clinic were approached by the clinician (without the researcher present) and gave verbal informed consent for a clinically qualified researcher to observe the consultation. If they consented, the researcher was then invited to sit in. A written record was made in field notes of this verbal consent. It was impractical to seek consent from patients whose cases were discussed (usually with very brief clinical details) in online MDTs. Therefore, clinical case examples from MDTs presented in the paper are fictionalized cases constructed from multiple real cases and with key clinical details changed (for example, comorbidities were replaced with different conditions which would produce similar symptoms). All fictionalized cases were checked by our patient advisory group to check that they were plausible to lived experience experts.

Consent for publication

No direct patient cases are reported in this manuscript. For details of how the fictionalized cases were constructed and validated, see “Consent to participate” above.

Competing interests

TG was a member of the UK National Long Covid Task Force 2021–2023 and on the Oversight Group for the NICE Guideline on Long Covid 2021–2022. She is a member of Independent SAGE.

Additional information

Publisher’s note.

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ . The Creative Commons Public Domain Dedication waiver ( http://creativecommons.org/publicdomain/zero/1.0/ ) applies to the data made available in this article, unless otherwise stated in a credit line to the data.

Reprints and permissions

About this article

Cite this article.

Greenhalgh, T., Darbyshire, J.L., Lee, C. et al. What is quality in long covid care? Lessons from a national quality improvement collaborative and multi-site ethnography. BMC Med 22 , 159 (2024). https://doi.org/10.1186/s12916-024-03371-6

Download citation

Received : 04 December 2023

Accepted : 26 March 2024

Published : 15 April 2024

DOI : https://doi.org/10.1186/s12916-024-03371-6

Share this article

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

  • Post-covid-19 syndrome
  • Quality improvement
  • Breakthrough collaboratives
  • Warranted variation
  • Unwarranted variation
  • Improvement science
  • Ethnography
  • Idiographic reasoning
  • Nomothetic reasoning

BMC Medicine

ISSN: 1741-7015

review case study methodology

U.S. flag

An official website of the United States government

The .gov means it’s official. Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

The site is secure. The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

  • Publications
  • Account settings

Preview improvements coming to the PMC website in October 2024. Learn More or Try it out now .

  • Advanced Search
  • Journal List
  • J Med Libr Assoc
  • v.107(1); 2019 Jan

Distinguishing case study as a research method from case reports as a publication type

The purpose of this editorial is to distinguish between case reports and case studies. In health, case reports are familiar ways of sharing events or efforts of intervening with single patients with previously unreported features. As a qualitative methodology, case study research encompasses a great deal more complexity than a typical case report and often incorporates multiple streams of data combined in creative ways. The depth and richness of case study description helps readers understand the case and whether findings might be applicable beyond that setting.

Single-institution descriptive reports of library activities are often labeled by their authors as “case studies.” By contrast, in health care, single patient retrospective descriptions are published as “case reports.” Both case reports and case studies are valuable to readers and provide a publication opportunity for authors. A previous editorial by Akers and Amos about improving case studies addresses issues that are more common to case reports; for example, not having a review of the literature or being anecdotal, not generalizable, and prone to various types of bias such as positive outcome bias [ 1 ]. However, case study research as a qualitative methodology is pursued for different purposes than generalizability. The authors’ purpose in this editorial is to clearly distinguish between case reports and case studies. We believe that this will assist authors in describing and designating the methodological approach of their publications and help readers appreciate the rigor of well-executed case study research.

Case reports often provide a first exploration of a phenomenon or an opportunity for a first publication by a trainee in the health professions. In health care, case reports are familiar ways of sharing events or efforts of intervening with single patients with previously unreported features. Another type of study categorized as a case report is an “N of 1” study or single-subject clinical trial, which considers an individual patient as the sole unit of observation in a study investigating the efficacy or side effect profiles of different interventions. Entire journals have evolved to publish case reports, which often rely on template structures with limited contextualization or discussion of previous cases. Examples that are indexed in MEDLINE include the American Journal of Case Reports , BMJ Case Reports, Journal of Medical Case Reports, and Journal of Radiology Case Reports . Similar publications appear in veterinary medicine and are indexed in CAB Abstracts, such as Case Reports in Veterinary Medicine and Veterinary Record Case Reports .

As a qualitative methodology, however, case study research encompasses a great deal more complexity than a typical case report and often incorporates multiple streams of data combined in creative ways. Distinctions include the investigator’s definitions and delimitations of the case being studied, the clarity of the role of the investigator, the rigor of gathering and combining evidence about the case, and the contextualization of the findings. Delimitation is a term from qualitative research about setting boundaries to scope the research in a useful way rather than describing the narrow scope as a limitation, as often appears in a discussion section. The depth and richness of description helps readers understand the situation and whether findings from the case are applicable to their settings.

CASE STUDY AS A RESEARCH METHODOLOGY

Case study as a qualitative methodology is an exploration of a time- and space-bound phenomenon. As qualitative research, case studies require much more from their authors who are acting as instruments within the inquiry process. In the case study methodology, a variety of methodological approaches may be employed to explain the complexity of the problem being studied [ 2 , 3 ].

Leading authors diverge in their definitions of case study, but a qualitative research text introduces case study as follows:

Case study research is defined as a qualitative approach in which the investigator explores a real-life, contemporary bounded system (a case) or multiple bound systems (cases) over time, through detailed, in-depth data collection involving multiple sources of information, and reports a case description and case themes. The unit of analysis in the case study might be multiple cases (a multisite study) or a single case (a within-site case study). [ 4 ]

Methodologists writing core texts on case study research include Yin [ 5 ], Stake [ 6 ], and Merriam [ 7 ]. The approaches of these three methodologists have been compared by Yazan, who focused on six areas of methodology: epistemology (beliefs about ways of knowing), definition of cases, design of case studies, and gathering, analysis, and validation of data [ 8 ]. For Yin, case study is a method of empirical inquiry appropriate to determining the “how and why” of phenomena and contributes to understanding phenomena in a holistic and real-life context [ 5 ]. Stake defines a case study as a “well-bounded, specific, complex, and functioning thing” [ 6 ], while Merriam views “the case as a thing, a single entity, a unit around which there are boundaries” [ 7 ].

Case studies are ways to explain, describe, or explore phenomena. Comments from a quantitative perspective about case studies lacking rigor and generalizability fail to consider the purpose of the case study and how what is learned from a case study is put into practice. Rigor in case studies comes from the research design and its components, which Yin outlines as (a) the study’s questions, (b) the study’s propositions, (c) the unit of analysis, (d) the logic linking the data to propositions, and (e) the criteria for interpreting the findings [ 5 ]. Case studies should also provide multiple sources of data, a case study database, and a clear chain of evidence among the questions asked, the data collected, and the conclusions drawn [ 5 ].

Sources of evidence for case studies include interviews, documentation, archival records, direct observations, participant-observation, and physical artifacts. One of the most important sources for data in qualitative case study research is the interview [ 2 , 3 ]. In addition to interviews, documents and archival records can be gathered to corroborate and enhance the findings of the study. To understand the phenomenon or the conditions that created it, direct observations can serve as another source of evidence and can be conducted throughout the study. These can include the use of formal and informal protocols as a participant inside the case or an external or passive observer outside of the case [ 5 ]. Lastly, physical artifacts can be observed and collected as a form of evidence. With these multiple potential sources of evidence, the study methodology includes gathering data, sense-making, and triangulating multiple streams of data. Figure 1 shows an example in which data used for the case started with a pilot study to provide additional context to guide more in-depth data collection and analysis with participants.

An external file that holds a picture, illustration, etc.
Object name is jmla-107-1-f001.jpg

Key sources of data for a sample case study

VARIATIONS ON CASE STUDY METHODOLOGY

Case study methodology is evolving and regularly reinterpreted. Comparative or multiple case studies are used as a tool for synthesizing information across time and space to research the impact of policy and practice in various fields of social research [ 9 ]. Because case study research is in-depth and intensive, there have been efforts to simplify the method or select useful components of cases for focused analysis. Micro-case study is a term that is occasionally used to describe research on micro-level cases [ 10 ]. These are cases that occur in a brief time frame, occur in a confined setting, and are simple and straightforward in nature. A micro-level case describes a clear problem of interest. Reporting is very brief and about specific points. The lack of complexity in the case description makes obvious the “lesson” that is inherent in the case; although no definitive “solution” is necessarily forthcoming, making the case useful for discussion. A micro-case write-up can be distinguished from a case report by its focus on briefly reporting specific features of a case or cases to analyze or learn from those features.

DATABASE INDEXING OF CASE REPORTS AND CASE STUDIES

Disciplines such as education, psychology, sociology, political science, and social work regularly publish rich case studies that are relevant to particular areas of health librarianship. Case reports and case studies have been defined as publication types or subject terms by several databases that are relevant to librarian authors: MEDLINE, PsycINFO, CINAHL, and ERIC. Library, Information Science & Technology Abstracts (LISTA) does not have a subject term or publication type related to cases, despite many being included in the database. Whereas “Case Reports” are the main term used by MEDLINE’s Medical Subject Headings (MeSH) and PsycINFO’s thesaurus, CINAHL and ERIC use “Case Studies.”

Case reports in MEDLINE and PsycINFO focus on clinical case documentation. In MeSH, “Case Reports” as a publication type is specific to “clinical presentations that may be followed by evaluative studies that eventually lead to a diagnosis” [ 11 ]. “Case Histories,” “Case Studies,” and “Case Study” are all entry terms mapping to “Case Reports”; however, guidance to indexers suggests that “Case Reports” should not be applied to institutional case reports and refers to the heading “Organizational Case Studies,” which is defined as “descriptions and evaluations of specific health care organizations” [ 12 ].

PsycINFO’s subject term “Case Report” is “used in records discussing issues involved in the process of conducting exploratory studies of single or multiple clinical cases.” The Methodology index offers clinical and non-clinical entries. “Clinical Case Study” is defined as “case reports that include disorder, diagnosis, and clinical treatment for individuals with mental or medical illnesses,” whereas “Non-clinical Case Study” is a “document consisting of non-clinical or organizational case examples of the concepts being researched or studied. The setting is always non-clinical and does not include treatment-related environments” [ 13 ].

Both CINAHL and ERIC acknowledge the depth of analysis in case study methodology. The CINAHL scope note for the thesaurus term “Case Studies” distinguishes between the document and the methodology, though both use the same term: “a review of a particular condition, disease, or administrative problem. Also, a research method that involves an in-depth analysis of an individual, group, institution, or other social unit. For material that contains a case study, search for document type: case study.” The ERIC scope note for the thesaurus term “Case Studies” is simple: “detailed analyses, usually focusing on a particular problem of an individual, group, or organization” [ 14 ].

PUBLICATION OF CASE STUDY RESEARCH IN LIBRARIANSHIP

We call your attention to a few examples published as case studies in health sciences librarianship to consider how their characteristics fit with the preceding definitions of case reports or case study research. All present some characteristics of case study research, but their treatment of the research questions, richness of description, and analytic strategies vary in depth and, therefore, diverge at some level from the qualitative case study research approach. This divergence, particularly in richness of description and analysis, may have been constrained by the publication requirements.

As one example, a case study by Janke and Rush documented a time- and context-bound collaboration involving a librarian and a nursing faculty member [ 15 ]. Three objectives were stated: (1) describing their experience of working together on an interprofessional research team, (2) evaluating the value of the librarian role from librarian and faculty member perspectives, and (3) relating findings to existing literature. Elements that signal the qualitative nature of this case study are that the authors were the research participants and their use of the term “evaluation” is reflection on their experience. This reads like a case study that could have been enriched by including other types of data gathered from others engaging with this team to broaden the understanding of the collaboration.

As another example, the description of the academic context is one of the most salient components of the case study written by Clairoux et al., which had the objectives of (1) describing the library instruction offered and learning assessments used at a single health sciences library and (2) discussing the positive outcomes of instruction in that setting [ 16 ]. The authors focus on sharing what the institution has done more than explaining why this institution is an exemplar to explore a focused question or understand the phenomenon of library instruction. However, like a case study, the analysis brings together several streams of data including course attendance, online material page views, and some discussion of results from surveys. This paper reads somewhat in between an institutional case report and a case study.

The final example is a single author reporting on a personal experience of creating and executing the role of research informationist for a National Institutes of Health (NIH)–funded research team [ 17 ]. There is a thoughtful review of the informationist literature and detailed descriptions of the institutional context and the process of gaining access to and participating in the new role. However, the motivating question in the abstract does not seem to be fully addressed through analysis from either the reflective perspective of the author as the research participant or consideration of other streams of data from those involved in the informationist experience. The publication reads more like a case report about this informationist’s experience than a case study that explores the research informationist experience through the selection of this case.

All of these publications are well written and useful for their intended audiences, but in general, they are much shorter and much less rich in depth than case studies published in social sciences research. It may be that the authors have been constrained by word counts or page limits. For example, the submission category for Case Studies in the Journal of the Medical Library Association (JMLA) limited them to 3,000 words and defined them as “articles describing the process of developing, implementing, and evaluating a new service, program, or initiative, typically in a single institution or through a single collaborative effort” [ 18 ]. This definition’s focus on novelty and description sounds much more like the definition of case report than the in-depth, detailed investigation of a time- and space-bound problem that is often examined through case study research.

Problem-focused or question-driven case study research would benefit from the space provided for Original Investigations that employ any type of quantitative or qualitative method of analysis. One of the best examples in the JMLA of an in-depth multiple case study that was authored by a librarian who published the findings from her doctoral dissertation represented all the elements of a case study. In eight pages, she provided a theoretical basis for the research question, a pilot study, and a multiple case design, including integrated data from interviews and focus groups [ 19 ].

We have distinguished between case reports and case studies primarily to assist librarians who are new to research and critical appraisal of case study methodology to recognize the features that authors use to describe and designate the methodological approaches of their publications. For researchers who are new to case research methodology and are interested in learning more, Hancock and Algozzine provide a guide [ 20 ].

We hope that JMLA readers appreciate the rigor of well-executed case study research. We believe that distinguishing between descriptive case reports and analytic case studies in the journal’s submission categories will allow the depth of case study methodology to increase. We also hope that authors feel encouraged to pursue submitting relevant case studies or case reports for future publication.

Editor’s note: In response to this invited editorial, the Journal of the Medical Library Association will consider manuscripts employing rigorous qualitative case study methodology to be Original Investigations (fewer than 5,000 words), whereas manuscripts describing the process of developing, implementing, and assessing a new service, program, or initiative—typically in a single institution or through a single collaborative effort—will be considered to be Case Reports (formerly known as Case Studies; fewer than 3,000 words).

An official website of the United States government

Here’s how you know

Official websites use .gov A .gov website belongs to an official government organization in the United States.

Secure .gov websites use HTTPS A lock ( Lock Locked padlock ) or https:// means you’ve safely connected to the .gov website. Share sensitive information only on official, secure websites.

Guidance on building better digital services in government

Determining the true value of a website: A GSA case study

review case study methodology

Cleaning up: A hypothetical scenario

Consider this scenario: you’ve been told to clean up a giant room full of Things Your Agency Has Made in the Past and Now Maintains for Public Use . This means disposing of the Things that no longer add value, and sprucing up the Things that are still useful. How do you determine which Things belong in which category, especially when all the Things in that giant room have been used by the public, and available for all to see?

When the “things” we’re talking about are websites, this determination is often much more complicated than it might appear on the surface. This scenario is one facing web teams across the government, including at the U.S. General Services Administration (GSA), every single day. If you’re in this situation, consider all the ways you might begin to tackle this cleanup job.

Evaluating by visits

You decide to start by determining how many people visit each website each month. Delighted, you pull those numbers together and produce a chart that looks something like this:

review case study methodology

The chart states that the 10 least-visited GSA websites had only about 66 visits in the past 30 days, whereas the top 10 websites averaged over 629,000 visits, and the agency average websites averaged over 244,000 monthly visits. So there you have it: clearly, it appears the websites with only 66 visits are the least useful and should be decommissioned. (Note that the low-traffic websites all show 66 visits because of the analytics tool’s statistical sampling methodology.)

However, you stop to examine one of the low-traffic sites. In studying it, you realize that it was never designed to have many visitors. Instead, it was designed to support a very small audience that only appears at random, unpredictable intervals; say, when a natural disaster strikes. Clearly, you don’t want to get rid of that website, since it’s meeting a specific need of a small but well-defined and important audience.

Through this consideration, you realize that using the number of visitors to determine the usefulness of a website incorrectly assumes:

  • Each visit across all your websites is of the same value.
  • Each audience, whether 66 people, or 629,000, have the same level of urgency and need for each website, even if one website is intended to serve a large, continuous audience, while another is designed to serve a small, irregular audience.

Since both of these assumptions are false, visitor numbers are not enough to determine the usefulness of a website. You need another evaluation tactic.

Evaluating by accessibility

After some consideration, you realize that all the websites have to be fully accessible to everyone, regardless of ability. You also have the tools and processes to help determine whether that standard has been reached. Excited, you start by assembling and running your automated accessibility tests.

review case study methodology

Five websites stand out as having the worst accessibility errors, according to your tests. Clearly, these websites must go. As you prepare to get rid of them, however, you notice that the vast majority of the errors in the worst website are identical and all seem to originate from the same part of the website. You look closer and realize that the problem causing all those errors is actually quite basic and can be fixed easily, taking the worst website out of the bottom ranking. Looking at the other websites in your list, you realize that other errors that have surfaced are only errors in an automatic test, not a human one. Many of them aren’t on critical paths for the website’s use, so while they should be addressed, they are not meaningfully blocking access to the website.

That throws your entire evaluation into question: how can you possibly batch and judge the usefulness of a website by accessibility, if the severity and impact of each accessibility error varies so much? Instead, you must pair automated accessibility tests with manual testing to reach conclusions on the least accessible websites. That won’t help you quickly get rid of the lowest value websites, so yet another evaluation tactic is needed.

Evaluating by speed and performance

After considering the number of visits and the accessibility, you realize that an evaluation of usefulness needs to consider a basic question: is the performance and speed of the website reasonable? If a product is so frustratingly slow that people don’t use it, then nothing else matters.

To figure out which websites are so slow as to be essentially non-functional, you find a free online tool that tests website performance. Additionally, you get smart based on your previous experiments: this tool tests for a few different parameters, not just one element of performance. It then compiles these parameters into a single index score, so its results are compelling.

review case study methodology

This performance metric shows you that, on average, your websites perform at 84% of a perfect 100% score, and there are a few low-performing websites at 26% performance or lower. This works for you; you know you need to get rid of your agency’s low-performing websites. As you’re planning to decommission these sites, however, a user visits one of them to complete a task and provides some feedback.

Evaluating by customer research

The user waits while the website slowly loads. Then, they interact with the website and exit the page. To gauge their satisfaction, you prompt them to give you feedback on the page by asking, “Was this page helpful?” The user shares:

“This website does work; it just works slowly. I’m willing to wait, though, because I need the information. There’s nowhere else to get this information, so please don’t get rid of this website; I have to come back and get information from it every month.”

After taking this customer research into account, you realize that visits, accessibility, performance, and speed do not, on their own, fully reflect the website’s value, so you still don’t know which websites to decommission.

At this point, you’ve discovered that evaluating websites is a multidimensional problem — one that cannot be determined by a single, simple metric. Indeed, even when you consider several metrics, your conclusions lack a customer’s perspective.

Determining the value of agency websites therefore must use an index that is not just composed of similar metrics (like the performance index) but is in fact a composite index of different datasets of different data types. This approach will allow you to evaluate the website’s purpose, function, and ultimately, value, to your agency and your customers. This aggregation of dataset types is known as a composite indicator.

Methodology: The Enterprise Digital Experience composite indicator

This is the story of evaluating websites in GSA. Websites seem simple to evaluate: do they work or not? But in truth, they are a multidimensional problem. In taking on the definition and evaluation of GSA public-facing websites, the Service Design team in GSA’s Office of Customer Experience researched and designed a composite indicator of multiple data sets of different types to evaluate the value of websites in GSA. Since 2021, we’ve been doing this by examining six things:

Accessibility , scored by our agency standard accessibility tool ( quantitative data, 21st Century IDEA Section 3A.1 )

Customer-centricity , scored by a human-centered design interview ( qualitative data, 21st Century IDEA Section 3A.6 and OMB Circular A-11 280.1 and 280.8 )

  • Stated audience : Can the website team succinctly and precisely name their website’s primary audience?
  • Stated purpose : Can the website team succinctly and precisely name their website’s primary purpose?
  • Measurement of purpose : Does the website have a replicable means to measure if the website’s purpose is being achieved?
  • Repeatable customer feedback mechanism : Does the website team have a repeatable customer feedback mechanism in place, such as an embedded survey, or recurring, well-promoted and attended meetings, or focus groups with customers? (Receiving ad hoc feedback from customer call centers or email submissions does not meet this mark.)
  • Ability to action : Does the website team have a skillset that can contribute to rapidly improving the website based on feedback and need, such as human-centered design research, user experience, writing, or programming skills?
  • Ability to measure impact : Does the website team have the ability to measure the impact of the improvements they implement? Have they devised and implemented a measurement methodology specifically for their changes (an ability to measure impact) or do they rely solely on blanket measures such as Digital Analytics Program data (no ability to measure impact)?

Performance and search engine optimization , scored by Google Lighthouse ( quantitative data, 21st Century IDEA Section 3A.8 )

Required links , scored by the Site Scanning Program ’s website scan ( quantitative data, 21st Century IDEA Section 3A.1 & 3E )

User behavior, non-duplication , scored by Google Analytics with related sites ( qualitative + quantitative data, 21st Century IDEA Section 3A.3 )

U.S. Web Design System implementation , scored by Site Scanning Program’s website scan ( qualitative + quantitative data, 21st Century IDEA Section 3A.1 & 3E )

View all sections of the law and the circular mentioned above:

  • 21st Century IDEA (Public Law No. 115-336)
  • OMB Circular A-11 (PDF, 385 KB, 14 pages, 2023)

We visualize this evaluation in website maps, rendered as charts that are available internally to GSA employees. This helps us see examples of good performers, such as Website A (on the left), and not-so-good performers, like Website B (on the right.)

review case study methodology

In addition, these charts, like all maps [1] , contains some decisions that prioritize how the information is rendered. They include:

  • An equal weight to all datasets and data types, regardless of fidelity . In the charts above, the slices spread out from 0 along even increments. Our measurement of customer-centricity gives equal weight to whether a site proactively listens to their customers, as well as to whether it has the resources to implement change.
  • A direct comparison by slice . For example, our customer-centricity slice gives the same amount of distance from the center for listening to its customers as our required links slice gives for including information about privacy, regardless of the fact that customer listening is foundationally different (and more complicated) as an activity than including required links.

We made these decisions because to weight all of the metrics would be to travel down the coastline paradox [2] , meaning: we had to identify a stopping point for measurement and comparison that is somewhat arbitrary because, paradoxically, the more closely we measure and compare, the less clear the GSA digital ecosystem would become. These measures are the baseline because, broadly, they are fair in their unfairness: some things are easier to do, and some things are harder, but what is “easier” and what is “harder” differs depending on the resources available to each website team.

But even in comparing websites using charts and maps containing multiple dataset types, we’re missing some nuance. “Website A” is a simple, informational site, whereas “Website B” contains a pricing feature, which introduces additional complexities that are more difficult to manage than simple textual information. To give visibility to this nuance, the Service Design team uses these maps as part of a broader website evaluation package, which includes qualitative research interviews and subsequent evaluation write ups. These are sent to every website team within three weeks after we conduct the research interview. Taken together, the quantitative and qualitative data in the website evaluation packages allow GSA staff to consistently measure how digital properties are functioning, and what their impact is on customers.

Concluding which websites should exist

The reality is: value exists in dimensions, not in single data points, or even in single datasets. To further complicate things, the closer you look at single datasets, the more your decision-making process is complicated, rather than clarified. This is because each data type and each data point in complex systems can be broken down into infinitely smaller pieces, rendering decisions made based on these pieces more accurate, but also of smaller and smaller impact. [3]

None of the measures in the Enterprise Digital Experience composite indicator or their use as a whole pie results in an affirmation or denial of the value of a digital property to the agency or to the public; value will always exist as an interpretation of these datasets. The indicator can tell us how existing sites are doing, but not whether we should continue supporting them.

To understand whether a website is worth supporting and how to evolve it, the Service Design team pairs qualitative and quantitative data with mission and strategic priorities to evaluate which websites to improve, and which to stop supporting. To achieve this pairing, three elements must come together:

  • Technical evaluations
  • Regular dialogue with each website’s customers, including internal stakeholders and leadership
  • Enterprise-level meta-analysis of a digital property’s functions in comparison to other digital properties

Customer dialogue is the responsibility of each team, and technical evaluations are readily available, thanks to tools like the Digital Analytics Program (DAP), but enterprise-level meta-analyses require a cross-functional view. This view can be attained through matrixed initiatives like GSA’s Service Design program, or cross-functional groups like GSA’s Digital Council, in collaboration with program teams and leadership.

From an enterprise perspective, the next phase in our evaluation of GSA properties is to apply service categories to each website, to better understand how GSA is working along categorical lines, instead of businesses or brands. Taxonomical work like this is the domain of enterprise architecture. Our service category taxonomy was compiled by using the Federal Enterprise Architecture Framework (FEAF) [4] as a starting point, and crosswalks a website’s designed function with its practical function, evaluated through general and agency use.

We’re starting to leverage service categories, and working with teams to create a more coalesced view of website value as we do so.

What can I do next?

Review an introduction to analytics to learn how metrics and data can improve understanding of how people use your website.

If you work at a U.S. federal government agency, and would like to learn more about this work, reach out to GSA’s Service Design team at [email protected] .

Disclaimer : All references to specific brands, products, and/or companies are used only for illustrative purposes and do not imply endorsement by the U.S. federal government or any federal government agency.

Join a Community

  • Web Analytics and Optimization
  • Innovation Adoption
  • Customer Experience
  • Web Managers

Related Topics

2024-04-16-determining-the-true-value-of-a-website-a-gsa-case-study.md

news/2024/04/2024-04-16-determining-the-true-value-of-a-website-a-gsa-case-study.md

Link Shortcode

{{< link "news/2024/04/2024-04-16-determining-the-true-value-of-a-website-a-gsa-case-study.md

" >}}

Join 60,000 others in government and subscribe to our newsletter — a round-up of the best digital news in government and across our field.

Digital.gov

An official website of the U.S. General Services Administration

  • Open access
  • Published: 19 April 2024

GbyE: an integrated tool for genome widely association study and genome selection based on genetic by environmental interaction

  • Xinrui Liu 1 , 2 ,
  • Mingxiu Wang 1 ,
  • Jie Qin 1 ,
  • Yaxin Liu 1 ,
  • Shikai Wang 1 ,
  • Shiyu Wu 1 ,
  • Ming Zhang 1 ,
  • Jincheng Zhong 1 &
  • Jiabo Wang 1  

BMC Genomics volume  25 , Article number:  386 ( 2024 ) Cite this article

Metrics details

The growth and development of organism were dependent on the effect of genetic, environment, and their interaction. In recent decades, lots of candidate additive genetic markers and genes had been detected by using genome-widely association study (GWAS). However, restricted to computing power and practical tool, the interactive effect of markers and genes were not revealed clearly. And utilization of these interactive markers is difficult in the breeding and prediction, such as genome selection (GS).

Through the Power-FDR curve, the GbyE algorithm can detect more significant genetic loci at different levels of genetic correlation and heritability, especially at low heritability levels. The additive effect of GbyE exhibits high significance on certain chromosomes, while the interactive effect detects more significant sites on other chromosomes, which were not detected in the first two parts. In prediction accuracy testing, in most cases of heritability and genetic correlation, the majority of prediction accuracy of GbyE is significantly higher than that of the mean method, regardless of whether the rrBLUP model or BGLR model is used for statistics. The GbyE algorithm improves the prediction accuracy of the three Bayesian models BRR, BayesA, and BayesLASSO using information from genetic by environmental interaction (G × E) and increases the prediction accuracy by 9.4%, 9.1%, and 11%, respectively, relative to the Mean value method. The GbyE algorithm is significantly superior to the mean method in the absence of a single environment, regardless of the combination of heritability and genetic correlation, especially in the case of high genetic correlation and heritability.

Conclusions

Therefore, this study constructed a new genotype design model program (GbyE) for GWAS and GS using Kronecker product. which was able to clearly estimate the additive and interactive effects separately. The results showed that GbyE can provide higher statistical power for the GWAS and more prediction accuracy of the GS models. In addition, GbyE gives varying degrees of improvement of prediction accuracy in three Bayesian models (BRR, BayesA, and BayesCpi). Whatever the phenotype were missed in the single environment or multiple environments, the GbyE also makes better prediction for inference population set. This study helps us understand the interactive relationship between genomic and environment in the complex traits. The GbyE source code is available at the GitHub website ( https://github.com/liu-xinrui/GbyE ).

Peer Review reports

Genetic by environmental interaction (G × E) is crucial of explaining individual traits and has gained increasing attention in research. It refers to the influence of genetic factors on susceptibility to environmental factors. In-depth study of G × E contributes to a deeper understanding of the relationship between individual growth, living environment and phenotypes. Genetic factors play a role in most human diseases at the molecular or cellular level, but environmental factors also contribute significantly. Researchers aim to uncover the mechanisms behind complex diseases and quantitative traits by investigating the interactions between organisms and their environment. Common, complex, or rare human diseases are often considered as outcomes resulting from the interplay of genes, environmental factors, and their interactions. Analyzing the joint effects of genes and the environment can provide valuable insights into the underlying pathway mechanisms of diseases. For instance, researchers have successfully identified potential loci associated with asthma risk through G × E interactions [ 1 ], and have explored predisposing factors for challenging-to-treat diseases like cancer [ 2 , 3 ], rhinitis [ 4 ], and depression [ 5 ].

However, two main methods are currently being used by breeders in agricultural production to increase crop yields and livestock productivity [ 6 ]. The first is to develop varieties with relatively low G × E effect to ensure stable production performance in different environments. The second is to use information from different environments to improve the statistical power of genome-wide association study (GWAS) to reveal potential loci of complex traits. The first method requires long-term commitment, while the second method clearly has faster returns. In GWAS, the use of multiple environments or phenotypes for association studies has become increasingly important. This not only improves the statistical power of environmental susceptibility traits[ 7 ], but also allows to detect signaling loci for G × E. There are significant challenges when using multiple environments or phenotypes for GWAS, mainly because most diseases and quantitative traits have numerous associated loci with minimal impact [ 8 ], and thus it is impossible to determine the effect size regulated by environment in these loci. The current detection strategy for G × E is based on complex statistical model, often requiring the use of a large number of samples to detect important signals [ 9 , 10 ]. In GS, breeders can use whole genome marker data to identify and select target strains in the early stages of animal and plant production [ 11 , 12 , 13 ]. Initially, GS models, similar to GWAS models, could only analyze a single environment or phenotype [ 14 ]. To improve the predictive accuracy of the models, higher marker densities are often required, allowing the proportion of genetic variation explained by these markers to be increased, indirectly obtaining higher predictive accuracy. It is worth mentioning that the consideration of G × E and multiple phenotypes in GS models [ 15 ] has been widely studied in different plant and animal breeding [ 16 ]. GS models that allow G × E have been developed [ 17 ] and most of them have modeled and interpreted G × E using structured covariates [ 18 ]. In these studies, most of the GS models provided more predictive accuracy when combined with G × E compared to single environment (or phenotype) analysis. Hence, there is need to develop models that leverage G × E information for GWAS and GS studies.

This study developed a novel genotype-by-environment method based on R, termed GbyE, which leverages the interaction among multiple environments or phenotypes to enhance the association study and prediction performance of environmental susceptibility traits. The method enables the identification of mutation sites that exhibit G × E interactions in specific environments. To evaluate the performance of the method, simulation experiments were conducted using a dataset comprising 282 corn samples. Importantly, this method can be seamlessly integrated into any GWAS and GS analysis.

Materials and methods

Support packages.

The development purpose of GbyE is to apply it to GWAS and GS research, therefore it uses the genome association and prediction integrated tool (GAPIT) [ 19 ], Bayesian Generalized Linear Regression (BGLR) [ 20 ], and Ridge Regression Best Linear Unbiased Prediction (rrBLUP) [ 21 ]package as support packages, where GbyE only provides conversion of interactive formats and file generation. In order to simplify the operation of the GbyE function package, the basic calculation package is attached to this package to support the operation of GbyE, including four function packages GbyE.Simulation.R (Dual environment phenotype simulation based on heritability, genetic correlation, and QTL quantity), GbyE.Calculate.R (For numerical genotype and phenotype data, this package can be used to process interactive genotype files of GbyE), GbyE.Power.FDR.R (Calculate the statistical power and false discovery rate (FDR) of GWAS), and GbyE.Comparison.Pvalue.R (GbyE generates redundant calculations in GWAS calculations, and SNP effect loci with minimal p -values can be filtered by this package).

Samples and sequencing data

In this study, a small volume of data was used for software simulation analysis, which is widely used in testing tasks of software such as GAPIT, TASSEL, and rMPV. The demonstration data comes from 282 inbred lines of maize, including 4 phenotypic data. In any case, there are no missing phenotypes in these data, and this dataset can be obtained from the website of GAPIT ( https://zzlab.net/GAPIT/index.html , accessed on May 1, 2022). Among them, our phenotype data was simulated using a self-made R simulation function, and the Mean and GbyE phenotype files were calculated. Convert this format to HapMap format using PLINK v1.09 and scripts written by oneself.

Simulated traits

Phenotype simulation was performed by modifying the GAPIT.Phenotype.Simulation function in the GAPIT. Based on the input parameter NQTN, the random selected markers’ genotype from whole genome were used to simulate genetic effect in the simulated trait. The genotype effects of these selected QTNs were randomly sampled from a multivariate normal distribution, the correlation value between these normal distribution was used to define the genetic relationship between each environments. The additive heritability ( \({{\text{h}}}_{{\text{g}}}^{2}\) ) was used to scale the relationship between additive genetic variance and phenotype variance. The simulated phenotype conditions in this paper are set as follows: 1) The three levels of \({{\text{h}}}_{{\text{g}}}^{2}\) were set at 0.8, 0.5, and 0.2, representing high ( \({{\text{h}}}_{{\text{h}}}^{2}\) ), median ( \({{\text{h}}}_{{\text{m}}}^{2}\) ) and low ( \({{\text{h}}}_{{\text{l}}}^{2}\) ) heritability; 2) Genetic correlation were set three levels 0.8, 0.5, 0.2 representing high ( \({{\text{R}}}_{{\text{h}}}\) ), medium ( \({{\text{R}}}_{{\text{m}}}\) ) and low ( \({{\text{R}}}_{{\text{l}}}\) ) genetic correlation; 3) 20 pre-set effect loci of QTL. The phenotype values in each environment were simulated together following above parameters.

Genetic by environment interaction model

The pipeline analysis process of GbyE includes three steps: data preprocessing, production converted, Association analysis. Normalize the phenotype data matrix Y of the dual environment and perform GbyE conversion to generate phenotype data in GbyE.Y format. The genotype data format, such as hapmap, vcf, bed and other formats firstly need to be converted into numerical genotype format (homozygotes were coded as 0 or 2, heterozygotes were coded as 1) using software or scripts such as GAPIT, PLINK, etc. The environment (E) matrix is environment index matrix. The G (n × m) originally of genotype matrix was converted as GbyE.GD(2n × 2 m) \(\left[\begin{array}{cc}G& 0\\ G& G\end{array}\right]\) during the Kronecker product, and the Y vector (n × 1) was also converted as the GbyE.Y vector (2n × 1) after normalization. The duplicated data format indicated different environments, genetic effect, and populations. The genomic data we used in the analysis was still retained the whole genome information. The first column of E is the additive effect, which was the average genetic effect among environments. The others columns of E are the interactive effect, which should be less one column than the number of environments. Because it need to avoid the linear dependent in the model. In the GbyE algorithm, we coded the first environment as background as default, that means the genotype in the first environment are 0, the others are 1. Then the Kronecker product of G and environment index matrix was named as GbyE.GD. The interactive effect part of the GbyE.GD matrix in the GWAS and GS were the relative values based on the first environment (Fig.  1 ). The GbyE environmental interaction matrix can be easily obtained by constructing the interaction matrix E (e.g., Eq. 1 ) such that the genotype matrix G is Kronecker-product with the design interaction matrix E (e.g., Eq. 2 ), in which \(\left[\begin{array}{c}G\\ G\end{array}\right]\) matrix is defined as additive effect and \(\left[\begin{array}{c}0\\ G\end{array}\right]\) matrix is defined as interactive effect. \(\left[\begin{array}{cc}G& 0\\ G& G\end{array}\right]\) matrix is called gene by environment interaction matrix, hereinafter referred to as the GbyE matrix. The phenotype file (GbyE.Y) and genotype file (GbyE.GD) after transformation by GbyE will be inputted into the GWAS and GS models and computed as standard phenotype and genotype files.

where G is the matrix of whole genotype and E is the design matrix for exploring interactive effects. GbyE mainly uses the Kronecker product of the genetic matrix (G) and the environmental matrix (E) as the genotype for subsequent GWAS as a way to distinguish between additive and interactive effects.

figure 1

The workflow pipeline of GbyE. The GbyE contains three main steps. (Step 1) Preprocessing of phenotype and genotype data,. The phenotype values in each environment was normalized respectively. Meanwhile, all genotype from HapMap, VCF, BED, and other types were converted to numeric genotype; (Step 2) Generate GbyE phenotype and interactive genotype matrix through the transformation of GbyE. In GbyE.GD matrix, the blue characters indicate additive effect, and red ones indicate interactive effect; (Step 3) The MLM and rrBLUP and BGLR were used to perform GWAS and GS

Association analysis model

The mixed linear model (MLM) of GAPIT is used as the basic model for GWAS analysis, and the principal component analysis (PCA) parameter is set to 3. Then the p -values of detection results are sorted and their power and FDR values are calculated. General expression of MLM (Fig.  1 ):

where Y is the vector of phenotypic measures (2n × 1); PCA and SNP i were defined as fixed effects, with a size of (2n × 2 m); Z is the incidence matrix of random effects; μ is the random effect vector, which follows the normal distribution μ ~ N(0, \({\delta }_{G}^{2}\) K) with mean vector of 0 and variance covariance matrix of \({\delta }_{G}^{2}\) K, where the \({\delta }_{G}^{2}\) is the total genetic variance including additive variance and interactive variance, the K is the kinship matrix built with all genotype including additive genotype and interactive genotype; e is a random error vector, and its elements need not be independent and identically distributed, e ~ N(0, \({\delta }_{e}^{2}\) I), where the \({\delta }_{e}^{2}\) is the residual and environment variance, the I is the design matrix.

Detectivity of GWAS

In the GWAS results, the list of markers following the order of P-values was used to evaluate detectivity of GWAS methods. When all simulated QTNs were detected, the power of the GWAS method was considered as 1 (100%). From the list of markers, following increasing of the criterion of real QTN, the power values will be increasing. The FDR indicates the rate between the wrong criterion of real QTNs and the number of all un-QTNs. The mean of 100 cycles was used to consider as the reference value for statistical power comparison. Here, we used a commonly used method in GWAS research with multiple traits or environmental phenotypes as a comparison[ 22 ]. This method obtains the mean of phenotypic values under different conditions as the phenotypic values for GWAS analysis, called the Mean value method, Compare the calculation results of GbyE with the additive and interactive effects of the mean method to evaluate the detection power of the GbyE strategy. Through the comprehensive analysis of these evaluation indicators, we aim to comprehensively evaluate the statistical power of the GbyE strategy in GWAS and provide a reference for future optimization research.

Among them, the formulae for calculating Power and FDR are as follows:

where \({{\text{n}}}_{{\text{i}}}\) indicates whether the i-th detection is true, true is 1, false is 0; \({{\text{m}}}_{{\text{r}}}\) is the total number of all true QTLs in the sample size; the maximum value of Power is 1.

where \({{\text{N}}}_{{\text{i}}}\) represents the i-th true value detected in the pseudogene, true is 1, false is 0. and cumulative calculation; \({{\text{M}}}_{{\text{f}}}\) is the number of all labeled un-QTNs in the total samples; the maximum value of FDR is 1.

Genomic prediction

To comparison the prediction accuracy of different GS models using GbyE, we performed rrBLUP, Bayesian methods using R packages. All phenotype of reference population and genotype of all population were used to train the model and predict genomic estimated breeding value (gEBV) of all individuals. The correlation between real phenotypes and gEBV of inference population was considered as prediction accuracy. fivefold cross-validation and 100 times repeats was performed to avoid over prediction and reduce bias. In order to distinguish the additive and interactive effects in GbyE, we designed two lists of additive and interactive effects in the "ETA" of BGLR, and put the additive and interactive effects into the model as two kinships for random objects. However, it was not possible to load the gene effects of the two lists in rrBLUP, so the additive and interactive genotypes together were used to calculate whole genetic kinship in rrBLUP (Fig.  1 ). Relevant parameters in BGLR are set as follows: 1) model set to "RRB"; 2) nIter is set to "12000"; 3) burnIn is set to "10000". The results of the above operations are averaged over 100 cycles. We also validated the GbyE method using four other Bayesian methods (BayesA, BayesB, BayesCpi, and Bayesian LASSO) in addition to RRB in BGLR.

Partial missing phentoype in the prediction

In this study, we artificially missed phenotype values in the single and double environments in the whole population from 281 inbred maize datasets. In the missing single environment case, the inference set in the cross-validation was selected from whole population, and each individual in the inference were only missed phenotypes in the one environment. The phenotype in the other environment was kept. The genotypes were always kept. In the case of missing double environments, both phenotypes and genotypes of environment 1 and environment 2 are missing, and the model can only predict phenotypic values in the two missing environments through the effects of other markers. In addition, the data were standardized and unstandardized to assess whether standardization had an effect on the estimation of the model. This experiment was tested using the "ML" method in rrBLUP to ensure the efficiency of the model.

GWAS statistical power of models at different heritabilities and genetic correlations

Power-FDR plots were used to demonstrate the detection efficiency of GbyE at three genetic correlation and three genetic power levels, with a total of nine different scenarios simulated (from left to right for high and low genetic correlation and from top to bottom for high and low genetic power). In order to distinguish whether the effect of improving the detection ability of genome-wide association analysis in GbyE is an additive effect or an effect of environmental interactions, we plotted their Power-FDR curves separately and added the traditional Mean method for comparative analysis. As shown in Fig.  2 , GbyE algorithm can detect more statistically significant genetic loci with lower FDR under any genetic background. However, in the combination with low heritability (Fig.  2 A, B, C), the interactive effect detected more real loci than GbyE under low FDR, but with the continued increase of FDR, GbyE detected more real loci than other groups. Under the combination with high heritability, all groups have high statistical power at low FDR, but with the increase of FDR, the statistical effect of GbyE gradually highlights. From the analysis of heritability combinations at all levels, the effect of heritability on interactive effect is not obvious, but GbyE always maintains the highest statistical power. The average detection power of GWAS in GbyE can be increased by about 20%, and with the decrease of genetic correlation, the effect of GbyE gradually highlights, indicating that the G × E plays a role.

figure 2

The power-FDR testing in simulated traits. Comparing the efficacy of the GbyE algorithm with the conventional mean method in terms of detection power and FDR. From left to right, the three levels of genetic correlation are indicated in order of low, medium and high. From top to bottom, the three levels of heritability, low, medium and high, are indicated in order. (1) Inter: Interactive section extracted from GbyE; (2) AddE: Additive section extracted from GbyE; (3) \({{\text{h}}}_{{\text{l}}}^{2}\) , \({{\text{h}}}_{{\text{m}}}^{2}\) , \({{\text{h}}}_{{\text{g}}}^{2}\) : Low, medium, high heritability; (4) \({{\text{R}}}_{{\text{l}}}\) , \({{\text{R}}}_{{\text{m}}}\) , \({{\text{R}}}_{{\text{l}}}\) : where R stands for genetic correlation, represents three levels of low, medium and high

Resolution of additive and interactive effect

The output results of GbyE could be understood as resolution of additive and interactive genetic effect. Hence, we created a combined Manhattan plots with Mean result from MLM, additive, and interactive results from GbyE. As shown in Fig.  3 , true marker loci were detected on chromosomes 1, 6 and 9 in Mean, and the same loci were detected on chromosomes 1 and 6 for the additive result in GbyE (the common loci detected jointly by the two results were marked as solid gray lines in the figure). All known pseudo QTNs were labeled with gray dots in the circle. Total 20 pseudo QTNs were simulated in such trait (The heritability is set to 0.9, and the genetic correlation is set to 0.1). Although the additive section in GbyE did not catch the locus on chromosome 9 yet (those p-values of markers did not show above the significance threshold (p-value < 3.23 × 10 –6 )), it has shown high significance relative to other markers of the same chromosome. In the reciprocal effect of GbyE, we detected more significant loci on chromosomes 1, 2, 3 and 10, and these loci were not detected in either of the two previous sections. An integrate QQ plot (Fig.  3 D) shows that the overall statistical power of the additive section in Mean and GbyE are close, nevertheless, the interactive section in the GbyE provided a bit of inflation.

figure 3

Manhattan statistical comparison plot. Manhattan comparison plots of mean ( A ), additive ( B ) and gene-environment interactive sections ( C ) at a heritability of 0.9 and genetic correlation of 0.1. Different colors are used in the diagram to distinguish between different chromosomes (X-axis). Loci with reinforcing circles and centroids are set up as real QTN loci. Consecutive loci found in both parts are shown as id lines, and loci found separately in the reciprocal effect only are shown as dashed lines. Parallel horizontal lines indicate significance thresholds ( p -value < 3.23 × 10 –6 ). D Quantile–quantile plots of simulated phenotypes for demo data from genome-wide association studies. x-axis indicates expected values of log p -values and y-axis is observed values of log p -values. The diagonal coefficients in red are 1. GbyE-inter is the interactive section in GbyE; GbyE-AddE is the additive section in GbyE

Genomic selection in assumption codistribution

The prediction accuracy of GbyE was significantly higher than the Mean value method by model statistics of rrBLUP in most cases of heritability and genetic correlation (Fig.  4 ). The prediction accuracy of the additive effect was close to that of Mean value method, which was consistent with the situation under the low hereditary. The prediction accuracy of interactive sections in GbyE remains at the same level as in GbyE, and interactive section plays an important role in the model. We observed that in \({{\text{h}}}_{{\text{l}}}^{2}{{\text{R}}}_{{\text{h}}}\) (Fig.  4 C), \({{\text{h}}}_{{\text{m}}}^{2}{{\text{R}}}_{{\text{h}}}\) (Fig.  4 F), \({{\text{h}}}_{{\text{h}}}^{2}{{\text{R}}}_{{\text{l}}}\) (Fig.  4 G), the prediction accuracy of GbyE was slightly higher than the Mean value method, but there was no significant difference overall. In addition, we only observed that the prediction accuracy of GbyE was slightly lower than the Mean value method in \({{\text{h}}}_{{\text{h}}}^{2}{{\text{R}}}_{{\text{l}}}\) (Fig.  4 H), but there was still no significant difference between GbyE and Mean value methods. Under the combination of low heritability and genetic correlation, the prediction accuracy of Mean value method and additive effect model remained at a similar level. However, with the continuous increase of heritability and genetic correlation, the difference in prediction accuracy between the two gradually increases. In summary, the GbyE algorithm can improve the accuracy of GS by capturing information on multiple environment or trait effects under the rrBLUP model.

figure 4

Box-plot of model prediction accuracy. The prediction accuracy (pearson's correlation coefficient) of the GbyE algorithm was compared with the tradition al Mean value method in a simulation experiment of genomic selection under the rrBLUP operating environment. The effect of different levels of heritability and genetic correlation on the prediction accuracy of genomic selection was simulated in this experiment. Each row from top to bottom represents low heritability ( \({{\text{h}}}_{{\text{l}}}^{2}\) ), medium heritability ( \({{\text{h}}}_{{\text{m}}}^{2}\) ) and high heritability ( \({{\text{h}}}_{{\text{h}}}^{2}\) ), respectively; each column from left to right represents low genetic correlation ( \({{\text{R}}}_{{\text{l}}}\) ), medium genetic correlation ( \({{\text{R}}}_{{\text{m}}}\) ) and high genetic correlation ( \({{\text{R}}}_{{\text{h}}}\) ), respectively; The X-axis shows the different test methods and effects, and the Y-axis shows the prediction accuracy

Genomic selection in assumption un-codistribution

The overall performance of GbyE under the 'BRR' statistical model based on the BGLR package remained consistent with rrBLUP, maintaining high predictive accuracy in most cases of heritability and genetic relatedness (Fig. S1 ). However, when the heritability is set to low and medium, the difference between the prediction accuracy of GbyE algorithm and Mean value method gradually decreases with the continuous increase of genetic correlation, and there is no statistically significant difference between the two. The prediction accuracy of the model by GbyE in \({{\text{h}}}_{{\text{h}}}^{2}{{\text{R}}}_{{\text{l}}}\) (Fig. S1 G) and \({{\text{h}}}_{{\text{h}}}^{2}{{\text{R}}}_{{\text{h}}}\) (Fig. S1 I) is significantly higher than that by Mean value method when the heritability is set to be high. On the contrary, when the genetic correlation is set to medium, there is no significant difference between GbyE and Mean value method in improving the prediction accuracy of the model, and the overall mean of GbyE is lower than Mean. When GbyE has relatively high heritability and low genetic correlation, its prediction accuracy is significantly higher than the mean method, such as \({{\text{h}}}_{{\text{m}}}^{2}{{\text{R}}}_{{\text{l}}}\) (Fig. S1 D), \({{\text{h}}}_{{\text{h}}}^{2}{{\text{R}}}_{{\text{l}}}\) (Fig. S1 G), and \({{\text{h}}}_{{\text{h}}}^{2}{{\text{R}}}_{{\text{m}}}\) (Fig. S1 H). Therefore, GbyE is more suitable for situations with high heritability and low genetic correlation.

Adaptability of Bayesian models

Next, we tested a more complex Bayesian model. The GbyE algorithm and Mean value method were combined with five Bayesian algorithms in BGLR for GS analysis, and the computing R script was used for phenotypic simulation test, where heritability and genetic correlation were both set to 0.5. The results indicate that among the three Bayesian models of RRB, BayesA, and BayesLASSO, the predictive accuracy of GbyE is significantly higher than that of Mean value method (Fig.  5 ). In contrast, under the Bayesian models of BayesB and BayesCpi, the prediction accuracy of GbyE is lower than that of the Mean value method. The GbyE algorithm improves the prediction accuracy of the three Bayesian models BRR, BayesA, and BayesLASSO using information from G × E and increases the prediction accuracy by 9.4%, 9.1%, and 11%, respectively, relative to the Mean value method. However, the predictive accuracy of the BayesB model decreased by 11.3%, while the BayescCpi model decreased by 6%.

figure 5

Relative prediction accuracy histogram for different Bayesian models. The X-axis is the Bayesian approach based on BGLR, and the Y-axis is the relative prediction accuracy. Where we normalize the prediction accuracy of Mean (the prediction accuracy is all adjusted to 1); the prediction accuracy of GbyE is the increase or decrease value relative to Mean in the same group of models

Impact of all and partial environmental missing

We tested missing the environmental by using simulated data. In the case of the simulated data, we simulated a total of nine situations with different heritability and genetic correlations (Fig.  6 ) and conducted tests on single and dual environment missing. The improvement in prediction accuracy by the GbyE algorithm was found to be significantly higher than the Mean value method in single environment deletion, regardless of the combination of heritability and genetic correlation. In the case of \({{\text{h}}}_{{\text{h}}}^{2}{{\text{R}}}_{{\text{h}}}\) , the prediction accuracy of GbyE is higher than 0.5, which is the highest value among all simulated combinations. When GbyE estimates the phenotypic values of Environment 1 and Environment 2 separately, its predictive accuracy seems too accurate. On the other hand, when the phenotypic values of both environments are missing on the same genotype, the predictive accuracy of GbyE does not show a significant decrease, and even maintains accuracy comparable to that of a single environment missing. However, when GbyE estimates Environment 1 and Environment 2 separately, the prediction accuracy significantly decreases compared to when a single environment is missing, and the prediction accuracy of Environment 1 and Environment 2 in \({{\text{h}}}_{{\text{l}}}^{2}{{\text{R}}}_{{\text{m}}}\) is extremely low (Fig.  6 B). In addition, the prediction accuracy of GbyE is lower than Mean values only in \({{\text{h}}}_{{\text{l}}}^{2}{{\text{R}}}_{{\text{h}}}\) , whether it is missing in a single or dual environment.

figure 6

Prediction accuracy of simulated data in single and dual environment absence. The prediction effect of GbyE was divided into two parts, environment 1 and environment 2, to compare the prediction accuracy of GbyE when predicting these two parts separately. This includes simulations with missing phenotypes and genotypes in environment 1 only ( A ) and simulations with missing in both environments ( B ). The horizontal coordinates of the graph indicate the different combinations of heritabilities and genetic correlations of the simulations

The phenotype of organisms is usually controlled by multiple factors, mainly genetic [ 23 ] and environmental factors [ 24 ], and their interactive factors. The phenotype of quantitative traits is often influenced by these three factors [ 25 , 26 ]. However, based on the computing limitation and lack of special tool, the interactive effect always was ignored in most GWAS and GS research, and it is difficult to distinguish additive and interactive effects. The rate between all additive genetic variance and phenotype variance was named as narrow sense heritability. The accuracy square of prediction of additive GS model is considered that can not surpass narrow sense heritability. In this study, the additive effects in GbyE are essentially equivalent to the detectability of traditional models, the key advantage of GbyE is the interactive section. More significant markers with interactive effects were detected. Detecting two genetic effects (additive and interactive sections) in GWAS and GS is a boost to computational complexity, while obtaining genotypes for genetic interactions by Kronecker product is an efficient means. This allows the estimation of additive and interactive genetic effects separately during the analysis, and ultimately the estimated genetic effects for each GbyE genotype (including additive and interactive genetic effect markers) are placed in a t-distribution for p -value calculation, and the significance of each genotype is considered by multiple testing. The GbyE also expanded the estimated heritability as generalized heritability which could be explained as the rate between total genetics variance and phenotype variance.

The genetic correlation among traits in multiple environments is the major immanent cause of GbyE. When the genetic correlation level is high, then additive genetic effects will play primary impact in the total genetic effect, and interactive genetic effects with different traits or environments are often at lower levels [ 27 ]. Therefore, the statistical power of the GbyE algorithm did not improve significantly compared with the traditional method (Mean value) when simulating high levels of genetic correlation. On the contrary, in the case of low levels of genetic correlation, the genetic variance of additive effects is relatively low and the genetic variance of interactive effects is major. At this time, GbyE utilizes multiple environments or traits to highlight the statistical power. Since the GbyE algorithm obtains additive, environmental, and interactive information by encoding numerical genotypes, it only increases the volume of SNP data and can be applied to any traditional GWAS association statistical model. However, this may slightly increase the correlation operation time of the GWAS model, but compared to other multi environment or trait models [ 28 , 29 ], GbyE only needs to perform a complete traditional GWAS once to obtain the results.

In GS, rrBLUP algorithm is a linear mixed model-based prediction method that assumes all markers provide genetic effects and their values following a normal distribution [ 30 ]. In contrast, the BGLR model is a linear mixed model, which assumes that gene effects are randomly drawn from a multivariate normal distribution and genotype effects are randomly drawn from a multivariate Gaussian process, which takes into account potential pleiotropy and polygenic effects and allows inferring the effects of single gene while estimating genomic values [ 31 ]. The algorithm typically uses Markov Chain Monte Carlo methods for estimation of the ratio between genetic variances and residual variances [ 32 , 33 ]. The model has been able to take into account more biological features and complexity, and therefore the overall improvement of the GbyE algorithm under BGLR is smaller than Mean method. In addition, the length of the Markov chain set on the BGLR package is often above 20,000 to obtain stable parameters and to undergo longer iterations to make the chain stable [ 34 ]. GbyE is effective in improving the statistical power of the model under most Bayesian statistical models. In the case of the phenotypes we simulated, more iterations cannot be provided for the BayesB and BayesCpi models because of the limitation of computation time, which causes low prediction accuracy. It is worth noting that the prediction accuracy of BayesCpi may also be influenced by the number of QTLs [ 35 ], and the prediction accuracy of BayesB is often related to the distribution of different allele frequencies (from rare to common variants) at random loci [ 36 ].

The overall statistical power of GbyE was significantly higher in missing single environment than in missing double environment, because in the case of missing single environment, GbyE can take full advantage of the information from the phenotype in the second environment. And the correlation between two environments can also affect the detectability of the GbyE algorithm in different ways. On the one hand, a high correlation between two environments can improve the predictive accuracy of the GbyE algorithm by using the information from one environment to predict the breeding values in the other environment, even if there is only few relationship with that environment [ 37 , 38 ]. On the other hand, when two environments are extremely uncorrelated, GbyE algorithm trained in one environment may not export well to another environment, which may lead to a decrease in prediction accuracy [ 39 ]. In the testing, we found that when the GbyE algorithm uses a GS model trained in one environment and tested in another environment, the high correlation between environments may result to the model capturing similarities between environments unrelated to G × E information [ 40 ]. However, when estimating the breeding values for each environment separately, GbyE still made effective predictions using the genotypes in that environment and maintained high prediction accuracy. As expected, the additive effect calculates the average genetic effect between environments, and its predictive effect does not differ much from the mean method. The interactive effect, however, has one less column than the number of environments, and it calculates the relative values between environments, a component that has a direct impact on the predictive effect. The correlation between the two environments may have both positive and negative effects on the detectability of the GbyE, so it is important to carefully consider the relationship between the two environments in subsequent in development and testing.

A key advantage of the GbyE algorithm is that it can be applied to almost all current genome-wide association and prediction. However, the focus of GbyE is still on estimating additive and interactive effects separately, so that it is easy to determine which portion of the is playing a role in the computational estimation.. The GbyE algorithm may have implications for the design of future GS studies. For example, the model could be used to identify the best environments or traits to include in GS studies in order to maximize prediction accuracy. It is particularly important to test the model on large datasets with different genetic backgrounds and environmental conditions to ensure that it can accurately predict genome-wide effects in a variety of contexts.

GbyE can simulate the effects of gene-environment interactions by building genotype files for multiple environments or multiple traits, normalizing the effects of multiple environments and multiple traits on marker effects. It also enables higher statistical power and prediction accuracy for GWAS and GS. The additive and interactive effects of genes under genetic roles could be revealed clearly, which makes it possible to utilize environmental information to improve the statistical power and prediction accuracy of traditional models, thus helping us to better understand the interactions between genes and the environment.

Availability of data and materials

The GbyE source code, demo script, and demo data are freely available on the GitHub website ( https://github.com/liu-xinrui/GbyE ).

Abbreviations

  • Genome-widely association study

Genome selection

Genetic by environmental interaction

Genome association and prediction integrated tool

Mixed linear model

Bayesian generalized linear regression

Ridge regression best linear unbiased prediction

False discovery rate

Principal component analysis

Genomic estimated breeding value

Maazi H, Hartiala JA, Suzuki Y, Crow AL, Shafiei Jahani P, Lam J, Patel N, Rigas D, Han Y, Huang P. A GWAS approach identifies Dapp1 as a determinant of air pollution-induced airway hyperreactivity. PLoS Genet. 2019;15(12):e1008528.

Article   PubMed   PubMed Central   Google Scholar  

Simonds NI, Ghazarian AA, Pimentel CB, Schully SD, Ellison GL, Gillanders EM, Mechanic LE. Review of the gene-environment interaction literature in cancer: what do we know? Genet Epidemiol. 2016;40(5):356–65.

Wang X, Chen H, Kapoor PM, Su Y-R, Bolla MK, Dennis J, Dunning AM, Lush M, Wang Q, Michailidou K. A Genome-Wide Gene-Based Gene-Environment Interaction Study of Breast Cancer in More than 90,000 Women. Cancer research communications. 2022;2(4):211–9.

Article   CAS   PubMed   PubMed Central   Google Scholar  

Chen R-X, Dai M-D, Zhang Q-Z, Lu M-P, Wang M-L, Yin M, Zhu X-J, Wu Z-F, Zhang Z-D, Cheng L. TLR Signaling Pathway Gene Polymorphisms, Gene-Gene and Gene-Environment Interactions in Allergic Rhinitis. Journal of Inflammation Research. 2022;15:3613–30.

Zhao M-Z, Song X-S, Ma J-S. Gene× environment interaction in major depressive disorder. World Journal of Clinical Cases. 2021;9(31):9368.

Falconer DS. The problem of environment and selection. Am Nat. 1952;86(830):293–8.

Article   Google Scholar  

Kim J, Zhang Y, Pan W. Powerful and adaptive testing for multi-trait and multi-SNP associations with GWAS and sequencing data. Genetics. 2016;203(2):715–31.

Visscher PM, Wray NR, Zhang Q, Sklar P, McCarthy MI, Brown MA, Yang J. 10 years of GWAS discovery: biology, function, and translation. The American Journal of Human Genetics. 2017;101(1):5–22.

Article   CAS   PubMed   Google Scholar  

van Os J, Rutten BP. Gene-environment-wide interaction studies in psychiatry. Am J Psychiatry. 2009;166(9):964–6.

Article   PubMed   Google Scholar  

Winham SJ, Biernacka JM. Gene–environment interactions in genome-wide association studies: current approaches and new directions. Journal of Child Psychology Psychiatry. 2013;54(10):1120–34.

Windhausen VS, Atlin GN, Hickey JM, Crossa J, Jannink J-L, Sorrells ME, Raman B, Cairns JE, Tarekegne A, Semagn K. Effectiveness of genomic prediction of maize hybrid performance in different breeding populations and environments. G3: Genes|Genomes|Genetics. 2012;2(11):1427–36.

Xu S, Zhu D, Zhang Q. Predicting hybrid performance in rice using genomic best linear unbiased prediction. Proc Natl Acad Sci. 2014;111(34):12456–61.

Zhao Y, Mette M, Gowda M, Longin C, Reif J. Bridging the gap between marker-assisted and genomic selection of heading time and plant height in hybrid wheat. Heredity. 2014;112(6):638–45.

Crossa J, Perez P, Hickey J, Burgueno J, Ornella L, Cerón-Rojas J, Zhang X, Dreisigacker S, Babu R, Li Y. Genomic prediction in CIMMYT maize and wheat breeding programs. Heredity. 2014;112(1):48–60.

Crossa J, Pérez-Rodríguez P, Cuevas J, Montesinos-López O, Jarquín D, De Los CG, Burgueño J, González-Camacho JM, Pérez-Elizalde S, Beyene Y. Genomic selection in plant breeding: methods, models, and perspectives. Trends Plant Sci. 2017;22(11):961–75.

Roorkiwal M, Jarquin D, Singh MK, Gaur PM, Bharadwaj C, Rathore A, Howard R, Srinivasan S, Jain A, Garg V. Genomic-enabled prediction models using multi-environment trials to estimate the effect of genotype× environment interaction on prediction accuracy in chickpea. Sci Rep. 2018;8(1):11701.

Burgueño J, de los Campos G, Weigel K, Crossa J. Genomic prediction of breeding values when modeling genotype× environment interaction using pedigree and dense molecular markers. Crop Science. 2012;52(2):707–19.

Jarquín D, Crossa J, Lacaze X, Du Cheyron P, Daucourt J, Lorgeou J, Piraux F, Guerreiro L, Pérez P, Calus M. A reaction norm model for genomic selection using high-dimensional genomic and environmental data. Theoretical applied genetics. 2014;127:595–607.

Wang JB, Zhang ZW. GAPIT Version 3: boosting power and accuracy for genomic association and prediction. Genomics Proteomics Bioinformatics. 2021;19(4):629–40.

Pérez P, de Los CG. Genome-wide regression and prediction with the BGLR statistical package. Genetics. 2014;198(2):483–95.

Endelman JB. Ridge Regression and other kernels for genomic selection with R package rrBLUP. Plant Genome J. 2011;4:250–5.

Turley P, Walters RK, Maghzian O, Okbay A, Lee JJ, Fontana MA, Nguyen-Viet TA, Wedow R, Zacher M. Furlotte NAJNg. Multi-trait analysis of genome-wide association summary statistics using MTAG. 2018;50(2):229–37.

CAS   Google Scholar  

Falconer DS. Introduction to quantitative genetics. Pearson Education India; 1996.

Google Scholar  

Lynch M, Walsh B. Genetics and analysis of quantitative traits, vol. 1: Sinauer Sunderland, MA. 1998.

Mackay TF. The genetic architecture of quantitative traits. Annu Rev Genet. 2001;35(1):303–39.

Visscher PM, Hill WG, Wray NR. Heritability in the genomics era—concepts and misconceptions. Nat Rev Genet. 2008;9(4):255–66.

Van der Sluis S, Posthuma D, Dolan CV. TATES: efficient multivariate genotype-phenotype analysis for genome-wide association studies. PLoS Genet. 2013;9(1):e1003235.

O’Reilly PF, Hoggart CJ, Pomyen Y, Calboli FC, Elliott P, Jarvelin M-R, Coin LJ. MultiPhen: joint model of multiple phenotypes can increase discovery in GWAS. PLoS ONE. 2012;7(5):e34861.

Chung J, Jun GR, Dupuis J, Farrer LA. Comparison of methods for multivariate gene-based association tests for complex diseases using common variants. Eur J Hum Genet. 2019;27(5):811–23.

Pérez-Rodríguez P, Gianola D, González-Camacho JM, Crossa J, Manès Y, Dreisigacker S. Comparison between linear and non-parametric regression models for genome-enabled prediction in wheat. G3: Genes|Genomes|Genetics. 2012;2(12):1595–16605.

VanRaden PM. Efficient methods to compute genomic predictions. J Dairy Sci. 2008;91(11):4414–23.

Meuwissen TH, Hayes BJ, Goddard M. Prediction of total genetic value using genome-wide dense marker maps. Genetics. 2001;157(4):1819–29.

de Los CG, Hickey JM, Pong-Wong R, Daetwyler HD, Calus MP. Whole-genome regression and prediction methods applied to plant and animal breeding. Genetics. 2013;193(2):327–45.

Andrieu C, De Freitas N, Doucet A, Jordan MI. An introduction to MCMC for machine learning. Mach Learn. 2003;50:5–43.

Daetwyler HD, Pong-Wong R, Villanueva B, Woolliams JA. The impact of genetic architecture on genome-wide evaluation methods. Genetics. 2010;185(3):1021–31.

Clark SA, Hickey JM, Van der Werf JH. Different models of genetic variation and their effect on genomic evaluation. Genet Sel Evol. 2011;43(1):1–9.

Yang J, Benyamin B, McEvoy BP, Gordon S, Henders AK, Nyholt DR, Madden PA, Heath AC, Martin NG, Montgomery GW. Common SNPs explain a large proportion of the heritability for human height. Nat Genet. 2010;42(7):565–9.

González-Recio O, Forni S. Genome-wide prediction of discrete traits using Bayesian regressions and machine learning. Genet Sel Evol. 2011;43:1–12.

Korte A, Farlow A. The advantages and limitations of trait analysis with GWAS: a review. Plant Methods. 2013;9(1):1–9.

Gauderman WJ. Sample size requirements for matched case-control studies of gene–environment interaction. Stat Med. 2002;21(1):35–50.

Download references

Acknowledgements

Thank you to all colleagues in the laboratory for their continuous help.

This project was partially funded by the National Key Research and Development Project of China, China (2022YFD1601601), the Heilongjiang Province Key Research and Development Project, China (2022ZX02B09), the Qinghai Science and Technology Program, China (2022-NK-110), Sichuan Science and Technology Program, China (Award #s 2021YJ0269 and 2021YJ0266), the Program of Chinese National Beef Cattle and Yak Industrial Technology System, China (Award #s CARS-37), and Fundamental Research Funds for the Central Universities, China (Southwest Minzu University, Award #s ZYN2023097).

Author information

Authors and affiliations.

Key Laboratory of Qinghai-Tibetan Plateau Animal Genetic Resource Reservation and Utilization, Sichuan Province and Ministry of Education, Southwest Minzu University, Chengdu, 6110041, China

Xinrui Liu, Mingxiu Wang, Jie Qin, Yaxin Liu, Shikai Wang, Shiyu Wu, Ming Zhang, Jincheng Zhong & Jiabo Wang

Nanchong Academy of Agricultural Sciences, Nanchong, 637000, China

You can also search for this author in PubMed   Google Scholar

Contributions

JW and XL conceived and designed the project. XL managed the entire trial, conducted software code development, software testing, and visualization. MW, JQ, YL, SW, MZ and SW helped with data collection and analysis. JQ, and YL assisted with laboratory analyses. JW, and XL had primary responsibility for the content in the final manuscript. JZ supervised the research. JW designed software and project methodology. All authors approved the final manuscript. All authors have reviewed the manuscript.

Corresponding author

Correspondence to Jiabo Wang .

Ethics declarations

Ethics approval and consent to participate.

Not applicable.

Consent for publication

Competing interests.

The authors have declared no competing interests.

Additional information

Publisher’s note.

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary Information

Supplementary material 1., rights and permissions.

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ . The Creative Commons Public Domain Dedication waiver ( http://creativecommons.org/publicdomain/zero/1.0/ ) applies to the data made available in this article, unless otherwise stated in a credit line to the data.

Reprints and permissions

About this article

Cite this article.

Liu, X., Wang, M., Qin, J. et al. GbyE: an integrated tool for genome widely association study and genome selection based on genetic by environmental interaction. BMC Genomics 25 , 386 (2024). https://doi.org/10.1186/s12864-024-10310-5

Download citation

Received : 27 December 2023

Accepted : 15 April 2024

Published : 19 April 2024

DOI : https://doi.org/10.1186/s12864-024-10310-5

Share this article

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

  • Genomic selection

BMC Genomics

ISSN: 1471-2164

review case study methodology

IMAGES

  1. why use case study approach

    review case study methodology

  2. How To Do Case Study Analysis?

    review case study methodology

  3. how to make a methodology in case study

    review case study methodology

  4. PPT

    review case study methodology

  5. Write Online: Case Study Report Writing Guide

    review case study methodology

  6. 4 components of a systematic review

    review case study methodology

VIDEO

  1. Case Study Methodology

  2. Case Study Methodology

  3. Case study

  4. Research Design ~ How

  5. RESEARCH METHODOLOGY# CASE STUDY METHOD# VIDEO

  6. Methodological Reviews

COMMENTS

  1. Case Study Methodology of Qualitative Research: Key Attributes and

    A case study is one of the most commonly used methodologies of social research. This article attempts to look into the various dimensions of a case study research strategy, the different epistemological strands which determine the particular case study type and approach adopted in the field, discusses the factors which can enhance the effectiveness of a case study research, and the debate ...

  2. Methodology or method? A critical review of qualitative case study

    Case study methodology or method. A third of the case studies reviewed appeared to use a case report method, not case study methodology as described by principal authors (Creswell, 2013b; Merriam, 2009; Stake, 1995; Yin, 2009). Case studies were identified as a case report because of missing methodological detail and by review of the study aims ...

  3. What Is a Case Study?

    A case study is a detailed study of a specific subject, such as a person, group, place, event, organization, or phenomenon. Case studies are commonly used in social, educational, clinical, and business research. A case study research design usually involves qualitative methods, but quantitative methods are sometimes also used.

  4. Continuing to enhance the quality of case study methodology in health

    Purpose of case study methodology. Case study methodology is often used to develop an in-depth, holistic understanding of a specific phenomenon within a specified context. 11 It focuses on studying one or multiple cases over time and uses an in-depth analysis of multiple information sources. 16,17 It is ideal for situations including, but not limited to, exploring under-researched and real ...

  5. The case study approach

    A case study is a research approach that is used to generate an in-depth, multi-faceted understanding of a complex issue in its real-life context. It is an established research design that is used extensively in a wide variety of disciplines, particularly in the social sciences. A case study can be defined in a variety of ways (Table 5 ), the ...

  6. (PDF) Qualitative Case Study Methodology: Study Design and

    For this study, a research methodology that combines case studies and a review of the literature is used. The study used data majorly from annual PPP reports from the Ministry of Finance in Ghana ...

  7. How Qualitative Case Study

    Qualitative case study methodology (QCSM) is a useful research approach that has grown in popularity within the social sciences; however, it has received less attention in the occupational therapy literature. The current scoping review aims to explore how studies utilizing a QCSM help inform occupational therapy knowledge and practice.

  8. Perspectives from Researchers on Case Study Design

    Case study research is typically extensive; it draws on multiple methods of data collection and involves multiple data sources. The researcher begins by identifying a specific case or set of cases to be studied. Each case is an entity that is described within certain parameters, such as a specific time frame, place, event, and process.

  9. Case Study

    Defnition: A case study is a research method that involves an in-depth examination and analysis of a particular phenomenon or case, such as an individual, organization, community, event, or situation. It is a qualitative research approach that aims to provide a detailed and comprehensive understanding of the case being studied.

  10. What the Case Study Method Really Teaches

    It's been 100 years since Harvard Business School began using the case study method. Beyond teaching specific subject matter, the case study method excels in instilling meta-skills in students.

  11. Methodology or method? A critical review of qualitative case study

    Improved reporting of case studies by qualitative researchers will advance the methodology for the benefit of researchers and practitioners. Despite on-going debate about credibility, and reported limitations in comparison to other approaches, case study is an increasingly popular approach among qualitative researchers. We critically analysed the methodological descriptions of published case ...

  12. Evaluating complex interventions in context: systematic, meta-narrative

    There is a growing need for methods that acknowledge and successfully capture the dynamic interaction between context and implementation of complex interventions. Case study research has the potential to provide such understanding, enabling in-depth investigation of the particularities of phenomena. However, there is limited guidance on how and when to best use different case study research ...

  13. Literature review as a research methodology: An ...

    An effective and well-conducted review as a research method creates a firm foundation for advancing knowledge and facilitating theory development (Webster & Watson, 2002). By integrating findings and perspectives from many empirical findings, a literature review can address research questions with a power that no single study has.

  14. A methodological review of qualitative case study methodology in

    Results: The review identified both case study research's applicability to midwifery and its low uptake, especially in clinical studies. Many papers included the necessary criteria to achieve rigour. The included measures of authenticity and methodology were varied. A high standard of authenticity was observed, suggesting authors considered ...

  15. Methodology minute: An overview of the case-case study design and its

    The case-case study design is a potentially useful tool for infection preventionists during outbreak or cluster investigations. This column clarifies terminology related to case-case, case-control, and case-case-control study designs. ... In this Methodology Minute column, we review the fundamentals of case-case studies, provide ...

  16. Methodology or method? A critical review of qualitative case study reports

    Differences between published case studies can make it difficult for researchers to define and understand case study as a methodology. Experienced qualitative researchers have identified case study research as a stand-alone qualitative approach (Denzin & Lincoln, 2011b). Case study research has a level of flexibility that is not readily offered ...

  17. 5 Benefits of the Case Study Method

    Through the case method, you can "try on" roles you may not have considered and feel more prepared to change or advance your career. 5. Build Your Self-Confidence. Finally, learning through the case study method can build your confidence. Each time you assume a business leader's perspective, aim to solve a new challenge, and express and ...

  18. A scoping review of 'Pacing' for management of Myalgic

    Using the framework of Arksey and O'Malley , a scoping review aims to use a broad set of search terms and include a wide range of study designs and methods (in contrast to a systematic review ). This approach, has the benefit of clarifying key concepts, surveying current data collection approaches, and identifying critical knowledge gaps.

  19. Deciphering the influence: academic stress and its role in shaping

    Background Nursing education presents unique challenges, including high levels of academic stress and varied learning approaches among students. Understanding the relationship between academic stress and learning approaches is crucial for enhancing nursing education effectiveness and student well-being. Aim This study aimed to investigate the prevalence of academic stress and its correlation ...

  20. What is quality in long covid care? Lessons from a national quality

    Background Long covid (post covid-19 condition) is a complex condition with diverse manifestations, uncertain prognosis and wide variation in current approaches to management. There have been calls for formal quality standards to reduce a so-called "postcode lottery" of care. The original aim of this study—to examine the nature of quality in long covid care and reduce unwarranted ...

  21. Distinguishing case study as a research method from case reports as a

    VARIATIONS ON CASE STUDY METHODOLOGY. Case study methodology is evolving and regularly reinterpreted. Comparative or multiple case studies are used as a tool for synthesizing information across time and space to research the impact of policy and practice in various fields of social research [].Because case study research is in-depth and intensive, there have been efforts to simplify the method ...

  22. Determining the true value of a website: A GSA case study

    This performance metric shows you that, on average, your websites perform at 84% of a perfect 100% score, and there are a few low-performing websites at 26% performance or lower. This works for you; you know you need to get rid of your agency's low-performing websites.

  23. Methodology or method? A critical review of qualitative case study reports

    Definitions of qualitative case study research. Case study research is an investigation and analysis of a single or collective case, intended to capture the complexity of the object of study (Stake, Citation 1995).Qualitative case study research, as described by Stake (Citation 1995), draws together "naturalistic, holistic, ethnographic, phenomenological, and biographic research methods ...

  24. Power in mixed martial arts (MMA): a case study of the ultimate

    Literature review. Mixed Martial Arts (MMA) is a combat sport that combines the rules and practices of several martial arts. In the 1990s, tournaments were held with minimal rules and were characterised by violent conduct that resulted in political and public pressure to ban the sport (Hill Citation 2013).Political opposition was notable from senator John McCain who labelled the sport 'human ...

  25. GbyE: an integrated tool for genome widely association study and genome

    The growth and development of organism were dependent on the effect of genetic, environment, and their interaction. In recent decades, lots of candidate additive genetic markers and genes had been detected by using genome-widely association study (GWAS). However, restricted to computing power and practical tool, the interactive effect of markers and genes were not revealed clearly.

  26. Sustainability

    The main aim of this article is to evaluate the impact of dynamic indicators associated with urban spaces on the environmental behavior of residents in Shanghai, China. With the city experiencing rapid urbanization and increasing environmental concerns, it is crucial to understand how the design and management of urban spaces can encourage pro-environmental attitudes and actions among the ...