Laat je scriptie nakijken op taal

Check je scriptie gratis op plagiaat, literatuurlijst genereren volgens de apa-stijl.

  • Knowledge Base
  • Onderzoeksmethoden
  • Systematische Review | Stappenplan & Voorbeeld

Systematische Review | Stappenplan & Voorbeeld

Vertaald op 19 augustus 2022 door Veronique Scharwächter. Oorspronkelijk gepubliceerd door Shaun Turney

Een systematische review ( systematic review ) is een soort review waarbij formele, herhaalbare methoden worden gebruikt om al het beschikbare bewijsmateriaal uit de bestaande literatuur te vinden, te selecteren en te synthetiseren.

In de review beantwoord je een duidelijk geformuleerde onderzoeksvraag en vermeld je expliciet de methoden die zijn gebruikt om tot het antwoord te komen.

Zij beantwoordden de vraag: “Wat is de effectiviteit van probiotica bij het verminderen van eczeemsymptomen en het verbeteren van de kwaliteit van leven bij patiënten met eczeem?”

Een probioticum is in deze context een gezondheidsproduct dat levende micro-organismen bevat en via de mond wordt ingenomen. Eczeem is een veel voorkomende huidaandoening die een rode, jeukende huid veroorzaakt.

Inhoudsopgave

Wat is een systematische review, systematische review vs meta-analyse, systematische review vs literatuuronderzoek, systematische review vs scoping review, wanneer voer je een systematische review uit, voor- en nadelen van een systematische review, stap-voor-stap voorbeeld van een systematische review, veelgestelde vragen over systematische review.

Een review is een literatuuroverzicht van het onderzoek dat al over een onderwerp is gedaan.

Een systematische review ( systematic review ) onderscheidt zich van andere reviews doordat de onderzoeksmethoden zo zijn opgezet dat vertekening ( bias ) wordt beperkt. De methoden zijn herhaalbaar (repliceerbaar) en de aanpak is formeel en systematisch:

  • Formuleer een onderzoeksvraag
  • Ontwikkel een protocol
  • Zoek naar alle relevante literatuur
  • Pas de selectiecriteria toe
  • Extraheer de data
  • Bundel de data
  • Schrijf en publiceer een verslag

Hoewel er meerdere richtlijnen voor een systematische review bestaan, is het Cochrane Handbook for Systematic Reviews één van de meeste gebruikte. Het handboek bevat gedetailleerde richtlijnen over hoe elke stap van het proces van een systematische review moet worden doorlopen.

Systematische reviews worden het meest toegepast in medisch onderzoek, maar ze worden ook gebruikt in andere vakgebieden.

Bij een systematische review is het gebruikelijk dat je de onderzoeksvraag beantwoordt door een synthese te maken van al het beschikbare bewijsmateriaal en de kwaliteit van het bewijsmateriaal vervolgens te evalueren.

Synthetiseren betekent het samenbrengen of bundelen van verschillende informatie om één samenhangend verhaal te vertellen. De synthese kan narratief ( kwalitatief ), kwantitatief of beide zijn.

Ontvang feedback op taal, structuur, lay-out en bronvermelding

Professionele Scribbr-editors kijken je scriptie na op:

  • Academisch taalgebruik
  • Onduidelijke zinnen
  • Grammaticale fouten
  • Interpunctie
  • Verboden woorden

Bekijk het voorbeeld

systematic literature review vertaling

Systematische reviews geven vaak een kwantitatieve synthese van het bewijsmateriaal met behulp van een meta-analyse. Een meta-analyse is een statistische analyse en geen review.

Een meta-analyse is een techniek om resultaten van meerdere studies te synthetiseren (i.e. samen te vatten). Het is een statistische analyse waarbij de resultaten van twee of meer studies worden gecombineerd, meestal om een effectgrootte te schatten. Je kunt dus een meta-analyse gebruiken voor een systematische review.

Een literatuuronderzoek is een soort review waarbij een minder systematische en formele aanpak wordt gevolgd dan bij een systematische review. Bij een literatuuronderzoek is het gebruikelijker dat een deskundige van het vakgebied werk kwalitatief zal samenvatten en evalueren, zonder een formele, expliciete methode te gebruiken.

Een scoping review is vergelijkbaar met een systematische review en wordt ook gebruikt om vertekening (bias) tot een minimum te beperken door transparante en herhaalbare methoden te gebruiken.

De reviews zijn echter niet hetzelfde. Het belangrijkste verschil is het doel: in plaats van een specifieke onderzoeksvraag te beantwoorden, verkent een scoping review een onderwerp. De onderzoeker probeert de belangrijke concepten, theorieën en bewijzen te identificeren, evenals de gebreken van bestaand onderzoek.

Soms zijn scoping reviews een verkennende, voorbereidende stap op een systematische review, maar ze kunnen ook een opzichzelfstaand project zijn.

Lees waarom zo veel studenten Scribbr inschakelen

Ontdek nakijken op taal

Een systematische review is een goede keuze als je een vraag wilt beantwoorden over de effectiviteit van een interventie, zoals een medische behandeling.

Om een systematische review uit te voeren, heb je het volgende nodig:

  • Een specifieke onderzoeksvraag, meestal over de effectiviteit van een interventie. De vraag moet gaan over een onderwerp dat al eerder door meerdere onderzoekers is onderzocht. Als er geen eerder onderzoek is, valt er ook niets te beoordelen.
  • Een team van ten minste drie personen. Voor een optimale uitvoering van sommige stappen in het proces van een systematische review zijn drie personen vereist. Idealiter heb je naast het onderzoeksteam ook een adviesteam van ongeveer zes mensen.
  • Als je een systematische review alleen uitvoert (e.g., voor je scriptie of paper), is het belangrijk dat je de nodige maatregelen neemt om de betrouwbaarheid en validiteit van je onderzoek te kunnen waarborgen.
  • Toegang tot databanken en archieven van wetenschappelijke tijdschriften. Je onderwijsinstelling biedt vaak gratis toegang.
  • Tijd. Een professionele systematische review is een tijdrovend proces. Het kost de hoofdauteur ongeveer zes maanden fulltime werk. Als student moet je de omvang van je systematische review beperken en je aan een strak schema houden.
  • Bibliografische, tekstverwerkings-, spreadsheet-, en statistische software. Je kunt bijvoorbeeld EndNote, Microsoft Word, Excel en SPSS of R gebruiken.

Systematische reviews hebben veel voordelen .

  • Ze minimaliseren vertekening ( bias ) door al het beschikbare bewijsmateriaal in overweging te nemen en elk onderzoek te evalueren op vooringenomenheid.
  • De methodes zijn transparant , zodat ze door anderen kunnen worden gecontroleerd.
  • De reviews zijn grondig . Ze vatten al het beschikbare bewijs samen.
  • Ze kunnen worden gerepliceerd en bijgewerkt door anderen.

Systematische reviews hebben ook een aantal nadelen .

  • Ze zijn tijdrovend .
  • Ze zijn beperkt in reikwijdte : ze beantwoorden alleen de onderzoeksvraag.

De 7 stappen voor het uitvoeren van een systematische review worden hieronder uitgelegd met een voorbeeld.

Stap 1: Formuleer een onderzoeksvraag

Het formuleren van een onderzoeksvraag is waarschijnlijk de belangrijkste stap van een systematische review. Een duidelijke onderzoeksvraag zorgt ervoor dat:

  • Je je onderzoek effectiever kunt communiceren naar andere onderzoekers.
  • Je beslissingen begeleid worden bij het plannen en uitvoeren van je systematische review.

Een goede onderzoeksvraag voor een systematische review heeft vier componenten die je kunt onthouden met het Engelse acroniem PICO :

  • P opulation or p roblem (Populatie of Probleem)
  • I ntervention (Interventie)
  • C omparison (Vergelijking)
  • O utcome (Uitkomst/Resultaat)

Je kunt deze vier componenten herschikken om je onderzoeksvraag te formuleren:

  • Wat is de effectiviteit van I versus C voor O bij P ?

Soms wil je nog een vijfde componenten toevoegen: het soort onderzoeksplan . In dit geval wordt het acroniem PICOT :

  • T ype of study design (Soort onderzoeksplan)
  • De populatie van patiënten met eczeem
  • De interventie van probiotica
  • In vergelijking (comparison) met geen behandeling, placebo of een niet-probiotische behandeling.
  • Het resultaat (outcome) van de veranderingen in de door de proefpersonen, ouders en artsen beoordeelde symptomen van eczeem en de kwaliteit van leven.
  • Het soort onderzoeksplan (type of study design) van randomized controlled trials .

Hun onderzoeksvraag was:

  • Wat is de effectiviteit van probiotica versus geen behandeling, een placebo of een niet-probiotische behandeling voor het verminderen van eczeem symptomen en het verbeteren van de kwaliteit van leven bij patiënten met eczeem?

Stap 2: Ontwikkel een protocol

Een protocol is een document waarin je onderzoeksplan voor de systematische review staat uitgeschreven. Dit is een belangrijke stap omdat je met een plan efficiënter kunt werken en research bias (onderzoeksbias) kunt verminderen.

Je protocol moet de volgende onderdelen bevatten:

  • Achtergrondinformatie. Geef de context van de onderzoeksvraag, en leg ook uit waarom deze belangrijk is.
  • Onderzoeksdoelstelling(en) . Herformuleer je onderzoeksvraag als een doelstelling.
  • Selectiecriteria: Geef aan op welke manier je beslist welke literatuur in je review wordt opgenomen en welke van je review wordt uitgesloten.
  • Zoekstrategie: Bespreek je plan voor het vinden van de al bestaande literatuur.
  • Analyse: Leg uit welke informatie je gaat verzamelen uit de literatuur en hoe je de data zult synthetiseren.

Als je een professional bent en van plan bent om je review te publiceren, is het een goed idee om een adviescommissie samen te stellen. Dit is een groep van ongeveer zes mensen die ervaring hebben met het onderwerp waar je onderzoek naar doet. Zij kunnen je helpen om beslissingen te maken over je protocol.

Het is sterk aan te raden om je protocol te registreren. Het registreren van je protocol betekent dat je het indient bij een databank zoals PROSPERO of ClinicalTrials.gov .

Stap 3: Zoek naar alle relevante literatuur

Het zoeken naar alle relevante literatuur is de meest tijdrovende stap van een systematische review.

Om vertekening ( bias ) te verminderen, is het belangrijk om zeer grondig te zoeken naar alle relevante literatuur. Je zoekstrategie zal afhangen van je vakgebied en je onderzoeksvraag, maar bronnen vallen over het algemeen in de volgende vier categorieën:

  • Databanken: Doorzoek meerdere databanken van peer-reviewed literatuur, zoals PubMed , JSTOR of Scopus . Denk goed na over hoe je je zoektermen formuleert en neem meerdere synoniemen van elk woord op.
  • Handmatig zoeken: Naast het zoeken van primaire bronnen in databanken, moet je ook handmatig zoeken. Je kunt ervoor kiezen om relevante tijdschriften en conferentieverslagen te scannen. Maar je kunt ook de referentielijsten van andere relevante studies doorzoeken.
  • Grijze literatuur: Grijze literatuur zijn documenten geproduceerd door overheden, universiteiten en andere instellingen die niet worden gepubliceerd door traditionele uitgevers. Scripties van afgestudeerde studenten zijn een belangrijke vorm van grijze literatuur. Deze kun je doorzoeken in de Networked Digital Library of Theses and Dissertations (NDLTD) . In de geneeskunde zijn registers van klinische proeven een andere belangrijke vorm van grijze literatuur.
  • Deskundigen: Neem contact op met experts in het vakgebied om te vragen of zij ongepubliceerde studies hebben die in je review moeten worden opgenomen.

In dit stadium van je review lees je de artikelen nog niet. Je slaat enkel alle mogelijk relevante citaten op met bibliografische software, zoals Scibbr’s APA Generator of MLA Generator .

  • Databanken: EMBASE, PsycINFO, AMED, LILACS, en ISI Web of Science.
  • Handmatig: Conferentieverslagen en referentielijsten van artikelen.
  • Grijze literatuur: The Cochrane Library, het metaRegister of Controlled Trials, en de Ongoing Skin Trials Register.
  • Deskundigen: Auteurs van ongepubliceerde geregistreerde onderzoeken, farmaceutische bedrijven, en producenten van probiotica.

Stap 4: Pas de selectiecriteria toe

Het toepassen van de selectiecriteria is een taak voor drie personen. Twee van de drie lezen onafhankelijk van elkaar de literatuur en beslissen welke in de systematische review moeten worden opgenomen op basis van de selectiecriteria die je in het protocol hebt vermeld. De taak van de derde persoon is om bij een eventueel meningsverschil de knoop door te hakken.

Om de interbeoordelaarsbetrouwbaarheid ( inter-rater reliability ) te vergroten, is het van belang dat iedereen de selectiecriteria goed begrijpt voordat je aan deze stap begint.

Als je als student een systematische review schrijft voor een opdracht, heb je misschien geen team tot je beschikking. In dat geval moet je de selectiecriteria alleen toepassen. Je kunt dit als een beperking vermelden in de bespreking van je paper.

De selectiecriteria dien je in twee fasen toe te passen:

  • Op basis van de titels en abstracts: Beslis of elk artikel potentieel voldoet aan de selectiecriteria op basis van de informatie in de abstracts.
  • Op basis van de volledige teksten: Download de artikelen die in de eerste fase niet zijn uitgesloten. Als een artikel niet online of via je bibliotheek beschikbaar is, kun je contact opnemen met de auteur van het artikel om een exemplaar te vragen. Lees de artikelen en bepaal welke artikelen aan de selectiecriteria voldoen.

Het is heel belangrijk dat je nauwkeurig bijhoudt waarom je elk artikel wel of niet hebt opgenomen. Als het selectieproces is voltooid, kun je samenvatten wat je hebt gedaan met behulp van een PRISMA Flow Diagram (stroomdiagram).

Vervolgens verzamelden Boyle en zijn collega’s de volledige teksten van alle overgebleven literatuur. Boyle en Tang lazen de artikelen door om te bepalen of er nog meer studies moesten worden uitgesloten op basis van hun selectiecriteria.

Als Boyle en Tang van mening verschilden over de vraag of een artikel van de review moest worden uitgesloten, bespraken zij dit met Varigos totdat de drie onderzoekers tot een overeenstemming kwamen.

Stap 5: Extraheer de data

Het extraheren van de data betekent het op systematische wijze verzamelen van informatie uit de geselecteerde literatuur. Er zijn twee soorten informatie die je van elke studie moet verzamelen:

  • Informatie over de methoden en resultaten van de studie. De precieze informatie hangt af van je onderzoeksvraag, maar deze informatie kan het jaar, de onderzoeksopzet , de steekproefgrootte , de context, de onderzoeksresultaten en de conclusie zijn.
  • Jouw oordeel over de kwaliteit van het bewijsmateriaal, inclusief het risico op vertekening ( bias ) .

Deze informatie kun je verzamelen met behulp van formulieren. Voorbeeldformulieren zijn te vinden in The Registry of Methods and Tools for Evidence-Informed Decision Making en The Grading of Recommendations, Assessment, Development and Evaluations Working Group .

Het extraheren van de data is ook een taak voor drie personen. Twee personen voeren deze stap weer onafhankelijk van elkaar uit, terwijl een derde persoon de meningsverschillen oplost.

Daarnaast verzamelden ze ook informatie over mogelijke bronnen van vertekening, zoals de manier waarop de proefpersonen van de studie willekeurig werden verdeeld in de behandelgroep en controlegroep ( randomisatie ).

Stap 6: Bundel de data

Het bundelen (synthetiseren) van de data betekent het samenbrengen van de verzamelde informatie in een enkel, samenhangend verhaal. Er zijn twee manieren voor het synthetiseren van de data:

  • Narratief ( kwalitatief ): Vat de informatie samen in woorden. Je moet de studies bespreken en hun algemene kwaliteit beoordelen.
  • Kwantitatief : Gebruik statistische methoden om de data van verschillende studies samen te vatten en te vergelijken. De meest gebruikelijke kwantitatieve benadering is een meta-analyse, waarmee je de resultaten van studies kunt combineren tot een samenvattend geheel.

Over het algemeen zul je waarschijnlijk beide manieren samen moeten gebruiken. Als je niet genoeg data hebt, of de data van verschillende studies zijn niet vergelijkbaar, dan kun je alleen kiezen voor de narratieve aanpak. Het is dan wel belangrijk dat je uitlegt waarom een kwantitatieve benadering niet mogelijk was.

De onderzoekers verdeelden de studies ook in subgroepen, zoals studies over baby’s, kinderen, en volwassenen, en analyseerden de effectgroottes binnen elke groep.

Stap 7: Schrijf en publiceer een verslag

Het doel van het schrijven van een verslag van je systematische review is om het antwoord op je onderzoeksvraag te delen en uit te leggen hoe je tot dit antwoord bent gekomen.

Je verslag moet de volgende onderdelen bevatten:

  • Abstract : Een samenvatting van de review. ( Tip : Maak gebruik van Scribbrs samenvatter om je tekst gratis & snel samen te vatten.)
  • Inleiding : Inclusief de beweegredenen en doelstelling(en).
  • Methoden : Inclusief de selectiecriteria, zoekmethode, methode voor data-extractie en methode voor synthese van de data.
  • Resultaten : Inclusief de resultaten van het zoek- en selectieproces, kenmerken van de studie, risico op vertekening (bias) binnen de studie en de resultaten van de synthese.
  • Discussie : Inclusief een interpretatie van de resultaten en de beperkingen van de review.
  • Conclusie : Het antwoord op je onderzoeksvraag en de implicaties voor praktijk, beleid of onderzoek.

Om te controleren of je verslag alles bevat wat nodig is, kun je de PRISMA-checklist gebruiken.

Zodra je verslag geschreven is, kun je het publiceren in een databank voor systematische reviews, zoals de Cochrane Database of Systematic Reviews en/of een peer-reviewed tijdschrift.

In 2018 hebben ze hun verslag bijgewerkt .

Literatuuronderzoek (ook literatuurstudie of literature review genoemd) is een methode om bestaande kennis over je onderwerp of probleemstelling te verzamelen. Deze kennis vind je in verschillende bronnen , zoals wetenschappelijke tijdschriftartikelen, boeken, papers, scripties en archiefmateriaal.

Het resultaat is niet alleen een beschrijving van de gevonden informatie, maar ook een kritische bespreking van de meest relevante informatie. Een literatuuronderzoek kan op zichzelf staan of deel uitmaken van een groter geheel.

Een systematische review is secundair onderzoek omdat het gebruikmaakt van al bestaand onderzoek. Je verzamelt zelf geen nieuwe data.

Met descriptief of beschrijvend onderzoek probeer je accuraat en systematisch een populatie, situatie of fenomeen te beschrijven. Met dit type onderzoek kun je wat-, waar-, wanneer- en hoe-vragen beantwoorden, maar geen waarom-vragen. In tegenstelling tot bij experimenteel onderzoek probeert een onderzoeker geen enkele variabele te controleren of manipuleren. In plaats daarvan worden de variabelen enkel geobserveerd en gemeten.

Citeer dit Scribbr-artikel

Als je naar deze bron wilt verwijzen, kun je de bronvermelding kopiëren of op “Citeer dit Scribbr-artikel” klikken om de bronvermelding automatisch toe te voegen aan onze gratis Bronnengenerator.

Scharwächter, V. (2022, 19 augustus). Systematische Review | Stappenplan & Voorbeeld. Scribbr. Geraadpleegd op 9 september 2024, van https://www.scribbr.nl/onderzoeksmethoden/systematische-review/

Wat vind jij van dit artikel?

Veronique Scharwächter

Veronique Scharwächter

Andere studenten bekeken ook, literatuuronderzoek of literatuurstudie doen in 4 stappen, soorten onderzoeksvragen perfect formuleren (met voorbeelden), wat is een peer review | betekenis & voorbeelden, wil jij zorgeloos je scriptie inleveren.

Reference management. Clean and simple.

How to write a systematic literature review [9 steps]

Systematic literature review

What is a systematic literature review?

Where are systematic literature reviews used, what types of systematic literature reviews are there, how to write a systematic literature review, 1. decide on your team, 2. formulate your question, 3. plan your research protocol, 4. search for the literature, 5. screen the literature, 6. assess the quality of the studies, 7. extract the data, 8. analyze the results, 9. interpret and present the results, registering your systematic literature review, frequently asked questions about writing a systematic literature review, related articles.

A systematic literature review is a summary, analysis, and evaluation of all the existing research on a well-formulated and specific question.

Put simply, a systematic review is a study of studies that is popular in medical and healthcare research. In this guide, we will cover:

  • the definition of a systematic literature review
  • the purpose of a systematic literature review
  • the different types of systematic reviews
  • how to write a systematic literature review

➡️ Visit our guide to the best research databases for medicine and health to find resources for your systematic review.

Systematic literature reviews can be utilized in various contexts, but they’re often relied on in clinical or healthcare settings.

Medical professionals read systematic literature reviews to stay up-to-date in their field, and granting agencies sometimes need them to make sure there’s justification for further research in an area. They can even be used as the starting point for developing clinical practice guidelines.

A classic systematic literature review can take different approaches:

  • Effectiveness reviews assess the extent to which a medical intervention or therapy achieves its intended effect. They’re the most common type of systematic literature review.
  • Diagnostic test accuracy reviews produce a summary of diagnostic test performance so that their accuracy can be determined before use by healthcare professionals.
  • Experiential (qualitative) reviews analyze human experiences in a cultural or social context. They can be used to assess the effectiveness of an intervention from a person-centric perspective.
  • Costs/economics evaluation reviews look at the cost implications of an intervention or procedure, to assess the resources needed to implement it.
  • Etiology/risk reviews usually try to determine to what degree a relationship exists between an exposure and a health outcome. This can be used to better inform healthcare planning and resource allocation.
  • Psychometric reviews assess the quality of health measurement tools so that the best instrument can be selected for use.
  • Prevalence/incidence reviews measure both the proportion of a population who have a disease, and how often the disease occurs.
  • Prognostic reviews examine the course of a disease and its potential outcomes.
  • Expert opinion/policy reviews are based around expert narrative or policy. They’re often used to complement, or in the absence of, quantitative data.
  • Methodology systematic reviews can be carried out to analyze any methodological issues in the design, conduct, or review of research studies.

Writing a systematic literature review can feel like an overwhelming undertaking. After all, they can often take 6 to 18 months to complete. Below we’ve prepared a step-by-step guide on how to write a systematic literature review.

  • Decide on your team.
  • Formulate your question.
  • Plan your research protocol.
  • Search for the literature.
  • Screen the literature.
  • Assess the quality of the studies.
  • Extract the data.
  • Analyze the results.
  • Interpret and present the results.

When carrying out a systematic literature review, you should employ multiple reviewers in order to minimize bias and strengthen analysis. A minimum of two is a good rule of thumb, with a third to serve as a tiebreaker if needed.

You may also need to team up with a librarian to help with the search, literature screeners, a statistician to analyze the data, and the relevant subject experts.

Define your answerable question. Then ask yourself, “has someone written a systematic literature review on my question already?” If so, yours may not be needed. A librarian can help you answer this.

You should formulate a “well-built clinical question.” This is the process of generating a good search question. To do this, run through PICO:

  • Patient or Population or Problem/Disease : who or what is the question about? Are there factors about them (e.g. age, race) that could be relevant to the question you’re trying to answer?
  • Intervention : which main intervention or treatment are you considering for assessment?
  • Comparison(s) or Control : is there an alternative intervention or treatment you’re considering? Your systematic literature review doesn’t have to contain a comparison, but you’ll want to stipulate at this stage, either way.
  • Outcome(s) : what are you trying to measure or achieve? What’s the wider goal for the work you’ll be doing?

Now you need a detailed strategy for how you’re going to search for and evaluate the studies relating to your question.

The protocol for your systematic literature review should include:

  • the objectives of your project
  • the specific methods and processes that you’ll use
  • the eligibility criteria of the individual studies
  • how you plan to extract data from individual studies
  • which analyses you’re going to carry out

For a full guide on how to systematically develop your protocol, take a look at the PRISMA checklist . PRISMA has been designed primarily to improve the reporting of systematic literature reviews and meta-analyses.

When writing a systematic literature review, your goal is to find all of the relevant studies relating to your question, so you need to search thoroughly .

This is where your librarian will come in handy again. They should be able to help you formulate a detailed search strategy, and point you to all of the best databases for your topic.

➡️ Read more on on how to efficiently search research databases .

The places to consider in your search are electronic scientific databases (the most popular are PubMed , MEDLINE , and Embase ), controlled clinical trial registers, non-English literature, raw data from published trials, references listed in primary sources, and unpublished sources known to experts in the field.

➡️ Take a look at our list of the top academic research databases .

Tip: Don’t miss out on “gray literature.” You’ll improve the reliability of your findings by including it.

Don’t miss out on “gray literature” sources: those sources outside of the usual academic publishing environment. They include:

  • non-peer-reviewed journals
  • pharmaceutical industry files
  • conference proceedings
  • pharmaceutical company websites
  • internal reports

Gray literature sources are more likely to contain negative conclusions, so you’ll improve the reliability of your findings by including it. You should document details such as:

  • The databases you search and which years they cover
  • The dates you first run the searches, and when they’re updated
  • Which strategies you use, including search terms
  • The numbers of results obtained

➡️ Read more about gray literature .

This should be performed by your two reviewers, using the criteria documented in your research protocol. The screening is done in two phases:

  • Pre-screening of all titles and abstracts, and selecting those appropriate
  • Screening of the full-text articles of the selected studies

Make sure reviewers keep a log of which studies they exclude, with reasons why.

➡️ Visit our guide on what is an abstract?

Your reviewers should evaluate the methodological quality of your chosen full-text articles. Make an assessment checklist that closely aligns with your research protocol, including a consistent scoring system, calculations of the quality of each study, and sensitivity analysis.

The kinds of questions you'll come up with are:

  • Were the participants really randomly allocated to their groups?
  • Were the groups similar in terms of prognostic factors?
  • Could the conclusions of the study have been influenced by bias?

Every step of the data extraction must be documented for transparency and replicability. Create a data extraction form and set your reviewers to work extracting data from the qualified studies.

Here’s a free detailed template for recording data extraction, from Dalhousie University. It should be adapted to your specific question.

Establish a standard measure of outcome which can be applied to each study on the basis of its effect size.

Measures of outcome for studies with:

  • Binary outcomes (e.g. cured/not cured) are odds ratio and risk ratio
  • Continuous outcomes (e.g. blood pressure) are means, difference in means, and standardized difference in means
  • Survival or time-to-event data are hazard ratios

Design a table and populate it with your data results. Draw this out into a forest plot , which provides a simple visual representation of variation between the studies.

Then analyze the data for issues. These can include heterogeneity, which is when studies’ lines within the forest plot don’t overlap with any other studies. Again, record any excluded studies here for reference.

Consider different factors when interpreting your results. These include limitations, strength of evidence, biases, applicability, economic effects, and implications for future practice or research.

Apply appropriate grading of your evidence and consider the strength of your recommendations.

It’s best to formulate a detailed plan for how you’ll present your systematic review results. Take a look at these guidelines for interpreting results from the Cochrane Institute.

Before writing your systematic literature review, you can register it with OSF for additional guidance along the way. You could also register your completed work with PROSPERO .

Systematic literature reviews are often found in clinical or healthcare settings. Medical professionals read systematic literature reviews to stay up-to-date in their field and granting agencies sometimes need them to make sure there’s justification for further research in an area.

The first stage in carrying out a systematic literature review is to put together your team. You should employ multiple reviewers in order to minimize bias and strengthen analysis. A minimum of two is a good rule of thumb, with a third to serve as a tiebreaker if needed.

Your systematic review should include the following details:

A literature review simply provides a summary of the literature available on a topic. A systematic review, on the other hand, is more than just a summary. It also includes an analysis and evaluation of existing research. Put simply, it's a study of studies.

The final stage of conducting a systematic literature review is interpreting and presenting the results. It’s best to formulate a detailed plan for how you’ll present your systematic review results, guidelines can be found for example from the Cochrane institute .

systematic literature review vertaling

systematic literature review vertaling

What is a Systematic Literature Review?

A systematic literature review (SLR) is an independent academic method that aims to identify and evaluate all relevant literature on a topic in order to derive conclusions about the question under consideration. "Systematic reviews are undertaken to clarify the state of existing research and the implications that should be drawn from this." (Feak & Swales, 2009, p. 3) An SLR can demonstrate the current state of research on a topic, while identifying gaps and areas requiring further research with regard to a given research question. A formal methodological approach is pursued in order to reduce distortions caused by an overly restrictive selection of the available literature and to increase the reliability of the literature selected (Tranfield, Denyer & Smart, 2003). A special aspect in this regard is the fact that a research objective is defined for the search itself and the criteria for determining what is to be included and excluded are defined prior to conducting the search. The search is mainly performed in electronic literature databases (such as Business Source Complete or Web of Science), but also includes manual searches (reviews of reference lists in relevant sources) and the identification of literature not yet published in order to obtain a comprehensive overview of a research topic.

An SLR protocol documents all the information gathered and the steps taken as part of an SLR in order to make the selection process transparent and reproducible. The PRISMA flow-diagram support you in making the selection process visible.

In an ideal scenario, experts from the respective research discipline, as well as experts working in the relevant field and in libraries, should be involved in setting the search terms . As a rule, the literature is selected by two or more reviewers working independently of one another. Both measures serve the purpose of increasing the objectivity of the literature selection. An SLR must, then, be more than merely a summary of a topic (Briner & Denyer, 2012). As such, it also distinguishes itself from “ordinary” surveys of the available literature. The following table shows the differences between an SLR and an “ordinary” literature review.

  • Charts of BSWL workshop (pdf, 2.88 MB)
  • Listen to the interview (mp4, 12.35 MB)

Differences to "common" literature reviews

CharacteristicSLRcommon literature overview
Independent research methodyesno
Explicit formulation of the search objectivesyesno
Identification of all publications on a topicyesno
Defined criteria for inclusion and exclusion of publicationsyesno
Description of search procedureyesno
Literature selection and information extraction by several personsyesno
Transparent quality evaluation of publicationsyesno

What are the objectives of SLRs?

  • Avoidance of research redundancies despite a growing amount of publications
  • Identification of research areas, gaps and methods
  • Input for evidence-based management, which allows to base management decisions on scientific methods and findings
  • Identification of links between different areas of researc

Process steps of an SLR

A SLR has several process steps which are defined differently in the literature (Fink 2014, p. 4; Guba 2008, Transfield et al. 2003). We distinguish the following steps which are adapted to the economics and management research area:

1. Defining research questions

Briner & Denyer (2009, p. 347ff.) have developed the CIMO scheme to establish clearly formulated and answerable research questions in the field of economic sciences:

C – CONTEXT:  Which individuals, relationships, institutional frameworks and systems are being investigated?

I – Intervention:  The effects of which event, action or activity are being investigated?

M – Mechanisms:  Which mechanisms can explain the relationship between interventions and results? Under what conditions do these mechanisms take effect?

O – Outcomes:  What are the effects of the intervention? How are the results measured? What are intended and unintended effects?

The objective of the systematic literature review is used to formulate research questions such as “How can a project team be led effectively?”. Since there are numerous interpretations and constructs for “effective”, “leadership” and “project team”, these terms must be particularized.

With the aid of the scheme, the following concrete research questions can be derived with regard to this example:

Under what conditions (C) does leadership style (I) influence the performance of project teams (O)?

Which constructs have an effect upon the influence of leadership style (I) on a project team’s performance (O)?          

Research questions do not necessarily need to follow the CIMO scheme, but they should:

  • ... be formulated in a clear, focused and comprehensible manner and be answerable;
  • ... have been determined prior to carrying out the SLR;
  • ... consist of general and specific questions.

As early as this stage, the criteria for inclusion and exclusion are also defined. The selection of the criteria must be well-grounded. This may include conceptual factors such as a geographical or temporal restrictions, congruent definitions of constructs, as well as quality criteria (journal impact factor > x).

2. Selecting databases and other research sources

The selection of sources must be described and explained in detail. The aim is to find a balance between the relevance of the sources (content-related fit) and the scope of the sources.

In the field of economic sciences, there are a number of literature databases that can be searched as part of an SLR. Some examples in this regard are:

  • Business Source Complete
  • ProQuest One Business
  • EconBiz        

Our video " Selecting the right databases " explains how to find relevant databases for your topic.

Literature databases are an important source of research for SLRs, as they can minimize distortions caused by an individual literature selection (selection bias), while offering advantages for a systematic search due to their data structure. The aim is to find all database entries on a topic and thus keep the retrieval bias low (tutorial on retrieval bias ).  Besides articles from scientific journals, it is important to inlcude working papers, conference proceedings, etc to reduce the publication bias ( tutorial on publication bias ).

Our online self-study course " Searching economic databases " explains step 2 und 3.

3. Defining search terms

Once the literature databases and other research sources have been selected, search terms are defined. For this purpose, the research topic/questions is/are divided into blocks of terms of equal ranking. This approach is called the block-building method (Guba 2008, p. 63). The so-called document-term matrix, which lists topic blocks and search terms according to a scheme, is helpful in this regard. The aim is to identify as many different synonyms as possible for the partial terms. A precisely formulated research question facilitates the identification of relevant search terms. In addition, keywords from particularly relevant articles support the formulation of search terms.

A document-term matrix for the topic “The influence of management style on the performance of project teams” is shown in this example .

Identification of headwords and keywords

When setting search terms, a distinction must be made between subject headings and keywords, both of which are described below:

  • appear in the title, abstract and/or text
  • sometimes specified by the author, but in most cases automatically generated
  • non-standardized
  • different spellings and forms (singular/plural) must be searched separately

Subject headings

  • describe the content
  • are generated by an editorial team
  • are listed in a standardized list (thesaurus)
  • may comprise various keywords
  • include different spellings
  • database-specific

Subject headings are a standardized list of words that are generated by the specialists in charge of some databases. This so-called index of subject headings (thesaurus) helps searchers find relevant articles, since the headwords indicate the content of a publication. By contrast, an ordinary keyword search does not necessarily result in a content-related fit, since the database also displays articles in which, for example, a word appears once in the abstract, even though the article’s content does not cover the topic.

Nevertheless, searches using both headwords and keywords should be conducted, since some articles may not yet have been assigned headwords, or errors may have occurred during the assignment of headwords. 

To add headwords to your search in the Business Source Complete database, please select the Thesaurus tab at the top. Here you can find headwords in a new search field and integrate them into your search query. In the search history, headwords are marked with the addition DE (descriptor).

The EconBiz database of the German National Library of Economics (ZBW – Leibniz Information Centre for Economics), which also contains German-language literature, has created its own index of subject headings with the STW Thesaurus for Economics . Headwords are integrated into the search by being used in the search query.

Since the indexes of subject headings divide terms into synonyms, generic terms and sub-aspects, they facilitate the creation of a document-term matrix. For this purpose it is advisable to specify in the document-term matrix the origin of the search terms (STW Thesaurus for Economics, Business Source Complete, etc.).

Searching in literature databases

Once the document-term matrix has been defined, the search in literature databases begins. It is recommended to enter each word of the document-term matrix individually into the database in order to obtain a good overview of the number of hits per word. Finally, all the words contained in a block of terms are linked with the Boolean operator OR and thereby a union of all the words is formed. The latter are then linked with each other using the Boolean operator AND. In doing so, each block should be added individually in order to see to what degree the number of hits decreases.

Since the search query must be set up separately for each database, tools such as  LitSonar  have been developed to enable a systematic search across different databases. LitSonar was created by  Professor Dr. Ali Sunyaev (Institute of Applied Informatics and Formal Description Methods – AIFB) at the Karlsruhe Institute of Technology.

Advanced search

Certain database-specific commands can be used to refine a search, for example, by taking variable word endings into account (*) or specifying the distance between two words, etc. Our overview shows the most important search commands for our top databases.

Additional searches in sources other than literature databases

In addition to literature databases, other sources should also be searched. Fink (2014, p. 27) lists the following reasons for this:

  • the topic is new and not yet included in indexes of subject headings;
  • search terms are not used congruently in articles because uniform definitions do not exist;
  • some studies are still in the process of being published, or have been completed, but not published.

Therefore, further search strategies are manual search, bibliographic analysis, personal contacts and academic networks (Briner & Denyer, p. 349). Manual search means that you go through the source information of relevant articles and supplement your hit list accordingly. In addition, you should conduct a targeted search for so-called gray literature, that is, literature not distributed via the book trade, such as working papers from specialist areas and conference reports. By including different types of publications, the so-called publication bias (DBWM video “Understanding publication bias” ) – that is, distortions due to exclusive use of articles from peer-reviewed journals – should be kept to a minimum.

The PRESS-Checklist can support you to check the correctness of your search terms.

4. Merging hits from different databases

In principle, large amounts of data can be easily collected, structured and sorted with data processing programs such as Excel. Another option is to use reference management programs such as EndNote, Citavi or Zotero. The Saxon State and University Library Dresden (SLUB Dresden) provides an  overview of current reference management programs  . Software for qualitative data analysis such as NVivo is equally suited for data processing. A comprehensive overview of the features of different tools that support the SLR process can be found in Bandara et al. (2015).

Our online-self study course "Managing literature with Citavi" shows you how to use the reference management software Citavi.

When conducting an SLR, you should specify for each hit the database from which it originates and the date on which the query was made. In addition, you should always indicate how many hits you have identified in the various databases or, for example, by manual search.

Exporting data from literature databases

Exporting from literature databases is very easy. In  Business Source Complete  , you must first click on the “Share” button in the hit list, then “Email a link to download exported results” at the very bottom and then select the appropriate format for the respective literature program.

Exporting data from the literature database  EconBiz  is somewhat more complex. Here you must first create a marked list and then select each hit individually and add it to the marked list. Afterwards, articles on the list can be exported.

After merging all hits from the various databases, duplicate entries (duplicates) are deleted.

5. Applying inclusion and exclusion criteria

All publications are evaluated in the literature management program applying the previously defined criteria for inclusion and exclusion. Only those sources that survive this selection process will subsequently be analyzed. The review process and inclusion criteria should be tested with a small sample and adjustments made if necessary before applying it to all articles. In the ideal case, even this selection would be carried out by more than one person, with each working independently of one another. It needs to be made clear how discrepancies between reviewers are dealt with. 

The review of the criteria for inclusion and exclusion is primarily based on the title, abstract and subject headings in the databases, as well as on the keywords provided by the authors of a publication in the first step. In a second step the whole article / source will be read.

You can create tag words for the inclusion and exclusion in your literature management tool to keep an overview.

In addition to the common literature management tools, you can also use software tools that have been developed to support SLRs. The central library of the university in Zurich has published an overview and evaluation of different tools based on a survey among researchers. --> View SLR tools

The selection process needs to be made transparent. The PRISMA flow diagram supports the visualization of the number of included / excluded studies.

Forward and backward search

Should it become apparent that the number of sources found is relatively small, or if you wish to proceed with particular thoroughness, a forward-and-backward search based on the sources found is recommendable (Webster & Watson 2002, p. xvi). A backward search means going through the bibliographies of the sources found. A forward search, by contrast, identifies articles that have cited the relevant publications. The Web of Science and Scopus databases can be used to perform citation analyses.

6. Perform the review

As the next step, the remaining titles are analyzed as to their content by reading them several times in full. Information is extracted according to defined criteria and the quality of the publications is evaluated. If the data extraction is carried out by more than one person, a training ensures that there will be no differences between the reviewers.

Depending on the research questions there exist diffent methods for data abstraction (content analysis, concept matrix etc.). A so-called concept matrix can be used to structure the content of information (Webster & Watson 2002, p. xvii). The image to the right gives an example of a concept matrix according to Becker (2014).

Particularly in the field of economic sciences, the evaluation of a study’s quality cannot be performed according to a generally valid scheme, such as those existing in the field of medicine, for instance. Quality assessment therefore depends largely on the research questions.

Based on the findings of individual studies, a meta-level is then applied to try to understand what similarities and differences exist between the publications, what research gaps exist, etc. This may also result in the development of a theoretical model or reference framework.

Example concept matrix (Becker 2013) on the topic Business Process Management

ArticlePatternConfigurationSimilarities
Thom (2008)x  
Yang (2009)x x
Rosa (2009) xx

7. Synthesizing results

Once the review has been conducted, the results must be compiled and, on the basis of these, conclusions derived with regard to the research question (Fink 2014, p. 199ff.). This includes, for example, the following aspects:

  • historical development of topics (histogram, time series: when, and how frequently, did publications on the research topic appear?);
  • overview of journals, authors or specialist disciplines dealing with the topic;
  • comparison of applied statistical methods;
  • topics covered by research;
  • identifying research gaps;
  • developing a reference framework;
  • developing constructs;
  • performing a meta-analysis: comparison of the correlations of the results of different empirical studies (see for example Fink 2014, p. 203 on conducting meta-analyses)

Publications about the method

Bandara, W., Furtmueller, E., Miskon, S., Gorbacheva, E., & Beekhuyzen, J. (2015). Achieving Rigor in Literature Reviews: Insights from Qualitative Data Analysis and Tool-Support.  Communications of the Association for Information Systems . 34(8), 154-204.

Booth, A., Papaioannou, D., and Sutton, A. (2012)  Systematic approaches to a successful literature review.  London: Sage.

Briner, R. B., & Denyer, D. (2012). Systematic Review and Evidence Synthesis as a Practice and Scholarship Tool. In Rousseau, D. M. (Hrsg.),  The Oxford Handbook of Evidenence Based Management . (S. 112-129). Oxford: Oxford University Press.

Durach, C. F., Wieland, A., & Machuca, Jose A. D. (2015). Antecedents and dimensions of supply chain robustness: a systematic literature review . International Journal of Physical Distribution & Logistic Management , 46 (1/2), 118-137. doi:  https://doi.org/10.1108/IJPDLM-05-2013-0133

Feak, C. B., & Swales, J. M. (2009). Telling a Research Story: Writing a Literature Review.  English in Today's Research World 2.  Ann Arbor: University of Michigan Press. doi:  10.3998/mpub.309338

Fink, A. (2014).  Conducting Research Literature Reviews: From the Internet to Paper  (4. Aufl.). Los Angeles, London, New Delhi, Singapore, Washington DC: Sage Publication.

Fisch, C., & Block, J. (2018). Six tips for your (systematic) literature review in business and management research.  Management Review Quarterly,  68, 103–106 (2018).  doi.org/10.1007/s11301-018-0142-x

Guba, B. (2008). Systematische Literaturrecherche.  Wiener Medizinische Wochenschrift , 158 (1-2), S. 62-69. doi:  doi.org/10.1007/s10354-007-0500-0  Hart, C.  Doing a literature review: releasing the social science research imagination.  London: Sage.

Jesson, J. K., Metheson, L. & Lacey, F. (2011).  Doing your Literature Review - traditional and Systematic Techniques . Los Angeles, London, New Delhi, Singapore, Washington DC: Sage Publication.

Page MJ, McKenzie JE, Bossuyt PM, Boutron I, Hoffmann TC, Mulrow CD, et al. The PRISMA 2020 statement: an updated guideline for reporting systematic reviews. BMJ 2021;372:n71. doi: 10.1136/bmj.n71.

Petticrew, M. and Roberts, H. (2006).  Systematic Reviews in the Social Sciences: A Practical Guide . Oxford:Blackwell. Ridley, D. (2012).  The literature review: A step-by-step guide . 2nd edn. London: Sage. 

Chang, W. and Taylor, S.A. (2016), The Effectiveness of Customer Participation in New Product Development: A Meta-Analysis,  Journal of Marketing , American Marketing Association, Los Angeles, CA, Vol. 80 No. 1, pp. 47–64.

Tranfield, D., Denyer, D. & Smart, P. (2003). Towards a methodology for developing evidence-informed management knowledge by means of systematic review.  British Journal of Management , 14 (3), S. 207-222. doi:  https://doi.org/10.1111/1467-8551.00375

Webster, J., & Watson, R. T. (2002). Analyzing the Past to Prepare for the Future: Writing a Literature Review.  Management Information Systems Quarterly , 26(2), xiii-xxiii.  http://www.jstor.org/stable/4132319

Durach, C. F., Wieland, A. & Machuca, Jose. A. D. (2015). Antecedents and dimensions of supply chain robustness: a systematic literature review. International Journal of Physical Distribution & Logistics Management, 45(1/2), 118 – 137.

What is particularly good about this example is that search terms were defined by a number of experts and the review was conducted by three researchers working independently of one another. Furthermore, the search terms used have been very well extracted and the procedure of the literature selection very well described.

On the downside, the restriction to English-language literature brings the language bias into play, even though the authors consider it to be insignificant for the subject area.

Bos-Nehles, A., Renkema, M. & Janssen, M. (2017). HRM and innovative work behaviour: a systematic literature review. Personnel Review, 46(7), pp. 1228-1253

  • Only very specific keywords used
  • No precise information on how the review process was carried out (who reviewed articles?)
  • Only journals with impact factor (publication bias)

Jia, F., Orzes, G., Sartor, M. & Nassimbeni, G. (2017). Global sourcing strategy and structure: towards a conceptual framework. International Journal of Operations & Production Management, 37(7), 840-864

  • Research questions are explicitly presented
  • Search string very detailed
  • Exact description of the review process
  • 2 persons conducted the review independently of each other

Franziska Klatt

[email protected]

+49 30 314-29778

systematic literature review vertaling

Privacy notice: The TU Berlin offers a chat information service. If you enable it, your IP address and chat messages will be transmitted to external EU servers. more information

The chat is currently unavailable.

Please use our alternative contact options.

Select language

Systematic review beeld

Utrecht University Library

Systematic reviews.

The university library offers advice and support in systematically searching for literature as part of your systematic review.

Are you about to search for literature for a systematic review? Or have you already started but need help? The University Library can support you in different ways, at different times during your research.

Compass+: Systematically searching for literature

The online training course Compass+: Systematically searching for literature shows you step by step what is involved in setting up and performing a systematic database search. After following the module, you can set up a search question, formulate search terms, create a search string and estimate whether you have enough results. It is useful to do this training course before you start searching for literature. You log in with your Solis-id to access the module.

Workshops systematic literature search (for systematic reviews)

At various times of the year, experts from the library give workshops on systematically searching for literature. You will learn to set up a systematic search strategy and you will receive information about where and how to search. You then apply these skills to your own research question. In the library's calendar you can see when the next workshop is planned.

Workshops are also provided at various courses. Do you want to include a workshop in your curriculum? Please contact the library .

Upcoming walk-in hours and workshops

Walk-in hours systematic searches - 16 september, walk-in hours systematic searches - 7 october, walk-in hours systematic searches - 21 october, walk-in hours systematic searches - 4 november, walk-in hours systematic searches - 18 november, tailored advice.

Experts from the university library offer tailored advice for your systematic search. You can make an appointment for this. During the appointment you will be helped with fine-tuning your search.

Reference management

Reference management is the systematic collection, management and use of references to sources. Especially when processing a large number of sources, it is important to do this in a structured way. You can use a reference tool. Check out the Libguide Reference Management for more information .

ASReview: Active Learning for Systematic Reviews

ASReview uses active learning to help you screen large amounts of search results (title-abstract) or text, so you don't have to go through everything manually. This saves you time. Read more on the website of ASReview. You can also download the software here. 

Do you have any questions? Please contact the university library. 

Utrecht University Heidelberglaan 8 3584 CS Utrecht The Netherlands Tel. +31 (0)30 253 35 50

Systematic Literature Reviews: An Introduction

  • Proceedings of the Design Society International Conference on Engineering Design 1(1):1633-1642
  • 1(1):1633-1642
  • CC BY-NC-ND 4.0
  • Conference: International Conference on Engineering Design 2019

Guillaume Lamé at CentraleSupélec

  • CentraleSupélec

Abstract and Figures

Search for "systematic review*" in titles on the Web of Science on 15 Sep 2018

Discover the world's research

  • 25+ million members
  • 160+ million publication pages
  • 2.3+ billion citations

Mahendrran Selvaduray

  • Most. Asikha Aktar
  • Al-Amrani Khadeem Ali Dhahi

Usman Abdullahi

  • Husnia Sholihatin Amri
  • Yetty Dwi Lestari
  • Ronald Aloysius Romein
  • Glenny Chudra

Yaser Arslan

  • Liandro Antonio Tiongson Tabora
  • Datu Sajid Islam Sinsuat Ampatuan
  • Marian Angelique Chamen Castaneda
  • Eury Ellyn Manaloto Zulueta

Srikar Chilla

  • Paul Reddymas
  • Shrihari Chetan Yeldandi
  • Anirudh Reddy Addula

Edi Sumarwan

  • Tina Kartika

Iin Avita Sari

  • Khanh Luong

Hamed Aboutorab

  • Hannah Gately

Tira Nur Fitria

  • DESIGN STUD

Philip Cash

  • BMC MED RES METHODOL

Zachary Munn

  • Martin Stacey

Laura Hay

  • L. Shamseer
  • David Moher
  • Alessandro Liberati

Jennifer M. Tetzlaff

  • David Jones

Alex J Sutton

  • J CLEAN PROD

Michael Saidani

  • Alissa Kendall
  • Recruit researchers
  • Join for free
  • Login Email Tip: Most researchers use their institutional email address as their ResearchGate login Password Forgot password? Keep me logged in Log in or Continue with Google Welcome back! Please log in. Email · Hint Tip: Most researchers use their institutional email address as their ResearchGate login Password Forgot password? Keep me logged in Log in or Continue with Google No account? Sign up

U.S. flag

An official website of the United States government

The .gov means it’s official. Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

The site is secure. The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

  • Publications
  • Account settings
  • My Bibliography
  • Collections
  • Citation manager

Save citation to file

Email citation, add to collections.

  • Create a new collection
  • Add to an existing collection

Add to My Bibliography

Your saved search, create a file for external citation management software, your rss feed.

  • Search in PubMed
  • Search in NLM Catalog
  • Add to Search

How to Do a Systematic Review: A Best Practice Guide for Conducting and Reporting Narrative Reviews, Meta-Analyses, and Meta-Syntheses

Affiliations.

  • 1 Behavioural Science Centre, Stirling Management School, University of Stirling, Stirling FK9 4LA, United Kingdom; email: [email protected].
  • 2 Department of Psychological and Behavioural Science, London School of Economics and Political Science, London WC2A 2AE, United Kingdom.
  • 3 Department of Statistics, Northwestern University, Evanston, Illinois 60208, USA; email: [email protected].
  • PMID: 30089228
  • DOI: 10.1146/annurev-psych-010418-102803

Systematic reviews are characterized by a methodical and replicable methodology and presentation. They involve a comprehensive search to locate all relevant published and unpublished work on a subject; a systematic integration of search results; and a critique of the extent, nature, and quality of evidence in relation to a particular research question. The best reviews synthesize studies to draw broad theoretical conclusions about what a literature means, linking theory to evidence and evidence to theory. This guide describes how to plan, conduct, organize, and present a systematic review of quantitative (meta-analysis) or qualitative (narrative review, meta-synthesis) information. We outline core standards and principles and describe commonly encountered problems. Although this guide targets psychological scientists, its high level of abstraction makes it potentially relevant to any subject area or discipline. We argue that systematic reviews are a key methodology for clarifying whether and how research findings replicate and for explaining possible inconsistencies, and we call for researchers to conduct systematic reviews to help elucidate whether there is a replication crisis.

Keywords: evidence; guide; meta-analysis; meta-synthesis; narrative; systematic review; theory.

PubMed Disclaimer

Similar articles

  • The future of Cochrane Neonatal. Soll RF, Ovelman C, McGuire W. Soll RF, et al. Early Hum Dev. 2020 Nov;150:105191. doi: 10.1016/j.earlhumdev.2020.105191. Epub 2020 Sep 12. Early Hum Dev. 2020. PMID: 33036834
  • Summarizing systematic reviews: methodological development, conduct and reporting of an umbrella review approach. Aromataris E, Fernandez R, Godfrey CM, Holly C, Khalil H, Tungpunkom P. Aromataris E, et al. Int J Evid Based Healthc. 2015 Sep;13(3):132-40. doi: 10.1097/XEB.0000000000000055. Int J Evid Based Healthc. 2015. PMID: 26360830
  • RAMESES publication standards: meta-narrative reviews. Wong G, Greenhalgh T, Westhorp G, Buckingham J, Pawson R. Wong G, et al. BMC Med. 2013 Jan 29;11:20. doi: 10.1186/1741-7015-11-20. BMC Med. 2013. PMID: 23360661 Free PMC article.
  • A Primer on Systematic Reviews and Meta-Analyses. Nguyen NH, Singh S. Nguyen NH, et al. Semin Liver Dis. 2018 May;38(2):103-111. doi: 10.1055/s-0038-1655776. Epub 2018 Jun 5. Semin Liver Dis. 2018. PMID: 29871017 Review.
  • Publication Bias and Nonreporting Found in Majority of Systematic Reviews and Meta-analyses in Anesthesiology Journals. Hedin RJ, Umberham BA, Detweiler BN, Kollmorgen L, Vassar M. Hedin RJ, et al. Anesth Analg. 2016 Oct;123(4):1018-25. doi: 10.1213/ANE.0000000000001452. Anesth Analg. 2016. PMID: 27537925 Review.
  • The Association between Emotional Intelligence and Prosocial Behaviors in Children and Adolescents: A Systematic Review and Meta-Analysis. Cao X, Chen J. Cao X, et al. J Youth Adolesc. 2024 Aug 28. doi: 10.1007/s10964-024-02062-y. Online ahead of print. J Youth Adolesc. 2024. PMID: 39198344
  • The impact of chemical pollution across major life transitions: a meta-analysis on oxidative stress in amphibians. Martin C, Capilla-Lasheras P, Monaghan P, Burraco P. Martin C, et al. Proc Biol Sci. 2024 Aug;291(2029):20241536. doi: 10.1098/rspb.2024.1536. Epub 2024 Aug 28. Proc Biol Sci. 2024. PMID: 39191283 Free PMC article.
  • Target mechanisms of mindfulness-based programmes and practices: a scoping review. Maloney S, Kock M, Slaghekke Y, Radley L, Lopez-Montoyo A, Montero-Marin J, Kuyken W. Maloney S, et al. BMJ Ment Health. 2024 Aug 24;27(1):e300955. doi: 10.1136/bmjment-2023-300955. BMJ Ment Health. 2024. PMID: 39181568 Free PMC article. Review.
  • Bridging disciplines-key to success when implementing planetary health in medical training curricula. Malmqvist E, Oudin A. Malmqvist E, et al. Front Public Health. 2024 Aug 6;12:1454729. doi: 10.3389/fpubh.2024.1454729. eCollection 2024. Front Public Health. 2024. PMID: 39165783 Free PMC article. Review.
  • Strength of evidence for five happiness strategies. Puterman E, Zieff G, Stoner L. Puterman E, et al. Nat Hum Behav. 2024 Aug 12. doi: 10.1038/s41562-024-01954-0. Online ahead of print. Nat Hum Behav. 2024. PMID: 39134738 No abstract available.
  • Search in MeSH

LinkOut - more resources

Full text sources.

  • Ingenta plc
  • Ovid Technologies, Inc.

Other Literature Sources

  • scite Smart Citations

Miscellaneous

  • NCI CPTAC Assay Portal
  • Citation Manager

NCBI Literature Resources

MeSH PMC Bookshelf Disclaimer

The PubMed wordmark and PubMed logo are registered trademarks of the U.S. Department of Health and Human Services (HHS). Unauthorized use of these marks is strictly prohibited.

Banner

Library Services Menu

  • Open Access & Publishing Resources
  • AI Tools & Responsible Use
  • Library Orientation
  • Literature Searching

Systematic Reviews

  • Course Support, Reserves, and Linking

Research Guide for Faculty

  • Information for Faculty by Adorée Hatton Makusztak Last Updated Aug 26, 2024 914 views this year

Systematic Review Process with a Librarian

The librarian plays an integral role in systematic reviews at Loma Linda University. 

What is a systematic review?

Cochrane Reviews provides the following definition for a systematic review: "A systematic review attempts to identify, appraise and synthesize all the empirical evidence that meets pre-specified eligibility criteria to answer a specific research question. Researchers conducting systematic reviews use explicit, systematic methods that are selected with a view aimed at minimizing bias, to produce more reliable findings to inform decision making."

A systematic review is a rigorous and comprehensive approach to reviewing and synthesizing existing research literature on a specific topic. It goes beyond a traditional literature review by using a systematic and transparent process to identify, select, appraise, and analyze relevant studies.

The purpose of a systematic review is to provide a reliable and unbiased summary of the available evidence on a particular research question or topic. By systematically searching for and critically evaluating all relevant studies, systematic reviews aim to minimize bias and provide a more objective assessment of the existing evidence.

Systematic reviews are essential in research for several reasons:

Evidence-based decision making

Summarizing complex bodies of evidence

Identifying research gaps and priorities

Resolving conflicting findings

Improving research efficiency

Systematic Review Service Staff:

To request a systematic review service, contact the jbi certified librarians below: .

systematic literature review vertaling

 Research & Instruction Librarian

 liaison to the school of allied health professions,    and the school of public health.

 office  (909) 558-1000 ext. 47564  ·   e-mail   [email protected]

  Make an appointment with Adorée

systematic literature review vertaling

 Liaison to the School of Pharmacy, the School of Dentistry, 

   and the school of nursing (undergraduate).

 office: (909) 558-1000 ext. 47561 e-mail:  [email protected]

Shan Tamares

 Shan Tamares

 library director.

 office:  (909) 558-1000 ext. 47501 

 e-mail:  [email protected]

  • << Previous: Literature Searching
  • Next: EndNote >>
  • Last Updated: Sep 12, 2024 10:00 AM
  • URL: https://libguides.llu.edu/library-menu
  • Search Menu
  • Sign in through your institution
  • Advance articles
  • Editor's Choice
  • 100 years of the AJE
  • Collections
  • Author Guidelines
  • Submission Site
  • Open Access Options
  • About American Journal of Epidemiology
  • About the Johns Hopkins Bloomberg School of Public Health
  • Journals Career Network
  • Editorial Board
  • Advertising and Corporate Services
  • Self-Archiving Policy
  • Dispatch Dates
  • Journals on Oxford Academic
  • Books on Oxford Academic

Society for Epidemiologic Research

Article Contents

Systematic reviews of the literature: an introduction to current methods.

  • Article contents
  • Figures & tables
  • Supplementary Data

Romina Brignardello-Petersen, Nancy Santesso, Gordon H Guyatt, Systematic reviews of the literature: an introduction to current methods, American Journal of Epidemiology , 2024;, kwae232, https://doi.org/10.1093/aje/kwae232

  • Permissions Icon Permissions

Systematic reviews are a type of evidence synthesis in which authors develop explicit eligibility criteria, collect all the available studies that meet these criteria, and summarize results using reproducible methods that minimize biases and errors. Systematic reviews serve different purposes and use a different methodology than other types of evidence synthesis that include narrative reviews, scoping reviews, and overviews of reviews. Systematic reviews can address questions regarding effects of interventions or exposures, diagnostic properties of tests, and prevalence or prognosis of diseases. All rigorous systematic reviews have common processes that include: 1) determining the question and eligibility criteria, including a priori specification of subgroup hypotheses 2) searching for evidence and selecting studies, 3) abstracting data and assessing risk of bias of the included studies, 4) summarizing the data for each outcome of interest, whenever possible using meta-analyses, and 5) assessing the certainty of the evidence and drawing conclusions. There are several tools that can guide and facilitate the systematic review process, but methodological and content expertise are always necessary.

  • narrative review
Month: Total Views:
July 2024 179
August 2024 154
September 2024 48

Email alerts

Citing articles via, looking for your next opportunity.

  • Recommend to your Library

Affiliations

  • Online ISSN 1476-6256
  • Print ISSN 0002-9262
  • Copyright © 2024 Johns Hopkins Bloomberg School of Public Health
  • About Oxford Academic
  • Publish journals with us
  • University press partners
  • What we publish
  • New features  
  • Open access
  • Institutional account management
  • Rights and permissions
  • Get help with access
  • Accessibility
  • Advertising
  • Media enquiries
  • Oxford University Press
  • Oxford Languages
  • University of Oxford

Oxford University Press is a department of the University of Oxford. It furthers the University's objective of excellence in research, scholarship, and education by publishing worldwide

  • Copyright © 2024 Oxford University Press
  • Cookie settings
  • Cookie policy
  • Privacy policy
  • Legal notice

This Feature Is Available To Subscribers Only

Sign In or Create an Account

This PDF is available to Subscribers Only

For full access to this pdf, sign in to an existing account, or purchase an annual subscription.

Have a language expert improve your writing

Run a free plagiarism check in 10 minutes, generate accurate citations for free.

  • Knowledge Base

Methodology

  • Systematic Review | Definition, Example, & Guide

Systematic Review | Definition, Example & Guide

Published on June 15, 2022 by Shaun Turney . Revised on November 20, 2023.

A systematic review is a type of review that uses repeatable methods to find, select, and synthesize all available evidence. It answers a clearly formulated research question and explicitly states the methods used to arrive at the answer.

They answered the question “What is the effectiveness of probiotics in reducing eczema symptoms and improving quality of life in patients with eczema?”

In this context, a probiotic is a health product that contains live microorganisms and is taken by mouth. Eczema is a common skin condition that causes red, itchy skin.

Table of contents

What is a systematic review, systematic review vs. meta-analysis, systematic review vs. literature review, systematic review vs. scoping review, when to conduct a systematic review, pros and cons of systematic reviews, step-by-step example of a systematic review, other interesting articles, frequently asked questions about systematic reviews.

A review is an overview of the research that’s already been completed on a topic.

What makes a systematic review different from other types of reviews is that the research methods are designed to reduce bias . The methods are repeatable, and the approach is formal and systematic:

  • Formulate a research question
  • Develop a protocol
  • Search for all relevant studies
  • Apply the selection criteria
  • Extract the data
  • Synthesize the data
  • Write and publish a report

Although multiple sets of guidelines exist, the Cochrane Handbook for Systematic Reviews is among the most widely used. It provides detailed guidelines on how to complete each step of the systematic review process.

Systematic reviews are most commonly used in medical and public health research, but they can also be found in other disciplines.

Systematic reviews typically answer their research question by synthesizing all available evidence and evaluating the quality of the evidence. Synthesizing means bringing together different information to tell a single, cohesive story. The synthesis can be narrative ( qualitative ), quantitative , or both.

Prevent plagiarism. Run a free check.

Systematic reviews often quantitatively synthesize the evidence using a meta-analysis . A meta-analysis is a statistical analysis, not a type of review.

A meta-analysis is a technique to synthesize results from multiple studies. It’s a statistical analysis that combines the results of two or more studies, usually to estimate an effect size .

A literature review is a type of review that uses a less systematic and formal approach than a systematic review. Typically, an expert in a topic will qualitatively summarize and evaluate previous work, without using a formal, explicit method.

Although literature reviews are often less time-consuming and can be insightful or helpful, they have a higher risk of bias and are less transparent than systematic reviews.

Similar to a systematic review, a scoping review is a type of review that tries to minimize bias by using transparent and repeatable methods.

However, a scoping review isn’t a type of systematic review. The most important difference is the goal: rather than answering a specific question, a scoping review explores a topic. The researcher tries to identify the main concepts, theories, and evidence, as well as gaps in the current research.

Sometimes scoping reviews are an exploratory preparation step for a systematic review, and sometimes they are a standalone project.

Here's why students love Scribbr's proofreading services

Discover proofreading & editing

A systematic review is a good choice of review if you want to answer a question about the effectiveness of an intervention , such as a medical treatment.

To conduct a systematic review, you’ll need the following:

  • A precise question , usually about the effectiveness of an intervention. The question needs to be about a topic that’s previously been studied by multiple researchers. If there’s no previous research, there’s nothing to review.
  • If you’re doing a systematic review on your own (e.g., for a research paper or thesis ), you should take appropriate measures to ensure the validity and reliability of your research.
  • Access to databases and journal archives. Often, your educational institution provides you with access.
  • Time. A professional systematic review is a time-consuming process: it will take the lead author about six months of full-time work. If you’re a student, you should narrow the scope of your systematic review and stick to a tight schedule.
  • Bibliographic, word-processing, spreadsheet, and statistical software . For example, you could use EndNote, Microsoft Word, Excel, and SPSS.

A systematic review has many pros .

  • They minimize research bias by considering all available evidence and evaluating each study for bias.
  • Their methods are transparent , so they can be scrutinized by others.
  • They’re thorough : they summarize all available evidence.
  • They can be replicated and updated by others.

Systematic reviews also have a few cons .

  • They’re time-consuming .
  • They’re narrow in scope : they only answer the precise research question.

The 7 steps for conducting a systematic review are explained with an example.

Step 1: Formulate a research question

Formulating the research question is probably the most important step of a systematic review. A clear research question will:

  • Allow you to more effectively communicate your research to other researchers and practitioners
  • Guide your decisions as you plan and conduct your systematic review

A good research question for a systematic review has four components, which you can remember with the acronym PICO :

  • Population(s) or problem(s)
  • Intervention(s)
  • Comparison(s)

You can rearrange these four components to write your research question:

  • What is the effectiveness of I versus C for O in P ?

Sometimes, you may want to include a fifth component, the type of study design . In this case, the acronym is PICOT .

  • Type of study design(s)
  • The population of patients with eczema
  • The intervention of probiotics
  • In comparison to no treatment, placebo , or non-probiotic treatment
  • The outcome of changes in participant-, parent-, and doctor-rated symptoms of eczema and quality of life
  • Randomized control trials, a type of study design

Their research question was:

  • What is the effectiveness of probiotics versus no treatment, a placebo, or a non-probiotic treatment for reducing eczema symptoms and improving quality of life in patients with eczema?

Step 2: Develop a protocol

A protocol is a document that contains your research plan for the systematic review. This is an important step because having a plan allows you to work more efficiently and reduces bias.

Your protocol should include the following components:

  • Background information : Provide the context of the research question, including why it’s important.
  • Research objective (s) : Rephrase your research question as an objective.
  • Selection criteria: State how you’ll decide which studies to include or exclude from your review.
  • Search strategy: Discuss your plan for finding studies.
  • Analysis: Explain what information you’ll collect from the studies and how you’ll synthesize the data.

If you’re a professional seeking to publish your review, it’s a good idea to bring together an advisory committee . This is a group of about six people who have experience in the topic you’re researching. They can help you make decisions about your protocol.

It’s highly recommended to register your protocol. Registering your protocol means submitting it to a database such as PROSPERO or ClinicalTrials.gov .

Step 3: Search for all relevant studies

Searching for relevant studies is the most time-consuming step of a systematic review.

To reduce bias, it’s important to search for relevant studies very thoroughly. Your strategy will depend on your field and your research question, but sources generally fall into these four categories:

  • Databases: Search multiple databases of peer-reviewed literature, such as PubMed or Scopus . Think carefully about how to phrase your search terms and include multiple synonyms of each word. Use Boolean operators if relevant.
  • Handsearching: In addition to searching the primary sources using databases, you’ll also need to search manually. One strategy is to scan relevant journals or conference proceedings. Another strategy is to scan the reference lists of relevant studies.
  • Gray literature: Gray literature includes documents produced by governments, universities, and other institutions that aren’t published by traditional publishers. Graduate student theses are an important type of gray literature, which you can search using the Networked Digital Library of Theses and Dissertations (NDLTD) . In medicine, clinical trial registries are another important type of gray literature.
  • Experts: Contact experts in the field to ask if they have unpublished studies that should be included in your review.

At this stage of your review, you won’t read the articles yet. Simply save any potentially relevant citations using bibliographic software, such as Scribbr’s APA or MLA Generator .

  • Databases: EMBASE, PsycINFO, AMED, LILACS, and ISI Web of Science
  • Handsearch: Conference proceedings and reference lists of articles
  • Gray literature: The Cochrane Library, the metaRegister of Controlled Trials, and the Ongoing Skin Trials Register
  • Experts: Authors of unpublished registered trials, pharmaceutical companies, and manufacturers of probiotics

Step 4: Apply the selection criteria

Applying the selection criteria is a three-person job. Two of you will independently read the studies and decide which to include in your review based on the selection criteria you established in your protocol . The third person’s job is to break any ties.

To increase inter-rater reliability , ensure that everyone thoroughly understands the selection criteria before you begin.

If you’re writing a systematic review as a student for an assignment, you might not have a team. In this case, you’ll have to apply the selection criteria on your own; you can mention this as a limitation in your paper’s discussion.

You should apply the selection criteria in two phases:

  • Based on the titles and abstracts : Decide whether each article potentially meets the selection criteria based on the information provided in the abstracts.
  • Based on the full texts: Download the articles that weren’t excluded during the first phase. If an article isn’t available online or through your library, you may need to contact the authors to ask for a copy. Read the articles and decide which articles meet the selection criteria.

It’s very important to keep a meticulous record of why you included or excluded each article. When the selection process is complete, you can summarize what you did using a PRISMA flow diagram .

Next, Boyle and colleagues found the full texts for each of the remaining studies. Boyle and Tang read through the articles to decide if any more studies needed to be excluded based on the selection criteria.

When Boyle and Tang disagreed about whether a study should be excluded, they discussed it with Varigos until the three researchers came to an agreement.

Step 5: Extract the data

Extracting the data means collecting information from the selected studies in a systematic way. There are two types of information you need to collect from each study:

  • Information about the study’s methods and results . The exact information will depend on your research question, but it might include the year, study design , sample size, context, research findings , and conclusions. If any data are missing, you’ll need to contact the study’s authors.
  • Your judgment of the quality of the evidence, including risk of bias .

You should collect this information using forms. You can find sample forms in The Registry of Methods and Tools for Evidence-Informed Decision Making and the Grading of Recommendations, Assessment, Development and Evaluations Working Group .

Extracting the data is also a three-person job. Two people should do this step independently, and the third person will resolve any disagreements.

They also collected data about possible sources of bias, such as how the study participants were randomized into the control and treatment groups.

Step 6: Synthesize the data

Synthesizing the data means bringing together the information you collected into a single, cohesive story. There are two main approaches to synthesizing the data:

  • Narrative ( qualitative ): Summarize the information in words. You’ll need to discuss the studies and assess their overall quality.
  • Quantitative : Use statistical methods to summarize and compare data from different studies. The most common quantitative approach is a meta-analysis , which allows you to combine results from multiple studies into a summary result.

Generally, you should use both approaches together whenever possible. If you don’t have enough data, or the data from different studies aren’t comparable, then you can take just a narrative approach. However, you should justify why a quantitative approach wasn’t possible.

Boyle and colleagues also divided the studies into subgroups, such as studies about babies, children, and adults, and analyzed the effect sizes within each group.

Step 7: Write and publish a report

The purpose of writing a systematic review article is to share the answer to your research question and explain how you arrived at this answer.

Your article should include the following sections:

  • Abstract : A summary of the review
  • Introduction : Including the rationale and objectives
  • Methods : Including the selection criteria, search method, data extraction method, and synthesis method
  • Results : Including results of the search and selection process, study characteristics, risk of bias in the studies, and synthesis results
  • Discussion : Including interpretation of the results and limitations of the review
  • Conclusion : The answer to your research question and implications for practice, policy, or research

To verify that your report includes everything it needs, you can use the PRISMA checklist .

Once your report is written, you can publish it in a systematic review database, such as the Cochrane Database of Systematic Reviews , and/or in a peer-reviewed journal.

In their report, Boyle and colleagues concluded that probiotics cannot be recommended for reducing eczema symptoms or improving quality of life in patients with eczema. Note Generative AI tools like ChatGPT can be useful at various stages of the writing and research process and can help you to write your systematic review. However, we strongly advise against trying to pass AI-generated text off as your own work.

If you want to know more about statistics , methodology , or research bias , make sure to check out some of our other articles with explanations and examples.

  • Student’s  t -distribution
  • Normal distribution
  • Null and Alternative Hypotheses
  • Chi square tests
  • Confidence interval
  • Quartiles & Quantiles
  • Cluster sampling
  • Stratified sampling
  • Data cleansing
  • Reproducibility vs Replicability
  • Peer review
  • Prospective cohort study

Research bias

  • Implicit bias
  • Cognitive bias
  • Placebo effect
  • Hawthorne effect
  • Hindsight bias
  • Affect heuristic
  • Social desirability bias

A literature review is a survey of scholarly sources (such as books, journal articles, and theses) related to a specific topic or research question .

It is often written as part of a thesis, dissertation , or research paper , in order to situate your work in relation to existing knowledge.

A literature review is a survey of credible sources on a topic, often used in dissertations , theses, and research papers . Literature reviews give an overview of knowledge on a subject, helping you identify relevant theories and methods, as well as gaps in existing research. Literature reviews are set up similarly to other  academic texts , with an introduction , a main body, and a conclusion .

An  annotated bibliography is a list of  source references that has a short description (called an annotation ) for each of the sources. It is often assigned as part of the research process for a  paper .  

A systematic review is secondary research because it uses existing research. You don’t collect new data yourself.

Cite this Scribbr article

If you want to cite this source, you can copy and paste the citation or click the “Cite this Scribbr article” button to automatically add the citation to our free Citation Generator.

Turney, S. (2023, November 20). Systematic Review | Definition, Example & Guide. Scribbr. Retrieved September 10, 2024, from https://www.scribbr.com/methodology/systematic-review/

Is this article helpful?

Shaun Turney

Shaun Turney

Other students also liked, how to write a literature review | guide, examples, & templates, how to write a research proposal | examples & templates, what is critical thinking | definition & examples, get unlimited documents corrected.

✔ Free APA citation check included ✔ Unlimited document corrections ✔ Specialized in correcting academic texts

U.S. flag

An official website of the United States government

The .gov means it’s official. Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

The site is secure. The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

  • Publications
  • Account settings

Preview improvements coming to the PMC website in October 2024. Learn More or Try it out now .

  • Advanced Search
  • Journal List
  • PMC10248995

Logo of sysrev

Guidance to best tools and practices for systematic reviews

Kat kolaski.

1 Departments of Orthopaedic Surgery, Pediatrics, and Neurology, Wake Forest School of Medicine, Winston-Salem, NC USA

Lynne Romeiser Logan

2 Department of Physical Medicine and Rehabilitation, SUNY Upstate Medical University, Syracuse, NY USA

John P. A. Ioannidis

3 Departments of Medicine, of Epidemiology and Population Health, of Biomedical Data Science, and of Statistics, and Meta-Research Innovation Center at Stanford (METRICS), Stanford University School of Medicine, Stanford, CA USA

Associated Data

Data continue to accumulate indicating that many systematic reviews are methodologically flawed, biased, redundant, or uninformative. Some improvements have occurred in recent years based on empirical methods research and standardization of appraisal tools; however, many authors do not routinely or consistently apply these updated methods. In addition, guideline developers, peer reviewers, and journal editors often disregard current methodological standards. Although extensively acknowledged and explored in the methodological literature, most clinicians seem unaware of these issues and may automatically accept evidence syntheses (and clinical practice guidelines based on their conclusions) as trustworthy.

A plethora of methods and tools are recommended for the development and evaluation of evidence syntheses. It is important to understand what these are intended to do (and cannot do) and how they can be utilized. Our objective is to distill this sprawling information into a format that is understandable and readily accessible to authors, peer reviewers, and editors. In doing so, we aim to promote appreciation and understanding of the demanding science of evidence synthesis among stakeholders. We focus on well-documented deficiencies in key components of evidence syntheses to elucidate the rationale for current standards. The constructs underlying the tools developed to assess reporting, risk of bias, and methodological quality of evidence syntheses are distinguished from those involved in determining overall certainty of a body of evidence. Another important distinction is made between those tools used by authors to develop their syntheses as opposed to those used to ultimately judge their work.

Exemplar methods and research practices are described, complemented by novel pragmatic strategies to improve evidence syntheses. The latter include preferred terminology and a scheme to characterize types of research evidence. We organize best practice resources in a Concise Guide that can be widely adopted and adapted for routine implementation by authors and journals. Appropriate, informed use of these is encouraged, but we caution against their superficial application and emphasize their endorsement does not substitute for in-depth methodological training. By highlighting best practices with their rationale, we hope this guidance will inspire further evolution of methods and tools that can advance the field.

Supplementary Information

The online version contains supplementary material available at 10.1186/s13643-023-02255-9.

Part 1. The state of evidence synthesis

Evidence syntheses are commonly regarded as the foundation of evidence-based medicine (EBM). They are widely accredited for providing reliable evidence and, as such, they have significantly influenced medical research and clinical practice. Despite their uptake throughout health care and ubiquity in contemporary medical literature, some important aspects of evidence syntheses are generally overlooked or not well recognized. Evidence syntheses are mostly retrospective exercises, they often depend on weak or irreparably flawed data, and they may use tools that have acknowledged or yet unrecognized limitations. They are complicated and time-consuming undertakings prone to bias and errors. Production of a good evidence synthesis requires careful preparation and high levels of organization in order to limit potential pitfalls [ 1 ]. Many authors do not recognize the complexity of such an endeavor and the many methodological challenges they may encounter. Failure to do so is likely to result in research and resource waste.

Given their potential impact on people’s lives, it is crucial for evidence syntheses to correctly report on the current knowledge base. In order to be perceived as trustworthy, reliable demonstration of the accuracy of evidence syntheses is equally imperative [ 2 ]. Concerns about the trustworthiness of evidence syntheses are not recent developments. From the early years when EBM first began to gain traction until recent times when thousands of systematic reviews are published monthly [ 3 ] the rigor of evidence syntheses has always varied. Many systematic reviews and meta-analyses had obvious deficiencies because original methods and processes had gaps, lacked precision, and/or were not widely known. The situation has improved with empirical research concerning which methods to use and standardization of appraisal tools. However, given the geometrical increase in the number of evidence syntheses being published, a relatively larger pool of unreliable evidence syntheses is being published today.

Publication of methodological studies that critically appraise the methods used in evidence syntheses is increasing at a fast pace. This reflects the availability of tools specifically developed for this purpose [ 4 – 6 ]. Yet many clinical specialties report that alarming numbers of evidence syntheses fail on these assessments. The syntheses identified report on a broad range of common conditions including, but not limited to, cancer, [ 7 ] chronic obstructive pulmonary disease, [ 8 ] osteoporosis, [ 9 ] stroke, [ 10 ] cerebral palsy, [ 11 ] chronic low back pain, [ 12 ] refractive error, [ 13 ] major depression, [ 14 ] pain, [ 15 ] and obesity [ 16 , 17 ]. The situation is even more concerning with regard to evidence syntheses included in clinical practice guidelines (CPGs) [ 18 – 20 ]. Astonishingly, in a sample of CPGs published in 2017–18, more than half did not apply even basic systematic methods in the evidence syntheses used to inform their recommendations [ 21 ].

These reports, while not widely acknowledged, suggest there are pervasive problems not limited to evidence syntheses that evaluate specific kinds of interventions or include primary research of a particular study design (eg, randomized versus non-randomized) [ 22 ]. Similar concerns about the reliability of evidence syntheses have been expressed by proponents of EBM in highly circulated medical journals [ 23 – 26 ]. These publications have also raised awareness about redundancy, inadequate input of statistical expertise, and deficient reporting. These issues plague primary research as well; however, there is heightened concern for the impact of these deficiencies given the critical role of evidence syntheses in policy and clinical decision-making.

Methods and guidance to produce a reliable evidence synthesis

Several international consortiums of EBM experts and national health care organizations currently provide detailed guidance (Table ​ (Table1). 1 ). They draw criteria from the reporting and methodological standards of currently recommended appraisal tools, and regularly review and update their methods to reflect new information and changing needs. In addition, they endorse the Grading of Recommendations Assessment, Development and Evaluation (GRADE) system for rating the overall quality of a body of evidence [ 27 ]. These groups typically certify or commission systematic reviews that are published in exclusive databases (eg, Cochrane, JBI) or are used to develop government or agency sponsored guidelines or health technology assessments (eg, National Institute for Health and Care Excellence [NICE], Scottish Intercollegiate Guidelines Network [SIGN], Agency for Healthcare Research and Quality [AHRQ]). They offer developers of evidence syntheses various levels of methodological advice, technical and administrative support, and editorial assistance. Use of specific protocols and checklists are required for development teams within these groups, but their online methodological resources are accessible to any potential author.

Guidance for development of evidence syntheses

 Cochrane (formerly Cochrane Collaboration)
 JBI (formerly Joanna Briggs Institute)
 National Institute for Health and Care Excellence (NICE)—United Kingdom
 Scottish Intercollegiate Guidelines Network (SIGN) —Scotland
 Agency for Healthcare Research and Quality (AHRQ)—United States

Notably, Cochrane is the largest single producer of evidence syntheses in biomedical research; however, these only account for 15% of the total [ 28 ]. The World Health Organization requires Cochrane standards be used to develop evidence syntheses that inform their CPGs [ 29 ]. Authors investigating questions of intervention effectiveness in syntheses developed for Cochrane follow the Methodological Expectations of Cochrane Intervention Reviews [ 30 ] and undergo multi-tiered peer review [ 31 , 32 ]. Several empirical evaluations have shown that Cochrane systematic reviews are of higher methodological quality compared with non-Cochrane reviews [ 4 , 7 , 9 , 11 , 14 , 32 – 35 ]. However, some of these assessments have biases: they may be conducted by Cochrane-affiliated authors, and they sometimes use scales and tools developed and used in the Cochrane environment and by its partners. In addition, evidence syntheses published in the Cochrane database are not subject to space or word restrictions, while non-Cochrane syntheses are often limited. As a result, information that may be relevant to the critical appraisal of non-Cochrane reviews is often removed or is relegated to online-only supplements that may not be readily or fully accessible [ 28 ].

Influences on the state of evidence synthesis

Many authors are familiar with the evidence syntheses produced by the leading EBM organizations but can be intimidated by the time and effort necessary to apply their standards. Instead of following their guidance, authors may employ methods that are discouraged or outdated 28]. Suboptimal methods described in in the literature may then be taken up by others. For example, the Newcastle–Ottawa Scale (NOS) is a commonly used tool for appraising non-randomized studies [ 36 ]. Many authors justify their selection of this tool with reference to a publication that describes the unreliability of the NOS and recommends against its use [ 37 ]. Obviously, the authors who cite this report for that purpose have not read it. Authors and peer reviewers have a responsibility to use reliable and accurate methods and not copycat previous citations or substandard work [ 38 , 39 ]. Similar cautions may potentially extend to automation tools. These have concentrated on evidence searching [ 40 ] and selection given how demanding it is for humans to maintain truly up-to-date evidence [ 2 , 41 ]. Cochrane has deployed machine learning to identify randomized controlled trials (RCTs) and studies related to COVID-19, [ 2 , 42 ] but such tools are not yet commonly used [ 43 ]. The routine integration of automation tools in the development of future evidence syntheses should not displace the interpretive part of the process.

Editorials about unreliable or misleading systematic reviews highlight several of the intertwining factors that may contribute to continued publication of unreliable evidence syntheses: shortcomings and inconsistencies of the peer review process, lack of endorsement of current standards on the part of journal editors, the incentive structure of academia, industry influences, publication bias, and the lure of “predatory” journals [ 44 – 48 ]. At this juncture, clarification of the extent to which each of these factors contribute remains speculative, but their impact is likely to be synergistic.

Over time, the generalized acceptance of the conclusions of systematic reviews as incontrovertible has affected trends in the dissemination and uptake of evidence. Reporting of the results of evidence syntheses and recommendations of CPGs has shifted beyond medical journals to press releases and news headlines and, more recently, to the realm of social media and influencers. The lay public and policy makers may depend on these outlets for interpreting evidence syntheses and CPGs. Unfortunately, communication to the general public often reflects intentional or non-intentional misrepresentation or “spin” of the research findings [ 49 – 52 ] News and social media outlets also tend to reduce conclusions on a body of evidence and recommendations for treatment to binary choices (eg, “do it” versus “don’t do it”) that may be assigned an actionable symbol (eg, red/green traffic lights, smiley/frowning face emoji).

Strategies for improvement

Many authors and peer reviewers are volunteer health care professionals or trainees who lack formal training in evidence synthesis [ 46 , 53 ]. Informing them about research methodology could increase the likelihood they will apply rigorous methods [ 25 , 33 , 45 ]. We tackle this challenge, from both a theoretical and a practical perspective, by offering guidance applicable to any specialty. It is based on recent methodological research that is extensively referenced to promote self-study. However, the information presented is not intended to be substitute for committed training in evidence synthesis methodology; instead, we hope to inspire our target audience to seek such training. We also hope to inform a broader audience of clinicians and guideline developers influenced by evidence syntheses. Notably, these communities often include the same members who serve in different capacities.

In the following sections, we highlight methodological concepts and practices that may be unfamiliar, problematic, confusing, or controversial. In Part 2, we consider various types of evidence syntheses and the types of research evidence summarized by them. In Part 3, we examine some widely used (and misused) tools for the critical appraisal of systematic reviews and reporting guidelines for evidence syntheses. In Part 4, we discuss how to meet methodological conduct standards applicable to key components of systematic reviews. In Part 5, we describe the merits and caveats of rating the overall certainty of a body of evidence. Finally, in Part 6, we summarize suggested terminology, methods, and tools for development and evaluation of evidence syntheses that reflect current best practices.

Part 2. Types of syntheses and research evidence

A good foundation for the development of evidence syntheses requires an appreciation of their various methodologies and the ability to correctly identify the types of research potentially available for inclusion in the synthesis.

Types of evidence syntheses

Systematic reviews have historically focused on the benefits and harms of interventions; over time, various types of systematic reviews have emerged to address the diverse information needs of clinicians, patients, and policy makers [ 54 ] Systematic reviews with traditional components have become defined by the different topics they assess (Table 2.1 ). In addition, other distinctive types of evidence syntheses have evolved, including overviews or umbrella reviews, scoping reviews, rapid reviews, and living reviews. The popularity of these has been increasing in recent years [ 55 – 58 ]. A summary of the development, methods, available guidance, and indications for these unique types of evidence syntheses is available in Additional File 2 A.

Types of traditional systematic reviews

Review typeTopic assessedElements of research question (mnemonic)
Intervention [ , ]Benefits and harms of interventions used in healthcare. opulation, ntervention, omparator, utcome ( )
Diagnostic test accuracy [ ]How well a diagnostic test performs in diagnosing and detecting a particular disease. opulation, ndex test(s), and arget condition ( )
Qualitative
 Cochrane [ ]Questions are designed to improve understanding of intervention complexity, contextual variations, implementation, and stakeholder preferences and experiences.

etting, erspective, ntervention or Phenomenon of nterest, omparison, valuation ( )

ample, henomenon of nterest, esign, valuation, esearch type ( )

spective, etting, henomena of interest/Problem, nvironment, omparison (optional), me/timing, indings ( )

 JBI [ ]Questions inform meaningfulness and appropriateness of care and the impact of illness through documentation of stakeholder experiences, preferences, and priorities. opulation, the Phenomena of nterest, and the ntext
Prognostic [ ]Probable course or future outcome(s) of people with a health problem. opulation, ntervention (model), omparator, utcomes, iming, etting ( )
Etiology and risk [ ]The relationship (association) between certain factors (e.g., genetic, environmental) and the development of a disease or condition or other health outcome. opulation or groups at risk, xposure(s), associated utcome(s) (disease, symptom, or health condition of interest), the context/location or the time period and the length of time when relevant ( )
Measurement properties [ , ]What is the most suitable instrument to measure a construct of interest in a specific study population? opulation, nstrument, onstruct, utcomes ( )
Prevalence and incidence [ ]The frequency, distribution and determinants of specific factors, health states or conditions in a defined population: eg, how common is a particular disease or condition in a specific group of individuals?Factor, disease, symptom or health ndition of interest, the epidemiological indicator used to measure its frequency (prevalence, incidence), the ulation or groups at risk as well as the ntext/location and time period where relevant ( )

Both Cochrane [ 30 , 59 ] and JBI [ 60 ] provide methodologies for many types of evidence syntheses; they describe these with different terminology, but there is obvious overlap (Table 2.2 ). The majority of evidence syntheses published by Cochrane (96%) and JBI (62%) are categorized as intervention reviews. This reflects the earlier development and dissemination of their intervention review methodologies; these remain well-established [ 30 , 59 , 61 ] as both organizations continue to focus on topics related to treatment efficacy and harms. In contrast, intervention reviews represent only about half of the total published in the general medical literature, and several non-intervention review types contribute to a significant proportion of the other half.

Evidence syntheses published by Cochrane and JBI

Intervention857296.3Effectiveness43561.5
Diagnostic1761.9Diagnostic Test Accuracy91.3
Overview640.7Umbrella40.6
Methodology410.45Mixed Methods20.3
Qualitative170.19Qualitative15922.5
Prognostic110.12Prevalence and Incidence60.8
Rapid110.12Etiology and Risk71.0
Prototype 80.08Measurement Properties30.4
Economic60.6
Text and Opinion10.14
Scoping436.0
Comprehensive 324.5
Total = 8900Total = 707

a Data from https://www.cochranelibrary.com/cdsr/reviews . Accessed 17 Sep 2022

b Data obtained via personal email communication on 18 Sep 2022 with Emilie Francis, editorial assistant, JBI Evidence Synthesis

c Includes the following categories: prevalence, scoping, mixed methods, and realist reviews

d This methodology is not supported in the current version of the JBI Manual for Evidence Synthesis

Types of research evidence

There is consensus on the importance of using multiple study designs in evidence syntheses; at the same time, there is a lack of agreement on methods to identify included study designs. Authors of evidence syntheses may use various taxonomies and associated algorithms to guide selection and/or classification of study designs. These tools differentiate categories of research and apply labels to individual study designs (eg, RCT, cross-sectional). A familiar example is the Design Tree endorsed by the Centre for Evidence-Based Medicine [ 70 ]. Such tools may not be helpful to authors of evidence syntheses for multiple reasons.

Suboptimal levels of agreement and accuracy even among trained methodologists reflect challenges with the application of such tools [ 71 , 72 ]. Problematic distinctions or decision points (eg, experimental or observational, controlled or uncontrolled, prospective or retrospective) and design labels (eg, cohort, case control, uncontrolled trial) have been reported [ 71 ]. The variable application of ambiguous study design labels to non-randomized studies is common, making them especially prone to misclassification [ 73 ]. In addition, study labels do not denote the unique design features that make different types of non-randomized studies susceptible to different biases, including those related to how the data are obtained (eg, clinical trials, disease registries, wearable devices). Given this limitation, it is important to be aware that design labels preclude the accurate assignment of non-randomized studies to a “level of evidence” in traditional hierarchies [ 74 ].

These concerns suggest that available tools and nomenclature used to distinguish types of research evidence may not uniformly apply to biomedical research and non-health fields that utilize evidence syntheses (eg, education, economics) [ 75 , 76 ]. Moreover, primary research reports often do not describe study design or do so incompletely or inaccurately; thus, indexing in PubMed and other databases does not address the potential for misclassification [ 77 ]. Yet proper identification of research evidence has implications for several key components of evidence syntheses. For example, search strategies limited by index terms using design labels or study selection based on labels applied by the authors of primary studies may cause inconsistent or unjustified study inclusions and/or exclusions [ 77 ]. In addition, because risk of bias (RoB) tools consider attributes specific to certain types of studies and study design features, results of these assessments may be invalidated if an inappropriate tool is used. Appropriate classification of studies is also relevant for the selection of a suitable method of synthesis and interpretation of those results.

An alternative to these tools and nomenclature involves application of a few fundamental distinctions that encompass a wide range of research designs and contexts. While these distinctions are not novel, we integrate them into a practical scheme (see Fig. ​ Fig.1) 1 ) designed to guide authors of evidence syntheses in the basic identification of research evidence. The initial distinction is between primary and secondary studies. Primary studies are then further distinguished by: 1) the type of data reported (qualitative or quantitative); and 2) two defining design features (group or single-case and randomized or non-randomized). The different types of studies and study designs represented in the scheme are described in detail in Additional File 2 B. It is important to conceptualize their methods as complementary as opposed to contrasting or hierarchical [ 78 ]; each offers advantages and disadvantages that determine their appropriateness for answering different kinds of research questions in an evidence synthesis.

An external file that holds a picture, illustration, etc.
Object name is 13643_2023_2255_Fig1_HTML.jpg

Distinguishing types of research evidence

Application of these basic distinctions may avoid some of the potential difficulties associated with study design labels and taxonomies. Nevertheless, debatable methodological issues are raised when certain types of research identified in this scheme are included in an evidence synthesis. We briefly highlight those associated with inclusion of non-randomized studies, case reports and series, and a combination of primary and secondary studies.

Non-randomized studies

When investigating an intervention’s effectiveness, it is important for authors to recognize the uncertainty of observed effects reported by studies with high RoB. Results of statistical analyses that include such studies need to be interpreted with caution in order to avoid misleading conclusions [ 74 ]. Review authors may consider excluding randomized studies with high RoB from meta-analyses. Non-randomized studies of intervention (NRSI) are affected by a greater potential range of biases and thus vary more than RCTs in their ability to estimate a causal effect [ 79 ]. If data from NRSI are synthesized in meta-analyses, it is helpful to separately report their summary estimates [ 6 , 74 ].

Nonetheless, certain design features of NRSI (eg, which parts of the study were prospectively designed) may help to distinguish stronger from weaker ones. Cochrane recommends that authors of a review including NRSI focus on relevant study design features when determining eligibility criteria instead of relying on non-informative study design labels [ 79 , 80 ] This process is facilitated by a study design feature checklist; guidance on using the checklist is included with developers’ description of the tool [ 73 , 74 ]. Authors collect information about these design features during data extraction and then consider it when making final study selection decisions and when performing RoB assessments of the included NRSI.

Case reports and case series

Correctly identified case reports and case series can contribute evidence not well captured by other designs [ 81 ]; in addition, some topics may be limited to a body of evidence that consists primarily of uncontrolled clinical observations. Murad and colleagues offer a framework for how to include case reports and series in an evidence synthesis [ 82 ]. Distinguishing between cohort studies and case series in these syntheses is important, especially for those that rely on evidence from NRSI. Additional data obtained from studies misclassified as case series can potentially increase the confidence in effect estimates. Mathes and Pieper provide authors of evidence syntheses with specific guidance on distinguishing between cohort studies and case series, but emphasize the increased workload involved [ 77 ].

Primary and secondary studies

Synthesis of combined evidence from primary and secondary studies may provide a broad perspective on the entirety of available literature on a topic. This is, in fact, the recommended strategy for scoping reviews that may include a variety of sources of evidence (eg, CPGs, popular media). However, except for scoping reviews, the synthesis of data from primary and secondary studies is discouraged unless there are strong reasons to justify doing so.

Combining primary and secondary sources of evidence is challenging for authors of other types of evidence syntheses for several reasons [ 83 ]. Assessments of RoB for primary and secondary studies are derived from conceptually different tools, thus obfuscating the ability to make an overall RoB assessment of a combination of these study types. In addition, authors who include primary and secondary studies must devise non-standardized methods for synthesis. Note this contrasts with well-established methods available for updating existing evidence syntheses with additional data from new primary studies [ 84 – 86 ]. However, a new review that synthesizes data from primary and secondary studies raises questions of validity and may unintentionally support a biased conclusion because no existing methodological guidance is currently available [ 87 ].

Recommendations

We suggest that journal editors require authors to identify which type of evidence synthesis they are submitting and reference the specific methodology used for its development. This will clarify the research question and methods for peer reviewers and potentially simplify the editorial process. Editors should announce this practice and include it in the instructions to authors. To decrease bias and apply correct methods, authors must also accurately identify the types of research evidence included in their syntheses.

Part 3. Conduct and reporting

The need to develop criteria to assess the rigor of systematic reviews was recognized soon after the EBM movement began to gain international traction [ 88 , 89 ]. Systematic reviews rapidly became popular, but many were very poorly conceived, conducted, and reported. These problems remain highly prevalent [ 23 ] despite development of guidelines and tools to standardize and improve the performance and reporting of evidence syntheses [ 22 , 28 ]. Table 3.1  provides some historical perspective on the evolution of tools developed specifically for the evaluation of systematic reviews, with or without meta-analysis.

Tools specifying standards for systematic reviews with and without meta-analysis

 Quality of Reporting of Meta-analyses (QUOROM) StatementMoher 1999 [ ]
 Meta-analyses Of Observational Studies in Epidemiology (MOOSE)Stroup 2000 [ ]
 Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA)Moher 2009 [ ]
 PRISMA 2020 Page 2021 [ ]
 Overview Quality Assessment Questionnaire (OQAQ)Oxman and Guyatt 1991 [ ]
 Systematic Review Critical Appraisal SheetCentre for Evidence-based Medicine 2005 [ ]
 A Measurement Tool to Assess Systematic Reviews (AMSTAR)Shea 2007 [ ]
 AMSTAR-2 Shea 2017 [ ]
 Risk of Bias in Systematic Reviews (ROBIS) Whiting 2016 [ ]

a Currently recommended

b Validated tool for systematic reviews of interventions developed for use by authors of overviews or umbrella reviews

These tools are often interchangeably invoked when referring to the “quality” of an evidence synthesis. However, quality is a vague term that is frequently misused and misunderstood; more precisely, these tools specify different standards for evidence syntheses. Methodological standards address how well a systematic review was designed and performed [ 5 ]. RoB assessments refer to systematic flaws or limitations in the design, conduct, or analysis of research that distort the findings of the review [ 4 ]. Reporting standards help systematic review authors describe the methodology they used and the results of their synthesis in sufficient detail [ 92 ]. It is essential to distinguish between these evaluations: a systematic review may be biased, it may fail to report sufficient information on essential features, or it may exhibit both problems; a thoroughly reported systematic evidence synthesis review may still be biased and flawed while an otherwise unbiased one may suffer from deficient documentation.

We direct attention to the currently recommended tools listed in Table 3.1  but concentrate on AMSTAR-2 (update of AMSTAR [A Measurement Tool to Assess Systematic Reviews]) and ROBIS (Risk of Bias in Systematic Reviews), which evaluate methodological quality and RoB, respectively. For comparison and completeness, we include PRISMA 2020 (update of the 2009 Preferred Reporting Items for Systematic Reviews of Meta-Analyses statement), which offers guidance on reporting standards. The exclusive focus on these three tools is by design; it addresses concerns related to the considerable variability in tools used for the evaluation of systematic reviews [ 28 , 88 , 96 , 97 ]. We highlight the underlying constructs these tools were designed to assess, then describe their components and applications. Their known (or potential) uptake and impact and limitations are also discussed.

Evaluation of conduct

Development.

AMSTAR [ 5 ] was in use for a decade prior to the 2017 publication of AMSTAR-2; both provide a broad evaluation of methodological quality of intervention systematic reviews, including flaws arising through poor conduct of the review [ 6 ]. ROBIS, published in 2016, was developed to specifically assess RoB introduced by the conduct of the review; it is applicable to systematic reviews of interventions and several other types of reviews [ 4 ]. Both tools reflect a shift to a domain-based approach as opposed to generic quality checklists. There are a few items unique to each tool; however, similarities between items have been demonstrated [ 98 , 99 ]. AMSTAR-2 and ROBIS are recommended for use by: 1) authors of overviews or umbrella reviews and CPGs to evaluate systematic reviews considered as evidence; 2) authors of methodological research studies to appraise included systematic reviews; and 3) peer reviewers for appraisal of submitted systematic review manuscripts. For authors, these tools may function as teaching aids and inform conduct of their review during its development.

Description

Systematic reviews that include randomized and/or non-randomized studies as evidence can be appraised with AMSTAR-2 and ROBIS. Other characteristics of AMSTAR-2 and ROBIS are summarized in Table 3.2 . Both tools define categories for an overall rating; however, neither tool is intended to generate a total score by simply calculating the number of responses satisfying criteria for individual items [ 4 , 6 ]. AMSTAR-2 focuses on the rigor of a review’s methods irrespective of the specific subject matter. ROBIS places emphasis on a review’s results section— this suggests it may be optimally applied by appraisers with some knowledge of the review’s topic as they may be better equipped to determine if certain procedures (or lack thereof) would impact the validity of a review’s findings [ 98 , 100 ]. Reliability studies show AMSTAR-2 overall confidence ratings strongly correlate with the overall RoB ratings in ROBIS [ 100 , 101 ].

Comparison of AMSTAR-2 and ROBIS

Characteristic
ExtensiveExtensive
InterventionIntervention, diagnostic, etiology, prognostic
7 critical, 9 non-critical4
 Total number1629
 Response options

Items # 1, 3, 5, 6, 10, 13, 14, 16: rated or

Items # 2, 4, 7, 8, 9 : rated or

Items # 11 , 12, 15: rated or

24 assessment items: rated

5 items regarding level of concern: rated

 ConstructConfidence based on weaknesses in critical domainsLevel of concern for risk of bias
 CategoriesHigh, moderate, low, critically lowLow, high, unclear

a ROBIS includes an optional first phase to assess the applicability of the review to the research question of interest. The tool may be applicable to other review types in addition to the four specified, although modification of this initial phase will be needed (Personal Communication via email, Penny Whiting, 28 Jan 2022)

b AMSTAR-2 item #9 and #11 require separate responses for RCTs and NRSI

Interrater reliability has been shown to be acceptable for AMSTAR-2 [ 6 , 11 , 102 ] and ROBIS [ 4 , 98 , 103 ] but neither tool has been shown to be superior in this regard [ 100 , 101 , 104 , 105 ]. Overall, variability in reliability for both tools has been reported across items, between pairs of raters, and between centers [ 6 , 100 , 101 , 104 ]. The effects of appraiser experience on the results of AMSTAR-2 and ROBIS require further evaluation [ 101 , 105 ]. Updates to both tools should address items shown to be prone to individual appraisers’ subjective biases and opinions [ 11 , 100 ]; this may involve modifications of the current domains and signaling questions as well as incorporation of methods to make an appraiser’s judgments more explicit. Future revisions of these tools may also consider the addition of standards for aspects of systematic review development currently lacking (eg, rating overall certainty of evidence, [ 99 ] methods for synthesis without meta-analysis [ 105 ]) and removal of items that assess aspects of reporting that are thoroughly evaluated by PRISMA 2020.

Application

A good understanding of what is required to satisfy the standards of AMSTAR-2 and ROBIS involves study of the accompanying guidance documents written by the tools’ developers; these contain detailed descriptions of each item’s standards. In addition, accurate appraisal of a systematic review with either tool requires training. Most experts recommend independent assessment by at least two appraisers with a process for resolving discrepancies as well as procedures to establish interrater reliability, such as pilot testing, a calibration phase or exercise, and development of predefined decision rules [ 35 , 99 – 101 , 103 , 104 , 106 ]. These methods may, to some extent, address the challenges associated with the diversity in methodological training, subject matter expertise, and experience using the tools that are likely to exist among appraisers.

The standards of AMSTAR, AMSTAR-2, and ROBIS have been used in many methodological studies and epidemiological investigations. However, the increased publication of overviews or umbrella reviews and CPGs has likely been a greater influence on the widening acceptance of these tools. Critical appraisal of the secondary studies considered evidence is essential to the trustworthiness of both the recommendations of CPGs and the conclusions of overviews. Currently both Cochrane [ 55 ] and JBI [ 107 ] recommend AMSTAR-2 and ROBIS in their guidance for authors of overviews or umbrella reviews. However, ROBIS and AMSTAR-2 were released in 2016 and 2017, respectively; thus, to date, limited data have been reported about the uptake of these tools or which of the two may be preferred [ 21 , 106 ]. Currently, in relation to CPGs, AMSTAR-2 appears to be overwhelmingly popular compared to ROBIS. A Google Scholar search of this topic (search terms “AMSTAR 2 AND clinical practice guidelines,” “ROBIS AND clinical practice guidelines” 13 May 2022) found 12,700 hits for AMSTAR-2 and 1,280 for ROBIS. The apparent greater appeal of AMSTAR-2 may relate to its longer track record given the original version of the tool was in use for 10 years prior to its update in 2017.

Barriers to the uptake of AMSTAR-2 and ROBIS include the real or perceived time and resources necessary to complete the items they include and appraisers’ confidence in their own ratings [ 104 ]. Reports from comparative studies available to date indicate that appraisers find AMSTAR-2 questions, responses, and guidance to be clearer and simpler compared with ROBIS [ 11 , 101 , 104 , 105 ]. This suggests that for appraisal of intervention systematic reviews, AMSTAR-2 may be a more practical tool than ROBIS, especially for novice appraisers [ 101 , 103 – 105 ]. The unique characteristics of each tool, as well as their potential advantages and disadvantages, should be taken into consideration when deciding which tool should be used for an appraisal of a systematic review. In addition, the choice of one or the other may depend on how the results of an appraisal will be used; for example, a peer reviewer’s appraisal of a single manuscript versus an appraisal of multiple systematic reviews in an overview or umbrella review, CPG, or systematic methodological study.

Authors of overviews and CPGs report results of AMSTAR-2 and ROBIS appraisals for each of the systematic reviews they include as evidence. Ideally, an independent judgment of their appraisals can be made by the end users of overviews and CPGs; however, most stakeholders, including clinicians, are unlikely to have a sophisticated understanding of these tools. Nevertheless, they should at least be aware that AMSTAR-2 and ROBIS ratings reported in overviews and CPGs may be inaccurate because the tools are not applied as intended by their developers. This can result from inadequate training of the overview or CPG authors who perform the appraisals, or to modifications of the appraisal tools imposed by them. The potential variability in overall confidence and RoB ratings highlights why appraisers applying these tools need to support their judgments with explicit documentation; this allows readers to judge for themselves whether they agree with the criteria used by appraisers [ 4 , 108 ]. When these judgments are explicit, the underlying rationale used when applying these tools can be assessed [ 109 ].

Theoretically, we would expect an association of AMSTAR-2 with improved methodological rigor and an association of ROBIS with lower RoB in recent systematic reviews compared to those published before 2017. To our knowledge, this has not yet been demonstrated; however, like reports about the actual uptake of these tools, time will tell. Additional data on user experience is also needed to further elucidate the practical challenges and methodological nuances encountered with the application of these tools. This information could potentially inform the creation of unifying criteria to guide and standardize the appraisal of evidence syntheses [ 109 ].

Evaluation of reporting

Complete reporting is essential for users to establish the trustworthiness and applicability of a systematic review’s findings. Efforts to standardize and improve the reporting of systematic reviews resulted in the 2009 publication of the PRISMA statement [ 92 ] with its accompanying explanation and elaboration document [ 110 ]. This guideline was designed to help authors prepare a complete and transparent report of their systematic review. In addition, adherence to PRISMA is often used to evaluate the thoroughness of reporting of published systematic reviews [ 111 ]. The updated version, PRISMA 2020 [ 93 ], and its guidance document [ 112 ] were published in 2021. Items on the original and updated versions of PRISMA are organized by the six basic review components they address (title, abstract, introduction, methods, results, discussion). The PRISMA 2020 update is a considerably expanded version of the original; it includes standards and examples for the 27 original and 13 additional reporting items that capture methodological advances and may enhance the replicability of reviews [ 113 ].

The original PRISMA statement fostered the development of various PRISMA extensions (Table 3.3 ). These include reporting guidance for scoping reviews and reviews of diagnostic test accuracy and for intervention reviews that report on the following: harms outcomes, equity issues, the effects of acupuncture, the results of network meta-analyses and analyses of individual participant data. Detailed reporting guidance for specific systematic review components (abstracts, protocols, literature searches) is also available.

PRISMA extensions

PRISMA for systematic reviews with a focus on health equity [ ]PRISMA-E2012
Reporting systematic reviews in journal and conference abstracts [ ]PRISMA for Abstracts2015; 2020
PRISMA for systematic review protocols [ ]PRISMA-P2015
PRISMA for Network Meta-Analyses [ ]PRISMA-NMA2015
PRISMA for Individual Participant Data [ ]PRISMA-IPD2015
PRISMA for reviews including harms outcomes [ ]PRISMA-Harms2016
PRISMA for diagnostic test accuracy [ ]PRISMA-DTA2018
PRISMA for scoping reviews [ ]PRISMA-ScR2018
PRISMA for acupuncture [ ]PRISMA-A2019
PRISMA for reporting literature searches [ ]PRISMA-S2021

PRISMA, Preferred Reporting Items for Systematic Reviews and Meta-Analyses

a Note the abstract reporting checklist is now incorporated into PRISMA 2020 [ 93 ]

Uptake and impact

The 2009 PRISMA standards [ 92 ] for reporting have been widely endorsed by authors, journals, and EBM-related organizations. We anticipate the same for PRISMA 2020 [ 93 ] given its co-publication in multiple high-impact journals. However, to date, there is a lack of strong evidence for an association between improved systematic review reporting and endorsement of PRISMA 2009 standards [ 43 , 111 ]. Most journals require a PRISMA checklist accompany submissions of systematic review manuscripts. However, the accuracy of information presented on these self-reported checklists is not necessarily verified. It remains unclear which strategies (eg, authors’ self-report of checklists, peer reviewer checks) might improve adherence to the PRISMA reporting standards; in addition, the feasibility of any potentially effective strategies must be taken into consideration given the structure and limitations of current research and publication practices [ 124 ].

Pitfalls and limitations of PRISMA, AMSTAR-2, and ROBIS

Misunderstanding of the roles of these tools and their misapplication may be widespread problems. PRISMA 2020 is a reporting guideline that is most beneficial if consulted when developing a review as opposed to merely completing a checklist when submitting to a journal; at that point, the review is finished, with good or bad methodological choices. However, PRISMA checklists evaluate how completely an element of review conduct was reported, but do not evaluate the caliber of conduct or performance of a review. Thus, review authors and readers should not think that a rigorous systematic review can be produced by simply following the PRISMA 2020 guidelines. Similarly, it is important to recognize that AMSTAR-2 and ROBIS are tools to evaluate the conduct of a review but do not substitute for conceptual methodological guidance. In addition, they are not intended to be simple checklists. In fact, they have the potential for misuse or abuse if applied as such; for example, by calculating a total score to make a judgment about a review’s overall confidence or RoB. Proper selection of a response for the individual items on AMSTAR-2 and ROBIS requires training or at least reference to their accompanying guidance documents.

Not surprisingly, it has been shown that compliance with the PRISMA checklist is not necessarily associated with satisfying the standards of ROBIS [ 125 ]. AMSTAR-2 and ROBIS were not available when PRISMA 2009 was developed; however, they were considered in the development of PRISMA 2020 [ 113 ]. Therefore, future studies may show a positive relationship between fulfillment of PRISMA 2020 standards for reporting and meeting the standards of tools evaluating methodological quality and RoB.

Choice of an appropriate tool for the evaluation of a systematic review first involves identification of the underlying construct to be assessed. For systematic reviews of interventions, recommended tools include AMSTAR-2 and ROBIS for appraisal of conduct and PRISMA 2020 for completeness of reporting. All three tools were developed rigorously and provide easily accessible and detailed user guidance, which is necessary for their proper application and interpretation. When considering a manuscript for publication, training in these tools can sensitize peer reviewers and editors to major issues that may affect the review’s trustworthiness and completeness of reporting. Judgment of the overall certainty of a body of evidence and formulation of recommendations rely, in part, on AMSTAR-2 or ROBIS appraisals of systematic reviews. Therefore, training on the application of these tools is essential for authors of overviews and developers of CPGs. Peer reviewers and editors considering an overview or CPG for publication must hold their authors to a high standard of transparency regarding both the conduct and reporting of these appraisals.

Part 4. Meeting conduct standards

Many authors, peer reviewers, and editors erroneously equate fulfillment of the items on the PRISMA checklist with superior methodological rigor. For direction on methodology, we refer them to available resources that provide comprehensive conceptual guidance [ 59 , 60 ] as well as primers with basic step-by-step instructions [ 1 , 126 , 127 ]. This section is intended to complement study of such resources by facilitating use of AMSTAR-2 and ROBIS, tools specifically developed to evaluate methodological rigor of systematic reviews. These tools are widely accepted by methodologists; however, in the general medical literature, they are not uniformly selected for the critical appraisal of systematic reviews [ 88 , 96 ].

To enable their uptake, Table 4.1  links review components to the corresponding appraisal tool items. Expectations of AMSTAR-2 and ROBIS are concisely stated, and reasoning provided.

Systematic review components linked to appraisal with AMSTAR-2 and ROBIS a

Table Table
Methods for study selection#5#2.5All three components must be done in duplicate, and methods fully described.Helps to mitigate CoI and bias; also may improve accuracy.
Methods for data extraction#6#3.1
Methods for RoB assessmentNA#3.5
Study description#8#3.2Research design features, components of research question (eg, PICO), setting, funding sources.Allows readers to understand the individual studies in detail.
Sources of funding#10NAIdentified for all included studies.Can reveal CoI or bias.
Publication bias#15*#4.5Explored, diagrammed, and discussed.Publication and other selective reporting biases are major threats to the validity of systematic reviews.
Author CoI#16NADisclosed, with management strategies described.If CoI is identified, management strategies must be described to ensure confidence in the review.

CoI conflict of interest, MA meta-analysis, NA not addressed, PICO participant, intervention, comparison, outcome, PRISMA-P Preferred Reporting Items for Systematic Review and Meta-Analysis Protocols, RoB risk of bias

a Components shown in bold are chosen for elaboration in Part 4 for one (or both) of two reasons: 1) the component has been identified as potentially problematic for systematic review authors; and/or 2) the component is evaluated by standards of an AMSTAR-2 “critical” domain

b Critical domains of AMSTAR-2 are indicated by *

Issues involved in meeting the standards for seven review components (identified in bold in Table 4.1 ) are addressed in detail. These were chosen for elaboration for one (or both) of two reasons: 1) the component has been identified as potentially problematic for systematic review authors based on consistent reports of their frequent AMSTAR-2 or ROBIS deficiencies [ 9 , 11 , 15 , 88 , 128 , 129 ]; and/or 2) the review component is judged by standards of an AMSTAR-2 “critical” domain. These have the greatest implications for how a systematic review will be appraised: if standards for any one of these critical domains are not met, the review is rated as having “critically low confidence.”

Research question

Specific and unambiguous research questions may have more value for reviews that deal with hypothesis testing. Mnemonics for the various elements of research questions are suggested by JBI and Cochrane (Table 2.1 ). These prompt authors to consider the specialized methods involved for developing different types of systematic reviews; however, while inclusion of the suggested elements makes a review compliant with a particular review’s methods, it does not necessarily make a research question appropriate. Table 4.2  lists acronyms that may aid in developing the research question. They include overlapping concepts of importance in this time of proliferating reviews of uncertain value [ 130 ]. If these issues are not prospectively contemplated, systematic review authors may establish an overly broad scope, or develop runaway scope allowing them to stray from predefined choices relating to key comparisons and outcomes.

Research question development

AcronymMeaning
feasible, interesting, novel, ethical, and relevant
specific, measurable, attainable, relevant, timely
time, outcomes, population, intervention, context, study design, plus (effect) moderators

a Cummings SR, Browner WS, Hulley SB. Conceiving the research question and developing the study plan. In: Hulley SB, Cummings SR, Browner WS, editors. Designing clinical research: an epidemiological approach; 4th edn. Lippincott Williams & Wilkins; 2007. p. 14–22

b Doran, GT. There’s a S.M.A.R.T. way to write management’s goals and objectives. Manage Rev. 1981;70:35-6.

c Johnson BT, Hennessy EA. Systematic reviews and meta-analyses in the health sciences: best practice methods for research syntheses. Soc Sci Med. 2019;233:237–51

Once a research question is established, searching on registry sites and databases for existing systematic reviews addressing the same or a similar topic is necessary in order to avoid contributing to research waste [ 131 ]. Repeating an existing systematic review must be justified, for example, if previous reviews are out of date or methodologically flawed. A full discussion on replication of intervention systematic reviews, including a consensus checklist, can be found in the work of Tugwell and colleagues [ 84 ].

Protocol development is considered a core component of systematic reviews [ 125 , 126 , 132 ]. Review protocols may allow researchers to plan and anticipate potential issues, assess validity of methods, prevent arbitrary decision-making, and minimize bias that can be introduced by the conduct of the review. Registration of a protocol that allows public access promotes transparency of the systematic review’s methods and processes and reduces the potential for duplication [ 132 ]. Thinking early and carefully about all the steps of a systematic review is pragmatic and logical and may mitigate the influence of the authors’ prior knowledge of the evidence [ 133 ]. In addition, the protocol stage is when the scope of the review can be carefully considered by authors, reviewers, and editors; this may help to avoid production of overly ambitious reviews that include excessive numbers of comparisons and outcomes or are undisciplined in their study selection.

An association with attainment of AMSTAR standards in systematic reviews with published prospective protocols has been reported [ 134 ]. However, completeness of reporting does not seem to be different in reviews with a protocol compared to those without one [ 135 ]. PRISMA-P [ 116 ] and its accompanying elaboration and explanation document [ 136 ] can be used to guide and assess the reporting of protocols. A final version of the review should fully describe any protocol deviations. Peer reviewers may compare the submitted manuscript with any available pre-registered protocol; this is required if AMSTAR-2 or ROBIS are used for critical appraisal.

There are multiple options for the recording of protocols (Table 4.3 ). Some journals will peer review and publish protocols. In addition, many online sites offer date-stamped and publicly accessible protocol registration. Some of these are exclusively for protocols of evidence syntheses; others are less restrictive and offer researchers the capacity for data storage, sharing, and other workflow features. These sites document protocol details to varying extents and have different requirements [ 137 ]. The most popular site for systematic reviews, the International Prospective Register of Systematic Reviews (PROSPERO), for example, only registers reviews that report on an outcome with direct relevance to human health. The PROSPERO record documents protocols for all types of reviews except literature and scoping reviews. Of note, PROSPERO requires authors register their review protocols prior to any data extraction [ 133 , 138 ]. The electronic records of most of these registry sites allow authors to update their protocols and facilitate transparent tracking of protocol changes, which are not unexpected during the progress of the review [ 139 ].

Options for protocol registration of evidence syntheses

 BMJ Open
 BioMed Central
 JMIR Research Protocols
 World Journal of Meta-analysis
 Cochrane
 JBI
 PROSPERO

 Research Registry-

 Registry of Systematic Reviews/Meta-Analyses

 International Platform of Registered Systematic Review and Meta-analysis Protocols (INPLASY)
 Center for Open Science
 Protocols.io
 Figshare
 Open Science Framework
 Zenodo

a Authors are advised to contact their target journal regarding submission of systematic review protocols

b Registration is restricted to approved review projects

c The JBI registry lists review projects currently underway by JBI-affiliated entities. These records include a review’s title, primary author, research question, and PICO elements. JBI recommends that authors register eligible protocols with PROSPERO

d See Pieper and Rombey [ 137 ] for detailed characteristics of these five registries

e See Pieper and Rombey [ 137 ] for other systematic review data repository options

Study design inclusion

For most systematic reviews, broad inclusion of study designs is recommended [ 126 ]. This may allow comparison of results between contrasting study design types [ 126 ]. Certain study designs may be considered preferable depending on the type of review and nature of the research question. However, prevailing stereotypes about what each study design does best may not be accurate. For example, in systematic reviews of interventions, randomized designs are typically thought to answer highly specific questions while non-randomized designs often are expected to reveal greater information about harms or real-word evidence [ 126 , 140 , 141 ]. This may be a false distinction; randomized trials may be pragmatic [ 142 ], they may offer important (and more unbiased) information on harms [ 143 ], and data from non-randomized trials may not necessarily be more real-world-oriented [ 144 ].

Moreover, there may not be any available evidence reported by RCTs for certain research questions; in some cases, there may not be any RCTs or NRSI. When the available evidence is limited to case reports and case series, it is not possible to test hypotheses nor provide descriptive estimates or associations; however, a systematic review of these studies can still offer important insights [ 81 , 145 ]. When authors anticipate that limited evidence of any kind may be available to inform their research questions, a scoping review can be considered. Alternatively, decisions regarding inclusion of indirect as opposed to direct evidence can be addressed during protocol development [ 146 ]. Including indirect evidence at an early stage of intervention systematic review development allows authors to decide if such studies offer any additional and/or different understanding of treatment effects for their population or comparison of interest. Issues of indirectness of included studies are accounted for later in the process, during determination of the overall certainty of evidence (see Part 5 for details).

Evidence search

Both AMSTAR-2 and ROBIS require systematic and comprehensive searches for evidence. This is essential for any systematic review. Both tools discourage search restrictions based on language and publication source. Given increasing globalism in health care, the practice of including English-only literature should be avoided [ 126 ]. There are many examples in which language bias (different results in studies published in different languages) has been documented [ 147 , 148 ]. This does not mean that all literature, in all languages, is equally trustworthy [ 148 ]; however, the only way to formally probe for the potential of such biases is to consider all languages in the initial search. The gray literature and a search of trials may also reveal important details about topics that would otherwise be missed [ 149 – 151 ]. Again, inclusiveness will allow review authors to investigate whether results differ in gray literature and trials [ 41 , 151 – 153 ].

Authors should make every attempt to complete their review within one year as that is the likely viable life of a search. (1) If that is not possible, the search should be updated close to the time of completion [ 154 ]. Different research topics may warrant less of a delay, for example, in rapidly changing fields (as in the case of the COVID-19 pandemic), even one month may radically change the available evidence.

Excluded studies

AMSTAR-2 requires authors to provide references for any studies excluded at the full text phase of study selection along with reasons for exclusion; this allows readers to feel confident that all relevant literature has been considered for inclusion and that exclusions are defensible.

Risk of bias assessment of included studies

The design of the studies included in a systematic review (eg, RCT, cohort, case series) should not be equated with appraisal of its RoB. To meet AMSTAR-2 and ROBIS standards, systematic review authors must examine RoB issues specific to the design of each primary study they include as evidence. It is unlikely that a single RoB appraisal tool will be suitable for all research designs. In addition to tools for randomized and non-randomized studies, specific tools are available for evaluation of RoB in case reports and case series [ 82 ] and single-case experimental designs [ 155 , 156 ]. Note the RoB tools selected must meet the standards of the appraisal tool used to judge the conduct of the review. For example, AMSTAR-2 identifies four sources of bias specific to RCTs and NRSI that must be addressed by the RoB tool(s) chosen by the review authors. The Cochrane RoB-2 [ 157 ] tool for RCTs and ROBINS-I [ 158 ] for NRSI for RoB assessment meet the AMSTAR-2 standards. Appraisers on the review team should not modify any RoB tool without complete transparency and acknowledgment that they have invalidated the interpretation of the tool as intended by its developers [ 159 ]. Conduct of RoB assessments is not addressed AMSTAR-2; to meet ROBIS standards, two independent reviewers should complete RoB assessments of included primary studies.

Implications of the RoB assessments must be explicitly discussed and considered in the conclusions of the review. Discussion of the overall RoB of included studies may consider the weight of the studies at high RoB, the importance of the sources of bias in the studies being summarized, and if their importance differs in relationship to the outcomes reported. If a meta-analysis is performed, serious concerns for RoB of individual studies should be accounted for in these results as well. If the results of the meta-analysis for a specific outcome change when studies at high RoB are excluded, readers will have a more accurate understanding of this body of evidence. However, while investigating the potential impact of specific biases is a useful exercise, it is important to avoid over-interpretation, especially when there are sparse data.

Synthesis methods for quantitative data

Syntheses of quantitative data reported by primary studies are broadly categorized as one of two types: meta-analysis, and synthesis without meta-analysis (Table 4.4 ). Before deciding on one of these methods, authors should seek methodological advice about whether reported data can be transformed or used in other ways to provide a consistent effect measure across studies [ 160 , 161 ].

Common methods for quantitative synthesis

Aggregate data

Individual

participant data

Weighted average of effect estimates

Pairwise comparisons of effect estimates, CI

Overall effect estimate, CI, value

Evaluation of heterogeneity

Forest plot with summary statistic for average effect estimate
Network Variable The interventions, which are compared directly indirectlyNetwork diagram or graph, tabular presentations
Comparisons of relative effects between any pair of interventionsEffect estimates for intervention pairings
Summary relative effects for pair-wise comparisons with evaluations of inconsistency and heterogeneityForest plot, other methods
Treatment rankings (ie, probability that an intervention is among the best options)Rankogram plot
Summarizing effect estimates from separate studies (without combination that would provide an average effect estimate)Range and distribution of observed effects such as median, interquartile range, range

Box-and-whisker plot, bubble plot

Forest plot (without summary effect estimate)

Combining valuesCombined value, number of studiesAlbatross plot (study sample size against values per outcome)
Vote counting by direction of effect (eg, favors intervention over the comparator)Proportion of studies with an effect in the direction of interest, CI, valueHarvest plot, effect direction plot

CI confidence interval (or credible interval, if analysis is done in Bayesian framework)

a See text for descriptions of the types of data combined in each of these approaches

b See Additional File 4  for guidance on the structure and presentation of forest plots

c General approach is similar to aggregate data meta-analysis but there are substantial differences relating to data collection and checking and analysis [ 162 ]. This approach to syntheses is applicable to intervention, diagnostic, and prognostic systematic reviews [ 163 ]

d Examples include meta-regression, hierarchical and multivariate approaches [ 164 ]

e In-depth guidance and illustrations of these methods are provided in Chapter 12 of the Cochrane Handbook [ 160 ]

Meta-analysis

Systematic reviews that employ meta-analysis should not be referred to simply as “meta-analyses.” The term meta-analysis strictly refers to a specific statistical technique used when study effect estimates and their variances are available, yielding a quantitative summary of results. In general, methods for meta-analysis involve use of a weighted average of effect estimates from two or more studies. If considered carefully, meta-analysis increases the precision of the estimated magnitude of effect and can offer useful insights about heterogeneity and estimates of effects. We refer to standard references for a thorough introduction and formal training [ 165 – 167 ].

There are three common approaches to meta-analysis in current health care–related systematic reviews (Table 4.4 ). Aggregate meta-analyses is the most familiar to authors of evidence syntheses and their end users. This standard meta-analysis combines data on effect estimates reported by studies that investigate similar research questions involving direct comparisons of an intervention and comparator. Results of these analyses provide a single summary intervention effect estimate. If the included studies in a systematic review measure an outcome differently, their reported results may be transformed to make them comparable [ 161 ]. Forest plots visually present essential information about the individual studies and the overall pooled analysis (see Additional File 4  for details).

Less familiar and more challenging meta-analytical approaches used in secondary research include individual participant data (IPD) and network meta-analyses (NMA); PRISMA extensions provide reporting guidelines for both [ 117 , 118 ]. In IPD, the raw data on each participant from each eligible study are re-analyzed as opposed to the study-level data analyzed in aggregate data meta-analyses [ 168 ]. This may offer advantages, including the potential for limiting concerns about bias and allowing more robust analyses [ 163 ]. As suggested by the description in Table 4.4 , NMA is a complex statistical approach. It combines aggregate data [ 169 ] or IPD [ 170 ] for effect estimates from direct and indirect comparisons reported in two or more studies of three or more interventions. This makes it a potentially powerful statistical tool; while multiple interventions are typically available to treat a condition, few have been evaluated in head-to-head trials [ 171 ]. Both IPD and NMA facilitate a broader scope, and potentially provide more reliable and/or detailed results; however, compared with standard aggregate data meta-analyses, their methods are more complicated, time-consuming, and resource-intensive, and they have their own biases, so one needs sufficient funding, technical expertise, and preparation to employ them successfully [ 41 , 172 , 173 ].

Several items in AMSTAR-2 and ROBIS address meta-analysis; thus, understanding the strengths, weaknesses, assumptions, and limitations of methods for meta-analyses is important. According to the standards of both tools, plans for a meta-analysis must be addressed in the review protocol, including reasoning, description of the type of quantitative data to be synthesized, and the methods planned for combining the data. This should not consist of stock statements describing conventional meta-analysis techniques; rather, authors are expected to anticipate issues specific to their research questions. Concern for the lack of training in meta-analysis methods among systematic review authors cannot be overstated. For those with training, the use of popular software (eg, RevMan [ 174 ], MetaXL [ 175 ], JBI SUMARI [ 176 ]) may facilitate exploration of these methods; however, such programs cannot substitute for the accurate interpretation of the results of meta-analyses, especially for more complex meta-analytical approaches.

Synthesis without meta-analysis

There are varied reasons a meta-analysis may not be appropriate or desirable [ 160 , 161 ]. Syntheses that informally use statistical methods other than meta-analysis are variably referred to as descriptive, narrative, or qualitative syntheses or summaries; these terms are also applied to syntheses that make no attempt to statistically combine data from individual studies. However, use of such imprecise terminology is discouraged; in order to fully explore the results of any type of synthesis, some narration or description is needed to supplement the data visually presented in tabular or graphic forms [ 63 , 177 ]. In addition, the term “qualitative synthesis” is easily confused with a synthesis of qualitative data in a qualitative or mixed methods review. “Synthesis without meta-analysis” is currently the preferred description of other ways to combine quantitative data from two or more studies. Use of this specific terminology when referring to these types of syntheses also implies the application of formal methods (Table 4.4 ).

Methods for syntheses without meta-analysis involve structured presentations of the data in any tables and plots. In comparison to narrative descriptions of each study, these are designed to more effectively and transparently show patterns and convey detailed information about the data; they also allow informal exploration of heterogeneity [ 178 ]. In addition, acceptable quantitative statistical methods (Table 4.4 ) are formally applied; however, it is important to recognize these methods have significant limitations for the interpretation of the effectiveness of an intervention [ 160 ]. Nevertheless, when meta-analysis is not possible, the application of these methods is less prone to bias compared with an unstructured narrative description of included studies [ 178 , 179 ].

Vote counting is commonly used in systematic reviews and involves a tally of studies reporting results that meet some threshold of importance applied by review authors. Until recently, it has not typically been identified as a method for synthesis without meta-analysis. Guidance on an acceptable vote counting method based on direction of effect is currently available [ 160 ] and should be used instead of narrative descriptions of such results (eg, “more than half the studies showed improvement”; “only a few studies reported adverse effects”; “7 out of 10 studies favored the intervention”). Unacceptable methods include vote counting by statistical significance or magnitude of effect or some subjective rule applied by the authors.

AMSTAR-2 and ROBIS standards do not explicitly address conduct of syntheses without meta-analysis, although AMSTAR-2 items 13 and 14 might be considered relevant. Guidance for the complete reporting of syntheses without meta-analysis for systematic reviews of interventions is available in the Synthesis without Meta-analysis (SWiM) guideline [ 180 ] and methodological guidance is available in the Cochrane Handbook [ 160 , 181 ].

Familiarity with AMSTAR-2 and ROBIS makes sense for authors of systematic reviews as these appraisal tools will be used to judge their work; however, training is necessary for authors to truly appreciate and apply methodological rigor. Moreover, judgment of the potential contribution of a systematic review to the current knowledge base goes beyond meeting the standards of AMSTAR-2 and ROBIS. These tools do not explicitly address some crucial concepts involved in the development of a systematic review; this further emphasizes the need for author training.

We recommend that systematic review authors incorporate specific practices or exercises when formulating a research question at the protocol stage, These should be designed to raise the review team’s awareness of how to prevent research and resource waste [ 84 , 130 ] and to stimulate careful contemplation of the scope of the review [ 30 ]. Authors’ training should also focus on justifiably choosing a formal method for the synthesis of quantitative and/or qualitative data from primary research; both types of data require specific expertise. For typical reviews that involve syntheses of quantitative data, statistical expertise is necessary, initially for decisions about appropriate methods, [ 160 , 161 ] and then to inform any meta-analyses [ 167 ] or other statistical methods applied [ 160 ].

Part 5. Rating overall certainty of evidence

Report of an overall certainty of evidence assessment in a systematic review is an important new reporting standard of the updated PRISMA 2020 guidelines [ 93 ]. Systematic review authors are well acquainted with assessing RoB in individual primary studies, but much less familiar with assessment of overall certainty across an entire body of evidence. Yet a reliable way to evaluate this broader concept is now recognized as a vital part of interpreting the evidence.

Historical systems for rating evidence are based on study design and usually involve hierarchical levels or classes of evidence that use numbers and/or letters to designate the level/class. These systems were endorsed by various EBM-related organizations. Professional societies and regulatory groups then widely adopted them, often with modifications for application to the available primary research base in specific clinical areas. In 2002, a report issued by the AHRQ identified 40 systems to rate quality of a body of evidence [ 182 ]. A critical appraisal of systems used by prominent health care organizations published in 2004 revealed limitations in sensibility, reproducibility, applicability to different questions, and usability to different end users [ 183 ]. Persistent use of hierarchical rating schemes to describe overall quality continues to complicate the interpretation of evidence. This is indicated by recent reports of poor interpretability of systematic review results by readers [ 184 – 186 ] and misleading interpretations of the evidence related to the “spin” systematic review authors may put on their conclusions [ 50 , 187 ].

Recognition of the shortcomings of hierarchical rating systems raised concerns that misleading clinical recommendations could result even if based on a rigorous systematic review. In addition, the number and variability of these systems were considered obstacles to quick and accurate interpretations of the evidence by clinicians, patients, and policymakers [ 183 ]. These issues contributed to the development of the GRADE approach. An international working group, that continues to actively evaluate and refine it, first introduced GRADE in 2004 [ 188 ]. Currently more than 110 organizations from 19 countries around the world have endorsed or are using GRADE [ 189 ].

GRADE approach to rating overall certainty

GRADE offers a consistent and sensible approach for two separate processes: rating the overall certainty of a body of evidence and the strength of recommendations. The former is the expected conclusion of a systematic review, while the latter is pertinent to the development of CPGs. As such, GRADE provides a mechanism to bridge the gap from evidence synthesis to application of the evidence for informed clinical decision-making [ 27 , 190 ]. We briefly examine the GRADE approach but only as it applies to rating overall certainty of evidence in systematic reviews.

In GRADE, use of “certainty” of a body of evidence is preferred over the term “quality.” [ 191 ] Certainty refers to the level of confidence systematic review authors have that, for each outcome, an effect estimate represents the true effect. The GRADE approach to rating confidence in estimates begins with identifying the study type (RCT or NRSI) and then systematically considers criteria to rate the certainty of evidence up or down (Table 5.1 ).

GRADE criteria for rating certainty of evidence

[ ]
Risk of bias [ ]Large magnitude of effect
Imprecision [ ]Dose–response gradient
Inconsistency [ ]All residual confounding would decrease magnitude of effect (in situations with an effect)
Indirectness [ ]
Publication bias [ ]

a Applies to randomized studies

b Applies to non-randomized studies

This process results in assignment of one of the four GRADE certainty ratings to each outcome; these are clearly conveyed with the use of basic interpretation symbols (Table 5.2 ) [ 192 ]. Notably, when multiple outcomes are reported in a systematic review, each outcome is assigned a unique certainty rating; thus different levels of certainty may exist in the body of evidence being examined.

GRADE certainty ratings and their interpretation symbols a

 ⊕  ⊕  ⊕  ⊕ High: We are very confident that the true effect lies close to that of the estimate of the effect
 ⊕  ⊕  ⊕ Moderate: We are moderately confident in the effect estimate: the true effect is likely to be close to the estimate of the effect, but there is a possibility that it is substantially different
 ⊕  ⊕ Low: Our confidence in the effect estimate is limited: the true effect may be substantially different from the estimate of the effect
 ⊕ Very low: We have very little confidence in the effect estimate: the true effect is likely to be substantially different from the estimate of effect

a From the GRADE Handbook [ 192 ]

GRADE’s developers acknowledge some subjectivity is involved in this process [ 193 ]. In addition, they emphasize that both the criteria for rating evidence up and down (Table 5.1 ) as well as the four overall certainty ratings (Table 5.2 ) reflect a continuum as opposed to discrete categories [ 194 ]. Consequently, deciding whether a study falls above or below the threshold for rating up or down may not be straightforward, and preliminary overall certainty ratings may be intermediate (eg, between low and moderate). Thus, the proper application of GRADE requires systematic review authors to take an overall view of the body of evidence and explicitly describe the rationale for their final ratings.

Advantages of GRADE

Outcomes important to the individuals who experience the problem of interest maintain a prominent role throughout the GRADE process [ 191 ]. These outcomes must inform the research questions (eg, PICO [population, intervention, comparator, outcome]) that are specified a priori in a systematic review protocol. Evidence for these outcomes is then investigated and each critical or important outcome is ultimately assigned a certainty of evidence as the end point of the review. Notably, limitations of the included studies have an impact at the outcome level. Ultimately, the certainty ratings for each outcome reported in a systematic review are considered by guideline panels. They use a different process to formulate recommendations that involves assessment of the evidence across outcomes [ 201 ]. It is beyond our scope to describe the GRADE process for formulating recommendations; however, it is critical to understand how these two outcome-centric concepts of certainty of evidence in the GRADE framework are related and distinguished. An in-depth illustration using examples from recently published evidence syntheses and CPGs is provided in Additional File 5 A (Table AF5A-1).

The GRADE approach is applicable irrespective of whether the certainty of the primary research evidence is high or very low; in some circumstances, indirect evidence of higher certainty may be considered if direct evidence is unavailable or of low certainty [ 27 ]. In fact, most interventions and outcomes in medicine have low or very low certainty of evidence based on GRADE and there seems to be no major improvement over time [ 202 , 203 ]. This is still a very important (even if sobering) realization for calibrating our understanding of medical evidence. A major appeal of the GRADE approach is that it offers a common framework that enables authors of evidence syntheses to make complex judgments about evidence certainty and to convey these with unambiguous terminology. This prevents some common mistakes made by review authors, including overstating results (or under-reporting harms) [ 187 ] and making recommendations for treatment. This is illustrated in Table AF5A-2 (Additional File 5 A), which compares the concluding statements made about overall certainty in a systematic review with and without application of the GRADE approach.

Theoretically, application of GRADE should improve consistency of judgments about certainty of evidence, both between authors and across systematic reviews. In one empirical evaluation conducted by the GRADE Working Group, interrater reliability of two individual raters assessing certainty of the evidence for a specific outcome increased from ~ 0.3 without using GRADE to ~ 0.7 by using GRADE [ 204 ]. However, others report variable agreement among those experienced in GRADE assessments of evidence certainty [ 190 ]. Like any other tool, GRADE requires training in order to be properly applied. The intricacies of the GRADE approach and the necessary subjectivity involved suggest that improving agreement may require strict rules for its application; alternatively, use of general guidance and consensus among review authors may result in less consistency but provide important information for the end user [ 190 ].

GRADE caveats

Simply invoking “the GRADE approach” does not automatically ensure GRADE methods were employed by authors of a systematic review (or developers of a CPG). Table 5.3 lists the criteria the GRADE working group has established for this purpose. These criteria highlight the specific terminology and methods that apply to rating the certainty of evidence for outcomes reported in a systematic review [ 191 ], which is different from rating overall certainty across outcomes considered in the formulation of recommendations [ 205 ]. Modifications of standard GRADE methods and terminology are discouraged as these may detract from GRADE’s objectives to minimize conceptual confusion and maximize clear communication [ 206 ].

Criteria for using GRADE in a systematic review a

1. The certainty in the evidence (also known as quality of evidence or confidence in the estimates) should be defined consistently with the definitions used by the GRADE Working Group.
2. Explicit consideration should be given to each of the GRADE domains for assessing the certainty in the evidence (although different terminology may be used).
3. The overall certainty in the evidence should be assessed for each important outcome using four or three categories (such as high, moderate, low and/or very low) and definitions for each category that are consistent with the definitions used by the GRADE Working Group.
4. Evidence summaries … should be used as the basis for judgments about the certainty in the evidence.

a Adapted from the GRADE working group [ 206 ]; this list does not contain the additional criteria that apply to the development of a clinical practice guideline

Nevertheless, GRADE is prone to misapplications [ 207 , 208 ], which can distort a systematic review’s conclusions about the certainty of evidence. Systematic review authors without proper GRADE training are likely to misinterpret the terms “quality” and “grade” and to misunderstand the constructs assessed by GRADE versus other appraisal tools. For example, review authors may reference the standard GRADE certainty ratings (Table 5.2 ) to describe evidence for their outcome(s) of interest. However, these ratings are invalidated if authors omit or inadequately perform RoB evaluations of each included primary study. Such deficiencies in RoB assessments are unacceptable but not uncommon, as reported in methodological studies of systematic reviews and overviews [ 104 , 186 , 209 , 210 ]. GRADE ratings are also invalidated if review authors do not formally address and report on the other criteria (Table 5.1 ) necessary for a GRADE certainty rating.

Other caveats pertain to application of a GRADE certainty of evidence rating in various types of evidence syntheses. Current adaptations of GRADE are described in Additional File 5 B and included on Table 6.3 , which is introduced in the next section.

Concise Guide to best practices for evidence syntheses, version 1.0 a

Cochrane , JBICochrane, JBICochraneCochrane, JBIJBIJBIJBICochrane, JBIJBI
 ProtocolPRISMA-P [ ]PRISMA-PPRISMA-PPRISMA-PPRISMA-PPRISMA-PPRISMA-PPRISMA-PPRISMA-P
 Systematic reviewPRISMA 2020 [ ]PRISMA-DTA [ ]PRISMA 2020

eMERGe [ ]

ENTREQ [ ]

PRISMA 2020PRISMA 2020PRISMA 2020PRIOR [ ]PRISMA-ScR [ ]
 Synthesis without MASWiM [ ]PRISMA-DTA [ ]SWiM eMERGe [ ] ENTREQ [ ] SWiM SWiM SWiM PRIOR [ ]

For RCTs: Cochrane RoB2 [ ]

For NRSI:

ROBINS-I [ ]

Other primary research

QUADAS-2[ ]

Factor review QUIPS [ ]

Model review PROBAST [ ]

CASP qualitative checklist [ ]

JBI Critical Appraisal Checklist [ ]

JBI checklist for studies reporting prevalence data [ ]

For NRSI: ROBINS-I [ ]

Other primary research

COSMIN RoB Checklist [ ]AMSTAR-2 [ ] or ROBIS [ ]Not required
GRADE [ ]GRADE adaptation GRADE adaptation

CERQual [ ]

ConQual [ ]

GRADE adaptation Risk factors GRADE adaptation

GRADE (for intervention reviews)

Risk factors

Not applicable

AMSTAR A MeaSurement Tool to Assess Systematic Reviews, CASP Critical Appraisal Skills Programme, CERQual Confidence in the Evidence from Reviews of Qualitative research, ConQual Establishing Confidence in the output of Qualitative research synthesis, COSMIN COnsensus-based Standards for the selection of health Measurement Instruments, DTA diagnostic test accuracy, eMERGe meta-ethnography reporting guidance, ENTREQ enhancing transparency in reporting the synthesis of qualitative research, GRADE Grading of Recommendations Assessment, Development and Evaluation, MA meta-analysis, NRSI non-randomized studies of interventions, P protocol, PRIOR Preferred Reporting Items for Overviews of Reviews, PRISMA Preferred Reporting Items for Systematic Reviews and Meta-Analyses, PROBAST Prediction model Risk Of Bias ASsessment Tool, QUADAS quality assessment of studies of diagnostic accuracy included in systematic reviews, QUIPS Quality In Prognosis Studies, RCT randomized controlled trial, RoB risk of bias, ROBINS-I Risk Of Bias In Non-randomised Studies of Interventions, ROBIS Risk of Bias in Systematic Reviews, ScR scoping review, SWiM systematic review without meta-analysis

a Superscript numbers represent citations provided in the main reference list. Additional File 6 lists links to available online resources for the methods and tools included in the Concise Guide

b The MECIR manual [ 30 ] provides Cochrane’s specific standards for both reporting and conduct of intervention systematic reviews and protocols

c Editorial and peer reviewers can evaluate completeness of reporting in submitted manuscripts using these tools. Authors may be required to submit a self-reported checklist for the applicable tools

d The decision flowchart described by Flemming and colleagues [ 223 ] is recommended for guidance on how to choose the best approach to reporting for qualitative reviews

e SWiM was developed for intervention studies reporting quantitative data. However, if there is not a more directly relevant reporting guideline, SWiM may prompt reviewers to consider the important details to report. (Personal Communication via email, Mhairi Campbell, 14 Dec 2022)

f JBI recommends their own tools for the critical appraisal of various quantitative primary study designs included in systematic reviews of intervention effectiveness, prevalence and incidence, and etiology and risk as well as for the critical appraisal of systematic reviews included in umbrella reviews. However, except for the JBI Checklists for studies reporting prevalence data and qualitative research, the development, validity, and reliability of these tools are not well documented

g Studies that are not RCTs or NRSI require tools developed specifically to evaluate their design features. Examples include single case experimental design [ 155 , 156 ] and case reports and series [ 82 ]

h The evaluation of methodological quality of studies included in a synthesis of qualitative research is debatable [ 224 ]. Authors may select a tool appropriate for the type of qualitative synthesis methodology employed. The CASP Qualitative Checklist [ 218 ] is an example of a published, commonly used tool that focuses on assessment of the methodological strengths and limitations of qualitative studies. The JBI Critical Appraisal Checklist for Qualitative Research [ 219 ] is recommended for reviews using a meta-aggregative approach

i Consider including risk of bias assessment of included studies if this information is relevant to the research question; however, scoping reviews do not include an assessment of the overall certainty of a body of evidence

j Guidance available from the GRADE working group [ 225 , 226 ]; also recommend consultation with the Cochrane diagnostic methods group

k Guidance available from the GRADE working group [ 227 ]; also recommend consultation with Cochrane prognostic methods group

l Used for syntheses in reviews with a meta-aggregative approach [ 224 ]

m Chapter 5 in the JBI Manual offers guidance on how to adapt GRADE to prevalence and incidence reviews [ 69 ]

n Janiaud and colleagues suggest criteria for evaluating evidence certainty for meta-analyses of non-randomized studies evaluating risk factors [ 228 ]

o The COSMIN user manual provides details on how to apply GRADE in systematic reviews of measurement properties [ 229 ]

The expected culmination of a systematic review should be a rating of overall certainty of a body of evidence for each outcome reported. The GRADE approach is recommended for making these judgments for outcomes reported in systematic reviews of interventions and can be adapted for other types of reviews. This represents the initial step in the process of making recommendations based on evidence syntheses. Peer reviewers should ensure authors meet the minimal criteria for supporting the GRADE approach when reviewing any evidence synthesis that reports certainty ratings derived using GRADE. Authors and peer reviewers of evidence syntheses unfamiliar with GRADE are encouraged to seek formal training and take advantage of the resources available on the GRADE website [ 211 , 212 ].

Part 6. Concise Guide to best practices

Accumulating data in recent years suggest that many evidence syntheses (with or without meta-analysis) are not reliable. This relates in part to the fact that their authors, who are often clinicians, can be overwhelmed by the plethora of ways to evaluate evidence. They tend to resort to familiar but often inadequate, inappropriate, or obsolete methods and tools and, as a result, produce unreliable reviews. These manuscripts may not be recognized as such by peer reviewers and journal editors who may disregard current standards. When such a systematic review is published or included in a CPG, clinicians and stakeholders tend to believe that it is trustworthy. A vicious cycle in which inadequate methodology is rewarded and potentially misleading conclusions are accepted is thus supported. There is no quick or easy way to break this cycle; however, increasing awareness of best practices among all these stakeholder groups, who often have minimal (if any) training in methodology, may begin to mitigate it. This is the rationale for inclusion of Parts 2 through 5 in this guidance document. These sections present core concepts and important methodological developments that inform current standards and recommendations. We conclude by taking a direct and practical approach.

Inconsistent and imprecise terminology used in the context of development and evaluation of evidence syntheses is problematic for authors, peer reviewers and editors, and may lead to the application of inappropriate methods and tools. In response, we endorse use of the basic terms (Table 6.1 ) defined in the PRISMA 2020 statement [ 93 ]. In addition, we have identified several problematic expressions and nomenclature. In Table 6.2 , we compile suggestions for preferred terms less likely to be misinterpreted.

Terms relevant to the reporting of health care–related evidence syntheses a

A review that uses explicit, systematic methods to collate and synthesize findings of studies that address a clearly formulated question.
The combination of quantitative results of two or more studies. This encompasses meta-analysis of effect estimates and other methods, such as combining values, calculating the range and distribution of observed effects, and vote counting based on the direction of effect.
A statistical technique used to synthesize results when study effect estimates and their variances are available, yielding a quantitative summary of results.
An event or measurement collected for participants in a study (such as quality of life, mortality).
The combination of a point estimate (such as a mean difference, risk ratio or proportion) and a measure of its precision (such as a confidence/credible interval) for a particular outcome.
A document (paper or electronic) supplying information about a particular study. It could be a journal article, preprint, conference abstract, study register entry, clinical study report, dissertation, unpublished manuscript, government report, or any other document providing relevant information.
The title or abstract (or both) of a report indexed in a database or website (such as a title or abstract for an article indexed in Medline). Records that refer to the same report (such as the same journal article) are “duplicates”; however, records that refer to reports that are merely similar (such as a similar abstract submitted to two different conferences) should be considered unique.
An investigation, such as a clinical trial, that includes a defined group of participants and one or more interventions and outcomes. A “study” might have multiple reports. For example, reports could include the protocol, statistical analysis plan, baseline characteristics, results for the primary outcome, results for harms, results for secondary outcomes, and results for additional mediator and moderator analyses.

a Reproduced from Page and colleagues [ 93 ]

Terminology suggestions for health care–related evidence syntheses

PreferredPotentially problematic

Evidence synthesis with meta-analysis

Systematic review with meta-analysis

Meta-analysis
Overview or umbrella review

Systematic review of systematic reviews

Review of reviews

Meta-review

RandomizedExperimental
Non-randomizedObservational
Single case experimental design

Single-subject research

N-of-1 design

Case report or case seriesDescriptive study
Methodological qualityQuality
Certainty of evidence

Quality of evidence

Grade of evidence

Level of evidence

Strength of evidence

Qualitative systematic reviewQualitative synthesis
Synthesis of qualitative data Qualitative synthesis
Synthesis without meta-analysis

Narrative synthesis , narrative summary

Qualitative synthesis

Descriptive synthesis, descriptive summary

a For example, meta-aggregation, meta-ethnography, critical interpretative synthesis, realist synthesis

b This term may best apply to the synthesis in a mixed methods systematic review in which data from different types of evidence (eg, qualitative, quantitative, economic) are summarized [ 64 ]

We also propose a Concise Guide (Table 6.3 ) that summarizes the methods and tools recommended for the development and evaluation of nine types of evidence syntheses. Suggestions for specific tools are based on the rigor of their development as well as the availability of detailed guidance from their developers to ensure their proper application. The formatting of the Concise Guide addresses a well-known source of confusion by clearly distinguishing the underlying methodological constructs that these tools were designed to assess. Important clarifications and explanations follow in the guide’s footnotes; associated websites, if available, are listed in Additional File 6 .

To encourage uptake of best practices, journal editors may consider adopting or adapting the Concise Guide in their instructions to authors and peer reviewers of evidence syntheses. Given the evolving nature of evidence synthesis methodology, the suggested methods and tools are likely to require regular updates. Authors of evidence syntheses should monitor the literature to ensure they are employing current methods and tools. Some types of evidence syntheses (eg, rapid, economic, methodological) are not included in the Concise Guide; for these, authors are advised to obtain recommendations for acceptable methods by consulting with their target journal.

We encourage the appropriate and informed use of the methods and tools discussed throughout this commentary and summarized in the Concise Guide (Table 6.3 ). However, we caution against their application in a perfunctory or superficial fashion. This is a common pitfall among authors of evidence syntheses, especially as the standards of such tools become associated with acceptance of a manuscript by a journal. Consequently, published evidence syntheses may show improved adherence to the requirements of these tools without necessarily making genuine improvements in their performance.

In line with our main objective, the suggested tools in the Concise Guide address the reliability of evidence syntheses; however, we recognize that the utility of systematic reviews is an equally important concern. An unbiased and thoroughly reported evidence synthesis may still not be highly informative if the evidence itself that is summarized is sparse, weak and/or biased [ 24 ]. Many intervention systematic reviews, including those developed by Cochrane [ 203 ] and those applying GRADE [ 202 ], ultimately find no evidence, or find the evidence to be inconclusive (eg, “weak,” “mixed,” or of “low certainty”). This often reflects the primary research base; however, it is important to know what is known (or not known) about a topic when considering an intervention for patients and discussing treatment options with them.

Alternatively, the frequency of “empty” and inconclusive reviews published in the medical literature may relate to limitations of conventional methods that focus on hypothesis testing; these have emphasized the importance of statistical significance in primary research and effect sizes from aggregate meta-analyses [ 183 ]. It is becoming increasingly apparent that this approach may not be appropriate for all topics [ 130 ]. Development of the GRADE approach has facilitated a better understanding of significant factors (beyond effect size) that contribute to the overall certainty of evidence. Other notable responses include the development of integrative synthesis methods for the evaluation of complex interventions [ 230 , 231 ], the incorporation of crowdsourcing and machine learning into systematic review workflows (eg the Cochrane Evidence Pipeline) [ 2 ], the shift in paradigm to living systemic review and NMA platforms [ 232 , 233 ] and the proposal of a new evidence ecosystem that fosters bidirectional collaborations and interactions among a global network of evidence synthesis stakeholders [ 234 ]. These evolutions in data sources and methods may ultimately make evidence syntheses more streamlined, less duplicative, and more importantly, they may be more useful for timely policy and clinical decision-making; however, that will only be the case if they are rigorously reported and conducted.

We look forward to others’ ideas and proposals for the advancement of methods for evidence syntheses. For now, we encourage dissemination and uptake of the currently accepted best tools and practices for their development and evaluation; at the same time, we stress that uptake of appraisal tools, checklists, and software programs cannot substitute for proper education in the methodology of evidence syntheses and meta-analysis. Authors, peer reviewers, and editors must strive to make accurate and reliable contributions to the present evidence knowledge base; online alerts, upcoming technology, and accessible education may make this more feasible than ever before. Our intention is to improve the trustworthiness of evidence syntheses across disciplines, topics, and types of evidence syntheses. All of us must continue to study, teach, and act cooperatively for that to happen.

Acknowledgements

Michelle Oakman Hayes for her assistance with the graphics, Mike Clarke for his willingness to answer our seemingly arbitrary questions, and Bernard Dan for his encouragement of this project.

Authors’ contributions

All authors participated in the development of the ideas, writing, and review of this manuscript. The author(s) read and approved the final manuscript.

The work of John Ioannidis has been supported by an unrestricted gift from Sue and Bob O’Donnell to Stanford University.

Declarations

The authors declare no competing interests.

This article has been published simultaneously in BMC Systematic Reviews, Acta Anaesthesiologica Scandinavica, BMC Infectious Diseases, British Journal of Pharmacology, JBI Evidence Synthesis, the Journal of Bone and Joint Surgery Reviews , and the Journal of Pediatric Rehabilitation Medicine .

Publisher’ s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

A systematic literature review on determinants and outcomes of ESG performance in family firms

  • Open access
  • Published: 09 September 2024

Cite this article

You have full access to this open access article

systematic literature review vertaling

  • Ramona Waldau   ORCID: orcid.org/0009-0007-4410-2892 1  

68 Accesses

Explore all metrics

As the predominant business type, family firms hold a unique position to influence the global sector’s ESG footprint. However, research on their ESG activities and performance is complex, multi-layered, and currently lacks integration. This review aims to bridge these research disciplines by providing an integrative overview of the current state of family firm ESG literature. By systematically reviewing 127 peer-reviewed studies published between 2000 and mid-2024, I examine the determinants and outcomes of family firm ESG performance, synthesize existing knowledge, and suggest future research directions. The findings reveal the nuanced and at times ambiguous role of family involvement across different ESG dimensions. Additionally, methodological challenges have contributed to inconclusive results in certain areas. This literature review identifies several promising new directions for future research at the intersection of family firm and ESG research to enhance our understanding and foster a more integrated and comprehensive approach to studying ESG in family firms.

Similar content being viewed by others

systematic literature review vertaling

Family firm ownership and its impact on performance: evidence from an emerging market

systematic literature review vertaling

The use of value-based management in family firms

New insights on economic theories of the family firm.

Avoid common mistakes on your manuscript.

1 Introduction

The influence of family involvement in relation to environmental, social, and governance (ESG) topics has been the focus of numerous recent studies. Family firms (FFs) have a great impact on the ecological and social footprint of the global private business sector due to their ubiquity and economic contribution (La Porta et al. 1999 ; De Massis et al. 2018 ). Therefore, it is crucial for researchers, practitioners, and policymakers to ask the following question: “What factors determine FFs’ ESG activities and performance, and what are the outcomes of FFs’ ESG activities and performance?” However, the relationship between FFs and their ESG activities and performance is proven to be complex and multi-layered. Some researchers find that certain FF subsystems, particularly ownership, positively impact specific aspects of ESG activities and performance, while others report negative effects (e.g., Rees and Rodionova 2015 ; Bammens and Hünermund 2020 ). A key concept explaining the effect of family ownership on ESG activities and performance is socioemotional wealth (SEW), which is considered to be the most important characteristic distinguishing FFs from all other organizational forms (Berrone et al. 2012 ). SEW is defined as the non-financial aspects that fulfill the emotional needs of the owning family (Gómez-Mejía et al. 2007 ), such as identification of family members with the firm or renewal of family bonds through firm succession (Berrone et al. 2012 ). Given the complexity of both FF subsystems and ESG issues, along with the increasing number of studies that bridge these areas from various research fields—including FF, corporate social responsibility (CSR), sustainability, and management—an integrative review of the current state of research and a future research agenda are necessary and timely.

Few literature reviews recognize the importance of linking FF and ESG research and highlight the role of FFs in the context of individual ESG issues (e.g., Feliu and Botero 2016 ; Ferreira et al. 2021 ; Mariani et al. 2021 ). This literature review differs from previous ones that focused only on parts of the ESG concept. Instead, I paint a broader picture by including studies analyzing both ESG dimensions and FF subsystems and by suggesting future research directions based on both streams. Thereby, determinants of ESG activities and performance in FFs refer to the factors influencing whether, how frequently, or how effectively FFs engage in ESG topics, as defined below. Outcomes of ESG activities and performance stand for the non-financial or financial results and other effects of FFs’ actions regarding ESG topics. The purpose of this literature review is to collect, structure, and synthesize existing knowledge on the determinants and outcomes of ESG activities and performance in FFs, and to propose avenues for future research. A deeper understanding of these relationships is crucial, especially as ESG considerations gain importance (Kiesel and Lücke 2019 ) and FFs play a significant role in global economies (De Massis et al. 2018 ). This review aims to uncover trends, highlight research gaps, and address inconsistencies to guide future studies towards areas needing further exploration. The synthesis enhances our understanding of how ESG activities and performance of FFs are uniquely shaped, ensuring that future research is both comprehensive and focused on these critical yet under-researched aspects of significant economic relevance.

This article employs a systematic review process (Webster and Watson 2002 ) to synthesize the findings of 127 peer-reviewed studies published between 2000 and mid-2024. These studies represent the current body of knowledge from the intersection of ESG and FF with high levels of relevance and scientific rigor, as only publications from journals ranked 3 and above in the Academic Journal Guide by the Chartered Association of Business Schools (CABS 2021 ) are included in this review. The latter differentiates the foci of the reviewed studies in terms of FF subsystems (Pieper and Klein 2007 ) and ESG dimensions (Li et al. 2021 ). The findings indicate that researchers have primarily focused on identifying determinants of ESG performance differences between FFs and non-FFs by measuring the level of family ownership. However, less attention has been paid to the heterogeneity among FFs, the actual measurement of FF goals, and ESG performance outcomes. These gaps present numerous opportunities for future research.

The study contributes to extant research in two ways. First, by addressing the fragmented body of studies, this review synthesizes the current state of knowledge, creating a consolidated framework that integrates diverse findings and perspectives. This comprehensive approach helps identify areas where data may be conflicting or incomplete, guiding future investigations. The added value of this synthesis lies in its ability to provide a coherent narrative and actionable insights, which are crucial for advancing the field and informing both theory and practice. Second, this review aims to bridge the gap between FF and ESG research, specifically targeting scholars in both fields. My goal is to encourage these researchers to transfer knowledge between both streams, considering the unique aspects of ESG performance in FFs and its implications for other types of organizations. This is particularly relevant for ESG scholars interested in non-economic behaviors and FF scholars focused on ESG issues. By fostering interdisciplinary collaboration, this review seeks to enhance the understanding and application of ESG principles in FFs.

2 ESG dimensions across FF subsystems

ESG dimensions. Environmental, social, and governance dimensions, nowadays known by their acronym ESG, were introduced as significant extensions to investment criteria in global financial markets (Christensen et al. 2022 ) and have since evolved into a central framework through which firms integrate ESG concerns into their operations and activities (Gillan et al. 2021 ). Increasing interest from investors, customers, and policymakers, as well as ESG’s ubiquity and comprehensiveness, has led to a growing number of researchers investigating the determinants and outcomes of ESG performance in various contexts (e.g., Friede et al. 2015 ; Widyawati 2020 ; Gillan et al. 2021 ). By employing the ESG concept in this review, the broadest definition of corporate responsibility is applied to ensure a holistic overview (Hubbard 2009 ; Capelle-Blancard and Petit 2017 ). I draw on the ESG framework by Li et al. ( 2021 ), which is based on the glossary and definitions by the European Banking Authority. As ESG categories continuously evolve and clarify, their taxonomy provides a well-founded framework for the definition of each ESG dimension. Similar definitions of each dimension have also been used by other researchers, such as Berg et al. ( 2022 ). Environmental activities and performance refer to a firm’s impact on the natural environment, including biodiversity, energy, water, waste management, and greenhouse gas emissions. Social activities and performance are linked to a firm’s impact on stakeholders, such as product quality and safety, community relations, and employee health and safety. Governance activities and performance involve evaluating a firm’s strengths and concerns regarding aspects like business ethics, board diversity, corruption, taxes, reporting, and remuneration.

In this literature review, I examine ESG activities and performance as quantitatively measured and evaluated by researchers, using predefined variables to compare firms. Common metrics include the KLD ratings, which have been employed to aggregate strengths and weaknesses in each ESG aspect to calculate an overall ESG performance score (e.g., Wagner 2010 ; Cui et al. 2018 ; Lamb and Butler 2018 ). However, the lack of standardized metrics and methodologies across rating agencies poses challenges for consistent assessment and comparability between firms (Berg et al. 2022 ). Thus, I propose that future research explore additional ESG measures beyond ESG and CSR ratings, as discussed in the discussion.

FF subsystems. FFs inhibit characteristics that distinguish them from other forms of organization, such as their SEW (Berrone et al. 2012 ). As a result, FF research has become a distinct and rapidly growing field within management research (e.g., Gedajlovic et al. 2012 ). FFs are defined by the family’s control through ownership and/or management, guiding the company’s vision and strategy across generations (Chua et al. 1999 ). To address the growing breadth of topics investigated in FF research (Zahra and Sharma 2004 ), I build on the “bulleye” framework by Pieper and Klein ( 2007 ). This model accounts for the idiosyncrasies and heterogeneity of FFs by differentiating between six subsystems: individual, family, ownership, management, business, and environment, which represent the multiple level of analysis relevant for FFs. A firm qualifies as an FF when the family and business subsystems interact with each other (constituting subsystems), while the ownership and management subsystems establish the connection between the FF as a business entity and the family itself (connecting subsystems). This framework has also been used to structure and synthesize findings in other literature reviews on FF research (e.g., Labaki et al. 2019 ; Williams et al. 2019 ).

All following definitions and examples are based on Pieper and Klein ( 2007 ). First, the individual subsystem is the basic level of analysis, representing the human element of FFs. Relevant dimensions include characteristics, intentions, and beliefs of individual family or non-family managers and members related to the FF (Pieper and Klein 2007 ). Second, the family subsystem refers to the nuclear and extended group of related persons (Rothausen 1999 ; Pieper and Klein 2007 ). Exemplary dimensions are involved generations and state of lifecycle, values and goals of the family, and feelings towards each other and the FF, such as love, trust, or commitment. Third, the ownership subsystem stands for the level of firm ownership in form of voting rights or company capital. Relevant dimensions for this subsystem are, for instance, the number of owners, the percent of equity held by a family, and the stage of ownership (Pieper and Klein 2007 ). Fourth, the management subsystem comprises the board and top management team, which may include varying shares of family and non-family managers. Relevant dimensions include the composition of the board or management team, leadership style, and characteristics (Pieper and Klein 2007 ). Fifth, the business subsystem refers to the FF as an organization with inputs and outputs (Thompson 1967 ; Katz and Kahn 2015 ), including financial and non-financial returns, market position, strategic approaches, resources, and capabilities (Pieper and Klein 2007 ). Finally, the environment subsystem encompasses the other subsystems, as FFs operate in distinctive surroundings with different cultural, economic, and physical characteristics, which influence their attitudes and behaviors. Relevant dimensions are FFs’ location and country, cultural system, and stakeholders such as customers, government, or competitors (Kast and Rosenzweig 1992 ; Pieper and Klein 2007 ).

These dimensions highlight the complexity and scope of current FF ESG research and serve as structural criteria for all studies reviewed. To the best of my knowledge, no other literature review has analyzed the determinants and outcomes of ESG performance in FFs. Instead, researchers have focused on narrower research fields, such as CSR (Mariani et al. 2021 ), sustainability (Ferreira et al. 2021 ) or philanthropy (Feliu and Botero 2016 ). This literature review makes an important contribution by structuring and synthesizing the complex and heterogeneous multitude of studies from various research fields dealing with the ESG activities and performance of FFs.

3 Methodology

3.1 search and analysis process.

FF researchers have identified numerous determinants and outcomes of various dimensions of ESG-related topics. As shown in Fig.  1 , a systematic search, review, and analysis process is applied to ensure transparency and comprehensiveness (Webster and Watson 2002 ).

figure 1

Overview on search and analysis process

Step 1: Search and selection. First, I followed a four-stage search and selection process to ensure that all relevant literature is included:

Broad search . The first step comprised the broad search in popular and comprehensive databases, namely Ebsco Business Source Premier , Elsevier ScienceDirect , Web of Science, and ABI Inform (e.g., Calabrò et al. 2019 ). Based on the definitions in the previous chapter, various keywords were used to search titles and abstracts to gain a comprehensive overview of the research topic: (“esg” OR “csr” OR “sustainab*” OR “corporate social resp*”) AND (“family control*” OR “family firm*” OR “family business*" OR “family ownership*” OR “family-owned*”). To ensure adequate levels of scientific quality and novelty of studies, articles were required to be from peer-reviewed journals ranked 3 and above in the Academic Journal Quality Guide (Paul and Criado 2020 ; CABS 2021 ; Davidsson and Gruenhagen 2021 ), written in English, and published from the year 2000 onwards. These are common inclusion criteria in recently published literature reviews (e.g., Waldkirch 2020 ; Debellis et al. 2021 ). My search yielded 1,323 articles, of which 898 articles remained after removing duplicates from merging results from different databases. Finally, 214 unique articles were admitted to the next step after eliminating all studies from journals either unranked or ranked 2 and below (CABS 2021 ).

Title and abstract analysis. The second step aimed to exclude non-relevant publications detected in the broad search. Initially, I reviewed the titles and abstracts of all articles, eliminating those that did not include FFs and any of the above-described ESG dimensions, as keywords are used with varying meanings. For example, Ahmad et al. ( 2021 ) use the term “sustainability” as a synonym to longevity, referring to a FF’s ability to endure beyond the founder's career, rather than in the context of ESG-related sustainability. This step resulted in a reduced sample of 104 studies.

Full text assessment. I retrieved and read the full texts of all relevant articles to assess their relevance for this review, as titles and abstracts leave open questions to the articles’ purpose, methodology, and findings. The inclusion criteria required that a publication include both perspectives on FFs and any ESG-related topic, but not be a literature review. For instance, studies covering SME without also analyzing (family) ownership structures were excluded (e.g., Russo and Perrini 2010 ). This step yielded 79 relevant studies.

Hand searching. The third step ensured comprehensiveness of my findings. I cross-checked the articles cited in already found articles to yet unknown studies. Additionally, I scanned other related literature reviews (Feliu and Botero 2016 ; Ferreira et al. 2021 ; Mariani et al. 2021 ) to ensure no relevant research was missing. I identified additional 48 relevant studies, a share comparable to other systematic literature reviews (Calabrò et al. 2019 ). Thus, 127 papers were selected for the analysis process.

Step 2: Coding and analysis. Second, I followed a clear coding scheme and analysis process:

Coding. First, I openly coded all publications to identify patterns of repeated topics within this research stream. Subsequently, I searched for an established framework in FF literature that matched my coding insights. Finally, I developed the coding scheme presented in Tables 1 and 2 after refining my main and sub-categories (Wolfswinkel et al. 2013 ). This final coding scheme was applied for all articles, considering their respective methodology, sample, theoretical perspective, FF subsystems, dependent and independent variable(s) analyzed including key FF variable(s), ESG dimensions, definition of ESG variable(s), and overarching findings. Unless otherwise indicated by the authors, I assumed that firm samples included both private and listed firms. This coding result served as the basis for structuring and analyzing all papers in the following steps as well as the descriptive results in the next subchapter. Tables 1 and 2 show a summarized version of this coding result, organizing studies by FF subsystem, ESG dimensions, type (comparative vs. heterogeneity studies), and alphabetically.

Structuring. Based on this coding, I organized all studies in a matrix structure (Webster and Watson 2002 ) as shown in Table  1 , providing an overview on the focus areas of all reviewed studies. This 6 × 4-matrix is based on the six FF subsystems and the four ESG dimensions as described in chapter 2. If publications included variables from multiple FF subsystems, such as ownership and management variables in the study by Cuadrado-Ballesteros et al. ( 2015 ), the FF subsystem that best aligned with the authors’ research question and analytical focus was selected. Table 1 also specifies whether the sample represents a comparative study (differences between non-FFs and FFs) or a heterogeneity study (differences among FFs), helping to clarify the studies’ purpose and methodology.

Analysis. I analyzed the main findings of all publications within each matrix field and across matrix fields to understand the current state of research findings. Subsequently, I synthesized these insights in the following chapter and deducted potential avenues for future research in the final chapter. Since it is not feasible to discuss all articles in detail, I concentrated the summary and synthesis on findings that link FF variables as determinants or outcomes to ESG actions and performance, excluding results that pertain to other relationships.

3.2 Descriptive results

This review analyzes 127 articles, 68 of which were published since 2020, highlighting the rapid expansion and growing importance of ESG-related research. This surge in recent publications aligns with insights from previous literature reviews on FF research (Waldkirch 2020 ; Fries et al. 2021 ). Given the fast-paced developments in this field, a comprehensive literature review that integrates the most recent findings is more necessary than ever.

A few outlets account for a substantial share of the published studies with the Journal of Business Ethics taking the lead (31 articles; 24%), followed by Business Strategy and the Environment (20 articles; 16%) and Family Business Review (11 articles; 9%), indicating that FF ESG research is mainly published in specialized journals for ethics, sustainability, or FF research.

The theoretical development of the studies’ hypotheses is based on a limited set of theories, with the most frequently used perspective being SEW (37 articles; 30%), followed by atheoretical articles (28 articles; 23%), and agency theory (24 articles; 20%). The study by Berrone et al. ( 2010 ) potentially contributed to this trend, linking FFs’ desire to protect their SEW to superior environmental performance and being one of the most cited article in this review with more than 2,200 citations as of July 2024. However, the relatively large share of atheoretical articles suggests a potential lack of integrating theories and constructs from the FF domain into ESG research.

Most studies follow a quantitative approach (117 articles; 92%), with 79 articles using secondary data (62%). This approach is common due to the availability of ESG performance indicators. For instance, ESG ratings or databases, such as the MSCI ESG database, are frequently used to estimate firms’ ESG performance (e.g., Wagner 2010 ; Lamb and Butler 2018 ), while firms’ homepages and sustainability reports offer insights into ESG disclosure (e.g., Arena and Michelon 2018 ). Only 7 studies (6%) derived their insights from a qualitative approach and three were conceptual (2%). This dominance of quantitative research indicates a current focus on measurable activities rather than analyzing underlying motivations related to FFs’ ESG initiatives.

Given the specific nature of this line of research, it is also worthwhile to better understand the samples of the studies. The use of secondary data allows for analyzing time series data and large sample sizes. For instance, Dal Maso et al. ( 2020 ) were able to analyze 33,901 firm-years of 4,932 individual firms from 2002 to 2016 across 56 countries by leveraging external data sources. Geographically, most studies analyzed US-American businesses (36 articles; 28%), followed by studies with a multi-country approach (17 articles; 13%) and Chinese firms (16 articles; 13%). The large number of studies comparing FFs and non-FFs (88 articles; 69%) highlight the great interest in understanding how ownership structures relate to ESG activities. Researchers have focused more on investigating the determinants of ESG performance (106 articles; 83%) rather than the outcomes (21 articles; 17%).

The analysis of the literature reveals an imbalance in favor of studies comparing FFs with non-FF rather than exploring FF heterogeneity. Research has primarily focused on the ownership and business subsystems, as well as overarching ESG and environmental performance, with a greater emphasis on identifying determinants rather than outcomes. Figure  2 visualizes the findings from all studies reviewed, indicating the frequency (n) of studies analyzing each combination of FF subsystem and ESG dimension. Additionally, the main FF topics investigated within each cell are noted. Harvey balls indicate the level of conclusiveness of research findings within that FF subsystem—the fuller the Harvey ball, the more conclusive the results have been. Based on the detailed synthesis below, potential areas for future research are highlighted and will be discussed in the following chapter.

figure 2

Framework for the current state and future research avenues in FF ESG research

4.1 Individual subsystem

Ten studies have examined the influence of individual managers or CEOs on FF’s ESG activities and performance, highlighting how individual attitudes, religious beliefs, and character traits shape these activities. However, this area of research is still in its infancy compared to CEO research outside the ESG context (e.g., Kaplan et al. 2012 ). Research on the individual subsystem stands out for two reasons. First, it more frequently examines heterogeneity among FFs (n = 7) than comparing FFs to non-FFs (n = 3), acknowledging that family control is highly personalized and dependent on individual willingness (Fassin et al. 2011 ). Second, nearly all studies use a broad range of theoretical perspectives, such as SEW (Bhatnagar et al. 2020 ) or upper echelons theory (Luo et al. 2024 ), suggesting a higher integration between FF and ESG research. This may explain the consistent results in this subsystem and inspire other researchers to base their studies on theoretical foundations applicable to other subsystems.

Five studies focused on the effect of religious beliefs or individual values on FFs’ ESG activities, often using qualitative research approaches due to the strong human nature of this topic. These studies argue that religious beliefs and community-oriented values drive ESG activities for non-financial reasons, promoting a sense of responsibility and selflessness in favor of the needs of FFs’ environment and stakeholder (Fassin et al. 2011 ; Bhatnagar et al. 2020 ; Vu et al. 2024 ). Quantitative studies support these findings, linking religiosity as antecedents of an attitude promoting FFs’ environmental intention (Singh et al. 2021 ) and chairpersons’ collectivist-shaped background to increased CSR activities in China (Luo et al. 2024 ).

Three quantitative studies have found that individual managers’ attitudes, similar to religious beliefs, foster ESG activities and performance. FF managers prioritize non-financial over financial gains from ESG activities, influenced by their positive attitude toward the community and higher levels of social identification (Fitzgerald et al. 2010 ; Mueller and Flickinger 2021 ). This results in FF managers contributing more financial and personal resources to their communities (Fitzgerald et al. 2010 ). Regardless, FF managers can exhibit attitudes that are different from those of non-FFs managers, resulting in heterogeneous business strategies (Lewis et al. 2015 ).

Two quantitative studies identified individual managers’ character traits as determinants of FFs’ ESG activities and performance. Overconfident managers boost CSR performance in FFs due to the perceived reputational benefits (Dick et al. 2021 ). CEO narcissism affects the selection of CSR instruments, steering them towards peripheral rather than embedded CSR. However, family influence can mitigate this negative impact by prioritizing collective goals over individual ambitions (Chen et al. 2021 ). Overall, research from the individual subsystem agrees that FF managers are driven by intrinsic motivations rather than economic or financial reasons to invest in ESG.

4.2 Family subsystem

A total of 23 studies have examined the family subsystem, revealing the importance of SEW, family values, and generational involvement for FFs’ ESG activities and performance. Notably, SEW is the most frequently used theoretical perspective (n = 15). Heterogeneity studies (n = 15) outnumber comparative studies (n = 8), and three conceptual studies were published. This trend aligns with the argumentation that the family subsystem is crucial in transforming an “ordinary” firm into an FF (Pieper and Klein 2007 ; Gómez-Mejía et al. 2011 ; Berrone et al. 2012 ). However, the relatively small number of studies indicates opportunities to further explore the core of FF research.

Several studies investigate the effect of families’ SEW-related needs and goals to ESG activities and performance, yielding ambivalent findings. On the one hand, studies indicate that, in general, SEW-related goals in FFs positively influence ESG activities and performance, enhancing reputation, and legitimacy (e.g., Bammens and Hünermund 2020 ; Ernst et al. 2022 ; Saeed et al. 2023 ). Hence, high trust and SEW, but low conflict levels, foster environmental and social strategies in FFs (Nikolakis et al. 2022 ). Individual SEW dimensions such as identification, commitment, and power drive CSR and environmental efforts (Uhlaner et al. 2012 ; Marques et al. 2014 ; Arena and Michelon 2018 ; Dayan et al. 2019 ). Similarly, FFs with a strong long-term orientation often adopt sustainability strategies (Memili et al. 2018 ; Dou et al. 2019 ). Additionally, FFs with transgenerational control intentions are likelier to embrace green innovations (Delmas and Gergaud 2014 ; Bammens and Hünermund 2020 ). If the next generation is willing to take over, the duration of family involvement is positively related to donations (Dou et al. 2014 ). Another contribution point out differences between first- and second-generation FFs regarding their green innovation engagement, as second-generation leaders shift their focus from internal to external stakeholders and aim to renew the FF (Wu et al. 2024 ). FFs’ unique resources and capabilities, often referred to as “familiness”, also have a positive impact on their absorptive capacity, mediated by CSR activities (Pütz et al. 2023 ).

On the other hand, studies find that ESG activities can be hindered if they are not aligned with family’s SEW goals. A conceptual article suggests that FFs are more inclined to adopt an selective approach to CSR rather than a holistic one, prioritizing the protection of their SEW (Zientara 2017 ). The close bond between the family and the FF can also become a hindrance if family conflicts arise, reducing the likelihood of adopting environmental and social strategies (Nikolakis et al. 2022 ). Likewise, multi-generation control may negatively sustainability certifications, driven by the desire to control the FF and maintain their legacy and traditional ways of working (Richards et al. 2017 ). FFs prioritizing control and influence are also less likely to disclose environmental engagement as a way to protect their SEW (Arena and Michelon 2018 ). Further, FF identity can drive tax evasion when perceived performance is low, as prioritizing FF survival may take precedence over risk and reputation concerns (Eddleston and Mulki 2021 ). Hence, it is crucial for future research to gain deeper insights into the ambivalent impact of SEW—the core concept of FF research—on ESG activities. Researchers should aim to comprehensively understand which ESG dimensions align with families' SEW goals and which do not, and what factors influence this relationship, building on the following recent insights. Randerson ( 2022 ) introduces the concept of family business social responsibility, explaining FFs’ engagement in socially responsible activities based on whether the business, owners, or family is the determining stakeholder. Hsueh et al. ( 2023a ) provide related insights into the heterogeneity of SEW in FFs and its effect on their CSR strategy: forward-looking SEW leads to the formalization of FFs’ CSR strategy for dynasty renewal, while backward-looking SEW results in informal CSR strategies to maintain the current legacy.

Two studies identify other motivational factors: ecological and social considerations (Kallmuenzer et al. 2018 ), market-related ambitions, and compliance with ethical and legal requirements (Déniz and Suárez 2005 ) foster CSR engagement, as FFs aim to remain competitive and secure their future survival. Family values also influence the willingness to pursue ESG-related strategies (Sharma and Sharma 2011 ). For example, FFs run by charitable families with high levels of foundation giving engage in more community-related CSR activities (Cruz et al. 2024 ).

4.3 Ownership subsystem

The most researched subsystem in this review, with a total of 34 studies, is the ownership subsystem. These studies analyze the effect of family ownership on various ESG activities compared to other ownership types. Two key observations emerge: First, almost all studies are comparative, focusing on FFs versus non-FFs, as varying levels of family ownership among FFs yield fewer insights. Second, the ownership subsystem exclusively uses quantitative methods, typically measuring family ownership with large secondary databases. Researchers assume that FFs’ SEW and their distinctive goals led to the observed outcomes without factually measuring these (Dyer and Whetten 2006 ; De Massis et al. 2014 ; Miller and Le Breton–Miller 2014 ). Some researchers have critiqued this approach, noting that the level of family ownership is an imprecise proxy for FFs’ goals (Berrone et al. 2012 ), suggesting an opportunity for more nuanced future research methodologies.

First, studies on overall ESG performance provide inconclusive results regarding whether family ownership has a positive or negative impact. Some studies suggest that family ownership fosters strengths or reduces concerns (Dyer and Whetten 2006 ; McGuire et al. 2012 ; Block and Wagner 2014 ; Cordeiro et al. 2018 ; Lamb and Butler 2018 ; Herrero et al. 2024 ; Rivera-Franco et al. 2024 ). Others indicate the opposite effect (Cruz et al. 2014 ; Rees and Rodionova 2015 ; El Ghoul et al. 2016 ; Labelle et al. 2018 ; Beji et al. 2021 ; Ahmed et al. 2024 ). Two main lines of argument are most prevalent: On the one hand, SEW can serve as a source of inspiration, and the goal of building strong ties with stakeholders and ensuring survival encourages FFs to invest in ESG (e.g., Lamb and Butler 2018 ). On the other hand, SEW can also be a restriction, and FFs shy away from ESG topics due to strategic and financial considerations, prioritizing family interests over non-family stakeholders (e.g., Labelle et al. 2018 ). Besides, family ownership is negatively related to organizational wrongdoing (Smulowitz et al. 2023 ) and positively to CSR legitimacy (Panwar et al. 2014 ). Notably, studies with U.S.-American samples generally report positive effects (e.g., Block and Wagner 2014 ), while cross-country studies find a negative effect (e.g., Rees and Rodionova 2015 ). European (e.g., Herrero et al. 2024 ) and Asian (e.g., El Ghoul et al. 2016 ) samples do not provide a consistent picture. In addition, studies that utilize the KLD database often find positive effects (e.g., Lamb and Butler 2018 ), whereas the remaining studies employ a variety of CSR databases (e.g., Cruz et al. 2014 ), which may contribute to the inconclusiveness of findings. Hence, other unidentified explanatory or moderating variables, along with the lack of standardized methods for measuring key variables, represent opportunities for future research.

Second, regarding environmental performance and green innovation, the findings are also mixed, indicating potential for future research. Researchers in favor of a positive influence assume that FFs, unlike non-FFs, have a greater interest in being perceived as responsible, protect their ecological environment, and are willing to take financial investments in the light of institutional pressures (Dyer and Whetten 2006 ; Berrone et al. 2010 , 2023 ; Block and Wagner 2014 ; Ardito et al. 2019 ; Dou et al. 2019 ; Horbach et al. 2023 ; Lorenzen et al. 2024 ). However, other researchers reason that FFs are too risk averse in fear of losing SEW and not flexible enough internally to invest in novel technologies, leading to a negative influence of family ownership on environmental performance compared to non-FFs (Rees and Rodionova 2015 ; Dal Maso et al. 2020 ; Miroshnychenko et al. 2022 ; Miroshnychenko and De Massis 2022 ) or a negative moderating effect of family ownership in the presence of institutional investors (Wu et al. 2023 ). One contribution can attribute almost half of the negative effect of family ownership on environmental performance to a lack of investments in training and development in FFs (Dal Maso et al. 2020 ). Following a more nuanced approach, Lorenzen et al. ( 2024 ) find in their meta-analysis that FFs have a lower footprint than non-FFs but do not differ regarding their handprint. The variability in measurement methods, such as ESG ratings (Dal Maso et al. 2020 ) and patent citations (Ardito et al. 2019 ), potentially contribute to these inconclusive findings and offer opportunities for future research.

Third, FFs are generally associated with more social strengths and less social concerns, including employee-friendly policies (Dyer and Whetten 2006 ; Bingham et al. 2011 ; Block and Wagner 2014 ; Kang and Kim 2020 ; Sahasranamam et al. 2020 ; Herrero et al. 2024 ). They argue that FFs aim to protect their SEW by forming close bonds to their stakeholders and supporting them. Further, family ownership leads to fewer layoffs of employees (Stavrou et al. 2007 ; Block 2010 ; Kim et al. 2020 ), as family managers identify strongly with the FF and are more concerned about protecting its reputation.

Fourth, the impact of family ownership on governance activities depends on the specific governance topic of analysis. Family ownership is linked to lower disclosure quantity and credibility, as disclosure of information contradicts FF goals of protecting SEW and maintaining the information asymmetry with other stakeholders (Blodgett et al. 2011 ; Campopiano and De Massis 2015 ; Hsueh 2018 ). However, one contribution finds a U-shaped relationship between family control and environmental performance disclosure, arguing that the reputational benefits outweigh the potential control loss when the family has extensive control over the firm (Terlaak et al. 2018 ). Adding a contextual pressure factor to this relationship, Zamir and Saeed ( 2020 ) show that FFs closer to financial centers are more willing to disclose information about their CSR activities. Only a few studies have focused on other governance dimensions, suggesting a need for future research: FFs are less likely to engage in tax avoidance strategies, valuing long-term reputation over short-term gains (Temouri et al. 2022 ). Regarding board compensation, family ownership negatively impacts CEO compensation, with family CEOs being paid less than professional CEOs in FFs (Croci et al. 2012 ).

4.4 Management subsystem

In this review, 12 studies focus on variables within the management subsystem to identify determinants of FFs’ ESG activities and performance. Most research has explored the effect of family involvement on boards and management, particularly the roles of family CEOs, as well as board characteristics such as diversity and independence. Three specific aspects of the research in this subsystem stand out. First, only quantitative studies have investigated the management subsystem, likely because board composition can be effectively measured using secondary databases. Second, many studies have included variables from both the management and ownership subsystems, examining the impact of board composition and characteristics across various types of ownership. Third, all but one article of this subsystem are underpinned by a theoretical framework, including agency theory (e.g., Jiang et al. 2023 ), stakeholder theory (e.g., Nadeem et al. 2020 ). The relatively conclusive results may be attributed to its solid theoretical foundation and can serve as a guide for future research on other subsystems.

Greater levels of family control via board involvement and family CEOs positively impact CSR practices (Cui et al. 2018 ; Meier and Schier 2021 ), philanthropy (Jiang et al. 2023 ), and pollution prevention (Chen et al. 2022 ). This relationship holds regardless of whether management variables are considered as a direct influence in a subset of FFs or as moderating variables that interact with family ownership. However, increased family involvement can also heighten SEW concerns regarding CSR disclosure, as families may hesitate to share information with external stakeholders, exacerbating feelings of losing control (Muttakin et al. 2018 ). Similarly, the negative effect on tax aggressiveness is stronger for founders with high control compared to later generations or hired CEOs due to their greater FF attachment (Brune et al. 2019 ). These findings may also inform future research on the ownership subsystem, helping to resolve inconclusive findings regarding specific ESG dimensions.

Board gender and age diversity generally promote ESG activities and performance in FFs. Researchers suggest that more diverse boards, particularly those with female directors, are more oriented towards stakeholder and reputation management. This diversity enhances CSR, improves disclosure, boosts environmental performance, and reduces tax aggressiveness (Cuadrado-Ballesteros et al. 2015 ; Cordeiro et al. 2020 ; Nadeem et al. 2020 ; Beji et al. 2021 ). Additionally, board capital and top management teams’ attention to environmental issues foster CSR disclosure and environmental performance in FFs (Kim et al. 2017 ; Muttakin et al. 2018 ). Likewise, directors chosen by non-family shareholders fosters FFs’ green innovation by contributing important resources and shift families’ attention to extended SEW (Du and Cao 2023 ).

4.5 Business subsystem

A total of 28 studies focus on the intersection of the business subsystem of FFs and their ESG activities and performance, making it the second most researched subsystem. Most studies analyze the effects of CSR performance, environmental activities, and disclosure on financial, stock market, and innovation outcomes, while non-economic outcomes like succession and strategy remain rather underexplored (cf. Gómez-Mejía et al. 2011 ). Two characteristics about this subsystem stand out: First, 21 out of the 28 studies analyze outcomes, with the business subsystem revolving around market-related outcomes and performance metrics. Future research could consider the business subsystem as a source of determinants for ESG activities. Second, this subsystem is the most atheoretical, common in finance-related outlets, seemingly without impacting the conclusiveness of findings.

Researchers predominantly agree that higher levels of ESG activities positively relate to FFs’ actual and perceived economic outcomes (Niehm et al. 2008 ; O'Boyle et al. 2010 ; Kang et al. 2015 ; Leonidou et al. 2023 ), which is amplified in FFs compared to non-FFs (Craig and Dibrell 2006 ; Singal 2014 ; Lartey et al. 2020 ; Yáñez-Araque et al. 2021 ; Yeon et al. 2021 ; Combs et al. 2023 ). They argue that FFs benefit from stronger ties to customers, suppliers, and public community, so that their ESG-related activities are perceived as more credible and amplify reciprocal responses (Combs et al. 2023 ). However, this relationship was not supported in all contexts, such as in Ghanaian SMEs (Adomako et al. 2019 ). Besides, FFs benefit from a protective effect, as outward-oriented CSR initiatives weaken the negative effect of R&D intensity on FF value in times of economic recession (Hu et al. 2023 ). FFs’ lack of diversity management does not negatively affect their financial performance if there is seemingly little connection to the prevailing culture in FFs (Singal and Gerde 2015 ). Similarly, as the distance between the family and the FF increases, such as in business groups, family ownership is negatively related to affiliated firms’ CSR (Oh et al. 2023 ).

Further, FFs better translate ESG activities into innovation than non-FFs, attributed to their organizational flexibility and long-term orientation (Craig and Dibrell 2006 ; Wagner 2010 ; Haddoud et al. 2021 ). Although FFs initially adopted a more risk-averse, long-term green innovation strategy in the early 2000s, they eventually caught up with non-FFs, demonstrating less volatility (Doluca et al. 2018 ). FFs also benefit more from transparency about their ESG activities compared to non-FFs (Nekhili et al. 2017 ; Maung et al. 2020 ; Sekerci et al. 2022 ), such as reducing costs of capital and debt (Gjergji et al. 2021 ; Duggal et al. 2024 ). Sekerci et al. ( 2022 ) attribute these findings to greater honesty, credibility, and expected responsible behavior, while investors suspect non-FFs of spreading dishonest news.

Only few studies have started to investigate business variables as determinants of ESG activities. Firm size influences CSR instrument selection (Graafland et al. 2003 ), while financial resources have an inverted U-shaped effect on CSR performance due to FFs’ alternative financial sources and normative CSR motivations (Tewari and Bhattacharya 2023 ). Professionalization has a positive impact on FFs' financial performance and sustainability reputation (Piyasinchai et al. 2024 ), and proactive CSR strategies and non-hierarchical CSR decision-making enhance non-family employees’ organizational identification (Hsueh et al. 2023b ). Marketing strategies show ambivalent relationships with sustainable performance in FFs (Battisti et al. 2023 ), hinting at a need for further research on the business subsystem as determinants.

4.6 Environment subsystem

A total of 20 studies investigate the environmental subsystem of FFs and its impact on their ESG activities and performance by examining the power balance between internal and external pressures as well as specific types of external pressures. Comparative studies (n = 15) outnumber heterogeneity studies (n = 5), suggesting a need for more research on the heterogeneity among FFs in this context. Three studies in the environmental subsystem utilize qualitative methods, indicating a potentially complex relationship between environmental factors and ESG topics in FFs (e.g., Dangelico et al. 2019 ).

First, researchers agree that FFs react differently to ESG pressures compared to non-FFs. The challenge for FFs lies in reconciling the diverse influences of internal and external stakeholders, as well as business-and family-related stakeholders, each with their own goals and expectations (Discua Cruz 2020 ). FFs prioritize internal stakeholders over external ones, balancing long-term SEW goals against the potential consequences of not complying with external influences (Dangelico et al. 2019 ; Abeysekera and Fernando 2020 ; García‐Sánchez et al. 2021 ).

Second, FFs adapt to external pressures from local communities (Dekker and Hasso 2016 ; Peake et al. 2017 ; Bammens and Hünermund 2023 ), external investors (Chen et al. 2010 ), institutional and public forces (Fritz et al. 2021 ; Wu et al. 2022 ; Ahmed et al. 2024 ). However, the effects are ambiguous, suggesting opportunities for further research. Some studies suggest that FFs self-regulate due to strong internal governance and higher self-motivation, reducing the impact of new regulations compared to non-FFs (Ding et al. 2016 , 2022 ; Peake et al. 2017 ; Bendell 2022 ). For instance, FFs invest in environmental innovation in anticipation of stricter regulations (Bendell 2022 ), commit less bribery in weaker macro-governance environment compared to non-FFs (Ding et al. 2016 ), and lay off fewer employees in regions with low population due to negative reputation concerns (Kim et al. 2020 ). However, other studies argue that external control mechanisms are crucial for improving FFs’ environmental performance, as they find that regulatory stringency and market competitiveness mitigate agency problems in FFs (Yu et al. 2021 ). Additionally, financial sanctions promote ESG investments to manage risk in Chinese FFs (Ahmed et al. 2024 ), and the one-child policy in China fosters CSR to facilitate intergenerational succession (Cumming et al. 2024 ). In line with this second argumentation, political connections can be leverages to reduce external control, leading to lower environmental orientation (Brumana et al. 2024 ) and increased misconduct and bribery (Du 2015 ; Jiang and Min 2023 ).

5 Discussion

5.1 future research opportunities.

This literature review has uncovered potential areas for future research, as illustrated in Fig.  2 . To address gaps and narrow investigations in the field, researchers are encouraged to adopt a more nuanced and holistic approach. Future research directions are categorized using the same matrix employed in the findings. Research avenues across the three categories can and should be combined to advance FF ESG research. Specifically, methodical approaches should be prioritized initially, as refining these methods can provide a solid foundation for exploring other research avenues and significantly progress the field of FF ESG research.

5.1.1 Future research opportunities of first priority on FF subsystems

Future research avenue 1—Intersection of family, management, and ownership subsystems. FF research has established that only the presence of both ability and willingness can result in FF-particularistic behavior (De Massis et al. 2014 ), i.e., combination of family, business, and ownership subsystem, and has predominantly focused on SEW preservation as FFs’ main goal (Gómez-Mejía et al. 2011 ). However, studies frequently use broad measures of family ownership or involvement in management to infer FF goals, potentially misattributing outcomes to SEW preservation (Berrone et al. 2012 ; Miller and Le Breton-Miller 2014 ). This rather vague approach may explain the inconsistent findings related to family ownership and ESG performance. To address these gaps, future research should delve into the following areas: First, researchers may investigate how the ability dimension in FFs, i.e., family ownership and management involvement, has been measured and how each subdimension of ability impacts ESG activities and performance (Miroshnychenko et al. 2022 ). For instance, Block ( 2010 ) finds varying effects of family ownership versus management involvement on downsizing, highlighting the need for more precise measurement. Second, research could benefit from examining the interplay among various SEW dimensions to understand their combined effect on ESG performance. Therefore, researchers may explore how SEW dimensions influence ESG activities and assess whether FFs and non-FFs pursue similar ESG goals, drawing on recent heterogeneity studies from the family subsystem. These studies have tried to capture the unique dynamics among various FF subsystems and provide initial starting points as to which dimensions are relevant, such as long-term orientation (Dou et al. 2019 ), reputation motives (Bammens and Hünermund 2020 ), trust and conflict among family members (Nikolakis et al. 2022 ), and dynastic renewal and identification (Hsueh et al. 2023a ). Moreover, Randerson ( 2022 ) provides a conceptual basis to integrate SEW and familiness into FF ESG research. To enhance understanding, researchers should prioritize survey-based approaches that directly measure FFs’ ability and willingness to engage in ESG initiatives. As measuring FF and non-FF goals more precisely bears challenges, researchers should start with a rigorous analysis of previous research to find the most fitting scale or way of measuring goals based on their specific research questions and hypotheses (e.g., Prügl 2019 ). Despite the additional effort required, such approaches will provide deeper insights and clarify the complex interactions between family, management, and ownership subsystems in relation to ESG performance. Third, researcher may include relevant moderating and mediating factors, as only few studies analyze such variables beyond family control measures, despite previous suggestions from researchers that FF behavior is likely influenced by a multitude of moderating and mediating factors (De Massis et al. 2014 ). Examples include long-term orientation (Memili et al. 2018 ), normative motivation (Ernst et al. 2022 ), and the use of the family name as the FF name (Cruz et al. 2024 ). Thus, these are exemplary unaddressed questions for future research on family control and goals related to ESG performance: Are current measurement methods capturing the nuanced effects of family control? Do FFs and non-FFs pursue similar goals with ESG engagement? Do FFs differ among each other regarding their SEW preservation goals which, in turn, affects their ESG activities? Which SEW dimensions have the most significant impact on ESG performance? Are there specific SEW aspects that enhance or hinder ESG outcomes? Are there interdependencies among SEW dimensions? What other family-related variables are relevant to understand FFs’ ESG activities? Which level of family influence benefits ESG performance most, which hinders most? How do FFs and non-FFs differ with regard to their ability and willingness to invest in ESG activities?

Future research avenue 2—FF heterogeneity. In this review, 69% of all studies compare FFs and non-FFs, predominantly treating FFs as a homogeneous group. However, this view has been challenged in FF research in recent years (De Massis et al. 2014 ; Daspit et al. 2021 ). Researchers advocate for further investigation into FF heterogeneity (Calabrò et al. 2019 ), a need that has also become evident concerning the individual and environmental subsystems. Studies with FF samples find significant differences in their ESG performance and outcomes, suggesting that treating them as a single category may overlook critical differences in ESG performance and strategies (e.g., Chen et al. 2010 ; Berrone et al. 2023 ). Although FF heterogeneity studies have gained momentum in recent years, with 21 out of 39 such studies in this review published since 2021, conceptual and qualitative approaches, such as in-depth case studies, are needed to foster our understanding. Such studies enable researchers to identify and reveal variations among FFs, distinguish between different archetypes of FFs in terms of their ESG approaches, and identify key variables that differentiate FFs. Researchers can draw on the findings and methodologies from studies by Discua Cruz ( 2020 ), Marques et al. ( 2014 ), Bhatnagar et al. ( 2020 ), and Hsueh et al. ( 2023a ). These studies have leveraged non-quantitative approaches to examine underlying motives and relationships within FFs regarding their ESG reporting, CSR engagement, philanthropy, and non-family employees’ organizational identification. For instance, researchers can analyze how FFs’ familiness impact their ESG performance (Pütz et al. 2023 ), which has rarely been the focus of studies. Researchers may also examine how the generational life cycle stage of FFs affects their ESG activities and performance, as this aspect is crucial as it may influence strategic decisions and long-term sustainability (Wu et al. 2024 ). Afterwards, scholars could revisit the comparison between FFs’ and non-FFs’ ESG performance, trying to solve mixed and inconclusive findings in comparative studies by incorporating a broader range of variables that have been found to explain differences among FFs. Exemplary unanswered research questions are: Why do FFs differ regarding their ESG performance? What role do their individual managers’ attitudes and character traits, resources, history, processes, goals, or local environment play to explain the differences in their determinants and outcomes of ESG performance? Are there differences in terms of the individual ESG dimensions? Have these factors changed over time in FFs? Are there noticeable differences in ESG approaches between first-generation and multi-generational FFs? Do specific attributes of familiness drive better or worse ESG outcomes?

5.1.2 Future research opportunities of first priority on ESG dimensions

Future research avenue 3—ESG measurement and greenwashing. More than half of all reviewed studies rely on CSR or ESG databases that collect and accumulate self-reported firm data. This approach warrants reconsideration for two main reasons as indicated in the subchapter regarding the ownership subsystem. First, different ESG databases or rating agencies apply varying definitions and standards when calculating ESG performance. The same FF can actually achieve opposite results depending on the source, which not only harms comparability between studies but also affects the validity of effect sizes, resulting in inconclusive findings across studies (Billio et al. 2021 ). Second, current research relies mainly on ESG data based on self-reported firm disclosures. However, ESG performance in particular is a topic where the risk of data bias due to social desirability is very high so that firms may exaggerate their positive ESG traits while downplaying negative ones (Nederhof 1985 ; Arvidsson and Dumay 2022 ). Hence, future research would benefit from more objective and comparable ESG indicators, such as the actual level of emissions, as demonstrated by Berrone et al. ( 2010 ) who used weighted on-site emissions to assess environmental performance. In addition, meta-analytical studies can help to clarify the impact of ESG measurement on the significance of determinants and outcomes. An interesting approach is also shown by Lorenzen et al. ( 2024 ), as they differentiate between ecological footprint and handprint in their meta-analysis, providing a more nuanced pictures of FFs’ environmental performance. In addition, researchers may also investigate greenwashing behaviors among FFs. For instance, Kim et al. ( 2017 ) find a higher tendency of greenwashing for non-FFs than FFs arguably due to their desire to preserve their SEW. I suggest fellow scholars to choose similar approaches to extend knowledge on this topic. Hence, I propose the following exemplary research questions: How do more objective ESG measurements relate to ESG ratings? Do determinants and outcomes vary based on the measurement of ESG variables? Are actual and perceived ESG performance correlated in FFs? Is there a stronger or weaker correlation compared to non-FFs? Do FFs exhibit a stronger tendency to adjust their reported ESG performance due to social desirability pressures? How do contextual factors, such as current financial performance, influence greenwashing behavior in FFs? Which goals, such as transgenerational intentions and reputation, impact FFs’ greenwashing behavior?

5.1.3 Future research opportunities of second priority on FF subsystems

Future research avenue 4—Moderating effects within the ownership subsystem. Future research should more thoroughly examine firm-, industry-, and country-level differences to explain the mixed results seen in comparative studies, particularly within the ownership subsystem. Initial studies hint at the importance of accounting for country-level differences to acknowledge the varying influences of governmental or regulatory environments, culture, and society’s norms on FFs’ ESG performance (e.g., Rees and Rodionova 2015 ; Miroshnychenko and De Massis 2022 ). The comparison of findings among the comparative studies indicates a relevant influence of the sample firms’ origin, since most studies with a positive relationship of family control on ESG have analyzed U.S. or other one-country-samples (e.g., Cui et al. 2018 ), whereas studies with partially contradicting results have relied on Asian or global samples (e.g., El Ghoul et al. 2016 ). In addition, scholars should also gain a more nuanced understanding about how firm characteristics and industry influences the ESG performance of FFs. For example, Terlaak et al. ( 2018 ) find a moderating effect of heavily polluting industries, Battisti et al. ( 2023 ) highlight the importance of advertising intensity, and Piyasinchai et al. ( 2024 ) emphasize the role professional management practices. The specific industry of FFs can affect their pressures, goals, and resources which is why future research would benefit from investigating firm- and industry-level differences. Researchers may also rely on Gómez-Mejía et al. ( 2011 ) who argue why and how FFs differ from non-FFs in five categories of managerial decisions and firm characteristics. Moreover, future studies can build upon the insights provided by Lorenzen et al. ( 2024 ), whose meta-analysis demonstrates that the effect of FF status on environmental performance differs for large and listed firms compared to small- and medium-sized firms. In addition, Bammens and Hünermund ( 2020 ) argue that listed FFs may prioritize short-term goals due to the greater distance between family and firm, contrasting with privately held FFs that are more focused on long-term objectives, in comparison to findings by Rees and Rodionova ( 2015 ). Thus, future research would benefit from either conducting meta-analyses or analyzing large global samples that can account for different business types, industries, and origins of firms. Building upon these first insights, additional research questions arise, such as: What are the key factors that distinguish countries where FFs exhibit varying ESG performance? Which industries excel in ESG performance, and how do they compare to others? Are non-FFs more affected by their industry than FFs? Do listed FFs show lower ESG performance across all dimensions or just specific ones? Do listed FFs act more like non-FFs than private FFs? What firm characteristics unique to FFs impact their ESG activities?

Future research avenue 5—Individual subsystem. Until now, the effect of individual FF manager attitudes and character traits on FF ESG activities and performance has received rather little attention in research. Future research can transfer and extend both the established theoretical foundation, the initial findings from FF ESG research, and CEO research. First, upper echelons theory by Hambrick and Mason ( 1984 ) proposes that managers’ experiences, values, and personalities have a high influence on their business choices and is a key theory in strategic management research. This theory offers a great fundamental basis for FF ESG research to deduce hypotheses, as research on the effect of CEOs on their firms’ actions and performance demonstrates (Wang et al. 2016 ). Second, both first quantitative and conceptual insights highlight that these individual influences are key determinants to understand why FFs engage in enhancing their ESG performance (e.g., Singh et al. 2021 ). Researchers could build on and empirically test the hypotheses by Le Breton-Miller and Miller ( 2016 ), proposing that FFs are more engaged in sustainable practices when FF owners and managers are influenced by religious values and their parents and educational background convey the importance of discipline and commitment to the FF and society. It would be highly relevant for future research to analyze whether this effect is potentially amplified in FFs due to the high levels of personalized control (Sharma and Sharma 2011 ). Third, existent research on CEO characteristics demonstrates the plethora of family manager and owner characteristics that may also be relevant for their decision-making in FFs. Compared to the state of research for CEOs, the knowledge about the individual FF manager or owner as key decision makers for ESG behaviors in FFs is still quite limited. Thus, FF researchers may draw on research by Kaplan et al. ( 2012 ), Manner ( 2010 ), and Barker and Mueller ( 2002 ) to identify relevant determinants from the individual subsystem, such as educational background, career experience, interpersonal or execution skills, tenure. For example, Cumming et al. ( 2024 ) find moderating effects of highly personal owner characteristics, such as number of children, number of sons, and surpassing of reproductive age, on the effect of one-child policy introduction on CSR performance. Similarly, Du and Cao ( 2023 ) show a positive effect of green professional backgrounds of directors on green innovation in FFs. Exemplary unanswered research questions are the following: How do FF managers’ attitudes, beliefs, and character traits influence their decisions regarding ESG engagement? Do family members exhibit different values and attitudes towards the environment and society than non-family members in FFs? How do individual differences, such as values or educational backgrounds, between preceding generations and successors impact FF ESG performance? Are younger family members more likely to foster the engagement in ESG practices? Do FF managers’ attitudes and beliefs moderate the influence of FF control and goals on ESG engagement?

Future research avenue 6—Business subsystem. Current research has primarily explored how ESG engagement impacts the financial performance of FFs, focusing on the “CSR-CFP relationship”. Future studies should investigate how ESG activities affect critical non-financial goals in FFs, which can offer a more holistic view of the benefits and challenges associated with ESG behaviors. For example, interdependencies between FFs’ SEW and ESG activities could be a valuable area of study, as ESG initiatives may enhance FFs’ SEW by improving the next generation’s willingness to take over (Dou et al. 2014 ) or increasing non-family employees’ organizational identification (Hsueh et al. 2023b ). Exploring these dynamics could provide insights into how ESG efforts contribute to both financial and non-financial outcomes. Moreover, future studies would benefit from adopting a different perspective and analyzing FF business variables, especially financial performance, as determinants of ESG performance for two reasons. First, financial resources are essential for any firm to invest in ESG issues, as payback periods are typically longer and the prospects for success unclear. For instance, green innovation is considered to be significantly more financially intensive than general innovation (e.g., Holzner and Wagner 2022 ). Research by Miroshnychenko and De Massis ( 2022 ) highlights that cash reserves, net sales growth, and firm size significantly influence the likelihood of adopting sustainability practices. However, non-financial business variables, such as professional management practices and dynamic capabilities, can also enhance FFs’ ESG strategy, reputation, and financial performance (Leonidou et al. 2023 ; Piyasinchai et al. 2024 ). Second, FFs have unique financial structures compared to non-FFs, which is likely to influence their financial flexibility for ESG engagement. Sirmon and Hitt ( 2003 ) argue that FFs possess patient financial capital that is invested with a long-term commitment and allows FFs to pursue more innovative strategies than non-FFs. Tewari and Bhattacharya ( 2023 ) provide first empirical support, showing that the impact of financial resources on CSR activities differs between FFs and non-FFs. However, they also acknowledge that FFs have less access to external capital due to their control aspirations and smaller firm size. Based on these studies the importance of firm performance as a determinant of ESG performance in FFs needs further investigation. Exemplary research questions are as follows: How do ESG activities and performance impact FFs’ non-financial performance? Which dimensions of FF performance influence ESG investments? What technical and management capabilities foster ESG activities in FFs? Does FF performance affect different ESG dimensions similarly? How does this compare to non-FFs? What role does the expected financial performance play? How does the FF life cycle affect the relationship of FF financial performance to ESG investments? For example, are founder-led FFs more conservative than FFs at a later generational stage? Are FFs willing to give up control to raise external capital for ESG investments? How do ESG activities serve as a risk management or value protection tools in terms of financial or reputational distress?

Future research avenue 7—Environment subsystem. The current understanding of how environmental factors and stakeholders influence FFs’ ESG performance is still limited and requires deeper exploration. First, the impact of climate change as a pressure on FFs is an underexplored area. A study by Wright and Nyberg ( 2017 ) shows that firms normalize and downplay the big challenge of climate change, with their environmental engagement diminishing over time. Future research could analyze this phenomenon among FFs, since their higher level of long-term orientation and local embeddedness (e.g., Berrone et al. 2010 ; Memili et al. 2018 ) might counteract this reaction. Thus, tangible climate change consequences in their immediate environment could foster ESG performance improvements. For example, Horbach et al. ( 2023 ) demonstrate that extreme weather events promote firms’ level of greenness. Second, given recent energy and environmental policy developments, the impact of regulatory and governmental influences on FFs’ behaviors offers opportunities for further investigation with great practical impact for FF owners and managers. First studies detect a self-regulatory approach of FFs (e.g., Bendell 2022 ), but many potential determinants of the relationship between regulatory pressure and ESG investments have not been investigated yet. For instance, Amore and Minichilli ( 2018 ) find that FFs show greater investment resilience under local political uncertainty. Bammens and Hünermund ( 2023 ) introduce a strong ecological community logic as a fostering condition for green innovation, but the reverse influence and effect have not been investigated yet. Future research should examine how similar environmental context factors influence ESG activities, particularly considering FFs’ place-based culture (e.g., Kim et al. 2020 ; Bammens and Hünermund 2023 ). Third, future research would benefit from analyzing more comprehensive models and including variables from other FF subsystems. For example, FF governance could be a relevant factor, as female managers are considered to be more stakeholder-oriented (e.g., Cordeiro et al. 2020 ) and non-family managers to be less focused on SEW goals and FF reputation (e.g., El Ghoul et al. 2016 ). Hence, a higher share of female managers/non-family manager could amplify/weaken the effect of regulatory pressure. Cumming et al. ( 2024 ) find that individual owner characteristics significantly moderate policy responses due to the highly personalized control in FFs. Other variables that have an influence on ESG-related behavior, such as FFs’ transgenerational control intentions (e.g., Bammens and Hünermund 2020 ) or political connectedness (e.g., Du 2015 ), might also moderate or mediate the effects of regulatory pressures. These factors should be considered in future studies to incorporate the uniqueness of FFs and their goals in their response to regulatory pressures. Hence, some unanswered research questions are the following: How does climate change impact the environmental performance of FFs compared to non-FFs? More or less than for non-FFs? Are local versus distant environmental disasters experienced differently in terms of impact and response by FFs? How do FFs differ regarding their reaction to regulatory pressure regarding ESG topics, e.g., self-regulatory versus reactive? What role do FF employees play in shaping their firms’ ESG activities, and how is this influenced by the level of ESG community logic? Does FF governance affect the impact of regulatory pressure on ESG performance? Are FFs pursuing different goals in their strategy to deal with regulatory pressures related to ESG issues?

5.1.4 Future research opportunities of second priority on ESG dimensions

Future research avenue 8—ESG heterogeneity. Research on ESG heterogeneity is crucial, as aggregated ESG and CSR metrics often present inconsistent results, which mask the nuanced approaches of FFs. Quantitative analyses of overarching ESG metrics can be misleading, failing to capture the complexity of FFs’ ESG practices, as findings are more conclusive when comparing results for each subtopic. For instance, Block and Wagner ( 2014 ) highlight that FFs may simultaneously act responsibly and irresponsibly, with varying engagement levels across different ESG dimensions. They find that family ownership positively influences environmental and product-related ESG performance but negatively impacts community-related performance. This nuanced approach has partially been overlooked, with research often focusing primarily on environmental issues or general ESG performance. Future research could understand FF behavior in a more granular way by acknowledging the complexity and heterogeneity of ESG. For instance, Lorenzen et al. ( 2024 ) suggest distinguishing between detailed aspects of environmental performance, such as handprint and footprint, to gain deeper insights. By building on studies such as Croci et al. ( 2012 ), Bingham et al. ( 2011 ), or Kang and Kim ( 2020 ), researchers should integrate more variables measuring the governance and social dimensions, such as human and employee rights, diversity, compensation and competition fairness, and transparent communication. Potential research questions to answer for future studies include the following: Which ESG dimensions are particularly aligned with which FF goals? Do particular family members, such as women or later generations, prioritize certain ESG issues differently? How have FFs' ESG preferences changed over time? Are FFs more effective than non-FFs on social issues aligned with their SEW goals, and less effective on unrelated issues? Why might some FFs exhibit behavior similar to non-FFs in governance-related topics?

5.2 Practical implications

This article also provides important practical insights for managers and stakeholders of FFs who aim to gain a deeper understanding about FFs’ ESG performance. By increasing awareness of the factors influencing FFs’ ESG performance and the potential outcomes, FF owners and managers can more effectively design their firm’s individual ESG strategy. This understanding also helps them anticipate potential obstacles and the effects of their actions. Given that the family is often also the largest shareholder and has an interest in the firms’ longevity, it is crucial for FFs to comply with the latest ESG regulations and to understand their firms’ future ESG readiness. Understanding ESG performance helps FFs to identify risks that could affect their survival. Effective management of these risks can protect the firm from potential financial, legal, and reputational damage, which is especially important for family members.

Additionally, investors and financial institutions are increasingly prioritizing ESG factors when making decisions. FFs that demonstrate strong ESG performance are more likely to attract investment from socially conscious investors and funds that prioritize sustainability alongside financial returns, aligning with their strategic approach and cultural values. Superior ESG performance can be a strong value proposition for FFs, attracting customers, employees, and business partners who value sustainability and ethical business practices. A future-ready and sustainability-oriented FF may also be more attractive for the next generation, supporting the long-term survival of the FF and family control over it. Furthermore, stakeholders from market, politics, and government may prefer future-proof and compliant FFs as suppliers or customers or feel inclined to shape effective ESG initiatives or subsidies for FFs.

5.3 Limitations

As with all research, this literature review has few limitations. The first limitation is the potential oversight of influences stemming from variations in the samples used, such as whether FFs are private or public, their country of origin, or the industry in which they operate. Analyzing how these factors impact ESG performance was beyond the purpose and scope of this review. The second limitation is the inability to distinguish between effects based on how ESG performance was measured. The exact measurement of certain FF variables, such as family control as a dummy or a metrical variable including ownership and/or management involvement, is particularly difficult to consider. Since several studies use different measurement approaches for family control (e.g., family ownership or management involvement), future meta-analyses or sensitivity studies should explore these variations and their impact on results (e.g., Miroshnychenko et al. 2022 ). Finally, the article selection and analysis process are inherently subjective. To mitigate this, I adhered to a systematic approach and meticulously documented each step, aiming to reduce any bias (Webster and Watson 2002 ; Wolfswinkel et al. 2013 ).

6 Conclusion

ESG-related FF research has continuously grown in both quantity and importance over the past two decades and will remain a central topic for researchers and practitioners. To support and steer future research, this literature review structures and synthesizes the current body of knowledge at the intersection of FF and ESG research, providing a comprehensive understanding of how various ESG dimensions impact different FF subsystems. By identifying and structuring new research avenues, this review supports the advancement of the field and enriches both FF and ESG research.

Availability of data and material

I do not analyze or generate any datasets, because my work is based on a literature reviewing approach. All reviewed studies are included in this article.

Abeysekera AP, Fernando CS (2020) Corporate social responsibility versus corporate shareholder responsibility: a family firm perspective. J Corp Finan 61:101370. https://doi.org/10.1016/J.Jcorpfin.2018.05.003

Article   Google Scholar  

Adomako S, Amankwah-Amoah J, Danso A, Konadu R, Owusu-Agyei S (2019) Environmental sustainability orientation and performance of family and nonfamily firms. Bus Strat Env 28:1250–1259. https://doi.org/10.1002/Bse.2314

Ahmad S, Omar R, Quoquab F (2021) Family firms’ sustainable longevity: the role of family involvement in business and innovation capability. Jfbm 11:86–106. https://doi.org/10.1108/Jfbm-12-2019-0081

Ahmed R, Abweny M, Benjasak C, Nguyen DTK (2024) Financial sanctions and environmental, social, and governance (ESG) performance: a comparative study of ownership responses in the Chinese context. J Environ Manage 351:119718. https://doi.org/10.1016/J.Jenvman.2023.119718

Amore MD, Minichilli A (2018) Local political uncertainty, family control, and investment behavior. J Financ Quant Anal 53:1781–1804. https://doi.org/10.1017/S002210901800025x

Ardito L, Messeni Petruzzelli A, Pascucci F, Peruffo E (2019) Inter-firm R&D collaborations and green innovation value: the role of family firms’ involvement and the moderating effects of proximity dimensions. Bus Strat Env 28:185–197. https://doi.org/10.1002/Bse.2248

Arena C, Michelon G (2018) A matter of control or identity? Family firms’ environmental reporting decisions along the corporate life cycle. Bus Strat Env 27:1596–1608. https://doi.org/10.1002/Bse.2225

Arvidsson S, Dumay J (2022) Corporate ESG reporting quantity, quality and performance: where to now for environmental policy and practice? Bus Strat Environ 31:1091–1110. https://doi.org/10.1002/Bse.2937

Bammens Y, Hünermund P (2020) Nonfinancial considerations in eco-innovation decisions: the role of family ownership and reputation concerns. J Prod Innov Manag 37:431–453. https://doi.org/10.1111/Jpim.12550

Bammens Y, Hünermund P (2023) Ecological community logics, identifiable business ownership, and green innovation as a company response. Res Policy 52:104826. https://doi.org/10.1016/J.Respol.2023.104826

Barker VL, Mueller GC (2002) CEO characteristics and firm R&D spending. Manage Sci 48:782–801. https://doi.org/10.1287/Mnsc.48.6.782.187

Battisti E, Nirino N, Leonidou E, Salvi A (2023) Corporate social responsibility in family firms: can corporate communication affect CSR performance? J Bus Res 162:113865. https://doi.org/10.1016/J.Jbusres.2023.113865

Beji R, Yousfi O, Loukil N, Omri A (2021) Board diversity and corporate social responsibility: empirical evidence from France. J Bus Ethics 173:133–155. https://doi.org/10.1007/S10551-020-04522-4

Bendell BL (2022) Environmental Investment decisions of family firms—an analysis of competitor and government influence. Bus Strat Environ 31:1–14. https://doi.org/10.1002/Bse.2870

Berg F, Kölbel JF, Rigobon R (2022) Aggregate confusion: the divergence of ESG ratings. Rev Finance 26:1315–1344. https://doi.org/10.1093/Rof/Rfac033

Berrone P, Cruz C, Gómez-Mejía LR, Larraza-Kintana M (2010) Socioemotional wealth and corporate responses to institutional pressures: do family-controlled firms pollute less? Adm Sci Q 55:82–113. https://doi.org/10.2189/Asqu.2010.55.1.82

Berrone P, Cruz C, Gómez-Mejía LR (2012) Socioemotional wealth in family firms: theoretical dimensions, assessment approaches, and agenda for future research. Fam Bus Rev 25:258–279. https://doi.org/10.1177/0894486511435355

Berrone P, Gómez-Mejía LR, Xu K (2023) The role of family ownership in norm-conforming environmental initiatives: lessons from China. Entrep Theory Pract 47:1915–1941. https://doi.org/10.1177/10422587221115362

Bhatnagar N, Sharma P, Ramachandran K (2020) Spirituality and corporate philanthropy in Indian family firms: an exploratory study. J Bus Ethics 163:715–728. https://doi.org/10.1007/S10551-019-04394-3

Billio M, Costola M, Hristova I, Latino C, Pelizzon L (2021) Inside the ESG ratings: (dis)agreement and performance. Corp Soc Responsib Environ Manage 28:1426–1445. https://doi.org/10.1002/Csr.2177

Bingham JB, Dyer WG, Smith I, Adams GL (2011) A stakeholder identity orientation approach to corporate social performance in family firms. J Bus Ethics 99:565–585. https://doi.org/10.1007/S10551-010-0669-9

Block JH (2010) Family management, family ownership, and downsizing: evidence from S&P 500 firms. Fam Bus Rev 23:109–130. https://doi.org/10.1177/089448651002300202

Block JH, Wagner M (2014) The effect of family ownership on different dimensions of corporate social responsibility: evidence from large US firms. Bus Strat Env 23:475–492. https://doi.org/10.1002/Bse.1798

Blodgett MS, Dumas C, Zanzi A (2011) Emerging trends in global ethics: a comparative study of U.S. and international family business values. J Bus Ethics 99:29–38. https://doi.org/10.1007/S10551-011-1164-7

Brumana M, Madonna A, Campopiano G, Boffelli A (2024) Orientation towards environmental sustainability in European family versus nonfamily firms: the role of policymaker engagement and incentives. Entrep Reg Dev. https://doi.org/10.1080/08985626.2024.2371883

Brune A, Thomsen M, Watrin C (2019) Family firm heterogeneity and tax avoidance: the role of the founder. Fam Bus Rev 32:296–317. https://doi.org/10.1177/0894486519831467

CABS (2021) Academic Journal Guide 2021. Accessed 8 Feb 2022

Calabrò A, Vecchiarini M, Gast J, Campopiano G, De Massis A, Kraus S (2019) Innovation in family firms: a systematic literature review and guidance for future research. Int J Manag Rev 21:317–355. https://doi.org/10.1111/Ijmr.12192

Campopiano G, De Massis A (2015) Corporate social responsibility reporting: a content analysis in family and non-family firms. J Bus Ethics 129:511–534. https://doi.org/10.1007/S10551-014-2174-Z

Capelle-Blancard G, Petit A (2017) The weighting of CSR dimensions: one size does not fit all. Bus Soc 56:919–943. https://doi.org/10.1177/0007650315620118

Chen S, Chen X, Cheng Q, Shevlin T (2010) Are family firms more tax aggressive than non-family firms? J Financ Econ 95:41–61. https://doi.org/10.1016/J.Jfineco.2009.02.003

Chen J, Zhang Z, Jia M (2021) How CEO narcissism affects corporate social responsibility choice? Asia Pac J Manag 38:897–924. https://doi.org/10.1007/S10490-019-09698-6

Chen X, Pan X, Sinha P (2022) What to green: family involvement and different types of eco-innovation. Bus Strategy Environ 31:2588–2602. https://doi.org/10.1002/Bse.3045

Christensen DM, Serafeim G, Sikochi A (2022) Why is corporate virtue in the eye of the beholder? The case of ESG ratings. Account Rev 97:147–175. https://doi.org/10.2308/Tar-2019-0506

Chua JH, Chrisman JJ, Sharma P (1999) Defining the family business by behavior. Entrep Theory Pract 23:19–39. https://doi.org/10.1177/104225879902300402

Combs JG, Jaskiewicz P, Ravi R, Walls JL (2023) More bang for their buck: why (and when) family firms better leverage corporate social responsibility. J Manag 49:575–605. https://doi.org/10.1177/01492063211066057

Cordeiro JJ, Galeazzo A, Shaw TS, Veliyath R, Nandakumar MK (2018) Ownership influences on corporate social responsibility in the Indian context. Asia Pac J Manag 35:1107–1136. https://doi.org/10.1007/S10490-017-9546-8

Cordeiro JJ, Profumo G, Tutore I (2020) Board gender diversity and corporate environmental performance: the moderating role of family and dual-class majority ownership structures. Bus Strat Environ 29:1127–1144. https://doi.org/10.1002/Bse.2421

Craig J, Dibrell C (2006) The natural environment, innovation, and firm performance: a comparative study. Fam Bus Rev 19:275–288. https://doi.org/10.1111/J.1741-6248.2006.00075.X

Croci E, Gonenc H, Ozkan N (2012) CEO compensation, family control, and institutional investors in continental Europe. J Bank Finance 36:3318–3335. https://doi.org/10.1016/J.Jbankfin.2012.07.017

Cruz C, Larraza-Kintana M, Garcés-Galdeano L, Berrone P (2014) Are family firms really more socially responsible? Entrep Theory Pract 38:1295–1316. https://doi.org/10.1111/Etap.12125

Cruz C, Milanov H, Klein J (2024) It’s a family affair: a case for consistency in family foundation giving and family firm community CSR activity. J Bus Ethics 191:633–649. https://doi.org/10.1007/S10551-023-05424-X

Cuadrado-Ballesteros B, Rodríguez-Ariza L, García-Sánchez I-M (2015) The role of independent directors at family firms in relation to corporate social responsibility disclosures. Int Bus Rev 24:890–901. https://doi.org/10.1016/J.Ibusrev.2015.04.002

Cui V, Ding S, Liu M, Wu Z (2018) Revisiting the effect of family involvement on corporate social responsibility: a behavioral agency perspective. J Bus Ethics 152:291–309. https://doi.org/10.1007/S10551-016-3309-1

Cumming D, Hu J, Wu H (2024) Leaving a legacy for my children: the one-child policy reform and engagement in CSR among family firms in China. J Bus Ethics. https://doi.org/10.1007/S10551-023-05603-W

Dal Maso L, Basco R, Bassetti T, Lattanzi N (2020) Family ownership and environmental performance: the mediation effect of human resource practices. Bus Strat Environ 29:1548–1562. https://doi.org/10.1002/Bse.2452

Dangelico RM, Nastasi A, Pisa S (2019) A comparison of family and nonfamily small firms in their approach to green innovation: a study of Italian companies in the agri-food industry. Bus Strat Environ 28:1434–1448. https://doi.org/10.1002/Bse.2324

Daspit JJ, Chrisman JJ, Ashton T, Evangelopoulos N (2021) Family firm heterogeneity: a definition, common themes, scholarly progress, and directions forward. Fam Bus Rev 34:296–322. https://doi.org/10.1177/08944865211008350

Davidsson P, Gruenhagen JH (2021) Fulfilling the process promise: a review and agenda for new venture creation process research. Entrep Theory Pract 45:1083–1118. https://doi.org/10.1177/1042258720930991

Dayan M, Ng PY, Ndubisi NO (2019) Mindfulness, socioemotional wealth, and environmental strategy of family businesses. Bus Strat Environ 28:466–481. https://doi.org/10.1002/Bse.2222

De Massis A, Kotlar J, Chua JH, Chrisman JJ (2014) Ability and willingness as sufficiency conditions for family-oriented particularistic behavior: implications for theory and empirical studies. J Small Bus Manage 52:344–364. https://doi.org/10.1111/Jsbm.12102

De Massis A, Frattini F, Majocchi A, Piscitello L (2018) Family firms in the global economy: toward a deeper understanding of internationalization determinants, processes, and outcomes. Glob Strateg J 8:3–21. https://doi.org/10.1002/Gsj.1199

Debellis F, Rondi E, Plakoyiannaki E, De Massis A (2021) Riding the waves of family firm internationalization: a systematic literature review, integrative framework, and research agenda. J World Bus 56:101144. https://doi.org/10.1016/J.Jwb.2020.101144

Dekker JC, Hasso T (2016) Environmental performance focus in private family firms: the role of social embeddedness. J Bus Ethics 136:293–309. https://doi.org/10.1007/S10551-014-2516-X

Delmas MA, Gergaud O (2014) Sustainable certification for future generations. Fam Bus Rev 27:228–243. https://doi.org/10.1177/0894486514538651

Déniz MD, Suárez MKC (2005) Corporate social responsibility and family business in Spain. J Bus Ethics 56:27–41. https://doi.org/10.1007/S10551-004-3237-3

Dick M, Wagner E, Pernsteiner H (2021) Founder-controlled family firms, overconfidence, and corporate social responsibility engagement: evidence from survey data. Fam Bus Rev 34:71–92. https://doi.org/10.1177/0894486520918724

Ding S, Qu B, Wu Z (2016) Family control, socioemotional wealth, and governance environment: the case of bribes. J Bus Ethics 136:639–654. https://doi.org/10.1007/S10551-015-2538-Z

Ding W, Levine R, Lin C, Xie W (2022) Competition laws, ownership, and corporate social responsibility. J Int Bus Stud 53:1576–1602. https://doi.org/10.1057/S41267-022-00536-4

Discua Cruz A (2020) There is no need to shout to be heard! The paradoxical nature of corporate social responsibility (CSR) reporting in a Latin American family small and medium-sized enterprise (SME). Int Small Bus J 38:243–267. https://doi.org/10.1177/0266242619884852

Doluca H, Wagner M, Block JH (2018) Sustainability and environmental behaviour in family firms: a longitudinal analysis of environment-related activities, innovation and performance. Bus Strate Environ 27:152–172. https://doi.org/10.1002/Bse.1998

Dou J, Zhang Z, Su E (2014) Does family involvement make firms donate more? Empirical evidence from Chinese private firms. Fam Bus Rev 27:259–274. https://doi.org/10.1177/0894486514538449

Dou J, Su E, Wang S (2019) When does family ownership promote proactive environmental strategy? The role of the firm’s long-term orientation. J Bus Ethics 158:81–95. https://doi.org/10.1007/S10551-017-3642-Z

Du X (2015) Is corporate philanthropy used as environmental misconduct dressing? Evidence from Chinese family-owned firms. J Bus Ethics 129:341–361. https://doi.org/10.1007/S10551-014-2163-2

Du S, Cao J (2023) Non-family shareholder governance and green innovation of family firms: a socio-emotional wealth theory perspective. Int Rev Financ Anal 90:102857. https://doi.org/10.1016/J.Irfa.2023.102857

Duggal N, He L, Shaw TS (2024) Mandatory corporate social responsibility spending family control and the cost of debt. Br Account Rev. https://doi.org/10.1016/J.Bar.2024.101356

Dyer WG, Whetten DA (2006) Family firms and social responsibility: preliminary evidence from the S&P 500. Entrep Theory Pract 30:785–802. https://doi.org/10.1111/J.1540-6520.2006.00151.X

Eddleston KA, Mulki JP (2021) Differences in family-owned SMEs’ ethical behavior: a mixed gamble perspective of family firm tax evasion. Entrep Theory Pract 45:767–791. https://doi.org/10.1177/1042258720964187

El Ghoul S, Guedhami O, Wang H, Kwok CC (2016) Family control and corporate social responsibility. J Bank Finance 73:131–146. https://doi.org/10.1016/J.Jbankfin.2016.08.008

Ernst R-A, Gerken M, Hack A, Hülsbeck M (2022) Family firms as agents of sustainable development: a normative perspective. Technol Forecast Social Change 174:121135. https://doi.org/10.1016/J.Techfore.2021.121135

Fassin Y, Van Rossem A, Buelens M (2011) Small-business owner-managers’ perceptions of business ethics and CSR-related concepts. J Bus Ethics 98:425–453. https://doi.org/10.1007/S10551-010-0586-Y

Feliu N, Botero IC (2016) Philanthropy in family enterprises. Fam Bus Rev 29:121–141. https://doi.org/10.1177/0894486515610962

Ferreira JJ, Fernandes CI, Schiavone F, Mahto RV (2021) Sustainability in family business—a bibliometric study and a research agenda. Technol Forecast Soc Chang 173:121077. https://doi.org/10.1016/J.Techfore.2021.121077

Fitzgerald MA, Haynes GW, Schrank HL, Danes SM (2010) Socially responsible processes of small family business owners: exploratory evidence from the national family business survey. J Small Bus Manage 48:524–551. https://doi.org/10.1111/J.1540-627x.2010.00307.X

Friede G, Busch T, Bassen A (2015) ESG and financial performance: aggregated evidence from more than 2000 empirical studies. J Sustain Finance Investment 5:210–233. https://doi.org/10.1080/20430795.2015.1118917

Fries A, Kammerlander N, Leitterstorf M (2021) Leadership styles and leadership behaviors in family firms: a systematic literature review. J Fam Bus Strat 12:100374. https://doi.org/10.1016/J.Jfbs.2020.100374

Fritz MMC, Ruel S, Kallmuenzer A, Harms R (2021) Sustainability management in supply chains: the role of familiness. Technol Forecast Soc Chang 173:121078

García-Sánchez I-M, Martín-Moreno J, Khan SA, Hussain N (2021) Socio-emotional wealth and corporate responses to environmental hostility: are family firms more stakeholder oriented? Bus Strateg Environ 30:1003–1018. https://doi.org/10.1002/Bse.2666

Gedajlovic E, Carney M, Chrisman JJ, Kellermanns FW (2012) The adolescence of family firm research: taking stock and planning for the future. J Manag 38:1010–1037. https://doi.org/10.2139/Ssrn.1837578

Gillan SL, Koch A, Starks LT (2021) Firms and social responsibility: a review of ESG and CSR research in corporate finance. J Corp Finan 66:101889. https://doi.org/10.1016/J.Jcorpfin.2021.101889

Gjergji R, Vena L, Sciascia S, Cortesi A (2021) The effects of environmental, social and governance disclosure on the cost of capital in small and medium enterprises: the role of family business status. Bus Strateg Environ 30:683–693. https://doi.org/10.1002/Bse.2647

Gómez-Mejía LR, Haynes KT, Núñez-Nickel M, Jacobson KJL, Moyano-Fuentes J (2007) Socioemotional wealth and business risks in family-controlled firms: evidence from Spanish olive oil mills. Adm Sci Q 52:106–137. https://doi.org/10.2189/Asqu.52.1.106

Gómez-Mejía LR, Cruz C, Berrone P, De Castro J (2011) The bind that ties: socioemotional wealth preservation in family firms. Acad Manag Ann 5:653–707. https://doi.org/10.5465/19416520.2011.593320

Graafland J, Van De Ven B, Stoffele N (2003) Strategies and instruments for organising CSR by small and large businesses in the Netherlands. J Bus Ethics 47:45–60. https://doi.org/10.1023/A:1026240912016

Haddoud MY, Onjewu AK, Nowiński W (2021) Environmental commitment and innovation as catalysts for export performance in family firms. Technol Forecast Social Change 173:121085. https://doi.org/10.1016/J.Techfore.2021.121085

Hambrick DC, Mason PA (1984) Upper Echelons: the organization as a reflection of its top managers. Amr 9:193–206. https://doi.org/10.5465/Amr.1984.4277628

Herrero I, López C, Ruiz-Benítez R (2024) So...are family firms more sustainable? On the economic, social and environmental sustainability of family SMEs. Bus Strat Environ 33:4252–4270. https://doi.org/10.1002/Bse.3699

Holzner B, Wagner M (2022) Linking levels of green innovation with profitability under environmental uncertainty: an empirical study. J Clean Prod 378:134438. https://doi.org/10.1016/J.Jclepro.2022.134438

Horbach J, Prokop V, Stejskal J (2023) Determinants of firms’ greenness towards sustainable development: a multi-country analysis. Bus Strat Environ 32:2868–2881. https://doi.org/10.1002/Bse.3275

Hsueh JW-J (2018) Governance structure and the credibility gap: experimental evidence on family businesses’ sustainability reporting. J Bus Ethics 153:547–568. https://doi.org/10.1007/S10551-016-3409-Y

Hsueh JW-J, De Massis A, Gomez-Mejia L (2023a) Examining heterogeneous configurations of socioemotional wealth in family firms through the formalization of corporate social responsibility strategy. Fam Bus Rev 36:172–198. https://doi.org/10.1177/08944865221146350

Hsueh JW-J, Campopiano G, Tetzlaff E, Jaskiewicz P (2023b) Managing non-family employees’ emotional connection with the family firms via shifting, compensating and leveraging approaches. Long Range Plan 56:102274. https://doi.org/10.1016/J.Lrp.2022.102274

Hu Q, Hughes P, Hughes M, Chapman G, He X (2023) When is R&D beneficial for family firms? The concurrent roles of CSR and economic conditions. R&D Manage 53:524–542. https://doi.org/10.1111/Radm.12580

Hubbard G (2009) Measuring organizational performance: beyond the triple bottom line. Bus Strat Environ 18:177–191. https://doi.org/10.1002/Bse.564

Jiang S, Min Y (2023) The ability and willingness of family firms to bribe: a socioemotional wealth perspective. J Bus Ethics 184:237–254. https://doi.org/10.1007/S10551-022-05086-1

Jiang F, Jiang P, Zheng X (2023) An axe to grind: family outsiders and firms doing good. Corp Gov: an Int Rev 31:921–944. https://doi.org/10.1111/Corg.12509

Kallmuenzer A, Nikolakis W, Peters M, Zanon J (2018) Trade-Offs between dimensions of sustainability: exploratory evidence from family firms in rural tourism regions. J Sustain Tour 26:1204–1221. https://doi.org/10.1080/09669582.2017.1374962

Kang J-K, Kim J (2020) Do family firms invest more than nonfamily firms in employee-friendly policies? Manage Sci 66:1300–1324. https://doi.org/10.1287/Mnsc.2018.3231

Kang J-S, Chiang C-F, Huangthanapan K, Downing S (2015) Corporate social responsibility and sustainability balanced scorecard: the case study of family-owned hotels. Int J Hosp Manag 48:124–134. https://doi.org/10.1016/J.Ijhm.2015.05.001

Kaplan SN, Klebanov MM, Sorensen M (2012) Which CEO characteristics and abilities matter? J Financ 67:973–1007. https://doi.org/10.1111/J.1540-6261.2012.01739.X

Kast FE, Rosenzweig JE (1992) System concepts pervasiveness and potential. Mir Manage Int Rev 32:40–49

Google Scholar  

Katz D, Kahn R (2015) The social psychology of organizations. In: Organizational Behavior 2. Routledge, pp 152–168. https://doi.org/10.4324/9781315702001-12

Kiesel F, Lücke F (2019) ESG in credit ratings and the impact on financial markets. Financial Market 28:263–290. https://doi.org/10.1111/Fmii.12114

Kim J, Fairclough S, Dibrell C (2017) Attention, action, and greenwash in family-influenced firms? Evidence from polluting industries. Organ Environ 30:304–323. https://doi.org/10.1177/1086026616673410

Kim K, Haider ZA, Wu Z, Dou J (2020) Corporate social performance of family firms: a place-based perspective in the context of layoffs. J Bus Ethics 167:235–252. https://doi.org/10.1007/S10551-019-04152-5

La Porta R, Lopez-De-Silanes F, Shleifer A (1999) Corporate ownership around the world. J Financ 54:471–517. https://doi.org/10.1111/0022-1082.00115

Labaki R, Bernhard F, Cailluet L (2019) The strategic use of historical narratives in the family business. In: Memili E, Dibrell C (eds) The Palgrave handbook of heterogeneity among family firms. Palgrave Macmillan, Cham, Switzerland, pp 531–553. https://doi.org/10.1007/978-3-319-77676-7_20

Chapter   Google Scholar  

Labelle R, Hafsi T, Francoeur C, Ben Amar W (2018) Family firms’ corporate social performance: a calculated quest for socioemotional wealth. J Bus Ethics 148:511–525. https://doi.org/10.1007/S10551-015-2982-9

Lamb NH, Butler FC (2018) The influence of family firms and institutional owners on corporate social responsibility performance. Bus Soc 57:1374–1406. https://doi.org/10.1177/0007650316648443

Lartey T, Yirenkyi DO, Adomako S, Danso A, Amankwah-Amoah J, Alam A (2020) Going green, going clean: lean-green sustainability strategy and firm growth. Bus Strat Environ 29:118–139. https://doi.org/10.1002/Bse.2353

Le Breton-Miller I, Miller D (2016) Family firms and practices of sustainability: a contingency View. J Fam Bus Strat 7:26–33. https://doi.org/10.1016/J.Jfbs.2015.09.001

Leonidou LC, Eteokleous PP, Christodoulides P, Strømfeldt Eduardsen J (2023) A dynamic capabilities perspective to socially responsible family business: implications on social-based advantage and market performance. J Bus Res 155:113390. https://doi.org/10.1016/J.Jbusres.2022.113390

Lewis KV, Cassells S, Roxas H (2015) SMEs and the potential for a collaborative path to environmental responsibility. Bus Strat Env 24:750–764. https://doi.org/10.1002/Bse.1843

Li T-T, Wang K, Sueyoshi T, Wang DD (2021) ESG: research progress and future prospects. Sustainability 13:11663. https://doi.org/10.3390/Su132111663

Liu M, Shi Y, Wilson C, Wu Z (2017) Does family involvement explain why corporate social responsibility affects earnings management? J Bus Res 75:8–16. https://doi.org/10.1016/J.Jbusres.2017.02.001

Lorenzen S, Gerken M, Steinmetz H, Block J, Hülsbeck M, Lux FS (2024) Environmental sustainability of family firms: a meta-analysis of handprint and footprint. Entrep Theory Pract. https://doi.org/10.1177/10422587231221799

Luo Y, Kong D, Cui H (2024) Top managers’ rice culture and corporate social responsibility performance. J Bus Ethics. https://doi.org/10.1007/S10551-024-05627-W

Manner MH (2010) The impact of CEO characteristics on corporate social performance. J Bus Ethics 93:53–72. https://doi.org/10.1007/S10551-010-0626-7

Mariani MM, Al-Sultan K, De Massis A (2021) Corporate social responsibility in family firms: a systematic literature review. J Small Bus Manage 61:1192–1246. https://doi.org/10.1080/00472778.2021.1955122

Marques P, Presas P, Simon A (2014) The heterogeneity of family firms in CSR engagement. Fam Bus Rev 27:206–227. https://doi.org/10.1177/0894486514539004

Maung M, Miller D, Tang Z, Xu X (2020) Value-enhancing social responsibility: market reaction to donations by family vs. non-family firms with religious CEOs. J Bus Ethics 163:745–758. https://doi.org/10.1007/S10551-019-04381-8

McGuire J, Dow S, Ibrahim B (2012) All in the family? Social performance and corporate governance in the family firm. J Bus Res 65:1643–1650. https://doi.org/10.1016/J.Jbusres.2011.10.024

Meier O, Schier G (2021) CSR and family CEO: the moderating role of CEO’s age. J Bus Ethics 174:595–612. https://doi.org/10.1007/S10551-020-04624-Z

Memili E, Fang HC, Koç B, Yildirim-Öktem Ö, Sonmez S (2018) Sustainability practices of family firms: the interplay between family ownership and long-term orientation. J Sustain Tour 26:9–28. https://doi.org/10.1080/09669582.2017.1308371

Miller D, Le Breton-Miller I (2014) Deconstructing socioemotional wealth. Entrep Theory Pract 38:713–720. https://doi.org/10.1111/Etap.12111

Miroshnychenko I, De Massis A (2022) Sustainability practices of family and nonfamily firms: a worldwide study. Technol Forecast Soc Chang 174:121079. https://doi.org/10.1016/J.Techfore.2021.121079

Miroshnychenko I, De Massis A, Barontini R, Testa F (2022) Family firms and environmental performance: a meta-analytic review. Fam Bus Rev 35:68–90. https://doi.org/10.1177/08944865211064409

Mueller EF, Flickinger M (2021) It’s a family affair: how social identification influences family CEO compensation. Corp Gov an Int Rev 29:461–478

Muttakin MB, Khan A, Mihret DG (2018) The effect of board capital and CEO power on corporate social responsibility disclosures. J Bus Ethics 150:41–56. https://doi.org/10.1007/S10551-016-3105-Y

Nadeem M, Gyapong E, Ahmed A (2020) Board gender diversity and environmental, social, and economic value creation: does family ownership matter? Bus Strat Environ 29:1268–1284. https://doi.org/10.1002/Bse.2432

Nederhof AJ (1985) Methods of coping with social desirability bias: a review. Eur J Soc Psychol 15:263–280. https://doi.org/10.1002/Ejsp.2420150303

Nekhili M, Nagati H, Chtioui T, Rebolledo C (2017) Corporate social responsibility disclosure and market value: family versus nonfamily firms. J Bus Res 77:41–52. https://doi.org/10.1016/J.Jbusres.2017.04.001

Niehm LS, Swinney J, Miller NJ (2008) Community social responsibility and its consequences for family business performance. J Small Bus Manage 46:331–350. https://doi.org/10.1111/J.1540-627x.2008.00247.X

Nikolakis W, Olaru D, Kallmuenzer A (2022) What motivates environmental and social sustainability in family firms? A cross-cultural survey. Bus Strat Environ 31:2351–2364. https://doi.org/10.1002/Bse.3025

O’Boyle EH, Rutherford MW, Pollack JM (2010) Examining the relation between ethical focus and financial performance in family firms: an exploratory study. Family Bus Rev 23(4):310–26

Oh W-Y, Ree H, Chang YK, Postuła I (2023) Trees in the forest: how do family owners make CSR decisions in business groups? J Bus Ethics 187:759–780. https://doi.org/10.1007/S10551-022-05270-3

Panwar R, Paul K, Nybakk E, Hansen E, Thompson D (2014) The legitimacy of CSR actions of publicly traded companies versus family-owned companies. J Bus Ethics 125:481–496. https://doi.org/10.1007/S10551-013-1933-6

Paul J, Criado AR (2020) The art of writing literature review: what do we know and what do we need to know? Int Bus Rev 29:101717. https://doi.org/10.1016/J.Ibusrev.2020.101717

Peake WO, Cooper D, Fitzgerald MA, Muske G (2017) Family business participation in community social responsibility: the moderating effect of gender. J Bus Ethics 142:325–343. https://doi.org/10.1007/S10551-015-2716-Z

Pieper TM, Klein SB (2007) The bulleye: a systems approach to modeling family firms. Fam Bus Rev 20:301–319. https://doi.org/10.1111/J.1741-6248.2007.00101.X

Piyasinchai N, Thananusak T, Hughes M (2024) Effects of family ownership and professionalization on firms’ financial performance and sustainability reputation. Entrep Theory Pract 48:856–880. https://doi.org/10.1177/10422587231206573

Prügl R (2019) Capturing the heterogeneity of family firms: reviewing scales to directly measure socioemotional wealth. In: Memili E, Dibrell C (eds) The Palgrave handbook of heterogeneity among family firms. Palgrave Macmillan, Cham, Switzerland, pp 461–484

Pütz L, Schell S, Werner A (2023) Openness to knowledge: does corporate social responsibility mediate the relationship between familiness and absorptive capacity? Small Bus Econ 60:1449–1482. https://doi.org/10.1007/S11187-022-00671-0

Randerson K (2022) Conceptualizing family business social responsibility. Technol Forecast Soc Chang 174:121225. https://doi.org/10.1016/J.Techfore.2021.121225

Rees W, Rodionova T (2015) The influence of family ownership on corporate social responsibility: an international analysis of publicly listed companies. Corp Gov an Int Rev 23:184–202. https://doi.org/10.1111/Corg.12086

Richards M, Zellweger T, Gond J-P (2017) Maintaining moral legitimacy through worlds and words: an explanation of firms’ investment in sustainability certification. J Manage Stud 54:676–710. https://doi.org/10.1111/Joms.12249

Rivera-Franco P, Requejo I, Suárez-González I (2024) Internal versus external CSR practices: the trade-off in family firms. Eur Manag Rev. https://doi.org/10.1111/Emre.12662

Rothausen TJ (1999) ‘Family’ in organizational research: a review and comparison of definitions and measures. J Organiz Behav 20:817–836

Russo A, Perrini F (2010) Investigating stakeholder theory and social capital: CSR in large firms and SMEs. J Bus Ethics 91:207–221. https://doi.org/10.1007/S10551-009-0079-Z

Saeed A, Riaz H, Liedong TA, Rajwani T (2023) Does family matter? Ownership, motives and firms’ environmental strategy. Long Range Plan 56:102216. https://doi.org/10.1016/J.Lrp.2022.102216

Sahasranamam S, Arya B, Sud M (2020) Ownership structure and corporate social responsibility in an emerging market. Asia Pac J Manag 37:1165–1192. https://doi.org/10.1007/S10490-019-09649-1

Sekerci N, Jaballah J, Van Essen M, Kammerlander N (2022) Investors’ reactions to CSR news in family versus nonfamily firms: a study on signal (in)credibility. Entrep Theory Pract 46:82–116. https://doi.org/10.1177/10422587211010498

Sharma P, Sharma S (2011) Drivers of proactive environmental strategy in family firms. Bus Ethics Q 21:309–334. https://doi.org/10.5840/Beq201121218

Singal M (2014) Corporate social responsibility in the hospitality and tourism industry: do family control and financial condition matter? Int J Hosp Manag 36:81–89. https://doi.org/10.1016/J.Ijhm.2013.08.002

Singal M, Gerde VW (2015) Is diversity management related to financial performance in family firms? Fam Bus Rev 28:243–259. https://doi.org/10.1177/0894486514566012

Singh G, Sharma S, Sharma R, Dwivedi YK (2021) Investigating environmental sustainability in small family-owned businesses: integration of religiosity, ethical judgment, and theory of planned behavior. Technol Forecast Soc Chang 173:121094. https://doi.org/10.1016/J.Techfore.2021.121094

Sirmon DG, Hitt MA (2003) Managing resources: linking unique resources, management, and wealth creation in family firms. Entrep Theory Pract 27:339–358. https://doi.org/10.1111/1540-8520.T01-1-00013

Smulowitz SJ, Cossin D, Massis DE, A, Lu H, (2023) Wrongdoing in publicly listed family- and nonfamily-owned frms: a behavioral perspective. Entre Theory Pract 47:1233–1264. https://doi.org/10.1177/10422587221142230

Stavrou E, Kassinis G, Filotheou A (2007) Downsizing and stakeholder orientation among the Fortune 500: does family ownership matter? J Bus Ethics 72:149–162. https://doi.org/10.1007/S10551-006-9162-X

Temouri Y, Nardella G, Jones C, Brammer S (2022) Haven-sent? Tax havens, corporate social irresponsibility and the dark side of family firm internationalization. Brit J Manage 33:1447–1467. https://doi.org/10.1111/1467-8551.12559

Terlaak A, Kim S, Roh T (2018) Not good, not bad: the effect of family control on environmental performance disclosure by business group firms. J Bus Ethics 153:977–996. https://doi.org/10.1007/S10551-018-3911-5

Tewari S, Bhattacharya B (2023) Financial resources, corporate social responsibility, and ownership type: evidence from India. Asia Pac J Manag 40:1093–1132. https://doi.org/10.1007/S10490-022-09810-3

Thompson JD (1967) Organizations in action. Mcgraw-Hill, New York. https://doi.org/10.4324/9781315125930

Uhlaner LM, Berent-Braun MM, Jeurissen RJM, De WG (2012) Beyond size: predicting engagement in environmental management practices of Dutch SMEs. J Bus Ethics 109:411–429. https://doi.org/10.1007/S10551-011-1137-X

Vu MC, Discua Cruz A, Burton N (2024) Contributing to the sustainable development goals as normative and instrumental acts: the role of Buddhist religious logics in family SMEs. Int Small Bus J 42:246–275. https://doi.org/10.1177/02662426231182425

Wagner M (2010) Corporate social performance and innovation with high social benefits: a quantitative analysis. J Bus Ethics 94:581–594. https://doi.org/10.1007/S10551-009-0339-Y

Waldkirch M (2020) Non-family CEOs in family firms: spotting gaps and challenging assumptions for a future research agenda. J Fam Bus Strat 11:100305. https://doi.org/10.1016/J.Jfbs.2019.100305

Wang G, Holmes RM, Oh IS, Zhu W (2016) Do CEOs matter to firm strategic actions and firm performance? A meta-analytic investigation based on Upper Echelons Theory. Pers Psychol 69:775–862. https://doi.org/10.1111/Peps.12140

Webster J, Watson RT (2002) Analyzing the past to prepare for the future: writing a literature review. MIS Q 26:13–23

Widyawati L (2020) A systematic literature review of socially responsible investment and environmental social governance metrics. Bus Strat Environ 29:619–637. https://doi.org/10.1002/Bse.2393

Williams RI, Pieper TM, Astrachan JH (2019) Private family business goals: a concise review, goal relationships, and goal formation processes. In: Memili E, Dibrell C (eds) The Palgrave handbook of heterogeneity among family firms. Palgrave Macmillan, Cham, Switzerland, pp 377–405

Wolfswinkel JF, Furtmueller E, Wilderom CPM (2013) Using grounded theory as a method for rigorously reviewing literature. Eur J Inf Syst 22:45–55. https://doi.org/10.1057/Ejis.2011.51

Wright C, Nyberg D (2017) An Inconvenient truth: how organizations translate climate change into business as usual. Amj 60:1633–1661. https://doi.org/10.5465/Amj.2015.0718

Wu B, Monfort A, Jin C, Shen X (2022) Substantial response or impression management? Compliance strategies for sustainable development responsibility in family firms. Technol Forecast Soc Chang 174:121214. https://doi.org/10.1016/J.Techfore.2021.121214

Wu B, Gu Q, Liu Z, Liu J (2023) Clustered institutional investors, shared ESG preferences and low-carbon innovation in family firm. Technol Forecast Soc Chang 194:122676. https://doi.org/10.1016/J.Techfore.2023.122676

Wu B, Chen F, Li L, Xu L, Liu Z, Wu Y (2024) Institutional investor ESG activism and exploratory green innovation: unpacking the heterogeneous responses of family firms across intergenerational contexts. British Account Rev. https://doi.org/10.1016/J.Bar.2024.101324

Yáñez-Araque B, Sánchez-Infante Hernández JP, Gutiérrez-Broncano S, Jiménez-Estévez P (2021) Corporate social responsibility in micro-, small- and medium-sized enterprises: multigroup analysis of family vs. nonfamily firms. J Bus Res 124:581–592. https://doi.org/10.1016/J.Jbusres.2020.10.023

Yeon J, Lin MS, Lee S, Sharma A (2021) Does family matter? The moderating role of family involvement on the relationship between CSR and firm performance. Ijchm 33:3729–3751. https://doi.org/10.1108/Ijchm-03-2021-0315

Yu B, Zeng S, Chen H, Meng X, Tam C (2021) Doing more and doing better are two different entities: different patterns of family control and environmental performance. Bus Strate Environ 30:1–20. https://doi.org/10.1002/Bse.2605

Zahra SA, Sharma P (2004) Family business research: a strategic reflection. Fam Bus Rev 17:331–346. https://doi.org/10.1111/J.1741-6248.2004.00022.X

Zamir F, Saeed A (2020) Location matters: impact of geographical proximity to financial centers on corporate social responsibility (CSR) disclosure in emerging economies. Asia Pac J Manag 37:263–295

Zientara P (2017) Socioemotional wealth and corporate social responsibility: a critical analysis. J Bus Ethics 144:185–199

Download references

Open Access funding enabled and organized by Projekt DEAL. The author has no other relevant financial or non-financial interests to disclose.

Author information

Authors and affiliations.

Research Group of Strategic and International Management, Philipps-Universität Marburg, Marburg (Hesse), Germany

Ramona Waldau

You can also search for this author in PubMed   Google Scholar

Corresponding author

Correspondence to Ramona Waldau .

Ethics declarations

Conflict of interest.

The author declares that there is no conflict of interest.

Additional information

Publisher's note.

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ .

Reprints and permissions

About this article

Waldau, R. A systematic literature review on determinants and outcomes of ESG performance in family firms. Manag Rev Q (2024). https://doi.org/10.1007/s11301-024-00462-9

Download citation

Received : 24 April 2023

Accepted : 09 August 2024

Published : 09 September 2024

DOI : https://doi.org/10.1007/s11301-024-00462-9

Share this article

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

  • Family firms
  • Family ownership
  • Literature review
  • Determinants and outcomes

JEL Classification

  • Find a journal
  • Publish with us
  • Track your research

Click through the PLOS taxonomy to find articles in your field.

For more information about PLOS Subject Areas, click here .

Loading metrics

Open Access

Peer-reviewed

Research Article

SARS-CoV-2 impairs male fertility by targeting semen quality and testosterone level: A systematic review and meta-analysis

Roles Investigation, Project administration, Writing – original draft, Writing – review & editing

Affiliations Medical Faculty, Department of Cardiovascular Surgery and Research Group for Experimental Surgery, Cardiovascular Regenerative Medicine and Tissue Engineering 3D Lab, Heinrich Heine University, Düsseldorf, Germany, Reproductive Biology and Toxicology Research Laboratory, Oasis of Grace Hospital, Osogbo, Nigeria

Affiliations Reproductive Biology and Toxicology Research Laboratory, Oasis of Grace Hospital, Osogbo, Nigeria, Department of Physiology, Ladoke Akintola University of Technology, Ogbomosho, Oyo State, Nigeria

Roles Formal analysis, Investigation, Methodology, Project administration, Supervision, Writing – review & editing

Affiliations Reproductive Biology and Toxicology Research Laboratory, Oasis of Grace Hospital, Osogbo, Nigeria, Department of Agronomy, Breeding and Genetic Unit, Osun State University, Osun State, Nigeria

Roles Conceptualization, Data curation, Formal analysis, Funding acquisition, Investigation, Methodology, Project administration, Resources, Supervision, Validation, Writing – original draft, Writing – review & editing

* E-mail: [email protected] , [email protected]

ORCID logo

  • Ashonibare V. J., 
  • Ashonibare P. J., 
  • Akhigbe T. M., 
  • R. E. Akhigbe

PLOS

  • Published: September 9, 2024
  • https://doi.org/10.1371/journal.pone.0307396
  • Peer Review
  • Reader Comments

Fig 1

Since the discovery of COVID-19 in December 2019, the novel virus has spread globally causing significant medical and socio-economic burden. Although the pandemic has been curtailed, the virus and its attendant complication live on. A major global concern is its adverse impact on male fertility.

This study was aimed to give an up to date and robust data regarding the effect of COVID-19 on semen variables and male reproductive hormones.

Materials and methods

Literature search was performed according to the recommendations of PRISMA. Out of the 852 studies collected, only 40 were eligible for inclusion in assessing the effect SARS-CoV-2 exerts on semen quality and androgens. More so, a SWOT analysis was conducted.

The present study demonstrated that SARS-CoV-2 significantly reduced ejaculate volume, sperm count, concentration, viability, normal morphology, and total and progressive motility. Furthermore, SARS-CoV-2 led to a reduction in circulating testosterone level, but a rise in oestrogen, prolactin, and luteinizing hormone levels. These findings were associated with a decline in testosterone/luteinizing hormone ratio.

Conclusions

The current study provides compelling evidence that SARS-CoV-2 may lower male fertility by reducing semen quality through a hormone-dependent mechanism; reduction in testosterone level and increase in oestrogen and prolactin levels.

Citation: V. J. A, P. J. A, T. M. A, Akhigbe RE (2024) SARS-CoV-2 impairs male fertility by targeting semen quality and testosterone level: A systematic review and meta-analysis. PLoS ONE 19(9): e0307396. https://doi.org/10.1371/journal.pone.0307396

Editor: Stefan Schlatt, University Hospital of Münster, GERMANY

Received: April 26, 2024; Accepted: July 4, 2024; Published: September 9, 2024

Copyright: © 2024 V. J. et al. This is an open access article distributed under the terms of the Creative Commons Attribution License , which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.

Data Availability: All data are in the paper and/or Supporting Information files.

Funding: The author(s) received no specific funding for this work.

Competing interests: The authors have declared that no competing interests exist.

Introduction

Severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2), which is implicated as the causative organism of the Corona-Virus disease 2019 (COVID-19) has remained a global concern since its outbreak [ 1 – 3 ]. SARS-CoV-2 is a sheathed β-coronavirus, which is genetically similar to SARS-CoV-1 (80%) and 96.2% with Bat coronavirus RaTG13 [ 4 ]. The S protein contains the S1 sub-unit, which carries the receptor binding domain that tethers to the angiotensin-converting enzyme 2 (ACE 2) [ 5 , 6 ], and facilitates binding to and entry into host cells [ 4 , 6 ]. Though quite similar, SARS-CoV-2 spreads more expeditiously than SARS-CoV-1, as it has a higher net reproductive rate. Additionally, SARS-CoV-2 exhibits stronger binding to its host receptor cells and greater host invasion because of its slight structural difference from SARS-CoV-1 [ 7 , 8 ]. However, angiotensin-converting enzyme 2 (ACE2) is the primary host receptor of SARS-CoV [ 4 ]. It is liberally present in the epithelial tissue of the lung and small intestine, heart, lungs, kidneys, and testes in humans [ 9 – 19 ], and may contribute possible entry portal for SARS-CoV [ 20 ].

As of May 2023, over 766 million COVID- 19 cases, with about 7 million mortalities were reported [ 9 ]. Studies have revealed that COVID-19 mainly affects both male and female respiratory systems [ 4 , 8 ]. Studies have also demonstrated that the virus causes damage to multiple organs, including the kidney, heart, liver, brain [ 10 , 12 ], and testes [ 2 , 4 , 6 , 8 , 13 ]. In addition, there is proof that SARS-CoV-1 exerts a more severe impact on males than females [ 6 , 14 – 17 ]. Also, orchitis has been reported in males recovering from the SARS virus [ 3 , 18 ]. Despite this, findings on the adverse effect of this deadly virus on the male reproductive system are limited and contentious. In a systematic review and meta-analysis by Corona et al. [ 21 ], SARS-CoV-2 infection was linked with low semen quality and serum testosterone level. This is in agreement with earlier systematic review and meta-analysis by Tiwari et al. [ 22 ]. The study however had some frailties- first, the random-effect model was used irrespective of the level of diversity, which might affect the findings of the meta-analysis. Also, no sensitivity analyses were performed to rule out the influence of diversity. Finally, the authors failed to apply the finding of the quality of the appraised studies to their analysis.

Therefore, the aim of this study is oriented towards providing an overhauling meta-analysis on the consequence of COVID-19 on male fertility. This review gives an insight into how COVID-19 impact semen quality and male reproductive hormones to modulate male fertility. So far as we are aware, this research pioneers the evaluation of the impact of COVID-19 by comparing between infected and non-infected subjects, before and after treatment in infected patients, and infected and pre-COVID state in the same patients. Hence, the present study evinces a robust review and analysis of the influence of SARS-CoV-2 on male fertility.

Protocol and eligibility criteria for inclusion

This study was registered on Prospero (CRD42024533906). This study was conducted on published works that evaluated the influence of SARS-CoV-2 on male fertility. The study adopted the “Preferred Reporting Items for Systematic Reviews and Meta-analyses (PRISMA)” strategy, which is provided as Fig 1 .

thumbnail

  • PPT PowerPoint slide
  • PNG larger image
  • TIFF original image

https://doi.org/10.1371/journal.pone.0307396.g001

This study adopted the Population, Exposure, Comparator/Comparison, and Outcomes (PECO) model. All studies published until October 2023 that were eligible based on set criteria, were collected. The studied populations were male in their reproductive ages, who had an exposure to SARS-CoV-2 and developed COVID 19. The studies were either retrospective or prospective among COVID 19-infected patients with age-matched control who were COVID 19-negative. In cases where there were no COVID 19-negative control groups, outcomes before and after the treatment of COVID 19 or at pre-COVID and COVID 19-infected states should be presented. The outcome measured were conventional semen parameters viz. ejaculate volume, sperm count, concentration, viability, normal morphology, total and progressive motility, and seminal fluid leukocyte level, and male reproductive hormones namely testosterone (T), oestrogen, prolactin, follicle-stimulating hormone (FSH), and luteinizing hormone (LH) levels. T/LH and FSH/LH were also measured.

Exclusion criteria included absence of a comparator as control, studies in females, in vitro studies, commentaries, review articles, letters to editor, editorials, preprint, conference abstracts, retracted papers, and degree thesis. No language or country restriction was applied.

Search strategy

An organized search using EMBASE, Pubmed/MEDLINE, Scopus, and Web of Science databases was performed. The keywords combined were “COVID”, “COVID 19”, “coronavirus”, “SARS-CoV-2”, “semen”, “semen analysis”, “seminal fluid”, “sperm”, “sperm cells”, “spermatozoa”, “sperm parameter”, “sperm variable”, “sperm count”, “sperm concentration”, “sperm viability”, “sperm vitality”, “sperm motility”, “total sperm motility”, “progressive sperm motility”, “sperm morphology”, “semen volume”, “ejaculate volume”, “seminal leukocyte”, and “seminal WBC’, “luteinizing hormone”, “LH”, “follicle stimulating hormone”, “FSH”, “testosterone”, “male fertility”, “male infertility”, “male reproduction”. Abstracts and full text of articles collected were independently evaluated for eligibility by AVJ, APJ, and . ATM, and differences of opinion were resolved by ARE.

Data collection, assessment of quality of eligible studies, and meta-analysis

The eligible studies were appraised for quality and data collected by AVJ, APJ, and . ATM. Disputes were resolved by ARE. Data gathered from the appropriate studies include the last name of the principal investigator, publication date, country of study origin, study design, method of COVID 19 diagnosis, sample size and ages of patients, duration of infection, and measured outcomes of interest. The outcomes of interest were pull out as mean and standard deviation. When the variables were presented in other forms, the mean and standard deviation were derived from the provided data. In cases where the outcomes were reported in Figs, they were converted to values using Web Plot Digitizer.

The quality of evidence in the eligible papers was evaluated using the ErasmusAGE quality score for systematic reviews, which assigns a number between 0 and 2 to five domains [ 23 ]. Furthermore, the “Office of Health Assessment and Translation (OHAT)” methodology was used to evaluate the risk of bias (RoB) [ 24 ]. Using the “Grading of Recommendations Assessment, Development and Evaluation (GRADE) Working Group” standards as a guide, the “OHAT approach for systematic review and evidence integration for literature-based health assessment was used to assess the certainty of the evidence” [ 25 , 26 ].

Review Manager (version 5.4.1) was used to conduct the quantitative meta-analyses. From the eligible studies, the standardized mean difference (SMD) at 95% confidence intervals (CIs) was calculated. A random-effect model was used when P-value < 0.1 or I2 > 50% which indicates the existence of significant variety; otherwise, a fixed-effect model was utilized. To assess the possible sources of diversity, sensitivity analysis was conducted by excluding the studies with the largest weight, high RoB (< 4), low quality of evidence (< 5) and low certainty of evidence. Also, the generated funnel’s plots were visually assessed for publication bias.

The selection of studies and the attributes of the relevant studies

Out of the 852 publications screened, only 50 were potentially eligible for evaluation. Finally, 40 studies [ 27 – 66 ] were deemed eligible for inclusion in this study ( Fig 1 ). The eligible papers were published between 2020 and 2023, and they were from China (7), Germany (1), India (1), Indonesia (1), Iran (6), Iraq (2), Italy (5), Jordan (2), Russia (1), Turkey (12), UK (1), and USA (1). The data collected included the surname of the year of publication, principal investigator, country of study origin, study design, method of diagnosing COVID-19, studied population size, participants’/patients’ age range, duration of infection, outcomes measured ( Table 1 ).

thumbnail

https://doi.org/10.1371/journal.pone.0307396.t001

Assessment of the quality of evidence, RoB, and certainty of evidence

A larger part of the studies had good quality of evidence, except 7 of them [ 27 , 31 , 40 , 48 , 50 , 56 , 64 ] that had low quality of evidence (<5) ( Table 2 ). Also, the included studies had moderate (4/9-6/9) to low (>6/9) RoB ( Table 3 ). In addition, the certainty of evidence in the included studies were moderate to high, except in 3 studies [ 29 , 48 , 56 ] with low certainty of evidence ( Table 4 ).

thumbnail

https://doi.org/10.1371/journal.pone.0307396.t002

thumbnail

https://doi.org/10.1371/journal.pone.0307396.t003

thumbnail

https://doi.org/10.1371/journal.pone.0307396.t004

Meta-analysis and sensitivity analysis

Ejaculate volume..

Based on the details of the meta-analysis of the 13 eligible studies that compared ejaculate volume in 591 COVID-positive patients with 722 COVID-negative individuals, SARS-CoV significantly reduced the ejaculate volume of infected patients (SMD -0.38 [95% CI: -0.70, -0.05] P = 0.02). Also, a marked inter-study diversity was noted (I 2 = 85%; X 2 P < 0.00001). Sensitivity analysis showed that ejaculate volume was still significantly reduced in SARS-CoV-infected patients when compared with the SARS-CoV-negative ones (SMD -0.42 [95% CI: -0.77, -0.07] P = 0.02), and the inter-study diversity was also significant (I 2 = 85%; X 2 P< 0.00001) ( Fig 2A ). Furthermore, the comparison of 286 COVID-positive patients before treatment with 300 patients after treatment revealed that the ejaculate volume was significantly increased after treatment when compared to before treatment (SMD -0.30 [95% CI: -0.46, -0.14] P = 0.0003), and there was no significant inter-study diversity (I 2 = 36%; X 2 P = 0.13). However, sensitivity analysis demonstrated that the ejaculate volume was not different before and after COVID treatment (SMD -0.24 [95% CI: -0.59, 0.11] P = 0.19). This showed marginal significant inter-study diversity (I 2 = 55%; X 2 P = 0.05) ( Fig 2B ). More so, it was observed that SAR-Cov-2 infection significantly reduced ejaculate volume of patients when compared with their pre-COVID (SMD -0.28 [95% CI: -0.55, -0.01] P = 0.04). There was a significant inter-study diversity (I 2 = 67%; X 2 P = 0.004). This significant difference persisted even after a sensitivity analysis (SMD -0.29 [95% CI: -0.55, -0.03] P = 0.03), and there was no significant inter-study diversity (I 2 = 35%; X 2 P = 0.20) ( Fig 2C ). The publication bias is shown in Fig 3 .

thumbnail

Forest plot of ejaculate volume comparing between COVID-19 positive and COVID-19 negative patients (A), before COVID-19 treatment and after COVID-19 treatment (B), and COVID-19 positive and preCOVID-19 period (C).

https://doi.org/10.1371/journal.pone.0307396.g002

thumbnail

Funnel plot of ejaculate volume comparing between COVID-19 positive and COVID-19 negative patients (A), before COVID-19 treatment and after COVID-19 treatment (B), and COVID-19 positive and preCOVID-19 period (C).

https://doi.org/10.1371/journal.pone.0307396.g003

Sperm count.

SARS-CoV-2 infection significantly reduced sperm count in contrast to non-infected persons (SMD -0.74 [95% CI: -1.43, -0.06] P = 0.03), and there was a marked heterogeneity between studies (I 2 = 95%; X 2 P < 0.00001); however after sensitivity analysis, SARS-CoV-2 infection only led to a marginal decline in sperm count (SMD -0.90 [95% CI: -1.91, 0.10] P = 0.08), and we observed a marked heterogeneity between studies (I 2 = 96%; X 2 P < 0.00001) ( Fig 4A ). However, COVID-19 treatment did not significantly improve sperm count when compared with the pre-treatment value (SMD -0.24 [95% CI: -0.66, 0.17] P = 0.24), and there was a marked heterogeneity between studies (I 2 = 83%; X 2 P < 0.00001), which persisted after sensitivity analysis (SMD -0.20 [95% CI: -0.78, 0.38] P = 0.50) with no marked heterogeneity between studies (I 2 = 83%; X 2 P < 0.00001) ( Fig 4B ). Nonetheless, SARS-CoV-2 infection significantly reduced sperm count when compared with the pre-COVID value of the patients (SMD -0.27 [95% CI: -0.45, -0.10] P = 0.002), and there no substantial inter-study diverseness was found (I 2 = 37%; X 2 P = 0.16) ( Fig 4C ). The funnels’ plots showing the publication bias are presented in Fig 5 .

thumbnail

Forest plot of sperm count comparing between COVID-19 positive and COVID-19 negative patients (A), before COVID-19 treatment and after COVID-19 treatment (B), and COVID-19 positive and preCOVID-19 period (C).

https://doi.org/10.1371/journal.pone.0307396.g004

thumbnail

Funnel plot of sperm count comparing between COVID-19 positive and COVID-19 negative patients (A), before COVID-19 treatment and after COVID-19 treatment (B), and COVID-19 positive and preCOVID-19 period (C).

https://doi.org/10.1371/journal.pone.0307396.g005

Sperm concentration.

Analysis of the impact of SARS-CoV-2 on sperm concentration revealed that the novel infection significantly reduced sperm concentration when compared with SARS-CoV-2-uninfected individuals (SMD -0.83 [95% CI: -1.46, -0.20] P = 0.010). Again, no substantial heterogeneity between studies was found (I 2 = 95%; X 2 P < 0.00001). After sensitivity analysis, SARS-CoV-2 only marginally reduced sperm concentration when compared with individuals who were not SARS-CoV-2 positive (SMD -1.02 [95% CI: -2.16, 0.12] P = 0.08). There was a significant inter-study variety (I 2 = 97%; X 2 P < 0.00001) ( Fig 6A ). However, when compare, we found no significant variability between sperm concentration before and after SARS-CoV-2 treatment (SMD -0.21 [95% CI: -0.53, 0.10] P = 0.19) and there was a significant inter-study diversity (I 2 = 69%; X 2 P = 0.001), even after sensitivity analysis (SMD -0.18 [95% CI: -0.59, 0.23] P = 0.39), and there was no marked heterogeneity between studies (I 2 = 67%; X 2 P = 0.010) ( Fig 6B ). Notwithstanding, SARS-CoV-2 significantly reduced sperm concentration of the patients when compared with the pre-COVID period (SMD -0.42 [95% CI: -0.70, -0.14] P = 0.004), we found no marked heterogeneity between studies (I 2 = 69%; X 2 P = 0.002). After sensitivity analysis, it was still observed that SARS-CoV-2 significantly reduced sperm concentration when compared with the pre-COVID values of the patients (SMD -0.31 [95% CI: -0.50, -0.12] P = 0.001), and there existed no significant inter-study variability (I 2 = 32%; X 2 P = 0.21) ( Fig 6C ). The publication bias as depicted by the funnels’ plots are shown in Fig 7 .

thumbnail

Forest plot of sperm concentration comparing between COVID-19 positive and COVID-19 negative patients (A), before COVID-19 treatment and after COVID-19 treatment (B), and COVID-19 positive and preCOVID-19 period (C).

https://doi.org/10.1371/journal.pone.0307396.g006

thumbnail

Funnel plot of sperm concentration comparing between COVID-19 positive and COVID-19 negative patients (A), before COVID-19 treatment and after COVID-19 treatment (B), and COVID-19 positive and preCOVID-19 period (C).

https://doi.org/10.1371/journal.pone.0307396.g007

Sperm viability.

SARS-CoV-2 significantly lowered sperm viability in comparison to SARS-CoV-2 uninfected individuals (SMD -1.08 [95% CI: -1.83, -0.33] P = 0.005). There was a notable inter-study diversity (I 2 = 88%; X 2 P < 0.00001). Sensitivity analysis demonstrated that SARS-CoV-2 yet significantly reduced sperm viability when compared to the control (SMD -1.34 [95% CI: -1.95, -0.72] P< 0.0001), and there was a substantial inter-study diversity (I 2 = 73%; X 2 P = 0.01) ( Fig 8A ). Moreover, sperm viability was significantly dropped in SARS-CoV-2 positive individuals before treatment in comparison to after treatment (SMD -0.84 [95% CI: -1.37, -0.31] P = 0.002), and there was a significant inter-study diversity (I 2 = 75%; X 2 P = 0.003). After sensitivity analysis, there was a significant drop in sperm viability among patients infected with SARS-CoV-2 positive when juxtaposed with the control (SMD -0.53 [95% CI: -0.86, -0.20] P = 0.002), but there existed no significant inter-study variability (I 2 = 0%; X 2 P = 0.53) ( Fig 8B ). In addition, when colligated with their premorbid state, sperm viability was significantly reduced in SARS-CoV-2 positive patients (SMD -0.85 [95% CI: -1.43, -0.26] P = 0.005). There was a substantial heterogeneity between studies (I 2 = 82%; X 2 P = 0.02) ( Fig 8C ). Fig 9 shows the funnels’ plots demonstrating the publication bias.

thumbnail

Forest plot of sperm viability comparing between COVID-19 positive and COVID-19 negative patients (A), before COVID-19 treatment and after COVID-19 treatment (B), and COVID-19 positive and preCOVID-19 period (C).

https://doi.org/10.1371/journal.pone.0307396.g008

thumbnail

Funnel plot of sperm viability comparing between COVID-19 positive and COVID-19 negative patients (A), before COVID-19 treatment and after COVID-19 treatment (B), and COVID-19 positive and preCOVID-19 period (C).

https://doi.org/10.1371/journal.pone.0307396.g009

Total and progressive sperm motility.

The total sperm motility was only marginally diminished in SARS-CoV-2 positive patients when compared with the control (SMD -0.30 [95% CI: -0.61, 0.00] P = 0.05), and there was a marked heterogeneity between studies (I 2 = 63%; X 2 P = 0.008). After sensitivity analysis, the difference in the total sperm motility remained insignificant (SMD -0.34 [95% CI: -0.86, 0.18] P = 0.20), and there was a marked heterogeneity between studies (I 2 = 82%; X 2 P < 0.0001) ( Fig 10A ). Also, there was a marginal decline in total sperm motility in SARS-CoV-2 positive patients before, juxtaposed with after treatment (SMD -0.34 [95% CI: -0.86, 0.18] P = 0.20), and there was a marked heterogeneity between studies (I 2 = 82%; X 2 P < 0.0001), even after sensitivity analysis (SMD -0.54 [95% CI: -1.36, 0.28] P = 0.20), and there was a marked heterogeneity between studies (I 2 = 84%; X 2 P = 0.0002) ( Fig 10B ). However, SARS-CoV-2 led to a marked decline in total sperm motility in infected patients when compared with their premorbid values (SMD -0.68 [95% CI: -1.12, -0.24] P = 0.002), and there was a marked heterogeneity between studies (I 2 = 87%; X 2 P < 0.00001). After sensitivity analysis, the significant difference in total sperm motility persisted in SARS-CoV-2 positive patients between the infected state and premorbid state (SMD -0.73 [95% CI: -1.42, -0.04] P = 0.04), and there was a significant inter-study diversity (I 2 = 90%; X 2 P < 0.00001) ( Fig 10C ). The funnels’ plots showing the publication bias are presented in Fig 11 .

thumbnail

Forest plot of total sperm motility comparing between COVID-19 positive and COVID-19 negative patients (A), before COVID-19 treatment and after COVID-19 treatment (B), and COVID-19 positive and preCOVID-19 period (C).

https://doi.org/10.1371/journal.pone.0307396.g010

thumbnail

Funnel plot of total sperm motility comparing between COVID-19 positive and COVID-19 negative patients (A), before COVID-19 treatment and after COVID-19 treatment (B), and COVID-19 positive and preCOVID-19 period (C).

https://doi.org/10.1371/journal.pone.0307396.g011

When colligated with the controls, progressive sperm motility substantially diminished in SARS-CoV-2 positive patients (SMD -0.48 [95% CI: -0.94, -0.02] P = 0.04), and there was a marked heterogeneity between studies (I 2 = 86%; X 2 P < 0.00001); although after sensitivity analysis, SARS-CoV-2 only caused a marginal decline in progressive sperm motility when compared with the control (SMD -0.51 [95% CI: -1.09, 0.07] P = 0.08), and there was a marked heterogeneity between studies (I 2 = 89%; X 2 P < 0.00001) ( Fig 12A ). In addition, COVID-19 significantly reduced progressive sperm motility in infected patients before treatment when compared with after treatment (SMD -0.41 [95% CI: -0.77, -0.05] P = 0.02), and there was a significant inter-study diversity (I 2 = 77%; X 2 P < 0.0001). Following sensitivity analysis, it was revealed that SARS-CoV-2 significantly reduced progressive sperm motility in infected patients before treatment when compared with after treatment (SMD -0.53 [95% CI: -1.02, -0.05] P = 0.03), and there was a marked heterogeneity between studies (I 2 = 74%; X 2 P = 0.002) ( Fig 12B ). Furthermore, SARS-CoV-2 caused a significant decline in progressive sperm motility in infected cohorts when compared with their premorbid state (SMD -0.49 [95% CI: -0.80, -0.19] P = 0.002), and there was a significant inter-study variation (I 2 = 65%; X 2 P = 0.009); however, this was observed to be marginal after sensitivity analysis (SMD -0.18 [95% CI: -0.56, 0.19] P = 0.34), and there was no significant inter-study diversity (I 2 = 0%; X 2 P = 0.81) ( Fig 12C ). The funnels’ plots showing publication bias are presented in Fig 13 .

thumbnail

Forest plot of progressive sperm motility comparing between COVID-19 positive and COVID-19 negative patients (A), before COVID-19 treatment and after COVID-19 treatment (B), and COVID-19 positive and preCOVID-19 period (C).

https://doi.org/10.1371/journal.pone.0307396.g012

thumbnail

Funnel plot of progressive sperm motility comparing between COVID-19 positive and COVID-19 negative patients (A), before COVID-19 treatment and after COVID-19 treatment (B), and COVID-19 positive and preCOVID-19 period (C).

https://doi.org/10.1371/journal.pone.0307396.g013

Sperm morphology.

SARS-CoV-2 infection did not significantly alter normal sperm morphology when compared with the COVID-19-negative controls (SMD -0.49 [95% CI: -1.33, 0.34] P = 0.25), and there was a marked heterogeneity between studies (I 2 = 95%; X 2 P < 0.00001), even after sensitivity analysis (SMD -0.70 [95% CI: -1.83, 0.43] P = 0.23), and there was a significant inter-study variation (I 2 = 96%; X 2 P < 0.00001) ( Fig 14A ). Similarly, SARS-CoV-2 did not considerably affect sperm morphology in infected patients before treatment in comparison with after treatment (SMD -0.19 [95% CI: -0.58, 0.21] P = 0.36), and there was a marked heterogeneity between studies (I 2 = 84%; X 2 P < 0.00001), despite sensitivity analysis (SMD -0.25 [95% CI: -0.81, 0.31] P = 0.38), and there was a marked heterogeneity between studies (I 2 = 85%; X 2 P < 0.00001) ( Fig 14B ). More so, SARS-CoV-2 caused a decline in normal sperm morphology in infected cohorts when colligated with their pre-morbid states (SMD -0.83 [95% CI: -1.69, 0.03] P = 0.06), and there was a marked heterogeneity between studies (I 2 = 92%; X 2 P < 0.00001). Nevertheless, there was a substantial reduction in the proportion of sperm with normal morphology after sensitivity analysis in SARS-CoV-2 positive patients when juxtaposed with their pre-COVID states (SMD -0.65 [95% CI: -1.03, -0.26] P = 0.0010), and there was no marked heterogeneity between studies (I 2 = 0%; X 2 P = 0.50) ( Fig 14C ). The publication bias as depicted by funnels’ plots are presented in Fig 15 .

thumbnail

Forest plot of normal sperm morphology comparing between COVID-19 positive and COVID-19 negative patients (A), before COVID-19 treatment and after COVID-19 treatment (B), and COVID-19 positive and preCOVID-19 period (C).

https://doi.org/10.1371/journal.pone.0307396.g014

thumbnail

Funnel plot of normal sperm morphology comparing between COVID-19 positive and COVID-19 negative patients (A), before COVID-19 treatment and after COVID-19 treatment (B), and COVID-19 positive and preCOVID-19 period (C).

https://doi.org/10.1371/journal.pone.0307396.g015

Seminal leukocyte count.

Only two studies reported data on seminal fluid leukocyte, comparing COVID-positive and COVID-negative patients, while three studies reported these parameters comparing COVID-pre- and post- treatment status of the infected patients. Unexpectedly, SARS-CoV-2 infection did not alter seminal leukocyte levels when compared with controls (SMD -0.01 [95% CI: -0.46, 0.43] P = 0.95), and there was no marked heterogeneity between studies (I 2 = 29%; X 2 P = 0.24). In addition, when seminal leukocytes in SARS-CoV-2 positive patients were colligated before and after treatment, there was no marked heterogeneity (SMD 0.34 [95% CI: -0.33, 1.00] P = 0.32), and there was a marked heterogeneity between studies (I 2 = 80%; X 2 P = 0.007) ( Fig 16 ). The funnels’ plots showing the publication bias are shown in Fig 17 .

thumbnail

Forest plot of seminal leukocyte count comparing between COVID-19 positive and COVID-19 negative patients (A) and before COVID-19 treatment and after COVID-19 treatment (B).

https://doi.org/10.1371/journal.pone.0307396.g016

thumbnail

Funnel plot of seminal leukocyte count comparing between COVID-19 positive and COVID-19 negative patients (A) and before COVID-19 treatment and after COVID-19 treatment (B).

https://doi.org/10.1371/journal.pone.0307396.g017

Circulating testosterone, oestrogen, and prolactin levels.

SARS-CoV-2 infection engendered a substantial diminution in serum testosterone level when collocated with covid-19-negative controls (SMD -1.00 [95% CI: -1.49, -0.51] P< 0.0001), and there was a marked heterogeneity between studies (I 2 = 96%; X 2 P < 0.00001) ( Fig 18A ). However, SARS-CoV-2 infection did not significantly cause a wane in serum testosterone level in infected patients in comparison before and after treatment (SMD -0.87 [95% CI: -1.90, 0.16] P = 0.10), and there was a significant inter-study diversity (I 2 = 95%; X 2 P < 0.00001). After sensitivity analysis, serum testosterone level did not also show notable distinction between SARS-CoV-2 positive patients before and after treatment (SMD -1.30 [95% CI: -3.27, 0.67] P = 0.20), and there was a significant inter-study diversity (I 2 = 98%; X 2 P < 0.00001) ( Fig 18B ). More so, circulating testosterone level was not significantly altered in SARS-CoV-2 positive patients in colligation with their premorbid states (SMD -0.51 [95% CI: -1.22, 0.19] P = 0.15), and there was a marked heterogeneity between studies (I 2 = 88%; X 2 P = 0.0003) ( Fig 18C ). The publication bias using funnels’ plots are shown in Fig 19 .

thumbnail

Forest plot of serum testosterone level comparing between COVID-19 positive and COVID-19 negative patients (A), before COVID-19 treatment and after COVID-19 treatment (B), and COVID-19 positive and preCOVID-19 period (C).

https://doi.org/10.1371/journal.pone.0307396.g018

thumbnail

Funnel plot of serum testosterone level comparing between COVID-19 positive and COVID-19 negative patients (A), before COVID-19 treatment and after COVID-19 treatment (B), and COVID-19 positive and preCOVID-19 period (C).

https://doi.org/10.1371/journal.pone.0307396.g019

In addition, serum concentration of oestrogen was marginally higher in SARS-CoV-2 patients in comparison with uninfected controls (SMD 0.62 [95% CI: 0.18, 1.07] P = 0.006). There was a marked heterogeneity between studies (I 2 = 70%; X 2 P = 0.04) ( Fig 20A ). The funnel’s plot showing the publication bias is shown in Fig 20B .

thumbnail

Forest plot (A) and funnel plot (B) of serum oestrogen level comparing between COVID-19 positive and COVID-19 negative patients.

https://doi.org/10.1371/journal.pone.0307396.g020

However, SARS-CoV-2 infection significantly increased serum prolactin concentration when compared with uninfected control (SMD 0.53 [95% CI: 0.11, 0.95] P = 0.01), and there was a notable heterogeneity between studies (I 2 = 86%; X 2 P < 0.00001) ( Fig 21A ). In comparison with SARS-CoV-2 positive patients after treatment, SARS-CoV-2 infection did not significantly alter serum prolactin level (SMD 0.39 [95% CI: -0.85, 1.64] P = 0.54), and there was a substantial inter-study variation (I 2 = 91%; X 2 P < 0.0001) ( Fig 21B ). The funnels’ plots showing the publication bias are shown in Fig 22 .

thumbnail

Forest plot of serum prolactin level comparing between COVID-19 positive and COVID-19 negative patients (A) and before COVID-19 treatment and after COVID-19 treatment (B).

https://doi.org/10.1371/journal.pone.0307396.g021

thumbnail

Funnel plot of serum prolactin level comparing between COVID-19 positive and COVID-19 negative patients (A) and before COVID-19 treatment and after COVID-19 treatment (B).

https://doi.org/10.1371/journal.pone.0307396.g022

Serum levels of gonadotropins.

Serum level of LH was significantly elevated in SARS-CoV-2 positive when juxtaposed with the uninfected control (SMD 0.75 [95% CI: 0.19, 1.31] P = 0.009), and there was a marked heterogeneity between studies (I 2 = 96%; X 2 P < 0.0001). After sensitivity analysis, serum LH level remained higher in SARS-CoV-2 positive cohorts in colligation with the negative cohorts (SMD 1.09 [95% CI: 0.10, 2.07] P = 0.03), and there was a substantial heterogeneity between studies (I 2 = 97%; X 2 P < 0.0001) ( Fig 23A ). However, serum LH level was not significantly different in SARS-CoV-2 positive before and after treatment (SMD 0.05 [95% CI: -0.28, 0.37] P = 0.78), and there was no significant inter-study diversity (I 2 = 0%; X 2 P = 0.76) ( Fig 23B ). In addition, there was no notable variance in serum LH levels in SARS-CoV-2 positive patients when compared with their pre-COVID state (SMD 0.54 [95% CI: -0.47, 1.56] P = 0.29), and there was a substantial heterogeneity between studies (I 2 = 94%; X 2 P < 0.00001) ( Fig 23C ). The publication bias, using funnels’ plots, are shown in Fig 24 .

thumbnail

Forest plot of serum luteinizing hormone (LH) level comparing between COVID-19 positive and COVID-19 negative patients (A), before COVID-19 treatment and after COVID-19 treatment (B), and COVID-19 positive and preCOVID-19 period (C).

https://doi.org/10.1371/journal.pone.0307396.g023

thumbnail

Funnel plot of serum luteinizing hormone (LH) level comparing between COVID-19 positive and COVID-19 negative patients (A), before COVID-19 treatment and after COVID-19 treatment (B), and COVID-19 positive and preCOVID-19 period (C).

https://doi.org/10.1371/journal.pone.0307396.g024

Serum FSH was marginally increased in SARS-CoV-2 positive patients when compared with the control (SMD 0.13 [95% CI: -0.16, 0.43] P = 0.37), and there was a noteworthy heterogeneity between studies (I 2 = 90%; X 2 P < 0.00001), which persisted even after sensitivity analysis (SMD 0.13 [95% CI: -0.25, 0.51] P = 0.50), and there was a marked heterogeneity between studies (I 2 = 91%; X 2 P < 0.00001) ( Fig 25A ). In comparison with infected patients after treatment, FSH level in infected patients was not significantly different (SMD -0.36 [95% CI: -1.07, 0.35] P = 0.32), and there was a marked heterogeneity between studies (I 2 = 89%; X 2 P < 0.0001) ( Fig 25C ). Also, FSH level did not show any significant difference in SARS-CoV-2 positive when compared with the preCOVID state (SMD 0.11 [95% CI: -0.03, 0.25] P = 0.12), and there was no significant inter-study diversity (I 2 = 0%; X 2 P = 0.98) ( Fig 25C ). The funnels’ plot showing the publication bias are presented in Fig 26 .

thumbnail

Forest plot of serum follicle-stimulating hormone (FSH) level comparing between COVID-19 positive and COVID-19 negative patients (A), before COVID-19 treatment and after COVID-19 treatment (B), and COVID-19 positive and preCOVID-19 period (C).

https://doi.org/10.1371/journal.pone.0307396.g025

thumbnail

Funnel plot of serum follicle-stimulating hormone (FSH) level comparing between COVID-19 positive and COVID-19 negative patients (A), before COVID-19 treatment and after COVID-19 treatment (B), and COVID-19 positive and preCOVID-19 period (C).

https://doi.org/10.1371/journal.pone.0307396.g026

Reproductive hormone indices.

Serum testosterone/LH and FSH/LH were compared in SARS-CoV-2 positive patients and the uninfected controls. It was observed that SARS-CoV-2 engendered a significant decline in testosterone/LH level when compared with the control (SMD -2.44 [95% CI: -3.69, -1.19] P = 0.0001), and there existed a notable inter-study variation (I 2 = 99%; X 2 P < 0.00001) ( Fig 27A ). The publication bias is shown in Fig 27B .

thumbnail

Forest (A) and funnel (B) plots of serum testosterone/luteinizing hormone (T/LH) ratio comparing between COVID-19 positive and COVID-19 negative patients.

https://doi.org/10.1371/journal.pone.0307396.g027

Furthermore, SARS-CoV-2 infection resulted in a marginal reduction in FSH/LH level when juxtaposed with the control (SMD -2.06 [95% CI: -4.36, 0.25] P = 0.08), and there was a significant inter-study diversity (I 2 = 98%; X 2 P < 0.00001) ( Fig 28A ). The publication bias is shown in Fig 28B .

thumbnail

Forest (A) and funnel (B) plots of serum follicle-stimulating hormone/luteinizing hormone (FSH/LH) ratio comparing between COVID-19 positive and COVID-19 negative patients.

https://doi.org/10.1371/journal.pone.0307396.g028

Although the achievement of clinical pregnancy and live birth is the true test of infertility, conventional semen analysis remains the cornerstone of the diagnosis and management of male infertility [ 67 ]. Evaluation of male sex hormones is also a useful tool in the management of male infertility. Our present data revealed that SARS-CoV-2 caused reductions in ejaculate volume, sperm count, concentration, viability, normal morphology, and total and progressive motility. These findings were associated with SARS-CoV-2-induced decline in serum testosterone level, and increase in oestrogen, prolactin, LH, and testosterone/LH levels. These data convincingly demonstrate that SARS-CoV-2 may impede fertility in males by engendering a nadir of semen quality and distorting male reproductive hormone milieu.

The present findings corroborate and form an extension of the previous findings of the meta-analysis of Corona et al. [ 21 ], Tiwari et al. [ 22 ], and Xie et al [ 68 ]. Our present findings provide an update and robust data demonstrating the detrimental sequelae of SARS-CoV-2 on semen quality and male sex hormones. These data also augment the evidence available in the scientific literature that support the grievous consequence which SARS-CoV-2 impacts on male reproductive function.

It is plausible to infer that SARAS-CoV-2 may impair male fertility through multiple pathways. The expression of SARS-CoV-2 virus in the semen of infected patients [ 69 – 71 ] suggests that the virus may exert a local effect on the sperm cells. SARS-CoV-2 virus promotes oxidative stress evinced by heightened reactive oxygen species (ROS) generation, malondialdehyde (MDA) level and decline in total antioxidant capacity (TAC) in the semen fluid of infected patients [ 38 ]. Since the sperm cells are rich in polyunsaturated fatty acids that make them highly susceptible to ROS attack, SARS-CoV-2-induced ROS generation in the spermatozoa may cause oxidative sperm damage, leading to reduced sperm count, viability, motility, concentration, and normal morphology.

In addition, studies have shown that SARS-CoV-2 positively modulates cytokines 30 through extracellular-regulated protein kinase (ERK) and p38 mitogen-activated protein kinases (MAPK) activation [ 3 , 4 , 72 ], thus activating a cascade of immune responses, which lead to a hyper-inflammatory state that compromise the blood-testis-barrier [ 3 , 73 , 74 ] and increase the susceptibility of the testis and germ cells to SARS-CoV-2-driven ROS attack. This may explain the reduced semen quality and testosterone levels observed in SARS-CoV-2 positive patient. Since LH and FSH levels were not reduced in association with reduced testosterone, it is credible to infer that SARS-CoV-2-induced testosterone decline is a local effect and not due to the suppression of the hypothalamic-pituitary-testicular axis. The observed rise in circulating oestrogen and prolactin concentrations in SARS-CoV-2 positive patients may also suggest the endocrine-disrupting activity of the viral infection as a pathway of impairing male fertility.

Beyond semen quality, SARS-CoV-2 infection may also impact on the success of testicular sperm extraction, hence on the outcome of assisted reproductive techniques (ART). Testosterone/LH is a known predictor of sperm concentration and successful sperm retrieval [ 75 , 76 ]; therefore, the reduced testosterone/LH level in SARS-CoV-infected patients explains the reduced sperm concentration found in the patients and also reveals a likelihood of reduced success rate of sperm retrieval in them. This implies that SARS-CoV-2 may lower the rate of spontaneous conception as well as reduce the success of ARTs. Since testosterone/LH is also a predictor of Leydig cell function [ 76 , 77 ], it is also credible to infer that SARS-CoV-2 impairs Leydig cell function. This may the reduced testosterone found in SARS-CoV-2 positive men.

It is imperative to note that the duration of the infection and time between infection and semen collection might have an effect on the study outcomes. Findings of Koç and Keseroğlu [ 48 ], and Temiz et al.[ 63 ] that performed semen analysis after 5 and 4 days of infection respectively showed insignificant changes for most of the sperm variables and testosterone level. It is also worth mentioning that most of the eligible studies were published between 2020 and 2022, indicating that they were likely before the introduction of COVID-19 vaccines and also before the infection by the most recent and less dangerous variants of COVID-19; hence, the impact of the virus may differ. It is likely that COVID-19 vaccination confers protection against sperm-endocrine aberrations induced by the novel virus. More so, the less virulent variants of COVID-19 may exert less adverse effect on the sperm-endocrine system than the virulent variant. Just like other systematic viral infections, SARS-CoV-2 impairs male fertility possibly by upregulating pro-inflammatory cytokines and promoting hyper-inflammation and oxidative stress or direct sperm-endocrine alterations [ 3 ]. The peculiarity of SARS-CoV-2 hinges around its novelty.

Despite the fascinating and convincing findings of this study, there are some limitations. First, the effect of SARS-CoV-2 on live-birth rate is not presented, which limits our conclusion on the effect of the viral diseases on male fertility. Also, there were remarkable risk of publication bias in many of the studies. More so, the significant diversity in most of the studies is a major concern, although this was controlled by a sensitivity analysis. Lastly, studies exploring the actual mechanisms on SARS-CoV-2 on semen quality and male sex hormones are lacking and most studies were speculative. Nonetheless, the present meta-analysis provides an update and a robust data delineating the consequences of SARS-CoV-2 on conventional semen parameters and male sex hormones. Detailed Strengths, Weaknesses, Opportunities, and Threats (SWOT) analysis of the current study is shown in Fig 29 .

thumbnail

https://doi.org/10.1371/journal.pone.0307396.g029

In conclusion, this study demonstrates that SARS-CoV-2 may diminish fertility in male by reducing semen quality viz. ejaculate volume, sperm count, concentration, viability, motility, and normal morphology through a hormone-dependent mechanism (reduction in testosterone level and increase in oestrogen and prolactin levels). It is also likely that the induction of oxidative stress and inflammatory injury play significant roles. More well-designed studies which accommodate larger sample size should be conducted to validate these findings, evaluate the long term effect of SARS-CoV-2 on sperm function and testosterone concentration, establish the associated mechanisms, and address the weaknesses highlighted are recommended.

Supporting information

S1 checklist. prisma 2020 checklist..

https://doi.org/10.1371/journal.pone.0307396.s001

S1 Raw data.

https://doi.org/10.1371/journal.pone.0307396.s002

  • View Article
  • PubMed/NCBI
  • Google Scholar
  • 25. OHAT (Office of Health Assessment and Translation) and NTP (National Toxicology Program). Handbook for Conducting a Literature-Based Health Assessment Using OHAT Approach for Systematic Review and Evidence Integration. Institute of Environmental Health Sciences, US Department of Health and Human Services 2019. Available at: https://ntp.niehs.nih.gov/sites/default/files/ntp/ohat/pubs/handbookmarch2019_508.pdf (accessed on 13th March, 2024).

IMAGES

  1. How to write a systematic literature review [9 steps]

    systematic literature review vertaling

  2. How to conduct Systematic Literature Review

    systematic literature review vertaling

  3. Steps of Systematic Literature Review

    systematic literature review vertaling

  4. The methodology of the systematic literature review. Four phases of the

    systematic literature review vertaling

  5. How to Write a Systematic Literature Review (7-Step-Guide) 📚🔍

    systematic literature review vertaling

  6. Systematic Literature Review

    systematic literature review vertaling

VIDEO

  1. Systematic Literature Review Paper

  2. Systematic literature review in Millitary Studies'...free webinar

  3. Systematic Literature Review Part2 March 20, 2023 Joseph Ntayi

  4. Systematic Literature Review

  5. ONLINE CLASS SYSTEMATIC LITERATURE REVIEW RESEARH METHODOLOGY PART 3

  6. Introduction Systematic Literature Review-Various frameworks Bibliometric Analysis

COMMENTS

  1. Systematische Review

    Systematische Review | Stappenplan & Voorbeeld. Vertaald op 19 augustus 2022 door Veronique Scharwächter. Oorspronkelijk gepubliceerd door Shaun Turney Een systematische review (systematic review) is een soort review waarbij formele, herhaalbare methoden worden gebruikt om al het beschikbare bewijsmateriaal uit de bestaande literatuur te vinden, te selecteren en te synthetiseren.

  2. PDF Systematic Literature Reviews: an Introduction

    Systematic literature reviews (SRs) are a way of synthesising scientific evidence to answer a particular research question in a way that is transparent and reproducible, while seeking to include all published evidence on the topic and appraising the quality of th is evidence. SRs have become a major methodology

  3. How-to conduct a systematic literature review: A quick guide for

    Method details Overview. A Systematic Literature Review (SLR) is a research methodology to collect, identify, and critically analyze the available research studies (e.g., articles, conference proceedings, books, dissertations) through a systematic procedure [12].An SLR updates the reader with current literature about a subject [6].The goal is to review critical points of current knowledge on a ...

  4. Guidelines for writing a systematic review

    A Systematic Review (SR) is a synthesis of evidence that is identified and critically appraised to understand a specific topic. SRs are more comprehensive than a Literature Review, which most academics will be familiar with, as they follow a methodical process to identify and analyse existing literature (Cochrane, 2022).

  5. Guidance on Conducting a Systematic Literature Review

    Literature reviews establish the foundation of academic inquires. However, in the planning field, we lack rigorous systematic reviews. In this article, through a systematic search on the methodology of literature review, we categorize a typology of literature reviews, discuss steps in conducting a systematic literature review, and provide suggestions on how to enhance rigor in literature ...

  6. How to write a systematic literature review [9 steps]

    Screen the literature. Assess the quality of the studies. Extract the data. Analyze the results. Interpret and present the results. 1. Decide on your team. When carrying out a systematic literature review, you should employ multiple reviewers in order to minimize bias and strengthen analysis.

  7. How to Perform a Systematic Literature Review

    How to Perform a Systematic Literature Review A Guide for Healthcare Researchers, Practitioners and Students. ... The systematic review is a rigorous method of collating and synthesizing evidence from multiple studies, producing a whole greater than the sum of parts. This textbook is an authoritative and accessible guide to an activity that is ...

  8. How to Write a Systematic Review of the Literature

    SLR, as the name implies, is a systematic way of collecting, critically evaluating, integrating, and presenting findings from across multiple research studies on a research question or topic of interest. SLR provides a way to assess the quality level and magnitude of existing evidence on a question or topic of interest.

  9. What is a Systematic Literature Review?

    A systematic literature review (SLR) is an independent academic method that aims to identify and evaluate all relevant literature on a topic in order to derive conclusions about the question under consideration. "Systematic reviews are undertaken to clarify the state of existing research and the implications that should be drawn from this."

  10. Systematic Reviews and Meta-Analysis: A Guide for Beginners

    The graphical output of meta-analysis is a forest plot which provides information on individual studies and the pooled effect. Systematic reviews of literature can be undertaken for all types of questions, and all types of study designs. This article highlights the key features of systematic reviews, and is designed to help readers understand ...

  11. Systematic reviews: Structure, form and content

    Topic selection and planning. In recent years, there has been an explosion in the number of systematic reviews conducted and published (Chalmers & Fox 2016, Fontelo & Liu 2018, Page et al 2015) - although a systematic review may be an inappropriate or unnecessary research methodology for answering many research questions.Systematic reviews can be inadvisable for a variety of reasons.

  12. Home

    qualitative systematic review derives data from observation, interviews, or verbal interactions and focuses on the meanings and interpretations of the participants. It will include focus groups, interviews, observations and diaries. Narrative reviews: Broad perspective on topic (like a textbook chapter), no specified search strategy ...

  13. Systematic reviews: Structure, form and content

    Abstract. This article aims to provide an overview of the structure, form and content of systematic reviews. It focuses in particular on the literature searching component, and covers systematic database searching techniques, searching for grey literature and the importance of librarian involvement in the search.

  14. Systematic reviews

    At various times of the year, experts from the library give workshops on systematically searching for literature. You will learn to set up a systematic search strategy and you will receive information about where and how to search. You then apply these skills to your own research question. In the library's calendar you can see when the next ...

  15. Systematic Literature Reviews: An Introduction

    Systematic literature reviews (SRs) are a way of synt hesising scientific evidence to answer a particular. research question in a way that is transparent and reproducible, while seeking to include ...

  16. How to Do a Systematic Review: A Best Practice Guide for Conducting and

    Systematic reviews are characterized by a methodical and replicable methodology and presentation. They involve a comprehensive search to locate all relevant published and unpublished work on a subject; a systematic integration of search results; and a critique of the extent, nature, and quality of evidence in relation to a particular research question.

  17. How-to conduct a systematic literature review: A quick guide for

    Abstract. Performing a literature review is a critical first step in research to understanding the state-of-the-art and identifying gaps and challenges in the field. A systematic literature review is a method which sets out a series of steps to methodically organize the review. In this paper, we present a guide designed for researchers and in ...

  18. LibGuides: Library Services Menu: Systematic Reviews

    Researchers conducting systematic reviews use explicit, systematic methods that are selected with a view aimed at minimizing bias, to produce more reliable findings to inform decision making." A systematic review is a rigorous and comprehensive approach to reviewing and synthesizing existing research literature on a specific topic.

  19. Systematic reviews of the literature: an introduction to current

    Systematic reviews serve different purposes and use a different methodology than other types of evidence synthesis that include narrative reviews, scoping reviews, and overviews of reviews. Systematic reviews can address questions regarding effects of interventions or exposures, diagnostic properties of tests, and prevalence or prognosis of ...

  20. Systematic Review

    A systematic review is a type of review that uses repeatable methods to find, select, and synthesize all available evidence. It answers a clearly formulated research question and explicitly states the methods used to arrive at the answer. Example: Systematic review. In 2008, Dr. Robert Boyle and his colleagues published a systematic review in ...

  21. Literature review as a research methodology: An overview and guidelines

    2.1.1. Systematic literature review. What is it and when should we use it? Systematic reviews have foremost been developed within medical science as a way to synthesize research findings in a systematic, transparent, and reproducible way and have been referred to as the gold standard among reviews (Davis et al., 2014).Despite all the advantages of this method, its use has not been overly ...

  22. Full article: Digitalising the Systematic Literature Review process

    A systematic review can consider only quantitative studies (i.e., meta-analysis), or just qualitative studies (i.e., meta-ethnography; Mays et al., Citation 2005). All in all, SLR combines the Literature Review core feature, the use of scientific sources, with the structured, unbiased, and evidence-based Systematic Review (see, Figure 2). It is ...

  23. Data Envelopment Analysis and Higher Education: A Systematic Review of

    The interest in Data Envelopment Analysis (DEA) has grown since its first put forward in 1978. In response to the overwhelming interest, systematic literature reviews, as well as bibliometric studies, have been performed in describing the state-of-the-art and offering quantitative outlines with regard to the high-impact papers on global applications of DEA and the higher education system (DEA-HE).

  24. Guidance to best tools and practices for systematic reviews

    Methods and guidance to produce a reliable evidence synthesis. Several international consortiums of EBM experts and national health care organizations currently provide detailed guidance (Table (Table1). 1).They draw criteria from the reporting and methodological standards of currently recommended appraisal tools, and regularly review and update their methods to reflect new information and ...

  25. A systematic literature review on determinants and outcomes of ESG

    As the predominant business type, family firms hold a unique position to influence the global sector's ESG footprint. However, research on their ESG activities and performance is complex, multi-layered, and currently lacks integration. This review aims to bridge these research disciplines by providing an integrative overview of the current state of family firm ESG literature. By ...

  26. Public-Private Partnerships in the Healthcare Sector and Sustainability

    Fabio De Matteis, PhD, is an Associate Professor of Public Management at the Ionic Department in "Legal and Economic System of Mediterranean: Society, Environment, Culture", University of Bari (Italy).He has been teaching and researching on public management issues for many years, and has coordinated a three-year research project ("Multidimensionality, measurement and valorisation of ...

  27. SARS-CoV-2 impairs male fertility by targeting semen quality and

    In a systematic review and meta-analysis by Corona et al. , SARS-CoV-2 infection was linked with low semen quality and serum testosterone level. This is in agreement with earlier systematic review and meta-analysis by Tiwari et al. . The study however had some frailties- first, the random-effect model was used irrespective of the level of ...