• Latest CDC Covid-19 Info
  • CSR Extranet
  • Staff Directory
  • Contact CSR

For Applicants

CSR’s primary role is to handle the receipt and review of ~ 75% of the grant applications that NIH receives. NIH separates the review process from funding decisions.

For Reviewers

Reviewers are critical to our mission to see that NIH grant applications receive, fair, independent, expert, and timely scientific reviews. We appreciate the generosity with which reviewers give their time.

News & Policy

The latest news and policy updates from CSR. Read about our outreach programs and publications.

Study Sections

Applications are reviewed in study sections (Scientific Review Group, SRG). Review Branches (RBs) are clusters of study sections based on scientific discipline.

Review Panels & Dates

Csr initiatives to address bias in peer review.

CSR is committed to addressing bias in peer review. Learn about our commitment and relevant data.

BiasinPeerReview

Words from Dr. Noni Byrnes, Director

  • CSR’s Commitment to Advancing EDI in Peer Review, 3 March 2021

Watch-button

Words from Dr. Bruce Reed, Deputy Director

BiasAwareness and Mitigation Training

Bias Awareness and      Mitigation Training

Reporting Avenues      for Bias

Broadening the      Reviewer Pool

Exploring Changes to      Review to Make it More      Fair and Effective

  

Bias Awareness and Mitigation Training for Reviewers, Chairs, and SROs

trophy icon

16,646      As of Dec 2022, 16,646 reviewers have completed the training.

identify icon

91%      91% of reviewers thought that the training substantially improved their ability to identify bias in peer review.

bias icon

93%      93% of reviewers stated the training made them substantially more comfortable intervening against bias.

CSR developed training specifically targeted toward mitigating the most common biases in the peer review process. The training includes personal testimonials, interactive exercises, and a narrated mock study section demonstrating techniques to intervene – all based on real-life examples. The training was developed with the assistance of a diverse CSR Advisory Council Working Group . Training has been provided to all CSR reviewers since August 2021.

01_Identify_Review_Bias_Chart

Return to top

Reporting avenues for bias, unfair reviews, uncivil conduct on panels.

CSR launched a widely-publicized reporting avenue for issues related to respectful interactions, bias or anything else that could affect the fairness of the review process. The reporting channel is open to all – investigators, reviewers, and program staff.

Steps

Investigate      Every allegation is carefully investigated by CSR senior management.

Resolve      If we agree the review was biased/flawed, CSR will re-review application in the same council round. If we don’t agree, the official NIH appeals process remains available to all investigators.

Closure and Culture Change      Re-review? CSR Scientific Division Director discusses the issue with the reviewer, if appropriate. Actions might also include not inviting the reviewer to serve in the future.

ReportBias

Report a Bias Incident      Send a message to     [email protected]

Return to top       

Broadening the reviewer pool to diversify peer review groups.

scientific background icon

Scientific Background

demographyicon

Career Stage

peerreviewicon

Peer Review Experience

There is a critical need for NIH to hear diverse perspectives to fulfill peer review’s mission of identifying the best, most novel science. The most effective, highest-quality review committees are broadly diverse in multiple dimensions including 1) scientific background and perspective; 2) demography; 3) geography; 4) career stage; 5) peer review experience. The value of diversity is evident in expectations of scientific review officers in recruiting for panels and in our investment of resources and public programs to more easily identify reviewers who are not already known to NIH.

Study section membership

The selection process for members of standing study sections is thorough and involves multiple levels of oversight and approval. Guidelines for staff in developing a slate of nominees explicitly addresses the value of diversity on panels and the advance planning process includes an analysis of current diversity on the panel, gaps, and plans to address those gaps

Currently, CSR is increasing focus on diversity of special emphasis panels (SEP) and has developed tools to allow SROs and supervisors to track diversity on SEPs.

Early Career Reviewer Program

CSR’s Early Career Reviewer (ECR) program offers early career scientists the opportunity to gain first-hand experience in peer review. In 2020, the program was expanded by requiring all standing study sections to include two ECRs at each meeting. This benefits ECRs by providing experience they can use in crafting their own grant applications and benefits CSR in that ECRs are more diverse. In 2021, 16.8% of ECRs were underrepresented minorities, compared to 10.3% for all CSR reviewers.

Development of tools to assist SROs in finding new, qualified reviewers

CSR has developed an internal database that allows scientific review officers (SRO) to more easily identify active scientists who might not already be known to NIH. The tool includes scientists funded through other government agencies and some non-profit organizations, those recommended by NIH staff, and those recommended by scientific societies.

CSR also regularly facilitates sharing among SROs of best practices in broader recruitment strategies.

Impact of CSR’s Efforts

Women-meetings

Exploring Changes to Review to Make it More Fair and Effective

Exploring blinded review processes photo

Leading efforts to simplify review criteria

CSR Advisory Council working groups ( Clinical Trials , Non-Clinical Trials ) developed recommendations to simplify review criteria in a way that focuses reviewers on the importance and feasibility of the research proposed and in which reputation of the investigator and institution, in the global sense, do not have a place. The reorganization of the current five review criteria into three factors allow for the possibility of a multi-stage, partially-blinded review process in the future. These recommendations were considered and modified by NIH leadership; see  details  of the proposed review framework. NIH gathered additional input from the scientific community on the proposed changes through a Request for Information, which closed March 10, 2023. An overview of the input received from more than 800 individuals, scientific societies, and academic institutions has been shared on Review Matters and a content analysis posted.

Current-proposed

Improving NRSA Fellowship Review

A CSR Advisory Council Working Group on Fellowship Review developed recommendations to modify the review criteria and change the required application materials for NRSA fellowship applications. The changes are expected to make the process more fair and more effective in identifying the next generation of promising scientists.

NRSA-Fellowship-Review

Review criteria will be modified to better focus reviewers on three key assessments: (1) potential of the applicant; (2) strength of the science; (3) quality of the training plan. In order to provide equal opportunity, accomplishments of the applicant will be evaluated in the context of their opportunities. Sponsor and institution will be evaluated with respect to the quality of the proposed science and quality of the training plan, minimizing effects of reputation. Changes in required application materials fit with the modified review criteria. Importantly, coursework grades will no longer be considered.

Details of the recommended changes can be found  here . These recommendations were considered and approved by NIH leadership. An RFI was issued to gather additional input from the public and closed in June 2023.  Responses to the RFI were supportive but underscored the need for ample communications and thoughtful guidance and training for investigators, reviewers, and NIH staff. An NIH Guide Notice will be issued in late 2023 with more information.

Collaboration with the Common Fund High Risk, High Reward (tR01) program

  • CSR collaborated with the Common Fund to test a multi-stage review process in which information about the investigator and institution is provided after assessment of the abstract, aims, and research plan. Study section reviews took place in April 2021 and the process is being evaluated by an external contractor. Initial results are encouraging with a statistically significant increase in the demographic diversity of the applicant pool.
  • CSR developed and supports the TRA Anonymization Check , an online tool that allows applicants to verify that their specific aims and research strategy sections do not contain identifying information.

CSR investigated the effects of anonymization on review outcomes

CSR conducted a large-scale study of the effect of redacting identifying information from grant applications on the preliminary assessment of scientific and technical merit ( Nakamura et al, 2021 ).

What does this mean

Last updated: 12/20/2023 13:46

  • More Social Media from NIH

National Institutes of Health

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

  • View all journals
  • Explore content
  • About the journal
  • Publish with us
  • Sign up for alerts
  • Open access
  • Published: 11 December 2020

Quantifying and addressing the prevalence and bias of study designs in the environmental and social sciences

  • Alec P. Christie   ORCID: orcid.org/0000-0002-8465-8410 1 ,
  • David Abecasis   ORCID: orcid.org/0000-0002-9802-8153 2 ,
  • Mehdi Adjeroud 3 ,
  • Juan C. Alonso   ORCID: orcid.org/0000-0003-0450-7434 4 ,
  • Tatsuya Amano   ORCID: orcid.org/0000-0001-6576-3410 5 ,
  • Alvaro Anton   ORCID: orcid.org/0000-0003-4108-6122 6 ,
  • Barry P. Baldigo   ORCID: orcid.org/0000-0002-9862-9119 7 ,
  • Rafael Barrientos   ORCID: orcid.org/0000-0002-1677-3214 8 ,
  • Jake E. Bicknell   ORCID: orcid.org/0000-0001-6831-627X 9 ,
  • Deborah A. Buhl 10 ,
  • Just Cebrian   ORCID: orcid.org/0000-0002-9916-8430 11 ,
  • Ricardo S. Ceia   ORCID: orcid.org/0000-0001-7078-0178 12 , 13 ,
  • Luciana Cibils-Martina   ORCID: orcid.org/0000-0002-2101-4095 14 , 15 ,
  • Sarah Clarke 16 ,
  • Joachim Claudet   ORCID: orcid.org/0000-0001-6295-1061 17 ,
  • Michael D. Craig 18 , 19 ,
  • Dominique Davoult 20 ,
  • Annelies De Backer   ORCID: orcid.org/0000-0001-9129-9009 21 ,
  • Mary K. Donovan   ORCID: orcid.org/0000-0001-6855-0197 22 , 23 ,
  • Tyler D. Eddy 24 , 25 , 26 ,
  • Filipe M. França   ORCID: orcid.org/0000-0003-3827-1917 27 ,
  • Jonathan P. A. Gardner   ORCID: orcid.org/0000-0002-6943-2413 26 ,
  • Bradley P. Harris 28 ,
  • Ari Huusko 29 ,
  • Ian L. Jones 30 ,
  • Brendan P. Kelaher 31 ,
  • Janne S. Kotiaho   ORCID: orcid.org/0000-0002-4732-784X 32 , 33 ,
  • Adrià López-Baucells   ORCID: orcid.org/0000-0001-8446-0108 34 , 35 , 36 ,
  • Heather L. Major   ORCID: orcid.org/0000-0002-7265-1289 37 ,
  • Aki Mäki-Petäys 38 , 39 ,
  • Beatriz Martín 40 , 41 ,
  • Carlos A. Martín 8 ,
  • Philip A. Martin 1 , 42 ,
  • Daniel Mateos-Molina   ORCID: orcid.org/0000-0002-9383-0593 43 ,
  • Robert A. McConnaughey   ORCID: orcid.org/0000-0002-8537-3695 44 ,
  • Michele Meroni 45 ,
  • Christoph F. J. Meyer   ORCID: orcid.org/0000-0001-9958-8913 34 , 35 , 46 ,
  • Kade Mills 47 ,
  • Monica Montefalcone 48 ,
  • Norbertas Noreika   ORCID: orcid.org/0000-0002-3853-7677 49 , 50 ,
  • Carlos Palacín 4 ,
  • Anjali Pande 26 , 51 , 52 ,
  • C. Roland Pitcher   ORCID: orcid.org/0000-0003-2075-4347 53 ,
  • Carlos Ponce 54 ,
  • Matt Rinella 55 ,
  • Ricardo Rocha   ORCID: orcid.org/0000-0003-2757-7347 34 , 35 , 56 ,
  • María C. Ruiz-Delgado 57 ,
  • Juan J. Schmitter-Soto   ORCID: orcid.org/0000-0003-4736-8382 58 ,
  • Jill A. Shaffer   ORCID: orcid.org/0000-0003-3172-0708 10 ,
  • Shailesh Sharma   ORCID: orcid.org/0000-0002-7918-4070 59 ,
  • Anna A. Sher   ORCID: orcid.org/0000-0002-6433-9746 60 ,
  • Doriane Stagnol 20 ,
  • Thomas R. Stanley 61 ,
  • Kevin D. E. Stokesbury 62 ,
  • Aurora Torres 63 , 64 ,
  • Oliver Tully 16 ,
  • Teppo Vehanen   ORCID: orcid.org/0000-0003-3441-6787 65 ,
  • Corinne Watts 66 ,
  • Qingyuan Zhao 67 &
  • William J. Sutherland 1 , 42  

Nature Communications volume  11 , Article number:  6377 ( 2020 ) Cite this article

15k Accesses

57 Citations

69 Altmetric

Metrics details

  • Environmental impact
  • Scientific community
  • Social sciences

Building trust in science and evidence-based decision-making depends heavily on the credibility of studies and their findings. Researchers employ many different study designs that vary in their risk of bias to evaluate the true effect of interventions or impacts. Here, we empirically quantify, on a large scale, the prevalence of different study designs and the magnitude of bias in their estimates. Randomised designs and controlled observational designs with pre-intervention sampling were used by just 23% of intervention studies in biodiversity conservation, and 36% of intervention studies in social science. We demonstrate, through pairwise within-study comparisons across 49 environmental datasets, that these types of designs usually give less biased estimates than simpler observational designs. We propose a model-based approach to combine study estimates that may suffer from different levels of study design bias, discuss the implications for evidence synthesis, and how to facilitate the use of more credible study designs.

Similar content being viewed by others

research bias in peer review

Citizen science in environmental and ecological sciences

research bias in peer review

Improving quantitative synthesis to achieve generality in ecology

research bias in peer review

Empirical evidence of widespread exaggeration bias and selective reporting in ecology

Introduction.

The ability of science to reliably guide evidence-based decision-making hinges on the accuracy and credibility of studies and their results 1 , 2 . Well-designed, randomised experiments are widely accepted to yield more credible results than non-randomised, ‘observational studies’ that attempt to approximate and mimic randomised experiments 3 . Randomisation is a key element of study design that is widely used across many disciplines because of its ability to remove confounding biases (through random assignment of the treatment or impact of interest 4 , 5 ). However, ethical, logistical, and economic constraints often prevent the implementation of randomised experiments, whereas non-randomised observational studies have become popular as they take advantage of historical data for new research questions, larger sample sizes, less costly implementation, and more relevant and representative study systems or populations 6 , 7 , 8 , 9 . Observational studies nevertheless face the challenge of accounting for confounding biases without randomisation, which has led to innovations in study design.

We define ‘study design’ as an organised way of collecting data. Importantly, we distinguish between data collection and statistical analysis (as opposed to other authors 10 ) because of the belief that bias introduced by a flawed design is often much more important than bias introduced by statistical analyses. This was emphasised by Light, Singer & Willet 11 (p. 5): “You can’t fix by analysis what you bungled by design…”; and Rubin 3 : “Design trumps analysis.” Nevertheless, the importance of study design has often been overlooked in debates over the inability of researchers to reproduce the original results of published studies (so-called ‘reproducibility crises’ 12 , 13 ) in favour of other issues (e.g., p-hacking 14 and Hypothesizing After Results are Known or ‘HARKing’ 15 ).

To demonstrate the importance of study designs, we can use the following decomposition of estimation error equation 16 :

This demonstrates that even if we improve the quality of modelling and analysis (to reduce modelling bias through a better bias-variance trade-off 17 ) or increase sample size (to reduce statistical noise), we cannot remove the intrinsic bias introduced by the choice of study design (design bias) unless we collect the data in a different way. The importance of study design in determining the levels of bias in study results therefore cannot be overstated.

For the purposes of this study we consider six commonly used study designs; differences and connections can be visualised in Fig.  1 . There are three major components that allow us to define these designs: randomisation, sampling before and after the impact of interest occurs, and the use of a control group.

figure 1

A hypothetical study set-up is shown where the abundance of birds in three impact and control replicates (e.g., fields represented by blocks in a row) are monitored before and after an impact (e.g., ploughing) that occurs in year zero. Different colours represent each study design and illustrate how replicates are sampled. Approaches for calculating an estimate of the true effect of the impact for each design are also shown, along with synonyms from different disciplines.

Of the non-randomised observational designs, the Before-After Control-Impact (BACI) design uses a control group and samples before and after the impact occurs (i.e., in the ‘before-period’ and the ‘after-period’). Its rationale is to explicitly account for pre-existing differences between the impact group (exposed to the impact) and control group in the before-period, which might otherwise bias the estimate of the impact’s true effect 6 , 18 , 19 .

The BACI design improves upon several other commonly used observational study designs, of which there are two uncontrolled designs: After, and Before-After (BA). An After design monitors an impact group in the after-period, while a BA design compares the state of the impact group between the before- and after-periods. Both designs can be expected to yield poor estimates of the impact’s true effect (large design bias; Equation (1)) because changes in the response variable could have occurred without the impact (e.g., due to natural seasonal changes; Fig.  1 ).

The other observational design is Control-Impact (CI), which compares the impact group and control group in the after-period (Fig.  1 ). This design may suffer from design bias introduced by pre-existing differences between the impact group and control group in the before-period; bias that the BACI design was developed to account for 20 , 21 . These differences have many possible sources, including experimenter bias, logistical and environmental constraints, and various confounding factors (variables that change the propensity of receiving the impact), but can be adjusted for through certain data pre-processing techniques such as matching and stratification 22 .

Among the randomised designs, the most commonly used are counterparts to the observational CI and BACI designs: Randomised Control-Impact (R-CI) and Randomised Before-After Control-Impact (R-BACI) designs. The R-CI design, often termed ‘Randomised Controlled Trials’ (RCTs) in medicine and hailed as the ‘gold standard’ 23 , 24 , removes any pre-impact differences in a stochastic sense, resulting in zero design bias (Equation ( 1 )). Similarly, the R-BACI design should also have zero design bias, and the impact group measurements in the before-period could be used to improve the efficiency of the statistical estimator. No randomised equivalents exist of After or BA designs as they are uncontrolled.

It is important to briefly note that there is debate over two major statistical methods that can be used to analyse data collected using BACI and R-BACI designs, and which is superior at reducing modelling bias 25 (Equation (1)). These statistical methods are: (i) Differences in Differences (DiD) estimator; and (ii) covariance adjustment using the before-period response, which is an extension of Analysis of Covariance (ANCOVA) for generalised linear models — herein termed ‘covariance adjustment’ (Fig.  1 ). These estimators rely on different assumptions to obtain unbiased estimates of the impact’s true effect. The DiD estimator assumes that the control group response accurately represents the impact group response had it not been exposed to the impact (‘parallel trends’ 18 , 26 ) whereas covariance adjustment assumes there are no unmeasured confounders and linear model assumptions hold 6 , 27 .

From both theory and Equation (1), with similar sample sizes, randomised designs (R-BACI and R-CI) are expected to be less biased than controlled, observational designs with sampling in the before-period (BACI), which in turn should be superior to observational designs without sampling in the before-period (CI) or without a control group (BA and After designs 7 , 28 ). Between randomised designs, we might expect that an R-BACI design performs better than a R-CI design because utilising extra data before the impact may improve the efficiency of the statistical estimator by explicitly characterising pre-existing differences between the impact group and control group.

Given the likely differences in bias associated with different study designs, concerns have been raised over the use of poorly designed studies in several scientific disciplines 7 , 29 , 30 , 31 , 32 , 33 , 34 , 35 . Some disciplines, such as the social and medical sciences, commonly undertake direct comparisons of results obtained by randomised and non-randomised designs within a single study 36 , 37 , 38 or between multiple studies (between-study comparisons 39 , 40 , 41 ) to specifically understand the influence of study designs on research findings. However, within-study comparisons are limited in their scope (e.g., a single study 42 , 43 ) and between-study comparisons can be confounded by variability in context or study populations 44 . Overall, we lack quantitative estimates of the prevalence of different study designs and the levels of bias associated with their results.

In this work, we aim to first quantify the prevalence of different study designs in the social and environmental sciences. To fill this knowledge gap, we take advantage of summaries for several thousand biodiversity conservation intervention studies in the Conservation Evidence database 45 ( www.conservationevidence.com ) and social intervention studies in systematic reviews by the Campbell Collaboration ( www.campbellcollaboration.org ). We then quantify the levels of bias in estimates obtained by different study designs (R-BACI, R-CI, BACI, BA, and CI) by applying a hierarchical model to approximately 1000 within-study comparisons across 49 raw environmental datasets from a range of fields. We show that R-BACI, R-CI and BACI designs are poorly represented in studies testing biodiversity conservation and social interventions, and that these types of designs tend to give less biased estimates than simpler observational designs. We propose a model-based approach to combine study estimates that may suffer from different levels of study design bias, discuss the implications for evidence synthesis, and how to facilitate the use of more credible study designs.

Prevalence of study designs

We found that the biodiversity-conservation (conservation evidence) and social-science (Campbell collaboration) literature had similarly high proportions of intervention studies that used CI designs and After designs, but low proportions that used R-BACI, BACI, or BA designs (Fig.  2 ). There were slightly higher proportions of R-CI designs used by intervention studies in social-science systematic reviews than in the biodiversity-conservation literature (Fig.  2 ). The R-BACI, R-CI, and BACI designs made up 23% of intervention studies for biodiversity conservation, and 36% of intervention studies for social science.

figure 2

Intervention studies from the biodiversity-conservation literature were screened from the Conservation Evidence database ( n =4260 studies) and studies from the social-science literature were screened from 32 Campbell Collaboration systematic reviews ( n =1009 studies – note studies excluded by these reviews based on their study design were still counted). Percentages for the social-science literature were calculated for each systematic review (blue data points) and then averaged across all 32 systematic reviews (blue bars and black vertical lines represent mean and 95% Confidence Intervals, respectively). Percentages for the biodiversity-conservation literature are absolute values (shown as green bars) calculated from the entire Conservation Evidence database (after excluding any reviews). Source data are provided as a Source Data file. BA before-after, CI control-impact, BACI before-after-control-impact, R-BACI randomised BACI, R-CI randomised CI.

Influence of different study designs on study results

In non-randomised datasets, we found that estimates of BACI (with covariance adjustment) and CI designs were very similar, while the point estimates for most other designs often differed substantially in their magnitude and sign. We found similar results in randomised datasets for R-BACI (with covariance adjustment) and R-CI designs. For ~30% of responses, in both non-randomised and randomised datasets, study design estimates differed in their statistical significance (i.e., p < 0.05 versus p  > =0.05), except for estimates of (R-)BACI (with covariance adjustment) and (R-)CI designs (Table  1 ; Fig.  3 ). It was rare for the 95% confidence intervals of different designs’ estimates to not overlap – except when comparing estimates of BA designs to (R-)BACI (with covariance adjustment) and (R-)CI designs (Table  1 ). It was even rarer for estimates of different designs to have significantly different signs (i.e., one estimate with entirely negative confidence intervals versus one with entirely positive confidence intervals; Table  1 , Fig.  3 ). Overall, point estimates often differed greatly in their magnitude and, to a lesser extent, in their sign between study designs, but did not differ as greatly when accounting for the uncertainty around point estimates – except in terms of their statistical significance.

figure 3

t-statistics were obtained from two-sided t-tests of estimates obtained by each design for different responses in each dataset using Generalised Linear Models (see Methods). For randomised datasets, BACI and CI axis labels refer to R-BACI and R-CI designs (denoted by ‘R-’). DiD Difference in Differences; CA covariance adjustment. Lines at t-statistic values of 1.96 denote boundaries between cells and colours of points indicate differences in direction and statistical significance ( p  < 0.05; grey = same sign and significance, orange = same sign but difference in significance, red = different sign and significance). Numbers refer to the number of responses in each cell. Source data are provided as a Source Data file. BA Before-After, CI Control-Impact, BACI Before-After-Control-Impact.

Levels of bias in estimates of different study designs

We modelled study design bias using a random effect across datasets in a hierarchical Bayesian model; σ is the standard deviation of the bias term, and assuming bias is randomly distributed across datasets and is on average zero, larger values of σ will indicate a greater magnitude of bias (see Methods). We found that, for randomised datasets, estimates of both R-BACI (using covariance adjustment; CA) and R-CI designs were affected by negligible amounts of bias (very small values of σ; Table  2 ). When the R-BACI design used the DiD estimator, it suffered from slightly more bias (slightly larger values of σ), whereas the BA design had very high bias when applied to randomised datasets (very large values of σ; Table  2 ). There was a highly positive correlation between the estimates of R-BACI (using covariance adjustment) and R-CI designs (Ω[R-BACI CA, R-CI] was close to 1; Table  2 ). Estimates of R-BACI using the DiD estimator were also positively correlated with estimates of R-BACI using covariance adjustment and R-CI designs (moderate positive mean values of Ω[R-BACI CA, R-BACI DiD] and Ω[R-BACI DiD, R-CI]; Table  2 ).

For non-randomised datasets, controlled designs (BACI and CI) were substantially less biased (far smaller values of σ) than the uncontrolled BA design (Table  2 ). A BACI design using the DiD estimator was slightly less biased than the BACI design using covariance adjustment, which was, in turn, slightly less biased than the CI design (Table  2 ).

Standard errors estimated by the hierarchical Bayesian model were reasonably accurate for the randomised datasets (see λ in Methods and Table  2 ), whereas there was some underestimation of standard errors and lack-of-fit for non-randomised datasets.

Our approach provides a principled way to quantify the levels of bias associated with different study designs. We found that randomised study designs (R-BACI and R-CI) and observational BACI designs are poorly represented in the environmental and social sciences; collectively, descriptive case studies (the After design), the uncontrolled, observational BA design, and the controlled, observational CI design made up a substantially greater proportion of intervention studies (Fig.  2 ). And yet R-BACI, R-CI and BACI designs were found to be quantifiably less biased than other observational designs.

As expected the R-CI and R-BACI designs (using a covariance adjustment estimator) performed well; the R-BACI design using a DiD estimator performed slightly less well, probably because the differencing of pre-impact data by this estimator may introduce additional statistical noise compared to covariance adjustment, which controls for these data using a lagged regression variable. Of the observational designs, the BA design performed very poorly (both when analysing randomised and non-randomised data) as expected, being uncontrolled and therefore prone to severe design bias 7 , 28 . The CI design also tended to be more biased than the BACI design (using a DiD estimator) due to pre-existing differences between the impact and control groups. For BACI designs, we recommend that the underlying assumptions of DiD and CA estimators are carefully considered before choosing to apply them to data collected for a specific research question 6 , 27 . Their levels of bias were negligibly different and their known bracketing relationship suggests they will typically give estimates with the same sign, although their tendency to over- or underestimate the true effect will depend on how well the underlying assumptions of each are met (most notably, parallel trends for DiD and no unmeasured confounders for CA; see Introduction) 6 , 27 . Overall, these findings demonstrate the power of large within-study comparisons to directly quantify differences in the levels of bias associated with different designs.

We must acknowledge that the assumptions of our hierarchical model (that the bias for each design (j) is on average zero and normally distributed) cannot be verified without gold standard randomised experiments and that, for observational designs, the model was overdispersed (potentially due to underestimation of statistical error by GLM(M)s or positively correlated design biases). The exact values of our hierarchical model should therefore be treated with appropriate caution, and future research is needed to refine and improve our approach to quantify these biases more precisely. Responses within datasets may also not be independent as multiple species could interact; therefore, the estimates analysed by our hierarchical model are statistically dependent on each other, and although we tried to account for this using a correlation matrix (see Methods, Eq. ( 3 )), this is a limitation of our model. We must also recognise that we collated datasets using non-systematic searches 46 , 47 and therefore our analysis potentially exaggerates the intrinsic biases of observational designs (i.e., our data may disproportionately reflect situations where the BACI design was chosen to account for confounding factors). We nevertheless show that researchers were wise to use the BACI design because it was less biased than CI and BA designs across a wide range of datasets from various environmental systems and locations. Without undertaking costly and time-consuming pre-impact sampling and pilot studies, researchers are also unlikely to know the levels of bias that could affect their results. Finally, we did not consider sample size, but it is likely that researchers might use larger sample sizes for CI and BA designs than BACI designs. This is, however, unlikely to affect our main conclusions because larger sample sizes could increase type I errors (false positive rate) by yielding more precise, but biased estimates of the true effect 28 .

Our analyses provide several empirically supported recommendations for researchers designing future studies to assess an impact of interest. First, using a controlled and/or randomised design (if possible) was shown to strongly reduce the level of bias in study estimates. Second, when observational designs must be used (as randomisation is not feasible or too costly), we urge researchers to choose the BACI design over other observational designs—and when that is not possible, to choose the CI design over the uncontrolled BA design. We acknowledge that limited resources, short funding timescales, and ethical or logistical constraints 48 may force researchers to use the CI design (if randomisation and pre-impact sampling are impossible) or the BA design (if appropriate controls cannot be found 28 ). To facilitate the usage of less biased designs, longer-term investments in research effort and funding are required 43 . Far greater emphasis on study designs in statistical education 49 and better training and collaboration between researchers, practitioners and methodologists, is needed to improve the design of future studies; for example, potentially improving the CI design by pairing or matching the impact group and control group 22 , or improving the BA design using regression discontinuity methods 48 , 50 . Where the choice of study design is limited, researchers must transparently communicate the limitations and uncertainty associated with their results.

Our findings also have wider implications for evidence synthesis, specifically the exclusion of certain observational study designs from syntheses (the ‘rubbish in, rubbish out’ concept 51 , 52 ). We believe that observational designs should be included in systematic reviews and meta-analyses, but that careful adjustments are needed to account for their potential biases. Exclusion of observational studies often results from subjective, checklist-based ‘Risk of Bias’ or quality assessments of studies (e.g., AMSTRAD 2 53 , ROBINS-I 54 , or GRADE 55 ) that are not data-driven and often neglect to identify the actual direction, or quantify the magnitude, of possible bias introduced by observational studies when rating the quality of a review’s recommendations. We also found that there was a small proportion of studies that used randomised designs (R-CI or R-BACI) or observational BACI designs (Fig.  2 ), suggesting that systematic reviews and meta-analyses risk excluding a substantial proportion of the literature and limiting the scope of their recommendations if such exclusion criteria are used 32 , 56 , 57 . This problem is compounded by the fact that, at least in conservation science, studies using randomised or BACI designs are strongly concentrated in Europe, Australasia, and North America 31 . Systematic reviews that rely on these few types of study designs are therefore likely to fail to provide decision makers outside of these regions with locally relevant recommendations that they prefer 58 . The Covid-19 pandemic has highlighted the difficulties in making locally relevant evidence-based decisions using studies conducted in different countries with different demographics and cultures, and on patients of different ages, ethnicities, genetics, and underlying health issues 59 . This problem is also acute for decision-makers working on biodiversity conservation in the tropical regions, where the need for conservation is arguably the greatest (i.e., where most of Earth’s biodiversity exists 60 ) but they either have to rely on very few well-designed studies that are not locally relevant (i.e., have low generalisability), or more studies that are locally relevant but less well-designed 31 , 32 . Either option could lead decision-makers to take ineffective or inefficient decisions. In the long-term, improving the quality and coverage of scientific evidence and evidence syntheses across the world will help solve these issues, but shorter-term solutions to synthesising patchy evidence bases are required.

Our work furthers sorely needed research on how to combine evidence from studies that vary greatly in their design. Our approach is an alternative to conventional meta-analyses which tend to only weight studies by their sample size or the inverse of their variance 61 ; when studies vary greatly in their study design, simply weighting by inverse variance or sample size is unlikely to account for different levels of bias introduced by different study designs (see Equation (1)). For example, a BA study could receive a larger weight if it had lower variance than a BACI study, despite our results suggesting a BA study usually suffers from greater design bias. Our model provides a principled way to weight studies by both their variance and the likely amount of bias introduced by their study design; it is therefore a form of ‘bias-adjusted meta-analysis’ 62 , 63 , 64 , 65 , 66 . However, instead of relying on elicitation of subjective expert opinions on the bias of each study, we provide a data-driven, empirical quantification of study biases – an important step that was called for to improve such meta-analytic approaches 65 , 66 .

Future research is needed to refine our methodology, but our empirically grounded form of bias-adjusted meta-analysis could be implemented as follows: 1.) collate studies for the same true effect, their effect size estimates, standard errors, and the type of study design; 2.) enter these data into our hierarchical model, where effect size estimates share the same intercept (the true causal effect), a random effect term due to design bias (whose variance is estimated by the method we used), and a random effect term for statistical noise (whose variance is estimated by the reported standard error of studies); 3.) fit this model and estimate the shared intercept/true effect. Heuristically, this can be thought of as weighting studies by both their design bias and their sampling variance and could be implemented on a dynamic meta-analysis platform (such as metadataset.com 67 ). This approach has substantial potential to develop evidence synthesis in fields (such as biodiversity conservation 31 , 32 ) with patchy evidence bases, where reliably synthesising findings from studies that vary greatly in their design is a fundamental and unavoidable challenge.

Our study has highlighted an often overlooked aspect of debates over scientific reproducibility: that the credibility of studies is fundamentally determined by study design. Testing the effectiveness of conservation and social interventions is undoubtedly of great importance given the current challenges facing biodiversity and society in general and the serious need for more evidence-based decision-making 1 , 68 . And yet our findings suggest that quantifiably less biased study designs are poorly represented in the environmental and social sciences. Greater methodological training of researchers and funding for intervention studies, as well as stronger collaborations between methodologists and practitioners is needed to facilitate the use of less biased study designs. Better communication and reporting of the uncertainty associated with different study designs is also needed, as well as more meta-research (the study of research itself) to improve standards of study design 69 . Our hierarchical model provides a principled way to combine studies using a variety of study designs that vary greatly in their risk of bias, enabling us to make more efficient use of patchy evidence bases. Ultimately, we hope that researchers and practitioners testing interventions will think carefully about the types of study designs they use, and we encourage the evidence synthesis community to embrace alternative methods for combining evidence from heterogeneous sets of studies to improve our ability to inform evidence-based decision-making in all disciplines.

Quantifying the use of different designs

We compared the use of different study designs in the literature that quantitatively tested interventions between the fields of biodiversity conservation (4,260 studies collated by Conservation Evidence 45 ) and social science (1,009 studies found by 32 systematic reviews produced by the Campbell Collaboration: www.campbellcollaboration.org ).

Conservation Evidence is a database of intervention studies, each of which has quantitatively tested a conservation intervention (e.g., sowing strips of wildflower seeds on farmland to benefit birds), that is continuously being updated through comprehensive, manual searches of conservation journals for a wide range of fields in biodiversity conservation (e.g., amphibian, bird, peatland, and farmland conservation 45 ). To obtain the proportion of studies that used each design from Conservation Evidence, we simply extracted the type of study design from each study in the database in 2019 – the study design was determined using a standardised set of criteria; reviews were not included (Table  3 ). We checked if the designs reported in the database accurately reflected the designs in the original publication and found that for a random subset of 356 studies, 95.1% were accurately described.

Each systematic review produced by the Campbell Collaboration collates and analyses studies that test a specific social intervention; we collated systematic reviews that tested a variety of social interventions across several fields in the social sciences, including education, crime and justice, international development and social welfare (Supplementary Data  1 ). We retrieved systematic reviews produced by the Campbell Collaboration by searching their website ( www.campbellcollaboration.org ) for reviews published between 2013‒2019 (as of 8th September 2019) — we limited the date range as we could not go through every review. As we were interested in the use of study designs in the wider social-science literature, we only considered reviews (32 in total) that contained sufficient information on the number of included and excluded studies that used different study designs. Studies may be excluded from systematic reviews for several reasons, such as their relevance to the scope of the review (e.g., testing a relevant intervention) and their study design. We only considered studies if the sole reason for their exclusion from the systematic review was their study design – i.e., reviews clearly reported that the study was excluded because it used a particular study design, and not because of any other reason, such as its relevance to the review’s research questions. We calculated the proportion of studies that used each design in each systematic review (using the same criteria as for the biodiversity-conservation literature – see Table  3 ) and then averaged these proportions across all systematic reviews.

Within-study comparisons of different study designs

We wanted to make direct within-study comparisons between the estimates obtained by different study designs (e.g., see 38 , 70 , 71 for single within-study comparisons) for many different studies. If a dataset contains data collected using a BACI design, subsets of these data can be used to mimic the use of other study designs (a BA design using only data for the impact group, and a CI design using only data collected after the impact occurred). Similarly, if data were collected using a R-BACI design, subsets of these data can be used to mimic the use of a BA design and a R-CI design. Collecting BACI and R-BACI datasets would therefore allow us to make direct within-study comparisons of the estimates obtained by these designs.

We collated BACI and R-BACI datasets by searching the Web of Science Core Collection 72 which included the following citation indexes: Science Citation Index Expanded (SCI-EXPANDED) 1900-present; Social Sciences Citation Index (SSCI) 1900-present Arts & Humanities Citation Index (A&HCI) 1975-present; Conference Proceedings Citation Index - Science (CPCI-S) 1990-present; Conference Proceedings Citation Index - Social Science & Humanities (CPCI-SSH) 1990-present; Book Citation Index - Science (BKCI-S) 2008-present; Book Citation Index - Social Sciences & Humanities (BKCI-SSH) 2008-present; Emerging Sources Citation Index (ESCI) 2015-present; Current Chemical Reactions (CCR-EXPANDED) 1985-present (Includes Institut National de la Propriete Industrielle structure data back to 1840); Index Chemicus (IC) 1993-present. The following search terms were used: [‘BACI’] OR [‘Before-After Control-Impact’] and the search was conducted on the 18th December 2017. Our search returned 674 results, which we then refined by selecting only ‘Article’ as the document type and using only the following Web of Science Categories: ‘Ecology’, ‘Marine Freshwater Biology’, ‘Biodiversity Conservation’, ‘Fisheries’, ‘Oceanography’, ‘Forestry’, ‘Zoology’, Ornithology’, ‘Biology’, ‘Plant Sciences’, ‘Entomology’, ‘Remote Sensing’, ‘Toxicology’ and ‘Soil Science’. This left 579 results, which we then restricted to articles published since 2002 (15 years prior to search) to give us a realistic opportunity to obtain the raw datasets, thus reducing this number to 542. We were able to access the abstracts of 521 studies and excluded any that did not test the effect of an environmental intervention or threat using an R-BACI or BACI design with response measures related to the abundance (e.g., density, counts, biomass, cover), reproduction (reproductive success) or size (body length, body mass) of animals or plants. Many studies did not test a relevant metric (e.g., they measured species richness), did not use a BACI or R-BACI design, or did not test the effect of an intervention or threat — this left 96 studies for which we contacted all corresponding authors to ask for the raw dataset. We were able to fully access 54 raw datasets, but upon closer inspection we found that three of these datasets either: did not use a BACI design; did not use the metrics we specified; or did not provide sufficient data for our analyses. This left 51 datasets in total that we used in our preliminary analyses (Supplementary Data  2 ).

All the datasets were originally collected to evaluate the effect of an environmental intervention or impact. Most of them contained multiple response variables (e.g., different measures for different species, such as abundance or density for species A, B, and C). Within a dataset, we use the term “response” to refer to the estimation of the true effect of an impact on one response variable. There were 1,968 responses in total across 51 datasets. We then excluded 932 responses (resulting in the exclusion of one dataset) where one or more of the four time-period and treatment subsets (Before Control, Before Impact, After Control, and After Impact data) consisted of entirely zero measurements, or two or more of these subsets had more than 90% zero measurements. We also excluded one further dataset as it was the only one to not contain repeated measurements at sites in both the before- and after-periods. This was necessary to generate reliable standard errors when modelling these data. We modelled the remaining 1,036 responses from across 49 datasets (Supplementary Table  1 ).

We applied each study design to the appropriate components of each dataset using Generalised Linear Models (GLMs 73 , 74 ) because of their generality and ability to implement the statistical estimators of many different study designs. The model structure of GLMs was adjusted for each response in each dataset based on the study design specified, response measure and dataset structure (Supplementary Table  2 ). We quantified the effect of the time period for the BA design (After vs Before the impact) and the effect of the treatment type for the CI and R-CI designs (Impact vs Control) on the response variable (Supplementary Table  2 ). For BACI and R-BACI designs, we implemented two statistical estimators: 1.) a DiD estimator that estimated the true effect using an interaction term between time and treatment type; and 2.) a covariance adjustment estimator that estimated the true effect using a term for the treatment type with a lagged variable (Supplementary Table  2 ).

As there were large numbers of responses, we used general a priori rules to specify models for each response; this may have led to some model misspecification, but was unlikely to have substantially affected our pairwise comparison of estimates obtained by different designs. The error family of each GLM was specified based on the nature of the measure used and preliminary data exploration: count measures (e.g., abundance) = poisson; density measures (e.g., biomass or abundance per unit area) = quasipoisson, as data for these measures tended to be overdispersed; percentage measures (e.g., percentage cover) = quasibinomial; and size measures (e.g., body length) = gaussian.

We treated each year or season in which data were collected as independent observations because the implementation of a seasonal term in models is likely to vary on a case-by-case basis; this will depend on the research questions posed by each study and was not feasible for us to consider given the large number of responses we were modelling. The log link function was used for all models to generate a standardised log response ratio as an estimate of the true effect for each response; a fixed effect coefficient (a variable named treatment status; Supplementary Table  2 ) was used to estimate the log response ratio 61 . If the response had at least ten ‘sites’ (independent sampling units) and two measurements per site on average, we used the random effects of subsample (replicates within a site) nested within site to capture the dependence within a site and subsample (i.e., a Generalised Linear Mixed Model or GLMM 73 , 74 was implemented instead of a GLM); otherwise we fitted a GLM with only the fixed effects (Supplementary Table  2 ).

We fitted all models using R version 3.5.1 75 , and packages lme4 76 and MASS 77 . Code to replicate all analyses is available (see Data and Code Availability). We compared the estimates obtained using each study design (both in terms of point estimates and estimates with associated standard error) by their magnitude and sign.

A model-based quantification of the bias in study design estimates

We used a hierarchical Bayesian model motivated by the decomposition in Equation (1) to quantify the bias in different study design estimates. This model takes the estimated effects of impacts and their standard errors as inputs. Let \(\hat \beta _{ij}\) be the true effect estimator in study \(i\) using design \(j\) and \(\hat \sigma _{ij}\) be its estimated standard error from the corresponding GLM or GLMM. Our hierarchical model assumes:

where β i is the true effect for response \(i\) , \(\gamma _{ij}\) is the bias of design j in response \(i\) , and \(\varepsilon _{ij}\) is the sampling noise of the statistical estimator. Although \(\gamma _{ij}\) technically incorporates both the design bias and any misspecification (modelling) bias due to using GLMs or GLMMs (Equation (1)), we expect the modelling bias to be much smaller than the design bias 3 , 11 . We assume the statistical errors \(\varepsilon _i\) within a response are related to the estimated standard errors through the following joint distribution:

where \({\Omega}\) is the correlation matrix for the different estimators in the same response and λ is a scaling factor to account for possible over/under-estimation of the standard errors.

This model effectively quantifies the bias of design \(j\) using the value of \(\sigma _j\) (larger values = more bias) by accounting for within-response correlations using the correlation matrix \({\Omega}\) and for possible under-estimation of the standard error using \(\lambda\) . We ensured that the prior distributions we used had very large variances so they would have a very small effect on the posterior distribution — accordingly we placed the following disperse priors on the variance parameters:

We fitted the hierarchical Bayesian model in R version 3.5.1 using the Bayesian inference package rstan 78 .

Data availability

All data analysed in the current study are available from Zenodo, https://doi.org/10.5281/zenodo.3560856 .  Source data are provided with this paper.

Code availability

All code used in the current study is available from Zenodo, https://doi.org/10.5281/zenodo.3560856 .

Donnelly, C. A. et al. Four principles to make evidence synthesis more useful for policy. Nature 558 , 361–364 (2018).

Article   ADS   CAS   PubMed   Google Scholar  

McKinnon, M. C., Cheng, S. H., Garside, R., Masuda, Y. J. & Miller, D. C. Sustainability: map the evidence. Nature 528 , 185–187 (2015).

Rubin, D. B. For objective causal inference, design trumps analysis. Ann. Appl. Stat. 2 , 808–840 (2008).

Article   MathSciNet   MATH   Google Scholar  

Peirce, C. S. & Jastrow, J. On small differences in sensation. Mem. Natl Acad. Sci. 3 , 73–83 (1884).

Fisher, R. A. Statistical methods for research workers . (Oliver and Boyd, 1925).

Angrist, J. D. & Pischke, J.-S. Mostly harmless econometrics: an empiricist’s companion . (Princeton University Press, 2008).

de Palma, A. et al . Challenges with inferring how land-use affects terrestrial biodiversity: study design, time, space and synthesis. in Next Generation Biomonitoring: Part 1 163–199 (Elsevier Ltd., 2018).

Sagarin, R. & Pauchard, A. Observational approaches in ecology open new ground in a changing world. Front. Ecol. Environ. 8 , 379–386 (2010).

Article   Google Scholar  

Shadish, W. R., Cook, T. D. & Campbell, D. T. Experimental and quasi-experimental designs for generalized causal inference . (Houghton Mifflin, 2002).

Rosenbaum, P. R. Design of observational studies . vol. 10 (Springer, 2010).

Light, R. J., Singer, J. D. & Willett, J. B. By design: Planning research on higher education. By design: Planning research on higher education . (Harvard University Press, 1990).

Ioannidis, J. P. A. Why most published research findings are false. PLOS Med. 2 , e124 (2005).

Article   PubMed   PubMed Central   Google Scholar  

Open Science Collaboration. Estimating the reproducibility of psychological science. Science 349 , aac4716 (2015).

Article   CAS   Google Scholar  

John, L. K., Loewenstein, G. & Prelec, D. Measuring the prevalence of questionable research practices with incentives for truth telling. Psychol. Sci. 23 , 524–532 (2012).

Article   PubMed   Google Scholar  

Kerr, N. L. HARKing: hypothesizing after the results are known. Personal. Soc. Psychol. Rev. 2 , 196–217 (1998).

Zhao, Q., Keele, L. J. & Small, D. S. Comment: will competition-winning methods for causal inference also succeed in practice? Stat. Sci. 34 , 72–76 (2019).

Article   MATH   Google Scholar  

Friedman, J., Hastie, T. & Tibshirani, R. The Elements of Statistical Learning . vol. 1 (Springer series in statistics, 2001).

Underwood, A. J. Beyond BACI: experimental designs for detecting human environmental impacts on temporal variations in natural populations. Mar. Freshw. Res. 42 , 569–587 (1991).

Stewart-Oaten, A. & Bence, J. R. Temporal and spatial variation in environmental impact assessment. Ecol. Monogr. 71 , 305–339 (2001).

Eddy, T. D., Pande, A. & Gardner, J. P. A. Massive differential site-specific and species-specific responses of temperate reef fishes to marine reserve protection. Glob. Ecol. Conserv. 1 , 13–26 (2014).

Sher, A. A. et al. Native species recovery after reduction of an invasive tree by biological control with and without active removal. Ecol. Eng. 111 , 167–175 (2018).

Imbens, G. W. & Rubin, D. B. Causal Inference in Statistics, Social, and Biomedical Sciences . (Cambridge University Press, 2015).

Greenhalgh, T. How to read a paper: the basics of Evidence Based Medicine . (John Wiley & Sons, Ltd, 2019).

Salmond, S. S. Randomized Controlled Trials: Methodological Concepts and Critique. Orthopaedic Nursing 27 , (2008).

Geijzendorffer, I. R. et al. How can global conventions for biodiversity and ecosystem services guide local conservation actions? Curr. Opin. Environ. Sustainability 29 , 145–150 (2017).

Dimick, J. B. & Ryan, A. M. Methods for evaluating changes in health care policy. JAMA 312 , 2401 (2014).

Article   CAS   PubMed   Google Scholar  

Ding, P. & Li, F. A bracketing relationship between difference-in-differences and lagged-dependent-variable adjustment. Political Anal. 27 , 605–615 (2019).

Christie, A. P. et al. Simple study designs in ecology produce inaccurate estimates of biodiversity responses. J. Appl. Ecol. 56 , 2742–2754 (2019).

Watson, M. et al. An analysis of the quality of experimental design and reliability of results in tribology research. Wear 426–427 , 1712–1718 (2019).

Kilkenny, C. et al. Survey of the quality of experimental design, statistical analysis and reporting of research using animals. PLoS ONE 4 , e7824 (2009).

Christie, A. P. et al. The challenge of biased evidence in conservation. Conserv, Biol . 13577, https://doi.org/10.1111/cobi.13577 (2020).

Christie, A. P. et al. Poor availability of context-specific evidence hampers decision-making in conservation. Biol. Conserv. 248 , 108666 (2020).

Moscoe, E., Bor, J. & Bärnighausen, T. Regression discontinuity designs are underutilized in medicine, epidemiology, and public health: a review of current and best practice. J. Clin. Epidemiol. 68 , 132–143 (2015).

Goldenhar, L. M. & Schulte, P. A. Intervention research in occupational health and safety. J. Occup. Med. 36 , 763–778 (1994).

CAS   PubMed   Google Scholar  

Junker, J. et al. A severe lack of evidence limits effective conservation of the World’s primates. BioScience https://doi.org/10.1093/biosci/biaa082 (2020).

Altindag, O., Joyce, T. J. & Reeder, J. A. Can Nonexperimental Methods Provide Unbiased Estimates of a Breastfeeding Intervention? A Within-Study Comparison of Peer Counseling in Oregon. Evaluation Rev. 43 , 152–188 (2019).

Chaplin, D. D. et al. The Internal And External Validity Of The Regression Discontinuity Design: A Meta-Analysis Of 15 Within-Study Comparisons. J. Policy Anal. Manag. 37 , 403–429 (2018).

Cook, T. D., Shadish, W. R. & Wong, V. C. Three conditions under which experiments and observational studies produce comparable causal estimates: New findings from within-study comparisons. J. Policy Anal. Manag. 27 , 724–750 (2008).

Ioannidis, J. P. A. et al. Comparison of evidence of treatment effects in randomized and nonrandomized studies. J. Am. Med. Assoc. 286 , 821–830 (2001).

dos Santos Ribas, L. G., Pressey, R. L., Loyola, R. & Bini, L. M. A global comparative analysis of impact evaluation methods in estimating the effectiveness of protected areas. Biol. Conserv. 246 , 108595 (2020).

Benson, K. & Hartz, A. J. A Comparison of Observational Studies and Randomized, Controlled Trials. N. Engl. J. Med. 342 , 1878–1886 (2000).

Smokorowski, K. E. et al. Cautions on using the Before-After-Control-Impact design in environmental effects monitoring programs. Facets 2 , 212–232 (2017).

França, F. et al. Do space-for-time assessments underestimate the impacts of logging on tropical biodiversity? An Amazonian case study using dung beetles. J. Appl. Ecol. 53 , 1098–1105 (2016).

Duvendack, M., Hombrados, J. G., Palmer-Jones, R. & Waddington, H. Assessing ‘what works’ in international development: meta-analysis for sophisticated dummies. J. Dev. Effectiveness 4 , 456–471 (2012).

Sutherland, W. J. et al. Building a tool to overcome barriers in research-implementation spaces: The Conservation Evidence database. Biol. Conserv. 238 , 108199 (2019).

Gusenbauer, M. & Haddaway, N. R. Which academic search systems are suitable for systematic reviews or meta-analyses? Evaluating retrieval qualities of Google Scholar, PubMed, and 26 other resources. Res. Synth. Methods 11 , 181–217 (2020).

Konno, K. & Pullin, A. S. Assessing the risk of bias in choice of search sources for environmental meta‐analyses. Res. Synth. Methods 11 , 698–713 (2020).

PubMed   Google Scholar  

Butsic, V., Lewis, D. J., Radeloff, V. C., Baumann, M. & Kuemmerle, T. Quasi-experimental methods enable stronger inferences from observational data in ecology. Basic Appl. Ecol. 19 , 1–10 (2017).

Brownstein, N. C., Louis, T. A., O’Hagan, A. & Pendergast, J. The role of expert judgment in statistical inference and evidence-based decision-making. Am. Statistician 73 , 56–68 (2019).

Article   MathSciNet   Google Scholar  

Hahn, J., Todd, P. & Klaauw, W. Identification and estimation of treatment effects with a regression-discontinuity design. Econometrica 69 , 201–209 (2001).

Slavin, R. E. Best evidence synthesis: an intelligent alternative to meta-analysis. J. Clin. Epidemiol. 48 , 9–18 (1995).

Slavin, R. E. Best-evidence synthesis: an alternative to meta-analytic and traditional reviews. Educ. Researcher 15 , 5–11 (1986).

Shea, B. J. et al. AMSTAR 2: a critical appraisal tool for systematic reviews that include randomised or non-randomised studies of healthcare interventions, or both. BMJ (Online) 358 , 1–8 (2017).

Google Scholar  

Sterne, J. A. C. et al. ROBINS-I: a tool for assessing risk of bias in non-randomised studies of interventions. BMJ 355 , i4919 (2016).

Guyatt, G. et al. GRADE guidelines: 11. Making an overall rating of confidence in effect estimates for a single outcome and for all outcomes. J. Clin. Epidemiol. 66 , 151–157 (2013).

Davies, G. M. & Gray, A. Don’t let spurious accusations of pseudoreplication limit our ability to learn from natural experiments (and other messy kinds of ecological monitoring). Ecol. Evolution 5 , 5295–5304 (2015).

Lortie, C. J., Stewart, G., Rothstein, H. & Lau, J. How to critically read ecological meta-analyses. Res. Synth. Methods 6 , 124–133 (2015).

Gutzat, F. & Dormann, C. F. Exploration of concerns about the evidence-based guideline approach in conservation management: hints from medical practice. Environ. Manag. 66 , 435–449 (2020).

Greenhalgh, T. Will COVID-19 be evidence-based medicine’s nemesis? PLOS Med. 17 , e1003266 (2020).

Article   CAS   PubMed   PubMed Central   Google Scholar  

Barlow, J. et al. The future of hyperdiverse tropical ecosystems. Nature 559 , 517–526 (2018).

Gurevitch, J. & Hedges, L. V. Statistical issues in ecological meta‐analyses. Ecology 80 , 1142–1149 (1999).

Stone, J. C., Glass, K., Munn, Z., Tugwell, P. & Doi, S. A. R. Comparison of bias adjustment methods in meta-analysis suggests that quality effects modeling may have less limitations than other approaches. J. Clin. Epidemiol. 117 , 36–45 (2020).

Rhodes, K. M. et al. Adjusting trial results for biases in meta-analysis: combining data-based evidence on bias with detailed trial assessment. J. R. Stat. Soc.: Ser. A (Stat. Soc.) 183 , 193–209 (2020).

Article   MathSciNet   CAS   Google Scholar  

Efthimiou, O. et al. Combining randomized and non-randomized evidence in network meta-analysis. Stat. Med. 36 , 1210–1226 (2017).

Article   MathSciNet   PubMed   Google Scholar  

Welton, N. J., Ades, A. E., Carlin, J. B., Altman, D. G. & Sterne, J. A. C. Models for potentially biased evidence in meta-analysis using empirically based priors. J. R. Stat. Soc. Ser. A (Stat. Soc.) 172 , 119–136 (2009).

Turner, R. M., Spiegelhalter, D. J., Smith, G. C. S. & Thompson, S. G. Bias modelling in evidence synthesis. J. R. Stat. Soc.: Ser. A (Stat. Soc.) 172 , 21–47 (2009).

Shackelford, G. E. et al. Dynamic meta-analysis: a method of using global evidence for local decision making. bioRxiv 2020.05.18.078840, https://doi.org/10.1101/2020.05.18.078840 (2020).

Sutherland, W. J., Pullin, A. S., Dolman, P. M. & Knight, T. M. The need for evidence-based conservation. Trends Ecol. evolution 19 , 305–308 (2004).

Ioannidis, J. P. A. Meta-research: Why research on research matters. PLOS Biol. 16 , e2005468 (2018).

Article   PubMed   PubMed Central   CAS   Google Scholar  

LaLonde, R. J. Evaluating the econometric evaluations of training programs with experimental data. Am. Econ. Rev. 76 , 604–620 (1986).

Long, Q., Little, R. J. & Lin, X. Causal inference in hybrid intervention trials involving treatment choice. J. Am. Stat. Assoc. 103 , 474–484 (2008).

Article   MathSciNet   CAS   MATH   Google Scholar  

Thomson Reuters. ISI Web of Knowledge. http://www.isiwebofknowledge.com (2019).

Stroup, W. W. Generalized linear mixed models: modern concepts, methods and applications . (CRC press, 2012).

Bolker, B. M. et al. Generalized linear mixed models: a practical guide for ecology and evolution. Trends Ecol. Evolution 24 , 127–135 (2009).

R Core Team. R: A language and environment for statistical computing. R Foundation for Statistical Computing (2019).

Bates, D., Mächler, M., Bolker, B. & Walker, S. Fitting linear mixed-effects models using lme4. J. Stat. Softw. 67 , 1–48 (2015).

Venables, W. N. & Ripley, B. D. Modern Applied Statistics with S . (Springer, 2002).

Stan Development Team. RStan: the R interface to Stan. R package version 2.19.3 (2020).

Download references

Acknowledgements

We are grateful to the following people and organisations for contributing datasets to this analysis: P. Edwards, G.R. Hodgson, H. Welsh, J.V. Vieira, authors of van Deurs et al. 2012, T. M. Grome, M. Kaspersen, H. Jensen, C. Stenberg, T. K. Sørensen, J. Støttrup, T. Warnar, H. Mosegaard, Axel Schwerk, Alberto Velando, Dolores River Restoration Partnership, J.S. Pinilla, A. Page, M. Dasey, D. Maguire, J. Barlow, J. Louzada, Jari Florestal, R.T. Buxton, C.R. Schacter, J. Seoane, M.G. Conners, K. Nickel, G. Marakovich, A. Wright, G. Soprone, CSIRO, A. Elosegi, L. García-Arberas, J. Díez, A. Rallo, Parks and Wildlife Finland, Parc Marin de la Côte Bleue. Author funding sources: T.A. was supported by the Grantham Foundation for the Protection of the Environment, Kenneth Miller Trust and Australian Research Council Future Fellowship (FT180100354); W.J.S. and P.A.M. were supported by Arcadia, MAVA, and The David and Claudia Harding Foundation; A.P.C. was supported by the Natural Environment Research Council via Cambridge Earth System Science NERC DTP (NE/L002507/1); D.A. was funded by Portugal national funds through the FCT – Foundation for Science and Technology, under the Transitional Standard – DL57 / 2016 and through the strategic project UIDB/04326/2020; M.A. acknowledges Koniambo Nickel SAS, and particularly Gregory Marakovich and Andy Wright; J.C.A. was funded through by Dirección General de Investigación Científica, projects PB97-1252, BOS2002-01543, CGL2005-04893/BOS, CGL2008-02567 and Comunidad de Madrid, as well as by contract HENARSA-CSIC 2003469-CSIC19637; A.A. was funded by Spanish Government: MEC (CGL2007-65176); B.P.B. was funded through the U.S. Geological Survey and the New York City Department of Environmental Protection; R.B. was funded by Comunidad de Madrid (2018-T1/AMB-10374); J.A.S. and D.A.B. were funded through the U.S. Geological Survey and NextEra Energy; R.S.C. was funded by the Portuguese Foundation for Science and Technology (FCT) grant SFRH/BD/78813/2011 and strategic project UID/MAR/04292/2013; A.D.B. was funded through the Belgian offshore wind monitoring program (WINMON-BE), financed by the Belgian offshore wind energy sector via RBINS—OD Nature; M.K.D. was funded by the Harold L. Castle Foundation; P.M.E. was funded by the Clackamas County Water Environment Services River Health Stewardship Program and the Portland State University Student Watershed Research Project; T.D.E., J.P.A.G. and A.P. were supported by funding from the New Zealand Department of Conservation (Te Papa Atawhai) and from the Centre for Marine Environmental & Economic Research, Victoria University of Wellington, New Zealand; F.M.F. was funded by CNPq-CAPES grants (PELD site 23 403811/2012-0, PELD-RAS 441659/2016-0, BEX5528/13-5 and 383744/2015-6) and BNP Paribas Foundation (Climate & Biodiversity Initiative, BIOCLIMATE project); B.P.H. was funded by NOAA-NMFS sea scallop research set-aside program awards NA16FM1031, NA06FM1001, NA16FM2416, and NA04NMF4720332; A.L.B. was funded by the Portuguese Foundation for Science and Technology (FCT) grant FCT PD/BD/52597/2014, Bat Conservation International student research fellowship and CNPq grant 160049/2013-0; L.C.M. acknowledges Secretaría de Ciencia y Técnica (UNRC); R.A.M. acknowledges Alaska Fisheries Science Center, NOAA Fisheries, and U.S. Department of Commerce for salary support; C.F.J.M. was funded by the Portuguese Foundation for Science and Technology (FCT) grant SFRH/BD/80488/2011; R.R. was funded by the Portuguese Foundation for Science and Technology (FCT) grant PTDC/BIA-BIC/111184/2009, by Madeira’s Regional Agency for the Development of Research, Technology and Innovation (ARDITI) grant M1420-09-5369-FSE-000002 and by a Bat Conservation International student research fellowship; J.C. and S.S. were funded by the Alabama Department of Conservation and Natural Resources; A.T. was funded by the Spanish Ministry of Education with a Formacion de Profesorado Universitario (FPU) grant AP2008-00577 and Dirección General de Investigación Científica, project CGL2008-02567; C.W. was funded by Strategic Science Investment Funding of the Ministry of Business, Innovation and Employment, New Zealand; J.S.K. acknowledges Boreal Peatland LIFE (LIFE08 NAT/FIN/000596), Parks and Wildlife Finland and Kone Foundation; J.J.S.S. was funded by the Mexican National Council on Science and Technology (CONACYT 242558); N.N. was funded by The Carl Tryggers Foundation; I.L.J. was funded by a Discovery Grant from the Natural Sciences and Engineering Research Council of Canada; D.D. and D.S. were funded by the French National Research Agency via the “Investment for the Future” program IDEALG (ANR-10-BTBR-04) and by the ALGMARBIO project; R.C.P. was funded by CSIRO and whose research was also supported by funds from the Great Barrier Reef Marine Park Authority, the Fisheries Research and Development Corporation, the Australian Fisheries Management Authority, and Queensland Department of Primary Industries (QDPI). Any use of trade, firm, or product names is for descriptive purposes only and does not imply endorsement by the U.S. Government. The scientific results and conclusions, as well as any views or opinions expressed herein, are those of the author(s) and do not necessarily reflect those of NOAA or the Department of Commerce.

Author information

Authors and affiliations.

Conservation Science Group, Department of Zoology, University of Cambridge, The David Attenborough Building, Downing Street, Cambridge, CB3 3QZ, UK

Alec P. Christie, Philip A. Martin & William J. Sutherland

Centre of Marine Sciences (CCMar), Universidade do Algarve, Campus de Gambelas, 8005-139, Faro, Portugal

David Abecasis

Institut de Recherche pour le Développement (IRD), UMR 9220 ENTROPIE & Laboratoire d’Excellence CORAIL, Université de Perpignan Via Domitia, 52 avenue Paul Alduy, 66860, Perpignan, France

Mehdi Adjeroud

Museo Nacional de Ciencias Naturales, CSIC, Madrid, Spain

Juan C. Alonso & Carlos Palacín

School of Biological Sciences, University of Queensland, Brisbane, 4072, QLD, Australia

Tatsuya Amano

Education Faculty of Bilbao, University of the Basque Country (UPV/EHU). Sarriena z/g E-48940 Leioa, Basque Country, Spain

Alvaro Anton

U.S. Geological Survey, New York Water Science Center, 425 Jordan Rd., Troy, NY, 12180, USA

Barry P. Baldigo

Universidad Complutense de Madrid, Departamento de Biodiversidad, Ecología y Evolución, Facultad de Ciencias Biológicas, c/ José Antonio Novais, 12, E-28040, Madrid, Spain

Rafael Barrientos & Carlos A. Martín

Durrell Institute of Conservation and Ecology (DICE), School of Anthropology and Conservation, University of Kent, Canterbury, CT2 7NR, UK

Jake E. Bicknell

U.S. Geological Survey, Northern Prairie Wildlife Research Center, Jamestown, ND, 58401, USA

Deborah A. Buhl & Jill A. Shaffer

Northern Gulf Institute, Mississippi State University, 1021 Balch Blvd, John C. Stennis Space Center, Mississippi, 39529, USA

Just Cebrian

MARE – Marine and Environmental Sciences Centre, Dept. Life Sciences, University of Coimbra, Coimbra, Portugal

Ricardo S. Ceia

CFE – Centre for Functional Ecology, Dept. Life Sciences, University of Coimbra, Coimbra, Portugal

Departamento de Ciencias Naturales, Universidad Nacional de Río Cuarto (UNRC), Córdoba, Argentina

Luciana Cibils-Martina

CONICET, Buenos Aires, Argentina

Marine Institute, Rinville, Oranmore, Galway, Ireland

Sarah Clarke & Oliver Tully

National Center for Scientific Research, PSL Université Paris, CRIOBE, USR 3278 CNRS-EPHE-UPVD, Maison des Océans, 195 rue Saint-Jacques, 75005, Paris, France

Joachim Claudet

School of Biological Sciences, University of Western Australia, Nedlands, WA, 6009, Australia

Michael D. Craig

School of Environmental and Conservation Sciences, Murdoch University, Murdoch, WA, 6150, Australia

Sorbonne Université, CNRS, UMR 7144, Station Biologique, F.29680, Roscoff, France

Dominique Davoult & Doriane Stagnol

Flanders Research Institute for Agriculture, Fisheries and Food (ILVO), Ankerstraat 1, 8400, Ostend, Belgium

Annelies De Backer

Marine Science Institute, University of California Santa Barbara, Santa Barbara, CA, 93106, USA

Mary K. Donovan

Hawaii Institute of Marine Biology, University of Hawaii at Manoa, Honolulu, HI, 96822, USA

Baruch Institute for Marine & Coastal Sciences, University of South Carolina, Columbia, SC, USA

Tyler D. Eddy

Centre for Fisheries Ecosystems Research, Fisheries & Marine Institute, Memorial University of Newfoundland, St. John’s, Canada

School of Biological Sciences, Victoria University of Wellington, P O Box 600, Wellington, 6140, New Zealand

Tyler D. Eddy, Jonathan P. A. Gardner & Anjali Pande

Lancaster Environment Centre, Lancaster University, LA1 4YQ, Lancaster, UK

Filipe M. França

Fisheries, Aquatic Science and Technology Laboratory, Alaska Pacific University, 4101 University Dr., Anchorage, AK, 99508, USA

Bradley P. Harris

Natural Resources Institute Finland, Manamansalontie 90, 88300, Paltamo, Finland

Department of Biology, Memorial University, St. John’s, NL, A1B 2R3, Canada

Ian L. Jones

National Marine Science Centre and Marine Ecology Research Centre, Southern Cross University, 2 Bay Drive, Coffs Harbour, 2450, Australia

Brendan P. Kelaher

Department of Biological and Environmental Science, University of Jyväskylä, Jyväskylä, Finland

Janne S. Kotiaho

School of Resource Wisdom, University of Jyväskylä, Jyväskylä, Finland

Centre for Ecology, Evolution and Environmental Changes – cE3c, Faculty of Sciences, University of Lisbon, 1749-016, Lisbon, Portugal

Adrià López-Baucells, Christoph F. J. Meyer & Ricardo Rocha

Biological Dynamics of Forest Fragments Project, National Institute for Amazonian Research and Smithsonian Tropical Research Institute, 69011-970, Manaus, Brazil

Granollers Museum of Natural History, Granollers, Spain

Adrià López-Baucells

Department of Biological Sciences, University of New Brunswick, PO Box 5050, Saint John, NB, E2L 4L5, Canada

Heather L. Major

Voimalohi Oy, Voimatie 23, Voimatie, 91100, Ii, Finland

Aki Mäki-Petäys

Natural Resources Institute Finland, Paavo Havaksen tie 3, 90014 University of Oulu, Oulu, Finland

Fundación Migres CIMA Ctra, Cádiz, Spain

Beatriz Martín

Intergovernmental Oceanographic Commission of UNESCO, Marine Policy and Regional Coordination Section Paris 07, Paris, France

BioRISC, St. Catharine’s College, Cambridge, CB2 1RL, UK

Philip A. Martin & William J. Sutherland

Departamento de Ecología e Hidrología, Universidad de Murcia, Campus de Espinardo, 30100, Murcia, Spain

Daniel Mateos-Molina

RACE Division, Alaska Fisheries Science Center, National Marine Fisheries Service, NOAA, 7600 Sand Point Way NE, Seattle, WA, 98115, USA

Robert A. McConnaughey

European Commission, Joint Research Centre (JRC), Ispra, VA, Italy

Michele Meroni

School of Science, Engineering and Environment, University of Salford, Salford, M5 4WT, UK

Christoph F. J. Meyer

Victorian National Park Association, Carlton, VIC, Australia

Department of Earth, Environment and Life Sciences (DiSTAV), University of Genoa, Corso Europa 26, 16132, Genoa, Italy

Monica Montefalcone

Department of Ecology, Swedish University of Agricultural Sciences, Uppsala, Sweden

Norbertas Noreika

Chair of Plant Health, Institute of Agricultural and Environmental Sciences, Estonian University of Life Sciences, Tartu, Estonia

Biosecurity New Zealand – Tiakitanga Pūtaiao Aotearoa, Ministry for Primary Industries – Manatū Ahu Matua, 66 Ward St, PO Box 40742, Wallaceville, New Zealand

Anjali Pande

National Institute of Water & Atmospheric Research Ltd (NIWA), 301 Evans Bay Parade, Greta Point Wellington, New Zealand

CSIRO Oceans & Atmosphere, Queensland Biosciences Precinct, 306 Carmody Road, ST. LUCIA QLD, 4067, Australia

C. Roland Pitcher

Museo Nacional de Ciencias Naturales, CSIC, José Gutiérrez Abascal 2, E-28006, Madrid, Spain

Carlos Ponce

Fort Keogh Livestock and Range Research Laboratory, 243 Fort Keogh Rd, Miles City, Montana, 59301, USA

Matt Rinella

CIBIO-InBIO, Research Centre in Biodiversity and Genetic Resources, University of Porto, Vairão, Portugal

Ricardo Rocha

Departamento de Sistemas Físicos, Químicos y Naturales, Universidad Pablo de Olavide, ES-41013, Sevilla, Spain

María C. Ruiz-Delgado

El Colegio de la Frontera Sur, A.P. 424, 77000, Chetumal, QR, Mexico

Juan J. Schmitter-Soto

Division of Fish and Wildlife, New York State Department of Environmental Conservation, 625 Broadway, Albany, NY, 12233-4756, USA

Shailesh Sharma

University of Denver Department of Biological Sciences, Denver, CO, USA

Anna A. Sher

U.S. Geological Survey, Fort Collins Science Center, Fort Collins, CO, 80526, USA

Thomas R. Stanley

School for Marine Science and Technology, University of Massachusetts Dartmouth, New Bedford, MA, USA

Kevin D. E. Stokesbury

Georges Lemaître Earth and Climate Research Centre, Earth and Life Institute, Université Catholique de Louvain, 1348, Louvain-la-Neuve, Belgium

Aurora Torres

Center for Systems Integration and Sustainability, Department of Fisheries and Wildlife, 13 Michigan State University, East Lansing, MI, 48823, USA

Natural Resources Institute Finland, Latokartanonkaari 9, 00790, Helsinki, Finland

Teppo Vehanen

Manaaki Whenua – Landcare Research, Private Bag 3127, Hamilton, 3216, New Zealand

Corinne Watts

Statistical Laboratory, Department of Pure Mathematics and Mathematical Statistics, University of Cambridge, Wilberforce Road, Cambridge, CB3 0WB, UK

Qingyuan Zhao

You can also search for this author in PubMed   Google Scholar

Contributions

A.P.C., T.A., P.A.M., Q.Z., and W.J.S. designed the research; A.P.C. wrote the paper; D.A., M.A., J.C.A., A.A., B.P.B, R.B., J.B., D.A.B., J.C., R.S.C., L.C.M., S.C., J.C., M.D.C, D.D., A.D.B., M.K.D., T.D.E., P.M.E., F.M.F., J.P.A.G., B.P.H., A.H., I.L.J., B.P.K., J.S.K., A.L.B., H.L.M., A.M., B.M., C.A.M., D.M., R.A.M, M.M., C.F.J.M.,K.M., M.M., N.N., C.P., A.P., C.R.P., C.P., M.R., R.R., M.C.R., J.J.S.S., J.A.S., S.S., A.A.S., D.S., K.D.E.S., T.R.S., A.T., O.T., T.V., C.W. contributed datasets for analyses. All authors reviewed, edited, and approved the manuscript.

Corresponding author

Correspondence to Alec P. Christie .

Ethics declarations

Competing interests.

The authors declare no competing interests.

Additional information

Peer review information Nature Communications thanks Casper Albers, Samuel Scheiner, and the other, anonymous, reviewer(s) for their contribution to the peer review of this work. Peer reviewer reports are available.

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary information

Supplementary information, peer review file, description of additional supplementary information, supplementary data 1, supplementary data 2, source data, source data, rights and permissions.

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/ .

Reprints and permissions

About this article

Cite this article.

Christie, A.P., Abecasis, D., Adjeroud, M. et al. Quantifying and addressing the prevalence and bias of study designs in the environmental and social sciences. Nat Commun 11 , 6377 (2020). https://doi.org/10.1038/s41467-020-20142-y

Download citation

Received : 29 January 2020

Accepted : 13 November 2020

Published : 11 December 2020

DOI : https://doi.org/10.1038/s41467-020-20142-y

Share this article

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

This article is cited by

Robust language-based mental health assessments in time and space through social media.

  • Siddharth Mangalik
  • Johannes C. Eichstaedt
  • H. Andrew Schwartz

npj Digital Medicine (2024)

Is there a “difference-in-difference”? The impact of scientometric evaluation on the evolution of international publications in Egyptian universities and research centres

  • Mona Farouk Ali

Scientometrics (2024)

Quantifying research waste in ecology

  • Marija Purgar
  • Tin Klanjscek
  • Antica Culina

Nature Ecology & Evolution (2022)

Assessing assemblage-wide mammal responses to different types of habitat modification in Amazonian forests

  • Paula C. R. Almeida-Maués
  • Anderson S. Bueno
  • Ana Cristina Mendes-Oliveira

Scientific Reports (2022)

Mitigating impacts of invasive alien predators on an endangered sea duck amidst high native predation pressure

  • Kim Jaatinen
  • Ida Hermansson

Oecologia (2022)

By submitting a comment you agree to abide by our Terms and Community Guidelines . If you find something abusive or that does not comply with our terms or guidelines please flag it as inappropriate.

Quick links

  • Explore articles by subject
  • Guide to authors
  • Editorial policies

Sign up for the Nature Briefing: Anthropocene newsletter — what matters in anthropocene research, free to your inbox weekly.

research bias in peer review

research bias in peer review

Understanding Bias in Peer Review

November 30, 2017

Posted by Andrew Tomkins, Director of Engineering and William D. Heavlin, Statistician, Google Research

  • We invited a number of experts to join the conference Program Committee (PC).
  • We randomly split these PC members into a single-blind cadre and a double-blind cadre.
  • We asked all PC members to “bid” for papers they were qualified to review, but only the single-blind cadre had access to the names and institutions of the paper authors.
  • Based on the resulting bids, we then allocated two single-blind and two double-blind PC members to each paper.
  • Each PC member read his or her assigned papers and entered reviews, again with only single-blind PC members able to see the authors and institutions.
  • Algorithms & Theory
  • Data Mining & Modeling

Other posts of interest

research bias in peer review

August 16, 2024

  • Data Mining & Modeling ·
  • Machine Intelligence ·

research bias in peer review

June 4, 2024

research bias in peer review

May 2, 2024

  • Health & Bioscience ·
  • Open Source Models & Datasets

research bias in peer review

  • Translation

Minimising or avoiding Bias during Peer Review

By charlesworth author services.

  • Charlesworth Author Services
  • 30 August, 2022

Bias in scholarly research can affect study findings before, during or after the research has been conducted. Did you know that academic peer review can also introduce bias?

Overview of bias in peer review

Peer review is meant to be a neutral, impartial assessment of the novelty , rigour and scientific merit of a study. Any deviation from objectivity in academic peer review is considered biased peer review. Journal editors or peer reviewers might be swayed by conscious or subconscious biases when deciding which manuscripts are reviewed or accepted for publication. Biased peer review compounds other biases , such as publication bias (e.g. preventing publication by authors of a particular region).

To avoid disappointment, wasted time and amplifying other research biases, all stakeholders in the peer reviewer process need to be more aware of and thereby avoid or minimise the phenomenon. 

Common reasons for biased peer review

A. biases toward or against certain author profiles.

When peer reviewers know the identity of the authors, biases against their institute, geography, culture or gender might come into play. For example …

Simply based on the lead author’s country, a peer reviewer may have preconceived notions about the quality of the language used within the manuscript, without even reading it.

Such pre-review preconceptions can colour the peer review report and make the reviewer pass unfair judgement on the quality of the work.

Such a bias can creep in even before actual peer review , i.e. might be introduced by the journal editor , who might make judgments based on the institute or country and decide to not send the manuscript for peer review in the first place.

Bias need not necessarily be negative. Reviewers might perform a cursory review and offer unmerited glowing reviews simply based on the reputation of the authors and their affiliations .

b. Prejudice towards specific findings

Research is, by definition, an exercise in discovery. On occasion, new discoveries may challenge long- and deeply held beliefs, challenging normally inquisitive and dispassionate peer reviewers.

  • When encountering novel and unanticipated reports on research, some reviewers might be rigid in their ideas, pushing them to ask authors to delete outcomes or modify analyses.
  • Alternatively, other reviewers might discourage the publication of nonsignificant results .

Such practices lead to bias in outcomes and reporting, as well as publication bias.

c. Conflicts of interest (CoIs)

  • If a reviewer knows that a manuscript is written by someone with whom they have a personal grudge or professional rivalry , they might give harsh feedback.
  • On the other hand, they might provide an inappropriately favourable review to a paper authored by a friend or erstwhile student .
  • A reviewer might even feel obliged to give a positive review if they were suggested by the author to act as a reviewer.

Minimising bias in peer review

A. suggesting peer reviewers ethically.

Journals are increasingly asking authors for reviewer suggestions .

  • Researchers must ensure that the suggestions are free of potential CoIs – positive or negative.
  • On the journal side, editorial offices and editors need to carefully vet all new reviewers, especially those suggested by manuscript authors.

b. Declining peer review if needed

As a reviewer, when you are sure that the identity of a manuscript’s author has a good chance of affecting your judgement, you should decline the review invitation. Let the journal editor know why, and perhaps suggest alternate reviewers who could provide a more objective evaluation.

c. Conducting a blinded peer review

Blinding can help reduce bias in peer review.

  • In double-blind peer review, the identities of authors and reviewers are concealed from each other.
  • Some journals have even introduced triple-blind peer review, where the authors’ identity is also hidden from the journal editors.

These types of blinding efforts help reviewers focus on the content of an assigned manuscript, rather than a paper’s authorship or institutional affiliation.

d. Conducting an open peer review

In open peer review , the peer review history, including reviewer comments and author responses, is made publicly available . Such a system increases transparency . It also encourages reviewers to be (more) constructive in their comments.

e. Recognising diversity, equity and inclusion in peer review

The quality, objectivity and breadth of research are enhanced when all members of the research community have access to equitable dissemination of their research findings. Publishers and research institutions are well-positioned to provide training to promote diversity and inclusion in peer review . When journals and publishers have diverse editorial boards and diverse pools of reviewers, everyone benefits and knowledge advances freely. Indeed, awareness and open conversations around diversity should be encouraged in academia.

There is a pressing need to minimise bias and improve transparency in academic peer review. Realise that every researcher is a potential peer reviewer . Thus, as a researcher, you should suggest reviewers ethically, and as a reviewer, strive to be impartial when critiquing the work of others. 

Maximise your publication success with Charlesworth Author Services.

Charlesworth Author Services, a trusted brand supporting the world’s leading academic publishers, institutions and authors since 1928.

To know more about our services, visit: Our Services

Share with your colleagues

cwg logo

Scientific Editing Services

Sign up – stay updated.

We use cookies to offer you a personalized experience. By continuing to use this website, you consent to the use of cookies in accordance with our Cookie Policy.

eLife logo

  • Feature Article

Research: Gender bias in scholarly peer review

Is a corresponding author

  • Manuel Schottdorf
  • Andreas Neef
  • Max Planck Institute for Dynamics and Self-Organization, Germany ;
  • Bernstein Center for Computational Neuroscience, Germany ;
  • Yale University, United States ;
  • Aix-Marseille University, France ;
  • Open access
  • Copyright information
  • Comment Open annotations (there are currently 0 annotations on this page).
  • 22,187 views
  • 1,802 downloads
  • 241 citations

Share this article

Cite this article.

  • Markus Helmer
  • Demian Battaglia
  • Copy to clipboard
  • Download BibTeX
  • Download .RIS
  • Figures and data

Introduction

Conclusions, materials and methods, decision letter, author response, article and author information.

Peer review is the cornerstone of scholarly publishing and it is essential that peer reviewers are appointed on the basis of their expertise alone. However, it is difficult to check for any bias in the peer-review process because the identity of peer reviewers generally remains confidential. Here, using public information about the identities of 9000 editors and 43000 reviewers from the Frontiers series of journals, we show that women are underrepresented in the peer-review process, that editors of both genders operate with substantial same-gender preference (homophily), and that the mechanisms of this homophily are gender-dependent. We also show that homophily will persist even if numerical parity between genders is reached, highlighting the need for increased efforts to combat subtler forms of gender bias in scholarly publishing.

Peer review has an important role in improving the quality of research papers. It is the “lifeblood of research in academia […] the social structure that subjects research to the critical assessment of other researchers” ( Bourdieu, 1975 ). This structure relies on self-regulated interactions within the scientific community, in which a journal editor appoints peer reviewers with expertise in the subject of a particular manuscript to report on the quality of that manuscript and to provide recommendations for its improvement. Other attributes of the peer reviewer, such as their gender, should be irrelevant ( Moss-Racusin et al., 2012 ; Nature, 2013 ). However the identities of peer reviewers and editors are usually confidential, so previous work on gender balance in the peer-review process has relied on small, monodisciplinary data sets and these studies have given partly contradictory reports ( Lloyd, 1990 ; Gilbert et al., 1994 ; Budden et al., 2008 ; Borsuk et al., 2009 ; Knobloch-Westerwick et al., 2013 ; Larivière et al., 2013 ; Buckley et al., 2014 ; Demarest et al., 2014 ; Handley et al., 2015b ; Fox et al., 2016 ).

Frontiers journals ( www.frontiersin.org ) differ from most journals in that they generally disclose the identities of peer reviewers and associate editors alongside each published article in an attempt to increase the transparency and quality of the publication process ( Poynder, 2016 ). This allowed us to extract the names of associate editors, peer reviewers and authors for articles published in Frontiers journals between 2007 (when the first Frontiers journal was published) and the end of 2015. This data set included the names of more than 9000 editors, 43,000 reviewers, and 126,000 authors for about 41,000 articles published in 142 journals in Science, Health, Engineering and the Humanities and Social Sciences (see Materials and methods). This data set is one of the largest available to date, and contains at least an order of magnitude more information than most data sets used in previous studies of peer review (see Supplementary file 2 for comparison).

Analysis of this data set reveals that women are underrepresented in the peer-review process, and that editors of both genders operate with substantial same-gender preference (homophily) when appointing reviewers. Moreover, our analysis suggests that this homophilic tendency will persist even when men and women are fairly represented in the peer-review process. Our results confirm the need for increased efforts to fight against subtler forms of gender bias in scholarly publishing and not just focus on numerical under-representation alone.

To assess whether our data set was representative of an active and mature research community, we created directed networks ( Figure 1a 1 ), in which individual scientists appeared as vertices, while arrows denoted interactions between them (“ is appointing ” in the editor-to-reviewer network, and “ is editing (reviewing) a manuscript of ” in the editor (reviewer)-to-author network). As a whole, the networks had an exponentially fast growth in time, with a large fraction of people participating in a connected component of the graph reaching 90% of the total network size. Furthermore, graph theoretical metrics such as shortest path length, small-world index as well as several other network properties have changed little in the 3-5 last considered years ( Figure 1—figure supplement 1 ). Thus, peer-reviewing interactions in the Frontiers journal series gave rise to a mature, topologically stable and integrated community, even though its contributors constitute only a small subset of researchers worldwide.

research bias in peer review

Women review and author even less articles than expected from their numeric underrepresentation.

( a 1 ) We represent peer-reviewing interactions as directed graphs, in which vertices denote scientists. In the editor-to-reviewer network every edge represents the act of an editor (source vertex) appointing a reviewer (target vertex) to review a manuscript (and the reviewer has accepted the invitation). Analogously, in the reviewer-to-author network edges represent a reviewer reviewing a manuscript of an author. ( b ) The development of the fraction of contributions for each gender are shown for editors, reviewers and authors. Since the start of the Frontiers journals in 2007 until 2015, women (circles) edit, review and author much less than 50% of manuscripts, as expected from their numeric underrepresentation. However, the actual numbers of reviewing and authoring contributions by women are even smaller than expected by chance, taking into account their numeric underrepresentation. This is revealed by comparison with a null hypothesis in which gender and number of contributions are assumed to be independent. To this end, we generated surrogate ensembles by shuffling the genders of scientists appearing in a given role in the network ( a 2 ). From the surrogate ensembles, we obtained 95% confidence intervals (CIs; shaded areas in b). *, **, *** over (under) the data symbols denote the data lying over (under) the 95%, 99%, 99.9% CIs. Note that for all three subnetworks, there is a noticeable, but extremely slow trend towards equity (dashed line) for the fraction of contributions. ( c ) The fraction of female contributors, ranked in increasing order of authoring contributions, for the 47 frontier journals, whose published articles were handled by at least 25 distinct editors. Women were underrepresented consistently across all fields and particularly severely in math-intensive disciplines.

We then looked for signatures of gender bias and of its evolution across time in the structure of these large networks. We study first the fractions of assignments for reviewing or editing given to female or male scientists and, for comparison, we also show the fractions of author contributions. Figure 1b reveals that the fractions of authoring, reviewing and editing contributions by women — amounting to 37%, 28% and 26%, respectively, in the complete accumulated data until 2015 — are always significantly smaller than the corresponding fractions for men. The unbalance between male and female contributions thus worsens when gradually ascending through the peer-review hierarchy. Apart from a few outlier countries, this pattern was dominant worldwide ( Figure 1—figure supplement 2 ). It was also largely present in all the considered journals when looking at them individually ( Figure 1c ). Overall, the number of contributions by female authors varies between about 15% (Frontiers in Neurorobotics) and 50% (Public Health), by female reviewers between about 15% (Surgery) and 50% (Public Health), and by female editors from ca. 5% (Robotics AI) to 35% (Aging Neuroscience). Globally, we observed a trend towards gender parity across time. The rates of change were, however, very slow. Linear extrapolation based on the fractions observed from 2012 to 2015 would predict that exact parity could be achieved as late as 2027 for authoring, 2034 for reviewing and 2042 for editing.

We wondered whether these lower fractions of contributions to the different roles were just due to the fact that overall there are numerically less female than male authors, reviewers and editors (39%, 30% and 28% out of all available authors, reviewers and editors, respectively, were women, closely mirroring the observed fractions of assignments). To test this hypothesis, we took the exact same network of peer-review interactions in the Frontiers journals for given, and randomly permuted gender labels among scientists of a given role ( Figure 1a 2 ). This procedure maintained the ratio of female and male scientists acting in the different roles, but destroyed all direct correlations between gender and numbers of contributions. Repeatedly drawing random genders for the scientists in the network generated a surrogate ensemble that we used to estimate the expected number of contributions in a gender-blind control network. Author and reviewing contributions by women lay significantly below the confidence intervals obtained through this permutation testing procedure since 2009 and 2011, respectively. For female editing contributions we found the same, though non-significant, trend. Thus, the mere overall smaller number of female actors cannot explain the observed unbalanced fractions of female contributions to the peer-review chain.

We then looked for possible differences over the entire distributions of the number of peer-review tasks and authoring contributions for men and women. These distributions are fat-tailed ( Figure 2a-c ), indicating that some individuals provided a large number of contributions to the publication chain, while a majority of scientists authored, reviewed or edited only a small number of manuscripts. Moreover, comparing the observed degree distributions to the expectations derived from the same null hypothesis used above, women had a significantly smaller than chance probability to review (and author) more than one article, while their probability to act as single-time reviewer or author exceeded the expected chance level ( Figure 2e-f ). In the editing role, women underrepresentation was significant only for a high number of contributions. Furthermore, we found significant deviations from chance-level expectations across the entire studied time-span ( Figure 2—figure supplement 1 ).

research bias in peer review

Women are underrepresented in the fat tail of contributions.

A break-down of the number of individuals contributing a given number of times as editors, reviewers and authors (binned, x-axis is marking the bin edges) shows that the majority of scientists ( a ) edited, ( b ) reviewed or ( c ) authored (corresponding zooms for small contribution numbers are shown in e-f ) only a small number of manuscripts. Chance levels (shaded) were derived from an ensemble of reference networks constructed as shown in Figure 1a . The underrepresentation of women in relation to these chance levels tends to increase towards the fat tail of the distribution, associated to the relatively few individuals that made many contributions. In the group of one-time authors or reviewers however, women are overrepresented. Time resolved distributions are shown in Figure 2—figure supplement 1 .

The differences in assignment numbers may reflect behavioral or psychological differences between the groups of male and female scientists — either intrinsic or due to sociocultural context ( Moss-Racusin et al., 2012 ; Nature Neuroscience, 2006 ; Ceci et al., 2009 ; Ceci and Williams, 2010 ; Goulden et al., 2011 ; Ceci and Williams, 2011 ; Bloch, 2012 ; Raymond, 2013 ; Shen, 2013 ; Handley et al., 2015a ). Nevertheless, assignment numbers are also ultimately influenced by the editors’ active choices. To reveal whether any bias exists in the reviewer assignment relation, we first analyzed gender correlations between directly connected pairs of nodes in the editor-to-reviewer appointment network ( Figure 3a ) and found a marked gender homophily bias for both male and female editor nodes. Specifically, 73% of reviewers appointed by men were also men, 33% of reviewers appointed by women were women, but, importantly, both these numbers laid above the expectations drawn from the assumption that genders were randomly distributed in the given editor-to-reviewer network topology. Similarly, in the reviewer-to-author network ( Figure 3b ), male (female) reviewers assessed articles authored by male (female) authors significantly more often than expected.

research bias in peer review

Editors have a same-gender preference for appointing reviewers.

( a ) Female editors (orange) appoint significantly more female reviewers than expected under the gender-blind assumption (shaded area). At the same time, male editors (green) appoint less women than expected. The development of this trend over time is shown, including articles cumulatively until the indicated year. ( b ) Likewise, female/male reviewers review significantly more female/male-authored articles than expected. ( c ) Homophily is widespread across scientific fields, including those with relatively mild underrepresentation of women. We here report four example disciplinary groupings, with large numbers of contributions (from left to right, respectively, 13416, 4721, 4020, 5680) and the propensity of appointing a female reviewer depending on the editor's gender for each of these groupings. Only assignments by female neuroscience editors were not homophilic, otherwise the occurrence of same-gender preferences was general, arguing against heterogeneity between subfields as a cause for homophily in assignments. ( d ) Plotted here are distributions of a measure of inbreeding homophily. To control for baseline homophily at the level of a narrow local neighborhood, we measure, for each editor node, the actual number of reviewer assignments given to women and subtract the expected number, which would be observed if the considered editor appointed women with the same frequency as in his/her local vicinity. For male editors (green) the distribution is skewed towards an underrepresentation of female assignments (left-leaning), while for female editors the distribution is skewed towards an overrepresentation of female assignments. This highlights that homophily bias is detectable even at the level of the reachable narrow surrounding of each editor. ( e ) Histogram of the probability that an editor assigns as least as many reviews to people of the same gender as he/she actually does reveals that there's an excess of strongly inbreeding-homophilic editors (small Φ hom -values) among both men and women compared to expectation (shaded area). Note that below Φ hom < 0.1 there are only few strongly homophilic female editors. For male editors, significant homophily extends through many more editors until Φ hom < 0.6. ( f ) Using all data until 2015, the probability that a women is appointed is above expectation (shaded areas) only for female editors and only when all or all but the most extremely inbreeding-homophilic editors are included in the analysis.

While these findings seem to point at homophily created by choices, they might also stem from “baseline” homophily ( McPherson et al., 2001 ), i.e. subtle but unavoidable bias caused by disproportions in the number of reachable male and female nodes due to heterogeneous network structure. We first checked for the influence of local subnetwork structure on apparent gender bias by looking at different scientific fields, including those with relatively mild underrepresentation of women, and found homophily widespread across disciplines ( Figure 3c ). Second, a more detailed analysis of inter-node gender correlations in the editor-to-reviewer appointment network detected a clear tendency to gender homophily already at the level of the narrow neighborhood of individual nodes ( Figure 3d ). Specifically, to control for baseline homophily at the level of a narrow local neighborhood, we measured, for each editor node, the actual number of reviewer assignments given to women. We then subtracted from this number its chance expectation, derived individually for every node from the frequency of locally reachable female reviewers, i.e. reviewers situated at most five links away (which is a short distance relative to the average shortest path length of 12 steps for the editor-to-reviewer network, cf Figure 1—figure supplement 1e ). Even at this local neighborhood level, we continued to find that male (female) editors generally appointed female reviewers at a lower (higher) rate than expected. Both independent analyses – by topic or localized – validate the existence of a so-called “inbreeding” homophily, i.e. an active preference to connect with same-gender network nodes, on top of “baseline” homophily ( McPherson et al., 2001 ).

Finally, we wondered whether the observed inbreeding homophily in the network was due to the presence of a few strongly homophilic editors or whether, alternatively, homophilic attachment was a feature shared by most editors. To that end, we defined an index of inbreeding homophily at the local level of each editor node. For each considered editor node, we first evaluated the number k of connected same-gender reviewers. We then evaluated the probability 0 ≤ Φ hom  ≤ 1 that k (or more) homophilic connections could arise by baseline homophily only, taking into account the editor-specific basin of locally reachable male and female reviewers (defined as for Figure 3d ). Such Φ hom can serve as an index tracking the strength of inbreeding homophily in shaping the actual reviewer appointments by an editor. Large values of Φ hom approaching 1 indicate that the observed gender homophilic choices of a given editor are plausibly just due to “passive” baseline homophily. In contrast, small values of Φ hom approaching 0 hint at a stronger tendency to “active” – consciously or unconsciously, see Discussion – inbreeding homophily. Figure 3e shows the histograms of the index Φ hom for male and female editors, compared with expectations from gender-shuffled networks. For male editors, most histogram bins for Φ hom < 0.6 displayed node counts significantly larger than gender-shuffled estimations. The histogram of Φ hom for female editors showed much fewer significant overrepresentation and most of them at very low values of Φ hom , however it remained compatible with gender-shuffled estimations for most of the Φ hom range.

These different distributions of inbreeding homophilic tendencies resulted in a gender-dependent impact of the reviewer-appointment choices of male and female editors in determining the overall number of female reviewer appointments. To determine this impact we pruned links originating from editors with inbreeding homophily index Φ hom below a growing threshold Φ thr (retaining only editors whose Φ hom satisfies 0 ≤ Φ thr ≤ Φ hom ≤ 1) and we did so separately for male and female editors ( Figure 3f ). After pruning the most homophilic male or female editors, we evaluated the new resulting probabilities of appointing a female reviewer. On the one hand, we found that it was enough to remove the few most homophilic female editors with the lowest values of Φ hom from the network, to bring the probability for a female editor to appoint a female reviewer back to chance-level. On the other hand, the probability for a male editor to appoint a female reviewer increased only very slowly by pruning more and more male editors. In particular, it remained significantly below chance expectations for all the considered thresholds for inclusion, 0 ≤ Φ thr ≤ 0.5. This means that the overall smaller-than-chance probability of appointing female reviewers for male editors is due to inbreeding homophilic tendencies that are widespread among male editors , although at varying degrees of strength. In contrast, the overall larger-than-chance probability to appoint female reviewers for female editors is driven by the action of just a small number of strongly homophilic female editors , with most other female editors showing only “passive” baseline homophily.

In this study, we found that apart from a few outliers depending on country and discipline, women are underrepresented in the scientific community with a very slow trend towards balance, which is consistent with earlier studies ( Larivière et al., 2013 ; Fox et al., 2016 ; Topaz and Sen, 2016 ; Lerback and Hanson, 2017 ; Nature Neuroscience, 2006 ; Shen, 2013 ; Nature, 2012 ). In addition, we found that women contribute to the system-relevant peer-reviewing chain even less than expected by their numerical underrepresentation, revealing novel and subtler forms of bias than numeric disproportion alone. We reported clear evidence for homophily beyond the expected baseline levels in both genders ( Figure 3 ) using a very large trans-disciplinary data set that allowed us to clarify a previously ambiguous picture ( Lloyd, 1990 ; Gilbert et al., 1994 ; Borsuk et al., 2009 ; Buckley et al., 2014 ; Fox et al., 2016 ). This network-level inbreeding homophily is driven by a large fraction of male editors, together with only a few highly homophilic female editors.

Evolution of participation rates by gender and causes for remaining inequity

To start our discussion on a positive note, we found that the participation of women in science, at least in terms of their numerical representation, has increased during the last years, which is consistent with other studies. The number of female doctoral recipients at US institutions increased by, on average, 0.1% - 0.6% per year between 2005 and 2015, depending on broad field of study ( National Science Foundation, 2016 ). Ley and Hamilton (2008) reported that the number of fraction of women in medical schools increased by, on average, 0.6% to 0.8% per year (depending on the scientists' rank) between 1996 and 2007. Percentages of women professors has increased at a rate of 0.5%-1% per year in the European Union ( ETAN Expert Working Group on Women and Science,ETAN Expert Working Group on Women and Science, 2000 ). The fraction of publishing female scientists in Germany increased by, on average 0.7% from 2010 to 2014, it is now 30.9% ( Pan and Kalinaki, 2015 ). Fox et al. (2016) found that the number of selected female reviewers in Functional Ecology increased by, on average, 0.8% per year between 2004 and 2014, while, notably, the number of female editors increased by, on average, 3.8% per year. Caplar et al. (2016) noted that the number of female first authors of astronomy articles increased by about 0.4% per year between 1960 and 2015. In the Frontiers series of journals, we found that the number of contributions by female authors, reviewers and editors increased by, on average, 1.1% / year, 1.2% / year and 0.9% / year between 2012 and 2015, respectively, similar to the numbers above.

What could be the reasons for the remaining inequity? It has been argued that underrepresentation of women in science may be due to conscious career choices by female researchers ( Ceci et al., 2009 ; Ceci and Williams, 2010 , 2011 ), even if it is not clear to which extent these choices are really free or rather constrained by society. Previous studies reported that, measured by their number of publications, women are generally less productive than men ( Cole and Zuckerman, 1984 ; Zuckerman, 1991 ; Long and Fox, 1995 ; Xie and Shauman, 1998 ; Pan and Kalinaki, 2015 ; Caplar et al., 2016 ) and it has been suggested ( Xie and Shauman, 1998 ) that this might be due to personal characteristics, structural positions, and marital status. Moreover, the fraction of female scientists decreases with rank or age ( ETAN Expert Working Group on Women and Science, 2000 ; Ley and Hamilton, 2008 ; Goulden et al., 2011 ) and this shorter career length might contribute to the drop of female-to-male ratio for a high number of contributions. Nevertheless, women who persevere longer in their career despite obstacles are highly performing. While the productivity of young publishing female scientists in Germany was 10% lower than that of their male counterparts, the discrepancy reduced to just 3% for more senior scientists ( Pan and Kalinaki, 2015 ). Also, it has been reported that women with children are not less productive than those without ( Hamovitch and Morgenstern, 1977 ; Cole, 1979 ; Cole and Zuckerman, 1987 ), although young children might decrease productivity ( Kyvik, 1990 ; Kyvik and Teigen, 1996 ). The low number of women among senior scientists might be particularly detrimental for a gender-neutral evaluation of scientific work, as the implicit association of “male” and “science” is strongest in the group of 40-65 year olds ( Nosek et al., 2007 ). Moreover, declining an invitation to review is often due to a lack of time ( Tite and Schroter, 2007 ) and it is possible that female scientists spend more time with duties beyond research (e.g. teaching, mentoring, service;  ETAN Expert Working Group on Women and Science, 2000 ; Knapp, 2005 ; Misra et al., 2011 ). On the other hand, a compensating factor seems to be that female editors, in contrast to authors, have been reported to be more productive than male editors ( Gilbert et al., 1994 ). Interestingly, men and women who are invited to review a manuscript have very similar propensities to accept the invitation ( Fox et al., 2016 ; Lerback and Hanson, 2017 ), suggesting: (1) that simply increasing the number of invitations to female reviewers would have a direct and proportional effect; and, (2) that the low number of female reviews in our data is caused in part by a lower number of invitations.

The underrepresentation and discrimination of women in the scientific community is a problem that will not solve by itself, given the pervasive, generally unconscious nature of gender bias. Women have been reported to be less likely to be hired ( Moss-Racusin et al., 2012 ), to receive a grant ( Wennerås and Wold, 1997 ), and to receive higher salaries ( Shen, 2013 ). Still today, most people implicitly associate science with men, and liberal arts with women more than the other way round ( Nosek et al., 2007 ), and this tendency, for both men and women, is apparent from a very young age ( del Río and Strasser, 2013 ; Bian et al., 2017 ) and possibly reinforced by social dynamics in school education ( American Psychological association, 2007 ; Duru-Bellat, 2008 ). Beyond that, men are more reluctant than women to believe that such a bias exists ( Handley et al., 2015a ), manifesting lack of interest for the problem (“negligence”) or, even, consciously assuming that gender discrimination cannot be avoided (“philosophical acceptance”) more often than females ( Parodi, 2011 ).

How representative are the Frontiers journals?

The data analyzed here comprises a wide spectrum of scientific topics and the findings should generalize. However, Frontiers articles are unusual insofar as they undergo open peer review, whereas the identity of reviewers is not revealed in most other journals. Ambiguous reports exist whether open-peer review (as opposed to single- or double-blind peer-review) affects potential reviewers' willingness to assess a paper ( Nature Neuroscience, 1999 , van Rooyen et al., 1999 ; Ware, 2008 ; Baggs et al., 2008 ). In particular, a primary concern in disclosing reviewer’s identity is the possibility that a rejected author may also become a prospective employer for the reviewer and hence a possible reluctance of peers in more vulnerable positions to accept an invitation to review. While it is conceivable that assignment rejection due to non-anonymity is more likely for early career scientists, we do not see any reason for a direct effect of gender and such an effect has not been reported to the best of our knowledge.

Then, how does the population of scientists contributing to the Frontiers series of journals compare to other scientometric populations? First, we compared our authorship data to that of Larivière et al. (2013) who analyzed gender bias in articles from a wide range of journals that were published between 2008 and 2012, comprising about 3 million authors. While no analysis of peer review is performed therein, this study comprises an order of magnitude more authors than in our study. It can therefore serve as a benchmark for gender-composition among authors. They reported that 42% of authors in their analyzed scientific articles were women, whereas we found that number to be 39% in the Frontiers journals. Given uncertainties in determining a person's gender these numbers are comparable. Broken down by country, we find overall similar fractions of female authors, although, for some countries, the relative deviations can rise up to 29% ( Supplementary file 1 ). However, small sample sizes, together with, possibly, a varying popularity of the Frontiers journals in different countries, might contribute to such deviations.

Second, not much data was available concerning gender bias among reviewers and editors, until very recently. Many previous studies (cf. Supplementary file 2 ) were self-diagnoses performed by editorial boards of the corresponding journal and, as a consequence, tended to be based on mono-disciplinary data of relatively small sample size. Larger sample sizes, but limited to editors, were considered in an analysis of the composition of editorial boards of 435 mathematical journals ( Topaz and Sen, 2016 ). Only 9% of editors were women. Other reported numbers for the fraction of women editors in journals of different disciplines range from 38% to 54% (cf. Supplementary file 2 ). These numbers lie at the lower and upper end of the female editor fractions across the Frontiers journals, ranging between 6% (Frontiers in Robotics and AI) and 37% (Frontiers in Aging Neuroscience), with an average of 28%. Concerning female reviewers or female reviewer appointments, fractions reported in the literature range between 16% and 48% (cf. Supplementary file 2 ), to be compared with the range between 11% and 48% for the Frontiers journals with an average of 30%. Concerning female authors, Pan and Kalinaki (2014) report fractions ranging from 15% in computer science to 57% in veterinary science. These numbers are once again comparable to female author fractions in Frontiers journals, ranging from 17% (Neurorobotics) to 48% (Public Health). Overall, our study provides thus a global account on the prevalence of women among editors and reviewers and ranks previous reports in a continuum of field-specific participation numbers. Importantly, our data is consistent with these diverse reports, highlighting that the Frontiers peer-review networks are well representative of widespread patterns.

Our work calls for a detailed comparison with another recently published report about peer reviewer assignments in 20 journals of the American Geophysical Union (AGU), based on a slightly smaller sample size compared to ours ( Lerback and Hanson, 2017 ). This study reports information about aspects that our study could not have access to, breaking down women’s underrepresentation by age and showing that the decline rate for invited reviews is only slightly smaller for women than men. Overall, relative fractions of female participation reported by this study are compatible with numbers we found for the journal Frontiers in Earth Science, with e.g. a matching female reviewer appointments fraction close to 20%, suggesting that women play a larger role in other fields compared to that report (cf. Figure 1c ). For the AGU journals the authors conclude that editors, especially male ones, appoint too few female reviewers. Male editors’ behavior in that study thus agrees with our findings for the entirety of Frontiers journals, while we find an opposite trend for female editors. We note here that Lerback and Hanson (2017) reached their conclusion of women’s underrepresentation by comparing actual reviewer appointment numbers to the fraction of female first authors. This comparison, however, might be questionable, because reviewers in low age groups are rarely invited by editors (3% of times) whereas first authors tend to be young. To account for such differences, we determined expectation levels by gender shuffling among the reviewers and editors in the fixed network of actual reviewer-editor interactions and find that the fraction of female authors (the expectation value that Lerback and Hanson used) is much higher than the expected number of female reviewer contributions (our expectation value; cf. Figures 1b and 3a ). For that reason, Lerback and Hanson may have quantitatively overestimated the female editors’ bias against female reviewer appointments. Still, despite this overestimation, even Lerback and Hanson reported female editors’ preference for female reviewers for certain age classes (although not commented upon).

Homophily in society and science

The phenomenon of gender homophily in peer-reviewing networks have already been described, but these previous reports have reached ambiguous conclusions. Lloyd (1990) found that female reviewers accepted female-authored papers at a higher rate than those of male authors, whereas male reviewers did not show such a bias. In contrast, Borsuk et al. (2009) reported that male and female reviewers were equally likely to reject a female-authored paper. The probability that a female editor appoints a female reviewer was reported to be 31%-33%, whereas male editors appointed female reviewers in 22%-27% of cases ( Gilbert et al., 1994 ; Buckley et al., 2014 ; Fox et al., 2016 ). Here, for the whole spectrum of Frontiers journals we found these numbers to be similar: 33% and 27%, respectively. However, our study concludes for the existence of significant inbreeding homophily in the reviewer appointing behavior for both male and female editors, and does so based on a pluri- rather than mono-disciplinary data set, substantially larger than all previous accounts of homophily in peer review.

Socrates, in Plato’s Phaedrus, already asserted that: “similarity begets friendship”. Homophily – or “attraction for the similar”, not only limited to the gender attribute – is ubiquitous in social networks. Since the classic studies of Park and Burgess (1921) and Lazarsfeld and Merton (1954) , gender homophily has been found in groups of playing children ( Bott, 1928 ; Shrum et al., 1988 ) and adult friends ( Verbrugge, 1977 ) and is also present in work environments ( Brass, 1985 ; Bielby and Baron, 1986 ; Ibarra, 1992 ) and voluntary organizations ( Popielarz, 1999 ). Since focused interactions between co-workers favor the formation of relations, operation in already homophilic environments will lead to an amplification of homophily ( Feld, 1981 ; Feld, 1984 ). In particular, homophilic styles of professional interaction with peers may persist since the time in which they were (un-)consciously learned in homophilic school environments ( Vinsonneau, 1999 ).

Importantly, even a slight homophily can influence and alter the way in which information spreads ( Yava and Yucel, 2014 ) and opinions form through the social network of interactions, leading to the emergence of “dead-end” cultural niches ( Mark, 2003 ). Homophilic groups indeed tend to vote together when asked to decide for something ( Caldeira and Patterson, 1987 ) and have similar prospective evaluations, a same mindset ( Galaskiewicz, 1985 ). While homophily can in principle be put to good use, as for instance in the education about good health practices ( Centola, 2011 ), the uncontrolled effects of homophily may constitute a threat to the universalism of the peer-review system, and thus to science.

Gender-specific mechanisms of homophily

We observed very different patterns of homophily for male and female editors, with a widespread homophily across men, while dominated by very few highly homophilic editors for women. After removal of their contribution, homophily became insignificant (cf. Figure 3e,f ). This suggests that there is only baseline homophily for the majority of female editors and most assignments are gender-blind (for instance in the neuroscience community, cf. Figure 3c ). Differences between men’s and women’s homophily patterns are classically known, finding their root in different styles of social network construction. For instance, in situations where a mutual friendship exists between A and B a friendship initiation with C tends to be reciprocated by boys, but not by girls ( Eder and Hallinan, 1978 ). Such differences in attachment strategies tend to generate gender-segregated worlds for children to preadolescents in which girls evolve in small homogeneous groups and boys form larger but more heterogeneous cliques, with boundaries made looser only later by romantic ties ( Shrum et al., 1988 ). Professional social networks of men are more homophilic than women’s, especially in work environments in which men are dominant ( Brass, 1985 ; Ibarra, 1992 ). Another source of asymmetry may be that both men and women tend to form connection routes passing through a male node when reaching toward distant domains ( Aldrich, 1989 ).

One could speculate that other factors might contribute, like friendship or (perceived) status, competency and reputation. These factors might, in turn, be partly depending on gender, e.g. through implicit biases ( Nosek et al., 2007 ; Merton, 1968 ; Paludi and Bauer, 1983 ). Multiple categories of relationships were analyzed, for instance, by Ibarra (1992) who reported that, in a company setting, men named mostly men as points of contact for five different business-relationship categories, whereas for women the preferred gender was category-dependent. A similar situation could be at work here: one could speculate that a set of other, hidden, variables influence reviewer appointment decisions, and that these variables have a different importance for male and female editors. Determining which factors are most important for male and female editors in the choice of the reviewer and how these factors are or are not, in principle, related to gender, might thus aid in reducing homophily in the peer-review system.

Our finding of strongly homophilic “topology-organizer” female editors is reminiscent of the notion of “femocrat” introduced in political studies, referring to the role played by isolated feminists who, after having managed to integrate inside men-dominated decisional organisms, provide a bridge to the spheres of power for the requests of activists outside of them ( Yeatman, 1990 ). Now, while the active engagement of these femocrats is very useful in pushing forward technocratic (i.e. top-down) solutions aiming at reducing gender discriminations, especially at an early stage, on the long-term, the effects of their action may be precarious. Indeed political experiences have shown that when an external event reduces the influence of these isolated driver women, the situation can quickly deteriorate again ( Outshoorn, 2005 ), aggravated by the suspicious look toward femocrats held by formerly dominant men or, paradoxically, even women, finding them too prone to compromise or too aggressive ( Outshoorn and Kantola, 2007 ). It is thus important to devise strategies ‘healing’ network topology in depth, and in a bottom-up fashion, via pervasive education campaigns targeted to the deciders ( Sainsbury, 1994 ), in our case chiefly the editors. Such strategies are required to protect the acquirements of top-down actions against gender discriminations: increasing the number of women will not be enough to overcome gender bias ( Isbell et al., 2012 ; Avin et al., 2015 ).

Ideally, all scientific interactions are gender-blind. A scientist's status and the provision of resources to scientists should not be influenced by gender but solely depend on the value of the scientific contributions. Access to the publication systems is a critical determinant of a scientist’s success. Accordingly, reviewers and editors, the gatekeepers of the scientific canon, should be particularly sensitive to base their judgment solely on the merit of scientific work. This merit, however, is difficult to determine and any assessment is necessarily influenced by the assessor's view of the field, including his or her personal position in the network of colleagues and the interactions with them ( Mulkay, 1979 ; Cole, 1992 ).

Inbreeding homophily, an increased affinity between persons with similar attributes, appears to be a sociological, population-level trait of human societies. It is only natural, thus, that we find gender homophily in interactions between editors, reviewers and authors. Nonetheless, this inbreeding homophily is damaging to female scientists, whose work ends up being overlooked, due to unconscious negative bias. The phenomenon of inbreeding homophily is also likely not restricted to the peer review of manuscripts, so it needs to be taken into account for grant evaluation, hiring, or when designing mentoring programs. Importantly, it is likely to persist even when numerical balance between genders is achieved ( Isbell et al., 2012 ). Altogether, inbreeding homophily negatively affects science as a whole because a stronger involvement of women would increase the quality of scientific output ( Merton, 1973 ; Woolley et al., 2010 ; Nature, 2013 ; Campbell et al., 2013 ). Consequently, all scientists should wholeheartedly support the endeavor to remove gender bias from science - but how could that be achieved?

Initiatives to remove gender-based inequality can roughly be divided into two different categories. On the one hand, “gender mainstreaming” ( Special Adviser on Gender Issues and Advancement of Women, 2002 ) promotes the consideration among actors at all levels of every action’s and policy’s implications on women and men and is geared towards creating long lasting “bottom-up” changes. On the other hand, fast progress could be attempted through “top-down” implementation of technocratic instruments such as quota. This politically issued ‘state feminism’ ( Mazur and McBride Stetson, 1995 ), is suboptimal in that it might even “provide an alibi” for not modifying attitudes in depth ( Squires, 2008 ). As inbreeding homophily is an expression of a state of mind it is likely little amenable to change by externally enforced measures. Raising awareness, in comparison, seems to be the most promising route. The goal should be to motivate all scientific actors to “integrate thinking about gender discrimination in every decisional process” (translated from Woodward, 2008 ). Educative actions should be conducted with tact, not based uniquely on inducing feeling of guilt and shame, in order not to be perceived as annoying ( Woodward, 2003 ). At the same time, existing formal actions to reduce bias should be upheld.

In the field of peer review two more specific strategies are available to reduce bias: blind review and automated editorial management. However, both strategies are of limited acceptance and use. First of all, removing the authors’ names is often not sufficiently blinding. References to the authors’ previous publications or to the approving ethics committee all but spell out the authors. Second, while removal of the authors’ names does indeed blind the reviewers to all irrelevant attributes, it also blinds them to relevant meta-data, such as the scientific experience of the authors, which might be considered as relevant by many reviewers. In an attempt to assist editors of Frontiers journals, keyword-based reviewer suggestions are automatically provided to them but the editors remain free to make their own choices. While these gender-blind automated suggestions could already contribute to an assignment that is less influenced by homophily, an editorial management software is also the ideal platform to routinely direct the editor’s attention to the issue of homophily. It could display statistics similar to our Figure 3 and encourage non-homophilic choices of reviewers. Such a strategy maintains full editorial freedom and could easily be evaluated, either internally or, in the case of open review as in the Frontiers journals, through analysis of the publicly available data.

Given how engrained homophily is in our nature, the path towards a gender-blind science will be arduous. Yet, with the joined effort of the scientific community to overcome partisanship and discrimination, a merit-based system with equal opportunities for all scientists might just be within reach. After all, which social enterprise would be more apt to follow ratio over instinct than science?

Collection and parsing of data

All article data were exclusively obtained from the publicly available articles web pages from the Frontiers Journal Series (RRID:SCR_007214), which was listed (at the date of last data download in March 2016) on: http://www.frontiersin.org/SearchIndexFiles/Index_Articles.aspx , as well as the associated XML file if the HTML code of the article web page contained a corresponding reference. Subsequently, articles' metadata (article id, authors, reviewers, editors, publication date, etc.) were extracted from the XML files and the web pages. All gathered personal identity information was deleted after inference of individual genders (see later), resulting thus in a fully anonymized data set. In total, we analyzed 41’100 articles published before January 1st, 2016, covering 142 Frontiers journals from Science, Health, Engineering, and Humanities and Social Sciences. Our parsing routine was able to find information about authors in 41’092 of these articles, about reviewers in 39’788 articles (note that some articles, like editorial articles, might not have been reviewed), and about editors in 40’405 articles. The anonymized network data is provided as Supplementary file 3 .

To recognize and identify people re-occurring in more than one article, every person was assigned a unique identifier number (UID). When a contributor was found to be associated to an official profile identification number in the Frontiers database, then we relied on it, directly translating it into a UID (this happened for 71% of contributors). In the remaining cases, we decided whether a record matched another based on the names and affiliations of people. Specifically, for two names to be matched, we required that the surnames coincided and that each given name of the contributor with less given names needed to have a corresponding match in the other contributor's name (a match could also be an initial like “J” with a fully specified name like “John”). In case both contributors' given names consisted of only initials, we required, in addition, that their affiliations were sufficiently similar. Newman (2001) found that name-matching in the absence of UIDs, and even abbreviating all given names to initials, resulted in errors on the order of few percent in a data set comprising more than a million people. Correspondingly, as we expect the UIDs to be correctly associated with a contributor in the vast majority of cases, erroneously matching or not matching people is likely relatively uncommon.

Determination of gender

Each UID was assigned a gender based on their associated given names (note that after the steps described in the previous section, at least one first name was fully specified for 99.6% of the UIDs, while for the remaining 0.4% of UIDs all given names consisted of only initials so that no gender could be attributed). The extracted given names were compared with an extensive name list, assembled from public web-sources, such as:

http://japanese.about.com/library/blgirlsname_[a-z].htm ,

http://japanese.about.com/library/blboysname_[a-z].htm (retrieved December 9, 2015)

http://www.top-100-baby-names-search.com/chinese-girl-names.html ,

http://www.top-100-baby-names-search.com/chinese-boys-names.html

http://www.babynames.org.uk (retrieved December 11, 2015)

US census data ( https://www.ssa.gov/oact/babynames/limits.html ; retrieved March 17, 2016).

Note that some given names (like Andrea) are in use for both men and women. Gender-ambiguous given names present in the US census database were categorized to the gender to which they were more frequently attributed. When a name appeared as both male and female in one of the other sources, or when different sources did not agree on the gender for a name, we decided not to associate that given name with a gender.

We validated the gender assignment procedure by performing a web search for 1053 randomly selected people from our data set, and determining their gender based on a picture or the use of gender-specific pronouns in a biographical text. We were able to find such information for 924 out of the 1053 people (88%). The gender automatically assigned by our algorithm to those identified was correct in 96 % of cases. For comparison, we note that the name-gender algorithm used in Larivière et al. (2013) misclassified male and female names in 8% of cases.

Our list thus comprised 66605 female and 43482 male names. In addition to the name list, we manually assigned the non-automatically-identified gender of 643 people with a high number of re-occurrences. In total, we were thereby able to assign gender to 131885, that is 87 % of UIDs. All further analyses were done ignoring the remaining 13% of scientists.

Network construction

We represented the available data in directed networks ( Figure 1a ), in which vertices were individual scientists and edges denoted peer-reviewing interactions: is appointing in the editor-to-reviewer network, and is editing (reviewing) a manuscript of in the editor (reviewer)-to-author network. Year-resolved graphs were constructed by deleting all links representing articles that were published later than the given year.

Graph analytics

All graph analyses ( Figure 1—figure supplement 1 ) were performed with the freely-available Python igraph package.

In graph theory, a connected component is a subgraph in which any two vertices are connected to each other by at least one path, and which is connected to no additional vertices in the full graph. The largest of all the connected components of a graph is called its giant component. One can distinguish between the weak giant component (in which the direction of edges is ignored when building inter-node paths) and the strong giant component (in which the direction is taken into account). All the following graph analyses have been performed on the weak giant component of the networks observed at each time.

Transitivity undirected (clustering coefficient) is calculated as the ratio of triangles to connected triangles (triplets) in the graph, considering connections between nodes independent of their direction.

Average path length calculates the mean of the geodesic directed path lengths between all pairs of nodes in a connected component. The geodesic path length between a given pair of nodes is the minimum number of links needed to travel between the nodes along connected edges.

Small-worldness S is defined in Humphries and Gurney (2008) , as S=γ/λ . γ is the undirected transitivity of the graph divided by k/n , which is an approximation for the undirected transitivity of an Erdös-Rényi random graph with n nodes and average degree (in+out) of k . λ is the ratio of the average shortest path length of the graph to ln( n )/ln( k ), which is the average shortest path length of an Erdös-Rényi graph with n nodes and average degree k .

Statistical testing

Statistical significance was established by comparing a feature of the data to its confidence interval (CI). The graphic notations *, ** and *** denote that this feature lay outside the 95%, 99% and 99.9% CI, respectively. Confidence intervals were calculated by recalculating the given feature 10000 times, after permuting gender labels (with the exception of Figure 3 e where, for computational reasons, only 100 recalculations were performed). Specifically, Figures 1 and 2 are derived from a table with a column given the number of contributions (up to a specified time point) in a given role for each person, and another column of each person's gender, and the latter column was permuted keeping the former constant. On the other hand, confidence intervals in Figure 3 were obtained by repeatedly permuting genders among all nodes in a given graph, independent of their associated roles. The underlying graph used for Figure 3a and Figure 3c-f  was a suitably pruned editor-to-reviewer graph, out of which: we first removed all self-loops (i.e. editor and reviewer are identical); second, we deleted all leaf nodes, i.e. scientist who never edited or reviewed anything and had therefore a null out-degree; third, for Figure 3c , we removed cross-disciplinary assignments from journals not belonging to the indicated category. Similarly, Figure 3b was derived from a deleafed reviewer-to-author graph.

Inbreeding homophily at a local level

Figure 3d shows two histograms, one over all male editor nodes, the other over all female editor nodes. For each editor i who appointed at least 2 distinct reviewers we calculated a measure H i of inbreeding homophily. To compute it, we first measured the actual number of reviewer assignments given to women nodes by the considered editor i , W i . The next step was to subtract the expected number of reviewer assignments given to women, which would be observed if the given editor node appointed women with the average frequency p i they are appointed in its local vicinity. To evaluate p i we took the set of all editors (both males and females) at a distance of at most 5 directed edges from the considered editor node i . We counted the overall number A all of reviewer assignments made by these editors (i.e. the total number of edges originating from editor nodes in the neighborhood shell), and neglected those editors for which A all < 62 (i.e. we required that, on average, at each of the 5 steps away from the considered editor at least 2 novel reviewers are encountered that could not have been reached in a shorter step count). We then determined the number A female ≤ A all of reviewer assignments made toward female nodes. We finally assumed p i = A female / A all .

We could then compute the local inbreeding homophily measure H i =W i – A i p i , where A i was the total number of assignment made by each considered editor i .

We used a similar technique to assess the impact of the most homophilic editors on the overall network-homophily in the female editor-to-reviewer and male editor-to-reviewer networks. Let q i be the probability a person of the same gender is chosen by an editor i , where q i is calculated exactly as p i in the previous paragraph, i.e. by considering all people at most 5 directed edges away from editor i in the editor-to-reviewer network, counting the number of assignments these people gave to people of the same gender and dividing by the total number of assignments these people made. Next, let k i denote the number of assignments editor i gives to a person of the same gender and n i the total number of assignments editor i makes. Assuming editor i chooses the gender of a reviewer at random, the probability that i assigns k i out of the n i reviewers to have the same gender follows a binomial distribution binom ( k i ; n i , q i ) and Φ hom = ∑ ν = k i n i b i n o m ( ν ; n i , q i ) measures how likely it is that editor i assigns at least k i reviews to a person of the same gender.

  • Google Scholar
  • American Psychological Association
  • Pignolet Y-A
  • Dougherty MC
  • Koricheva J
  • Caldeira GA
  • Patterson SC
  • Campbell LG
  • Tacchella S
  • Williams WM
  • Zuckerman H
  • Sugimoto CR
  • Duru-Bellat M
  • Hallinan MT
  • ETAN Expert Working Group on Women and Science
  • Federal Glass Ceiling Commission
  • Galaskiewicz J
  • Williams ES
  • Lundberg GD
  • Hamovitch W
  • Morgenstern RD
  • Kocovsky PM
  • Moss-Racusin CA
  • Humphries MD
  • Harcourt AH
  • Knobloch-Westerwick S
  • Larivière V
  • Lazarsfeld PF
  • Hamilton BH
  • McBride Stetson D
  • McPherson M
  • Smith-Lovin L
  • Lundquist JH
  • Agiomavritis S
  • Brescoll VL
  • Handelsman J
  • National Science Foundation
  • Nature Neuroscience
  • Ranganath KA
  • Greenwald AG
  • Outshoorn J
  • Popielarz PA
  • Sainsbury D
  • Special Adviser on Gender Issues and Advancement of Women
  • van Rooyen S
  • Verbrugge LM
  • Vinsonneau G
  • Peter Rodgers Reviewing Editor; eLife, United Kingdom

In the interests of transparency, eLife includes the editorial decision letter and accompanying author responses. A lightly edited version of the letter sent to the authors after peer review is shown, indicating the most substantive concerns; minor comments are not usually included.

Thank you for submitting your article "Gender bias in scholarly peer review" to eLife for consideration as a Feature Article. Your article has been reviewed by three peer reviewers, and the evaluation has been overseen by a Senior/Reviewing Editor. The reviewers have discussed the reviews with one another and the Reviewing Editor has drafted this decision to help you prepare a revised submission.

This is an interesting and high quality manuscript on a topic that is important and timely. The manuscript (which is based on a novel data set) merits publication after revision to improve readability and replicability.

Essential revisions:

1) To maintain high standards of research reproducibility, and to promote the reuse of new findings, eLife requires all major data sets associated with an article to be made freely and widely available (unless there are strong reasons to restrict access, for example in the case of human subjects’ data), in the most useful formats, and according to the relevant reporting standards. (See below for more details).

2) How representative is the population studied by the authors? It would be interesting to compare the population studied by the authors with other large-scale studies to enhance the generalizeability of their findings. For example, it seems as if the authors could compare Figure 1—figure supplement 2 to Lariviere et al. (2013) in order to determine whether the population they study is significantly different to traditional scientometric populations. Also, there are several studies that show that women are less productive than men. This might be useful for explaining Figure 2 .

3) The authors should more fully acknowledge the limitations of the data set. For example, the authors mention that the names are publicly disclosed (this is what allows this analysis). However, several studies have shown that open peer review significantly affects the willingness of certain populations to conduct reviews (e.g., by age, rank, gender, minority status). Could the fact that names are disclosed make this less generalizable? The limitations of this should be discussed.

4) The analyses only included reviewers who agreed to review (understandably, as there is no publicly available data for reviewers who declined to review). However, women might be more likely to say no to requests to reviews for various reasons (e.g. they might have less time to review, or they might feel less qualified to comment on their peers' work). Hence, might their underrepresentation stem from self-selection rather than discrimination?

5) It would be interesting to comment on the possible effect of age on gender discrepancies at different levels. Specifically, editors might (on average) be older than reviewers, who in turn might (on average) be older than authors (although with large variability). Is it possible that higher gender discrepancy among editors and reviewers is because women might be less represented in older age groups?

6) There is sparse information on author name disambiguation (for those that lacked a UID). More information should be provided on this. The authors state that they cannot guarantee a "completely error-free parsing" – could they run a validation test to give an estimate of the error? And was any validation exercise conducted on the name-gender algorithm? How does this compare with other name-gender algorithms?

7) Several seminal books on peer review and articles specifically focused on bias in peer review are omitted here. Furthermore, the article fails to engage with the previous literature in a substantial way. Given that this work is one of the largest peer-review studies to-date, it would be useful for the authors to establish how their work refutes or confirms previous work.

8) Regarding the third paragraph in the Results section: please include a figure which shows that the gender gap in various contributions cannot be fully explained by population.

9) There are many interesting results that warrant more explanation in the text. At present, some of the most interesting results are buried in the figure captions. It would be great to see the authors embed some of these results, in more detail, in the main text.

10) Overall, the differences between observed and chance-level participation of women (for instance in Figure 2 ) are not that large, even though they are statistically significant. How concerned should we be about them in the first place?

11) The authors advise that editors should be encouraged to make non-homophilic reviewer selection. But if the goal is to increase gender parity, shouldn't women editors be 'allowed' to be more homophilic, at least until the parity achieved? Might it be better to suggest that all editors should try to recruit more women reviewers?

We thank the reviewer for the reminder. We have included the anonymized raw data in the revised submission.

We thank the reviewer for these suggestions. We have now augmented the Discussion section with a comparison of the Frontiers population with other scientometric populations (subsection “How representative are the Frontiers journals?”) and included two tables ( Supplementary files 1 and 2 ) summarizing previous studies. Specifically, we find that gender composition of Frontiers articles authors is similar to that of Lariviere et al. (2013), although when broken down by country outliers do exist. We also included a comparison with (1) a large data set for the gender composition of editors in mathematical journals (Topaz and Sen, 2016) and (2) a recent study (Lerback and Hansen, 2017) with data from 20 journals of the American Geophysical Union. We find that, overall, our study provides a global account on the prevalence of women among editors and reviewers and ranks these previous reports in a continuum of field-specific participation numbers.

While our data does not allow inferring the reasons for why women contribute less, some studies indeed suggested that women are less productive. These results are now mentioned and discussed as a potential reason in the revised Discussion (subsection “Evolution of participation rates by gender and causes for remaining inequity”, second paragraph). However, for completeness, we also refer to other partially conflicting studies indicating that women may have different, not necessarily lower productivity patterns than men (e.g. Gilbert et al., 1994; Pan & Kalinaki, 2015). Furthermore, we stress that the key new findings of our study concern homophily ( Figure 3 ) and that the smaller number of women has been taken into account in the analysis leading to these findings.

We thank the reviewer for pointing this out. We agree that open peer review could potentially have effects other than the intended increase in transparency and quality. These effects are now explicitly discussed in the revised manuscript (subsection “How representative are the Frontiers journals?”, first paragraph). Despite extensive search we could not find any study that specifically assesses whether open peer review affects the gender specific acceptance rate of reviewer assignments. Should we have missed a crucial publication, we would be very grateful for a reference.

We also now systematically compare our findings to a large number of previous studies which considered smaller sample-sizes but had access to more meta-data, since they were initiated journal-side. Overall, as we comment in the extended Discussion, we find women’s participation rates among editors, reviewers and authors of Frontiers journals to be comparable to previous reports, indicating that the Frontiers community is well representative of a general situation.

The reviewer is correct in suggesting that it is critical to distinguish between self-selection and discrimination. Our data set does not allow inferring the reasons for women’s underrepresentation and therefore self-selection (due to various motivations) could potentially play a role. This possibility, along with others, is now acknowledged and discussed in the revised manuscript (subsection “Evolution of participation rates by gender and causes for remaining inequity”, second paragraph). Besides this discussion, we also performed a more detailed analysis of the different prevalence of homophily in the male and female subpopulation. Taken together we feel that the data clearly point to same-gender preference of mostly male editors as a contribution to the under-representation of female scientists across the Frontiers series of journals. This conclusion is particularly strengthened by our new analysis of the different patterns of gender homophily for male and female editors ( Figure 3e -f).

Our data set does not include scientists’ ages and we are, therefore, unable to analyze how age affects women’s participation in the peer-review system. However, as the number of female scientists decreases with age and rank, the differences seen in the gender fraction of editors, reviewers and authors could well be influenced by the age distribution in these populations. The revised manuscript now mentions this possibility (subsection “Evolution of participation rates by gender and causes for remaining inequity”, second paragraph) and we further discuss our results in light of a recent study with access to age data (Lerback and Hansen, 2017; subsection “How representative are the Frontiers journals?”, last paragraph).

We thank the reviewer for pointing this out. The gender assignment is a critical step of the analysis and we have now included both a more detailed description of the name-gender algorithm and a validation test in the Materials and methods section (subsections “Collection and parsing of data” and “Determination of gender”). While errors are in principle unavoidable when identifying people based on their names, Newman (2001) estimated their rate to be only few percent. Combined with our reliance on user IDs for the vast majority of people, we expect that errors due to name disambiguation are uncommon.

To estimate the error rate in gender assignment, we have validated the genders of a random selection of 924 scientists through a manual web search (as described in subsection “Determination of gender”, second paragraph) and compared our assignment error with the supplemental material of Lariviere et al. (2013). Notably, our algorithm’s error rate is smaller than the reported error rate of the name-gender algorithm of Lariviere et al. (2013). Independent of the error rate, we would like to stress that mis-assigning gender can never increase the homophily that we see. Random mis-assignments have the same effect as a small rate of random permutation of gender labels. This would reduce evidence for inbreeding homophily rather than spuriously inducing it Thus, although we cannot reach a 100% correct gender assignment, the finding of inbreeding homophily is robust against this type of uncertainty.

We thank the reviewer for pointing this out. It is true that our previous version had the style of a short letter, rather than that of an extended article. We now rewrote and substantially extended the Discussion section of the manuscript, including generous paragraphs of comparison with other empirical work, the current state of the art and detailed discussions about why our study can clarify previously ambiguous pictures.

Despite these comprehensive additions and expanded reference section (99 references cited, while in the previous version we had just 31) we have failed to include the crucial references the reviewer had in mind; we would be keen to receive more explicit guidance.

Every figure in our article clearly indicates comparison with confidence intervals associated to a null hypothesis of no bias effects besides numerical differences between male and female populations. Therefore, our figures already showed the requested information.

Indeed, our statistical assessment – random shuffling of gender labels – leaves the network structure and the fraction of females at every level intact. In essence, we ask whether the observed contributions could have occurred randomly, if all actors of different gender behaved identically with respect to the contribution in question.

We apologize if this important procedural aspect was not clear enough. We have now better highlighted the concept of the shuffling method (cf. Figure 1A ) for the generation of gender-blind control networks, and edited the third paragraph of the Results section (third paragraph) in an attempt to make it more understandable. Furthermore, we have also improved figure captions.

We thank the reviewer for this suggestion. Former Figure S3, now Figure 1C , as well as former Figure S5, now Figure 3D are now presented in more detail in the Results section, (second and sixth paragraphs, respectively). We also better highlighted the homophily findings, which are a key novel contribution of our study. In particular, we have added a completely new section and figure panel about the observed different mechanisms of homophily for male and female editors ( Figure 3E and 3F ) in the main manuscript.

We further rewrote and substantially extended the Discussion section of the manuscript to now include a discussion of these findings.

We agree with the reviewer that it is important to discern statistical significance, effect size, and relevance. First, while the differences, for the most part, seem small, they can reach up to a factor of 2 between the observed fraction of females with 10 contributions and the chance range. Given the large range in the number of occurrences, visualized in Figure 2 , we had to use a logarithmic scale. This requires care when attempting to judge the magnitude of the bias just from the visual inspection of those graphs, while statistics are precise. Second, to highlight why our findings are concerning, we now include several sections in the revised Discussion that:

A) Argue why we are confident, in the light of previous findings, to state that discrimination is a critical reason for the remaining inequity (subsection “Evolution of participation rates by gender and causes for remaining inequity”, second paragraph);

B) List direct results of subtler forms of gender bias. Notably, even a slight homophily can influence and alter the way in which information spreads and is damaging to female scientists, whose work is perceived with a negative bias by the majority (subsection “Homophily in society and science”, last paragraph) and;

C) Translate the findings to specific structural disadvantages for women (subsection “Conclusions”, second paragraph).

The reviewer is asking an interesting question and we agree: on first glance, it might seem like a prudent policy to support homophily among female editors to reach equity.

In response to question 1, we would like to highlight, however, that the goal of scientific policy is not equity per se, but equal opportunity and scientific meritocracy. To this end homophily should be reduced, too, as homophily, even in a situation when parity is reached, prevents equal opportunity. This is now detailed in the revised Discussion in the first two paragraphs of the subsection “Conclusions”.

In response to question 2, encouraging editors of both genders to recruit more women might foster the advance of excellent female scientists and advertise that women do good science. We stress however that a broad literature corpus in sociology of work environments have shown that the efficacy of top-down measures aiming at enforcing equal representation is poor if not paired with policies aiming at educating deciders about the importance of fighting gender discrimination (i.e. healing “from within” network construction mechanisms, rather than just artificially “transplanting” more female nodes into the network).

Considering both sides of the medal, it is unclear whether the positive effects (faster route to equity) or negative effects (undesirable behavioral trait) prevail in the long run. We added a corresponding paragraph to the Discussion (last paragraph) and the subsection “Conclusions” (third paragraph).

Author details

  • Max Planck Institute for Dynamics and Self-Organization, Göttingen, Germany
  • Bernstein Center for Computational Neuroscience, Göttingen, Germany
  • Yale University, New Haven, United States

Contribution

For correspondence, competing interests.

ORCID icon

  • Institute for Systems Neuroscience, Aix-Marseille University, Marseille, France

Bundesministerium für Bildung und Forschung (01GQ1005B)

Marie curie career development fellowship (fp7- ief 330792 (dynvib)), boehringer ingelheim fonds.

The funders had no role in study design, data collection and interpretation, or the decision to submit the work for publication.

Publication history

  • Received: September 20, 2016
  • Accepted: February 27, 2017
  • Version of Record published : March 21, 2017

This is an open-access article, free of all copyright, and may be freely reproduced, distributed, transmitted, modified, built upon, or otherwise used by anyone for any lawful purpose. The work is made available under the Creative Commons CC0 public domain dedication .

Views, downloads and citations are aggregated across all versions of this paper published by eLife.

Download links

Downloads (link to download the article as pdf).

  • Article PDF
  • Figures PDF

Open citations (links to open the citations from this article in various online reference manager services)

Cite this article (links to download the citations from this article in formats compatible with various reference manager tools), categories and tags.

  • peer review
  • gender bias
  • large-scale data analysis
  • Part of Collection

Meta-research

Meta-Research: A Collection of Articles

Further reading.

The study of science itself is a growing field of research.

Equity, Diversity and Inclusion

The research community needs to do more to support scientists from underrepresented groups.

Research Culture: A Selection of Articles

Research culture needs to be improved for the benefit of science and scientists.

Be the first to read new articles from eLife

Howard Hughes Medical Institute

  • Open access
  • Published: 26 August 2024

Public attitudes towards personal health data sharing in long-term epidemiological research: a Citizen Science approach in the KORA study

  • Ina-Maria Rückert-Eheberg 1 ,
  • Margit Heier 1 , 2 ,
  • Markus Simon 1 ,
  • Monika Kraus 1 , 3 ,
  • Annette Peters 1 , 3 , 4 , 5 &
  • Birgit Linkohr 1 , 3  

BMC Public Health volume  24 , Article number:  2317 ( 2024 ) Cite this article

72 Accesses

Metrics details

Loss to follow-up in long-term epidemiological studies is well-known and often substantial. Consequently, there is a risk of bias to the results. The motivation to take part in an epidemiological study can change over time, but the ways to minimize loss to follow-up are not well studied. The Citizen Science approach offers researchers to engage in direct discussions with study participants and to integrate their opinions and requirements into cohort management.

Guided group discussions were conducted with study participants from the KORA cohort in the Augsburg Region in Germany, established 40 years ago, as well as a group of independently selected citizens. The aim was to look at the relevant aspects of health studies with a focus on long-term participation. A two-sided questionnaire was developed subsequently in a co-creation process and presented to 500 KORA participants and 2,400 employees of the research facility Helmholtz Munich.

The discussions revealed that altruistic motivations, (i.e. supporting research and public health), personal benefits (i.e. a health check-up during a study examination), data protection, and information about research results in layman’s terms were crucial to ensure interest and long-term study participation. The results of the questionnaire confirmed these aspects and showed that exclusively digital information channels may be an obstacle for older and less educated people. Thus, paper-based media such as newsletters are still important.

Conclusions

The findings shed light on cohort management and long-term engagement with study participants. A long-term health study needs to benefit public and individual health; the institution needs to be trustworthy; and the results and their impact need to be disseminated in widely understandable terms and by the right means of communication back to the participants.

Peer Review reports

In a long-term prospective cohort study, the motivation of people to participate over an extended period and trustfully share their health data is essential to investigating causal relationships between health and disease in constantly changing environments. However, loss-to-follow up, i.e. declining willingness to take part in follow-up examinations and questionnaires, is a major problem in all long-term prospective cohort studies [ 1 , 2 , 3 ], raising questions about the generalizability of results [ 4 ]. Information on the reasons to participate is often gathered at the initial sign-up of the study by short non-participant questionnaires [ 5 , 6 , 7 ], satisfaction polls after the study examination for internal conduct improvement, witness statements [ 8 ] or by chance when study participants comment to staff or leave remarks in questionnaires. Non-participants often report acute health problems or stressful life-events, but also unspecific reasons like lack of interest or time constraints. In good epidemiological practice, efforts to characterize the loss of follow-up during analysis are made [ 9 ] and particular groups can be identified, e.g. less educated groups or middle-aged men, depending on the cohort [ 10 , 11 ]. However, cohort management should seek to maximize participation in follow-up studies in the first place by trying to meet participants’ expectations. Personal attitudes towards data sharing may change during long-term studies, particularly in the light of the experience of the COVID-19 pandemic. To our knowledge, systematic research into cohort management strategies in long-term epidemiological studies is rare.

Citizen Science, also called “participatory research,” has increasingly been supported by public organizations in and outside of academic institutions to meet information requirements, increase transparency, and improve people’s attitudes towards science [ 12 ]. In 2022, the White Paper “Citizen Science Strategy 2030 for Germany” was published that comprehensively informs about Citizen Science, action areas, networking, funding, volunteer management, and many other aspects [ 13 ]. Meanwhile, a wide range of scientific projects covering all areas of interest are offered to the public [ 14 , 15 , 16 ]. Participatory research strategies have been introduced into health research in various initiatives (e.g. [ 17 ]) with the overarching goal “to reduce concerns about the use of data through intensive exchange with interested citizens and to demonstrate the opportunities it offers” [ 18 ]. Citizen Science in public health can be characterized by typology according to aim, approach, and size, depending on the level of engagement with the community [ 19 ].

Recently, Marcs et al. published a scoping review on Citizen Science approaches in chronic disease prevention where they used Citizen Science to identify problems from the perspective of community members, generate and prioritize solutions, develop, test and/or evaluate interventions, and/or build community capacity [ 20 ]. Frameworks for a systematic development of participatory epidemiology have also been proposed [ 21 ].

Our aim was to employ Citizen Science approaches to engage in direct discussion with study participants from a well-established epidemiological study to evaluate how to maximise study participation long-term by high response rates and low subsequent withdrawal of consent. We were particularly interested in the reasons for continuing to take part in follow-up studies as well as concerns and wishes regarding the collection and use of health data. The research methods combined Citizen Science approaches like qualitative research and co-design elements with a classical quantitative approach in a nested but work-efficient study design. The project was conducted in a randomly selected subgroup of participants of a long-term prospective cohort study and, for comparison, a group of independent citizens and employees of a large health research institution.

The Citizens Science project was embedded in the KORA study (Cooperative Health Research in the Region of Augsburg), an adult population-based prospective cohort study established in 1984 in the City of Augsburg and the adjacent rural counties Augsburg and Aichach-Friedberg in Southern Germany [ 22 ]. Briefly, the KORA-study consists of four cross-sectional baseline surveys (S1 from 1984/85 with N  = 4,022 (response: 79.3%); S2 from 1989/90 with N  = 4,940 (response: 76.9%); S3 from 1994/95 with 4,856 (response: 74.9%); and S4 from 1999/2001 with 4,261 (response: 66.8%)). The participants were randomly selected from population registries aged 25–74 years (S1: 25–64 years). The KORA study is still in active follow-up with a KORA study centre located in the City of Augsburg. A general health survey was sent out in 2021 to all S1 to S4 participants still living in the study area and with consent for recontact. 6,070 out of 9,109 participants answered the survey (66.6%).

The starting point of the project was qualitative research with three guided discussion groups: two with KORA study participants and one with newly recruited citizens. In a co-creation process at a subsequent meeting, a questionnaire was developed with a smaller group of volunteers from the discussion groups. For the quantitative part of the study, this questionnaire was mailed to participants of the KORA study and distributed to all employees of Helmholtz Munich.

Discussion groups

During the preparation of the study setup, a pilot discussion group was conducted with seven acquaintances of the involved scientists. For the two discussion groups with KORA volunteers, 183 KORA study participants were selected (criteria: 50% women, 50% participants of the latest KORA general health survey 2021 with online survey completion and 50% paper-based completion, born 1949–1969, residing in Augsburg or nearby). They were invited in writing by post and contacted by telephone. Citizens were recruited via a newsletter advertisement of the Volunteer Centre Augsburg [ 23 ], and posters and flyers that were distributed in shops, restaurants, the library, the University Hospital Augsburg, and other public places in Augsburg. To compensate expenses, e.g. for travelling, we paid a small expense allowance.

The discussions took place between May and June 2023 in the KORA Study Centre in Augsburg. Following a short impulse presentation on the KORA study, the attendees were asked to note their motivations, concerns, and wishes regarding the participation in a long-term observational health study separately on index cards. The number of cards was not specified. The participants had the opportunity to present each card to the group before it was displayed on a whiteboard sorted by the respective category. Guided by two moderators, the raised aspects were discussed in greater depth along with a set of prepared questions. To provide more information on data privacy and protection in the KORA study, the consent form and study information from the most recent KORA general health survey in 2021 were distributed. Each discussion group lasted about 90 min and was rounded up with a little get-together at the end. The discussions were audiotaped with Audacity ® 3.2.5 and a microphone of the conference system Logitech CC3000e ConferenceCam and transcribed subsequently. In the aftermath, the index cards were coded according to reoccurring themes. One of the authors, who was part of all three discussion groups, developed a coding scheme with the help of the audiotapes. The scheme was reviewed by another author who was not present at the discussions, and consensus was found in terms of discrepant interpretation. Anonymized quotes were selected and translated for publication purposes.

Questionnaire development

The discussion group participants were invited to a subsequent meeting to develop the questionnaire together with the researchers in a co-creation process. The aim was to recruit six volunteers (two per group) to discuss a prepared questionnaire draft in the light of the results from the discussion groups. The questionnaire was designed for mailing to the KORA study participants first and modified slightly for the employees of Helmholtz Munich thereafter. It consisted of questions on the three pre-defined categories motivations, concerns, and wishes and a section on personal data such as sex, age, and school education. Many of the questions were formatted as 5-point Likert scales.

The questionnaire was piloted at the Institute of Epidemiology, and the final version was also translated into English for the Helmholtz Munich employees (Supplement).

Questionnaire survey

The paper version was posted to 500 selected KORA participants, equally balanced by sex. They were randomly chosen from the KORA S1-S3 studies from a total of N  = 2,933 participants born between 1964 and 1945, still living in the study area, and with consent for recontact. 400 of them had taken part in the latest KORA general health survey in 2021, while 100 had not. The approximately 2,400 Helmholtz employees were invited to complete the questionnaire personally on paper in the canteen on campus or online (in PDF format).

Ethics approval and consent to participate

All discussion group participants gave their written informed consent to take part in the discussions. The questionnaire was conducted anonymously, and no written informed consent was required. This study protocol was approved by the ethics committee of the Bavarian Medical Association (EC 23010).

Statistical analysis

The data from the completed questionnaires was transferred to a database and analyzed primarily with R and RStudio (Boston, MA, USA). Characteristics of the qualitative study groups were reported with absolute numbers, and characteristics of the quantitative questionnaire study population with numbers and percentages. The R-package „Likert“ was used to create Likert scale charts (Figs.  1 and 2 ). Percentages were calculated to sum the two categories “not important” and “not very important”, and the two categories “important” and “very important”, respectively. The category “neutral” was also visualized, and the percentages were given. Figure  3 was set up in Excel. Percentages were calculated and displayed by education level after exclusion of participants with missing information on education ( N  = 1) and those who had no school-leaving certificate ( N  = 2). Significance tests were not performed because the statistics were descriptive and not adjusted for confounding factors.

figure 1

Reasons to participate in the KORA study or a long-term health study. Percentages on the left represent purple responses, percentages on the right represent green responses

figure 2

Concerns about data protection, linkage of study data with secondary health information, and use for non-public research. Percentages on the left represent green responses, percentages on the right represent red responses

figure 3

Preferred information channels to disseminate research results of the KORA study, stratified by school education

Twenty-four people participated in the three discussion groups (17 probands of the KORA study, 7 citizens, 11 women, and 13 men). Their age range was 42 to 78 years (mean age: 65 years). 14 people reported high (12–13 years), 9 intermediate (10 years), and one person low (9 years) school education.

Table  1 shows the results of the group discussions stratified by category. There was no major difference between KORA participants and the citizen group. Most ideas were raised in the category motivations, followed by wishes and concerns. We excluded statements that went beyond the scope of a health study (concerns: general criticism of the health system (3x) and study staff would not listen (1x); wishes: individual health advice (8x) and contact between participants (2x)).

The number of people who referred to one of the aspects listed in the table is depicted in column N.

For many volunteers, a motivation to take part in the KORA study or a health study in general was the free preventive medical check-up in the form of the study examinations.

Discussion Group 1 , KORA participant: “So , my motivation to join was to get information about my health that I wouldn’t have gotten otherwise.”

Additionally, the discussants placed great importance on the benefits for the public, their contribution to health research, and their interest in it.

Discussion Group 3 , KORA participant: “In terms of motivation , the focus is , of course , quite clearly on the fact that the benefit is for the general public.” Discussion Group 1 , KORA participant: “And then , of course , that one contributes to general research.”

The professional conduct of the study was also mentioned several times.

The participants raised fewer issues in the category concerns than in the categories motivations and wishes. The main aspects were protection and security of health data in KORA or generally in health studies.

Discussion Group 1 , KORA participant: “My concerns are (…) data protection and data usage. Not particularly in relation to Helmholtz Munich , but the overall (…) misuse , data hackers , cybercrime , all that stuff. And that will increase even more in the future.” Discussion Group 2 , Citizen: “…it is always difficult with data protection in an international comparison. We have very high standards here , but can we maintain them in the long term? Because , of course , we also create barriers that are incomprehensible to others.”

Some of these concerns were not directed at the discussants themselves but rather at younger people who might suffer greater harm through misuse. Discrimination in professional life or when taking out insurance were mentioned as examples in this context.

Discussion Group 1 , KORA participant: “Personally , I wouldn’t mind (…) , but with younger , working people , I would probably have a different opinion. Because today , you can supposedly already say that people might get certain diseases at some point. (…) And I think that is dangerous if this information goes to the insurance companies or to the employers themselves (…).”

The participants did express their trust in Helmholtz Munich as a publicly funded research institution, and the consent form and study information were considered informative and clear; some participants even found them too detailed.

A minority of the participants had no worries whatsoever.

Discussion Group 3 , KORA participant: “I really can’t say anything about concerns. If my data were published with my name , I wouldn’t care at all.”

In the category wishes, the participants pointed out that more communication on study results and their translation into the health care system would motivate them long-term to participate in a study.

Discussion Group 2 , Citizen: “(…) the research results must be disseminated more widely. In my opinion , they have primarily been intended for experts.” Discussion Group 2 , Citizen: “I find the contributions on the Internet (…) terrible. The layperson gets all mixed up. You’d have to clean up that mess , too.”

Many participants indicated that simple, brief, and comprehensible communication was appreciated. Some discussants preferred digital formats, while others explicitly stated that they wanted paper-based communication only. Overall, the discussion group participants were open to health research and were interested in more frequent examinations and additional study offers.

A two-page questionnaire was developed in a meeting between two out of the 24 discussion group participants and two researchers. The participants pointed out some complicated questions and assessed the overall comprehensibility.

The survey was completed by 278 KORA participants (response rate: 67% in those who had participated in the latest KORA follow-up and 9% in those who had not participated) and 285 Helmholtz Munich employees (response rate: about 12% as the exact number of employees was not available), resulting in a total study population of 563 people. The characteristics of the study population are displayed in Table  2 . Approximately the same number of women and men took part in the survey. The KORA study participants were between 58 and 78 years old (mean age: 67.9 years). The Helmholtz Munich employees were younger, mostly between 20 and 50 years old (mean age: 39.8 years). About one-third of the KORA participants had low (9 years), intermediate (10 years), and high (12–13 years) levels of school education. In contrast, most of the Helmholtz Munich employees (89.2%) had a high level of education. 71.4% of the Helmholtz Munich employees worked scientifically, and 70.4% had German citizenship.

In the questionnaire, participants were asked how important they rated the three listed reasons to participate in the KORA study or a long-term health study (Fig.  1 ). The answers of the KORA study participants and the Helmholtz employees were very similar. A majority of about 90% deemed “contributing to health research” and “benefits for the general public” as very important or important. “Free comprehensive medical check-ups” were also seen as important or very important by about 70%, while about 20% took a neutral position on this aspect.

Differences between the two participant groups were found regarding questions about concerns in relation to data protection and data linkage (Fig.  2 ). Only a small proportion of the KORA study participants had reservations about data protection in the KORA study (3%). Concerns or strong concerns increased with regards to linking their study data to secondary health data such as diagnoses by their physicians (7%), prescription and treatment data by their health insurance (14%), but it decreased with regards to the cause of death sometime in the future (7%). In comparison, 35% of the Helmholtz Munich employees had concerns or strong concerns about data protection in a long-term health study. Data linkage was seen critically by 35% regarding study and physician diagnosis data, by 41% regarding study and health insurance data, and by 17% regarding study and death certificate data.

A larger proportion in both groups (29% of the KORA participants and 57% of the Helmholtz Munich employees) indicated concerns or strong concerns about the utilization of their health data by non-public research organizations.

The KORA participants were asked how they would like to be informed about the research results of the KORA study. Multiple selections were allowed. Figure  3 shows the percentages stratified by school education. Participants with a high level of school education preferred digital channels such as electronic newsletters and websites, in contrast to participants with low or intermediate school education, who preferred information, i.e. newsletters by paper mail. About 20% of each group indicated that they would appreciate coverage of scientific research results via newspapers, radio, and TV, while books were only interesting for a small proportion of participants. Less than 10% did not wish for any information. Of the 147 participants who chose a newsletter by paper mail, 20% also selected a newsletter by email, and 4% also selected the website category – thus, 77% of those who chose paper mail wanted no digital information.

Using Citizen Science approaches, this project examined the motivations, concerns, and wishes of research participants to help slow down the decline in follow-up study participation. The KORA study was established almost four decades ago and is still in active follow-up with relatively high response rates, e.g. 64% in an examination in 2018/19 [ 24 ] and 66.7% in a general health survey in 2021. Longitudinal data is particularly informative for life-course health research, but few studies exist on how to keep up motivation in follow-up studies. The findings from the discussion groups and the questionnaire survey showed that participants can be motivated to provide their personal health data for scientific purposes over long periods of time if their expectations are met. Three main reasons to participate in a long-term health study were identified: the benefit to the public, scientific progress, and personal health. Those findings are consistent with a previous study led by KORA scientists in 2010 on the public perceptions of cohort studies and biobanks during the recruitment phase of the German National Cohort (NAKO) [ 25 ]. They found that in general, citizens approve epidemiological research based on expectations for communal and individual benefits (e.g., health check-ups and health information). This shows that the basic motivation for study participation does not change between study initiation and long-term follow-up. Collaboration with science [ 26 ], making a contribution to society [ 27 ], and receiving information about personal health [ 28 ] have also been known as motivations for study participation in clinical studies. In a recent study on retaining participants in longitudinal studies of Alzheimer’s disease, altruism and personal benefit were the factors associated with continued study participation as well [ 29 ].

In the discussion groups, data protection did not come up as a major concern and was not necessarily directed at the KORA study. In the questionnaire, participants had no strong concerns about their data in the KORA study, even for data linkage. This is in line with the findings by Bongartz et al. that the trustworthiness of those conducting research appeared to be most important for the decision to participate in a health-related study [ 30 ]. However, Helmholtz Munich employees expressed more concerns with regards to data protection and data linkage. A likely interpretation for this difference is that KORA participants referred to a specific study that they had a lot of experience with, while Helmholtz employees imagined some theoretical long-term health study. Moreover, the Helmholtz employees were, on average, younger, higher educated, and probably more informed about data protection and data security risks. Our findings showed that institutional trust is essential for long-term participation in a health study. Once trust is gained at initial sign-up, it is important to maintain it. The comprehensive study by Tommel et al. also supports the importance of trust [ 31 ]. They explored citizens and healthcare professionals’ perspectives on personalized genomic medicine and personal health data spaces in questionnaires and interviews. Cohort management can help maintain trust, but overall satisfaction with the health system, public health policy, or pandemics is outside its scope.

About one-third of the KORA participants and about two-thirds of the Helmholtz Munich employees expressed concern about sharing data with non-public research organizations. This is in line with findings that people are generally prepared to participate in epidemiological research if it is conducted by a trusted public institution, but that there is widespread distrust of research conducted or sponsored by pharmaceutical companies [ 32 , 33 ]. However, this degree of concern in both groups was somewhat surprising, as most KORA participants had given consent to sharing their data with industry previously, and Helmholtz Munich contributes to the translation of research into medical innovation with commercial partners.

The discussion group participants wished to be informed about the results and impact of the research in a generally understandable format. The information should be addressed to them personally, such as through a newsletter, rather than in the press, TV, or the internet. A notable proportion of the KORA participants wished to be informed via non-digital means. This is an important finding for those running population-based studies such as the German National Cohort [ 34 ] and their financing bodies. While the finding may be specific to the setting in Southern Germany and a long-term cohort study with aged participants, it is important to monitor the information preferences. In addition to digital tools, paper-based methods are still needed for many more years to not lose large groups of the general population. Future research should focus particularly on the digital readiness of older citizens, so that cohort management strategies can engage participants at their level. In long-term health studies, morbidity and mortality are often relevant health outcomes. Public health policies that enable secondary data linkage could also compensate for loss to follow-up and limit selection bias.

Strengths and limitations

A strength of this project is its diverse group of participants, which includes stakeholders from a long-term epidemiological study, independent citizens, and staff from a research institution earning their living in health science research.

The discussion groups were structured but allowed participants to explain their own narratives and introduce new issues. The questionnaire was administered to two very different groups of participants, and in part, similar results were obtained that confirmed each other (i.e., important motivations to take part in health research (Fig.  1 )).

With respect to limitations, a Citizen Science project depends on participants who are interested and motivated to take part. It is quite difficult to find enough participants, and only 24 discussion group volunteers do not necessarily represent the “general” public, especially as discussants with low education were underrepresented. Participants living in rural areas were completely absent due to the recruitment strategy that they had to live in a reasonable travel distance from the KORA study center. The dates and times of the discussion groups were fixed by the researchers and probably discouraged very busy people. However, the small fee and snacks seemed to motivate some of the participants with lower economic status to take part.

In addition, it cannot be ruled out that the ideas of the discussants as well as the answers of the questionnaire survey were influenced by social desirability, perhaps on a subconscious level, and people might thus act somewhat differently in real life than they indicated they would in a theoretical setting. In a group discussion, participants may give answers that they believe to be expected and that will please the interviewer or moderator. Social desirability bias was certainly less of an issue in the questionnaire survey as it was anonymous. However, the outcomes of the discussion groups generally agreed with the responses to the questionnaire given to KORA participants. This questionnaire represents the views of a pre-selected group of people who were recruited up to forty years ago and who still consent to be contacted again for follow-up research. The response to the questionnaire by the KORA participants was as expected: It was high among those who had participated in the latest KORA general health survey in 2021, but it was very low in those who did not participate at the time. This shows that participants who are lost to follow-up are difficult to re-engage.

Finally, the development of the questionnaire was intended to be a co-creation process between selected discussion group participants and scientists. However, the interest of the discussion group members in co-creation was low, and only two participants were willing to take part in this process. They improved the comprehensibility of the questionnaire draft but saw themselves clearly as contributors rather than co-creators. A successful co-creation process requires more capacity building than was possible in this project. As Laird et al. pointed out, Citizen Science approaches often face barriers like building up longer-term collaborative relationships, and their implementation is often time and resource constrained [ 35 ].

The Citizen Science approach opens a new possibility to get in touch with study participants more closely and to integrate their opinions and requirements into cohort management.

On the one hand, people are altruistically motivated when they decide to take part in a long-term health study, and they enjoy the possibility to contribute to public benefit and scientific progress. On the other hand, they also see benefits for their personal health. Concerns do not seem to prevail. Feedback in layman’s terms on the long-term results of the study is highly appreciated and should be addressed to the participant personally.

Cohort management should include regular feedback of results as a thank you for the data donation and contribution to society.

In other words, a long-term health study needs to benefit public and individual health, to be trustworthy regarding data protection and data use, and to provide long-term research results in generally understandable terms and in the preferred communication mode back to the participants.

Data availability

The datasets used and/or analyzed during the current study are available from the corresponding author on reasonable request via the application tool KORA.passt ( https://helmholtz-muenchen.managed-otrs.com/external/ ).

Abbreviations

Cooperative Health Research in the Region of Augsburg

Osler M, Linneberg A, Glumer C, Jorgensen T. The cohorts at the Research Centre for Prevention and Health, formerly ‘The Glostrup Population studies’. Int J Epidemiol. 2011;40(3):602–10.

Article   PubMed   Google Scholar  

Rabel M, Meisinger C, Peters A, Holle R, Laxy M. The longitudinal association between change in physical activity, weight, and health-related quality of life: results from the population-based KORA S4/F4/FF4 cohort study. PLoS ONE. 2017;12(9):e0185205.

Article   PubMed   PubMed Central   Google Scholar  

Volzke H, Schossow J, Schmidt CO, Jurgens C, Richter A, Werner A, et al. Cohort Profile Update: the study of Health in Pomerania (SHIP). Int J Epidemiol. 2022;51(6):e372–83.

Zivadinovic N, Abrahamsen R, Pesonen M, Wagstaff A, Toren K, Henneberger PK, et al. Loss to 5-year follow-up in the population-based Telemark Study: risk factors and potential for bias. BMJ Open. 2023;13(3):e064311.

Hoffmann W, Terschüren C, Holle R, Kamtsiuris P, Bergmann M, Kroke A, et al. [The problem of response in epidemiologic studies in Germany (Part II)]. Gesundheitswesen. 2004;66(8–9):482–91.

Article   PubMed   CAS   Google Scholar  

Enzenbach C, Wicklein B, Wirkner K, Loeffler M. Evaluating selection bias in a population-based cohort study with low baseline participation: the LIFE-Adult-study. BMC Med Res Methodol. 2019;19(1):135.

Holle R, Hochadel M, Reitmeir P, Meisinger C, Wichmann HE. Prolonged recruitment efforts in health surveys: effects on response, costs, and potential bias. Epidemiology. 2006;17(6):639–43.

NaKo - Botschafter. https://nako.de/studie/nako-botschafter/ . Accessed 07 May 2024.

Nohr EA, Liew Z. How to investigate and adjust for selection bias in cohort studies. Acta Obstet Gynecol Scand. 2018;97(4):407–16.

Powers J, Tavener M, Graves A, Loxton D. Loss to follow-up was used to estimate bias in a longitudinal study: a new approach. J Clin Epidemiol. 2015;68(8):870–6.

Kendall CE, Raboud J, Donelle J, Loutfy M, Rourke SB, Kroch A, et al. Lost but not forgotten: a population-based study of mortality and care trajectories among people living with HIV who are lost to follow-up in Ontario, Canada. HIV Med. 2019;20(2):88–98.

European Citizen Science Platform. https://eu-citizen.science. Accessed 07 May 2024.

Bonn A, Brink W, Hecker S, Herrmann TM, Liedtke C, Premke-Kraus M, Voigt-Heucke S et al. White Paper Citizen Science Strategy 2030 for Germany ( https://www.mitforschen.org/sites/default/files/grid/2024/07/24/White_Paper_Citizen_Science_Strategy_2030_for_Germany.pdf) 2022.

mit:forschen!. Gemeinsam Wissen schaffen (ehemals Bürger schaffen Wissen). www.buergerschaffenwissen.de. Accessed 07 May 2024.

Zooniverse. People-powered research. www.zooniverse.org. Accessed 07 May 2024.

European Commission - Marie. Skłodowska-Curie Actions. https://marie-sklodowska-curie-actions.ec.europa.eu/news/marie-sklodowska-curie-actions-funds-44-projects-to-bring-research-closer-to-education-and-society-across-europe . Accessed 08 May 2024.

Schütt AM-F, Weschke E. Sarah. Aktive Beteiligung Von Patientinnen Und Patienten in Der Gesundheitsforschung. Eine Heranführung für (klinisch) Forschende. Bonn/Berlin: DLR Projektträger; 2023.

Google Scholar  

NFDI4Health. Nationale Forschungsdateninfrastruktur für personenbezogene Gesundheitsdaten. https://www.nfdi4health.de . Accessed 07 May 2024.

Den Broeder L, Devilee J, Van Oers H, Schuit AJ, Wagemakers A. Citizen Science for public health. Health Promot Int. 2018;33(3):505–14.

PubMed   Google Scholar  

Marks L, Laird Y, Trevena H, Smith BJ, Rowbotham S. A scoping review of Citizen Science Approaches in Chronic Disease Prevention. Front Public Health. 2022;10:743348.

Bach M, Jordan S, Hartung S, Santos-Hovener C, Wright MT. Participatory epidemiology: the contribution of participatory research to epidemiology. Emerg Themes Epidemiol. 2017;14:2.

Holle R, Happich M, Lowel H, Wichmann HE, Group MKS. KORA–a research platform for population based health research. Gesundheitswesen. 2005;67(Suppl 1):S19–25.

Freiwilligen-Zentrum Augsburg. https://www.freiwilligen-zentrum-augsburg.de/ . Accessed 07 May 2024.

Rooney JP, Rakete S, Heier M, Linkohr B, Schwettmann L, Peters A. Blood lead levels in 2018/2019 compared to 1987/1988 in the German population-based KORA study. Environ Res. 2022;215(Pt 1):114184.

Starkbaum J, Gottweis H, Gottweis U, Kleiser C, Linseisen J, Meisinger C, et al. Public perceptions of cohort studies and biobanks in Germany. Biopreserv Biobank. 2014;12(2):121–30.

Costas L, Bayas JM, Serrano B, Lafuente S, Muñoz MA. Motivations for participating in a clinical trial on an avian influenza vaccine. Trials. 2012;13:28.

Richter G, Krawczak M, Lieb W, Wolff L, Schreiber S, Buyx A. Broad consent for health care-embedded biobanking: understanding and reasons to donate in a large patient sample. Genet Med. 2018;20(1):76–82.

Akmatov MK, Jentsch L, Riese P, May M, Ahmed MW, Werner D, et al. Motivations for (non)participation in population-based health studies among the elderly - comparison of participants and nonparticipants of a prospective study on influenza vaccination. BMC Med Res Methodol. 2017;17(1):18.

Gabel M, Bollinger RM, Coble DW, Grill JD, Edwards DF, Lingler JH, et al. Retaining participants in Longitudinal studies of Alzheimer’s Disease. J Alzheimers Dis. 2022;87(2):945–55.

Bongartz H, Rübsamen N, Raupach-Rosin H, Akmatov MK, Mikolajczyk RT. Why do people participate in health-related studies? Int J Public Health. 2017;62(9):1059–62.

Tommel J, Kenis D, Lambrechts N, Brohet RM, Swysen J, Mollen L et al. Personal Genomes in Practice: Exploring Citizen and Healthcare Professionals’ Perspectives on Personalized Genomic Medicine and Personal Health Data Spaces Using a Mixed-Methods Design. Genes (Basel). 2023;14(4).

Slegers C, Zion D, Glass D, Kelsall H, Fritschi L, Brown N, et al. Why do people participate in epidemiological research? J Bioeth Inq. 2015;12(2):227–37.

Richter G, Borzikowsky C, Lesch W, Semler SC, Bunnik EM, Buyx A, et al. Secondary research use of personal medical data: attitudes from patient and population surveys in the Netherlands and Germany. Eur J Hum Genet. 2021;29(3):495–502.

Peters A, German National Cohort C, Peters A, Greiser KH, Gottlicher S, Ahrens W, et al. Framework and baseline examination of the German National Cohort (NAKO). Eur J Epidemiol. 2022;37(10):1107–24.

Article   PubMed   PubMed Central   CAS   Google Scholar  

Laird Y, Marks L, Smith BJ, Walker P, Garvey K, Jose K et al. Harnessing citizen science in health promotion: perspectives of policy and practice stakeholders in Australia. Health Promot Int. 2023;38(5).

Download references

Acknowledgements

We thank all participants of the discussion groups and the questionnaire survey for their contributions, the staff for data collection and research data management, and the members of the KORA Study Group (https://www.helmholtz-munich.de/en/epi/cohort/kora) who are responsible for the design and conduct of the KORA study.

The KORA study was initiated and financed by the Helmholtz Zentrum München – German Research Center for Environmental Health, which is funded by the German Federal Ministry of Education and Research (BMBF) and by the State of Bavaria. Data collection in the KORA study is done in cooperation with the University Hospital of Augsburg. The project was supported by the NFDI4Health (National Research Data Infrastructure for Personal Health Data) citizen-science 2023 initiative to support participatory research ( https://www.nfdi4health.de/community/citizen-science.html ).

Open Access funding enabled and organized by Projekt DEAL.

Author information

Authors and affiliations.

Institute of Epidemiology, Helmholtz Zentrum München, German Research Center for Environmental Health, Neuherberg, Munich, Germany

Ina-Maria Rückert-Eheberg, Margit Heier, Markus Simon, Monika Kraus, Annette Peters & Birgit Linkohr

KORA Study Centre, University Hospital of Augsburg, Augsburg, Germany

Margit Heier

German Centre for Cardiovascular Research (DZHK e.V.), Munich Heart Alliance, Munich, Germany

Monika Kraus, Annette Peters & Birgit Linkohr

Institute for Medical Information Processing, Biometry and Epidemiology (IBE), Faculty of Medicine, Ludwig-Maximilians-Universität München, Munich, Germany

Annette Peters

Partner Site München-Neuherberg, German Center for Diabetes Research (DZD), Munich-Neuherberg, Germany

You can also search for this author in PubMed   Google Scholar

Contributions

IMRE contributed to the conception, design, and conduct of the study, analyzed and interpreted the data, and drafted the manuscript. MH contributed to the conception, design, and conduct of the study, interpreted the data, and revised the manuscript. MS contributed to the design and conduct of the study, interpreted the data, and revised the manuscript. MK contributed to the design and conduct of the study, interpreted the data, and revised the manuscript. AP contributed to the conception and design of the study, interpreted the data, and revised the manuscript. BL contributed to the conception, design, and conduct of the study, interpreted the data, and drafted the manuscript. All authors read and approved the final manuscript. They agree to be accountable for their own contributions and that questions that may arise on the accuracy or integrity of the work will be appropriately investigated, resolved, and documented.

Corresponding author

Correspondence to Ina-Maria Rückert-Eheberg .

Ethics declarations

Consent for publication.

Not applicable.

Competing interests

The authors declare no competing interests.

Additional information

Publisher’s note.

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary Material 1

Rights and permissions.

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ . The Creative Commons Public Domain Dedication waiver ( http://creativecommons.org/publicdomain/zero/1.0/ ) applies to the data made available in this article, unless otherwise stated in a credit line to the data.

Reprints and permissions

About this article

Cite this article.

Rückert-Eheberg, IM., Heier, M., Simon, M. et al. Public attitudes towards personal health data sharing in long-term epidemiological research: a Citizen Science approach in the KORA study. BMC Public Health 24 , 2317 (2024). https://doi.org/10.1186/s12889-024-19730-0

Download citation

Received : 17 May 2024

Accepted : 08 August 2024

Published : 26 August 2024

DOI : https://doi.org/10.1186/s12889-024-19730-0

Share this article

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

  • Citizen science
  • Participatory research
  • Public engagement
  • Health data sharing
  • Epidemiological cohort management
  • Co-creation

BMC Public Health

ISSN: 1471-2458

research bias in peer review

  • - Google Chrome

Intended for healthcare professionals

  • My email alerts
  • BMA member login
  • Username * Password * Forgot your log in details? Need to activate BMA Member Log In Log in via OpenAthens Log in via your institution

Home

Search form

  • Advanced search
  • Search responses
  • Search blogs
  • Suicide rates among...

Suicide rates among physicians compared with the general population in studies from 20 countries: gender stratified systematic review and meta-analysis

Linked editorial.

Doctors and suicide

  • Related content
  • Peer review
  • 1 Department of Epidemiology, Center for Public Health, Medical University of Vienna, Vienna, Austria
  • 2 Department of Emergency Medicine, Vienna General Hospital, Medical University of Vienna, Vienna, Austria
  • 3 Department of Social and Preventive Medicine, Center for Public Health, Medical University of Vienna, Vienna, Austria
  • 4 Department of Epidemiology, Harvard TH Chan School of Public Health, Boston, MA, USA
  • Correspondence to: E Schernhammer eva.schernhammer{at}muv.ac.at ( @EvaSchernhammer on X)
  • Accepted 10 June 2024

Objectives To estimate age standardised suicide rate ratios in male and female physicians compared with the general population, and to examine heterogeneity across study results.

Design Systematic review and meta-analysis.

Data sources Studies published between 1960 and 31 March 2024 were retrieved from Embase, Medline, and PsycINFO. There were no language restrictions. Forward and backwards reference screening was performed for selected studies using Google Scholar.

Eligibility criteria for selecting studies Observational studies with directly or indirectly age standardised mortality ratios for physician deaths by suicide, or suicide rates per 100 000 person years of physicians and a reference group similar to the general population, or extractable data on physician deaths by suicide suitable for the calculation of ratios. Two independent reviewers extracted data and assessed the risk of bias using an adapted version of the Joanna Briggs Institute checklist for prevalence studies. Mean effect estimates for male and female physicians were calculated based on random effects models, with subgroup analyses for geographical region and a secondary analysis of deaths by suicide in physicians compared with other professions.

Results Among 39 included studies, 38 studies for male physicians and 26 for female physicians were eligible for analyses, with a total of 3303 suicides in male physicians and 587 in female physicians (observation periods 1935-2020 and 1960-2020, respectively). Across all studies, the suicide rate ratio for male physicians was 1.05 (95% confidence interval 0.90 to 1.22). For female physicians, the rate ratio was significantly higher at 1.76 (1.40 to 2.21). Heterogeneity was high for both analyses. Meta-regression revealed a significant effect of the midpoint of study observation period, indicating decreasing effect sizes over time. The suicide rate ratio for male physicians compared with other professions was 1.81 (1.55 to 2.12).

Conclusion Standardised suicide rate ratios for male and female physicians decreased over time. However, the rates remained increased for female physicians. The findings of this meta-analysis are limited by a scarcity of studies from regions outside of Europe, the United States, and Australasia. These results call for continued efforts in research and prevention of physician deaths by suicide, particularly among female physicians and at risk subgroups.

Systematic review registration PROSPERO CRD42019118956.

Introduction

In 2019, suicide caused over 700 000 deaths globally, which was more than one in every 100 deaths that year (1.3%). While the worldwide age standardised suicide rate was estimated at 9.0 per 100 000 population, there was great variation between individual countries (from <2 to >80 suicide deaths per 100 000). 1 The overall global decline in suicide rates by 36% since 2000 is not a universal trend because some countries like the United States or Brazil saw an increase of roughly the same magnitude. 1 2 Among many other social and environmental factors, occupation has been shown to influence suicide risk beyond established risk factors such as low socioeconomic status or educational attainment. 3 4 5 6 7

Physicians are one of several occupational groups linked to a higher risk of death by suicide, and the medical community has a longstanding and often conflicted history in addressing this issue. 8 A JAMA editorial from 1903 reviewed annual suicide numbers for US physicians and concluded that their suicide risk is higher compared with the general population. 9 A substantial amount of evidence has been accumulated globally in the 120 years since then, providing more insight on the topic and the challenges involved in its assessment. Most earlier research reported higher suicide rates for male and female physicians compared with the general population, and the mean effect estimates from the first meta-analysis in 2004 indicated a significantly increased standardised mortality ratio (SMR) of 1.41 for male physicians and 2.27 for female physicians. 10 This meta-analysis included 22 studies on suicide in physicians with observation periods between 1910 and 1998 and revealed some heterogeneity among study results, which was partly explained by the decline in risk over time. Similarly, another meta-analysis that included nine studies with observation periods between 1980 and 2015 reported a significantly decreased SMR of 0.68 for male physicians and a significantly increased SMR of 1.46 for female physicians. 11

In addition to publication year, several other factors could potentially drive heterogeneity between the published studies. Methodological differences in study design, outcome measures, and level of age standardisation could explain heterogeneity between studies. Furthermore, individual countries and world regions have varying levels of stigma about suicide in general and among physicians in particular, associated with different risks of underreporting, access to support systems, and generally different training and working conditions.

In this study, we aimed to perform an appraisal of the currently available evidence on suicide deaths in male and female physicians compared with the general population. We also aimed to explore heterogeneity by considering a broader spectrum of potential covariates. We hypothesise that suicide rate ratios for male and female physicians have declined over time, but gender differences persist and suicide risk remains increased for female physicians.

Search strategy and study selection

This meta-analysis was conducted based on recommendations of the Cochrane Collaboration, 12 and is reported in accordance with the preferred reporting items for systematic review and meta-analyses (PRISMA) statement. 13 We searched for observational studies with data on suicide rates in physicians compared with the general population or similar using Medline, PsycINFO, and Embase. “Physician,” “mortality,” and “suicide” were entered as MeSH terms and text words and then connected through Boolean operators. The specific search strategy was developed and adapted for each database with the support of librarians from the Medical University of Vienna (supplement table S1). Following Schernhammer and Colditz, 10 we limited the search period to articles published after 1960 but updated it through to 31 March 2024. No constraints were placed on the language in which the reports were written, the region where study participants lived, or their age group. Articles published in languages other than English or German were screened with the help of the translation software DeepL 14 and colleagues fluent in these languages. Screening of the literature was done independently by two reviewers (CZ and SS). We also performed forward and backwards reference screening for the included articles and searched for unpublished data from sources and databases listed in included articles, such as the US National Institute for Occupational Safety and Health, the UK Office for National Statistics, Switzerland’s Federal Statistical Office, and Statistics Denmark.

We excluded studies that reported only on specific suicide methods in physicians, non-fatal suicidal behaviour or thoughts, mental health and burnout, and suicide prevention. We also excluded conference abstracts, editorials, case studies, and letters. Only reports with adequate data about physician deaths by suicide (not attempts) were eligible.

At the full text screening stage, we decided to only include rate based outcome measures that compare the suicide mortality in a physician population with the suicide mortality in a reference population. This includes the indirectly standardised mortality ratio (SMR), directly standardised rate ratio (SRR), and the comparative mortality figure. Even though their formulas and recommended uses differ and might yield slightly different results when calculated for the exact same population, 15 it can be argued that they are comparable estimates for the purpose of meta-analysing suicide deaths in physicians compared with a reference population. We also included rate ratios, even though their level of age standardisation is typically less detailed and only comprises one age group (with lower or upper age cutoff points). However, the proportionate mortality ratio expresses a different concept (the cause specific SMR divided by the all cause SMR, or the rate of suicides in all physician deaths divided by the rate of suicides in all population deaths). This outcome measure is not suitable for calculation of combined estimates with SMRs, especially in target populations with higher general life expectancy like physicians, 16 and was therefore not included. We also excluded studies that reported odds ratios and relative risk calculations because these are not based on rates.

We avoided overlapping time periods of the same geographical regions among included studies so that any physician death by suicide would only be counted once towards the pooled result. In case of overlaps, only one study was included, and the decision of which to include was based on three criteria in sequential order: sample size (higher number of observed suicides); risk of bias (lower risk of bias based on the Joanna Briggs Institute (JBI) checklist for prevalence studies); and recentness (more recent midpoint of observation period). We also excluded studies that only reported overall (and not gender stratified) suicide ratios, only covered physician subgroups (eg, medical specialties), or did not meet minimum requirements for sample size (ie, an expected number of one suicide). When necessary information for inclusion was missing from eligible studies or the source of data was unclear, we contacted the authors. We excluded studies if the necessary information could not be obtained. A detailed list of excluded references including reason for exclusion can be found in the supplement (table S2).

Data extraction and risk of bias

Data extraction was conducted by two reviewers (CZ and SS) using a standardised table in Microsoft Excel. If studies did not include an SMR, but reported the numbers of observed (O) and expected (E) suicides or the necessary information to calculate them, the SMR was calculated by the reviewers (SMR=O/E). If the studies did not include an SRR or rate ratio, but reported (age standardised) suicide rates per 100 000 person years for physicians (R1) and a suitable reference population (R2) for a similar time period, the SRR or rate ratio was calculated (SRR=R1/R2, rate ratio=R1/R2). For one study, R1 and R2 were estimated from graphs. 17 Because not all studies reported confidence limits and the ones that did used different methods, we calculated 95% confidence intervals (CIs) for all studies based on Fisher’s exact test using observed and expected suicide numbers. For SRRs or rate ratios, we calculated the expected suicides by treating the SRR as an SMR (E=O/SRR). Standard errors were derived from the calculated 95% CIs by using the formula recommended for ratios in the Cochrane handbook (standard error=(ln upper CI limit – ln lower CI limit)/3.92). 12

In addition to variables relating to the main outcome, we extracted data on the following study characteristics to be used in sensitivity analyses: geographical location, observation period, age range, level of age standardisation, suicide classification, study design, and reference group. We used duplicate extraction and checked the final extraction table for errors to ensure accuracy.

Because there was no suitable validated scale to assess the quality of observational studies on mortality ratios, we used the JBI checklist for prevalence studies 18 as a critical appraisal tool for risk of bias assessment. Out of nine questions on this checklist, three were deemed not applicable owing to the investigation of mortality rather than morbidity (see supplement table S3a). Two reviewers (CZ and SS) independently evaluated a subsample of the included studies and the JBI checklist was subsequently further specified to achieve clear criteria for risk of bias assessment (see supplement table S3b). The same two reviewers then independently evaluated all studies (supplement table S4a and S4b). Consistency in rating was high, disagreements were resolved through discussion. If all applicable items of the JBI checklist were rated positive, a study was classified as having low risk of bias. If at least one item was rated negative or unclear, a study was classified as having moderate or high risk of bias.

Data analysis

We performed separate meta-analyses of suicide rate ratios for male and female physicians. Random effects models were chosen a priori owing to the assumption that the included studies represent a random sample of different yet comparable physician populations with some heterogeneity in effect size. 19 Random effects models were calculated based on the Hartung-Knapp method (also known as the Sidik-Jonkman method). 20 Cumulative meta-analyses were performed to examine changes in the overall mean effect estimate over time. Heterogeneity was assessed by Q tests, I 2 , T 2 , and prediction intervals.

Begg and Egger tests were conducted to evaluate the possibility of publication bias, which was also assessed by funnel plot and trim-and-fill analysis. We performed sensitivity analyses using meta-regression (for single covariates and adjusted for study observation period midpoint), including binary variables for several study characteristics (see supplement table S5a and S5b): risk of bias (low risk v moderate or high risk studies), study design (registry based studies v others), outcome measures (SMR v others), level of age standardisation (detailed with several age groups used v others), suicide classification (narrow international classification of diseases (ICD) definition without deaths of undetermined intent v others), age range (studies with a cutoff point around retirement age v others), and reference group (general population v similar). We also performed meta-regressions for length of observation period and number of suicides. Subgroup analysis was performed to assess geographical differences in two categorisations: World Health Organization world regions (with studies from the Americas, European Region, and Western Pacific Region for male and female physicians, only one study from the African Region for male physicians, and no studies from the South East Asian and Eastern Mediterranean Region) and most common study origin regions, reflecting the accumulation of reports from certain parts of the world (US, UK, Scandinavia, other European countries, rest of the world). We also used subgroups to calculate mean effect estimates in older and more recent studies. Two groups were formed based on the midpoint of study observation period, with one subgroup consisting of the 10 most recent studies, and another subgroup with the remaining studies. To accommodate for multiple testing, we adapted the level of significance to P<0.01 for all sensitivity analyses.

We conducted a secondary meta-analysis on suicide rates in physicians compared with another reference group that was more similar than the general population in terms of socioeconomic status. Studies were included if they provided data on deaths by suicide in physicians as well as a group of other professions with similar socioeconomic status (all other eligibility criteria remained the same).

All analyses were performed with Stata (version 17). This study was registered at the International Prospective Register of Ongoing Systematic Reviews (PROSPERO) under CRD42019118956.

Patient and public involvement

Several authors of this paper have trained and worked as physicians, and lived through the loss of colleagues to suicide. Their firsthand experiences offered valuable insights similar to those typically provided by patients. Because of the highly methodical nature of a systematic review and meta-analysis, it was difficult to involve members of the public in most areas of the study design and execution. However, patient and public involvement representatives reviewed the manuscript after submission and offered suggestions on language, dissemination, and general improvements to increase its relevance to those affected by physician deaths by suicide.

Included studies

The initial literature search yielded 23 458 studies. After removing duplicates and screening titles and abstracts, we were left with 786 articles. Application of the inclusion criteria resulted in 75 reports and we found a further 22 potentially eligible studies through reference list and registry based searches. Full text screening resulted in 38 studies for male physicians and 26 for female physicians that were eligible for analyses ( fig 1 ). Because a few studies provided more than one effect estimate, 21 22 a total of 42 datasets (male physicians) and 27 datasets (female physicians) were used for meta-analysis ( table 1 and table 2 ).

Fig 1

Flowchart showing study selection

  • Download figure
  • Open in new tab
  • Download powerpoint

Characteristics of included studies on male physicians

  • View inline

Characteristics of included studies on female physicians

Meta-analyses

The meta-analysis on suicide deaths in male physicians ( fig 2 ) produced a mean effect estimate of 1.05 (95% CI 0.90 to 1.22). The Q test was highly significant (Q=460.2, df=41, P<0.001), and the I 2 of 94% indicated that a high proportion of variance in the observed effects was caused by heterogeneity in true effects compared with sampling error. The variance of true effect size estimated with T 2 was 0.216, the standard deviation T was 0.465. The resulting prediction interval ranged from 0.41 to 2.72, which indicates that in 95% of all comparable future studies in male physician populations, the true effect size will fall in this interval. This finding reflects a high level of dispersion, suggesting that the suicide rates are decreased in some male physician populations but increased in others compared with the general population. Meta-regression confirmed calendar time (measured by midpoint of study observation period) as a highly significant covariate (β=−0.015, P<0.001), with an adjusted R 2 indicating an explained proportion of 52% of between-study variance.

Fig 2

Forest plot of suicide rate ratios for male physicians compared with general population

The mean effect estimate for suicide deaths in female physicians ( fig 3 ) was 1.76 (95% CI 1.40 to 2.21). The Q test for heterogeneity was highly significant (Q=143.2, df=26, P<0.001), and the I 2 of 84% indicated a high proportion of variance caused by heterogeneity in true effects, with T 2 estimated at 0.278 and T at 0.523. The prediction interval ranged from 0.58 to 5.35, so the dispersion of the true effect size across studies on female physicians was also substantial, ranging from decreased suicide rates in some female physician populations to considerably increased rates in others. The midpoint of study observation period also showed a highly significant association with the pooled estimate in a meta-regression (β=−0.024, P<0.001), explaining 87% of between-study variance.

Fig 3

Forest plot of suicide rate ratios for female physicians compared with general population

A decrease in suicide rate ratios over time is shown by cumulative meta-analyses (supplement figure S1a and S1b). A decline in pooled estimates is observed for female physicians throughout all studies, and a decline for studies with midpoints of observation period after 1985 can be seen for male physicians.

Further analyses

We performed sensitivity analyses across all studies using meta-regression. We did not observe any significant (P<0.01) results for male or female physicians, for study design, outcome measures, level of age standardisation, suicide classification, age range, reference group, length of observation period, and number of suicides. We found a significant association between risk of bias and effect size for male (β=−0.475, P=0.001) and female (β=−0.601, P=0.003) physicians, but when adjusting for midpoint of observation period, this association was no longer significant.

Egger test and Begg test gave no evidence of publication bias for studies on male or female physicians. The funnel plots showed no asymmetry, although they did reflect the high heterogeneity between studies (figure S2a and S2b). The non-parametric trim-and-fill analyses imputed no studies for male or female physicians, therefore no difference in effect size was found for observed versus observed plus imputed studies.

We also performed subgroup analyses based on geographical study location in two different categorisations: WHO world regions and most common study origin regions. With both analyses, the decrease in effect sizes over time was visible in most subgroups, and lower effect sizes were observed especially in studies from Asian countries (supplement figures S3a, S3b, S4a, and S4b). This finding translates to lower overall suicide rates for male physicians in the Western Pacific Region of 0.61 (95% CI 0.35 to 1.04), or similarly, for studies outside of Europe and the US with 0.69 (0.45 to 1.06). This pattern was not observed for female physicians, although the suicide rate ratio for the Western Pacific Region (1.06, 0.34 to 3.32) was also the lowest compared with all other subgroups.

Given that calendar time has been shown to have a strong association with effect size, we also performed a subgroup analysis of the 10 most recent studies versus all older studies. For male physicians (supplement figure S5a), the mean effect estimate in the subgroup of 32 older datasets was increased at 1.17 (0.96 to 1.41), whereas in the subgroup of the 10 most recent studies it was significantly decreased at 0.78 (0.70 to 0.88). For female physicians (supplement figure S5b), the mean suicide rate ratio in the subgroup of 17 older studies was significantly increased at 2.21 (1.63 to 3.01). In the subgroup of the 10 most recent studies, the mean effect was still significantly increased at a lower level of 1.24 (1.00 to 1.55).

Secondary meta-analysis

We conducted another meta-analysis on suicide rates in physicians compared with other professions of similar socioeconomic status and identified eight studies that compared male physicians with a reference group of other academics, other professionals, other health professionals, or members of social class I (supplement figure S6 and table S6). The pooled effect estimate was significantly increased at 1.81 (95% CI 1.55 to 2.12). The Q test (Q=17.6, df=7, P=0.01) was significant, but the I 2 of 58% and the prediction interval of 1.15 to 2.87 indicated a lower level of heterogeneity compared with the main analysis, and a more similar effect size across studies. We found five studies on female physicians (supplement table S6). The results of these studies appeared similar to those for male physicians, but we deemed the number of eligible studies too low for a random effects meta-analysis. 62

In this meta-analysis summarising the available evidence on physician deaths by suicide, we found the rate ratio for female physicians to be significantly raised, but not for male physicians. This result confirmed our hypothesis that mean effect estimates would be lower than in a previous meta-analysis on the subject published in 2004. 10 Calendar time was identified as a significant covariate in both analyses, indicating decreasing suicide rate ratios for physicians over time. The high level of heterogeneity in results from different studies suggests that suicide risk for male and female physicians is not consistent across various physician populations. Therefore, the pooled effect estimate is only of limited use in describing the overall suicide risk for physicians compared with the general population. In a secondary meta-analysis, the suicide rate ratio of male physicians was shown to be significantly raised when other professional groups with similar socioeconomic status were used as a reference group, with less heterogeneity across study results.

Strengths and limitations of this study

We did not impose any language restrictions on our search strategy so that relevant studies from different geographical regions were found. Consequently, we were able to include a large number of studies from 20 countries providing overall and recent summary estimates based on a complete assessment of the available evidence. This study also explored a range of covariates as potential causes for heterogeneity.

Several weaknesses should also be mentioned. Underreporting of suicide deaths might be more common for physicians compared with the general population, 8 influencing ratios between those two populations in the original studies. Despite the large number of included reports, several geographical regions are still underrepresented in the available evidence, which limits the generalisability of findings.

Comparison with other studies

A systematic review on physician deaths by suicide included a meta-analysis of studies with observation periods between 1980 and 2015, 11 but found only nine eligible studies (a third of which were already included in the first meta-analysis by Schernhammer and Colditz 10 ). This analysis was also subject to some methodological limitations, such as using a potentially arbitrary starting point for study observation periods and not accounting for overlap between included studies (therefore counting some physician deaths by suicide twice). Another systematic review and meta-analysis on physician and healthcare worker deaths by suicide included only one new study compared with Schernhammer and Colditz 10 and so did not provide an updated estimate. 63 Additionally, this analysis included a large US study that reported increased proportionate mortality ratios, impacting the pooled estimate for male physicians towards showing an effect.

Meaning of the study

The results of this study suggest that across different physician populations, the suicide risk is decreasing compared with the general population, although it remains raised for female physicians. The causes of this decline are unknown, but several factors might play a part. The critical appraisal of the included studies indicated better study quality among more recent studies, which might have contributed to the decrease in effect sizes over time. Meta-regression results by Duarte and colleagues suggested that the decrease in suicide risk in male physicians was driven by a reduction in the rate of suicide deaths in physicians rather than an increase in suicide deaths in the population. 11 This finding could mean that physicians have benefitted more from general or targeted suicide prevention efforts compared with the general population, which is testament to the repeated calls for more awareness and interventions to support the mental health of physicians. 64 65 Furthermore, the proportion of female physicians has increased over recent decades, and the average proportion of female physicians across all OECD (Organisation for Economic Co-operation and Development) countries reached 50% in 2021. 66 This change is likely to affect working conditions in a historically male dominated field that could be relevant to the mental health of workers. Some evidence exists that occupational gender composition affects the availability of workplace support and affective wellbeing, with higher support levels in mixed rather than male dominated occupations. 67 68

It is important to note, however, that considerable heterogeneity exists in the suicide risk of different physician populations that is still partly unexplained. Working as a physician is probably associated with different risk and protective factors across diverse healthcare systems, as well as training and work environments. Additionally, prevailing attitudes and stigma about mental health and suicide could vary. Societal influences on suicide rates over time might affect physicians differently compared with the general population (eg, mental health stigma might differ for physicians compared with the general population, and change at a different rate). Therefore, it seems plausible that the relation between suicide deaths in physicians compared with the general population differs between regions and countries.

Policy implications

Overall, this study highlights the ongoing need for suicide prevention measures among physicians. We found evidence for increased suicide rates in female physicians compared with the general population, and for male physicians compared with other professionals. Additionally, the decreasing trend in suicide risk in physicians is not a universal phenomenon. An Australian study found a substantial increase in suicide risk for female physicians, which doubled between 2001 and 2017. 58 The recent covid-19 pandemic has put additional strain on the mental health of physicians, potentially exacerbating risk factors for suicide such as depression and substance use. 69 70 Other important risk factors include suicidal ideation and attempted suicide, and their prevalence among physicians was estimated by a recent meta-analysis. The results suggest higher levels of suicidal ideation among physicians compared with the general population, whereas the prevalence of suicide attempts appeared to be lower. 71 This finding could indicate that suicidal intent in physicians is more likely to result in fatal rather than non-fatal suicidal behaviours. 72 A systematic review on mental illness in physicians concluded that a coordinated range of mental health initiatives needs to be implemented at the individual and organisational level to create workplaces that support their mental health. 73 Evidence exists for effective physician directed interventions, but hardly any research on organisational measures to address suicide risk in physicians. 74 Continued advances in organisational strategies for the mental wellbeing of physicians are essential to support individual medical institutions in their efforts to foster supportive environments, combat gender discrimination, and integrate mental health awareness into medical education and training.

Recommendations for future research

In addition to more primary studies from world regions other than Europe, the US, and Australia, future research also needs to systematically look into other factors beyond study characteristics that might explain the heterogeneity in suicide risk in physicians. Such research would help in identifying physicians who are at risk, with targeted prevention measures and ways to adapt them to different clinical and cultural contexts. Because geographical or national differences appear to be important factors, future studies on suicide risk in physicians should bear in mind that the specific settings of any physician population might influence their risk and resilience factors to a much higher degree than previously assumed. Other major events that affect healthcare, such as the covid-19 pandemic, could also have a large impact. Future research is needed to assess any covid-19 related effects on suicide rates in physicians around the world.

What is already known on this topic

Many studies reported increased suicide rates for physicians, and a 2004 meta-analysis found significantly increased suicide rates for male and female physicians compared with the general population

Evidence on increased suicide rates for physicians is inconsistent across countries

What this study adds

Suicide rate ratios for physicians appear to have decreased over time, but are still increased for female physicians

A high level of heterogeneity exists across studies, suggesting that suicide risk varies among different physician populations

Further research is needed to identify physician populations and subgroups at higher risk of suicide

Ethics statements

Ethical approval.

Not required.

Data availability statement

Additional data are available from the corresponding author at [email protected] upon request.

Acknowledgments

The authors are grateful for the support in developing the literature search strategy that was provided by the library staff at the Medical University of Vienna, and for the generous help with translations that was provided by a number of colleagues from within and outside of this institution. The authors also want to acknowledge the efforts undertaken by the Federal Statistics Office (Switzerland) and the Office for National Statistics (UK) to provide original data that were used in this analysis. Furthermore, the authors thank Eduardo Vega who reviewed the paper after submission as a member of the public, as well as Lena Hübl and Klaus Michael Fröhlich who provided their perspectives as physicians.

Contributors: CZ, SS, and ES conceived and designed the study, HH and TN contributed and advised on methodological aspects. CZ performed the literature search and was the first reviewer for article screening, data extraction, and risk of bias assessment. SS was the second reviewer for article screening, data extraction, and risk of bias assessment. CZ performed the statistical analyses and SS accessed and verified the underlying study data. CZ, SS, and ES interpreted the data. CZ drafted the manuscript and prepared tables and figures. All authors critically revised the manuscript for intellectual content and approved the final version. ES supervised the study. CZ is the study guarantor. The corresponding author attests that all listed authors meet authorship criteria and that no others meeting the criteria have been omitted.

Funding: This study was partially supported by the Vienna Anniversary Foundation for Higher Education (grant number H-303766/2019). The funder had no role in the study design, data collection, analysis, or interpretation, or in writing or submitting the report. The researchers were independent from the funder and all authors had full access to all of the data (including statistical reports and tables) and can take responsibility for the integrity of the data and the accuracy of the data analysis.

Competing interests: All authors have completed the ICMJE uniform disclosure form at www.icmje.org/disclosure-of-interest/ and declare: CZ received partial funding from the Vienna Anniversary Foundation for Higher Education for the submitted work; no financial relationships with any organisations that might have an interest in the submitted work in the previous three years; no other relationships or activities that could appear to have influenced the submitted work.

Transparency: The lead author (the manuscript’s guarantor) affirms that the manuscript is an honest, accurate, and transparent account of the study being reported; that no important aspects of the study have been omitted; and that any discrepancies from the study as originally planned (and, if relevant, registered) have been explained.

Dissemination to participants and related patient and public communities: The authors plan to disseminate the study findings through conference presentations, talks, press releases, social media, and in mandatory courses on mental wellbeing for medical students. The results will also be forwarded to national and international organisations that the authors have had contact with, to be disseminated both within these organisations and through their communication channels. This includes organisations in the field of mental health, public health, suicide prevention, and professional associations (for physicians and medical students); examples include the American Foundation for Suicide Prevention, the International Association for Suicide Prevention and particularly its Special Interest Group on Suicide and the Workplace, the Canadian Medical Association, the Austrian Public Health Association, and the Austrian Medical Chamber. Discussions on how these findings might be used in local and national suicide prevention efforts in Austria will involve physicians, hospital administrators, mental and occupational health professionals, and interested members of the public who are affected by suicidality among physicians.

Provenance and peer review: Not commissioned; externally peer reviewed.

This is an Open Access article distributed in accordance with the Creative Commons Attribution Non Commercial (CC BY-NC 4.0) license, which permits others to distribute, remix, adapt, build upon this work non-commercially, and license their derivative works on different terms, provided the original work is properly cited and the use is non-commercial. See: http://creativecommons.org/licenses/by-nc/4.0/ .

  • ↵ World Health Organization. Suicide worldwide in 2019: global health estimates. Published online 2021. Accessed 12 Sep 2023. https://www.who.int/publications-detail-redirect/9789240026643
  • Turecki G ,
  • Gunnell D ,
  • Mortensen PB ,
  • Nordentoft M
  • de Gelder R ,
  • Kapadia D ,
  • Øien-Ødegaard C ,
  • Spittal MJ ,
  • LaMontagne AD
  • Roberts SE ,
  • Jaremin B ,
  • Schernhammer ES ,
  • El-Hagrassy MM ,
  • Couto TCE ,
  • ↵ Cochrane. Cochrane Handbook for Systematic Reviews of Interventions. Version 6.3. Published 2022. Accessed 19 Jul 2023. https://training.cochrane.org/handbook/current
  • Liberati A ,
  • Altman DG ,
  • Tetzlaff J ,
  • ↵ Deep L. DeepL Translator. Accessed 19 Jul 2023. https://www.DeepL.com/translator
  • ↵ Windsor-Shellard B. Suicide by Occupation, England: 2011 to 2015 . Office for National Statistics; 2017. Accessed 17 Sep 2023. https://www.ons.gov.uk/releases/suicidesbyoccupationengland2011to2015
  • Stefansson CG ,
  • ↵ Munn Z, Moola S, Lisy K, Riitano D, Tufanaru C. Chapter 5: Systematic reviews of prevalence and incidence. In: JBI Manual for Evidence Synthesis . 2020. Accessed 19 Jul 2023. https://jbi.global/critical-appraisal-tools
  • ↵ Borenstein M, Hedges LV, Higgins JPT, Rothstein HR. Introduction to Meta-Analysis . Wiley; 2021. https://www.google.at/books/edition/Introduction_to_Meta_Analysis/2oYmEAAAQBAJ
  • IntHout J ,
  • Ioannidis JP ,
  • ↵ Office of Population Censuses and Surveys. Occupational Mortality 1979-80, 1982-83, Decennial Supplement, Part I+II. Her Majesty’s Stationery Office; 1986.
  • Friese CR ,
  • Lindhardt M ,
  • Frandsen E ,
  • Hamtoft H ,
  • Pitts FN Jr .
  • Dean G. The causes of death of South African doctors and dentists. Afr Med J . Published online 1969.
  • Revicki DA ,
  • Feuerlein W
  • Richings JC ,
  • Arnetz BB ,
  • Hedberg A ,
  • Theorell T ,
  • Allander E ,
  • Rimpelä AH ,
  • Nurminen MM ,
  • Pulkkinen PO ,
  • Rimpelä MK ,
  • Tokudome S ,
  • Nishizumi M ,
  • Kuratsune M
  • Schlicht SM ,
  • Gordon IR ,
  • Christie DG
  • Iwasaki A ,
  • Lindeman S ,
  • Hirvonen J ,
  • Lönnqvist J
  • Rafnsson V ,
  • Gunnarsdottir HK
  • Mosbech J ,
  • Clements A ,
  • Sakarovitch C ,
  • Hostettler M ,
  • Baburin A ,
  • Meltzer H ,
  • Griffiths C ,
  • Petersen MR ,
  • Aasland OG ,
  • Haldorsen T ,
  • Palhares-Alves HN ,
  • Palhares DM ,
  • Laranjeira R ,
  • Nogueira-Martins LA ,
  • Claessens H
  • Schwenk TL ,
  • Davidson JE ,
  • Herrero-Huertas L ,
  • Andérica E ,
  • FSO (Federal Statistics Office) Switzerland. Data request for physician suicide data. https://www.bfs.admin.ch/bfs/en/home.html
  • FMH (Foederatio Medicorum Helveticorum). Online-tool for the physician statistics of the Swiss Medical Association (FMH), data for 2008-2020. https://aerztestatistik.fmh.ch/
  • ONS (Office for National Statistics) UK. Suicide by Occupation in England: 2011 to 2015 and 2016 to 2020. Published 2021. https://www.ons.gov.uk/peoplepopulationandcommunity/birthsdeathsandmarriages/deaths/adhocs/13768suicidebyhealthcarerelatedoccupationsengland2011to2015and2016to2020registrations
  • Zeritis S ,
  • Phillips M ,
  • Zimmermann C ,
  • Strohmaier S ,
  • Niederkrotenthaler T ,
  • Schernhammer E
  • Rothman K, Boyce J. Epidemiologic Analysis with a Programmable Calculator . Vol NIH Publication No. 79-1649. National Institutes of Health; 1979. Accessed 13 Sep 2023. https://hero.epa.gov/hero/index.cfm/reference/details/reference_id/3978444
  • Pitts FN Jr . ,
  • Schuller AB ,
  • Dutheil F ,
  • Pereira B ,
  • Goldman ML ,
  • Bernstein CA
  • ↵ OECD. Health at a Glance 2023: OECD Indicators. OECD Publ . Published online 2023. doi: 10.1787/7a7afb35-en OpenUrl CrossRef
  • ↵ Frank E, Dingle AD. Self-reported depression and suicide attempts among U.S. women physicians. Am J Psychiatry . Published online 1999. Accessed 25 Jun 2018. https://ajp.psychiatryonline.org/doi/pdf/10.1176/ajp.156.12.1887
  • Harvey SB ,
  • Epstein RM ,
  • Glozier N ,
  • Crawford J ,
  • Baker STE ,

research bias in peer review

U.S. flag

An official website of the United States government

The .gov means it’s official. Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

The site is secure. The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

  • Publications
  • Account settings

Preview improvements coming to the PMC website in October 2024. Learn More or Try it out now .

  • Advanced Search
  • Journal List
  • HHS Author Manuscripts

Logo of nihpa

Eliminating Explicit and Implicit Biases in Health Care: Evidence and Research Needs

Monica b. vela.

1 Department of Medicine, Section of Academic Internal Medicine, University of Illinois College of Medicine in Chicago, Chicago, Illinois, USA

Amarachi I. Erondu

2 Department of Internal Medicine and Pediatrics, University of California, Los Angeles Medical Center, Los Angeles, California, USA

Nichole A. Smith

3 Department of Internal Medicine, Hospital of the University of Pennsylvania, Philadelphia, Pennsylvania, USA

Monica E. Peek

4 Department of Medicine, Section of General Internal Medicine and Chicago Center for Diabetes Translation Research, University of Chicago, Chicago, Illinois, USA

James N. Woodruff

5 Pritzker School of Medicine, University of Chicago, Chicago, Illinois, USA

Marshall H. Chin

6 Department of Medicine and Chicago Center for Diabetes Translation Research, University of Chicago, Chicago, Illinois, USA

AUTHOR CONTRIBUTIONS

Health care providers hold negative explicit and implicit biases against marginalized groups of people such as racial and ethnic minoritized populations. These biases permeate the health care system and affect patients via patient–clinician communication, clinical decision making, and institutionalized practices. Addressing bias remains a fundamental professional responsibility of those accountable for the health and wellness of our populations. Current interventions include instruction on the existence and harmful role of bias in perpetuating health disparities, as well as skills training for the management of bias. These interventions can raise awareness of provider bias and engage health care providers in establishing egalitarian goals for care delivery, but these changes are not sustained, and the interventions have not demonstrated change in behavior in the clinical or learning environment. Unfortunately, the efficacy of these interventions may be hampered by health care providers’ work and learning environments, which are rife with discriminatory practices that sustain the very biases US health care professions are seeking to diminish. We offer a conceptual model demonstrating that provider-level implicit bias interventions should be accompanied by interventions that systemically change structures inside and outside the health care system if the country is to succeed in influencing biases and reducing health inequities.

1. INTRODUCTION

Although expressions of explicit bias have declined in the United States over time, implicit bias has remained unrelenting. Health care providers hold negative explicit and implicit biases against many marginalized groups of people, including racial and ethnic minoritized populations, disabled populations, and gender and sexual minorities, among others ( 29 , 63 ). Implicit bias permeates the health care system and affects patients via patient–clinician communication, clinical decision making, and institutionalized practices ( 78 ). Higher education systems, including medical schools and academic hospitals, have been affected by the discrimination and bias that have long permeated the health care delivery system ( 84 , 104 ). Bias in admissions and promotions processes, in classroom and bedside instruction, and by health care providers contributes to the constant messaging that stereotypes and isolates marginalized groups ( 80 , 102 , 105 ). These biases hinder improvement in compositional diversity of health care providers, long recognized as an important mechanism in reducing health care disparities ( 60 ). This complex system of discrimination and biases causes devastating health inequities that persist despite a growing understanding of the root causes and the health care system’s professional, ethical, and moral responsibility to address these inequities.

It has been theorized that implicit bias and structural racism mutually reinforce one another—ambient structural racism and its outcomes reinforce an individual’s psychological associations between racial identity and poorer outcomes (implicit bias) ( 20 , 21 ). Inequitable structural determinants have diminished housing, education, health care, and income and have increased exposure to environmental pollutants and chronic stressors for marginalized populations ( 76 , 108 ). Structural inequities and discrimination have created stereotypes of marginalized populations or communities and implicit and explicit biases toward them. Health care providers hold negative explicit and implicit biases against racialized minorities. A similar reinforcing dynamic may exist for marginalized populations such as those who are overweight/obese, use wheelchairs, have limited English proficiency, have mental health illness, and belong to lower socioeconomic classes ( 29 ). These biases can facilitate the creation and perpetuation of discriminatory systems and practices, creating a complex feedback loop that sustains itself.

Addressing bias remains a fundamental professional responsibility of health care and public health professionals accountable for population health and wellness ( 64 , 65 ). This article ( a ) provides an overview of existing evidence of bias among health professionals, health practitioners, and public health workers in the practice and training environments (and lay health workers as appropriate) and its impact on health disparities; ( b ) systematically reviews the extant literature for evidence and limitations of current interventions designed to reduce or manage biases; ( c ) explores the interaction between bias and structural elements of the health care system (including medical education); and ( d ) proposes a conceptual model that frames bias not as an independent factor in the generation of disparities but as one element of a reinforcing system of elements that perpetuates such disparities. Ultimately, we provide evidence that interventions designed to reduce or manage existing explicit and implicit biases in clinical settings and public health are insufficient and will continue to fall short in reducing health inequities if we do not concomitantly address the racism and discrimination ingrained in health, medical educational systems, and other societal structures.

2. BACKGROUND

2.1. overview of bias.

Critical to an understanding of interventions that address explicit and implicit biases in health care is an understanding of key terminology, tools used to measure bias, and the evidence for and impact of these biases in health care.

2.1.1. Key terminology: What are implicit and explicit biases?

Implicit biases are unconscious mental processes that lead to associations and reactions that are automatic and without intention; actors have no awareness of the associations with a stimulus ( 41 , 43 ) ( Table 1 ). Axt et al. ( 4 ) maintain that social status is relational and people unconsciously hold more negative attitudes or feelings about membership of an outgroup (people with whom they do not share identities) than about membership of an ingroup (people with whom they share identities). A stereotype is a fixed set of attributes associated with a social group ( 49 ).

Terminology of bias

TermDefinition
DiscriminationDiscrimination is “the result of either implicit or explicit biases and is the inequitable treatment and/or impact of general policies, practices, and norms on individuals and communities based on social group membership” ( , p. S5).
EthnicityEthnicity is “a social system defining a group that shares a common ancestry, history or culture with some combination of shared geographic origins, family patterns, language, or cultural norms, religious traditions, or other cultural and social characteristics” ( , p. 325).
Explicit biasExplicit forms of bias include “preferences, beliefs, and attitudes of which people are generally consciously aware, endorsed, and can be identified and communicated” ( , p. 1).
Hidden curriculum“Lessons taught through socialization of learners especially as it pertains to professionalism, humanism, and accountability, as opposed to explicitly taught in the classroom or bedside” ( , p. 50).
Implicit biasImplicit biases are “unconscious mental processes that lead to associations and reactions that are automatic and without intention and actors have no awareness of the associations with a stimulus. Implicit bias goes beyond stereotyping to include favorable or unfavorable evaluations toward groups of people.” While we are not aware these implicit biases exist, they have a significant impact on decision making ( , p. 14).
Institutional racismInstitutional racism (structural) “refers to the processes of racism that are embedded in laws (local, state and federal), policies, and practices of society and its institutions that provide advantages to racial groups deemed superior while differentially oppressing, disadvantaging or otherwise neglecting racial groups viewed as inferior” ( , p. 107).
Race“Race is primarily a social category, based on nationality, ethnicity, phenotypic or other markers of social difference, which captures differential access to power and resources in society. It functions on many levels and socializes people to accept as true the inferiority of nondominant racial groups leading to negative normative beliefs (stereotypes) and attitudes (prejudice) toward stigmatized racial groups which undergird differential treatment of members of these groups by both individuals and social institutions” ( , p. 106).
Racism“Racism is an organized social system in which the dominant racial group, based on an ideology of inferiority, categorizes and ranks people into social groups called ‘races’ and uses its power to devalue, disempower, and differentially allocate valued society resources and opportunities to groups defined as inferior... A characteristic of racism is that its structure and ideology can persist in governmental and institutional policies in the absence of individual actors who are explicitly racially prejudiced” ( , p. 106).
Role modelingRole modeling is a mechanism for teaching behavior through learning by observation ( , p. 26).
StereotypeA stereotype is “a fixed set of attributes associated with a social group” ( , p. 209).
Stereotype threatStereotype threat “occurs when cues in the environment make negative stereotypes associated with an individual’s group status salient, triggering physiological and psychological processes that have detrimental consequences for behavior” and performance of the individual who identifies as a member of the stereotyped group ( , p. S169).

Implicit bias goes beyond stereotyping to include favorable or unfavorable evaluations toward groups of people ( Table 1 ). Although we are not aware these implicit biases exist, they have a significant impact on decision making ( 97 ).

A belief is explicit if consciously endorsed ( 43 ). Explicit forms of bias include preferences, beliefs, and attitudes of which people are generally consciously aware, personally endorse, and can identify and communicate ( 22 ). Discrimination, the result of either implicit or explicit biases, is the inequitable treatment and/or impact of general policies, practices, and norms on individuals and communities based on social group membership ( 65 , 76 ). Daumeyer et al. ( 22 ) argue that implicit biases must be exposed and discussed so that people and institutions can be held accountable for their effects. They argue for nuanced conversations about the ways in which implicit biases shape behavior and the ways to combat it.

2.1.2. Tools used to measure implicit bias: How good are these measures? Have they been used outside of medicine?

In 1998, Greenwald et al. ( 45 ) described a word association test that identified implicit stereotype effects through indirect reaction time measures even when subjects self-reported low measures of prejudice. Since then, the implicit association test (IAT) has consistently demonstrated implicit stereotyping for a range of different social categories, particularly gender and ethnicity ( Table 1 ). Greenwald et al. ( 42 ) maintain that statistically small effects of the IAT can have socially large effects. A meta-analysis by Greenwald et al. ( 45 ) demonstrated the predictive validity of the IAT regarding implicit stereotype associations to behavioral outcomes across a range of social subject areas. Some critics challenge whether the IAT measures implicit bias and predicts behavior, and question its utility in clinical and other real-world situations ( 3 , 69 ). Most researchers agree that the IAT has limitations ( 44 ). It does not have high test-retest reliability in the same individual, and it is not useful as a tool to label individuals as implicitly sexist or racist or to predict behavior ( 73 ). The IAT has been used in health professions education as a metric to demonstrate the efficacy of educational interventions meant to reduce implicit bias and as a tool to raise awareness of existing implicit bias among health care trainees and providers ( 101 ).

2.1.3. Implicit biases in health care: What is the evidence for racial bias among health care professionals? What is the impact of such bias in health care?

Implicit racial and ethnic bias exists among health care professionals in favor of White patients and against Black, Hispanic, and dark-skinned patients even when all other major factors (e.g., socioeconomic differences, insurance status) have been controlled and accounted for. Hall et al. ( 47 ) published a systematic literature review of 15 studies designed to explore the evidence of provider implicit racial bias and health outcomes. In the studies measuring prevalence, rates of anti-Black bias in health care providers ranged from 42% to 100%. These findings were redemonstrated in similar reviews conducted in 2017 ( 29 ) and 2018 ( 63 ).

Hoffman et al. ( 50 ) demonstrated in 2016 that White medical students and residents were more likely to believe that Black patients had thicker skin and smaller brains, and were more likely to rate Black patients as feeling less pain than and not needing the same levels of pain medications as White patients. Several studies have demonstrated that negative implicit biases held by those in the health professions are similar to those seen in the lay population ( 29 ).

The Medical Student Cognitive Habits and Growth Evaluation Study (CHANGES) has provided the greatest insight into the implicit and explicit biases held by medical students and trainees in the United States. This longitudinal multimeasure study followed a large sample of students attending a stratified random sample of 49 US allopathic medical schools and measured associations between possible interventions and levels of biases held by students. A web-based survey completed by more than 4,500 first-year medical students demonstrated that most students exhibited implicit (74%) and explicit (67%) weight bias. The study also demonstrated that scores of implicit weight bias were similar to scores of implicit bias against racial minorities (74%) in the same group of students ( 86 ). The size and scope of this study demonstrate undeniable evidence that implicit bias is pervasive among medical students, even in the first year of medical school. The multiple papers and findings generated by this foundational study were excluded from the final selection of studies in the results section because the study was observational and did not introduce interventions.

Biases affect health care delivery and public health outcomes, the health professions workplace and learning environments, and the diversity of trainees and workforce ( Table 2 ). Hall et al. ( 47 ) demonstrated that these implicit biases have negatively affected patient–provider interactions, treatment decisions, and patient adherence to treatment. The most consistent evidence is found in studies of patient–provider interactions in which the bias of health care providers has been repeatedly linked to discriminatory care ( 18 )—patients rate physicians with higher levels of implicit bias as less patient-centered in the primary care setting. Blanchard & Lurie ( 6 ) demonstrated that patients who perceived that they would have received better treatment if they were of a different race were significantly less likely to receive optimal chronic disease screening and more likely to not follow the doctor’s advice or to delay care. In a large study of adult primary care, higher implicit bias among health care providers was associated with patients’ lower ratings of interpersonal treatment, contextual knowledge, communication, and trust ( 5 ).

Impacts of implicit bias

AreaImpacts
Health care deliveryPatient-provider communication
Patient-provider relationships
Patient satisfaction
Patient perception of physician’s patient-centeredness
Patient treatment adherence
Provider decision making
Provider’s perspective of patient’s likelihood to adhere to treatment
Public healthResource allocation (testing locations, vaccine distribution, location of environmental stressors)
Health professions workplace and learning environmentsPromotions practices
Compensation
Evaluations
Awards and recognition
Research grants
Stress, isolation
Diversity of trainees and workforceRecruitment and selection of future trainees
Inclusive learning environment

Other studies have confirmed associations between provider bias (demonstrated via IAT testing) and disparate treatment of their patients ( 63 ). In a systematic literature review, six studies found that higher implicit bias among health care providers was associated with disparities in treatment recommendations, expectations of therapeutic bonds, pain management, and empathy ( 63 ). Seven studies that examined the impact of implicit provider bias on real-world patient–provider interaction found that health care providers with stronger implicit bias demonstrated poorer patient–provider communication and that health care providers with high implicit biases ( a ) provided lower rates of postoperative narcotic prescriptions for Black children than for White children ( 93 ), ( b ) had poorer bonding with Black patients than with White patients ( 55 ), and ( c ) made disparate recommendations for thrombolytic therapy for Black patients and White patients ( 40 ).

A study of 3,756 students at 49 US medical schools demonstrated that high scores of racism as measured by the three variables were significantly correlated with low scores of student intentions to work in underserved areas and to provide care to minority populations ( 74 ).

Implicit bias affects not only patients but also trainees and faculty within health care systems. A 2014 systematic literature review revealed that rates of harassment and discrimination against trainees (24% reported racial discrimination, 33% reported sexual harassment, and 54% reported gender discrimination) have remained unchanged over time ( 31 ). Minority trainees report facing daily bias and microaggressions and having feelings of isolation and substantial stress ( 74 ). Minority medical students reported five-times-higher odds of racial discrimination and isolation than did nonminority peers ( 26 ). Stereotype threat (defined in Table 1 ) is common, particularly among non-White students, interferes with learning, and adds to the cognitive load of minoritized students ( 9 ). Thus, bias in health professions training can affect the performance of racialized minorities. Early and small differences in assessed clinical performance, which may be affected by implicit biases, lead to larger differences in grades and selection for awards [e.g., Alpha Omega Alpha Honor Medical Society (AOA)], ultimately affecting career trajectories of racial minority candidates ( 102 ). For example, significant differences in negative descriptive words on medical students’ evaluations have been found across different racial and gender groups ( 91 ). Membership in AOA, conferred to only 16% of each graduating medical school class, has effectively barred diversity in many specialties and may represent a longstanding form of structural racism ( 7 ).

2.2. Impact of Interventions Designed to Reduce or Manage Bias

Literature outside of health care has introduced techniques to manage implicit bias, including stereotype replacement (replacing stereotypical responses to bias with nonstereotypical ones), counter-stereotypic imaging (imagining known counter-stereotypical people), individuation (learning personal attributes of persons present rather than identifying group attributes), perspective taking (taking the perspective of persons present), and increasing opportunities for contact. Several studies have explored the efficacy of these interventions. Strikingly, the only study demonstrating reduction of measured implicit bias was conducted on undergraduate students enrolled in a course using a prejudice-habit-breaking intervention involving instruction of all the aforementioned techniques with effects lasting 8 weeks ( 24 ). Unfortunately, these results may not be generalizable and have not been reproduced. Lai et al. ( 57 ) tested nine interventions and although all immediately reduced implicit preferences, results were sustained for only several hours to days. FitzGerald et al. ( 30 ) conducted in 2019 a systematic review of bias interventions utilizing the IAT or other measures across multiple disciplines. They found that most studies did not provide robust data to support many interventions, although perspective taking was more successful than counter-stereotypic imaging.

2.3. Interactions Between Bias and Structural Elements of the Health Care System

Implicit bias has important interactions with structural elements of the health care system. Evidence suggests that implicit bias can reinforce structural dimensions of the health care system that generate disparities. Other evidence suggests that structural dimensions of the health care system and medical education can reinforce implicit bias. These interactions suggest a complex and mutually reinforcing relationship between implicit bias and structural elements of the health care system.

2.3.1. The relationship between implicit bias and public policy.

Implicit biases influence the decisions of policy makers in government and health care that result in structural racism ( 70 , 75 , 81 ). Public health responses to the coronavirus disease 2019 (COVID-19) pandemic offer evidence of this dynamic. Despite data demonstrating that non-Hispanic Black populations and Hispanic populations were dying at a younger average age (71.8 years and 67.3 years) than non-Hispanic White patients were (80.9 years), the phase 1b vaccination strategy targeted individuals age 75 and older ( 25 ). Thus, federal public health recommendations ignored or discounted the evidence that an age-based approach would lead to further disparities in COVID-19 infections and mortality, amounting to structural racism against Black and Hispanic populations.

2.3.2. The relationship between implicit bias and cognitive workload: overcrowding and patient load.

Studies have consistently shown that decision makers burdened with higher cognitive load are more likely to make biased decisions ( 10 ). A more recent study of physicians in the emergency department has confirmed that cognitive stressors such as patient overcrowding and patient load were associated with increased implicit racial bias as measured by a race IAT preshift compared to postshift ( 53 ).

2.3.3. The relationship between implicit bias and the learning/training environment.

Unfortunately, to date, medical education and educators have not adequately addressed the implicit biases that place marginalized patients at high risk of receiving disparate care and suffering poorer health outcomes. In fact, Phelan et al. ( 84 ) concluded that structural racism is at play in medical education through many medical schools’ formal and hidden curricula ( 52 , 88 ). In contrast to a formal curriculum, which can be measured by the number of hours students receive training related to racial disparities and bias, structured service-learning, minority health activities, cultural awareness programming, and the completion of an IAT, the hidden curriculum is unofficial and often more powerful, consisting of faculty role modeling ( 52 ), institutional priorities around the interracial climate, and experiences of microaggressions.

Most medical students continue to believe that both race and gender (as opposed to sex) are genetic and biological constructs. Even when students are taught otherwise, the practice of race-based medicine reinforces these characterizations. When students are taught about health disparities without the appropriate contextualization of structural racism, historic segregation, the pathologization of gender and sexual orientation, and the medical professions’ complicity in scientific racism, students may assume there is something inherently wrong with racialized minorities rather than with the systems that have harmed them. Students are often taught that race, instead of racism, is an independent risk factor for disease. They learn to associate race with any number of diseases. They are taught to incorporate the race of their patient into the opening line of clinical presentations even though there is no evidence that race is relevant to the establishment of diagnoses. They learn to use race-based algorithms to calculate glomerular filtration rates, pulmonary function testing, hypertension guidelines, and even urinary tract infection diagnoses in pediatric populations ( 2 ). Such messaging only serves to undo any structured teaching on the social construct of race and gender ( 16 ).

2.3.4. The relationship between implicit bias and health care outcomes.

As discussed above, there is substantial evidence that implicit bias results in health care disparities through mechanisms including disparate care and trust. But the relationship between implicit bias and outcomes may be bidirectional. Evidence has shown that implicit attitudes are malleable and that such attitudes are learned and strengthened through repeated observation of particular classes of people in valued or devalued circumstances. For example, individuals exposed to less favorable exemplars from a given identity demonstrate increased implicit bias and stereotypes with respect to that entire group ( 20 ). Furthermore, these investigators showed that changing exposure to more favorable exemplars can diminish established implicit bias. This phenomenon has been demonstrated in experiments looking specifically at race- and age-related attitudes ( 21 ). These findings suggest that a practitioner’s implicit bias toward a marginalized group may be augmented or diminished by the clinical outcomes of that group.

2.3.5. Favorable relationships between structural elements of training and bias: curricula, climate, and contact.

The CHANGES study demonstrated that students’ implicit bias against sexual minorities was reduced at 42 medical schools and increased at only 7 schools. Reduced bias was associated with more frequent interaction with LGBT students, faculty, and patients; the perceived quality of that contact; and increased training involving skills in caring for sexual minorities ( 85 ).

The CHANGES study found that changes in student implicit racial attitudes were independently associated with formal curricula related to disparities in health and health care, cultural competence, and minority health; informal curricula (or hidden curricula, defined in Table 1 ), including racial climate and role model behavior; and the amount and favorability of interracial contact during medical school ( 84 ).

Thus, carefully designed structural elements of the learning environment can favorably affect the implicit biases and wellness of students.

2.4. Systematic Review of Studies with Interventions

A systematic literature review was performed with the goal of assessing the efficacy of extant interventions designed to reduce the explicit and implicit biases of health care providers and of learners across the continuum of health professions education.

2.4.1. Methods.

We searched three databases (ERIC, PubMed, and MedEdPORTAL) using key terms ( Figure 1 ). The terms “implicit bias,” prejudice,” and “stigma” were often used inter-changeably and the terms “bias” and “biases” yielded more than 100,000 articles, often with little relevance to implicit bias in the health professions. We found, as did FitzGerald et al. ( 30 ) in their systematic review, that indexing in databases for these terms was inconsistent and that titles and abstracts were often imprecise. We conducted repeated searches with and without these terms, comparing the number of search results. We developed a set of terms most frequently encountered in the titles and abstracts of irrelevant articles and defined important terminology ( Table 1 ) to narrow the search. We reviewed the references of landmark articles and used the advanced search function to increase the likelihood that no key articles were missed.

An external file that holds a picture, illustration, etc.
Object name is nihms-1812351-f0001.jpg

PRISMA flow diagram of the systematic review.

A study had to include health care professionals, assess an intervention (e.g., training, workshop, didactics, contact, program) designed to address explicit or implicit bias held by health care providers, be written in English, and be published between May 2011 and May 2021. We excluded commentaries, theoretical frameworks, editorials, and institutional or societal pledges that address racism, although these were reviewed for context. We did not exclude qualitative studies, studies without comparison groups, or studies outside North America. However, although we did find studies from other countries detailing explicit and implicit biases, we did not find articles with interventions addressing these biases for inclusion in this review. We extracted subjects, intervention format (e.g., lectures, workshops, discussions, panels, interviews), target (e.g., knowledge, skills, attitudes, IAT), and summary of key findings.

We excluded abstracts that did not include original research or bias reduction as an expected outcome; that did not employ a discrete intervention or, like the CHANGES study, retrospectively identified effective interventions; or that studied populations other than health professions students, trainees, or providers. We excluded articles that focused on self-stigma (e.g., from a diagnosis of obesity, HIV, sexually transmitted infection, mental health) and community-based interventions, as they were not focused specifically on the bias of health professionals. Observational studies without discrete interventions were excluded but were reviewed in Section 1 .

Title, abstract, and full-text review were conducted by three authors (M.B.V., A.I.E., and N.A.S.) and coded to consensus.

2.4.2. Findings.

Twenty-five studies met inclusion criteria ( Table 3 ). None of the studies mentioned in Sections 1 and 2 met inclusion criteria but were reviewed because of their significant contributions to the understanding of the interactions of implicit bias in learning and clinical settings. Most studies (68%) engaged medical students and utilized classroom or web-based interventions. Most studies did not have a control group (72%) and none used actual clinical settings. Three studies focused on interventions for implicit bias of faculty serving on admissions or search committees.

Provider-level implicit bias interventions

Study populationInterventionEvaluation/outcomesLimitationsReference
Interventions without formal measurement of implicit bias/attitudesMedical students ( = 25)Study and control groups Study group participated in 5-h dialogues on race and biasPre- and postsurveys
Paired -tests demonstrated increased knowledge and awareness of racial bias and increased comfort talking about race.
No formal bias measure
Self-selected study group of students
Faculty who serve on search committees ( = 22)2-h reflection-based workshop on unconscious biasPost-intervention survey evaluated effectiveness and utility of exercise.
Most surveyed found workshop helpful in preparing for faculty searches.
Extremely limited evaluation (no pre-/postcomparison)
No formal bias measure
Medical students = 615)2-day orientation on power, privilege, and biasPost-intervention survey Surveys demonstrated raised bias awareness.No formal bias measure
No pre-/postcomparison
Medical students ( = 187)Five 2-h workshops with lectures on biasPre- and postsurveys
Paired -tests on surveys demonstrated raised awareness of own biases and intent to address bias.
No formal bias measure
Health professions educators = 70)Introduced new longitudinal case conference curriculum called HER to discuss and address the impact of structural racism and implicit bias on patient care
Utilized case-based discussion, evidence-based exercises, and two conceptual frameworks
Tracked conference attendance and postconference surveys
Most survey respondents (88% or more) indicated that HER promoted personal reflection on implicit bias, and 7 5 % or more indicated that HER would affect their clinical practice.
No pre-/postcomparison
No formal bias measure
No control group
Faculty = 66)90-min interactive workshop that included a reflective exercise, role-play, brief didactic session, and case-based discussion on use of language in patient chartsPost-intervention survey with four Likert scale questions
Participants felt workshop met its
objectives (4.8 out of 5.0) and strongly agreed that they would apply skills learned (4.8).
Self-selected study group
No measure of bias
No control group
Family medicine residents ( = 31)Training on institutional racism, colonization, and cultural power followed by humanism and instruction on taking health equity time-outs during clinical timeFocus groups conducted 6 months post-intervention
Four themes:
No measure of bias
No pre-/postcomparison
Qualitative analysis only
No control group
Medical students ( = 26)Service-learning plus reflectionReflection practice questionnaire analysis
Students reported recognizing and mitigating bias.
No formal measure of bias used
No control group
Medical students ( = 127)Readings/reflections on weight stigma
Standardized patient before and after
Pre-/post-intervention questionnaires
Reduced stereotyping, increased empathy, and improved counseling confidence
Weak analysis may be biased itself.
No formal bias measurement
No control group
Interventions with formal measurement of implicit bias/attitudesMedical students/elective ( = 218)Single session in which students completed an IAT followed by discussionPost IAT survey
Implicit bias deniers were significantly more likely to report IAT results with implicit preferences toward self, to believe the IAT is invalid, and to believe that doctors and the health care system provide equal care to all, and were less likely to report having directly observed inequitable care.
Self-selected study group
No control group
Medical students ( = 180)Single IAT administration followed by guided reflective discussion and essay writingEvaluation of reflective essays
Students noted raised awareness of bias but were not able to strategize solutions to mitigate bias.
Prompt did not ask for strategies
No control group
Medical students ( = 15)Nine 1.5-h sessions focused on promoting skills to empower students to recognize implicit bias reduction as part of professionalism
Three objectives (grounded in implicit bias recognition and transformative learning theory):
Post-intervention focus groups and analysis of semistructured interviews
Major themes:
Self-selected small group of students
No control group
Medical students ( = 72)IAT administration followed by small group debrief and discussion on biasQualitative analysis of discussion transcripts
Students who reach for normative versus personal standards had higher implicit bias post-intervention.
No post IAT measure of bias
No control group
Nursing students ( = 75)Pre/post IAT with debriefing, writing, and teaching of bias management techniques (e.g., internal feedback, humanism)Postclass survey, conducted 5 weeks after the intervention
Learners were extremely likely or likely to ( ) take additional IATs and reflect on the results and ( ) learn more about unconscious bias.
No formal analysis of pre/post IATs, but focus was on acceptance of bias and management
No control group
Medical students ( = 78)Workshops that involved IAT administration, instruction on implicit bias and impact on decision making, and presentation of six strategies to reduce implicit biasReduction of implicit bias against Hispanics as measured by an IAT in majority students only
No change for minority students was demonstrated.
No control group
Nonclinical setting
Medical students, house staff, faculty ( = 468)Twenty workshops to emphasize skill building and include lectures, guided reflections, and facilitated discussions focused on the following: Survey response rate was 80%; a paired -test
Pre- and postsurveys to evaluate the intervention’s capacity to improve awareness of bias and address it through allyship
Demonstrated greatest improvements in understanding of the process of allyship; ability to describe strategies to address, assess, and recognize unconscious bias; and knowledge of managing situations in which prejudice, power, and privilege are involved
Improved confidence in addressing bias but no measure of bias reduction
Faculty on admissions committee ( = 140)Black-White IAT administered before 2012–2013 medical school admission cycle
Study participants received results before start of admission cycle and were surveyed on the impact at the end of cycle in May 2013
Most survey respondents (67%) thought the IAT might be helpful in reducing bias, 48% were conscious of their individual results when interviewing candidates in the next cycle, and 21 % reported knowledge of their IAT results impacted their admissions decisions in the subsequent cycle.
This class is the most diverse to
matriculate in the Ohio State University College of Medicine’s history.
Unclear whether other factors affected matriculation of students
Faculty members ( = 281)Standardized, 20-min educational intervention to educate faculty about implicit biases and strategies for overcoming themPre-/postassessments that included the following: The intervention had a small but significant effect on the implicit biases surrounding women and leadership of all participants regardless of age and gender.
Faculty experienced significant increases in their perceptions of personal bias (Cohen’s = 0.50 and 0.17; < 0.01 for both questions), perceptions of societal bias (Cohen’s = 0.14, 0.12, and 0.25; < 0.05 for all three questions), and perceptions of bias in academic medicine (Cohen’s = 0.38, 0.57, and 0.58; < 0.001 for all three questions).
Immediate impact only
No control group
Medical students ( = 64)Study participants watched video linking obesity to genetics and environmentBeliefs about Obese Persons, Attitudes toward Obese Persons, and Fat Phobia Scales administered pre- and post-intervention
Paired -tests revealed decreased negative stereotypes and beliefs.
No longitudinal results
No control group
House staff ( = 69)Narrative photography to prompt reflection and photovoice of Latino adolescentsControl and intervention groups
Measured ethnocultural empathy, health care empathy, patient centeredness, and implicit attitudes using the affect misattribution procedure
All measures improved with some note of dose response with more exposure.
Nonclinical setting
Medical students ( = 129)Workshop to address obesity-related bias using theater reading (intervention group) of play versus lecture (control group) on obesity
Students randomly assigned to groups
Obesity-specific IAT, anti-fat attitudes questionnaire pre-/postworkshop
Reduced explicit fat bias in theater group with no change in implicit bias or empathy post-intervention or 4 months later
Nonclinical setting
Primary care providers ( = 185)Study participants randomized to intervention (lecture and contact)/control (lecture and discussion)Beliefs and Attitudes towards Mental Health Service Users’ Rights Scale
Reduced stigmatizing beliefs and attitudes at 1 month in the intervention group but rebound effect at 3 months
No formal measure of bias
Nonclinical setting
Medical students ( = 111)One-time contact-based educational intervention on the stigma of mental illness among medical students and compared this with a multimodal undergraduate psychiatry courseOpening Minds Scale for Health Care Providers to assess changes in stigma
Stigma scores for both groups were significandy reduced upon course completion ( < 0.0001) but were not significandy changed following the one-time contact-based educational intervention in the primary analysis.
Nonclinical setting
Medical students ( = 160)Intergroup contact theory (facilitated contact to reduce bias) plus 50 h of competency-based curriculum on inclusive care of LGBTQ and gender-nonconforming individuals through lectures, standardized patients, discussion, panels, and reflective writingHad study and control groups
Pre and post IATs with debriefings demonstrated reduced implicit preference for straight people.
IAT with debriefings were important when used to facilitate curriculum.
Nonclinical setting
Medical students ( = 50)Three cultural competency training sessions led by LGBTQ2S+ experts and elders from the community
Study participants randomized to intervention and control groups
Focus group discussions conducted
Pre-/postassessment
Lesbian, Gay, and Bisexual Knowledge and Attitudes Scale for Heterosexuals and The Riddle Scale: Attitudes towards Gay, Lesbian, Bisexual, and Trans people survey
Measurable and relevant changes in health care students’ perceived knowledge, attitudes, and clinical behavior regarding LGBTQ2S+ populations as a result
Nonclinical setting

Abbreviations: HER, Health Equity Round; IAT, implicit association test.

3. DURATION OF INTERVENTION EFFECT

The three studies of faculty serving on admissions or search committees reported increased awareness of biases, but none reported bias reduction or long-lasting impact.

Three studies followed subjects 3, 4, and 6 months post-intervention, but only one noted a lasting positive impact ( 96 ).

4. NOVEL INTERVENTION CONTENT

All studies addressing implicit bias among health care providers raised awareness of implicit bias through didactic instruction, discussions, workshops or other reflection-based techniques (e.g., service-learning, photovoice, contact-based interventions, theater reading; see Table 4 ), or an IAT or similar measure.

Definitions of intervention types used in selected studies

Intervention typeDefinition
Allyship training“An active, consistent, and arduous practice of unlearning and re-evaluating, in which a person of privilege seeks to operate in solidarity with a marginalized group” ( )
“Allyship begins with an awareness of unconscious biases and then moves to actions that address inequities in everyday interactions to create an inclusive culture for example to amplify the voices of those in underrepresented groups and to advocate for equitable practices” ( , p. 6).
Bias literacyPromotes a basic understanding of key terms, skills and concepts related to bias as a first step to organizational change ( , p. 64; , p. 22)
Brave space“A space where difficult, diverse, and often controversial issues are presented and can be discussed with a common goal of understanding the barriers to equity in health care” ( , p. 87)
Emotional regulation“The processes by which we influence which emotions we have, when we have them, and how we experience and express them” ( , p. 282)
Intergroup contactThe promotion of contact between two groups with the goal of reducing prejudice ( , p. 66)
Photovoice“A method that allows participants to use photography to document their experiences and dialogue to eventually influence change” ( , p. 318)
Service-learningA “pedagogy of engagement wherein students address a genuine community need by engaging in volunteer service that is connected explicitly to the academic curriculum through structured ongoing reflections” ( , p. 115)
Theater readingPlay reading with students as active participants ( , p. 232)

Despite the limitations noted in Section 2 , the IAT continues to be widely utilized. The IAT and other measures ( 32 ) of implicit bias, stigma, and attitudes toward groups of persons were used among subjects to ( a ) demonstrate the existence of participant implicit biases, ( b ) act as a springboard to create cognitive dissonance for oral and/or written reflection and to practice bias management skills, and ( c ) evaluate interventions. Gonzalez et al. ( 37 ) found that using the IAT without priming on its results and without a follow-up debriefing led some subjects (22%) to question the validity of the measure and the existence of implicit biases, and therefore advised judicious use of the IAT and trained facilitators. Subjects who accepted the results of the IAT were not able to develop management strategies for those biases without dedicated instruction.

Despite having low explicit bias based on a self-reported survey, admissions committee members at The Ohio State University College of Medicine ( 14 ) had high levels of implicit preference for White versus Black students as measured by the Black-White IAT. Results were presented to committee members with strategies to reduce implicit bias. The following admissions cycle resulted in an increase in underrepresented minority matriculation from 17% to 20%, a change that was not statistically significant.

Seventy-six percent of studies ( 8 , 13 , 14 , 23 , 28 , 35 – 38 , 48 , 51 , 58 , 59 , 77 , 82 , 94 , 96 , 99 , 109 ) instructed on structural determinants such as structural racism and/or historic oppression of groups so that subjects could explore explicit and implicit biases. All these studies demonstrated an increased awareness of bias, and subjects often voiced a willingness to address their biases. Four studies explored the use of contact with groups with identities such as LGBTQI ( 58 , 59 ) and persons with mental illness ( 27 , 77 ) with positive and negative results, respectively.

In recognition that biases may be immutable in the current health care context but can be managed, educators have used transformative learning theory (TLT) in concert with implicit bias management techniques. TLT transforms the individual’s existing paradigm by disrupting assumptions and then engaging in critical reflection and dialogue to interpret the disruptions ( 68 ). TLT may move learners to an “inclusive, self-reflective and integrative frame of reference” ( 100 , p. 718). This paired approach has had early success. Sherman et al. ( 96 ) engaged both residents and faculty in transformative learning to address issues of race, racism, and Whiteness and created an environment for critical dialogue incorporating practical recommendations for addressing implicit bias in clinical practice. Focus groups 4 months later revealed that subjects noted increased awareness of their biases and sustained commitment to addressing racial bias, to challenging their own clinical decision making, and to engaging leadership in dialogue regarding bias.

Gonzalez et al. ( 38 ) describe implicit bias recognition and management (IBRM), a process that promotes conscious awareness of biases and fosters behavioral changes. IBRM supposes that biases are difficult to reduce and should therefore be managed. IBRM has helped medical students interrupt biases in learning and clinical settings. Wu et al. ( 109 ) paired IAT administration with training to improve skills in bias literacy, emotional regulation, and allyship ( Table 4 ). Trainees practiced these skills in clinical vignettes and improved their confidence in addressing bias in real-world settings. All three studies created a brave space to explore biases and emphasized continued practice and development of skills.

These studies have multiple limitations. They often lacked control groups or used pre- and postcomparison designs. They had limited longitudinal follow-up and often were not performed in real-world clinical or learning environments. Many studies did not focus on targeted outcomes, and most did not access the continuum of learners in medical education such as practicing health care providers and leadership. Most interventions had a limited one-time delivery with no opportunity to measure a dose- or time-dependent effect.

5. DISCUSSION

Many of the interventions demonstrated successful promotion of awareness of implicit bias held among subjects as well as an interest in mitigating implicit biases among subjects. No intervention in this review, however, achieved sustained reduction of implicit bias among health care professionals or trainees. In addition, no study demonstrated that an intervention improved clinical outcomes, the learning environment, interprofessional team dynamics, patient care, health disparities, patient satisfaction, or satisfaction of health professionals. Studies were hampered by lack of statistical analysis, lack of control group, limited numbers of participants, findings that are not necessarily generalizable from the classroom or web-based setting to the clinical or real-world setting, and heavy reliance on qualitative assessments or nonvalidated instruments. Future studies should also assess whether regularly timed booster interventions manifest in sustained changes over time and should have longer-term follow-up to assess sustainability of initial gains. Future studies should include educational models that use direct clinical observation or standardized patients. Studies should assess health care trainees’ ability to incorporate skills into patient communication and shared decision making, their improvement of clinical delivery practices, their interactions with colleagues, and their teaching practices.

5.1. Conceptual Model

Based on Jones’s ( 54 ) allegory A Gardener’s Tale , we present a conceptual model of implicit biases of health care providers and the key structural factors affecting these biases ( Figure 2 ). In the vicious cycle of health disparity, students, trainees, and providers receive a constant barrage of messaging that reinforces biases. The soil of their work (practice and learning environments) is laden with structural bias from racialized medicine, a biased learning environment, and poor compositional diversity. Furthermore, these trainees and health care providers are under substantial time pressure and cognitive load. These characteristics of the practice and learning environments may be considered structural determinants of implicit bias.

An external file that holds a picture, illustration, etc.
Object name is nihms-1812351-f0002.jpg

Interactions between structural determinants and provider implicit bias. The vicious cycle: Structural determinants of implicit bias in the practice environment support biased decision making. Structural determinants of health in the community further impair outcomes in marginalized populations, leading to confirmation of the practitioner’s implicit bias. Health disparities are exacerbated. The virtuous cycle: A favorable practice environment regarding structural determinants of implicit bias supports unbiased clinical decision making. Favorable structural determinants of health in the community further enhance patient outcomes, positively reinforcing unbiased practice. Health disparities are reduced.

Biases are now primed as the clinician moves to provide care to patients (see the left side of Figure 2 ). When caring for marginalized patients, the provider’s bias influences communication with the patient, potentially resulting in suboptimal decision making. The patient may sense the bias, may distrust the provider and system, and may decide to not follow through on treatment plans or may modify them. The patient lives in underresourced and unhealthy spaces that contribute to poor outcomes. The provider notes the poor outcomes and their implicit bias is confirmed. Health care disparities are exacerbated. Further exacerbation of the vicious cycle occurs when this dynamic is accompanied with biases toward students, trainees, and providers from marginalized groups. Individuals from these marginalized groups are less likely to succeed, confirming biases about them and perpetuating poor diversity in the health care workforce. The benefits of diversity to education and patient care are lost.

The right side of Figure 2 depicts the virtuous cycle of health equity. A well-resourced provider learning and working within an environment devoid of racialized medicine and bias and characterized by compositional diversity is less likely to display biases against the patient. Compositional diversity also increases the likelihood that the provider shares lived experiences with the patient. The patient notes the absence of provider bias, develops a trusting relationship, adheres to the treatment plan in a well-resourced environment, and returns with improved health outcomes. The patient’s outcome confirms the provider’s more favorable bias. Health care disparities are reduced.

This conceptual model highlights two important dynamics in the perpetuation of implicit bias and its impact on care. First, structural determinants in the health care system and surrounding community contribute to the development of implicit bias toward marginalized patient populations and then reinforce that implicit bias through generation of poorer patient outcomes. Second, interruption of this cycle is possible only through an overall shift toward favorable structural influences on implicit bias. Discrete, time-limited training as the sole intervention to reduce implicit bias is unlikely to result in sustained change; health care providers return to a practice or learning environment that is often replete with structural determinants and patient outcomes that reinforce implicit bias. To avoid the ongoing creation and perpetuation of racist structures in society, systems, and organizations, it is crucial to recognize that these dynamics may enhance the implicit bias of medical leaders and policy makers as well.

5.2. Taking Action

To enable provider-level bias interventions to succeed in improving health outcomes, multiple other concurrent approaches should address structural factors inside and outside the health care system that influence these biases ( 80 ).

Structural inequities outside the health care system include poor access to high-quality health care, racialized violence, the carceral state, crowded housing, healthy food scarcity, lack of access to green spaces, environmental toxins, and poorly protected workspaces, among other issues related to geography and place ( 19 , 103 ).

Structural inequities inside the health care system that prime bias include the work and learning environments of students, trainees, and providers ( 104 ). It will be important to address these structural drivers of bias, including time pressures, cognitive load, and the practice of racialized medicine. Racism, sex and gender discrimination, and other forms of discrimination must be rooted out, as they prevent marginalized trainees and faculty from thriving, create stereotype threat for the marginalized, and confirm bias for the nonmarginalized. Bioethical principles of fairness, distributive justice, and reciprocity should be core for public health officials and health care providers, and practitioner and provider trainings in these areas can raise awareness. For example, to address health inequities laid bare by COVID-19, Peek et al. ( 79 ) recommend a multifactorial approach that acknowledges the systemic racism of the health care system and other societal structures as well as the biases of providers ( 67 ).

Addressing compositional diversity in health care is another avenue for treating the structures that influence implicit and explicit biases and eliminate health care disparities. Minority health professionals are underrepresented in the workforce and health professions faculty ( 60 ). Only 6.2% of medical students identify as Hispanic or Latinx, and only 8.4% as Black or African American ( 1 ). Gender parity among medical school students has been achieved. However, women are underrepresented at the faculty instructor level, with substantially less representation at the professor level, and are also underrepresented in hospital leadership, with even starker inequities for female racial and ethnic minorities ( 33 , 88 ). Gender inequalities in salaries have been well documented ( 12 , 62 , 71 ). In academic medicine, Black male faculty are offered lower rates of compensation than their White counterparts and are less likely to be awarded research funding from the National Institutes of Health ( 34 ). Similarly, in 2016, graduate student enrollment in the Association of Schools and Programs of Public Health demonstrated a ≤5% increase over a 20-year period among Asian, Black, Hispanic, and Native American students; only 11.1% of students were Black and 12% were Hispanic. Black, Hispanic, and Native American representation among tenured public health faculty increased <3% during this same 20-year period ( 39 ).

6. CONCLUSION

TLT, IBRM, and a skills-based approach offer promise for future interventions in implicit bias management. It is also encouraging that discussions around disparities and inequities have moved from race to racism and have focused on the professional responsibility of providers to root out inequities and manage biases. The extant literature regarding the use of provider-level implicit bias interventions suggests that these interventions can play an important role in concert with other interventions that more broadly address bias and discrimination inside and outside the health care system. Evidence supports the use of provider-level interventions in immediate-impact activities such as decision making on search committees or admissions committees and raising critical awareness of the bioethical principles of fairness, distributive justice, and reciprocity. However, provider-level implicit bias interventions alone have not improved health outcomes. Thus, provider-level implicit bias interventions should be accompanied by interventions that systemically change structures inside and outside the health care system that influence biases and perpetuate health inequities.

ACKNOWLEDGMENTS

The authors extend their heartfelt thanks to Debra A. Werner, the University of Chicago’s Librarian for Science Instruction & Outreach and Biomedical Reference Librarian, for her patient guidance and assistance with the systematic literature review, and Morgan Ealey, Administrative Manager, Section of General Internal Medicine, who helped format the manuscript.

DISCLOSURE STATEMENT

M.E.P. and M.H.C. were supported in part by Bridging the Gap: Reducing Disparities in Diabetes Care National Program Office, funded by the Merck Foundation, and the Chicago Center for Diabetes Translation Research, funded by the National Institute of Diabetes and Digestive and Kidney Diseases (P30 DK092949). M.H.C. was also supported in part by Advancing Health Equity: Leading Care, Payment, and Systems Transformation, a program funded by the Robert Wood Johnson Foundation. M.H.C. is a member of the Blue Cross Blue Shield Health Equity Strategy advisory panel, Bristol Myers Squibb Company Health Equity Initiative advisory board, and The Joint Commission and Kaiser Permanente Bernard J. Tyson National Award for Excellence in Pursuit of Healthcare Equity review panel. The other authors are not aware of any affiliations, memberships, funding, or financial holdings that might be perceived as affecting the objectivity of this review.

LITERATURE CITED

  • Research article
  • Open access
  • Published: 22 August 2024

A systematic review and meta-analysis of randomized trials of substituting soymilk for cow’s milk and intermediate cardiometabolic outcomes: understanding the impact of dairy alternatives in the transition to plant-based diets on cardiometabolic health

  • M. N. Erlich 1 , 2 ,
  • D. Ghidanac 1 , 2 ,
  • S. Blanco Mejia 1 , 2 ,
  • T. A. Khan 1 , 2 ,
  • L. Chiavaroli 1 , 2 , 3 ,
  • A. Zurbau 1 , 2 ,
  • S. Ayoub-Charette 1 , 2 ,
  • A. Almneni 4 ,
  • M. Messina 5 ,
  • L. A. Leiter 1 , 2 , 3 , 6 , 7 ,
  • R. P. Bazinet 1 ,
  • D. J. A. Jenkins 1 , 2 , 3 , 6 , 7 ,
  • C. W. C. Kendall 1 , 2 , 8 &
  • J. L. Sievenpiper 1 , 2 , 3 , 6 , 7  

BMC Medicine volume  22 , Article number:  336 ( 2024 ) Cite this article

2065 Accesses

87 Altmetric

Metrics details

Dietary guidelines recommend a shift to plant-based diets. Fortified soymilk, a prototypical plant protein food used in the transition to plant-based diets, usually contains added sugars to match the sweetness of cow’s milk and is classified as an ultra-processed food. Whether soymilk can replace minimally processed cow’s milk without the adverse cardiometabolic effects attributed to added sugars and ultra-processed foods remains unclear. We conducted a systematic review and meta-analysis of randomized controlled trials, to assess the effect of substituting soymilk for cow’s milk and its modification by added sugars (sweetened versus unsweetened) on intermediate cardiometabolic outcomes.

MEDLINE, Embase, and the Cochrane Central Register of Controlled Trials were searched (through June 2024) for randomized controlled trials of ≥ 3 weeks in adults. Outcomes included established markers of blood lipids, glycemic control, blood pressure, inflammation, adiposity, renal disease, uric acid, and non-alcoholic fatty liver disease. Two independent reviewers extracted data and assessed risk of bias. The certainty of evidence was assessed using GRADE (Grading of Recommendations, Assessment, Development, and Evaluation). A sub-study of lactose versus sucrose outside of a dairy-like matrix was conducted to explore the role of sweetened soymilk which followed the same methodology.

Eligibility criteria were met by 17 trials ( n  = 504 adults with a range of health statuses), assessing the effect of a median daily dose of 500 mL of soymilk (22 g soy protein and 17.2 g or 6.9 g/250 mL added sugars) in substitution for 500 mL of cow’s milk (24 g milk protein and 24 g or 12 g/250 mL total sugars as lactose) on 19 intermediate outcomes. The substitution of soymilk for cow’s milk resulted in moderate reductions in non-HDL-C (mean difference, − 0.26 mmol/L [95% confidence interval, − 0.43 to − 0.10]), systolic blood pressure (− 8.00 mmHg [− 14.89 to − 1.11]), and diastolic blood pressure (− 4.74 mmHg [− 9.17 to − 0.31]); small important reductions in LDL-C (− 0.19 mmol/L [− 0.29 to − 0.09]) and c-reactive protein (CRP) (− 0.82 mg/L [− 1.26 to − 0.37]); and trivial increases in HDL-C (0.05 mmol/L [0.00 to 0.09]). No other outcomes showed differences. There was no meaningful effect modification by added sugars across outcomes. The certainty of evidence was high for LDL-C and non-HDL-C; moderate for systolic blood pressure, diastolic blood pressure, CRP, and HDL-C; and generally moderate-to-low for all other outcomes. We could not conduct the sub-study of the effect of lactose versus added sugars, as no eligible trials could be identified.

Conclusions

Current evidence provides a good indication that replacing cow’s milk with soymilk (including sweetened soymilk) does not adversely affect established cardiometabolic risk factors and may result in advantages for blood lipids, blood pressure, and inflammation in adults with a mix of health statuses. The classification of plant-based dairy alternatives such as soymilk as ultra-processed may be misleading as it relates to their cardiometabolic effects and may need to be reconsidered in the transition to plant-based diets.

Trial registration

ClinicalTrials.gov identifier, NCT05637866.

Peer Review reports

Major dietary guidelines recommend a shift to plant-based diets for public and planetary health [ 1 , 2 , 3 , 4 , 5 , 6 , 7 , 8 ] , while recommending simultaneous reductions in ultra-processed foods [ 2 , 3 , 4 , 5 , 6 , 7 , 8 ]. The shift to plant-based diets has resulted in an explosion of dairy, meat, and egg alternatives with plant protein foods projected to reach almost 10% of the global protein market by 2030 [ 9 ]. Although these foods can aid in the transition to plant-based diets, food classification systems such as the World Health Organization (WHO)-endorsed NOVA classification system classify them as ultra-processed foods to be avoided [ 10 ].

Dairy alternatives are an important example of a food category at the crossroads of these competing recommendations. School milk programs provide > 150 million servings of cow’s milk to children worldwide [ 11 ]. These programs are in addition to the food service and procurement policies of public institutions such as schools, universities, hospitals, long-term care homes, and prisons. Many of these programs and policies do not allow for the free replacement of cow’s milk with nutrient-dense plant milks [ 12 , 13 ]. Although the Dietary Guidelines for Americans [ 1 ], Canada’s Food Guide [ 3 ], and several European food-based dietary guidelines [ 14 ] recognize fortified soymilk [ 1 ] as nutritionally equivalent to cow’s milk, school nutrition programs in the United States (US) [ 12 ] and Europe [ 13 ] only provide funding for cow’s milk. There is a bipartisan bill before the US congress to change this policy and provide funding for fortified soymilk [ 15 ]. A major barrier to the use of fortified soymilk is that it contains added sugars to match the sweetness of cow’s milk at a level which would disqualify it from meeting the Food and Drug Administration’s proposed definition of “healthy” [ 16 ] (although its total sugar content is usually ~ 60% less than that of cow’s milk given the higher sweetness intensity of sucrose vs lactose) [ 17 ] and is classified (irrespective of its sugar content) as an ultra-processed food to be avoided [ 10 , 18 ]. Cow’s milk, on the other hand, enjoys classification as a “healthy,” minimally processed food to be encouraged [ 10 , 18 ].

As industry innovates in response to the growing demand and policy makers develop public health nutrition policies and programs in response to the evolving dietary guidance for more plant-based diets, it is important to understand whether nutrient-dense ultra-processed plant protein foods can replace minimally processed dairy foods without the adverse cardiometabolic effects attributed to added sugars and ultra-processed foods. We conducted a systematic review and meta-analysis of randomized controlled trials of the effect of substituting soymilk for minimally processed cow’s milk and its modification by added sugars (sweetened versus unsweetened) on intermediate cardiometabolic outcomes as a basis for understanding the role of nutrient-dense ultra-processed plant protein foods in the transition to plant-based diets.

We followed the Cochrane Handbook for Systematic Reviews of Interventions to conduct this systematic review and meta-analysis and reported our results by the Preferred Reporting Items for Systematic Reviews and Meta-Analysis (PRISMA) guidelines [ 19 , 20 ] (Additional file 1 : Table 1). To explore whether added sugars mediate any effects observed in sweetened soymilk studies, we conducted an additional systematic review and meta-analysis sub-study. This separate investigation followed the same protocol and methodology as our main study. It focused on controlled trials examining the impact of lactose in isocaloric comparisons with fructose-containing sugars (such as sucrose, high-fructose corn syrup [HFCS], or fructose) when not included in a dairy-like matrix, on all outcomes in the main study. The protocol is registered at ClinicalTrials.gov (NCT05637866).

Data sources and search strategy

We searched MEDLINE, Embase, and the Cochrane Central Register of Controlled Trials databases through June 2024. The detailed search strategies for the main study and sub-study were based on validated search terms [ 21 ] (Additional file 1 : Tables 2 and 4). Manual searches of the reference lists of included studies supplemented the systematic search.

Study selection

The main study included randomized controlled trials in human adults with any health status. Included trials had a study duration of ≥ 3 weeks and investigated the effects of soymilk compared with cow’s milk in energy matched conditions on intermediate cardiometabolic outcomes (Additional file 1 : Table 3). Trials that included other comparators that were not cow’s milk or had no viable outcome data were excluded. No restrictions were placed on language. For the sub-study, we included controlled trials involving adults of all health statuses that had a study duration of ≥ 3 weeks and investigated the effects of added sugars compared with lactose on the same intermediate cardiometabolic outcomes (Additional file 1 : Table 5).

Data extraction

A minimum of two investigators (ME, DG, SBM, AA) independently extracted relevant data from eligible studies. Extracted data included study design, sample size, sample characteristics (age, body mass index [BMI], sex, health status), intervention characteristics (soymilk volume, total sugars content, soy protein dose), control characteristics (cow’s milk volume, total sugars content, milk protein dose, milk fat content), baseline outcome levels, background diet, follow-up duration, setting, funding sources, and outcome data. The authors were contacted for missing outcome data when it was indicated that a relevant outcome was measured but not reported. Graphically presented data were extracted from figures using Plot Digitizer [ 22 ].

Outcomes for the main study and sub-study included blood lipids (low-density lipoprotein cholesterol [LDL-C], high-density lipoprotein cholesterol [HDL-C], non-high-density lipoprotein cholesterol [non-HDL-C], triglycerides, and apolipoprotein B [ApoB]), glycemic control (hemoglobin A1c [HbA1c], fasting plasma glucose, 2-h postprandial glucose, fasting insulin, and plasma glucose area under the curve [PG-AUC]), blood pressure (systolic blood pressure and diastolic blood pressure), inflammation (c-reactive protein [CRP]), adiposity (body weight, BMI, body fat, and waist circumference), kidney function and structure (creatinine, creatinine clearance, glomerular filtration rate [GFR], estimated glomerular filtration rate [eGFR], albuminuria, and albumin-creatinine ratio [ACR]), uric acid, and non-alcoholic fatty liver disease (NAFLD) (intrahepatocellular lipid [IHCL], alanine transaminase [ALT], aspartate aminotransferase [AST], and fatty liver index).

Mean differences (MDs) between the intervention and control arm and respective standard errors were extracted for each trial. If these were not provided, they were derived from available data using published formulas [ 19 ]. Mean pairwise difference in change-from-baseline values were preferred over end values. When median data was provided, they were converted to mean data with corresponding variances using methods developed by McGrath et al. [ 23 ]. When no variance data was available, the standard deviation of the MDs was borrowed from a trial similar in size, participants, and nature of intervention. All disagreements were reconciled by consensus or with a senior reviewer (JLS).

Risk of bias assessment

Included studies were assessed for the risk of bias independently and in duplicate by at least two investigators (ME, DG, SBM, AA) using the Cochrane Risk of Bias (ROB) 2 Tool [ 24 ]. The assessment was performed across six domains of bias (randomization process, deviations from intended interventions, missing outcome data, measurement of the outcome, selection of the reported result, and overall bias). Crossover studies were assessed for an additional domain of bias (risk of bias arising from period or carryover effects). The ROB for each domain was assessed as “low” (plausible bias unlikely to seriously alter the results), “high” (plausible bias that seriously weakens confidence in results), or “some concern” (plausible bias that raises some doubt about the results). Reviewer discrepancies were resolved by consensus or arbitration by a senior investigator (JLS).

Statistical analysis

STATA (version 17; StataCorp LP, College Station, TX) was used for all analyses for the main study and sub-study. The principal effect measures were the mean pair-wise differences in change from baseline (or alternatively, end differences) between the intervention arm providing the soymilk and the cow’s milk comparator/control arm in each trial (significance at P MD  < 0.05). Results are reported as MDs with 95% confidence intervals (95% CI). As one of our primary research questions relates to the role of added sugars as a mediator in any observed differences between soymilk and cow’s milk, we stratified results by the presence of added sugars in the soymilk (sweetened versus unsweetened) and assessed effect modification by this variable on pooled estimates. Data were pooled using the generic inverse variance method with DerSimonian and Laird random effect models [ 25 ]. Fixed effects were used when less than five trials were available for an outcome [ 26 ]. A paired analysis was applied for crossover designs and for within-individual correlation coefficient between treatment of 0.5 as described by Elbourne et al. [ 27 , 28 ].

Heterogeneity was assessed using the Cochran’s Q statistic and quantified using the I 2 statistic, where I 2  ≥ 50% and P Q  < 0.10 were used as evidence of substantial heterogeneity [ 19 ]. Potential sources of heterogeneity were explored using sensitivity analyses. Sensitivity analyses were done via two methods. We conducted an influence analysis by systematically removing one trial at a time and recalculating the overall effect estimate and heterogeneity. A trial was considered influential if its removal explained the substantial heterogeneity or altered the direction, magnitude, or significance of the summary estimate. To determine whether the overall summary estimates were robust to the use of an assumed correlation coefficient for crossover trials, we conducted a second sensitivity analysis by using correlation coefficients of 0.25 and 0.75. If ≥ 10 trials were available, meta-regression analyses were used to assess the significance of each subgroup categorically and when possible, continuously (significance at P  < 0.05). A priori subgroup analyses included soy protein dose, follow-up duration, baseline outcome levels, comparator, design, age, health status, funding, and risk of bias.

If ≥ 6 trials are available [ 29 ], dose–response analyses were performed using meta-regression to assess linear (by generalized least squares trend (GLST) estimation models) and non-linear spline curve modeling (by MKSPLINE procedure) dose–response gradients (significance at P  < 0.05).

If ≥ 10 studies were available, publication bias was assessed by inspection of contour-enhanced funnel plots and formal testing with Egger’s and Begg’s tests (significance at P  < 0.10) [ 30 , 31 , 32 ]. If evidence of publication bias was suspected, the Duval and Tweedie trim-and-fill method was performed to adjust for funnel plot asymmetry by imputing missing study data and assess for small-study effects [ 33 ].

Certainty of evidence

The Grading of Recommendations Assessment, Development, and Evaluation (GRADE) approach was used to assess the certainty of evidence. The GRADE Handbook and GRADEpro V.3.2 software were used [ 34 , 35 ]. A minimum of two investigators (ME, DG, SBM) independently performed GRADE assessments for each outcome [ 36 ]. Discrepancies were resolved by consensus or arbitration by the senior author (JLS). The overall certainty of evidence was graded as either high, moderate, low, or very low. Randomized trials are initially graded as high by default and then downgraded or upgraded based on prespecified criteria. Reasons for downgrading the evidence included study limitations (risk of bias assessed by the Cochrane ROB Tool), inconsistency of results (substantial unexplained interstudy heterogeneity, I 2  > 50% and P Q  < 0.10), indirectness of evidence (presence of factors that limit the generalizability of the results), imprecision (the 95% CI for effect estimates overlap with the MID for benefit or harm), and publication bias (evidence of small-study effects). The evidence was upgraded if a significant dose–response gradient was detected. We defined the importance of the magnitude of the pooled effect estimates using prespecified MIDs (Additional file 1 : Table 6) with GRADE guidance [ 36 , 37 , 38 , 39 , 40 , 41 , 42 , 43 , 44 , 45 , 46 , 47 , 48 , 49 , 50 ] according to five levels: very large (≥ 10 MID); large (≥ 5 MID); moderate (≥ 2 MID); small important (≥ 1 MID); and trivial/unimportant (< 1 MID) effects.

Search results

Figure 1 in Appendix shows the flow of the literature for the main analysis. We identified 522 reports through database and manual searches. A total of 17 reports [ 51 , 52 , 53 , 54 , 55 , 56 , 57 , 58 , 59 , 60 , 61 , 62 , 63 , 64 , 65 , 66 , 67 ] met the inclusion criteria and contained data for LDL (10 trials, n  = 312), HDL-C (8 trials, n  = 271), non-HDL-C (7 trials, n  = 243), triglycerides (9 trials, n  = 278), HbA1c (1 trial, n  = 25), fasting plasma glucose (5 trials, n  = 147), 2-h plasma glucose (1 trial, n  = 28), fasting insulin (4 trials, n  = 119), systolic blood pressure (5 trials, n  = 158), diastolic blood pressure (5 trials, n  = 158), CRP (5 trials, n  = 147), body weight (6 trials, n  = 163), BMI (6 trials, n  = 173), body fat (1 trial, n  = 43), waist circumference (3 trials, n  = 90), creatinine (1 trial, n  = 25), eGFR (1 trial, n  = 25), ALT (1 trial, n  = 24), and AST (1 trial, n  = 24) involving 504 participants. No trials were available for ApoB, PG-AUC, creatinine clearance, eGFR, albuminuria, ACR, uric acid, IHCL, or fatty liver index.

Additional file 1 : Fig. 1 shows the flow of literature for the sub-study. We identified 1010 reports through database and manual searches. After excluding 305 duplicates, a total of 705 reports were reviewed by title and abstract. No reports met the inclusion criteria and therefore no data was available for analysis.

Trial characteristics

Table 1 shows the characteristics of the included trials. The trials were conducted in a variety of locations, with most conducted in Iran (7/17 trials, 41%), followed by the US (3/17 trials, 18%), Italy (2/17 trials, 12%), Brazil (1/17 trials, 6%), Scotland (1/17 trials, 6%), Sweden (1/17 trials, 6%), Spain (1/17 trials, 6%), and Australia (1/17 trials, 6%). All trials took place in outpatient settings (17/17, 100%). The median trial size was 25 participants (range, 7–60 participants). The median age of the participants was 48.5 years (range, 20–70 years) and the median BMI was 27.9 kg/m 2 (range, 20–31.1 kg/m 2 ). The trials included participants with hypercholesterolemia (4/17 trials, 25%), overweight or obesity (4/17 trials, 25%), type 2 diabetes (2/17 trials, 12%), hypertension (1/17 trials, 6%), rheumatoid arthritis (1/17 trials, 6%), or were healthy (3/17 trials, 18%) or post-menopausal (2/17 trials, 12%). Both trials with crossover design (10/17 trials, 59%) and parallel design (7/17 trials, 41%) were included. The intervention included sweetened (11/17 trials, 65%) and unsweetened (6/17 trials, 35%) soymilk.

The median soymilk dose was 500 mL/day (range, 240–1000 mL/day) with a median soy protein of 22 g/day (range, 2.5–70 g/day) or 6.6 g/250 mL (range, 2.6–35 g/250 mL) and median total (added) sugars of 17.2 g/day (range, 4.0–32 g/day) or 6.9 g/250 mL (range, 1–16 g/250 mL) in the sweetened soymilk. The comparators included skim (0% milk fat) (2/17 trials, 12%), low-fat (1% milk fat) (4/17 trials, 24%), reduced fat (1.5–2.5% milk fat) (7/17 trials, 41%), and whole (3% milk fat) (1/17 trials, 6%) cow’s milk. Three trials did not report the milk fat content of cow’s milk used. The median cow’s milk dose was 500 mL/day (range, 236–1000 mL/day) with a median milk protein of 24 g/day (range, 3.3–70 g/day) or 8.3 g/250 mL (range, 3.4–35 g/250 mL) and median total (lactose) sugars of 24 g/day (range, 11.5–49.2 g/day) or 12 g/250 mL (range, 10.8–12.8 g/250 mL). The median study duration was 4 weeks (range, 4–16 weeks). The trials received funding from industry (1/17 trials, 6%), agency (8/17 trials, 47%), both industry and agency (4/16 trials, 25%), or they did not report the funding source (4/17 trials, 24%).

Additional file 1 : Fig. 2 shows the ROB assessments of the included trials. Two trials were assessed as having some concerns from period or carryover effects: Bricarello et al. [ 53 ] and Steele [ 67 ]. All other trials were judged as having an overall low risk of bias. There was no evidence of serious risk of bias across the included trials.

Markers of blood lipids

Figure 2 and Additional file 1 : Figs. 3–6 show the effect of substituting soymilk for cow’s milk on markers of blood lipids. The substitution resulted in a small important reduction in LDL-C (10 trials; MD: − 0.19 mmol/L; 95% CI: − 0.29 to − 0.09 mmol/L; P MD  < 0.001; no heterogeneity: I 2  = 0.0%; P Q  = 0.823), a trivial increase in HDL-C (8 trials; MD: 0.05 mmol/L; 95% CI: 0.00 to 0.09 mmol/L; P MD  = 0.036; no heterogeneity: I 2  = 0.0%; P Q  = 0.053), a moderate reduction in non-HDL-C (7 trials; MD: − 0.26 mmol/L; 95% CI: − 0.43 to − 0.10 mmol/L; P MD  = 0.002; no heterogeneity: I 2  = 0.0%; P Q  = 0.977), and no effect on triglycerides. There were no interactions by added sugars in soymilk for any blood lipid markers ( P  = 0.49–0.821).

Markers of glycemic control

Figure 2 and Additional file 1 : Figs. 7–10 show the effect of substituting soymilk for cow’s milk on markers of glycemic control. The substitution had no effect on HbA1c, fasting plasma glucose, 2-h plasma glucose, or fasting insulin. There was no interaction by added sugars in soymilk for fasting plasma glucose ( P  = 0.747) but there was an interaction for fasting insulin ( P  = 0.026), where a lack of effect remained in both groups with neither the sweetened soymilk (non-significant increasing effect) nor the unsweetened soymilk (non-significant decreasing effect) showing an effect on fasting insulin. We could not assess this interaction for HbA1c or 2-h plasma glucose, as there was only one trial available for each outcome.

Blood pressure

Figure 2 and Additional file 1 : Figs. 11 and 12 show the effect of substituting soymilk for cow’s milk on blood pressure. The substitution resulted in a moderate reduction in both systolic blood pressure (5 trials; MD: − 8.00 mmHg; 95% CI: − 14.89 to − 1.11 mmHg; P MD  = 0.023; substantial heterogeneity: I 2  = 86.89%; P Q  ≤ 0.001) and diastolic blood pressure (5 trials; MD: − 4.74 mmHg; 95% CI: − 9.17 to − 0.31 mmHg; P MD  = 0.036; substantial heterogeneity: I 2  = 77.3%; P Q  = 0.001). There were no interactions by added sugars in soymilk for blood pressure ( P  = 0.747 and 0.964).

Markers of inflammation

Figure 2 and Additional file 1 : Fig. 13 show the effect of substituting soymilk for cow’s milk on markers of inflammation. The substitution resulted in a small important reduction in CRP (5 trials; MD: − 0.81 mg/dL; 95% CI: − 1.26 to − 0.37 mg/dL; P MD  = < 0.001; no heterogeneity: I 2  = 0.0%; P Q  = 0.814). There was no interaction by added sugars in soymilk for CRP ( P  = 0.275).

Markers of adiposity

Figure 2 and Additional file 1 : Figs. 14–17 show the effect of substituting soymilk for cow’s milk on markers of adiposity. The substitution had no effect on body weight, BMI, body fat, or waist circumference. There were no interactions by added sugars in soymilk for any adiposity outcome ( P  = 0.664–0.733).

Markers of kidney function

Figure 2 and Additional file 1 : Figs. 18 and 19 show the effect of substituting soymilk for cow’s milk on markers of kidney function. The substitution had no effect on creatinine or eGFR. We could not assess the interaction by added sugars in soymilk for creatinine or eGFR, as there was only one trial available for each outcome which included soymilk without added sugars.

Markers of NAFLD

Figure 2 and Additional file 1 : Figs. 20 and 21 show the effect of substituting soymilk for cow’s milk on markers of NAFLD. The substitution had no effect on ALT or AST. We could not assess heterogeneity or the interaction by added sugars in soymilk for ALT or AST, as there was only one trial available for each outcome which included soymilk without added sugars.

Sensitivity analysis

Additional file 1 : Figs. 22–33 present the influence analyses across all outcomes. The removal of Bricarello et al. [ 53 ] or Steele [ 67 ] each resulted in loss of significant effect for HDL-C. The removal of Onning et al. [ 62 ] or Steele [ 67 ] each resulted in a partial explanation of heterogeneity for triglycerides. The removal of Hasanpour et al. [ 56 ] explained the heterogeneity for fasting insulin. The removal of Keshavarz et al. [ 57 ] or Miraghajani et al. [ 59 ] each resulted in a loss of significant effect for systolic blood pressure and the removal of Rivas et al. [ 63 ] resulted in a partial explanation of the heterogeneity for systolic blood pressure. The removal of Hasanpour et al. [ 56 ], Keshavarz et al. [ 57 ], Miraghajani et al. [ 59 ], or Rivas et al. [ 63 ] each resulted in a loss of significant effect for diastolic blood pressure and the removal of Rivas et al. [ 63 ] resulted in a partial explanation of heterogeneity for diastolic blood pressure. The removal of Mohammad-Shahi et al. [ 58 ] resulted in loss of significant effect for CRP.

Additional file 1 : Table 8 shows the sensitivity analyses for the different correlation coefficients (0.25 and 0.75) used in paired analyses of crossover trials for all outcomes. The different correlation coefficients did not alter the direction, magnitude, or significance of the effect or evidence for heterogeneity, with the following exceptions: loss of significance for the effect of the substitution on HDL-C (8 trials; MD: 0.04 mmol/L; 95% CI: − 0.10 to 0.01 mmol/L; P MD  = 0.107; I 2  = 0.0%; P Q  = 0.670) with the use of 0.25 and (8 trials; MD: 0.05 mmol/L; 95% CI: − 0.10 to 0.01 mmol/L; P MD  = 0.089; I 2  = 0.0%; P Q  = 0.640) with the use of 0.75.

Subgroup analyses

Additional file 1 : Figs. 34–36 present the subgroup analyses and continuous meta-regression analyses for LDL-C. Subgroup analysis was not conducted for any other outcome as there were < 10 trials included. There was no significant effect modification by health status, BMI, age, comparator, baseline LDL-C, study design, follow-up duration, funding source, dose of soy protein, or risk of bias for LDL-C. However, there were tendencies towards a greater reduction in LDL-C by point estimates in groups with certain health statuses (hypercholesterolemic and overweight/obesity), a higher baseline LDL-C, and a higher soy protein dose (> 25 g/day).

Dose–response analyses

Additional file 1 : Figs. 37–42 present linear and non-linear dose–response analyses for LDL-C, HDL-C, non-HDL-C, triglycerides, body weight, and BMI. There was no dose–response seen for the effect of substituting soymilk for cow’s milk, with the exception of a positive linear dose–response for triglycerides ( P linear  = 0.038). We did not downgrade the certainty of evidence as the greater reduction in triglycerides seen at lower doses of soy protein was lost at higher doses. There were no dose–response analyses performed for the remaining outcomes because there were < 6 trials available for each.

Publication bias assessment

Additional file 1 : Fig. 43 presents the contour-enhanced funnel plot for assessment of publication bias for LDL-C. There was no asymmetry at the visual inspection and no evidence (Begg’s test = 0.721, Egger’s test = 0.856) of funnel plot asymmetry for LDL-C. No other publication bias analyses could be performed as there were < 10 trials available for each.

Adverse events and acceptability

Additional file 1 : Table 9 shows the reported adverse events and acceptability of study beverages. Adverse events were reported in nine trials. In one trial by Gardner et al. [ 55 ], one participant experienced a recurrence of a cancer; however, it was considered to be unrelated to the short-term consumption of the study milks. Three trials (Miraghajani et al., Hasanpour et al., and Mohammad-Shahi, et al.) [ 56 , 58 , 59 ] reported one to two withdrawals due to digestive difficulties related to soymilk consumption. Two trials (Sirtori et al. 1999 and 2002) [ 65 , 66 ] reported one or more participants with digestive difficulties related to cow’s milk consumption. Two trials (Nourieh et al. and Keshavarz et al.) [ 57 , 61 ] each reported two participant withdrawals related to digestive problems that were not specific to either study beverage. Of these, four trials indicated that most participants found the soymilk and cow’s milk acceptable and tolerable. One trial, by Onning et al. [ 62 ], incorporated a sensory evaluation of appearance, consistency, flavor, and overall impression, which showed declining scores for both types of milk over the 3-week test period.

GRADE assessment

Additional file 1 : Table 10 presents the GRADE assessment. The certainty of evidence for the effect of substituting soymilk for cow’s milk was high for LDL-C, non-HDL-C, fasting plasma glucose, and waist circumference. The certainty of evidence was moderate for HDL-C, triglycerides, fasting insulin, systolic blood pressure, diastolic blood pressure, CRP, body weight, and BMI owing to a downgrade for imprecision of the pooled effect estimates and was moderate for body fat owing to a downgrade for indirectness. The certainty of evidence was low for HbA1c, 2-h plasma glucose, creatinine, eGFR, ALT, and AST owing to downgrades for indirectness and imprecision.

We conducted a systematic review and meta-analysis of 17 trials that examined the effect of substituting soymilk (median dose of 22 g/day or 6.6 g/250 mL serving of soy protein per day and 17.2 g/day or 6.9 g/250 mL of total [added] sugars in the sweetened soymilk) for cow’s milk (median dose of 24 g/day or 8.3 g/250 mL of milk protein and 24 g/day or 12 g/250 mL of total sugars [lactose]) and its modification by added sugars (sweetened versus unsweetened soymilk) on 19 intermediate cardiometabolic outcomes over a median follow-up period of 4 weeks in adults of varying health status. The substitution of soymilk for cows’ milk led to moderate reductions in non-HDL-C (− 0.26 mmol/L or ~ − 7%) and systolic blood pressure (− 8.00 mmHg) and diastolic blood pressure (− 4.74 mmHg); small important reductions in LDL-C (− 0.19 mmol/L or ~ − 6%) and CRP (− 0.81 mg/L or ~ 22%); and a trivial increase in HDL-C (0.05 mmol/L or ~ 4%), with no adverse effects on other intermediate cardiometabolic outcomes. There was no meaningful interaction by added sugars in soymilk, with sweetened and unsweetened soymilk showing similar effects across outcomes. There was no dose–response relationship seen across the outcomes for which dose–response analyses were performed.

Findings in relation to the literature

Our findings agree with previous evidence syntheses of soy. Regulatory authorities such as the United States Food and Drug Administration (FDA) and Health Canada have conducted comprehensive evaluations of the randomized controlled trials of the effect of soy protein from different sources on total-C and LDL-C, resulting in approved health claims for soy protein (based on an intake of 25 g/day of soy protein irrespective of source) for cholesterol reduction [ 68 ] and coronary heart disease risk reduction [ 69 ]. Updated systematic reviews and meta-analyses of the 46 randomized controlled trials included in the re-evaluation of the FDA health claim [ 70 ] showed reductions in LDL-C of − 3.2% [ 71 ]. This reduction has been stable since the health claim was first approved in 1999 [ 72 ] and is smaller but consistent with our findings specifically for soymilk. No increase in HDL-C, however, was detected. Previous systematic reviews and meta-analyses of randomized controlled trials of soy protein and soy isoflavones have also shown significant but smaller reductions in systolic blood pressure (1.70 mmHg) and diastolic blood pressure (− 1.27 mmHg) [ 73 ] than was found in the current analysis. These reductions in LDL-C and blood pressure are further supported by reductions in clinical events with updated pooled analyses of prospective cohort studies showing that legumes including soy are associated with reduced incidence of total cardiovascular disease and coronary heart disease [ 74 ].

Systematic reviews and meta-analyses that specifically isolated the effect of soymilk (as a single food matrix) in its intended substitution for cow’s milk are lacking. Sohouli and coworkers [ 75 ] conducted a systematic review and meta-analysis of 18 randomized controlled trials in 665 individuals of varying health status that assessed the effect of soymilk in comparison with a mix of comparators on intermediate cardiometabolic outcomes but did not isolate its substitution with cow’s milk. This synthesis showed similar improvements in LDL-C (− 0.24 mmol/L), systolic blood pressure (− 7.38 mmHg), diastolic blood pressure (− 4.36 mmHg), and CRP (− 1.07, mg/L), while also showing reductions in waist circumference and TNF-α [ 75 ]. The substitution of legumes that includes soy for various animal protein sources and more specifically legumes/nuts (the only exposure available) for dairy in syntheses of prospective cohort studies has also shown reductions in incident total cardiovascular disease and all-cause mortality [ 76 ].

Indirect evidence from dietary patterns that contain soy foods including soymilk in substitution for different animal sources of protein including cow’s milk further supports our findings. Systematic reviews and meta-analyses of randomized trials of the Portfolio diet and vegetarian and vegan dietary patterns have shown additive reductions in LDL-C, non-HDL-C, blood pressure, and CRP when soy foods including soymilk are combined with other foods that target these same intermediate risk factors with displacement of different animal sources of protein including cow’s milk [ 77 , 78 ]. These reductions have also been shown to translate to reductions in clinical events with systematic reviews and meta-analyses of prospective cohort studies showing that adherence to these dietary patterns is associated with reductions in incident coronary heart disease, total cardiovascular disease, and all-cause mortality [ 79 , 80 , 81 ].

Potential mechanisms of action

The potential mechanism mediating the effects of soy remains unclear. Specific components within the soy food matrix, including soy protein and phytochemicals like isoflavones [ 82 ], have been implicated. The well-established lipid-lowering effect of soy [ 72 ] may be attributed to the 7S globulin fraction of soy protein, which exerts its primary action by upregulating LDL-C receptors predominantly within the liver, thereby augmenting the clearance of LDL-C from circulation [ 82 ]. The isoflavone, fiber, fatty acids, and anti-nutrient components may also exert some mediation [ 83 ]. The reduction in blood pressure has been most linked to the soy isoflavones [ 83 ]. There is evidence that soy isoflavones may modulate the renin–angiotensin–aldosterone system (RAAS), with the capacity to inhibit the production of angiotensin II and aldosterone, thereby contributing to the regulation of blood pressure [ 73 ]. Another blood pressure lowering mechanism may involve the ability of soy isoflavones to enhance endothelial function by mitigating oxidative stress and inflammation, consequently promoting the release of the relaxing factor nitric oxide (NO) [ 73 ]. This potential mechanism of isoflavones may also explain the reductions seen in inflammation.

Strengths and limitations

Our evidence synthesis had several strengths. First, we completed a comprehensive and reproducible systematic search and selection process of the available literature examining the effect of substituting soymilk for cow’s milk on intermediate cardiometabolic outcomes. Second, we synthesized the totality of available evidence from a large body of randomized controlled trials, which gives the greatest protection against systematic error. Third, we included an extensive and comprehensive list of outcomes to fully capture the impact of soymilk on cardiometabolic health. Fourth, we only included randomized controlled trials that compared soymilk to cow’s milk directly, to increase the specificity of our conclusion. Finally, we included a GRADE assessment to explore the certainty of available evidence.

There were also several limitations. First, we could not conduct the sub-study of the effect of lactose versus added sugars outside of a dairy-like matrix, as no eligible trials could be identified. Although this analysis is important for isolating the effect of added sugars as a mediator of any adverse effects, we did not observe any meaningful interaction by added sugars in soymilk. Second, there was serious imprecision in the pooled estimates across many of the outcomes with the 95% confidence intervals overlapping the MID in each case, with the exception of LDL-C, non-HDL-C, fasting plasma glucose, and waist circumference. The certainty of evidence for HDL-C, triglycerides, HbA1c, fasting plasma glucose, 2-h plasma glucose, fasting insulin, systolic blood pressure, diastolic blood pressure, CRP, body weight, BMI, body fat, creatinine, eGFR, ALT, and AST was downgraded for this reason. Third, there was evidence of indirectness related to insufficient trials for HbA1c, 2-h plasma glucose, creatinine, eGFR, ALT, and AST, which limits generalizability. Each outcome with data from only 1 trial was downgraded for this reason. Another source of indirectness could be the median follow-up duration of 4 weeks (range, 4–16 weeks). This time frame may be sufficient for observing certain effects, but other outcomes may require a longer period to manifest changes. Despite acknowledging this variation in response time among different outcomes, we did not further downgrade for this aspect of indirectness. Instead, we tailored our conclusions to reflect short-to-moderate term effects. Finally, although publication bias was not suspected, we were only able to make this assessment for LDL-C, as there were < 10 trials for all other outcomes.

Considering these strengths and limitations, we assessed the certainty of evidence as high for LDL-C and non-HDL-C; moderate for systolic blood pressure, diastolic blood pressure, CRP, and HDL-C; and moderate-to-low for all outcomes where significant effects were not observed.

Implications

This work has important implications for plant protein foods in the recommended shift to more plant-based diets. Major international dietary guidelines in the US [ 1 ], Canada [ 3 ], and Europe [ 4 , 5 , 6 ] recommend fortified soymilk as the only suitable replacement for cow’s milk. Our findings support this recommendation showing soymilk including sweetened soymilk (up to 7 g added sugars per 250 mL) does not have any adverse effects compared with cow’s milk across 19 intermediate cardiometabolic outcomes with benefits for lipids, blood pressure, and inflammation. This evidence suggests that it may be misleading as it relates to their cardiometabolic effects to classify fortified soymilk as an ultra-processed food to be avoided while classifying cow’s milk as a minimally processed food to be encouraged (based on the WHO-endorsed NOVA classification system [ 10 ]). It also suggests that it may be misleading not to allow fortified soymilk that is sweetened with small amounts of sugars to be classified as “healthy” (based on the FDA’s new proposed definition that only permits this claim on products with added sugars ≤ 2.5 g or 5% daily value (DV) per 250 mL serving [ 16 ]). The proposed FDA criteria would prevent this claim on soymilk products designed to be iso-sweet analogs of cow’s milk (in which 5 g or 10% daily value [DV] of added sugars from sucrose in soymilk is equivalent to the 12 g of lactose in cow’s milk per 250 mL serving, as sucrose is 1.4 sweeter than lactose [ 17 ]). To prevent confusion, policy makers may want to exempt fortified soymilk from classification as an ultra-processed food and allow added sugars up to 10% DV for the definition of “healthy,” as has been proposed by the FDA for sodium and saturated fat in dairy products (including soy-based dairy alternatives) to account for accepted processing and preservation methods [ 16 ]. These policy considerations would balance the need to limit nutrient-poor energy-dense foods with the need to promote nutrient-dense foods like fortified soymilk in the shift to healthy plant-based diets.

In conclusion, the evidence provides a good indication that substituting either sweetened or unsweetened soymilk for cow’s milk in adults with varying health statuses does not have the adverse effects on intermediate cardiometabolic outcomes attributed to added sugars and ultra-processed foods in the short-to-moderate term. There appear even to be advantages with small to moderate reductions in established markers of blood lipids (LDL-C, non-HDL-C) that are in line with approved health claims for cholesterol and coronary heart disease risk reduction, as well as small to moderate reductions in blood pressure and inflammation (CRP). Sources of uncertainty include imprecision and indirectness in several of the estimates. There remains a need for more well-powered randomized controlled trials of the effect of substituting soymilk for cow’s milk on less studied intermediate cardiometabolic outcomes, especially established markers of glycemic control, kidney structure and function, and NAFLD. There is also a need for trials comparing lactose versus added sugars outside of a dairy-like matrix to understand better the role of added sugars at different levels in substitution for lactose across outcomes. In the meantime, our findings support the use of fortified soymilk with up to 7 g added sugars per 250 mL as a suitable replacement for cow’s milk and suggest that its classification as ultra-processed and/or not healthy based on small amounts of added sugars may be misleading and need to be reconsidered to facilitate the recommended transition to plant-based diets.

Availability of data and materials

All data generated or analyzed during this study are included in this published article and its Additional file 1 : information files.

Abbreviations

Grading of Recommendations, Assessment, Development, and Evaluation

Non-high-density lipoprotein cholesterol

Low-density lipoprotein cholesterol

C-reactive protein

High-density lipoprotein cholesterol

World Health Organization

United States

Preferred Reporting Items for Systematic Reviews and Meta-Analysis

High-fructose corn syrup

Body mass index

Apolipoprotein B

Hemoglobin A1c

Plasma glucose area under the curve

Glomerular filtration rate

Estimated glomerular filtration rate

Albumin-creatinine ratio

Non-alcoholic fatty liver disease

Intrahepatocellular lipid

Alanine transaminase

Aspartate aminotransferase

Mean difference

Risk of bias

95% Confidence interval

Generalized least squares trend

Food and Drug Administration

Tumor necrosis factor alpha

Renin-angiotensin-aldosterone system

Nitric oxide

Daily value

Dietary guidelines for Americans, 2020–2025. 2020 [9:[Available from: www.dietaryguidelines.gov .

Canada, Health. Canada’s Food Guide. Ottawa; 2019.  https://food-guide.canada.ca/en/ .

Canada’s food guide Ottawa 2019 [Available from: https://food-guide.canada.ca/en/ .

Blomhoff R, Andersen R, Arnesen EK, Christensen JJ, Eneroth H, Erkkola M, Gudanaviciene I, Halldórsson ÞI, Höyer-Lund A, Lemming EW. Nordic nutrition recommendations 2023: integrating environmental aspects. Nordisk Ministerråd; 2023.

García EL, Lesmes IB, Perales AD, Arribas VM, del Puy Portillo Baquedano M, Velasco AMR, Salvo UF, Romero LT, Porcel FBO, Laín SA. Report of the Scientific Committee of the Spanish Agency for Food Safety and Nutrition (AESAN) on sustainable dietary and physical activity recommendations for the Spanish population. Wiley Online Library; 2023. Report No.: 2940–1399.

Brink E, van Rossum C, Postma-Smeets A, Stafleu A, Wolvers D, van Dooren C, et al. Development of healthy and sustainable food-based dietary guidelines for the Netherlands. Public Health Nutr. 2019;22(13):2419–35.

Article   PubMed   PubMed Central   Google Scholar  

Lichtenstein AH, Appel LJ, Vadiveloo M, Hu FB, Kris-Etherton PM, Rebholz CM, et al. 2021 dietary guidance to improve cardiovascular health: a scientific statement from the American Heart Association. Circulation. 2021;144(23):e472–87.

Article   PubMed   Google Scholar  

Willett W, Rockström J, Loken B, Springmann M, Lang T, Vermeulen S, et al. Food in the Anthropocene: the EAT–Lancet Commission on healthy diets from sustainable food systems. The lancet. 2019;393(10170):447–92.

Article   Google Scholar  

Bartashus J, Srinivasan G. Plant-based foods poised for explosive growth. Bloomberg Intelligence. 2021.

Monteiro CA, Cannon G, Lawrence M, Costa Louzada Md, Pereira Machado P. Ultra-processed foods, diet quality, and health using the NOVA classification system. Rome: FAO; 2019. p. 48.

International Dairy Federation. The contribution of school milk programmes to the nutrition of children worldwide. Brussels: Belgium; 2020.

Google Scholar  

USDA Food and Nutrition Service. Special Milk Program [Available from: https://www.fns.usda.gov/smp/special-milk-program .

The European Parliament. European Parliament resolution of 9 May 2023 on the implementation of the school scheme [Available from: https://www.europarl.europa.eu/doceo/document/TA-9-2023-0135_EN.html .

European Commission. Summary of FBDG recommendations for milk and dairy products for the EU, Iceland, Norway, Switzerland and the United Kingdom. [Available from: https://knowledge4policy.ec.europa.eu/health-promotion-knowledge-gateway/food-based-dietary-guidelines-europe-table-7_en .

Addressing Digestive Distress in Stomachs of Our Youth (ADD SOY) Act, House of Representatives, 1st Sess.; 2023.  https://troycarter.house.gov/sites/evo-subsites/troycarter.house.gov/files/evo-media-document/add-soy-act.pdf .

Food and Drug Administration. Food labeling: nutrient content claims; definition of term “healthy”. In: Department of Health and Human Services (HHS); 2022.  https://www.federalregister.gov/documents/2022/09/29/2022-20975/food-labeling-nutrient-content-claims-definition-of-term-healthy .

Helstad S. Chapter 20 - corn sweeteners. In: Serna-Saldivar SO, editor. Corn. 3rd ed. Oxford: AACC International Press; 2019. p. 551–91.

Chapter   Google Scholar  

Messina M, Sievenpiper JL, Williamson P, Kiel J, Erdman JW. Perspective: soy-based meat and dairy alternatives, despite classification as ultra-processed foods, deliver high-quality nutrition on par with unprocessed or minimally processed animal-based counterparts. Adv Nutr. 2022;13(3):726–38.

Article   PubMed   PubMed Central   CAS   Google Scholar  

Higgins J, Thomas J, Chandler J. Cochrane handbook for systematic reviews of interventions version 6.2. 2021.

Moher D, Liberati A, Tetzlaff J, Altman DG, Group* P. Preferred reporting items for systematic reviews and meta-analyses: the PRISMA statement. Ann Intern Med. 2009;151(4):264–9.

BMJ Best Practice. Search strategies [Available from: https://bestpractice.bmj.com/info/toolkit/learn-ebm/study-design-search-filters/ .

Rohatgi A. WebPlotDigitizer 4.6; 2022.  https://automeris.io/WebPlotDigitizer/ .

McGrath S, Zhao X, Steele R, Thombs BD, Benedetti A, Collaboration DESD. Estimating the sample mean and standard deviation from commonly reported quantiles in meta-analysis. Stat Methods Med Res. 2020;29(9):2520–37.

Sterne JAC, Savovic J, Page MJ, Elbers RG, Blencowe NS, Boutron I, et al. RoB 2: a revised tool for assessing risk of bias in randomised trials. BMJ. 2019;366: l4898.

DerSimonian R, Laird N. Meta-analysis in clinical trials. Control Clin Trials. 1986;7(3):177–88.

Article   PubMed   CAS   Google Scholar  

Tufanaru C, Munn Z, Stephenson M, Aromataris E. Fixed or random effects meta-analysis? Common methodological issues in systematic reviews of effectiveness. Int J Evid Based Healthc. 2015;13(3):196–207.

Elbourne DR, Altman DG, Higgins JP, Curtin F, Worthington HV, Vail A. Meta-analyses involving cross-over trials: methodological issues. Int J Epidemiol. 2002;31(1):140–9.

Balk EM, Earley A, Patel K, Trikalinos TA, Dahabreh IJ. Empirical assessment of within-arm correlation imputation in trials of continuous outcomes. 2013.

Fu R, Gartlehner G, Grant M, Shamliyan T, Sedrakyan A, Wilt TJ, et al. Conducting quantitative synthesis when comparing medical interventions: AHRQ and the Effective Health Care Program. J Clin Epidemiol. 2011;64(11):1187–97.

Peters JL, Sutton AJ, Jones DR, Abrams KR, Rushton L. Contour-enhanced meta-analysis funnel plots help distinguish publication bias from other causes of asymmetry. J Clin Epidemiol. 2008;61(10):991–6.

Egger M, Smith GD, Schneider M, Minder C. Bias in meta-analysis detected by a simple, graphical test. BMJ. 1997;315(7109):629–34.

Begg CB, Mazumdar M. Operating characteristics of a rank correlation test for publication bias. Biometrics. 1994;50(4):1088–101.

Duval S, Tweedie R. Trim and fill: a simple funnel-plot-based method of testing and adjusting for publication bias in meta-analysis. Biometrics. 2000;56(2):455–63.

Schünemann H, Brożek J, Guyatt G, Oxman A. GRADE handbook. Grading of Recommendations Assessment, Development and Evaluation, Grade Working Group. 2013.

McMaster University and Evidence Prime. GRADEpro GDT: GRADEpro Guideline Development Tool [Software]. gradepro.org .

Brunetti M, Shemilt I, Pregno S, Vale L, Oxman AD, Lord J, et al. GRADE guidelines: 10. Considering resource use and rating the quality of economic evidence. J Clin Epidemiol. 2013;66(2):140–50.

Guyatt GH, Oxman AD, Kunz R, Brozek J, Alonso-Coello P, Rind D, et al. GRADE guidelines 6. Rating the quality of evidence—imprecision. J Clin Epidemiol. 2011;64(12):1283–93.

Guyatt GH, Oxman AD, Kunz R, Woodcock J, Brozek J, Helfand M, et al. GRADE guidelines: 8. Rating the quality of evidence—indirectness. J Clin Epidemiol. 2011;64(12):1303–10.

Guyatt GH, Oxman AD, Kunz R, Woodcock J, Brozek J, Helfand M, et al. GRADE guidelines: 7. Rating the quality of evidence—inconsistency. J Clin Epidemiol. 2011;64(12):1294–302.

Guyatt GH, Oxman AD, Montori V, Vist G, Kunz R, Brozek J, et al. GRADE guidelines: 5. Rating the quality of evidence—publication bias. J Clin Epidemiol. 2011;64(12):1277–82.

Guyatt GH, Oxman AD, Santesso N, Helfand M, Vist G, Kunz R, et al. GRADE guidelines: 12. Preparing summary of findings tables-binary outcomes. J Clin Epidemiol. 2013;66(2):158–72.

Guyatt GH, Oxman AD, Sultan S, Glasziou P, Akl EA, Alonso-Coello P, et al. GRADE guidelines: 9. Rating up the quality of evidence. J Clin Epidemiol. 2011;64(12):1311–6.

Guyatt GH, Oxman AD, Vist G, Kunz R, Brozek J, Alonso-Coello P, et al. GRADE guidelines: 4. Rating the quality of evidence—study limitations (risk of bias). J Clin Epidemiol. 2011;64(4):407–15.

Guyatt GH, Thorlund K, Oxman AD, Walter SD, Patrick D, Furukawa TA, et al. GRADE guidelines: 13. Preparing summary of findings tables and evidence profiles-continuous outcomes. J Clin Epidemiol. 2013;66(2):173–83.

Kaminski-Hartenthaler A, Gartlehner G, Kien C, Meerpohl JJ, Langer G, Perleth M, et al. GRADE-Leitlinien: 11. Gesamtbeurteilung des Vertrauens in Effektschätzer für einen einzelnen Studienendpunkt und für alle Endpunkte. Zeitschrift für Evidenz, Fortbildung und Qualität im Gesundheitswesen. 2013;107(9):638–45.

Langendam M, Carrasco-Labra A, Santesso N, Mustafa RA, Brignardello-Petersen R, Ventresca M, et al. Improving GRADE evidence tables part 2: a systematic survey of explanatory notes shows more guidance is needed. J Clin Epidemiol. 2016;74:19–27.

Santesso N, Carrasco-Labra A, Langendam M, Brignardello-Petersen R, Mustafa RA, Heus P, et al. Improving GRADE evidence tables part 3: detailed guidance for explanatory footnotes supports creating and understanding GRADE certainty in the evidence judgments. J Clin Epidemiol. 2016;74:28–39.

Santesso N, Glenton C, Dahm P, Garner P, Akl EA, Alper B, et al. GRADE guidelines 26: informative statements to communicate the findings of systematic reviews of interventions. J Clin Epidemiol. 2020;119:126–35.

Balshem H, Helfand M, Schünemann HJ, Oxman AD, Kunz R, Brozek J, et al. GRADE guidelines: 3. Rating the quality of evidence. J Clin Epidemiol. 2011;64(4):401–6.

Schünemann HJ, Higgins JPT, Vist GE, Glasziou P, Akl EA, Skoetz N, Guyatt GH, Group, Cochrane GRADEing Methods and Group, the Cochrane Statistical Methods. Chapter 14: completing ‘summary of findings’ tables and grading the certainty of the evidence. Cochrane handbook for systematic reviews of interventions. 2019. p. 375–402.

Azadbakht L, Nurbakhsh S. Effect of soy drink replacement in a weight reducing diet on anthropometric values and blood pressure among overweight and obese female youths. Asia Pac J Clin Nutr. 2011;20(3):383–9.

PubMed   CAS   Google Scholar  

Beavers KM, Serra MC, Beavers DP, Cooke MB, Willoughby DS. Soymilk supplementation does not alter plasma markers of inflammation and oxidative stress in postmenopausal women. Nutr Res. 2009;29(9):616–22.

Bricarello LP, Kasinski N, Bertolami MC, Faludi A, Pinto LA, Relvas WG, et al. Comparison between the effects of soy milk and non-fat cow milk on lipid profile and lipid peroxidation in patients with primary hypercholesterolemia. Nutrition. 2004;20(2):200–4.

Faghih S, Hedayati M, Abadi A, Kimiagar M. Comparison of the effects of cow’s milk, fortified soy milk, and calcium supplement on plasma adipocytokines in overweight and obese women. Iranian Journal of Endocrinology and Metabolism. 2009;11(6):692–8.

Gardner CD, Messina M, Kiazand A, Morris JL, Franke AA. Effect of two types of soy milk and dairy milk on plasma lipids in hypercholesterolemic adults: a randomized trial. J Am Coll Nutr. 2007;26(6):669–77.

Hasanpour A, Babajafari S, Mazloomi SM, Shams M. The effects of soymilk plus probiotics supplementation on cardiovascular risk factors in patients with type 2 diabetes mellitus: a randomized clinical trial. BMC Endocr Disord. 2023;23(1):36.

Keshavarz SA, Nourieh Z, Attar MJ, Azadbakht L. Effect of soymilk consumption on waist circumference and cardiovascular risks among overweight and obese female adults. Int J Prev Med. 2012;3(11):798–805.

PubMed   PubMed Central   Google Scholar  

Mohammad-Shahi M, Mowla K, Haidari F, Zarei M, Choghakhori R. Soy milk consumption, markers of inflammation and oxidative stress in women with rheumatoid arthritis: a randomised cross-over clinical trial. Nutr Diet. 2016;73(2):139–45.

Miraghajani MS, Esmaillzadeh A, Najafabadi MM, Mirlohi M, Azadbakht L. Soy milk consumption, inflammation, coagulation, and oxidative stress among type 2 diabetic patients with nephropathy. Diabetes Care. 2012;35(10):1981–5.

Mitchell JH, Collins AR. Effects of a soy milk supplement on plasma cholesterol levels and oxidative DNA damage in men—a pilot study. Eur J Nutr. 1999;38(3):143–8.

Nourieh Z, Keshavarz SA, Attar MJH, Azadbakht L. Effects of soy milk consumption on inflammatory markers and lipid profiles among non-menopausal overweight and obese female adults. Int J Prev Med. 2012;3:798.

Onning G, Akesson B, Oste R, Lundquist I. Effects of consumption of oat milk, soya milk, or cow’s milk on plasma lipids and antioxidative capacity in healthy subjects. Ann Nutr Metab. 1998;42(4):211–20.

Rivas M, Garay RP, Escanero JF, Cia P Jr, Cia P, Alda JO. Soy milk lowers blood pressure in men and women with mild to moderate essential hypertension. J Nutr. 2002;132(7):1900–2.

Ryan-Borchers TA, Park JS, Chew BP, McGuire MK, Fournier LR, Beerman KA. Soy isoflavones modulate immune function in healthy postmenopausal women. Am J Clin Nutr. 2006;83(5):1118–25.

Sirtori CR, Pazzucconi F, Colombo L, Battistin P, Bondioli A, Descheemaeker K. Double-blind study of the addition of high-protein soya milk v. cows’ milk to the diet of patients with severe hypercholesterolaemia and resistance to or intolerance of statins. Br J Nutr. 1999;82(2):91–6.

Sirtori CR, Bosisio R, Pazzucconi F, Bondioli A, Gatti E, Lovati MR, et al. Soy milk with a high glycitein content does not reduce low-density lipoprotein cholesterolemia in type II hypercholesterolemic patients. Ann Nutr Metab. 2002;46(2):88–92.

Steele M. Effect on serum cholesterol levels of substituting milk with a soya beverage. Aust J Nutr Diet. 1992;49(1):24–8.

Summary of Health Canada’s assessment of a health claim about soy protein and cholesterol lowering Ottawa: Health Canada; 2015 [Available from: https://www.canada.ca/en/health-canada/services/food-nutrition/food-labelling/health-claims/assessments/summary-assessment-health-claim-about-protein-cholesterol-lowering.html .

Food and Drug Administration. Food labeling health claims; soy protein and coronary heart disease. Fed Regist. 1999;64:57699–733.

Food and Drug Administration. Food labeling health claims; soy protein and coronary heart disease. Fed Regist. 2017;82:50324–46.

Blanco Mejia S, Messina M, Li SS, Viguiliouk E, Chiavaroli L, Khan TA, et al. A meta-analysis of 46 studies identified by the FDA demonstrates that soy protein decreases circulating LDL and total cholesterol concentrations in adults. J Nutr. 2019;149(6):968–81.

Jenkins DJA, Blanco Mejia S, Chiavaroli L, Viguiliouk E, Li SS, Kendall CWC, et al. Cumulative meta-analysis of the soy effect over time. J Am Heart Assoc. 2019;8(13):e012458.

Mosallanezhad Z, Mahmoodi M, Ranjbar S, Hosseini R, Clark CCT, Carson-Chahhoud K, et al. Soy intake is associated with lowering blood pressure in adults: a systematic review and meta-analysis of randomized double-blind placebo-controlled trials. Complement Ther Med. 2021;59:102692.

Viguiliouk E, Glenn AJ, Nishi SK, Chiavaroli L, Seider M, Khan T, et al. Associations between dietary pulses alone or with other legumes and cardiometabolic disease outcomes: an umbrella review and updated systematic review and meta-analysis of prospective cohort studies. Adv Nutr. 2019;10(Suppl_4):S308–19.

Sohouli MH, Lari A, Fatahi S, Shidfar F, Găman M-A, Guimaraes NS, et al. Impact of soy milk consumption on cardiometabolic risk factors: a systematic review and meta-analysis of randomized controlled trials. Journal of Functional Foods. 2021;83:104499.

Neuenschwander M, Stadelmaier J, Eble J, Grummich K, Szczerba E, Kiesswetter E, et al. Substitution of animal-based with plant-based foods on cardiometabolic health and all-cause mortality: a systematic review and meta-analysis of prospective studies. BMC Med. 2023;21(1):404.

Chiavaroli L, Nishi SK, Khan TA, Braunstein CR, Glenn AJ, Mejia SB, et al. Portfolio dietary pattern and cardiovascular disease: a systematic review and meta-analysis of controlled trials. Prog Cardiovasc Dis. 2018;61(1):43–53.

Viguiliouk E, Kendall CW, Kahleova H, Rahelic D, Salas-Salvado J, Choo VL, et al. Effect of vegetarian dietary patterns on cardiometabolic risk factors in diabetes: a systematic review and meta-analysis of randomized controlled trials. Clin Nutr. 2019;38(3):1133–45.

Glenn AJ, Guasch-Ferre M, Malik VS, Kendall CWC, Manson JE, Rimm EB, et al. Portfolio diet score and risk of cardiovascular disease: findings from 3 prospective cohort studies. Circulation. 2023;148(22):1750–63.

Glenn AJ, Lo K, Jenkins DJA, Boucher BA, Hanley AJ, Kendall CWC, et al. Relationship between a plant-based dietary portfolio and risk of cardiovascular disease: findings from the Women’s Health Initiative prospective cohort study. J Am Heart Assoc. 2021;10(16): e021515.

Lo K, Glenn AJ, Yeung S, Kendall CWC, Sievenpiper JL, Jenkins DJA, Woo J. Prospective association of the portfolio diet with all-cause and cause-specific mortality risk in the Mr. OS and Ms. OS study. Nutrients. 2021;13(12):4360.  https://doi.org/10.3390/nu13124360 .

Jenkins DJ, Mirrahimi A, Srichaikul K, Berryman CE, Wang L, Carleton A, et al. Soy protein reduces serum cholesterol by both intrinsic and food displacement mechanisms. J Nutr. 2010;140(12):2302S-S2311.

Ramdath DD, Padhi EM, Sarfaraz S, Renwick S, Duncan AM. Beyond the cholesterol-lowering effect of soy protein: a review of the effects of dietary soy and its constituents on risk factors for cardiovascular disease. Nutrients. 2017;9(4):324.  https://doi.org/10.3390/nu9040324 .

Download references

Acknowledgements

Aspects of this work were presented at the following conferences: Canadian Nutrition Society (CNS), Quebec City, Canada, May 4–6, 2023; 40th International Symposium on Diabetes and Nutrition, Pula, Croatia, June 15–18, 2023; and Nutrition 2023—American Society for Nutrition (ASN), Boston, USA, July 22–25, 2023.

Authors’ Twitter handles

@Toronto_3D_Unit.

This work was supported by the United Soybean Board (the United States Department of Agriculture Soybean Checkoff Program [funding reference number, 2411–108-0101]) and the Canadian Institutes of Health Research (funding reference number, 129920) through the Canada-wide Human Nutrition Trialists’ Network (NTN). The Diet, Digestive tract, and Disease (3D) Centre, funded through the Canada Foundation for Innovation and the Ministry of Research and Innovation’s Ontario Research Fund, provided the infrastructure for the conduct of this work. ME was funded by a CIHR Canada Graduate Scholarship and Toronto 3D PhD Scholarship award. DG was funded by an Ontario Graduate Scholarship. TAK and AZ were funded by a Toronto 3D Postdoctoral Fellowship Award. LC was funded by a Toronto 3D New Investigator Award. SA-C was funded by a CIHR Canadian Graduate Scholarship. DJAJ was funded by the Government of Canada through the Canada Research Chair Endowment. None of the sponsors had any role in study design; in the collection, analysis, and interpretation of data; in the writing of the report; and in the decision to submit the article for publication. But one of the co-authors, Mark Messina, who was involved in all aspects of the study except data collection or analysis, is the Director of Nutrition Science and Research at the Soy Nutrition Institute Global, an organization that receives partial funding from the principal funder, the United Soybean Board (USB).

Author information

Authors and affiliations.

Department of Nutritional Sciences, Temerty Faculty of Medicine, University of Toronto, Toronto, ON, Canada

M. N. Erlich, D. Ghidanac, S. Blanco Mejia, T. A. Khan, L. Chiavaroli, A. Zurbau, S. Ayoub-Charette, L. A. Leiter, R. P. Bazinet, D. J. A. Jenkins, C. W. C. Kendall & J. L. Sievenpiper

Toronto 3D Knowledge Synthesis and Clinical Trials Unit, Clinical Nutrition and Risk Factor Modification Centre, St. Michael’s Hospital, Toronto, ON, Canada

M. N. Erlich, D. Ghidanac, S. Blanco Mejia, T. A. Khan, L. Chiavaroli, A. Zurbau, S. Ayoub-Charette, L. A. Leiter, D. J. A. Jenkins, C. W. C. Kendall & J. L. Sievenpiper

Li Ka Shing Knowledge Institute, St. Michael’s Hospital, Toronto, ON, Canada

L. Chiavaroli, L. A. Leiter, D. J. A. Jenkins & J. L. Sievenpiper

Royal College of Surgeons in Ireland, Dublin, Ireland

Soy Nutrition Institute Global, Washington, DC, USA

Department of Medicine, Temerty Faculty of Medicine, University of Toronto, Toronto, ON, Canada

L. A. Leiter, D. J. A. Jenkins & J. L. Sievenpiper

Division of Endocrinology and Metabolism, Department of Medicine, St. Michael’s Hospital, Toronto, ON, Canada

College of Pharmacy and Nutrition, University of Saskatchewan, Saskatoon, SK, Canada

C. W. C. Kendall

You can also search for this author in PubMed   Google Scholar

Contributions

The authors’ responsibilities were as follows: JLS designed the research (conception, development of overall research plan, and study oversight); ME and DG acquired the data; ME, SBM, TAK, and SAC performed the data analysis; JLS, ME, DG, SBM, AA, TAK, and LC interpreted the data; JLS and ME drafted the manuscript, have primary responsibility for the final content, and take responsibility for the integrity of the data and accuracy of the data analysis; JLS, MNE, DG, SBM, TAK, LC, AZ, SAC, AA, MM, LAL, RPB, CWCK, and DJD contributed to the project conception and critical revision of the manuscript for important intellectual content and read and approved the final version of the manuscript. The corresponding author attests that all listed authors meet the authorship criteria and that no others meeting the criteria have been omitted. All authors read and approved the final manuscript.

Corresponding author

Correspondence to J. L. Sievenpiper .

Ethics declarations

Ethics approval and consent to participate.

Not applicable.

Consent for publication

Competing interests.

TAK reports receiving grants from Institute for the Advancement of Food and Nutrition Sciences (IAFNS, formerly ILSI North America) and National Honey Board (USDA Checkoff program). He has received honorariums from Advancement of Food and Nutrition Sciences (IAFNS), the International Food Information Council (IFIC), the Calorie Control Council (CCC), the International Sweeteners Association (ISA), and AmCham Dubai. He has received funding from the Toronto 3D Knowledge Synthesis and Clinical Trials foundation. LC has received research support from the Canadian Institutes of health Research (CIHR), Protein Industries Canada (a Government of Canada Global Innovation Clusters), The United Soybean Board (USDA soy “Checkoff” program), and the Alberta Pulse Growers Association. AZ is a part-time research associate at INQUIS Clinical Research, Ltd., a contract research organization. She has received consulting fees from Glycemic Index Foundation Inc. SA-C has received an honorarium from the International Food Information Council (IFIC) for a talk on artificial sweeteners, the gut microbiome, and the risk for diabetes. MM was employed by the Soy Nutrition Institute Global, an organization that receives funding from the United Soybean Board (USB) and from members involved in the soy industry. RPB has received industrial grants, including those matched by the Canadian government, and/or travel support or consulting fees largely related to work on brain fatty acid metabolism or nutrition from Arctic Nutrition, Bunge Ltd., Dairy Farmers of Canada, DSM, Fonterra Inc, Mead Johnson, Natures Crops International, Nestec Inc. Pharmavite, Sancero Inc., and Spore Wellness Inc. Moreover, Dr. Bazinet is on the executive of the International Society for the Study of Fatty Acids and Lipids and held a meeting on behalf of Fatty Acids and Cell Signaling, both of which rely on corporate sponsorship. Dr. Bazinet has given expert testimony in relation to supplements and the brain. DJAJ has received research grants from Saskatchewan & Alberta Pulse Growers Associations, the Agricultural Bioproducts Innovation Program through the Pulse Research Network, the Advanced Foods and Material Network, Loblaw Companies Ltd., Unilever Canada and Netherlands, Barilla, the Almond Board of California, Agriculture and Agri-food Canada, Pulse Canada, Kellogg’s Company, Canada, Quaker Oats, Canada, Procter & Gamble Technical Centre Ltd., Bayer Consumer Care, Springfield, NJ, Pepsi/Quaker, International Nut & Dried Fruit Council (INC), Soy Foods Association of North America, the Coca-Cola Company (investigator initiated, unrestricted grant), Solae, Haine Celestial, the Sanitarium Company, Orafti, the International Tree Nut Council Nutrition Research and Education Foundation, the Peanut Institute, Soy Nutrition Institute (SNI), the Canola and Flax Councils of Canada, the Calorie Control Council, the Canadian Institutes of Health Research (CIHR), the Canada Foundation for Innovation (CFI), and the Ontario Research Fund (ORF). He has received in-kind supplies for trials as a research support from the Almond board of California, Walnut Council of California, the Peanut Institute, Barilla, Unilever, Unico, Primo, Loblaw Companies, Quaker (Pepsico), Pristine Gourmet, Bunge Limited, Kellogg Canada, and WhiteWave Foods. He has been on the speaker’s panel, served on the scientific advisory board and/or received travel support and/or honoraria from Lawson Centre Nutrition Digital Series, Nutritional Fundamentals for Health (NFH)-Nutramedica, Saint Barnabas Medical Center, The University of Chicago, 2020 China Glycemic Index (GI) International Conference, Atlantic Pain Conference, Academy of Life Long Learning, the Almond Board of California, Canadian Agriculture Policy Institute, Loblaw Companies Ltd, the Griffin Hospital (for the development of the NuVal scoring system), the Coca-Cola Company, Epicure, Danone, Diet Quality Photo Navigation (DQPN), Better Therapeutics (FareWell), Verywell, True Health Initiative (THI), Heali AI Corp, Institute of Food Technologists (IFT), Soy Nutrition Institute (SNI), Herbalife Nutrition Institute (HNI), Saskatchewan & Alberta Pulse Growers Associations, Sanitarium Company, Orafti, the International Tree Nut Council Nutrition Research and Education Foundation, the Peanut Institute, Herbalife International, Pacific Health Laboratories, Barilla, Metagenics, Bayer Consumer Care, Unilever Canada and Netherlands, Solae, Kellogg, Quaker Oats, Procter & Gamble, Abbott Laboratories, Dean Foods, the California Strawberry Commission, Haine Celestial, PepsiCo, the Alpro Foundation, Pioneer Hi-Bred International, DuPont Nutrition and Health, Spherix Consulting and WhiteWave Foods, the Advanced Foods and Material Network, the Canola and Flax Councils of Canada, Agri-Culture and Agri-Food Canada, the Canadian Agri-Food Policy Institute, Pulse Canada, the Soy Foods Association of North America, the Nutrition Foundation of Italy (NFI), Nutra-Source Diagnostics, the McDougall Program, the Toronto Knowledge Translation Group (St. Michael’s Hospital), the Canadian College of Naturopathic Medicine, The Hospital for Sick Children, the Canadian Nutrition Society (CNS), the American Society of Nutrition (ASN), Arizona State University, Paolo Sorbini Foundation, and the Institute of Nutrition, Metabolism and Diabetes. He received an honorarium from the United States Department of Agriculture to present the 2013 W.O. Atwater Memorial Lecture. He received the 2013 Award for Excellence in Research from the International Nut and Dried Fruit Council. He received funding and travel support from the Canadian Society of Endocrinology and Metabolism to produce mini cases for the Canadian Diabetes Association (CDA). He is a member of the International Carbohydrate Quality Consortium (ICQC). His wife, Alexandra L Jenkins, is a director and partner of INQUIS Clinical Research for the Food Industry, his 2 daughters, Wendy Jenkins and Amy Jenkins, have published a vegetarian book that promotes the use of the foods described here, The Portfolio Diet for Cardiovascular Risk Reduction (Academic Press/Elsevier 2020 ISBN:978–0-12–810510-8), and his sister, Caroline Brydson, received funding through a grant from the St. Michael’s Hospital Foundation to develop a cookbook for one of his studies. He is also a vegan. CWCK has received grants or research support from the Advanced Food Materials Network, Agriculture and Agri-Foods Canada (AAFC), Almond Board of California, Barilla, Canadian Institutes of Health Research (CIHR), Canola Council of Canada, International Nut and Dried Fruit Council, International Tree Nut Council Research and Education Foundation, Loblaw Brands Ltd, the Peanut Institute, Pulse Canada, and Unilever. He has received in-kind research support from the Almond Board of California, Barilla, California Walnut Commission, Kellogg Canada, Loblaw Companies, Nutrartis, Quaker (PepsiCo), the Peanut Institute, Primo, Unico, Unilever, and WhiteWave Foods/Danone. He has received travel support and/or honoraria from the Barilla, California Walnut Commission, Canola Council of Canada, General Mills, International Nut and Dried Fruit Council, International Pasta Organization, Lantmannen, Loblaw Brands Ltd, Nutrition Foundation of Italy, Oldways Preservation Trust, Paramount Farms, the Peanut Institute, Pulse Canada, Sun-Maid, Tate & Lyle, Unilever, and White Wave Foods/Danone. He has served on the scientific advisory board for the International Tree Nut Council, International Pasta Organization, McCormick Science Institute, and Oldways Preservation Trust. He is a founding member of the International Carbohydrate Quality Consortium (ICQC), Executive Board Member of the Diabetes and Nutrition Study Group (DNSG) of the European Association for the Study of Diabetes (EASD), is on the Clinical Practice Guidelines Expert Committee for Nutrition Therapy of the EASD, and is a Director of the Toronto 3D Knowledge Synthesis and Clinical Trials foundation. JLS has received research support from the Canadian Foundation for Innovation, Ontario Research Fund, Province of Ontario Ministry of Research and Innovation and Science, Canadian Institutes of health Research (CIHR), Diabetes Canada, American Society for Nutrition (ASN), National Honey Board (U.S. Department of Agriculture [USDA] honey “Checkoff” program), Institute for the Advancement of Food and Nutrition Sciences (IAFNS), Pulse Canada, Quaker Oats Center of Excellence, INC International Nut and Dried Fruit Council Foundation, The United Soybean Board (USDA soy “Checkoff” program), Protein Industries Canada (a Government of Canada Global Innovation Cluster), Almond Board of California, European Fruit Juice Association, The Tate and Lyle Nutritional Research Fund at the University of Toronto, The Glycemic Control and Cardiovascular Disease in Type 2 Diabetes Fund at the University of Toronto (a fund established by the Alberta Pulse Growers), The Plant Protein Fund at the University of Toronto (a fund which has received contributions from IFF among other donors), The Plant Milk Fund at the University of Toronto (a fund established by the Karuna Foundation through Vegan Grants), and The Nutrition Trialists Network Fund at the University of Toronto (a fund established by donations from the Calorie Control Council and Physicians Committee for Responsible Medicine). He has received food donations to support randomized controlled trials from the Almond Board of California, California Walnut Commission, Danone, Nutrartis, Soylent, and Dairy Farmers of Canada. He has received travel support, speaker fees and/or honoraria from Danone, FoodMinds LLC, Nestlé, Abbott, General Mills, Nutrition Communications, International Food Information Council (IFIC), Arab Beverages, International Sweeteners Association, Association Calorie Control Council, and Phynova. He has or has had ad hoc consulting arrangements with Perkins Coie LLP, Tate & Lyle, Ingredion, and Brightseed. He is on the Clinical Practice Guidelines Expert Committees of Diabetes Canada, European Association for the study of Diabetes (EASD), Canadian Cardiovascular Society (CCS), and Obesity Canada/Canadian Association of Bariatric Physicians and Surgeons. He serves as an unpaid member of the Board of Trustees of IAFNS. He is a Director at Large of the Canadian Nutrition Society (CNS), founding member of the International Carbohydrate Quality Consortium (ICQC), Executive Board Member of the Diabetes and Nutrition Study Group (DNSG) of the EASD, and Director of the Toronto 3D Knowledge Synthesis and Clinical Trials foundation. His spouse is an employee of AB InBev. All other authors declare that they have no competing interests.

Additional information

Publisher’s note.

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary Information

12916_2024_3524_moesm1_esm.docx.

Additional file 1: This file contains Additional file 1 material, including the PRISMA checklist, further details on the search process, and additional results.

figure 1

Flow of literature on the effect of substituting soymilk for cow’s milk on intermediate cardiometabolic outcomes. Exclusion criteria: duplicate, abstract only (conference abstract), non-human (animal study), in vitro, review/position paper/commentary/letter, observational (observational study), no soymilk (intervention was not soymilk), children (participants < 18 years of age), no suitable comparator (comparator was not cow’s milk), isolated soy protein (an ISP powder was given to participants), acute (follow-up of < 3 weeks), combined intervention (effects of intervention and comparator could not be isolated), wrong endpoint (no data for outcomes of interest), alternative publication (repeated data from original publication)

figure 2

A summary plot for the effect of substituting soymilk for cow’s milk on intermediate cardiometabolic outcomes. Analyses were conducted using generic, inverse variance random-effects models (at least 5 trials available), or fixed-effects models (fewer than 5 trials available). Between-study heterogeneity was assessed by the Cochrane Q statistic, where P Q  < 0.100 was considered statistically significant, and quantified by the I 2 statistic, where I 2  ≥ 50% was considered evidence of substantial heterogeneity. The GRADE of randomized controlled trials are rated as “high” certainty of evidence and can be downgraded by 5 domains and upgraded by 1 domain. The white squares represent no downgrades, the filled black squares indicate a single downgrade or upgrades for each outcome, and the black square with a white “2” indicates a double downgrade for each outcome. Because all included trials were randomized or nonrandomized controlled trials, the certainty of the evidence was graded as high for all outcomes by default and then downgraded or upgraded based on prespecified criteria. Criteria for downgrades included risk of bias (downgraded if most trials were considered to be at high ROB); inconsistency (downgraded if there was substantial unexplained heterogeneity: I 2  ≥ 50%; P Q  < 0.10); indirectness (downgraded if there were factors absent or present relating to the participants, interventions, or outcomes that limited the generalizability of the results); imprecision (downgraded if the 95% CI crossed the minimally important difference (MID) for harm or benefit); and publication bias (downgraded if there was evidence of publication bias based on the funnel plot asymmetry and/or significant Egger or Begg test ( P  < 0.10)), with confirmation by adjustment using the trim-and-fill analysis of Duval and Tweedie. The criteria for upgrades included a significant dose–response gradient. For the interpretation of the magnitude, we used the MIDs to assess the importance of magnitude of our point estimate using the effect size categories according to the new GRADE guidance. Then, we used the MIDs to assess the importance of the magnitude of our point estimates using the effect size categories according to the GRADE guidance as follows: a large effect (≥ 5 × MID); moderate effect (≥ 2 × MID); small important effect (≥ 1 × MID); and trivial/unimportant effect (< 1 MID). *HDL-C values reversed to show benefit. **LDL-C was not downgraded for imprecision, as the degree to which the upper 95% CI crosses the MID is not clinically meaningful. Additionally, the moderate change in non-HDL-C, with high certainty of evidence, substantiates the high certainty of the LDL-C results.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ . The Creative Commons Public Domain Dedication waiver ( http://creativecommons.org/publicdomain/zero/1.0/ ) applies to the data made available in this article, unless otherwise stated in a credit line to the data.

Reprints and permissions

About this article

Cite this article.

Erlich, M.N., Ghidanac, D., Blanco Mejia, S. et al. A systematic review and meta-analysis of randomized trials of substituting soymilk for cow’s milk and intermediate cardiometabolic outcomes: understanding the impact of dairy alternatives in the transition to plant-based diets on cardiometabolic health. BMC Med 22 , 336 (2024). https://doi.org/10.1186/s12916-024-03524-7

Download citation

Received : 20 December 2023

Accepted : 09 July 2024

Published : 22 August 2024

DOI : https://doi.org/10.1186/s12916-024-03524-7

Share this article

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

  • Soy protein
  • Cardiovascular disease
  • Systematic review
  • Meta-analysis
  • Randomized controlled feeding trials

BMC Medicine

ISSN: 1741-7015

research bias in peer review

Physical Review E’s Chief Editor on Interdisciplinary Research and Peer Review Challenges

An interview with dario corradini..

Dario Corradini

Interdisciplinary science has quickly become a new norm in physics, and cross-pollination between subjects is leading to better questions and more innovative ideas. It’s fitting, then, that the new chief editor of one of physics’ broadest journals seeks to expand Physical Review E into the interstices between physics, biology, chemistry, and computer science.

We spoke with Dario Corradini about the nature of the scientific review process, the increasing internationality of the journal’s leadership and readership, and more.

You’ve been Chief Editor of Physical Review E since April. What’s in store for the journal under your leadership?

We want to expand further into a few topics that we already cover, including soft matter, machine learning and artificial intelligence, and biophysics. To attract more papers in these areas, we are putting in place different strategies tailored to each topic.

We’ve also started thinking about revamping the image of the journal. We’ve served our audiences well over the years, but we want to be more nimble and able to follow the most recent trends in research, so as to be valuable for the next generations of researchers.

Lastly, we want to get even broader in the geographical representation of not just our authors, but our editorial board — especially across Asia. I’ve already contacted several people at different career levels from different parts of the globe who have agreed to serve on our editorial board next year.

You’ve worked and studied in several different countries. Does that inform your role as chief editor?

Yes, definitely. I started in Rome for my Ph.D., then spent two years in Boston, three in Paris, then came back to the U.S. for a position as associate editor of Physical Review X , which I held for nine years. All this gave me lots of colleagues in both the U.S. and Europe, and within the statistical physics community, which is one of the biggest communities served by Physical Review E . Having lived and worked in different countries has also made me appreciate that science is a truly universal endeavor transcending language and cultural barriers.

What was it like transitioning from research into publishing?

When you do research, you’re used to working on a very specific problem or set of problems. You know everything about your project — it’s like your baby, and you’re very focused. But when you become an editor, your horizons expand a lot. You’re exposed to so much more physics. It's definitely daunting at first.

But over time, I developed a sense that my role now is not to understand every single technical detail. For that, I rely on my referees, the technical experts. As an editor, I have to understand if the paper is a good fit for the journal. Not just to support the journal, but to help researchers reach the best audience for their work. There’s a psychology element to this role, too. I have to know what the referees are saying, but I also have to figure out what they aren’t saying. It’s really an investigative job.

Speaking of expanding your horizons, Physical Review E has one of the broadest scopes in the Physical Review portfolio. Why is that valuable to physics research?

It’s true. Among the Physical Review journals, Physical Review E is arguably the broadest in scope. We could have different journals for each niche topic, but physics is ultimately “one.” Many of the concepts used in a certain area can be translated to other contexts, but we all rely on the same physical principles.

I would actually go so far as to say that science is one. Biologists, chemists, mathematicians, and engineers may have different approaches, but they are not alien to each other. Lots of the most interesting research these days is at the interfaces of these different fields. At Physical Review E , we encourage submissions on physics-adjacent research for this reason. If a variety of tangentially related topics are published in the same journal, our readers can more clearly see the connections between fields, and ultimately ask better questions.

If you could chat with each author who submits to Physical Review E , what would you say?

First of all, thank you. Thank you for trusting us with your research. I would also ask for your help. I can sense that there is a crisis with referees. There are not enough researchers offering their time as referees, and peer review only works when people are willing to review papers. I think it’s really important for younger people to act as referees, so it’s equally important for established researchers to train and mentor younger scientists to be good referees. Part of being in the scientific community means offering your time as an expert referee, because all of your papers are published thanks to others who volunteered their time.

We ask authors to suggest referees when they submit, and we really mean it. But we also ask that they avoid suggesting the usual suspects. The big names in the field can only give so much of the time that’s asked of them. Outside of the top experts, there are many researchers who make great referees, including younger scientists.

Cypress Hansen

Cypress Hansen is a science writer in the San Diego area.

Join your Society

If you embrace scientific discovery, truth and integrity, partnership, inclusion, and lifelong curiosity, this is your professional home.

  • Future Students
  • Parents/Families
  • Alumni/Friends
  • Current Students
  • Faculty/Staff
  • MyOHIO Student Center
  • Visit Athens Campus
  • Regional Campuses
  • OHIO Online
  • Faculty/Staff Directory
  • University Community
  • Research & Impact
  • Alumni & Friends
  • Search All News
  • OHIO Today Magazine
  • Colleges & Campuses
  • For the Media

Helpful Links

Navigate OHIO

Connect With Us

University Libraries purchases Sage Research Methods package

Sage Research Methods

Ohio University Libraries has purchased Sage Research Methods, a platform that includes textbooks, foundation research guidelines, data sets, code books, peer-reviewed case studies and more with updates through 2029.

If members of the OHIO community are looking to explore a new research methodology, hoping to reduce textbook costs or needing a case study for a course, Sage Research Methods can help.

The platform boasts more than 500 downloadable datasets, with code books and instructional materials that provide step by step walk-throughs of the data analysis. The platform also includes quantitative data sets that come with software guides that can assist in understanding tools like SPSS, R, Stata, and Python.

Ohio University now has access to more than 1,000 book titles (including the quantitative social sciences "little green books") that support a variety of research methodologies and approaches from beginner to expert. The OHIO package also includes peer-reviewed case studies with accompanying discussion questions and multiple-choice quiz questions and can be embedded into Canvas courses. Further, the collection includes a Diversifying and Decolonizing Research subcollection that highlights the importance of inclusive research, perspectives from marginalized populations and cultures, and minimizing bias in data analysis. 

Highlighted features within Sage Research Methods include:

  • Ability to browse content, including datasets, by disciplines and/or methodology. 
  • Informational and instructional videos that cover topics like market research, data visualization, ethics and integrity, and Big Data. The videos are easy to embed, too.
  • Interactive research tools that help with research plan development: Methods Map visualizer, Project Planner for outlining, and Reading Lists. 
  • Permalinks are easy to access by just copying and pasting the URL into Canvas or your syllabus.

Learn more about Sage Research Methods. 

University Libraries strives to support the OHIO community in and out of the classroom by supporting varying pedagogic approaches and finding ways to make learning more affordable for our students. Further, the Libraries aims to provide access and discoverability to research materials to support Ohio University’s innovative research enterprise. Purchasing Sage Research Methods supports both initiatives as this resource can be used by all students, faculty and staff at Ohio University for research support and instructors for course materials.

Students, faculty and staff Interested in learning more about any of the resources mentioned above are encouraged to reach out to Head of Learning Services and Education Librarian Dr. Chris Guder , Head of Research Services and Health Sciences Librarian Hanna Schmillen or a subject librarian.

Be sure to explore Sage Research Methods on your own; the platform can be accessed through Ohio University Libraries . In addition, there are training sessions and videos from Sage on its training website.

Sexual side effects of antipsychotic drugs in schizophrenia: Protocol for a systematic review with single-arm, pairwise and network meta-analysis of randomized controlled trials and non-randomized studies

Johannes Schneider-Thoma Roles: Conceptualization, Methodology, Writing – Review & Editing Shimeng Dong Roles: Methodology, Project Administration, Writing – Original Draft Preparation Orestis Efthimiou Roles: Formal Analysis Spyridon Siafis Roles: Conceptualization, Methodology Wulf Peter Hansen Roles: Resources Elfriede Scheuring Roles: Resources Karl Heinz Möhrmann Roles: Resources Stefan Leucht Roles: Conceptualization, Supervision

Introduction

Sexual dysfunctions are common yet underreported side effects of antipsychotics for schizophrenia, affecting 30-80% of treated individuals. These side effects can severely impact social interactions and treatment adherence for individuals with schizophrenia, but comprehensive comparative evidence assessing the risk profiles of different antipsychotics is lacking. This study aims to address this gap using network meta-analysis that integrates data from both randomized-controlled trials (RCTs) and non-randomized studies (NRS).

This systematic review will include both RCTs and NRS focusing on participants with schizophrenia or schizophrenia-like psychoses, without restrictions on symptoms, gender, ethnicity, age, or setting. For interventions, all second-generation antipsychotics will be included. The primary outcome will be the occurrence of at least one sexual adverse event of any kind. Secondary outcomes will be the occurrence of any sexual adverse event evaluated in men and women separately, and any adverse event related to the three phases of sexual response cycle separately: desire (e.g. libido, sexual thoughts), arousal (e.g. erection, lubrication) and orgasm (e.g. ejaculation, anorgasmia), and any adverse effect related to breast dysfunction and menstruation irregularities. Study selection and data extraction will be performed independently by two reviewers. The Cochrane Risk of Bias tool 1 and ROBINS-I will be employed to evaluate the risk of bias for RCTs and NRS, respectively. Single-arm meta-analysis of proportions will synthesize the average frequency of sexual adverse events in treated participants. Pairwise and network meta-analysis of RCTs and NRS will be used to evaluate comparative tolerability. Subgroup and sensitivity analyses will explore possible heterogeneity in results and validate the findings’ robustness. The quality of the evidence will be evaluated using GRADE.

This study will provide vital insights into the sexual side effects of antipsychotics by combining evidence from clinical trials and real-world practice, facilitating better decision-making in choosing the optimal antipsychotic for individuals.

Sexual side effects, antipsychotics, schizophrenia, meta-analysis

Schizophrenia is a prevalent severe mental illness with worldwide distribution, affecting approximately 1% of the population during their lifetime due to its start during early adulthood ( McGrath et al. 2008 ). Antipsychotics, which are critical for both acute management and prevention of relapse in schizophrenia ( DGPPN e.V. for the Guideline Group 2019 ), are often prescribed over long periods, potentially lifelong. These medications, however, are associated with various side effects, including sexual dysfunctions.

Sexual dysfunctions induced by antipsychotics can manifest as disturbances in sexual desire, erection and ejaculation, vaginal lubrication, and orgasmic dysfunctions as well as partly related disorders of the menstruation cycle and the breast (such as gynecomastia and galactorrhea) ( Kelly and Conley 2004 ; La Torre et al. 2013 ; Montejo et al. 2018 ). These dysfunctions are not only common —mostly reported in 30-80% of treated individuals with prevalence rates varying from 0 to over 90% ( La Torre et al. 2013 ) —but also highly distressing and a frequent cause of non-adherence to treatment ( Perkins 2002 ; Lambert et al. 2004 ). Non-adherence significantly elevates the risk of relapse of psychotic symptoms. Moreover, sexual side effects critically interfere with normal participation in social life in terms of having close and satisfying personal relations in a romantic partnership, which is one of the most important unmet needs of individuals with schizophrenia ( Jager and McCann 2017 ). Therefore, sexual side effects severely diminish the quality of life for those affected ( Bebbington et al. 2009 ; Olfson et al. 2005 ).

Despite the significant clinical impact of sexual side effects induced by antipsychotics, there is a lack of comprehensive meta-analyses addressing this critical issue, particularly no network meta-analyses presenting differences between antipsychotics in this regard. Existing reviews include several narrative reviews and some pairwise meta-analyses (mainly Cochrane reviews) that only focused on specific antipsychotics and invested sexual side effects as secondary outcomes (risperidone ( Hunter et al. 2003 ; Jayaram and Hosalli 2005 ; Komossa et al. 2011 ), sertindole ( Komossa et al. 2009 ; Lewis et al. 2005 ), paliperidone ( Harrington and English 2010 ), or amisulpride ( Men et al. 2018 )). Moreover, some meta-analyses only included observational studies ( Zhao et al. 2020 ; Korchia et al. 2023 ) or had a small number of studies ( Trinchieri et al. 2021 ). One single-arm meta-analysis combined both randomized and observational data and calculated overall percentages of sexual dysfunctions with each antipsychotic across 34 studies ( Serretti and Chiesa 2011 ). However, as the authors report themselves, this approach is not suitable to make statements for differences between antipsychotics in propensity to cause sexual side effects. In summary, the existing evidence leaves us with an incomplete and only impressionistic picture which is limited in terms of available trials, number of events and use of inappropriate methods.

This study aims to fill this knowledge gap by providing evidence-based insights on sexual adverse events associated with antipsychotic to guide the selection of the optimal drug for individual needs. Therefore, to summarize according to the PICO(S) scheme, we will conduct a comprehensive network meta-analysis combining data from randomized-controlled trials and real-world observational studies ( S tudy design) to compare all second-generation antipsychotics ( I ntervention) with each other ( C omparator) on their propensity to cause sexual side effects ( O utcome) in patients with schizophrenia ( P opulation).

We report this systematic review and network meta-analysis protocol according to the Preferred Reporting Items for Systematic review and Meta-analysis Protocols (PRISMA-P) checklist, and the PRISMA extension for network meta-analysis ( Hutton et al. 2015 ). The PRISMA-P Checklist can be found in the extended data. This protocol has been registered with PROSPERO (registration number: CRD42024510190) and will be updated with any necessary amendments.

Criteria for considering studies for this review

Study designs

We will include randomized controlled trials (RCTs) and non-randomized studies (NRS). RCTs identified with high risk of bias in sequence generation will be considered as quasi-randomized studies and grouped with NRS. The inclusion of NRS is not limited to specific study designs because as stressed by the Cochrane handbook ( Reeves et al. 2022 ), design labels are used very inconsistently and the risk of bias of a certain NRS can be only assessed when the specific study features are known. Accordingly, studies will first be classified by design, followed by a careful assessment of bias risk for each study and studies with critical risk of bias will be excluded from the analysis. We will also exclude studies from mainland China that are not conducted by international pharmaceutical companies or published in international scientific journals due to significant concerns regarding methodological and reporting quality ( Leucht et al. 2022 ). Both open-label and blinded studies will be included; however, open-label and single-blind studies will be excluded in a sensitivity analysis to address potential bias in expectations of sexual side effects. The minimum study duration will be 3 weeks because shorter studies usually do not focus on clinical efficacy and tolerability of antipsychotics but on more experimental research questions. For cross-over studies, only data from the first phase will be used to avoid carry-over effects, which are common in schizophrenia.

Participants

We will include trials in which at least 80% of the participants are diagnosed with schizophrenia or related disorders (such as schizophreniform or schizoaffective disorders) without restrictions in terms of symptoms (acute episode or maintenance phase), gender, ethnicity, age, or setting. These inclusion criteria are adopted because occurrence of side effects can be considered largely independent of psychopathology and they will increase the data availability for these typically underreported outcomes ( Zorzela et al. 2016 ). Of note, we will record potentially important population characteristics for each trial and consider them in the assessments of heterogeneity and transitivity ae well as in subgroup and sensitivity analyses.

Interventions

All second-generation antipsychotics (SGAs), which are predominantly prescribed for schizophrenia in Europe, Japan and the USA, will be included in this study, namely amisulpride, aripiprazole, asenapine, blonanserine, brexpiprazole, cariprazine, clozapine, iloperidone, lumateperone, lurasidone, olanzapine, olanzapine-samidorphan, paliperidone, perospirone, quetiapine, risperidone, sertindole, ziprasidone, zotepine. Only SGAs are included because those were investigated in recent clinical research adhering to standardized procedures. These standards include systematic documentation of adverse events according to protocols like the Good-Clinical-Practice guideline and use of standardized nomenclatures of adverse events such as MedDRA ( International Council for Harmonisation of Technical Requirements for Pharmaceuticals for Human Use 2022 ). Furthermore, reporting of studies involving SGAs typically comply with guidelines like CONSORT for RCTs ( Schulz et al. 2010 ) and STROBE for NRS ( Vandenbroucke et al. 2007 ), ensuring detailed information on study design and outcomes. Moreover, study authors and pharmaceutical companies of these trials are very likely to keep electronic records and are contactable to provide necessary additional information, which is very important for this review. However, we will include first-generation antipsychotics (FGAs), placebo and no treatment when they were used as comparators in RCTs and NRS of SGAs.

We will include all these compounds, when used in monotherapy, in any form of administration (e.g. oral or intramuscular depot). Primarily, different applications of the same drug will be combined because side effects predominantly follow the pharmacodynamic profile of the specific compounds and not its pharmacokinetics, as observed in previous reviews ( Huhn et al. 2019 ; Schneider-Thoma et al. 2022 ), but considered separate interventions in sensitivity analysis. For RCTs, we will only include fixed-dose studies within the target to maximum range according to a recent consensus reached after a two-step Delphi survey among international experts in the treatment of schizophrenia ( McAdam et al. 2023 ); all flexible-dose treatment regimens (as long as they overlap with the target to maximum range) will be included as these allow investigators to titrate doses to optimal levels for individual participants. Similarly, NRS that rely on observed clinical data will be treated as having flexible dose. In sensitivity analyses, we will exclude flexible dose RCTs and NRS in which the applied doses were outside the target to maximum range for some participants, to control for potential effects of extremely low or high doses.

Comparators

In network meta-analysis there is no formal comparator as all interventions will be compared with each other.

Outcome measures

Primary outcome

The primary outcome will be “Any sexual side effect”. We will use the occurrence of at least one sexual adverse event of any kind provided by the original authors, for example from specific questionnaires for sexual side effects. In case the occurrence of any sexual side effect is not explicitly reported, we will use the highest number of participants reported for any specific sexual adverse event, in line with methodologies used in previous reviews ( Serretti and Chiesa 2011 ; Huhn et al. 2019 ; Schneider-Thoma et al. 2022 ).

1. Any sexual adverse event in men and women separately.

2. Any adverse events related to the “desire” phase of sexual response cycle, such as libido decrease, loss of sexual thoughts.

3. Any adverse events related to the “arousal” phase of sexual response cycle, such as erectile dysfunction, vaginal lubrication decrease.

4. Any adverse events related to the “orgasm” phase of sexual response cycle, such as ejaculation dysfunction, anorgasmia.

5. Any adverse related to breast dysfunction, such as gynecomastia, galactorrhea.

6. Any adverse related to menstruation irregularities, such as amenorrhea.

Of note, there is discussion whether breast dysfunction and menstruation irregularities should be considered as sexual side effects because they are not part of the sexual function per se. However, they are frequently mentioned in parallel to dysfunctions of the sexual response cycle, included in some scales for sexual side effects ( Serretti and Chiesa 2011 ; Boer et al. 2014 ) and very bothersome for participants, and therefore we decided to address them as secondary outcomes.

Timing of outcome measurement will be at study endpoint.

Search strategy

Electronic searches

As recommended by the PRISMA harms checklist ( Zorzela et al. 2016 ) and the Cochrane handbook ( Reeves et al. 2022 ), we search for any study that might have reported adverse events in general and not only for studies mentioning specific sexual adverse events in title/abstract because it is impossible to report all adverse events in searchable/indexable parts of publications. For RCTs, we search the Cochrane Schizophrenia Group’s Study-Based Register of trials ( Shokraneh and Adams 2020 ) for published and unpublished reports. Following the methods from Cochrane ( Higgins JPT, Thomas J, Chandler J, Cumpston M, Li T, Page MJ, Welch VA 2022 ) the Information Specialist compiles this register from systematic searches in MEDLINE, Embase, Allied and Complementary Medicine (AMED), Cumulative Index to Nursing and Allied Health Literature (CINAHL), PsycINFO, PubMed, US National Institute of Health Ongoing Trials Register ClinicalTrials.gov , World Health Organization International Clinical Trials Registry Platform ( www.who.int/ictrp ), ProQuest Dissertations and Theses A&I. The register also includes hand searches and conference proceedings and does not place any limitations on language, date, document type or publication status. For NRS, we search multiple electronic databases including ClinicalTrials.gov , Embase, MEDLINE, PsycINFO, Science Citation Index-Expanded, and WHO International Clinical Trials Registry Platform (ICTRP) with no date/time, language, document type, and publication status limitations. The search string contains terms for schizophrenia and the included antipsychotics. The detailed search strategies can be found in the extended data.

Reference lists and other sources

As additional hand searches, we will check the included studies in previously published relevant systematic reviews. Moreover, because adverse events are often underreported, we will contact the corresponding authors of each included study for unpublished information about adverse events.

Identification and selection of studies

Using Rayyan ( Ouzzani et al. 2016 ), title and abstracts of identified references are screened in duplicate by two reviewers with regard to the eligibility criteria above. Any disagreements between the two reviewers are solved by discussion. Then, again in duplicate, two reviewers will inspect the full articles of references selected in title/abstract screening for eligibility and for availability of sexual side effects. Any disagreements will be solved by discussion among the two reviewers or with a third, experienced reviewer (JST, SL). If a decision cannot be made, the study authors will be contacted for clarification.

Data extraction

− General information, such as author name, year of publication, treatment arms and sample size.

− Methodology, such as study design, blinding, duration of study, diagnostic criteria used, study population (Intention-to-treat, observed cases) for which adverse events are reported.

− Participant characteristics, such as age, weight, number of men/women, diagnosis details, plasma prolactin level.

− Intervention characteristics, such as doses, form of application, percent co-medication with antidepressants.

− Outcome measures.

Risk-of-bias assessment

Risk of bias will be assessed for each included study by two reviewers in duplicate referring to the Cochrane Collaboration’s risk of bias tools for randomized controlled studies (RoB tool 1) and non-randomized studies (Risk Of Bias In Non-randomized Studies – of Interventions, ROBINS-I). Disagreements in the assessment will be discussed among the two reviewers and, if needed with a third, experienced reviewer (JST, SL). We will exclude NRS judged as carrying an overall critical risk of bias from the primary analysis. RCTs judged at high risk of bias RCTs and NRS judged at serious risk of bias, we will exclude in a sensitivity analysis.

Data analysis

Overview of the step-wise process for data synthesis of randomized and non-randomized data

First, we will conduct frequentist random effects single-arm and pairwise meta-analyses with RCTs and NRS as subgroups to synthesize estimates of overall prevalence and comparative tolerability and to assess heterogeneity. In the next step, we will conduct network meta-analysis of RCTs (including assessment of transitivity and evaluation of consistency). If the network meta-analysis of RCTs is internally consistent, we proceed with comparing the different estimates from RCTs (direct, indirect, mixed evidence) to the estimates of NRS. If there are no indications for systematic differences between RCT and NRS estimates, we proceed with combined network meta-analysis (again including assessment of transitivity and inconsistency).

Of note, if the requirements for network meta-analysis of RCTs or joint network meta-analysis of RCTs and NRS are not met, we will not proceed to the next step and use pairwise meta-analysis or network meta-analysis of RCTs for data synthesis.

Details of synthesis

Estimation procedures

For estimating the proportion of patients experiencing side effects in antipsychotics, we will use the number of participants experiencing sexual adverse events and non-events among pa exposed to antipsychotics or placebo/no treatment. We will meta-analyze the data using generalized linear mixed models ( Schwarzer et al. 2019 ).

For comparative pairwise and network meta-analysis, the number of participants experiencing sexual adverse events will be synthesized using odds ratios (OR) because ORs have better mathematical properties for meta-analysis, particularly in the case of studies with varying prevalence rates ( Doi et al. 2022 ) and because it is the only measure available in case-control studies. If available in the original publication, we will use reported ORs that are adjusted for possible confounders, such as differences in age and sex between the compared groups. If not available, we will calculate ORs based on the number of participants with events and the number of participants assessed (considering that some sexual adverse events only occur in men or women).

For the pairwise meta-analysis we aim to use a random effects meta-analysis model. However, if the data are sparse, i.e. if there are many studies with few or zero events in one or more of their arms, the usual inverse variance model for meta-analysis has limitations. In that case we will use models that can better handle rare events, such as the Mantel-Haenszel model and Bayesian approaches, as per methodological recommendations ( Efthimiou 2018 ).

Network meta-analysis of RCTs will be performed in a frequentist framework using a random effects model. We will assume a common heterogeneity parameter across the various treatment comparisons. For combined network meta-analysis of RCTs and NRS, different several statistical models are available. The selection of the most suitable model will be decided after careful consideration of the actual data, the distribution of studies by designs and the risk of bias assessment ( Efthimiou et al. 2017 ). In case of sparse data, we will explore the use of a Bayesian model or a frequentist model based on the Mantel-Haenszel approach ( Efthimiou et al. 2019 ).

Assessment of heterogeneity

Heterogeneity (variability in relative treatment effects within the same treatment comparison) will be assessed within and across study designs by visual inspection of forest plots and estimating the statistical heterogeneity τ, i.e. the standard deviation of random effects, and I 2 . We will employ empirical distributions to characterize the amount of heterogeneity as low, moderate or high ( Turner et al. 2012 ). Substantial heterogeneity indicates important differences in clinical and methodological characteristics of the studies which warrant further investigation, such as checking for mistakes in data entry and for potential effect modifiers and bias factors. Moreover, to assess how much heterogeneity affects the clinical interpretation of the relative treatment effects with respect to the extra uncertainty anticipated in a future study, we will produce prediction intervals.

Assessment of the transitivity assumption in network meta-analysis

Joint analysis of treatments can be misleading if the network is substantially intransitive. Intransitivity can arise when design, population or treatment characteristics that may modify the relative effects between interventions are distributed differently between comparisons. For the case of relative treatment effects in terms of sexual side effects, there is no clear a-priori-evidence, but several characteristics, may play a role (e.g. study design, blinding, gender, age, dose, antidepressant co-medication). Therefore, we will investigate if relevant characteristics are similarly distributed across studies grouped by comparison.

Assessment of inconsistency

Consistency, i.e. the agreement between direct evidence and indirect evidence of a network meta-analysis, will be statistically evaluated globally, by using the design-by-treatment test ( Higgins et al. 2012 ) and locally, via the back-calculation method ( König et al. 2013 ). In case of evidence of inconsistency, we will investigate possible sources of it (mistakes in data entry, clear differences in study characteristics).

Investigation of heterogeneity and inconsistency

Substantial heterogeneity, intransitivity or inconsistency will prevent network meta-analysis. Small or moderate amounts will be further explored by subgroup, network meta-regression, and sensitivity analyses.

We a priori plan to investigate the impact of following potential effect modifiers via Bayesian network meta-regression analyses of the primary outcome: percentage women, mean age, prolactin level, percent co-medication with antidepressants (which also cause sexual side effects), and duration of study. Additionally, we will perform separate network meta-analyses for sexual adverse events occurring in men and women (see secondary outcome).

Moreover, we will explore the robustness of results (with regard to the inclusion of studies with differences in study design, population and intervention characteristics in the primary analysis) by sensitivity analyses. The following sensitivity analyses of the primary outcomes are predefined: exclusion of (1) non-double-blind studies, (2) studies that report only observed-case analyses, (3) RCTs with high risk/NRS with serious risk of bias, (4) flexible-dose studies in which the range of applied doses exceeded the recommended (target to maximum) dose range ( McAdam et al. 2023 ), (5) studies that did not use specific questionnaires to assess sexual side effects, (6) studies in acutely ill patients (because acute psychosis might interfere with sexual functioning). Moreover, we will perform network meta-analysis with oral and depot applications of the same compound as separate interventions.

Small study effects and publication bias

For assessment of small study effects and publication bias, we will employ a comparison-adjusted funnel plot method to explore the association between study size and effect size ( Chaimani and Salanti 2012 ). Moreover, comparisons with 10 or more studies will be plotted in a contour-enhanced funnel plot ( Peters et al. 2008 ). Similarly, we will plot a contour enhanced funnel plot of all SGAs combined versus placebo.

Assessment of the confidence in estimates

The quality of the evidence of the primary outcome will be evaluated using the Grading of Recommendations Assessment, Development and Evaluation (GRADE) framework extended to NMA ( Puhan et al. 2014 ).

Statistical software

Analyses will be performed in R using the packages “meta” for single-arm and pairwise meta-analysis ( Balduzzi et al. 2019 ), “netmeta” for network meta-analyses ( Rücker et al. 2020 ), “crossnma” for combined network meta-analyses of RCTs and NRS ( Hamza and Salanti 2022 ). Bayesian analyses will be performed using self-programmed routines in “rjags” ( Plummer et al. 2023 ). R software and the mentioned packages are freely available https://cran.r-project.org/bin/windows/base/

Study status: search and the selection process are ongoing currently.

Despite its significant clinical relevance, there is a lack of scientific comparison between different antipsychotics regarding their sexual side effects. This project will address this gap by providing a comprehensive synthesis of evidence from clinical trials and real-world clinical practice. This information is relevant for clinicians and guideline developers in selecting the most appropriate medication for individuals. Additionally, this review is of high importance for future clinical research, regarding both RCTs and NRS, as it will report the current state of evidence concerning sexual side effects of antipsychotics and identify existing limitations. Finally, this review will be among the first to integrate randomized and non-randomized evidence in a network meta-analysis, thereby advancing methodological approaches in evidence-based medicine.

Patients and public involvement

We collaborate with members of the patient organization “BASTA - Bündnis für psychisch erkrankte Menschen” and the relatives’ organization “Landesverband Bayern der Angehörigen psychisch erkrankter Menschen e.V.” in this project. They contributed in identifying the research idea and developing this review protocol from their perspective as people with lived experience with the disease schizophrenia and the treatment with antipsychotics. They will be updated regularly about the state of the project and help with any upcoming questions. Moreover, they will be involved in interpreting the results and in preparing a lay summary of the results so that other patients and relatives of patients can be directly informed about the scientific results with a text that can be understood, e.g. using the BASTA-newsblog ( http://www.bastagegenstigma.de/ ).

Ethics and consent

This review does not require ethical approval.

Contributions of authors

SL is the principal investigator, obtained funding, and supervises the study. JST, SD, SS and SL designed the study and provided clinical and methodological advice. JST and SD drafted the manuscript and registered the protocol with PROSPERO before. OE provided substantial methodological and statistical advice. WPH, ES and KHM provided the patient perspective when designing the study. All authors critically reviewed the manuscript for important intellectual content and approved its final version.

Data availability

Underlying data.

No data associated with this article.

Extended data

Figshare: Sexual side effects of antipsychotic drugs in schizophrenia: Protocol for a systematic review with single-arm, pairwise and network meta-analysis of randomized controlled trials and non-randomized studies, https://doi.org/10.6084/m9.figshare.26396275.v2 ( Dong, 2024 ).

• PRISMA-p checklist

• Search strategy

Data are available under the terms of the Creative Commons Attribution 4.0 International License (CC-BY 4.0).

Acknowledgements

We would like to thank Dr Farhad Shokraneh, Systematic Review Consultants LTD, for designing the searches and AR, who wants to stay anonymous, for providing the patient perspective in the design of this review.

  •   Balduzzi S, Rücker G, Schwarzer G: How to perform a meta-analysis with R: a practical tutorial. Evid. Based Ment. Health. 2019; 22 (4): 153–160. PubMed Abstract | Publisher Full Text | Free Full Text
  •   Bebbington PE, Angermeyer M, Azorin J-M, et al. : Side-effects of antipsychotic medication and health-related quality of life in schizophrenia. Acta Psychiatr. Scand. Suppl. 2009; 119 : 22–28. Publisher Full Text
  •   de Boer MK , Castelein S, Wiersma D, et al. : A systematic review of instruments to measure sexual functioning in patients using antipsychotics. J. Sex Res. 2014; 51 (4): 383–389. PubMed Abstract | Publisher Full Text
  •   Chaimani A, Salanti G: Using network meta-analysis to evaluate the existence of small-study effects in a network of interventions. Res. Synth. Methods. 2012; 3 (2): 161–176. PubMed Abstract | Publisher Full Text
  •   DGPPN e.V. for the Guideline Group, editor. S3 Guideline for Schizophrenia.2019. Reference Source
  •   Doi SA, Furuya-Kanamori L, Xu C, et al. : Controversy and Debate: Questionable utility of the relative risk in clinical research: Paper 1: A call for change to practice. J. Clin. Epidemiol. 2022; 142 : 271–279. PubMed Abstract | Publisher Full Text
  •   Dong S: Extented data for SexualSE protocol. figshare. 2024. Publisher Full Text
  •   Efthimiou O: Practical guide to the meta-analysis of rare events. Evid. Based Ment. Health. 2018; 21 (2): 72–76. PubMed Abstract | Publisher Full Text | Free Full Text
  •   Efthimiou O, Mavridis D, Debray TPA, et al. : Combining randomized and non-randomized evidence in network meta-analysis. Stat. Med. 2017; 36 (8): 1210–1226. Publisher Full Text
  •   Efthimiou O, Rücker G, Schwarzer G, et al. : Network meta-analysis of rare events using the Mantel-Haenszel method. Stat. Med. 2019; 38 (16): 2992–3012. PubMed Abstract | Publisher Full Text
  •   Hamza T, Salanti G: P41 Crosnma: A New R Package to Synthesize Cross-Design Evidence and Cross-Format Data. Value Health. 2022; 25 (1): S9. Publisher Full Text
  •   Harrington CA, English C: Tolerability of paliperidone: a meta-analysis of randomized, controlled trials. Int. Clin. Psychopharmacol. 2010; 25 (6): 334–341. Publisher Full Text
  •   Higgins JPT, Jackson D, Barrett JK, et al. : Consistency and inconsistency in network meta-analysis: concepts and models for multi-arm studies. Res. Synth. Methods. 2012; 3 (2): 98–110. PubMed Abstract | Publisher Full Text | Free Full Text
  •   Higgins JPT, Thomas J, Chandler J, et al. : Cochrane Handbook for Systematic Reviews of Interventions. version 6.3. Cochrane; 2022. Reference Source
  •   Huhn M, Nikolakopoulou A, Schneider-Thoma J, et al. : Comparative efficacy and tolerability of 32 oral antipsychotics for the acute treatment of adults with multi-episode schizophrenia: a systematic review and network meta-analysis. Lancet. 2019; 394 (10202): 939–951. PubMed Abstract | Publisher Full Text | Free Full Text
  •   Hunter RH, Joy CB, Kennedy E, et al. : Risperidone versus typical antipsychotic medication for schizophrenia. Cochrane Database Syst. Rev. 2003; 2003 : CD000440. PubMed Abstract | Publisher Full Text | Free Full Text
  •   Hutton B, Salanti G, Caldwell DM, et al. : The PRISMA extension statement for reporting of systematic reviews incorporating network meta-analyses of health care interventions: checklist and explanations. Ann. Intern. Med. 2015; 162 (11): 777–784. PubMed Abstract | Publisher Full Text
  •   International Council for Harmonisation of Technical Requirements for Pharmaceuticals for Human Use: MedDRA ® the Medical Dictionary for Regulatory Activities terminology. Version 25.1.2022.
  •   de Jager J , McCann E: Psychosis as a Barrier to the Expression of Sexuality and Intimacy: An Environmental Risk? Schizophr. Bull. 2017; 43 (2): sbw172–sbw239. PubMed Abstract | Publisher Full Text | Free Full Text
  •   Jayaram MB, Hosalli P: Risperidone versus olanzapine for schizophrenia. Cochrane Database Syst. Rev. 2005; 2 : CD005237. Publisher Full Text
  •   Kelly DL, Conley RR: Sexuality and schizophrenia: a review. Schizophr. Bull. 2004; 30 (4): 767–779. Publisher Full Text
  •   Komossa K, Rummel-Kluge C, Hunger H, et al. : Sertindole versus other atypical antipsychotics for schizophrenia. Cochrane Database Syst. Rev. 2009; 2 : CD006752. PubMed Abstract | Publisher Full Text | Free Full Text
  •   Komossa K, Rummel-Kluge C, Schwarz S, et al. : Risperidone versus other atypical antipsychotics for schizophrenia. Cochrane Database Syst. Rev. 2011; 1 : CD006626. PubMed Abstract | Publisher Full Text | Free Full Text
  •   König J, Krahn U, Binder H: Visualizing the flow of evidence in network meta-analysis and characterizing mixed treatment comparisons. Stat. Med. 2013; 32 (30): 5414–5429. PubMed Abstract | Publisher Full Text
  •   Korchia T, Achour V, Faugere M, et al. : Sexual Dysfunction in Schizophrenia: A Systematic Review and Meta-Analysis. JAMA Psychiatry. 2023; 80 (11): 1110–1120. PubMed Abstract | Publisher Full Text | Free Full Text
  •   La Torre A, Conca A, Duffy D, et al. : Sexual dysfunction related to psychotropic drugs: a critical review part II: antipsychotics. Pharmacopsychiatry. 2013; 46 (6): 201–208. PubMed Abstract | Publisher Full Text
  •   Lambert M, Conus P, Eide P, et al. : Impact of present and past antipsychotic side effects on attitude toward typical antipsychotic treatment and adherence. Eur. Psychiatry. 2004; 19 (7): 415–422. PubMed Abstract | Publisher Full Text
  •   Leucht S, Li C, Davis JM, et al. : About the issue of including or excluding studies from China in systematic reviews. Schizophr. Res. 2022; 240 : 162–163. Publisher Full Text
  •   Lewis R, Bagnall A-M, Leitner M: Sertindole for schizophrenia. Cochrane Database Syst. Rev. 2005; 2005 (3): CD001715. PubMed Abstract | Publisher Full Text | Free Full Text
  •   McAdam MK, Baldessarini RJ, Murphy AL, et al. : Second International Consensus Study of Antipsychotic Dosing (ICSAD-2). J. Psychopharmacol. 2023; 37 (10): 982–991. PubMed Abstract | Publisher Full Text | Free Full Text
  •   McGrath J, Saha S, Chant D, et al. : Schizophrenia: a concise overview of incidence, prevalence, and mortality. Epidemiol. Rev. 2008; 30 : 67–76. PubMed Abstract | Publisher Full Text
  •   Men P, Yi Z, Li C, et al. : Comparative efficacy and safety between amisulpride and olanzapine in schizophrenia treatment and a cost analysis in China: a systematic review, meta-analysis, and cost-minimization analysis. BMC Psychiatry. 2018; 18 (1): 286. PubMed Abstract | Publisher Full Text | Free Full Text
  •   Montejo AL, Montejo L, Baldwin DS: The impact of severe mental disorders and psychotropic medications on sexual health and its implications for clinical management. World Psychiatry. 2018; 17 (1): 3–11. PubMed Abstract | Publisher Full Text | Free Full Text
  •   Olfson M, Uttaro T, Carson WH, et al. : Male sexual dysfunction and quality of life in schizophrenia. J. Clin. Psychiatry. 2005; 66 (3): 331–338. Publisher Full Text
  •   Ouzzani M, Hammady H, Fedorowicz Z, et al. : Rayyan-a web and mobile app for systematic reviews. Syst. Rev. 2016; 5 (1): 210. PubMed Abstract | Publisher Full Text | Free Full Text
  •   Perkins DO: Predictors of noncompliance in patients with schizophrenia. J. Clin. Psychiatry. 2002; 63 (12): 1121–1128. Publisher Full Text
  •   Peters JL, Sutton AJ, Jones DR, et al. : Contour-enhanced meta-analysis funnel plots help distinguish publication bias from other causes of asymmetry. J. Clin. Epidemiol. 2008; 61 (10): 991–996. PubMed Abstract | Publisher Full Text
  •   Plummer M, Stukalov A, Denwood M: rjags: Bayesian Graphical Models using MCMC.2023. Publisher Full Text
  •   Puhan MA, Schünemann HJ, Murad MH, et al. : A GRADE Working Group approach for rating the quality of treatment effect estimates from network meta-analysis. BMJ (Clinical Research ed.). 2014; 349 : g5630. Publisher Full Text
  •   Reeves BC, Deeks JJ, Higgins JPT, et al. : Chapter 24: Including non-randomized studies on intervention effects.Higgins JPT, Thomas J, Chandler J, et al. , editor. Cochrane Handbook for Systematic Reviews of Interventions. version 6.3. 2022.
  •   Rücker G, Krahn U, König J, et al. : netmeta: Network Meta-Analysis using Frequentist Methods. R package version 1.2-1.2020. Reference Source
  •   Schneider-Thoma J, Chalkou K, Dörries C, et al. : Comparative efficacy and tolerability of 32 oral and long-acting injectable antipsychotics for the maintenance treatment of adults with schizophrenia: a systematic review and network meta-analysis. Lancet. 2022; 399 (10327): 824–836. PubMed Abstract | Publisher Full Text
  •   Schulz KF, Altman DG, Moher D: CONSORT 2010 statement: updated guidelines for reporting parallel group randomized trials. Ann. Intern. Med. 2010; 152 (11): 726–732. PubMed Abstract | Publisher Full Text
  •   Schwarzer G, Chemaitelly H, Abu-Raddad LJ, et al. : Seriously misleading results using inverse of Freeman-Tukey double arcsine transformation in meta-analysis of single proportions. Res. Synth. Methods. 2019; 10 (3): 476–483. PubMed Abstract | Publisher Full Text | Free Full Text
  •   Serretti A, Chiesa A: A meta-analysis of sexual dysfunction in psychiatric patients taking antipsychotics. Int. Clin. Psychopharmacol. 2011; 26 (3): 130–140. PubMed Abstract | Publisher Full Text
  •   Shokraneh F, Adams CE: Cochrane Schizophrenia Group’s Study-Based Register of Randomized Controlled Trials: Development and Content Analysis. Schizophrenia Bulletin Open. 2020; 1 (1): Article sgaa061. Publisher Full Text
  •   Trinchieri M, Trinchieri M, Perletti G, et al. : Erectile and Ejaculatory Dysfunction Associated with Use of Psychotropic Drugs: A Systematic Review. J. Sex. Med. 2021; 18 (8): 1354–1363. PubMed Abstract | Publisher Full Text
  •   Turner RM, Davey J, Clarke MJ, et al. : Predicting the extent of heterogeneity in meta-analysis, using empirical data from the Cochrane Database of Systematic Reviews. Int. J. Epidemiol. 2012; 41 (3): 818–827. PubMed Abstract | Publisher Full Text | Free Full Text
  •   Vandenbroucke JP, von Elm E , Altman DG, et al. : Strengthening the Reporting of Observational Studies in Epidemiology (STROBE): explanation and elaboration. Ann. Intern. Med. 2007; 147 (8): W–94. Publisher Full Text
  •   Zhao S, Wang X, Qiang X, et al. : Is There an Association Between Schizophrenia and Sexual Dysfunction in Both Sexes? A Systematic Review and Meta-Analysis. J. Sex. Med. 2020; 17 (8): 1476–1488. PubMed Abstract | Publisher Full Text
  •   Zorzela L, Loke YK, Ioannidis JP, et al. : PRISMA harms checklist: improving harms reporting in systematic reviews. BMJ (Clinical Research ed.). 2016; 352 : i157. PubMed Abstract | Publisher Full Text

Comments on this article Comments (0)

Views Downloads
F1000Research - -
PubMed Central - -

Open Peer Review

Reviewer status, comments on this article.

All Comments (0)

Browse by related subjects

Competing Interests Policy

Provide sufficient details of any financial or non-financial competing interests to enable users to assess whether your comments might lead a reasonable person to question your impartiality. Consider the following examples, but note that this is not an exhaustive list:

  • Within the past 4 years, you have held joint grants, published or collaborated with any of the authors of the selected paper.
  • You have a close personal relationship (e.g. parent, spouse, sibling, or domestic partner) with any of the authors.
  • You are a close professional associate of any of the authors (e.g. scientific mentor, recent student).
  • You work at the same institute as any of the authors.
  • You hope/expect to benefit (e.g. favour or employment) as a result of your submission.
  • You are an Editor for the journal in which the article is published.
  • You expect to receive, or in the past 4 years have received, any of the following from any commercial organisation that may gain financially from your submission: a salary, fees, funding, reimbursements.
  • You expect to receive, or in the past 4 years have received, shared grant support or other funding with any of the authors.
  • You hold, or are currently applying for, any patents or significant stocks/shares relating to the subject matter of the paper you are commenting on.

Stay Updated

Sign up for content alerts and receive a weekly or monthly email with all newly published articles

Register with F1000Research

Already registered? Sign in

Not now, thanks

The email address should be the one you originally registered with F1000.

You registered with F1000 via Google, so we cannot reset your password.

To sign in, please click here .

If you still need help with your Google account password, please click here .

You registered with F1000 via Facebook, so we cannot reset your password.

If you still need help with your Facebook account password, please click here .

If your email address is registered with us, we will email you instructions to reset your password.

If you think you should have received this email but it has not arrived, please check your spam filters and/or contact for further assistance.

  • Open access
  • Published: 26 August 2024

The association between ibuprofen administration in children and the risk of developing or exacerbating asthma: a systematic review and meta-analysis

  • Luke Baxter 1 ,
  • Maria M. Cobo 1 , 2 ,
  • Aomesh Bhatt 1 ,
  • Rebeccah Slater 1 ,
  • Olutoba Sanni 3 &
  • Nutan Shinde 4  

BMC Pulmonary Medicine volume  24 , Article number:  412 ( 2024 ) Cite this article

26 Accesses

Metrics details

Ibuprofen is one of the most commonly used analgesic and antipyretic drugs in children. However, its potential causal role in childhood asthma pathogenesis remains uncertain. In this systematic review, we assessed the association between ibuprofen administration in children and the risk of developing or exacerbating asthma.

We searched MEDLINE, Embase, Cochrane Library, CINAHL, Web of Science, and Scopus from inception to May 2022, with no language limits; searched relevant reviews; and performed citation searching. We included studies of any design that were primary empirical peer-reviewed publications, where ibuprofen use in children 0–18 years was reported. Screening was performed in duplicate by blinded review. In total, 24 studies met our criteria. Data were extracted according to PRISMA guidelines, and the risk of bias was assessed using RoB2 and NOS tools. Quantitative data were pooled using fixed effect models, and qualitative data were pooled using narrative synthesis. Primary outcomes were asthma or asthma-like symptoms. The results were grouped according to population (general, asthmatic, and ibuprofen-hypersensitive), comparator type (active and non-active) and follow-up duration (short- and long-term).

Comparing ibuprofen with active comparators, there was no evidence of a higher risk associated with ibuprofen over both the short and long term in either the general or asthmatic population. Comparing ibuprofen use with no active alternative over a short-term follow-up, ibuprofen may provide protection against asthma-like symptoms in the general population when used to ease symptoms of fever or bronchiolitis. In contrast, it may cause asthma exacerbation for those with pre-existing asthma. However, in both populations, there were no clear long-term follow-up effects.

Conclusions

Ibuprofen use in children had no elevated risk relative to active comparators. However, use in children with asthma may lead to asthma exacerbation. The results are driven by a very small number of influential studies, and research in several key clinical contexts is limited to single studies. Both clinical trials and observational studies are needed to understand the potential role of ibuprofen in childhood asthma pathogenesis.

Peer Review reports

Asthma is a noncommunicable disease affecting approximately 235 million people worldwide and is characterised by inflammation and narrowing of the small airways in the lungs, leading to any combination of cough, wheeze, shortness of breath, and chest tightness [ 1 ]. The prevalence of asthma has increased in many countries in recent decades, especially among children, making asthma a serious global public health problem [ 2 , 3 ]. The reason for increasing asthma prevalence in children is uncertain, but there is likely a complex interaction of multiple risk factors, including environmental (e.g., increased air pollution, changes to housing conditions) and lifestyle factors (e.g., decreased physical activity, changes in diet, increased childhood obesity) [ 4 ].

Increased early-life use of pharmacological agents, such as analgesics and antipyretics, could be causal factors in childhood asthma pathogenesis. Due to fears of a causal relationship between aspirin use and Reye’s syndrome [ 5 ] and the risk of aspirin-induced asthma [ 6 ], aspirin use in children has dramatically decreased in recent decades. Consequently, drugs such as ibuprofen and paracetamol have become increasingly popular for treating fever and pain in children. In the United Kingdom, the National Health Service describes both paracetamol and ibuprofen as safe for treating pain and high temperature in babies and children [ 7 ]. However, caution is advised for ibuprofen use in children with asthma [ 8 ], while no such warning is supplied for paracetamol [ 9 ], suggesting that ibuprofen may be linked to asthma development or exacerbation in those with pre-existing asthma.

Ibuprofen is a non-steroidal anti-inflammatory drug (NSAID) that is frequently prescribed or administered over-the-counter (OTC) to treat fever and pain. Links between childhood ibuprofen use and asthma development or exacerbation are being investigated [ 10 , 11 , 12 , 13 , 14 , 15 , 16 ]. Ibuprofen’s inhibition of the cyclooxygenase system can lead to activation of the lipoxygenase system, resulting in bronchospasm [ 6 , 17 ], which could precipitate asthma. Additionally, empirical evidence exists demonstrating ibuprofen-induced asthma exacerbation in children with asthma and self-reported aspirin allergy [ 18 ].

Despite these points, two recent systematic reviews did not identify a risk difference between ibuprofen and paracetamol in asthma development or exacerbation in children [ 14 , 16 ]. However, one of these reviews limited the scope to randomised controlled trials (RCTs) [ 14 ], and the other to a relatively narrow age range of less than 2 years [ 16 ], restricting the generalisability of the findings.

We conducted a systematic review to assess the association between ibuprofen administration in children and the risk of developing or exacerbating asthma. The aim was to expand on previous reviews by looking across the entire age range of childhood from 0 to 18 years, including both interventional and observational studies, and assessing the association separately for clinically distinct paediatric subpopulations: general, asthmatic, and ibuprofen-hypersensitive.

Protocol development

We registered our review on PROSPERO on 8 July 2022 (CRD42022344838). The protocol was written according to PRISMA-P guidelines [ 19 , 20 ] and made publicly available on OSF prior to registration with PROSPERO. Further methodological details can be found in our online protocol ( https://doi.org/10.17605/OSF.IO/Z37KW ).

Eligibility criteria

A full list of eligibility criteria is provided in Supplementary Methods S1.1 (Supplementary Tables 1–2). The numeric results from studies included in our review were grouped by population for synthesis: (i) general population of children (i.e., studies not limiting eligibility to specific clinical subpopulations; however, some study-specific exclusion will always occur, for example, children with severe asthma, ibuprofen hypersensitivity, or other contraindications for safety reasons; children with conditions that could interfere with ibuprofen administration or absorption, such as inability to swallow or frequent vomiting; children receiving treatments that could interfere with the outcome assessment, such as leukotriene receptor antagonist and other anti-asthmatic treatments); (ii) children with asthma; and (iii) children with ibuprofen hypersensitivity.

Search strategy

We searched six bibliographic databases (MEDLINE, Embase, Cochrane Library, CINAHL, Web Of Science, Scopus) to identify records on 21-May-2022, and our searches were independently peer-reviewed using the PRESS Checklist [ 21 , 22 ] by an outreach librarian at the Bodleian Health Care Libraries, University of Oxford ( https://doi.org/10.17605/OSF.IO/R3AV6 ). All search strategies are provided in full in Supplementary Methods S1.2. Additional information sources included relevant reviews that were identified during screening [ 10 , 11 , 12 , 13 , 14 , 15 , 16 ] and backwards citation searching using the citationchaser tool [ 23 ]. EPPI-Reviewer [ 24 ] was used for de-duplication, and screening was performed independently in duplicate, with disagreements settled by discussion between both reviewers.

Data extraction and bias assessment

Data extraction and bias assessment were performed by one reviewer and then verified by a second reviewer, with disagreements settled by discussion. Our primary outcomes of interest were asthma, asthma-like symptoms, or asthma exacerbation [ 2 ]. For risk of bias assessment, the Cochrane risk of bias tool (RoB2) was used for RCTs [ 25 ], and the Newcastle-Ottawa Scale (NOS) [ 26 ] was used for observational studies. The results from these assessments were used to decide which studies to include in primary syntheses (Supplementary Figs. 1–2). Our approach to assessing meta-biases (outcome reporting and publication biases) is detailed in Supplementary Methods S1.3.

Data synthesis

A narrative synthesis was performed when outcomes were too heterogeneous to synthesise quantitatively. Otherwise, meta-analysis was performed using the R package meta [ 27 ]. Given the sparsity of the data for quantitative synthesis, we report the common effect model results as primary results. For completion, we report additional analysis outputs, e.g., both odds and risk ratios; both common and random effects model effect sizes; I 2 , tau 2 , and chi 2 for heterogeneity. Due to the sparsity of the results, subgroup analyses were not performed.

For meta-analysis of dichotomous data, ORs were pooled using Peto’s method [ 28 ] due to zero events in some arms. Where multiple outcomes from a study were available, the primary analysis was performed by selecting the outcomes with the expected lowest risk of bias. To test the robustness of the primary analysis, sensitivity analyses were performed using alternative combinations of studies’ numeric results.

Study selection characteristics

Of the 820 records screened, 18 relevant studies were identified, with a further 6 from relevant reviews (Supplementary Fig. 3). The study characteristics for all 24 studies are summarised in Table  1 . Relevant numeric results were grouped by population: (i) general population of children (Table  2 ), (ii) children with asthma (Table  3 ), and (iii) children with ibuprofen hypersensitivity (Table  4 ). For the general population and children with asthma, data synthesis was performed for (i) ibuprofen versus an active comparator (Fig.  1 ) and (ii) ibuprofen versus baseline (i.e., children not taking an alternative antipyretic or analgesic). To increase homogeneity, the results were also grouped based on the duration of follow-up, in line with a recent similar systematic review [ 16 ]: short duration of ≤ 28 days or long duration of > 28 days.

figure 1

Synthesis of results of ibuprofen versus active comparators. The active comparator for Kokki 2010 was ketoprofen; for all other studies, the active comparator was paracetamol. ( a ) General population of children over a short duration. ( b ) Children with asthma over a long duration. Abbreviations: OR = odds ratio; 95% CI = 95% confidence interval

General population

In total, 13 numeric results from 9 studies relevant to assessing ibuprofen use in a general population of children were identified [ 29 , 30 , 31 , 32 , 33 , 34 , 35 , 36 , 37 ] (Table  2 ).

Ibuprofen versus active comparator

There were six results from six interventional studies (all RCTs) and two results from one observational cohort study that compared ibuprofen use with an active comparator in the general population. The main active comparator was paracetamol, with one study [ 29 ] using ketoprofen (Table  2 ). The durations of study for the interventional RCT were all short (≤ 28 days). Two of these results were from publications based on the same dataset, the Boston University Fever Study [ 30 , 31 ], of which the original publication was selected for primary analysis.

The synthesis of five results comparing ibuprofen with active comparators (four paracetamol, one ketoprofen) resulted in a common effect OR = 0.87; 95% CI=[0.55, 1.37], demonstrating a lack of significant difference between ibuprofen and active comparators (Fig.  1 a). Our sensitivity analyses were in agreement with this primary result (Supplementary Fig. 4).

A single observational study [ 36 ] assessed ibuprofen relative to paracetamol over both short and long durations (Table  2 ) in a general population of children. Over a short duration (14 days), no significant difference in wheezing was identified, but over a long duration (1 year), they observed a significant advantage to ibuprofen over paracetamol, with a reduction in health care practitioner visits for wheezing illness consistent with bronchiolitis or asthma.

Taken together, these interventional and observational results suggest that there is no difference between ibuprofen and active comparators in the general population over a short duration (≤ 28 days). This finding is driven largely by a single study, the Boston University Fever Study [ 31 ], conducted almost 30 years ago on a large sample ( n  = 83,915) of children aged 6 months to 12 years. Over longer follow-up durations of one year, there is evidence from only a single cohort study [ 36 ] to suggest that there may be a reduction in wheezing when ibuprofen is prescribed, rather than paracetamol, for a first episode of bronchiolitis in children aged 0–12 months.

Ibuprofen versus baseline

Five numeric results from three studies relevant to assessing ibuprofen relative to baseline (children not taking an alternative antipyretic or analgesic) in the general population were identified (Table  2 ). All outcomes were from observational studies. Due to the sparsity and substantive heterogeneity of the results, quantitative synthesis was not possible.

Two studies looked at general populations over short durations (≤ 28 days) [ 33 , 36 ]. Both studies suggest that ibuprofen might decrease wheezing when taken for either acute febrile illness or bronchiolitis (Table  2 ).

Two studies looked at general populations of children over long durations [ 35 , 36 ] and produced conflicting results. One study [ 36 ] compared those prescribed ibuprofen for a first episode of bronchiolitis to those not prescribed ibuprofen (or another drug) and followed up participants over a 1-year duration, observing a positive impact of ibuprofen prescription. The second study [ 35 ] compared children administered ibuprofen to those not administered ibuprofen during the first postnatal year and followed-up participants at a 3–5 year duration, observing a negative impact of ibuprofen on asthma development, and at a 7–10 year duration, observing no difference between cohorts (Table  2 ).

Taken together, ibuprofen use in the general population of children during acute febrile illness or bronchiolitis might decrease wheezing when assessed in the short-term (≤ 28 days), with both observational studies reporting strong significant effects (Table  2 ). Over longer durations, the two observational studies identified in this review have substantive heterogeneity in design, analysis, and outcome, preventing meaningful synthesis. Additionally, their numeric findings are inconsistent (Table  2 ).

Asthmatic population

Five numeric results from four studies relevant to assessing ibuprofen in asthmatic paediatric populations were identified [ 38 , 39 , 40 , 41 ] (Table  3 ).

Three results across three studies compared ibuprofen with an active comparator (paracetamol in all cases) in asthmatic populations (Table  3 ). One interventional study assessed outcomes over a short duration [ 39 ] and found no difference between treatments. While further analyses in this paper did suggest a favourable outcome for ibuprofen relative to paracetamol, the results from this second post-hoc Boston Fever Study report are at very high risk of bias in the selection of the reported result (Supplementary Fig. 2).

Two studies looked at the comparison between ibuprofen and paracetamol in asthmatic populations over long durations [ 38 , 41 ]. The RCT study [ 41 ] identified no difference between drugs (OR = 0.90 [ 0.57, 1.41]). In contrast, the observational cohort study [ 38 ] identified a significant disadvantage for ibuprofen relative to paracetamol in asthmatic populations (aOR = 2.10 [1.17, 3.76]). These conflicting results for ibuprofen relative to paracetamol in asthmatic populations over long durations are challenging to resolve due to the different experimental designs. However, there are also several similarities in their designs: use of the same active comparator, inclusion of asthmatic populations of children with similar age ranges (Sheehan: 1–4.9 years; Fu: 1–5 years) over similar follow-up durations (Sheehan: 46 weeks; Fu: 52 weeks), and use of asthma exacerbation as the outcome. As an exploratory analysis, we synthesised these results, which resulted in a common effect OR = 1.24; 95% CI=[0.87, 1.77], suggesting an overall non-significant effect, which is consistent with the RCT study result alone (Fig.  1 b).

Taken together, these interventional and observational results suggest that there is no difference in asthma exacerbation between ibuprofen and paracetamol in asthmatic populations over short or long durations.

Only a single study looked at an asthmatic population over both short and long durations [ 40 ]. Over a short duration, this study found that ibuprofen increased asthma exacerbation. Over a long duration, they found no effect of ibuprofen on asthma exacerbation in the asthmatic population.

Ibuprofen hypersensitive population

Four drug provocation studies were identified that studied ibuprofen-hypersensitive children where ibuprofen was ingested and adverse events reported as part of hypersensitivity diagnosis [ 42 , 43 , 44 , 45 ]. A range of respiratory adverse effects were reported that included asthma, coughing, wheezing, dyspnoea, and respiratory distress (Table  4 ). Across the four studies, there was a total of 10 children with respiratory adverse events reported in a total of 80 children. Thus, in children with ibuprofen hypersensitivity, the average rate of respiratory adverse events following ibuprofen ingestion was 12.5%.

Unsynthesised papers

Seven studies were identified that reported the relationship between ibuprofen and asthma in children, which were not synthesised in this review [ 18 , 46 , 47 , 48 , 49 , 50 , 51 ]: five studies reported on single cases, and two group analysis studies had substantive differences in methodology and outcomes relative to other studies included in this review.

One crossover RCT [ 46 ] assessed the prevalence of ibuprofen-sensitive asthma in children with mild or moderate persistent asthma using bronchoprovocation challenge and found a prevalence of 2%. Another non-randomised controlled study [ 18 ] assessed the impact of short-term ibuprofen treatment on pulmonary function in children with mild to moderate stable asthma and self-reported aspirin allergy. Relative to a healthy control group, the asthmatic group exhibited a drop in FEV1 (forced expiratory volume in the first second) of 18.85% and an increase in FeNO (fractional exhaled nitric oxide) of 20.76 ppb. A summary of the results from these two studies is provided in Supplementary Table 3.

Four case reports of severe adverse events to ibuprofen were identified [ 47 , 48 , 50 , 51 ], and in all cases, the children had pre-existing asthma. Last, in a case series of fatal asthma in Finland, a single death due to ibuprofen ingestion was reported in a child with severe asthma and a known allergy to ibuprofen [ 49 ].

Here, we assessed the association between ibuprofen use and asthma in children aged 0–18 years. Both observational and interventional studies were reviewed in the general population as well as the asthmatic population. Studies that benchmarked ibuprofen against an active comparator almost exclusively used paracetamol, and in both populations of children, the combined evidence suggested no difference in asthma-related adverse events between ibuprofen and paracetamol (or ketoprofen) use. A single observational study suggested a potential benefit of ibuprofen over paracetamol prescription in response to bronchiolitis in the general paediatric population after a one-year follow-up. When ibuprofen use was assessed relative to no alternative drug administration, differences emerged between the general and asthmatic populations. In the short-term follow-up (1–14 days) to ibuprofen use, two observational studies reported favourable effects in the general population, while one observational and one interventional study observed unfavourable effects in the asthmatic population. Over a longer follow-up period (12 weeks to 10 years), no clear effect emerged for either population.

The majority of research on the association between ibuprofen use and asthma-related adverse events in children has been conducted in the general population, benchmarked relative to paracetamol, and participants followed-up over a short duration [ 29 , 30 , 31 , 32 , 34 , 36 , 37 ]. The aggregate result from five RCTs conducted in this context is driven primarily by the Boston University Fever Study [ 31 ], conducted almost 30 years ago on children aged 6 months to 12 years. While a single observational study [ 36 ] conducted five years ago corroborates this finding, research is sparse. Furthermore, only a single study comparing ibuprofen with paracetamol use with a short-term follow-up was conducted in children with asthma [ 39 ], and this study was a second post-hoc analysis publication of the same Boston University Fever Study dataset. Given the increased vulnerability of the asthmatic population to respiratory adverse events from ibuprofen use that was observed in our review, there is a clear lack of research comparing the short-term effects of ibuprofen relative to alternative analgesics and antipyretics such as paracetamol in children with asthma.

Two studies [ 38 , 41 ] assessing differences between ibuprofen and paracetamol use over longer follow-up periods in asthmatic populations report conflicting results. Due to several study similarities, we tentatively synthesised the two results, and no aggregate difference between ibuprofen and paracetamol was observed. However, in the RCT [ 41 ], the median dose of trial medication (ibuprofen or paracetamol) was 5.5 doses (IQR = 1–15) and matched between trial arms. In the retrospective cohort study [ 38 ], it could not be determined by the original investigators whether patients took the medication prescribed. Additionally, the observational study did not control for upper respiratory tract infections, a well-documented source of confounding by indication [ 35 , 52 ], which were not well-matched between the ibuprofen and paracetamol cohorts. For these reasons, the RCT finding alone or the synthesised outcome of no difference between drugs seems most justifiable.

Comparing the asthmatic and general populations for short-term asthma-relevant outcomes after ibuprofen use, no conflicts in results were observed. The two observational studies in the general population [ 33 , 36 ] both observed reductions in asthma-related outcomes, while one observational [ 40 ] and one interventional [ 18 ] study in the asthmatic population both observed increases in asthma-related outcomes. These findings highlight the importance of avoiding naïve pooling of results from studies in these different paediatric populations.

It is noteworthy that all RCTs reviewed compared ibuprofen with an active comparator. Of the studies comparing ibuprofen with a baseline of no alternative drug, three were cohort studies [ 35 , 36 , 40 ], and one was cross-sectional [ 33 ]. One non-randomised interventional study [ 18 ] compared an asthmatic sample with a healthy control sample. This highlights one of the limitations of the RCT design approach in assessing adverse events in the youngest children [ 53 , 54 ]. As a recent RCT feasibility study found [ 55 ], almost three quarters of parents surveyed described the use of a placebo comparator treatment as unacceptable for treating their child’s fever or pain. This ethical unacceptability of using a placebo arm in clinical trials for treating pain and fever in young children [ 55 , 56 ] introduces an ambiguity into these active comparator RCT studies, as a lack of difference among active comparators does not exclude the possibility that both ibuprofen and active comparator use may be associated with parallel increases in asthma exacerbations [ 41 , 56 ]. It has been argued that, given that ibuprofen and paracetamol have different mechanisms of action, it is unlikely that their use could be associated with similar increases in the rate of asthma-related complications that are known to be determined by disparate mechanisms of disease [ 41 , 56 ]. However, this speculation requires careful examination and empirical support. Observational studies with comparator groups in which an active treatment was not prescribed or taken can be used as a baseline control to assess the impact of ibuprofen alone, acknowledging the challenges of inferring causality in observational studies. It is these advantages and disadvantages of both RCTs and observational designs that require a review of the association between ibuprofen use and asthma-related outcomes in children to consider and attempt to synthesise all study design types. This feature of our review adds substantially to two recent systematic reviews in this area [ 14 , 56 ] that either limited the study designs to RCTs [ 14 ] or limited the population to those under 2 years [ 56 ].

We identified four drug provocation trials in which ibuprofen hypersensitivity was confirmed in children by controlled administration of ibuprofen [ 42 , 43 , 44 , 45 ] and respiratory adverse events were recorded. The average percentage of children with confirmed ibuprofen hypersensitivity who displayed respiratory adverse events was 12.5%. Relative to other adverse events, such as angio-oedema and urticaria (which were by far the most common adverse events), asthma and asthma-like respiratory events were less commonly reported. While adverse respiratory reactions to ibuprofen ingestion in those with ibuprofen hypersensitivity can be quite severe, as reported in a handful of case reports [ 47 , 48 , 50 , 51 ], fatalities appear to be very rare. In this review, only a single case of ibuprofen-induced asthma fatality was identified [ 49 ].

The number of studies in this review that were relevant to important clinical populations and contexts was unfortunately sparse. Only a single publication was identified for each of the following three contexts: the general population where ibuprofen is compared with an active comparator with a follow-up duration longer than 1 month [ 36 ]; the asthmatic population where ibuprofen is compared with an active comparator with a short-term follow-up [ 39 ]; and the asthmatic population where ibuprofen is compared with a baseline of no active comparator with a follow-up duration longer than 1 month [ 40 ]. These limitations hinder the generalisability of findings to several important clinical contexts and are an ongoing issue to be addressed.

Here, we found that research is most lacking for populations of children with pre-existing asthma, who are the population at most risk for potential respiratory adverse events following ibuprofen use. Our review highlights the importance of assessing both interventional and observational studies and analysing the general population and asthmatic population separately. Continued investigation into the role of early-life ibuprofen use and its short-term and long-term impact on childhood asthma is needed.

Data availability

All data (data collection form, risk of bias assessment forms, and data used for all analyses) are publicly available on the project’s OSF site: https://doi.org/10.17605/OSF.IO/ZBDS7 . All code used for the meta-analysis is publicly available on Zenodo: https://doi.org/10.5281/zenodo.11258287 .

WHO. Asthma [Internet]. 2023 [cited 2023 Apr 4]. https://www.who.int/news-room/fact-sheets/detail/asthma

Reddel HK, Bacharier LB, Bateman ED, Brightling CE, Brusselle GG, Buhl R et al. Global Initiative for Asthma Strategy 2021: executive summary and rationale for key changes. European Respiratory Journal [Internet]. 2022 Jan 1 [cited 2022 Apr 27];59(1). https://erj.ersjournals.com/content/59/1/2102730

Vos T, Murray CJL, GBD 2016 Disease and Injury Incidence and Prevalence Collaborators. Global, regional, and national incidence, prevalence, and years lived with disability for 328 diseases and injuries for 195 countries, 1990–2016: a systematic analysis for the global burden of Disease Study 2016. Lancet. 2017;390(10100):1211–59.

Article   Google Scholar  

Beasley R, Semprini A, Mitchell EA. Risk factors for asthma: is prevention possible? Lancet. 2015;386(9998):1075–85.

Article   PubMed   Google Scholar  

Schrör K. Aspirin and Reye syndrome: a review of the evidence. Paediatr Drugs. 2007;9(3):195–204.

Szczeklik A, Stevenson DD. Aspirin-induced asthma: advances in pathogenesis and management. J Allergy Clin Immunol. 1999;104(1):5–13.

Article   PubMed   CAS   Google Scholar  

NHS. nhs.uk. 2020 [cited 2023 Apr 4]. Medicines for babies and children. https://www.nhs.uk/conditions/baby/health/medicines-for-babies-and-children/

NHS. nhs.uk. 2022 [cited 2023 Apr 4]. Who can and cannot take ibuprofen for children. https://www.nhs.uk/medicines/ibuprofen-for-children/who-can-and-cannot-take-ibuprofen-for-children/

NHS. nhs.uk. 2022 [cited 2023 Apr 4]. Who can and cannot take paracetamol for children. https://www.nhs.uk/medicines/paracetamol-for-children/who-can-and-cannot-take-paracetamol-for-children/

Kanabar D, Dale S, Rawat M. A review of ibuprofen and acetaminophen use in febrile children and the occurrence of asthma-related symptoms. Clin Ther. 2007;29(12):2716–23.

Kanabar DJ. A clinical and safety review of Paracetamol and Ibuprofen in children. Inflammopharmacology. 2017;25(1):1–9.

Article   PubMed   PubMed Central   CAS   Google Scholar  

Kauffman RE, Lieh-Lai M. Ibuprofen and increased morbidity in children with asthma: fact or fiction? Paediatr Drugs. 2004;6(5):267–72.

Pierce CA, Voss B. Efficacy and safety of ibuprofen and acetaminophen in children and adults: a meta-analysis and qualitative review. Ann Pharmacother. 2010;44(3):489–506.

Sherbash M, Furuya-Kanamori L, Nader JD, Thalib L. Risk of wheezing and asthma exacerbation in children treated with Paracetamol versus Ibuprofen: a systematic review and meta-analysis of randomised controlled trials. BMC Pulm Med. 2020;20(1):72.

Southey ER, Soares-Weiser K, Kleijnen J. Systematic review and meta-analysis of the clinical safety and tolerability of ibuprofen compared with paracetamol in paediatric pain and fever. Current medical research and opinion [Internet]. 2009 Sep [cited 2022 May 19];25(9). https://pubmed.ncbi.nlm.nih.gov/19606950/

Tan E, Braithwaite I, McKinlay CJD, Dalziel SR. Comparison of Acetaminophen (paracetamol) with Ibuprofen for Treatment of Fever or Pain in children younger than 2 years: a systematic review and Meta-analysis. JAMA Netw Open. 2020;3(10):e2022398.

Article   PubMed   PubMed Central   Google Scholar  

Szczeklik A. The cyclooxygenase theory of aspirin-induced asthma. Eur Respir J. 1990;3(5):588–93.

Su YM, Huang CS, Wan KS. Short-term ibuprofen treatment and pulmonary function in children with asthma. Indian Pediatr. 2015;52(8):691–3.

Moher D, Shamseer L, Clarke M, Ghersi D, Liberati A, Petticrew M, et al. Preferred reporting items for systematic review and meta-analysis protocols (PRISMA-P) 2015 statement. Syst Reviews. 2015;4(1):1.

Shamseer L, Moher D, Clarke M, Ghersi D, Liberati A, Petticrew M, et al. Preferred reporting items for systematic review and meta-analysis protocols (PRISMA-P) 2015: elaboration and explanation. BMJ. 2015;349:g7647.

McGowan J, Sampson M, Salzwedel DM, Cogo E, Foerster V, Lefebvre C. PRESS peer review of electronic search strategies: 2015 Guideline Statement. J Clin Epidemiol. 2016;75:40–6.

McGowan J, Sampson M, Salzwedel DM, Cogo E, Lefebvre C, PRESS Peer Review of Electronic Search Strategies. : 2015 Guideline Explanation and Elaboration (PRESS E&E). CADTH Methods and Guidelines [Internet]. 2016 [cited 2022 Mar 3]; https://www.cadth.ca/press-peer-review-electronic-search-strategies-2015-guideline-explanation-and-elaboration

Haddaway NR, Grainger MJ, Gray CT. citationchaser: An R package and Shiny app for forward and backward citations chasing in academic searching [Internet]. Zenodo; 2021 [cited 2023 Jan 17]. https://zenodo.org/record/4543513

Thomas J, Graziosi S, Brunton J, Ghouze Z, O’Driscoll P, Bond M et al. EPPI-Reviewer: advanced software for systematic reviews, maps and evidence synthesis [Internet]. EPPI-Centre, UCL Social Research Institute, University College London; 2022. http://eppi.ioe.ac.uk/cms/Default.aspx?tabid=2914

Sterne JAC, Savović J, Page MJ, Elbers RG, Blencowe NS, Boutron I, et al. RoB 2: a revised tool for assessing risk of bias in randomised trials. BMJ. 2019;366:l4898.

Wells G, Shea B, O’Connell D, Peterson J, Welch V, Losos M et al. The Newcastle–Ottawa Scale (NOS) for assessing the quality of nonrandomised studies in meta-analyses [Internet]. 2008 [cited 2022 May 23]. http://www.ohri.ca/programs/clinical_epidemiology/oxford.asp

Balduzzi S, Rücker G, Schwarzer G. How to perform a meta-analysis with R: a practical tutorial. Evid Based Ment Health. 2019;22(4):153–60.

Yusuf S, Peto R, Lewis J, Collins R, Sleight P. Beta blockade during and after myocardial infarction: an overview of the randomized trials. Prog Cardiovasc Dis. 1985;27(5):335–71.

Kokki H, Kokki M. Ketoprofen versus Paracetamol (acetaminophen) or ibuprofen in the management of fever: results of two randomized, double-blind, double-dummy, parallel-group, repeated-dose, multicentre, phase III studies in children. Clin Drug Investig. 2010;30(6):375–86.

Lesko SM, Mitchell AA. The safety of acetaminophen and ibuprofen among children younger than two years old. Pediatrics. 1999;104(4):e39.

Lesko SM, Mitchell AA. An assessment of the safety of pediatric ibuprofen. A practitioner-based randomized clinical trial. JAMA. 1995;273(12):929–33.

Luo S, Ran M, Luo Q, Shu M, Guo Q, Zhu Y, et al. Alternating Acetaminophen and Ibuprofen versus monotherapies in improvements of distress and reducing refractory fever in Febrile children: a Randomized Controlled Trial. Paediatr Drugs. 2017;19(5):479–86.

Matok I, Elizur A, Perlman A, Ganor S, Levine H, Kozer E. Association of Acetaminophen and Ibuprofen Use with Wheezing in Children with Acute Febrile illness. Ann Pharmacother. 2017;51(3):239–44.

McIntyre J, Hull D. Comparing efficacy and tolerability of Ibuprofen and Paracetamol in fever. Arch Dis Child. 1996;74(2):164–7.

Sordillo JE, Scirica CV, Rifas-Shiman SL, Gillman MW, Bunyavanich S, Camargo CA, et al. Prenatal and infant exposure to acetaminophen and ibuprofen and the risk for wheeze and asthma in children. J Allergy Clin Immunol. 2015;135(2):441–8.

Walsh P, Rothenberg SJ. Wheezing after the use of acetaminophen and or ibuprofen for first episode of bronchiolitis or respiratory tract infection. PLoS ONE. 2018;13(9):e0203770.

Wong A, Sibbald A, Ferrero F, Plager M, Santolaya ME, Escobar AM, et al. Antipyretic effects of dipyrone versus ibuprofen versus acetaminophen in children: results of a multinational, randomized, modified double-blind study. Clin Pediatr (Phila). 2001;40(6):313–24.

Fu LS, Lin CC, Wei CY, Lin CH, Huang YC. Risk of acute exacerbation between acetaminophen and ibuprofen in children with asthma. PeerJ. 2019;7:e6760.

Lesko SM, Louik C, Vezina RM, Mitchell AA. Asthma morbidity after the short-term use of ibuprofen in children. Pediatrics. 2002;109(2):E20.

Lo PC, Tsai YT, Lin SK, Lai JN. Risk of asthma exacerbation associated with nonsteroidal anti-inflammatory drugs in childhood asthma: a nationwide population-based cohort study in Taiwan. Med (Baltim). 2016;95(41):e5109.

Article   CAS   Google Scholar  

Sheehan WJ, Mauger DT, Paul IM, Moy JN, Boehmer SJ, Szefler SJ, et al. Acetaminophen versus Ibuprofen in Young children with mild persistent asthma. N Engl J Med. 2016;375(7):619–30.

Corzo JL, Zambonino MA, Muñoz C, Mayorga C, Requena G, Urda A, et al. Tolerance to COX-2 inhibitors in children with hypersensitivity to nonsteroidal anti-inflammatory drugs. Br J Dermatol. 2014;170(3):725–9.

Ertoy Karagol HI, Yilmaz O, Topal E, Ceylan A, Bakirtas A. Nonsteroidal anti-inflammatory drugs-exacerbated respiratory disease in adolescents. Int Forum Allergy Rhinol. 2015;5(5):392–8.

Guvenir H, Dibek Misirlioglu E, Vezir E, Toyran M, Ginis T, Civelek E et al. Nonsteroidal anti-inflammatory drug hypersensitivity among children. Allergy Asthma Proc. 2015;36(5):386–93.

Yilmaz Topal O, Kulhas Celik I, Turgay Yagmur I, Toyran M, Civelek E, Karaatmaca B, et al. Results of NSAID provocation tests and difficulties in the classification of children with nonsteroidal anti-inflammatory drug hypersensitivity. Ann Allergy Asthma Immunol. 2020;125(2):202–7.

Debley JS, Carter ER, Gibson RL, Rosenfeld M, Redding GJ. The prevalence of ibuprofen-sensitive asthma in children: a randomized controlled bronchoprovocation challenge study. J Pediatr. 2005;147(2):233–8.

Goraya JS, Virdi VS. To the editor: exacerbation of asthma by ibuprofen in a very young child. Pediatr Pulmonol. 2001;32(3):262.

King G, Byrne A, Fleming P. A case of severe NSAID exacerbated respiratory disease (NERD) following a dental procedure in a child. Eur Arch Paediatr Dent. 2016;17(4):277–81.

Malmström K, Kaila M, Kajosaari M, Syvänen P, Juntunen-Backman K. Fatal asthma in Finnish children and adolescents 1976–1998: validity of death certificates and a clinical description. Pediatr Pulmonol. 2007;42(3):210–5.

Menendez R, Venzor J, Ortiz G. Failure of zafirlukast to prevent ibuprofen-induced anaphylaxis. Ann Allergy Asthma Immunol. 1998;80(3):225–6.

Palmer GM. A teenager with severe asthma exacerbation following ibuprofen. Anaesth Intensive Care. 2005;33(2):261–5.

Schnabel E, Heinrich J. Respiratory tract infections and not Paracetamol medication during infancy are associated with asthma development in childhood. J Allergy Clin Immunol. 2010;126(5):1071–3.

CRD. Chapter 4: systematic reviews of adverse effects. Systematic reviews: CRD’s guidance for undertaking reviews in health care. York Publishing Services; 2009.

Reeves B, Deeks J, Higgins J, Shea B, Tugwell P, Wells G. Chapter 24: Including non-randomized studies on intervention effects. In: Higgins J, Thomas J, Chandler J, Cumpston M, Li T, Page M, editors. Cochrane Handbook for Systematic Reviews of Interventions [Internet]. 6.3. Cochrane; 2022 [cited 2023 Apr 7]. https://training.cochrane.org/handbook/current/chapter-24

Riley J, Braithwaite I, Shirtcliffe P, Caswell-Smith R, Hunt A, Bowden V, et al. Randomized controlled trial of asthma risk with Paracetamol use in infancy–a feasibility study. Clin Exp Allergy. 2015;45(2):448–56.

Tan E, Braithwaite I, McKinlay C, Riley J, Hoare K, Okesene-Gafa K, et al. Randomised controlled trial of Paracetamol or Ibuprofen, as required for fever and pain in the first year of life, for prevention of asthma at age 6 years: Paracetamol or Ibuprofen in the primary prevention of asthma in Tamariki (PIPPA Tamariki) protocol. BMJ Open. 2020;10(12):e038296.

Download references

Acknowledgements

We thank Imran Lodhi, Fiona Murray-Zmijewski, Frederic Esclassan, and Bill Laughey for reviewing and advising on improvements for this systematic review. We thank Carolyn Smith, the Outreach Librarian at University of Oxford’s Bodleian Libraries, who performed the search strategy PRESS Peer Review.

This work was funded by Reckitt. Employees of Reckitt were involved in the design and conduct of the study; collection, management, analysis, and interpretation of the data; preparation, review, or approval of a manuscript; and decision regarding where to submit the manuscript for publication.

Author information

Authors and affiliations.

Department of Paediatrics, University of Oxford, Oxford, UK

Luke Baxter, Maria M. Cobo, Aomesh Bhatt & Rebeccah Slater

Colegio de Ciencias Biologicas y Ambientales, Universidad San Francisco de Quito USFQ, Quito, Ecuador

Maria M. Cobo

Reckitt, Dansom Lane, Hull, HU8 7DS, UK

Olutoba Sanni

Reckitt (Global Headquarters), Turner House, 103-105 Bath Road, Slough, Berkshire, SL1 3UH, UK

Nutan Shinde

You can also search for this author in PubMed   Google Scholar

Contributions

LB: conceptualization, methodology, data curation, formal analysis, investigation, visualization, writing – original draft, writing – review & editing. MC: methodology, data curation, investigation, validation, writing – review & editing. AB: conceptualization, methodology, data curation, investigation, writing – review & editing. RS: conceptualization, methodology, data curation, investigation, writing – review & editing. OS: conceptualization, methodology, project administration, writing – review & editing. NS: conceptualization, methodology, project administration, writing – review & editing.

Corresponding author

Correspondence to Luke Baxter .

Ethics declarations

Ethics approval and consent to participate.

Not Applicable.

Consent for publication

Competing interests.

OS and NS are current employees of Reckitt and may hold equity interest in Reckitt. LB and RS were compensated by Reckitt for activities related to execution of the study. MC and AB declare no competing interests. No other disclosures were reported.

Additional information

Publisher’s note.

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary Material 1

Rights and permissions.

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ .

Reprints and permissions

About this article

Cite this article.

Baxter, L., Cobo, M.M., Bhatt, A. et al. The association between ibuprofen administration in children and the risk of developing or exacerbating asthma: a systematic review and meta-analysis. BMC Pulm Med 24 , 412 (2024). https://doi.org/10.1186/s12890-024-03179-3

Download citation

Received : 24 October 2023

Accepted : 22 July 2024

Published : 26 August 2024

DOI : https://doi.org/10.1186/s12890-024-03179-3

Share this article

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

  • Hypersensitivity
  • Bronchospasm
  • Bronchoconstriction

BMC Pulmonary Medicine

ISSN: 1471-2466

research bias in peer review

American Psychological Association

Style and Grammar Guidelines

APA Style provides a foundation for effective scholarly communication because it helps writers present their ideas in a clear, concise, and inclusive manner. When style works best, ideas flow logically, sources are credited appropriately, and papers are organized predictably. People are described using language that affirms their worth and dignity. Authors plan for ethical compliance and report critical details of their research protocol to allow readers to evaluate findings and other researchers to potentially replicate the studies. Tables and figures present information in an engaging, readable manner.

The style and grammar guidelines pages present information about APA Style as described in the Publication Manual of the American Psychological Association, Seventh Edition and the Concise Guide to APA Style, Seventh Edition . Any updates to APA Style are noted on the applicable topic pages. If you are still using the sixth edition, helpful resources are available in the sixth edition archive .

Looking for more style?

APA Style CENTRAL logo

  • Accessibility of APA Style
  • Line Spacing
  • Order of Pages
  • Page Header
  • Paragraph Alignment and Indentation
  • Sample Papers
  • Title Page Setup
  • Appropriate Level of Citation
  • Basic Principles of Citation
  • Classroom or Intranet Sources
  • Paraphrasing
  • Personal Communications
  • Quotations From Research Participants
  • Secondary Sources
  • Abbreviations
  • Capitalization
  • Italics and Quotation Marks
  • Punctuation
  • Spelling and Hyphenation
  • General Principles for Reducing Bias
  • Historical Context
  • Intersectionality
  • Participation in Research
  • Racial and Ethnic Identity
  • Sexual Orientation
  • Socioeconomic Status
  • Accessible Use of Color in Figures
  • Figure Setup
  • Sample Figures
  • Sample Tables
  • Table Setup
  • Archival Documents and Collections
  • Basic Principles of Reference List Entries
  • Database Information in References
  • DOIs and URLs
  • Elements of Reference List Entries
  • Missing Reference Information
  • Reference Examples
  • References in a Meta-Analysis
  • Reference Lists Versus Bibliographies
  • Works Included in a Reference List
  • Active and Passive Voice
  • Anthropomorphism
  • First-Person Pronouns
  • Logical Comparisons
  • Plural Nouns
  • Possessive Adjectives
  • Possessive Nouns
  • Singular “They”
  • Adapting a Dissertation or Thesis Into a Journal Article
  • Correction Notices
  • Cover Letters
  • Journal Article Reporting Standards (JARS)
  • Open Science
  • Response to Reviewers

COMMENTS

  1. Moving towards less biased research

    Hence, even if one were to confer it a hybrid status wherein it can both prevent and detect bias, the extent of bias that has long been documented in peer-reviewed journals reveals major weaknesses in peer review. Recent high-profile COVID-19 -related retractions 31 and commentary 32 further confirms these weaknesses. Consequently, we need to ...

  2. Best Available Evidence or Truth for the Moment: Bias in Research

    The subject of this column is the nature of bias in both quantitative and qualitative research. To that end, bias will be defined and then both the processes by which it enters into research will be entertained along with discussions on how to ameliorate this problem.

  3. Reducing bias and improving transparency in medical research: a

    Underpowered research or questionable methodological or statistical decisions can be identified and addressed through peer-review prior to study conduct. Since journals commit to publication upon review of study plans, rather than finished papers, registered reports may reduce the incentive for authors to 'spin' and for reviewers to request ...

  4. Peer Review Bias: A Critical Review

    Conceptually, the peer review process can lead to distortion of the results from the viewpoint of the evidence user, akin to bias. Peer review bias can be defined as a violation of impartiality in the evaluation of a submission. We propose that this transgression of neutrality standards can affect the dissemination of research (ie, denying ...

  5. CSR Initiatives to Address Bias in Peer Review

    16,646. As of Dec 2022, 16,646 reviewers have completed the training. 91%. 91% of reviewers thought that the training substantially improved their ability to identify bias in peer review. 93%. 93% of reviewers stated the training made them substantially more comfortable intervening against bias. Data for the Jan 2022 Advisory Council cycle.

  6. Identifying and Avoiding Bias in Research

    Abstract. This narrative review provides an overview on the topic of bias as part of Plastic and Reconstructive Surgery 's series of articles on evidence-based medicine. Bias can occur in the planning, data collection, analysis, and publication phases of research. Understanding research bias allows readers to critically and independently review ...

  7. Revisiting Bias in Qualitative Research: Reflections on Its

    Stories of research funding bodies and journal peer reviewers rejecting proposed qualitative methods or study findings due to "bias" are not uncommon. Usually, I find this relates to a perception by peer reviewers that the way data have/will be collected or analyzed is too closely aligned with the personal agenda of the researcher(s).

  8. Quantifying and addressing the prevalence and bias of study ...

    Future research is needed to refine our methodology, but our empirically grounded form of bias-adjusted meta-analysis could be implemented as follows: 1.) collate studies for the same true effect ...

  9. Exploring Bias in Scientific Peer Review: An ASCO Initiative

    With regard to gender bias in publications, data suggest that women are under-represented in peer review and same-gender preference exists strongly for males and females. 9 For racial bias, we found data in National Institutes of Health research awards, which showed that African Americans/Black and Asians are less likely to receive National ...

  10. PDF Tackling Bias in Peer Review Guidance for Peer Reviewers

    Reducing and challenging bias in peer review is critically important to ensure the integrity of the process and to help advance equity, diversity, and inclusion in our scientific ... "The big consequences of small biases: A simulation of peer review." Research Policy, 44(6): 1266-1270. [online] Available at: <https://www.sciencedirect.com ...

  11. Understanding Bias in Peer Review

    Understanding Bias in Peer Review. In the 1600's, a series of practices came into being known collectively as the "scientific method.". These practices encoded verifiable experimentation as a path to establishing scientific fact. Scientific literature arose as a mechanism to validate and disseminate findings, and standards of scientific ...

  12. Peer Review Bias: A Critical Review

    The peer review process can also introduce bias. A compelling ethical and moral rationale necessitates improving the peer review process. A double-blind peer review system is supported on equipoise and fair-play principles. Triple-and quadruple-blind systems have also been described but are not commonly used.

  13. Peer Review Bias: A Critical Review

    Peer Review as a Source of Bias. Scientific experiments in general, and biomedical research specifically, are subject to bias and confounding. Domains for the different types of bias and confounding have been established as to their role before, during, and after the intervention is delivered. 6 Dedicated tools have been developed to evaluate ...

  14. Quantitative bias analysis methods for summary level ...

    Plain Language Summary. Quantitative bias analysis methods can be used to evaluate the impact of biases on observational study results. However, little is known about the full range and characteristics of available methods in the peer-reviewed literature that can be used to conduct quantitative bias analysis using information reported in manuscripts and other publicly available sources without ...

  15. Bias in peer review

    Research on bias in peer review examines scholarly communication and funding processes to assess the epistemic and social legitimacy of the mechanisms by which knowledge communities vet and self-regulate their work. Despite vocal concerns, a closer look at the empirical and methodological limitations of research on bias raises questions about ...

  16. Working toward reducing bias in peer review

    Reducing bias in peer review is critically important to ensure the integrity of our editorial process and embrace diversity, equity, and inclusion in our scientific communities. As a practical matter, we will encourage associate editors to take gender, racial, and geographical diversity into consideration when selecting reviewers for submitted ...

  17. Peer review and gender bias: A study on 145 scholarly journals

    Abstract. Scholarly journals are often blamed for a gender gap in publication rates, but it is unclear whether peer review and editorial processes contribute to it. This article examines gender bias in peer review with data for 145 journals in various fields of research, including about 1.7 million authors and 740,000 referees.

  18. PDF Bias in peer review

    This review provides a brief description of the function, history, and scope of peer review; articulates and critiques the con-ception of bias unifying research on bias in peer review; characterizes and examines the empirical, methodologi-cal, and normative claims of bias in peer review research; and assesses possible alternatives to the status ...

  19. The good, the bad, and the ugly of implicit bias

    The concept of implicit bias, also termed unconscious bias, and the related Implicit Association Test (IAT) rests on the belief that people act on the basis of internalised schemas of which they are unaware and thus can, and often do, engage in discriminatory behaviours without conscious intent.1 This idea increasingly features in public discourse and scholarly inquiry with regard to ...

  20. How to minimise bias during peer review?

    Blinding can help reduce bias in peer review. In double-blind peer review, the identities of authors and reviewers are concealed from each other. Some journals have even introduced triple-blind peer review, where the authors' identity is also hidden from the journal editors. These types of blinding efforts help reviewers focus on the content ...

  21. Research: Gender bias in scholarly peer review

    Peer review has an important role in improving the quality of research papers. It is the "lifeblood of research in academia […] the social structure that subjects research to the critical assessment of other researchers" (Bourdieu, 1975).This structure relies on self-regulated interactions within the scientific community, in which a journal editor appoints peer reviewers with expertise ...

  22. Public attitudes towards personal health data sharing in long-term

    Background Loss to follow-up in long-term epidemiological studies is well-known and often substantial. Consequently, there is a risk of bias to the results. The motivation to take part in an epidemiological study can change over time, but the ways to minimize loss to follow-up are not well studied. The Citizen Science approach offers researchers to engage in direct discussions with study ...

  23. Suicide rates among physicians compared with the general ...

    Objectives To estimate age standardised suicide rate ratios in male and female physicians compared with the general population, and to examine heterogeneity across study results. Design Systematic review and meta-analysis. Data sources Studies published between 1960 and 31 March 2024 were retrieved from Embase, Medline, and PsycINFO. There were no language restrictions. Forward and backwards ...

  24. Eliminating Explicit and Implicit Biases in Health Care: Evidence and

    1. INTRODUCTION. Although expressions of explicit bias have declined in the United States over time, implicit bias has remained unrelenting. Health care providers hold negative explicit and implicit biases against many marginalized groups of people, including racial and ethnic minoritized populations, disabled populations, and gender and sexual minorities, among others (29, 63).

  25. A systematic review and meta-analysis of randomized trials of

    We conducted a systematic review and meta-analysis of 17 trials that examined the effect of substituting soymilk (median dose of 22 g/day or 6.6 g/250 mL serving of soy protein per day and 17.2 g/day or 6.9 g/250 mL of total [added] sugars in the sweetened soymilk) for cow's milk (median dose of 24 g/day or 8.3 g/250 mL of milk protein and 24 g/day or 12 g/250 mL of total sugars [lactose ...

  26. Physical Review E's Chief Editor on Interdisciplinary Research and Peer

    Biologists, chemists, mathematicians, and engineers may have different approaches, but they are not alien to each other. Lots of the most interesting research these days is at the interfaces of these different fields. At Physical Review E, we encourage submissions on physics-adjacent research for this reason. If a variety of tangentially ...

  27. University Libraries purchases Sage Research Methods package

    The OHIO package also includes peer-reviewed case studies with accompanying discussion questions and multiple-choice quiz questions and can be embedded into Canvas courses. Further, the collection includes a Diversifying and Decolonizing Research subcollection that highlights the importance of inclusive research, perspectives from marginalized ...

  28. Sexual side effects of antipsychotic drugs in...

    The Cochrane Risk of Bias tool 1 and ROBINS-I will be employed to evaluate the risk of bias for RCTs and NRS, respectively. ... [version 1; peer review: awaiting peer review]. F1000Research 2024, 13:973 (https://doi.org ... They contributed in identifying the research idea and developing this review protocol from their perspective as people ...

  29. The association between ibuprofen administration in children and the

    Ibuprofen is one of the most commonly used analgesic and antipyretic drugs in children. However, its potential causal role in childhood asthma pathogenesis remains uncertain. In this systematic review, we assessed the association between ibuprofen administration in children and the risk of developing or exacerbating asthma. We searched MEDLINE, Embase, Cochrane Library, CINAHL, Web of Science ...

  30. Style and Grammar Guidelines

    People are described using language that affirms their worth and dignity. Authors plan for ethical compliance and report critical details of their research protocol to allow readers to evaluate findings and other researchers to potentially replicate the studies. Tables and figures present information in an engaging, readable manner.