It’s a wonderful world — and universe — out there.

Come explore with us!  

Science News Explores

When a study can’t be replicated.

Many factors can prevent scientists from repeating research and confirming results

white male scientist

Sometimes the findings of research that was done well can’t be replicated — confirmed by other scientists. The reasons may vary or never be fully understood, new studies find. 

ViktorCap / iStockphoto

Share this:

  • Google Classroom

By Janet Raloff

September 11, 2015 at 6:00 am

In the world of science, the gold standard for accepting a finding is seeing it “replicated.” To achieve this, researchers must repeat a study and find the same conclusion. Doing so helps confirm that the original finding wasn’t a fluke — one due to chance.

Yet try as they might, many research teams cannot replicate , or match, an original study’s results. Sometimes that occurs because the original scientists faked the study. Indeed, a 2012 study  looked at more than 2,000 published papers that had to be retracted — eventually labeled by the publisher as too untrustworthy to believe. Of these, more than 65 percent involved cases of misconduct, including fraud.

But even when research teams act honorably, their studies may still prove hard to replicate, a new study finds. Yet a second new analysis shows how important it is to try to replicate studies. It also shows what researchers can learn from the mistakes of others.

The first study focused on 100 human studies in the field of psychology. That field examines how animals or people respond to certain conditions and why. The second study looked at 38 research papers reporting possible explanations for global warming. The papers presented explanations for global warming that run contrary to those of the vast majority of the world’s climate scientists.

Both new studies set out to replicate the earlier ones. Both had great trouble doing so. Yet neither found evidence of fraud. These studies point to how challenging it can be to replicate research. Yet without that replication, the research community may find it hard to trust a study’s data or know how to interpret what those data mean.

Trying to make sense of the numbers

Brian Nosek led the first new study. He is a psychologist the University of Virginia in Charlottesville. His research team recruited 270 scientists. Their mission: to reproduce the findings of 100 previously published studies. All of the studies had appeared in one of three major psychology journals in 2008. In the end, only 35 of the studies could be replicated by this group. The researchers described their efforts in the August 28 issue of Science .

Two types of findings proved hardest to confirm. The first were those that originally had been described as unexpected. The second were ones that had barely achieved statistical significance . That raises concerns, Nosek told Science News , about the common practice of publishing attention-grabbing results. Many of those types of findings appear to have come from data that had not been statistically strong. Such studies may have included too few individuals. Or they may have turned up only weak signs of an effect. There is a greater likelihood that such findings are the result of random chance.

No one can say why the tests by Nosek’s team failed to confirm findings in 65 percent of their tries. It’s possible the initial studies were not done well. But even if they had been done well, conflicting conclusions raise doubts about the original findings. For instance, they may not be applicable to groups other than the ones initially tested.

Rasmus Benestad works at the Norwegian Meteorological Institute in Oslo. He led the second new study. It focused on climate research.

Explainer: Global warming and the greenhouse effect

In climate science, some 97 percent of reports and scientists have come to a similar conclusion: that human activities, mostly the burning of fossil fuels , are a major driver of a recent global warming. The 97 percent figure came from the United Nations’ Intergovernmental Panel on Climate Change. This is a group of researchers active in climate science. The group reviewed nearly 12,000 abstracts of published research findings. It also received some 1,200 ratings by climate scientists of what the published data and analyses had concluded about climate change. Nearly all came up with the same source: us.

But what about the other 3 percent? Was there something different about those studies? Or could there be something different about the scientists who felt that humans did not play a big role in global warming? That’s what this new study sought to probe. It took a close look at 38 of these “contrarian” papers.

Benestad’s team attempted to replicate the original analyses in these papers. In doing so, the team pored over the details of each study. Along the way, they identified several common problems. Many started with false assumptions, the new analysis says. Some used a faulty analysis. Others set up an improper hypothesis for testing. Still others used “incorrect statistics” for making their analyses, Benestad’s group reports. Several papers also set up a false either/or situation. They had argued if one thing influenced global warming, then the other must not have. In fact, Benestad’s group noted, that logic was sometimes faulty. In many cases, both explanations for global warming might work together.

Mistakes or an incomplete understanding of previous work by others could lead to faulty assessments,  Benestad’s team concluded . Its new analysis appeared August 20 in Theoretical and Applied Climatology.

What to make of this?

It might seem like it should be easy to copy a study and come up with similar findings. As the two new studies show, it’s not. And there can be a host of reasons why.

Some investigators have concluded that it may be next to impossible to redo a study exactly. This can be true especially when a study works with subjects or materials that vary greatly. Cells, animals and people are all things that have a lot of variation. Due to genetic or developmental differences, one cell or individual may respond differently to stimuli than another will. Stimuli might include foods, drugs, infectious germs or some other aspect of the environment.

Similarly, some studies involve conditions that are quite complicated. Examples can include the weather or how crowds of people behave. Consider climate studies. Computers are not yet big enough and fast enough to account for everything that affects climate, scientists note. Many of these factors will vary broadly over time and distance. So climate scientists choose to analyze the conditions that seem the most important. They may concentrate on those for which they have the best or the most data. If the next group of researchers uses a different set of data, their findings may not match the earlier ones.

Eventually, time and more data may show why the findings of an original study and a repeated one differ. One of the studies may be found weak or somewhat flawed. Perhaps both will be.

This points to what can make advancing science so challenging. “Science is never settled, and both the scientific consensus and alternative hypotheses should be subject to ongoing questioning,” Benestad’s group argues.

Researchers should try to prove or disprove even those things that have been considered common knowledge, they add. Resolving differences in an understanding of science and data is essential, they argue. That is true in climate science, psychology and every other field. After all, without a good understanding of science, they say, society won’t be able to make sound decisions on how to create a safer, healthier and more sustainable world.

More Stories from Science News Explores on Science & Society

an illustration of robotic hand using a human brain as a puppet on strings while inside a person's head

AI learned how to influence humans by watching a video game

A photo of a teen sitting in front of their bedroom desk. They are looking at a monitor, listening to headphones, holding up a smartphone, and potentially streaming (there's a ring light that is turned on pointed at them).

You’re too distracted. Here’s why that matters and what to do about it

a girl and her mom sit a table in a restaurant, the girl is playing with various rainbow colored fidget toys

Explainer: What is autism?

A photo of Kwesi Joseph, a Black man with a big smile, short hair and a neatly trimmed beard. He's wearing a dress shirt with a sweater and a pink tie.

This urban gardener is mimicking nature to create healthier plants

a photo of a row of kids sitting on a sofa, all of them are on devices

U.S. lawmakers look for ways to protect kids on social media

a group of young children of varying ages playing on the floor with blocks

9 things to know about lead’s health risks — and how to curb them

A photo of people marching across a bridge over the Flint River from February 19, 2016. A Black woman in the middle of the image is holding hands with another Black woman, both are raising their voices in protest. The woman on the left has a sign that reads 'Water is a human right' - MI Nurses Association. Behind the women and to the right other protesters can be seen marching.

Community action helps people cope with Flint’s water woes

a photo of a person walking along the Flint River in downtown Flint Michigan

Health problems persist in Flint 10 years after water poisoning

Have a language expert improve your writing

Run a free plagiarism check in 10 minutes, generate accurate citations for free.

  • Knowledge Base

Methodology

  • Reproducibility vs Replicability | Difference & Examples

Reproducibility vs Replicability | Difference & Examples

Published on August 19, 2022 by Kassiani Nikolopoulou . Revised on June 22, 2023.

The terms reproducibility , repeatability , and replicability  are sometimes used interchangeably, but they mean different things.

  • A research study is reproducible when the existing data is reanalysed using the same research methods and yields the same results. This shows that the analysis was conducted fairly and correctly.
  • A research study is replicable (or repeatable ) when the entire research process is conducted again, using the same methods but new data, and still yields the same results. This shows that the results of the original study are reliable .

A survey of 60 children between the ages of 12 and 16 shows that football and hockey are the most popular sports. Football received 20 votes and hockey 18.

An independent researcher reanalyses the survey data and also finds that 20 children chose football and 18 children chose hockey. This makes the research reproducible.

Table of contents

Why reproducibility and replicability matter in research, what is the replication crisis, how to ensure reproducibility and replicability in your research, other interesting articles, frequently asked questions about reproducibility, replicability and repeatability.

Reproducibility and replicability enhance the reliability of results. This allows researchers to check the quality of their own work or that of others, which in turn increases the chance that the results are valid and not suffering from research bias .

On the other hand, reproduction alone does not show whether the results are correct. As it does not involve collecting new data , reproducibility is a minimum necessary condition – showing that findings are transparent and informative.

In order to make research reproducible, it is important to provide all the necessary raw data. This makes it so that anyone can run the analysis again, ideally recreating the same results. Omitted variables , missing data, or mistakes leading to information bias can lead to your research not being reproducible.

To make your research reproducible, you describe step by step how you collected and analysed your data. You also include all the raw data in the appendix : a list of the interview questions, the interview transcripts, and the coding sheet you used to analyse your interviews.

Sometimes researchers also conduct replication studies . These studies investigate whether researchers can arrive at the same scientific findings as an existing study while collecting new data and completing new analyses.  

The researchers administered an online survey, and the majority of the participants (58%) reported that they waste 10% or less of procured food. They also found that guilt and setting a good example were the main motivators for reducing food waste, rather than economic or environmental factors.

Together with your research group, you decide to conduct a replication study: you collect new data in the same state, this time via focus groups . Your findings are consistent with the initial study, which makes it replicable.

Overall, repeatability and reproducibility ensure that scientists remain honest and do not invent or distort results to get better outcomes. In particular, testing for reproducibility can also be a way to catch any mistakes, biases , or inconsistencies in your data.

Here's why students love Scribbr's proofreading services

Discover proofreading & editing

Unfortunately, findings from many scientific fields – such as psychology, medicine, or economics – often prove impossible to replicate. When other research teams try to repeat a study, they get a different result, suggesting that the initial study’s findings are not reliable.

Some factors contributing to this phenomenon include:

  • Unclear definition of key terms
  • Poor description of research methods
  • Lack of transparency in the discussion section
  • Unclear presentation of raw data
  • Poor description of data analysis undertaken

Publication bias can also play a role. Scientific journals are more likely to accept original (non-replicated) studies that report positive, statistically significant results that support the hypothesis .

To make your research reproducible and replicable, it is crucial to describe, step by step, how to conduct the research. You can do so by focusing on writing a clear and transparent methodology section , using precise language and avoiding vague writing .

Transparent methodology section

In your methodology section, you explain in detail what steps you have taken to answer the research question. As a rule of thumb, someone who has nothing to do with your research should be able to repeat what you did based solely on your explanation.

For example, you can describe:

  • What type of research ( quantitative , qualitative , mixed methods ) you conducted
  • Which research method you used ( interviews , surveys , etc.)
  • Who your participants or respondents are (e.g., their age or education level)
  • What materials you used (audio clips, video recording, etc.)
  • What procedure you used
  • What data analysis method you chose (such as the type of statistical analysis )
  • How you ensured reliability and validity
  • Why you drew certain conclusions, and on the basis of which results
  • In which appendix the reader can find any survey questions, interviews, or transcripts

Sometimes, parts of the research may turn out differently than you expected, or you may accidentally make mistakes. This is all part of the process! It’s important to mention these problems and limitations so that they can be prevented next time. You can do this in the discussion or conclusion , depending on the requirements of your study program.

Use of clear and unambiguous language

You can also increase the reproducibility and replicability/repeatability of your research by always using crystal-clear writing. Avoid using vague language, and ensure that your text can only be understood in one way. Careful description shows that you have thought in depth about the method you chose and that you have confidence in the research and its results.

Here are a few examples.

  • The participants of this study were children from a school.
  • The 67 participants of this study were elementary school children between the ages of 6 and 10.
  • The interviews were transcribed and then coded.
  • The semi-structured interviews were first summarised, transcribed, and then open-coded.
  • The results were compared with a t test .
  • The results were compared with an unpaired t test.

If you want to know more about statistics , methodology , or research bias , make sure to check out some of our other articles with explanations and examples.

  • Normal distribution
  • Degrees of freedom
  • Null hypothesis
  • Discourse analysis
  • Control groups
  • Mixed methods research
  • Non-probability sampling
  • Quantitative research
  • Ecological validity

Research bias

  • Rosenthal effect
  • Implicit bias
  • Cognitive bias
  • Selection bias
  • Negativity bias
  • Status quo bias

Receive feedback on language, structure, and formatting

Professional editors proofread and edit your paper by focusing on:

  • Academic style
  • Vague sentences
  • Style consistency

See an example

the research study cannot be replicated or repeated

Reproducibility and replicability are related terms.

  • Reproducing research entails reanalyzing the existing data in the same manner.
  • Replicating (or repeating ) the research entails reconducting the entire analysis, including the collection of new data . 
  • A successful reproduction shows that the data analyses were conducted in a fair and honest manner.
  • A successful replication shows that the reliability of the results is high.

The reproducibility and replicability of a study can be ensured by writing a transparent, detailed method section and using clear, unambiguous language.

Cite this Scribbr article

If you want to cite this source, you can copy and paste the citation or click the “Cite this Scribbr article” button to automatically add the citation to our free Citation Generator.

Nikolopoulou, K. (2023, June 22). Reproducibility vs Replicability | Difference & Examples. Scribbr. Retrieved April 9, 2024, from https://www.scribbr.com/methodology/reproducibility-repeatability-replicability/

Is this article helpful?

Kassiani Nikolopoulou

Kassiani Nikolopoulou

Other students also liked, internal validity in research | definition, threats, & examples, the 4 types of validity in research | definitions & examples, reliability vs. validity in research | difference, types and examples, unlimited academic ai-proofreading.

✔ Document error-free in 5minutes ✔ Unlimited document corrections ✔ Specialized in correcting academic texts

the research study cannot be replicated or repeated

It bears repeating: how scientists are addressing the ‘reproducibility problem’

the research study cannot be replicated or repeated

Assistant Professor and Co-Director of the Histopathology and Tissue Shared Resource, Georgetown University

Disclosure statement

Deborah Berry is a member of the Laboratory Advisory Board for Science Exchange.

View all partners

Recently a friend of mine on Facebook posted a link whose headline quoted a scientist saying “Most cancer research is largely a fraud.” The quote is both out of context and many decades old. But its appearance still makes a strong point: the general public has a growing distrust of science and research.

the research study cannot be replicated or repeated

Recent reports in the Washington Post and the Economist, among others, raise the concern that relatively few scientists’ experimental findings can be replicated . This is worrying: replicating an experiment is a main foundation of the scientific method.

As scientists, we build on knowledge gained and published by others. We develop new experiments and questions based on the knowledge we gain from those published reports. If those papers are valid, our work is supported and knowledge advances.

On the other hand, if published research is not actually valid, if it can’t be replicated, it delivers only an incidental finding, not scientific knowledge. Any subsequent questions will either be wrong or flawed in important ways. Identifying which reports are invalid is critical to prevent wasting money and time pursuing an incorrect idea based on bad data. How can we know which findings to trust?

Why would a repeat fail?

Repeating a result is not always a simple task. Say you flip a coin three times and get heads each time. You may conclude that coins always land on heads. As an independent test, your friend flips a coin five more times and gets four tails and one heads. The friend concludes your results were incorrect, not reproducible and that coins usually land on tails. Repeating the research can both correct inaccuracies and deepen our understanding of the real truth: the coin lands on heads and tails equally.

This is much harder in studies that are more complex than coin-flipping. In a recent commentary in Science, lead author and Harvard psychologist Daniel Gilbert notes that the 2015 study that reported low reproducibility of psychology research did not correctly replicate the methods or approaches of the original studies. For example, a study of race and affirmative action performed at Stanford University was “replicated” at the University of Amsterdam in the Netherlands, in another country with different racial diversity. When the study was later repeated at Stanford, the original published results were indeed replicated.

Gilbert’s analysis suggests that the reproducibility “problem” may be more complex. Perhaps some studies cannot be repeated due to problems with the initial study, while others aren’t replicable because the follow-up research did not follow the methods or use the same tools as the original study. Likely both contribute to the reproducibility problem.

Focusing on the details

The scientific community is addressing this challenge in several ways. For example, scholarly journals are requiring much more detailed explanations of how we did our experiments. More detail allows scholars to better evaluate and understand what parts of the experiment could influence the result .

Also, when reviewing requests for government research grant money, the National Institutes of Health now requires scientists to detail both the tools they will use and the tests they used to confirm the tools are exactly what they should be.

One way scientists can get results that can’t be reproduced is if one or more of the tools used doesn’t work as the researchers assume or intend. Researchers have found that tools such as cell lines can become contaminated, mislabeled or mixed up. Antibodies used to identify one protein may actually identify the wrong protein or more than one protein. Even variations in the type of food given to lab mice have shown to significantly change experiment results .

To combat this type of problem, researchers have begun sequencing DNA to ensure they are working with the cell lines they intend to be. Some lab supply companies are testing their antibodies in-house to confirm they work as expected. Other companies are using the online lab-services marketplace Science Exchange to find expert labs like mine to independently test their antibodies. (I am on Science Exchange’s Lab Advisory Board, but have no financial interest in the company.) The results of those tests can “validate” an antibody as good or bad for a particular experiment, letting future scientists know which antibodies are the best tools for their research.

Finding time to reproduce important studies

Those steps address future and ongoing research. But how do we know which already published experiments are reproducible and which are not? Most journals focus on publishing new and groundbreaking findings, rather than publishing a replication of a previous study. Further, research that finds a study’s results can’t be replicated – getting what are called “negative results” – can also be difficult for scientists and journals to publish. Collaboration and support from colleagues are key to academic success; publishing data that contradict a fellow researcher’s results risks alienating peers.

In 2012, the biopharmaceutical company Amgen reported that it had been unable to reproduce 47 of 53 “landmark” cancer papers. For confidentiality reasons, however, the company did not release which papers it could not replicate and thus did not provide details about how it repeated the experiments. As with the psychology studies, this leaves the possibility that Amgen got different results because the experiments were not performed the same way as the original study. It opens the door to doubt about which result – the first or the repeat test – was correct.

Several initiatives are addressing this problem in multiple disciplines. Science Exchange; the Center for Open Science , a group dedicated to “openness, integrity and reproducibility of scientific research”; and F1000Research , a team focused on immediate and transparent publishing have all introduced initiatives along this line.

Science Exchange and the Center for Open Science have launched a specific effort in this direction regarding cancer research. Their effort, the Reproducibility Project: Cancer Biology , has received US$1.3 million from the Arnold Foundation to repeat selected experiments from a number of high-profile cancer biology papers. The project will publish comprehensive details of how scientists attempted to reproduce each study, and will report results whether they confirm, contradict or change the findings of the study being repeated.

In addition, Science Exchange, the open-access journal PLoS , the data management site figshare and the reference management site Mendeley joined forces in 2012 to identify and document high-quality reproducible research. This effort, called the Reproducibility Initiative , allows scientists to apply to have key parts of their projects repeated in independent expert labs identified by Science Exchange.

The results of the repeat tests can be published in the special PLoS reproducibility collection . The data are made openly available through figshare and the impact the work has on future studies and publications can be tracked in the Mendeley reproducibility collection . Many journals have agreed to add an “Independently Validated” badge to original articles that are successfully repeated, indicating their high quality.

Doing it right again and again

To prevent problems in the repetition of the experiments, the Reproducibility Initiative spends months reviewing the details of an experiment with the original author to ensure the project is repeated accurately. Once reviewed, Science Exchange splits the project into types of experiments and outsources each type to a lab with that expertise. By dividing and outsourcing the project, the testing labs do not know the original paper, results, or authors, eliminating chances for bias in testing.

Testing labs like mine create a detailed report of the experiments to be done. Every step, every reagent down to the catalog number and company, is carefully documented and published in an independent report in “PLoS One.” That way, whether the result of the repetition is positive or negative, the full details of the experiment are available for review. Upon completion of the repeat testing, the results are published in “PLoS One,” whether they validate or contradict the original findings. The results of the first full replication of a study are expected to be published later this year.

As scientists, we are working to dispel concerns about scientific research like those raised by my Facebook friend. With improved reporting and tools for future research, the science community can counter and reduce existing problems of reproducibility, which will help us build a strong and valid foundation for future scientific studies.

  • Scientific method
  • Reproducibility
  • Science publishing
  • open scholarship
  • Research collaboration

the research study cannot be replicated or repeated

GRAINS RESEARCH AND DEVELOPMENT CORPORATION CHAIRPERSON

the research study cannot be replicated or repeated

Project Officer, Student Program Development

the research study cannot be replicated or repeated

Faculty of Law - Academic Appointment Opportunities

the research study cannot be replicated or repeated

Operations Manager

the research study cannot be replicated or repeated

Audience Development Coordinator (fixed-term maternity cover)

U.S. flag

An official website of the United States government

The .gov means it's official. Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you're on a federal government site.

The site is secure. The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

  • Publications
  • Account settings
  • Browse Titles

NCBI Bookshelf. A service of the National Library of Medicine, National Institutes of Health.

National Academies of Sciences, Engineering, and Medicine; Policy and Global Affairs; Committee on Science, Engineering, Medicine, and Public Policy; Board on Research Data and Information; Division on Engineering and Physical Sciences; Committee on Applied and Theoretical Statistics; Board on Mathematical Sciences and Analytics; Division on Earth and Life Studies; Nuclear and Radiation Studies Board; Division of Behavioral and Social Sciences and Education; Committee on National Statistics; Board on Behavioral, Cognitive, and Sensory Sciences; Committee on Reproducibility and Replicability in Science. Reproducibility and Replicability in Science. Washington (DC): National Academies Press (US); 2019 May 7.

Cover of Reproducibility and Replicability in Science

Reproducibility and Replicability in Science.

  • Hardcopy Version at National Academies Press

7 Confidence in Science

The committee was asked to “draw conclusions and make recommendations for improving rigor and transparency in scientific and engineering research.” Certainly, reproducibility and replicability play an important role in achieving rigor and transparency, and for some lines of scientific inquiry, replication is one way to gain confidence in scientific knowledge. For other lines of inquiry, however, direct replications may be impossible due to the characteristics of the phenomena being studied. The robustness of science is less well represented by the replications between two individual studies than by a more holistic web of knowledge reinforced through multiple lines of examination and inquiry. In this chapter, the committee illustrates a spectrum of pathways to attain rigor and confidence in scientific knowledge, beginning with an overview of research synthesis and meta-analysis, and then citing illustrative approaches and perspectives from geoscience, genetics, psychology, and big data in social sciences. The chapter concludes with a consideration of public understanding and confidence in science.

When results are computationally reproduced or replicated, confidence in robustness of the knowledge derived from that particular study is increased. However, reproducibility and replicability are focused on the comparison between individual studies. By looking more broadly and using other techniques to gain confidence in results, multiple pathways can be found to consistently support certain scientific concepts and theories while rejecting others. Research synthesis is a widely accepted and practiced method for gauging the reliability and validity of bodies of research, although like all research methods, it can be used in ways that are more or less valid ( de Vrieze, 2018 ). The common principles of science—gathering evidence, developing theories and/or hypotheses, and applying logic—allow us to explore and predict systems that are inherently non-replicable. We use several of these systems below to highlight how scientists gain confidence when direct assessments of reproducibility or replicability are not feasible.

  • RESEARCH SYNTHESIS

As we note throughout this report, studies purporting to investigate similar scientific questions can produce inconsistent or contradictory results. Research synthesis addresses the central question of how the results of studies relate to each other, what factors may be contributing to variability across studies, and how study results coalesce or not in developing the knowledge network for a particular science domain. In current use, the term research synthesis describes the ensemble of research activities involved in identifying, retrieving, evaluating, synthesizing, interpreting, and contextualizing the available evidence from studies on a particular topic and comprises both systematic reviews and meta-analyses. For example, a research synthesis may classify studies based on some feature and then test whether the effect size is larger for studies with or without the feature compared with the other studies. The term meta-analysis is reserved for the quantitative analysis conducted as part of research synthesis.

Although the terms used to describe research synthesis vary, the practice is widely used, in fields ranging from medicine to physics. In medicine, Cochrane reviews are systematic reviews that are performed by a body of experts who examine and synthesize the results of medical research. 1 These reviews provide an overview of the best available evidence on a wide variety of topics, and they are updated periodically as needed. In physics, the Task Group on Fundamental Constants performs research syntheses as part of its task to adjust the values of the fundamental constants of physics. The task group compares new results to each other and to the current estimated value, and uses this information to calculate an adjusted value ( Mohr et al., 2016 ). The exact procedure for research synthesis varies by field and by the scientific question at hand; the following is a general description of the approach.

Research synthesis begins with formal definitions of the scientific issues and the scope of the investigation and proceeds to search for published and unpublished sources of potentially relevant information (e.g., study results). The ensemble of studies identified by the search is evaluated for relevance to the central scientific question, and the resulting subset of studies undergoes review for methodological quality, typically using explicit criteria and the assignment of quality scores. The next step is the extraction of qualitative and quantitative information from the selected studies. The former includes study-level characteristics of design and study processes; the latter includes quantitative results, such as study-level estimates of effects and variability overall as well as by subsets of study participants or units or individual-level data on study participants or units ( Institute of Medicine, 2011 , Chapter 4 ).

Using summary statistics or individual-level data, meta-analysis provides estimates of overall central tendencies, effect sizes, or association magnitudes, along with estimates of the variance or uncertainty in those estimates. For example, the meta-analysis of the comparative efficacy of two treatments for a particular condition can provide estimates of an overall effect in the target clinical population. Replicability of an effect is reflected in the consistency of effect sizes across the studies, especially when a variety of methods, each with different weaknesses, converge on the same conclusion. As a tool for testing whether patterns of results across studies are anomalous, meta-analyses have, for example, suggested that well-accepted results in a scientific field are or could plausibly be largely due to publication bias.

Meta-analyses also test for variation in effect sizes and, as a result, can suggest potential causes of non-replicability in existing research. Meta-analyses can quantify the extent to which results appear to vary from study to study solely due to random sampling variation or to varying in a systematic way by subgroups (including sociodemographic, clinical, genetic, and other subject characteristics), as well as by characteristics of the individual studies (such as important aspects of the design of studies, the treatments used, and the time period and context in which studies were conducted). Of course, these features of the original studies need to be described sufficiently to be retrieved from the research reports.

For example, a meta-analytic aggregation across 200 meta-analyses published in the top journal for reviews in psychology, Psychological Bulletin , showed that only 8 percent of studies had adequate statistical power; variation across studies testing the same hypothesis was very high, with 74 percent of variation due to unexplained heterogeneity; and reporting bias overall was low ( Stanley et al., 2018 ).

In social psychology, Malle (2006) conducted a meta-analysis of studies comparing how actors explain their own behavior with how observers explain it and identified an unrecognized confounder—the positivity of the behavior. In studies that tested positive behaviors, actors took credit for the action and attributed it more to themselves than did observers. In studies that tested negative behaviors, actors justified the behavior and viewed it as due to the situation they were in more than did observers. Similarly, meta-analyses have often shown that the association of obesity with various outcomes (e.g., dementia) depend on the age in life at which the obesity is considered.

Systematic reviews and meta-analyses are typically conducted as retrospective investigations, in the sense that they search and evaluate the evidence from studies that have been conducted. Systematic reviews and meta-analyses are susceptible to biased datasets, for example, if the scientific literature on which a systematic review or a meta-analysis is biased due to publication bias of positive results. However, the potential for a prospective formulation of evidence synthesis is clear and is beginning to transform the landscape. Some research teams are beginning to monitor the scientific literature on a particular topic and conduct periodic updates of systematic reviews on the topic. 2 Prospective research synthesis may offer a partial solution to the challenge of biased datasets.

Meta-research is a new field that involves evaluating and improving the practice of research. Meta-research encompasses and goes beyond meta-analysis. As Ioannidis et al. (2015) aptly argued, meta-research can go beyond single substantive questions to examine factors that affect rigor, reproducibility, replicability, and, ultimately, the truth of research results across many topics.

CONCLUSION 7-1: Further development in and the use of meta-research would facilitate learning from scientific studies. These developments would include the study of research practices such as research on the quality and effects of peer review of journal manuscripts or grant proposals, research on the effectiveness and side effects of proposed research practices, and research on the variation in reproducibility and replicability between fields or over time.

What distinguishes geoscience from much of chemistry, biology, and physics is its focus on phenomena that emerge out of uncontrolled natural environments, as well as its special concern with understanding past events documented in the geologic record. Emergent phenomena on a global scale include climate variations at Earth's surface, tectonic motions of its lithospheric plates, and the magnetic field generated in its iron-rich core. The geosystems responsible for these phenomena have been active for billions of years, and the geologic record indicates that many of the terrestrial processes in the distant geologic past were similar to those that are occurring today. Geoscientists seek to understand the geosystems that produced these past behaviors and to draw implications regarding the future of the planet and its human environment. While one cannot replicate geologic events, such as earthquakes or hurricanes, scientific methods are used to generate increasingly accurate forecasts and predictions.

Emergent phenomena from complex natural systems are infinite in their variety; no two events are identical, and in this sense, no event repeats itself. Events can be categorized according to their statistical properties, however, such as the parameters of their space, time, and size distributions. The satisfactory explanation of an emergent phenomena requires building a geosystem model (usually a numerical code) that can replicate the statistics of the phenomenon by simulating the causal processes and interactions. In this context, replication means achieving sufficient statistical agreement between the simulated and observed phenomena.

Understanding of a geosystem and its defining phenomena is often measured by scientists' ability to replicate behaviors that were previously observed (i.e., retrospective testing) and predict new ones that can be subsequently observed (i.e., prospective testing). These evaluations can be in the form of null-hypothesis significance tests (e.g., expressed in terms of p- values) or in terms of skill scores relative to a prediction baseline (e.g., weather forecasts relative to mean-climate forecasts).

In the study of geosystems, reproducibility and replicability are closely tied to verification and validation. 3 Verification confirms the correctness of the model by checking that the numerical code correctly solves the mathematical equations. Validation is the process of deciding whether a model replicates the data-generating process accurately enough to warrant some specific application, such as the forecasting of natural hazards.

Hazard forecasting is an area of applied geoscience in which the issues of reproducibility and replicability are sharply framed by the operational demands for delivering high-quality information to a variety of users in a timely manner. Federal agencies tasked with providing authoritative hazard information to the public have undertaken substantial programs to improve reproducibility and replicability standards in operational forecasting. The cyberinfrastructure constructed to support operational forecasting also enhances capabilities for exploratory science in geosystems.

Natural hazards—from windstorms, droughts, floods, and wildfires to earthquakes, landslides, tsunamis, and volcanic eruptions—are notoriously difficult to predict because of the scale and complexity of the geosystems that produce them. Predictability is especially problematic for extreme events of low probability but high consequence that often dominate societal risk, such as the “500-year flood” or “2,500-year earthquake.” Nevertheless, across all sectors of society, expectations are rising for timely, reliable predictions of natural hazards based on the best available science. 4 A substantial part of applied geoscience now concerns the scientific forecasting of hazards and their consequences. A forecast is deemed scientific if meets five criteria:

formulated to predict measurable events

respectful of physical laws

calibrated against past observations

as reliable and skillful as practical, given the available information

testable against future observations

To account for the unavoidable sources of non-replicability (i.e., the randomness of nature and lack of knowledge about this variability), scientific forecasts must be expressed as probabilities. The goal of probabilistic forecasting is to develop forecasts of natural events that are statistically ideal—the best forecasts possible given the available information. Progress toward this goal requires the iterated development of forecasting models over many cycles of data gathering, model calibration, verification, simulation, and testing.

In some fields, such as weather and hydrological forecasting, the natural cycles are rapid enough and the observations are dense and accurate enough to permit the iterated development of system-level models with high explanatory and predictive power. Through steady advances in data collection and numerical modeling over the past several decades, the skill of the ensemble forecasting models developed and maintained by the weather prediction centers has been steadily improved ( Bauer et al., 2015 ). For example, forecasting skill in the range from 3 to 10 days ahead has been increasing by about 1 day per decade; that is, today's 6-day forecast is as accurate as the 5-day forecast was 10 years ago. This is a familiar illustration of gaining confidence in scientific knowledge without doing repeat experiments.

One of the principal tools to gain knowledge about genetic risk factors for disease is a genome-wide association study (GWAS). A GWAS is an observational study of a genome-wide set of genetic variants with the aim of detecting which variants may be associated with the development of a disease, or more broadly, associated with any expressed trait. These studies can be complex to mount, involve massive data collection, and require application of a range of sophisticated statistical methods for correct interpretation.

The community of investigators undertaking GWASs have adopted a series of practices and standards to improve the reliability of their results. These practices include a wide range of activities, such as:

  • efforts to ensure consistency in data generation and extensive quality control steps to ensure the reliability of genotype data;
  • genotype and phenotype harmonization;
  • a push for large sample sizes through the establishment of large international disease consortia;
  • rigorous study design and standardized statistical analysis protocols, including consensus building on controlling for key confounders, such as genetic ancestry/population stratification, the use of stringent criteria to account for multiple testing, and the development of norms for conducting independent replication studies and meta-analyzing multiple cohorts;
  • a culture of large-scale international collaboration and sharing of data, results, and tools, empowered by strong infrastructure support; and
  • an incentive system, which is created to meet scientific needs and is recognized and promoted by funding agencies and journals, as well as grant and paper reviewers, for scientists to perform reproducible, replicable, and accurate research.

For a description of the general approach taken by this community of investigators, see Lin (2018) .

The idea that there is a “replication crisis” in psychology has received a good deal of attention in professional and popular media, including The New York Times , The Atlantic , National Review , and Slate . However, there is no consensus within the field on this point. Some researchers believe that the field is rife with lax methods that threaten validity, including low statistical power, failure to clarify between a priori and a posteriori hypothesis testing, and the potential for p- hacking (e.g., Pashler and Wagenmakers, 2012 ; Simmons et al., 2011 ). Other researchers disagree with this characterization and have discussed the costs of what they see as misportraying psychology as a field in crisis, such as the possible chilling effects of such claims on young investigators and an overemphasis on Type I errors (i.e., false positives) at the expense of Type II errors (i.e., false negatives), and failing to discover important new phenomena ( Fanelli, 2018 ; Fiedler et al., 2012 ). Yet others have noted that psychology has long been concerned with improving its methodology, and the current discussion of reproducibility is part of the normal progression of science. An analysis of experimenter bias in the 1960s is a good example, especially as it spurred the use of double-blind methods in experiments ( Rosenthal, 1979 ). In this view, the current concerns can be situated within a history of continuing methodological improvements as psychological scientists continue to develop better understanding and implementation of statistical and other methods and reporting practices.

One reason to believe in the fundamental soundness of psychology as a science is that a great deal of useful and reliable knowledge is being produced. Researchers are making numerous replicable discoveries about the causes of human thought, emotion, and behavior ( Shiffrin et al., 2018 ). To give but a few examples, research on human memory has documented the fallibility of eyewitness testimony, leading to the release of many wrongly convicted prisoners ( Loftus, 2017 ). Research on “overjustification” shows that rewarding children can undermine their intrinsic interest in desirable activities ( Lepper and Henderlong, 2000 ). Research on how decisions are framed has found that more people participate in social programs, such as retirement savings or organ donation, when they are automatically enrolled and have to make a decision to leave (i.e., opt out), compared with when they have to make a decision to join (i.e., opt in) ( Jachimowicz et al., 2018 ). Increasingly, researchers and governments are using such psychological knowledge to meet social needs and solve problems, including improving educational outcomes, reducing government waste from ineffective programs, improving people's health, and reducing stereotyping and prejudice ( Walton and Wilson, 2018 ; Wood and Neal, 2016 ).

It is possible that accompanying this progress are lower levels of reproducibility than would be desirable. As discussed throughout this report, no field of science produces perfectly replicable results, but it may be useful to estimate the current level of replicability of published psychology results and ask whether that level is as high as the field believes it needs to be. Indeed, psychology has been at the forefront of empirical attempts to answer this question with large-scale replication projects, in which researchers from different labs attempt to reproduce a set of studies (refer to Table 5-1 in Chapter 5 ).

The replication projects themselves have proved to be controversial, however, generating wide disagreement about the attributes used to assess replication and the interpretation of the results. Some view the results of these projects as cause for alarm. In his remarks to the committee, for example, Brian Nosek observed: “The evidence for reproducibility [replicability] has fallen short of what one might expect or what one might desire.” ( Nosek, 2018 ). Researchers who agree with this perspective offer a range of evidence. 5

First, many of the replication attempts had similar or higher levels of rigor (e.g., sample size, transparency, preregistration) as the original studies, and yet many were not able to reproduce the original results ( Cheung et al., 2016 ; Ebersole et al., 2016a ; Eerland et al., 2016 ; Hagger et al., 2016 ; Klein et al., 2018 ; O'Donnell et al., 2018 ; Wagenmakers et al., 2016 ). Given the high degree of scrutiny on replication studies ( Zwaan et al., 2018 ), it is unlikely that most failed replications are the result of sloppy research practices.

Second, some of the replication attempts have focused specifically on results that have garnered a lot of attention, are taught in textbooks, and are in other ways high profile—results that one might expect have a high chance of being robust. Some of these replication attempts were successful, but many were not (e.g., Hagger et al., 2016 ; O'Donnell et al., 2018 ; Wagenmakers et al., 2016 ).

Third, a number of the replication attempts were collaborative, with researchers closely tied to the original result (e.g., the authors of the original studies or people with a great deal of expertise on the phenomenon) playing an active role in vetting the replication design and procedure ( Cheung et al., 2016 ; Eerland et al., 2016 ; Hagger et al., 2016 ; O'Donnell et al., 2018 ; Wagenmakers et al., 2016 ). This has not consistently led to positive replication results.

Fourth, when potential mitigating factors have been identified for the failures to replicate, these are often speculative and yet to be tested empirically. For example, failures to replicate have been attributed to context sensitivity and that some phenomena are simply more difficult to recreate in another time and place ( Van Bavel et al., 2016 ). However, without prospective empirical tests of this or other proposed mitigating factors, the possibility that the original result is not replicable remains a real possibility.

And fifth, even if a substantial portion (say, one-third) of failures to replicate are false negatives, it would still lead to the conclusion that the replicability of psychology results falls short of the ideal. Thus, to conclude that replicability rates are acceptable (say, near 80%), one would need to have confidence that most failed replications have significant flaws.

Others, however, have a quite different view of the results of the replication projects that have been conducted so far and offer their own arguments and evidence. First, some replication projects have found relatively high rates of replication: for example, Klein et al. (2014) replicated 10 of 13 results. Second, some high-profile replication projects (e.g., Open Science Collaboration, 2015 ) may have underestimated the replication rate by failing to correct for errors and by introducing changes in the replications that were not in the original studies (e.g., Bench et al., 2017 ; Etz and Vandekerckhove, 2016 ; Gilbert et al., 2015 ; Van Bavel et al., 2016 ). Moreover, several cases have come to light in which studies failed to replicate because of methodological changes in the replications, rather than problems with the original studies, and when these changes were corrected, the study replicated successfully (e.g., Alogna et al., 2014 ; Luttrell et al., 2017 ; Noah et al., 2018 ). Finally, the generalizability of the replication results is unknown, because no project randomly selected the studies to be replicated, and many were quite selective in the studies they chose to try to replicate.

An unresolved question in any analysis of replicability is what criteria to use to determine success or failure. Meta-analysis across a set of results may be a more promising technique to assess replicability, because it can evaluate moderators of effects as well as uniformity of results. However, meta-analysis may not achieve sufficient power given only a few studies.

Despite opposing views about how to interpret large-scale replication projects, there seems to be an emerging consensus that it is not helpful, or justified, to refer to psychology as being in a state of “crisis.” Nosek put it this way in his comments to the committee: “How extensive is the lack of reproducibility in research results in science and engineering in general? The easy answer is that we don't know. We don't have enough information to provide an estimate with any certainty for any individual field or even across fields in general.” He added, “I don't like the term crisis because it implies a lot of things that we don't know are true.”

Moreover, even if there were a definitive estimate of replicability in psychology, no one knows the expected level of non-replicability in a healthy science. Empirical results in psychology, like science in general, are inherently probabilistic, meaning that some failures to replicate are inevitable. As we stress throughout this report, innovative research will likely produce inconsistent results as it pushes the boundaries of knowledge. Ambitious research agendas that, for example, link brain to behavior, genetic to environmental influences, computational models to empirical results, and hormonal fluctuations to emotions necessarily yield some dead ends and failures. In short, some failures to replicate can reflect normal progress in science, and they can also highlight a lack of theoretical understanding or methodological limitations.

Whatever the extent of the problem, scientific methods and data analytic techniques can always be improved, and this discussion follows a long tradition in psychology of methodological innovation. New practices, such as checks on the efficacy of experimental manipulations, are now accepted in the field. Funding proposals now include power analyses as a matter of course. Longitudinal studies no longer just note attrition (i.e., participant dropout), but instead routinely estimate its effects (e.g., intention-to-treat analyses). At the same time, not all researchers have adopted best practices, sometimes failing to keep pace with current knowledge ( Sedlmeier and Gigerenzer, 1989 ). Only recently are researchers starting to systematically use power calculations in research reports or to provide online access to data and materials. Pressures on researchers to improve practices and to increase transparency have been heightened in the past decade by new developments in information technology that increase public access to information and scrutiny of science ( Lupia, 2017 ).

  • SOCIAL SCIENCE RESEARCH USING BIG DATA

With close to 7 in 10 Americans now using social media as a regular news source ( Pew, 2018 ), social scientists in communication research, psychology, sociology, and political science routinely analyze a variety of information disseminated on commercial social media platforms, such as Twitter and Facebook, how that information flows through social networks, and how it influences attitudes and behaviors.

Analyses of data from these commercial platforms may rely on publicly available data that can be scraped and collected by any researcher without input from or collaboration with industry partners (model 1). Alternatively, industry staff may collaborate with researchers and provide access to proprietary data for analysis (such as code or underlying algorithms) that may not be made available to others (model 2). Variations on these two basic models will depend on the type of intellectual property being used in the research.

Both models raise challenges for reproducibility and replicability. In terms of reproducibility, when data are proprietary and undisclosed, the computation by definition is not reproducible by others. This might put this kind of research at odds with publication requirements of journals and other academic outlets. An inability to publish results from such industry partnerships may in the long term create a disincentive for work on datasets that cannot be made publicly available and increase pressure from within the scientific community on industry partners for more openness. This process may be accelerated if funding agencies only support research that follows the standards for full documentation and openness detailed in this report.

Both models also raise issues with replicability. Social media platforms, such as Twitter and Facebook, regularly modify their application programming interfaces (APIs) and other modalities of data access, which influences the ability of researchers to access, document, and archive data consistently. In addition, data are likely confounded by ongoing A/B testing 6 and tweaks to underlying algorithms. In model 1, these confounds are not transparent to researchers and therefore cannot be documented or controlled for in the original data collections or attempts to replicate the work. In model 2, they are known to the research team, but because they are proprietary they cannot be shared publicly. In both models, changes implemented by social media platforms in algorithms, APIs, and other internal characteristics over time make it impossible to computationally reproduce analytic models and to have confidence that equivalent data for reproducibility can be collected over time.

In summary, the considerations for social science using big data of the type discussed above illustrate a spectrum of challenges and approaches toward gaining confidence in scientific studies. In these and other scientific domains, science progresses through growing consensus in the scientific community of what counts as scientific knowledge. At the same time, public trust in science is premised on public confidence in the ability of scientists to demonstrate and validate what they assert is scientific knowledge.

In the examples above, diverse fields of science have developed methods for investigating phenomena that are difficult or impossible to replicate. Yet, as in the case of hazard prediction, scientific progress has been made as evidenced by forecasts with increased accuracy. This progress is built from the results of many trials and errors. Differentiating a success from a failure of a single study cannot be done without looking more broadly at the other lines of evidence. As noted by Goodman and colleagues (2016 , p. 3): “[A] preferred way to assess the evidential meaning of two or more results with substantive stochastic variability is to evaluate the cumulative evidence they provide.”

CONCLUSION 7-2: Multiple channels of evidence from a variety of studies provide a robust means for gaining confidence in scientific knowledge over time. The goal of science is to understand the overall effect or inference from a set of scientific studies, not to strictly determine whether any one study has replicated any other.

  • PUBLIC PERCEPTIONS OF REPRODUCIBILITY AND REPLICABILITY

The congressional mandate that led to this study expressed the view that “there is growing concern that some published research results cannot be replicated, which can negatively affect the public's trust in science.” The statement of task for this report reflected this concern, asking the committee to “consider if the lack of replicability and reproducibility impacts . . . the public's perception” of science (refer to Box 1-1 in Chapter 1 ). This committee is not aware of any data that have been collected that specifically address how non-reproducibility and non-replicability have affected the public's perception of science. However, there are data about topics that may shed some light on how the public views these issues. These include data about the public's understanding of science, the public's trust in science, and the media's coverage of science.

Public Understanding of Science

When examining public understanding of science for the purposes of this report, at least four areas are particularly relevant: factual knowledge, understanding of the scientific process, awareness of scientific consensus, and understanding of uncertainty.

Factual knowledge about scientific terms and concepts in the United States has been fairly stable in recent years. In 2016, Americans correctly answered an average of 5.6 of the 9 true-or-false or multiple-choice items asked on the Science & Engineering Indicators surveys. This number was similar to the averages from data gathered over the past decade. In other words, there is no indication that knowledge of scientific facts and terms has decreased in recent years. It is clear from the data, however, that “factual knowledge of science is strongly related to individuals' level of formal schooling and the number of science and mathematics courses completed” ( National Science Foundation, 2018e , p. 7-35).

Americans' understanding of the scientific process is mixed. The Science & Engineering Indicators surveys ask respondents about their understanding of three aspects related to the scientific process. In 2016, 64 percent could correctly answer two questions related to the concept of probability, 51 percent provided a correct description of a scientific experiment, and 23 percent were able to describe the idea of a scientific study. While these numbers have not been declining over time, they nonetheless indicate relatively low levels of understanding of the scientific process and suggest an inability of “[m]any members of the public . . . to differentiate a sound scientific study from a poorly conducted one and to understand the scientific process more broadly” ( Scheufele, 2013 , p. 14041).

Another area in which the public lacks a clear understanding of science is the idea of scientific consensus on a topic. There are widespread perceptions that no scientific consensus has emerged in areas that are supported by strong and consistent bodies of research. In a 2014 U.S. survey ( Funk and Raine, 2015 , p. 8), for instance, two-thirds of respondents (67%) thought that scientists did “not have a clear understanding about the health effects of GM [genetically modified] crops,” in spite of more than 1,500 peer-refereed studies showing that there is no difference between genetically modified and traditionally grown crops in terms of their health effects for human consumption ( National Academies of Sciences, Engineering, and Medicine, 2016a ). Similarly, even though there is broad consensus among scientists, one-half of Americans (52%) thought “scientists are divided” that the universe was created in a single, violent event often called the big bang, and about one-third thought that scientists are divided on the human causes of climate change (37%) and on evolution (29%).

For the fourth area, the public's understanding about uncertainty, its role in scientific inquiry, and how uncertainty ought to be evaluated, research is sparse. Some data are available on uncertainties surrounding public opinion poll results. In a 2007 Harris interactive poll, 7 for instance, only about 1 in 10 Americans (12%) could correctly identify the source of error quantified by margin-of-error estimates. Yet slightly more than one-half (52%) agreed that pollsters should use the phrase “margin of error” when reporting on survey results.

Some research has shown that scientists believe that the public is unable to understand or contend with uncertainty in science ( Besley and Nisbet, 2013 ; Davies, 2008 ; Ecklund et al., 2012 ) and that providing information related to uncertainty creates distrust, panic, and confusion ( Frewer et al., 2003 ). However, people appear to expect some level of uncertainty in scientific information, and seem to have a relatively high tolerance for scientific uncertainty ( Howell, 2018 ). Currently, research is being done to explore how best to communicate uncertainties to the public and how to help people accurately process uncertain information.

Public Trust in Science

Despite a sometimes shaky understanding of science and the scientific process, the public continues largely to trust the scientific community. In its biannual Science & Engineering Indicators reports, the National Science Board ( National Science Foundation, 2018e ) tracks public confidence in a range of institutions (see Figure 7-1 ). Over time, trust in science has remained stable—in contrast to other institutions, such as Congress, major corporations, and the press, which have all shown significant declines in public confidence over the past 50 years. With respect to public confidence, science has been eclipsed in public confidence only by the military during Operation Desert Storm in the early 1990s and since the 9/11 terrorist attacks.

Levels of public confidence in selected U.S. institutions over time. SOURCE: National Science Foundation (2018e, Figure 7-16) and General Social Survey (2018 data from http://gss.norc.org/Get-The-Data).

In the most recent iteration of the Science & Engineering Indicators surveys ( National Science Foundation, 2018e ), almost 9 in 10 (88%) Americans also “strongly agreed” or “agreed” with the statement that “[m]ost scientists want to work on things that will make life better for the average person.” A similar proportion (89%) “strongly agreed” or “agreed” that “[s]cientific researchers are dedicated people who work for the good of humanity.” Even for potentially controversial issues, such as climate change, levels of trust in scientists as information sources remain relatively high, with 71 percent in a 2015 Yale University Project on Climate Change survey saying that they trust climate scientists “as a source of information about global warming,” compared with 60 percent trusting television weather reporters as information sources, and 41 percent trusting mainstream news media. Controversies around scientific conduct, such as “climategate,” have not led to significant shifts in public trust. In fact, “more than a decade of public opinion research on global warming . . . [shows] that these controversies . . . had little if any measurable impact on relevant opinions of the nation as a whole” ( MacInnis and Krosnick, 2016 , p. 509).

In recent years, some scholars have raised concerns that unwarranted attention on emerging areas of science can lead to misperceptions or even declining trust among public audiences, especially if science is unable to deliver on early claims or subsequent research fails to replicate initial results ( Scheufele, 2014 ). Public opinion surveys show that these concerns are not completely unfounded. In national surveys, one in four Americans (27%) think that it is a “big problem” and almost one-half of Americans (47%) think it is at least a “small problem” that “[s]cience researchers overstate the implications of their research”; only one in four (24%) see no problem ( Funk et al., 2017 ). In other words, “science may run the risk of undermining its position in society in the long term if it does not navigate this area of public communication carefully and responsibly” ( Scheufele and Krause, 2019 , p. 7667).

Media Coverage of Science

The concerns noted above are exacerbated by the fact that the public's perception of science—and of reproducibility and replicability issues—is heavily influenced by the media's coverage of science. News is an inherently event-driven profession. Research on news values ( Galtung and Ruge, 1965 ) and journalistic norms ( Shoemaker and Reese, 1996 ) has shown that rare, unexpected, or novel events and topics are much more likely to be covered by news media than recurring or what are seen as routine issues. As a result, scientific news coverage often tends to favor articles about single-study, breakthrough results over stories that might summarize cumulative evidence, describe the process of scientific discovery, or delineate between systemic, application-focused, or intrinsic uncertainties surrounding science, as discussed throughout this report. In addition to being event driven, news is also subject to audience demand. Experimental studies have demonstrated that respondents prefer conflict-laden debates over deliberative exchanges ( Mutz and Reeves, 2005 ). Audience demand may drive news organizations to cover scientific stories that emphasize conflict—for example, studies that contradict previous work—rather than reporting on studies that support the consensus view or make incremental additions to existing knowledge.

In addition to what is covered by the media, there are also concerns about how the media cover scientific stories. There is some evidence that media stories contain exaggerations or make causal statements or inferences that are not warranted when reporting on scientific studies. For example, a study that looked at journal articles, press releases about these articles, and the subsequent news stories found that more than one-third of press releases contained exaggerated advice, causal claims, or inferences from animals to humans ( Sumner et al., 2016 ). When the press release contained these exaggerations, the news stories that followed were far more likely also to contain exaggerations in comparison with news stories based on press releases that did not exaggerate.

Public confidence in science journalism reflects this concern about coverage, with 73 percent of Americans saying that the “biggest problem with news about scientific research findings is the way news reporters cover it,” and 43 percent saying it is a “big problem” that the news media are “too quick to report research findings that may not hold up” ( Funk et al., 2017 ). Implicit in discussions of sensationalizing and exaggeration of research results is the concept of uncertainty. While scientific publications almost always include at least a brief discussion of the uncertainty in the results—whether presented in error bars, confidence intervals, or other metrics—this discussion of uncertainty does not always make it into news stories. When results are presented without the context of uncertainty, it can contribute to the perception of hyping or exaggerating a study's results.

In recent years, the term “replication crisis” has been used in both academic writing (e.g., Shrout and Rodgers, 2018 ) and in the mainstream media (see, e.g., Yong, 2016 ), despite a lack of reliable data about the existence of such a “crisis.” Some have raised concerns that highly visible instances of media coverage of the issue of replicability and reproducibility have contributed to a larger narrative in public discourse around science being “broken” ( Jamieson, 2018 ). The frequency and prominence with which an issue is covered in the media can influence the perceived importance among audiences about that issue relative to other topics and ultimately how audiences evaluate actors in their performance on the issue ( National Academies of Sciences, Engineering, and Medicine, 2016b ). However, large-scale analyses suggest that widespread media coverage of the issue is not the case. A preliminary analysis of print and online news outlets, for instance, shows that overall media coverage on reproducibility and replicability remains low, with fewer than 200 unique, on-topic articles captured for a 10-year period, from June 1, 2008, to April 30, 2018 ( Howell, 2018 ). Thus, there is currently limited evidence that media coverage of a replication crisis has significantly influenced public opinion.

Scientists also bear some responsibility for misrepresentation in the public's eye, with many believing that scientists overstate the implications of their research. The purported existence of a replication crisis has been reported in several high-profile articles in the mainstream media; however, overall coverage remains low and it is unclear whether this issue has reached the ears of the general population.

CONCLUSION 7-3: Based on evidence from well-designed and long-standing surveys of public perceptions, the public largely trusts scientists. Understanding of the scientific process and methods has remained stable over time, though is not widespread. The National Science Foundation's most recent Science & Engineering Indicators survey shows that 51 percent of Americans understand the logic of experiments and 23 percent understand the idea of a scientific study.

As discussed throughout this report, uncertainty is an inherent part of science. Unfortunately, while people show some tolerance for uncertainty in science, it is often not well communicated by researchers or the media. There is, however, a large and growing body of research outlining evidence-based approaches for scientists to more effectively communicate different dimensions of scientific uncertainty to nonexpert audiences (for an overview, see Fischhoff and Davis, 2014 ). Similarly, journalism teachers and scholars have long examined how journalists cover scientific uncertainty (e.g., Stocking, 1999 ) and best practices for communicating uncertainty in science news coverage (e.g., Blum et al., 2005 ).

Broader trends in how science is promoted and covered in modern news environments may indirectly influence public trust in science related to replicability and reproducibility. Examples include concerns about hyperbolic claims in university press releases (for a summary, see Weingart, 2017 ) and a false balance in reporting, especially when scientific topics are covered by nonscience journalists: in these cases, the established scientific consensus around issues such as climate change are put on equal footing with nonfactual claims by nonscientific organizations or interest groups for the sake of “showing both sides” ( Boykoff and Boykoff, 2004 ).

RECOMMENDATION 7-1: Scientists should take care to avoid overstating the implications of their research and also exercise caution in their review of press releases, especially when the results bear directly on matters of keen public interest and possible action.

RECOMMENDATION 7-2: Journalists should report on scientific results with as much context and nuance as the medium allows. In covering issues related to replicability and reproducibility, journalists should help their audiences understand the differences between non-reproducibility and non-replicability due to fraudulent conduct of science and instances in which the failure to reproduce or replicate may be due to evolving best practices in methods or inherent uncertainty in science. Particular care in reporting on scientific results is warranted when

  • the scientific system under study is complex and with limited control over alternative explanations or confounding influences;
  • a result is particularly surprising or at odds with existing bodies of research;
  • the study deals with an emerging area of science that is characterized by significant disagreement or contradictory results within the scientific community; and
  • research involves potential conflicts of interest, such as work funded by advocacy groups, affected industry, or others with a stake in the outcomes.

Finally, members of the public and policy makers have a role to play to improve reproducibility and replicability. When reports of a new discovery are made in the media, one needs to ask about the uncertainties associated with the results and what other evidence exists that the discovery might be weighed against.

RECOMMENDATION 7-3: Anyone making personal or policy decisions based on scientific evidence should be wary of making a serious decision based on the results, no matter how promising, of a single study. Similarly, no one should take a new, single contrary study as refutation of scientific conclusions supported by multiple lines of previous evidence.

For an overview of the Cochrane, see http://www ​.cochrane.org .

In the broad area of health care research, for example, this approach has been adopted by Cochrane, an international group for systematic reviews, and by U.S. government organizations such as the Agency for Healthcare Research and Quality and the U.S. Preventive Services Task Force.

The meanings of the terms verification and validation, like reproducibility and replicability, differ among fields. Here we conform to the usage in computer and information science. In weather forecasting, a model is verified by its agreement with data—what is here called validation.

For example, the 2015 Paris Agreement adopted by the U.N. Framework Convention on Climate Change specifies that “adaptation action . . . should be based on and guided by the best available science.” And the California Earthquake Authority is required by law to establish residential insurance rates that are based on “the best available science” ( Marshall, 2018 , p. 106).

For a list of replication studies in psychology, see http: ​//curatescience ​.org/#replicationssection .

A/B testing is a randomized experiment with two variants that includes application of statistical hypothesis testing or “two-sample hypothesis testing” as used in the field of statistics.

See https: ​//theharrispoll ​.com/wp-content/uploads ​/2017/12/Harris-Interactive-Poll-ResearchMargin-of-Error-2007-11.pdf .

  • Cite this Page National Academies of Sciences, Engineering, and Medicine; Policy and Global Affairs; Committee on Science, Engineering, Medicine, and Public Policy; Board on Research Data and Information; Division on Engineering and Physical Sciences; Committee on Applied and Theoretical Statistics; Board on Mathematical Sciences and Analytics; Division on Earth and Life Studies; Nuclear and Radiation Studies Board; Division of Behavioral and Social Sciences and Education; Committee on National Statistics; Board on Behavioral, Cognitive, and Sensory Sciences; Committee on Reproducibility and Replicability in Science. Reproducibility and Replicability in Science. Washington (DC): National Academies Press (US); 2019 May 7. 7, Confidence in Science.
  • PDF version of this title (2.9M)

In this Page

Recent activity.

  • Confidence in Science - Reproducibility and Replicability in Science Confidence in Science - Reproducibility and Replicability in Science

Your browsing activity is empty.

Activity recording is turned off.

Turn recording back on

Connect with NLM

National Library of Medicine 8600 Rockville Pike Bethesda, MD 20894

Web Policies FOIA HHS Vulnerability Disclosure

Help Accessibility Careers

statistics

Stanford University

Latest information about COVID-19

Rigorous research practices improve scientific replication

Science has suffered a crisis of replication—too few scientific studies can be repeated by peers. A new study from Stanford and three leading research universities shows that using rigorous research practices can boost the replication rate of studies.

Science has a replication problem. In recent years, it has come to light that the findings of many studies, particularly those in social psychology, cannot be reproduced by other scientists. When this happens, the data, methods, and interpretation of the study’s results are often called into question, creating a crisis of confidence.

“When people don’t trust science, that’s bad for society,” said Jon Krosnick , the Frederic O. Glover Professor of Humanities and Social Sciences in the Stanford School of Humanities and Sciences. Krosnick is one of four co-principal investigators on a study that explored ways scientists in fields ranging from physics to psychology can improve the replicability of their research. The study, published Nov. 9 in Nature Human Behavior , found that using rigorous methodology can yield near-perfect rates of replication.

Image of Jon Krosnick

“Replicating others’ scientific results is fundamental to the scientific process,” Krosnick argues. According to a paper published in 2015 in Science , fewer than half of findings of psychology studies could be replicated—and only 30 percent for studies in the field of social psychology. Such findings “damage the credibility of all scientists, not just those whose findings cannot be replicated,” Krosnick explained.

Publish or perish

“Scientists are people, too,” said Krosnick, who is a professor of communication and of political science in H&S and of social sciences in the Stanford Doerr School of Sustainability. “Researchers want to make their funders happy and to publish head-turning results. Sometimes, that inspires researchers to make up or misrepresent data.

Almost every day, I see a new story about a published study being retracted—in physics, neuroscience, medicine, you name it. Showing that scientific findings can be replicated is the only pathway to solving the credibility problem.”

Accordingly, Krosnick added that the publish-or-perish environment creates the temptation to fake the data or to analyze and reanalyze the data with various methods until a desired result finally pops out, which is not actually real—a practice known as p-hacking.

Image of Bo MacInnis

In an effort to assess the true potential of rigorous social science findings to be replicated, Krosnick’s lab at Stanford and labs at the University of California, Santa Barbara; the University of Virginia; and the University of California, Berkeley set out to discover new experimental effects using best practices and to assess how often they could be reproduced. The four teams attempted to replicate the results of 16 studies using rigor-enhancing practices.

“The results reassure me that painstakingly rigorous methods pay off,” said Bo MacInnis , a Stanford lecturer and study co-author whose research on political communication was conducted under the parameters of the replicability study. “Scientific researchers can effectively and reliably govern themselves in a way that deserves and preserves the public’s highest trust.”

Matthew DeBell , director of operations at the American National Election Studies program at the Stanford Institute for Research in the Social Sciences is also a co-author.

“The quality of scientific evidence depends on the quality of the research methods,” DeBell said. “Research findings do hold up when everything is done as well as possible, underscoring the importance of adhering to the highest standards in science.”

Image of Matthew DeBell

Transparent methods

In the end, the team found that when four “rigor-enhancing” practices were implemented, the replication rate was almost 90 percent. Although the recommended steps place additional burdens on the researchers, those practices are relatively straightforward and not particularly onerous.

These practices call for researchers to run confirmatory tests on their own studies to corroborate results prior to publication. Data should be collected from a sufficiently large sample of participants. Scientists should preregister all studies, committing to the hypotheses to be tested and the methods to be used to test them before data are collected, to guard against p-hacking. And researchers must fully document their procedures to ensure that peers can precisely repeat them.

The four labs conducted original research using these recommended rigor-enhancing practices. Then they submitted their work to the other labs for replication. Overall, of the 16 studies produced by the four labs during the five-year project, replication was successful in 86 percent of the attempts.

“The bottom line in this study is that when science is done well, it produces believable, replicable, and generalizable findings,” Krosnick said. “What I and the other authors of this study hope will be the takeaway is a wake-up call to other disciplines to doubt their own work, to develop and adopt their own best practices, and to change how we all publish by building in replication routinely. If we do these things, we can restore confidence in the scientific process and in scientific findings.”

Acknowledgements

Krosnick is also a professor, by courtesy, of psychology in H&S. Additional authors include lead author John Protzko of Central Connecticut State University; Leif Nelson, a principal investigator from the University of California, Berkeley; Brian Nosek, a principal investigator from the University of Virginia; Jordan Axt of McGill University ; Matt Berent of Matt Berent Consulting; Nicholas Buttrick and Charles R. Ebersole of the University of Virginia; Sebastian Lundmark of the University of Gothenburg, Gothenburg, Sweden; Michael O’Donnell of Georgetown University; Hannah Perfecto of Washington University, St. Louis; James E. Pustejovsky of the University of Wisconsin, Madison; Scott Roeder of the University of South Carolina ; Jan Walleczek of the Fetzer Franklin Fund; and senior author and project principal investigator author Jonathan Schooler of the University of California, Santa Barbara.

This research was funded by the Fetzer Franklin Fund of the John E. Fetzer Memorial Trust.

Competing Interests

Nosek is the executive director of the nonprofit Center for Open Science. Walleczek was the scientific director of the Fetzer Franklin Fund that sponsored this research, and Nosek was on the fund’s scientific advisory board. Walleczek made substantive contributions to the design and execution of this research but as a funder did not have controlling interest in the decision to publish or not. All other authors declared no conflicts of interest.

  • Social Sciences

the research study cannot be replicated or repeated

Learning the history of evolution and primatology

Joe Berger

Stanford sociology professor Joseph Berger has died

Norman Wessells in Montana

Norman K. Wessells, pioneering Stanford biologist, has died

Marina Johnson and Ava Jeffs stand under the arcades along Stanford's Main Quad

Students will explore Taylor Swift’s lyrics as literature in new course

Stanford University

© Stanford University, Stanford, California 94305

  • Skip to main content
  • Keyboard shortcuts for audio player

Hidden Brain

Scientific findings often fail to be replicated, researchers say.

Shankar Vedantam 2017 square

Shankar Vedantam

A massive effort to test the validity of 100 psychology experiments finds that more than 50 percent of the studies fail to replicate. This is based on a new study published in the journal "Science."

STEVE INSKEEP, HOST:

People seeking answers to science questions face a constant reality. Many science experiments come up with fascinating results. But the results cannot be replicated as often as you'd think. David Greene spoke with NPR's Shankar Vedantam.

DAVID GREENE, HOST:

So here's the deal. Researchers recently tried to replicate a hundred experiments in psychology that were published in three leading journals. And Shankar's here to talk about that. Shankar, what did they find?

SHANKAR VEDANTAM, BYLINE: They found something very disappointing, David. Nearly two-thirds of the experiments did not replicate, meaning that scientists repeated these studies but could not obtain the results that were found by the original research team.

GREENE: Two-thirds of these original studies, which presumably at least some of them drew some attention, actually turned out to be false when this replication was tried.

VEDANTAM: Yeah, so calling them false is one explanation, David. In fact, there have been some really big scandals recently where researchers have been found to have fabricated the evidence in data. So that's, you know, one possibility. But I was speaking with Brian Nosek. He's is a psychologist at the University of Virginia. He organized this massive new replication effort. He offered a more nuanced way to think about the findings.

BRIAN NOSEK: Our best methodologies to try to figure out truth mostly reveal to us that figuring out truth is really hard. And we're going to get contradictions. One year, we're going to learn that coffee is good for us. The next year, we're going to learn that it's bad for us. The next year, we're going to learn we don't know.

VEDANTAM: When you fail to reproduce a result, David, you know, the first thing we think about is that OK, this means the first study was wrong. But there are other explanations. It could be the second study was wrong. It could be that they're both wrong. Nosek said it's also possible that both studies are actually right. To use his example, maybe coffee has effects only under certain conditions. When you meet those conditions, you see an effect. When you don't meet those conditions, you don't see an effect. So Nosek says when we can't reproduce a study, it's a sign of uncertainty, not a sign of untrustworthiness. It's a signal there's something going on that we don't understand.

GREENE: Well, Shankar, how do scientists respond when their work is checked and, in some cases, disproven?

VEDANTAM: You know, they respond defensively, David. And perhaps it's not surprising. Nosek told me that one of his own studies was tested for replication, and the replication didn't work. I asked him how he felt about his earlier work being shot down.

NOSEK: We are invested in our findings because they feel like personal possessions, right? I discovered that. I'm proud of it. I have some motivation to even feel like I should defend it. But of course, all of those things are not the scientific ideal. There isn't really an easy way to not feel bad about those things because we're human. And these are the contributions that we as individual scientists make.

GREENE: You know, Shankar, if this is a healthy process for scientists to be constantly checking one another, I mean, one problem I see is that doesn't happen that often because most scientists want to sort of be doing original research and not spending a career looking at other research and trying to see if it was right or not.

VEDANTAM: That's exactly right, David. That's one of the big goals that Nosek is focusing on with this new initiative. Research journals also have a big incentive to publish new findings, not necessarily to publish reproductions of earlier findings. Many science organizations are trying to figure out ways to change the incentives so that both researchers and science journals publish more reproductions of earlier work, including results that are mixed or confusing.

GREENE: OK, I mean, this is all well and good if we're to understand that each time a new study comes out, maybe we should view it as sort of an ongoing search for the truth. But does that mean that when we see a big headline about some study, we should just ignore it?

VEDANTAM: Well, this is the problem, David. I think many of us look to science to provide us with answers and certainty when science really is in the business of producing questions and producing more uncertainty. You know, as I was listening to Nosek talk about science, David, I realized there are parallels between the practice of science and the practice of what we do as journalists. You know, we paint a picture of the world every day, whether that's a war zone or financial markets. But we're always doing it in the context of imperfect information. And especially when we're covering things we don't know much about - you know, a big breaking story, what we discover in the first few days is likely to get revised down the road. Now, you can throw up your hands and say, let's not waste time reading or listening to the first draft of history. Let me just wait a month or a year for the whole picture to emerge. But I think most people would say the best information is still valuable, even if it's going to get updated tomorrow. We need to think about scientific studies the same way.

GREENE: Shankar, thanks, as always.

INSKEEP: A new study finds that that is our social science correspondent, Shankar Vendantam, talking with David Greene on MORNING EDITION from NPR News.

Copyright © 2015 NPR. All rights reserved. Visit our website terms of use and permissions pages at www.npr.org for further information.

NPR transcripts are created on a rush deadline by an NPR contractor. This text may not be in its final form and may be updated or revised in the future. Accuracy and availability may vary. The authoritative record of NPR’s programming is the audio record.

  • Skip to primary navigation
  • Skip to main content
  • Skip to primary sidebar
  • Skip to footer

the research study cannot be replicated or repeated

Understanding Science

How science REALLY works...

  • Understanding Science 101
  • Scientists aim for their studies to be replicable — meaning that another researcher could perform a similar investigation and obtain the same basic results.
  • When a study cannot be replicated, it suggests that our current understanding of the study system or our methods of testing are insufficient.

Copycats in science: The role of replication

Scientists aim for their studies’ findings to be replicable — so that, for example, an experiment testing ideas about the attraction between electrons and protons should yield the same results when repeated in different labs. Similarly, two different researchers studying the same dinosaur bone in the same way should come to the same conclusions regarding its measurements and composition—though they may interpret that evidence differently (e.g., regarding what it means about dinosaur growth patterns). This goal of replicability makes sense. After all, science aims to reconstruct the unchanging rules by which the universe operates, and those same rules apply, 24 hours a day, seven days a week, from Sweden to Saturn, regardless of who is studying them. If a finding can’t be replicated, it suggests that our current understanding of the study system or our methods of testing are insufficient.

Does this mean that scientists are constantly repeating what others before them have already done? No, of course not — or we would never get anywhere at all. The process of science doesn’t require that every experiment and every study be repeated, but many are, especially those that produce surprising or particularly important results. In some fields, it is standard procedure for a scientist to replicate his or her own results before publication in order to ensure that the findings were not due to some fluke or factors outside the experimental design.

The desire for replicability is part of the reason that scientific papers almost always include a methods section, which describes exactly how the researchers performed the study. That information allows other scientists to replicate the study and to evaluate its quality, helping ensure that occasional cases of fraud or sloppy scientific work are weeded out and corrected.

  • Science in action

When a finding can’t be replicated, it spells trouble for the idea supported by that piece of evidence. Lack of replicability has challenged the idea of cold fusion. Read the full story:  Cold fusion: A case study for scientific behavior .

Scrutinizing science: Peer review

Benefits of science

Subscribe to our newsletter

  • The science flowchart
  • Science stories
  • Grade-level teaching guides
  • Teaching resource database
  • Journaling tool
  • Misconceptions

Subscribe or renew today

Every print subscription comes with full digital access

Science News

A massive 8-year effort finds that much cancer research can’t be replicated.

Unreliable preclinical studies could impede drug development later on

illustration of orange and red prostate cancer cells

An effort to replicate nearly 200 preclinical cancer experiments that generated buzz from 2010 to 2012 found that only about a quarter could be reproduced. Prostate cancer cells are shown in this artist’s illustration.

Dr_Microbe/iStock/Getty Images Plus

Share this:

By Tara Haelle

December 7, 2021 at 8:00 am

After eight years, a project that tried to reproduce the results of key cancer biology studies has finally concluded. And its findings suggest that like research in the social sciences, cancer research has a replication problem.

Researchers with the Reproducibility Project: Cancer Biology aimed to replicate 193 experiments from 53 top cancer papers published from 2010 to 2012. But only a quarter of those experiments were able to be reproduced , the team reports in two papers published December 7 in eLife .

The researchers couldn’t complete the majority of experiments because the team couldn’t gather enough information from the original papers or their authors about methods used, or obtain the necessary materials needed to attempt replication.

What’s more, of the 50 experiments from 23 papers that were reproduced, effect sizes were, on average, 85 percent lower than those reported in the original experiments. Effect sizes indicate how big the effect found in a study is. For example, two studies might find that a certain chemical kills cancer cells, but the chemical kills 30 percent of cells in one experiment and 80 percent of cells in a different experiment. The first experiment has less than half the effect size seen in the second one. 

The team also measured if a replication was successful using five criteria. Four focused on effect sizes, and the fifth looked at whether both the original and replicated experiments had similarly positive or negative results, and if both sets of results were statistically significant. The researchers were able to apply those criteria to 112 tested effects from the experiments they could reproduce. Ultimately, just 46 percent, or 51, met more criteria than they failed, the researchers report.

“The report tells us a lot about the culture and realities of the way cancer biology works, and it’s not a flattering picture at all,” says Jonathan Kimmelman, a bioethicist at McGill University in Montreal. He coauthored a commentary on the project exploring the ethical aspects of the findings.

It’s worrisome if experiments that cannot be reproduced are used to launch clinical trials or drug development efforts, Kimmelman says. If it turns out that the science on which a drug is based is not reliable, “it means that patients are needlessly exposed to drugs that are unsafe and that really don’t even have a shot at making an impact on cancer,” he says.

At the same time, Kimmelman cautions against overinterpreting the findings as suggesting that the current cancer research system is broken. “We actually don’t know how well the system is working,” he says. One of the many questions left unresolved by the project is what an appropriate rate of replication is in cancer research, since replicating all studies perfectly isn’t possible. “That’s a moral question,” he says. “That’s a policy question. That’s not really a scientific question.”

The overarching lessons of the project suggest that substantial inefficiency in preclinical research may be hampering the drug development pipeline later on, says Tim Errington, who led the project. He is the director of research at the Center for Open Science in Charlottesville, Va., which cosponsored the research.

As many as 14 out of 15 cancer drugs that enter clinical trials never receive approval from the U.S. Food and Drug Administration. Sometimes that’s because the drugs lack commercial potential, but more often it is because they do not show the level of safety and effectiveness needed for licensure.

microscope image of a squamous cell carcinoma in a mouse

Much of that failure is expected. “We’re humans trying to understand complex disease, we’re never going to get it right,” Errington says. But given the cancer reproducibility project’s findings, perhaps “we should have known that we were failing earlier, or maybe we don’t understand actually what’s causing [an] exciting finding,” he says.

Still, it’s not that failure to replicate means that a study was wrong or that replicating it means that the findings are correct, says Shirley Wang, an epidemiologist at Brigham and Women’s Hospital in Boston and Harvard Medical School. “It just means that you’re able to reproduce,” she says, a point that the reproducibility project also stresses.

Scientists still have to evaluate whether a study’s methods are unbiased and rigorous, says Wang, who was not involved in the project but reviewed its findings. And if the results of original experiments and their replications do differ, it’s a learning opportunity to find out why and the implications, she adds.

Errington and his colleagues have reported on subsets of the cancer reproducibility project’s findings before , but this is the first time that the effort’s entire analysis has been released ( SN: 1/18/17 ).

During the project, the researchers faced a number of obstacles, particularly that none of the original experiments included enough details in their published studies about methods to attempt reproduction. So the reproducibility researchers contacted the studies’ authors for additional information.

While authors for 41 percent of the experiments were extremely or very helpful, authors for another third of the experiments did not reply to requests for more information or were not otherwise helpful, the project found. For example, one of the experiments that the group was unable to replicate required the use of a mouse model specifically bred for the original experiment. Errington says that the scientists who conducted that work refused to share some of these mice with the reproducibility project, and without those rodents, replication was impossible.

image of a blue gloved hands holding a lab mouse

Some researchers were outright hostile to the idea that independent scientists wanted to attempt to replicate their work, says Brian Nosek, executive director at the Center for Open Science and a coauthor on both studies. That attitude is a product of a research culture that values innovation over replication, and that prizes the academic publish-or-perish system over cooperation and data sharing, Nosek says.

Some scientists may feel threatened by replication because it is uncommon. “If replication is normal and routine, people wouldn’t see it as a threat,” Nosek says. But replication may also feel intimidating because scientists’ livelihoods and even identities are often so deeply rooted in their findings, he says. “Publication is the currency of advancement, a key reward that turns into chances for funding, chances for a job and chances for keeping that job,” Nosek says. “Replication doesn’t fit neatly into that rewards system.”

Even authors who wanted to help couldn’t always share their data for various reasons, including lost hard drives or intellectual property restrictions or data that only former graduate students had.

Calls from some experts about science’s “ reproducibility crisis ” have been growing for years, perhaps most notably in psychology (SN: 8/27/18 ) . Then in 2011 and 2012, pharmaceutical companies Bayer and Amgen reported difficulties in replicating findings from preclinical biomedical research.

But not everyone agrees on solutions, including whether replication of key experiments is actually useful or possible , or even what exactly is wrong with the way science is done or what needs to improve ( SN: 1/13/15 ).   

At least one clear, actionable conclusion emerged from the new findings, says Yvette Seger, director of science policy at the Federation of American Societies for Experimental Biology. That’s the need to provide scientists with as much opportunity as possible to explain exactly how they conducted their research.

“Scientists should aspire to include as much information about their experimental methods as possible to ensure understanding about results on the other side,” says Seger, who was not involved in the reproducibility project.

Ultimately, if science is to be a self-correcting discipline, there needs to be plenty of opportunities not only for making mistakes but also for discovering those mistakes, including by replicating experiments, the project’s researchers say.

“In general, the public understands science is hard, and I think the public also understands that science is going to make errors,” Nosek says. “The concern is and should be, is science efficient at catching its errors?” The cancer project’s findings don’t necessarily answer that question, but they do highlight the challenges of trying to find out.

More Stories from Science News on Health & Medicine

Colorful bags of delta-8 products are hung in a story display. The products' packaging includes knock offs of Doritos, Life Savers, Ruffles and Oreos.

Teens are using an unregulated form of THC. Here’s what we know

COVID patient lung x-ray checked by doctors

Immune cells’ intense reaction to the coronavirus may lead to pneumonia

A blacklegged tick with a reddish brown body crawls across white human skin. Two fingers behind the tick look ready to pinch it.

A protein found in sweat may protect people from Lyme disease

trees creating shade in New York City

Heat waves cause more illness and death in U.S. cities with fewer trees

Multiple cows on a dairy farm eat hay while some black birds eat from the same hay piles

Bird flu has infected a person after spreading to cows. Here’s what to know

Bits of colorful microplastics lay on the tips of a white person's outstretched fingers.

A new study has linked microplastics to heart attacks and strokes. Here’s what we know 

Digital art of someone in a hospital gown standing at a line graph full of data points with other images such as lungs, chemical makeup and bell curves around the person.

How patient-led research could speed up medical innovation

the research study cannot be replicated or repeated

Here’s what distorted faces can look like to people with prosopometamorphopsia

Subscribers, enter your e-mail address for full access to the Science News archives and digital editions.

Not a subscriber? Become one now .

  • Open access
  • Published: 09 January 2019

Replicability and replication in the humanities

  • Rik Peels   ORCID: orcid.org/0000-0001-8107-5992 1  

Research Integrity and Peer Review volume  4 , Article number:  2 ( 2019 ) Cite this article

35k Accesses

19 Citations

58 Altmetric

Metrics details

A large number of scientists and several news platforms have, over the last few years, been speaking of a replication crisis in various academic disciplines, especially the biomedical and social sciences. This paper answers the novel question of whether we should also pursue replication in the humanities. First, I create more conceptual clarity by defining, in addition to the term “humanities,” various key terms in the debate on replication, such as “reproduction” and “replicability.” In doing so, I pay attention to what is supposed to be the object of replication: certain studies, particular inferences, of specific results. After that, I spell out three reasons for thinking that replication in the humanities is not possible and argue that they are unconvincing. Subsequently, I give a more detailed case for thinking that replication in the humanities is possible. Finally, I explain why such replication in the humanities is not only possible, but also desirable.

Peer Review reports

Scientists and various news platforms have, over the last few years, increasingly been speaking of a replication crisis in various academic disciplines, especially the biomedical Footnote 1 and social sciences. Footnote 2 The main reason for this is that it turns out that large numbers of studies cannot be replicated, that is (roughly), they yield results that appear not to support, or to count against, the validity of the original finding. Footnote 3 This has been and still is an important impulse for composing and adapting various codes of research integrity. Moreover, in December 2017, the National American Academies convened the first meeting of a new study committee that will, for a period of 18 months, study “Reproducibility and Replicability in Science,” a project funded by the National Science Foundation. Footnote 4 Finally, over the last few years, various official reports on replication have been published. At least five of them come to mind:

The 2015 report by the National Science Foundation: Social, Behavioral, and Economic Sciences Perspectives on Robust and Reliable Science

The 2015 symposium report by the Academy of Medical Sciences: Reproducibility and Reliability of Biomedical Research

The 2016 workshop report by the National Academies of Sciences: Statistical Challenges in Assessing and Fostering the Reproducibility of Scientific Results [ 1 ]

The 2016 report by the Interacademy Partnership for Health: A Call for Action to Improve the Reproducibility of Biomedical Research

The 2018 advisory report by the Royal Netherlands Academy of Arts and Sciences Replication Studies Footnote 5

These documents state what the problem regarding replication is, they explain how we should think of the nature and value of replication, and they make various recommendations as to how to improve upon replicability. There are many causes for lack of replicability and failure to successfully replicate upon attempting to do so. Among them are (i) fraud, falsification, and plagiarism, (ii) questionable research practices, partly due to unhealthy research systems with perverse publication incentives, (iii) human error, (iv) changes in conditions and circumstances, (v) lack of effective peer review, and (vi) lack of rigor. Footnote 6 Thus, we also need a wide variety of measures to improve on replicability. In this article, I will take each of the reports mentioned above into consideration, but pay special attention to the KNAW report, since it is the most recent one and it has taken the findings of the other reports into account.

The issue of replicability and replication in academic research is important for various reasons. Let me mention four of them: (i) results that are consistently replicated are likely to be true, all else being equal, that is, controlling for such phenomena as publication bias and assuming that the other assumptions in the relevant theory or model are valid, (ii) replicability prevents the waste of (financial, time, etc.) resources, since studies that cannot be consistently replicated are less likely to be true, (iii) results that are not replicable are, if they are applied, more likely to cause harm to individuals, animals, and society (e.g., by leading to mistaken economic measures or medicine that is detrimental to people’s health), and (iv) if too many results turn out not to be replicable, upon attempting to replicate them, that will gradually erode public trust in science. Footnote 7

Now, reports about replication focus on various quantitative empirical sciences . Footnote 8 The KNAW Advisory Report, for instance, makes explicit that it is confined to the medical sciences, life sciences, and psychology. Footnote 9 These reports, though invite researchers from other disciplines to consider the relevance of these documents and recommendations for their own fields. That is precisely the purpose of this paper: to explore to what extent replication is possible and desirable in another important field of scholarly activity, namely the humanities . After all, many humanistic disciplines, such as history, archeology, linguistics, and art theory are thoroughly empirical : they are based on the collection of data (as opposed to the deductive lines of reasoning that we find in mathematics, logic, parts of ethics, and metaphysics). This naturally leads to the question whether replication is also possible in the humanities.

How we should think of replication in the humanities is something that has not received any attention so far, except for a couple of articles that I co-authored with Lex Bouter. Footnote 10 Maybe this is because it is questionable whether replication is even possible in the humanities. There are various reasons for this. First, the study objects in the humanities are often unique phenomena, such as historical events, so that it is not clear in what sense one could replicate a study. Second, one might think that various methods in the humanities, such as the hermeneutical method in studying a text, do not lend themselves well to replication—at least not as well as certain methods in the quantitative empirical sciences, where one can carry out an experiment with similar data under similar circumstances. Third, the objects of humanistic research, in opposition to the objects of research in the natural sciences, are often object with meaning and value , objects such as paintings, texts, statues, and buildings—in opposition to, say, such objects as atoms and viruses that are studied in the natural sciences. One might think that the inevitably normative nature of these humanistic objects makes replication impossible. It remains to be seen, though, whether these objections hold water. I return to each of them below.

Introduction

In order to answer the question of whether replication is possible and, if so, desirable in the humanities, I first create more conceptual clarity by defining, in addition to the term “humanities,” various key terms in the debate on replication, such as “reproduction” and “replicability.” In doing so, I pay attention to what is supposed to be the object of replication: certain studies, particular inferences, of specific results. After that, I lay out three reasons for thinking that replication in the humanities is not possible and argue that they are unconvincing. Subsequently, I give a more detailed case for thinking that replication in the humanities is possible. Finally, I explain why such replication in the humanities is not only possible , but also desirable .

Defining the key terms

We can be rather brief about the term “humanities.” There is a debate on what should count as a humanistic discipline and what not. Rather than entering that debate here, I will simply stipulate that, for the sake of argument, I take the following disciplines to belong to the humanities: anthropology; archeology; classics; history; linguistics and languages; law and politics; literature; the study of the performing arts, such as music, theater, and dance; the study of the visual arts, such as drawing, painting, and film; philosophy; theology; and religious studies. This captures what most people take to fall under the umbrella of “humanities” and that will do for the purposes of this paper. Footnote 11

Let us now move on to replication. There are at least two complicating factors when it comes to the issue of replication in the humanities: there is a wide variety of terms and many of these terms have no definition that is widely agreed upon. I have the following eight terms in mind: “replication study,” “replicability,” “replication,” “reproduction,” “reproducibility,” “robustness,” “reliability,” and “verifiability.” Here, I will put the final three terms, namely “robustness,” “reliability,” and “verifiability” aside, since the points I want to make about replication in the humanities do not depend on them. Footnote 12 Also, I take “replication” and “reproduction” to be synonyms, as I do “replicability” and “reproducibility.” Footnote 13 I will, therefore, focus on the three remaining terms, to wit “replication studies,” “replicability,” and “replication.”

Let us define “replication study” as follows:

Replication study

A replication study is a study that is an independent repetition of an earlier, published study, using sufficiently similar methods (along the appropriate dimensions) and conducted under sufficiently similar circumstances . Footnote 14

Clearly, this definition requires some explanation. First, it counts both studies that are meant as close or exact replication and studies designed as conceptual replication as replication studies. There are, of course, crucial differences between these kinds of replication, but they both count as replication studies and that is exactly what the above definition is meant to capture. A recent call for replication studies by the Netherlands Organization for Scientific Research (NWO), for instance, distinguishes three kinds of replication Footnote 15 :

Replication with existing data and the same research protocol and the same research question: repeated analysis of the datasets from the original study with the original research question (sometimes more narrowly referred to as a “reproduction”).

Replication with a new data collection and with the same research protocol and the same research question as the original study (often referred to as a “direct replication”).

Replication with new data and with a new or revised research protocol Footnote 16 : new data collection with a different design from the original study in which the research question remains unchanged compared to that of the original study (often referred to as a “conceptual replication”). Footnote 17

An advantage of the above definition of “replication study” is that it captures these three varieties of replication studies. It is, of course, perfectly compatible with my definition to make these further distinctions among varieties of replication studies.

Second, the definition states that the new study should in some sense be independent from the original study. Unfortunately, reports on replication usually do not define what it is for a study to be independent from an earlier one. Footnote 18 It seems to me that the right way to understand “independence” here is that the new study should not in any way depend on the results of the original study .

However, can we be more precise about how the results of the new study should not depend on those of the original study? The most obvious meaning of this phrase is that the new study should not take all the original results for granted —that is, it should not assume their truth or correctness in its line of reasoning (even though, it can of course do so merely for the sake of argument ). Dependence, however, is a matter of degree: one can, for instance, assume certain results or certain aspects of certain results in order to replicate other results or other aspects of results. Below, we return to the issue of degrees when we consider in what sense results of the new study should agree with the results of the original study.

This means that various other kinds of dependence are perfectly legitimate for a replication study. For example, the new study can depend on the same instruments as those used in the original study, on the same research protocol (e.g., in a repetition of an earlier study), and, in some cases, even on the original researchers or at least partly so in the case of a collaborative team with the original researchers and new researchers. It can perfectly well depend on these things in that it is no problem if the original study and the new study have the same instruments, the same research protocol, and consists of the same group of researchers—at least for some kinds of replication.

Third and finally, the definition states that the methods used and the circumstances in which the study is carried out should be “sufficiently similar.” That means that they need not be identical—that may be the case (or something very close to that), but that is not required for a replication study. It also means that they should not be completely different—that is excluded by its being a replication study. But exactly when are they “sufficiently similar?”

This is a complex issue that others have addressed in detail. For instance, Etienne LeBel and others provide a replication taxonomy that understands replication as a graded phenomenon: it ranges from an exact replication (all facets that are under the researchers’ control are the same) to a very far replication (independent variables (IV) or dependent variables (DV) constructs are different), with, respectively, very close replication, close replication, and far replication in-between. The design facets that their taxonomy pays attention to are such things as effect or hypothesis, IV construct, DV construct, operationalization, population (e.g., seize), IV stimuli, DV stimuli, procedural details, such as task instructions and font size, physical setting, and contextual variables (they indicate that the list can be extended). Footnote 19 What this goes to show is that replication is a matter of degree and that in assessing the epistemic status of a replication, one should try to locate it on a replication continuum.

This brings us to the second key term, “replicability.” It seems to me that this term is used in two crucially different ways, in the KNAW Advisory Report as well as in the broader literature on replication studies. In order to keep things clear, I would like to distinguish the two and will refer to the former as “replicability” and to the latter as “replication.” I define them as follows:

  • Replicability

A study is having certain features such that a replication study of it could be carried out .

  • Replication

A study is being such that a repetition of it has successfully been carried out, producing results that agree with the original study . Footnote 20

Some philosophers of science and scholars in research integrity use the term “transparency” for what I dub “replicability” here. Footnote 21 Clearly, replicability, as I understand it here, has much to do with transparency: a study can be replicated only if the researchers are sufficiently transparent about the data, the method, the inferences, and so on. Still, I prefer to use the term “replicability” rather than “transparency,” given the purposes of this paper. This is because some humanistic scholars, as we shall see below, think that studies can be perfectly transparent and yet such that they cannot be replicated. If so, they are not replicable, but not because of any scholarly shortcoming. Rather, it would be the nature of the beast (a humanistic study, or a particular kind of humanistic study, such as one about value or meaning) that prevents the possibility of replication.

Thus, replicability is a desideratum for at least many studies in the quantitative empirical sciences (I return to the humanities below): we want them to be set-up and described in such a way that, in principle, we could carry out a replication study. Precise definitions, a clear description of the methodology (in the research protocol), a clear overview of the raw data, a lucid analysis of the data, and so on, all contribute to the replicability of a study. One of the things the replication crisis has made clear is that many studies in the empirical sciences fail to meet the criterion of replicability: we cannot carry out a replication study of them, since the key terms are not even sufficiently clearly defined, the method is underdescribed, the discussion is not transparent, the raw data are not presented in a lucid way, or the analysis of the data is not clearly described.

Replicability should be clearly distinguished from replication . Replication entails replicability (you cannot replicate what is not replicable), but requires significantly more, namely that a successful replication has actually taken place, producing results that agree with the results of the original study. Thus, in a way this distinction is similar to Karl Popper’s famous distinction between falsifiability and falsification. Footnote 22 Falsifiability is a desideratum for any scientific theory: very roughly, a theory should be such that it is in principle falsifiable. Falsification entails falsifiability, but goes a step further, because a falsified theory is a theory that is not only falsifiable, but that has in fact also been falsified. I said “roughly,” because, as Brian Earp has argued in more detail, things are never so simple when it comes to falsification: even if an attempt at falsification has taken place and the new data seem to count against the original hypothesis, one might often just as well, say, question an auxiliary assumption, consider whether a mistake was made in the original study, or wonder whether perhaps the original effect is a genuine effect but one that can only be obtained under specific conditions. Footnote 23 Nevertheless, falsification is often still considered as a useful heuristic in judging the strength of a hypothesis. Footnote 24 Now, the obvious difference with the issue at hand is that, even though both falsifiability and replicability are desiderata, replication is a good thing, because it makes it, all else being equal, likely that results are true, whereas falsification is in a sense a bad thing, because it makes it likely that a theory is false. Footnote 25

A replication study, then, is a study that aims at replication. Such replication may fail either because the original study turns out not to be replicable in the first place or because, even though it is replicable, a successful replication does not occur. A successful replication occurs if the results of the new study agree with those of the original study or, slightly more precisely, if the results of the two studies are commensurate. Exactly what is it, though, for results to be commensurate? As several reports on replication point out Footnote 26 it is not required that the results are identical—that would be too demanding in, say, many biomedical sciences. Again, it seems that “agreeing” is a property of results that comes in degrees . More precisely, we can distinguish at least the following senses, in order of increasing strength:

The studies’ conclusions have the same direction (e.g., both studies show a positive correlation between X and Y);

The studies’ conclusions have the same direction and the studies have a similar effect size (e.g., in both studies, Y is three times as large with X as it is with non-X; in some disciplines: the relative risk is three (RR = 3));

The studies’ conclusions have the same direction , and the studies have a similar effect size and a similar p value , confidence interval , or Bayes factor (e.g., for both studies, RR = 3 (1.5–5.0)). Footnote 27

The stronger the criterion for the sense in which studies results “agree,” the lower—ceteris paribus—the percentage of successful replications will be, at least when it comes to quantitative empirical research.

Now, what does a typical replication study look like? The aforementioned KNAW Advisory Report sketches four characteristics: it “(a) is carried out by a team of independent investigators; (b) generates new data; (c) follows the original protocol closely and justifies any deviations; and (d) attempts to explain the resulting degree of reproducibility.”. Footnote 28 Thus, even though, as I pointed out above, independence does not require that the replication study be carried out by different researchers than the original study, this is nonetheless often the case. Below, we will explore to what extent we encounter the combination of these characteristics in the humanities.

Before we move on to replicability and replication in the humanities, I would like to make two preliminary points. First, we should note that it follows from the definitions of “replicability” and “replication” given in this section that both replicability and replication are a matter of degree . Footnote 29 Replication studies can be pretty much identical to the original study, but very often there are slight or even somewhat larger alterations in samples, instruments, conditions, researcher skills, the body of researchers, and sometimes even changes in the method. One can change the method, for instance, in order to explore whether a similar finding can be obtained by way of a rather different method, or a finding that would similarly support one of the relevant underlying hypotheses, at least if the auxiliary assumptions are also met. Every replication study can be located on a continuum that goes from being a replication almost identical to the original study to hardly being a replication at all. The closer the replication study topic is to the topic of the original study, the more it counts as a replication study, and, similarly, for method, samples, conditions, and so on. How we ought to balance these various factors in assessing how much of a replication a particular study is, is a complicated matter that we need not settle here; all we need to realize is that replication is something that comes in degrees. As I briefly spelled out above, in laying out Etienne LeBel’s replication taxonomy, a study can be more or less of a replication of an original study. Footnote 30

Second, exactly what is it that should be replicable in a good replication study? There are at least three candidates here: the study as a whole, the inferences involved in the study, and the results of the study. Footnote 31 I will focus on the replicability of a study’s results . After all, as suggested in our discussion above, we want to leave room for the possibility of a direct replication (which uses new data, so that the study as a whole is not replicated), and a conceptual replication (which uses new data and a new research protocol, so that neither the study as a whole nor its specific inferences are replicated). This means that a study is replicable if a new study can be carried out, producing results that might agree with those of the original study in the sense specified above.

Potential obstacles to replication in the humanities

Now, one might think that, in opposition to the quantitative empirical sciences, such as the biomedical sciences, the humanities are not really suited for the phenomenon of replication. In this section, I discuss three arguments in support of this claim.

1. The first objection to the idea that replication is possible in the humanities is that, frequently, the study object in the humanities is unique Footnote 32 : there was one French Revolution in 1789–1799, there is one novel of Virginia Woolf named To the Lighthouse (1928), pieces of architecture, such as Magdalen College’s library in Oxford, are unique, and so on. Viruses, atoms, leg fractures, Borneo’s rhinos, economic measures, and many other study objects in the empirical sciences, have multiple instances. In a replication study one can investigate a different instance or token than the one studied in the original study; an instance or token of the same type.

However, this objection fails for two reasons. On the one hand, many study objects in the humanities do have multiple instances. On the other hand, quite a few study objects in the empirical sciences are unique. As to the former: Virginia Woolf’s To the Lighthouse is unique, but it is also one of many instances of novels using a stream-of-consciousness-narrative technique; the French Revolution is unique, but it is an instance of a social revolution, of which the American Revolution in 1775–1783 and the Russian Revolution in 1917 are other examples. Magdalen College library can be compared to other college libraries in Oxford, to other libraries across the country, and to other buildings in the late fifteenth century. And so on. Parts of linguistics study grammatical structures that, by definition, have many instances, as will be clear from any introduction to morphosyntax. Footnote 33 As to the quantitative empirical sciences: the big bang, the coming into existence of life on earth, space-time itself, and many other phenomena studied in the empirical sciences are unique phenomena: there is only one instance of them. Thus, the idea that the empirical sciences study phenomena that have multiple instances, whereas the humanities study unique phenomena is, as a general claim, untenable.

Second and more importantly, whether or not the object of study is unique or not is immaterial to the issue of the replicability of a study on that object. After all, one may study an object several times and studying it several times may even generate new data (a typical property of many replication studies, as we noted in the previous section). For example, even though the French revolution was a unique historical event (or a unique series of events), that event comprises so many data, laid down in artifacts, literary accounts, paintings, and so on, that it is possible to repeat a particular method—say, studying a text—and even discover new things about that unique event. Footnote 34

2. A second argument against the idea that replication is possible in the humanities is that many methodologies that are employed in the humanities do not lend themselves well to replication. By replicating an empirical study, say, on whether or not patients with incident migraine, in comparison with the general population, have higher absolute risks of suffering from myocardial infarction, stroke, peripheral artery disease, atrial fibrillation, and heart failure Footnote 35 one can, in principle, apply the same method or a similar method to new patients (say, a population from a different country). One can generate new data , thus making it likely—if replications consistently deliver sufficiently similar results—that the original results are true. One might think that no such thing takes place when one employs the methods of the humanities.

In response to this objection, I think it is important to note that there is a wide variety of methods used in the humanities. Among them are: more or less formal logic (in philosophy, theology, and law), literary analysis (in literary studies, philosophy, and theology), historical analysis (in historical studies, philosophy, and theology) and various narrative approaches Footnote 36 (in historical studies), constructivism (in art theory, for instance), Socratic questioning (in philosophy), methods involving empathy (in literary studies and art studies), conceptual analysis (in philosophy and theology), the hermeneutical method (in any humanistic discipline that involves careful reading of texts, such as law, history, and theology), interviews (e.g., in anthropology), and phenomenology (in philosophy). This is important to note, because, as I pointed out above, I only want to argue that replication is possible in the humanities to the extent that they are empirical . Replication may not be possible in disciplines that primarily use a deductive method and that do not collect and analyze data, such as logic, mathematics, certain parts of ethics, and metaphysics. This leaves plenty of room for replication in disciplines that are empirical, such as literary studies, linguistics, history, and the study of the arts.

Take the hermeneutical method. Does reading a text again make it, all else being equal, likely that one’s interpretation is correct? It seems to me the answer here has to be positive. There are at least two reasons for that. First, one may have made certain mistakes in one’s original reading and interpretation: faulty reading, sloppy analysis, forgetting relevant passages, and so on, on the first occasion may play a role. If one’s second interpretation differs from the first, one will normally realize that and revisit the relevant passage, comparing which of the two interpretations is more plausible. This will generally increase the likelihood that one comes to a correct interpretation of, say, the relevant passage in Ovid. Second, if one re-reads certain passages that will be with new background beliefs, given that humanistic scholars gradually acquire more knowledge in the course of their lives. That may lead to a new interpretation. Unless one thinks that new beliefs are as likely to be false as true—which seems implausible—carefully re-reading a passage with relevant new background beliefs and coming to the same result increases the likelihood of truth of one’s interpretation. These two points apply a forteriori when other rather than the same humanistic scholars apply the same method of interpretation (the hermeneutical approach or a historical-critical methodology) to the same text. They will come to an interpretation and compare it with the original one; if it differs, they are likely to revisit relevant passages and, thereby, filter out forgetting, sloppiness, and mistakes. Footnote 37 And, of course, they bring new background knowledge to a text. That as well makes it likely that when a study is consistently replicated, then, all else being equal, the original study results are likely to be true.

3. A third objection to the idea that replication is possible in the humanities, is that many of the study objects in the humanities are normative in the sense that they are objects of value and meaning, whereas this is not the case in many of the natural and biomedical sciences. René van Woudenberg, for instance, has argued in a recent paper that the objects of the humanities are such meaningful and/or valuable things as words, sentences, perlocutionary acts, buildings and paintings, music, and all sorts of artifacts. Molecules, laws of nature, diseases, and the like lack that specific sort of meaning and value. Footnote 38

In reply, let me say that I will grant the assumption that the humanities are concerned with objects of value and meaning, whereas the sciences are not (or at least not with those aspects of those objects). I think this is not entirely true: some humanistic disciplines, such as metaphysics, are also concerned with objects that do not have meaning or value, such as numbers or the nature of space-time. It will still be true for most humanistic disciplines, though.

However, this point is not relevant for the issue of replication. This can be seen by considering, on the one hand, a scenario in which knowledge about value and meaning is not possible and, on the other, a scenario in which knowledge about value and meaning is possible. First, imagine that it is impossible to uncover knowledge about objects with value and meaning and specifically about those aspects of those objects that concern value and meaning. One may think, for instance, that there are no such facts about value and meaning Footnote 39 or that they are all socially constructed, so that it would not be right to say that the humanities can uncover them. Footnote 40 This is, of course, a controversial issue. Here, I will not delve into this complex issue, which would merit a paper or more of its own. Rather, I would like to point out that if it is indeed impossible to uncover knowledge about value and meaning, then that is a problem for the humanities in general , and not specifically for the issue of replication in the humanities. For, if there is no value and meaning, or if all value and meaning is socially constructed and the humanities can, therefore, not truly uncover value and meaning, one may rightly wonder to what extent humanistic scholarship as an academic discipline is still possible.

Now, imagine, on the other hand, that it is possible to uncover knowledge about objects with value and meaning and even about those aspects of those objects that specifically concern value and meaning. Then, it seems possible to uncover such knowledge and understanding about the aspects that involve value and meaning multiple times for the same or similar objects. And that would mean that in that case, it would very well be possible to carry out a replication study that involves conclusions about value and meaning. Of course, given the fact that the objects have value and meaning, it might sometimes be harder to reach agreement among scholars. After all, background assumptions bear heavily on issues concerning value and meaning. However, as several examples below show, agreement about issues concerning value and meaning is still quite often possible in the humanities.

I conclude that three main reasons for thinking that replication is not possible in the humanities do not hold water.

A positive case for the possibility of replication in the humanities

So far, I have primarily deflected three objections to the possibility of replication in the humanities. Is there actually also a more detailed, positive case to be made for the possibility of replication in the humanities? Yes. In this section, I shall provide such a case.

My positive, more substantive case is an inductive one: there are many cases of replication studies in the humanities in the sense stipulated above: a study’s being such that a replication of it has successfully been carried out, producing results that agree with the original study. Moreover, they often meet the four stereotypical properties mentioned above: (a) they are carried out by a team of independent investigators; (b) they generate new data; (c) they follow the original protocol (or, at least, method description) closely and justify any deviations; and (d) attempt to explain the resulting degree of reproducibility.

Here is an example: re-interpreting Aurelius Augustine’s (354–430 AD) writings in order to see to what extent he continued to embrace or rejected Gnosticism. Using the hermeneutical method Footnote 41 —with such principles as that one should generally opt for interpretations of passages that make the text internally coherent, that one should, in interpreting a text, take its genre into account, and so on—and relevant historical background knowledge, it has time and again been confirmed that Augustine came to reject the basic tenets of Gnosticism—such as the Manicheistic idea that good and evil are two equally powerful forces in the world, but that it continued to exercise influence upon his thought—for instance, when it comes to his assessment of the extent to which we can enjoy things in themselves ( frui ) or merely for the sake of some higher good, namely God ( uti ). Footnote 42 Various independent researchers have argued this, in doing so they came up with new data (new passages or new historical background knowledge), they used the same hermeneutical or historical-critical method, and explained the consonance with the original results (and thus the successful replication, even though they would not have used that word) by sketching a larger picture of Augustine’s thought that made sense of his relation to Gnosticism.

Here is another example of a study that employs the hermeneutical method. The crucial difference with the previous example is that this is still a hotly debated issue and that it is not clear exactly what counts as a replication, since it is not clear that advocates and opponents share enough background beliefs in order to properly execute a replication study; only the future will tell us whether that is indeed the case. What I have in mind is the so-called New Perspective on Paul in New Testament theology. Since the 1960s, Protestant scholars started to interpret the New Testament letters of Paul differently from how they had been understood by Protestants so far. Historically, Lutherans and Reformed theologians had understood Paul as arguing that the good works of faith do not factor into their salvation—only faith itself would (in a slogan: sola fide ). The New Perspective, advocated by Ed Parish Sanders and Tom Wright, Footnote 43 however, has it that Paul was not so much addressing good works in general, but specific Jewish laws regarding circumcision, dietary laws, Sabbath laws, and other laws the observance of which set Jews apart from other nations. The New Perspective has been embraced by most Roman Catholic and Orthodox theologians and a substantial number of Protestants theologians, but is still very much under debate. Thus, we should not conclude from the fact that some studies that employ the hermeneutic method are replicable that all of them are: some of them may involve too many controversial background assumptions in order for a fairly straightforward replication to be possible.

However, it is easy to add examples of studies from other humanistic fields that meet the criterion of replicability. Here are two of them that use a different method than the hermeneutical one:

The granodiorite stele that was named the Rosetta Stone and that was found in 1799, has texts both in Ancient Egyptian, using hieroglyphic and Demotic script, and an Ancient Greek text. The differences in the content of these three texts are minor. The stone has turned out be the key in deciphering Egyptian hieroglyphs. A large number of scholars have studied the stone in detail and the most important results have been replicated multiple times. Footnote 44

It was established in 2013 by way of various methods—such as study of the materials, chemical composition, painting style, and a study of his letters—that the painting Sunset at Montmajour is a true Van Gogh. It was painted on July 4, 1888. If one has the right background knowledge and skills, one can fairly easily study the same data or collect further data in order to replicate this study. Footnote 45

I take the examples given so far to be representative and, therefore, to provide an inductive argument for the possibility of replication in the humanities: it turns out that in a variety of humanistic fields that employ different methods replication is possible.

Now, the KNAW Advisory Report Replication Studies mentions three things to pay attention to in carrying out a replication study: (i) look at the raw data, the final outcomes (results/conclusions) and/or everything in between, (ii) take a rigorous statistical approach or a more qualitative approach in making the comparison between the original study and the replication study, and (iii) define how much similarity is required for a successful replication. This is important, for it means that even the specific way in which a replication study is supposed to be carried out can be copied in a replication study in the humanities. After all, it is possible (i) to compare the original data (say, certain texts, archeological findings, the occurrence of certain verbs, and so on), the conclusions of the original study and the replication study, and everything in between, (ii) to take a qualitative approach and sometimes even, if not a rigorous statistical approach, at least a more quantitative approach, e.g., by counting the number of verbs in Shakespeare’s plays that end in “th” or “st,” and (iii) to define how much similarity between the original results and the results in the replication study is required for something’s being a successful replication, even though this will be harder or impossible to quantify , in opposition to many studies in, say, psychology and economics.

The desirability of replicability and replication in the humanities

It is widely agreed that replicability is a desideratum and replication an epistemically positive feature of a study in the quantitative empirical sciences. Given that, as we have seen in the preceding sections, replication is possible in the humanities, is it something we should pursue? Should we desire that studies be replicable and that a significant number of them be replicated, that is, that they are indeed replicated with a positive, confirming outcome?

The answer has to be: Yes. After all, if, as I argued, replication is possible in the humanities and consistent replication makes it likely that the results of the original study are true, then carrying out such replication studies contributes to such core epistemic aims of the academic enterprise as knowledge, insight, and understanding—which all require truth. Of course, one will have to find the right balance between carrying out new research—with, possibly or likely, stumbling upon new truths, never found before—and replicating a study and thereby making it likely that the original study results are true. However, there is nothing special about the humanities when it comes to the fact that we need to find the right balance between various intellectual goals: we need to find the right balance in any discipline—medicine, psychology, and economics included. This is not to deny that there may be important differences between various fields. Research indicates that as much as 70% of studies in social psychology turn out not to be replicated upon attempting to replicated them. Footnote 46 This gives us both epistemic reason—it decreases the likelihood of truth of the original study—and pragmatic reason—it defeats public trust in science as a source of knowledge—to carry out more replication studies. Thus, how much replication is needed depends on the epistemic state a particular discipline is in.

Certainly, it is not at all common to speak of a “replication crisis” in the case of the humanities, in contrast to some of the quantitative empirical sciences. As various philosophers, such as Martha Nussbaum, Footnote 47 have argued, though, there is at least a crisis in the humanities in the sense that they are relatively widely thought of as having a low epistemic status. They are thought to be not nearly as reliable as the sciences and not to provide any robust knowledge. To give just one example, according to American philosopher of science Alex Rosenberg:

When it comes to real understanding, the humanities are nothing we have to take seriously, except as symptoms. But they are everything we need to take seriously when it comes to entertainment, enjoyment, and psychological satisfaction. Just don’t treat them as knowledge or wisdom. Footnote 48

Another well-known example is the recent so-called grievance studies affair (or hoax). This was an attempt in 2017–2018 by three scholars—James Lindsay, Peter Boghossian, and Helen Pluckrose—to test the editorial and peer review process of various fields in the humanities. They did so by trying to get bogus papers published in influential academic journals in fields such as feminism studies, gender studies, race studies, and sexuality studies. They managed to publish a significant number of papers (which were all retracted after the hoax was revealed), and got an even larger number accepted (without yet being published). However, it is rather controversial exactly what this hoax shows about the epistemic status of these fields in the humanities. Footnote 49 Some have argued that the results would have been similar in pretty much any other empirical discipline, Footnote 50 and still others that we cannot conclude anything from this hoax, since there was no control group. Footnote 51

In any case, there may well be a crisis in how the humanities are perceived. Yet, there does not seem to be a replication crisis —at least, it is usually not framed as such. There may, therefore, be somewhat less of a social and epistemic urge to carry out replication studies in the humanities. However, given the epistemic and pragmatic reasons to do so, carrying out at least some replication studies would be good for the humanities and for how they are publicly perceived.

We should also realize that one of the reasons that people started to talk about a replication crisis in certain empirical sciences in the first place was that, apart from problems with replicability (some studies did not even meet that desideratum), for some studies an attempt at replication took place but was unsuccessful, so they met replicability as a desideratum, but not the positive property of replication. That showed the need for more replication studies. Thus, one way to discover the need for replication studies is, paradoxically, to carry out such replication studies. This means that, in order to establish the extent to which replication studies are needed in various fields in the humanities, we should simply carry them out.

Before we move on, I would like to discuss an objection against the desirability of replication in the humanities. The objection is that even though replication may well be possible in the humanities, it is not particularly desirable—not something to aim at or invest research money on—because there is simply too much disagreement in the humanities for there to be a successful replication sufficiently often. Thus, even though many humanistic studies would be replicable, carrying out a replication study would in the majority of cases lead to different results. In philosophy, for instance, there is a rather radical divide between scholars in the analytic tradition and scholars in the continental tradition. One might think it likely that a replication of any study by members of the one group would lead to substantially different results if carried out by members of the other group.

We should not forget, though, that we find radically different sorts of schools within, say, economics or physics. In economics, for instance, we find the economics of the Saltwater school, the economics of the Freshwater school, and, more rarely, institutional economics, Austrian economics, feminist economics, Marxian economics, and ecological economics. In quantum mechanics, we find a wide variety of different interpretations with different ideas about randomness and determinacy, the nature of measurement, and which elements in quantum mechanics can be considered real: the Standard or Copenhagen interpretation, the consistent histories interpretation, the many worlds interpretation, the transactional interpretation, and so on.

The problem that this objection draws our attention to, then, is a general one: if a study from one school of thought is replicated by members of a different school of thought, it is much more likely that relevant background assumptions will be different and various auxiliary hypotheses will play an important role. This may make it easier for the researchers of the original study to reject the results of the new study if they differ from those of the original one: they may point to different background assumptions and different auxiliary hypotheses. That does not necessarily undermine the value of those replication studies, though revision in background assumptions or change in auxiliary hypotheses may be widely considered to be an improvement in comparison with the original study or a legitimate change for other reasons. Also, even if the study’s background assumptions are different and various auxiliary hypotheses differ, the study may still be successfully replicated.

The most important point to note here, though, is that, to the extent that this is a problem (and we have seen that it is not necessarily a problem at all), it is a general problem and not one that is unique to the humanities.

This is not to deny that there may be situations in which there is too much divergence on background assumptions, method, relevant auxiliary hypotheses, and so on, to carry out a replication study. This will be the case for some humanistic studies and research groups, as it will be the case for some scientific studies and research groups. What this means is that in some humanistic disciplines, replicability is still a desideratum and replicability surely is still a positive property, but the absence of replicability because of severe limits on the possibility of replication is not necessarily a reason to discard that study. In other words, in balancing the theoretical virtues of various hypotheses and studies in the humanities, replicability will sometimes not be weighed as heavily as, say, consistency with background knowledge, simplicity, internal coherence, and other intellectual virtues. That is, of course, as such not a problem at all, as the weight of various intellectual virtues differs from discipline to discipline anyway; predictive power, for instance, is crucial in much of physics, but carries much less weight in economics and evolutionary biology.

Conclusions

I conclude that replication is possible in the humanities. By that, I mean that empirical studies in the humanities are often such that an independent repetition of it, using similar or different methods and conducted under similar circumstances, can be carried out. I also conclude that replicability is desirable in the humanities: by that, I mean that many empirical studies in the humanities should indeed be such that an independent repetition of it, using similar or different methods and conducted under similar circumstances, can be carried out. And I conclude that carrying out replication studies in the humanities is desirable: we should actually frequently carry out such independent repetitions of published studies. Exactly how desirable replication in the humanities is remains to be seen; paradoxically, carrying out replication studies in the humanities may tell us more about exactly how desirable doing so is.

See Begley [ 2 ].

See Open Science Collaboration [ 3 ].

See Baker [ 4 ]; Ioannidis [ 5 ]; Nuzzo [ 6 ]; Munafò and Smith [ 7 ].

See http://www8.nationalacademies.org/cp/projectview.aspx?key=49906 , last visited May 1, 2018.

For full bibliographical details, see the list in the “References” section.

For overviews of such causes, see AMS [ 8 ], 5, 16–21; IAP [ 9 ], 1; KNAW, 23–24 [ 10 ]; Munafò et al. [ 11 ], 2. Bouter [ 12 ] further analyzes the causes for various kinds of questionable research practices. For the issue of lack of effective peer review, see, for instance, [ 13 ]. In this paper, Smith argues that there is actually no systematically acquired evidence for thinking that peer review is a good quality assurance mechanism and that we do have good evidence for thinking that peer review has a large number of downsides.

For these points, see also KNAW [ 10 ], 4, 20–22.

Note the KNAW Advisory Report’s subtitle: Improving Reproducibility in the Empirical Sciences .

KNAW [ 10 ], 16.

See [ 14 , 15 ], and a recent co-authored blog: [ 16 ]. The paper at hand provides a much more in-depth exploration of the ideas about replication in the humanities advocated in these three pieces.

The humanities are to be distinguished from the sciences , where I take the sciences to include the applied sciences, such as medicine, engineering, computer science, and applied physics, the formal sciences, such as decision theory, statistics, systems theory, theoretical computer science, and mathematics, the natural sciences, such as physics, chemistry, earth science, ecology, oceanography, geology, meteorology, astronomy, life science, biology, zoology, and botany, and the social sciences, such as criminology, economy, and psychology.

I would be happy, though, to embrace the definitions given of these terms in the KNAW Advisory Report, viz. for “robustness”: the extent to which the conclusions depend on minor changes in the procedures and assumptions, for “reliability of measurements”: the measurement error due to variation, and for “verifiability of results”: the extent to which the study documentation provides enough information on how results have been attained to assess compliance with relevant standards (cf. [ 10 ], 19). As will become clear from what follows in this section, the phenomena of robustness, reliability, and verifiability, thus understood, are in interesting ways related to , but nevertheless clearly conceptually distinct from replication, replicability, reproduction, and reproducibility.

Some people use the word “reproduction” somewhat more narrowly, namely merely for a study that re-analyzes the same data of the original study and scrutinizes whether they lead to the same results. I will use a broader definition here.

For a similar definition, see KNAW [ 10 ], 18; NSF [ 17 ], 4–5. IAP [ 9 ], unfortunately, provides no definition.

A fourth option, not mentioned in the report, is to carry out a replication with the same data and a new or revised research protocol .

For the purposes of the paper, I take a “research protocol” to be primarily a description of the study design: a description of which data are taken to be relevant and which method is used.

Italics are mine. See https://www.nwo.nl/en/funding/our-funding-instruments/sgw/replication-studies/replication-studies.html , last visited August 30, 2018. For the different kinds of replication, see also [ 18 ].

The KNAW [ 10 ] report, for instance, does not.

See LeBel et al. [ 19 ], 9; see also Earp and Trafimow [ 20 ].

Thus, for instance, KNAW [ 10 ], 18: “reproducibility concerns the extent to which the results of a replication study agree with those of the earlier study.”

For example, LeBel et al. [ 19 ].

See, for instance, Popper [ 21 ].

See [ 22 ]. As to the role of auxiliary assumptions, such as ones about the role of language, he also gives a particular example that illustrates this claim—one about walking speed in response to being primed with the elderly stereotype (the original study being [ 23 ]). For further illustrations of the fact that direct falsification of a theory is virtually impossible, see [ 20 , 24 ].

See Earp and Trafimov [ 20 ]

This is not to deny that Popper himself thought falsification to be a good thing, since he believed scientific progress to consist of instances of falsification (see [ 25 ], 215–250).

For example, AMS [ 8 ], 9.

This approach to agreement on results squares well with that of [ 26 ], and that of [ 27 ]. I thank Lex Bouter for helpful suggestions on this point.

KNAW [ 10 ], 33

This is also noted in the KNAW [ 10 ] Report, 18, 25, and spelled out in more detail in [ 19 ], 14. For more on the nature of degrees, that is, for what I take it to be for something to come in degrees, see [ 28 ].

See LeBel et al. [ 19 ]

Thus, also KNAW [ 10 ], 4, 19.

This worry is also found in the KNAW Advisory Report: KNAW [ 10 ], 17, 29, even though it is pointed out that this worry might not be decisive. We find the same idea among certain neo-Kantians; see, for instance, [ 29 ].

See, for instance, Payne [ 30 ].

Moreover, one may wonder whether there are such things as unique historical events studied by historians. One might think, for instance, that the French revolution is not a unique historical event, but just a series of (virtually) infinitely many smaller events, and that history always studies a combination of those events rather than a single, unique event.

This was concluded by a recent study; see [ 31 ].

For example, Lorenz [ 32 ].

In a way, then, replication—including replication in the humanities—is like what mathematicians do in checking a proof and lay people in checking a particular calculation (say, splitting the bill in a restaurant); if a large number of competent people come to the same result, then, all else being equal, the result is likely to be true.

See [ 33 ], 112–122. This is not to deny that they may have meaning or significance in some sense; the double-helix structure of DNA may be of special significance to, say, James Watson, Francis Crick, and Rosalind Franklin.

For a defense of this position, see [ 34 ].

For an exploration and discussion, see, for instance, [ 35 ].

For a more detailed exposition and discussion of the hermeneutical method, see [ 36 , 37 ].

For this interpretation, see various essays in Van [ 38 ]; and [ 39 ].

See Sander [ 40 ] and Wright [ 41 ].

For an overview of much research on the Rosetta stone, see [ 42 ].

See Van Tilborgh, Meedendorp, Van Maanen [ 43 ].

See, for instance, Klein [ 44 ].

See Nussbaum [ 45 ], chapter 1.

Rosenberg [ 34 ], 307.

See Lindsay et al. [ 46 ]. For a more positive assessment, see Mounk 2018 [ 47 ].

See Engber 2018 [ 48 ].

See [ 49 ]. For a defense of the hoax on these two points, see Essig, Moorti 2018 [ 50 ].

NAS: National Academies of Sciences, Engineering, and Medicine. Statistical Challenges in Assessing and Fostering the Reproducibility of Scientific Results: Summary of a Workshop: National Academies Press; 2016. https://www.nap.edu/catalog/21915/statistical-challenges-in-assessing-and-fostering-the-reproducibility-of-scientific-results , last visited May 1 st 2018

Begley E. Raise Standards for Preclinical Cancer Research. Nature. 2012;483:531–3.

Article   Google Scholar  

Open Science Collaboration. Estimating the reproducibility of psychological science. Science. 2015;349:6351. https://doi.org/10.1126/science.aac4716.

Baker M. Is there a replicability crisis? Nature. 2016;533:452–4.

Ioannidis JP. Why most published research findings are false. PLoS Med. 2005;2(8):e124.

Nuzzo R. Fooling ourselves. Nature. 2015;526:182–5.

Munafò MR, Smith D. Repeating experiments is not enough. Nature. 2018;553:399–401.

AMS: The Academy of Medical Sciences. Reproducibility and reliability of biomedical research: improving research practice. Symposium report. 2015 https://acmedsci.ac.uk/file-download/38189-56531416e2949.pdf, last visited May 1 st 2018 .

IAP: Interacademy Partnership for Health. A call for action to improve the reproducibility of biomedical research. 2016 http://www.interacademies.org/39535/Improving-the-reproducibility-of-biomedical-research-a-call-for-action , last visited May 1 st 2018.

KNAW: Royal Dutch Academy of Arts and Sciences. Replication studies: improving reproducibility in the empirical sciences, Amsterdam. 2018 https://knaw.nl/en/news/publications/replication-studies , last visited May 1 st 2018.

Munafò MR, et al. A Manifesto for Reproducible Science. Nat Hum Behav. 2017;1(art. 0021):1–9. https://doi.org/10.1038/s41562-016-0021 .

Bouter LM. Fostering responsible research practices is a shared responsibility of multiple stakeholders. J Clin Epidemiol. 2018;96:143–6.

Smith R. Classical peer review: an empty gun. Breast Cancer Res. 2010;12(4):S13.

Peels R, Bouter L. Replication drive for humanities. Nature. 2018a;558:372.

Peels R, Bouter L. The possibility and desirability for replication in the humanities. Palgrave Commun. 2018b;4:95. https://doi.org/10.1057/s41599-018-0149-x .

Peels, Rik, Lex Bouter. Replication is both possible and desirable in the humanities, just as it is in the sciences, London School of Economics and Political Science Impact Blog, 10 October. 2018c http://blogs.lse.ac.uk/impactofsocialsciences/2018/10/01/replication-is-both-possible-and-desirable-in-the-humanities-just-as-it-is-in-the-sciences/ .

NSF: National Science Foundation. (2015). Social, Behavioral, and Economic Sciences Perspectives on Robust and Reliable Science: Report of the Subcommittee on Replicability in Science Advisory Committee to the National Science Foundation Directorate for Social, Behavioral, and Economic Sciences, https://www.nsf.gov/sbe/AC_Materials/SBE_Robust_and_Reliable_Research_Report.pd , last visited May 1 st 2018.

Google Scholar  

Radder H. The material realization of science: from Habermas to experimentation and referential realism. Dordrecht: Springer; 2012.

Book   Google Scholar  

LeBel EP, McCarthy RJ, Earp BD, Elson M, Vanpaemel W. A unified framework to quantify the credibility of scientific findings. Adv Methods Pract Psychol Sci. 2018 forthcoming. https://doi.org/10.1177/2515245918787489 .

Earp BD, Trafimow D. Replication, falsification, and the crisis of confidence in social psychology. Front Psychol. 2015;6:621.

Popper KR. Zwei Bedeutungen von Falsifizierbarkeit [Two Meanings of Falsifiability]. In: Seiffert H, Radnitzky G, editors. Handlexikon der Wissenschaftstheorie. München: Deutscher Taschenbuch Verlag; 1994. p. 82–5.

Earp BD. Falsification: How Does It Relate to Reproducibility? In: Morin J-F, Olsson C, Atikcan EO, editors. Key Concepts in Research Methods. Abingdon, New York: Routledge; 2018. Available online ahead of print at https://www.academia.edu/36659820/Falsification_How_does_it_relate_to_reproducibility/ .

Bargh JA, Chen M, Burrows L. Automaticity of social behavior: direct effects of trait construct and stereotype activation on action. J Pers Soc Psychol. 1996;71(2):230–44.

Trafimow D, Earp BD. Badly specified theories are not responsible for the replication crisis in social psychology: comment on Klein. Theory Psychol. 2016;26(4):540–8.

Popper KR. Conjectures and Refutations. New York: Harper; 1965. p. 1965.

Goodman SN, Fanelli D, Ioannidis JPA. What does reproducibility really mean? Sci Transl Med. 2016;8(341):ps12.

Nosek BA, Errington TM. Making sense of replications. eLIFE. 2017;6:e23383.

Van Woudenberg R, Peels R. The metaphysics of degrees. Eur J Philos. 2018;26(1):46–65.

Windelband W, Oakes G. History and Natural Science. History and Theory. 1980;19(2):165–8 (originally published in 1924).

Payne T. Describing Morphosyntax: a guide for field linguists. Cambridge: Cambridge University; 1997.

Adelborg K, et al. Migraine and risk of cardiovascular diseases: Danish population based matched cohort study. Br Med J. 2018;360:k96. https://doi.org/10.1136/bmj.k96 published January 31 st .

Lorenz C. Constructing the past. Princeton: Princeton University Press; 2008.

Van Woudenberg R. The nature of the humanities. Philosophy. 2017;93(1):109–40.

Rosenberg A. The Atheist’s guide to reality. New York: Norton; 2012.

Kukla A. Social Constructivism and the Philosophy of Science. Oxford: Routledge; 2000.

Malpas J, Gander H-H. The Routledge Companion to Hermeneutics. New York: Routledge; 2015.

Nial K, Lawn C, editors. The Blackwell companion to hermeneutics. Oxford: Blackwell; 2016.

Van den Berg, Albert J, Kotzé A, Nicklas T, Scopello M. In Search of Truth: Augustine, Manichaeism and other Gnosticism: Studies for Johannes van Oort at Sixty, Nag Hammadi and Manichaean Studies 74. Leiden: Brill; 2010.

Meconi DV, Stump E, editors. The Cambridge Companion to Augustine. Cambridge: Cambridge University Press; 2014.

Sander EP. Paul and Palestinian Judaism: a comparison of patterns of religion. Philadelphia: Fortress Press; 1977.

Wright NT. Paul and his recent interpreters. Minneapolis: Augsburg Fortress; 2014.

Ray JD. The Rosetta Stone and the Rebirth of Ancient Egypt. Cambridge, Mass: Harvard University Press; 2007.

Van Tilborgh, L, T Meedendorp, O van Maanen. ‘Sunset at Montmajour’: a newly discovered painting by Vincent van Gogh, Burlingt Mag. 2013 155 (no. 1327).

Klein RA, Ratliff KA, Vianello M, Adams RB Jr, Bahnik S, Bernstein MJ, Bocian K, Bary Kappes H, Nosek BA. Investigating variation in replicability. Soc Psychol. 2014;45:142–52.

Nussbaum M. Not for profit: why democracy needs the humanities. Princeton: Princeton University Press; 2010.

Lindsay, JA., P Boghossian, H Pluckrose. Academic Grievance Studies and the Corruption of Scholarship, Areo Magazine, October 2 nd . 2018 https://areomagazine.com/2018/10/02/academic-grievance-studies-and-the-corruption-of-scholarship/ .

Mounk, Y. The Circling of the Academic Wagons, The Chronicle of Higher Education, 9 October. 2018 https://web.archive.org/web/20181010122828/ ; https://www.chronicle.com/article/What-the-Grievance/244753 .

Engber, Daniel. What the “Grievance Studies” Hoax Actually Reveals. Slate. 2018. https://slate.com/technology/2018/10/grievance-studieshoax-not-academic-scandal.html .

Hughes, V, P Aldhous. Here’s what critics say about that big new hoax on gender studies, Buzzfeed News, 10-09-2018. 2018 https://www.buzzfeednews.com/article/virginiahughes/grievance-studies-sokal-hoax .

Essig L, Moorti S. Only a Rube Would Believe Gender Studies Has Produced Nothing of Value: The Chronicle of Higher Education; 2018.

Download references

Acknowledgements

For their helpful comments on an earlier version of this paper, I would like to thank Lieke Asma, Valentin Arts, Wout Bisschop, Lex Bouter, Jeroen de Ridder, Tamarinde Haven, Thirza Lagewaard, Chris Ranalli, Joeri Tijdink, and René van Woudenberg. I also thank various audience members for their constructive suggestions at the KNAW Royal Netherlands Academy of Sciences meeting on replicability ( Reproduceerbaarheid van wetenschappelijk onderzoek: Wetenschapsbreed van belang? ) on March 5, 2018. Finally, I thank Brian Nosek and an anonymous referee for their constructive review of the paper for this journal.

This publication was made possible through the support of a grant from the Templeton World Charity Foundation: “The Epistemic Responsibilities of the University” (2016–2019). The opinions expressed in this publication are those of the author and do not necessarily reflect the views of the Templeton World Charity Foundation.

Availability of data and materials

Not applicable

Author information

Authors and affiliations.

Philosophy Department, Faculty of Humanities, Vrije Universiteit Amsterdam, De Boelelaan 1105, 1081 HV, Amsterdam, The Netherlands

You can also search for this author in PubMed   Google Scholar

Contributions

The author read and approved the final manuscript.

Corresponding author

Correspondence to Rik Peels .

Ethics declarations

Ethics approval and consent to participate, consent for publication, competing interests.

The author declares that he has no competing interests.

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License ( http://creativecommons.org/licenses/by/4.0/ ), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver ( http://creativecommons.org/publicdomain/zero/1.0/ ) applies to the data made available in this article, unless otherwise stated.

Reprints and permissions

About this article

Cite this article.

Peels, R. Replicability and replication in the humanities. Res Integr Peer Rev 4 , 2 (2019). https://doi.org/10.1186/s41073-018-0060-4

Download citation

Received : 03 October 2018

Accepted : 13 December 2018

Published : 09 January 2019

DOI : https://doi.org/10.1186/s41073-018-0060-4

Share this article

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

  • Normativity
  • Replication crisis

Research Integrity and Peer Review

ISSN: 2058-8615

the research study cannot be replicated or repeated

A New Replication Crisis: Research that is Less Likely to be True is Cited More

Papers that cannot be replicated are cited 153 times more because their findings are interesting, according to a new uc san diego study.

Credit: Dilok Klaisataporn/iStock. Findings from studies that cannot be verified when the experiments are repeated are cited 153 times more because the research is interesting, according to a new UC San Diego paper.

  • Christine Clark

Media Contact:

Published Date

Share this:, article content.

Papers in leading psychology, economic and science journals that fail to replicate and therefore are less likely to be true are often the most cited papers in academic research, according to a new study by the University of California San Diego’s Rady School of Management.

Published in Science Advances , the paper explores the ongoing “replication crisis” in which researchers have discovered that many findings in the fields of social sciences and medicine don’t hold up when other researchers try to repeat the experiments.

The paper reveals that findings from studies that cannot be verified when the experiments are repeated have a bigger influence over time. The unreliable research tends to be cited as if the results were true long after the publication failed to replicate.  

“We also know that experts can predict well which papers will be replicated,” write the authors Marta Serra-Garcia, assistant professor of economics and strategy at the Rady School and Uri Gneezy, professor of behavioral economics also at the Rady School. “Given this prediction, we ask ‘why are non-replicable papers accepted for publication in the first place?’”

Their possible answer is that review teams of academic journals face a trade-off. When the results are more “interesting,” they apply lower standards regarding their reproducibility.

The link between interesting findings and nonreplicable research also can explain why it is cited at a much higher rate—the authors found that papers that successfully replicate are cited 153 times less than those that failed.

“Interesting or appealing findings are also covered more by media or shared on platforms like Twitter, generating a lot of attention, but that does not make them true,” Gneezy said. 

Serra-Garcia and Gneezy analyzed data from three influential replication projects which tried to systematically replicate the findings in top psychology, economic and general science journals (Nature and Science). In psychology, only 39 percent of the 100 experiments successfully replicated. In economics, 61 percent of the 18 studies replicated as did 62 percent of the 21 studies published in Nature/Science.

With the findings from these three replication projects, the authors used Google Scholar to test whether papers that failed to replicate are cited significantly more often than those that were successfully replicated, both before and after the replication projects were published. The largest gap was in papers published in Nature/Science: non-replicable papers were cited 300 times more than replicable ones.

When the authors took into account several characteristics of the studies replicated—such as the number of authors, the rate of male authors, the details of the experiment (location, language and online implementation) and the field in which the paper was published—the relationship between replicability and citations was unchanged.

They also show the impact of such citations grows over time. Yearly citation counts reveal a pronounced gap between papers that replicated and those that did not. On average, papers that failed to replicate are cited 16 times more per year. This gap remains even after the replication project is published.

“Remarkably, only 12 percent of post-replication citations of non-replicable findings acknowledge the replication failure,” the authors write.

The influence of an inaccurate paper published in a prestigious journal can have repercussions for decades. For example, the study Andrew Wakefield published in The Lancet in 1998 turned tens of thousands of parents around the world against the measles, mumps and rubella vaccine because of an implied link between vaccinations and autism. The incorrect findings were retracted by The Lancet 12 years later, but the claims that autism is linked to vaccines continue. 

The authors added that journals may feel pressure to publish interesting findings, and so do academics. For example, in promotion decisions, most academic institutions use citations as an important metric in the decision of whether to promote a faculty member.

This may be the source of the “replication crisis,” first discovered the early 2010s.

“We hope our research encourages readers to be cautious if they read something that is interesting and appealing,” Serra-Garcia said. “Whenever researchers cite work that is more interesting or has been cited a lot, we hope they will check if replication data is available and what those findings suggest.”

Gneezy added, “We care about the field and producing quality research and we want to it to be true.”

You May Also Like

Opening portals to the underworld, servicenow developer david loo returns to his roots to accept cse’s distinguished alumni award, take 10 with a triton: carrie metzgar is chasing stars, medical innovator: farah sheikh, stay in the know.

Keep up with all the latest from UC San Diego. Subscribe to the newsletter today.

You have been successfully subscribed to the UC San Diego Today Newsletter.

Campus & Community

Arts & culture, visual storytelling.

  • Media Resources & Contacts

Signup to get the latest UC San Diego newsletters delivered to your inbox.

Award-winning publication highlighting the distinction, prestige and global impact of UC San Diego.

Popular Searches: Covid-19   Ukraine   Campus & Community   Arts & Culture   Voices

What Does It Mean to Replicate a Study?

Replication studies put researchers’ conclusions to the test by creating new versions of the original experiment

In principle, a well-designed and well-documented experiment will accurately describe a real phenomenon and provide other researchers with everything they need to reproduce those results in a comparable study.

To replicate an experiment, researchers use the same methods of data collection and apply the same analysis, even though they have to use new subjects, often in somewhat different situations. Subjects may be generally older or younger, for example, or from a different country.

While later tests can’t possibly be identical to the original, they need to be similar enough to yield comparable data.

Concepts like “similar enough” and “comparable data” involve a level of subjectivity that adds to the controversy around replication. When a study fails a replication attempt, defenders often question whether the followup truly reflected the original. Researchers help to resolve such debates when they openly share their data and methods.

Recent Posts

U of T’s 197th Birthday Quiz

Test your knowledge of all things U of T in honour of the university’s 197th anniversary on March 15!

Are Cold Plunges Good for You?

Research suggests they are, in three ways

Work Has Changed. So Have the Qualities of Good Leadership

Rapid shifts in everything from technology to employee expectations are pressuring leaders to constantly adapt

About The Author

Patchen Barss

Leave a reply.

Your email address will not be published. Required fields are marked *

Save my name, email, and website in this browser for the next time I comment.

  • Culture & Society
  • Faculty & Staff
  • Our History
  • Your Subscription
  • Update Your Address
  • Advertising
  • Research & Ideas /
  • Technology /
  • Culture & Society /
  • Faculty & Staff /
  • Our History /
  • All Issues /
  • Your Subscription /
  • Update Your Address /
  • Advertising /
  • Contact Us /
  • Follow Us /
  • Instagram /

share this!

May 21, 2021

A new replication crisis: Research that is less likely to be true is cited more

by University of California - San Diego

A new replication crisis: Research that is less likely to be true is cited more

Papers in leading psychology, economic and science journals that fail to replicate and therefore are less likely to be true are often the most cited papers in academic research, according to a new study by the University of California San Diego's Rady School of Management.

Published in Science Advances , the paper explores the ongoing " replication crisis " in which researchers have discovered that many findings in the fields of social sciences and medicine don't hold up when other researchers try to repeat the experiments.

The paper reveals that findings from studies that cannot be verified when the experiments are repeated have a bigger influence over time. The unreliable research tends to be cited as if the results were true long after the publication failed to replicate.

"We also know that experts can predict well which papers will be replicated," write the authors Marta Serra-Garcia, assistant professor of economics and strategy at the Rady School and Uri Gneezy, professor of behavioral economics also at the Rady School. "Given this prediction, we ask 'why are non-replicable papers accepted for publication in the first place?'"

Their possible answer is that review teams of academic journals face a trade-off. When the results are more "interesting," they apply lower standards regarding their reproducibility.

The link between interesting findings and nonreplicable research also can explain why it is cited at a much higher rate—the authors found that papers that successfully replicate are cited 153 times less than those that failed.

"Interesting or appealing findings are also covered more by media or shared on platforms like Twitter, generating a lot of attention, but that does not make them true," Gneezy said.

Serra-Garcia and Gneezy analyzed data from three influential replication projects which tried to systematically replicate the findings in top psychology, economic and general science journals ( Nature and Science ). In psychology, only 39 percent of the 100 experiments successfully replicated. In economics, 61 percent of the 18 studies replicated as did 62 percent of the 21 studies published in Nature / Science .

With the findings from these three replication projects, the authors used Google Scholar to test whether papers that failed to replicate are cited significantly more often than those that were successfully replicated, both before and after the replication projects were published. The largest gap was in papers published in Nature / Science : non-replicable papers were cited 300 times more than replicable ones.

When the authors took into account several characteristics of the studies replicated—such as the number of authors, the rate of male authors, the details of the experiment (location, language and online implementation) and the field in which the paper was published—the relationship between replicability and citations was unchanged.

They also show the impact of such citations grows over time. Yearly citation counts reveal a pronounced gap between papers that replicated and those that did not. On average, papers that failed to replicate are cited 16 times more per year. This gap remains even after the replication project is published.

"Remarkably, only 12 percent of post-replication citations of non-replicable findings acknowledge the replication failure," the authors write.

The influence of an inaccurate paper published in a prestigious journal can have repercussions for decades. For example, the study Andrew Wakefield published in The Lancet in 1998 turned tens of thousands of parents around the world against the measles, mumps and rubella vaccine because of an implied link between vaccinations and autism. The incorrect findings were retracted by The Lancet 12 years later, but the claims that autism is linked to vaccines continue.

The authors added that journals may feel pressure to publish interesting findings, and so do academics. For example, in promotion decisions, most academic institutions use citations as an important metric in the decision of whether to promote a faculty member.

This may be the source of the "replication crisis," first discovered the early 2010s.

"We hope our research encourages readers to be cautious if they read something that is interesting and appealing," Serra-Garcia said. "Whenever researchers cite work that is more interesting or has been cited a lot, we hope they will check if replication data is available and what those findings suggest."

Gneezy added, "We care about the field and producing quality research and we want to it to be true."

Journal information: Science Advances , Nature , Science , The Lancet

Provided by University of California - San Diego

Explore further

Feedback to editors

the research study cannot be replicated or repeated

Digging up new species of Australia and New Guinea's giant fossil kangaroos

7 hours ago

the research study cannot be replicated or repeated

Aboriginal people made pottery, sailed to distant islands thousands of years before Europeans arrived

16 hours ago

the research study cannot be replicated or repeated

The experimental demonstration of a verifiable blind quantum computing protocol

Apr 13, 2024

the research study cannot be replicated or repeated

A machine learning-based approach to discover nanocomposite films for biodegradable plastic alternatives

the research study cannot be replicated or repeated

Saturday Citations: Listening to bird dreams, securing qubits, imagining impossible billiards

the research study cannot be replicated or repeated

Physicists solve puzzle about ancient galaxy found by Webb telescope

the research study cannot be replicated or repeated

Researchers study effects of solvation and ion valency on metallopolymers

the research study cannot be replicated or repeated

Chemists devise easier new method for making a common type of building block for drugs

the research study cannot be replicated or repeated

Research team discovers more than 50 potentially new deep-sea species in one of the most unexplored areas of the planet

Apr 12, 2024

the research study cannot be replicated or repeated

New study details how starving cells hijack protein transport stations

Relevant physicsforums posts, why is two-tone ska rock popular on retro radio.

5 hours ago

Today's Fusion Music: T Square, Cassiopeia, Rei & Kanade Sato

6 hours ago

Cover songs versus the original track, which ones are better?

For ww2 buffs.

13 hours ago

A Rain Song -- Favorite one? Memorable one? One you like?

Interesting anecdotes in the history of physics.

More from Art, Music, History, and Linguistics

Related Stories

the research study cannot be replicated or repeated

Level of media coverage for scientific research linked to number of citations

Jul 1, 2020

the research study cannot be replicated or repeated

Successful research papers cite young references

Apr 15, 2019

the research study cannot be replicated or repeated

Scholars take aim at false positives in research

Sep 4, 2017

Study details shortage of replication in education research

Aug 14, 2014

Exacerbating the replication crisis in science: Replication studies are often unwelcome

Apr 11, 2017

Retracted scientific paper persists in new citations, study finds

Jan 5, 2021

Recommended for you

the research study cannot be replicated or repeated

Saturday Citations: AI and the prisoner's dilemma; stellar cannibalism; evidence that EVs reduce atmospheric CO₂

Apr 6, 2024

the research study cannot be replicated or repeated

Saturday Citations: 100-year-old milk, hot qubits and another banger from the Event Horizon Telescope project

Mar 30, 2024

the research study cannot be replicated or repeated

Saturday Citations: An anemic galaxy and a black hole with no influence. Also: A really cute bug

Mar 23, 2024

the research study cannot be replicated or repeated

Saturday Citations: The volcanoes of Mars; Starship launched; 'Try our new menu item,' say Australian researchers

Mar 16, 2024

the research study cannot be replicated or repeated

Saturday Citations: New hope for rumbly guts; 'alien' signal turns out to be terrestrial and boring. Plus: A cool video

Mar 9, 2024

Let us know if there is a problem with our content

Use this form if you have come across a typo, inaccuracy or would like to send an edit request for the content on this page. For general inquiries, please use our contact form . For general feedback, use the public comments section below (please adhere to guidelines ).

Please select the most appropriate category to facilitate processing of your request

Thank you for taking time to provide your feedback to the editors.

Your feedback is important to us. However, we do not guarantee individual replies due to the high volume of messages.

E-mail the story

Your email address is used only to let the recipient know who sent the email. Neither your address nor the recipient's address will be used for any other purpose. The information you enter will appear in your e-mail message and is not retained by Phys.org in any form.

Newsletter sign up

Get weekly and/or daily updates delivered to your inbox. You can unsubscribe at any time and we'll never share your details to third parties.

More information Privacy policy

Donate and enjoy an ad-free experience

We keep our content available to everyone. Consider supporting Science X's mission by getting a premium account.

E-mail newsletter

New Research

Scientists Replicated 100 Psychology Studies, and Fewer Than Half Got the Same Results

The massive project shows that reproducibility problems plague even top scientific journals

Brian Handwerk

Science Correspondent

42-52701089.jpg

Academic journals and the press regularly serve up fresh helpings of fascinating psychological research findings. But how many of those experiments would produce the same results a second time around?

According to work presented today in Science , fewer than half of 100 studies published in 2008 in three top psychology journals could be replicated successfully. The international effort included 270 scientists who re-ran other people's studies as part of The Reproducibility Project: Psychology , led by Brian Nosek of the University of Virginia .

The eye-opening results don't necessarily mean that those original findings were incorrect or that the scientific process is flawed. When one study finds an effect that a second study can't replicate, there are several possible reasons, says co-author Cody Christopherson of Southern Oregon University. Study A's result may be false, or Study B's results may be false—or there may be some subtle differences in the way the two studies were conducted that impacted the results.

“This project is not evidence that anything is broken. Rather, it's an example of science doing what science does,” says Christopherson. “It's impossible to be wrong in a final sense in science. You have to be temporarily wrong, perhaps many times, before you are ever right.”

Across the sciences, research is considered reproducible when an independent team can conduct a published experiment, following the original methods as closely as possible, and get the same results. It's one key part of the process for building evidence to support theories. Even today, 100 years after Albert Einstein presented his general theory of relativity, scientists regularly repeat tests of its predictions and look for cases where his famous description of gravity does not apply.

"Scientific evidence does not rely on trusting the authority of the person who made the discovery," team member Angela Attwood , a psychology professor at the University of Bristol, said in a statement "Rather, credibility accumulates through independent replication and elaboration of the ideas and evidence."

The Reproducibility Project, a community-based crowdsourcing effort, kicked off in 2011 to test how well this measure of credibility applies to recent research in psychology. Scientists, some recruited and some volunteers, reviewed a pool of studies and selected one for replication that matched their own interest and expertise. Their data and results were shared online and reviewed and analyzed by other participating scientists for inclusion in the large Science study.

To help improve future research, the project analysis attempted to determine which kinds of studies fared the best, and why. They found that surprising results were the hardest to reproduce, and that the experience or expertise of the scientists who conducted the original experiments had little to do with successful replication.

The findings also offered some support for the oft-criticized statistical tool known as the P value , which measures whether a result is significant or due to chance. A higher value means a result is most likely a fluke, while a lower value means the result is statistically significant.

The project analysis showed that a low P value was fairly predictive of which psychology studies could be replicated. Twenty of the 32 original studies with a P value of less than 0.001 could be replicated, for example, while just 2 of the 11 papers with a value greater than 0.04 were successfully replicated.

But Christopherson suspects that most of his co-authors would not want the study to be taken as a ringing endorsement of P values, because they recognize the tool's limitations. And at least one P value problem was highlighted in the research: The original studies had relatively little variability in P value, because most journals have established a cutoff of 0.05 for publication. The trouble is that value can be reached by being selective about data sets , which means scientists looking to replicate a result should also carefully consider the methods and the data used in the original study.

It's also not yet clear whether psychology might be a particularly difficult field for reproducibility—a similar study is currently underway on cancer biology research. In the meantime, Christopherson hopes that the massive effort will spur more such double-checks and revisitations of past research to aid the scientific process.

“Getting it right means regularly revisiting past assumptions and past results and finding new ways to test them. The only way science is successful and credible is if it is self-critical,” he notes. 

Unfortunately there are disincentives to pursuing this kind of research, he says: “To get hired and promoted in academia, you must publish original research, so direct replications are rarer. I hope going forward that the universities and funding agencies responsible for incentivizing this research—and the media outlets covering them—will realize that they've been part of the problem, and that devaluing replication in this way has created a less stable literature than we'd like.”

Get the latest Science stories in your inbox.

Brian Handwerk | READ MORE

Brian Handwerk is a science correspondent based in Amherst, New Hampshire.

the research study cannot be replicated or repeated

Basic research crisis? Many results cannot be replicated

Many scientific studies cannot be replicated and this is symptomatic of a wider crisis in basic research, say scientists..

BASIC RESEARCH

When scientists write up their results in a scientific article, they should describe their methods very precisely so that other scientists can replicate them.

And in research, it’s not enough to test something once. A study should preferably be replicated many times by independent groups of scientists before they can be confident that their results are correct. This process is called reproduction.

Reproduction eliminates erroneous results or bad science. It also ensures that scientists do not waste time and money pursuing flawed research, and reassures others that incorrect conclusions are filtered out of the system.

   Basic Research

ScienceNordic takes you into the engine room of basic research to find out:

  • What is it?
  • Who does it?
  • And how does it benefit us?

Read More: What is basic research?

But far too much of today’s published research cannot be reproduced. And it is a serious problem, say scientists.

The results of a Nature survey in 2016: “Is there a reproducibility crisis in research?” (Video: Nature Publishing Group).

One-time results are widespread

A survey published in Nature in 2016 asked researchers “is there a reproducibility crisis in research?”

Has basic research fallen ill?

Researchers all over the world are busy creating robust and ground-breaking research.

But many scientists believe that basic research is in a crisis.

The “publish or perish” mantra means that the quality of research is declining, say some scientists.

ScienceNordic presents a series of articles focusing on the symptoms, consequences, and solutions to this crisis in basic research.

In this article we take a closer look at the lack of reproducibility in scientific articles.

Over half of 1,576 respondents, 52 per cent, said yes.

More than 70 per cent had tried to reproduce a colleagues experiment without success, and over half had tried, and failed, to reproduce their own experiments.

It is also easy to find examples of studies that could not be reproduced. In 2015, a group of researchers showed that only 36 per cent of studies from three respected psychology journals could be reproduced.

Only 47 per cent of the reproduced studies reported the same results as the originals study.

A similar study in 2012 showed that just six out of 53 landmark studies in cancer could be reproduced.

Read More articles in the  Basic Research Theme on Science Nordic

Mistakes are reinforced

Lack of reproducibility is especially problematic because researchers today depend on their research being cited by other scientists, says Associate Professor David Budtz Pedersen from the Department of Communication, Aalborg University, Denmark.

“Today, we’re mutually dependent other groups' research, because we cite each other and build upon each other’s work. In this way, a single mistake can multiply and create a cascade of bad research, that is very hard to detect and takes long time to correct,” he says.

The ideals have changed over time, says psychologist Dorthe Berntsen from the Department of Psychology and Behavioural Sciences at Aarhus University, Denmark.

“When I started in science, the ideal was to conduct three to four experiments that could show the same effect in different ways. You replicated your experiments and then  presented the three or four studies together in one article and submitted it to a journal,” says Berntsen.

“Later on, there was a change so that one or perhaps two studies was enough, but it had to be be a novel and exciting discovery, which should be really entertaining and fascinating to a wider public. This is the focus that you have today in Nature and Science, for example,” she says, but adds that there is also a movement in the opposite direction.

For example, the Association for Psychological Science recently launched a number of quality assurance initiatives, such as the Registered Replication Reports initiative , which aims to encourage replication studies.

Lack of reproduction is not always a problematic

But just because a study cannot be reproduced, it does not necessarily mean that the research is bad, says Associate Professor Mikkel Johansen from the Department of Science Education at the University of Copenhagen, Denmark.

While he agrees that scientists should try to replicate results as often as possible, it is not always bad science or questionable research practice that means that a result cannot be reproduced.

“We should really take care when we talk about reproducibility in empirical sciences, because what does it mean exactly to reproduce an experiment? Reproduction is built partially on the false premise that scientific experiments are a well-defined and standardised machine that always spit out the same result from the same input,” says Johansen.

Read More:  New minimum basic funding requirements in Norway

Genuine basic research is difficult to reproduce

We should be especially cautious of ringing the alarm when we fail to reproduce basic research, says Johansen.

“There’s clearly some cases where it’s reasonable to require reproduction, because we understand the context really well and the experiment can be very precisely described. For example, medical research. But when it comes to basic research, there are some places where you don’t know which conditions influence [the outcome]. So when you reproduce [the experiment] you may inadvertently change certain crucial details,” he says.

“A lack of reproducibility may also indicate that we don’t understand parts of the experiment well enough yet and so it’s difficult to reproduce all of the relevant parts, because we don’t know what they are, and then we change a couple of parameters that we thought didn’t matter.”

---------------

Read the Danish version of this article on Videnskab.dk  

Translated by: Catherine Jex

External links

  • David Budtz Pedersen
  • Dorthe Berntsen
  • Mikkel W. Johansen

Related content

the research study cannot be replicated or repeated

What is basic research?

What is basic research, and is it important? A new theme on ScienceNordic.

Basic research: Mistakes can lead to the biggest discoveries

Do not fear failure. It could be the first step towards the next important scientific discovery, says Peter Kjærgaard, director of the Natural History Museum of Denmark.

the research study cannot be replicated or repeated

How basic research helps fine-tune cancer treatments

Scientists have discovered a surprising mechanism that controls cell-repairing proteins. The new research could substantially improve cancer treatments in the future.

A boost for open access to research

The Research Council of Norway is introducing a new funding scheme to promote publication in open access journals.

the research study cannot be replicated or repeated

Arctic research on the rise

With an increased interest in the natural resources of the Arctic, geologists are increasingly drawn to research on the archipelago of Svalbard.

the research study cannot be replicated or repeated

The Earth traps more heat than before. This is partly due to cleaner air

the research study cannot be replicated or repeated

Is there something wrong with the prisons in Norway?

the research study cannot be replicated or repeated

Is it possible to make healthy french fries?

the research study cannot be replicated or repeated

Harmful substances have been found in plastic food packaging – but do we ingest them?

the research study cannot be replicated or repeated

Did you know energy possibly speaks?

Understanding the language of energy flows can help us prevent energy blackouts due to climate change and cyber attacks

the research study cannot be replicated or repeated

More people are seeking help for eating disorders - but we still don't understand the illnesses

New research center aims to shine light on how and why eating disorders develop

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

  • View all journals
  • My Account Login
  • Explore content
  • About the journal
  • Publish with us
  • Sign up for alerts
  • Open access
  • Published: 09 April 2024

Anger is eliminated with the disposal of a paper written because of provocation

  • Yuta Kanaya 1 &
  • Nobuyuki Kawai   ORCID: orcid.org/0000-0003-0372-1703 2   nAff1  

Scientific Reports volume  14 , Article number:  7490 ( 2024 ) Cite this article

21k Accesses

676 Altmetric

Metrics details

  • Human behaviour

Anger suppression is important in our daily life, as its failure can sometimes lead to the breaking down of relationships in families. Thus, effective strategies to suppress or neutralise anger have been examined. This study shows that physical disposal of a piece of paper containing one’s written thoughts on the cause of a provocative event neutralises anger, while holding the paper did not. In this study, participants wrote brief opinions about social problems and received a handwritten, insulting comment consisting of low evaluations about their composition from a confederate. Then, the participants wrote the cause and their thoughts about the provocative event. Half of the participants (disposal group) disposed of the paper in the trash can (Experiment 1) or in the shredder (Experiment 2), while the other half (retention group) kept it in a file on the desk. All the participants showed an increased subjective rating of anger after receiving the insulting feedback. However, the subjective anger for the disposal group decreased as low as the baseline period, while that of the retention group was still higher than that in the baseline period in both experiments. We propose this method as a powerful and simple way to eliminate anger.

Similar content being viewed by others

the research study cannot be replicated or repeated

Microdosing with psilocybin mushrooms: a double-blind placebo-controlled study

Federico Cavanna, Stephanie Muller, … Enzo Tagliazucchi

the research study cannot be replicated or repeated

Adults who microdose psychedelics report health related motivations and lower levels of anxiety and depression compared to non-microdosers

Joseph M. Rootman, Pamela Kryskow, … Zach Walsh

the research study cannot be replicated or repeated

Persistent interaction patterns across social media platforms and over time

Michele Avalle, Niccolò Di Marco, … Walter Quattrociocchi

Experiment 1

Introduction.

The need to control anger has been of importance for a long time in human societies, as inferred by a philosopher in Imperium Romanum who had already explored how to cease being angry 1 . However, it can still be challenging to suppress anger effectively. Frequent, unregulated anger often leads to violence towards children 2 , which has become an increasingly prevalent issue. One study found that the global estimate for children experiencing any form of violence (physical, sexual, emotional, or a combination) in the past year is one billion children aged 2–17 years 3 . The number of child abuse cases in Japan has reportedly doubled in the past decade 4 . Children learn about appropriate emotional expression and behaviour from their parents 5 , and children who have been maltreated may lack the opportunity to learn how to regulate anger. Consequently, these maltreated children may have difficulty controlling their own anger 6 , recognising anger in others 7 , and tend to exhibit externalizing behaviour problems 8 . These studies suggest that parental anger regulation issues negatively affect children’s emotional competence. Therefore, an effective way of reducing anger has been examined throughout the years 9 .

However, simply attempting to suppress anger is usually not effective 10 . Both cognitive reappraisal and distraction (i.e., thinking about something other than provocative comments) could reduce anger; however, distraction could suppress anger only for a transient period of time 11 . Cognitive reappraisal refers to the reinterpretation or modification of the meaning of an unpleasant situation. Although reappraisal is considered as an effective way to reduce anger 12 , it requires greater cognitive effort 13 , 14 . Therefore, reappraisal under stressful situations which require cognitive load was not found to be effective in reducing anger as compared to non-stressful situations 15 . Self-distancing, which may be responsible for the anger-reducing effect of reappraisal 12 is also considered as an effective way to reduce anger. Nevertheless, self-distancing or reflection on one’s provocation from a distance is often not feasible, especially in the heat of the moment 13 .

Failure to reduce anger can lead an individual to think about a provocative event repeatedly. Such ruminations are often produced in a self-immersed, experiential manner 16 . Self-immersed experiential rumination can lead to reliving past provocative events 17 , thus maintaining or even increasing subjective anger and vascular responses 18 .

However, among the types of ruminations, writing down a provocation event does not always maintain or increase anger; instead, anger is suppressed depending on the way of writing. For instance, anger was suppressed when participants wrote down the anger-inducing event in a detached, informational, ‘cool’ manner. However, their anger was not suppressed (and was maintained or even increased) when they failed to write down the event in an analytical manner, and wrote it down in a ‘hot’ (emotional) manner 12 . Somewhat relevant here is the expressive writing technique 19 , which is frequently used in emotion-focused psychotherapy treatment 20 . It is believed to be effective in suppressing anger in clinical settings. However, only one experimental study using this technique has been conducted, wherein it was found that there was a significant likelihood of reduced anger when sentences about the emotion were written in the past tense 21 . These studies suggest that anger may be successfully suppressed if individuals are able to separate their internal experience of provocative events from their sense of self 22 . Healy et al. 23 reported that negative self-referential statements (‘my life is pointless’), when presented in a defused format (‘I am having a thought that my life is pointless’), could decrease the emotional discomfort related to that statement.

These previous studies emphasised the cognitive processes (such as goals or valuations) that occur almost entirely inside individuals’ heads 24 . However, if we look at the literature more broadly, studies on emotion regulation (a situated cognitive approach) have demonstrated successful emotion control through dynamic interplay between the person and the situation 24 , 25 . From this situated cognition perspective, people perceive their environment in terms of the possibilities for the kinds of actions that they would pursue. These functional features of the environment (affordances) do not solely exist inside an individual’s mind but instead have a physical reality that exists in the individual’s relationship with the environment. For instance, people frequently use physical substances to modify their moods. People may take a hot shower when they feel lonely 26 , 27 or hold a teddy bear when they feel afraid 28 . Such access to physical objects can significantly modify individuals’ ability to manage their emotions.

In this study, we developed a new anger reduction strategy inspired by the situated cognition approach to emotion regulation 24 . Relevant to this approach, the notion of a grounded procedure of separation 29 also assumes that mental representations and functions are grounded in one’s own experiences and interactions with physical reality. For instance, if people want to take revenge through permanent removal (e.g. hatred for ex), they may destroy a related entity such that it is no longer recognisable (burn, melt, or tear related). In a related study, Briñol et al. 30 reported that writing down negative thoughts about a Mediterranean diet on a piece of paper and disposing of the paper in a trash can result in lower negative (more positive) evaluations of the diet, compared to a group that kept the paper in a booklet. These attitude changes may derive from the cognitive fusion that people often fuse with physical objects, such as jewellery, cars, and family heirlooms 31 . Such fused objects are valued more and are less likely to be abandoned because doing so means losing a part of themselves 32 , 33 . Specifically, throwing an object associated with negative emotions (anger) may result in losing the negative emotions (anger). However, to the best of our knowledge, no study has tested whether the disposal of anger-written paper can reduce or even eliminate anger.

Previous studies from a situated cognitive approach to anger management have changed the external environment of the individual in anger. Tool (object) use has received scant attention in these situated cognition approaches to anger management, except for a few studies, such as hitting a punching bag 34 and playing a video game 35 . This study examined a method in which the disposal of a paper (object) on which participants wrote down their descriptions or thoughts about a provocative event could neutralise anger. Participants threw the anger-written paper into a trash box in Experiment 1, and put the paper into a shredder in Experiment 2. If the action of disposal is crucial to modifying emotions, anger would be reduced only in participants in Experiment 1 but not in Experiment 2, as predicted by the grounded separation procedure 29 . Nevertheless, if anger was modified by the meaning of disposal, the subjective ratings of anger would be eliminated in both experiments. The disposal of the paper with the written descriptions would remove the psychological existence of anger for the provoked participants along with the disposal of paper by the dynamic interactions with the object 24 . This simple method of eliminating anger could potentially contribute to effective parental anger management toward their children.

Materials and method

Participants.

A total of 57 students (women = 21, mean age = 21.11, SD  = 1.05) from a local university participated in this experiment. The data from seven participants were excluded from the final analysis because they correctly guessed the purpose of the experiment and they did not express induced anger by insult (subjective ratings of anger were lower or the same compared to those of the baseline), as was the case in a previous study 36 . Our final analysis included 50 participants (women = 16, mean age = 21.10, SD  = 1.08). A sample size of 50 participants was determined by G*Power 3.1.9.4 37 using the a priori procedure for repeated measures ANOVA, within (periods)—between (disposal and retention) interaction with the parameters of 95% power, an expected effect size of 0.25 (defined as a medium effect by Cohen 38 ), alpha level of 0.05, a within-subjects measurement correlation of 0.5, and a nonsphericity correction ε of 1. The calculation suggested a sample size of 22 participants in each group. Based on these analyses, we concluded that the sample size was appropriate for this study.

Angry feelings were assessed with five adjective items: angry, bothered, annoyed, hostile, and irritated. These adjectives were previously used as measures of self-reported anger 39 . In this study, each response scale ranged from 1 (not at all) to 6 (extremely). As was the case in a previous study on anger 40 , scores on these five adjectives were averaged to form an anger experience composite, which was the score used in the analysis (Cronbach’s α = 0.90). We also used Positive and Negative Affect Schedule (PANAS) as a subjective scale to assess mainly negative feelings 38 . We used the Japanese version of the 6-point PANAS scale 41 .

In this experiment, participants' subjective emotional states were measured at three time points (baseline, post-provocation, and post-writing). The participants were told to write an essay on social problems (e.g., smoking in public) for which they would receive feedback from a doctoral student assessing the quality of the essay. They had seen the doctoral student before entering the experimental room. After the participants wrote the essay, they completed the PANAS and anger questionnaires for the baseline. The evaluation by the fictitious doctoral student was then provided to the participants. The evaluation included ratings of the essay on six characteristics using a 9-point scale (e.g. for intelligence, 1 = unintelligent, 9 = intelligent). All participants were given the following ratings: intelligence = 3, interest = 3, friendliness = 2, logic = 3, respectability = 4, and rationality = 3. Each essay was also provided with the following comment: ‘I cannot believe an educated person would think like this. I hope this person learns something while at the university’ 40 , 42 . All of these manipulations were successfully used in our previous study 40 . The participants were required to read the feedback ratings and comments silently for two minutes. Then, they filled out the subjective emotional questionnaires (PANAS and anger adjectives) for the post-provocation period.

Then, the participants were asked to write every thought of them on receiving the feedback and were given three minutes for this. The instruction was ‘Think about the event from your own perspective. Concentrate especially on the things that originally triggered the emotions and your reactions’. We added guide questions (‘Why were you feeling this way?’, ‘What made you feel this way?’) to induce analytical rumination. To allow the participants to write about their honest feelings, they were informed that the written paper would not be seen by anyone, including the experimenter. After writing, the participants were asked to review the sentences carefully for 30 s. For the retention group, the paper was turned over, put in a clear plastic folder, and placed on the right side of the desk. The participants in the disposal group rolled up the paper into a crumpled ball, stood up, threw the paper into the trash can held by the experimenter, and sat back in the chair. Finally, both groups of participants filled out the subjective emotional questionnaires (anger adjectives and PANAS) for the post-writing period. At the end of the experiment, all participants were debriefed and informed of the truth. They were also assured that the evaluations of their essays had been prepared in advance.

Data analyses

Angry feelings were analysed using a 2 (group: disposal or retention) × 3 (period: at baseline, post-provocation, and post-writing) ANOVA. All significance levels were set at p  < 0.05. We used the Greenhouse–Geisser correction when Mauchly’s test of sphericity was violated. When the interaction was significant, multiple comparisons using the Bonferroni correction method were used to assess the differences.

We also report Bayes factors (BFs) from the Bayesian repeated measures ANOVA in JASP 43 . For BFs, BF 10 values reflect the probability of an alternative relative to the null hypothesis. BFs greater than 3 indicate support for the hypotheses. A BF favouring the alternative over the null hypothesis (BF 10 ) offers strong evidence for the alternative hypothesis when it is over 10. Values less than 0.33 indicate support for the null hypothesis, and values between 0.33 and 3 indicate data insensitivity. We also reported 95% confidence intervals.

We aimed to examine (1) whether angry feelings resumed in the disposal group, and (2) whether angry feelings were different between the groups after the disposal or retention treatments. Our main interest was angry feelings, while we also verified PANAS scores using a 2 (group: disposal or retention) × 3 (period: at baseline, post-provocation, and post-writing) ANOVA.

Ethics statement

All participants were paid for their participation and had provided written informed consent in accordance with the procedures before participation. The study was approved by the Ethics Committee of the Department of Cognitive and Psychological Sciences at Nagoya University (201104-C-02–02). All methods were carried out in accordance with the ethical guidelines of the Declaration of Helsinki. All participants provided their written and informed consent prior to starting the study.

Anger experience

The left panel of Fig.  1 shows mean subjective ratings of anger for disposal and retention groups at three time points (baseline, post-provocation, and post-writing). Subjective ratings of anger of both groups increased at the post-provocation ( M disposal  = 3.34, SD  = 1.20, 95% CI [2.86, 3.82]; M retention  = 3.45, SD  = 1.11, 95% CI [3.00, 3.89]) from the baseline ( M disposal  = 1.59, SD  = 0.50, 95% CI [1.39, 1.79]; M retention  = 1.78, SD  = 0.71, 95% CI [1.50, 2.07]). Subjective ratings at the post-writing decreased from the post-provocation, however those of retention group were still higher than the baseline ( M retention  = 2.64, SD  = 0.95, 95% CI [2.26, 3.02]), while those of disposal group eliminated at the same level of the baseline ( M disposal  = 1.87, SD  = 0.71, 95% CI [1.59, 2.16]). A 2 (group: disposal or retention) × 3 (period: at baseline, post-provocation, and post-writing) mixed model analysis of variance (ANOVA) revealed a significant main effect of period [ F (2, 96) = 73.36, p  < 0.001, partial η 2  = 0.60, BF 10  > 100], while a main effect of group was not significant [ F (1, 48) = 3.21, p  > 0.05, partial η 2  = 0.06, BF 10  = 0.66]. The interaction between group and period was significant [ F (2, 96) = 3.12, p  < 0.05, partial η 2  = 0.06, BF 10  = 1.17]. Multiple comparisons with the Bonferroni method revealed that the subjective anger was significantly higher at the post-provocation than those at the baseline ( p  < 0.05), indicating that a provocative manipulation was exerted. Subjective ratings of anger post-writing decreased significantly, compared to post-provocation ( p  < 0.05). Importantly, however, subjective ratings of retention group at the post-writing period were still significantly higher than those of the baseline period ( p  < 0.05), whereas those of disposal group at the post-writing period eliminated to levels of the baseline period ( p  > 0.05). Subjective ratings of disposal group at the post-writing period were significantly lower than those of retention group ( p  < 0.01).

figure 1

Self-reported anger during Experiment 1 (left) and Experiment 2 (right). Significant differences emerged at the end of time due to experimental manipulations. Possible values for anger range from 1 to 6. Each vertical line illustrates the 95% confidence intervals for each group.

Negative and positive affect

The negative affect subscale of the PANAS at post-provocation ( M disposal  = 3.10, SD  = 1.00, 95% CI [2.70, 3.49]; M retention  = 3.06, SD  = 1.03, 95% CI [2.64, 3.47]) was higher than at baseline ( M disposal  = 2.45, SD  = 0.66, 95% CI [2.18, 2.71]; M retention  = 2.50, SD  = 0.84, 95% CI [2.16, 2.83]) and post-writing ( M disposal  = 2.06, SD  = 0.65, 95% CI [1.80, 2.32]; M retention  = 2.39, SD  = 0.88, 95% CI [2.04, 2.73]). The 95% CIs of the disposal group overlapped a little bit between post-provocation [2.70, 3.49] and baseline periods [2.18, 2.71], and those of the retention group overlapped between both the post-provocation [2.64, 3.47] and baseline [2.16, 2.83]. The 95% CIs for the post-writing means partially overlapped between the groups. A 2 (group) × 3 (period) mixed ANOVA revealed a significant main effect of period [ F (2, 96) = 28.64, p  < 0.001, partial η 2  = 0.37, BF 10  > 100]. However, the main effect of group [ F (1, 48) = 0.29, p  > 0.05, partial η 2  = 0.01, BF 10  = 0.32] and the interaction between group and period were not significant [ F (2, 96) = 1.35, p  > 0.05, partial η 2  = 0.03, BF 10  = 0.31]. Multiple comparisons with the Bonferroni method revealed that the subjective negative affect post-provocation was significantly higher than at baseline and post-writing ( ps  < 0.05).

The PANAS positive affect subscale showed little variation at three periods ( M disposal  = 2.33, SD  = 0.80, 95% CI [2.01, 2.65]; M retention  = 2.32, SD  = 0.75, 95% CI [2.01, 2.62]), post-provocation ( M disposal  = 2.44, SD  = 0.76, 95% CI [2.13, 2.75]; M retention  = 2.42, SD  = 0.89, 95% CI [2.06, 2.78]), and post-writing ( M disposal  = 2.38, SD  = 0.87, 95% CI [2.03, 2.73]; M retention  = 2.27, SD  = 0.83, 95% CI [1.93, 2.60]). A 2 × 3 mixed ANOVA revealed that neither main effects nor interaction was significant ( Fs  < 0.90, ps  > 0.41, BF 10 s < 0.14).

This study examined whether writing about the provocative event and disposing of the paper into a trash can would suppress anger. The provocation treatments evoked anger in both the groups similarly. Nevertheless, the retention group still showed significantly higher anger compared to levels at the baseline period, while the disposal group completely eliminated their anger after the disposal of the anger-written paper. These results suggest that the disposal of the paper containing ruminated anger into the trash can neutralise anger. Our interpretation is that the act of throwing the paper with ruminated anger into the trash can produces a feeling similar to the psychological existence (anger) being discarded, leading to anger elimination, since the psychological entity (anger) was disposed along with the physical object (anger-written paper).

One may argue that it was not the disposal itself but the physical distance played a critical role in reducing anger. Since the paper was distanced from participants in the disposal group, whereas the paper in the retention group was located by them. Nevertheless, Zhang et al. 44 showed that engaging in an avoidance action rather than creating physical distance was critical for reversing the perceived effect of negative thoughts. In their study (Experiment 5), participants in avoidance action conditions either threw the ball to the opposite corner of the room (creating physical distance between themselves and the ball), or pretended to throw the ball (creating no distance between themselves and the ball). Participants in the no-avoidance action condition either carried the ball to the opposite corner of the room and left it there (creating physical distance between the self and the ball without involving a throwing action) or held the ball in their non-dominant hand (creating no distance). Participants in both avoidance action conditions reversed the negative thoughts, while participants in both no-avoidance conditions did not. Avoidance actions were crucial in their study. Therefore, the physical distance would not contribute to reduce anger in this study. However, disposal action might be the key to neutralising anger in this study. Nevertheless, we assume that the meaning (i.e. interpretation) of disposal is more important than the action itself. Other studies have also suggested that the meaning of an action is critical for determining its impact, not the action itself 30 , 45 . This study could not exclude throwing action's potential contribution to neutralising anger. Thus, we conducted another experiment to exclude the potential contribution of the throwing action as much as possible, confirm the effectiveness of the disposal method, and explore the variation in this method.

Experiment 2

Experiment 1 indicated that the disposal of a piece of paper containing the description of an anger-inducing experience into the trash can neutralise anger. However, it was unclear what aspect of the paper’s disposal neutralised anger. Although we interpreted the meaning of the action as critical to neutralising anger, the physical distance between the participant and the paper or the action itself (i.e. embodied cognition) might have played a critical role. We set up the second experiment: (1) to replicate the results of Experiment 1; (2) to exclude the embodied explanation as much as possible; and (3) to explore another version of the disposal method using a shredder on the desk. In this experiment, we asked participants to put the paper containing anger into the shredder instead of throwing it into the trash can which was kept at some distance from the participants. We also made a small change to the retention group. Participants of retention group put the paper into a clear box on the desk, and the disposal group put the paper into the shredder. Thus, the distance between the participants and the paper and the type of action were matched between the two groups. If the sensorimotor experience of throwing the paper was critical to neutralise anger, we would not be able to replicate the results of Experiment 1. Nevertheless, if the meaning of the disposal of a physical entity plays a critical role in reducing anger, we anticipated obtaining similar results. In line with our prediction, the attitude changed when the paper was transferred to a box labelled ‘trash can’, which indicated mentally discarding it, compared to a box labelled ‘safety box’ 46 , suggesting that the perceived meaning of actions, and not the actions per se, influence attitude change. Hence, we designed a new study to confirm whether the perceived meaning of action eliminates anger. We predicted that putting the paper in a shredder would reduce negative emotions (anger), as compared to keeping the paper.

A total of 48 participants (women = 24, mean age = 26.81, SD  = 9.42) were participated through worker dispatching company and a local university. There was no overlap between the participants of the two experiments. This sample size was determined using G*Power 3.1.9.4 37 using the a priori procedure for repeated measures ANOVA, within (periods)–between (disposal and retention) interaction with the parameters of 95% power, an expected effect size of 0.25 (defined as a medium effect by Cohen 38 ), alpha level of 0.05, a within-participants measurement correlation of 0.5, and a nonsphericity correction ε of 1. The calculation suggested a sample size of 22 participants in each group. Based on these analyses, we concluded that the sample size was appropriate for this study. As in Experiment 1, the data of two participants were excluded from the final analysis because they correctly guessed the purpose of the experiment and did not express anger by insult (subjective ratings of anger were lower or the same as those at the baseline). Our final analysis included 46 participants (women = 23, mean age = 26.39, SD  = 9.14).

As in Experiment 1, angry feelings were assessed using five adjectives: angry, bothered, annoyed, hostile, and irritated. Responses ranged from 1 (not at all) to 6 (extremely). Scores on these five adjectives will be averaged to form an anger experience composite, which is the score used in the analyses. We also used the Japanese version of the 6-point PANAS scale as a subjective scale to assess mainly negative feelings 40 , 41 .

For the disposal group, a dustbin-type shredder (ACCO Brands Japan Corp, GSHA26MB) was used. This shredder (30 cm × 10 cm × 28 cm) cuts paper into pieces of 2 mm × 14 mm on putting the paper in from the top. The lower part of the shredder holds a transparent dustbin, so that the pieces of paper can be observed from the outside. For the retention group, a hand-made clear plastic box (23 cm × 5 cm × 30 cm) was used. Paper can be placed from the top, as with the shredder. Furthermore, as with the lower part of the shredder, the box is also transparent so that the paper in the box can be observed from the outside.

This experiment followed the same method used in Experiment 1 with slight changes. The words “while at university” were removed from the provocative comment (‘I cannot believe an educated person would think like this. I hope this person learns something while at university’ 40 , 42 , because non-students participated in this study. The second change was the method of disposing or retaining the paper containing a description of the anger-inducing experience. After participants wrote down provocative events in an analytical manner, a transparent box or a transparent shredder bin was placed on the desk in front of them (Fig.  2 ), before they were asked to review the sentences carefully for 30 s. Then, participants were required to put the paper into the box, with the frontside of the paper facing them. Participants in the disposal group watched as the paper was cut in the shredder for five seconds. Participants in the retention group were required to enclose the paper in a clear file folder and place it in a transparent box showing their written sentences. Then, they observed the paper carefully for five seconds. Subsequently, the box was turned back to show the blank side of the paper. All participants rated their anger and provided responses to the PANAS after these treatments.

figure 2

Pictures of experimental manipulations in Experiment 2. The disposal group (left) put the paper into the shredder, while the retention group (right) put the paper into the transparent box.

The right panel of Fig.  1 shows the mean subjective anger ratings for the disposal and retention groups at the three time points (baseline, post-provocation, and post-writing). This pattern of results is similar to that of Experiment 1. Subjective ratings of anger in both groups increased after provocation ( M disposal  = 3.14, SD  = 1.38, 95% CI [2.56, 3.72]; M retention  = 3.24, SD  = 1.04, 95% CI [2.80, 3.67]) from baseline ( M disposal  = 1.57, SD  = 0.75, 95% CI [1.25, 1.88]; M retention  = 1.64, SD  = 0.59, 95% CI [1.40, 1.89]). Subjective ratings at post-writing decreased from post-provocation. However, those of the retention group were still higher than those of the baseline ( M retention  = 2.75, SD  = 1.05, 95% CI [2.31, 3.19]), while those of the disposal group were eliminated at the same level as the baseline ( M disposal  = 1.98, SD  = 0.87, 95% CI [1.62, 2.35]). Only a small overlap (0.04) was observed in the 95% CI for the mean post-writing scores between the groups. A 2 (group: disposal or retention) × 3 (period: at baseline, post-provocation, and post-writing) mixed model ANOVA revealed a significant main effect of period [ F (2, 88) = 56.93, p  < 0.001, partial η 2  = 0.56, BF 10  > 100], while the main effect of group was not significant [ F (1, 44) = 1.68, p  > 0.05, partial η 2  = 0.04, BF 10  = 0.46]. The interaction between group and period was significant [ F (2, 88) = 3.49, p  < 0.05, partial η 2  = 0.07, BF 10  = 1.62]. Multiple comparisons with the Bonferroni method revealed that subjective anger was significantly higher at post-provocation than baseline ( p  < 0.05), indicating that provocative manipulation was exerted. Subjective ratings of anger at post-writing decreased significantly compared to post-provocation ( p  < 0.05). However, the subjective ratings of the retention group in the post-writing period were still maintained at the same level of anger as those of the post-provocation period ( p  > 0.05). Contrastingly, those of the disposal group in the post-writing period were significantly lower than those of the post-provocation period ( p  < 0.05).

Additionally, as was the result of Experiment1, the subjective ratings of the retention group in the post-writing period were significantly higher than those of the baseline period ( p  < 0.05). Those of the disposal group in the post-writing period were eliminated to the baseline period ( p  > 0.05). The subjective ratings of the disposal group in the post-writing period were significantly lower than those of the retention group ( p  < 0.05).

The negative affect subscale of the PANAS at post-provocation ( M disposal  = 3.34, SD  = 1.09, 95% CI [2.88, 3.79]; M retention  = 3.35, SD  = 0.89, 95% CI [2.98, 3.73]) was higher than at baseline ( M disposal  = 2.60, SD  = 0.78, 95% CI [2.27, 2.93]; M retention  = 2.73, SD  = 0.92, 95% CI [2.34, 3.11]) and post-writing ( M disposal  = 2.45, SD  = 0.96, 95% CI [2.05, 2.85]; M retention  = 2.57, SD  = 0.87, 95% CI [2.20, 2.93]). The 95% CIs of the disposal group overlapped a little bit between post-provocation [2.88, 3.79] and baseline periods [2.27, 2.93], and those of the retention group overlapped between both the post-provocation [2.98, 3.73] and baseline [2.34, 3.11]. A 2 (group) × 3 (period) mixed ANOVA revealed a significant main effect of period [ F (2, 88) = 20.19, p  < 0.01, partial η 2  = 0.68, BF 10  > 100]. However, the main effect of the group [ F (1, 44) = 0.15, p  > 0.05, partial η 2  = 0.06, BF 10  = 0.33] and the interaction between group and period were not significant [ F (2, 88) = 1.35, p  > 0.05, partial η 2  = 0.05, BF 10  = 0.13]. Multiple comparisons with the Bonferroni method revealed that the subjective negative affect post-provocation was significantly higher than at baseline and post-writing ( ps  < 0.05).

The positive affect subscale of the PANAS showed little variation at the three-time points ( M disposal  = 2.88, SD  = 1.03, 95% CI [2.44, 3.31]; M retention  = 2.57, SD  = 0.89), 95% CI [2.19, 2.94], post-provocation ( M disposal  = 2.49, SD  = 0.86, 95% CI [2.13, 2.85]; M retention  = 2.51, SD  = 0.94, 95% CI [2.12, 2.90]), and post-writing ( M disposal  = 2.49, SD  = 0.97, 95% CI [2.08, 2.89]; M retention  = 2.64, SD  = 1.02, 95% CI [2.21, 3.06]). A 2 × 3 mixed ANOVA revealed that neither the main effects nor interaction were significant ( Fs  < 2.28, ps  > 0.11, BF 10 s < 0.70).

The results were essentially the same as those of Experiment 1. The disposal group significantly reduced their anger after disposing of the anger-written paper into the shredder. The retention group showed significantly higher anger than the baseline period and disposal group. These results suggest that the results in Experiment 1 could be attributed neither to the physical distance between the participant and the paper nor to the action itself (i.e. embodied cognition). Specifically, Experiment 2 replicated the results of Experiment 1 and excluded the embodied explanation (the sensorimotor experience of throwing the paper) because the action of the disposal group was quite similar to that of the retention group in Experiment 2. The distance between participant and paper was the same in both groups, as the transparent box and shredder were placed on the desk.

General discussion

This study aimed to determine whether the disposal of anger-written papers could eliminate or at least reduce subjective anger. Disposal manipulation eliminated anger, either by throwing the paper into a trash can or placing it into the shredder. We propose that this anger reduction method is quite effective, so the subjective ratings of anger resumed as much as the baseline levels. We believe that this method can be used in daily life and especially for populations characterised by extreme levels of anger and aggression in their home. The use of this method may potentially contribute to emotion socialization, as parents are the primary model for their children.

These results indicate that the sensorimotor experience of throwing paper plays a small role in reducing subjective anger 44 . Instead, the meaning (interpretation) of disposal plays a critical role. These results are consistent with other studies which showed that the meaning of disposal was critical for determining its impact, not the action itself 30 , 45 . However, these results are partially inconsistent with those reported by Zhang et al. 44 . Their experiment tested whether certain behaviors could lower the perceived likelihood of bad luck, as is often the case with jinxes. Participants who threw a ball believed that a jinxed-negative outcome was less likely than those who held the ball. They demonstrated that engaging in an avoidant action rather than creating physical distance was critical for reversing the perceived effect of the jinx. The results of Experiment 1 in this study are consistent with their results. However, we demonstrated that neither avoidance action nor physical distance was crucial in reducing subjective anger.

Our results may be related to the phenomenon of ‘backward magical contagion’ 47 , which is the belief that actions taken on an object (e.g. hair) associated with an individual can affect the individuals themselves. Rozin et al. 48 discovered that individuals experience strong negative emotions when their personal objects are possessed by negative others (such as rapists or enemies). However, these emotions are reduced when the objects are destroyed, such as throwing them in a septic tank or burning them. The phenomenon of ‘magical contagion’ or ‘celebrity contagion’ refers to the belief that the ‘essence’ of an individual can be transferred to their possessions. This backward magical contagion operates in a reversed process, where manipulating an object associated with a person is thought to impact the individuals themselves. The current study's findings may be explained by the concept of backward magical contagion, which posits that negative emotions can be transferred from others to an individual through their possessions. This study did not involve the direct mediation of other individuals. The neutralization of subjective anger through the disposal of an object may be achieved by recognizing that the physical entity, such as a piece of paper, has been diminished, thus causing the original emotion to also disappear.

At least, however, some limitations regarding this disposal method should be addressed in future studies. First, the findings of this study are based on the assumption that participants identified their subjective anger with the paper. Thus, subjective anger had gone with the anger-written paper after its disposal. The participants were asked to review the sentences carefully for 30 s to enhance this identification between thought and paper. It is not clear whether this review process is necessary for identification.

Another limitation is that we did not test a digital device, such as a word processor or smartphone, but used only papers. We believe the present disposal method can be generalised to a digital device, whereas empirical data are limited only by physical entities, papers, trash cans, or shredders. Suppose the disposal method is proven to be effective in digital devices. In that case, it will be adopted in various situations, such as business meetings or daily conversations in schools, by writing and disposing of with a smartphone.

Furthermore, although the disposal method had a more significant effect so that the subjective ratings of anger were eliminated as much as the baseline levels, the effectiveness of this method was not directly compared to other anger reduction methods, such as self-distancing. Other methods may be as effective or even more effective than the present disposal method. Personality traits may modulate the effects of anger suppression, although this has not been examined in the techniques used in this or in other studies. Individuals with high (versus low) levels of trait anger tended to experience lapses in effortful control when exposed to anger-relevant stimuli 49 , 50 . As mentioned above, although cognitive reappraisal (the reinterpretation of the meaning of an unpleasant event) is considered an effective way to reduce anger 12 , it requires more significant cognitive effort 13 , 14 . Self-distancing is not feasible, particularly during the heat of the moment 13 . Conversely, the disposal method with low cognitive effort used in this study may be more effective for individuals with lower levels of trait self-control than for those with high trait self-control. Future research should examine whether personality traits moderate the relationship between the disposal method and the expected outcomes.

Individuals with higher levels of trait anger tended to have prolonged experiences of induced state anger 51 . However, experimental research on anger regulation strategies has predominantly emphasized the effectiveness of immediate control 10 , 11 , 12 , neglecting to investigate whether these strategies are equally effective in managing anger that persists over time. However, in everyday life, it is not always feasible to implement anger regulation strategies immediately after anger arises. Therefore, to ascertain its practical utility in real-world settings, it is imperative to examine whether the effectiveness of the disposal method varies with the duration of anger.

Moreover, it should be tested whether the disposal method can suppress subjective anger even if participants write down a provocation event in an experiential manner rather than in the analytic rumination manner used in this study. Previous studies suggest that anger rumination can maintain 52 or even increase 53 the original level of anger when participants wrote down a provocation event in an experiential rumination manner. As it may not be easy to write down analytically, especially in the heat of the moment, the disposal method will gain further strength if it is valid by experiential rumination.

It should be mentioned that although provocation was effective in both the subjective anger score and the PANAS negative score, the revealed emotion regulation strategy in this study seemed specific to anger (as no significant interaction effect for the PANAS negative score was observed). Kubo et al. 40 reported that the increase in the state of anger relevant to approach motivation (aggression) by provocation (measured using the STAXI and asymmetry of prefrontal brain activity) was reduced by an apology comment. However, an increase in the subjective scores of negative emotion (assessed using the PANAS) remained unchanged, regardless of the presence or absence of an apology comment. They proposed anger as not a unitary process but one that comprises multiple independent components (subjective anger and negative feelings). If the anger scale used in this study reflects the approach motivation component of anger as well as the STAXI, the disposal method appears to specifically suppress the components of anger’s approach motivation (aggression) and can be used to reduce aggression as a clinical technique.

Despite these limitations, this is the first study to be designed and used to conveniently eliminate subjective anger by interacting with physical entities. It offers a cost-effective and easy-to-use method to reduce anger by rumination about the provocative event, which otherwise lasts longer. Anyone with a pen and piece of paper can use this method. Suppose one maintains a diary or a personal log. In that case, they can write down a provocative event on the day on the memo pad, and throwing it into the trash can eliminate the provocative event. This action may help neutralize the negative emotions associated with the event, potentially protecting the children’s emotional socialization.

This study presents a new and convenient method for eliminating subjective anger. This method offers a cost-effective way to eliminate anger in various situations, including business meetings, childcare, and clinical applications. The building blocks of this method (e.g. applying it to a digital device or creating a specific application) could be useful in various daily situations as well as behavioural therapies. In particular, for someone who has difficulty suppressing their anger in their homes.

Data availability

The datasets used and analysed during the current study available from the corresponding author on reasonable request.

Seneca, L. A. Seneca Moral Essays , Vol. 1. De ira [Anger] (J. W. Basore, Ed. and Trans.) (Original work published 45) (W. Heinemann, 1928).

Rodriguez, C. M. & Green, A. J. Parenting stress and anger expression as predictors of child abuse potential. Child Abuse Negl. 21 (4), 367–377. https://doi.org/10.1016/s0145-2134(96)00177-9 (1997).

Article   CAS   PubMed   Google Scholar  

Moody, G., Cannings-John, R., Hood, K., Kemp, A. & Robling, M. Establishing the international prevalence of self-reported child maltreatment: A systematic review by maltreatment type and gender. BMC Public Health 18 , 1164. https://doi.org/10.1186/s12889-018-6044-y (2018).

Article   PubMed   PubMed Central   Google Scholar  

Child and Family Policy Bureau, Ministry of Health, Labour and Welfare. Trends in the Number of Cases of Child Abuse at Child Guidance Center (accessed 25 January 2023). https://www.mhlw.go.jp/content/001040752.pdf . ( in Japanese ).

Denham, S. & Kochanoff, A. T. Parental contributions to preschoolers’ understanding of emotion. Marriage Fam. Rev. 34 (3–4), 311–343. https://doi.org/10.1300/J002v34n03_06 (2002).

Article   Google Scholar  

Heleniak, C., Jenness, J. L., Vander Stoep, A., McCauley, E. & McLaughlin, K. A. Childhood maltreatment exposure and disruptions in emotion regulation: A transdiagnostic pathway to adolescent internalizing and externalizing psychopathology. Cogn. Ther. Res. 40 (3), 394–415. https://doi.org/10.1007/s10608-015-9735-z (2016).

Pollak, S. D., Cicchetti, D., Hornung, K. & Reed, A. Recognizing emotion in faces: Developmental effects of child abuse and neglect. Dev. Psychol. 36 (5), 679–688. https://doi.org/10.1037//0012-1649.36.5.679 (2000).

Article   PubMed   Google Scholar  

Denham, S. A. et al. Prediction of externalizing behavior problems from early to middle childhood: The role of parental socialization and emotion expression. Dev. Psychopathol. 12 (1), 23–45. https://doi.org/10.1017/s0954579400001024 (2000).

Beames, J. R., O’Dean, S. M., Grisham, J. R., Moulds, M. L. & Denson, T. F. Anger regulation in interpersonal contexts: Anger experience, aggressive behavior, and cardiovascular reactivity. J. Soc. Pers. Relationsh. 36 (5), 1441–1458. https://doi.org/10.1177/0265407518819295 (2019).

Szasz, P. L., Szentagotai, A. & Hofmann, S. G. The effect of emotion regulation strategies on anger. Behav. Res. Ther. 49 (2), 114–119. https://doi.org/10.1016/j.brat.2010.11.011 (2011).

Fabiansson, E. C. & Denson, T. F. The effects of intrapersonal anger and its regulation in economic bargaining. Plos One 7 (12), e51595. https://doi.org/10.1371/journal.pone.0051595 (2012).

Article   ADS   CAS   PubMed   PubMed Central   Google Scholar  

Denson, T. F., Moulds, M. L. & Grisham, J. R. The effects of analytical rumination, reappraisal, and distraction on anger experience. Behav. Ther. 43 (2), 355–364. https://doi.org/10.1016/j.beth.2011.08.001 (2012).

Kross, E. & Ayduk, O. Self-distancing: Theory, research, and current directions. Adv. Exp. Soc. Psychol. 55 (55), 81–136. https://doi.org/10.1016/bs.aesp.2016.10.002 (2017).

Orvell, A., Ayduk, O., Moser, J. S., Gelman, S. A. & Kross, E. Linguistic shifts: A relatively effortless route to emotion regulation?. Curr. Dir. Psychol. Sci. 28 (6), 567–573. https://doi.org/10.1177/0963721419861411 (2019).

Zhan, J. et al. Regulating anger under stress via cognitive reappraisal and sadness. Front. Psychol. 8 , 1372. https://doi.org/10.3389/fpsyg.2017.01372 (2017).

Ayduk, O. & Kross, E. From a distance: Implications of spontaneous self-distancing for adaptive self-reflection. J. Pers. Soc. Psychol. 98 (5), 809–829. https://doi.org/10.1037/a0019205 (2010).

Kross, E., Ayduk, O. & Mischel, W. When asking “why” does not hurt - Distinguishing rumination from reflective processing of negative emotions. Psychol. Sci. 16 (9), 709–715. https://doi.org/10.1111/j.1467-9280.2005.01600.x (2005).

Glynn, L. M., Christenfeld, N. & Gerin, W. Recreating cardiovascular responses with rumination: The effects of a delay between harassment and its recall. Int. J. Psychophysiol. 66 (2), 135–140. https://doi.org/10.1016/j.ijpsycho.2007.03.018 (2007).

Pennebaker, J. W. Expressive writing in psychological science. Perspect. Psychol. Sci. 13 (2), 226–229. https://doi.org/10.1177/1745691617707315 (2018).

Fuentes, A. M. M., Kahn, J. H. & Lannin, D. G. Emotional disclosure and emotion change during an expressive-writing task: Do pronouns matter?. Curr. Psychol. 40 , 1672–1679. https://doi.org/10.1007/s12144-018-0094-2 (2021).

Pasupathi, M., Wainryb, C., Mansfield, C. D. & Bourne, S. The feeling of the story: Narrating to regulate anger and sadness. Cogn. Emotion 31 (3), 444–461. https://doi.org/10.1080/02699931.2015.1127214 (2017).

Bernstein, A. et al. Decentering and related constructs: A critical review and metacognitive processes model. Perspect. Psychol. Sci. 10 (5), 599–617. https://doi.org/10.1177/1745691615594577 (2015).

Healy, H. A. et al. An experimental test of a cognitive defusion exercise: Coping with negative and positive self-statements. Psychol. Record 58 (4), 623–640. https://doi.org/10.1007/bf03395641 (2008).

Koole, S. L. & Veenstra, L. Does emotion regulation occur only inside people’s heads? Toward a situated cognition analysis of emotion-regulatory dynamics. Psychol. Inq. 26 (1), 61–68. https://doi.org/10.1080/1047840x.2015.964657 (2015).

Gross, J. J. Emotion regulation: Current status and future prospects. Psychol. Inq. 26 (1), 1–26. https://doi.org/10.1080/1047840x.2014.940781 (2015).

Article   MathSciNet   Google Scholar  

Bargh, J. A. & Shalev, I. The substitutability of physical and social warmth in daily life. Emotion 12 (1), 154–162. https://doi.org/10.1037/a0023527 (2012).

Shalev, I. & Bargh, J. On the association between loneliness and physical warmth-seeking through bathing: Reply to Donnellan et al. (2014) and three further replications of Bargh and Shalev (2012) study 1. Emotion 15 (1), 120–123. https://doi.org/10.1037/emo0000014 (2015).

Koole, S. L., Sin, M. T. A. & Schneider, I. K. Embodied terror management: Interpersonal touch alleviates existential concerns among individuals with low self-esteem. Psychol. Sci. 25 (1), 30–37. https://doi.org/10.1177/0956797613483478 (2013).

Lee, S. W. S. & Schwarz, N. Grounded procedures: A proximate mechanism for the psychology of cleansing and other physical actions. Behav. Brain Sci. 44 , e1. https://doi.org/10.1017/s0140525x20000308 (2021).

Brinol, P., Gasco, M., Petty, R. E. & Horcajo, J. Treating thoughts as material objects can increase or decrease their impact on evaluation. Psychol. Sci. 24 (1), 41–47. https://doi.org/10.1177/0956797612449176 (2013).

Hatvany, T., Burkley, E. & Curtis, J. Becoming part of me: Examining when objects, thoughts, goals, and people become fused with the self-concept. Soc. Personal. Psychol. Compass 12 (1), e12369. https://doi.org/10.1111/spc3.12369 (2018).

Belk, R. W. Possessions and the extended self. J. Consum. Res. 15 (2), 139–168. https://doi.org/10.1086/209154 (1988).

Reb, J. & Connolly, T. Possession, feelings of ownership and the endowment effect. Judgm. Decis. Mak. J. 2 (2), 107–114 (2007).

Bushman, B. J. Does venting anger feed or extinguish the flame? Catharsis, rumination, distraction, anger, and aggressive responding. Personal. Soc. Psychol. Bull. 28 (6), 724–731. https://doi.org/10.1177/0146167202289002 (2002).

Denzler, M., Hafner, M. & Forster, J. He just wants to play: How goals determine the influence of violent computer games on aggression. Personal. Soc. Psychol. Bull. 37 (12), 1644–1654. https://doi.org/10.1177/0146167211421176 (2011).

Mauss, I. B., Cook, C. L. & Gross, J. J. Automatic emotion regulation during anger provocation. J. Exp. Soc. Psychol. 43 (5), 698–711. https://doi.org/10.1016/j.jesp.2006.07.003 (2007).

Faul, F., Erdfelder, E., Lang, A. G. & Buchner, A. G*Power 3: A flexible statistical power analysis program for the social, behavioral, and biomedical sciences. Behav. Res. Methods 39 (2), 175–191. https://doi.org/10.3758/bf03193146 (2007).

Cohen, J. Statistical Power Analysis for the Behavioral Sciences 2nd edn. (Routledge, 1988). https://doi.org/10.4324/9780203771587 .

Book   Google Scholar  

Mischkowski, D., Kross, E. & Bushman, B. J. Flies on the wall are less aggressive: Self-distancing “in the heat of the moment” reduces aggressive thoughts, angry feelings and aggressive behavior. J. Exp. Soc. Psychol. 48 (5), 1187–1191. https://doi.org/10.1016/j.jesp.2012.03.012 (2012).

Kubo, K., Okanoya, K. & Kawai, N. Apology isn’t good enough: An apology suppresses an approach motivation but not the physiological and psychological anger. Plos One 7 (3), e33006. https://doi.org/10.1371/journal.pone.0033006 (2012).

Sato, T. & Yasuda, A. Development of the Japanese version of positive and negative affect schedule (PANAS) scales. Japan J. Pers. 9 (2), 138–139. https://doi.org/10.2132/jjpjspp.9.2_138 (2001) ( in Japanese ).

Harmon-Jones, E. & Sigelman, J. State anger and prefrontal brain activity: Evidence that insult-related relative left-prefrontal activation is associated with experienced anger and aggression. J. Personal. Soc. Psychol. 80 (5), 797–803. https://doi.org/10.1037/0022-3514.80.5.797 (2001).

Article   CAS   Google Scholar  

JASP Team. (2020). JASP (Version 0.7) [Computer software]. https://jasp-stats.org

Zhang, Y., Risen, J. L. & Hosey, C. Reversing one’s fortune by pushing away bad luck. J. Exp. Psychol. General 143 (3), 1171–1184. https://doi.org/10.1037/a0034023 (2014).

Brinol, P., Petty, R. E. & Wagner, B. Body posture effects on self-evaluation: A self-validation approach. Eur. J. Soc. Psychol. 39 (6), 1053–1064. https://doi.org/10.1002/ejsp.607 (2009).

Kim, T. W., Duhachek, A., Briñol, P. & Petty, R. E. Protect or hide your thoughts: The meanings associated with actions matter. In NA-Advances in Consumer Research (eds Cotte, J. & Wood, S.) 96–100 (ACR, 2014).

Google Scholar  

Rozin, P., Nemeroff, C., Wane, M. & Sherrod, A. Operation of the sympathetic magical law of contagion in interpersonal attitudes among americans. Bull. Psychon. Soc. 27 (4), 367–370 (1989).

Rozin, P., Dunn, C. & Fedotova, N. Reversing the causal arrow: Incidence and properties of negative backward magical contagion in Americans. Judgm. Decis. Mak. 13 (5), 441–450. https://doi.org/10.1017/S1930297500008718 (2018).

Denny, K. G. & Siemer, M. Trait aggression is related to anger-modulated deficits in response inhibition. J. Res. Person. 46 (4), 450–454 (2012).

Wilkowski, B. M. & Robinson, M. D. Keeping one’s cool: Trait anger, hostile thoughts, and the recruitment of limited capacity control. Person. Soc. Psychol. Bull. 33 (9), 1201–1213 (2007).

Veenstra, L., Bushman, B. J. & Koole, S. L. The facts on the furious: A brief review of the psychology of trait anger. Curr. Opin. Psychol. https://doi.org/10.1016/j.copsyc.2017.03.014 (2018).

Peuters, C., Kalokerinos, E. K., Pe, M. L. & Kuppens, P. Sequential effects of reappraisal and rumination on anger during recall of an anger-provoking event. Plos One 14 (1), e0209029. https://doi.org/10.1371/journal.pone.0209029 (2019).

Article   CAS   PubMed   PubMed Central   Google Scholar  

Lievaart, M., Huijding, J., van der Veen, F. M., Hovens, J. E. & Franken, I. H. A. The impact of angry rumination on anger-primed cognitive control. J. Behav. Ther. Exp. Psychiatry 54 , 135–142. https://doi.org/10.1016/j.jbtep.2016.07.016 (2017).

Download references

This study was supported by JSPS KAKENHI Grant Numbers 21K18552 and 21H04421, by Aoyama Gakuin University grant for ‘Projection Science,’ and by JST SPRING, Grant Number JPMJSP2125.

Author information

Nobuyuki Kawai

Present address: Department of Cognitive and Psychological Sciences, Nagoya University, Nagoya, 464-8601, Japan

Authors and Affiliations

Department of Cognitive and Psychological Sciences, Nagoya University, Nagoya, 464-8601, Japan

Yuta Kanaya

Academy of Emerging Science, Chubu University, Kasugai City, 487-8501, Japan

You can also search for this author in PubMed   Google Scholar

Contributions

N.K.: Conceptualization, Methodology, Writing-Original draft preparation, Writing-Reviewing and Editing, Supervision, Validation. Y.K.: Data collection and curation, Writing-Original draft preparation Visualization, Investigation.

Corresponding author

Correspondence to Nobuyuki Kawai .

Ethics declarations

Competing interests.

The authors declare no competing interests.

Additional information

Publisher's note.

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ .

Reprints and permissions

About this article

Cite this article.

Kanaya, Y., Kawai, N. Anger is eliminated with the disposal of a paper written because of provocation. Sci Rep 14 , 7490 (2024). https://doi.org/10.1038/s41598-024-57916-z

Download citation

Received : 09 February 2023

Accepted : 22 March 2024

Published : 09 April 2024

DOI : https://doi.org/10.1038/s41598-024-57916-z

Share this article

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

  • Suppression

By submitting a comment you agree to abide by our Terms and Community Guidelines . If you find something abusive or that does not comply with our terms or guidelines please flag it as inappropriate.

Quick links

  • Explore articles by subject
  • Guide to authors
  • Editorial policies

Sign up for the Nature Briefing newsletter — what matters in science, free to your inbox daily.

the research study cannot be replicated or repeated

COMMENTS

  1. When a study can't be replicated

    To achieve this, researchers must repeat a study and find the same conclusion. Doing so helps confirm that the original finding wasn't a fluke — one due to chance. Yet try as they might, many research teams cannot replicate, or match, an original study's results. Sometimes that occurs because the original scientists faked the study.

  2. Replicability

    However, a successful replication does not guarantee that the original scientific results of a study were correct, nor does a single failed replication conclusively refute the original claims. A failure to replicate previous results can be due to any number of factors, including the discovery of an unknown effect, inherent variability in the system, inability to control complex variables ...

  3. Replication crisis

    Ioannidis (2005): "Why Most Published Research Findings Are False".The replication crisis (also called the replicability crisis and the reproducibility crisis) is an ongoing methodological crisis in which the results of many scientific studies are difficult or impossible to reproduce.Because the reproducibility of empirical results is an essential part of the scientific method, such failures ...

  4. Reproducibility vs Replicability

    Reproducibility vs Replicability | Difference & Examples. Published on August 19, 2022 by Kassiani Nikolopoulou.Revised on June 22, 2023. The terms reproducibility, repeatability, and replicability are sometimes used interchangeably, but they mean different things.. A research study is reproducible when the existing data is reanalysed using the same research methods and yields the same results.

  5. Six factors affecting reproducibility in life science research and how

    There are several reasons why an experiment cannot be replicated. ... with several studies providing evidence that research is often not reproducible. A 2016 Nature survey 3, ...

  6. Summary

    One of the pathways by which scientists confirm the validity of a new finding or discovery is by repeating the research that produced it. When a scientific effort fails to independently confirm the computations or results of a previous study, some argue that the observed inconsistency may be an important precursor to new discovery while others fear it may be a symptom of a lack of rigor in ...

  7. It bears repeating: how scientists are addressing the 'reproducibility

    Perhaps some studies cannot be repeated due to problems with the initial study, while others aren't replicable because the follow-up research did not follow the methods or use the same tools as ...

  8. Confidence in Science

    When results are computationally reproduced or replicated, confidence in robustness of the knowledge derived from that particular study is increased. However, reproducibility and replicability are focused on the comparison between individual studies. By looking more broadly and using other techniques to gain confidence in results, multiple pathways can be found to consistently support certain ...

  9. Rigorous research practices improve scientific replication

    Science has suffered a crisis of replication—too few scientific studies can be repeated by peers. A new study from Stanford and three leading research universities shows that using rigorous research practices can boost the replication rate of studies. ... not just those whose findings cannot be replicated," Krosnick explained. Publish or ...

  10. Low replicability can support robust and efficient science

    Replicability is fundamental to science 1.Any finding that cannot be replicated at best fails to contribute to knowledge and, at worst, wastes other researchers' time when they pursue a blind ...

  11. Replicating scientific results is tough

    Replicabillity — the ability to obtain the same result when an experiment is repeated — is foundational to science. But in many research fields it has proved difficult to achieve. An important ...

  12. Scientific Findings Often Fail To Be Replicated, Researchers Say

    A massive effort to test the validity of 100 psychology experiments finds that more than 50 percent of the studies fail to replicate. This is based on a new study published in the journal "Science."

  13. Copycats in science: The role of replication

    Scientists aim for their studies to be replicable — meaning that another researcher could perform a similar investigation and obtain the same basic results. When a study cannot be replicated, it suggests that our current understanding of the study system or our methods of testing are insufficient. Copycats in science: The role of replication.

  14. Dozens of major cancer studies can't be replicated

    A massive 8-year effort finds that much cancer research can't be replicated. Unreliable preclinical studies could impede drug development later on. An effort to replicate nearly 200 preclinical ...

  15. Replicability and replication in the humanities

    Scientists and various news platforms have, over the last few years, increasingly been speaking of a replication crisis in various academic disciplines, especially the biomedical Footnote 1 and social sciences. Footnote 2 The main reason for this is that it turns out that large numbers of studies cannot be replicated, that is (roughly), they yield results that appear not to support, or to ...

  16. Is replication relevant for qualitative research?

    Replication, broadly defined as the repetition of a research study, generally among different subjects and/or situations, is commonly conducted in quantitative research with the aim of determining whether the basic findings of the original study can be generalized to other circumstances. Qualitative researchers have for many years objected to the notion of replicability, seeing it as being ...

  17. A new replication crisis: Research that is less likely to be true is

    On average, papers that failed to replicate are cited 16 times more per year. This gap remains even after the replication project is published. "Remarkably, only 12 percent of post-replication ...

  18. A New Replication Crisis: Research that is Less Likely to be True is

    Papers that cannot be replicated are cited 153 times more because their findings are interesting, according to a new UC San Diego study Credit: Dilok Klaisataporn/iStock. Findings from studies that cannot be verified when the experiments are repeated are cited 153 times more because the research is interesting, according to a new UC San Diego ...

  19. What Does It Mean to Replicate a Study?

    January 3, 2018. In principle, a well-designed and well-documented experiment will accurately describe a real phenomenon and provide other researchers with everything they need to reproduce those results in a comparable study. To replicate an experiment, researchers use the same methods of data collection and apply the same analysis, even ...

  20. A new replication crisis: Research that is less likely to be true is

    Yearly citation counts reveal a pronounced gap between papers that replicated and those that did not. On average, papers that failed to replicate are cited 16 times more per year. This gap remains ...

  21. Scientists Replicated 100 Psychology Studies, and Fewer Than Half Got

    Twenty of the 32 original studies with a P value of less than 0.001 could be replicated, for example, while just 2 of the 11 papers with a value greater than 0.04 were successfully replicated.

  22. Basic research crisis? Many results cannot be replicated

    In 2015, a group of researchers showed that only 36 per cent of studies from three respected psychology journals could be reproduced. Only 47 per cent of the reproduced studies reported the same results as the originals study. A similar study in 2012 showed that just six out of 53 landmark studies in cancer could be reproduced.

  23. Anger is eliminated with the disposal of a paper written ...

    This study examined a method in which the disposal of a paper (object) on which participants wrote down their descriptions or thoughts about a provocative event could neutralise anger ...