Have a language expert improve your writing

Run a free plagiarism check in 10 minutes, generate accurate citations for free.

  • Knowledge Base
  • Research bias

Types of Bias in Research | Definition & Examples

Research bias results from any deviation from the truth, causing distorted results and wrong conclusions. Bias can occur at any phase of your research, including during data collection , data analysis , interpretation, or publication. Research bias can occur in both qualitative and quantitative research .

Understanding research bias is important for several reasons.

  • Bias exists in all research, across research designs , and is difficult to eliminate.
  • Bias can occur at any stage of the research process .
  • Bias impacts the validity and reliability of your findings, leading to misinterpretation of data.

It is almost impossible to conduct a study without some degree of research bias. It’s crucial for you to be aware of the potential types of bias, so you can minimize them.

For example, the success rate of the program will likely be affected if participants start to drop out ( attrition ). Participants who become disillusioned due to not losing weight may drop out, while those who succeed in losing weight are more likely to continue. This in turn may bias the findings towards more favorable results.  

Table of contents

Information bias, interviewer bias.

  • Publication bias

Researcher bias

Response bias.

Selection bias

Cognitive bias

How to avoid bias in research

Other types of research bias, frequently asked questions about research bias.

Information bias , also called measurement bias, arises when key study variables are inaccurately measured or classified. Information bias occurs during the data collection step and is common in research studies that involve self-reporting and retrospective data collection. It can also result from poor interviewing techniques or differing levels of recall from participants.

The main types of information bias are:

  • Recall bias
  • Observer bias

Performance bias

Regression to the mean (rtm).

Over a period of four weeks, you ask students to keep a journal, noting how much time they spent on their smartphones along with any symptoms like muscle twitches, aches, or fatigue.

Recall bias is a type of information bias. It occurs when respondents are asked to recall events in the past and is common in studies that involve self-reporting.

As a rule of thumb, infrequent events (e.g., buying a house or a car) will be memorable for longer periods of time than routine events (e.g., daily use of public transportation). You can reduce recall bias by running a pilot survey and carefully testing recall periods. If possible, test both shorter and longer periods, checking for differences in recall.

  • A group of children who have been diagnosed, called the case group
  • A group of children who have not been diagnosed, called the control group

Since the parents are being asked to recall what their children generally ate over a period of several years, there is high potential for recall bias in the case group.

The best way to reduce recall bias is by ensuring your control group will have similar levels of recall bias to your case group. Parents of children who have childhood cancer, which is a serious health problem, are likely to be quite concerned about what may have contributed to the cancer.

Thus, if asked by researchers, these parents are likely to think very hard about what their child ate or did not eat in their first years of life. Parents of children with other serious health problems (aside from cancer) are also likely to be quite concerned about any diet-related question that researchers ask about.

Observer bias is the tendency of research participants to see what they expect or want to see, rather than what is actually occurring. Observer bias can affect the results in observationa l and experimental studies, where subjective judgment (such as assessing a medical image) or measurement (such as rounding blood pressure readings up or down) is part of the d ata collection process.

Observer bias leads to over- or underestimation of true values, which in turn compromise the validity of your findings. You can reduce observer bias by using double-blinded  and single-blinded research methods.

Based on discussions you had with other researchers before starting your observations , you are inclined to think that medical staff tend to simply call each other when they need specific patient details or have questions about treatments.

At the end of the observation period, you compare notes with your colleague. Your conclusion was that medical staff tend to favor phone calls when seeking information, while your colleague noted down that medical staff mostly rely on face-to-face discussions. Seeing that your expectations may have influenced your observations, you and your colleague decide to conduct semi-structured interviews with medical staff to clarify the observed events. Note: Observer bias and actor–observer bias are not the same thing.

Performance bias is unequal care between study groups. Performance bias occurs mainly in medical research experiments, if participants have knowledge of the planned intervention, therapy, or drug trial before it begins.

Studies about nutrition, exercise outcomes, or surgical interventions are very susceptible to this type of bias. It can be minimized by using blinding , which prevents participants and/or researchers from knowing who is in the control or treatment groups. If blinding is not possible, then using objective outcomes (such as hospital admission data) is the best approach.

When the subjects of an experimental study change or improve their behavior because they are aware they are being studied, this is called the Hawthorne effect (or observer effect). Similarly, the John Henry effect occurs when members of a control group are aware they are being compared to the experimental group. This causes them to alter their behavior in an effort to compensate for their perceived disadvantage.

Regression to the mean (RTM) is a statistical phenomenon that refers to the fact that a variable that shows an extreme value on its first measurement will tend to be closer to the center of its distribution on a second measurement.

Medical research is particularly sensitive to RTM. Here, interventions aimed at a group or a characteristic that is very different from the average (e.g., people with high blood pressure) will appear to be successful because of the regression to the mean. This can lead researchers to misinterpret results, describing a specific intervention as causal when the change in the extreme groups would have happened anyway.

In general, among people with depression, certain physical and mental characteristics have been observed to deviate from the population mean .

This could lead you to think that the intervention was effective when those treated showed improvement on measured post-treatment indicators, such as reduced severity of depressive episodes.

However, given that such characteristics deviate more from the population mean in people with depression than in people without depression, this improvement could be attributed to RTM.

Interviewer bias stems from the person conducting the research study. It can result from the way they ask questions or react to responses, but also from any aspect of their identity, such as their sex, ethnicity, social class, or perceived attractiveness.

Interviewer bias distorts responses, especially when the characteristics relate in some way to the research topic. Interviewer bias can also affect the interviewer’s ability to establish rapport with the interviewees, causing them to feel less comfortable giving their honest opinions about sensitive or personal topics.

Participant: “I like to solve puzzles, or sometimes do some gardening.”

You: “I love gardening, too!”

In this case, seeing your enthusiastic reaction could lead the participant to talk more about gardening.

Establishing trust between you and your interviewees is crucial in order to ensure that they feel comfortable opening up and revealing their true thoughts and feelings. At the same time, being overly empathetic can influence the responses of your interviewees, as seen above.

Publication bias occurs when the decision to publish research findings is based on their nature or the direction of their results. Studies reporting results that are perceived as positive, statistically significant , or favoring the study hypotheses are more likely to be published due to publication bias.

Publication bias is related to data dredging (also called p -hacking ), where statistical tests on a set of data are run until something statistically significant happens. As academic journals tend to prefer publishing statistically significant results, this can pressure researchers to only submit statistically significant results. P -hacking can also involve excluding participants or stopping data collection once a p value of 0.05 is reached. However, this leads to false positive results and an overrepresentation of positive results in published academic literature.

Researcher bias occurs when the researcher’s beliefs or expectations influence the research design or data collection process. Researcher bias can be deliberate (such as claiming that an intervention worked even if it didn’t) or unconscious (such as letting personal feelings, stereotypes, or assumptions influence research questions ).

The unconscious form of researcher bias is associated with the Pygmalion effect (or Rosenthal effect ), where the researcher’s high expectations (e.g., that patients assigned to a treatment group will succeed) lead to better performance and better outcomes.

Researcher bias is also sometimes called experimenter bias, but it applies to all types of investigative projects, rather than only to experimental designs .

  • Good question: What are your views on alcohol consumption among your peers?
  • Bad question: Do you think it’s okay for young people to drink so much?

Response bias is a general term used to describe a number of different situations where respondents tend to provide inaccurate or false answers to self-report questions, such as those asked on surveys or in structured interviews .

This happens because when people are asked a question (e.g., during an interview ), they integrate multiple sources of information to generate their responses. Because of that, any aspect of a research study may potentially bias a respondent. Examples include the phrasing of questions in surveys, how participants perceive the researcher, or the desire of the participant to please the researcher and to provide socially desirable responses.

Response bias also occurs in experimental medical research. When outcomes are based on patients’ reports, a placebo effect can occur. Here, patients report an improvement despite having received a placebo, not an active medical treatment.

While interviewing a student, you ask them:

“Do you think it’s okay to cheat on an exam?”

Common types of response bias are:

Acquiescence bias

Demand characteristics.

  • Social desirability bias

Courtesy bias

  • Question-order bias

Extreme responding

Acquiescence bias is the tendency of respondents to agree with a statement when faced with binary response options like “agree/disagree,” “yes/no,” or “true/false.” Acquiescence is sometimes referred to as “yea-saying.”

This type of bias occurs either due to the participant’s personality (i.e., some people are more likely to agree with statements than disagree, regardless of their content) or because participants perceive the researcher as an expert and are more inclined to agree with the statements presented to them.

Q: Are you a social person?

People who are inclined to agree with statements presented to them are at risk of selecting the first option, even if it isn’t fully supported by their lived experiences.

In order to control for acquiescence, consider tweaking your phrasing to encourage respondents to make a choice truly based on their preferences. Here’s an example:

Q: What would you prefer?

  • A quiet night in
  • A night out with friends

Demand characteristics are cues that could reveal the research agenda to participants, risking a change in their behaviors or views. Ensuring that participants are not aware of the research objectives is the best way to avoid this type of bias.

On each occasion, patients reported their pain as being less than prior to the operation. While at face value this seems to suggest that the operation does indeed lead to less pain, there is a demand characteristic at play. During the interviews, the researcher would unconsciously frown whenever patients reported more post-op pain. This increased the risk of patients figuring out that the researcher was hoping that the operation would have an advantageous effect.

Social desirability bias is the tendency of participants to give responses that they believe will be viewed favorably by the researcher or other participants. It often affects studies that focus on sensitive topics, such as alcohol consumption or sexual behavior.

You are conducting face-to-face semi-structured interviews with a number of employees from different departments. When asked whether they would be interested in a smoking cessation program, there was widespread enthusiasm for the idea.

Note that while social desirability and demand characteristics may sound similar, there is a key difference between them. Social desirability is about conforming to social norms, while demand characteristics revolve around the purpose of the research.

Courtesy bias stems from a reluctance to give negative feedback, so as to be polite to the person asking the question. Small-group interviewing where participants relate in some way to each other (e.g., a student, a teacher, and a dean) is especially prone to this type of bias.

Question order bias

Question order bias occurs when the order in which interview questions are asked influences the way the respondent interprets and evaluates them. This occurs especially when previous questions provide context for subsequent questions.

When answering subsequent questions, respondents may orient their answers to previous questions (called a halo effect ), which can lead to systematic distortion of the responses.

Extreme responding is the tendency of a respondent to answer in the extreme, choosing the lowest or highest response available, even if that is not their true opinion. Extreme responding is common in surveys using Likert scales , and it distorts people’s true attitudes and opinions.

Disposition towards the survey can be a source of extreme responding, as well as cultural components. For example, people coming from collectivist cultures tend to exhibit extreme responses in terms of agreement, while respondents indifferent to the questions asked may exhibit extreme responses in terms of disagreement.

Selection bias is a general term describing situations where bias is introduced into the research from factors affecting the study population.

Common types of selection bias are:

Sampling or ascertainment bias

  • Attrition bias
  • Self-selection (or volunteer) bias
  • Survivorship bias
  • Nonresponse bias
  • Undercoverage bias

Sampling bias occurs when your sample (the individuals, groups, or data you obtain for your research) is selected in a way that is not representative of the population you are analyzing. Sampling bias threatens the external validity of your findings and influences the generalizability of your results.

The easiest way to prevent sampling bias is to use a probability sampling method . This way, each member of the population you are studying has an equal chance of being included in your sample.

Sampling bias is often referred to as ascertainment bias in the medical field.

Attrition bias occurs when participants who drop out of a study systematically differ from those who remain in the study. Attrition bias is especially problematic in randomized controlled trials for medical research because participants who do not like the experience or have unwanted side effects can drop out and affect your results.

You can minimize attrition bias by offering incentives for participants to complete the study (e.g., a gift card if they successfully attend every session). It’s also a good practice to recruit more participants than you need, or minimize the number of follow-up sessions or questions.

You provide a treatment group with weekly one-hour sessions over a two-month period, while a control group attends sessions on an unrelated topic. You complete five waves of data collection to compare outcomes: a pretest survey, three surveys during the program, and a posttest survey.

Self-selection or volunteer bias

Self-selection bias (also called volunteer bias ) occurs when individuals who volunteer for a study have particular characteristics that matter for the purposes of the study.

Volunteer bias leads to biased data, as the respondents who choose to participate will not represent your entire target population. You can avoid this type of bias by using random assignment —i.e., placing participants in a control group or a treatment group after they have volunteered to participate in the study.

Closely related to volunteer bias is nonresponse bias , which occurs when a research subject declines to participate in a particular study or drops out before the study’s completion.

Considering that the hospital is located in an affluent part of the city, volunteers are more likely to have a higher socioeconomic standing, higher education, and better nutrition than the general population.

Survivorship bias occurs when you do not evaluate your data set in its entirety: for example, by only analyzing the patients who survived a clinical trial.

This strongly increases the likelihood that you draw (incorrect) conclusions based upon those who have passed some sort of selection process—focusing on “survivors” and forgetting those who went through a similar process and did not survive.

Note that “survival” does not always mean that participants died! Rather, it signifies that participants did not successfully complete the intervention.

However, most college dropouts do not become billionaires. In fact, there are many more aspiring entrepreneurs who dropped out of college to start companies and failed than succeeded.

Nonresponse bias occurs when those who do not respond to a survey or research project are different from those who do in ways that are critical to the goals of the research. This is very common in survey research, when participants are unable or unwilling to participate due to factors like lack of the necessary skills, lack of time, or guilt or shame related to the topic.

You can mitigate nonresponse bias by offering the survey in different formats (e.g., an online survey, but also a paper version sent via post), ensuring confidentiality , and sending them reminders to complete the survey.

You notice that your surveys were conducted during business hours, when the working-age residents were less likely to be home.

Undercoverage bias occurs when you only sample from a subset of the population you are interested in. Online surveys can be particularly susceptible to undercoverage bias. Despite being more cost-effective than other methods, they can introduce undercoverage bias as a result of excluding people who do not use the internet.

Cognitive bias refers to a set of predictable (i.e., nonrandom) errors in thinking that arise from our limited ability to process information objectively. Rather, our judgment is influenced by our values, memories, and other personal traits. These create “ mental shortcuts” that help us process information intuitively and decide faster. However, cognitive bias can also cause us to misunderstand or misinterpret situations, information, or other people.

Because of cognitive bias, people often perceive events to be more predictable after they happen.

Although there is no general agreement on how many types of cognitive bias exist, some common types are:

  • Anchoring bias  
  • Framing effect  
  • Actor-observer bias
  • Availability heuristic (or availability bias)
  • Confirmation bias  
  • Halo effect
  • The Baader-Meinhof phenomenon  

Anchoring bias

Anchoring bias is people’s tendency to fixate on the first piece of information they receive, especially when it concerns numbers. This piece of information becomes a reference point or anchor. Because of that, people base all subsequent decisions on this anchor. For example, initial offers have a stronger influence on the outcome of negotiations than subsequent ones.

  • Framing effect

Framing effect refers to our tendency to decide based on how the information about the decision is presented to us. In other words, our response depends on whether the option is presented in a negative or positive light, e.g., gain or loss, reward or punishment, etc. This means that the same information can be more or less attractive depending on the wording or what features are highlighted.

Actor–observer bias

Actor–observer bias occurs when you attribute the behavior of others to internal factors, like skill or personality, but attribute your own behavior to external or situational factors.

In other words, when you are the actor in a situation, you are more likely to link events to external factors, such as your surroundings or environment. However, when you are observing the behavior of others, you are more likely to associate behavior with their personality, nature, or temperament.

One interviewee recalls a morning when it was raining heavily. They were rushing to drop off their kids at school in order to get to work on time. As they were driving down the highway, another car cut them off as they were trying to merge. They tell you how frustrated they felt and exclaim that the other driver must have been a very rude person.

At another point, the same interviewee recalls that they did something similar: accidentally cutting off another driver while trying to take the correct exit. However, this time, the interviewee claimed that they always drive very carefully, blaming their mistake on poor visibility due to the rain.

  • Availability heuristic

Availability heuristic (or availability bias) describes the tendency to evaluate a topic using the information we can quickly recall to our mind, i.e., that is available to us. However, this is not necessarily the best information, rather it’s the most vivid or recent. Even so, due to this mental shortcut, we tend to think that what we can recall must be right and ignore any other information.

  • Confirmation bias

Confirmation bias is the tendency to seek out information in a way that supports our existing beliefs while also rejecting any information that contradicts those beliefs. Confirmation bias is often unintentional but still results in skewed results and poor decision-making.

Let’s say you grew up with a parent in the military. Chances are that you have a lot of complex emotions around overseas deployments. This can lead you to over-emphasize findings that “prove” that your lived experience is the case for most families, neglecting other explanations and experiences.

The halo effect refers to situations whereby our general impression about a person, a brand, or a product is shaped by a single trait. It happens, for instance, when we automatically make positive assumptions about people based on something positive we notice, while in reality, we know little about them.

The Baader-Meinhof phenomenon

The Baader-Meinhof phenomenon (or frequency illusion) occurs when something that you recently learned seems to appear “everywhere” soon after it was first brought to your attention. However, this is not the case. What has increased is your awareness of something, such as a new word or an old song you never knew existed, not their frequency.

While very difficult to eliminate entirely, research bias can be mitigated through proper study design and implementation. Here are some tips to keep in mind as you get started.

  • Clearly explain in your methodology section how your research design will help you meet the research objectives and why this is the most appropriate research design.
  • In quantitative studies , make sure that you use probability sampling to select the participants. If you’re running an experiment, make sure you use random assignment to assign your control and treatment groups.
  • Account for participants who withdraw or are lost to follow-up during the study. If they are withdrawing for a particular reason, it could bias your results. This applies especially to longer-term or longitudinal studies .
  • Use triangulation to enhance the validity and credibility of your findings.
  • Phrase your survey or interview questions in a neutral, non-judgmental tone. Be very careful that your questions do not steer your participants in any particular direction.
  • Consider using a reflexive journal. Here, you can log the details of each interview , paying special attention to any influence you may have had on participants. You can include these in your final analysis.
  • Baader–Meinhof phenomenon
  • Sampling bias
  • Ascertainment bias
  • Self-selection bias
  • Hawthorne effect
  • Omitted variable bias
  • Pygmalion effect
  • Placebo effect

Research bias affects the validity and reliability of your research findings , leading to false conclusions and a misinterpretation of the truth. This can have serious implications in areas like medical research where, for example, a new form of treatment may be evaluated.

Observer bias occurs when the researcher’s assumptions, views, or preconceptions influence what they see and record in a study, while actor–observer bias refers to situations where respondents attribute internal factors (e.g., bad character) to justify other’s behavior and external factors (difficult circumstances) to justify the same behavior in themselves.

Response bias is a general term used to describe a number of different conditions or factors that cue respondents to provide inaccurate or false answers during surveys or interviews. These factors range from the interviewer’s perceived social position or appearance to the the phrasing of questions in surveys.

Nonresponse bias occurs when the people who complete a survey are different from those who did not, in ways that are relevant to the research topic. Nonresponse can happen because people are either not willing or not able to participate.

Is this article helpful?

Other students also liked.

  • Attrition Bias | Examples, Explanation, Prevention
  • Observer Bias | Definition, Examples, Prevention
  • What Is Social Desirability Bias? | Definition & Examples

More interesting articles

  • Demand Characteristics | Definition, Examples & Control
  • Hostile Attribution Bias | Definition & Examples
  • Regression to the Mean | Definition & Examples
  • Representativeness Heuristic | Example & Definition
  • Sampling Bias and How to Avoid It | Types & Examples
  • Self-Fulfilling Prophecy | Definition & Examples
  • The Availability Heuristic | Example & Definition
  • The Baader–Meinhof Phenomenon Explained
  • What Is a Ceiling Effect? | Definition & Examples
  • What Is Actor-Observer Bias? | Definition & Examples
  • What Is Affinity Bias? | Definition & Examples
  • What Is Anchoring Bias? | Definition & Examples
  • What Is Ascertainment Bias? | Definition & Examples
  • What Is Belief Bias? | Definition & Examples
  • What Is Bias for Action? | Definition & Examples
  • What Is Cognitive Bias? | Definition, Types, & Examples
  • What Is Confirmation Bias? | Definition & Examples
  • What Is Conformity Bias? | Definition & Examples
  • What Is Correspondence Bias? | Definition & Example
  • What Is Explicit Bias? | Definition & Examples
  • What Is Generalizability? | Definition & Examples
  • What Is Hindsight Bias? | Definition & Examples
  • What Is Implicit Bias? | Definition & Examples
  • What Is Information Bias? | Definition & Examples
  • What Is Ingroup Bias? | Definition & Examples
  • What Is Negativity Bias? | Definition & Examples
  • What Is Nonresponse Bias? | Definition & Example
  • What Is Normalcy Bias? | Definition & Example
  • What Is Omitted Variable Bias? | Definition & Examples
  • What Is Optimism Bias? | Definition & Examples
  • What Is Outgroup Bias? | Definition & Examples
  • What Is Overconfidence Bias? | Definition & Examples
  • What Is Perception Bias? | Definition & Examples
  • What Is Primacy Bias? | Definition & Example
  • What Is Publication Bias? | Definition & Examples
  • What Is Recall Bias? | Definition & Examples
  • What Is Recency Bias? | Definition & Examples
  • What Is Response Bias? | Definition & Examples
  • What Is Selection Bias? | Definition & Examples
  • What Is Self-Selection Bias? | Definition & Example
  • What Is Self-Serving Bias? | Definition & Example
  • What Is Status Quo Bias? | Definition & Examples
  • What Is Survivorship Bias? | Definition & Examples
  • What Is the Affect Heuristic? | Example & Definition
  • What Is the Egocentric Bias? | Definition & Examples
  • What Is the Framing Effect? | Definition & Examples
  • What Is the Halo Effect? | Definition & Examples
  • What Is the Hawthorne Effect? | Definition & Examples
  • What Is the Placebo Effect? | Definition & Examples
  • What Is the Pygmalion Effect? | Definition & Examples
  • What Is Unconscious Bias? | Definition & Examples
  • What Is Undercoverage Bias? | Definition & Example
  • What Is Vividness Bias? | Definition & Examples

2020 Learning Objects

Handout: Case Studies in Research and Bias

Williams, Kristina E.

This handout lists case studies that highlight different types of biases and their impact on research strategies and design.

  • Library research
  • Research--Moral and ethical aspects
  • Academic libraries

thumnail for Case Studies in Research and Bias .pdf

More About This Work

Related items.

  • DOI Copy DOI to clipboard

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

  • View all journals
  • My Account Login
  • Explore content
  • About the journal
  • Publish with us
  • Sign up for alerts
  • Open access
  • Published: 17 February 2021

Temporal bias in case-control design: preventing reliable predictions of the future

  • William Yuan 1 ,
  • Brett K. Beaulieu-Jones   ORCID: orcid.org/0000-0002-6700-1468 1 ,
  • Kun-Hsing Yu   ORCID: orcid.org/0000-0001-9892-8218 1 ,
  • Scott L. Lipnick 1 , 2 , 3 ,
  • Nathan Palmer   ORCID: orcid.org/0000-0002-4361-207X 1 ,
  • Joseph Loscalzo   ORCID: orcid.org/0000-0002-1153-8047 4 ,
  • Tianxi Cai 1 , 5 , 6 &
  • Isaac S. Kohane 1  

Nature Communications volume  12 , Article number:  1107 ( 2021 ) Cite this article

10k Accesses

29 Citations

24 Altmetric

Metrics details

  • Epidemiology
  • Machine learning

One of the primary tools that researchers use to predict risk is the case-control study. We identify a flaw, temporal bias, that is specific to and uniquely associated with these studies that occurs when the study period is not representative of the data that clinicians have during the diagnostic process. Temporal bias acts to undermine the validity of predictions by over-emphasizing features close to the outcome of interest. We examine the impact of temporal bias across the medical literature, and highlight examples of exaggerated effect sizes, false-negative predictions, and replication failure. Given the ubiquity and practical advantages of case-control studies, we discuss strategies for estimating the influence of and preventing temporal bias where it exists.

Similar content being viewed by others

bias in case study research

Long COVID: major findings, mechanisms and recommendations

Hannah E. Davis, Lisa McCorkell, … Eric J. Topol

bias in case study research

The serotonin theory of depression: a systematic umbrella review of the evidence

Joanna Moncrieff, Ruth E. Cooper, … Mark A. Horowitz

bias in case study research

Genome-wide association studies

Emil Uffelmann, Qin Qin Huang, … Danielle Posthuma

Introduction

The ability to predict disease risk is a foundational aspect of medicine, and is instrumental for early intervention, clinician decision support, and improving patient outcomes. One of the main tools utilized by researchers for identifying predictive associations or constructing models from observational data is the case-control study 1 . By measuring differing exposure patterns between the case and control groups, exposures can be interpreted as predictors or risk factors for case status 2 , 3 . With the proliferation of observational datasets and novel machine learning techniques, the potential for these studies to play a direct role in personalized medicine has begun to be explored 4 . However, we have identified a structural flaw, seen widely in basic case-control study designs, which we call temporal bias. At its core, temporal bias represents a mismatch between the data used in the study and the data that a clinician would have access to when making a diagnostic decision. A clinician must evaluate all patients in real time, without the luxury of knowing that they have been pre-selected according to their future status. Case-control studies, as popularly implemented, are uniquely unable to make prospectively valid predictions. This temporal bias not only amplifies reported effect sizes relative to what would be observed in practice, but also obfuscates the prospective use of findings.

A classic example of temporal bias and its impacts can be seen through the initial discovery of lyme disease, a tick-borne bacterial infection. Lyme disease is characterized by (i) an initial bite, (ii) an expanding ring rash, and (iii) arthritic symptoms, in that order 5 . However, the original 1976 discovery of lyme disease (then termed lyme arthritis) focused exclusively on patients who manifested with arthritic symptoms 6 . This enabled researchers to definitively identify the prognostic value of a ring rash towards arthritis, but not tick bites, due to the latter symptom’s temporal distance from the researcher’s focus. By focusing on predictive features immediately prior to the event in question, researchers capture a biased representation of the full trajectory from healthy-to-diseased. A contemporaneous doctor aware of lyme arthritis examining a patient presenting with a tick bite would miss the possibility of disease until further symptoms developed. Similarly, a predictive model for lyme arthritis focused on ring rashes would report false negatives if it were deployed in practice: patients who had yet to develop ring rashes would contract arthritis at a future time. These errors stem from the incomplete picture of symptoms that was captured.

However, temporal bias is not a problem of the past. The central flaw, an overemphasis on features collected near the case event, still occurs in the literature today. Within the medical domain, there are numerous examples of temporal bias in both clinical medicine and machine learning 7 , 8 , 9 , 10 , 11 , 12 , 13 , 14 , 15 , 16 . Despite increasing interest in machine learning risk prediction, few tools for use on individual patients have become standard practice 17 , 18 . As algorithms trained using large datasets and advanced machine learning methods become more popular, understanding limitations in the way they were generated is critical. In this article, we describe the basis for temporal bias and examine three representative instances of temporal bias in the medical, machine learning, and nutritional literature to identify the impact that this phenomenon has on effect sizes and predictive power.

Of interest are the expansive set of studies that focus on predicting future events in real time and obey the following general conditions. First, events to be predicted take the form of state transitions (healthy-to-diseased, stable-to-failed, control-to-case, etc.). This implies that there exists a bulk population of controls, from which cases differentiate themselves. Soon-to-be cases progress along a trajectory away from the control population at varying speeds. This trajectory terminates at the occurrence of the case event, but the position of control individuals along this trajectory cannot be reliably determined.

Second, we consider that the risk-of-event is equivalent to measuring progress along a control-to-case trajectory in time. Because risk prediction utilizes features from the present to assess the chance of a future event occurring, an event that is truly random would not be appropriate for a risk prediction algorithm. The trajectory represents the ground truth progression along a pathway towards the event in question and are defined relative to the specific populations chosen for the study. This assumes that the researchers have taken the exchangeability 19 of their case and control populations into account: if members of the control population are chosen poorly and cannot experience the case event, then there can be no trajectory.

Third, at the population level, the trajectory commences when the to-be-diseased population first begins to diverge from the non-diseased population and reaches a maximum when the disease event actually occurs. This requires that the trajectory is aligned to the event in question. Diseased individuals must consequently be referred to using terms such as days to disease, while control individuals exist in an undefined point along this timeline, because their days to disease is unknown. This is only required due to the retrospective nature of these studies and is a major departure from prospective deployment.

Finally, the features actually measured by a study represent proxies for an individual’s position along the trajectory. Regardless of their positive or negative association with the event, features subject to temporal bias will tend to diverge between cases and controls with a continuous trajectory, and become better at differentiating the controls from cases as case individuals get closer to their event. This divergence provides the mechanism of action for temporal bias to act. If a model does not possess time varying features (such as a GWAS), temporal bias cannot occur, but predicted risk will also be static with respect to time-to-case-event.

As a result, we can distill prediction studies into a common structure (Fig.  1 ): the members of the diseased population begin as controls at a point in the past, and progress along a trajectory until the disease occurs. Most case-control studies apply a dichotomous framework over this continuous trajectory.

figure 1

Red and green zones represent positions on the trajectory corresponding to outward definitions of diseased and non-diseased status. Vertical arrows represent sampling of population at a particular point of a trajectory. A The (single-class) case-control paradigm often imposes a dichotomous (binary) framework onto a continuous trajectory. B Experiments utilizing observations of cases that are concentrated at the time when the case event occurs cannot capture any information regarding the transition trajectory, resulting in temporal bias. C In order to predict a patient’s position along the trajectory, experiments capturing the entire transition from non-diseased to diseased are necessary.

Temporal bias occurs when cases are sampled unevenly in time across this trajectory (Fig.  1B ). (A theoretical basis for temporal bias is presented in Supplementary Note  1 .) This is a separate but analogous effect compared to selection bias: the control population may be exchangeable with the diseased population but must tautologically exist at a prior point along the disease trajectory compared to cases. Rather than operating over the selection of which patients to include in the study, temporal bias acts over the selection of when each subject is observed.

This important temporal feature yields two implications:

If the features of diseased subjects are evaluated based on a point or window that is defined relative to the case event (a future event, from the perspective of the feature measurements), features in the end of the trajectory will be oversampled. For example, a study that compares individuals one year prior to disease diagnosis to healthy controls will oversample the trajectory one year prior to disease, and undersample the trajectory further out.

The resulting model cannot be prospectively applied because the study design implicitly leaked information from the future: a prospective evaluator has no way of knowing if a particular subject is within the observation window defined by the study. It cannot be known if an individual is one year away from a disease diagnosis in real time.

Temporal bias is intuitively understood within certain epidemiological circles- in fact:

recall bias, caused by the tendency for survey respondents to remember recent events at a higher rate relative to past events, can be interpreted as a specific instance of temporal bias. Similarly, it is understood that case-control studies represent a lower level of evidence relative to other study designs 20 . Methodologies have been proposed that, while not explicitly designed to address temporal bias, happen to be immune to it (density-based sampling, among others 21 ). However, these tend to focus on point exposures or necessitate impractically exact sampling strategies. Despite this important shortcoming, the ease of the case-control framework has allowed temporal bias to proliferate across many fields. We examine three examples, in cardiology, medical machine learning, and nutrition below.

Temporal bias can inflate observed associations and effect sizes

The INTERHEART study 22 examined the association between various risk factors and myocardial infarction (MI) using a matched case-control design among a global cohort. Individuals presenting at hospitals with characteristic MI were defined as cases, and subjected to interviews and blood tests, while matched controls were identified from relatives of MI patients or healthy cardiovascular individuals presenting with unrelated disorders. One risk factor of interest included lipoprotein (a) [Lp(a)], a blood protein 23 , 24 . While Lp(a) levels are thought to be influenced by inheritance, significant intra-individual biological variance with time has been reported 25 , 26 .

One recent analysis utilized data from this study to examine the positive association between blood levels of Lp(a) and MI across different ethnicities and evaluate the possible efficacy of Lp(a) as a risk prediction feature 27 . However, because cases were only sampled at the time of the MI event, the resulting effect sizes are difficult to interpret prospectively. Indexing case patients by their case status leaks information regarding their status to which a physician prospectively examining a patient would not have access to. Intuitively, if Lp(a) was static until a spike immediately prior to an MI event, it could not be used as a prospective risk predictor, even though a significant association would be observed given this experimental design. This limitation cannot be overcome using only the data that was collected, as information regarding the dynamics of Lp(a) over time is missing. To evaluate the influence of temporal bias, we estimated the size of the Lp(a)-MI association had the experiment been done prospectively. This analysis was done by simulating control-to-case trajectories using INTERHEART case/control population Lp(a) distributions by imputing the missing data. We conducted extensive sensitivity testing over different possible trajectories to evaluate the range of possible effect sizes. This approach allowed for the recalculation of the association strength as if the study had been conducted in a prospective manner from the beginning.

Table  1 summarizes the observed effect size in the simulated prospective trials compared to the reported baseline. In all cases, the simulated raw odds ratio between Lp(a) and MI was significantly lower than the observed raw odds ratio due to temporal bias present in the latter measurement. This is intuitive, since case individuals as a group will be more similar to controls (healthier) when sampled at random points in time rather than when they experience an MI event (Fig.  1B ). Although it cannot be proven that prospective effect sizes would be smaller, as this would require longitudinal data that do not exist, this experiment suggests that the degree of temporal bias scales with area under the imputed trajectory. In order to observe the reported odds ratio, the underlying trajectory would need to resemble a Heaviside step function in which cases spontaneously experience a spike in Lp(a) levels at the point of their divergence from the controls, an assumption that is neither explicitly made in the study nor has a basis in biology. We repeated the imputation process with Heaviside step function-based trajectories, varying the position of the impulse in the trajectory (Table  1 ). As the impulse location approaches the beginning of the trajectory, the effect size relative to the baseline approaches 1. This observation illustrates the assumption intrinsic in the original INTERHEART experimental design: that MI individuals had static Lp(a) measurements during the runup to their hospitalizations.

To characterize these findings in a real-world dataset, we examined the Lp(a) test values and MI status of 7128 patients seen at hospitals and clinics within the Partners Healthcare System-representing Brigham and Women’s Hospital and Massachusetts General Hospital among others-who had indications of more than one Lp(a) reading over observed records. This dataset included 28,313 individual Lp(a) tests and 2587 individuals with indications of myocardial infarction. We identified significant intra-individual variation in Lp(a) values in this population: the mean intra-individual standard deviation between tests was 12.2 mg/dl, compared to a mean test result of 49.4 mg/dl. These results are consistent with literature findings of significant intra-individual variance of Lp(a) values 25 , 28 , 29 , challenging the assumption that individuals could have static levels in the runup to MI. Furthermore, in this dataset, biased Lp(a) measurement selection among case exposure values varied the observed association strength between Lp(a) and MI by between 51.9% (preferential selection of lower values) to 137% (preferential selection of higher values) of what would have been observed with random timepoint selection. On the upper end, this is a conservative estimate: we would expect the deviation to increase upon correcting for ascertainment bias in the dataset. Control individuals would be less healthy than true controls, while cases would typically not be sampled immediately prior to an MI, and consequently appear to be healthier than INTERHEART cases. These findings suggest that temporal bias was likely to act in this study design as executed, in a manner that would reduce the observed utility of Lp(a) as a risk predictor for future MI.

Prospective prediction failure due to temporal bias

As the availability of observational data has skyrocketed, event prediction has become a popular task in machine learning. Because of this focus on prediction, many methods utilize the idea of a prediction window: a gap between when an event is observed and when features are collected 12 , 13 . A model that differentiates patients six months prior to MI onset from healthy matched controls may be said to detect MI six months in advance. However, because the window is defined relative to a case event, it represents an uneven sampling of the disease trajectory. Consequently, this prediction requires unfounded assumptions regarding the trajectory of MI onset. For example, if the trajectory is such that patients’ risk in the year prior to the MI is approximately uniform and significantly elevated from the control risk, a model trained in this way would provide many false positive 6-month MI predictions by falsely implicating patients more than 6 months away from an MI. Because window sizes are often chosen without respect to the underlying transition trajectory, significant potential for temporal bias still exists, driven by factors such as differential diagnosis periods or missed exposures.

To illustrate the impact of temporal bias in this case, we constructed predictors for childbirth: a phenotype that was chosen because of its well-defined trajectory. While the trajectory for delivery is a rare example of a step function, we demonstrated in the previous section that the use of case-control effectively imposes a step-function regardless of the true shape of the underlying trajectory. Rather than to present a toy example, this is intended to represent the extreme case of the potential consequences of releasing a predictive model trained in this manner.

In this system, cases and controls are significantly more difficult to distinguish more than nine or ten months prior to delivery compared to later in pregnancy because the case population is not yet pregnant. Features collected while the case population is pregnant are far more informative regarding delivery status. A case-control study that uses a window defined three months prior to delivery will capture these informative, pregnancy related features. In contrast, a cohort study examining all patients in January of a given year will capture largely uninformative features when the case individual’s delivery takes place late in the year (Fig.  2A ).

figure 2

A The ground truth trajectory for delivery (orange) is composed of parts: an informative period, 9–10 months prior to the delivery, and a largely uninformative period prior. Case-control windows (blue) are indexed to delivery/baseline date, and so only sample a single (informative) slice of the trajectory. Cohort windows (green) always occur in January, and so uniformly sample the trajectory. B Model performance (Validation AUROC) for deep recurrent neural networks and logistic regression for each study design. Error bars represent the 95% confidence intervals. Each box represents the results of 10 independently trained models. Box bounds represent upper quartile, lower quartile, and mean. Whiskers represent maxima and minima. C Comparison of confusion matrices for CC-CC (left) and CC-Cohort (right) models. Color intensity corresponds to matrix value. D CC-Cohort validation model confidence distributions for late (Oct/Nov/Dec) deliveries given January features.

Using 2015 data from a de-identified nationwide medical insurance claims dataset, we simulated three studies:

CC-CC: models trained and evaluated under the case-control (CC) paradigm: one month of records, three months prior to the delivery date (cases) or matched baseline date (controls) are used.

CC-Cohort: models trained under the case-control paradigm, but evaluated under the cohort paradigm, where records from January are used to predict delivery in 2015.

Cohort-Cohort: models trained and evaluated under the cohort paradigm.

For each simulated study, records within the observation window of diagnoses, procedures, and prescriptions ordered were fed into both deep recurrent neural nets (RNN) and logistic regression (LR) models.

The significant difference in performance (Fig.  2B ) between CC-CC and CC-Cohort models illustrates a central trait of temporally-biased sampling. Uneven sampling across the transition trajectory improves validation AUC under artificial validation conditions, but model performance collapses when deployed in a prospective manner. In contrast, models designed with the prospective task from the outset (Cohort-Cohort) had intermediate performance that reflected the inherent ambiguity of the available observations. These findings were robust across both RNN and LR-based models. In fact, while the more complex RNN performed better than the logistic regression model for the CC-CC task, it performed worse than the LR on the CC-Cohort task. In this case, methodological improvements on an unrealistic task led to more significant declines in performance on a more realistic task.

For women with October/November/December deliveries, claims data from January are mostly uninformative, and a reliable prediction at that point is not possible at the population level, especially when using features trained during pregnancy. The confusion matrices produced by CC-CC and CC-Cohort models revealed that much of the performance collapse can be traced to false negatives (Fig.  2C ). We examined the confidence that the deep convolutional networks assigned to October/November/December deliveries when evaluated on cohort structured data were predictive (Fig.  2D ). Models trained under using case-control incorrectly label these individuals as high confidence controls, while models trained using cohorts more appropriately capture the intrinsic ambiguity of the prediction task. Clinicians do not have the luxury of examining only patients three months/six months/one year prior to disease incidence: they must assess risk in real time. These studies are common in the machine learning literature- one study even described the act of aligning patients by disease diagnosis time as a feature, and a major reason why their framework was better able to stratify risk 14 . However, aligning patients in this way requires waiting until disease diagnosis, and so the superior risk stratification comes too late to be useful.

It is critical to note that this is a problem that cannot be solved methodologically. As evidenced by the comparison of the performance of the RNN and LR models, novel or exotic machine learning techniques cannot compensate for the fact that the data fed into the models represent a distorted view of the actual population distribution that would be encountered prospectively. Even with perfect measurement and modeling, temporal bias and the issues that result would still be present: the underlying trajectory would still be unobserved.

Temporal bias-induced replication failure

Studies that identify disease risk factors through nutrition data enjoy a particularly high profile among the public 30 . As an example, the Mediterranean diet (characterized by consumption of olive oil, fruits, vegetables, among other factors) has been implicated as a protective factor against coronary heart disease, but the mechanism for this association is unclear. One paper set out to examine whether olive oil consumption specifically was associated with MI using patients from a Spanish hospital 31 . MI patients and matched controls were interviewed regarding their olive oil consumption over the past year, and a protective effect against MI was observed among the highest quintile of olive oil consumers. In response, another group analyzed data from an Italian case-control study and were unable to identify the same association between the upper quintile of olive oil consumption and MI 32 . Crucially, these analyses differed in the size of the observation window used: one year and two years respectively. As a result, not only were these studies sampling the MI trajectory unevenly, they sampled different parts of the MI trajectory. To examine the degree to which differing amounts of temporal bias present in each study could have influenced the results of the study, we utilized longitudinal data from nearly 100,000 individuals from the Nurses’ Health Study (NHS) regarding olive oil consumption patterns and MI to provide a baseline ground truth. We simulated retrospective case-control studies that considered different lookback periods to determine if the presence or magnitude of a protective effect was sensitive to the manner in which an experiment was conducted. Figure  3A details the simulation setup: longitudinal records (Fig.  3A ) were used to identify case (red) and control (green) individuals. MI dates were identified for cases, and baseline dates for controls were selected to match the age distribution of the cases. For each patient, exposures during the lookback time are recorded. The association between MI and the observed exposures were then calculated and the influence of the lookback time on association strength was assessed.

figure 3

A Over a particular time period, longitudinal data of olive oil consumption is continuous for all cohort members with time. Circles represent MI events, while diamonds represent matched, but otherwise arbitrarily chosen baseline points for controls. B Case-control studies arbitrarily align MI patients at the date of the MI. As a result, the time dimension is inverted and anchored to the MI date, the position of controls is consequently lost. C Strength of olive oil consumption-MI association given years of consumption prior to baseline considered. Effect size is normalized to the average 1-year association strength. Points are colored based on statistical significance after FDR correction. Each box plot represents 200 repeated trials. Box bounds represent upper quartile, lower quartile, and mean. Whiskers represent maxima and minima.

The simulated studies that examined one year of past olive oil consumption relative to the MI/baseline date detected a protective effect, as originally observed. However, the magnitude and statistical significance of this effect decayed as the size of the lookback period was increased, consistent with the results of the failed replication. When a two-year lookback period was used, only 41% of simulated studies observed a statistically significant result (Fig.  3C ). The observed protective effect in these cases is an artifact of methodology, rather than medicine, physiology, or society. The act of looking back from the MI date/matched baseline has the effect of inverting the time axis to time-from-MI “and aligning the case individuals (Fig.  3B ). However, no such treatment is possible for control individuals, and their position along the new temporal axis is unknown. As a result, there is no functional basis for comparing healthy individuals to individuals artificially indexed to a future event (MI) because these represent groups that can only be identified retrospectively, after the MI has already occurred. A mismatch exists between the information utilized in the study and the information that patients or physicians would have access to when making dietary decisions. While there may indeed be a prospective association between olive oil and MI, protective or otherwise, the data to observe such an effect was not collected. Because both olive oil consumption and MI risk are time-varying features, the strength of the instantaneous association between the two will naturally depend on when each feature is measured.

Temporal bias can be thought of as a flaw present in the application of case-control experiments to the real-world diagnostic or prognostic task. Because these experiments do not uniformly sample the control-to-case trajectory, features and observations in certain parts of the trajectory are oversampled and assigned disproportionate weight. These observations also do not match the observations that physicians or patients have when assessing risk in real time. Because the case observations that are model-applicable can only be identified after the case event actually occurs, the resulting experimental findings are impossible to use prospectively. Temporal bias serves to amplify differences between the healthy and diseased populations, improving apparent predictive accuracy and exaggerating effect sizes of predictors. In prospective cases, it may also result in researchers failing to discover predictive signals that were outside the window considered. Because the magnitude of its effects is a function of an often-unobserved trajectory, temporal bias is poorly controlled for and can lead to replication bias between studies. The relative impact of temporal bias will scale with the dynamic range of the trajectory: a trajectory that contains large, dramatic changes is susceptible to bias, while trajectories composed of static features (genotype, demographics, etc) will largely be immune.

Temporal bias has existed alongside case-control studies from when they were first utilized. The first documented case-control study in the medical literature was Reverend Henry Whitehead’s follow-up 33 to John Snow’s famous report 34 on the Broad Street cholera outbreak. Whitehead aimed to evaluate Snow’s hypothesis that consuming water from the Broad Street pump led to infection. Whitehead surveyed both families of infected and deceased as well as individuals without cholera regarding their consumption of pump water during the time deaths were observed 35 , 36 .

The outbreak began on August 31st, 1854 34 , with deaths occurring in the days that immediately followed. Whitehead’s efforts in identifying pump-water exposure among outbreak victims focused on the time period between August 30th and September 8th, corresponding to a lookback time between 1 and 10 days, depending on when the victim died. This would normally result in temporal bias towards the end of cholera trajectory. Although Whitehead’s conclusions were ultimately correct, the brief incubation period (2 h to 5 days 37 ) of cholera contributed to the success of the experiment and Whitehead’s later ability to identify the index patient. The rapid transition from healthy to diseased ensured that Whitehead’s chosen lookback time would have uniformly sampled the disease trajectory but is also something Whitehead could not have known at the time. Had Whitehead instead been faced with an outbreak of another waterborne disease such as typhoid fever, which can have an incubation period as long as 30 days 38 , Whitehead’s chosen window would oversample exposure status in the runup to death, leading to temporal bias that would overemphasize features in the latter portion of the disease trajectory (Fig.  4A ). Because the disease etiology and trajectory were unknown at the time, the association between Broad Street water and death is much less clear in the case of a hypothetical typhoid fever epidemic. (In another instance with unclear etiology, a recent survey of COVID-19 predictive algorithms found a significant number utilizing case-control sampling 39 ). Figure  4B summarizes hypothetical interview data given Whitehead’s study design in the case of both a cholera and a typhoid fever outbreak. In the unshaded columns, which represent information he would have access to, the association between pump water consumption and mortality is only clear in the case of cholera.

figure 4

A Whitehead’s cholera study benefited from the short period between infection and death. Had Whitehead been faced with an outbreak of typhoid fever, his sampling strategy would oversample late-stage features. B Hypothetical interview data from Whitehead’s case-control study. Lacking underlying knowledge regarding disease etiology, Whitehead’s experimental design would have experienced temporal bias given a disease with a longer incubation period. Shaded columns represent information hidden to the investigator. C Randomizing the lookback window among case patients can uniformly sample the trajectory, if the lookback times go far back enough. D Evaluating person-days, person-weeks, or person-months can allow for the entire trajectory to be considered. E Conducting a cohort study by creating a well-defined date from which a look forward window is deployed does not uniformly sample the trajectory in all individuals, but is still prospectively implementable since the starting date can be determined in real time.

Many factors have contributed to unconscious adoption of bias-susceptible experimental designs. From a data efficiency perspective, case-control studies are often motivated by large class imbalances. A case-control experiment is one of the only ways to take efficient advantage of all minority class observations in a model. The analogous cohort experiment would require identifying a starting alignment date common to all study subjects. Furthermore, longitudinal observational data are often expensive or difficult to acquire, compared to the ease of one-shot, non-temporal case-control datasets. Without the use of retrospective observations, a case-control study is one of the only types that can be conducted immediately after the study is conceived, rather than waiting for observations to be generated, as in prospective studies.

More concerningly, publication biases towards larger effect sizes and higher accuracy may have driven researchers towards methods that accentuate the differences between cases and controls. Temporal bias can be interpreted as a relatively invisible symptom of this subconscious aversion towards ambiguity in prognostic models. Strong predictive models (in terms of accuracy) are naturally easier to create when structural differences between the two groups are used to provide additional signal. The increasing popularity of large data sets and difficult-to-interpret deep learning techniques facilitates this strategy.

This is not to say that case-control studies should be abandoned wholesale. These studies for practical reasons (data efficiency, cost, ease of deployment) have contributed countless numbers of discoveries across fields. However, a systematic understanding of where and why temporal bias exists is critical in the transition of research findings to applications in the clinic and beyond. There are several strategies to minimize temporal bias where it exists and evaluate its effects otherwise (Fig.  4C–E , examples are provided in Supplementary Note  2 ).

Assuming that a suitable control population can be identified, the following two conditions can enable uniform sampling of the control-to-case trajectory: i) the use of a randomized lookback time, and ii) the length of the maximum lookback time plus the length of the observation window is longer than the transition period.

Person-time classification or prediction tasks, where multiple windows are drawn from sufficiently extended case observations for use can also uniformly sample the trajectory in question. This approach takes the form of sampling case trajectories more than once, and weighing them according to prevalence. This can be facilitated through careful control criteria definitions, as the selection of sicker controls can shorten the trajectory considered in the experiment, likely at the cost of model discriminative ability.

The use of well-defined baseline dates in cohort studies can eliminate temporal bias. Assessing exposure after a particular birthday, at the start of a particular month/year, or after a well-defined event makes the prospective deployment population easier to identify.

Finally, sensitivity analyses combined with researchers’ background domain knowledge regarding the state transition trajectory in question can be used to estimate effects of prospective deployment. An increasing focus on considering the deployability of a given model, the nature of the underlying trajectory, or even whether a particular feature can realistically be predicted from features at hand can also serve to prevent temporal bias from infiltrating a study.

While temporal bias is common and has far reaching implications, it is unique among experimental or epistemological flaws in that once understood, it is fairly easy to detect. As experiments grow broader in scope, transparency regarding the extent to which temporal bias influences findings is key to ensuring the consistency of associations and predictions.

Lipoprotein(a) trajectory imputation

Centiles of lipoprotein A values [Lp(a)] for myocardial infarction (MI) of 4441 Chinese patients (cases) and healthy matched controls (controls) published by Paré et al. 27 were used to construct log-normal distributions of Lp(a) values for each cohort. One hundred fifty thousand case and control measurements were drawn and a linear model was fit to establish the baseline coefficient of association between Lp(a) and MI in the presence of temporal bias. For trajectory imputation, for each case patient, a starting Lp(a) value was generated using one of three methods: (i) random sampling from the control distribution such that the drawn value is smaller than the case value, (ii) percentile matching (if the case value fell in the Nth percentile of the case distribution, the Nth percentile value from the control was drawn), and (iii) a uniform shift of 15% (representing the observation that the median control value was 15% lower than the median case). This starting value is understood to represent the Lp(a) measurement of the case patient in the distant past at the point when they were cardiovascularly healthy. The case-ending value was directly drawn from the published distributions. For each pair of case-starting and case-ending values, a linear/logarithmic/logistic/step function was fit using the two values as starting and ending points. New case observations were generated by randomly selecting a point along the generated trajectory allowing for the computation of a prospective effect size. All individual experiments were repeated 100 times with newly drawn sample cohorts.

To examine the potential impact of inadvertent selection bias on the observed association between Lp(a) and MI, the Lp(a) values and MI for all patients with more than one Lp(a) observations prior to the first recorded MI event were extracted from the Partners Research Patient Data Registry database in a deidentified manner. This work was approved by the Partners Institutional Review Board (Protocol #2018P000016). Case and control patients were defined based on MI status, and for each patient in each cohort, the (i) largest available, (ii) smallest available, and (iii) mean Lp(a) values were computed and used to identify the observed effect size under each selection scheme by fitting a logistic regression model. All calculations were conducted in R (version 3.44) using the glmnet package, version 2.0-16.

Delivery prediction from sequential claims data

Records of health insurance claims in 2015 from a deidentified national database from Aetna, a commercial managed health care company, were utilized for this study. The Harvard Medical School Institutional Review Board waived the requirement for patient consent for analysis of this database as it was deemed to not be human subjects research. Delivery events were identified based on International Classification of Diseases (ICD9/10) diagnostic code, Current Procedural Terminology (CPT) code, or the birth year of newly born members linked by subscriber-parent annotations. Cases were defined as individuals who experienced a delivery between February and December, 2015, while controls were defined as individuals who did not experience a delivery during any of 2015. Thirty thousand cases were randomly selected and matched to 30,000 controls based on age and ZIP code. For each individual, case-control and cohort feature windows were defined. Case-control windows were set as the month of records that was three months prior to the delivery/matched baseline date for cases and controls respectively. Cohort windows were set as the month of records from January, 2015. Three studies were simulated: (1) The CC-CC study consisted of model training using case-control windows and model evaluation using case-control windows. (2) The CC-Cohort study consisted of model training using case-control windows and model evaluation using cohort windows. (3) The Cohort-Cohort study consisted of model training using cohort windows and model evaluation using cohort windows. For each study, deep recurrent neural networks and logistic regression models were trained over the features present in each window. For deep recurrent neural network-based models, the linear sequence of features inside the window was provided in the form of International Classification of Diseases (ICD9) codes for diagnoses, Current Procedural Terminology (CPT) codes for procedures, and National Drug Codes (NDC) for prescriptions. The sequence length was set to 20 events, individual sequences were either padded or clipped to meet this requirement. Logistic regression models utilized binary occurrence matrices for all events as features. Both models contained demographic information in the form of age. Sex was excluded as a feature because all cohort members were female. All calculations were conducted in Python 2.7.3 using the Keras 2.2.0 and scikit-learn 0.18.1 packages.

Simulation of olive oil/myocardial infarction case-control study

Data from the Nurses’ Health Study (NHS) was used for this analysis. All nutrition and disease incidence surveys between 1994 and 2010 were considered. Internal NHS definitions of first MI were utilized to define the case population. Case individuals were only considered if they had at least two consecutive nutritional surveys with answers to all olive oil related questions prior to the first MI event. Individuals with any history of cardiovascular disease including MI and angina were excluded from the control population. Control individuals were only considered if they had at least two consecutive nutritional surveys with answers to all olive oil related questions. In total, 3188 total qualifying MI individuals were identified, and 94,893 controls. A baseline date for each control individual was defined based on the availability of consecutive nutrition surveys. For each case, a matched control was identified based using age at baseline and sex. For all individuals, total cumulative yearly olive oil consumption was computed by summing olive oil added to food and olive oil salad dressing consumption, as validated by Guasch-Ferré et al. 40 . For each experiment, a lookback time between 1 and 4 years was selected, and the cumulative total olive oil consumed during the lookback time relative to the MI date/baseline was calculated. For each lookback time, the effect size between the top quintile (based on total consumption) and the remaining population and statistical significance were calculated using a two-sided t -test. Each experiment, including case-control matching, was repeated 200 times. All calculations were conducted in R (version 3.44) using the glmnet package, version 2.0-16.

Reporting summary

Further information on research design is available in the  Nature Research Reporting Summary linked to this article.

Data availability

The data that support the findings of this study are available from Aetna Insurance, but restrictions apply to the availability of these data, which were used under license for the current study, and so are not publicly available. Please contact N. Palmer ([email protected]) for inquiries about the Aetna dataset. Summary data are, however, available from the authors upon reasonable request and with permission of Aetna Insurance. All data utilized in the study from the Nurses’ Health Study (NHS) is available upon request with the permission of the NHS and can be accessed at https://www.nurseshealthstudy.org/researchers . All data utilized in the study from the Partners Research Patient Data Registry is available upon request with the permission of Partners Healthcare and can be accessed at https://rc.partners.org/research-apps-and-services/identify-subjects-request-data#research-patient-data-registry .

Code availability

Auxiliary code is available at https://github.com/william-yuan/temporalbias

Song, J. W. & Chung, K. C. Observational studies: cohort and case-control studies. Plast. Reconstructive Surg. 126 , 2234–2242 (2010).

Article   CAS   Google Scholar  

Marshall, T. What is a case-control study? Int. J. Epidemiol. 33 , 612–613 (2004).

Lewallen, S. & Courtright, P. Epidemiology in practice: case-control studies. Community Eye Health 11 , 57–58 (1998).

CAS   PubMed   PubMed Central   Google Scholar  

Weiss, J. C., Natarajan, S., Peissig, P. L., McCarty, C. A. & Page, D. Machine learning for personalized medicine: predicting primary myocardial infarction from electronic health records. AI Mag. 33 , 33 (2012).

Article   Google Scholar  

Steere, A. C. et al. Lyme borreliosis. Nat. Rev. Dis. Prim. 2 , 16090 (2016).

Steere, A. C. et al. Lyme arthritis: an epidemic of oligoarticular arthritis in children and adults in three connecticut communities. Arthritis Rheum. 20 , 7–17 (1977).

Norgeot, B. et al. Assessment of a deep learning model based on electronic health record data to forecast clinical outcomes in patients with rheumatoid arthritis. JAMA Netw. Open 2 , e190606 (2019).

Chou, R. C., Kane, M., Ghimire, S., Gautam, S. & Gui, J. Treatment for rheumatoid arthritis and risk of Alzheimer’s disease: a nested case-control analysis. CNS Drugs 30 , 1111–1120 (2016).

Integrative Analysis of Lung Cancer Etiology and Risk (INTEGRAL) Consortium for Early Detection of Lung Cancer. Assessment of lung cancer risk on the basis of a biomarker panel of circulating proteins. JAMA Oncol. 4 , e182078 (2018). et al.

Himes, B. E., Dai, Y., Kohane, I. S., Weiss, S. T. & Ramoni, M. F. Prediction of chronic obstructive pulmonary disease (COPD) in asthma patients using electronic medical records. J. Am. Med. Inform. Assoc. 16 , 371–379 (2009).

Rand, L. I. et al. Multiple factors in the prediction of risk of proliferative diabetic retinopathy. N. Engl. J. Med. 313 , 1433–1438 (1985).

Choi, E., Schuetz, A., Stewart, W. F. & Sun, J. Using recurrent neural network models for early detection of heart failure onset. J. Am. Med. Inform. Assoc. 24 , 361–370 (2017).

Wang, X., Wang, F., Hu, J. & Sorrentino, R. Exploring joint disease risk prediction. AMIA Annu. Symp. Proc. 2014 , 1180–1187 (2014).

PubMed   PubMed Central   Google Scholar  

Ranganath, R., Perotte, A., Elhadad, N. & Blei, D. Deep survival analysis; Proceedings of the 1st Machine Learning for Healthcare Conference, PMLR 56 , 101–114 (2016).

Masino, A. J. et al. Machine learning models for early sepsis recognition in the neonatal intensive care unit using readily available electronic health record data. PLoS One 14 , e0212665 (2019).

Mayhew, M. B. et al. A generalizable 29-mRNA neural-network classifier for acute bacterial and viral infections. Nat. Commun. 11 , 1177 (2020).

Article   CAS   ADS   Google Scholar  

Topol, E. J. High-performance medicine: the convergence of human and artificial intelligence. Nat. Med. 25 , 44–56 (2019).

Rajkomar, A., Dean, J. & Kohane, I. Machine learning in medicine. N. Engl. J. Med. 380 , 1347–1358 (2019).

Hernan, M. A. Estimating causal effects from epidemiological data. J. Epidemiol. Community Health 60 , 578–586 (2006).

Burns, P. B., Rohrich, R. J. & Chung, K. C. The levels of evidence and their role in evidence-based medicine. Plast. Reconstr. Surg. 128 , 305–310 (2011).

Rothman, K. J. Epidemiology: an introduction (Oxford University Press, 2012).

Yusuf, S. et al. Effect of potentially modifiable risk factors associated with myocardial infarction in 52 countries (the INTERHEART study): case-control study. Lancet 364 , 937–952 (2004).

Jacobson, T. A. Lipoprotein(a), Cardiovascular Disease, and Contemporary Management. Mayo Clin. Proc. 88 , 1294–1311 (2013).

Hippe, D. S. et al. Lp(a) (Lipoprotein(a)) levels predict progression of carotid atherosclerosis in subjects with atherosclerotic cardiovascular disease on intensive lipid therapy: an analysis of the AIM-HIGH (Atherothrombosis intervention in metabolic syndrome with low HDL/high triglycerides: impact on global health outcomes) carotid magnetic resonance imaging substudy-brief report. Arterioscler. Thromb. Vasc. Biol. 38 , 673–678 (2018).

Garnotel, R., Monier, F., Lefèvre, F. & Gillery, P. Long-term variability of serum lipoprotein(a) concentrations in healthy fertile women. Clin. Chem. Lab. Med. 36 , 317–321 (1998).

Nazir, D. J., Roberts, R. S., Hill, S. A. & McQueen, M. J. Monthly intra-individual variation in lipids over a 1-year period in 22 normal subjects. Clin. Biochem. 32 , 381–389 (1999).

Paré, G. et al. Lipoprotein(a) levels and the risk of myocardial infarction among 7 ethnic groups. Circulation 139 , 1472–1482 (2019).

Hoffmann, M. M., Schäfer, L., Winkler, K. & König, B. Intraindividual variability of lipoprotein(a) and implications for the decision-making process for lipoprotein(a) lowering therapy. Atherosclerosis 263 , e27 (2017).

Nazir, D. J. & McQueen, M. J. Monthly intra-individual variation in lipoprotein(a) in 22 normal subjects over 12 months. Clin. Biochem. 30 , 163–170 (1997).

Goldberg, J. P. & Hellwig, J. P. Nutrition research in the media: the challenge facing scientists. J. Am. Coll. Nutr. 16 , 544–550 (1997).

CAS   PubMed   Google Scholar  

Fernández-Jarne, E. et al. Risk of first non-fatal myocardial infarction negatively associated with olive oil consumption: a case-control study in Spain. Int. J. Epidemiol. 31 , 474–480 (2002).

Bertuzzi, M., Tavani, A., Negri, E. & La Vecchia, C. Olive oil consumption and risk of non-fatal myocardial infarction in Italy. Int. J. Epidemiol. 31 , 1274–1277 (2002). author reply 1276–7.

Paneth, N., Susser, E. & Susser, M. Origins and early development of the case-control study: Part 1, Early evolution. Soz. Praventivmed. 47 , 282–288 (2002).

Snow, J. On the mode of communication of cholera. Edinb. Med. J. 1 , 668–670 (1856).

Whitehead, H. The broad street pump: an episode in the cholera epidemic of 1854 , 113–122 (Macmillan’s Magazine, 1865).

Newsom, S. W. B. Pioneers in infection control: John Snow, Henry Whitehead, the Broad Street pump, and the beginnings of geographical epidemiology. J. Hospital Infect. 64 , 210–216 (2006).

Centers for Disease Control and Prevention. Cholera – Vibrio cholerae infection. Information for Public Health & Medical Professionals, https://www.cdc.gov/cholera/healthprofessionals.html . (2020).

Mintz, E., Slayton, R. & Walters, M. Typhoid fever and paratyphoid fever. Control of Communicable Diseases Manual (2015) https://doi.org/10.2105/ccdm.2745.149 .

Wynants, Laure et al. Prediction models for diagnosis and prognosis of covid-19: systematic review and critical appraisal. BMJ 369 , m1328 (2020).

Guasch-Ferré, M. et al. Olive oil consumption and risk of type 2 diabetes in US women. Am. J. Clin. Nutr. 102 , 479–486 (2015).AA

Download references

Acknowledgements

W.Y. was supported by the NVIDIA Graduate Fellowship, the T32HD040128 from the NICHD/NIH, and received support from the AWS Cloud Credits for Research and NVIDIA GPU Grant Program.

Author information

Authors and affiliations.

Department of Biomedical Informatics, Harvard Medical School, Boston, MA, USA

William Yuan, Brett K. Beaulieu-Jones, Kun-Hsing Yu, Scott L. Lipnick, Nathan Palmer, Tianxi Cai & Isaac S. Kohane

Department of Stem Cell and Regenerative Biology, Harvard University, Cambridge, MA, USA

Scott L. Lipnick

Center for Assessment Technology and Continuous Health, Massachusetts General Hospital, Boston, MA, USA

Department of Medicine, Brigham and Women’s Hospital, Boston, MA, USA

Joseph Loscalzo

Division of Data Sciences, VA Boston Healthcare System, Boston, MA, USA

Department of Biostatistics, Harvard T. H. Chan School of Public Health, Boston, MA, USA

You can also search for this author in PubMed   Google Scholar

Contributions

Conceptualization: W.Y., B.K.B-J., K-H.Y., T.C., I.S.K.; Methodology, Investigation, Writing – Original Draft: W.Y.; Writing – Review and Editing: All authors; Supervision: I.S.K.

Corresponding authors

Correspondence to William Yuan or Isaac S. Kohane .

Ethics declarations

Competing interests.

The authors declare no competing interests.

Additional information

Peer review information Nature Communications thanks Augusto Di Castelnuovo and the other, anonymous, reviewer(s) for their contribution to the peer review of this work.

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary information

Supplementary information, reporting summary, rights and permissions.

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/ .

Reprints and permissions

About this article

Cite this article.

Yuan, W., Beaulieu-Jones, B.K., Yu, KH. et al. Temporal bias in case-control design: preventing reliable predictions of the future. Nat Commun 12 , 1107 (2021). https://doi.org/10.1038/s41467-021-21390-2

Download citation

Received : 12 September 2020

Accepted : 22 January 2021

Published : 17 February 2021

DOI : https://doi.org/10.1038/s41467-021-21390-2

Share this article

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

This article is cited by

A machine learning model identifies patients in need of autoimmune disease testing using electronic health records.

  • Iain S. Forrest
  • Ben O. Petrazzini

Nature Communications (2023)

An integrated pipeline for prediction of Clostridioides difficile infection

  • Durgesh Chaudhary

Scientific Reports (2023)

TransformEHR: transformer-based encoder-decoder generative model to enhance prediction of disease outcomes using electronic health records

  • Zhichao Yang
  • Avijit Mitra

Optic nerve thickening on high-spatial-resolution MRI predicts early-stage postlaminar optic nerve invasion in retinoblastoma

  • Christiaan M. de Bloeme
  • Robin W. Jansen
  • Marcus C. de Jong

European Radiology (2023)

Subpopulation-specific machine learning prognosis for underrepresented patients with double prioritized bias correction

  • Sharmin Afrose
  • Wenjia Song
  • Danfeng Yao

Communications Medicine (2022)

By submitting a comment you agree to abide by our Terms and Community Guidelines . If you find something abusive or that does not comply with our terms or guidelines please flag it as inappropriate.

Quick links

  • Explore articles by subject
  • Guide to authors
  • Editorial policies

Sign up for the Nature Briefing newsletter — what matters in science, free to your inbox daily.

bias in case study research

Grad Coach

Research Bias 101: What You Need To Know

By: Derek Jansen (MBA) | Expert Reviewed By: Dr Eunice Rautenbach | September 2022

If you’re new to academic research, research bias (also sometimes called researcher bias) is one of the many things you need to understand to avoid compromising your study. If you’re not careful, research bias can ruin the credibility of your study. 

In this post, we’ll unpack the thorny topic of research bias. We’ll explain what it is , look at some common types of research bias and share some tips to help you minimise the potential sources of bias in your research.

Overview: Research Bias 101

  • What is research bias (or researcher bias)?
  • Bias #1 – Selection bias
  • Bias #2 – Analysis bias
  • Bias #3 – Procedural (admin) bias

So, what is research bias?

Well, simply put, research bias is when the researcher – that’s you – intentionally or unintentionally skews the process of a systematic inquiry , which then of course skews the outcomes of the study . In other words, research bias is what happens when you affect the results of your research by influencing how you arrive at them.

For example, if you planned to research the effects of remote working arrangements across all levels of an organisation, but your sample consisted mostly of management-level respondents , you’d run into a form of research bias. In this case, excluding input from lower-level staff (in other words, not getting input from all levels of staff) means that the results of the study would be ‘biased’ in favour of a certain perspective – that of management.

Of course, if your research aims and research questions were only interested in the perspectives of managers, this sampling approach wouldn’t be a problem – but that’s not the case here, as there’s a misalignment between the research aims and the sample .

Now, it’s important to remember that research bias isn’t always deliberate or intended. Quite often, it’s just the result of a poorly designed study, or practical challenges in terms of getting a well-rounded, suitable sample. While perfect objectivity is the ideal, some level of bias is generally unavoidable when you’re undertaking a study. That said, as a savvy researcher, it’s your job to reduce potential sources of research bias as much as possible.

To minimize potential bias, you first need to know what to look for . So, next up, we’ll unpack three common types of research bias we see at Grad Coach when reviewing students’ projects . These include selection bias , analysis bias , and procedural bias . Keep in mind that there are many different forms of bias that can creep into your research, so don’t take this as a comprehensive list – it’s just a useful starting point.

Research bias definition

Bias #1 – Selection Bias

First up, we have selection bias . The example we looked at earlier (about only surveying management as opposed to all levels of employees) is a prime example of this type of research bias. In other words, selection bias occurs when your study’s design automatically excludes a relevant group from the research process and, therefore, negatively impacts the quality of the results.

With selection bias, the results of your study will be biased towards the group that it includes or favours, meaning that you’re likely to arrive at prejudiced results . For example, research into government policies that only includes participants who voted for a specific party is going to produce skewed results, as the views of those who voted for other parties will be excluded.

Selection bias commonly occurs in quantitative research , as the sampling strategy adopted can have a major impact on the statistical results . That said, selection bias does of course also come up in qualitative research as there’s still plenty room for skewed samples. So, it’s important to pay close attention to the makeup of your sample and make sure that you adopt a sampling strategy that aligns with your research aims. Of course, you’ll seldom achieve a perfect sample, and that okay. But, you need to be aware of how your sample may be skewed and factor this into your thinking when you analyse the resultant data.

Need a helping hand?

bias in case study research

Bias #2 – Analysis Bias

Next up, we have analysis bias . Analysis bias occurs when the analysis itself emphasises or discounts certain data points , so as to favour a particular result (often the researcher’s own expected result or hypothesis). In other words, analysis bias happens when you prioritise the presentation of data that supports a certain idea or hypothesis , rather than presenting all the data indiscriminately .

For example, if your study was looking into consumer perceptions of a specific product, you might present more analysis of data that reflects positive sentiment toward the product, and give less real estate to the analysis that reflects negative sentiment. In other words, you’d cherry-pick the data that suits your desired outcomes and as a result, you’d create a bias in terms of the information conveyed by the study.

Although this kind of bias is common in quantitative research, it can just as easily occur in qualitative studies, given the amount of interpretive power the researcher has. This may not be intentional or even noticed by the researcher, given the inherent subjectivity in qualitative research. As humans, we naturally search for and interpret information in a way that confirms or supports our prior beliefs or values (in psychology, this is called “confirmation bias”). So, don’t make the mistake of thinking that analysis bias is always intentional and you don’t need to worry about it because you’re an honest researcher – it can creep up on anyone .

To reduce the risk of analysis bias, a good starting point is to determine your data analysis strategy in as much detail as possible, before you collect your data . In other words, decide, in advance, how you’ll prepare the data, which analysis method you’ll use, and be aware of how different analysis methods can favour different types of data. Also, take the time to reflect on your own pre-conceived notions and expectations regarding the analysis outcomes (in other words, what do you expect to find in the data), so that you’re fully aware of the potential influence you may have on the analysis – and therefore, hopefully, can minimize it.

Analysis bias

Bias #3 – Procedural Bias

Last but definitely not least, we have procedural bias , which is also sometimes referred to as administration bias . Procedural bias is easy to overlook, so it’s important to understand what it is and how to avoid it. This type of bias occurs when the administration of the study, especially the data collection aspect, has an impact on either who responds or how they respond.

A practical example of procedural bias would be when participants in a study are required to provide information under some form of constraint. For example, participants might be given insufficient time to complete a survey, resulting in incomplete or hastily-filled out forms that don’t necessarily reflect how they really feel. This can happen really easily, if, for example, you innocently ask your participants to fill out a survey during their lunch break.

Another form of procedural bias can happen when you improperly incentivise participation in a study. For example, offering a reward for completing a survey or interview might incline participants to provide false or inaccurate information just to get through the process as fast as possible and collect their reward. It could also potentially attract a particular type of respondent (a freebie seeker), resulting in a skewed sample that doesn’t really reflect your demographic of interest.

The format of your data collection method can also potentially contribute to procedural bias. If, for example, you decide to host your survey or interviews online, this could unintentionally exclude people who are not particularly tech-savvy, don’t have a suitable device or just don’t have a reliable internet connection. On the flip side, some people might find in-person interviews a bit intimidating (compared to online ones, at least), or they might find the physical environment in which they’re interviewed to be uncomfortable or awkward (maybe the boss is peering into the meeting room, for example). Either way, these factors all result in less useful data.

Although procedural bias is more common in qualitative research, it can come up in any form of fieldwork where you’re actively collecting data from study participants. So, it’s important to consider how your data is being collected and how this might impact respondents. Simply put, you need to take the respondent’s viewpoint and think about the challenges they might face, no matter how small or trivial these might seem. So, it’s always a good idea to have an informal discussion with a handful of potential respondents before you start collecting data and ask for their input regarding your proposed plan upfront.

Procedural bias

Let’s Recap

Ok, so let’s do a quick recap. Research bias refers to any instance where the researcher, or the research design , negatively influences the quality of a study’s results, whether intentionally or not.

The three common types of research bias we looked at are:

  • Selection bias – where a skewed sample leads to skewed results
  • Analysis bias – where the analysis method and/or approach leads to biased results – and,
  • Procedural bias – where the administration of the study, especially the data collection aspect, has an impact on who responds and how they respond.

As I mentioned, there are many other forms of research bias, but we can only cover a handful here. So, be sure to familiarise yourself with as many potential sources of bias as possible to minimise the risk of research bias in your study.

bias in case study research

Psst… there’s more (for free)

This post is part of our dissertation mini-course, which covers everything you need to get started with your dissertation, thesis or research project. 

You Might Also Like:

Research proposal mistakes

This is really educational and I really like the simplicity of the language in here, but i would like to know if there is also some guidance in regard to the problem statement and what it constitutes.

Alvin Neil A. Gutierrez

Do you have a blog or video that differentiates research assumptions, research propositions and research hypothesis?

Submit a Comment Cancel reply

Your email address will not be published. Required fields are marked *

Save my name, email, and website in this browser for the next time I comment.

  • Print Friendly

Incorporate STEM journalism in your classroom

  • Exercise type: Activity
  • Topic: Science & Society
  • Category: Research & Design
  • Category: Diversity in STEM

How bias affects scientific research

  • Download Student Worksheet

Purpose: Students will work in groups to evaluate bias in scientific research and engineering projects and to develop guidelines for minimizing potential biases.

Procedural overview: After reading the Science News for Students article “ Think you’re not biased? Think again ,” students will discuss types of bias in scientific research and how to identify it. Students will then search the Science News archive for examples of different types of bias in scientific and medical research. Students will read the National Institute of Health’s Policy on Sex as a Biological Variable and analyze how this policy works to reduce bias in scientific research on the basis of sex and gender. Based on their exploration of bias, students will discuss the benefits and limitations of research guidelines for minimizing particular types of bias and develop guidelines of their own.

Approximate class time: 2 class periods

How Bias Affects Scientific Research student guide

Computer with access to the Science News archive

Interactive meeting and screen-sharing application for virtual learning (optional)

Directions for teachers:

One of the guiding principles of scientific inquiry is objectivity. Objectivity is the idea that scientific questions, methods and results should not be affected by the personal values, interests or perspectives of researchers. However, science is a human endeavor, and experimental design and analysis of information are products of human thought processes. As a result, biases may be inadvertently introduced into scientific processes or conclusions.

In scientific circles, bias is described as any systematic deviation between the results of a study and the “truth.” Bias is sometimes described as a tendency to prefer one thing over another, or to favor one person, thing or explanation in a way that prevents objectivity or that influences the outcome of a study or the understanding of a phenomenon. Bias can be introduced in multiple points during scientific research — in the framing of the scientific question, in the experimental design, in the development or implementation of processes used to conduct the research, during collection or analysis of data, or during the reporting of conclusions.

Researchers generally recognize several different sources of bias, each of which can strongly affect the results of STEM research. Three types of bias that often occur in scientific and medical studies are researcher bias, selection bias and information bias.

Researcher bias occurs when the researcher conducting the study is in favor of a certain result. Researchers can influence outcomes through their study design choices, including who they choose to include in a study and how data are interpreted. Selection bias can be described as an experimental error that occurs when the subjects of the study do not accurately reflect the population to whom the results of the study will be applied. This commonly happens as unequal inclusion of subjects of different races, sexes or genders, ages or abilities. Information bias occurs as a result of systematic errors during the collection, recording or analysis of data.

When bias occurs, a study’s results may not accurately represent phenomena in the real world, or the results may not apply in all situations or equally for all populations. For example, if a research study does not address the full diversity of people to whom the solution will be applied, then the researchers may have missed vital information about whether and how that solution will work for a large percentage of a target population.

Bias can also affect the development of engineering solutions. For example, a new technology product tested only with teenagers or young adults who are comfortable using new technologies may have user experience issues when placed in the hands of older adults or young children.

Want to make it a virtual lesson? Post the links to the  Science News for Students article “ Think you’re not biased? Think again ,” and the National Institutes of Health information on sickle-cell disease . A link to additional resources can be provided for the students who want to know more. After students have reviewed the information at home, discuss the four questions in the setup and the sickle-cell research scenario as a class. When the students have a general understanding of bias in research, assign students to breakout rooms to look for examples of different types of bias in scientific and medical research, to discuss the Science News article “ Biomedical studies are including more female subjects (finally) ” and the National Institute of Health’s Policy on Sex as a Biological Variable and to develop bias guidelines of their own. Make sure the students have links to all articles they will need to complete their work. Bring the groups back together for an all-class discussion of the bias guidelines they write.

Assign the Science News for Students article “ Think you’re not biased? Think again ” as homework reading to introduce students to the core concepts of scientific objectivity and bias. Request that they answer the first two questions on their guide before the first class discussion on this topic. In this discussion, you will cover the idea of objective truth and introduce students to the terminology used to describe bias. Use the background information to decide what level of detail you want to give to your students.

As students discuss bias, help them understand objective and subjective data and discuss the importance of gathering both kinds of data. Explain to them how these data differ. Some phenomena — for example, body temperature, blood type and heart rate — can be objectively measured. These data tend to be quantitative. Other phenomena cannot be measured objectively and must be considered subjectively. Subjective data are based on perceptions, feelings or observations and tend to be qualitative rather than quantitative. Subjective measurements are common and essential in biomedical research, as they can help researchers understand whether a therapy changes a patient’s experience. For instance, subjective data about the amount of pain a patient feels before and after taking a medication can help scientists understand whether and how the drug works to alleviate pain. Subjective data can still be collected and analyzed in ways that attempt to minimize bias.

Try to guide student discussion to include a larger context for bias by discussing the effects of bias on understanding of an “objective truth.” How can someone’s personal views and values affect how they analyze information or interpret a situation?

To help students understand potential effects of biases, present them with the following scenario based on information from the National Institutes of Health :

Sickle-cell disease is a group of inherited disorders that cause abnormalities in red blood cells. Most of the people who have sickle-cell disease are of African descent; it also appears in populations from the Mediterranean, India and parts of Latin America. Males and females are equally likely to inherit the condition. Imagine that a therapy was developed to treat the condition, and clinical trials enlisted only male subjects of African descent. How accurately would the results of that study reflect the therapy’s effectiveness for all people who suffer from sickle-cell disease?

In the sickle-cell scenario described above, scientists will have a good idea of how the therapy works for males of African descent. But they may not be able to accurately predict how the therapy will affect female patients or patients of different races or ethnicities. Ask the students to consider how they would devise a study that addressed all the populations affected by this disease.

Before students move on, have them answer the following questions. The first two should be answered for homework and discussed in class along with the remaining questions.

1.What is bias?

In common terms, bias is a preference for or against one idea, thing or person. In scientific research, bias is a systematic deviation between observations or interpretations of data and an accurate description of a phenomenon.

2. How can biases affect the accuracy of scientific understanding of a phenomenon? How can biases affect how those results are applied?

Bias can cause the results of a scientific study to be disproportionately weighted in favor of one result or group of subjects. This can cause misunderstandings of natural processes that may make conclusions drawn from the data unreliable. Biased procedures, data collection or data interpretation can affect the conclusions scientists draw from a study and the application of those results. For example, if the subjects that participate in a study testing an engineering design do not reflect the diversity of a population, the end product may not work as well as desired for all users.

3. Describe two potential sources of bias in a scientific, medical or engineering research project. Try to give specific examples.

Researchers can intentionally or unintentionally introduce biases as a result of their attitudes toward the study or its purpose or toward the subjects or a group of subjects. Bias can also be introduced by methods of measuring, collecting or reporting data. Examples of potential sources of bias include testing a small sample of subjects, testing a group of subjects that is not diverse and looking for patterns in data to confirm ideas or opinions already held.

4. How can potential biases be identified and eliminated before, during or after a scientific study?

Students should brainstorm ways to identify sources of bias in the design of research studies. They may suggest conducting implicit bias testing or interviews before a study can be started, developing guidelines for research projects, peer review of procedures and samples/subjects before beginning a study, and peer review of data and conclusions after the study is completed and before it is published. Students may focus on the ideals of transparency and replicability of results to help reduce biases in scientific research.

Obtain and evaluate information about bias

Students will now work in small groups to select and analyze articles for different types of bias in scientific and medical research. Students will start by searching the Science News or Science News for Students archives and selecting articles that describe scientific studies or engineering design projects. If the Science News or Science News for Students articles chosen by students do not specifically cite and describe a study, students should consult the Citations at the end of the article for links to related primary research papers. Students may need to read the methods section and the conclusions of the primary research paper to better understand the project’s design and to identify potential biases. Do not assume that every scientific paper features biased research.

Student groups should evaluate the study or engineering design project outlined in the article to identify any biases in the experimental design, data collection, analysis or results. Students may need additional guidance for identifying biases. Remind them of the prior discussion about sources of bias and task them to review information about indicators of bias. Possible indicators include extreme language such as all , none or nothing ; emotional appeals rather than logical arguments; proportions of study subjects with specific characteristics such as gender, race or age; arguments that support or refute one position over another and oversimplifications or overgeneralizations. Students may also want to look for clues related to the researchers’ personal identity such as race, religion or gender. Information on political or religious points of view, sources of funding or professional affiliations may also suggest biases.

Students should also identify any deliberate attempts to reduce or eliminate bias in the project or its results. Then groups should come back together and share the results of their analysis with the class.

If students need support in searching the archives for appropriate articles, encourage groups to brainstorm search terms that may turn up related articles. Some potential search terms include bias , study , studies , experiment , engineer , new device , design , gender , sex , race , age , aging , young , old , weight , patients , survival or medical .

If you are short on time or students do not have access to the Science News or Science News for Students archive, you may want to provide articles for students to review. Some suggested articles are listed in the additional resources  below.

Once groups have selected their articles, students should answer the following questions in their groups.

1. Record the title and URL of the article and write a brief summary of the study or project.

Answers will vary, but students should accurately cite the article evaluated and summarize the study or project described in the article. Sample answer: We reviewed the Science News article “Even brain images can be biased,” which can be found at www.sciencenews.org/blog/scicurious/even-brain-images-can-be-biased. This article describes how scientific studies of human brains that involve electronic images of brains tend to include study subjects from wealthier and more highly educated households and how researchers set out to collect new data to make the database of brain images more diverse.

2. What sources of potential bias (if any) did you identify in the study or project? Describe any procedures or policies deliberately included in the study or project to eliminate biases.

The article “Even brain images can be biased” describes how scientists identified a sampling bias in studies of brain images that resulted from the way subjects were recruited. Most of these studies were conducted at universities, so many college students volunteer to participate, which resulted in the samples being skewed toward wealthier, educated, white subjects. Scientists identified a database of pediatric brain images and evaluated the diversity of the subjects in that database. They found that although the subjects in that database were more ethnically diverse than the U.S. population, the subjects were generally from wealthier households and the parents of the subjects tended to be more highly educated than average. Scientists applied statistical methods to weight the data so that study samples from the database would more accurately reflect American demographics.

3. How could any potential biases in the study or design project have affected the results or application of the results to the target population?

Scientists studying the rate of brain development in children were able to recognize the sampling bias in the brain image database. When scientists were able to apply statistical methods to ensure a better representation of socioeconomically diverse samples, they saw a different pattern in the rate of brain development in children. Scientists learned that, in general, children’s brains matured more quickly than they had previously thought. They were able to draw new conclusions about how certain factors, such as family wealth and education, affected the rate at which children’s brains developed. But the scientsits also suggested that they needed to perform additional studies with a deliberately selected group of children to ensure true diversity in the samples.

In this phase, students will review the Science News article “ Biomedical studies are including more female subjects (finally) ” and the NIH Policy on Sex as a Biological Variable , including the “ guidance document .” Students will identify how sex and gender biases may have affected the results of biomedical research before NIH instituted its policy. The students will then work with their group to recommend other policies to minimize biases in biomedical research.

To guide their development of proposed guidelines, students should answer the following questions in their groups.

1. How have sex and gender biases affected the value and application of biomedical research?

Gender and sex biases in biomedical research have diminished the accuracy and quality of research studies and reduced the applicability of results to the entire population. When girls and women are not included in research studies, the responses and therapeutic outcomes of approximately half of the target population for potential therapies remain unknown.

2. Why do you think the NIH created its policy to reduce sex and gender biases?

In the guidance document, the NIH states that “There is a growing recognition that the quality and generalizability of biomedical research depends on the consideration of key biological variables, such as sex.” The document goes on to state that many diseases and conditions affect people of both sexes, and restricting diversity of biological variables, notably sex and gender, undermines the “rigor, transparency, and generalizability of research findings.”

3. What impact has the NIH Policy on Sex as a Biological Variable had on biomedical research?

The NIH’s policy that sex is factored into research designs, analyses and reporting tries to ensure that when developing and funding biomedical research studies, researchers and institutes address potential biases in the planning stages, which helps to reduce or eliminate those biases in the final study. Including females in biomedical research studies helps to ensure that the results of biomedical research are applicable to a larger proportion of the population, expands the therapies available to girls and women and improves their health care outcomes.

4. What other policies do you think the NIH could institute to reduce biases in biomedical research? If you were to recommend one set of additional guidelines for reducing bias in biomedical research, what guidelines would you propose? Why?

Students could suggest that the NIH should have similar policies related to race, gender identity, wealth/economic status and age. Students should identify a category of bias or an underserved segment of the population that they think needs to be addressed in order to improve biomedical research and health outcomes for all people and should recommend guidelines to reduce bias related to that group. Students recommending guidelines related to race might suggest that some populations, such as African Americans, are historically underserved in terms of access to medical services and health care, and they might suggest guidelines to help reduce the disparity. Students might recommend that a certain percentage of each biomedical research project’s sample include patients of diverse racial and ethnic backgrounds.

5. What biases would your suggested policy help eliminate? How would it accomplish that goal?

Students should describe how their proposed policy would address a discrepancy in the application of biomedical research to the entire human population. Race can be considered a biological variable, like sex, and race has been connected to higher or lower incidence of certain characteristics or medical conditions, such as blood types or diabetes, which sometimes affect how the body reponds to infectious agents, drugs, procedures or other therapies. By ensuring that people from diverse racial and ethnic groups are included in biomedical research studies, scientists and medical professionals can provide better medical care to members of those populations.

Class discussion about bias guidelines

Allow each group time to present its proposed bias-reducing guideline to another group and to receive feedback. Then provide groups with time to revise their guidelines, if necessary. Act as a facilitator while students conduct the class discussion. Use this time to assess individual and group progress. Students should demonstrate an understanding of different biases that may affect patient outcomes in biomedical research studies and in practical medical settings. As part of the group discussion, have students answer the following questions.

1. Why is it important to identify and eliminate biases in research and engineering design?

The goal of most scientific research and engineering projects is to improve the quality of life and the depth of understanding of the world we live in. By eliminating biases, we can better serve the entirety of the human population and the planet .

2. Were there any guidelines that were suggested by multiple groups? How do those actions or policies help reduce bias?

Answers will depend on the guidelines developed and recommended by other groups. Groups could suggest policies related to race, gender identity, wealth/economic status and age. Each group should clearly identify how its guidelines are designed to reduce bias and improve the quality of human life.

3. Which guidelines developed by your classmates do you think would most reduce the effects of bias on research results or engineering designs? Support your selection with evidence and scientific reasoning.

Answers will depend on the guidelines developed and recommended by other groups. Students should agree that guidelines that minimize inequities and improve health care outcomes for a larger group are preferred. Guidelines addressing inequities of race and wealth/economic status are likely to expand access to improved medical care for the largest percentage of the population. People who grow up in less economically advantaged settings have specific health issues related to nutrition and their access to clean water, for instance. Ensuring that people from the lowest economic brackets are represented in biomedical research improves their access to medical care and can dramatically change the length and quality of their lives.

Possible extension

Challenge students to honestly evaluate any biases they may have. Encourage them to take an Implicit Association Test (IAT) to identify any implicit biases they may not recognize. Harvard University has an online IAT platform where students can participate in different assessments to identify preferences and biases related to sex and gender, race, religion, age, weight and other factors. You may want to challenge students to take a test before they begin the activity, and then assign students to take a test after completing the activity to see if their preferences have changed. Students can report their results to the class if they want to discuss how awareness affects the expression of bias.

Additional resources

If you want additional resources for the discussion or to provide resources for student groups, check out the links below.

Additional Science News articles:

Even brain images can be biased

Data-driven crime prediction fails to erase human bias

What we can learn from how a doctor’s race can affect Black newborns’ survival

Bias in a common health care algorithm disproportionately hurts black patients

Female rats face sex bias too

There’s no evidence that a single ‘gay gene’ exists

Positive attitudes about aging may pay off in better health

What male bias in the mammoth fossil record says about the animal’s social groups

The man flu struggle might be real, says one researcher

Scientists may work to prevent bias, but they don’t always say so

The Bias Finders

Showdown at Sex Gap

University resources:

Project Implicit (Take an Implicit Association Tests)

Catalogue of Bias

Understanding Health Research

Featured Topics

Featured series.

A series of random questions answered by Harvard experts.

Explore the Gazette

Read the latest.

I. Glenn Cohen in office.

Up next for Supreme Court on abortion: Idaho

Panelists Daniel Ziblatt, (from left) Harvard, Manisha Sinha, Univ. of Connecticut, Gary Gerstle, Univ. of Cambridge, Carol Anderson, Emory.

Historian sees a warning for today in post-Civil War U.S.

Kevin McCarthy.

McCarthy says immigration, abortion, economy to top election issues

Mahzarin Banaji opened the symposium on Tuesday by recounting the “implicit association” experiments she had done at Yale and at Harvard. The final talk is today at 9 a.m.

Kris Snibbe/Harvard Staff Photographer

Turning a light on our implicit biases

Brett Milano

Harvard Correspondent

Social psychologist details research at University-wide faculty seminar

Few people would readily admit that they’re biased when it comes to race, gender, age, class, or nationality. But virtually all of us have such biases, even if we aren’t consciously aware of them, according to Mahzarin Banaji, Cabot Professor of Social Ethics in the Department of Psychology, who studies implicit biases. The trick is figuring out what they are so that we can interfere with their influence on our behavior.

Banaji was the featured speaker at an online seminar Tuesday, “Blindspot: Hidden Biases of Good People,” which was also the title of Banaji’s 2013 book, written with Anthony Greenwald. The presentation was part of Harvard’s first-ever University-wide faculty seminar.

“Precipitated in part by the national reckoning over race, in the wake of George Floyd, Breonna Taylor and others, the phrase ‘implicit bias’ has almost become a household word,” said moderator Judith Singer, Harvard’s senior vice provost for faculty development and diversity. Owing to the high interest on campus, Banaji was slated to present her talk on three different occasions, with the final one at 9 a.m. Thursday.

Banaji opened on Tuesday by recounting the “implicit association” experiments she had done at Yale and at Harvard. The assumptions underlying the research on implicit bias derive from well-established theories of learning and memory and the empirical results are derived from tasks that have their roots in experimental psychology and neuroscience. Banaji’s first experiments found, not surprisingly, that New Englanders associated good things with the Red Sox and bad things with the Yankees.

She then went further by replacing the sports teams with gay and straight, thin and fat, and Black and white. The responses were sometimes surprising: Shown a group of white and Asian faces, a test group at Yale associated the former more with American symbols though all the images were of U.S. citizens. In a further study, the faces of American-born celebrities of Asian descent were associated as less American than those of white celebrities who were in fact European. “This shows how discrepant our implicit bias is from even factual information,” she said.

How can an institution that is almost 400 years old not reveal a history of biases, Banaji said, citing President Charles Eliot’s words on Dexter Gate: “Depart to serve better thy country and thy kind” and asking the audience to think about what he may have meant by the last two words.

She cited Harvard’s current admission strategy of seeking geographic and economic diversity as examples of clear progress — if, as she said, “we are truly interested in bringing the best to Harvard.” She added, “We take these actions consciously, not because they are easy but  because they are in our interest and in the interest of society.”

Moving beyond racial issues, Banaji suggested that we sometimes see only what we believe we should see. To illustrate she showed a video clip of a basketball game and asked the audience to count the number of passes between players. Then the psychologist pointed out that something else had occurred in the video — a woman with an umbrella had walked through — but most watchers failed to register it. “You watch the video with a set of expectations, one of which is that a woman with an umbrella will not walk through a basketball game. When the data contradicts an expectation, the data doesn’t always win.”

Expectations, based on experience, may create associations such as “Valley Girl Uptalk” is the equivalent of “not too bright.” But when a quirky way of speaking spreads to a large number of young people from certain generations,  it stops being a useful guide. And yet, Banaji said, she has been caught in her dismissal of a great idea presented in uptalk.  Banaji stressed that the appropriate course of action is not to ask the person to change the way she speaks but rather for her and other decision makers to know that using language and accents to judge ideas is something people at their own peril.

Banaji closed the talk with a personal story that showed how subtler biases work: She’d once turned down an interview because she had issues with the magazine for which the journalist worked.

The writer accepted this and mentioned she’d been at Yale when Banaji taught there. The professor then surprised herself by agreeing to the interview based on this fragment of shared history that ought not to have influenced her. She urged her colleagues to think about positive actions, such as helping that perpetuate the status quo.

“You and I don’t discriminate the way our ancestors did,” she said. “We don’t go around hurting people who are not members of our own group. We do it in a very civilized way: We discriminate by who we help. The question we should be asking is, ‘Where is my help landing? Is it landing on the most deserved, or just on the one I shared a ZIP code with for four years?’”

To subscribe to short educational modules that help to combat implicit biases, visit outsmartinghumanminds.org .

Share this article

You might like.

Justices to hear case on near-complete ban amid shifting legal landscape after overturn of Roe

Panelists Daniel Ziblatt, (from left) Harvard, Manisha Sinha, Univ. of Connecticut, Gary Gerstle, Univ. of Cambridge, Carol Anderson, Emory.

Past is present at Warren Center symposium featuring scholars from Harvard, Emory, UConn, and University of Cambridge

Kevin McCarthy.

Former House speaker also says Trump would likely win if election were held today in wide-ranging talk

Harvard announces return to required testing

Leading researchers cite strong evidence that testing expands opportunity

For all the other Willie Jacks

‘Reservation Dogs’ star Paulina Alexis offers behind-the-scenes glimpse of hit show, details value of Native representation

Good genes are nice, but joy is better

Harvard study, almost 80 years old, has proved that embracing community helps us live longer, and be happier

What Is Meant by ‘Bias’ in Psychological Science?

  • First Online: 14 September 2023

Cite this chapter

Book cover

  • Craig L. Frisby 5  

683 Accesses

Frisby begins by characterizing science as a social construct guided by standardized values, procedures, and the need to be conducted in an honest manner. This characterization is exemplified by the Mertonian norms of universalism, disinterestedness, communality, and organized skepticism. The construct of bias, as it has been applied to a wide variety of academic disciplines and applied professions, is defined generally as a near-universal human phenomenon that easily influences psychological science. Various types of bias studied by psychologists are discussed briefly, with the conclusion that bias essentially represents an undermining of truth that has both internal and external sources, misrepresents and/or promulgates falsehoods, is a poor model for students in training, and represents hypocrisy among psychologists who champion unbiased research methods.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
  • Available as EPUB and PDF
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

American Educational Research Association, American Psychological Association, National Council on Measurement in Education. (2014). Standards for educational and psychological testing . American Psychological Association.

Google Scholar  

American Psychological Association. (2020). Publication manual of the American Psychological Association (7th ed.). American Psychological Association.

Atkins, L. (2016). Skewed: A critical thinker’s guide to media bias . Prometheus.

Attkisson, S. (2014). Stonewalled: My fight for truth against the forces of obstruction, intimidation, and harassment in Obama’s Washington . Harper.

Bakker, M., van Dijk, A., & Wicherts, J. M. (2012). The rules of the game called psychological science. Perspectives on Psychological Science, 7 (6), 543–554.

Article   PubMed   Google Scholar  

Banji, M. R., & Greenwald, A. G. (2016). Blindspot: Hidden biases of good people . Bantam Books.

Barr, D. A. (2019). Health disparities in the United States: Social class, race, ethnicity, and the social determinants of health (3rd ed.). Johns Hopkins University Press.

Book   Google Scholar  

Baum, M., & Zhukov, Y. (2018). Media ownership and news coverage of international conflict. Political Communication, 36 (1), 36–63.

Article   Google Scholar  

Bauman, N. (2022). Air-bone gap – what’s that? Center for Hearing Loss Help . Accessed Jan 2022 at https://hearinglosshelp.com/blog/air-bone-gap-whats-that/

Beeghly, E., & Madva, A. (Eds.). (2020). An introduction to implicit bias: Knowledge, justice, and the social mind . Routledge.

Benson, T. A., & Fiarman, S. E. (2019). Unconscious bias in schools: A developmental approach to exploring race and racism . Harvard Education Press.

Berezow, A. B. (2012). Why psychology isn’t science. Los Angeles Times . Accessed Jan 2022 from http://latimes.com/

Boomgaarden, H. G., & Wagner, M. (2015). One bias fits all? Three types of media bias and their effects on party preferences. Communication Research, 44 (8), 1125–1148.

Booten, M. (2020). What is political media bias? How does it affect you? Politic-Ed . Accessed Mar 2022 from https://politic-ed.com/2020/03/16/what-is-political-media-bias/

Borodkin, L. (2018). How to manage implicit bias in law enforcement. Pradco . Accessed Mar 2022 from https://www.pradco.com/safety-forces/how-to-manage-implicit-bias-in-law-enforcement/

Brandenburg, H. (2006). Party strategy and media bias: A quantitative analysis of the 2005 UK election campaign. Journal of Elections, Public Opinion and Parties, 16 (2), 157–178.

Carter, E.C., Schõnbrodt, Gervais, W.M., & F.D., Hilgard (2019). Correcting for bias in psychology: A comparison of meta-analytic methods. Advances in Methods and Practices in Psychological Science, 2 (2), 115–144.

Chan, A. W., Krleža-Jerić, K., Schmid, I., & Altman, D. G. (2004). Outcome reporting bias in randomized trials funded by the Canadian Institutes of Health Research. Canadian Medical Association Journal, 171 , 735–740.

Article   PubMed Central   PubMed   Google Scholar  

Clark, C. J., & Winegard, B. M. (2020). Tribalism in war and peace: The nature and evolution of ideological epistemology and its significance for modern social science. Psychological Inquiry, 31 (1), 1–22.

Coburn, K. M., & Vevea, J. L. (2015). Publication bias as a function of study characteristics. Psychological Methods, 20 (3), 310–330.

Costello, T. H., Clark, C., & Tetlock, P. E. (2022). Shoring up the shaky psychological foundations of a micro-economic model of ideology: Adversarial collaboration solutions. Psychological Inquiry, 33 (2), 88–94.

Crawford, J.T. & Jussim, L. (Eds.) (2018). The politics of social psychology. New York: Routledge.

Dawes, R. M. (1994). House of cards: Psychology and psychotherapy built on myth . Free Press.

De Angelis, C., et al. (2004). Clinical trial registration: A statement from the international committee of medical journal editors. New England Journal of Medicine, 351 (12), 1250–1251.

Ditto, P., Liu, B., Clark, C., Wojcik, S., Chen, E., Grady, R., et al. (2019). At least bias is bipartisan: A meta-analytic comparison of partisan bias in liberals and conservatives. Perspectives on Psychological Science, 14 (2), 273–291.

Earp, B. D., Monrad, J. T., LaFrance, M., Bargh, J. A., Cohen, L. L., & Richeson, J. A. (2019). Gender bias in pediatric pain assessment. Journal of Pediatric Psychology, 44 (4), 403–414.

Font, S. A., Berger, L. M., & Slack, K. S. (2012). Examining racial disproportionality in child protective services case decisions. Children and Youth Services Review, 34 (11), 2188–2200.

Franz, B., Dhanani, L. Y., & Miller, W. C. (2021). Rural-urban differences in physician bias toward patients with opioid use disorder. Psychiatric Services, 72 (8), 874–879.

Frisby, C. L. (2023). What is meant by the ‘Politics of Psychology’? In C. Frisby, R. Redding, W. O’Donohue, & S. Lilienfeld (Eds.), Ideological and political bias in psychology: Nature, scope, and solutions (pp. xxx–xxx). Springer.

Garb, H. N. (1997). Race bias, social class bias, and gender bias in clinical judgment. Clinical Psychology: Science and Practice, 4 (2), 99–120.

Gazzaninga, M. S. (2011). Who’s in charge? Free will and the science of the brain . HarperCollins.

Geher, G. (2018, September 11). The problem with psychology: A brief history of the heterodox movement in psychology. Psychology Today . Accessed Mar 2022 at https://www.psychologytoday.com/us/blog/darwins-subterranean-world/201809/the-problem-psychology

Gilens, M., & Hertzman, C. (2000). Corporate ownership and news bias: Newspaper coverage of the 1996 Telecommunications Act. Journal of Politics, 62 (2), 369–386.

Greenwald, A. G., & Banaji, M. R. (1995). Implicit social cognition: Attitudes, self-esteem, and stereotypes. Psychological Review, 102 (1), 4–27.

Groeling, T. (2013). Media bias by the numbers: Challenges and opportunities in the empirical study of partisan news. Annual Review of Political Science, 16 (1), 129–151.

Groseclose, T. (2012). Left turn: How liberal media bias distorts the American mind . St. Martin’s Griffin.

Haidt, J. (2012). The righteous mind: Why good people and divided by politics and religion . Pantheon.

Hammer, D. (1996). More than misconceptions: Multiple perspectives on student knowledge and reasoning, and an appropriate role for education research. American Journal of Physics, 64 , 1316–1325.

Hartsfield, T. (2015, November 2). Statistics shows psychology is not science. RealClear Science . Accessed Jan 2022 from https://www.realclearscience.com/blog/2015/11/the_trouble_with_social_science_statistics.html

Hirsh, A., Miller, M., Hollingshead, N., Anastas, T., Carnell, S., Lok, B., et al. (2019). A randomized controlled trial testing a virtual perspective-taking intervention to reduce race and socioeconomic status disparities in pain care. Pain, 160 (10), 2229–2240.

Hofstetter, C., & Buss, T. F. (1978). Bias in television news coverage of political events: A methodological analysis. Journal of Broadcasting, 22 (4), 517–530.

Howe, L. C., & Krosnick, J. A. (2017). Attitude strength. Annual Review of Psychology, 68 , 327–351.

Inbar, Y., & Lammers, J. (2012). Political diversity in social and personality psychology. Perspectives on Psychological Science, 7 (5), 496–503.

Iyengar, S., Lelkes, Y., Levendusky, M., Malhotra, N., & Westwood, S. J. (2019). The origins and consequences of affective polarization in the United States. Annual Review of Political Science, 22 , 129–146.

Jensen, A. R. (1980). Bias in mental testing . Free Press.

John, L. K., Lowenstein, G. F., & Prelec, D. (2012). Measuring the prevalence of questionable research practices with incentives for truth telling. Psychological Science, 23 (5), 524–532.

Jussim, L. (2019, December 27). The threat to academic freedom . . . From academics. PsychRabble . Accessed Mar 2022 from https://psychrabble.medium.com/the-threat-to-academic-freedom-from-academics-4685b1705794

Kahneman, D., Slovic, P., & Tversky, A. (Eds.). (1999). Judgment under uncertainty: Heuristics and biases . Cambridge University Press.

Kellogg, D. (2006). Toward a post-academic science policy: Scientific communication and the collapse of the Mertonian norms. International Journal of Communications Law and Policy , 1–29.

Kidd, V., Spisak, J., Vanderlinden, S., & Kayingo, G. (2022). A survey of implicit bias training in physician-assistant and nurse practitioner postgraduate fellowship/residency programs. Research Square . Accessed Mar 2022 at https://www.researchgate.net/publication/358813700_A_survey_of_implicit_bias_training_in_physician_assistant_and_nurse_practitioner_postgraduate_fellowshipresidency_programs

Kirkegaard, E. (2020, May 7). Reverse publication bias: A collection. Clear Language, Clear Mind . Accessed Mar 2022 from https://emilkirkegaard.dk/en/2020/05/reverse-publication-bias-a-collection/

Klick, J., & Satel, S. (2006). The health disparities myth: Diagnosing the treatment gap . AEI Press.

Kühberger, A., Fritz, A., & Scherndl, T. (2014). Publication bias in psychology: A diagnosis based on the correlation between effect size and sample size. PLoS One, 9 (9) Accessed Jan 2022 from https://content.ebscohost.com/ContentServer.asp?T=P & P=AN & K=98618563 & S=L & D=afh & EbscoContent=dGJyMNLe80Sep7E4yNfsOLCmsEqep7VSs6q4SK%2BWxWXS & ContentCustomer=dGJyMPGusku0rLJLuePfgeyx4YHs1%2BaE

Kunda, Z. (1990). The case for motivated reasoning. Psychological Bulletin, 108 , 480–498.

Lee, I. (2021, April 18). 4 reasons why correlation does not imply causation . Accessed Dec 2021 from https://towardsdatascience.com/4-reasons-why-correlation-does-not-imply-causation-f202f69fe979

Lessig, L. (2018). How academic corruption works. The Chronicle of Higher Education, 65 (6). Accessed May 2023 from https://www.chronicle.com/article/howacademic-corruption-works/

Lilienfeld, S., & Waldman, I. D. (Eds.) (2017). Psychological science under scrutiny: Recent challenges and proposed solutions. West Sussex, UK: John Wiley & Sons.

Lilienfeld, S., & Basterfield, C. (2020). Reflective practice in clinical psychology: Reflections from basic psychological science. Clinical Psychology: Science and Practice, 27 (4). Accessed Jan 2022 from https://scottlilienfeld.com/wp-content/uploads/2021/01/lilienfeld2020.pdf

Lilienfeld, S. O., & Waldman, I. D. (Eds.). (2007). Psychological science under scrutiny: Recent challenges and proposed solutions . Wiley Blackwell.

Lilienfeld, S. O., Lynn, S. J., Ruscio, J., & Beyerstein, B. L. (2010). 50 great myths of popular psychology: Shattering widespread misconceptions about human behavior . Wiley-Blackwell.

MacIntyre, L. (2019). The scientific attitude: Defending science from denial, fraud, and pseuedoscience . MIT Press.

Margolis, R. H., Wilson, R. H., Popelka, G. R., Eikelboom, R. H., Swanepoel, D., & Saly, G. L. (2016). Distribution characteristics of air-bone gaps: Evidence of bias in manual audiometry. Ear and Hearing, 37 (2), 177–188.

Meisenberg, G. (2019). Should cognitive differences research be forbidden? Psych, 1 (1), 306–319.

Merton, R. K. (1942). The normative structure of science. Reprinted in N.W. Storer (Ed.), The sociology of science: Empirical and theoretical investigations (Chicago and London: University of Chicago Press, 1973, pp. 267–278), Accessed Dec 2021 at https://idoc.pub/documents/merton1942the-normative-structure-of-science-vnd561m85jlx

Mitroff. (1974). Norms and counter-norms in a select group of the Apollo moon scientists: A case study of the ambivalence of scientists. American Sociological Review, 39 , 579–595.

Mizock, L., & Brubaker, M. (2019). Treatment experiences with gender and discrimination among women with serious mental illness. Psychological Services, 18 (1), 64–72.

Monnot, M. J., Quirk, S. W., Hoerger, M., & Brewer, L. (2009). Racial bias in personality assessment: Using the MMPI-2 to predict psychiatric diagnoses of African American and Caucasian chemical dependency inpatients. Psychological Assessment, 21 (2), 137–151.

Mulkay, M. J. (1976). Norms and ideology in science. Social Science Information, 15 , 637–656.

Nabaho, L., & Turyasingura, W. (2019). Battling academic corruption in higher education: Does external quality assurance offer a ray of hope? Higher Learning Research Communications . Accessed Mar 2022 from https://scholarworks.waldenu.edu/cgi/viewcontent.cgi?article=1157 & context=hlrc

Nickerson, R. S. (1998). Confirmation bias: A ubiquitous phenomenon in many guises. Review of General Psychology, 2 (2), 175–220.

Nir, L. (2011). Motivated reasoning and public opinion perception. Public Opinion Quarterly, 75 (3), 504–532.

Orem, D. (2018). Addressing implicit bias in the hiring process. National Association of Independent Schools . Accessed Mar 2021 at https://www.nais.org/magazine/independent-school/fall-2018/addressing-implicit-bias-in-the-hiring-process/

Pohl, R. F. (Ed.). (2004). A handbook on fallacies and biases in thinking, judgement and memory . Psychology Press.

Proust, D. (2011). The harmony of the spheres from Pythagoras to Voyager. International Astronomical Union, The role of Astronomy in Society and Culture Proceedings IAU Symposium No. 260 . Accessed Mar 2022 from https://www.cambridge.org/core/services/aop-cambridge-core/content/view/9434E23CC742237A03AC3312B068B99B/S1743921311002535a.pdf/harmony_of_the_spheres_from_pythagoras_to_voyager.pdf

Rawat, S., & Meena, S. (2014). Publish or perish: Where are we heading? Journal of Research in Medical Sciences, 19 (2), 87–89.

PubMed Central   PubMed   Google Scholar  

Renkewitz, F., & Keiner, M. (2019). How to detect publication bias in psychological research. Zeitschrift für Psychologie, 227 (4), 261–279.

Reynolds, C. R., & Carson, A. D. (2005). Methods for assessing cultural bias in tests. In C. L. Frisby & C. R. Reynolds (Eds.), Comprehensive handbook of multicultural school psychology (pp. 795–823). Wiley.

Reynolds, C. R., & Suzuki, L. A. (2013). Bias in psychological assessment: An empirical review and recommendations. In I. B. Weiner (Ed.), Handbook of psychology (Second ed., pp. 82–113). Wiley.

Risinger, D., Saks, M. J., Thompson, W. C., & Rosenthal, R. (2002). The Daubert/Kumbo implications of observer effects in forensic science: Hidden problems of expectation and suggestion. California Law Review, 90 (1), 1–56.

Ritchie, S. (2020). Science fictions: How fraud, bias, negligence, and hype undermine the search for truth . Henry Holt & Co.

Rosenthal, R. (1979). The file drawer problem and tolerance for Null results. Psychological Bulletin, 86 (3), 638–641.

Rothstein, H. R., Sutton, A. J., & Borenstein, M. (2006). Publication bias in meta-analysis: Prevention, assessment and adjustments . Wiley.

Sackett, D. L. (1979). Bias in analytic research. Journal of Chronic Diseases, 32 , 51–63.

Sanghvi, D. (2021). “Publish or perish”; time to question an age old adage? Indian Journal of Radiology and Imaging, 31 , S215–S216.

Simon, S. L., Clay, D., Chandrasekhar, J., & Duncan, C. L. (2020). Gender bias in pediatric psychology. Clinical Practice in Pediatric Psychology, 9 (1), 82–95.

Singal, J. (2017). Psychology’s favorite tool for measuring racism isn’t up to the job. The Cut . Accessed Mar 2022 from https://www.thecut.com/2017/01/psychologys-racism-measuring-tool-isnt-up-to-the-job.html

Smedslund, J. (2016). Why psychology cannot be an empirical science. Integrative Psychological and Behavioral Science, 50 , 185–195.

Sowell, T. (2019). Discrimination and disparities (revised and enlarged edition) . Basic Books.

Spoont, M., Nelson, D., Kehle-Forbes, S., Meis, L., Murdoch, M., Rosen, C., & Sayer, N. (2021). Racial and ethnic disparities in clinical outcomes six months after receiving a PTSD diagnosis in Veterans Health Administration. Psychological Services, 18 (4), 584–594.

Stanovich, K. E. (2021). The bias that divides us: The science and politics of myside thinking . MIT Press.

Sternberg, R. J. (2003a). It all started with those darn IQ tests: Half a career spent defying the crowd. In R. J. Sternberg (Ed.), Psychologists defying the crowd: Stories of those who battled the establishment and won (pp. 257–270). American Psychological Association.

Chapter   Google Scholar  

Sternberg, R. J. (Ed.). (2003b). Psychologists defying the crowd: Stories of those who battled the establishment and won . American Psychological Association.

Sterne, J. A. C., Egger, M., & Moher, D. (2008). Addressing reporting biases. In J. P. T. Higgins & S. Green (Eds.), Cochrane handbook for systematic reviews of interventions (pp. 297–334). Wiley.

Stover, S., & Saunders, G. (2000). Astronomical misconceptions and the effectiveness of science museums in promoting conceptual change. Journal of Elementary Science Education, 12 , 41–52.

Tavris, C., & Wade, C. (2001). Psychology in perspective (Third ed.). Prentice Hall.

Vedula, S., Goldman, P.S., Rona, I., Greene, T.M. & Dickersin, K. (2012). Implementation of a publication strategy in the context of reporting biases. A case study based on new documents from Neurontin litigation. Trials, 13 , 1–13.

Wikipedia. (2022). List of cognitive biases. Wikipedia.org . Accessed Mar 2022 from https://en.wikipedia.org/wiki/List_of_cognitive_biases

Williams, M. T., Rosen, D. C., & Kanter, J. W. (Eds.). (2019). Eliminating race-based mental health disparities: Promoting equity and culturally responsive care across settings . Context Press.

Woods, J. (2013). Errors of reasoning: Naturalizing the logic of inference . College Publications.

Zimbardo, P. G. (1988). Psychology and life . Scott Foresman.

Download references

Author information

Authors and affiliations.

College of Education, University of Missouri, Columbia, MO, USA

Craig L. Frisby

You can also search for this author in PubMed   Google Scholar

Corresponding author

Correspondence to Craig L. Frisby .

Editor information

Editors and affiliations.

Dale E. Fowler School of Law and Crean College of Health and Behavioral Sciences, Chapman University, Orange, CA, USA

Richard E. Redding

Department of Psychology, University of Nevada Reno, Reno, NV, USA

William T. O'Donohue

Department of Psychology, Emory University, Atlanta, GA, USA

Scott O. Lilienfeld

Rights and permissions

Reprints and permissions

Copyright information

© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this chapter

Frisby, C.L. (2023). What Is Meant by ‘Bias’ in Psychological Science?. In: Frisby, C.L., Redding, R.E., O'Donohue, W.T., Lilienfeld, S.O. (eds) Ideological and Political Bias in Psychology. Springer, Cham. https://doi.org/10.1007/978-3-031-29148-7_2

Download citation

DOI : https://doi.org/10.1007/978-3-031-29148-7_2

Published : 14 September 2023

Publisher Name : Springer, Cham

Print ISBN : 978-3-031-29147-0

Online ISBN : 978-3-031-29148-7

eBook Packages : Behavioral Science and Psychology Behavioral Science and Psychology (R0)

Share this chapter

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

  • Publish with us

Policies and ethics

  • Find a journal
  • Track your research

TeachEpi

The B-Files (Bias Case Studies)

Case studies of bias in real life epidemiologic studies By: Madhukar Pai & Jay S. Kaufman

Can the Bias in Algorithms Help Us See Our Own?

Photo: Closeup of code text reflecting off the glasses of someone looking at a computer screen

Algorithms have been shown to amplify human biases, but new BU-led research says they could also be used as a mirror, revealing biases people otherwise find hard to see in themselves. Photo via iStock/Kobus Louw

New research by Questrom’s Carey Morewedge shows that people recognize more of their biases in algorithms’ decisions than they do in their own—even when those decisions are the same

Molly callahan.

Algorithms were supposed to make our lives easier and fairer: help us find the best job applicants, help judges impartially assess the risks of bail and bond decisions, and ensure that healthcare is delivered to the patients with the greatest need. By now, though, we know that algorithms can be just as biased as the human decision-makers they inform and replace. 

What if that weren’t a bad thing? 

New research by Carey Morewedge , a Boston University Questrom School of Business professor of marketing and Everett W. Lord Distinguished Faculty Scholar, found that people recognize more of their biases in algorithms’ decisions than they do in their own—even when those decisions are the same. The research , published in the Proceedings of the National Academy of Sciences , suggests ways that awareness might help human decision-makers recognize and correct for their biases. 

Photo: A man with short hair and a smile wearing an elegant dark suit and white collared shirt

“A social problem is that algorithms learn and, at scale, roll out biases in the human decisions on which they were trained,” says Morewedge, who also chairs Questrom’s marketing department. For example: In 2015, Amazon tested (and soon scrapped ) an algorithm to help its hiring managers filter through job applicants. They found that the program boosted résumés it perceived to come from male applicants, and downgraded those from female applicants, a clear case of gender bias. 

But that same year, just 39 percent of Amazon’s workforce were women. If the algorithm had been trained on Amazon’s existing hiring data, it’s no wonder it prioritized male applicants—Amazon already was. If its algorithm had a gender bias, “it’s because Amazon’s managers were biased in their hiring decisions,” Morewedge says.

“Algorithms can codify and amplify human bias, but algorithms also reveal structural biases in our society,” he says. “Many biases cannot be observed at an individual level. It’s hard to prove bias, for instance, in a single hiring decision. But when we add up decisions within and across persons, as we do when building algorithms, it can reveal structural biases in our systems and organizations.”

Morewedge and his collaborators—Begüm Çeliktutan and Romain Cadario, both at Erasmus University in the Netherlands—devised a series of experiments designed to tease out people’s social biases (including racism, sexism, and ageism). The team then compared research participants’ recognition of how those biases colored their own decisions versus decisions made by an algorithm. In the experiments, participants sometimes saw the decisions of real algorithms. But there was a catch: other times, the decisions attributed to algorithms were actually the participants’ choices, in disguise. 

When we add up decisions within and across persons, as we do when building algorithms, it can reveal structural biases in our systems and organizations. Carey Morewedge

Across the board, participants were more likely to see bias in the decisions they thought came from algorithms than in their own decisions. Participants also saw as much bias in the decisions of algorithms as they did in the decisions of other people. (People generally better recognize bias in others than in themselves, a phenomenon called the bias blind spot.) Participants were also more likely to correct for bias in those decisions after the fact, a crucial step for minimizing bias in the future. 

Algorithms Remove the Bias Blind Spot

The researchers ran sets of participants, more than 6,000 in total, through nine experiments. In the first, participants rated a set of Airbnb listings, which included a few pieces of information about each listing: its average star rating (on a scale of 1 to 5) and the host’s name. The researchers assigned these fictional listings to hosts with names that were “distinctively African American or white,” based on previous research identifying racial bias , according to the paper. The participants rated how likely they were to rent each listing. 

In the second half of the experiment, participants were told about a research finding that explained how the host’s race might bias the ratings. Then, the researchers showed participants a set of ratings and asked them to assess (on a scale of 1 to 7) how likely it was that bias had influenced the ratings. 

Participants saw either their own rating reflected back to them, their own rating under the guise of an algorithm’s, their own rating under the guise of someone else’s, or an actual algorithm rating based on their preferences. 

The researchers repeated this setup several times, testing for race, gender, age, and attractiveness bias in the profiles of Lyft drivers and Airbnb hosts. Each time, the results were consistent. Participants who thought they saw an algorithm’s ratings or someone else’s ratings (whether or not they actually were) were more likely to perceive bias in the results. 

Morewedge attributes this to the different evidence we use to assess bias in others and bias in ourselves. Since we have insight into our own thought process, he says, we’re more likely to trace back through our thinking and decide that it wasn’t biased, perhaps driven by some other factor that went into our decisions. When analyzing the decisions of other people, however, all we have to judge is the outcome.

“Let’s say you’re organizing a panel of speakers for an event,” Morewedge says. “If all those speakers are men, you might say that the outcome wasn’t the result of gender bias because you weren’t even thinking about gender when you invited these speakers. But if you were attending this event and saw a panel of all-male speakers, you’re more likely to conclude that there was gender bias in the selection.” 

Indeed, in one of their experiments, the researchers found that participants who were more prone to this bias blind spot were also more likely to see bias in decisions attributed to algorithms or others than in their own decisions. In another experiment, they discovered that people more easily saw their own decisions influenced by factors that were fairly neutral or reasonable, such as an Airbnb host’s star rating, compared to a prejudicial bias, such as race—perhaps because admitting to preferring a five-star rental isn’t as threatening to one’s sense of self or how others might view us, Morewedge suggests. 

Algorithms as Mirrors: Seeing and Correcting Human Bias

In the researchers’ final experiment, they gave participants a chance to correct bias in either their ratings or the ratings of an algorithm (real or not). People were more likely to correct the algorithm’s decisions, which reduced the actual bias in its ratings. 

This is the crucial step for Morewedge and his colleagues, he says. For anyone motivated to reduce bias, being able to see it is the first step. Their research presents evidence that algorithms can be used as mirrors—a way to identify bias even when people can’t see it in themselves. 

“Right now, I think the literature on algorithmic bias is bleak,” Morewedge says. “A lot of it says that we need to develop statistical methods to reduce prejudice in algorithms. But part of the problem is that prejudice comes from people. We should work to make algorithms better, but we should also work to make ourselves less biased.

“What’s exciting about this work is that it shows that algorithms can codify or amplify human bias, but algorithms can also be tools to help people better see their own biases and correct them,” he says. “Algorithms are a double-edged sword. They can be a tool that amplifies our worst tendencies. And algorithms can be a tool that can help better ourselves.”

Financial support for this research came from Questrom’s Digital Business Institute and the Erasmus Research Institute of Management.

Explore Related Topics:

  • Share this story
  • 0 Comments Add

Senior Writer

Photo: Headshot of Molly Callahan. A white woman with short, curly brown hair, wearing glasses and a blue sweater, smiles and poses in front of a dark grey backdrop.

Molly Callahan began her career at a small, family-owned newspaper where the newsroom housed computers that used floppy disks. Since then, her work has been picked up by the Associated Press and recognized by the Connecticut chapter of the Society of Professional Journalists. In 2016, she moved into a communications role at Northeastern University as part of its News@Northeastern reporting team. When she's not writing, Molly can be found rock climbing, biking around the city, or hanging out with her fiancée, Morgan, and their cat, Junie B. Jones. Profile

Comments & Discussion

Boston University moderates comments to facilitate an informed, substantive, civil conversation. Abusive, profane, self-promotional, misleading, incoherent or off-topic comments will be rejected. Moderators are staffed during regular business hours (EST) and can only accept comments written in English. Statistics or facts must include a citation or a link to the citation.

Post a comment. Cancel reply

Your email address will not be published. Required fields are marked *

Latest from The Brink

Not having job flexibility or security can leave workers feeling depressed, anxious, and hopeless, bu electrical engineer vivek goyal named a 2024 guggenheim fellow, do immigrants and immigration help the economy, how do people carry such heavy loads on their heads, do alcohol ads promote underage drinking, how worried should we be about us measles outbreaks, stunning new image shows black hole’s immensely powerful magnetic field, it’s not just a pharmacy—walgreens and cvs closures can exacerbate health inequities, how does science misinformation affect americans from underrepresented communities, what causes osteoarthritis bu researchers win $46 million grant to pursue answers and find new treatments, how to be a better mentor, how the design of hospitals impacts patient treatment and recovery, all americans deserve a healthy diet. here’s one way to help make that happen, bu cte center: lewiston, maine, mass shooter had traumatic brain injury, can cleaner classroom air help kids do better at school, carb-x funds 100th project—a milestone for bu-based nonprofit leading antimicrobial-resistance fightback, is covid-19 still a pandemic, what is language and how does it evolve, the secrets of living to 100.

ScienceDaily

Can the bias in algorithms help us see our own?

Algorithms were supposed to make our lives easier and fairer: help us find the best job applicants, help judges impartially assess the risks of bail and bond decisions, and ensure that healthcare is delivered to the patients with the greatest need. By now, though, we know that algorithms can be just as biased as the human decision-makers they inform and replace.

What if that weren't a bad thing?

New research by Carey Morewedge, a Boston University Questrom School of Business professor of marketing and Everett W. Lord Distinguished Faculty Scholar, found that people recognize more of their biases in algorithms' decisions than they do in their own -- even when those decisions are the same. The research, publishing in the Proceedings of the National Academy of Sciences , suggests ways that awareness might help human decision-makers recognize and correct for their biases.

"A social problem is that algorithms learn and, at scale, roll out biases in the human decisions on which they were trained," says Morewedge, who also chairs Questrom's marketing department. For example: In 2015, Amazon tested (and soon scrapped) an algorithm to help its hiring managers filter through job applicants. They found that the program boosted résumés it perceived to come from male applicants, and downgraded those from female applicants, a clear case of gender bias.

But that same year, just 39 percent of Amazon's workforce were women. If the algorithm had been trained on Amazon's existing hiring data, it's no wonder it prioritized male applicants -- Amazon already was. If its algorithm had a gender bias, "it's because Amazon's managers were biased in their hiring decisions," Morewedge says.

"Algorithms can codify and amplify human bias, but algorithms also reveal structural biases in our society," he says. "Many biases cannot be observed at an individual level. It's hard to prove bias, for instance, in a single hiring decision. But when we add up decisions within and across persons, as we do when building algorithms, it can reveal structural biases in our systems and organizations."

Morewedge and his collaborators -- Begüm Çeliktutan and Romain Cadario, both at Erasmus University in the Netherlands -- devised a series of experiments designed to tease out people's social biases (including racism, sexism, and ageism). The team then compared research participants' recognition of how those biases colored their own decisions versus decisions made by an algorithm. In the experiments, participants sometimes saw the decisions of real algorithms. But there was a catch: other times, the decisions attributed to algorithms were actually the participants' choices, in disguise.

Across the board, participants were more likely to see bias in the decisions they thought came from algorithms than in their own decisions. Participants also saw as much bias in the decisions of algorithms as they did in the decisions of other people. (People generally better recognize bias in others than in themselves, a phenomenon called the bias blind spot.) Participants were also more likely to correct for bias in those decisions after the fact, a crucial step for minimizing bias in the future.

Algorithms Remove the Bias Blind Spot

The researchers ran sets of participants, more than 6,000 in total, through nine experiments. In the first, participants rated a set of Airbnb listings, which included a few pieces of information about each listing: its average star rating (on a scale of 1 to 5) and the host's name. The researchers assigned these fictional listings to hosts with names that were "distinctively African American or white," based on previous research identifying racial bias, according to the paper. The participants rated how likely they were to rent each listing.

In the second half of the experiment, participants were told about a research finding that explained how the host's race might bias the ratings. Then, the researchers showed participants a set of ratings and asked them to assess (on a scale of 1 to 7) how likely it was that bias had influenced the ratings.

Participants saw either their own rating reflected back to them, their own rating under the guise of an algorithm's, their own rating under the guise of someone else's, or an actual algorithm rating based on their preferences.

The researchers repeated this setup several times, testing for race, gender, age, and attractiveness bias in the profiles of Lyft drivers and Airbnb hosts. Each time, the results were consistent. Participants who thought they saw an algorithm's ratings or someone else's ratings (whether or not they actually were) were more likely to perceive bias in the results.

Morewedge attributes this to the different evidence we use to assess bias in others and bias in ourselves. Since we have insight into our own thought process, he says, we're more likely to trace back through our thinking and decide that it wasn't biased, perhaps driven by some other factor that went into our decisions. When analyzing the decisions of other people, however, all we have to judge is the outcome.

"Let's say you're organizing a panel of speakers for an event," Morewedge says. "If all those speakers are men, you might say that the outcome wasn't the result of gender bias because you weren't even thinking about gender when you invited these speakers. But if you were attending this event and saw a panel of all-male speakers, you're more likely to conclude that there was gender bias in the selection."

Indeed, in one of their experiments, the researchers found that participants who were more prone to this bias blind spot were also more likely to see bias in decisions attributed to algorithms or others than in their own decisions. In another experiment, they discovered that people more easily saw their own decisions influenced by factors that were fairly neutral or reasonable, such as an Airbnb host's star rating, compared to a prejudicial bias, such as race -- perhaps because admitting to preferring a five-star rental isn't as threatening to one's sense of self or how others might view us, Morewedge suggests.

Algorithms as Mirrors: Seeing and Correcting Human Bias

In the researchers' final experiment, they gave participants a chance to correct bias in either their ratings or the ratings of an algorithm (real or not). People were more likely to correct the algorithm's decisions, which reduced the actual bias in its ratings.

This is the crucial step for Morewedge and his colleagues, he says. For anyone motivated to reduce bias, being able to see it is the first step. Their research presents evidence that algorithms can be used as mirrors -- a way to identify bias even when people can't see it in themselves.

"Right now, I think the literature on algorithmic bias is bleak," Morewedge says. "A lot of it says that we need to develop statistical methods to reduce prejudice in algorithms. But part of the problem is that prejudice comes from people. We should work to make algorithms better, but we should also work to make ourselves less biased.

"What's exciting about this work is that it shows that algorithms can codify or amplify human bias, but algorithms can also be tools to help people better see their own biases and correct them," he says. "Algorithms are a double-edged sword. They can be a tool that amplifies our worst tendencies. And algorithms can be a tool that can help better ourselves."

  • Racial Issues
  • Consumer Behavior
  • Gender Difference
  • Computer Programming
  • Neural Interfaces
  • Computers and Internet
  • List of cognitive biases
  • Social cognition
  • Illusion of control
  • Microeconomics
  • Cognitive bias
  • Macroeconomics
  • Anchoring bias in decision-making

Story Source:

Materials provided by Boston University . Original written by Molly Callahan. Note: Content may be edited for style and length.

Journal Reference :

  • Begum Celiktutan, Romain Cadario, Carey K. Morewedge. People see more of their biases in algorithms . Proceedings of the National Academy of Sciences , 2024; 121 (16) DOI: 10.1073/pnas.2317602121

Cite This Page :

Explore More

  • No Two Worms Are Alike
  • Quantum Effects in Electron Waves
  • Star Trek's Holodeck Recreated Using ChatGPT
  • Cloud Engineering to Mitigate Global Warming
  • Detecting Delayed Concussion Recovery
  • Bonobos: More Aggressive Than Thought
  • Brightest Gamma-Ray Burst
  • Stellar Winds of Three Sun-Like Stars Detected
  • Fences Causing Genetic Problems for Mammals
  • Ozone Impact On Fly Species

Trending Topics

Strange & offbeat.

share this!

April 9, 2024

This article has been reviewed according to Science X's editorial process and policies . Editors have highlighted the following attributes while ensuring the content's credibility:

fact-checked

peer-reviewed publication

trusted source

Can the bias in algorithms help us see our own?

by Molly Callahan, Boston University

AI algorithm

Algorithms were supposed to make our lives easier and fairer: help us find the best job applicants, help judges impartially assess the risks of bail and bond decisions, and ensure that health care is delivered to the patients with the greatest need. By now, though, we know that algorithms can be just as biased as the human decision-makers they inform and replace.

What if that weren't a bad thing?

New research by Carey Morewedge, a Boston University Questrom School of Business professor of marketing and Everett W. Lord Distinguished Faculty Scholar, found that people recognize more of their biases in algorithms' decisions than they do in their own—even when those decisions are the same. The research, published in the Proceedings of the National Academy of Sciences , suggests ways that awareness might help human decision-makers recognize and correct for their biases.

"A social problem is that algorithms learn and, at scale, roll out biases in the human decisions on which they were trained," says Morewedge, who also chairs Questrom's marketing department. For example: In 2015, Amazon tested (and soon scrapped ) an algorithm to help its hiring managers filter through job applicants. They found that the program boosted résumés it perceived to come from male applicants, and downgraded those from female applicants, a clear case of gender bias.

But that same year, just 39 percent of Amazon's workforce were women. If the algorithm had been trained on Amazon's existing hiring data, it's no wonder it prioritized male applicants—Amazon already was. If its algorithm had a gender bias, "it's because Amazon's managers were biased in their hiring decisions," Morewedge says.

"Algorithms can codify and amplify human bias, but algorithms also reveal structural biases in our society," he says. "Many biases cannot be observed at an individual level. It's hard to prove bias, for instance, in a single hiring decision. But when we add up decisions within and across persons, as we do when building algorithms, it can reveal structural biases in our systems and organizations."

Morewedge and his collaborators—Begüm Çeliktutan and Romain Cadario, both at Erasmus University in the Netherlands—devised a series of experiments designed to tease out people's social biases (including racism, sexism, and ageism).

The team then compared research participants' recognition of how those biases colored their own decisions versus decisions made by an algorithm. In the experiments, participants sometimes saw the decisions of real algorithms. But there was a catch: other times, the decisions attributed to algorithms were actually the participants' choices, in disguise.

Across the board, participants were more likely to see bias in the decisions they thought came from algorithms than in their own decisions. Participants also saw as much bias in the decisions of algorithms as they did in the decisions of other people. (People generally better recognize bias in others than in themselves, a phenomenon called the bias blind spot.) Participants were also more likely to correct for bias in those decisions after the fact, a crucial step for minimizing bias in the future.

Algorithms remove the bias blind spot

The researchers ran sets of participants, more than 6,000 in total, through nine experiments. In the first, participants rated a set of Airbnb listings, which included a few pieces of information about each listing: its average star rating (on a scale of 1 to 5) and the host's name. The researchers assigned these fictional listings to hosts with names that were "distinctively African American or white," based on previous research identifying racial bias , according to the paper. The participants rated how likely they were to rent each listing.

In the second half of the experiment, participants were told about a research finding that explained how the host's race might bias the ratings. Then, the researchers showed participants a set of ratings and asked them to assess (on a scale of 1 to 7) how likely it was that bias had influenced the ratings.

Participants saw either their own rating reflected back to them, their own rating under the guise of an algorithm's, their own rating under the guise of someone else's, or an actual algorithm rating based on their preferences.

The researchers repeated this setup several times, testing for race, gender, age, and attractiveness bias in the profiles of Lyft drivers and Airbnb hosts. Each time, the results were consistent. Participants who thought they saw an algorithm's ratings or someone else's ratings (whether or not they actually were) were more likely to perceive bias in the results.

Morewedge attributes this to the different evidence we use to assess bias in others and bias in ourselves. Since we have insight into our own thought process, he says, we're more likely to trace back through our thinking and decide that it wasn't biased, perhaps driven by some other factor that went into our decisions. When analyzing the decisions of other people, however, all we have to judge is the outcome.

"Let's say you're organizing a panel of speakers for an event," Morewedge says. "If all those speakers are men, you might say that the outcome wasn't the result of gender bias because you weren't even thinking about gender when you invited these speakers. But if you were attending this event and saw a panel of all-male speakers, you're more likely to conclude that there was gender bias in the selection."

Indeed, in one of their experiments, the researchers found that participants who were more prone to this bias blind spot were also more likely to see bias in decisions attributed to algorithms or others than in their own decisions. In another experiment, they discovered that people more easily saw their own decisions influenced by factors that were fairly neutral or reasonable, such as an Airbnb host's star rating , compared to a prejudicial bias, such as race—perhaps because admitting to preferring a five-star rental isn't as threatening to one's sense of self or how others might view us, Morewedge suggests.

Algorithms as mirrors: Seeing and correcting human bias

In the researchers' final experiment, they gave participants a chance to correct bias in either their ratings or the ratings of an algorithm (real or not). People were more likely to correct the algorithm 's decisions, which reduced the actual bias in its ratings.

This is the crucial step for Morewedge and his colleagues, he says. For anyone motivated to reduce bias, being able to see it is the first step. Their research presents evidence that algorithms can be used as mirrors—a way to identify bias even when people can't see it in themselves.

"Right now, I think the literature on algorithmic bias is bleak," Morewedge says. "A lot of it says that we need to develop statistical methods to reduce prejudice in algorithms. But part of the problem is that prejudice comes from people. We should work to make algorithms better, but we should also work to make ourselves less biased.

"What's exciting about this work is that it shows that algorithms can codify or amplify human bias, but algorithms can also be tools to help people better see their own biases and correct them," he says. "Algorithms are a double-edged sword. They can be a tool that amplifies our worst tendencies. And algorithms can be a tool that can help better ourselves."

Journal information: Proceedings of the National Academy of Sciences

Provided by Boston University

Explore further

Feedback to editors

bias in case study research

New analysis reveals the brutal history of the Winchcombe meteorite's journey through space

3 hours ago

bias in case study research

Why European colonization drove the blue antelope to extinction

5 hours ago

bias in case study research

Bumblebees don't care about pesticide cocktails: Research highlights their resilience to chemical stressors

6 hours ago

bias in case study research

Researchers discover new clues to how tardigrades can survive intense radiation

bias in case study research

Nanovials method for immune cell screening uncovers receptors that target prostate cancer

bias in case study research

Orbital eccentricity may have led to young underground ocean on Saturn's moon Mimas

7 hours ago

bias in case study research

Researchers control quantum properties of 2D materials with tailored light

bias in case study research

Study uses thermodynamics to describe expansion of the universe

bias in case study research

Internet can achieve quantum speed with light saved as sound

bias in case study research

New research highlights effects of gentrification on urban wildlife populations across US cities

Relevant physicsforums posts, cover songs versus the original track, which ones are better.

2 hours ago

History of Railroad Safety - Spotlight on current derailments

Biographies, history, personal accounts.

10 hours ago

Interesting anecdotes in the history of physics?

11 hours ago

Why Is Two-Tone Ska Rock Popular on Retro Radio?

19 hours ago

Today's Fusion Music: T Square, Cassiopeia, Rei & Kanade Sato

Apr 14, 2024

More from Art, Music, History, and Linguistics

Related Stories

bias in case study research

Milestone rating system improved residency knowledge ratings bias

Dec 27, 2023

bias in case study research

Algorithms that adjust for worker race, gender still show biases

Feb 8, 2023

bias in case study research

Online images reinforce gender stereotypes more than text: study

Feb 14, 2024

bias in case study research

New study shows algorithms promote gender bias, and that consumers cooperate

Aug 21, 2023

bias in case study research

Research shows humans can inherit AI biases

Oct 3, 2023

bias in case study research

Jurors recommend death penalty based on certain looks, but new training can correct the bias

Dec 14, 2023

Recommended for you

bias in case study research

Are the world's cultures growing apart?

Apr 10, 2024

bias in case study research

Building footprints could help identify neighborhood sociodemographic traits

bias in case study research

First languages of North America traced back to two very different language groups from Siberia

Apr 9, 2024

bias in case study research

Public transit agencies may need to adapt to the rise of remote work, says new study

bias in case study research

The 'Iron Pipeline': Is Interstate 95 the connection for moving guns up and down the East Coast?

bias in case study research

Americans are bad at recognizing conspiracy theories when they believe they're true, says study

Apr 8, 2024

Let us know if there is a problem with our content

Use this form if you have come across a typo, inaccuracy or would like to send an edit request for the content on this page. For general inquiries, please use our contact form . For general feedback, use the public comments section below (please adhere to guidelines ).

Please select the most appropriate category to facilitate processing of your request

Thank you for taking time to provide your feedback to the editors.

Your feedback is important to us. However, we do not guarantee individual replies due to the high volume of messages.

E-mail the story

Your email address is used only to let the recipient know who sent the email. Neither your address nor the recipient's address will be used for any other purpose. The information you enter will appear in your e-mail message and is not retained by Phys.org in any form.

Newsletter sign up

Get weekly and/or daily updates delivered to your inbox. You can unsubscribe at any time and we'll never share your details to third parties.

More information Privacy policy

Donate and enjoy an ad-free experience

We keep our content available to everyone. Consider supporting Science X's mission by getting a premium account.

E-mail newsletter

  • Open access
  • Published: 12 April 2024

Risk of conversion to mild cognitive impairment or dementia among subjects with amyloid and tau pathology: a systematic review and meta-analysis

  • Zsolt Huszár 1 , 2 ,
  • Marie Anne Engh 1 ,
  • Márk Pavlekovics 1 , 3 ,
  • Tomoya Sato 1 ,
  • Yalea Steenkamp 1 ,
  • Bernard Hanseeuw 4 , 5 ,
  • Tamás Terebessy 1 ,
  • Zsolt Molnár 1 , 6 , 7 ,
  • Péter Hegyi 1 , 8 , 9 , 10 &
  • Gábor Csukly 1 , 2  

Alzheimer's Research & Therapy volume  16 , Article number:  81 ( 2024 ) Cite this article

65 Accesses

Metrics details

Measurement of beta-amyloid (Aβ) and phosphorylated tau (p-tau) levels offers the potential for early detection of neurocognitive impairment. Still, the probability of developing a clinical syndrome in the presence of these protein changes (A+ and T+) remains unclear. By performing a systematic review and meta-analysis, we investigated the risk of mild cognitive impairment (MCI) or dementia in the non-demented population with A+ and A- alone and in combination with T+ and T- as confirmed by PET or cerebrospinal fluid examination.

A systematic search of prospective and retrospective studies investigating the association of Aβ and p-tau with cognitive decline was performed in three databases (MEDLINE via PubMed, EMBASE, and CENTRAL) on January 9, 2024. The risk of bias was assessed using the Cochrane QUIPS tool. Odds ratios (OR) and Hazard Ratios (HR) were pooled using a random-effects model. The effect of neurodegeneration was not studied due to its non-specific nature.

A total of 18,162 records were found, and at the end of the selection process, data from 36 cohorts were pooled ( n = 7,793). Compared to the unexposed group, the odds ratio (OR) for conversion to dementia in A+ MCI patients was 5.18 [95% CI 3.93; 6.81]. In A+ CU subjects, the OR for conversion to MCI or dementia was 5.79 [95% CI 2.88; 11.64]. Cerebrospinal fluid Aβ42 or Aβ42/40 analysis and amyloid PET imaging showed consistent results. The OR for conversion in A+T+ MCI subjects (11.60 [95% CI 7.96; 16.91]) was significantly higher than in A+T- subjects (2.73 [95% CI 1.65; 4.52]). The OR for A-T+ MCI subjects was non-significant (1.47 [95% CI 0.55; 3.92]). CU subjects with A+T+ status had a significantly higher OR for conversion (13.46 [95% CI 3.69; 49.11]) than A+T- subjects (2.04 [95% CI 0.70; 5.97]). Meta-regression showed that the ORs for Aβ exposure decreased with age in MCI. (beta = -0.04 [95% CI -0.03 to -0.083]).

Conclusions

Identifying Aβ-positive individuals, irrespective of the measurement technique employed (CSF or PET), enables the detection of the most at-risk population before disease onset, or at least at a mild stage. The inclusion of tau status in addition to Aβ, especially in A+T+ cases, further refines the risk assessment. Notably, the higher odds ratio associated with Aβ decreases with age.

Trial registration

The study was registered in PROSPERO (ID: CRD42021288100).

Affecting 55 million people worldwide, dementia is one of the leading causes of years spent with disability and one of the costliest long-term illnesses in society. The most common cause of dementia is Alzheimer's disease (AD), responsible for 60-80% of cases [ 1 , 2 ].

Two specific protein aggregates play a crucial role in the pathophysiology of AD. One is the amyloid plaque formation in the extracellular space, predominantly by Aβ aggregation. These plaques, among other pathological effects, inhibit the signaling function of neurons [ 3 ]. The other protein change is the appearance of neurofibrillary tangles within the neurons, which are formed by the phosphorylation of tau proteins (p-tau) and inhibit the axonal transport inside the cell [ 4 ]. Whereas the specific pathology could only be confirmed by autopsy in the past, in vivo tests are available today. Parallelly to this development, the diagnostic definitions of AD have evolved significantly over time, moving from purely clinical assessments and post-mortem examinations to the integration of in vivo amyloid and later p-tau biomarkers, emphasizing the role of preclinical stages [ 5 , 6 , 7 , 8 ]. Accordingly, researchers are increasingly trying to link the diagnosis of the disease to biological parameters. However, in general, the clinical practice only considers the quality of the symptoms of dementia and the fact of neurodegeneration confirmed by radiology when establishing an AD diagnosis.

The International Working Group (IWG) [ 5 ] emphasizes that diagnosis should align with clinical symptoms. However, for researchers in the field, the U.S. National Institute on Aging – Alzheimer’s Association (NIA-AA) has issued a new framework recommendation [ 6 ]. This recommendation defines AD purely in terms of specific biological changes based on the Aβ (A) and p-tau (T) protein status, while neurodegeneration (N) is considered a non-specific marker that can be used for staging. In the recommendation, the category ‘Alzheimer’s disease continuum’ is proposed for all A+ cases, ‘Alzheimer’s pathological changes’ for A+T- cases, and ‘Alzheimer’s disease’ for A+T+ cases. A-(TN)+ cases are classified as ‘non-Alzheimer pathological changes’.

Aβ and p-tau proteins have long been known to be associated with AD development, and their accumulation can begin up to 15-20 years before the onset of cognitive symptoms [ 9 ]. Pathological amyloid changes are highly prevalent in dementia: 88% of those clinically diagnosed with AD and between 12 and 51% of those with non-AD are A+, according to a meta-analysis [ 10 ]. At the same time, the specificity of the abnormal beta-amyloid level for AD and its central role in its pathomechanism have been questioned [ 11 ]. Their use as a preventive screening target is a subject of ongoing discourse [ 12 ]. Yet it is still unclear to what extent their presence accelerates cognitive decline. What are the predictive prospects for an individual with abnormal protein levels who is otherwise cognitively healthy or with only mild cognitive impairment (MCI), meaning cases where there is a detectable decline in cognitive ability with maintained ability to perform most activities of daily living independently? [ 13 ] Research on non-demented populations shows substantial variation; for example, studies have shown OR values for conversion to dementia ranging from 2.25 [95% CI 0.71; 7.09] [ 14 ] to 137.5 [95% CI 17.8; 1059.6] [ 15 ]. Comparing conversion data systematically is necessary to provide a clearer picture.

In the CU population over 50 years, the prevalence of being A+ ranges from 10 to 44%, while in MCI it ranges from 27 to 71%, depending on age. Taking this into consideration [ 16 ], we aim to investigate the effect of Aβ alone and in combination with p-tau on the conversion to MCI and dementia, through a systematic review and meta-analysis of the available literature. Knowing the prognostic effect can highlight the clinical potential of this current research framework, given that, at present, the therapy of MCI or dementia can only slow down the decline. Prevention starting at an early stage or even before symptoms appear, provides the best chance against the disease.

Study registration

Our study was registered in the PROSPERO database (ID: CRD42021288100), with a pre-defined research plan and detailed objectives, is reported strictly in accordance with the recommendation of the PRISMA 2020 guideline and was performed following the guidance of the Cochrane Handbook [ 17 ].

We aimed to determine the change in odds of progression to MCI or dementia among non-demented subjects based on abnormal Aβ levels alone, or in combination with abnormal p-tau levels.

Search and selection

We included longitudinal prospective and retrospective studies that used the NIA-AA 2018 recommended measurement of Aβ and p-tau (for Aβ: amyloid PET, CSF Aβ42, or Aβ42/40 ratio; for p-tau: tau PET, or CSF p-tau) and investigated the role of Aβ and +/- p-tau in CU and MCI subjects in progression to MCI or dementia. Case reports and case series were excluded. Overlapping populations were taken into account during the data extraction. Our search key was run in the Medline, Embase, and Central databases on 31 October 2021, and the search was updated on 9 January 2024 (see Supplementary Material, Appendix 1 ). After removing duplicates, we screened publications by title and abstract, and in the second round by full text. Two independent reviewers conducted the selection (ZH, MP), and a third reviewer (GC) resolved disagreements. The degree of the agreement was quantified using Cohen’s kappa statistics at each selection stage.

As part of the selection process, articles that only examined the ADNI database [ 18 ] were excluded, as patient-level data were used instead (see Supplementary Material Appendix 2 for details of the patient-level data analysis of the ADNI).

A standardized Excel (Microsoft Corporation, Redmond, Washington, USA) document sheet was used for data extraction (for one special case of data extraction see Supplementary Material Appendix 3 ). Where data were available in graphical form only, we used an online software (Plot Digitizer) [ 19 , 20 ]. The following data were extracted: source of data used in the studies (place of clinical trial or name of database), baseline characteristics of the population (age, gender, APOE status, and education level), type of exposure (Aβ, p-tau, and neurodegeneration), measurement technique of the exposure, data on cognitive impairment separately for the different exposure groups).

Data synthesis

Generally, where several studies used the same population sample or cohort, only data from the study with the largest sample size were used. Conversion to Alzheimer’s dementia and to unspecified dementia was assessed together, as the definition of Alzheimer’s dementia varied between the studies, and the diagnosis was based on neurocognitive tests. If conversion to both types of dementia was given, the value of the conversion to unspecified dementia was used. The population with subjective cognitive symptoms was scored jointly with the CU population, as these subpopulations could not be differentiated objectively.

Odds ratio and hazard ratio values were used or calculated based on the available information (for details on the methodology, see Supplementary Material Appendix 4 ). Considering that studies report their results on different age groups, a meta-regression analysis was performed to investigate how age affects the likelihood of developing dementia based on Aβ levels.

Studies applied different analysis methods to identify Aβ positivity. Where multiple amyloid categories were being considered, the preferred method was amyloid PET. When relying on CSF analysis, the Aβ42/40 ratio was given precedence over Aβ42 since the 42/40 ratio has a higher concordance with amyloid PET [ 21 ]. To estimate the confounding effect caused by different amyloid measurement techniques a subgroup analysis was performed. For the assessment of p-tau, studies measured p-tau181 levels from CSF samples, or employed tau PET. While there is also a limited number of tau PET measurements in the ADNI, in order to ensure consistency in the analyses, we used exclusively the CSF p-tau181 levels from the ADNI database.

For the OR analysis, studies with varying follow-up times were pooled. To estimate the resulting bias, a meta-regression analysis was performed to explore how follow-up time affected the results.

Statistical analysis

Statistical analyses were performed in the R programming environment (version 4.1.2) using the “meta” software package version 5.2-0. To visualize synthesized data, we used forest plots showing ORs or HRs and corresponding confidence intervals for each individual study and pooled effect sizes in terms of ORs and HRs. For dichotomous outcomes, odds ratios and hazard ratios with 95% confidence intervals (CI) were used as effect measures. To calculate odds ratios, the total number of patients in each study and the number of patients with the event of interest in each group were extracted from each study. Raw data from the selected studies were pooled using a random-effects model with the Mantel-Haenszel method [ 22 , 23 , 24 ]. The random-effects model was used as we assumed that the true effect would vary between studies due to differences in demographics and clinical measures, such as age or baseline cognitive impairment.

Heterogeneity was assessed by calculating I 2 , tau 2 , and the prediction interval. I 2 is defined as the percentage of variability in the effect size that is not caused by sampling error, whereas tau 2 is the square root of the standard deviation of the true effect size. As I 2 is heavily dependent on the precision of the studies and tau 2  is sometimes hard to interpret (as it is insensitive to the number of the studies and their precision), the prediction interval has also been calculated. The great advantage of the prediction interval is that this measure is easy to interpret: if the interval does not include zero, further studies are expected to show a similar result.

Sensitivity analysis

We performed outlier detection according to Viechtbauer et al. [ 25 ]. A study is considered an outlier if the confidence interval of the study does not overlap with the confidence interval of the pooled effect. The idea behind is to detect effect sizes that differ significantly from the overall effect. As a sensitivity analysis, we repeated the analyses after removing any outliers and then we compared the pooled effects before and after the exclusion, in order to detect if outliers would have a substiantial impact on the overall effect.

Risk of bias assement

The risk of bias was assessed according to the recommendation of the Cochrane Collaboration; using the QUIPS tool [ 26 ], two investigators (ZH and YS) independently assessed the quality of the studies, and a third author solved disagreements. Publication bias was examined using the Peter’s regression test [ 27 ] and visual inspection of the adjusted Funnel-plots.

Search results

During the systematic search (Fig. 1 ), 18,162 records were found, and finally, 46 eligible articles were obtained (Supplementary Material eTable 1 ); While some of the articles analyzed the same cohorts, we were able to pool data from 36 different cohorts or centres. The Cohens’s kappa was 0.91 for the title and abstract, and 0.86 for the full-text selection. Given the amount of data found, we decided to examine the targeted outcomes separately and focus only on the conversion data in this report.

figure 1

PRISMA flowchart of selection. Flowchart of the study screening process following the Preferred Reporting Items for Systematic Reviews and Meta-analysis (PRISMA) 2020 statement

The investigated studies expressed their results in different ways. They calculated unadjusted or adjusted hazard ratios or presented the number of conversions for the different follow-up periods. In the latter case, we calculated odds ratios for the defined time periods. The measured exposures also differed: data were given only for Aβ or in combination with p-tau or neurodegeneration. There were also differences in the techniques used to measure exposure, with CSF sample being used in some cases and PET scan in others.

During data extraction, one [ 28 ] article was excluded because of inconsistently reported conversion data, and four [ 15 , 29 , 30 , 31 ] were excluded from the A/T analysis because the definition of the pathologic Aβ and p-tau was based on Aβ/p-tau ratio, which did not comply with the NIA-AA 2018 recommendation.

The eligible studies investigated three groups: CU, MCI, and mixed - in which the results were collectively expressed for both the MCI and CU groups. The CU group comprised either cognitively healthy subjects or individuals with only subjective cognitive complaints. To define the MCI group, all studies followed the Petersen criteria [ 32 ]. Four studies examined mixed groups. Since all of them studied large samples ( n >180), it was considered more valuable to jointly analyze them with MCI, since the outcome was also the conversion to dementia. As a result of the joint analysis, our findings are based on a substantially larger sample. To support this decision, we performed a subgroup analysis comparing the Aβ positive MCI and mixed population studies. The OR differed significantly from the unexposed group in both the MCI (OR 5.83 [3.80; 8.93]) and the mixed (4.64 [95% CI 1.16; 18.61]) subgroups, and there was no significant difference between the two subgroups ( p =0.55) (Supplementary Material eFigure 1 ).

Conversion from MCI to dementia

Aβ exposition - in or.

Based on a mixed model meta-analysis of 3,576 subjects (Table 1 ), we observed a significant association between Aβ positivity and higher conversion rates. Compared to the unexposed, the OR for conversion to dementia in the amyloid positives were 5.18 [95% CI 3.93; 6.81]; t(21)=12.47; ( p <0.0001). The I 2 - test for heterogeneity revealed that 44.8% of the variance across studies was due to heterogeneity (Fig. 2 A). As a result of the outlier detection we excluded the Balassa study and found a very similar overall effect and a reduced heterogeneity (5.05 [95% CI 3.98; 6.40]; t(20) = 14.2; p < 0.0001; I 2 = 31.4%). Meta-regression analysis of mean age showed a statistically significant decrease in OR values with increasing age (R 2 = 59.05%, beta = -0.04, SE = 0.019, [95% CI = -0.03 to -0.083], df = 18, t = -2.27, p = 0.036) (Fig. 2 B). The Hartunk-Knapp method was applied to adjust test statistics and confidence intervals to reduce the risk of false positives.

figure 2

Conversion of Aβ exposed MCI groups to dementia in OR. The squares and bars represent the mean values and 95% CIs of the effect sizes, and the squares’ area reflects the weight of the studies. Diamonds represent the combined effects, and the vertical dotted line represents the line of no association. A  OR for Aβ exposition; B  meta-regression of age and ORs for conversion regarding Aβ exposure. The size of the circle is proportional to the weight of each study in the meta-analysis. The line corresponds to meta-regression with age as covariate, and beta represents the slope of ORs by mean age

Beta-amyloid was determined by CSF Aβ42, CSF Aβ42/40 ratio or amyloid PET. When the three groups were compared in a subgroup analysis, the OR was 5.87 (2.83; 12.19) for CSF Aβ42, 5.00 (3.31; 7.55) for CSF Aβ42/40 ratio, and 5.32 (2.53; 11.18) for amyloid PET. The difference between the subgroups was not significant ( p =0.88) (Supplementary Material eFigure 2 ).

The meta-regression analysis performed to examine the role of follow-up time showed no association with respect to the ORs (R 2 = 0%, beta = -0.002, SE = 0.07, [95% CI = -0.02 - 0.01], df = 11, p = 0.77) (Supplementary Material eFigure 3 A).

We used a funnel plot to examine publication bias (Supplementary Material eFigure 4 A). Most of the studies with large sample sizes lie close to the midline, which confirms that the pooled effect size seems valid. However, the visual inspection of the plot raised the possibility of some publication bias in two ways: (1) Studies in the bottom right corner of the plot have significant results despite having large standard errors (2) The absence of studies in the bottom left corner (blank area in the figure) may indicate that studies with nonsignificant results were not published. In order to quantify funnel plot asymmetry, the Peter’s regression test was applied. The test results were not significant ( t = 1.7, df = 20, p = 0.11) so no asymmetry was proven in the funnel plot.

The effect of Aβ exposition in terms of HR

Several studies reported their results in HRs instead of or in addition to ORs (Supplementary Material eTable 2 ). The advantage of the HR value is that this measure is independent of the length of follow-up times of the studies. For these reasons, we also considered it important to analyze the results expressed in HR. Based on pooled data of patients studied ( n =1,888), the HR for conversion to dementia was 3.16 [95% CI 2.07; 4.83], p < 0.001 (Fig. 3 A).

figure 3

Conversion of Aβ exposed MCI groups to dementia in HR. The squares and bars represent the mean values and 95% CIs of the effect sizes, and the squares’ area reflects the weight of the studies. Diamonds represent the combined effects, and the vertical dotted line represents the line of no association. A  HR for Aβ exposition; B  sub-group analysis of studies with adjusted and unadjusted HR values

To investigate the effect of adjustment, we conducted a subgroup analysis between the unadjusted and adjusted measurements. Although there was a trend for higher unadjusted HR values compared to the adjusted HRs, the difference did not reach statistical significance (unadjusted HR : 5.07 [95% CI 2.77 - 9.26], adjusted HR 2.86 [95% CI 1.70 - 4.83] p =0.055) (Fig. 3 B). We could not analyze HR in the A+T-, A+T+, and A-T+ subgroups, due to the low number of available studies.

The effect of Aβ and p-tau exposition in terms of OR

We examined the combined effect of p-tau and Aβ (Table 2 ), and compared A+T+, A+T-, and A-T+ exposures to A-T-. Based on pooled data for patients studied (n=1,327), the OR for conversion to dementia in A+T- was 2.73 [95% CI 1.65; 4.52], and the odds ratio was significantly higher in the presence of both exposures (A+T+) ( p <0.001), with an OR of 11.60 [95% CI 7.96; 16.91]. The effect of A-T+ exposure on conversion was not significant (OR: 1.47 [0.55; 3.92]) (Fig. 4 A).

figure 4

Conversion of Aβ and p-tau exposed MCI groups to dementia in OR. The squares and bars represent the mean values and 95% CIs of the effect sizes, and the squares’ area reflects the weight of the studies. Diamonds represent the combined effects, and the vertical dotted line represents the line of no association. A  Aβ and p-tau expositions in OR; B  sub-group analysis of comparisons between the A+T+ and A+T- groups; C  sub-group analysis of comparisons between the A+T- and A-T+ groups

Subgroup analyses showed that the A+T+ group had a significantly higher odds of conversion compared to the A+T- group ( p <0.001), while the A+T- and A-T+ groups did not differ significantly ( p =0.15) (Fig. 4 B and C).

Conversion from CU to MCI or dementia

The effect of aβ exposition in terms of or.

Analyses on the CU population ( n = 4,217) yielded very similar results to the MCI sample. The OR for conversion to MCI or dementia was 5.79 [95% CI 2.88; 11.64] (t(13) = 5.43; p = 0.0001), the results of the studies did however show a high degree of heterogeneity (I 2 = 73% [55%; 84%]) (Table 3 , Fig. 5 A). As a result of the outlier detection we removed the Aruda study and found a very similar overall effect (6.33 [95% CI 3.42; 11.71]; t(12) = 6.54; p < 0.0001; I 2 = 72.1%).

figure 5

Conversion of Aβ and p-tau exposed CU groups to MCI or dementia in OR. The squares and bars represent the mean values and 95% CIs of the effect sizes, and the squares' area reflects the weight of the studies. Diamonds represent the combined effects, and the vertical dotted line represents the line of no association. A  Aβ exposition in OR. B  Aβ and p-tau expositions in OR

Meta-regression analysis of mean age did not show a significant association with OR. (R 2 = 8.22%, beta = -0.05, SE = 0.05, [95% CI = -0.17 – 0.7], df = 11, t =, p = 0.37).

Meta-regression analysis also showed no association between follow-up time and ORs (R 2 = 0.35%, beta = -0.014, SE = 0.024, [95% CI = -0.07 - 0.04], df = 8, p = 0.58) (Supplementary Material eFigure 3 B).

We applied a funnel plot to examine publication bias (Supplementary Material eFigure 4 B).Most of the studies with large sample sizes lie close to the midline, which reaffirms the pooled effect size’s validity. In order to quantify funnel plot asymmetry, Peter’s regression test was applied. The test results were not significant ( t = 0.9, df = 12, p = 0.31) indicating that no asymmetry was demonstrated in the funnel plot.

Four cohorts provided HRs for the CU population ( n =2700) with one cohort (ADNI) representing the 55.3% of the total sample (weight: 78.5%) (Supplementary Material eTable 3 ). The pooled HR for conversion was 2.33 [95% CI 1.88; 2.88] ( p =0.001) (Supplementary Material eFigure 5 )

The combined effect of Aβ and p-tau exposition in terms of OR

Using data from a total of 2228 subjects, we investigated the effect of p-tau in combination with Aβ (Table 4 ) in the CU population. The OR for conversion is 2.04 [95% CI 0.70; 5.97] for A+T-, and 13.46 [95% CI 3.69; 49.11] for the A+T+, compared to the A-T- group The OR shows a trend level increased risk (t=2.1, P =0.12) for the A+T- group compared to the A-T- group.

Similarly to the MCI population, subgroup analyses showed that the A+T+ group had significantly higher OR for conversion compared to the A+T- group ( p <0.01). The analysis could not be performed for A-T+ due to the low number of these cases.

Risk of bias assessment

The risk of bias was assessed separately for the analyses discussed above. The overall risk of the studies ranged from low to moderate, except in three cases: twice we found a high risk of bias due to attrition of above 50% [ 59 , 60 ], and once due to a focus on monozygotic twins [ 61 ] (Supplementary Material, eFigure 6 ). These articles ( n =197) were excluded from all analyses.

Summary and context

A pathological Aβ state are strongly correlated with the risk of clinical progression. The odds ratio for conversion is 5.18 in the MCI population and 5.79 in the CU population. Therefore, measuring Aβ levels alone can identify a population at high risk. The OR for conversion to dementia differs significantly between the A+T+ and A+T- groups in both the MCI and CU populations: while the OR is 2.73 [95% CI 1.65; 4.52] for MCI and 2.04 [95% CI 0.70; 5.97] for CU subjects in the A+T- group, it increases to 11.60 [95% CI 7.96; 16.91] for MCI and 14.67 [95% CI 3.69; 49.11] for CU in the A+T+ group. Note that in the case of A+T- at CU population, only a trend-level statistical correlation is visible.

The results of the meta-regression show a decrease in OR with mean age (Fig. 2 B). Based on this result it seems that the impact of Amyloid positivity on conversion is decreasing with age. The fact that age is a risk factor for dementia and vascular and other neurodegenerative damage are more frequent in elderly age is a possible explanation to this finding. Our findings combined with the results of Rodrigue et al. [ 62 ] suggests that amyloid burden increases with age, while its impact on conversion rates slightly decreases with age.

The appearance of Aβ is assumed to be one of the earliest signs of AD [ 63 , 64 ]. Our results fit into this picture by showing that only the A+T+ and A+T- groups showed an increased risk for conversion compared to A-T-, the A-T+ group did not. Thus, Aβ alone is suitable for detecting the population at risk, while p-tau alone is not as effective in the prediction conversion. Our result is in line with previous studies showing that the A-T+ group has a weaker association with cognitive decline compared to the A+T- or A+T+ groups [ 65 , 66 ]. However, it is important to emphasize that previous results showing that T+ status is closely associated with neurodegeneration and the A-T+ group is related to frontotemporal dementia [ 67 ]. More research is needed to fully explain the significance of the A-T+ group.

The PET scan is known to be a more sensitive tool for detecting Amyloid positivity compared to CSF sampling [ 68 ]. However, from a prognostic point of view, our results did not show a significant difference ( p =0.73) between PET measurements (OR: 6.02) and the more cost-effective but invasive CSF Aβ42 measurements (OR: 5.11). It is important to note here that the present meta-analysis is underpowered for detecting prognostic differences between these methods. Due to the heterogeneity among studies, the impact of confounding factors, and standardised studies are required to evaluate the comparative prognostic value of these biomarkers accurately.

Our results based on ORs are further strengthened by the HR analyses giving similar results for Aβ exposure in the MCI (HR: 3.16) and CU (HR: 2.33) populations. It should be noted that in the HR analysis of the CU group, ADNI accounts for 78.5% of the weight, which is a limitation of this meta-analysis. This disproportionate representation may affect the overall result. Regarding the statistical trend-level association with a higher unadjusted HR, it should be noted that in the presence of a random distribution of other risk factors (e.g. baseline MMSE score or educational level), the unadjusted value may overestimate the HR. As in the case of a non-random distribution, the adjusted value underestimates the HR. With this in mind, we recommend reporting both values in the future.

Our analyses were performed on CU and MCI populations. Including mixed populations with the MCI population was a practical simplification, as several studies with a large number of cases gave their results combining MCI subjects with CU subjects, and we aimed to answer the set of questions based on the largest population. To investigate the potential bias of this method, we performed subgroup analysis comparing the mixed and MCI populations, and the result was not significant. The Aβ OR based on the mixed-only group is 4.64 [95% CI 1.16; 18.61], and the OR calculated on the MCI-only studies is 5.83 [95% CI 3.80; 8.93]. Thus, the inclusion of the mixed population in the pool decreases the OR of the main analysis (5.21 [95% CI 3.93; 6.90]) slightly (Supplementary Material eFigure 1 ).

Strengths and limitations

There are several limitations to consider when interpreting our results. The study populations differ in several aspects; for cognitive status, the population ranges from those with no cognitive symptoms through those with subjective cognitive symptoms (these two groups were considered CU) to MCI groups. Therefore, the distance from the cognitive state corresponding to MCI or dementia also varies. Due to the different cut-offs used in the studies, subjects with grey area scores may oscillate between A- and A+ groups, increasing heterogeneity. Our study could not examine the role of other risk factors such as education, cardiovascular status, obesity, diabetes, depression, social and physical activity [ 69 ], or genetic status [ 70 , 71 ], which may also contribute to heterogeneity. Furthermore, there is a considerable heterogeneity by mean age, and our meta-regression analysis of MCI group showed a significant decreasing effect of mean age on ORs.

In the OR analysis of Aβ in the CU group, in the context of the outlier value of the Arruda study, the possibility of a statistical extreme value can be assumed due to the small number of A+ subjects and the much larger A- group. Similarly, in the case of the Grontvedt [ 14 ] and Hanseeuw [ 41 ] studies, which show exceptionally high values, the A+ and A- groups show a similar uneven distribution. Similarly, the outliers in the MCI amyloid OR analysis are also associated with small sample sizes. For the Aβ HR analysis in the CU group, the interpretability of the result is strongly influenced by one specific cohort (ADNI), which accounts for 78% of the overall weight. In the A+T+/A+T-/A-T+ analyses, no outliers were found in either the MCI or CU groups.

Furthermore, we note that although the Aβ OR analyses could be confirmed by also calculating the HRs, the inability to analyze the effect of p-tau on HR due to the low number of studies limits the completeness of the A/T analysis.

We pooled studies reporting AD-type dementia conversion and studies reporting conversion to unspecified dementia. This simplification was necessary because different studies defined Alzheimer’s dementia differently, generally considering the amnestic clinical symptoms rather than biomarkers.

The fact that the studies used different neuropsychology tests to define MCI may contribute to the heterogeneity in the pooled sample. Another contributing factor would be the heterogeneity in the definition of MCI, however among the studies in our pool, only one, by Riemschneider et al. [ 48 ] (sample size = 28), precedes the 2003 ‘Key Symposium’ [ 72 ] that transformed the MCI concept. All other studies were published subsequent to it. While MCI subgroups were deifned after the 2003 Symposium, the definition of MCI (objective cognitive impairment, essentially preserved general cognitive functioning, preserved independence in functional abilities) did not change afterwards. Furthermore, most of the studies pooled in the analyses were published after 2010.

Another source of heterogeneity is the relatively small sample size of some studies, leading to a higher variability of results. However, we thought that including studies with lower sample sizes was also important to get a complete picture.

It is essential to discuss the difference in the follow-up times between studies. The follow-up times ranged from 20 months to more than 10 years. Follow-up times were given in different ways, either as mean, median or up to a certain point. While naturally, the odds of conversion increase over time, our meta-regression analysis suggests that there is no significant difference in the odds ratios over (follow-up) time. The moderate heterogeneity of the studies also points in this direction. We also note here that hazard ratios independent of follow-up time showed similar results to OR analyses. Finally, yet importantly, we would like to point out that pathological protein changes can begin up to 20 years before the appearance of symptoms [ 6 ]. Such an extended follow-up is very difficult to carry out; therefore, all studies were shorter than that.

The results for Aβ are based on 7,793 individuals, and the combined analyses of Aβ and p-tau are based on data of over 3,500 individuals. Studies using CSF sampling or amyloid/tau PET to detect Aβ and p-tau were pooled together, despite using different kits and thresholds for positivity, contributing to the heterogeneity of results. This variation is acknowledged in Tables 1 , 2 , 3 and  4 , where the cut-off values are provided. Previous large population studies have indicated that amyloid and tau PET scans exhibit slightly higher sensitivity compared to CSF sampling techniques [ 73 , 74 , 68 ]. Nonetheless, the concordance between these diagnostic methods remains substantial. Moreover, findings from prior research (Lee et al. [ 75 ], Toledo et al. [ 76 ], Palmqvist et al. [ 77 ]) demonstrating high concordance across different amyloid CSF and amyloid PET measurements suggest that the impact of methodological differences on heterogeneity may be limited, All techniques are recommended by the National Institute on Aging-Alzheimer’s Association (NIA-AA) [ 6 ] for measurement.

Future directions

Conversion to Alzheimer’s disease could not be analyzed specifically, as most of the articles examining conversion either did not define Alzheimer’s disease or the definition was based on neuropsychological testing but not on biomarkers (i.e., Aβ and p-tau status were assessed only at baseline). According to the NIA-AA guideline [ 6 ] and our results, we recommend biomarker-based studies to assess conversion rates to Alzheimer’s disease.

In view of the Aβ and p-tau status, the most endangered population can be identified before the appearance of cognitive symptoms or at least at a mild stage. While the significance of Aβ in conversion is clear, it appears that its ability to predict the onset decreases with age. If we consider the current therapeutic limitations and the importance of early prevention, we believe that the initiation of non-pharmacological and pharmacological treatments should be related to Aβ and p-tau status rather than cognitive status.

Identifying the most endangered population also makes research more effective. The efficacy of different dementia prevention approaches can be more accurately assessed by knowing the Aβ and p-tau status of the patient. As the population targeted by the interventions can be more homogeneous, the effectiveness can be measured more precisely by identifying the population most at risk of conversion.

Availability of data and materials

The datasets used and/or analysed during the current study are available from the corresponding author on reasonable request.

Abbreviations

Non-pathologic levels of beta-amyloid

Pathologic levels of beta-amyloid

  • Beta-amyloid
  • Alzheimer’s disease

Alzheimer’s Disease Neuroimaging Initiative

Confidance interval

Cognitively unimpaired

Cerebrospinal fluid

Hazard ratio

  • Mild cognitive impairment

Absence of neurodegeneration

Presence of neurodegeneration

National Institute on Aging Alzheimer’s Association

Positron emission tomography

  • Phosphorylated tau

Non-pathologic levels of phosphorylated tau

Pathologic levels of phosphorylated tau

Risk Reduction of Cognitive Decline and Dementia: WHO Guidelines. Geneva: World Health Organization; 2019. Available from: https://www.ncbi.nlm.nih.gov/books/NBK542796/ .

Gauthier S, Rosa-Neto P, Morais JA, & Webster C. 2021. World Alzheimer Report 2021: Journey through the diagnosis of dementia. London: Alzheimer’s Disease International.

De Strooper B. The cellular phase of Alzheimer’s disease. Cell. 2016;164(4):603–15. https://doi.org/10.1016/j.cell.2015.12.056 .

Article   CAS   PubMed   Google Scholar  

Scheltens P, De Strooper B, Kivipelto M, et al. Alzheimer’s disease. The Lancet. 2021;397(10284):1577–90. https://doi.org/10.1016/s0140-6736(20)32205-4 .

Article   CAS   Google Scholar  

Dubois B, Villain N, Frisoni GB, et al. Clinical diagnosis of Alzheimer’s disease: recommendations of the international working group. Lancet Neurol. 2021;20(6):484–96. https://doi.org/10.1016/s1474-4422(21)00066-1 .

Article   CAS   PubMed   PubMed Central   Google Scholar  

Jack CR Jr, Bennett DA, Blennow K, et al. NIA-AA Research framework: toward a biological definition of alzheimer’s disease. Alzheimers Dement. 2018;14(4):535–62. https://doi.org/10.1016/j.jalz.2018.02.018 .

Article   PubMed   Google Scholar  

McKhann G, Drachman D, Folstein M, Katzman R, Price D, Stadlan EM. Clinical diagnosis of Alzheimer’s disease: report of the NINCDS-ADRDA work group under the auspices of department of health and human services task force on alzheimer’s disease. Neurology. 1984;34(7):939–44. https://doi.org/10.1212/wnl.34.7.939 .

McKhann GM, Knopman DS, Chertkow H, et al. The diagnosis of dementia due to Alzheimer’s disease: recommendations from the national institute on aging-Alzheimer’s association workgroups on diagnostic guidelines for alzheimer’s disease. Alzheimers Dement. 2011;7(3):263–9. https://doi.org/10.1016/j.jalz.2011.03.005 .

Rowe CC, Ellis KA, Rimajova M, et al. Amyloid imaging results from the australian imaging, biomarkers and lifestyle (AIBL) study of aging. Neurobiol Aging. 2010;31(8):1275–83. https://doi.org/10.1016/j.neurobiolaging.2010.04.007 .

Ossenkoppele R, Jansen WJ, Rabinovici GD, et al. Prevalence of amyloid PET positivity in dementia syndromes. JAMA. 2015;313(19):1939. https://doi.org/10.1001/jama.2015.4669 .

Article   PubMed   PubMed Central   Google Scholar  

Morris GP, Clark IA, Vissel B. Questions concerning the role of amyloid-β in the definition, aetiology and diagnosis of Alzheimer’s disease. Acta Neuropathol. 2018;136(5):663–89. https://doi.org/10.1007/s00401-018-1918-8 .

Van Der Flier WM, Scheltens P. The ATN framework—moving preclinical Alzheimer disease to clinical relevance. JAMA Neurology. 2022;79(10):968. https://doi.org/10.1001/jamaneurol.2022.2967 .

Petersen RC, Smith GE, Waring SC, Ivnik RJ, Tangalos EG, Kokmen E. Mild cognitive impairment: clinical characterization and outcome. Arch Neurol. 1999;56(3):303–8. https://doi.org/10.1001/archneur.56.3.303 .

Grøntvedt GR, Lauridsen C, Berge G, et al. The amyloid, tau, and neurodegeneration (A/T/N) classification applied to a clinical research cohort with long-term follow-up. J Alzheimers Dis. 2020;74(3):829–37. https://doi.org/10.3233/jad-191227 .

Balasa M, Sánchez-Valle R, Antonell A, et al. Usefulness of biomarkers in the diagnosis and prognosis of early-onset cognitive impairment. J Alzheimer’s Di. 2014;40(4):919–27. https://doi.org/10.3233/JAD-132195 .

Article   Google Scholar  

Jansen WJ, Ossenkoppele R, Knol DL, et al. Prevalence of cerebral amyloid pathology in persons without dementia: a meta-analysis. Jama. 2015;313(19):1924–38. https://doi.org/10.1001/jama.2015.4668 .

Page MJ, McKenzie JE, Bossuyt PM, The PRISMA, et al. statement: an updated guideline for reporting systematic reviews. BMJ. 2020;2021: n71. https://doi.org/10.1136/bmj.n71 .

Weiner MW. Alzheimer’s disease neuroimaging initiative. Available from: https://adni.loni.usc.edu/ .

Aydin O, Yassikaya MY. Validity and reliability analysis of the plotdigitizer software program for data extraction from single-case graphs. Perspect Behav Sci. 2022;45(1):239–57. https://doi.org/10.1007/s40614-021-00284-0 .

Huwaldt, J. A., & Steinhorst, S. (2020). Plot digitizer 2.6.9.PlotDigitizer-Software. http://plotdigitizer.sourceforge.net/ .

Lewczuk P, Matzen A, Blennow K, et al. Cerebrospinal Fluid Aβ42/40 Corresponds better than Aβ42 to amyloid PET in Alzheimer’s disease. J Alzheimers Dis. 2017;55(2):813–22. https://doi.org/10.3233/jad-160722 .

Mantel N, Haenszel W. Statistical aspects of the analysis of data from retrospective studies of disease. J Natl Cancer Inst. 1959;22(4):719–48.

CAS   PubMed   Google Scholar  

Robins J, Greenland S, Breslow NE. A general estimator for the variance of the Mantel-Haenszel odds ratio. Am J Epidemiol. 1986;124(5):719–23. https://doi.org/10.1093/oxfordjournals.aje.a114447 .

Thompson SG, Turner RM, Warn DE. Multilevel models for meta-analysis, and their application to absolute risk differences. Stat Methods Med Res. 2001;10(6):375–92. https://doi.org/10.1177/096228020101000602 .

Viechtbauer W, Cheung MW. Outlier and influence diagnostics for meta-analysis. Res Synth Methods. 2010;1(2):112–25. https://doi.org/10.1002/jrsm.11 .

Hayden JA, van der Windt DA, Cartwright JL, Côté P, Bombardier C. Assessing bias in studies of prognostic factors. Ann Intern Med. 2013;158(4):280–6. https://doi.org/10.7326/0003-4819-158-4-201302190-00009 .

Peters JL, Sutton AJ, Jones DR, Abrams KR, Rushton L. Comparison of two methods to detect publication bias in meta-analysis. Jama. 2006;295(6):676–80. https://doi.org/10.1001/jama.295.6.676 .

Kemppainen NM, Scheinin NM, Koivunen J, et al. Five-year follow-up of 11C-PIB uptake in Alzheimer’s disease and MCI. Eur J Nucl Med Mol Imaging. 2014;41(2):283–9. https://doi.org/10.1007/s00259-013-2562-0 .

Buchhave P, Minthon L, Zetterberg H, Wallin AK, Blennow K, Hansson O. Cerebrospinal fluid levels of β-amyloid 1–42, but not of tau, are fully changed already 5 to 10 years before the onset of Alzheimer dementia. Arch Gen Psychiatry. 2012;69(1):98–106. https://doi.org/10.1001/archgenpsychiatry.2011.155 .

Forlenza OV, Radanovic M, Talib LL, et al. Cerebrospinal fluid biomarkers in Alzheimer’s disease: diagnostic accuracy and prediction of dementia. Alzheimers Dement (Amst). 2015;1(4):455–63. https://doi.org/10.1016/j.dadm.2015.09.003 .

Hansson O, Buchhave P, Zetterberg H, Blennow K, Minthon L, Warkentin S. Combined rCBF and CSF biomarkers predict progression from mild cognitive impairment to Alzheimer’s disease. Neurobiol Aging. 2009;30(2):165–73. https://doi.org/10.1016/j.neurobiolaging.2007.06.009 .

Petersen RC. Mild cognitive impairment as a diagnostic entity. J Intern Med. 2004;256(3):183–94. https://doi.org/10.1111/j.1365-2796.2004.01388.x .

Arruda F, Rosselli M, Mejia Kurasz A, et al. Stability in cognitive classification as a function of severity of impairment and ethnicity: a longitudinal analysis. Article in Press. Appl Neuropsychol Adult. 2023:1-14. https://doi.org/10.1080/23279095.2023.2222861 .

Baldeiras I, Silva-Spínola A, Lima M, et al. Alzheimer’s disease diagnosis based on the amyloid, tau, and neurodegeneration scheme (ATN) in a real-life multicenter cohort of general neurological centers. J Alzheimer’s Dis. 2022;90(1):419–32. https://doi.org/10.3233/JAD-220587 .

Bos I, Verhey FR, Ramakers I, et al. Cerebrovascular and amyloid pathology in predementia stages: the relationship with neurodegeneration and cognitive decline. Alzheimers Res Ther. 2017;9(1):101. https://doi.org/10.1186/s13195-017-0328-9 .

Cerami C, Della Rosa PA, Magnani G, et al. Brain metabolic maps in Mild cognitive impairment predict heterogeneity of progression to dementia. Neuroimage Clin. 2015;7:187–94. https://doi.org/10.1016/j.nicl.2014.12.004 .

de Wilde A, Reimand J, Teunissen CE, et al. Discordant amyloid-β PET and CSF biomarkers and its clinical consequences. Alzheimers Res Ther. 2019;11(1):78. https://doi.org/10.1186/s13195-019-0532-x .

Eckerström C, Svensson J, Kettunen P, Jonsson M, Eckerström M. Evaluation of the ATN model in a longitudinal memory clinic sample with different underlying disorders. Alzheimers Dement (Amst). 2021;13(1): e12031. https://doi.org/10.1002/dad2.12031 .

Frölich L, Peters O, Lewczuk P, et al. Incremental value of biomarker combinations to predict progression of mild cognitive impairment to Alzheimer’s dementia. Alzheimers Res Ther. 2017;9(1):84. https://doi.org/10.1186/s13195-017-0301-7 .

Groot C, Cicognola C, Bali D, et al. Diagnostic and prognostic performance to detect alzheimer’s disease and clinical progression of a novel assay for plasma p-tau217. Article Alzheimer’s Res Ther. 2022;14(1):67. https://doi.org/10.1186/s13195-022-01005-8 .

Hanseeuw BJ, Malotaux V, Dricot L, et al. Defining a Centiloid scale threshold predicting long-term progression to dementia in patients attending the memory clinic: an [(18)F] flutemetamol amyloid PET study. Eur J Nucl Med Mol Imaging. 2021;48(1):302–10. https://doi.org/10.1007/s00259-020-04942-4 .

Herukka SK, Hallikainen M, Soininen H, Pirttilä T. CSF Aβ42 and tau or phosphorylated tau and prediction of progressive mild cognitive impairment. Article Neurology. 2005;64(7):1294–7. https://doi.org/10.1212/01.WNL.0000156914.16988.56 .

Jiménez-Bonilla JF, Quirce R, De Arcocha-Torres M, et al. A 5-year longitudinal evaluation in patients with mild cognitive impairment by 11C-PIB PET/CT: a visual analysis. Nucl Med Commun. 2019;40(5):525–31. https://doi.org/10.1097/mnm.0000000000001004 .

Lopez OL, Becker JT, Chang Y, et al. Amyloid deposition and brain structure as long-term predictors of MCI, dementia, and mortality. Neurology. 2018;90(21):E1920–8. https://doi.org/10.1212/WNL.0000000000005549 .

Okello A, Koivunen J, Edison P, et al. Conversion of amyloid positive and negative MCI to AD over 3 years: an 11C-PIB PET study. Neurology. 2009;73(10):754–60. https://doi.org/10.1212/WNL.0b013e3181b23564 .

Orellana A, García-González P, Valero S, et al. Establishing in-house cutoffs of CSF Alzheimer’s disease biomarkers for the AT(N) stratification of the Alzheimer center barcelona cohort. Int J Mol Sci. 2022;23(13):6891. https://doi.org/10.3390/ijms23136891 .

Ortega RL, Dakterzada F, Arias A, et al. Usefulness of CSF biomarkers in predicting the progression of amnesic and nonamnesic mild cognitive impairment to Alzheimer’s disease. Curr Aging Sci. 2019;12(1):35–42. https://doi.org/10.2174/1874609812666190112095430 .

Riemenschneider M, Lautenschlager N, Wagenpfeil S, Diehl J, Drzezga A, Kurz A. Cerebrospinal fluid tau and beta-amyloid 42 proteins identify Alzheimer disease in subjects with mild cognitive impairment. Arch Neurol. 2002;59(11):1729–34. https://doi.org/10.1001/archneur.59.11.1729 .

Rizzi L, Missiaggia L, Schwartz IVD, Roriz-Cruz M. Value of CSF biomarkers in predicting risk of progression from aMCI to ADD in a 5-year follow-up cohort. SN Compr Clin Med. 2020;2(9):1543–50. https://doi.org/10.1007/s42399-020-00437-3 .

Roberts RO, Aakre JA, Kremers WK, et al. Prevalence and Outcomes of amyloid positivity among persons without dementia in a longitudinal population-based setting. JAMA Neurol. 2018;75(8):970–9. https://doi.org/10.1001/jamaneurol.2018.0629 .

Villemagne VL, Pike KE, Chételat G, et al. Longitudinal assessment of Aβ and cognition in aging and Alzheimer disease. Ann Neurol. 2011;69(1):181–92. https://doi.org/10.1002/ana.22248 .

Hansson O, Zetterberg H, Buchhave P, Londos E, Blennow K, Minthon L. Association between CSF biomarkers and incipient Alzheimer’s disease in patients with mild cognitive impairment: a follow-up study. Lancet Neurol. 2006;5(3):228–34. https://doi.org/10.1016/s1474-4422(06)70355-6 .

Dang C, Harrington KD, Lim YY, et al. Relationship Between amyloid-β positivity and progression to mild cognitive impairment or dementia over 8 years in cognitively normal older adults. J Alzheimers Dis. 2018;65(4):1313–25. https://doi.org/10.3233/jad-180507 .

Ebenau JL, Timmers T, Wesselman LMP, et al. ATN classification and clinical progression in subjective cognitive decline: The SCIENCe project. Neurology. 2020;95(1):e46–58. https://doi.org/10.1212/wnl.0000000000009724 .

Hatashita S, Wakebe D. Amyloid β deposition and glucose metabolism on the long-term progression of preclinical Alzheimer’s disease. Future Sci OA. 2019;5(3):Fso356. https://doi.org/10.4155/fsoa-2018-0069 .

Ossenkoppele R, Pichet Binette A, Groot C, et al. Amyloid and tau PET-positive cognitively unimpaired individuals are at high risk for future cognitive decline. Nature Medicine. 2022;28(11):2381–7. https://doi.org/10.1038/s41591-022-02049-x .

Strikwerda-Brown C, Hobbs DA, Gonneaud J, et al. Association of elevated amyloid and tau positron emission tomography signal with near-term development of alzheimer disease symptoms in older adults without cognitive impairment. JAMA Neurology. 2022;79(10):975. https://doi.org/10.1001/jamaneurol.2022.2379 .

Vos SJ, Xiong C, Visser PJ, et al. Preclinical Alzheimer’s disease and its outcome: a longitudinal cohort study. Lancet Neurol. 2013;12(10):957–65. https://doi.org/10.1016/s1474-4422(13)70194-7 .

Blom ES, Giedraitis V, Zetterberg H, et al. Rapid progression from mild cognitive impairment to Alzheimer’s disease in subjects with elevated levels of tau in cerebrospinal fluid and the APOE epsilon4/epsilon4 genotype. Dement Geriatr Cogn Disord. 2009;27(5):458–64. https://doi.org/10.1159/000216841 .

Hong YJ, Park JW, Lee SB, et al. The influence of amyloid burden on cognitive decline over 2 years in older adults with subjective cognitive decline: a prospective cohort study. Dement Geriatr Cogn Disord. 2021;50(5):437–45. https://doi.org/10.1159/000519766 .

Tomassen J, den Braber A, van der Landen SM, et al. Abnormal cerebrospinal fluid levels of amyloid and tau are associated with cognitive decline over time in cognitively normal older adults: A monozygotic twin study. Alzheimers Dement (N Y). 2022;8(1): e12346. https://doi.org/10.1002/trc2.12346 .

Rodrigue KM, Kennedy KM, Devous MD Sr, et al. β-Amyloid burden in healthy aging: regional distribution and cognitive consequences. Neurology. 2012;78(6):387–95. https://doi.org/10.1212/WNL.0b013e318245d295 .

Donohue MC, Jacqmin-Gadda H, Le Goff M, et al. Estimating long-term multivariate progression from short-term data. Alzheimers Dement. 2014;10(5 Suppl):S400–10. https://doi.org/10.1016/j.jalz.2013.10.003 .

Young AL, Oxtoby NP, Daga P, et al. A data-driven model of biomarker changes in sporadic Alzheimer’s disease. Brain. 2014;137(Pt 9):2564–77. https://doi.org/10.1093/brain/awu176 .

Oberstein TJ, Schmidt MA, Florvaag A, et al. Amyloid-β levels and cognitive trajectories in non-demented pTau181-positive subjects without amyloidopathy. Brain. 2022;145(11):4032–41. https://doi.org/10.1093/brain/awac297 .

Wisse LEM, Butala N, Das SR, et al. Suspected non-AD pathology in mild cognitive impairment. Neurobiol Aging. 2015;36(12):3152–62. https://doi.org/10.1016/j.neurobiolaging.2015.08.029 .

Pouclet-Courtemanche H, Nguyen TB, Skrobala E, et al. Frontotemporal dementia is the leading cause of “true” A-/T+ profiles defined with Aβ(42/40) ratio. Alzheimers Dement (Amst). 2019;11:161–9. https://doi.org/10.1016/j.dadm.2019.01.001 .

Vos SJB, Gordon BA, Su Y, et al. NIA-AA staging of preclinical Alzheimer disease: discordance and concordance of CSF and imaging biomarkers. Neurobiol Aging. 2016;44:1–8. https://doi.org/10.1016/j.neurobiolaging.2016.03.025 .

Livingston G, Huntley J, Sommerlad A, et al. Dementia prevention, intervention, and care: 2020 report of the Lancet Commission. Lancet. 2020;396(10248):413–46. https://doi.org/10.1016/s0140-6736(20)30367-6 .

Lourida I, Hannon E, Littlejohns TJ, et al. Association of lifestyle and genetic risk with incidence of dementia. JAMA. 2019;322(5):430–7. https://doi.org/10.1001/jama.2019.9879 .

Licher S, Ahmad S, Karamujić-Čomić H, et al. Genetic predisposition, modifiable-risk-factor profile and long-term dementia risk in the general population. Nat Med. 2019;25(9):1364–9. https://doi.org/10.1038/s41591-019-0547-7 .

Winblad B, Palmer K, Kivipelto M, et al. Mild cognitive impairment–beyond controversies, towards a consensus: report of the international working group on mild cognitive impairment. J Intern Med. 2004;256(3):240–6. https://doi.org/10.1111/j.1365-2796.2004.01380.x .

La Joie R, Bejanin A, Fagan AM, et al. Associations between [(18)F]AV1451 tau PET and CSF measures of tau pathology in a clinical sample. Neurology. 2018;90(4):e282–90. https://doi.org/10.1212/wnl.0000000000004860 .

Wolters EE, Ossenkoppele R, Verfaillie SCJ, et al. Regional [(18)F]flortaucipir PET is more closely associated with disease severity than CSF p-tau in Alzheimer’s disease. Eur J Nucl Med Mol Imaging. 2020;47(12):2866–78. https://doi.org/10.1007/s00259-020-04758-2 .

Lee J, Jang H, Kang SH, et al. Cerebrospinal fluid biomarkers for the diagnosis and classification of Alzheimer’s disease spectrum. J Korean Med Sci. 2020;35(44):361. https://doi.org/10.3346/jkms.2020.35.e361 .

Toledo JB, Brettschneider J, Grossman M, et al. CSF biomarkers cutoffs: the importance of coincident neuropathological diseases. Acta Neuropathol. 2012;124(1):23–35. https://doi.org/10.1007/s00401-012-0983-7 .

Palmqvist S, Zetterberg H, Mattsson N, et al. Detailed comparison of amyloid PET and CSF biomarkers for identifying early Alzheimer disease. Neurology. 2015;85(14):1240–9. https://doi.org/10.1212/wnl.0000000000001991 .

Download references

Acknowledgements

Not applicable.

Open access funding provided by Semmelweis University. 1. Supported by the GINOP-2.3.4-15-2020-00008 project. The project is co-financed by the European Union and the European Regional Development Fund.

2. This is an EU Joint Programme- Neurodegenerative Disease Research (JPND) project. The project is supported through the following funding organization under the aegis of JPND - www.jpnd.eu (National Research, Development and Innovation, Hungary, 2019-2.1.7-ERA-NET-2020-00006).

3. Supported by the National Research, Development and Innovation Office (NKFI/OTKA FK 138385).

Role of funding source: The sponsor(s), did not participate in study design; in the collection, analysis and interpretation of data; in the writing of the report; and in the decision to submit the article for publication.

Author information

Authors and affiliations.

Centre for Translational Medicine, Semmelweis University, Üllői út 26, Budapest, 1085, Hungary

Zsolt Huszár, Marie Anne Engh, Márk Pavlekovics, Tomoya Sato, Yalea Steenkamp, Tamás Terebessy, Zsolt Molnár, Péter Hegyi & Gábor Csukly

Department of Psychiatry and Psychotherapy, Semmelweis University, Balassa utca 6, Budapest, 1083, Hungary

Zsolt Huszár & Gábor Csukly

Department of Neurology, Jahn Ferenc Teaching Hospital, Köves utca 1, Budapest, 1204, Hungary

Márk Pavlekovics

Department of Neurology and Institute of Neuroscience, Cliniques Universitaires Saint-Luc, Université Catholique de Louvain, Brussels, 1200, Belgium

Bernard Hanseeuw

Department of Radiology, Gordon Center for Medical Imaging, Massachusetts General Hospital, Harvard Medical School, Boston, MA, 02155, USA

Department of Anesthesiology and Intensive Therapy, Semmelweis University, Üllői út 78/A, Budapest, Hungary

Zsolt Molnár

Department of Anesthesiology and Intensive Therapy, Poznan University of Medical Sciences, 49 Przybyszewskiego St, Poznan, Poland

Institute for Translational Medicine, Medical School, University of Pécs, Pécs, 7624, Hungary

Péter Hegyi

Institute of Pancreatic Diseases, Semmelweis University, Tömő 25-29, Budapest, 1083, Hungary

Translational Pancreatology Research Group, Interdisciplinary Centre of Excellence for Research Development and Innovation University of Szeged, Budapesti 9, Szeged, 6728, Hungary

You can also search for this author in PubMed   Google Scholar

Contributions

ZH: conceptualisation, project administration, methodology, formal analysis, writing – original draft; ME: conceptualisation, methodology, formal analysis, writing – review and editing; MP: conceptualisation, formal analysis, writing - review and editing; TS formal analysis, writing – review and editing; YS: formal analysis, writing – review and editing; BH: writing - review and editing; TT: conceptualisation, writing – review and editing; ZM: conceptualisation, supervision, writing - review and editing; PH: conceptualisation, supervision, writing - review and editing; GCs: conceptualization, methodology, formal analysis, supervision, writing – original draft, visualization. All authors read and approved the final manuscript.

Corresponding author

Correspondence to Gábor Csukly .

Ethics declarations

Ethics approval and consent to participate, consent for publication, competing interests.

The authors declare no competing interests.

Additional information

Publisher’s note.

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary Information

Supplementary material 1., rights and permissions.

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ . The Creative Commons Public Domain Dedication waiver ( http://creativecommons.org/publicdomain/zero/1.0/ ) applies to the data made available in this article, unless otherwise stated in a credit line to the data.

Reprints and permissions

About this article

Cite this article.

Huszár, Z., Engh, M., Pavlekovics, M. et al. Risk of conversion to mild cognitive impairment or dementia among subjects with amyloid and tau pathology: a systematic review and meta-analysis. Alz Res Therapy 16 , 81 (2024). https://doi.org/10.1186/s13195-024-01455-2

Download citation

Received : 07 July 2023

Accepted : 08 April 2024

Published : 12 April 2024

DOI : https://doi.org/10.1186/s13195-024-01455-2

Share this article

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

Alzheimer's Research & Therapy

ISSN: 1758-9193

bias in case study research

U.S. flag

An official website of the United States government

The .gov means it’s official. Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

The site is secure. The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

  • Publications
  • Account settings

Preview improvements coming to the PMC website in October 2024. Learn More or Try it out now .

  • Advanced Search
  • Journal List
  • Biochem Med (Zagreb)
  • v.23(1); 2013 Feb

Bias in research

By writing scientific articles we communicate science among colleagues and peers. By doing this, it is our responsibility to adhere to some basic principles like transparency and accuracy. Authors, journal editors and reviewers need to be concerned about the quality of the work submitted for publication and ensure that only studies which have been designed, conducted and reported in a transparent way, honestly and without any deviation from the truth get to be published. Any such trend or deviation from the truth in data collection, analysis, interpretation and publication is called bias. Bias in research can occur either intentionally or unintentionally. Bias causes false conclusions and is potentially misleading. Therefore, it is immoral and unethical to conduct biased research. Every scientist should thus be aware of all potential sources of bias and undertake all possible actions to reduce or minimize the deviation from the truth. This article describes some basic issues related to bias in research.

Introduction

Scientific papers are tools for communicating science between colleagues and peers. Every research needs to be designed, conducted and reported in a transparent way, honestly and without any deviation from the truth. Research which is not compliant with those basic principles is misleading. Such studies create distorted impressions and false conclusions and thus can cause wrong medical decisions, harm to the patient as well as substantial financial losses. This article provides the insight into the ways of recognizing sources of bias and avoiding bias in research.

Definition of bias

Bias is any trend or deviation from the truth in data collection, data analysis, interpretation and publication which can cause false conclusions. Bias can occur either intentionally or unintentionally ( 1 ). Intention to introduce bias into someone’s research is immoral. Nevertheless, considering the possible consequences of a biased research, it is almost equally irresponsible to conduct and publish a biased research unintentionally.

It is worth pointing out that every study has its confounding variables and limitations. Confounding effect cannot be completely avoided. Every scientist should therefore be aware of all potential sources of bias and undertake all possible actions to reduce and minimize the deviation from the truth. If deviation is still present, authors should confess it in their articles by declaring the known limitations of their work.

It is also the responsibility of editors and reviewers to detect any potential bias. If such bias exists, it is up to the editor to decide whether the bias has an important effect on the study conclusions. If that is the case, such articles need to be rejected for publication, because its conclusions are not valid.

Bias in data collection

Population consists of all individuals with a characteristic of interest. Since, studying a population is quite often impossible due to the limited time and money; we usually study a phenomenon of interest in a representative sample. By doing this, we hope that what we have learned from a sample can be generalized to the entire population ( 2 ). To be able to do so, a sample needs to be representative of the population. If this is not the case, conclusions will not be generalizable, i.e. the study will not have the external validity.

So, sampling is a crucial step for every research. While collecting data for research, there are numerous ways by which researchers can introduce bias in the study. If, for example, during patient recruitment, some patients are less or more likely to enter the study than others, such sample would not be representative of the population in which this research is done. In that case, these subjects who are less likely to enter the study will be under-represented and those who are more likely to enter the study will be over-represented relative to others in the general population, to which conclusions of the study are to be applied to. This is what we call a selection bias . To ensure that a sample is representative of a population, sampling should be random, i.e. every subject needs to have equal probability to be included in the study. It should be noted that sampling bias can also occur if sample is too small to represent the target population ( 3 ).

For example, if the aim of the study is to assess the average hsCRP (high sensitive C-reactive protein) concentration in healthy population in Croatia, the way to go would be to recruit healthy individuals from a general population during their regular annual health check up. On the other hand, a biased study would be one which recruits only volunteer blood donors because healthy blood donors are usually individuals who feel themselves healthy and who are not suffering from any condition or illness which might cause changes in hsCRP concentration. By recruiting only healthy blood donors we might conclude that hsCRP is much lower that it really is. This is a kind of sampling bias, which we call a volunteer bias .

Another example for volunteer bias occurs by inviting colleagues from a laboratory or clinical department to participate in the study on some new marker for anemia. It is very likely that such study would preferentially include those participants who might suspect to be anemic and are curious to learn it from this new test. This way, anemic individuals might be over-represented. A research would then be biased and it would not allow generalization of conclusions to the rest of the population.

Generally speaking, whenever cross-sectional or case control studies are done exclusively in hospital settings, there is a good chance that such study will be biased. This is called admission bias . Bias exists because the population studied does not reflect the general population.

Another example of sampling bias is the so called survivor bias which usually occurs in cross-sectional studies. If a study is aimed to assess the association of altered KLK6 (human Kallikrein-6) expression with a 10 year incidence of Alzheimer’s disease, subjects who died before the study end point might be missed from the study.

Misclassification bias is a kind of sampling bias which occurs when a disease of interest is poorly defined, when there is no gold standard for diagnosis of the disease or when a disease might not be easy detectable. This way some subjects are falsely classified as cases or controls whereas they should have been in another group. Let us say that a researcher wants to study the accuracy of a new test for an early detection of the prostate cancer in asymptomatic men. Due to absence of a reliable test for the early prostate cancer detection, there is a chance that some early prostate cancer cases would go misclassified as disease-free causing the under- or over-estimation of the accuracy of this new marker.

As a general rule, a research question needs to be considered with much attention and all efforts should be made to ensure that a sample is as closely matched to the population, as possible.

Bias in data analysis

A researcher can introduce bias in data analysis by analyzing data in a way which gives preference to the conclusions in favor of research hypothesis. There are various opportunities by which bias can be introduced during data analysis, such as by fabricating, abusing or manipulating the data. Some examples are:

  • reporting non-existing data from experiments which were never done (data fabrication);
  • eliminating data which do not support your hypothesis (outliers, or even whole subgroups);
  • using inappropriate statistical tests to test your data;
  • performing multiple testing (“fishing for P”) by pair-wise comparisons ( 4 ), testing multiple endpoints and performing secondary or subgroup analyses, which were not part of the original plan in order “to find” statistically significant difference regardless to hypothesis.

For example, if the study aim is to show that one biomarker is associated with another in a group of patients, and this association does not prove significant in a total cohort, researchers may start “torturing the data” by trying to divide their data into various subgroups until this association becomes statistically significant. If this sub-classification of a study population was not part of the original research hypothesis, such behavior is considered data manipulation and is neither acceptable nor ethical. Such studies quite often provide meaningless conclusions such as:

  • CRP was statistically significant in a subgroup of women under 37 years with cholesterol concentration > 6.2 mmol/L;
  • lactate concentration was negatively associated with albumin concentration in a subgroup of male patients with a body mass index in the lowest quartile and total leukocyte count below 4.00 × 10 9 /L.

Besides being biased, invalid and illogical, those conclusions are also useless, since they cannot be generalized to the entire population.

There is a very often quoted saying (attributed to Ronald Coase, but unpublished to the best of my knowledge), which says: “If you torture the data long enough, it will confess to anything”. This actually means that there is a good chance that statistical significance will be reached only by increasing the number of hypotheses tested in the work. The question is then: is this significant difference real or did it occur by pure chance?

Actually, it is well known that if 20 tests are performed on the same data set, at least one Type 1 error (α) is to be expected. Therefore, the number of hypotheses to be tested in a certain study needs to determined in advance. If multiple hypotheses are tested, correction for multiple testing should be applied or study should be declared as exploratory.

Bias in data interpretation

By interpreting the results, one needs to make sure that proper statistical tests were used, that results were presented correctly and that data are interpreted only if there was a statistical significance of the observed relationship ( 5 ). Otherwise, there may be some bias in a research.

However, wishful thinking is not rare in scientific research. Some researchers tend to believe so much in their original hypotheses that they tend to neglect the original findings and interpret them in favor of their beliefs. Examples are:

  • discussing observed differences and associations even if they are not statistically significant (the often used expression is “borderline significance”);
  • discussing differences which are statistically significant but are not clinically meaningful;
  • drawing conclusions about the causality, even if the study was not designed as an experiment;
  • drawing conclusions about the values outside the range of observed data (extrapolation);
  • overgeneralization of the study conclusions to the entire general population, even if a study was confined to the population subset;
  • Type I (the expected effect is found significant, when actually there is none) and type II (the expected effect is not found significant, when it is actually present) errors ( 6 ).

Even if this is done as an honest error or due to the negligence, it is still considered a serious misconduct.

Publication bias

Unfortunately, scientific journals are much more likely to accept for publication a study which reports some positive than a study with negative findings. Such behavior creates false impression in the literature and may cause long-term consequences to the entire scientific community. Also, if negative results would not have so many difficulties to get published, other scientists would not unnecessarily waste their time and financial resources by re-running the same experiments.

Journal editors are the most responsible for this phenomenon. Ideally, a study should have equal opportunity to be published regardless of the nature of its findings, if designed in a proper way, with valid scientific assumptions, well conducted experiments and adequate data analysis, presentation and conclusions. However, in reality, this is not the case. To enable publication of studies reporting negative findings, several journals have already been launched, such as Journal of Pharmaceutical Negative Results, Journal of Negative Results in Biomedicine, Journal of Interesting Negative Results and some other. The aim of such journals is to counterbalance the ever-increasing pressure in the scientific literature to publish only positive results.

It is our policy at Biochemia Medica to give equal consideration to submitted articles, regardless to the nature of its findings.

One sort of publication bias is the so called funding bias which occurs due to the prevailing number of studies funded by the same company, related to the same scientific question and supporting the interests of the sponsoring company. It is absolutely acceptable to receive funding from a company to perform a research, as long as the study is run independently and not being influenced in any way by the sponsoring company and as long as the funding source is declared as a potential conflict of interest to the journal editors, reviewers and readers.

It is the policy of our Journal to demand such declaration from the authors during submission and to publish this declaration in the published article ( 7 ). By this we believe that scientific community is given an opportunity to judge on the presence of any potential bias in the published work.

There are many potential sources of bias in research. Bias in research can cause distorted results and wrong conclusions. Such studies can lead to unnecessary costs, wrong clinical practice and they can eventually cause some kind of harm to the patient. It is therefore the responsibility of all involved stakeholders in the scientific publishing to ensure that only valid and unbiased research conducted in a highly professional and competent manner is published ( 8 ).

Potential conflict of interest

None declared.

IMAGES

  1. Research bias: What it is, Types & Examples

    bias in case study research

  2. Research bias: What it is, Types & Examples

    bias in case study research

  3. Types of Bias in Research.

    bias in case study research

  4. case–non case studies principle methods bias and interpretation

    bias in case study research

  5. 35 Media Bias Examples for Students (2024)

    bias in case study research

  6. Research Bias

    bias in case study research

VIDEO

  1. case study research (background info and setting the stage)

  2. Case study

  3. what is case study research in Urdu Hindi with easy examples

  4. Sampling Bias in Research

  5. Types of Unconscious Bias in the Workplace

  6. Carrying out a Case Study Research: Using practical examples

COMMENTS

  1. Types of Bias in Research

    Information bias occurs during the data collection step and is common in research studies that involve self-reporting and retrospective data collection. It can also result from poor interviewing techniques or differing levels of recall from participants. The main types of information bias are: Recall bias. Observer bias.

  2. Identifying and Avoiding Bias in Research

    Abstract. This narrative review provides an overview on the topic of bias as part of Plastic and Reconstructive Surgery 's series of articles on evidence-based medicine. Bias can occur in the planning, data collection, analysis, and publication phases of research. Understanding research bias allows readers to critically and independently review ...

  3. Moving towards less biased research

    Introduction. Bias, perhaps best described as 'any process at any stage of inference which tends to produce results or conclusions that differ systematically from the truth,' can pollute the entire spectrum of research, including its design, analysis, interpretation and reporting. 1 It can taint entire bodies of research as much as it can individual studies. 2 3 Given this extensive ...

  4. Revisiting Bias in Qualitative Research: Reflections on Its

    Bias—commonly understood to be any influence that provides a distortion in the results of a study (Polit & Beck, 2014)—is a term drawn from the quantitative research paradigm.Most (though perhaps not all) of us would recognize the concept as being incompatible with the philosophical underpinnings of qualitative inquiry (Thorne, Stephens, & Truant, 2016).

  5. Best Available Evidence or Truth for the Moment: Bias in Research

    Abstract. The subject of this column is the nature of bias in both quantitative and qualitative research. To that end, bias will be defined and then both the processes by which it enters into research will be entertained along with discussions on how to ameliorate this problem. Clinicians, who are in practice, frequently are called upon to make ...

  6. Study Bias

    Outside of medicine, significant bias can result in erroneous conclusions in academic research, leading to future fruitless studies in the same field. [55] When combined with the knowledge that most studies are never replicated or verified, this can lead to a deleterious cycle of biased, unverified research leading to more research.

  7. Quantifying and addressing the prevalence and bias of study ...

    Future research is needed to refine our methodology, but our empirically grounded form of bias-adjusted meta-analysis could be implemented as follows: 1.) collate studies for the same true effect ...

  8. Handout: Case Studies in Research and Bias

    Case Studies in Research and Bias .pdf application/pdf 167 KB Download File; More About This Work. Academic Units Center for Teaching and Learning Libraries Series From Books to Bytes: Navigating the Research Ecosystem Published Here September 1, 2020. Related Items. Supplement to: Recognize Bias.

  9. Temporal bias in case-control design: preventing reliable ...

    One of the primary tools that researchers use to predict risk is the case-control study. We identify a flaw, temporal bias, that is specific to and uniquely associated with these studies that ...

  10. Five Misunderstandings About Case-Study Research

    Abstract. This article examines five common misunderstandings about case-study research: (1) Theoretical. knowledge is more valuable than practical knowledge; (2) One cannot generalize from a single case, therefore the single case study cannot contribute to scientific development; (3) The case study is most.

  11. Bias in research

    The aim of this article is to outline types of 'bias' across research designs, and consider strategies to minimise bias. Evidence-based nursing, defined as the "process by which evidence, nursing theory, and clinical expertise are critically evaluated and considered, in conjunction with patient involvement, to provide the delivery of optimum nursing care,"1 is central to the continued ...

  12. Case Study Methodology of Qualitative Research: Key Attributes and

    A case study is one of the most commonly used methodologies of social research. This article attempts to look into the various dimensions of a case study research strategy, the different epistemological strands which determine the particular case study type and approach adopted in the field, discusses the factors which can enhance the effectiveness of a case study research, and the debate ...

  13. Research Bias 101: Definition + Examples

    Research bias refers to any instance where the researcher, or the research design, negatively influences the quality of a study's results, whether intentionally or not. The three common types of research bias we looked at are: Selection bias - where a skewed sample leads to skewed results. Analysis bias - where the analysis method and/or ...

  14. PDF Bias in research

    designing research studies, and often provides valuable practical guidance in developing robust research. In quantitative studies, selection bias is often reduced by the random selection of participants, and in the case of clinical trials randomisation of participants into comparison groups. However, not accounting for

  15. How bias affects scientific research

    Researcher bias occurs when the researcher conducting the study is in favor of a certain result. Researchers can influence outcomes through their study design choices, including who they choose to ...

  16. PDF 2018 Ethics Discussion Cases

    Implicit or unconscious bias occurs automatically and unintentionally, escaping the conscious awareness of an individual or group, and so can be especially insidious and difficult to recognize. This year's research ethics discussion cases are intended to increase recognition of different types of

  17. The Cycle of Bias in Health Research: A Framework and Toolbox for

    Keywords: Conflicts of interest, industry sponsorship of health studies, bias in research, evidence-based medicine, informed decision making, research ethics. Introduction. The ability to recognize the risk of bias in research is an important critical appraisal skill. Bias related to methodological flaws or conflicts of interest (COI) can ...

  18. Tips and Tricks for Maintaining Wellbeing and Addressing Bias in Research

    By acknowledging your biases upfront, you can mitigate their impact on your work and maintain its integrity. working with the content. Have a plan. Before diving into your research, develop a plan for what you will do, how it will function, and how you will attempt to maintain your wellbeing as a researcher. Ideally find a partner or community ...

  19. Taking a hard look at our implicit biases

    Banaji opened on Tuesday by recounting the "implicit association" experiments she had done at Yale and at Harvard. The assumptions underlying the research on implicit bias derive from well-established theories of learning and memory and the empirical results are derived from tasks that have their roots in experimental psychology and ...

  20. What Is Meant by 'Bias' in Psychological Science?

    Publication bias is a specific subset within the more broad category of research bias ... Implementation of a publication strategy in the context of reporting biases. A case study based on new documents from Neurontin litigation. Trials, 13, 1-13. Google Scholar Wikipedia. (2022). List of cognitive biases. Wikipedia.org. Accessed Mar ...

  21. bias case studies

    The B-Files (Bias Case Studies) Bias File 1. The Rise and Fall of Hormone Replacement Therapy Bias File 2. Should we stop drinking coffee? The story of coffee and pancreatic cancer Bias File 3. Émile Durkheim and the ecological fallacy Bias File 4. The early controversy over estrogen and endometrial cancer Bias File 5. How blind are the blind?

  22. Can the Bias in Algorithms Help Us See Our Own?

    The research, published in the Proceedings of the National Academy of Sciences, suggests ways that awareness might help human decision-makers recognize and correct for their biases. "Algorithms can codify and amplify human bias, but algorithms also reveal structural biases in our society," says Carey Morewedge, a Questrom professor of ...

  23. Increasing rigor and reducing bias in qualitative research: A document

    Increasing rigor and reducing bias in qualitative research: A document analysis of parliamentary debates using applied thematic analysis. ... Constraints, compromises and choice: Comparing three qualitative research studies. The Qualitative Report 6(4): 1-15. Google Scholar. Connolly M (2003) Qualitative analysis: A teaching tool for social ...

  24. Can the bias in algorithms help us see our own?

    New research shows that people recognize more of their biases in algorithms' decisions than they do in their own -- even when those decisions are the same.

  25. Essential Concepts for Reducing Bias in Observational Studies

    Consultation with a statistician can aid in choosing the type of regression analysis to perform. Regardless of what type of multivariable regression analysis is chosen, an essential component to reduce bias in observational research using multivariable regression involves choosing confounding variables for adjustment.

  26. Can the bias in algorithms help us see our own?

    Their research presents evidence that algorithms can be used as mirrors—a way to identify bias even when people can't see it in themselves. "Right now, I think the literature on algorithmic bias ...

  27. A Stack of Studies Shows Networking Often Backfires for Women Unless

    Research shows women are judged harshly for strategic networking, but also offers a workaround for this annoying bias. ... there is a stack of studies I can show you.

  28. Risk of conversion to mild cognitive impairment or dementia among

    The risk of bias was assessed separately for the analyses discussed above. The overall risk of the studies ranged from low to moderate, except in three cases: twice we found a high risk of bias due to attrition of above 50% [59, 60], and once due to a focus on monozygotic twins (Supplementary Material, eFigure 6).

  29. Bias in research

    Bias in research can cause distorted results and wrong conclusions. Such studies can lead to unnecessary costs, wrong clinical practice and they can eventually cause some kind of harm to the patient. It is therefore the responsibility of all involved stakeholders in the scientific publishing to ensure that only valid and unbiased research ...

  30. Hepatic Ablation Technology: Assessment of Conflicts of Interest in

    Collaboration between the health care industry and surgeons is critical in modern medicine. Conflict of interest (COI) has the risk of introducing bias into research studies. We investigated the accuracy of self-disclosed COI for studies that researched the use of microwave ablation for liver metastasis.