• Privacy Policy

Research Method

Home » Validity – Types, Examples and Guide

Validity – Types, Examples and Guide

Table of Contents

Validity

Definition:

Validity refers to the extent to which a concept, measure, or study accurately represents the intended meaning or reality it is intended to capture. It is a fundamental concept in research and assessment that assesses the soundness and appropriateness of the conclusions, inferences, or interpretations made based on the data or evidence collected.

Research Validity

Research validity refers to the degree to which a study accurately measures or reflects what it claims to measure. In other words, research validity concerns whether the conclusions drawn from a study are based on accurate, reliable and relevant data.

Validity is a concept used in logic and research methodology to assess the strength of an argument or the quality of a research study. It refers to the extent to which a conclusion or result is supported by evidence and reasoning.

How to Ensure Validity in Research

Ensuring validity in research involves several steps and considerations throughout the research process. Here are some key strategies to help maintain research validity:

Clearly Define Research Objectives and Questions

Start by clearly defining your research objectives and formulating specific research questions. This helps focus your study and ensures that you are addressing relevant and meaningful research topics.

Use appropriate research design

Select a research design that aligns with your research objectives and questions. Different types of studies, such as experimental, observational, qualitative, or quantitative, have specific strengths and limitations. Choose the design that best suits your research goals.

Use reliable and valid measurement instruments

If you are measuring variables or constructs, ensure that the measurement instruments you use are reliable and valid. This involves using established and well-tested tools or developing your own instruments through rigorous validation processes.

Ensure a representative sample

When selecting participants or subjects for your study, aim for a sample that is representative of the population you want to generalize to. Consider factors such as age, gender, socioeconomic status, and other relevant demographics to ensure your findings can be generalized appropriately.

Address potential confounding factors

Identify potential confounding variables or biases that could impact your results. Implement strategies such as randomization, matching, or statistical control to minimize the influence of confounding factors and increase internal validity.

Minimize measurement and response biases

Be aware of measurement biases and response biases that can occur during data collection. Use standardized protocols, clear instructions, and trained data collectors to minimize these biases. Employ techniques like blinding or double-blinding in experimental studies to reduce bias.

Conduct appropriate statistical analyses

Ensure that the statistical analyses you employ are appropriate for your research design and data type. Select statistical tests that are relevant to your research questions and use robust analytical techniques to draw accurate conclusions from your data.

Consider external validity

While it may not always be possible to achieve high external validity, be mindful of the generalizability of your findings. Clearly describe your sample and study context to help readers understand the scope and limitations of your research.

Peer review and replication

Submit your research for peer review by experts in your field. Peer review helps identify potential flaws, biases, or methodological issues that can impact validity. Additionally, encourage replication studies by other researchers to validate your findings and enhance the overall reliability of the research.

Transparent reporting

Clearly and transparently report your research methods, procedures, data collection, and analysis techniques. Provide sufficient details for others to evaluate the validity of your study and replicate your work if needed.

Types of Validity

There are several types of validity that researchers consider when designing and evaluating studies. Here are some common types of validity:

Internal Validity

Internal validity relates to the degree to which a study accurately identifies causal relationships between variables. It addresses whether the observed effects can be attributed to the manipulated independent variable rather than confounding factors. Threats to internal validity include selection bias, history effects, maturation of participants, and instrumentation issues.

External Validity

External validity concerns the generalizability of research findings to the broader population or real-world settings. It assesses the extent to which the results can be applied to other individuals, contexts, or timeframes. Factors that can limit external validity include sample characteristics, research settings, and the specific conditions under which the study was conducted.

Construct Validity

Construct validity examines whether a study adequately measures the intended theoretical constructs or concepts. It focuses on the alignment between the operational definitions used in the study and the underlying theoretical constructs. Construct validity can be threatened by issues such as poor measurement tools, inadequate operational definitions, or a lack of clarity in the conceptual framework.

Content Validity

Content validity refers to the degree to which a measurement instrument or test adequately covers the entire range of the construct being measured. It assesses whether the items or questions included in the measurement tool represent the full scope of the construct. Content validity is often evaluated through expert judgment, reviewing the relevance and representativeness of the items.

Criterion Validity

Criterion validity determines the extent to which a measure or test is related to an external criterion or standard. It assesses whether the results obtained from a measurement instrument align with other established measures or outcomes. Criterion validity can be divided into two subtypes: concurrent validity, which examines the relationship between the measure and the criterion at the same time, and predictive validity, which investigates the measure’s ability to predict future outcomes.

Face Validity

Face validity refers to the degree to which a measurement or test appears, on the surface, to measure what it intends to measure. It is a subjective assessment based on whether the items seem relevant and appropriate to the construct being measured. Face validity is often used as an initial evaluation before conducting more rigorous validity assessments.

Importance of Validity

Validity is crucial in research for several reasons:

  • Accurate Measurement: Validity ensures that the measurements or observations in a study accurately represent the intended constructs or variables. Without validity, researchers cannot be confident that their results truly reflect the phenomena they are studying. Validity allows researchers to draw accurate conclusions and make meaningful inferences based on their findings.
  • Credibility and Trustworthiness: Validity enhances the credibility and trustworthiness of research. When a study demonstrates high validity, it indicates that the researchers have taken appropriate measures to ensure the accuracy and integrity of their work. This strengthens the confidence of other researchers, peers, and the wider scientific community in the study’s results and conclusions.
  • Generalizability: Validity helps determine the extent to which research findings can be generalized beyond the specific sample and context of the study. By addressing external validity, researchers can assess whether their results can be applied to other populations, settings, or situations. This information is valuable for making informed decisions, implementing interventions, or developing policies based on research findings.
  • Sound Decision-Making: Validity supports informed decision-making in various fields, such as medicine, psychology, education, and social sciences. When validity is established, policymakers, practitioners, and professionals can rely on research findings to guide their actions and interventions. Validity ensures that decisions are based on accurate and trustworthy information, which can lead to better outcomes and more effective practices.
  • Avoiding Errors and Bias: Validity helps researchers identify and mitigate potential errors and biases in their studies. By addressing internal validity, researchers can minimize confounding factors and alternative explanations, ensuring that the observed effects are genuinely attributable to the manipulated variables. Validity assessments also highlight measurement errors or shortcomings, enabling researchers to improve their measurement tools and procedures.
  • Progress of Scientific Knowledge: Validity is essential for the advancement of scientific knowledge. Valid research contributes to the accumulation of reliable and valid evidence, which forms the foundation for building theories, developing models, and refining existing knowledge. Validity allows researchers to build upon previous findings, replicate studies, and establish a cumulative body of knowledge in various disciplines. Without validity, the scientific community would struggle to make meaningful progress and establish a solid understanding of the phenomena under investigation.
  • Ethical Considerations: Validity is closely linked to ethical considerations in research. Conducting valid research ensures that participants’ time, effort, and data are not wasted on flawed or invalid studies. It upholds the principle of respect for participants’ autonomy and promotes responsible research practices. Validity is also important when making claims or drawing conclusions that may have real-world implications, as misleading or invalid findings can have adverse effects on individuals, organizations, or society as a whole.

Examples of Validity

Here are some examples of validity in different contexts:

  • Example 1: All men are mortal. John is a man. Therefore, John is mortal. This argument is logically valid because the conclusion follows logically from the premises.
  • Example 2: If it is raining, then the ground is wet. The ground is wet. Therefore, it is raining. This argument is not logically valid because there could be other reasons for the ground being wet, such as watering the plants.
  • Example 1: In a study examining the relationship between caffeine consumption and alertness, the researchers use established measures of both variables, ensuring that they are accurately capturing the concepts they intend to measure. This demonstrates construct validity.
  • Example 2: A researcher develops a new questionnaire to measure anxiety levels. They administer the questionnaire to a group of participants and find that it correlates highly with other established anxiety measures. This indicates good construct validity for the new questionnaire.
  • Example 1: A study on the effects of a particular teaching method is conducted in a controlled laboratory setting. The findings of the study may lack external validity because the conditions in the lab may not accurately reflect real-world classroom settings.
  • Example 2: A research study on the effects of a new medication includes participants from diverse backgrounds and age groups, increasing the external validity of the findings to a broader population.
  • Example 1: In an experiment, a researcher manipulates the independent variable (e.g., a new drug) and controls for other variables to ensure that any observed effects on the dependent variable (e.g., symptom reduction) are indeed due to the manipulation. This establishes internal validity.
  • Example 2: A researcher conducts a study examining the relationship between exercise and mood by administering questionnaires to participants. However, the study lacks internal validity because it does not control for other potential factors that could influence mood, such as diet or stress levels.
  • Example 1: A teacher develops a new test to assess students’ knowledge of a particular subject. The items on the test appear to be relevant to the topic at hand and align with what one would expect to find on such a test. This suggests face validity, as the test appears to measure what it intends to measure.
  • Example 2: A company develops a new customer satisfaction survey. The questions included in the survey seem to address key aspects of the customer experience and capture the relevant information. This indicates face validity, as the survey seems appropriate for assessing customer satisfaction.
  • Example 1: A team of experts reviews a comprehensive curriculum for a high school biology course. They evaluate the curriculum to ensure that it covers all the essential topics and concepts necessary for students to gain a thorough understanding of biology. This demonstrates content validity, as the curriculum is representative of the domain it intends to cover.
  • Example 2: A researcher develops a questionnaire to assess career satisfaction. The questions in the questionnaire encompass various dimensions of job satisfaction, such as salary, work-life balance, and career growth. This indicates content validity, as the questionnaire adequately represents the different aspects of career satisfaction.
  • Example 1: A company wants to evaluate the effectiveness of a new employee selection test. They administer the test to a group of job applicants and later assess the job performance of those who were hired. If there is a strong correlation between the test scores and subsequent job performance, it suggests criterion validity, indicating that the test is predictive of job success.
  • Example 2: A researcher wants to determine if a new medical diagnostic tool accurately identifies a specific disease. They compare the results of the diagnostic tool with the gold standard diagnostic method and find a high level of agreement. This demonstrates criterion validity, indicating that the new tool is valid in accurately diagnosing the disease.

Where to Write About Validity in A Thesis

In a thesis, discussions related to validity are typically included in the methodology and results sections. Here are some specific places where you can address validity within your thesis:

Research Design and Methodology

In the methodology section, provide a clear and detailed description of the measures, instruments, or data collection methods used in your study. Discuss the steps taken to establish or assess the validity of these measures. Explain the rationale behind the selection of specific validity types relevant to your study, such as content validity, criterion validity, or construct validity. Discuss any modifications or adaptations made to existing measures and their potential impact on validity.

Measurement Procedures

In the methodology section, elaborate on the procedures implemented to ensure the validity of measurements. Describe how potential biases or confounding factors were addressed, controlled, or accounted for to enhance internal validity. Provide details on how you ensured that the measurement process accurately captures the intended constructs or variables of interest.

Data Collection

In the methodology section, discuss the steps taken to collect data and ensure data validity. Explain any measures implemented to minimize errors or biases during data collection, such as training of data collectors, standardized protocols, or quality control procedures. Address any potential limitations or threats to validity related to the data collection process.

Data Analysis and Results

In the results section, present the analysis and findings related to validity. Report any statistical tests, correlations, or other measures used to assess validity. Provide interpretations and explanations of the results obtained. Discuss the implications of the validity findings for the overall reliability and credibility of your study.

Limitations and Future Directions

In the discussion or conclusion section, reflect on the limitations of your study, including limitations related to validity. Acknowledge any potential threats or weaknesses to validity that you encountered during your research. Discuss how these limitations may have influenced the interpretation of your findings and suggest avenues for future research that could address these validity concerns.

Applications of Validity

Validity is applicable in various areas and contexts where research and measurement play a role. Here are some common applications of validity:

Psychological and Behavioral Research

Validity is crucial in psychology and behavioral research to ensure that measurement instruments accurately capture constructs such as personality traits, intelligence, attitudes, emotions, or psychological disorders. Validity assessments help researchers determine if their measures are truly measuring the intended psychological constructs and if the results can be generalized to broader populations or real-world settings.

Educational Assessment

Validity is essential in educational assessment to determine if tests, exams, or assessments accurately measure students’ knowledge, skills, or abilities. It ensures that the assessment aligns with the educational objectives and provides reliable information about student performance. Validity assessments help identify if the assessment is valid for all students, regardless of their demographic characteristics, language proficiency, or cultural background.

Program Evaluation

Validity plays a crucial role in program evaluation, where researchers assess the effectiveness and impact of interventions, policies, or programs. By establishing validity, evaluators can determine if the observed outcomes are genuinely attributable to the program being evaluated rather than extraneous factors. Validity assessments also help ensure that the evaluation findings are applicable to different populations, contexts, or timeframes.

Medical and Health Research

Validity is essential in medical and health research to ensure the accuracy and reliability of diagnostic tools, measurement instruments, and clinical assessments. Validity assessments help determine if a measurement accurately identifies the presence or absence of a medical condition, measures the effectiveness of a treatment, or predicts patient outcomes. Validity is crucial for establishing evidence-based medicine and informing medical decision-making.

Social Science Research

Validity is relevant in various social science disciplines, including sociology, anthropology, economics, and political science. Researchers use validity to ensure that their measures and methods accurately capture social phenomena, such as social attitudes, behaviors, social structures, or economic indicators. Validity assessments support the reliability and credibility of social science research findings.

Market Research and Surveys

Validity is important in market research and survey studies to ensure that the survey questions effectively measure consumer preferences, buying behaviors, or attitudes towards products or services. Validity assessments help researchers determine if the survey instrument is accurately capturing the desired information and if the results can be generalized to the target population.

Limitations of Validity

Here are some limitations of validity:

  • Construct Validity: Limitations of construct validity include the potential for measurement error, inadequate operational definitions of constructs, or the failure to capture all aspects of a complex construct.
  • Internal Validity: Limitations of internal validity may arise from confounding variables, selection bias, or the presence of extraneous factors that could influence the study outcomes, making it difficult to attribute causality accurately.
  • External Validity: Limitations of external validity can occur when the study sample does not represent the broader population, when the research setting differs significantly from real-world conditions, or when the study lacks ecological validity, i.e., the findings do not reflect real-world complexities.
  • Measurement Validity: Limitations of measurement validity can arise from measurement error, inadequately designed or flawed measurement scales, or limitations inherent in self-report measures, such as social desirability bias or recall bias.
  • Statistical Conclusion Validity: Limitations in statistical conclusion validity can occur due to sampling errors, inadequate sample sizes, or improper statistical analysis techniques, leading to incorrect conclusions or generalizations.
  • Temporal Validity: Limitations of temporal validity arise when the study results become outdated due to changes in the studied phenomena, interventions, or contextual factors.
  • Researcher Bias: Researcher bias can affect the validity of a study. Biases can emerge through the researcher’s subjective interpretation, influence of personal beliefs, or preconceived notions, leading to unintentional distortion of findings or failure to consider alternative explanations.
  • Ethical Validity: Limitations can arise if the study design or methods involve ethical concerns, such as the use of deceptive practices, inadequate informed consent, or potential harm to participants.

Also see  Reliability Vs Validity

About the author

' src=

Muhammad Hassan

Researcher, Academic Writer, Web developer

You may also like

Alternate Forms Reliability

Alternate Forms Reliability – Methods, Examples...

Construct Validity

Construct Validity – Types, Threats and Examples

Internal Validity

Internal Validity – Threats, Examples and Guide

Reliability Vs Validity

Reliability Vs Validity

Internal_Consistency_Reliability

Internal Consistency Reliability – Methods...

Split-Half Reliability

Split-Half Reliability – Methods, Examples and...

  • Bipolar Disorder
  • Therapy Center
  • When To See a Therapist
  • Types of Therapy
  • Best Online Therapy
  • Best Couples Therapy
  • Best Family Therapy
  • Managing Stress
  • Sleep and Dreaming
  • Understanding Emotions
  • Self-Improvement
  • Healthy Relationships
  • Student Resources
  • Personality Types
  • Guided Meditations
  • Verywell Mind Insights
  • 2023 Verywell Mind 25
  • Mental Health in the Classroom
  • Editorial Process
  • Meet Our Review Board
  • Crisis Support

Internal Validity vs. External Validity in Research

Both help determine how meaningful the results of the study are

Arlin Cuncic, MA, is the author of The Anxiety Workbook and founder of the website About Social Anxiety. She has a Master's degree in clinical psychology.

a valid research study does the following

Rachel Goldman, PhD FTOS, is a licensed psychologist, clinical assistant professor, speaker, wellness expert specializing in eating behaviors, stress management, and health behavior change.

a valid research study does the following

Verywell / Bailey Mariner

  • Internal Validity
  • External Validity

Internal validity is a measure of how well a study is conducted (its structure) and how accurately its results reflect the studied group.

External validity relates to how applicable the findings are in the real world. These two concepts help researchers gauge if the results of a research study are trustworthy and meaningful.

Conclusions are warranted

Controls extraneous variables

Eliminates alternative explanations

Focus on accuracy and strong research methods

Findings can be generalized

Outcomes apply to practical situations

Results apply to the world at large

Results can be translated into another context

What Is Internal Validity in Research?

Internal validity is the extent to which a research study establishes a trustworthy cause-and-effect relationship. This type of validity depends largely on the study's procedures and how rigorously it is performed.

Internal validity is important because once established, it makes it possible to eliminate alternative explanations for a finding. If you implement a smoking cessation program, for instance, internal validity ensures that any improvement in the subjects is due to the treatment administered and not something else.

Internal validity is not a "yes or no" concept. Instead, we consider how confident we can be with study findings based on whether the research avoids traps that may make those findings questionable. The less chance there is for "confounding," the higher the internal validity and the more confident we can be.

Confounding refers to uncontrollable variables that come into play and can confuse the outcome of a study, making us unsure of whether we can trust that we have identified the cause-and-effect relationship.

In short, you can only be confident that a study is internally valid if you can rule out alternative explanations for the findings. Three criteria are required to assume cause and effect in a research study:

  • The cause preceded the effect in terms of time.
  • The cause and effect vary together.
  • There are no other likely explanations for the relationship observed.

Factors That Improve Internal Validity

To ensure the internal validity of a study, you want to consider aspects of the research design that will increase the likelihood that you can reject alternative hypotheses. Many factors can improve internal validity in research, including:

  • Blinding : Participants—and sometimes researchers—are unaware of what intervention they are receiving (such as using a placebo on some subjects in a medication study) to avoid having this knowledge bias their perceptions and behaviors, thus impacting the study's outcome
  • Experimental manipulation : Manipulating an independent variable in a study (for instance, giving smokers a cessation program) instead of just observing an association without conducting any intervention (examining the relationship between exercise and smoking behavior)
  • Random selection : Choosing participants at random or in a manner in which they are representative of the population that you wish to study
  • Randomization or random assignment : Randomly assigning participants to treatment and control groups, ensuring that there is no systematic bias between the research groups
  • Strict study protocol : Following specific procedures during the study so as not to introduce any unintended effects; for example, doing things differently with one group of study participants than you do with another group

Internal Validity Threats

Just as there are many ways to ensure internal validity, there is also a list of potential threats that should be considered when planning a study.

  • Attrition : Participants dropping out or leaving a study, which means that the results are based on a biased sample of only the people who did not choose to leave (and possibly who all have something in common, such as higher motivation)
  • Confounding : A situation in which changes in an outcome variable can be thought to have resulted from some type of outside variable not measured or manipulated in the study
  • Diffusion : This refers to the results of one group transferring to another through the groups interacting and talking with or observing one another; this can also lead to another issue called resentful demoralization, in which a control group tries less hard because they feel resentful over the group that they are in
  • Experimenter bias : An experimenter behaving in a different way with different groups in a study, which can impact the results (and is eliminated through blinding)
  • Historical events : May influence the outcome of studies that occur over a period of time, such as a change in the political leader or a natural disaster that occurs, influencing how study participants feel and act
  • Instrumentation : This involves "priming" participants in a study in certain ways with the measures used, causing them to react in a way that is different than they would have otherwise reacted
  • Maturation : The impact of time as a variable in a study; for example, if a study takes place over a period of time in which it is possible that participants naturally change in some way (i.e., they grew older or became tired), it may be impossible to rule out whether effects seen in the study were simply due to the impact of time
  • Statistical regression : The natural effect of participants at extreme ends of a measure falling in a certain direction due to the passage of time rather than being a direct effect of an intervention
  • Testing : Repeatedly testing participants using the same measures influences outcomes; for example, if you give someone the same test three times, it is likely that they will do better as they learn the test or become used to the testing process, causing them to answer differently

What Is External Validity in Research?

External validity refers to how well the outcome of a research study can be expected to apply to other settings. This is important because, if external validity is established, it means that the findings can be generalizable to similar individuals or populations.

External validity affirmatively answers the question: Do the findings apply to similar people, settings, situations, and time periods?

Population validity and ecological validity are two types of external validity. Population validity refers to whether you can generalize the research outcomes to other populations or groups. Ecological validity refers to whether a study's findings can be generalized to additional situations or settings.

Another term called transferability refers to whether results transfer to situations with similar characteristics. Transferability relates to external validity and refers to a qualitative research design.

Factors That Improve External Validity

If you want to improve the external validity of your study, there are many ways to achieve this goal. Factors that can enhance external validity include:

  • Field experiments : Conducting a study outside the laboratory, in a natural setting
  • Inclusion and exclusion criteria : Setting criteria as to who can be involved in the research, ensuring that the population being studied is clearly defined
  • Psychological realism : Making sure participants experience the events of the study as being real by telling them a "cover story," or a different story about the aim of the study so they don't behave differently than they would in real life based on knowing what to expect or knowing the study's goal
  • Replication : Conducting the study again with different samples or in different settings to see if you get the same results; when many studies have been conducted on the same topic, a meta-analysis can also be used to determine if the effect of an independent variable can be replicated, therefore making it more reliable
  • Reprocessing or calibration : Using statistical methods to adjust for external validity issues, such as reweighting groups if a study had uneven groups for a particular characteristic (such as age)

External Validity Threats

External validity is threatened when a study does not take into account the interaction of variables in the real world. Threats to external validity include:

  • Pre- and post-test effects : When the pre- or post-test is in some way related to the effect seen in the study, such that the cause-and-effect relationship disappears without these added tests
  • Sample features : When some feature of the sample used was responsible for the effect (or partially responsible), leading to limited generalizability of the findings
  • Selection bias : Also considered a threat to internal validity, selection bias describes differences between groups in a study that may relate to the independent variable—like motivation or willingness to take part in the study, or specific demographics of individuals being more likely to take part in an online survey
  • Situational factors : Factors such as the time of day of the study, its location, noise, researcher characteristics, and the number of measures used may affect the generalizability of findings

While rigorous research methods can ensure internal validity, external validity may be limited by these methods.

Internal Validity vs. External Validity

Internal validity and external validity are two research concepts that share a few similarities while also having several differences.

Similarities

One of the similarities between internal validity and external validity is that both factors should be considered when designing a study. This is because both have implications in terms of whether the results of a study have meaning.

Both internal validity and external validity are not "either/or" concepts. Therefore, you always need to decide to what degree a study performs in terms of each type of validity.

Each of these concepts is also typically reported in research articles published in scholarly journals . This is so that other researchers can evaluate the study and make decisions about whether the results are useful and valid.

Differences

The essential difference between internal validity and external validity is that internal validity refers to the structure of a study (and its variables) while external validity refers to the universality of the results. But there are further differences between the two as well.

For instance, internal validity focuses on showing a difference that is due to the independent variable alone. Conversely, external validity results can be translated to the world at large.

Internal validity and external validity aren't mutually exclusive. You can have a study with good internal validity but be overall irrelevant to the real world. You could also conduct a field study that is highly relevant to the real world but doesn't have trustworthy results in terms of knowing what variables caused the outcomes.

Examples of Validity

Perhaps the best way to understand internal validity and external validity is with examples.

Internal Validity Example

An example of a study with good internal validity would be if a researcher hypothesizes that using a particular mindfulness app will reduce negative mood. To test this hypothesis, the researcher randomly assigns a sample of participants to one of two groups: those who will use the app over a defined period and those who engage in a control task.

The researcher ensures that there is no systematic bias in how participants are assigned to the groups. They do this by blinding the research assistants so they don't know which groups the subjects are in during the experiment.

A strict study protocol is also used to outline the procedures of the study. Potential confounding variables are measured along with mood , such as the participants' socioeconomic status, gender, age, and other factors. If participants drop out of the study, their characteristics are examined to make sure there is no systematic bias in terms of who stays in.

External Validity Example

An example of a study with good external validity would be if, in the above example, the participants used the mindfulness app at home rather than in the laboratory. This shows that results appear in a real-world setting.

To further ensure external validity, the researcher clearly defines the population of interest and chooses a representative sample . They might also replicate the study's results using different technological devices.

A Word From Verywell

Setting up an experiment so that it has both sound internal validity and external validity involves being mindful from the start about factors that can influence each aspect of your research.

It's best to spend extra time designing a structurally sound study that has far-reaching implications rather than to quickly rush through the design phase only to discover problems later on. Only when both internal validity and external validity are high can strong conclusions be made about your results.

San Jose State University. Internal and external validity .

Michael RS. Threats to internal & external validity: Y520 strategies for educational inquiry .

Pahus L, Burgel PR, Roche N, Paillasseur JL, Chanez P. Randomized controlled trials of pharmacological treatments to prevent COPD exacerbations: applicability to real-life patients . BMC Pulm Med . 2019;19(1):127. doi:10.1186/s12890-019-0882-y

By Arlin Cuncic, MA Arlin Cuncic, MA, is the author of The Anxiety Workbook and founder of the website About Social Anxiety. She has a Master's degree in clinical psychology.

Validity In Psychology Research: Types & Examples

Saul Mcleod, PhD

Editor-in-Chief for Simply Psychology

BSc (Hons) Psychology, MRes, PhD, University of Manchester

Saul Mcleod, PhD., is a qualified psychology teacher with over 18 years of experience in further and higher education. He has been published in peer-reviewed journals, including the Journal of Clinical Psychology.

Learn about our Editorial Process

Olivia Guy-Evans, MSc

Associate Editor for Simply Psychology

BSc (Hons) Psychology, MSc Psychology of Education

Olivia Guy-Evans is a writer and associate editor for Simply Psychology. She has previously worked in healthcare and educational sectors.

In psychology research, validity refers to the extent to which a test or measurement tool accurately measures what it’s intended to measure. It ensures that the research findings are genuine and not due to extraneous factors.

Validity can be categorized into different types based on internal and external validity .

The concept of validity was formulated by Kelly (1927, p. 14), who stated that a test is valid if it measures what it claims to measure. For example, a test of intelligence should measure intelligence and not something else (such as memory).

Internal and External Validity In Research

Internal validity refers to whether the effects observed in a study are due to the manipulation of the independent variable and not some other confounding factor.

In other words, there is a causal relationship between the independent and dependent variables .

Internal validity can be improved by controlling extraneous variables, using standardized instructions, counterbalancing, and eliminating demand characteristics and investigator effects.

External validity refers to the extent to which the results of a study can be generalized to other settings (ecological validity), other people (population validity), and over time (historical validity).

External validity can be improved by setting experiments more naturally and using random sampling to select participants.

Types of Validity In Psychology

Two main categories of validity are used to assess the validity of the test (i.e., questionnaire, interview, IQ test, etc.): Content and criterion.

  • Content validity refers to the extent to which a test or measurement represents all aspects of the intended content domain. It assesses whether the test items adequately cover the topic or concept.
  • Criterion validity assesses the performance of a test based on its correlation with a known external criterion or outcome. It can be further divided into concurrent (measured at the same time) and predictive (measuring future performance) validity.

table showing the different types of validity

Face Validity

Face validity is simply whether the test appears (at face value) to measure what it claims to. This is the least sophisticated measure of content-related validity, and is a superficial and subjective assessment based on appearance.

Tests wherein the purpose is clear, even to naïve respondents, are said to have high face validity. Accordingly, tests wherein the purpose is unclear have low face validity (Nevo, 1985).

A direct measurement of face validity is obtained by asking people to rate the validity of a test as it appears to them. This rater could use a Likert scale to assess face validity.

For example:

  • The test is extremely suitable for a given purpose
  • The test is very suitable for that purpose;
  • The test is adequate
  • The test is inadequate
  • The test is irrelevant and, therefore, unsuitable

It is important to select suitable people to rate a test (e.g., questionnaire, interview, IQ test, etc.). For example, individuals who actually take the test would be well placed to judge its face validity.

Also, people who work with the test could offer their opinion (e.g., employers, university administrators, employers). Finally, the researcher could use members of the general public with an interest in the test (e.g., parents of testees, politicians, teachers, etc.).

The face validity of a test can be considered a robust construct only if a reasonable level of agreement exists among raters.

It should be noted that the term face validity should be avoided when the rating is done by an “expert,” as content validity is more appropriate.

Having face validity does not mean that a test really measures what the researcher intends to measure, but only in the judgment of raters that it appears to do so. Consequently, it is a crude and basic measure of validity.

A test item such as “ I have recently thought of killing myself ” has obvious face validity as an item measuring suicidal cognitions and may be useful when measuring symptoms of depression.

However, the implication of items on tests with clear face validity is that they are more vulnerable to social desirability bias. Individuals may manipulate their responses to deny or hide problems or exaggerate behaviors to present a positive image of themselves.

It is possible for a test item to lack face validity but still have general validity and measure what it claims to measure. This is good because it reduces demand characteristics and makes it harder for respondents to manipulate their answers.

For example, the test item “ I believe in the second coming of Christ ” would lack face validity as a measure of depression (as the purpose of the item is unclear).

This item appeared on the first version of The Minnesota Multiphasic Personality Inventory (MMPI) and loaded on the depression scale.

Because most of the original normative sample of the MMPI were good Christians, only a depressed Christian would think Christ is not coming back. Thus, for this particular religious sample, the item does have general validity but not face validity.

Construct Validity

Construct validity assesses how well a test or measure represents and captures an abstract theoretical concept, known as a construct. It indicates the degree to which the test accurately reflects the construct it intends to measure, often evaluated through relationships with other variables and measures theoretically connected to the construct.

Construct validity was invented by Cronbach and Meehl (1955). This type of content-related validity refers to the extent to which a test captures a specific theoretical construct or trait, and it overlaps with some of the other aspects of validity

Construct validity does not concern the simple, factual question of whether a test measures an attribute.

Instead, it is about the complex question of whether test score interpretations are consistent with a nomological network involving theoretical and observational terms (Cronbach & Meehl, 1955).

To test for construct validity, it must be demonstrated that the phenomenon being measured actually exists. So, the construct validity of a test for intelligence, for example, depends on a model or theory of intelligence .

Construct validity entails demonstrating the power of such a construct to explain a network of research findings and to predict further relationships.

The more evidence a researcher can demonstrate for a test’s construct validity, the better. However, there is no single method of determining the construct validity of a test.

Instead, different methods and approaches are combined to present the overall construct validity of a test. For example, factor analysis and correlational methods can be used.

Convergent validity

Convergent validity is a subtype of construct validity. It assesses the degree to which two measures that theoretically should be related are related.

It demonstrates that measures of similar constructs are highly correlated. It helps confirm that a test accurately measures the intended construct by showing its alignment with other tests designed to measure the same or similar constructs.

For example, suppose there are two different scales used to measure self-esteem:

Scale A and Scale B. If both scales effectively measure self-esteem, then individuals who score high on Scale A should also score high on Scale B, and those who score low on Scale A should score similarly low on Scale B.

If the scores from these two scales show a strong positive correlation, then this provides evidence for convergent validity because it indicates that both scales seem to measure the same underlying construct of self-esteem.

Concurrent Validity (i.e., occurring at the same time)

Concurrent validity evaluates how well a test’s results correlate with the results of a previously established and accepted measure, when both are administered at the same time.

It helps in determining whether a new measure is a good reflection of an established one without waiting to observe outcomes in the future.

If the new test is validated by comparison with a currently existing criterion, we have concurrent validity.

Very often, a new IQ or personality test might be compared with an older but similar test known to have good validity already.

Predictive Validity

Predictive validity assesses how well a test predicts a criterion that will occur in the future. It measures the test’s ability to foresee the performance of an individual on a related criterion measured at a later point in time. It gauges the test’s effectiveness in predicting subsequent real-world outcomes or results.

For example, a prediction may be made on the basis of a new intelligence test that high scorers at age 12 will be more likely to obtain university degrees several years later. If the prediction is born out, then the test has predictive validity.

Cronbach, L. J., and Meehl, P. E. (1955) Construct validity in psychological tests. Psychological Bulletin , 52, 281-302.

Hathaway, S. R., & McKinley, J. C. (1943). Manual for the Minnesota Multiphasic Personality Inventory . New York: Psychological Corporation.

Kelley, T. L. (1927). Interpretation of educational measurements. New York : Macmillan.

Nevo, B. (1985). Face validity revisited . Journal of Educational Measurement , 22(4), 287-293.

Print Friendly, PDF & Email

Have a language expert improve your writing

Run a free plagiarism check in 10 minutes, automatically generate references for free.

  • Knowledge Base
  • Methodology
  • The 4 Types of Validity | Types, Definitions & Examples

The 4 Types of Validity | Types, Definitions & Examples

Published on 3 May 2022 by Fiona Middleton . Revised on 10 October 2022.

In quantitative research , you have to consider the reliability and validity of your methods and measurements.

Validity tells you how accurately a method measures something. If a method measures what it claims to measure, and the results closely correspond to real-world values, then it can be considered valid. There are four main types of validity:

  • Construct validity : Does the test measure the concept that it’s intended to measure?
  • Content validity : Is the test fully representative of what it aims to measure?
  • Face validity : Does the content of the test appear to be suitable to its aims?
  • Criterion validity : Do the results accurately measure the concrete outcome they are designed to measure?

Note that this article deals with types of test validity, which determine the accuracy of the actual components of a measure. If you are doing experimental research, you also need to consider internal and external validity , which deal with the experimental design and the generalisability of results.

Table of contents

Construct validity, content validity, face validity, criterion validity.

Construct validity evaluates whether a measurement tool really represents the thing we are interested in measuring. It’s central to establishing the overall validity of a method.

What is a construct?

A construct refers to a concept or characteristic that can’t be directly observed but can be measured by observing other indicators that are associated with it.

Constructs can be characteristics of individuals, such as intelligence, obesity, job satisfaction, or depression; they can also be broader concepts applied to organisations or social groups, such as gender equality, corporate social responsibility, or freedom of speech.

What is construct validity?

Construct validity is about ensuring that the method of measurement matches the construct you want to measure. If you develop a questionnaire to diagnose depression, you need to know: does the questionnaire really measure the construct of depression? Or is it actually measuring the respondent’s mood, self-esteem, or some other construct?

To achieve construct validity, you have to ensure that your indicators and measurements are carefully developed based on relevant existing knowledge. The questionnaire must include only relevant questions that measure known indicators of depression.

The other types of validity described below can all be considered as forms of evidence for construct validity.

Prevent plagiarism, run a free check.

Content validity assesses whether a test is representative of all aspects of the construct.

To produce valid results, the content of a test, survey, or measurement method must cover all relevant parts of the subject it aims to measure. If some aspects are missing from the measurement (or if irrelevant aspects are included), the validity is threatened.

Face validity considers how suitable the content of a test seems to be on the surface. It’s similar to content validity, but face validity is a more informal and subjective assessment.

As face validity is a subjective measure, it’s often considered the weakest form of validity. However, it can be useful in the initial stages of developing a method.

Criterion validity evaluates how well a test can predict a concrete outcome, or how well the results of your test approximate the results of another test.

What is a criterion variable?

A criterion variable is an established and effective measurement that is widely considered valid, sometimes referred to as a ‘gold standard’ measurement. Criterion variables can be very difficult to find.

What is criterion validity?

To evaluate criterion validity, you calculate the correlation between the results of your measurement and the results of the criterion measurement. If there is a high correlation, this gives a good indication that your test is measuring what it intends to measure.

Cite this Scribbr article

If you want to cite this source, you can copy and paste the citation or click the ‘Cite this Scribbr article’ button to automatically add the citation to our free Reference Generator.

Middleton, F. (2022, October 10). The 4 Types of Validity | Types, Definitions & Examples. Scribbr. Retrieved 29 April 2024, from https://www.scribbr.co.uk/research-methods/validity-types/

Is this article helpful?

Fiona Middleton

Fiona Middleton

Other students also liked, qualitative vs quantitative research | examples & methods, a quick guide to experimental design | 5 steps & examples, what is qualitative research | methods & examples.

Validity in research: a guide to measuring the right things

Last updated

27 February 2023

Reviewed by

Cathy Heath

Validity is necessary for all types of studies ranging from market validation of a business or product idea to the effectiveness of medical trials and procedures. So, how can you determine whether your research is valid? This guide can help you understand what validity is, the types of validity in research, and the factors that affect research validity.

Make research less tedious

Dovetail streamlines research to help you uncover and share actionable insights

  • What is validity?

In the most basic sense, validity is the quality of being based on truth or reason. Valid research strives to eliminate the effects of unrelated information and the circumstances under which evidence is collected. 

Validity in research is the ability to conduct an accurate study with the right tools and conditions to yield acceptable and reliable data that can be reproduced. Researchers rely on carefully calibrated tools for precise measurements. However, collecting accurate information can be more of a challenge.

Studies must be conducted in environments that don't sway the results to achieve and maintain validity. They can be compromised by asking the wrong questions or relying on limited data. 

Why is validity important in research?

Research is used to improve life for humans. Every product and discovery, from innovative medical breakthroughs to advanced new products, depends on accurate research to be dependable. Without it, the results couldn't be trusted, and products would likely fail. Businesses would lose money, and patients couldn't rely on medical treatments. 

While wasting money on a lousy product is a concern, lack of validity paints a much grimmer picture in the medical field or producing automobiles and airplanes, for example. Whether you're launching an exciting new product or conducting scientific research, validity can determine success and failure.

  • What is reliability?

Reliability is the ability of a method to yield consistency. If the same result can be consistently achieved by using the same method to measure something, the measurement method is said to be reliable. For example, a thermometer that shows the same temperatures each time in a controlled environment is reliable.

While high reliability is a part of measuring validity, it's only part of the puzzle. If the reliable thermometer hasn't been properly calibrated and reliably measures temperatures two degrees too high, it doesn't provide a valid (accurate) measure of temperature. 

Similarly, if a researcher uses a thermometer to measure weight, the results won't be accurate because it's the wrong tool for the job. 

  • How are reliability and validity assessed?

While measuring reliability is a part of measuring validity, there are distinct ways to assess both measurements for accuracy. 

How is reliability measured?

These measures of consistency and stability help assess reliability, including:

Consistency and stability of the same measure when repeated multiple times and conditions

Consistency and stability of the measure across different test subjects

Consistency and stability of results from different parts of a test designed to measure the same thing

How is validity measured?

Since validity refers to how accurately a method measures what it is intended to measure, it can be difficult to assess the accuracy. Validity can be estimated by comparing research results to other relevant data or theories.

The adherence of a measure to existing knowledge of how the concept is measured

The ability to cover all aspects of the concept being measured

The relation of the result in comparison with other valid measures of the same concept

  • What are the types of validity in a research design?

Research validity is broadly gathered into two groups: internal and external. Yet, this grouping doesn't clearly define the different types of validity. Research validity can be divided into seven distinct groups.

Face validity : A test that appears valid simply because of the appropriateness or relativity of the testing method, included information, or tools used.

Content validity : The determination that the measure used in research covers the full domain of the content.

Construct validity : The assessment of the suitability of the measurement tool to measure the activity being studied.

Internal validity : The assessment of how your research environment affects measurement results. This is where other factors can’t explain the extent of an observed cause-and-effect response.

External validity : The extent to which the study will be accurate beyond the sample and the level to which it can be generalized in other settings, populations, and measures.

Statistical conclusion validity: The determination of whether a relationship exists between procedures and outcomes (appropriate sampling and measuring procedures along with appropriate statistical tests).

Criterion-related validity : A measurement of the quality of your testing methods against a criterion measure (like a “gold standard” test) that is measured at the same time.

  • Examples of validity

Like different types of research and the various ways to measure validity, examples of validity can vary widely. These include:

A questionnaire may be considered valid because each question addresses specific and relevant aspects of the study subject.

In a brand assessment study, researchers can use comparison testing to verify the results of an initial study. For example, the results from a focus group response about brand perception are considered more valid when the results match that of a questionnaire answered by current and potential customers.

A test to measure a class of students' understanding of the English language contains reading, writing, listening, and speaking components to cover the full scope of how language is used.

  • Factors that affect research validity

Certain factors can affect research validity in both positive and negative ways. By understanding the factors that improve validity and those that threaten it, you can enhance the validity of your study. These include:

Random selection of participants vs. the selection of participants that are representative of your study criteria

Blinding with interventions the participants are unaware of (like the use of placebos)

Manipulating the experiment by inserting a variable that will change the results

Randomly assigning participants to treatment and control groups to avoid bias

Following specific procedures during the study to avoid unintended effects

Conducting a study in the field instead of a laboratory for more accurate results

Replicating the study with different factors or settings to compare results

Using statistical methods to adjust for inconclusive data

What are the common validity threats in research, and how can their effects be minimized or nullified?

Research validity can be difficult to achieve because of internal and external threats that produce inaccurate results. These factors can jeopardize validity.

History: Events that occur between an early and later measurement

Maturation: The passage of time in a study can include data on actions that would have naturally occurred outside of the settings of the study

Repeated testing: The outcome of repeated tests can change the outcome of followed tests

Selection of subjects: Unconscious bias which can result in the selection of uniform comparison groups

Statistical regression: Choosing subjects based on extremes doesn't yield an accurate outcome for the majority of individuals

Attrition: When the sample group is diminished significantly during the course of the study

Maturation: When subjects mature during the study, and natural maturation is awarded to the effects of the study

While some validity threats can be minimized or wholly nullified, removing all threats from a study is impossible. For example, random selection can remove unconscious bias and statistical regression. 

Researchers can even hope to avoid attrition by using smaller study groups. Yet, smaller study groups could potentially affect the research in other ways. The best practice for researchers to prevent validity threats is through careful environmental planning and t reliable data-gathering methods. 

  • How to ensure validity in your research

Researchers should be mindful of the importance of validity in the early planning stages of any study to avoid inaccurate results. Researchers must take the time to consider tools and methods as well as how the testing environment matches closely with the natural environment in which results will be used.

The following steps can be used to ensure validity in research:

Choose appropriate methods of measurement

Use appropriate sampling to choose test subjects

Create an accurate testing environment

How do you maintain validity in research?

Accurate research is usually conducted over a period of time with different test subjects. To maintain validity across an entire study, you must take specific steps to ensure that gathered data has the same levels of accuracy. 

Consistency is crucial for maintaining validity in research. When researchers apply methods consistently and standardize the circumstances under which data is collected, validity can be maintained across the entire study.

Is there a need for validation of the research instrument before its implementation?

An essential part of validity is choosing the right research instrument or method for accurate results. Consider the thermometer that is reliable but still produces inaccurate results. You're unlikely to achieve research validity without activities like calibration, content, and construct validity.

  • Understanding research validity for more accurate results

Without validity, research can't provide the accuracy necessary to deliver a useful study. By getting a clear understanding of validity in research, you can take steps to improve your research skills and achieve more accurate results.

Get started today

Go from raw data to valuable insights with a flexible research platform

Editor’s picks

Last updated: 21 December 2023

Last updated: 16 December 2023

Last updated: 6 October 2023

Last updated: 25 November 2023

Last updated: 12 May 2023

Last updated: 15 February 2024

Last updated: 11 March 2024

Last updated: 12 December 2023

Last updated: 18 May 2023

Last updated: 6 March 2024

Last updated: 10 April 2023

Last updated: 20 December 2023

Latest articles

Related topics, log in or sign up.

Get started for free

a valid research study does the following

What is the Significance of Validity in Research?

a valid research study does the following

Introduction

  • What is validity in simple terms?

Internal validity vs. external validity in research

Uncovering different types of research validity, factors that improve research validity.

In qualitative research , validity refers to an evaluation metric for the trustworthiness of study findings. Within the expansive landscape of research methodologies , the qualitative approach, with its rich, narrative-driven investigations, demands unique criteria for ensuring validity.

Unlike its quantitative counterpart, which often leans on numerical robustness and statistical veracity, the essence of validity in qualitative research delves deep into the realms of credibility, dependability, and the richness of the data .

The importance of validity in qualitative research cannot be overstated. Establishing validity refers to ensuring that the research findings genuinely reflect the phenomena they are intended to represent. It reinforces the researcher's responsibility to present an authentic representation of study participants' experiences and insights.

This article will examine validity in qualitative research, exploring its characteristics, techniques to bolster it, and the challenges that researchers might face in establishing validity.

a valid research study does the following

At its core, validity in research speaks to the degree to which a study accurately reflects or assesses the specific concept that the researcher is attempting to measure or understand. It's about ensuring that the study investigates what it purports to investigate. While this seems like a straightforward idea, the way validity is approached can vary greatly between qualitative and quantitative research .

Quantitative research often hinges on numerical, measurable data. In this paradigm, validity might refer to whether a specific tool or method measures the correct variable, without interference from other variables. It's about numbers, scales, and objective measurements. For instance, if one is studying personalities by administering surveys, a valid instrument could be a survey that has been rigorously developed and tested to verify that the survey questions are referring to personality characteristics and not other similar concepts, such as moods, opinions, or social norms.

Conversely, qualitative research is more concerned with understanding human behavior and the reasons that govern such behavior. It's less about measuring in the strictest sense and more about interpreting the phenomenon that is being studied. The questions become: "Are these interpretations true representations of the human experience being studied?" and "Do they authentically convey participants' perspectives and contexts?"

a valid research study does the following

Differentiating between qualitative and quantitative validity is crucial because the research methods to ensure validity differ between these research paradigms. In quantitative realms, validity might involve test-retest reliability or examining the internal consistency of a test.

In the qualitative sphere, however, the focus shifts to ensuring that the researcher's interpretations align with the actual experiences and perspectives of their subjects.

This distinction is fundamental because it impacts how researchers engage in research design , gather data , and draw conclusions . Ensuring validity in qualitative research is like weaving a tapestry: every strand of data must be carefully interwoven with the interpretive threads of the researcher, creating a cohesive and faithful representation of the studied experience.

While often terms associated more closely with quantitative research, internal and external validity can still be relevant concepts to understand within the context of qualitative inquiries. Grasping these notions can help qualitative researchers better navigate the challenges of ensuring their findings are both credible and applicable in wider contexts.

Internal validity

Internal validity refers to the authenticity and truthfulness of the findings within the study itself. In qualitative research , this might involve asking: Do the conclusions drawn genuinely reflect the perspectives and experiences of the study's participants?

Internal validity revolves around the depth of understanding, ensuring that the researcher's interpretations are grounded in participants' realities. Techniques like member checking , where participants review and verify the researcher's interpretations , can bolster internal validity.

External validity

External validity refers to the extent to which the findings of a study can be generalized or applied to other settings or groups. For qualitative researchers, the emphasis isn't on statistical generalizability, as often seen in quantitative studies. Instead, it's about transferability.

It becomes a matter of determining how and where the insights gathered might be relevant in other contexts. This doesn't mean that every qualitative study's findings will apply universally, but qualitative researchers should provide enough detail (through rich, thick descriptions) to allow readers or other researchers to determine the potential for transfer to other contexts.

a valid research study does the following

Try out a free trial of ATLAS.ti today

See how you can turn your data into critical research findings with our intuitive interface.

Looking deeper into the realm of validity, it's crucial to recognize and understand its various types. Each type offers distinct criteria and methods of evaluation, ensuring that research remains robust and genuine. Here's an exploration of some of these types.

Construct validity

Construct validity is a cornerstone in research methodology . It pertains to ensuring that the tools or methods used in a research study genuinely capture the intended theoretical constructs.

In qualitative research , the challenge lies in the abstract nature of many constructs. For example, if one were to investigate "emotional intelligence" or "social cohesion," the definitions might vary, making them hard to pin down.

a valid research study does the following

To bolster construct validity, it is important to clearly and transparently define the concepts being studied. In addition, researchers may triangulate data from multiple sources , ensuring that different viewpoints converge towards a shared understanding of the construct. Furthermore, they might delve into iterative rounds of data collection, refining their methods with each cycle to better align with the conceptual essence of their focus.

Content validity

Content validity's emphasis is on the breadth and depth of the content being assessed. In other words, content validity refers to capturing all relevant facets of the phenomenon being studied. Within qualitative paradigms, ensuring comprehensive representation is paramount. If, for instance, a researcher is using interview protocols to understand community perceptions of a local policy, it's crucial that the questions encompass all relevant aspects of that policy. This could range from its implementation and impact to public awareness and opinion variations across demographic groups.

Enhancing content validity can involve expert reviews where subject matter experts evaluate tools or methods for comprehensiveness. Another strategy might involve pilot studies , where preliminary data collection reveals gaps or overlooked aspects that can be addressed in the main study.

Ecological validity

Ecological validity refers to the genuine reflection of real-world situations in research findings. For qualitative researchers, this means their observations , interpretations , and conclusions should resonate with the participants and context being studied.

If a study explores classroom dynamics, for example, studying students and teachers in a controlled research setting would have lower ecological validity than studying real classroom settings. Ecological validity is important to consider because it helps ensure the research is relevant to the people being studied. Individuals might behave entirely different in a controlled environment as opposed to their everyday natural settings.

Ecological validity tends to be stronger in qualitative research compared to quantitative research , because qualitative researchers are typically immersed in their study context and explore participants' subjective perceptions and experiences. Quantitative research, in contrast, can sometimes be more artificial if behavior is being observed in a lab or participants have to choose from predetermined options to answer survey questions.

Qualitative researchers can further bolster ecological validity through immersive fieldwork, where researchers spend extended periods in the studied environment. This immersion helps them capture the nuances and intricacies that might be missed in brief or superficial engagements.

Face validity

Face validity, while seemingly straightforward, holds significant weight in the preliminary stages of research. It serves as a litmus test, gauging the apparent appropriateness and relevance of a tool or method. If a researcher is developing a new interview guide to gauge employee satisfaction, for instance, a quick assessment from colleagues or a focus group can reveal if the questions intuitively seem fit for the purpose.

While face validity is more subjective and lacks the depth of other validity types, it's a crucial initial step, ensuring that the research starts on the right foot.

Criterion validity

Criterion validity evaluates how well the results obtained from one method correlate with those from another, more established method. In many research scenarios, establishing high criterion validity involves using statistical methods to measure validity. For instance, a researcher might utilize the appropriate statistical tests to determine the strength and direction of the linear relationship between two sets of data.

If a new measurement tool or method is being introduced, its validity might be established by statistically correlating its outcomes with those of a gold standard or previously validated tool. Correlational statistics can estimate the strength of the relationship between the new instrument and the previously established instrument, and regression analyses can also be useful to predict outcomes based on established criteria.

While these methods are traditionally aligned with quantitative research, qualitative researchers, particularly those using mixed methods , may also find value in these statistical approaches, especially when wanting to quantify certain aspects of their data for comparative purposes. More broadly, qualitative researchers could compare their operationalizations and findings to other similar qualitative studies to assess that they are indeed examining what they intend to study.

In the realm of qualitative research , the role of the researcher is not just that of an observer but often as an active participant in the meaning-making process. This unique positioning means the researcher's perspectives and interactions can significantly influence the data collected and its interpretation . Here's a deep dive into the researcher's pivotal role in upholding validity.

Reflexivity

A key concept in qualitative research, reflexivity requires researchers to continually reflect on their worldviews, beliefs, and potential influence on the data. By maintaining a reflexive journal or engaging in regular introspection, researchers can identify and address their own biases , ensuring a more genuine interpretation of participant narratives.

Building rapport

The depth and authenticity of information shared by participants often hinge on the rapport and trust established with the researcher. By cultivating genuine, non-judgmental, and empathetic relationships with participants, researchers can enhance the validity of the data collected.

Positionality

Every researcher brings to the study their own background, including their culture, education, socioeconomic status, and more. Recognizing how this positionality might influence interpretations and interactions is crucial. By acknowledging and transparently sharing their positionality, researchers can offer context to their findings and interpretations.

Active listening

The ability to listen without imposing one's own judgments or interpretations is vital. Active listening ensures that researchers capture the participants' experiences and emotions without distortion, enhancing the validity of the findings.

Transparency in methods

To ensure validity, researchers should be transparent about every step of their process. From how participants were selected to how data was analyzed , a clear documentation offers others a chance to understand and evaluate the research's authenticity and rigor .

Member checking

Once data is collected and interpreted, revisiting participants to confirm the researcher's interpretations can be invaluable. This process, known as member checking , ensures that the researcher's understanding aligns with the participants' intended meanings, bolstering validity.

Embracing ambiguity

Qualitative data can be complex and sometimes contradictory. Instead of trying to fit data into preconceived notions or frameworks, researchers must embrace ambiguity, acknowledging areas of uncertainty or multiple interpretations.

a valid research study does the following

Make the most of your research study with ATLAS.ti

From study design to data analysis, let ATLAS.ti guide you through the research process. Download a free trial today.

a valid research study does the following

5.3 Experimentation and Validity

Learning objectives.

  • Explain what internal validity is and why experiments are considered to be high in internal validity.
  • Explain what external validity is and evaluate studies in terms of their external validity.
  • Explain the concepts of construct and statistical validity.

Four Big Validities

When we read about psychology experiments with a critical view, one question to ask is “is this study valid?” However, that question is not as straightforward as it seems because, in psychology, there are many different kinds of validities. Researchers have focused on four validities to help assess whether an experiment is sound (Judd & Kenny, 1981; Morling, 2014) [1] [2] : internal validity, external validity, construct validity, and statistical validity. We will explore each validity in depth.

Internal Validity

Two variables being statistically related does not necessarily mean that one causes the other. “Correlation does not imply causation.” For example, if it were the case that people who exercise regularly are happier than people who do not exercise regularly, this implication would not necessarily mean that exercising increases people’s happiness. It could mean instead that greater happiness causes people to exercise or that something like better physical health causes people to exercise   and  be happier.

The purpose of an experiment, however, is to show that two variables are statistically related and to do so in a way that supports the conclusion that the independent variable caused any observed differences in the dependent variable. The logic is based on this assumption: If the researcher creates two or more highly similar conditions and then manipulates the independent variable to produce just  one  difference between them, then any later difference between the conditions must have been caused by the independent variable. For example, because the only difference between Darley and Latané’s conditions was the number of students that participants believed to be involved in the discussion, this difference in belief must have been responsible for differences in helping between the conditions.

An empirical study is said to be high in  internal validity  if the way it was conducted supports the conclusion that the independent variable caused any observed differences in the dependent variable. Thus experiments are high in internal validity because the way they are conducted—with the manipulation of the independent variable and the control of extraneous variables—provides strong support for causal conclusions. In contrast, nonexperimental research designs (e.g., correlational designs), in which variables are measured but are not manipulated by an experimenter, are low in internal validity.

External Validity

At the same time, the way that experiments are conducted sometimes leads to a different kind of criticism. Specifically, the need to manipulate the independent variable and control extraneous variables means that experiments are often conducted under conditions that seem artificial (Bauman, McGraw, Bartels, & Warren, 2014) [3] . In many psychology experiments, the participants are all undergraduate students and come to a classroom or laboratory to fill out a series of paper-and-pencil questionnaires or to perform a carefully designed computerized task. Consider, for example, an experiment in which researcher Barbara Fredrickson and her colleagues had undergraduate students come to a laboratory on campus and complete a math test while wearing a swimsuit (Fredrickson, Roberts, Noll, Quinn, & Twenge, 1998) [4] . At first, this manipulation might seem silly. When will undergraduate students ever have to complete math tests in their swimsuits outside of this experiment?

The issue we are confronting is that of external validity . An empirical study is high in external validity if the way it was conducted supports generalizing the results to people and situations beyond those actually studied. As a general rule, studies are higher in external validity when the participants and the situation studied are similar to those that the researchers want to generalize to and participants encounter every day, often described as mundane realism . Imagine, for example, that a group of researchers is interested in how shoppers in large grocery stores are affected by whether breakfast cereal is packaged in yellow or purple boxes. Their study would be high in external validity and have high mundane realism if they studied the decisions of ordinary people doing their weekly shopping in a real grocery store. If the shoppers bought much more cereal in purple boxes, the researchers would be fairly confident that this increase would be true for other shoppers in other stores. Their study would be relatively low in external validity, however, if they studied a sample of undergraduate students in a laboratory at a selective university who merely judged the appeal of various colors presented on a computer screen; however, this study would have high psychological realism where the same mental process is used in both the laboratory and in the real world.  If the students judged purple to be more appealing than yellow, the researchers would not be very confident that this preference is relevant to grocery shoppers’ cereal-buying decisions because of low external validity but they could be confident that the visual processing of colors has high psychological realism.

We should be careful, however, not to draw the blanket conclusion that experiments are low in external validity. One reason is that experiments need not seem artificial. Consider that Darley and Latané’s experiment provided a reasonably good simulation of a real emergency situation. Or consider field experiments  that are conducted entirely outside the laboratory. In one such experiment, Robert Cialdini and his colleagues studied whether hotel guests choose to reuse their towels for a second day as opposed to having them washed as a way of conserving water and energy (Cialdini, 2005) [5] . These researchers manipulated the message on a card left in a large sample of hotel rooms. One version of the message emphasized showing respect for the environment, another emphasized that the hotel would donate a portion of their savings to an environmental cause, and a third emphasized that most hotel guests choose to reuse their towels. The result was that guests who received the message that most hotel guests choose to reuse their towels, reused their own towels substantially more often than guests receiving either of the other two messages. Given the way they conducted their study, it seems very likely that their result would hold true for other guests in other hotels.

A second reason not to draw the blanket conclusion that experiments are low in external validity is that they are often conducted to learn about psychological processes  that are likely to operate in a variety of people and situations. Let us return to the experiment by Fredrickson and colleagues. They found that the women in their study, but not the men, performed worse on the math test when they were wearing swimsuits. They argued that this gender difference was due to women’s greater tendency to objectify themselves—to think about themselves from the perspective of an outside observer—which diverts their attention away from other tasks. They argued, furthermore, that this process of self-objectification and its effect on attention is likely to operate in a variety of women and situations—even if none of them ever finds herself taking a math test in her swimsuit.

Construct Validity

In addition to the generalizability of the results of an experiment, another element to scrutinize in a study is the quality of the experiment’s manipulations or the construct validity . The research question that Darley and Latané started with is “does helping behavior become diffused?” They hypothesized that participants in a lab would be less likely to help when they believed there were more potential helpers besides themselves. This conversion from research question to experiment design is called operationalization (see Chapter 4 for more information about the operational definition). Darley and Latané operationalized the independent variable of diffusion of responsibility by increasing the number of potential helpers. In evaluating this design, we would say that the construct validity was very high because the experiment’s manipulations very clearly speak to the research question; there was a crisis, a way for the participant to help, and increasing the number of other students involved in the discussion, they provided a way to test diffusion.

What if the number of conditions in Darley and Latané’s study changed? Consider if there were only two conditions: one student involved in the discussion or two. Even though we may see a decrease in helping by adding another person, it may not be a clear demonstration of diffusion of responsibility, just merely the presence of others. We might think it was a form of Bandura’s social inhibition. The construct validity would be lower. However, had there been five conditions, perhaps we would see the decrease continue with more people in the discussion or perhaps it would plateau after a certain number of people. In that situation, we may not necessarily be learning more about diffusion of responsibility or it may become a different phenomenon. By adding more conditions, the construct validity may not get higher. When designing your own experiment, consider how well the research question is operationalized your study.

Statistical Validity

Statistical validity concerns the proper statistical treatment of data and the soundness of the researchers’ statistical conclusions. There are many different types of inferential statistics tests (e.g.,  t- tests, ANOVA, regression, correlation) and statistical validity concerns the use of the proper type of test to analyze the data. When considering the proper type of test, researchers must consider the scale of measure their dependent variable was measured on and the design of their study. Further, many of inferential statistics tests carry certain assumptions (e.g., the data are normally distributed) and statistical validity is threatened when these assumptions are not met but the statistics are used nonetheless.

One common critique of experiments is that a study did not have enough participants. The main reason for this criticism is that it is difficult to generalize about a population from a small sample. At the outset, it seems as though this critique is about external validity but there are studies where small sample sizes are not a problem (subsequent chapters will discuss how small samples, even of only 1 person, are still very illuminating for psychology research). Therefore, small sample sizes are actually a critique of statistical validity . The statistical validity speaks to whether the statistics conducted in the study are sound and support the conclusions that are made.

The proper statistical analysis should be conducted on the data to determine whether the difference or relationship that was predicted was found. The number of conditions and the total number of participants will determine the overall size of the effect. With this information, a power analysis can be conducted to ascertain whether you are likely to find a real difference. When designing a study, it is best to think about the power analysis so that the appropriate number of participants can be recruited and tested. To design a statistically valid experiment, thinking about the statistical tests at the beginning of the design will help ensure the results can be believed.

Prioritizing Validities

These four big validities–internal, external, construct, and statistical–are useful to keep in mind when both reading about other experiments and designing your own. However, researchers must prioritize and often it is not possible to have high validity in all four areas. In Cialdini’s study on towel usage in hotels, the external validity was high but the statistical validity was more modest. This discrepancy does not invalidate the study but it shows where there may be room for improvement for future follow-up studies (Goldstein, Cialdini, & Griskevicius, 2008) [6] . Morling (2014) points out that most psychology studies have high internal and construct validity but sometimes sacrifice external validity.

Key Takeaways

  • Studies are high in internal validity to the extent that the way they are conducted supports the conclusion that the independent variable caused any observed differences in the dependent variable. Experiments are generally high in internal validity because of the manipulation of the independent variable and control of extraneous variables.
  • Studies are high in external validity to the extent that the result can be generalized to people and situations beyond those actually studied. Although experiments can seem “artificial”—and low in external validity—it is important to consider whether the psychological processes under study are likely to operate in other people and situations.
  • Judd, C.M. & Kenny, D.A. (1981). Estimating the effects of social interventions . Cambridge, MA: Cambridge University Press. ↵
  • Morling, B. (2014, April). Teach your students to be better consumers. APS Observer . Retrieved from http://www.psychologicalscience.org/index.php/publications/observer/2014/april-14/teach-your-students-to-be-better-consumers.html ↵
  • Bauman, C.W., McGraw, A.P., Bartels, D.M., & Warren, C. (2014). Revisiting external validity: Concerns about trolley problems and other sacrificial dilemmas in moral psychology. Social and Personality Psychology Compass, 8/9 , 536-554. ↵
  • Fredrickson, B. L., Roberts, T.-A., Noll, S. M., Quinn, D. M., & Twenge, J. M. (1998). The swimsuit becomes you: Sex differences in self-objectification, restrained eating, and math performance. Journal of Personality and Social Psychology, 75 , 269–284. ↵
  • Cialdini, R. (2005, April). Don’t throw in the towel: Use social influence research. APS Observer . Retrieved from http://www.psychologicalscience.org/index.php/publications/observer/2005/april-05/dont-throw-in-the-towel-use-social-influence-research.html ↵
  • Goldstein, N. J., Cialdini, R. B., & Griskevicius, V. (2008). A room with a viewpoint: Using social norms to motivate environmental conservation in hotels. Journal of Consumer Research, 35 , 472–482. ↵

Creative Commons License

Share This Book

  • Increase Font Size
  • Search Menu
  • Browse content in Arts and Humanities
  • Browse content in Archaeology
  • Anglo-Saxon and Medieval Archaeology
  • Archaeological Methodology and Techniques
  • Archaeology by Region
  • Archaeology of Religion
  • Archaeology of Trade and Exchange
  • Biblical Archaeology
  • Contemporary and Public Archaeology
  • Environmental Archaeology
  • Historical Archaeology
  • History and Theory of Archaeology
  • Industrial Archaeology
  • Landscape Archaeology
  • Mortuary Archaeology
  • Prehistoric Archaeology
  • Underwater Archaeology
  • Urban Archaeology
  • Zooarchaeology
  • Browse content in Architecture
  • Architectural Structure and Design
  • History of Architecture
  • Residential and Domestic Buildings
  • Theory of Architecture
  • Browse content in Art
  • Art Subjects and Themes
  • History of Art
  • Industrial and Commercial Art
  • Theory of Art
  • Biographical Studies
  • Byzantine Studies
  • Browse content in Classical Studies
  • Classical History
  • Classical Philosophy
  • Classical Mythology
  • Classical Literature
  • Classical Reception
  • Classical Art and Architecture
  • Classical Oratory and Rhetoric
  • Greek and Roman Papyrology
  • Greek and Roman Epigraphy
  • Greek and Roman Law
  • Greek and Roman Archaeology
  • Late Antiquity
  • Religion in the Ancient World
  • Digital Humanities
  • Browse content in History
  • Colonialism and Imperialism
  • Diplomatic History
  • Environmental History
  • Genealogy, Heraldry, Names, and Honours
  • Genocide and Ethnic Cleansing
  • Historical Geography
  • History by Period
  • History of Emotions
  • History of Agriculture
  • History of Education
  • History of Gender and Sexuality
  • Industrial History
  • Intellectual History
  • International History
  • Labour History
  • Legal and Constitutional History
  • Local and Family History
  • Maritime History
  • Military History
  • National Liberation and Post-Colonialism
  • Oral History
  • Political History
  • Public History
  • Regional and National History
  • Revolutions and Rebellions
  • Slavery and Abolition of Slavery
  • Social and Cultural History
  • Theory, Methods, and Historiography
  • Urban History
  • World History
  • Browse content in Language Teaching and Learning
  • Language Learning (Specific Skills)
  • Language Teaching Theory and Methods
  • Browse content in Linguistics
  • Applied Linguistics
  • Cognitive Linguistics
  • Computational Linguistics
  • Forensic Linguistics
  • Grammar, Syntax and Morphology
  • Historical and Diachronic Linguistics
  • History of English
  • Language Evolution
  • Language Reference
  • Language Acquisition
  • Language Variation
  • Language Families
  • Lexicography
  • Linguistic Anthropology
  • Linguistic Theories
  • Linguistic Typology
  • Phonetics and Phonology
  • Psycholinguistics
  • Sociolinguistics
  • Translation and Interpretation
  • Writing Systems
  • Browse content in Literature
  • Bibliography
  • Children's Literature Studies
  • Literary Studies (Romanticism)
  • Literary Studies (American)
  • Literary Studies (Asian)
  • Literary Studies (European)
  • Literary Studies (Eco-criticism)
  • Literary Studies (Modernism)
  • Literary Studies - World
  • Literary Studies (1500 to 1800)
  • Literary Studies (19th Century)
  • Literary Studies (20th Century onwards)
  • Literary Studies (African American Literature)
  • Literary Studies (British and Irish)
  • Literary Studies (Early and Medieval)
  • Literary Studies (Fiction, Novelists, and Prose Writers)
  • Literary Studies (Gender Studies)
  • Literary Studies (Graphic Novels)
  • Literary Studies (History of the Book)
  • Literary Studies (Plays and Playwrights)
  • Literary Studies (Poetry and Poets)
  • Literary Studies (Postcolonial Literature)
  • Literary Studies (Queer Studies)
  • Literary Studies (Science Fiction)
  • Literary Studies (Travel Literature)
  • Literary Studies (War Literature)
  • Literary Studies (Women's Writing)
  • Literary Theory and Cultural Studies
  • Mythology and Folklore
  • Shakespeare Studies and Criticism
  • Browse content in Media Studies
  • Browse content in Music
  • Applied Music
  • Dance and Music
  • Ethics in Music
  • Ethnomusicology
  • Gender and Sexuality in Music
  • Medicine and Music
  • Music Cultures
  • Music and Media
  • Music and Religion
  • Music and Culture
  • Music Education and Pedagogy
  • Music Theory and Analysis
  • Musical Scores, Lyrics, and Libretti
  • Musical Structures, Styles, and Techniques
  • Musicology and Music History
  • Performance Practice and Studies
  • Race and Ethnicity in Music
  • Sound Studies
  • Browse content in Performing Arts
  • Browse content in Philosophy
  • Aesthetics and Philosophy of Art
  • Epistemology
  • Feminist Philosophy
  • History of Western Philosophy
  • Metaphysics
  • Moral Philosophy
  • Non-Western Philosophy
  • Philosophy of Language
  • Philosophy of Mind
  • Philosophy of Perception
  • Philosophy of Science
  • Philosophy of Action
  • Philosophy of Law
  • Philosophy of Religion
  • Philosophy of Mathematics and Logic
  • Practical Ethics
  • Social and Political Philosophy
  • Browse content in Religion
  • Biblical Studies
  • Christianity
  • East Asian Religions
  • History of Religion
  • Judaism and Jewish Studies
  • Qumran Studies
  • Religion and Education
  • Religion and Health
  • Religion and Politics
  • Religion and Science
  • Religion and Law
  • Religion and Art, Literature, and Music
  • Religious Studies
  • Browse content in Society and Culture
  • Cookery, Food, and Drink
  • Cultural Studies
  • Customs and Traditions
  • Ethical Issues and Debates
  • Hobbies, Games, Arts and Crafts
  • Lifestyle, Home, and Garden
  • Natural world, Country Life, and Pets
  • Popular Beliefs and Controversial Knowledge
  • Sports and Outdoor Recreation
  • Technology and Society
  • Travel and Holiday
  • Visual Culture
  • Browse content in Law
  • Arbitration
  • Browse content in Company and Commercial Law
  • Commercial Law
  • Company Law
  • Browse content in Comparative Law
  • Systems of Law
  • Competition Law
  • Browse content in Constitutional and Administrative Law
  • Government Powers
  • Judicial Review
  • Local Government Law
  • Military and Defence Law
  • Parliamentary and Legislative Practice
  • Construction Law
  • Contract Law
  • Browse content in Criminal Law
  • Criminal Procedure
  • Criminal Evidence Law
  • Sentencing and Punishment
  • Employment and Labour Law
  • Environment and Energy Law
  • Browse content in Financial Law
  • Banking Law
  • Insolvency Law
  • History of Law
  • Human Rights and Immigration
  • Intellectual Property Law
  • Browse content in International Law
  • Private International Law and Conflict of Laws
  • Public International Law
  • IT and Communications Law
  • Jurisprudence and Philosophy of Law
  • Law and Politics
  • Law and Society
  • Browse content in Legal System and Practice
  • Courts and Procedure
  • Legal Skills and Practice
  • Primary Sources of Law
  • Regulation of Legal Profession
  • Medical and Healthcare Law
  • Browse content in Policing
  • Criminal Investigation and Detection
  • Police and Security Services
  • Police Procedure and Law
  • Police Regional Planning
  • Browse content in Property Law
  • Personal Property Law
  • Study and Revision
  • Terrorism and National Security Law
  • Browse content in Trusts Law
  • Wills and Probate or Succession
  • Browse content in Medicine and Health
  • Browse content in Allied Health Professions
  • Arts Therapies
  • Clinical Science
  • Dietetics and Nutrition
  • Occupational Therapy
  • Operating Department Practice
  • Physiotherapy
  • Radiography
  • Speech and Language Therapy
  • Browse content in Anaesthetics
  • General Anaesthesia
  • Neuroanaesthesia
  • Clinical Neuroscience
  • Browse content in Clinical Medicine
  • Acute Medicine
  • Cardiovascular Medicine
  • Clinical Genetics
  • Clinical Pharmacology and Therapeutics
  • Dermatology
  • Endocrinology and Diabetes
  • Gastroenterology
  • Genito-urinary Medicine
  • Geriatric Medicine
  • Infectious Diseases
  • Medical Toxicology
  • Medical Oncology
  • Pain Medicine
  • Palliative Medicine
  • Rehabilitation Medicine
  • Respiratory Medicine and Pulmonology
  • Rheumatology
  • Sleep Medicine
  • Sports and Exercise Medicine
  • Community Medical Services
  • Critical Care
  • Emergency Medicine
  • Forensic Medicine
  • Haematology
  • History of Medicine
  • Browse content in Medical Skills
  • Clinical Skills
  • Communication Skills
  • Nursing Skills
  • Surgical Skills
  • Browse content in Medical Dentistry
  • Oral and Maxillofacial Surgery
  • Paediatric Dentistry
  • Restorative Dentistry and Orthodontics
  • Surgical Dentistry
  • Medical Ethics
  • Medical Statistics and Methodology
  • Browse content in Neurology
  • Clinical Neurophysiology
  • Neuropathology
  • Nursing Studies
  • Browse content in Obstetrics and Gynaecology
  • Gynaecology
  • Occupational Medicine
  • Ophthalmology
  • Otolaryngology (ENT)
  • Browse content in Paediatrics
  • Neonatology
  • Browse content in Pathology
  • Chemical Pathology
  • Clinical Cytogenetics and Molecular Genetics
  • Histopathology
  • Medical Microbiology and Virology
  • Patient Education and Information
  • Browse content in Pharmacology
  • Psychopharmacology
  • Browse content in Popular Health
  • Caring for Others
  • Complementary and Alternative Medicine
  • Self-help and Personal Development
  • Browse content in Preclinical Medicine
  • Cell Biology
  • Molecular Biology and Genetics
  • Reproduction, Growth and Development
  • Primary Care
  • Professional Development in Medicine
  • Browse content in Psychiatry
  • Addiction Medicine
  • Child and Adolescent Psychiatry
  • Forensic Psychiatry
  • Learning Disabilities
  • Old Age Psychiatry
  • Psychotherapy
  • Browse content in Public Health and Epidemiology
  • Epidemiology
  • Public Health
  • Browse content in Radiology
  • Clinical Radiology
  • Interventional Radiology
  • Nuclear Medicine
  • Radiation Oncology
  • Reproductive Medicine
  • Browse content in Surgery
  • Cardiothoracic Surgery
  • Gastro-intestinal and Colorectal Surgery
  • General Surgery
  • Neurosurgery
  • Paediatric Surgery
  • Peri-operative Care
  • Plastic and Reconstructive Surgery
  • Surgical Oncology
  • Transplant Surgery
  • Trauma and Orthopaedic Surgery
  • Vascular Surgery
  • Browse content in Science and Mathematics
  • Browse content in Biological Sciences
  • Aquatic Biology
  • Biochemistry
  • Bioinformatics and Computational Biology
  • Developmental Biology
  • Ecology and Conservation
  • Evolutionary Biology
  • Genetics and Genomics
  • Microbiology
  • Molecular and Cell Biology
  • Natural History
  • Plant Sciences and Forestry
  • Research Methods in Life Sciences
  • Structural Biology
  • Systems Biology
  • Zoology and Animal Sciences
  • Browse content in Chemistry
  • Analytical Chemistry
  • Computational Chemistry
  • Crystallography
  • Environmental Chemistry
  • Industrial Chemistry
  • Inorganic Chemistry
  • Materials Chemistry
  • Medicinal Chemistry
  • Mineralogy and Gems
  • Organic Chemistry
  • Physical Chemistry
  • Polymer Chemistry
  • Study and Communication Skills in Chemistry
  • Theoretical Chemistry
  • Browse content in Computer Science
  • Artificial Intelligence
  • Computer Architecture and Logic Design
  • Game Studies
  • Human-Computer Interaction
  • Mathematical Theory of Computation
  • Programming Languages
  • Software Engineering
  • Systems Analysis and Design
  • Virtual Reality
  • Browse content in Computing
  • Business Applications
  • Computer Security
  • Computer Games
  • Computer Networking and Communications
  • Digital Lifestyle
  • Graphical and Digital Media Applications
  • Operating Systems
  • Browse content in Earth Sciences and Geography
  • Atmospheric Sciences
  • Environmental Geography
  • Geology and the Lithosphere
  • Maps and Map-making
  • Meteorology and Climatology
  • Oceanography and Hydrology
  • Palaeontology
  • Physical Geography and Topography
  • Regional Geography
  • Soil Science
  • Urban Geography
  • Browse content in Engineering and Technology
  • Agriculture and Farming
  • Biological Engineering
  • Civil Engineering, Surveying, and Building
  • Electronics and Communications Engineering
  • Energy Technology
  • Engineering (General)
  • Environmental Science, Engineering, and Technology
  • History of Engineering and Technology
  • Mechanical Engineering and Materials
  • Technology of Industrial Chemistry
  • Transport Technology and Trades
  • Browse content in Environmental Science
  • Applied Ecology (Environmental Science)
  • Conservation of the Environment (Environmental Science)
  • Environmental Sustainability
  • Environmentalist Thought and Ideology (Environmental Science)
  • Management of Land and Natural Resources (Environmental Science)
  • Natural Disasters (Environmental Science)
  • Nuclear Issues (Environmental Science)
  • Pollution and Threats to the Environment (Environmental Science)
  • Social Impact of Environmental Issues (Environmental Science)
  • History of Science and Technology
  • Browse content in Materials Science
  • Ceramics and Glasses
  • Composite Materials
  • Metals, Alloying, and Corrosion
  • Nanotechnology
  • Browse content in Mathematics
  • Applied Mathematics
  • Biomathematics and Statistics
  • History of Mathematics
  • Mathematical Education
  • Mathematical Finance
  • Mathematical Analysis
  • Numerical and Computational Mathematics
  • Probability and Statistics
  • Pure Mathematics
  • Browse content in Neuroscience
  • Cognition and Behavioural Neuroscience
  • Development of the Nervous System
  • Disorders of the Nervous System
  • History of Neuroscience
  • Invertebrate Neurobiology
  • Molecular and Cellular Systems
  • Neuroendocrinology and Autonomic Nervous System
  • Neuroscientific Techniques
  • Sensory and Motor Systems
  • Browse content in Physics
  • Astronomy and Astrophysics
  • Atomic, Molecular, and Optical Physics
  • Biological and Medical Physics
  • Classical Mechanics
  • Computational Physics
  • Condensed Matter Physics
  • Electromagnetism, Optics, and Acoustics
  • History of Physics
  • Mathematical and Statistical Physics
  • Measurement Science
  • Nuclear Physics
  • Particles and Fields
  • Plasma Physics
  • Quantum Physics
  • Relativity and Gravitation
  • Semiconductor and Mesoscopic Physics
  • Browse content in Psychology
  • Affective Sciences
  • Clinical Psychology
  • Cognitive Psychology
  • Cognitive Neuroscience
  • Criminal and Forensic Psychology
  • Developmental Psychology
  • Educational Psychology
  • Evolutionary Psychology
  • Health Psychology
  • History and Systems in Psychology
  • Music Psychology
  • Neuropsychology
  • Organizational Psychology
  • Psychological Assessment and Testing
  • Psychology of Human-Technology Interaction
  • Psychology Professional Development and Training
  • Research Methods in Psychology
  • Social Psychology
  • Browse content in Social Sciences
  • Browse content in Anthropology
  • Anthropology of Religion
  • Human Evolution
  • Medical Anthropology
  • Physical Anthropology
  • Regional Anthropology
  • Social and Cultural Anthropology
  • Theory and Practice of Anthropology
  • Browse content in Business and Management
  • Business Ethics
  • Business Strategy
  • Business History
  • Business and Technology
  • Business and Government
  • Business and the Environment
  • Comparative Management
  • Corporate Governance
  • Corporate Social Responsibility
  • Entrepreneurship
  • Health Management
  • Human Resource Management
  • Industrial and Employment Relations
  • Industry Studies
  • Information and Communication Technologies
  • International Business
  • Knowledge Management
  • Management and Management Techniques
  • Operations Management
  • Organizational Theory and Behaviour
  • Pensions and Pension Management
  • Public and Nonprofit Management
  • Strategic Management
  • Supply Chain Management
  • Browse content in Criminology and Criminal Justice
  • Criminal Justice
  • Criminology
  • Forms of Crime
  • International and Comparative Criminology
  • Youth Violence and Juvenile Justice
  • Development Studies
  • Browse content in Economics
  • Agricultural, Environmental, and Natural Resource Economics
  • Asian Economics
  • Behavioural Finance
  • Behavioural Economics and Neuroeconomics
  • Econometrics and Mathematical Economics
  • Economic History
  • Economic Systems
  • Economic Methodology
  • Economic Development and Growth
  • Financial Markets
  • Financial Institutions and Services
  • General Economics and Teaching
  • Health, Education, and Welfare
  • History of Economic Thought
  • International Economics
  • Labour and Demographic Economics
  • Law and Economics
  • Macroeconomics and Monetary Economics
  • Microeconomics
  • Public Economics
  • Urban, Rural, and Regional Economics
  • Welfare Economics
  • Browse content in Education
  • Adult Education and Continuous Learning
  • Care and Counselling of Students
  • Early Childhood and Elementary Education
  • Educational Equipment and Technology
  • Educational Strategies and Policy
  • Higher and Further Education
  • Organization and Management of Education
  • Philosophy and Theory of Education
  • Schools Studies
  • Secondary Education
  • Teaching of a Specific Subject
  • Teaching of Specific Groups and Special Educational Needs
  • Teaching Skills and Techniques
  • Browse content in Environment
  • Applied Ecology (Social Science)
  • Climate Change
  • Conservation of the Environment (Social Science)
  • Environmentalist Thought and Ideology (Social Science)
  • Natural Disasters (Environment)
  • Social Impact of Environmental Issues (Social Science)
  • Browse content in Human Geography
  • Cultural Geography
  • Economic Geography
  • Political Geography
  • Browse content in Interdisciplinary Studies
  • Communication Studies
  • Museums, Libraries, and Information Sciences
  • Browse content in Politics
  • African Politics
  • Asian Politics
  • Chinese Politics
  • Comparative Politics
  • Conflict Politics
  • Elections and Electoral Studies
  • Environmental Politics
  • European Union
  • Foreign Policy
  • Gender and Politics
  • Human Rights and Politics
  • Indian Politics
  • International Relations
  • International Organization (Politics)
  • International Political Economy
  • Irish Politics
  • Latin American Politics
  • Middle Eastern Politics
  • Political Behaviour
  • Political Economy
  • Political Institutions
  • Political Methodology
  • Political Communication
  • Political Philosophy
  • Political Sociology
  • Political Theory
  • Politics and Law
  • Public Policy
  • Public Administration
  • Quantitative Political Methodology
  • Regional Political Studies
  • Russian Politics
  • Security Studies
  • State and Local Government
  • UK Politics
  • US Politics
  • Browse content in Regional and Area Studies
  • African Studies
  • Asian Studies
  • East Asian Studies
  • Japanese Studies
  • Latin American Studies
  • Middle Eastern Studies
  • Native American Studies
  • Scottish Studies
  • Browse content in Research and Information
  • Research Methods
  • Browse content in Social Work
  • Addictions and Substance Misuse
  • Adoption and Fostering
  • Care of the Elderly
  • Child and Adolescent Social Work
  • Couple and Family Social Work
  • Developmental and Physical Disabilities Social Work
  • Direct Practice and Clinical Social Work
  • Emergency Services
  • Human Behaviour and the Social Environment
  • International and Global Issues in Social Work
  • Mental and Behavioural Health
  • Social Justice and Human Rights
  • Social Policy and Advocacy
  • Social Work and Crime and Justice
  • Social Work Macro Practice
  • Social Work Practice Settings
  • Social Work Research and Evidence-based Practice
  • Welfare and Benefit Systems
  • Browse content in Sociology
  • Childhood Studies
  • Community Development
  • Comparative and Historical Sociology
  • Economic Sociology
  • Gender and Sexuality
  • Gerontology and Ageing
  • Health, Illness, and Medicine
  • Marriage and the Family
  • Migration Studies
  • Occupations, Professions, and Work
  • Organizations
  • Population and Demography
  • Race and Ethnicity
  • Social Theory
  • Social Movements and Social Change
  • Social Research and Statistics
  • Social Stratification, Inequality, and Mobility
  • Sociology of Religion
  • Sociology of Education
  • Sport and Leisure
  • Urban and Rural Studies
  • Browse content in Warfare and Defence
  • Defence Strategy, Planning, and Research
  • Land Forces and Warfare
  • Military Administration
  • Military Life and Institutions
  • Naval Forces and Warfare
  • Other Warfare and Defence Issues
  • Peace Studies and Conflict Resolution
  • Weapons and Equipment

Validity and Validation

  • < Previous
  • Next chapter >

Validity and Validation

1 Validity and Validation in Research and Assessment

  • Published: October 2013
  • Cite Icon Cite
  • Permissions Icon Permissions

This chapter first sets out the book's purpose, namely to further define validity and to explore the factors that should be considered when evaluating claims from research and assessment. It then discusses validity theory and its philosophical foundations, with connections between the philosophical foundations and specific ways validation is considered in research and measurement. An overview of the subsequent chapters is also presented.

Signed in as

Institutional accounts.

  • Google Scholar Indexing
  • GoogleCrawler [DO NOT DELETE]

Personal account

  • Sign in with email/username & password
  • Get email alerts
  • Save searches
  • Purchase content
  • Activate your purchase/trial code

Institutional access

  • Sign in with a library card Sign in with username/password Recommend to your librarian
  • Institutional account management
  • Get help with access

Access to content on Oxford Academic is often provided through institutional subscriptions and purchases. If you are a member of an institution with an active account, you may be able to access content in one of the following ways:

IP based access

Typically, access is provided across an institutional network to a range of IP addresses. This authentication occurs automatically, and it is not possible to sign out of an IP authenticated account.

Sign in through your institution

Choose this option to get remote access when outside your institution. Shibboleth/Open Athens technology is used to provide single sign-on between your institution’s website and Oxford Academic.

  • Click Sign in through your institution.
  • Select your institution from the list provided, which will take you to your institution's website to sign in.
  • When on the institution site, please use the credentials provided by your institution. Do not use an Oxford Academic personal account.
  • Following successful sign in, you will be returned to Oxford Academic.

If your institution is not listed or you cannot sign in to your institution’s website, please contact your librarian or administrator.

Sign in with a library card

Enter your library card number to sign in. If you cannot sign in, please contact your librarian.

Society Members

Society member access to a journal is achieved in one of the following ways:

Sign in through society site

Many societies offer single sign-on between the society website and Oxford Academic. If you see ‘Sign in through society site’ in the sign in pane within a journal:

  • Click Sign in through society site.
  • When on the society site, please use the credentials provided by that society. Do not use an Oxford Academic personal account.

If you do not have a society account or have forgotten your username or password, please contact your society.

Sign in using a personal account

Some societies use Oxford Academic personal accounts to provide access to their members. See below.

A personal account can be used to get email alerts, save searches, purchase content, and activate subscriptions.

Some societies use Oxford Academic personal accounts to provide access to their members.

Viewing your signed in accounts

Click the account icon in the top right to:

  • View your signed in personal account and access account management features.
  • View the institutional accounts that are providing access.

Signed in but can't access content

Oxford Academic is home to a wide variety of products. The institutional subscription may not cover the content that you are trying to access. If you believe you should have access to that content, please contact your librarian.

For librarians and administrators, your personal account also provides access to institutional account management. Here you will find options to view and activate subscriptions, manage institutional settings and access options, access usage statistics, and more.

Our books are available by subscription or purchase to libraries and institutions.

  • About Oxford Academic
  • Publish journals with us
  • University press partners
  • What we publish
  • New features  
  • Open access
  • Rights and permissions
  • Accessibility
  • Advertising
  • Media enquiries
  • Oxford University Press
  • Oxford Languages
  • University of Oxford

Oxford University Press is a department of the University of Oxford. It furthers the University's objective of excellence in research, scholarship, and education by publishing worldwide

  • Copyright © 2024 Oxford University Press
  • Cookie settings
  • Cookie policy
  • Privacy policy
  • Legal notice

This Feature Is Available To Subscribers Only

Sign In or Create an Account

This PDF is available to Subscribers Only

For full access to this pdf, sign in to an existing account, or purchase an annual subscription.

Grad Coach

Validity & Reliability In Research

A Plain-Language Explanation (With Examples)

By: Derek Jansen (MBA) | Expert Reviewer: Kerryn Warren (PhD) | September 2023

Validity and reliability are two related but distinctly different concepts within research. Understanding what they are and how to achieve them is critically important to any research project. In this post, we’ll unpack these two concepts as simply as possible.

This post is based on our popular online course, Research Methodology Bootcamp . In the course, we unpack the basics of methodology  using straightfoward language and loads of examples. If you’re new to academic research, you definitely want to use this link to get 50% off the course (limited-time offer).

Overview: Validity & Reliability

  • The big picture
  • Validity 101
  • Reliability 101 
  • Key takeaways

First, The Basics…

First, let’s start with a big-picture view and then we can zoom in to the finer details.

Validity and reliability are two incredibly important concepts in research, especially within the social sciences. Both validity and reliability have to do with the measurement of variables and/or constructs – for example, job satisfaction, intelligence, productivity, etc. When undertaking research, you’ll often want to measure these types of constructs and variables and, at the simplest level, validity and reliability are about ensuring the quality and accuracy of those measurements .

As you can probably imagine, if your measurements aren’t accurate or there are quality issues at play when you’re collecting your data, your entire study will be at risk. Therefore, validity and reliability are very important concepts to understand (and to get right). So, let’s unpack each of them.

Free Webinar: Research Methodology 101

What Is Validity?

In simple terms, validity (also called “construct validity”) is all about whether a research instrument accurately measures what it’s supposed to measure .

For example, let’s say you have a set of Likert scales that are supposed to quantify someone’s level of overall job satisfaction. If this set of scales focused purely on only one dimension of job satisfaction, say pay satisfaction, this would not be a valid measurement, as it only captures one aspect of the multidimensional construct. In other words, pay satisfaction alone is only one contributing factor toward overall job satisfaction, and therefore it’s not a valid way to measure someone’s job satisfaction.

a valid research study does the following

Oftentimes in quantitative studies, the way in which the researcher or survey designer interprets a question or statement can differ from how the study participants interpret it . Given that respondents don’t have the opportunity to ask clarifying questions when taking a survey, it’s easy for these sorts of misunderstandings to crop up. Naturally, if the respondents are interpreting the question in the wrong way, the data they provide will be pretty useless . Therefore, ensuring that a study’s measurement instruments are valid – in other words, that they are measuring what they intend to measure – is incredibly important.

There are various types of validity and we’re not going to go down that rabbit hole in this post, but it’s worth quickly highlighting the importance of making sure that your research instrument is tightly aligned with the theoretical construct you’re trying to measure .  In other words, you need to pay careful attention to how the key theories within your study define the thing you’re trying to measure – and then make sure that your survey presents it in the same way.

For example, sticking with the “job satisfaction” construct we looked at earlier, you’d need to clearly define what you mean by job satisfaction within your study (and this definition would of course need to be underpinned by the relevant theory). You’d then need to make sure that your chosen definition is reflected in the types of questions or scales you’re using in your survey . Simply put, you need to make sure that your survey respondents are perceiving your key constructs in the same way you are. Or, even if they’re not, that your measurement instrument is capturing the necessary information that reflects your definition of the construct at hand.

If all of this talk about constructs sounds a bit fluffy, be sure to check out Research Methodology Bootcamp , which will provide you with a rock-solid foundational understanding of all things methodology-related. Remember, you can take advantage of our 60% discount offer using this link.

Need a helping hand?

a valid research study does the following

What Is Reliability?

As with validity, reliability is an attribute of a measurement instrument – for example, a survey, a weight scale or even a blood pressure monitor. But while validity is concerned with whether the instrument is measuring the “thing” it’s supposed to be measuring, reliability is concerned with consistency and stability . In other words, reliability reflects the degree to which a measurement instrument produces consistent results when applied repeatedly to the same phenomenon , under the same conditions .

As you can probably imagine, a measurement instrument that achieves a high level of consistency is naturally more dependable (or reliable) than one that doesn’t – in other words, it can be trusted to provide consistent measurements . And that, of course, is what you want when undertaking empirical research. If you think about it within a more domestic context, just imagine if you found that your bathroom scale gave you a different number every time you hopped on and off of it – you wouldn’t feel too confident in its ability to measure the variable that is your body weight 🙂

It’s worth mentioning that reliability also extends to the person using the measurement instrument . For example, if two researchers use the same instrument (let’s say a measuring tape) and they get different measurements, there’s likely an issue in terms of how one (or both) of them are using the measuring tape. So, when you think about reliability, consider both the instrument and the researcher as part of the equation.

As with validity, there are various types of reliability and various tests that can be used to assess the reliability of an instrument. A popular one that you’ll likely come across for survey instruments is Cronbach’s alpha , which is a statistical measure that quantifies the degree to which items within an instrument (for example, a set of Likert scales) measure the same underlying construct . In other words, Cronbach’s alpha indicates how closely related the items are and whether they consistently capture the same concept . 

Reliability reflects whether an instrument produces consistent results when applied to the same phenomenon, under the same conditions.

Recap: Key Takeaways

Alright, let’s quickly recap to cement your understanding of validity and reliability:

  • Validity is concerned with whether an instrument (e.g., a set of Likert scales) is measuring what it’s supposed to measure
  • Reliability is concerned with whether that measurement is consistent and stable when measuring the same phenomenon under the same conditions.

In short, validity and reliability are both essential to ensuring that your data collection efforts deliver high-quality, accurate data that help you answer your research questions . So, be sure to always pay careful attention to the validity and reliability of your measurement instruments when collecting and analysing data. As the adage goes, “rubbish in, rubbish out” – make sure that your data inputs are rock-solid.

Literature Review Course

Psst… there’s more!

This post is an extract from our bestselling short course, Methodology Bootcamp . If you want to work smart, you don't want to miss this .

You Might Also Like:

Narrative analysis explainer

THE MATERIAL IS WONDERFUL AND BENEFICIAL TO ALL STUDENTS.

THE MATERIAL IS WONDERFUL AND BENEFICIAL TO ALL STUDENTS AND I HAVE GREATLY BENEFITED FROM THE CONTENT.

Submit a Comment Cancel reply

Your email address will not be published. Required fields are marked *

Save my name, email, and website in this browser for the next time I comment.

  • Print Friendly

a valid research study does the following

Understanding Reliability and Validity

These related research issues ask us to consider whether we are studying what we think we are studying and whether the measures we use are consistent.

Reliability

Reliability is the extent to which an experiment, test, or any measuring procedure yields the same result on repeated trials. Without the agreement of independent observers able to replicate research procedures, or the ability to use research tools and procedures that yield consistent measurements, researchers would be unable to satisfactorily draw conclusions, formulate theories, or make claims about the generalizability of their research. In addition to its important role in research, reliability is critical for many parts of our lives, including manufacturing, medicine, and sports.

Reliability is such an important concept that it has been defined in terms of its application to a wide range of activities. For researchers, four key types of reliability are:

Equivalency Reliability

Equivalency reliability is the extent to which two items measure identical concepts at an identical level of difficulty. Equivalency reliability is determined by relating two sets of test scores to one another to highlight the degree of relationship or association. In quantitative studies and particularly in experimental studies, a correlation coefficient, statistically referred to as r , is used to show the strength of the correlation between a dependent variable (the subject under study), and one or more independent variables , which are manipulated to determine effects on the dependent variable. An important consideration is that equivalency reliability is concerned with correlational, not causal, relationships.

For example, a researcher studying university English students happened to notice that when some students were studying for finals, their holiday shopping began. Intrigued by this, the researcher attempted to observe how often, or to what degree, this these two behaviors co-occurred throughout the academic year. The researcher used the results of the observations to assess the correlation between studying throughout the academic year and shopping for gifts. The researcher concluded there was poor equivalency reliability between the two actions. In other words, studying was not a reliable predictor of shopping for gifts.

Stability Reliability

Stability reliability (sometimes called test, re-test reliability) is the agreement of measuring instruments over time. To determine stability, a measure or test is repeated on the same subjects at a future date. Results are compared and correlated with the initial test to give a measure of stability.

An example of stability reliability would be the method of maintaining weights used by the U.S. Bureau of Standards. Platinum objects of fixed weight (one kilogram, one pound, etc...) are kept locked away. Once a year they are taken out and weighed, allowing scales to be reset so they are "weighing" accurately. Keeping track of how much the scales are off from year to year establishes a stability reliability for these instruments. In this instance, the platinum weights themselves are assumed to have a perfectly fixed stability reliability.

Internal Consistency

Internal consistency is the extent to which tests or procedures assess the same characteristic, skill or quality. It is a measure of the precision between the observers or of the measuring instruments used in a study. This type of reliability often helps researchers interpret data and predict the value of scores and the limits of the relationship among variables.

For example, a researcher designs a questionnaire to find out about college students' dissatisfaction with a particular textbook. Analyzing the internal consistency of the survey items dealing with dissatisfaction will reveal the extent to which items on the questionnaire focus on the notion of dissatisfaction.

Interrater Reliability

Interrater reliability is the extent to which two or more individuals (coders or raters) agree. Interrater reliability addresses the consistency of the implementation of a rating system.

A test of interrater reliability would be the following scenario: Two or more researchers are observing a high school classroom. The class is discussing a movie that they have just viewed as a group. The researchers have a sliding rating scale (1 being most positive, 5 being most negative) with which they are rating the student's oral responses. Interrater reliability assesses the consistency of how the rating system is implemented. For example, if one researcher gives a "1" to a student response, while another researcher gives a "5," obviously the interrater reliability would be inconsistent. Interrater reliability is dependent upon the ability of two or more individuals to be consistent. Training, education and monitoring skills can enhance interrater reliability.

Related Information: Reliability Example

An example of the importance of reliability is the use of measuring devices in Olympic track and field events. For the vast majority of people, ordinary measuring rulers and their degree of accuracy are reliable enough. However, for an Olympic event, such as the discus throw, the slightest variation in a measuring device -- whether it is a tape, clock, or other device -- could mean the difference between the gold and silver medals. Additionally, it could mean the difference between a new world record and outright failure to qualify for an event. Olympic measuring devices, then, must be reliable from one throw or race to another and from one competition to another. They must also be reliable when used in different parts of the world, as temperature, air pressure, humidity, interpretation, or other variables might affect their readings.

Validity refers to the degree to which a study accurately reflects or assesses the specific concept that the researcher is attempting to measure. While reliability is concerned with the accuracy of the actual measuring instrument or procedure, validity is concerned with the study's success at measuring what the researchers set out to measure.

Researchers should be concerned with both external and internal validity. External validity refers to the extent to which the results of a study are generalizable or transferable. (Most discussions of external validity focus solely on generalizability; see Campbell and Stanley, 1966. We include a reference here to transferability because many qualitative research studies are not designed to be generalized.)

Internal validity refers to (1) the rigor with which the study was conducted (e.g., the study's design, the care taken to conduct measurements, and decisions concerning what was and wasn't measured) and (2) the extent to which the designers of a study have taken into account alternative explanations for any causal relationships they explore (Huitt, 1998). In studies that do not explore causal relationships, only the first of these definitions should be considered when assessing internal validity.

Scholars discuss several types of internal validity. For brief discussions of several types of internal validity, click on the items below:

Face Validity

Face validity is concerned with how a measure or procedure appears. Does it seem like a reasonable way to gain the information the researchers are attempting to obtain? Does it seem well designed? Does it seem as though it will work reliably? Unlike content validity, face validity does not depend on established theories for support (Fink, 1995).

Criterion Related Validity

Criterion related validity, also referred to as instrumental validity, is used to demonstrate the accuracy of a measure or procedure by comparing it with another measure or procedure which has been demonstrated to be valid.

For example, imagine a hands-on driving test has been shown to be an accurate test of driving skills. By comparing the scores on the written driving test with the scores from the hands-on driving test, the written test can be validated by using a criterion related strategy in which the hands-on driving test is compared to the written test.

Construct Validity

Construct validity seeks agreement between a theoretical concept and a specific measuring device or procedure. For example, a researcher inventing a new IQ test might spend a great deal of time attempting to "define" intelligence in order to reach an acceptable level of construct validity.

Construct validity can be broken down into two sub-categories: Convergent validity and discriminate validity. Convergent validity is the actual general agreement among ratings, gathered independently of one another, where measures should be theoretically related. Discriminate validity is the lack of a relationship among measures which theoretically should not be related.

To understand whether a piece of research has construct validity, three steps should be followed. First, the theoretical relationships must be specified. Second, the empirical relationships between the measures of the concepts must be examined. Third, the empirical evidence must be interpreted in terms of how it clarifies the construct validity of the particular measure being tested (Carmines & Zeller, p. 23).

Content Validity

Content Validity is based on the extent to which a measurement reflects the specific intended domain of content (Carmines & Zeller, 1991, p.20).

Content validity is illustrated using the following examples: Researchers aim to study mathematical learning and create a survey to test for mathematical skill. If these researchers only tested for multiplication and then drew conclusions from that survey, their study would not show content validity because it excludes other mathematical functions. Although the establishment of content validity for placement-type exams seems relatively straight-forward, the process becomes more complex as it moves into the more abstract domain of socio-cultural studies. For example, a researcher needing to measure an attitude like self-esteem must decide what constitutes a relevant domain of content for that attitude. For socio-cultural studies, content validity forces the researchers to define the very domains they are attempting to study.

Related Information: Validity Example

Many recreational activities of high school students involve driving cars. A researcher, wanting to measure whether recreational activities have a negative effect on grade point average in high school students, might conduct a survey asking how many students drive to school and then attempt to find a correlation between these two factors. Because many students might use their cars for purposes other than or in addition to recreation (e.g., driving to work after school, driving to school rather than walking or taking a bus), this research study might prove invalid. Even if a strong correlation was found between driving and grade point average, driving to school in and of itself would seem to be an invalid measure of recreational activity.

The challenges of achieving reliability and validity are among the most difficult faced by researchers. In this section, we offer commentaries on these challenges.

Difficulties of Achieving Reliability

It is important to understand some of the problems concerning reliability which might arise. It would be ideal to reliably measure, every time, exactly those things which we intend to measure. However, researchers can go to great lengths and make every attempt to ensure accuracy in their studies, and still deal with the inherent difficulties of measuring particular events or behaviors. Sometimes, and particularly in studies of natural settings, the only measuring device available is the researcher's own observations of human interaction or human reaction to varying stimuli. As these methods are ultimately subjective in nature, results may be unreliable and multiple interpretations are possible. Three of these inherent difficulties are quixotic reliability, diachronic reliability and synchronic reliability.

Quixotic reliability refers to the situation where a single manner of observation consistently, yet erroneously, yields the same result. It is often a problem when research appears to be going well. This consistency might seem to suggest that the experiment was demonstrating perfect stability reliability. This, however, would not be the case.

For example, if a measuring device used in an Olympic competition always read 100 meters for every discus throw, this would be an example of an instrument consistently, yet erroneously, yielding the same result. However, quixotic reliability is often more subtle in its occurrences than this. For example, suppose a group of German researchers doing an ethnographic study of American attitudes ask questions and record responses. Parts of their study might produce responses which seem reliable, yet turn out to measure felicitous verbal embellishments required for "correct" social behavior. Asking Americans, "How are you?" for example, would in most cases, elicit the token, "Fine, thanks." However, this response would not accurately represent the mental or physical state of the respondents.

Diachronic reliability refers to the stability of observations over time. It is similar to stability reliability in that it deals with time. While this type of reliability is appropriate to assess features that remain relatively unchanged over time, such as landscape benchmarks or buildings, the same level of reliability is more difficult to achieve with socio-cultural phenomena.

For example, in a follow-up study one year later of reading comprehension in a specific group of school children, diachronic reliability would be hard to achieve. If the test were given to the same subjects a year later, many confounding variables would have impacted the researchers' ability to reproduce the same circumstances present at the first test. The final results would almost assuredly not reflect the degree of stability sought by the researchers.

Synchronic reliability refers to the similarity of observations within the same time frame; it is not about the similarity of things observed. Synchronic reliability, unlike diachronic reliability, rarely involves observations of identical things. Rather, it concerns itself with particularities of interest to the research.

For example, a researcher studies the actions of a duck's wing in flight and the actions of a hummingbird's wing in flight. Despite the fact that the researcher is studying two distinctly different kinds of wings, the action of the wings and the phenomenon produced is the same.

Comments on a Flawed, Yet Influential Study

An example of the dangers of generalizing from research that is inconsistent, invalid, unreliable, and incomplete is found in the Time magazine article, "On A Screen Near You: Cyberporn" (De Witt, 1995). This article relies on a study done at Carnegie Mellon University to determine the extent and implications of online pornography. Inherent to the study are methodological problems of unqualified hypotheses and conclusions, unsupported generalizations and a lack of peer review.

Ignoring the functional problems that manifest themselves later in the study, it seems that there are a number of ethical problems within the article. The article claims to be an exhaustive study of pornography on the Internet, (it was anything but exhaustive), it resembles a case study more than anything else. Marty Rimm, author of the undergraduate paper that Time used as a basis for the article, claims the paper was an "exhaustive study" of online pornography when, in fact, the study based most of its conclusions about pornography on the Internet on the "descriptions of slightly more than 4,000 images" (Meeks, 1995, p. 1). Some USENET groups see hundreds of postings in a day.

Considering the thousands of USENET groups, 4,000 images no longer carries the authoritative weight that its author intended. The real problem is that the study (an undergraduate paper similar to a second-semester composition assignment) was based not on pornographic images themselves, but on the descriptions of those images. This kind of reduction detracts significantly from the integrity of the final claims made by the author. In fact, this kind of research is commensurate with doing a study of the content of pornographic movies based on the titles of the movies, then making sociological generalizations based on what those titles indicate. (This is obviously a problem with a number of types of validity, because Rimm is not studying what he thinks he is studying, but instead something quite different. )

The author of the Time article, Philip Elmer De Witt writes, "The research team at CMU has undertaken the first systematic study of pornography on the Information Superhighway" (Godwin, 1995, p. 1). His statement is problematic in at least three ways. First, the research team actually consisted of a few of Rimm's undergraduate friends with no methodological training whatsoever. Additionally, no mention of the degree of interrater reliability is made. Second, this systematic study is actually merely a "non-randomly selected subset of commercial bulletin-board systems that focus on selling porn" (Godwin, p. 6). As pornography vending is actually just a small part of the whole concerning the use of pornography on the Internet, the entire premise of this study's content validity is firmly called into question. Finally, the use of the term "Information Superhighway" is a false assessment of what in actuality is only a few USENET groups and BBSs (Bulletin Board System), which make up only a small fraction of the entire "Information Superhighway" traffic. Essentially, what is here is yet another violation of content validity.

De Witt is quoted as saying: "In an 18-month study, the team surveyed 917,410 sexually-explicit pictures, descriptions, short-stories and film clips. On those USENET newsgroups where digitized images are stored, 83.5 percent of the pictures were pornographic" (De Witt 40).

Statistically, some interesting contradictions arise. The figure 917,410 was taken from adult-oriented BBSs--none came from actual USENET groups or the Internet itself. This is a glaring discrepancy. Out of the 917,410 files, 212,114 are only descriptions (Hoffman & Novak, 1995, p.2). The question is, how many actual images did the "researchers" see?

"Between April and July 1994, the research team downloaded all available images (3,254)...the team encountered technical difficulties with 13 percent of these images...This left a total of 2,830 images for analysis" (p. 2). This means that out of 917,410 files discussed in this study, 914,580 of them were not even pictures! As for the 83.5 percent figure, this is actually based on "17 alt.binaries groups that Rimm considered pornographic" (p. 2).

In real terms, 17 USENET groups is a fraction of a percent of all USENET groups available. Worse yet, Time claimed that "...only about 3 percent of all messages on the USENET [represent pornographic material], while the USENET itself represents 11.5 percent of the traffic on the Internet" (De Witt, p. 40).

Time neglected to carry the interpretation of this data out to its logical conclusion, which is that less than half of 1 percent (3 percent of 11 percent) of the images on the Internet are associated with newsgroups that contain pornographic imagery. Furthermore, of this half percent, an unknown but even smaller percentage of the messages in newsgroups that are 'associated with pornographic imagery', actually contained pornographic material (Hoffman & Novak, p. 3).

Another blunder can be seen in the avoidance of peer-review, which suggests that there was some political interests being served in having the study become a Time cover story. Marty Rimm contracted the Georgetown Law Review and Time in an agreement to publish his study as long as they kept it under lock and key. During the months before publication, many interested scholars and professionals tried in vain to obtain a copy of the study in order to check it for flaws. De Witt justified not letting such peer-review take place, and also justified the reliability and validity of the study, on the grounds that because the Georgetown Law Review had accepted it, it was therefore reliable and valid, and needed no peer-review. What he didn't know, was that law reviews are not edited by professionals, but by "third year law students" (Godwin, p. 4).

There are many consequences of the failure to subject such a study to the scrutiny of peer review. If it was Rimm's desire to publish an article about on-line pornography in a manner that legitimized his article, yet escaped the kind of critical review the piece would have to undergo if published in a scholarly journal of computer-science, engineering, marketing, psychology, or communications. What better venue than a law journal? A law journal article would have the added advantage of being taken seriously by law professors, lawyers, and legally-trained policymakers. By virtue of where it appeared, it would automatically be catapulted into the center of the policy debate surrounding online censorship and freedom of speech (Godwin).

Herein lies the dangerous implication of such a study: Because the questions surrounding pornography are of such immediate political concern, the study was placed in the forefront of the U.S. domestic policy debate over censorship on the Internet, (an integral aspect of current anti-First Amendment legislation) with little regard for its validity or reliability.

On June 26, the day the article came out, Senator Grassley, (co-sponsor of the anti-porn bill, along with Senator Dole) began drafting a speech that was to be delivered that very day in the Senate, using the study as evidence. The same day, at the same time, Mike Godwin posted on WELL (Whole Earth 'Lectronic Link, a forum for professionals on the Internet) what turned out to be the overstatement of the year: "Philip's story is an utter disaster, and it will damage the debate about this issue because we will have to spend lots of time correcting misunderstandings that are directly attributable to the story" (Meeks, p. 7).

As Godwin was writing this, Senator Grassley was speaking to the Senate: "Mr. President, I want to repeat that: 83.5 percent of the 900,000 images reviewed--these are all on the Internet--are pornographic, according to the Carnegie-Mellon study" ( p. 7). Several days later, Senator Dole was waving the magazine in front of the Senate like a battle flag.

Donna Hoffman, professor at Vanderbilt University, summed up the dangerous political implications by saying, "The critically important national debate over First Amendment rights and restrictions of information on the Internet and other emerging media requires facts and informed opinion, not hysteria" (p.1).

In addition to the hysteria, Hoffman sees a plethora of other problems with the study. "Because the content analysis and classification scheme are 'black boxes,'" Hoffman said, "because no reliability and validity results are presented, because no statistical testing of the differences both within and among categories for different types of listings has been performed, and because not a single hypothesis has been tested, formally or otherwise, no conclusions should be drawn until the issues raised in this critique are resolved" (p. 4).

However, the damage has already been done. This questionable research by an undergraduate engineering major has been generalized to such an extent that even the U.S. Senate, and in particular Senators Grassley and Dole, have been duped, albeit through the strength of their own desires to see only what they wanted to see.

Annotated Bibliography

American Psychological Association. (1985). Standards for educational and psychological testing. Washington, DC: Author.

This work on focuses on reliability, validity and the standards that testers need to achieve in order to ensure accuracy.

Babbie, E.R. & Huitt, R.E. (1979). The practice of social research 2nd ed . Belmont, CA: Wadsworth Publishing.

An overview of social research and its applications.

Beauchamp, T. L., Faden, R.R., Wallace, Jr., R.J. & Walters, L . ( 1982). Ethical issues in social science research. Baltimore and London: The Johns Hopkins University Press.

A systematic overview of ethical issues in Social Science Research written by researchers with firsthand familiarity with the situations and problems researchers face in their work. This book raises several questions of how reliability and validity can be affected by ethics.

Borman, K.M. et al. (1986). Ethnographic and qualitative research design and why it doesn't work. American behavioral scientist 30 , 42-57.

The authors pose questions concerning threats to qualitative research and suggest solutions.

Bowen, K. A. (1996, Oct. 12). The sin of omission -punishable by death to internal validity: An argument for integration of quantitative research methods to strengthen internal validity. Available: http://trochim.human.cornell.edu/gallery/bowen/hss691.htm

An entire Web site that examines the merits of integrating qualitative and quantitative research methodologies through triangulation. The author argues that improving the internal validity of social science will be the result of such a union.

Brinberg, D. & McGrath, J.E. (1985). Validity and the research process . Beverly Hills: Sage Publications.

The authors investigate validity as value and propose the Validity Network Schema, a process by which researchers can infuse validity into their research.

Bussières, J-F. (1996, Oct.12). Reliability and validity of information provided by museum Web sites. Available: http://www.oise.on.ca/~jfbussieres/issue.html

This Web page examines the validity of museum Web sites which calls into question the validity of Web-based resources in general. Addresses the issue that all Websites should be examined with skepticism about the validity of the information contained within them.

Campbell, D. T. & Stanley, J.C. (1963). Experimental and quasi-experimental designs for research. Boston: Houghton Mifflin.

An overview of experimental research that includes pre-experimental designs, controls for internal validity, and tables listing sources of invalidity in quasi-experimental designs. Reference list and examples.

Carmines, E. G. & Zeller, R.A. (1991). Reliability and validity assessment . Newbury Park: Sage Publications.

An introduction to research methodology that includes classical test theory, validity, and methods of assessing reliability.

Carroll, K. M. (1995). Methodological issues and problems in the assessment of substance use. Psychological Assessment, Sep. 7 n3 , 349-58.

Discusses methodological issues in research involving the assessment of substance abuse. Introduces strategies for avoiding problems with the reliability and validity of methods.

Connelly, F. M. & Clandinin, D.J. (1990). Stories of experience and narrative inquiry. Educational Researcher 19:5 , 2-12.

A survey of narrative inquiry that outlines criteria, methods, and writing forms. It includes a discussion of risks and dangers in narrative studies, as well as a research agenda for curricula and classroom studies.

De Witt, P.E. (1995, July 3). On a screen near you: Cyberporn. Time, 38-45.

This is an exhaustive Carnegie Mellon study of online pornography by Marty Rimm, electrical engineering student.

Fink, A., ed. (1995). The survey Handbook, v.1 .Thousand Oaks, CA: Sage.

A guide to survey; this is the first in a series referred to as the "survey kit". It includes bibliograpgical references. Addresses survey design, analysis, reporting surveys and how to measure the validity and reliability of surveys.

Fink, A., ed. (1995). How to measure survey reliability and validity v. 7 . Thousand Oaks, CA: Sage.

This volume seeks to select and apply reliability criteria and select and apply validity criteria. The fundamental principles of scaling and scoring are considered.

Godwin, M. (1995, July). JournoPorn, dissection of the Time article. Available: http://www.hotwired.com

A detailed critique of Time magazine's Cyberporn , outlining flaws of methodology as well as exploring the underlying assumptions of the article.

Hambleton, R.K. & Zaal, J.N., eds. (1991). Advances in educational and psychological testing . Boston: Kluwer Academic.

Information on the concepts of reliability and validity in psychology and education.

Harnish, D.L. (1992). Human judgment and the logic of evidence: A critical examination of research methods in special education transition literature . In D.L. Harnish et al. eds., Selected readings in transition.

This article investigates threats to validity in special education research.

Haynes, N. M. (1995). How skewed is 'the bell curve'? Book Product Reviews . 1-24.

This paper claims that R.J. Herrnstein and C. Murray's The Bell Curve: Intelligence and Class Structure in American Life does not have scientific merit and claims that the bell curve is an unreliable measure of intelligence.

Healey, J. F. (1993). Statistics: A tool for social research, 3rd ed . Belmont: Wadsworth Publishing.

Inferential statistics, measures of association, and multivariate techniques in statistical analysis for social scientists are addressed.

Helberg, C. (1996, Oct.12). Pitfalls of data analysis (or how to avoid lies and damned lies). Available: http//maddog/fammed.wisc.edu/pitfalls/

A discussion of things researchers often overlook in their data analysis and how statistics are often used to skew reliability and validity for the researchers purposes.

Hoffman, D. L. and Novak, T.P. (1995, July). A detailed critique of the Time article: Cyberporn. Available: http://www.hotwired.com

A methodological critique of the Time article that uncovers some of the fundamental flaws in the statistics and the conclusions made by De Witt.

Huitt, William G. (1998). Internal and External Validity . http://www.valdosta.peachnet.edu/~whuitt/psy702/intro/valdgn.html

A Web document addressing key issues of external and internal validity.

Jones, J. E. & Bearley, W.L. (1996, Oct 12). Reliability and validity of training instruments. Organizational Universe Systems. Available: http://ous.usa.net/relval.htm

The authors discuss the reliability and validity of training design in a business setting. Basic terms are defined and examples provided.

Cultural Anthropology Methods Journal. (1996, Oct. 12). Available: http://www.lawrence.edu/~bradleyc/cam.html

An online journal containing articles on the practical application of research methods when conducting qualitative and quantitative research. Reliability and validity are addressed throughout.

Kirk, J. & Miller, M. M. (1986). Reliability and validity in qualitative research. Beverly Hills: Sage Publications.

This text describes objectivity in qualitative research by focusing on the issues of validity and reliability in terms of their limitations and applicability in the social and natural sciences.

Krakower, J. & Niwa, S. (1985). An assessment of validity and reliability of the institutinal perfarmance survey . Boulder, CO: National center for higher education management systems.

Educational surveys and higher education research and the efeectiveness of organization.

Lauer, J. M. & Asher, J.W. (1988). Composition Research. New York: Oxford University Press.

A discussion of empirical designs in the context of composition research as a whole.

Laurent, J. et al. (1992, Mar.) Review of validity research on the stanford-binet intelligence scale: 4th Ed. Psychological Assessment . 102-112.

This paper looks at the results of construct and criterion- related validity studies to determine if the SB:FE is a valid measure of intelligence.

LeCompte, M. D., Millroy, W.L., & Preissle, J. eds. (1992). The handbook of qualitative research in education. San Diego: Academic Press.

A compilation of the range of methodological and theoretical qualitative inquiry in the human sciences and education research. Numerous contributing authors apply their expertise to discussing a wide variety of issues pertaining to educational and humanities research as well as suggestions about how to deal with problems when conducting research.

McDowell, I. & Newell, C. (1987). Measuring health: A guide to rating scales and questionnaires . New York: Oxford University Press.

This gives a variety of examples of health measurement techniques and scales and discusses the validity and reliability of important health measures.

Meeks, B. (1995, July). Muckraker: How Time failed. Available: http://www.hotwired.com

A step-by-step outline of the events which took place during the researching, writing, and negotiating of the Time article of 3 July, 1995 titled: On A Screen Near You: Cyberporn .

Merriam, S. B. (1995). What can you tell from an N of 1?: Issues of validity and reliability in qualitative research. Journal of Lifelong Learning v4 , 51-60.

Addresses issues of validity and reliability in qualitative research for education. Discusses philosophical assumptions underlying the concepts of internal validity, reliability, and external validity or generalizability. Presents strategies for ensuring rigor and trustworthiness when conducting qualitative research.

Morris, L.L, Fitzgibbon, C.T., & Lindheim, E. (1987). How to measure performance and use tests. In J.L. Herman (Ed.), Program evaluation kit (2nd ed.). Newbury Park, CA: Sage.

Discussion of reliability and validity as it pertyains to measuring students' performance.

Murray, S., et al. (1979, April). Technical issues as threats to internal validity of experimental and quasi-experimental designs. San Francisco: University of California. 8-12.

(From Yang et al. bibliography--unavailable as of this writing.)

Russ-Eft, D. F. (1980). Validity and reliability in survey research. American Institutes for Research in the Behavioral Sciences August , 227 151.

An investigation of validity and reliability in survey research with and overview of the concepts of reliability and validity. Specific procedures for measuring sources of error are suggested as well as general suggestions for improving the reliability and validity of survey data. A extensive annotated bibliography is provided.

Ryser, G. R. (1994). Developing reliable and valid authentic assessments for the classroom: Is it possible? Journal of Secondary Gifted Education Fall, v6 n1 , 62-66.

Defines the meanings of reliability and validity as they apply to standardized measures of classroom assessment. This article defines reliability as scorability and stability and validity is seen as students' ability to use knowledge authentically in the field.

Schmidt, W., et al. (1982). Validity as a variable: Can the same certification test be valid for all students? Institute for Research on Teaching July, ED 227 151.

A technical report that presents specific criteria for judging content, instructional and curricular validity as related to certification tests in education.

Scholfield, P. (1995). Quantifying language. A researcher's and teacher's guide to gathering language data and reducing it to figures . Bristol: Multilingual Matters.

A guide to categorizing, measuring, testing, and assessing aspects of language. A source for language-related practitioners and researchers in conjunction with other resources on research methods and statistics. Questions of reliability, and validity are also explored.

Scriven, M. (1993). Hard-Won Lessons in Program Evaluation . San Francisco: Jossey-Bass Publishers.

A common sense approach for evaluating the validity of various educational programs and how to address specific issues facing evaluators.

Shou, P. (1993, Jan.). The singer loomis inventory of personality: A review and critique. [Paper presented at the Annual Meeting of the Southwest Educational Research Association.]

Evidence for reliability and validity are reviewed. A summary evaluation suggests that SLIP (developed by two Jungian analysts to allow examination of personality from the perspective of Jung's typology) appears to be a useful tool for educators and counselors.

Sutton, L.R. (1992). Community college teacher evaluation instrument: A reliability and validity study . Diss. Colorado State University.

Studies of reliability and validity in occupational and educational research.

Thompson, B. & Daniel, L.G. (1996, Oct.). Seminal readings on reliability and validity: A "hit parade" bibliography. Educational and psychological measurement v. 56 , 741-745.

Editorial board members of Educational and Psychological Measurement generated bibliography of definitive publications of measurement research. Many articles are directly related to reliability and validity.

Thompson, E. Y., et al. (1995). Overview of qualitative research . Diss. Colorado State University.

A discussion of strengths and weaknesses of qualitative research and its evolution and adaptation. Appendices and annotated bibliography.

Traver, C. et al. (1995). Case Study . Diss. Colorado State University.

This presentation gives an overview of case study research, providing definitions and a brief history and explanation of how to design research.

Trochim, William M. K. (1996) External validity. (. Available: http://trochim.human.cornell.edu/kb/EXTERVAL.htm

A comprehensive treatment of external validity found in William Trochim's online text about research methods and issues.

Trochim, William M. K. (1996) Introduction to validity. (. Available: hhttp://trochim.human.cornell.edu/kb/INTROVAL.htm

An introduction to validity found in William Trochim's online text about research methods and issues.

Trochim, William M. K. (1996) Reliability. (. Available: http://trochim.human.cornell.edu/kb/reltypes.htm

A comprehensive treatment of reliability found in William Trochim's online text about research methods and issues.

Validity. (1996, Oct. 12). Available: http://vislab-www.nps.navy.mil/~haga/validity.html

A source for definitions of various forms and types of reliability and validity.

Vinsonhaler, J. F., et al. (1983, July). Improving diagnostic reliability in reading through training. Institute for Research on Teaching ED 237 934.

This technical report investigates the practical application of a program intended to improve the diagnoses of reading deficient students. Here, reliability is assumed and a pragmatic answer to a specific educational problem is suggested as a result.

Wentland, E. J. & Smith, K.W. (1993). Survey responses: An evaluation of their validity . San Diego: Academic Press.

This book looks at the factors affecting response validity (or the accuracy of self-reports in surveys) and provides several examples with varying accuracy levels.

Wiget, A. (1996). Father juan greyrobe: Reconstructing tradition histories, and the reliability and validity of uncorroborated oral tradition. Ethnohistory 43:3 , 459-482.

This paper presents a convincing argument for the validity of oral histories in ethnographic research where at least some of the evidence can be corroborated through written records.

Yang, G. H., et al. (1995). Experimental and quasi-experimental educational research . Diss. Colorado State University.

This discussion defines experimentation and considers the rhetorical issues and advantages and disadvantages of experimental research. Annotated bibliography.

Yarroch, W. L. (1991, Sept.). The Implications of content versus validity on science tests. Journal of Research in Science Teaching , 619-629.

The use of content validity as the primary assurance of the measurement accuracy for science assessment examinations is questioned. An alternative accuracy measure, item validity, is proposed to look at qualitative comparisons between different factors.

Yin, R. K. (1989). Case study research: Design and methods . London: Sage Publications.

This book discusses the design process of case study research, including collection of evidence, composing the case study report, and designing single and multiple case studies.

Related Links

Internal Validity Tutorial. An interactive tutorial on internal validity.

http://server.bmod.athabascau.ca/html/Validity/index.shtml

Howell, Jonathan, Paul Miller, Hyun Hee Park, Deborah Sattler, Todd Schack, Eric Spery, Shelley Widhalm, & Mike Palmquist. (2005). Reliability and Validity. Writing@CSU . Colorado State University. https://writing.colostate.edu/guides/guide.cfm?guideid=66

A qualitative simulation checking approach of programmed grounded theory and its application in workers’ involvement: extending Corbin and Struss’ grounded theory checking mechanism

  • Published: 29 April 2024

Cite this article

a valid research study does the following

  • Haoran Wang 1 ,
  • Bin Hu 1 &
  • Yanting Duan 1  

The validity and reliability of a grounded theory research based on interpretivism involves four dimensions: credibility, transferability, dependability, and confirmability. In order to enhance the credibility of a qualitative study, the findings of the grounded theory need to be checked for consistency with reality. Traditional checking approaches lack universal applicability and are difficult for researchers to implement. This paper proposes a qualitative simulation checking approach for programmed grounded theory research, which can enhance the validity and reliability of programmed grounded theory research in terms of credibility, transferability, and dependability dimensions. This approach is not only a more generally applicable checking approach, but also provides a virtual experiment platform for qualitative research. To overcome the deficiencies caused by following a single approach, a checking framework that integrates programmed grounded theory, qualitative simulation checking, and member checking was proposed. The methodology of this paper is validated by applying to a case of sanitation workers’ involvement in the Internet of Things environment.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price includes VAT (Russian Federation)

Instant access to the full article PDF.

Rent this article via DeepDyve

Institutional subscriptions

a valid research study does the following

Bozionelos, N.: The big five of personality and work involvement. J. Manag. Psychol. 19 (1), 69–81 (2004). https://doi.org/10.1108/02683940410520664

Article   Google Scholar  

Candela, A.G.: Exploring the function of member checking. The Qualit. Rep. 24 (3), 619–628 (2019). https://doi.org/10.46743/2160-3715/2019.3726

Cao, Q., Yu, K., Zhou, L., Wang, L., Li, C.: In-depth research on qualitative simulation of coal miners’ group safety behaviors. Saf. Sci. 113 (3), 210–232 (2019). https://doi.org/10.1016/j.ssci.2018.11.012

Carson, J. S.: Verification and validation: a consultant's perspective. In Proceedings of the 21st conference on Winter simulation. 1989; pp. 552–558

Charmaz, K.: Constructing grounded theory: A practical guide through qualitative analysis. Sage (2006)

Cho, J., Allen, T.: Validity in qualitative research revisited. Qual. Res. 6 (3), 319–340 (2006). https://doi.org/10.1177/1468794106065006

Corbin, J., Strauss, A.: Basics of qualitative research: techniques and procedures for developing grounded theory. Sage publications (2014)

Cypress, B.S.: Rigor or reliability and validity in qualitative research: perspectives, strategies, reconceptualization, and recommendations. Dimens. Crit. Care Nurs. 36 (4), 253–263 (2017)

Dahwa, C.: Adapting and blending grounded theory with case study: a practical guide. Qual. Quant. (2023). https://doi.org/10.1007/s11135-023-01783-9

De Jong, H., Geiselmann, J., Hernandez, C., Page, M.: Genetic network analyzer: qualitative simulation of genetic regulatory networks. Bioinformatics 19 (3), 336–344 (2003). https://doi.org/10.1093/bioinformatics/btf851

Demerouti, E., Bakker, A.B., Nachreiner, F., Schaufeli, W.B.: The job demands-resources model of burnout. J. Appl. Psychol. 86 (3), 499–512 (2001). https://doi.org/10.1037/0021-9010.86.3.499

Eisenberger, R., Armeli, S., Rexwinkel, B., Lynch, P.D.: Rhoades, L: reciprocation of perceived organizational support. J. Appl. Psychol. 86 (1), 42–51 (2001). https://doi.org/10.1037/0021-9010.86.1.42

Eisenhardt, K.M., Graebner, M.E.: Theory building from cases: opportunities and challenges. Acad. Manag. J. 50 (1), 25–32 (2007). https://doi.org/10.5465/amj.2007.24160888

Gioia, D.A., Corley, K.G., Hamilton, A.L.: Seeking qualitative rigor in inductive research: notes on the Gioia methodology. Org. Res. Methods 16 (1), 15–31 (2013). https://doi.org/10.1177/1094428112452151

Glaser, B., Strauss, A.: The discovery of grounded theory: Strategies for qualitative research. Aldine De Gruyter, New York (1976)

Google Scholar  

Glaser, B.: Basics of Grounded Theory Analysis. Sociology Press, Mill Valley, CF (1992)

Glaser, B.: Theoretical sensitivity. Sociology Press, Mill Valley, CA (1978)

Grodal, S., Anteby, M., Holm, A.L.: Achieving rigor in qualitative analysis: the role of active categorization in theory building. Acad. Manag. Rev. 46 (3), 591–612 (2021). https://doi.org/10.5465/amr.2018.0482

Guerrin, F., Dumas, J.: Knowledge representation and qualitative simulation of salmon redd functioning. Part I: qualitative modeling and simulation. Biosystems 59 (2), 75–84 (2001). https://doi.org/10.1016/S0303-2647(01)00100-9

Hansen, S., Baroody, A.J.: Electronic health records and the logics of care: complementarity and conflict in the US healthcare system. Inf. Syst. Res. 31 (1), 57–75 (2020). https://doi.org/10.1287/isre.2019.0875

Hu, B., Xia, N.: Cusp catastrophe model for sudden changes in a person’s behavior. Inf. Sci. 294 (10), 489–512 (2015). https://doi.org/10.1016/j.ins.2014.09.055

Johnson, J.S., Matthes, J.M.: Sales-to-marketing job transitions. J. Market. 82 (4), 32–48 (2018). https://doi.org/10.1509/jm.17.0279

Kanungo, R.N.: Measurement of job and work involvement. J. Appl. Psychol. 67 (3), 341–349 (1982). https://doi.org/10.1037/0021-9010.67.3.341

Karjalainen, M., Sarker, S., Siponen, M.: Toward a theory of information systems security behaviors of organizational employees: a dialectical process perspective. Inf. Syst. Res. 30 (2), 687–704 (2019). https://doi.org/10.1287/isre.2018.0827

Kellogg, K.C., Valentine, M.A., Christin, A.: Algorithms at work: the new contested terrain of control. Acad. Manag. Ann. 14 (1), 366–410 (2020). https://doi.org/10.5465/annals.2018.0174

Khoury, T.A., Shymko, Y., Vermeire, J.: Simulating the cause: how grassroots organizations advance their credibility through the dramaturgical curation of events. Organ. Sci. 33 (4), 1470–1500 (2022). https://doi.org/10.1287/orsc.2021.1489

Kim, D.S., Chung, C.K.: Qualitative simulation on the dynamics between social capital and business performance in strategic networks. J. Distrib. Sci. 14 (9), 31–45 (2016). https://doi.org/10.15722/jds.14.9.201609.31

Kirk J., Miller M. L.: Reliability and validity in qualitative research. Sage publications (1986)

Kleindorfer, G.B., O’Neill, L., Ganeshan, R.: Validation in simulation: various positions in the philosophy of science. Manage. Sci. 44 (8), 1087–1099 (1998). https://doi.org/10.1287/mnsc.44.8.1087

Knight, C., Patterson, M., Dawson, J.: Building work engagement: a Systematic review and meta-analysis investigating the effectiveness of work engagement interventions. J. Organ. Behav. 38 (6), 792–812 (2017). https://doi.org/10.1002/job.2167

Kohnke, E.J., Mukherjee, U.K., Sinha, K.K.: Delivering long-term surgical care in underserved communities: the enabling role of international NPO s as partners. Prod. Oper. Manag. 26 (6), 1092–1119 (2017). https://doi.org/10.1111/poms.12705

Kuipers, B.: Qualitative simulation. Artif. Intell. 29 (3), 289–338 (1986). https://doi.org/10.1016/0004-3702(86)90073-1

Kunze, L., Beetz, M.: Envisioning the qualitative effects of robot manipulation actions using simulation-based projections. Artif. Intell. 247 (6), 352–380 (2017). https://doi.org/10.1016/j.artint.2014.12.004

Lesener, T., Gusy, B., Wolter, C.: The job demands-resources model: a meta-analytic review of longitudinal studies. Work Stress. 33 (1), 76–103 (2019). https://doi.org/10.1080/02678373.2018.1529065

Leung, L.: Validity, reliability, and generalizability in qualitative research. J. Fam. Med. Prim. Care. 4 (3), 324–327 (2015)

Liu, Q., Qu, X., Zhao, D., Guo, Y.: Qualitative simulation of organization quality specific immune decision-making of manufacturing enterprises based on QSIM algorithm simulation. J. Comput. Methods Sci. Eng. 21 (6), 2059–2076 (2021). https://doi.org/10.3233/JCM-215523

Lu, Y., Wang, F., Jia, M., Qi, Y.: Centrifugal compressor fault diagnosis based on qualitative simulation and thermal parameters. Mech. Syst. Signal Process. 81 (12), 259–273 (2016). https://doi.org/10.1016/j.ymssp.2016.03.018

Lubet, S.: Interrogating ethnography: Why evidence matters. Oxford University Press, New York (2017)

Markus, M.L., Robey, D.: Information technology and organizational change: causal structure in theory and research. Manage. Sci. 34 (5), 583–598 (1988). https://doi.org/10.1287/mnsc.34.5.583

Martínez-Miranda, J., Pavón, J.: Modeling the influence of trust on work team performance. SIMULATION 88 (4), 408–436 (2012)

Morse, J.M., Barrett, M., Mayan, M., Olson, K., Spiers, J.: Verification strategies for establishing reliability and validity in qualitative research. Int J Qual Methods 1 (2), 13–22 (2002)

Oh, T.T., Pham, M.T.: A liberating-engagement theory of consumer fun. J. Cons. Res. 49 (1), 46–73 (2022). https://doi.org/10.1093/jcr/ucab051

Oliver, D., Cole, B.M.: The interplay of product and process in skunkworks identity work: an inductive model. Strateg. Manag. J. 40 (9), 1491–1514 (2019). https://doi.org/10.1002/smj.3034

Ropers, D., De Jong, H., Page, M., Schneider, D., Geiselmann, J.: Qualitative simulation of the carbon starvation response in Escherichia coli. Biosystems 84 (2), 124–152 (2006). https://doi.org/10.1016/j.biosystems.2005.10.005

Ruesch, L., Tarakci, M., Besiou, M., Van Quaquebeke, N.: Orchestrating coordination among humanitarian organizations. Prod. Oper. Manag. 31 (5), 1977–1996 (2022). https://doi.org/10.1111/poms.13660

Saadatpour, A., Albert, R.: A comparative study of qualitative and quantitative dynamic models of biological regulatory networks. EPJ Nonlinear Biomed. Phys. 4 (1), 1–13 (2016)

Saunders, B., Sim, J., Kingstone, T., Baker, S., Waterfield, J., Bartlam, B., Burroughs, H., Jinks, C.: Saturation in qualitative research: exploring its conceptualization and operationalization. Qual. Quant. 52 (4), 1893–1907 (2018)

Schaufeli, W.B., Salanova, M., Gonzalez-Roma, V., Bakker, A.B.: The measurement of engagement and burnout: a two sample confirmatory factor analytic approach. J. Happiness Stud. 3 , 71–92 (2002)

Small, M.L.: Causal thinking and ethnographic research. Am. J. Sociol. 119 (3), 597–601 (2013). https://doi.org/10.1086/675893

Sting, F.J., Stevens, M., Tarakci, M.: Temporary deembedding buyer–supplier relationships: a complexity perspective. J. Oper. Manag. 65 (2), 114–135 (2019). https://doi.org/10.1002/joom.1008

Strauss, A.: Qualitative analysis for social scientists. Cambridge university press (1987)

Sutton, R.I., Staw, B.M.: What theory is not. Adm. Sci. Q. 40 (3), 371–384 (1995). https://doi.org/10.2307/2393788

Wiesche, M., Jurisch, M.C., Yetton, P.W., Krcmar, H.: Grounded theory methodology in information systems research. MIS Quarter. 41 (3), 685–701 (2017)

Zhao, X., Hu, B.: Qualitative simulation on staff counterproductive work behaviors based on stochastic catastrophe theory. J. Manag. Sci. China 19 (2), 13–30 (2016)

Download references

Acknowledgements

The authors would like to thank the participants in the study discussed in this paper as well as the managers of the sanitation work in Shenzhen, China.

This work was supported by the [National Natural Science Foundation of China] (grant number [72371110], [71971093], [72132001]) and the [Fundamental Research Funds for the Central Universities] (grant number [2023WKZDJC007]).

Author information

Authors and affiliations.

School of Management, Huazhong University of Science and Technology, 1037 Luoyu Road, Hongshan District, Wuhan, Hubei, People’s Republic of China

Haoran Wang, Bin Hu & Yanting Duan

You can also search for this author in PubMed   Google Scholar

Contributions

All authors contributed to the study conception and design. Material preparation, data collection and analysis were performed by HW, BH and YD. The first draft of the manuscript was written by HW and all authors commented on previous versions of the manuscript. All authors read and approved the final manuscript.

Corresponding author

Correspondence to Bin Hu .

Ethics declarations

Conflict of interest.

The authors have no relevant financial or non-financial interests to disclose.

Ethical approval

The study does not include any procedure which involves danger, harm, distress or discomfort to research participants. Participants requested anonymity, and we only required the identification of their job type as well as their position in our study. We used codes to identify individuals. All participants gave verbal consent for us to disclose the codes representing their personal information and the results of this study.

Additional information

Publisher's note.

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary Information

Below is the link to the electronic supplementary material.

Supplementary file1 (DOCX 137 kb)

Rights and permissions.

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Wang, H., Hu, B. & Duan, Y. A qualitative simulation checking approach of programmed grounded theory and its application in workers’ involvement: extending Corbin and Struss’ grounded theory checking mechanism. Qual Quant (2024). https://doi.org/10.1007/s11135-024-01864-3

Download citation

Accepted : 19 February 2024

Published : 29 April 2024

DOI : https://doi.org/10.1007/s11135-024-01864-3

Share this article

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

  • Programmed grounded theory
  • Qualitative simulation
  • Member checking
  • Work involvement
  • Find a journal
  • Publish with us
  • Track your research
  • International edition
  • Australia edition
  • Europe edition

Older couple hiking on grassy hills

Perception of when old age starts has increased over time, shows study

As people get older, they revise the age they consider to be old upwards

None of us are getting any younger, but it appears the age at which we are considered old has moved upwards over the generations.

What’s more, as adults get older, they shift the goalposts further still, a study has shown.

The researchers behind the study said the upward shift could be down to increases in life expectancy and retirement age, as well as other factors.

“We should be aware that conceptions and perceptions of ‘old’ change across historical time, and that people are quite different regarding when they think old age begins, dependent on their age, their birth cohort, but also their health etc,” said Dr Markus Wettstein, co-author of the study, from the Humboldt University of Berlin.

Writing in the journal Psychology and Aging , Wettstein and colleagues report how they analysed responses to the question: “At what age would you describe someone as old?”, which is part of the ongoing German ageing survey that follows people born between 1911 and 1974.

The results from 14,056 middle-aged and older adults who answered the question between one and eight times over a 25-year period from 1996, when they were between 40 and 100 years old, reveals that the point at which old age is thought to begin has increased.

“For those born in 1931, the perceived onset of old age is 74 when they are 65. For those born in 1944 it is about 75 years when they are 65 years old,” said Wettstein, adding that while the study could not ask 65-year-olds born in 1911 when they thought old age began, models suggest it would have been at 71.

However, it seems perceptions are stabilising: while the team found people born after 1935 perceived old age as beginning later in life than those born between 1911 and 1935, there was no noticeable difference between those born between 1936 and 1951 and those born between 1952 and 1974.

Further, as people get older, they revise the age they consider to be old upwards.

“This could have to do with the fact that many people do not want to be old, so they postpone the onset of old age,” said Wettstein, adding that that could be related to age stereotypes.

However, it seems those born in later cohorts shift the goalposts to a greater extent: while people born in 1944 revised their notion of old age upwards by 1.9 years on ageing from 64 to 74, those born in 1934 shifted their view by less than a month between these ages.

The team add that while the perception of when old age begins was higher for women than men, and lower for those who had poor health or were more lonely, neither these factors nor education level or how old participants felt, fully explained their findings.

Caroline Abrahams, the charity director at Age UK, said it was well known that people tended to judge “old” as meaning at least a few years beyond their chronological age, even in their 70s and 80s, and that probably reflects the bad image of “old” in western cultures.

“This is a shame if it holds us back from living as full and happy lives as we could and should in our later years, because of us self-limiting our activities and aspirations,” she said.

Instead, Abrahams said the idea that we are “as old as we feel” is a lot more supportive.

“The truth is that chronological age is rarely a good proxy for anything and the sooner we realise that in our society, the better,” she said.

  • Older people

More on this story

a valid research study does the following

‘You’re covered in wrinkles. You’re no longer interesting’: the books making ageing women visible

a valid research study does the following

Retirement anxiety: do you really need $1.46m to have a happy old age?

a valid research study does the following

‘Kardashian children are sharing skincare routines’: experts on gen Z’s ageing fixation

a valid research study does the following

Two nights of broken sleep can make people feel years older, finds study

a valid research study does the following

As I slide into my ninth decade there are many things I regret, and some days the list is endless

a valid research study does the following

Online dating in your middle age feels like praying for a miracle

a valid research study does the following

‘Ageing isn’t inevitable’: The 100-Year-Life co-author on how to live well for longer

a valid research study does the following

Team sports for kids and yoga in your 50s: exercises to help you stay flexible for every life stage

a valid research study does the following

Don’t bounce back, bounce forward: how to be more resilient at every age

a valid research study does the following

Scientists name eight measures that can slow ageing by up to six years

Most viewed.

This paper is in the following e-collection/theme issue:

Published on 29.4.2024 in Vol 26 (2024)

Exploring the Impact of In Basket Metrics on the Adoption of a New Electronic Health Record System Among Specialists in a Tertiary Hospital in Alberta: Descriptive Study

Authors of this article:

Author Orcid Image

Original Paper

  • Melita Avdagovska 1 , PhD   ; 
  • Craig Kuziemsky 2 , PhD   ; 
  • Helia Koosha 1 , MSc   ; 
  • Maliheh Hadizadeh 1 , PhD   ; 
  • Robert P Pauly 3 , MSc, MD   ; 
  • Timothy Graham 4 , MSc, MD, CCFP, CHE   ; 
  • Tania Stafinski 1 , PhD   ; 
  • David Bigam 5 , MD   ; 
  • Narmin Kassam 3 , MD   ; 
  • Devidas Menon 1 , PhD  

1 School of Public Health, University of Alberta, Edmonton, AB, Canada

2 Office of Research Services and School of Business, MacEwan University, Edmonton, AB, Canada

3 Medicine Department, Faculty of Medicine & Dentistry, University of Alberta, Edmonton, AB, Canada

4 Department of Emergency Medicine, Faculty of Medicine & Dentistry, University of Alberta, Edmonton, AB, Canada

5 Surgery Department, Faculty of Medicine & Dentistry, University of Alberta, Edmonton, AB, Canada

Corresponding Author:

Melita Avdagovska, PhD

School of Public Health

University of Alberta

11405 87 Avenue

Edmonton Clinic Health Academy

Edmonton, AB, T6G 1C9

Phone: 1 780 908 3334

Email: [email protected]

Background: Health care organizations implement electronic health record (EHR) systems with the expectation of improved patient care and enhanced provider performance. However, while these technologies hold the potential to create improved care and system efficiencies, they can also lead to unintended negative consequences, such as patient safety issues, communication problems, and provider burnout.

Objective: This study aims to document metrics related to the In Basket communication hub ( time in In Basket per day, time in In Basket per appointment, In Basket messages received per day, and turnaround time ) of the EHR system implemented by Alberta Health Services, the province-wide health delivery system called Connect Care (Epic Systems). The objective was to identify how a newly implemented EHR system was used, the timing of its use, and the duration of use specifically related to In Basket activities.

Methods: A descriptive study was conducted. Due to the diversity of specialties, the providers were grouped into medical and surgical based on previous similar studies. The participants were further subgrouped based on their self-reported clinical full-time equivalent (FTE ) measure. This resulted in 3 subgroups for analysis: medical FTE <0.5 , medical FTE >0.5 , and surgical (all of whom reported FTE >0.5 ). The analysis was limited to outpatient clinical interactions and explicitly excluded inpatient activities.

Results: A total of 72 participants from 19 different specialties enrolled in this study. The providers had, on average, 8.31 appointments per day during the reporting periods. The providers received, on average, 21.93 messages per day, and they spent 7.61 minutes on average in the time in In Basket per day metric and 1.84 minutes on average in the time in In Basket per appointment metric. The time for the providers to mark messages as done ( turnaround time ) was on average 11.45 days during the reporting period. Although the surgical group had, on average, approximately twice as many appointments per scheduled day, they spent considerably less connected time (based on almost all time metrics) than the medical group. However, the surgical group took much longer than the medical group to mark messages as done ( turnaround time).

Conclusions: We observed a range of patterns with no consistent direction. There does not seem to be evidence of a “learning curve,” which would have shown a consistent reduction in time spent on the system over time due to familiarity and experience. While this study does not show how the included metrics could be used as predictors of providers’ satisfaction or feelings of burnout, the use trends could be used to start discussions about future Canadian studies needed in this area.

Introduction

Electronic health record (EHR) systems have been implemented with many goals including streamlining information sharing among providers, empowering patients to be active partners in their care, supporting evidence-based individualized care, and monitoring population health. Health care organizations implement EHR systems with the expectation of improved patient care and enhanced provider performance [ 1 , 2 ].

EHR systems are not new in Canada [ 3 ]; however, their implementation has been faced with delays, changes in vendors, and reluctant adoption by users [ 4 ]. Canada continues to see activity in EHR implementation including in British Columbia [ 5 ], Saskatchewan [ 6 ], Ontario [ 7 ], Alberta (Connect Care; Epic Systems) [ 8 ], and Nova Scotia [ 9 ].

While EHR systems hold the potential to improve care delivery, they can also contribute to unintended negative consequences, such as patient safety issues, communication problems, and provider burnout [ 10 - 13 ]. Rather than implementing EHR systems and waiting to identify unintended consequences, we should proactively identify metrics to measure the impact of these EHR systems on the work of health care providers and enable ways to improve the diffusion and the subsequent adoption of EHR systems in Canada [ 9 - 12 ].

Advantages of the EHR System

EHRs are systems designed “to collect patient data in real time to enhance care by providing data at the provider’s fingertips and enabling decision-making where it needs to occur” [ 14 ]. These systems provide functions such as viewing (eg, laboratory or test results); documenting (eg, entering data and notes); ordering (eg, referrals, prescriptions, and tests); web-based messaging (eg, notifying patients of test results); care management (eg, disease-specific tools and allergy alerts); analysis and reporting; and patient-directed engagement capabilities (eg, access to own laboratory values and web-based messaging care providers) [ 15 - 20 ]. EHRs can provide benefits such as easy access to accurate and timely point-of-care data, easy navigation to enhance workflow, automation of mundane tasks, evidence-based management pathways to individualize care, convenient sharing of data across organizations, and population health monitoring [ 13 , 19 , 21 ]. Furthermore, when fully implemented, in some instances, EHRs have resulted in the comprehensive replacement of traditional paper charts [ 13 ].

Burden of EHR System

Although EHR systems are designed to deliver positive outcomes, unintended and technology-specific negative outcomes have also been described related to the workflow, patient-provider interactions (technology seen as impersonal), and the challenges of implementing these technologies within current health care systems [ 10 ]. There is increasing awareness that physician well-being has an important impact on the health system, and concerns exist over increasing rates of burnout [ 11 ], job dissatisfaction, intention to leave practice, and job turnover [ 22 ]. Among the consistently reported drivers of burnout and job dissatisfaction are adverse clinician interactions with EHRs. While EHRs are intended to streamline workflows, they are cited as increasing the inefficiency of clinical work, adding to user frustration [ 23 ].

Studies suggest that in clinical environments that include an integrated EHR system, physicians spend an additional 1.5 hours [ 16 ] of time using the EHR system for every 1 hour of direct patient interaction, with an additional 6 to 30 hours [ 17 , 24 ] per month of cumulative time spent on EHR documentation and inbox management outside routine working hours [ 25 ]. Furthermore, studies have demonstrated a negative impact on providers’ time due to managing test results and communications within In Basket, the EHR system’s communication hub [ 24 , 26 ] for administrative, clerical, and documentation functions and after-hour activities related to accomplishing required tasks [ 23 ]. In Basket is where health care providers receive and manage various tasks; web-based messages; and notifications such as appointment requests, medication refill requests, laboratory or imaging results, consultation requests, and web-based messages from other health care team members [ 27 ]. Many of these tasks, previously performed by administrative staff, have been increasingly offloaded to providers.

In a primary care setting, a health care provider may receive anywhere from a few to several dozen In Basket messages per day [ 28 ]. In specialized settings, such as a hospital or specialized clinic, the volume of In Basket messages can be higher, especially for providers who are involved in complex cases or have a larger patient population [ 29 , 30 ]. Specialists may receive additional types of web-based messages, such as interdepartmental consult requests or referrals from primary care providers. Due to these challenges, the time spent in In Basket activities can vary depending on factors such as the volume of web-based messages, complexity of tasks, and individual work practices [ 26 ]. The number of In Basket messages received per day can vary significantly depending on several factors, such as the size of the health care organization, the specialty or department, and the individual provider’s practice [ 31 ]. Managing In Basket time requires a balance between efficiency, prioritization, and effective communication to ensure timely and appropriate handling of tasks while delivering quality patient care [ 26 ]. Health care organizations have attempted to manage these challenges with one-on-one provider training, optimization and upgrade of processes, increased availability of technical support, added or expanded use of scribes, voice recognition, and improved EHR governance [ 32 , 33 ].

While physician burnout and distress, more broadly, are prevalent issues in Canada [ 34 ], to our knowledge, research on EHR use and physician well-being in a Canadian context has been limited, so the relationship of EHRs to surrogate outcomes of well-being (eg, burnout) in the Canadian context is unknown. This knowledge gap is significant, given the substantial disparities between the Canadian and American health care systems, particularly concerning documentation requirements for billing, insurance, and medicolegal purposes, which EHR systems are designed to streamline [ 32 ]. Notably, while EHR system implementation is underway in Canadian hospitals, most US hospitals have already adopted these EHR systems [ 35 , 36 ].

Canada operates under a federated system where health governance is federal, but health care delivery and associated tasks such as EHR implementation are managed provincially. Moreover, unlike in the United States, where much of the charting is focused on billing, the Canadian health care landscape differs substantially, as billing is not part of the need for implementing EHR systems [ 35 , 36 ]. These contextual distinctions between the 2 health care systems mean that research findings from one setting cannot simply be extrapolated to the other [ 13 , 21 , 23 , 24 , 26 , 37 ].

The extent to which EHRs contribute to physician dissatisfaction in Canada, akin to their presumed impact in the United States, remains uncertain. This study lays the groundwork for addressing this gap in knowledge by studying the use metrics of an EHR system implementation in Alberta. This study provides essential insights that not only pave the way for future investigations into the correlation between clinician well-being and EHRs in Canadian contexts but also inform interventional studies aimed at enhancing the user experience. In addition, our findings contribute to the development of best practices for EHR system implementation and use.

This study documented In Basket metrics of an EHR system implemented by Alberta Health Services (AHS), the province-wide health delivery system in Alberta branded as Connect Care. Understanding and documenting granular use metrics of Connect Care in Alberta is foundational for future studies examining the relationship between clinician wellness and EHRs in a Canadian setting, for example, understanding how the EHR-clinician interface contributes to adverse unintended consequences such as burnout. Through gaining an in-depth understanding of how the EHR system in Alberta captures In Basket metrics, this study was designed as a precursor to forthcoming studies examining the association between provider wellness and EHR systems in Canadian settings and to studies focused on improving the EHR system’s user experiences and developing best practices for the EHR system rollout and subsequent use.

Study Design

This was a descriptive study of a volunteer cohort of multidisciplinary specialists working at the University of Alberta Hospital (UAH) [ 38 ]. As the goal of the study was to measure the trends in In Basket use in the EHR system from the launch of Connect Care by specialists at the UAH over 33 months of use, this study design allowed us to commence the identification of how the EHR was used, when it was used, and for how long it was used related to the In Basket activities. To the best of our knowledge, this study is the first to explore the In Basket use of Connect Care by specialists in Alberta.

Explored In Basket Domain Measures (Metrics)

In Basket metrics refer to performance indicators that assess the efficiency and effectiveness of managing tasks, web-based messages, or alerts within the EHR system. These metrics assess various aspects of workflow management and communication within the EHR environment. In this study, we used the following In Basket metrics to capture the use of the In Basket toolbar by study participants: time spent in the In Basket per day , time spent in the In Basket per appointment , number of In Basket messages received per day , and turnaround Time . The time in In Basket per day metric is defined as the average number of minutes a provider spends in In Basket per day. The time in In Basket per appointment metric is the average number of minutes a provider spends in In Basket per scheduled appointment. The In Basket messages received per day metric is the average number of In Basket messages a provider receives per day. The turnaround time metric is the average number of days a provider takes to mark a message of a specific type as done. Furthermore, In Basket metrics included appointments per day, which is the average number of appointments per day within the reporting period for comparison purposes (workload vs use) between the participating specialists.

Ethics Approval

Ethics approval was received from the University of Alberta Health Research Ethics Board (study ID Pro00119194), and operational approval was received from AHS (OA60778, OA60779, and OA60780).

Connect Care is a comprehensive EHR system that allows users to access, generate, and manage documents, laboratory results, text reports, radiology images, notes, prescriptions, referrals, and web-based messages. Furthermore, Connect Care contains advanced auditing capabilities that record the actions of users when accessing the EHR system.

Study Setting

AHS is Canada’s largest integrated provincial health system and is responsible for delivering health services to >4.3 million people. Health care programs and services are offered at >900 facilities throughout the province (eg, hospitals, clinics, continuing care facilities, cancer centers, mental health facilities, and community health sites) [ 39 ]. The UAH is a quaternary care research and teaching hospital in Edmonton, Alberta. This hospital provides a wide range of inpatient and outpatient diagnostic and treatment services [ 40 ]. Study sites within the UAH were selected based on the length of time that they had been using the EHR system. The departments of medicine and surgery at the UAH were part of the first wave of the AHS Connect Care implementation. The specialists in these departments were considered to have used Connect Care for a time period that would provide sufficient use data required for this study.

Study Sample Recruitment

We decided on the following inclusion criteria for potential study participants: (1) any specialist located at the UAH and (2) ≥7 months of Connect Care use.

We used a purposive sampling method to recruit specialists. The clinical coinvestigators (RPP, DB, and NK) introduced and explained the project at departmental meetings. RPP developed a PowerPoint (Microsoft Corp) presentation, which was adapted by DB and NK to fit the context of their respective departments. During these presentations, the coinvestigators started by describing the potential impact of EHRs on provider well-being, the lack of Canadian use data, the need to understand the user experience, and the opportunity for EHR improvement driven by users. Furthermore, potential participants were informed that their individual results from the study would be shared with them. The clinical leads emailed all attending specialists asking them to complete the consent form (using REDCap [Research Electronic Data Capture]; Vanderbilt University) and provide the required information (eg, department, EHR login ID, clinical workload defined by the self-reported fraction of a full-time equivalent [FTE] measure, and work position) for data access.

Data Source

The raw In Basket data source was from Signal (an analytical platform developed by Epic Systems Corporation) using EHR user action log data (Epic Systems Corporation, unpublished data, April 2023). The user action log measures the time that the user interacted with the EHR system. The metrics captured in Signal are defined, and quantifiable measurements are used in reports to summarize information about processes or outcomes (Epic Systems Corporation, unpublished data, September 2020). Information about time spent in particular ambulatory (outpatient) In Basket activities (user action logs) was obtained for each participant from their first login to the EHR system. The analysis was limited to outpatient clinical interactions and explicitly excluded inpatient activities.

Once a specialist agreed to participate in the study, their name, login ID, and study ID were stored in a zipped and encrypted file and sent to the AHS Connect Care and Epic data team through REDCap to retrieve the required event logs data. REDCap is a secure web-based platform hosted by the Women and Children’s Health Research Institute in collaboration with the Northern Alberta Clinical Trials and Research Centre at the University of Alberta. Once the Epic data team reviewed the requested information, data were pulled and transferred to the AHS Connect Care team. The anonymized data were zipped and encrypted before being transferred to the principal investigator for analysis.

Data Description

Participants.

A total of 72 participants from 19 different specialties enrolled in this study. Of the 72 providers, 1 (1%) provider was excluded due to an absence of In Basket outpatient ambulatory Signal data. Due to the diversity of specialties, the providers were grouped into a medical group and a surgical group based on previous similar studies and the fact that these categories have similar EHR workflows [ 41 , 42 ].

The participants were further subgrouped based on their self-reported clinical FTE measure. Clinical FTE is a measure used in health care to quantify the work hours of health care providers or clinical staff in relation to a full-time position. This resulted in 3 subgroups for analysis: medical FTE <0.5 , medical FTE >0.5 , and surgical (all of whom reported FTE >0.5) groups.

In this study, providers in each group are independent of each other (ie, each provider contributes to the weighted means of only 1 group). However, for each In Basket metric, various subsets of providers in the group (ie, medical FTE <0.5, medical FTE >0.5, or surgical group) contribute to the weighted mean of various reporting periods.

Missing Values

Once the EHR data for each of the 72 providers was received, we identified missing values. As this study is one of the first to explore the provider’s use of Connect Care in the Alberta context, we wanted to gain an in-depth understanding of the missing In Basket outpatient ambulatory provider-related Signal data.

On the basis of discussions with the Epic team, the study team identified 3 reasons for missing values in the data. The first reason was that a participant must be “registered” with Connect Care (AHS), be active, and must have logged in to the EHR system and seen at least 1 patient in the reporting period [ 38 ]. Second, for the time in In Basket per appointment metric, there was an additional inclusion criterion where the provider needed at least 5 appointments scheduled per week within the reporting period for Signal to capture user interactions in the EHR system [ 2 ]. We identified this as an issue as many part-time specialists might have ≤4 appointments per week; for example, if they were on ward duties, they would be managing only inpatients during that time. Although they interacted with the EHR system, no data would be recorded for these metrics. Since inpatient data were not studied, the true impact of EHR system use might be underestimated. The third reason for missing data is that the EHR system did not capture any data for certain metrics for all participating providers during certain months such as the In Basket messages received per day (missing data for all providers during April 2021, May 2021, July 2021, August 2021, and September 2021) and time in In Basket per appointment (April 2021) metrics. Neither we nor the analysts from Epic Systems could determine the root cause of the missing data.

On the basis of these findings, we used a complete case analysis to address missing values [ 29 ]. The observations with denominator=0 were excluded. The weighted averages did not capture the missing values data. As each In Basket metric was considered individually, a provider had to have at least 1 month of data for a particular In Basket metric to be included in the metric analysis.

Data Ranges

The start date was November 1, 2019 (the date of launch of Connect Care), and the end date was July 30, 2022, for the In Basket metrics.

Depending on the available data for the metrics, the monthly reporting periods included in the analysis ranged between 14 and 33 months. The overall amount of data varied between 1528 (15.92%) observations for the time in In Basket per appointment metric and 2203 (22.95%) observations for the time in In Basket per day metric. The total number of observations for all included metrics was 9598.

Statistical Analysis

Data aggregation, analysis, and visualization were performed using SAS (version 9.4; SAS Institute) and Tableau (version 2021.4.3; Tableau Software, LLC) [ 43 , 44 ]. The numerator and denominator from each metric were used to calculate the weighted daily means of all participants and each group.

A 2-sample t test (2-tailed) was used to compare the weighted daily mean of every metric for the medical FTE >0.5 group with the medical FTE ≤0.5 group and compare those of the medical FTE >0.5 group with the surgical FTE >0.5 group. A weighted average calculates the mean of a data set while considering the varying importance or significance of each number within the set. This approach is commonly used in statistical analysis. It is a critical tool for addressing fluctuations, managing uneven or distorted data, and ensuring fair representation of similar data points based on their respective weights.

In time-series analysis, such as the one we have conducted, time-weighted averages were used because the time series was not evenly sampled. Ideally, data points in a time series are evenly spaced, such as hourly, daily, or monthly intervals, where each point carries equal weight. However, in our data set, reporting periods were irregular, with varying lengths ranging from 27 to 35 days. Consequently, these reporting periods had different weights. To address this, we converted the reporting periods to a daily scale, ensuring each data point carried equal weight. In summary, a time-weighted average assigns weight to each value based on its duration relative to surrounding points, leading to significantly improved accuracy in the final calculation.

Trend analysis was used to evaluate the use trends over time to determine changes in Connect Care use by the participating providers. A simple moving average (SMA) curve was used to explore the learning curves (changes over time) for each metric [ 45 , 46 ]. A linear trend line was fitted to the SMA curve for each group (ie, medical FTE <0.5, medical FTE >0.5, and surgical groups) based on each included metric to determine the changes in trends (ie, whether the slope increased, decreased, or remained unchanged).

In all these analyses, a P value of <.05 was considered statistically significant.

Participant Characteristics

In total, 71 providers were included in the analysis. Of the 71 providers, 29 (40%) were women providers and 43 (60%) were men providers. The analysis did not compare results by age or gender because the numbers were small. The largest specialty group was internal medicine (n=14, 20%), followed by nephrology (n=10, 14%) and general surgery (n=9, 13%). The least represented specialties were dermatology, intensive care, neurosurgery, and cardiac surgery, at about 1% (n=1) each. Due to the diversity of specialties, the providers were grouped into a medical group (n=53, 75%) and a surgical group (n=18, 25%) based on previous similar studies ( Multimedia Appendix 1 ) [ 41 , 42 ].

Furthermore, the self-reported FTE was used to further subgroup participants. Of the 53 participants in the medical group, 27 (51%) participants reported FTE <0.5 and 26 (49%) participants reported FTE >0.5. All 18 (100%) surgical specialists reported FTE >0.5. This resulted in 3 subgroups: medical FTE <0.5, medical FTE >0.5, and surgical (all FTE >0.5) groups.

Overall Results

Table 1 shows the weighted daily means for all participating providers (including weighted daily means for the medical and surgery groups) for each metric in this study. The use of weighted daily means indicates a more precise method for determining the average appointments per day compared to a simple average based solely on the number of providers and reporting periods. In this study, because the reporting periods varied in duration, they were assigned different weights based on the number of days within each period. This adjustment ensured a more accurate representation of daily appointment averages.

On the basis of the weighted daily means adjustment, each provider had, on average, 8.31 appointments per day during the entire reporting period. The providers received, on average, 21.93 web-based messages per day and spent 7.61 minutes on average in the time in In Basket per day metric and 1.84 minutes on average in the time in In Basket per appointment metric. The time for the providers to mark messages as “done” (meaning that they had completed tasks associated with them; Turnaround Time) was, on average, 11.45 days during the reporting period. Although the surgical group had, on average, approximately twice as many appointments per scheduled day, they spent considerably less “connected time” (based on almost all time metrics) than the medical group. However, the surgical group took much longer than the medical group to mark messages as done (Turnaround Time; Table 1 ).

Table 2 shows the weighted daily means per provider group (ie, medical FTE <0.5, medical FTE >0.5, and surgical groups) for each metric in this study. According to the raw data, the medical FTE <0.5 and the surgical groups had, on average, more appointments per day during the reporting period than the medical FTE >0.5 group. In addition, all the time metrics indicate that the medical FTE <0.5 group had less time on Connect Care than the medical group FTE >0.5. The same was observed between the medical FTE >0.5 and the surgical groups, except for the turnaround time metric ( Table 2 ).

a Surgical group versus medical FTE >0.5 group comparison: P value=.07.

Trend Analysis

Table 3 presents the results of the trend analysis. All 3 groups had a statistically significant increase in the appointments per day and turnaround time metrics over the study period.

As presented in Table 3 , for the medical FTE ≤0.5 group, the appointments per day , In Basket messages received per day , time in In Basket per appointment , and turnaround time metrics showed statistically significant changing slopes (increasing trends over time), while the time in In Basket per day metric remained unchanged. The largest slope for this group was observed for the turnaround time metric with a value of 0.0055.

For the medical FTE >0.5 group, all metrics showed statistically significant changes ( Table 3 ). This group showed the largest number of statistically significant trend changes among the 3 studied groups. A total of 3 metrics (ie, appointments per day , In Basket messages received per day , and time in In Basket per day ) that showed statistically significant changes had increasing trends, while the time in In Basket per appointment metric showed statistically significant changes with a negative slope (decreasing trend).

a Increasing: positive slope and P value is statistically significant.

b Unchanged: P value is not statistically significant.

c Decreasing: negative slope and P value is statistically significant.

For the surgical group, the appointments per day and turnaround time metrics showed a statistically significant increasing trend, while the time in In Basket per day and time in In Basket per appointment metrics showed a statistically significant decreasing trend.

Although there were increasing and decreasing patterns among the included metrics, there were no obvious patterns across metrics and among groups. Therefore, there does not seem to be evidence of a “learning curve,” which would have shown a consistent reduction in time spent in the EHR system over time due to familiarity and experience.

Findings by Metric

The following sections describe the findings for each metric.

Appointments Per Day

During the reporting period, the weighted daily average number of appointments per day was 8.31 (95% CI 8.27-8.35) for all providers. For the medical group, the daily weighted average was 6.41 (95% CI 6.39-6.44), while for the surgical group, this number was 14.01 (95% CI 13.93-14.10) appointments per day. The weighted daily mean for the medical FTE ≤0.5 group (mean 6.47, 95% CI 6.44-6.49), compared to the mean for the medical FTE >0.5 group (mean 6.36, 95% CI 6.33-6.39), was significantly different ( Multimedia Appendix 2 ).

Although the slope changes were subtle, the SMA trends for the appointments per day metric for all 3 groups were statistically increasing over time ( Multimedia Appendix 2 ).

In Basket Messages Received Per Day

The weighted daily mean of web-based messages received was 21.93 (95% CI 21.64-22.22) messages for all 71 providers. The weighted daily mean for the medical FTE >0.5 group was significantly larger than that for the medical FTE ≤0.5 group. Furthermore, the difference between the weighted daily mean values of the medical FTE >0.5 group (mean 23.29, 95% CI 22.70-23.57) and the surgical group (mean 21.70, 95% CI 21.45, 21.94; P <.001) was statistically significant ( Multimedia Appendix 2 ).

In June 2021, Signal data recorded that 1 particular specialist received an unusually large number of In Basket messages. After an examination, it was determined that this was due to the EHR system sending a batch of all laboratory results from many patients to this particular medical specialist, who was probably on call. The spike from this individual’s data is reflected in the 2 graphs related to the medical group (FTE >0.5) and the graph for all providers ( Multimedia Appendix 2 ). While this particular case may be seen as an outlier, it serves as an illustration of what can potentially happen within an EHR system. Instances like this one may not be uncommon.

According to the SMA trend analysis, both medical groups experienced statistically significant increasing trends in this metric, while the surgery group’s trend remained statistically unchanged. The trend change was much more pronounced for the medical FTE >0.5 group (slope=0.0194) than that of the medical FTE ≤0.5 group (slope=0.0047). Notably, for the medical FTE >0.5 group, this metric had the largest slope and was the fastest changing over time ( Multimedia Appendix 2 ). This might have been because of the “anomaly” of a single physician in the medical FTE >0.5 group receiving a very large number of emails, as described in the previous paragraph. Furthermore, these results show the situations that are possible within the EHR system and need to be recognized.

Time in In Basket Per Day

The weighted daily mean for all providers was 7.61 (95% CI 7.59-7.64) minutes in In Basket per day. The weighted daily mean for the medical group was 8.86 (95% CI 8.84-8.89) minutes in In Basket per day, while that for the surgical group was 3.95 (95% CI 3.92-3.97) minutes per day. The medical FTE ≤0.5 group’s weighted daily mean was 7.98 (95% CI 7.95-8.02) minutes in In Basket per day, and the weighted daily mean for the medical FTE >0.5 group ( P <.001) was 9.73 (95% CI 9.68-9.77) minutes per day. The surgical group spent less time in In Basket per day than the medical FTE >0.5 group (mean 3.95, 95% CI 3.92-3.97, vs mean 9.73, 95% CI 9.68-9.77; P <.001; Multimedia Appendix 2 ).

On the basis of the trend analysis, the medical FTE >0.5 group showed a statistically significant increasing trend for this metric, while the surgery group showed a statistically significant decreasing trend and the medical FTE ≤0.5 group’s trend stayed statistically unchanged. While the result of trend analysis for this metric is different for each group, it is important to note that the slopes for each group were very small and clinically insignificant ( Multimedia Appendix 2 ).

Time in In Basket Per Appointment

After analyzing 1528 observations related to the time that providers spent in the In Basket per appointment, the total average time for both surgical and medical groups was 1.84 (95% CI 1.83-1.85) minutes. The surgical groups spent 0.60 (95% CI 0.59-0.61) minutes, while the medical group spent 2.78 (95% CI 2.77-2.79) minutes. The weighted daily mean for the medical FTE ≤0.5 group was significantly different compared to the mean for the medical FTE >0.5 group (mean 2.69, 95% CI 2.68-2.71, vs mean 2.88, 95% CI 2.86-2.90; P <.001). Furthermore, a significant difference was observed when comparing the medical FTE >0.5 group (mean 2.88, 95% CI 2.86-2.90) and the surgical group (mean 0.60, 95% CI 0.59, 0.61; P <.001; Multimedia Appendix 2 ).

For the time in In Basket per appointment metric, the medical FTE ≤0.5 group was the only group that saw a statistical increase in their use over time. The other 2 groups showed a statistical decrease in their use over time for this metric ( Multimedia Appendix 2 ).

Turnaround Time

Turnaround time is a metric group under the In Basket category within Signal. It reports the average number of days a provider takes to mark a message of a specific type as “done.” According to the data, the surgical group spent 16.22 (95% CI 14.69-17.76) days on average to mark messages as done. The medical group spent, on average, 9.72 (95% CI 9.21-10.23) days to mark messages as done ( Multimedia Appendix 2 ). For this metric, a significant difference was observed when comparing the 2 medical groups and between the medical FTE >0.5 and the surgical group. The study team was unable to identify the reasons for the delays.

For this metric, spikes in recorded data were observed ( Multimedia Appendix 2 ) for 2 study participants (1 medical and 1 surgical specialist) over several months, indicating extremely long delays in marking received messages as “done.” An explanation for these anomalies in data capture within the turnaround time metric remains elusive. Once more, we encounter an outlier; nonetheless, it serves as an example of potential EHR system use scenarios.

On the basis of the SMA trend analysis, all 3 groups experienced statistically increasing trends over time for their turnaround time metric ( Multimedia Appendix 2 ). The largest slope (0.0175) belonged to the surgical group and the smallest slope (0.0055) belonged to the medical FTE ≤0.5 group for this metric.

Principal Findings

Implementing Connect Care by AHS has transformed how providers capture and share information by establishing changes to workflows, processes, and charting approaches [ 47 ]. While the overall objective is to establish uniformity in the EHR system’s use, this study has revealed disparities in the timing of task completion within the EHR system. Furthermore, in certain cases, outliers have emerged whose use patterns are not easily explained with the existing data. This study revealed significant gaps in our understanding of EHRs and In Basket management, highlighting the need for further exploration and comprehension in these areas.

Khairat et al [ 48 ] evaluated the time spent by general and specialist pediatricians performing clinical documentation and In Basket tasks outside work hours. Specialists spent more time in the EHR system, and “this may be because specialists see more complex patients and, therefore, need more time to review the patient chart and to respond to In Basket messages” [ 48 ]. Although in our study, we cannot say what percentage of workload the providers spent on In Basket activities, we identified that they spent 7.61 minutes in the time in In Basket per day metric and 1.84 minutes in the time in In Basket per appointment metric . According to the raw data, the medical FTE <0.5 group and the surgical group had, on average, more appointments per day during the reporting period than the medical FTE >0.5 group. It would be valuable to explore the main workflow drivers of In Basket time and try to optimize efficiency in this area for all specialties.

The proportion of time spent in the EHR system based on the included metrics between the providers within the medical groups (FTE ≤0.5 and FTE >0.5) was similar; however, little can be concluded about the similarities or differences in use due to the high variability within the specialties. Although data analysis showed statistical significance for all metrics, it is apparent that FTE made no difference to the workload between providers working (FTE ≤0.5 or FTE >0.5). Before comparing part-time medical providers with full-time ones, we could not definitively attribute the observed differences between the medical and surgical groups to the fact that some medical providers worked part time or to the fact that all surgical providers worked full time. Our study did not reveal important differences in In Basket metrics among medical specialists regardless of the clinical FTE . Significant differences were observed between medical and surgical colleagues. Presumably, these differences relate to broad differences in medical versus surgical consultation and their associated workflows.

When comparing the 3 groups, the medical FTE >0.5 group was more “connected” than the medical FTE <0.5 group and the surgical group when considering the time in In Basket per day and time in In Basket per appointment metrics. Although the surgical group treated more patients (on average, 14.01 appointments during the reporting period), they spent less time in In Basket per day and per appointment, so they were “less connected” than the 2 medical groups. Nonetheless, while the data do not provide a direct explanation for these differences, they do provide insight into the structuring of workloads. This insight is crucial for comprehending how various professionals use the EHR system and identifying areas where workflow enhancements could prove beneficial.

We identified several providers’ data that were outliers in terms of their EHR use. For example, 2 providers took inordinately long times to mark received web-based messages as done ( turnaround time ), which impacted the data on between-group differences. Furthermore, there was a medical provider who received an extremely high number of web-based messages in June 2021. Such outliers demonstrate that certain scenarios can significantly influence the averages of various metrics, leading to skewed results. It underscores the possibility that data generated by the EHR system may not always be accurate, emphasizing the need for discussions and considerations with EHR system vendors regarding EHR functionality and measures to reduce outlier occurrences. Future research with a more robust statistical approach should be conducted to delve deeper into addressing and mitigating anomalies in the data.

One factor that we identified in our study is that Connect Care did not capture all interactions due to various vendor-imposed rules (eg, 5 appointments per reporting period). Similarly, Cohen et al [ 49 ] identified issues with vendor-derived metrics and how different vendors calculated the same activities in different ways and identified that not all EHRs (vendors) drew information from audit log data, which led to the inability to provide the whole picture of provider’s interaction with the EHR system [ 49 ]. Therefore, using only vendor-derived metrics may miss important aspects of the true impact of the EHR system on users. In the study by Cohen et al [ 49 ], 1 participant stated that “if different EHR (vendors) are attacking the issue differently, you will get variations not related to burden but just how the math is done.” Documentation time for In Basket use must be captured completely with the intent to understand how In Basket contributes to the overall workload of providers. If EHR systems are being associated with burnout, In Basket messages could be a starting point for common ground around the discussion of how web-based messages should be delivered and managed [ 50 ].

Future Directions

On the basis of the results from this study, we identified several future studies that can build upon this study. This study was descriptive and did not explore the correlation between the included metrics and provider satisfaction and burnout due to EHR system use. The next step would be to conduct a study exploring the circumstances around the individual EHR data. It would be valuable to explore the main workflow drivers of In Basket use time and try to optimize efficiency in this area for all specialties. A qualitative study should be conducted to explore the variances between actual and perceived EHR system use. While data from this study do not provide a direct explanation for these differences, they do provide insight into the structuring of workloads.

Furthermore, future studies should focus on the difference between providers with part-time and full-time clinical schedules and how that translates into EHR use. This insight is key for understanding how various professions use the EHR system in order to identify areas where workflow enhancements could prove beneficial. Moreover, future research should explore EHR use between different specialties and whether these specialties impact EHR use habits. In addition, studies should explore the association between other metrics and quality outcomes. Finally, future studies need to work on developing strategies for EHR data quality appraisal. In our study, we identified that the data generated by the EHR system may not always be accurate, emphasizing the need for discussions and considerations with EHR vendors regarding EHR functionality. In future studies with a more robust statistical approach, there may be an opportunity to delve deeper into addressing and mitigating anomalies in the data.

Limitations

This study has several limitations. The analyzed data were only the participating providers’ ambulatory (outpatient) data. Inpatient data were not included, which might have provided additional information on some of the metrics (inpatient data were unavailable for all included metrics in this study). Another limitation is the underestimation of some metrics based on how Epic defines and captures activity (eg, a provider needs at least 5 appointments scheduled per week within the reporting period and inbox activities related to phone calls or chart review). Furthermore, to address some In Basket issues, a person may need to access other parts of the EHR system to gather more information or complete some other task (eg, write a prescription) and only then go back into the In Basket to sign off on it. Therefore, the actual time in the In Basket is a systematic underestimation of the actual time it took to complete a task.

Due to the high variability of specialties (19 in total) and the low number of recruited providers for each specialty (ranging between 1 and 14 providers), we were unable to explore and compare the differences in EHR use between the specialties. The small number of participants might have created a bias regarding the reasons for participation. Another study limitation was that the FTE was self-reported, which might have led to providers over- or underestimating their clinical schedules. The final limitation is that we did not evaluate the types of web-based messages that the providers received in the In Basket. As this is one of the first studies evaluating Connect Care, we deemed that the focus should be on the overall metrics rather than the submetrics or categories.

Conclusions

This study demonstrated the enormous promise of the ability to harvest data from an EHR that describes system use and the potential impact that it has on the workflow of physicians. To take complete advantage of this, there must be an appropriate understanding of how EHR systems capture and measure the use by providers. This would be foundational to forthcoming studies examining the association between provider wellness and EHR systems in Canadian settings and studies focused on improving the EHR system’s user experiences, developing best practices for EHR systems rollout and subsequent use, and understanding how the interface of the user and the EHR systems interrelate. Although this study does not show how the included metrics could be used as predictors of providers’ satisfaction or feeling of burnout, the use trends could be used to start discussions about future Canadian studies needed in this area.

Acknowledgments

The authors would like to acknowledge and thank the providers who participated in this study. Without their help, this research would not have been possible. The authors would also like to thank the staff from Alberta Health Services and Epic Corporation for their willingness to provide the required data. Furthermore, the authors would like to thank the Canadian Institutes of Health Research (grant 180993) for providing funding for this work. Without this funding support, this study would have not been possible.

Conflicts of Interest

None declared.

Grouping of included providers according to specialties.

Findings by metric.

  • Donnelly C, Janssen A, Vinod S, Stone E, Harnett P, Shaw T. A systematic review of electronic medical record driven quality measurement and feedback systems. Int J Environ Res Public Health. Dec 23, 2022;20(1):200. [ FREE Full text ] [ CrossRef ] [ Medline ]
  • Negro-Calduch E, Azzopardi-Muscat N, Krishnamurthy RS, Novillo-Ortiz D. Technological progress in electronic health record system optimization: systematic review of systematic literature reviews. Int J Med Inform. Aug 2021;152:104507. [ FREE Full text ] [ CrossRef ] [ Medline ]
  • Boothe C, Bhullar J, Chahal N, Chai A, Hayre K, Park M, et al. The history of technology in nursing: the implementation of electronic health records in Canadian healthcare settings. Can J Nurs Inform. Sep 23, 2023;18(3). [ FREE Full text ]
  • Chang F, Gupta N. Progress in electronic medical record adoption in Canada. Can Fam Physician. Dec 2015;61(12):1076-1084. [ FREE Full text ] [ Medline ]
  • Electronic medical and health records. HealthLink BC. URL: https://tinyurl.com/mrxyuky6 [accessed 2021-03-27]
  • MySaskHealthRecord. eHealth Saskatchewan. URL: https://www.ehealthsask.ca/MySaskHealthRecord/MySaskHealthRecord [accessed 2021-03-27]
  • What’s an EHR? eHealth Ontario. URL: https://ehealthontario.on.ca/en/patients-and-families/ehrs-explained [accessed 2021-10-08]
  • Connect care. Alberta Health Services. URL: https://www.albertahealthservices.ca/cis/cis.aspx [accessed 2021-10-08]
  • Primary health information management and electronic medical records. Government of Nova Scotia. URL: https://novascotia.ca/dhw/primaryhealthcare/PHIM-EMR.asp [accessed 2021-10-08]
  • Buntin MB, Burke MF, Hoaglin MC, Blumenthal D. The benefits of health information technology: a review of the recent literature shows predominantly positive results. Health Aff (Millwood). Mar 2011;30(3):464-471. [ FREE Full text ] [ CrossRef ] [ Medline ]
  • Rotenstein LS, Torre M, Ramos MA, Rosales RC, Guille C, Sen S, et al. Prevalence of burnout among physicians: a systematic review. JAMA. Sep 18, 2018;320(11):1131-1150. [ FREE Full text ] [ CrossRef ] [ Medline ]
  • Physician burnout. Agency for Healthcare Research and Quality. URL: https://www.ahrq.gov/prevention/clinician/ahrq-works/burnout/index.html [accessed 2022-01-13]
  • National Academies of Sciences, Engineering, and Medicine, Committee on Systems Approaches to Improve Patient Care by Supporting Clinician Well-Being, National Academy of Medicine. Taking Action Against Clinician Burnout: A Systems Approach to Professional Well-being. Washington, DC. National Academies Press; 2019.
  • McGonigle D, Mastrian K. Nursing Informatics and the Foundation of Knowledge. Burlington, MA. Jones & Bartlett Learning; 2017.
  • Ouyang D, Chen JH, Hom J, Chi J. Internal medicine resident computer usage: an electronic audit of an inpatient service. JAMA Intern Med. Feb 2016;176(2):252-254. [ FREE Full text ] [ CrossRef ] [ Medline ]
  • Sinsky C, Colligan L, Li L, Prgomet M, Reynolds S, Goeders L, et al. Allocation of physician time in ambulatory practice: a time and motion study in 4 specialties. Ann Intern Med. Dec 06, 2016;165(11):753-760. [ CrossRef ] [ Medline ]
  • Arndt BG, Beasley JW, Watkinson MD, Temte JL, Tuan WJ, Sinsky CA, et al. Tethered to the EHR: primary care physician workload assessment using EHR event log data and time-motion observations. Ann Fam Med. Sep 2017;15(5):419-426. [ FREE Full text ] [ CrossRef ] [ Medline ]
  • Islam MM, Poly TN, Li YC. Recent advancement of clinical information systems: opportunities and challenges. Yearb Med Inform. Aug 2018;27(1):83-90. [ FREE Full text ] [ CrossRef ] [ Medline ]
  • Piscotty RJ, Tzeng HM. Exploring the clinical information system implementation readiness activities to support nursing in hospital settings. Comput Inform Nurs. Nov 2011;29(11):648-656. [ CrossRef ] [ Medline ]
  • Sheikh A, Sood HS, Bates DW. Leveraging health information technology to achieve the "triple aim" of healthcare reform. J Am Med Inform Assoc. Jul 2015;22(4):849-856. [ FREE Full text ] [ CrossRef ] [ Medline ]
  • Menachemi N, Collum TH. Benefits and drawbacks of electronic health record systems. Risk Manag Healthc Policy. 2011;4:47-55. [ FREE Full text ] [ CrossRef ] [ Medline ]
  • Degen C, Li J, Angerer P. Physicians' intention to leave direct patient care: an integrative review. Hum Resour Health. Sep 08, 2015;13:74. [ FREE Full text ] [ CrossRef ] [ Medline ]
  • Eschenroeder HC, Manzione LC, Adler-Milstein J, Bice C, Cash R, Duda C, et al. Associations of physician burnout with organizational electronic health record support and after-hours charting. J Am Med Inform Assoc. Apr 23, 2021;28(5):960-966. [ FREE Full text ] [ CrossRef ] [ Medline ]
  • Akbar F, Mark G, Warton EM, Reed ME, Prausnitz S, East JA, et al. Physicians' electronic inbox work patterns and factors associated with high inbox work duration. J Am Med Inform Assoc. Apr 23, 2021;28(5):923-930. [ FREE Full text ] [ CrossRef ] [ Medline ]
  • Berg S. Family doctors spend 86 minutes of “pajama time” with EHRs nightly. American Medical Association. Sep 11, 2017. URL: https:/​/www.​ama-assn.org/​practice-management/​digital/​family-doctors-spend-86-minutes-pajama-time-ehrs-nightly [accessed 2021-10-12]
  • Tai-Seale M, Dillon E, Yang Y, Nordgren R, Steinberg R, Nauenberg T, et al. Physicians' well-being linked to in-basket messages generated by algorithms in electronic health records. Health Aff (Millwood). Jul 2019;38(7):1073-1078. [ FREE Full text ] [ CrossRef ] [ Medline ]
  • Baxter SL, Saseendrakumar BR, Cheung M, Savides TJ, Longhurst CA, Sinsky CA, et al. Association of electronic health record Inbasket message characteristics with physician burnout. JAMA Netw Open. Nov 01, 2022;5(11):e2244363. [ FREE Full text ] [ CrossRef ] [ Medline ]
  • Lieu TA, Altschuler A, Weiner JZ, East JA, Moeller MF, Prausnitz S, et al. Primary care physicians' experiences with and strategies for managing electronic messages. JAMA Netw Open. Dec 02, 2019;2(12):e1918287. [ FREE Full text ] [ CrossRef ] [ Medline ]
  • Nguyen OT, Turner K, Apathy NC, Magoc T, Hanna K, Merlo LJ, et al. Primary care physicians' electronic health record proficiency and efficiency behaviors and time interacting with electronic health records: a quantile regression analysis. J Am Med Inform Assoc. Jan 29, 2022;29(3):461-471. [ FREE Full text ] [ CrossRef ] [ Medline ]
  • Cutrona SL, Fouayzi H, Burns L, Sadasivam RS, Mazor KM, Gurwitz JH, et al. Primary care providers' opening of time-sensitive alerts sent to commercial electronic health record InBaskets. J Gen Intern Med. Nov 14, 2017;32(11):1210-1219. [ FREE Full text ] [ CrossRef ] [ Medline ]
  • Akbar F, Mark G, Prausnitz S, Warton EM, East JA, Moeller MF, et al. Physician stress during electronic health record inbox work: in situ measurement with wearable sensors. JMIR Med Inform. Apr 28, 2021;9(4):e24014. [ FREE Full text ] [ CrossRef ] [ Medline ]
  • Brunken M, Bice C. Achieving EHR satisfaction in any specialty: impact report. KLAS Research. 2019. URL: https://klasresearch.com/archcollaborative/report/achieving-ehr-satisfaction-in-any-specialty/310 [accessed 2024-04-08]
  • Kroth PJ, Morioka-Douglas N, Veres S, Babbott S, Poplau S, Qeadan F, et al. Association of electronic health record design and use factors with clinician stress and burnout. JAMA Netw Open. Aug 02, 2019;2(8):e199609. [ FREE Full text ] [ CrossRef ] [ Medline ]
  • CMA national physician health survey: a national snapshot. Canadian Medical Association. 2018. URL: https://www.cma.ca/cma-national-physician-health-survey-national-snapshot [accessed 2024-04-08]
  • Holmgren AJ, Downing NL, Bates DW, Shanafelt TD, Milstein A, Sharp CD, et al. Assessment of electronic health record use between US and non-US health systems. JAMA Intern Med. Feb 01, 2021;181(2):251-259. [ FREE Full text ] [ CrossRef ] [ Medline ]
  • Essén A, Stern AD, Haase CB, Car J, Greaves F, Paparova D, et al. Health app policy: international comparison of nine countries' approaches. NPJ Digit Med. Mar 18, 2022;5(1):31. [ FREE Full text ] [ CrossRef ] [ Medline ]
  • Moy AJ, Schwartz JM, Chen R, Sadri S, Lucas E, Cato KD, et al. Measurement of clinical documentation burden among physicians and nurses using electronic health records: a scoping review. J Am Med Inform Assoc. Apr 23, 2021;28(5):998-1008. [ FREE Full text ] [ CrossRef ] [ Medline ]
  • Aggarwal R, Ranganathan P. Study designs: part 2 – descriptive studies. Perspect Clin Res. 2019;10(1):34-36. [ CrossRef ]
  • About AHS. Alberta Health Services. URL: https://www.albertahealthservices.ca/about/about.aspx [accessed 2020-08-19]
  • University of Alberta Hospital. University Hospital Foundation. URL: https://givetouhf.ca/university-of-alberta-hospital/ [accessed 2023-01-16]
  • Nath B, Williams B, Jeffery MM, O'Connell R, Goldstein R, Sinsky CA, et al. Trends in electronic health record inbox messaging during the COVID-19 pandemic in an ambulatory practice network in New England. JAMA Netw Open. Oct 01, 2021;4(10):e2131490. [ FREE Full text ] [ CrossRef ] [ Medline ]
  • Saag HS, Shah K, Jones SA, Testa PA, Horwitz LI. Pajama time: working after work in the electronic health record. J Gen Intern Med. Sep 9, 2019;34(9):1695-1696. [ FREE Full text ] [ CrossRef ] [ Medline ]
  • SAS/STAT® 14.3 user’s guide high-performance procedures. SAS Institute Inc. 2017. URL: https://support.sas.com/documentation/onlinedoc/stat/143/stathpug.pdf [accessed 2024-04-08]
  • What is tableau? Tableau. URL: https://www.tableau.com/why-tableau/what-is-tableau [accessed 2023-01-16]
  • Dulku K, Toll E, Kwun J, van der Meer G. The learning curve of BiZact™ tonsillectomy. Int J Pediatr Otorhinolaryngol. Jul 2022;158:111155. [ CrossRef ] [ Medline ]
  • Zou YM, Ma Y, Liu JH, Shi J, Fan T, Shan YY, et al. Trends and correlation of antibacterial usage and bacterial resistance: time series analysis for antibacterial stewardship in a Chinese teaching hospital (2009-2013). Eur J Clin Microbiol Infect Dis. Apr 10, 2015;34(4):795-803. [ CrossRef ] [ Medline ]
  • Frequently asked questions. Alberta Health Services. URL: https://www.albertahealthservices.ca/info/Page15938.aspx [accessed 2024-04-08]
  • Khairat S, Zalla L, Gartland A, Seashore C. Association between proficiency and efficiency in electronic health records among pediatricians at a major academic health system. Front Digit Health. Sep 6, 2021;3:689646. [ FREE Full text ] [ CrossRef ] [ Medline ]
  • Cohen GR, Boi J, Johnson C, Brown L, Patel V. Measuring time clinicians spend using EHRs in the inpatient setting: a national, mixed-methods study. J Am Med Inform Assoc. Jul 30, 2021;28(8):1676-1682. [ FREE Full text ] [ CrossRef ] [ Medline ]
  • Tai-Seale M, Olson CW, Li J, Chan AS, Morikawa C, Durbin M, et al. Electronic health record logs indicate that physicians split time evenly between seeing patients and desktop medicine. Health Aff (Millwood). Apr 01, 2017;36(4):655-662. [ FREE Full text ] [ CrossRef ] [ Medline ]

Abbreviations

Edited by K Williams; submitted 10.10.23; peer-reviewed by M Zaidi, N Shaw; comments to author 26.01.24; revised version received 12.03.24; accepted 19.03.24; published 29.04.24.

©Melita Avdagovska, Craig Kuziemsky, Helia Koosha, Maliheh Hadizadeh, Robert P Pauly, Timothy Graham, Tania Stafinski, David Bigam, Narmin Kassam, Devidas Menon. Originally published in the Journal of Medical Internet Research (https://www.jmir.org), 29.04.2024.

This is an open-access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work, first published in the Journal of Medical Internet Research, is properly cited. The complete bibliographic information, a link to the original publication on https://www.jmir.org/, as well as this copyright and license information must be included.

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

  • View all journals
  • Explore content
  • About the journal
  • Publish with us
  • Sign up for alerts

Latest science news, discoveries and analysis

a valid research study does the following

Bird flu virus has been spreading in US cows for months, RNA reveals

a valid research study does the following

Superconductivity hunt gets boost from China's $220 million physics 'playground'

a valid research study does the following

Could a rare mutation that causes dwarfism also slow ageing?

a valid research study does the following

Bird flu in US cows: is the milk supply safe?

Future of humanity institute shuts: what's next for ‘deep future’ research, judge dismisses superconductivity physicist’s lawsuit against university, nih pay raise for postdocs and phd students could have us ripple effect, china's moon atlas is the most detailed ever made, ecologists: don’t lose touch with the joy of fieldwork chris mantegna.

a valid research study does the following

Should the Maldives be creating new land?

a valid research study does the following

‘Shut up and calculate’: how Einstein lost the battle to explain quantum reality

a valid research study does the following

Algorithm ranks peer reviewers by reputation — but critics warn of bias

a valid research study does the following

How gliding marsupials got their ‘wings’

First fetus-to-fetus transplant demonstrated in rats, epic blazes threaten arctic permafrost. can fire-fighters save it, how reliable is this research tool flags papers discussed on pubpeer, ‘chatgpt for crispr’ creates new gene-editing tools.

a valid research study does the following

Retractions are part of science, but misconduct isn’t — lessons from a superconductivity lab

a valid research study does the following

Any plan to make smoking obsolete is the right step

a valid research study does the following

Citizenship privilege harms science

European ruling linking climate change to human rights could be a game changer — here’s how charlotte e. blattner, will ai accelerate or delay the race to net-zero emissions, current issue.

Issue Cover

The Maldives is racing to create new land. Why are so many people concerned?

Surprise hybrid origins of a butterfly species, stripped-envelope supernova light curves argue for central engine activity, optical clocks at sea, research analysis.

a valid research study does the following

Ancient DNA traces family lines and political shifts in the Avar empire

a valid research study does the following

A chemical method for selective labelling of the key amino acid tryptophan

a valid research study does the following

Robust optical clocks promise stable timing in a portable package

a valid research study does the following

Targeting RNA opens therapeutic avenues for Timothy syndrome

Bioengineered ‘mini-colons’ shed light on cancer progression, galaxy found napping in the primordial universe, tumours form without genetic mutations, marsupial genomes reveal how a skin membrane for gliding evolved.

a valid research study does the following

Scientists urged to collect royalties from the ‘magic money tree’

a valid research study does the following

Breaking ice, and helicopter drops: winning photos of working scientists

a valid research study does the following

Shrouded in secrecy: how science is harmed by the bullying and harassment rumour mill

I strive to make the great barrier reef more resilient to heat stress, 85 million cells — and counting — at your fingertips, books & culture.

a valid research study does the following

How volcanoes shaped our planet — and why we need to be ready for the next big eruption

a valid research study does the following

Dogwhistles, drilling and the roots of Western civilization: Books in brief

a valid research study does the following

Cosmic rentals

Las borinqueñas remembers the forgotten puerto rican women who tested the first pill, dad always mows on summer saturday mornings, nature podcast.

Nature Podcast

Latest videos

Nature briefing.

An essential round-up of science news, opinion and analysis, delivered to your inbox every weekday.

a valid research study does the following

Quick links

  • Explore articles by subject
  • Guide to authors
  • Editorial policies

U.S. flag

An official website of the United States government

The .gov means it’s official. Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

The site is secure. The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

  • Publications
  • Account settings

Preview improvements coming to the PMC website in October 2024. Learn More or Try it out now .

  • Advanced Search
  • Journal List
  • HHS Author Manuscripts

Logo of nihpa

How to Conduct Responsible Research: A Guide for Graduate Students

Alison l. antes.

1 Department of Medicine, Division of General Medical Sciences, Washington University School of Medicine, St. Louis, Missouri, 314-362-6006

Leonard B. Maggi, Jr.

2 Department of Medicine, Division of Molecular Oncology, Siteman Cancer Center, Washington University School of Medicine, St. Louis, Missouri, 314-362-4102

Researchers must conduct research responsibly for it to have an impact and to safeguard trust in science. Essential responsibilities of researchers include using rigorous, reproducible research methods, reporting findings in a trustworthy manner, and giving the researchers who contributed appropriate authorship credit. This “how-to” guide covers strategies and practices for doing reproducible research and being a responsible author. The article also covers how to utilize decision-making strategies when uncertain about the best way to proceed in a challenging situation. The advice focuses especially on graduate students but is appropriate for undergraduates and experienced researchers. The article begins with an overview of the responsible conduct of research, research misconduct, and ethical behavior in the scientific workplace. The takeaway message is that responsible conduct of research requires a thoughtful approach to doing research to ensure trustworthy results and conclusions and that researchers receive fair credit.

INTRODUCTION

Doing research is stimulating and fulfilling work. Scientists make discoveries to build knowledge and solve problems, and they work with other dedicated researchers. Research is a highly complex activity, so it takes years for beginning researchers to learn everything they need to know to do science well. Part of this large body of knowledge is learning how to do research responsibly. Our purpose in this article is to provide graduate students a guide for how to perform responsible research. Our advice is also relevant to undergraduate researchers and for principal investigators (PIs), postdocs, or other researchers who mentor beginning researchers and wish to share our advice.

We begin by introducing some fundamentals about the responsible conduct of research (RCR), research misconduct, and ethical behavior. We focus on how to do reproducible science and be a responsible author. We provide practical advice for these topics and present scenarios to practice thinking through challenges in research. Our article concludes with decision-making strategies for addressing complex problems.

What is the responsible conduct of research?

To be committed to RCR means upholding the highest standards of honesty, accuracy, efficiency, and objectivity ( Steneck, 2007 ). Each day, RCR requires engaging in research in a conscientious, intentional fashion that yields the best science possible ( “Research Integrity is Much More Than Misconduct,” 2019 ). We adopt a practical, “how-to” approach, discussing the behaviors and habits that yield responsible research. However, some background knowledge about RCR is helpful to frame our discussion.

The scientific community uses many terms to refer to ethical and responsible behavior in research: responsible conduct of research, research integrity, scientific integrity, and research ethics ( National Academies of Science, 2009 ; National Academies of Sciences Engineering and Medicine, 2017 ; Steneck, 2007 ). A helpful way to think about these concepts is “doing good science in a good manner” ( DuBois & Antes, 2018 ). This means that the way researchers do their work, from experimental procedures to data analysis and interpretation, research reporting, and so on, leads to trustworthy research findings and conclusions. It also includes respectful interactions among researchers both within research teams (e.g., between peers, mentors and trainees, and collaborators) and with researchers external to the team (e.g., peer reviewers). We expand on trainee-mentor relationships and interpersonal dynamics with labmates in a companion article ( Antes & Maggi, 2021 ). When research involves human or animal research subjects, RCR includes protecting the well-being of research subjects.

We do not cover all potential RCR topics but focus on what we consider fundamentals for graduate students. Common topics covered in texts and courses on RCR include the following: authorship and publication; collaboration; conflicts of interest; data management, sharing, and ownership; intellectual property; mentor and trainee responsibilities; peer review; protecting human subjects; protecting animal subjects; research misconduct; the role of researchers in society; and laboratory safety. A number of topics prominently discussed among the scientific community in recent years are also relevant to RCR. These include the reproducibility of research ( Baker, 2016 ; Barba, 2016 ; Winchester, 2018 ), diversity and inclusion in science ( Asplund & Welle, 2018 ; Hofstra et al., 2020 ; Meyers, Brown, Moneta-Koehler, & Chalkley, 2018 ; National Academies of Sciences Engineering and Medicine, 2018a ; Roper, 2019 ), harassment and bullying ( Else, 2018 ; National Academies of Sciences Engineering and Medicine, 2018b ; “ No Place for Bullies in Science,” 2018 ), healthy research work environments ( Norris, Dirnagl, Zigmond, Thompson-Peer, & Chow, 2018 ; “ Research Institutions Must Put the Health of Labs First,” 2018 ), and the mental health of graduate students ( Evans, Bira, Gastelum, Weiss, & Vanderford, 2018 ).

The National Institutes of Health (NIH) ( National Institutes of Health, 2009 ) and the National Science Foundation ( National Science Foundation, 2017 ) have formal policies indicating research trainees must receive education in RCR. Researchers are accountable to these funding agencies and the public which supports research through billions in tax dollars annually. The public stands to benefit from, or be harmed by, research. For example, the public may be harmed if medical treatments or social policies are based on untrustworthy research findings. Funding for research, participation in research, and utilization of the fruits of research all rely on public trust ( Resnik, 2011 ). Trustworthy findings are also essential for good stewardship of scarce resources ( Emanuel, Wendler, & Grady, 2000 ). Researchers are further accountable to their peers, colleagues, and scientists more broadly. Trust in the work of other researchers is essential for science to advance. Finally, researchers are accountable for complying with the rules and policies of their universities or research institutions, such as rules about laboratory safety, bullying and harassment, and the treatment of animal research subjects.

What is research misconduct?

When researchers intentionally misrepresent or manipulate their results, these cases of scientific fraud often make the news headlines ( Chappell, 2019 ; O’Connor, 2018 ; Park, 2012 ), and they can seriously undermine public trust in research. These cases also harm trust within the scientific community.

The U.S. defines research misconduct as fabrication, falsification, and plagiarism (FFP) ( Department of Health and Human Services, 2005 ). FFP violate the fundamental ethical principle of honesty. Fabrication is making up data, and falsification is manipulating or changing data or results so they are no longer truthful. Plagiarism is a form of dishonesty because it includes using someone’s words or ideas and portraying them as your own. When brought to light, misconduct involves lengthy investigations and serious consequences, such as ineligibility to receive federal research funding, loss of employment, paper retractions, and, for students, withdrawal of graduate degrees.

One aspect of responsible behavior includes addressing misconduct if you observe it. We suggest a guide titled “Responding to Research Wrongdoing: A User-Friendly Guide” that provides advice for thinking about your options if you think you have observed misconduct ( Keith-Spiegel, Sieber, & Koocher, 2010 ). Your university will have written policies and procedures for investigating allegations of misconduct. Making an allegation is very serious. As Keith-Spiegel et al.’s guide indicates, it is important to know the evidence that supports your claim, and what to expect in the process. We encourage, if possible, talking to the persons involved first. For example, one of us knew of a graduate student who reported to a journal editor their suspicion of falsified data in a manuscript. It turned out that the student was incorrect. Going above the PI directly to the editor ultimately led to the PI leaving the university, and the student had a difficult time finding a new lab to complete their degree. If the student had first spoken to the PI and lab members, they could have learned that their assumptions about the data in the paper were wrong. In turn, they could have avoided accusing the PI of a serious form of scientific misconduct—making up data—and harming everyone’s scientific career.

What shapes ethical behavior in the scientific workplace?

Responsible conduct of research and research misconduct are two sides of a continuum of behavior—RCR upholds the ideals of research and research misconduct violates them. Problematic practices that fall in the middle but are not defined formally as research misconduct have been labeled as detrimental research practices ( National Academies of Sciences Engineering and Medicine, 2017 ). Researchers conducting misleading statistical analyses or PIs providing inadequate supervision are examples of the latter. Research suggests that characteristics of individual researchers and research environments explain (un)ethical behavior in the scientific workplace ( Antes et al., 2007 ; Antes, English, Baldwin, & DuBois, 2018 ; Davis, Riske-Morris, & Diaz, 2007 ; DuBois et al., 2013 ).

These two influences on ethical behavior are helpful to keep in mind when thinking about your behavior. When people think about their ethical behavior, they think about their personal values and integrity and tend to overlook the influence of their environment. While “being a good person” and having the right intentions are essential to ethical behavior, the environment also has an influence. In addition, knowledge of standards for ethical research is important for ethical behavior, and graduate students new to research do not yet know everything they need to. They also have not fully refined their ethical decision-making skills for solving professional problems. We discuss strategies for ethical decision-making in the final section of this article ( McIntosh, Antes, & DuBois, 2020 ).

The research environment influences ethical behavior in a number of ways. For example, if a research group explicitly discusses high standards for research, people will be more likely to prioritize these ideals in their behavior ( Plemmons et al., 2020 ). A mentor who sets a good example is another important factor ( Anderson et al., 2007 ). Research labs must also provide individuals with adequate training, supervision and feedback, opportunities to discuss data, and the psychological safety to feel comfortable communicating about problems, including mistakes ( Antes, Kuykendall, & DuBois, 2019a , 2019b ). On the other hand, unfair research environments, inadequate supervision, poor communication, and severe stress and anxiety may undermine ethical decision-making and behavior; particularly when many of these factors exist together. Thus, (un)ethical behavior is a complex interplay of individual factors (e.g., personality, stress, decision-making skills) and the environment.

For graduate students, it is important to attend to what you are learning and how the environment around you might influence your behavior. You do not know what you do not know, and you necessarily rely on others to teach you responsible practices. So, it is important to be aware. Ultimately, you are accountable for your behavior. You cannot just say “I didn’t know.” Rather, just like you are curious about your scientific questions, maintain a curiosity about responsible behavior as a researcher. If you feel uncomfortable with something, pay attention to that feeling, speak to someone you trust, and seek out information about how to handle the situation. In what follows, we cover key tips for responsible behavior in the areas of reproducibility and authorship that we hope will help you as you begin.

HOW TO DO REPRODUCIBLE SCIENCE

The foremost responsibility of scientists is to ensure they conduct research in such a manner that the findings are trustworthy. Reproducibility is the ability to duplicate results ( Goodman, Fanelli, & Ioannidis, 2016 ). The scientific community has called for greater openness, transparency, and rigor as key remedies for lack of reproducibility ( Munafò et al., 2017 ). As a graduate student, essential to fostering reproducibility is the rigor of your approach to doing experiments and handling data. We discuss how to utilize research protocols, document experiments in a lab notebook, and handle data responsibly.

Utilize research protocols

1. learn and utilize the lab’s protocols.

Research protocols describe the step-by-step procedures for doing an experiment. They are critical for the quality and reproducibility of experiments. Lab members must learn and follow the lab’s protocols with the understanding that they may need to make adjustments based on the requirements of a specific experiment.

Also, it is important to distinguish between the experiment you are performing and analyzing the data from that experiment. For example, the experiment you want to perform might be to determine if loss of a gene blocks cell growth. Several protocols, each with pros and cons, will allow you to examine “cell growth.” Using the wrong experimental protocol can produce data that leads to muddled conclusions. In this example, the gene does block cell growth, but the experiment used to produce the data that you analyze to understand cell growth is wrong, thus giving a result that is a false negative.

When first joining a lab, it is essential to commit to learning the protocols necessary for your assigned research project. Researchers must ensure they are proficient in executing a protocol and can perform their experiments reliably. If you do not feel confident with a protocol, you should do practice runs if possible. Repetition is the best way to work through difficulties with protocols. Often it takes several attempts to work through the steps of a protocol before you will be comfortable performing it. Asking to watch another lab member perform the protocol is also helpful. Be sure to watch closely how steps are performed, as often there are minor steps taken that are not written down. Also, experienced lab members may do things as second nature and not think to explicitly mention them when working through the protocol. Ask questions of other lab members so that you can improve your knowledge and gain confidence with a protocol. It is better to ask a question than potentially ruin a valuable or hard-to-get sample.

Be cautious of differences in the standing protocols in the lab and how you actually perform the experiment. Even the most minor deviations can seriously impact the results and reproducibility of an experiment. As mentioned above, often there are minor things that are done that might not be listed in the protocol. Paying attention and asking questions are the best ways to learn, in addition to adding notes to the protocol if you find minor details are missing.

2. Develop your own protocols

Often you will find that a project requires a protocol that has not been performed in the lab. If performing a new experiment in the lab and no protocol exists, find a protocol and try it. Protocols can be obtained from many different sources. A great source is other labs on campus, as you can speak directly to the person who performs the experiment. There are many journal sources as well, such as Current Protocols, Nature Protocols, Nature Methods, and Cell STAR Methods . These methods journals provide the most detailed protocols for experiments often with troubleshooting tips. Scientific papers are the most common source of protocols. However, keep in mind that due to the common brevity of methods sections, they often omit crucial details or reference other papers that may not contain a complete description of the protocol.

3. Handle mistakes or problems promptly

At some point, everyone encounters problems with a protocol, or realizes they made a mistake. You should be prepared to handle this situation by being able to detail exactly how you performed the experiment. Did you skip a step? Shorten or lengthen a time point? Did you have to make a new buffer or borrow a labmate’s buffer? There are too many ways an experiment can go wrong to list here but being able to recount all the steps you performed in detail will help you work through the problem. Keep in mind that often the best way to understand how to perform an experiment is learning from when something goes wrong. This situation requires you to critically think through what was done and understand the steps taken. When everything works perfectly, it is easy to pay less attention to the details, which can lead to problems down the line.

It is up to you to be attentive and meticulous in the lab. Paying attention to the details may feel like a pain at first, or even seem overwhelming. Practice and repetition will help this focus on details become a natural part of your lab work. Ultimately, this skill will be essential to being a responsible scientist.

Document experiments in a lab notebook

1. recognize the importance of a lab notebook.

Maintaining detailed documentation in a lab notebook allows researchers to keep track of their experiments and generation of data. This detailed documentation helps you communicate about your research with others in the lab, and serves as a basis for preparing publications. It also provides a lasting record for the lab that exists beyond your time in the lab. After graduate students leave the lab, sometimes it is necessary to go back to the results of older experiments. A complete and detailed notebook is essential, or all of the time, effort, and resources are lost.

2. Learn the note-keeping practices in your lab

When you enter a new lab, it is important to understand how the lab keeps notebooks and the expectations for documentation. Being conscientious about documentation will make you a better scientist. In some labs, the PI might routinely examine your notebook, while in other labs you may be expected to maintain a notebook, but it may not be regularly viewed by others. It is tempting to become relaxed in documentation if you think your notebook may not be reviewed. Avoid this temptation; documentation of your ideas and process will improve your ability to think critically about research. Further, even if the PI or lab members do not physically view your notebook, you will need to communicate with them about your experiments. This documentation is necessary to communicate effectively about your work.

3. Organize your lab notebook

Different labs use different formats; some use electronic notebooks while others handwritten notebooks. The contents of a good notebook include the purpose of the experiment, the details of the experimental procedure, the data, and thoughts about the results. To effectively document your experiment, there are 5 critical questions that the information you record should be able to answer.

  • Why I am doing this experiment? (purpose)
  • What did I do to perform the experiment? (protocol)
  • What are the results of what I did? (data, graphs)
  • What do I think about the results?
  • What do I think are the next steps?

We also recommend a table of contents. It will make the information more useful to you and the lab in the future. The table of contents should list the title of the experiment, the date(s) it was performed, and the page numbers on which it is recorded. Also, make sure that you write clearly and provide a legend or explanation of any shorthand or non-standard abbreviation you use. Often labs will have a combination of written lab notebooks and electronic data. It is important to reference where electronic data are located that go with each experiment. The idea is to make it as easy as possible to understand what you did and where to find all the data (electronic and hard copy) that accompanies your experiment.

Keeping a lab notebook becomes easier with practice. It can be thought of almost like journaling about your experiment. Sometimes people think of it as just a place to paste their protocol and a graph or data. We strongly encourage you to include your thoughts about why you made the decisions you made when conducting the experiment and to document your thoughts about next steps.

4. Commit to doing it the right way

A common reason to become lax in documentation is feeling rushed for time. Although documentation takes time, it saves time in the long-run and fosters good science. Without good notes, you will waste time trying to recall precisely what you did, reproduce your findings, and remember what you thought would be important next steps. The lab notebook helps you think about your research critically and keep your thoughts together. It can also save you time later when writing up results for publication. Further, well-documented data will help you draft a cogent and rigorous dissertation.

Handle data responsibly

1. keep all data.

Data are the product of research. Data include raw data, processed data, analyzed data, figures, and tables. Many data today are electronic, but not all. Generating data requires a lot of time and resources and researchers must treat data with care. The first essential tip is to keep all data. Do not discard data just because the experiment did not turn out as expected. A lot of experiments do not turn out to yield publishable data, but the results are still important for informing next steps.

Always keep the original, raw data. That is, as you process and analyze data, always maintain an unprocessed version of the original data.

Universities and funding agencies have data retention policies. These policies specify the number of years beyond a grant that data must be kept. Some policies also indicate researchers need to retain original data that served as the basis for a publication for a certain number of years. Therefore, your data will be important well beyond your time in graduate school. Most labs require you to keep samples for reanalysis until a paper is published, then the analyzed data are enough. If you leave a lab before a paper is accepted for publication, you are responsible for ensuring your data and original samples are well documented for others to find and use.

2. Document all data

In addition to keeping all data, data must be well-organized and documented. This means that no matter the way you keep your data (e.g., electronic or in written lab notebooks), there is a clear guide—in your lab notebook, a binder, or on a lab hard drive—to finding the data for a particular experiment. For example, it must be clear which data produced a particular graph. Version control of data is also critical. Your documentation should include “metadata” (data about your data) that tracks versions of the data. For example, as you edit data for a table, you should save separate versions of the tables, name the files sequentially, and note the changes that were made to each version.

3. Backup your data

You should backup electronic data regularly. Ideally, your lab has a shared server or cloud storage to backup data. If you are supposed to put your data there, make sure you do it! When you leave the lab, it must be possible to find your data.

4. Perform data analysis honestly and competently

Inappropriate use of statistics is a major concern in the scientific community, as the results and conclusions will be misleading if done incorrectly ( DeMets, 1999 ). Some practices are clearly an abuse of statistics, while other inappropriate practices stem from lack of knowledge. For example, a practice called “p-hacking” describes when researchers “collect or select data or statistical analyses until nonsignificant results become significant” ( Head, Holman, Lanfear, Kahn, & Jennions, 2015 ). In addition to avoiding such misbehavior, it is essential to be proficient with statistics to ensure you do statistical procedures appropriately. Learning statistical procedures and analyzing data takes many years of practice, and your statistics courses may only cover the basics. You will need to know when to consult others for help. In addition to consulting members in your lab or your PI, your university may have statistical experts who can provide consultations.

5. Master pressure to obtain favored results

When you conduct an experiment, the results are the results. As a beginning researcher, it is important to be prepared to manage the frustration of experiments not turning out as expected. It is also important to manage the real or perceived pressure to produce favored results. Investigators can become wedded to a hypothesis, and they can have a difficult time accepting the results. Sometimes you may feel this pressure coming from yourself; for example, if you want to please your PI, or if you want to get results for a certain publication. It is important to always follow the data no matter where it leads.

If you do feel pressure, this situation can be uncomfortable and stressful. If you have been meticulous and followed the above recommendations, this can be one great safeguard. You will be better able to confidently communicate your results to the PI because of your detailed documentation, and you will be more confident in your procedures if the possibility of error is suggested. Typically, with enough evidence that the unexpected results are real, the PI will concede. We recommend seeking the support of friends or colleagues to vent and cope with stress. In the rare case that the PI does not relent, you could turn to an advisor outside the lab if you need advice about how to proceed. They can help you look at the data objectively and also help you think about the interpersonal aspects of navigating this situation.

6. Communicate about your data in the lab

A critical element of reproducible research is communication in the lab. Ideally, there are weekly or bi-weekly meetings to discuss data. You need to develop your communication skills for writing and speaking about data. Often you and your labmates will discuss experimental issues and results informally during the course of daily work. This is an excellent way to hone critical thinking and communication skills about data.

Scenario 1 – The Protocol is Not Working

At the beginning of a rotation during their first year, a graduate student is handed a lab notebook and a pen and is told to keep track of their work. There does not appear to be a specific format to follow. There are standard lab protocols that everyone follows, but minor tweaks to the protocols do not seem to be tracked from experiment to experiment in the standard lab protocol nor in other lab notebooks. After two weeks of trying to follow one of the standard lab protocols, the student still cannot get the experiment to work. The student has included the appropriate positive and negative controls which are failing, making the experiment uninterpretable. After asking others in the lab for help, the graduate student learns that no one currently in the lab has performed this particular experiment. The former lab member who had performed the experiment only lists the standard protocol in their lab notebook.

How should the graduate student start to solve the problem?

Speaking to the PI would be the next logical step. As a first-year student in a lab rotation, the PI should expect this type of situation and provide additional troubleshooting guidance. It is possible that the PI may want to see how the new graduate student thinks critically and handles adversity in the lab. Rather than giving an answer, the PI might ask the student to work through the problem. The PI should give guidance, but it may not be an immediate fix for the problem. If the PI’s suggestions fail to correct the problem, asking a labmate or the PI for the contact information of the former lab member who most recently performed the experiment would be a reasonable next step. The graduate student’s conversations with the PI and labmates in this situation will help them learn a lot about how the people in the lab interact.

Most of the answers for these types of problems will require you as a graduate student to take the initiative to answer. They will require your effort and ingenuity to talk to other lab members, other labs at the university, and even scour the literature for alternatives. While labs have standard protocols, there are multiple ways to do many experiments, and working out an alternative will teach you more than when everything works. Having to troubleshoot problems will result in better standard protocols in the lab and better science.

HOW TO BE A RESPONSIBLE AUTHOR

Researchers communicate their findings via peer-reviewed publications, and publications are important for advancing in a research career. Many graduate students will first author or co-author publications in graduate school. For good advice on how to write a research manuscript, consult the Current Protocols article “How to write a research manuscript” ( Frank, 2018 ). We focus on the issues of assigning authors and reporting your findings responsibly. First, we describe some important basics: journal impact factors, predatory journals, and peer review.

What are journal impact factors?

It is helpful to understand journal impact factors. There is criticism about an overemphasis on impact factors for evaluating the quality or importance of researchers’ work ( DePellegrin & Johnston, 2015 ), but they remain common for this purpose. Journal impact factors reflect the average number of times articles in a journal were cited in the last two years. Higher impact factors place journals at a higher rank. Approximately 2% of journals have an impact factor of 10 or higher. For example, Cell, Science, and Nature have impact factors of approximately 39, 42, and 43, respectively. Journals can be great journals but have lower impact factors; often this is because they focus on a smaller specialty field. For example, Journal of Immunology and Oncogene are respected journals, but their impact factors are about 4 and 7, respectively.

Research trainees often want to publish in journals with the highest possible impact factor because they expect this to be viewed favorably when applying to future positions. We encourage you to bear in mind that many different journals publish excellent science and focus on publishing where your work will reach the desired audience. Also, keep in mind that while a high impact factor can direct you to respectable, high-impact science, it does not guarantee that the science in the paper is good or even correct. You must critically evaluate all papers you read no matter the impact factor.

What are predatory journals?

Predatory journals have flourished over the past few years as publishing science has moved online. An international panel defined predatory journals as follows ( Grudniewicz et al., 2019 ):

Predatory journals and publishers are entities that prioritize self-interest at the expense of scholarship and are characterized by false or misleading information, deviation from best editorial and publication practices, a lack of transparency, and/or the use of aggressive and indiscriminate solicitation practices. (p. 211)

Often young researchers receive emails soliciting them to submit their work to a journal. There are typically small fees (around $99 US) requested but these fees will be much lower than open access fees of reputable journals (often around $2000 US). A warning sign of a predatory journal is outlandish promises, such as 24-hour peer review or immediate publication. You can find a list of predatory journals created by a postdoc in Europe at BeallsList.net ( “Beall’s List of Potential Predatory Journals and Publishers,” 2020 ).

What is peer review?

Peer reviewers are other scientists who have the expertise to evaluate a manuscript. Typically 2 or 3 reviewers evaluate a manuscript. First, an editor performs an initial screen of the manuscript to ensure its appropriateness for the journal and that it meets basic quality standards. At this stage, an editor can decide to reject the manuscript and not send it to review. Not sending a paper for peer review is common in the highest impact journals that receive more submissions per year than can be reviewed and published. For average-impact journals and specialty journals, typically your paper will be sent for peer review.

In general, peer review focuses on three aspects of a manuscript: research design and methods, validity of the data and conclusions, and significance. Peer reviewers assess the merit and rigor of the research design and methodology, and they evaluate the overall validity of the results, interpretations, and conclusions. Essentially, reviewers want to ensure that the data support the claims. Additionally, reviewers evaluate the overall significance, or contribution, of the findings, which involves the novelty of the research and the likelihood that the findings will advance the field. Significance standards vary between journals. Some journals are open to publishing findings that are incremental advancements in a field, while others want to publish only what they deem as major advancements. This feature can distinguish the highest impact journals which seek the most significant advancements and other journals that tend to consider a broader range of work as long as it is scientifically sound. It is important to keep in mind that determining at the stage of review and publication whether a paper is “high impact” is quite subjective. In reality, this can only really be determined in retrospect.

The key ethical issues in peer review are fairness, objectivity, and confidentiality ( Shamoo & Resnik, 2015 ). Peer reviewers are to evaluate the manuscript on its merits and not based on biases related to the authors or the science itself. If reviewers have a conflict of interest, this should be disclosed to the editor. Confidentiality of peer review means that the reviewers should keep private the information; they should not share the information with others or use it to their benefit. Reviewers can ultimately recommend that the manuscript is rejected, revised, and resubmitted (major or minor revisions), or accepted. The editor evaluates the reviewers’ feedback and makes a judgment about rejecting, accepting, or requesting a revision. Sometimes PIs will ask experienced graduate students to assist with peer reviewing a manuscript. This is a good learning opportunity. The PI should disclose to the editor that they included a trainee in preparing the review.

Assign authorship fairly

Authorship gives credit to the people who contributed to the research. This includes thinking of the ideas, designing and performing experiments, interpreting the results, and writing the paper. Two key questions regarding authorship include: 1 - Who will be an author? 2 - What will be the order in which authors are listed? These seem simple on the surface but can get quite complex.

1. Know authorship guidelines

Authorship guidelines published by journals, professional societies, and universities communicate key principles of authorship and standards for earning authorship. The core ethical principle of assigning authorship is fairness in who receives credit for the work. The people who contributed to the work should get credit for it. This seems simply enough, but determining authorship can (and often does) create conflict.

Many universities have authorship guidelines, and you should know the policies at your university. The International Committee of Medical Journal Editors (ICMJE) provides four criteria for determining who should be an author ( International Committee of Medical Journal Editors, 2020 ). These criteria indicate that an author should do all of the following: 1) make “substantial contributions” to the development of the idea or research design, or to acquiring, analyzing, or interpreting the data, 2) write the manuscript or revise it a substantive way, 3) give approval of the final manuscript (i.e., before it is submitted for review, and after it is revised, if necessary), and 4) agree to be responsible for any questions about the accuracy or integrity of the research.

Several types of authorship violate these guidelines and should be avoided. Guest authorship is when respected researchers are added out of appreciation, or to have the manuscript be perceived more favorably to get it published or increase its impact. Gift authorship is giving authorship to reward an individual, or as a favor. Ghost authorship is when someone made significant contributions to the paper but is not listed as an author. To increase transparency, some journals require authors to indicate how each individual contributed to the research and manuscript.

2. Apply the guidelines

Conflicts often arise from disagreements about how much people contributed to the research and whether those contributions merit authorship. The best approach is an open, honest, and ongoing discussion about authorship, which we discuss in #3 below. To have effective, informed conversations about authorship, you must understand how to apply the guidelines to your specific situation. The following is a simple rule of thumb that indicates there are three components of authorship. We do not list giving final approval of the manuscript and agreeing to be accountable, but we do consider these essentials of authorship.

  • Thinking – this means contributing to the ideas leading to the hypothesis of the work, designing experiments to address the hypothesis, and/or analyzing the results in the larger context of the literature in the field.
  • Doing – this means performing and analyzing the experiments.
  • Writing – this means editing a draft, or writing the entire paper. The first author often writes the entire first draft.

In our experience, a first author would typically do all three. They also usually coordinate the writing and editing process. Co-authors are typically very involved in at least two of the three, and are somewhat involved in the other. The PI, who oversees and contributes to all three, is often the last, or “senior author.” The “senior author” is typically the “corresponding author”—the person listed as the individual to contact about the paper. The other co-authors are listed between the first and senior author either alphabetically, or more commonly, in order from the largest to smallest contribution.

Problems in assigning authorship typically arise due to people’s interpretations of #1 (thinking) and #2 (doing)—what and how much each individual contributed to a project’s design, execution, and analysis. Different fields or PIs may have their own slight variations on these guidelines. The potential conflicts associated with assigning authorship lead to the most common recommendation for responsibly assigning authorship: discuss authorship expectations early and revisit them during the project.

3. Discuss authorship with your collaborators

Publications are important for career advancement, so you can see why people might be worried about fairness in assigning authorship. If the problem arises from a lack of a shared understanding about contributions to the research, the only way to clarify this is an open discussion. This discussion should ideally take place very early at the beginning of a project, and should be ongoing. Hopefully you work in a laboratory that makes these discussions a natural part of the research process; this makes it much easier to understand the expectations upfront.

We encourage you to speak up about your interest in making a contribution that would merit authorship, especially if you want to earn first authorship. Sometimes norms about authoring papers in a lab make it clear you are expected to first and co-author publications, but it is best to communicate your interest in earning authorship. If the project is not yours, but you wish to collaborate, you can inquire what you may be able to contribute that would merit authorship.

If it is not a norm in your lab to discuss authorship throughout the life of projects, then as a graduate student you may feel reluctant to speak up. You could initiate a conversation with a more senior graduate student, a postdoc, or your PI, depending on the dynamics in the group. You could ask generally about how the lab approaches assignment of authorship, but discussing a specific project and paper may be best. It may feel awkward to ask, but asking early is less uncomfortable than waiting until the end of the project. If the group is already drafting a manuscript and you are told that your contribution is insufficient for authorship, this situation is much more discouraging than if you had asked earlier about what is expected to earn authorship.

How to report findings responsibly

The most significant responsibility of authors is to present their research accurately and honestly. Deliberately presenting misleading information is clearly unethical, but there are significant judgment calls about how to present your research findings. For example, an author can mislead by overstating the conclusions given what the data support.

1. Commit to presenting your findings honestly

Any good scientific manuscript writer will tell you that you need to “tell a good story.” This means that your paper is organized and framed to draw the reader into the research and convince them of the importance of the findings. But, this story must be sound and justified by the data. Other authors are presenting their findings in the best, most “publishable” light, so it is a balancing act to be persuasive but also responsible in presenting your findings in a trustworthy manner. To present your findings honestly, you must be conscious of how you interpret your data and present your conclusions so that they are accurate and not overstated.

One misbehavior known as “HARKing,” Hypothesis After the Results are Known, occurs when hypotheses are created after seeing the results of an experiment, but they are presented as if they were defined prior to collecting the data ( Munafò et al., 2017 ). This practice should be avoided. HARKing may be driven, in part, by a concern in scientific publishing known as publication bias. This bias is a preference that reviewers, editors, and researchers have for papers describing positive findings instead of negative findings ( Carroll, Toumpakari, Johnson, & Betts, 2017 ). This preference can lead to manipulating one’s practices, such as by HARKing, so that positive findings can be reported.

It is important to note that in addition to avoiding misbehaviors such as HARKing, all researchers are susceptible to a number of more subtle traps in judgment. Even the most well-intentioned researcher may jump to conclusions, discount alternative explanations, or accept results that seem correct without further scrutiny ( Nuzzo, 2015 ). Therefore, researchers must not only commit to presenting their findings honestly but consider how they can counteract such traps by slowing down and increasing their skepticism towards their findings.

2. Provide an appropriate amount of detail

Providing enough detail in a manuscript can be a challenge with the word limits imposed by most journals. Therefore, you will need to determine what details to include and which to exclude, or potentially include in the supplemental materials. Methods sections can be long and are often the first to be shortened, but complete methods are important for others to evaluate the research and to repeat the methods in other studies. Even more significant is making decisions about what experimental data to include and potentially exclude from the manuscript. Researchers must determine what data is required to create a complete scientific story that supports the central hypothesis of the paper. On the other hand, it is not necessary or helpful to include so much data in the manuscript, or in supplemental material, that the central point of the paper is difficult to discern. It is a tricky balance.

3. Follow proper citation practices

Of course, responsible authorship requires avoiding plagiarism. Many researchers think that plagiarism is not a concern for them because they assume it is always done intentionally by “copying and pasting” someone else’s words and claiming them as your own. Sometimes poor writing practices, such as taking notes from references without distinguishing between direct quotes and paraphrased material, can lead to including material that is not quoted properly. More broadly, proper citation practices include accurately and completely referencing prior studies to provide appropriate context for your manuscript.

4. Attend to the other important details

The journal will require several pieces of additional information, such as disclosure of sources of funding and potential conflicts of interest. Typically, graduate students do not have relationships that constitute conflicts of interest, but a PI who is a co-author may. In submitting a manuscript, also make sure to acknowledge individuals not listed as authors but who contributed to the work.

5. Share data and promote transparency

Data sharing is a key facet of promoting transparency in science ( Nosek et al., 2015 ). It will be important to know the expectations of the journals in which you wish to publish. Many top journals now require data sharing; for example, sharing your data files in an online repository so others have access to the data for secondary use. Funding agencies like NIH also increasingly require data sharing. To further foster transparency and public trust in research, researchers must deposit their final peer-reviewed manuscripts that report on research funded by NIH to PubMed Central. PubMed makes biomedical and life science research publicly accessible in a free, online database.

Scenario 2 – Authors In Conflict

To prepare a manuscript for publication, a postdoc’s data is added to a graduate student’s thesis project. After working together to combine the data and write the paper, the postdoc requests co-first authorship on the paper. The graduate student balks at this request on the basis that it is their thesis project. In a weekly meeting with the lab’s PI to discuss the status of the paper, the graduate student states that they should divide the data between the authors as a way to prove that the graduate student should be the sole first author. The PI agrees to this attempt to quantify how much data each person contributed to the manuscript. All parties agree the writing and thinking were equally shared between them. After this assessment, the graduate student sees that the postdoc actually contributed more than half of the data presented in the paper. The graduate student and a second graduate student contributed the remaining data; this means the graduate student contributed much less than half of the data in the paper. However, the graduate student is still adamant that they must be the sole first author of the paper because it is their thesis project.

Is the graduate student correct in insisting that it is their project, so they are entitled to be the sole first author?

Co-first authorship became popular about 10 years ago as a way to acknowledge shared contributions to a paper in which authors worked together and contributed equally. If the postdoc contributed half of the data and worked with the graduate student to combine their interpretations and write the first draft of the paper, then the postdoc did make a substantial contribution. If the graduate student wrote much of the first draft of the paper, contributed significantly to the second half of data, and played a major role in the thesis concept and design, this is also a major contribution. We summarized authorship requirements as contributing to thinking, doing, and writing, and we noted that a first author usually contributes to all of these. The graduate student has met all 3 elements to claim first authorship. However, it appears that the postdoc has also met these 3 requirements. Thus, it is at least reasonable for the postdoc to ask about co-first authorship.

The best way to move forward is to discuss their perspectives openly. Both the graduate student and postdoc want first authorship on papers to advance their careers. The postdoc feels they contributed more to the overall concept and design than the graduate student is recognizing, and the postdoc did contribute half of the data. This is likely frustrating and upsetting for the postdoc. On the other hand, perhaps the postdoc is forgetting how much a thesis becomes like “your baby,” so to speak. The work is the graduate student’s thesis, so it is easy to see why the graduate student would feel a sense of ownership of it. Given this fact, it may be hard for the graduate student to accept the idea that they would share first-author recognition for the work. Yet, the graduate student should consider that the manuscript would not be possible without the postdoc’s contribution. Further, if the postdoc was truly being unreasonable, then the postdoc could make the case for sole first authorship based on contributing the most data to the paper, in addition to contributing ideas and writing the paper. The graduate student should consider that the postdoc may be suggesting co-first authorship in good faith.

As with any interpersonal conflict, clear communication is key. While it might be temporarily uncomfortable to voice their views and address this disagreement, it is critical to avoiding permanent damage to their working relationship. The pair should consider each other’s perspectives and potential alternatives. For example, if the graduate student is first author and the postdoc second, at a minimum they could include an author note in the manuscript that describes the contribution of each author. This would make it clear the scope of the postdoc’s contribution, if they decided not to go with co-first authorship. Also, the graduate student should consider their assumptions about co-first authorship. Maybe they assume it makes it appear they contributed less, but instead, perhaps co-first authorship highlights their collaborative approach to science. Collaboration is a desirable quality many (although arguably not all) research organizations look for when they are hiring.

They will also need to speak with others for advice. The pair should definitely speak with the PI who could provide input about how these cases have been handled in the past. Ultimately, if they cannot reach an agreement, the PI, who is likely to be the last or “senior” author, may make the final decision. They should also speak to the other graduate student who is an author.

If either individual is upset with the situation, they will want to discuss it when they have had time to cool down. This might mean taking a day before discussing, or speaking with someone outside of the lab for support. Ideally, all authors on this paper would have initiated this conversation earlier, and the standards in the lab for first authorship would be discussed routinely. Clear communication may have avoided the conflict.

HOW TO USE DECISION-MAKING STRATEGIES TO NAVIGATE CHALLENGES

We have provided advice on some specific challenges you might encounter in research. This final section covers our overarching recommendation that you adopt a set of ethical decision-making strategies. These strategies help researchers address challenges by helping them think through a problem and possible alternatives ( McIntosh et al., 2020 ). The strategies encourage you to gather information, examine possible outcomes, consider your assumptions, and address emotional reactions before acting. They are especially helpful when you are uncertain how to proceed, face a new problem, or when the consequences of a decision could negatively impact you or others. The strategies also help people be honest with themselves, such as when they are discounting important factors or have competing goals, by encouraging them to identify outside perspectives and test their motivations. You can remember the strategies using the acronym SMART .

1. S eek Help

Obtain input from others who can be objective and that you trust. They can assist you with assessing the situation, predicting possible outcomes, and identifying potential options. They can also provide you with support. Individuals to consult may be peers, other faculty, or people in your personal life. It is important that you trust the people you talk with, but it is also good when they challenge your perspective, or encourage you to think in a new way about a problem. Keep in mind that people such as program directors and university ombudsmen are often available for confidential, objective advice.

2. M anage Emotions

Consider your emotional reaction to the situation and how it might influence your assessment of the situation, and your potential decisions and actions. In particular, identify negative emotions, like frustration, anxiety, fear, and anger, as they particularly tend to diminish decision-making and the quality of interactions with others. Take time to address these emotions before acting, for example, by exercising, listening to music, or simply taking a day before responding.

3. A nticipate Consequences

Think about how the situation could turn out. This includes for you, for the research team, and anyone else involved. Consider the short, middle-term, and longer-term impacts of the problem and your potential approach to addressing the situation. Ideally, it is possible to identify win-win outcomes. Often, however, in tough professional situations, you may need to select the best option from among several that are not ideal.

4. R ecognize Rules and Context

Determine if any ethical principles, professional policies, or rules apply that might help guide your choices. For instance, if the problem involves an authorship dispute, consider the authorship guidelines that apply. Recognizing the context means considering the situational factors that could impact your options and how you proceed. For example, factors such as the reality that ultimately the PI may have the final decision about authorship.

5. T est Assumptions and Motives

Examine your beliefs about the situation and whether any of your thoughts may not be justified. This includes critically examining the personal motivations and goals that are driving your interpretation of the problem and thoughts about how to resolve it.

These strategies do not have to be engaged in order, and they are interrelated. For example, seeking help can help you manage emotions, test assumptions, and anticipate consequences. Go back to the scenarios and our advice throughout this article, and you will see many of our suggestions align with these strategies. Practice applying SMART strategies when you encounter a problem and they will become more natural.

Learning practices for responsible research will be the foundation for your success in graduate school and your career. We encourage you to be reflective and intentional as you learn and hope that our advice helps you along the way.

ACKNOWLEDGEMENTS

This work was supported by the National Human Genome Research Institute (Antes, K01HG008990) and the National Center for Advancing Translational Sciences (UL1 TR002345).

LITERATURE CITED

  • Anderson MS, Horn AS, Risbey KR, Ronning EA, De Vries R, & Martinson BC (2007). What Do Mentoring and Training in the Responsible Conduct of Research Have To Do with Scientists’ Misbehavior? Findings from a National Survey of NIH-Funded Scientists . Academic Medicine , 82 ( 9 ), 853–860. doi: 10.1097/ACM.0b013e31812f764c [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Antes AL, Brown RP, Murphy ST, Waples EP, Mumford MD, Connelly S, & Devenport LD (2007). Personality and Ethical Decision-Making in Research: The Role of Perceptions of Self and Others . Journal of Empirical Research on Human Research Ethics , 2 ( 4 ), 15–34. doi: 10.1525/jer.2007.2.4.15 [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Antes AL, English T, Baldwin KA, & DuBois JM (2018). The Role of Culture and Acculturation in Researchers’ Perceptions of Rules in Science . Science and Engineering Ethics , 24 ( 2 ), 361–391. doi: 10.1007/s11948-017-9876-4 [ PMC free article ] [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Antes AL, Kuykendall A, & DuBois JM (2019a). The Lab Management Practices of “Research Exemplars” that Foster Research Rigor and Regulatory Compliance: A Qualitative Study of Successful Principal Investigators . PloS One , 14 ( 4 ), e0214595. doi: 10.1371/journal.pone.0214595 [ PMC free article ] [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Antes AL, Kuykendall A, & DuBois JM (2019b). Leading for Research Excellence and Integrity: A Qualitative Investigation of the Relationship-Building Practices of Exemplary Principal Investigators . Accountability in Research , 26 ( 3 ), 198–226. doi: 10.1080/08989621.2019.1611429 [ PMC free article ] [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Antes AL, & Maggi LB Jr. (2021). How to Navigate Trainee-Mentor Relationships and Interpersonal Dynamics in the Lab . Current Protocols Essential Laboratory Techniques. [ PMC free article ] [ PubMed ] [ Google Scholar ]
  • Asplund M, & Welle CG (2018). Advancing Science: How Bias Holds Us Back . Neuron , 99 ( 4 ), 635–639. doi: 10.1016/j.neuron.2018.07.045 [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Baker M (2016). Is There a Reproducibility Crisis? Nature , 533 , 452–454. doi: 10.1038/533452a [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Barba LA (2016). The Hard Road to Reproducibility . Science , 354 ( 6308 ), 142. doi: 10.1126/science.354.6308.142 [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Beall’s List of Potential Predatory Journals and Publishers . (2020). Retrieved from https://beallslist.net/#update [ Google Scholar ]
  • Carroll HA, Toumpakari Z, Johnson L, & Betts JA (2017). The Perceived Feasibility of Methods to Reduce Publication Bias . PloS One , 12 ( 10 ), e0186472–e0186472. doi: 10.1371/journal.pone.0186472 [ PMC free article ] [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Chappell B (2019). Duke Whistleblower Gets More Than $33 Million in Research Fraud Settlement . NPR. Retrieved from https://www.npr.org/2019/03/25/706604033/duke-whistleblower-gets-more-than-33-million-in-research-fraud-settlement [ Google Scholar ]
  • Davis MS, Riske-Morris M, & Diaz SR (2007). Causal Factors Implicated in Research Misconduct: Evidence from ORI Case Files . Science and Engineering Ethics , 13 ( 4 ), 395–414. doi: 10.1007/s11948-007-9045-2 [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • DeMets DL (1999). Statistics and Ethics in Medical Research . Science and Engineering Ethics , 5 ( 1 ), 97–117. doi: 10.1007/s11948-999-0059-9 [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Department of Health and Human Services. (2005). 42 CFR Parts 50 and 93 Public Health Service Policies on Research Misconduct; Final Rule. Retrieved from https://ori.hhs.gov/sites/default/files/42_cfr_parts_50_and_93_2005.pdf [ Google Scholar ]
  • DePellegrin TA, & Johnston M (2015). An Arbitrary Line in the Sand: Rising Scientists Confront the Impact Factor . Genetics , 201 ( 3 ), 811–813. [ PMC free article ] [ PubMed ] [ Google Scholar ]
  • DuBois JM, Anderson EE, Chibnall J, Carroll K, Gibb T, Ogbuka C, & Rubbelke T (2013). Understanding Research Misconduct: A Comparative Analysis of 120 Cases of Professional Wrongdoing . Account Res , 20 ( 5–6 ), 320–338. doi: 10.1080/08989621.2013.822248 [ PMC free article ] [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • DuBois JM, & Antes AL (2018). Five Dimensions of Research Ethics: A Stakeholder Framework for Creating a Climate of Research Integrity . Academic Medicine , 93 ( 4 ), 550–555. doi: 10.1097/ACM.0000000000001966 [ PMC free article ] [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Else H (2018). Does Science have a Bullying Problem? Nature , 563 , 616–618. doi: 10.1038/d41586-018-07532-5 [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Emanuel EJ, Wendler D, & Grady C (2000). What Makes Clinical Research Ethical ? Journal of the American Medical Association , 283 ( 20 ), 2701–2711. doi:jsc90374 [pii] [ PubMed ] [ Google Scholar ]
  • Evans TM, Bira L, Gastelum JB, Weiss LT, & Vanderford NL (2018). Evidence for a Mental Health Crisis in Graduate Education . Nature Biotechnology , 36 ( 3 ), 282–284. doi: 10.1038/nbt.4089 [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Frank DJ (2018). How to Write a Research Manuscript . Current Protocols Essential Laboratory Techniques , 16 ( 1 ), e20. doi: 10.1002/cpet.20 [ CrossRef ] [ Google Scholar ]
  • Goodman SN, Fanelli D, & Ioannidis JPA (2016). What Does Research Reproducibility Mean? Science Translational Medicine , 8 ( 341 ), 341ps312. doi: 10.1126/scitranslmed.aaf5027 [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Grudniewicz A, Moher D, Cobey KD, Bryson GL, Cukier S, Allen K, … Lalu MM (2019). Predatory journals: no definition, no defence . Nature , 576 ( 7786 ), 210–212. doi: 10.1038/d41586-019-03759-y [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Head ML, Holman L, Lanfear R, Kahn AT, & Jennions MD (2015). The Extent and Consequences of P-Hacking in Science . PLoS Biology , 13 ( 3 ), e1002106. doi: 10.1371/journal.pbio.1002106 [ PMC free article ] [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Hofstra B, Kulkarni VV, Munoz-Najar Galvez S, He B, Jurafsky D, & McFarland DA (2020). The Diversity–Innovation Paradox in Science . Proceedings of the National Academy of Sciences , 117 ( 17 ), 9284. doi: 10.1073/pnas.1915378117 [ PMC free article ] [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • International Committee of Medical Journal Editors. (2020). Defining the Role of Authors and Contributors . Retrieved from http://www.icmje.org/recommendations/browse/roles-and-responsibilities/defining-the-role-of-authors-and-contributors.html
  • Keith-Spiegel P, Sieber J, & Koocher GP (2010). Responding to Research Wrongdoing: A User-Friendly Guide . Retrieved from http://users.neo.registeredsite.com/1/4/0/20883041/assets/RRW_11-10.pdf
  • McIntosh T, Antes AL, & DuBois JM (2020). Navigating Complex, Ethical Problems in Professional Life: A Guide to Teaching SMART Strategies for Decision-Making . Journal of Academic Ethics . doi: 10.1007/s10805-020-09369-y [ PMC free article ] [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Meyers LC, Brown AM, Moneta-Koehler L, & Chalkley R (2018). Survey of Checkpoints along the Pathway to Diverse Biomedical Research Faculty . PloS One , 13 ( 1 ), e0190606–e0190606. doi: 10.1371/journal.pone.0190606 [ PMC free article ] [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Munafò MR, Nosek BA, Bishop DVM, Button KS, Chambers CD, Percie du Sert N, … Ioannidis JPA (2017). A manifesto for reproducible science . Nature Human Behaviour , 1 ( 1 ), 0021. doi: 10.1038/s41562-016-0021 [ PMC free article ] [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • National Academies of Science. (2009). On Being a Scientist: A Guide to Responsible Conduct in Research . Washington DC: National Academics Press. [ PubMed ] [ Google Scholar ]
  • National Academies of Sciences Engineering and Medicine. (2017). Fostering Integrity in Research . Washington, DC: The National Academies Press [ PubMed ] [ Google Scholar ]
  • National Academies of Sciences Engineering and Medicine. (2018a). An American Crisis: The Growing Absence of Black Men in Medicine and Science: Proceedings of a Joint Workshop . Washington, DC: The National Academies Press. [ PubMed ] [ Google Scholar ]
  • National Academies of Sciences Engineering and Medicine. (2018b). Sexual harassment of women: climate, culture, and consequences in academic sciences, engineering, and medicine : National Academies Press. [ PubMed ] [ Google Scholar ]
  • National Institutes of Health. (2009). Update on the Requirement for Instruction in the Responsible Conduct of Research . NOT-OD-10-019 . Retrieved from https://grants.nih.gov/grants/guide/notice-files/NOT-OD-10-019.html
  • National Science Foundation. (2017). Important Notice No. 140 Training in Responsible Conduct of Research – A Reminder of the NSF Requirement . Retrieved from https://www.nsf.gov/pubs/issuances/in140.jsp
  • No Place for Bullies in Science. (2018). Nature , 559 ( 7713 ), 151. doi: 10.1038/d41586-018-05683-z [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Norris D, Dirnagl U, Zigmond MJ, Thompson-Peer K, & Chow TT (2018). Health Tips for Research Groups . Nature , 557 , 302–304. doi: 10.1038/d41586-018-05146-5 [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Nosek BA, Alter G, Banks GC, Borsboom D, Bowman SD, Breckler SJ, … Yarkoni T (2015). Scientific Standards . Promoting an Open Research Culture. Science , 348 ( 6242 ), 1422–1425. doi: 10.1126/science.aab2374 [ PMC free article ] [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Nuzzo R (2015). How Scientists Fool Themselves - and How They Can Stop . Nature , 526 , 182–185. [ PubMed ] [ Google Scholar ]
  • O’Connor A (2018). More Evidence that Nutrition Studies Don’t Always Add Up . The New York Times. Retrieved from https://www.nytimes.com/2018/09/29/sunday-review/cornell-food-scientist-wansink-misconduct.html [ Google Scholar ]
  • Park A (2012). Great Science Frauds . Time. Retrieved from https://healthland.time.com/2012/01/13/great-science-frauds/slide/the-baltimore-case/ [ Google Scholar ]
  • Plemmons DK, Baranski EN, Harp K, Lo DD, Soderberg CK, Errington TM, … Esterling KM (2020). A Randomized Trial of a Lab-embedded Discourse Intervention to Improve Research Ethics . Proceedings of the National Academy of Sciences , 117 ( 3 ), 1389. doi: 10.1073/pnas.1917848117 [ PMC free article ] [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Research Institutions Must Put the Health of Labs First. (2018). Nature , 557 ( 7705 ), 279–280. doi: 10.1038/d41586-018-05159-0 [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Research Integrity is Much More Than Misconduct . (2019). ( 570 ). doi: 10.1038/d41586-019-01727-0 [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Resnik DB (2011). Scientific Research and the Public Trust . Science and Engineering Ethics , 17 ( 3 ), 399–409. doi: 10.1007/s11948-010-9210-x [ PMC free article ] [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Roper RL (2019). Does Gender Bias Still Affect Women in Science? Microbiology and Molecular Biology Reviews , 83 ( 3 ), e00018–00019. doi: 10.1128/MMBR.00018-19 [ PMC free article ] [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Shamoo AE, & Resnik DB (2015). Responsible Conduct of Research (3rd ed.). New York: Oxford University Press. [ Google Scholar ]
  • Steneck NH (2007). ORI Introduction to the Responsible Conduct of Research (Updated ed.). Washington, D.C.: U.S. Government Printing Office. [ Google Scholar ]
  • Winchester C (2018). Give Every Paper a Read for Reproducibility . Nature , 557 , 281. doi: 10.1038/d41586-018-05140-x [ PubMed ] [ CrossRef ] [ Google Scholar ]

Numbers, Facts and Trends Shaping Your World

Read our research on:

Full Topic List

Regions & Countries

  • Publications
  • Our Methods
  • Short Reads
  • Tools & Resources

Read Our Research On:

Americans are following the news less closely than they used to

A newspaper reader in Washington Square Park on a September Sunday in New York City. (Gary Hershorn/Getty Images)

Americans are following the news less closely than they were a few years ago, according to a new Pew Research Center analysis. This comes amid changes in news consumption habits , declining trust in the media and high levels of news fatigue .

To examine how closely Americans follow the news, Pew Research Center conducted an analysis using seven survey waves from 2016 to 2022. The most recent survey was of 12,147 U.S. adults from July 18 to Aug. 21, 2022. Everyone who completes these surveys is a member of the Center’s American Trends Panel (ATP), an online survey panel that is recruited through national, random sampling of residential addresses. This way nearly all U.S. adults have a chance of selection. The surveys are weighted to be representative of the U.S. adult population by gender, race, ethnicity, partisan affiliation, education and other categories. Read more about the  ATP’s methodology .

Here are the questions used for this analysis , along with responses, and its  methodology .

Pew Research Center is a subsidiary of The Pew Charitable Trusts, its primary funder. This is the latest analysis in Pew Research Center’s ongoing investigation of the state of news, information and journalism in the digital age, a research program funded by The Pew Charitable Trusts, with generous support from the John S. and James L. Knight Foundation.

A line chart showing that the share of U.S. adults who say they closely follow the news has decreased in recent years.

In 2016, 51% of U.S. adults said they followed the news all or most of the time . But that share fell to 38% in 2022, the most recent time we asked this question .

In turn, a rising share of Americans say they follow the news only now and then . While 12% of adults said this in 2016, that figure increased to 19% by 2022. And while 5% of adults said in 2016 that they hardly ever follow the news, 9% said the same last year.

A line chart showing that attention to news in the U.S. has declined across age groups in recent years.

Older adults are more likely to say they follow the news all or most of the time, while younger adults are less likely. However, Americans in all age groups have become less likely to say they follow the news all or most of the time since 2016.

For example, 46% of adults ages 30 to 49 said in 2016 that they followed the news all or most of the time. As of last year, 27% said this – a decline of 19 percentage points. Although the decline was smaller among adults 18 to 29, their share was relatively low to begin with: 27% said they followed the news closely in 2016, and this fell to 19% in 2022.

The recent decline in Americans’ attention to the news has occurred across demographic lines, including education, gender, race, ethnicity and political party affiliation. But the decline is still bigger among some groups than others.

A line chart showing a steep decline over time in share of Republicans who say they closely follow the news.

For example, the decrease has been particularly steep among Republicans, who also have become much less likely to trust information from national news organizations in recent years.

In 2016, 57% of Republicans and independents who lean Rep­­ublican said they followed the news all or most of the time. In the 2022 survey, 37% said the same, a decrease of 20 points. By comparison, the share saying this among Democrats and Democratic leaners dropped by only 7 points, from 49% to 42%.

Note: Here are the questions used for this analysis , along with responses, and its  methodology .

  • News Habits & Media

Naomi Forman-Katz's photo

Naomi Forman-Katz is a research analyst focusing on news and information research at Pew Research Center

Most Americans say a free press is highly important to society

How hispanic americans get their news, majorities in most countries surveyed say social media is good for democracy, introducing the pew-knight initiative, 8 facts about black americans and the news, most popular.

1615 L St. NW, Suite 800 Washington, DC 20036 USA (+1) 202-419-4300 | Main (+1) 202-857-8562 | Fax (+1) 202-419-4372 |  Media Inquiries

Research Topics

  • Age & Generations
  • Coronavirus (COVID-19)
  • Economy & Work
  • Family & Relationships
  • Gender & LGBTQ
  • Immigration & Migration
  • International Affairs
  • Internet & Technology
  • Methodological Research
  • Non-U.S. Governments
  • Other Topics
  • Politics & Policy
  • Race & Ethnicity
  • Email Newsletters

ABOUT PEW RESEARCH CENTER  Pew Research Center is a nonpartisan fact tank that informs the public about the issues, attitudes and trends shaping the world. It conducts public opinion polling, demographic research, media content analysis and other empirical social science research. Pew Research Center does not take policy positions. It is a subsidiary of  The Pew Charitable Trusts .

Copyright 2024 Pew Research Center

Terms & Conditions

Privacy Policy

Cookie Settings

Reprints, Permissions & Use Policy

IMAGES

  1. School essay: Components of valid research

    a valid research study does the following

  2. 9 Types of Validity in Research (2024)

    a valid research study does the following

  3. Validity and Reliability in Qualitative Research

    a valid research study does the following

  4. Reliability vs. Validity in Research

    a valid research study does the following

  5. Reliability and Validity in Quantitative Research

    a valid research study does the following

  6. Validity and reliability in research example

    a valid research study does the following

VIDEO

  1. Validity vs Reliability || Research ||

  2. Difference between Reliability & Validity in Research

  3. How to Study Like a PhD Student!

  4. What's a Valid and Reliable Survey Question? 🔥 [SURVEY TIPS 📄]

  5. 1. Experimental Validity and Statistical Conclusion Validity

  6. Validity

COMMENTS

  1. The 4 Types of Validity in Research

    The 4 Types of Validity in Research | Definitions & Examples. Published on September 6, 2019 by Fiona Middleton. Revised on June 22, 2023. Validity tells you how accurately a method measures something. If a method measures what it claims to measure, and the results closely correspond to real-world values, then it can be considered valid.

  2. Validity

    Internal Validity (Causal Inference): Example 1: In an experiment, a researcher manipulates the independent variable (e.g., a new drug) and controls for other variables to ensure that any observed effects on the dependent variable (e.g., symptom reduction) are indeed due to the manipulation.

  3. Internal Validity vs. External Validity in Research

    The essential difference between internal validity and external validity is that internal validity refers to the structure of a study (and its variables) while external validity refers to the universality of the results. But there are further differences between the two as well. For instance, internal validity focuses on showing a difference ...

  4. Internal and external validity: can you apply research study results to

    Lack of internal validity implies that the results of the study deviate from the truth, and, therefore, we cannot draw any conclusions; hence, if the results of a trial are not internally valid, external validity is irrelevant. 2 Lack of external validity implies that the results of the trial may not apply to patients who differ from the study population and, consequently, could lead to low ...

  5. Validity In Psychology Research: Types & Examples

    Types of Validity In Psychology. Two main categories of validity are used to assess the validity of the test (i.e., questionnaire, interview, IQ test, etc.): Content and criterion. Content validity refers to the extent to which a test or measurement represents all aspects of the intended content domain. It assesses whether the test items ...

  6. Internal Validity in Research

    Internal validity makes the conclusions of a causal relationship credible and trustworthy. Without high internal validity, an experiment cannot demonstrate a causal link between two variables. Research example. You want to test the hypothesis that drinking a cup of coffee improves memory. You schedule an equal number of college-aged ...

  7. Reliability vs Validity in Research

    Revised on 10 October 2022. Reliability and validity are concepts used to evaluate the quality of research. They indicate how well a method, technique, or test measures something. Reliability is about the consistency of a measure, and validity is about the accuracy of a measure. It's important to consider reliability and validity when you are ...

  8. The 4 Types of Validity

    Face validity. Face validity considers how suitable the content of a test seems to be on the surface. It's similar to content validity, but face validity is a more informal and subjective assessment. Example: Face validity. You create a survey to measure the regularity of people's dietary habits. You review the survey items, which ask ...

  9. Validity in Research: A Guide to Better Results

    Validity in research is the ability to conduct an accurate study with the right tools and conditions to yield acceptable and reliable data that can be reproduced. Researchers rely on carefully calibrated tools for precise measurements. However, collecting accurate information can be more of a challenge.

  10. What is Validity in Research?

    Validity is an important concept in establishing qualitative research rigor. At its core, validity in research speaks to the degree to which a study accurately reflects or assesses the specific concept that the researcher is attempting to measure or understand. It's about ensuring that the study investigates what it purports to investigate.

  11. 5.3 Experimentation and Validity

    Researchers have focused on four validities to help assess whether an experiment is sound (Judd & Kenny, 1981; Morling, 2014)[1][2]: internal validity, external validity, construct validity, and statistical validity. We will explore each validity in depth. Internal Validity. Two variables being statistically related does not necessarily mean ...

  12. A Primer on the Validity of Assessment Instruments

    What is validity? 1. Validity in research refers to how accurately a study answers the study question or the strength of the study conclusions. For outcome measures such as surveys or tests, validity refers to the accuracy of measurement. Here validity refers to how well the assessment tool actually measures the underlying outcome of interest.

  13. Clinical research study designs: The essentials

    Introduction. In clinical research, our aim is to design a study, which would be able to derive a valid and meaningful scientific conclusion using appropriate statistical methods that can be translated to the "real world" setting. 1 Before choosing a study design, one must establish aims and objectives of the study, and choose an appropriate target population that is most representative of ...

  14. 1 Validity and Validation in Research and Assessment

    Research studies support the interpretation and uses of assessment results; assessment results support theory building and problem solving based on research. In this book, I will attempt to summarize current thinking about validity as it relates to both research and assessment. ... Chapter 3 presents ways to address potential threats to the ...

  15. What Makes a Research Study Valid

    What Makes a Research Study Valid. The LifelineLetter and other periodicals often report the findings from medical research studies. When deciphering the results, consumers should be attuned to the study design before making any conclusions about whether a therapy is beneficial, better than no treatment at all or better than a previously used ...

  16. Validity & Reliability In Research

    In simple terms, validity (also called "construct validity") is all about whether a research instrument accurately measures what it's supposed to measure. For example, let's say you have a set of Likert scales that are supposed to quantify someone's level of overall job satisfaction. If this set of scales focused purely on only one ...

  17. Guide: Understanding Reliability and Validity

    Content validity is illustrated using the following examples: Researchers aim to study mathematical learning and create a survey to test for mathematical skill. If these researchers only tested for multiplication and then drew conclusions from that survey, their study would not show content validity because it excludes other mathematical functions.

  18. (PDF) Importance of Reliability and Validity in Research

    validity is the extent in which a study is legitimate based on the way the sample group was selected, data was recorded and analyzed (Hardahan, 2017). External validity will show if the

  19. Solved A valid research study does the following:Provides

    A valid research study does the following:Provides consistent results in a given study.Emphasizes independent over dependent variable.Accurately measure the phenomenon under study.The ethical standard is followed by sociologists.

  20. chapt 14 quizz, EXAM 2 Research Methods Flashcards

    The stereotypical assumption that only women do laundry. The presence of an interaction between the gender and age of participants in a study indicates that: A. the results of the study are accurately interpreted. B. age is a more important variable than gender. C. the results for males cannot be generalized to females.

  21. Study validity

    The internal validity is the steps taken or standards followed by the researchers in the study environment to obtain the truthful results. The external validity is the generalization followed for wider acceptance of global population. [ 2] Although these validation procedures are essential for the clinical studies, greater care is necessary for ...

  22. Topic Review Flashcards

    Which of the following statements concerning a valid study is correct? A. A valid study allows the researcher to draw accurate conclusions. B. A valid study does not have to be reliable. C. A valid study is inadequate in predicting a specific outcome. D. A valid study utilizes a poorly designed measure.

  23. A qualitative simulation checking approach of programmed ...

    The validity and reliability of a grounded theory research based on interpretivism involves four dimensions: credibility, transferability, dependability, and confirmability. In order to enhance the credibility of a qualitative study, the findings of the grounded theory need to be checked for consistency with reality. Traditional checking approaches lack universal applicability and are ...

  24. Perception of when old age starts has increased over time, shows study

    The results from 14,056 middle-aged and older adults who answered the question between one and eight times over a 25-year period from 1996, when they were between 40 and 100 years old, reveals ...

  25. Journal of Medical Internet Research

    Background: Health care organizations implement electronic health record (EHR) systems with the expectation of improved patient care and enhanced provider performance. However, while these technologies hold the potential to create improved care and system efficiencies, they can also lead to unintended negative consequences, such as patient safety issues, communication problems, and provider ...

  26. Latest science news, discoveries and analysis

    Find breaking science news and analysis from the world's leading research journal.

  27. How to Conduct Responsible Research: A Guide for Graduate Students

    Abstract. Researchers must conduct research responsibly for it to have an impact and to safeguard trust in science. Essential responsibilities of researchers include using rigorous, reproducible research methods, reporting findings in a trustworthy manner, and giving the researchers who contributed appropriate authorship credit.

  28. Crime in the U.S.: Key questions answered

    We conducted this analysis to learn more about U.S. crime patterns and how those patterns have changed over time. The analysis relies on statistics published by the FBI, which we accessed through the Crime Data Explorer, and the Bureau of Justice Statistics (BJS), which we accessed through the National Crime Victimization Survey data analysis tool. ...

  29. Americans are following the news less closely than they used to

    In 2016, 51% of U.S. adults said they followed the news all or most of the time.But that share fell to 38% in 2022, the most recent time we asked this question.. In turn, a rising share of Americans say they follow the news only now and then.While 12% of adults said this in 2016, that figure increased to 19% by 2022.