quantitative research importance across fields

PHILO-notes

Free Online Learning Materials

Importance of Quantitative Research Across Fields

First of all, research is necessary and valuable in society because, among other things, 1) it is an important tool for building knowledge and facilitating learning; 2) it serves as a means in understanding social and political issues and in increasing public awareness; 3) it helps people succeed in business; 4) it enables us to disprove lies and support truths; and 5) it serves as a means to find, gauge, and seize opportunities, as well as helps in finding solutions to social and health problems (in fact, the discovery of COVID-19 vaccines is a product of research).

Now, quantitative research, as a type of research that explains phenomena according to numerical data which are analyzed by means of mathematically based methods, especially statistics, is very important because it relies on hard facts and numerical data to gain as objective a picture of people’s opinion as possible or an objective understanding of reality. Hence, quantitative research enables us to map out and understand the world in which we live.

In addition, quantitative research is important because it enables us to conduct research on a large scale; it can reveal insights about broader groups of people or the population as a whole; it enables researchers to compare different groups to understand similarities and differences; and it helps businesses understand the size of a new opportunity. As we can see, quantitative research is important across fields and disciplines.

Let me now briefly discuss the importance of quantitative research across fields and disciplines. But for brevity’ sake, the discussion that follows will only focus on the importance of quantitative research in psychology, economics, education, environmental science and sustainability, and business.

First, on the importance of quantitative research in psychology .

We know for a fact that one of the major goals of psychology is to understand all the elements that propel human (as well as animal) behavior. Here, one of the most frequent tasks of psychologists is to represent a series of observations or measurements by a concise and suitable formula. Such a formula may either express a physical hypothesis, or on the other hand be merely empirical, that is, it may enable researchers in the field of psychology to represent by a few well selected constants a wide range of experimental or observational data. In the latter case it serves not only for purposes of interpolation, but frequently suggests new physical concepts or statistical constants. Indeed, quantitative research is very important for this purpose.

It is also important to note that in psychology research, researchers would normally discern cause-effect relationships, such as the study that determines the effect of drugs on teenagers. But cause-effect relationships cannot be elucidated without hard statistical data gathered through observations and empirical research. Hence, again, quantitative research is very important in the field of psychology because it allows researchers to accumulate facts and eventually create theories that allow researchers in psychology to understand human condition and perhaps diminish suffering and allow human race to flourish.

Second, on the importance of quantitative research in economics .

In general perspective, the economists have long used quantitative methods to provide us with theories and explanations on why certain things happen in the market. Through quantitative research too, economists were able to explain why a given economic system behaves the way it does. It is also important to note that the application of quantitative methods, models and the corresponding algorithms helps to make more accurate and efficient research of complex economic phenomena and issues, as well as their interdependence with the aim of making decisions and forecasting future trends of economic aspects and processes.

Third, on the importance of quantitative research in education .

Again, quantitative research deals with the collection of numerical data for some type of analysis. Whether a teacher is trying to assess the average scores on a classroom test, determine a teaching standard that was most commonly missed on the classroom assessment, or if a principal wants to assess the ways the attendance rates correlate with students’ performance on government assessments, quantitative research is more useful and appropriate.

In many cases too, school districts use quantitative data to evaluate teacher effectiveness from a number of measures, including stakeholder perception surveys, students’ performance and growth on standardized government assessments, and percentages on their levels of professionalism. Quantitative research is also good for informing instructional decisions, measuring the effectiveness of the school climate based on survey data issued to teachers and school personnel, and discovering students’ learning preferences.

Fourth, on the importance of quantitative research in Environmental Science and Sustainability.

Addressing environmental problems requires solid evidence to persuade decision makers of the necessity of change. This makes quantitative literacy essential for sustainability professionals to interpret scientific data and implement management procedures. Indeed, with our world facing increasingly complex environmental issues, quantitative techniques reduce the numerous uncertainties by providing a reliable representation of reality, enabling policy makers to proceed toward potential solutions with greater confidence. For this purpose, a wide range of statistical tools and approaches are now available for sustainability scientists to measure environmental indicators and inform responsible policymaking. As we can see, quantitative research is very important in environmental science and sustainability.

But how does quantitative research provide the context for environmental science and sustainability?

Environmental science brings a transdisciplinary systems approach to analyzing sustainability concerns. As the intrinsic concept of sustainability can be interpreted according to diverse values and definitions, quantitative methods based on rigorous scientific research are crucial for establishing an evidence-based consensus on pertinent issues that provide a foundation for meaningful policy implementation.

And fifth, on the importance of quantitative research in business .

As is well known, market research plays a key role in determining the factors that lead to business success. Whether one wants to estimate the size of a potential market or understand the competition for a particular product, it is very important to apply methods that will yield measurable results in conducting a  market research  assignment. Quantitative research can make this happen by employing data capture methods and statistical analysis. Quantitative market research is used for estimating consumer attitudes and behaviors, market sizing, segmentation and identifying drivers for brand recall and product purchase decisions.

Indeed, quantitative data open a lot of doors for businesses. Regression analysis, simulations, and hypothesis testing are examples of tools that might reveal trends that business leaders might not have noticed otherwise. Business leaders can use this data to identify areas where their company could improve its performance.

U.S. flag

An official website of the United States government

The .gov means it’s official. Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

The site is secure. The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

  • Publications
  • Account settings

Preview improvements coming to the PMC website in October 2024. Learn More or Try it out now .

  • Advanced Search
  • Journal List
  • J Korean Med Sci
  • v.37(16); 2022 Apr 25

Logo of jkms

A Practical Guide to Writing Quantitative and Qualitative Research Questions and Hypotheses in Scholarly Articles

Edward barroga.

1 Department of General Education, Graduate School of Nursing Science, St. Luke’s International University, Tokyo, Japan.

Glafera Janet Matanguihan

2 Department of Biological Sciences, Messiah University, Mechanicsburg, PA, USA.

The development of research questions and the subsequent hypotheses are prerequisites to defining the main research purpose and specific objectives of a study. Consequently, these objectives determine the study design and research outcome. The development of research questions is a process based on knowledge of current trends, cutting-edge studies, and technological advances in the research field. Excellent research questions are focused and require a comprehensive literature search and in-depth understanding of the problem being investigated. Initially, research questions may be written as descriptive questions which could be developed into inferential questions. These questions must be specific and concise to provide a clear foundation for developing hypotheses. Hypotheses are more formal predictions about the research outcomes. These specify the possible results that may or may not be expected regarding the relationship between groups. Thus, research questions and hypotheses clarify the main purpose and specific objectives of the study, which in turn dictate the design of the study, its direction, and outcome. Studies developed from good research questions and hypotheses will have trustworthy outcomes with wide-ranging social and health implications.

INTRODUCTION

Scientific research is usually initiated by posing evidenced-based research questions which are then explicitly restated as hypotheses. 1 , 2 The hypotheses provide directions to guide the study, solutions, explanations, and expected results. 3 , 4 Both research questions and hypotheses are essentially formulated based on conventional theories and real-world processes, which allow the inception of novel studies and the ethical testing of ideas. 5 , 6

It is crucial to have knowledge of both quantitative and qualitative research 2 as both types of research involve writing research questions and hypotheses. 7 However, these crucial elements of research are sometimes overlooked; if not overlooked, then framed without the forethought and meticulous attention it needs. Planning and careful consideration are needed when developing quantitative or qualitative research, particularly when conceptualizing research questions and hypotheses. 4

There is a continuing need to support researchers in the creation of innovative research questions and hypotheses, as well as for journal articles that carefully review these elements. 1 When research questions and hypotheses are not carefully thought of, unethical studies and poor outcomes usually ensue. Carefully formulated research questions and hypotheses define well-founded objectives, which in turn determine the appropriate design, course, and outcome of the study. This article then aims to discuss in detail the various aspects of crafting research questions and hypotheses, with the goal of guiding researchers as they develop their own. Examples from the authors and peer-reviewed scientific articles in the healthcare field are provided to illustrate key points.

DEFINITIONS AND RELATIONSHIP OF RESEARCH QUESTIONS AND HYPOTHESES

A research question is what a study aims to answer after data analysis and interpretation. The answer is written in length in the discussion section of the paper. Thus, the research question gives a preview of the different parts and variables of the study meant to address the problem posed in the research question. 1 An excellent research question clarifies the research writing while facilitating understanding of the research topic, objective, scope, and limitations of the study. 5

On the other hand, a research hypothesis is an educated statement of an expected outcome. This statement is based on background research and current knowledge. 8 , 9 The research hypothesis makes a specific prediction about a new phenomenon 10 or a formal statement on the expected relationship between an independent variable and a dependent variable. 3 , 11 It provides a tentative answer to the research question to be tested or explored. 4

Hypotheses employ reasoning to predict a theory-based outcome. 10 These can also be developed from theories by focusing on components of theories that have not yet been observed. 10 The validity of hypotheses is often based on the testability of the prediction made in a reproducible experiment. 8

Conversely, hypotheses can also be rephrased as research questions. Several hypotheses based on existing theories and knowledge may be needed to answer a research question. Developing ethical research questions and hypotheses creates a research design that has logical relationships among variables. These relationships serve as a solid foundation for the conduct of the study. 4 , 11 Haphazardly constructed research questions can result in poorly formulated hypotheses and improper study designs, leading to unreliable results. Thus, the formulations of relevant research questions and verifiable hypotheses are crucial when beginning research. 12

CHARACTERISTICS OF GOOD RESEARCH QUESTIONS AND HYPOTHESES

Excellent research questions are specific and focused. These integrate collective data and observations to confirm or refute the subsequent hypotheses. Well-constructed hypotheses are based on previous reports and verify the research context. These are realistic, in-depth, sufficiently complex, and reproducible. More importantly, these hypotheses can be addressed and tested. 13

There are several characteristics of well-developed hypotheses. Good hypotheses are 1) empirically testable 7 , 10 , 11 , 13 ; 2) backed by preliminary evidence 9 ; 3) testable by ethical research 7 , 9 ; 4) based on original ideas 9 ; 5) have evidenced-based logical reasoning 10 ; and 6) can be predicted. 11 Good hypotheses can infer ethical and positive implications, indicating the presence of a relationship or effect relevant to the research theme. 7 , 11 These are initially developed from a general theory and branch into specific hypotheses by deductive reasoning. In the absence of a theory to base the hypotheses, inductive reasoning based on specific observations or findings form more general hypotheses. 10

TYPES OF RESEARCH QUESTIONS AND HYPOTHESES

Research questions and hypotheses are developed according to the type of research, which can be broadly classified into quantitative and qualitative research. We provide a summary of the types of research questions and hypotheses under quantitative and qualitative research categories in Table 1 .

Quantitative research questionsQuantitative research hypotheses
Descriptive research questionsSimple hypothesis
Comparative research questionsComplex hypothesis
Relationship research questionsDirectional hypothesis
Non-directional hypothesis
Associative hypothesis
Causal hypothesis
Null hypothesis
Alternative hypothesis
Working hypothesis
Statistical hypothesis
Logical hypothesis
Hypothesis-testing
Qualitative research questionsQualitative research hypotheses
Contextual research questionsHypothesis-generating
Descriptive research questions
Evaluation research questions
Explanatory research questions
Exploratory research questions
Generative research questions
Ideological research questions
Ethnographic research questions
Phenomenological research questions
Grounded theory questions
Qualitative case study questions

Research questions in quantitative research

In quantitative research, research questions inquire about the relationships among variables being investigated and are usually framed at the start of the study. These are precise and typically linked to the subject population, dependent and independent variables, and research design. 1 Research questions may also attempt to describe the behavior of a population in relation to one or more variables, or describe the characteristics of variables to be measured ( descriptive research questions ). 1 , 5 , 14 These questions may also aim to discover differences between groups within the context of an outcome variable ( comparative research questions ), 1 , 5 , 14 or elucidate trends and interactions among variables ( relationship research questions ). 1 , 5 We provide examples of descriptive, comparative, and relationship research questions in quantitative research in Table 2 .

Quantitative research questions
Descriptive research question
- Measures responses of subjects to variables
- Presents variables to measure, analyze, or assess
What is the proportion of resident doctors in the hospital who have mastered ultrasonography (response of subjects to a variable) as a diagnostic technique in their clinical training?
Comparative research question
- Clarifies difference between one group with outcome variable and another group without outcome variable
Is there a difference in the reduction of lung metastasis in osteosarcoma patients who received the vitamin D adjunctive therapy (group with outcome variable) compared with osteosarcoma patients who did not receive the vitamin D adjunctive therapy (group without outcome variable)?
- Compares the effects of variables
How does the vitamin D analogue 22-Oxacalcitriol (variable 1) mimic the antiproliferative activity of 1,25-Dihydroxyvitamin D (variable 2) in osteosarcoma cells?
Relationship research question
- Defines trends, association, relationships, or interactions between dependent variable and independent variable
Is there a relationship between the number of medical student suicide (dependent variable) and the level of medical student stress (independent variable) in Japan during the first wave of the COVID-19 pandemic?

Hypotheses in quantitative research

In quantitative research, hypotheses predict the expected relationships among variables. 15 Relationships among variables that can be predicted include 1) between a single dependent variable and a single independent variable ( simple hypothesis ) or 2) between two or more independent and dependent variables ( complex hypothesis ). 4 , 11 Hypotheses may also specify the expected direction to be followed and imply an intellectual commitment to a particular outcome ( directional hypothesis ) 4 . On the other hand, hypotheses may not predict the exact direction and are used in the absence of a theory, or when findings contradict previous studies ( non-directional hypothesis ). 4 In addition, hypotheses can 1) define interdependency between variables ( associative hypothesis ), 4 2) propose an effect on the dependent variable from manipulation of the independent variable ( causal hypothesis ), 4 3) state a negative relationship between two variables ( null hypothesis ), 4 , 11 , 15 4) replace the working hypothesis if rejected ( alternative hypothesis ), 15 explain the relationship of phenomena to possibly generate a theory ( working hypothesis ), 11 5) involve quantifiable variables that can be tested statistically ( statistical hypothesis ), 11 6) or express a relationship whose interlinks can be verified logically ( logical hypothesis ). 11 We provide examples of simple, complex, directional, non-directional, associative, causal, null, alternative, working, statistical, and logical hypotheses in quantitative research, as well as the definition of quantitative hypothesis-testing research in Table 3 .

Quantitative research hypotheses
Simple hypothesis
- Predicts relationship between single dependent variable and single independent variable
If the dose of the new medication (single independent variable) is high, blood pressure (single dependent variable) is lowered.
Complex hypothesis
- Foretells relationship between two or more independent and dependent variables
The higher the use of anticancer drugs, radiation therapy, and adjunctive agents (3 independent variables), the higher would be the survival rate (1 dependent variable).
Directional hypothesis
- Identifies study direction based on theory towards particular outcome to clarify relationship between variables
Privately funded research projects will have a larger international scope (study direction) than publicly funded research projects.
Non-directional hypothesis
- Nature of relationship between two variables or exact study direction is not identified
- Does not involve a theory
Women and men are different in terms of helpfulness. (Exact study direction is not identified)
Associative hypothesis
- Describes variable interdependency
- Change in one variable causes change in another variable
A larger number of people vaccinated against COVID-19 in the region (change in independent variable) will reduce the region’s incidence of COVID-19 infection (change in dependent variable).
Causal hypothesis
- An effect on dependent variable is predicted from manipulation of independent variable
A change into a high-fiber diet (independent variable) will reduce the blood sugar level (dependent variable) of the patient.
Null hypothesis
- A negative statement indicating no relationship or difference between 2 variables
There is no significant difference in the severity of pulmonary metastases between the new drug (variable 1) and the current drug (variable 2).
Alternative hypothesis
- Following a null hypothesis, an alternative hypothesis predicts a relationship between 2 study variables
The new drug (variable 1) is better on average in reducing the level of pain from pulmonary metastasis than the current drug (variable 2).
Working hypothesis
- A hypothesis that is initially accepted for further research to produce a feasible theory
Dairy cows fed with concentrates of different formulations will produce different amounts of milk.
Statistical hypothesis
- Assumption about the value of population parameter or relationship among several population characteristics
- Validity tested by a statistical experiment or analysis
The mean recovery rate from COVID-19 infection (value of population parameter) is not significantly different between population 1 and population 2.
There is a positive correlation between the level of stress at the workplace and the number of suicides (population characteristics) among working people in Japan.
Logical hypothesis
- Offers or proposes an explanation with limited or no extensive evidence
If healthcare workers provide more educational programs about contraception methods, the number of adolescent pregnancies will be less.
Hypothesis-testing (Quantitative hypothesis-testing research)
- Quantitative research uses deductive reasoning.
- This involves the formation of a hypothesis, collection of data in the investigation of the problem, analysis and use of the data from the investigation, and drawing of conclusions to validate or nullify the hypotheses.

Research questions in qualitative research

Unlike research questions in quantitative research, research questions in qualitative research are usually continuously reviewed and reformulated. The central question and associated subquestions are stated more than the hypotheses. 15 The central question broadly explores a complex set of factors surrounding the central phenomenon, aiming to present the varied perspectives of participants. 15

There are varied goals for which qualitative research questions are developed. These questions can function in several ways, such as to 1) identify and describe existing conditions ( contextual research question s); 2) describe a phenomenon ( descriptive research questions ); 3) assess the effectiveness of existing methods, protocols, theories, or procedures ( evaluation research questions ); 4) examine a phenomenon or analyze the reasons or relationships between subjects or phenomena ( explanatory research questions ); or 5) focus on unknown aspects of a particular topic ( exploratory research questions ). 5 In addition, some qualitative research questions provide new ideas for the development of theories and actions ( generative research questions ) or advance specific ideologies of a position ( ideological research questions ). 1 Other qualitative research questions may build on a body of existing literature and become working guidelines ( ethnographic research questions ). Research questions may also be broadly stated without specific reference to the existing literature or a typology of questions ( phenomenological research questions ), may be directed towards generating a theory of some process ( grounded theory questions ), or may address a description of the case and the emerging themes ( qualitative case study questions ). 15 We provide examples of contextual, descriptive, evaluation, explanatory, exploratory, generative, ideological, ethnographic, phenomenological, grounded theory, and qualitative case study research questions in qualitative research in Table 4 , and the definition of qualitative hypothesis-generating research in Table 5 .

Qualitative research questions
Contextual research question
- Ask the nature of what already exists
- Individuals or groups function to further clarify and understand the natural context of real-world problems
What are the experiences of nurses working night shifts in healthcare during the COVID-19 pandemic? (natural context of real-world problems)
Descriptive research question
- Aims to describe a phenomenon
What are the different forms of disrespect and abuse (phenomenon) experienced by Tanzanian women when giving birth in healthcare facilities?
Evaluation research question
- Examines the effectiveness of existing practice or accepted frameworks
How effective are decision aids (effectiveness of existing practice) in helping decide whether to give birth at home or in a healthcare facility?
Explanatory research question
- Clarifies a previously studied phenomenon and explains why it occurs
Why is there an increase in teenage pregnancy (phenomenon) in Tanzania?
Exploratory research question
- Explores areas that have not been fully investigated to have a deeper understanding of the research problem
What factors affect the mental health of medical students (areas that have not yet been fully investigated) during the COVID-19 pandemic?
Generative research question
- Develops an in-depth understanding of people’s behavior by asking ‘how would’ or ‘what if’ to identify problems and find solutions
How would the extensive research experience of the behavior of new staff impact the success of the novel drug initiative?
Ideological research question
- Aims to advance specific ideas or ideologies of a position
Are Japanese nurses who volunteer in remote African hospitals able to promote humanized care of patients (specific ideas or ideologies) in the areas of safe patient environment, respect of patient privacy, and provision of accurate information related to health and care?
Ethnographic research question
- Clarifies peoples’ nature, activities, their interactions, and the outcomes of their actions in specific settings
What are the demographic characteristics, rehabilitative treatments, community interactions, and disease outcomes (nature, activities, their interactions, and the outcomes) of people in China who are suffering from pneumoconiosis?
Phenomenological research question
- Knows more about the phenomena that have impacted an individual
What are the lived experiences of parents who have been living with and caring for children with a diagnosis of autism? (phenomena that have impacted an individual)
Grounded theory question
- Focuses on social processes asking about what happens and how people interact, or uncovering social relationships and behaviors of groups
What are the problems that pregnant adolescents face in terms of social and cultural norms (social processes), and how can these be addressed?
Qualitative case study question
- Assesses a phenomenon using different sources of data to answer “why” and “how” questions
- Considers how the phenomenon is influenced by its contextual situation.
How does quitting work and assuming the role of a full-time mother (phenomenon assessed) change the lives of women in Japan?
Qualitative research hypotheses
Hypothesis-generating (Qualitative hypothesis-generating research)
- Qualitative research uses inductive reasoning.
- This involves data collection from study participants or the literature regarding a phenomenon of interest, using the collected data to develop a formal hypothesis, and using the formal hypothesis as a framework for testing the hypothesis.
- Qualitative exploratory studies explore areas deeper, clarifying subjective experience and allowing formulation of a formal hypothesis potentially testable in a future quantitative approach.

Qualitative studies usually pose at least one central research question and several subquestions starting with How or What . These research questions use exploratory verbs such as explore or describe . These also focus on one central phenomenon of interest, and may mention the participants and research site. 15

Hypotheses in qualitative research

Hypotheses in qualitative research are stated in the form of a clear statement concerning the problem to be investigated. Unlike in quantitative research where hypotheses are usually developed to be tested, qualitative research can lead to both hypothesis-testing and hypothesis-generating outcomes. 2 When studies require both quantitative and qualitative research questions, this suggests an integrative process between both research methods wherein a single mixed-methods research question can be developed. 1

FRAMEWORKS FOR DEVELOPING RESEARCH QUESTIONS AND HYPOTHESES

Research questions followed by hypotheses should be developed before the start of the study. 1 , 12 , 14 It is crucial to develop feasible research questions on a topic that is interesting to both the researcher and the scientific community. This can be achieved by a meticulous review of previous and current studies to establish a novel topic. Specific areas are subsequently focused on to generate ethical research questions. The relevance of the research questions is evaluated in terms of clarity of the resulting data, specificity of the methodology, objectivity of the outcome, depth of the research, and impact of the study. 1 , 5 These aspects constitute the FINER criteria (i.e., Feasible, Interesting, Novel, Ethical, and Relevant). 1 Clarity and effectiveness are achieved if research questions meet the FINER criteria. In addition to the FINER criteria, Ratan et al. described focus, complexity, novelty, feasibility, and measurability for evaluating the effectiveness of research questions. 14

The PICOT and PEO frameworks are also used when developing research questions. 1 The following elements are addressed in these frameworks, PICOT: P-population/patients/problem, I-intervention or indicator being studied, C-comparison group, O-outcome of interest, and T-timeframe of the study; PEO: P-population being studied, E-exposure to preexisting conditions, and O-outcome of interest. 1 Research questions are also considered good if these meet the “FINERMAPS” framework: Feasible, Interesting, Novel, Ethical, Relevant, Manageable, Appropriate, Potential value/publishable, and Systematic. 14

As we indicated earlier, research questions and hypotheses that are not carefully formulated result in unethical studies or poor outcomes. To illustrate this, we provide some examples of ambiguous research question and hypotheses that result in unclear and weak research objectives in quantitative research ( Table 6 ) 16 and qualitative research ( Table 7 ) 17 , and how to transform these ambiguous research question(s) and hypothesis(es) into clear and good statements.

VariablesUnclear and weak statement (Statement 1) Clear and good statement (Statement 2) Points to avoid
Research questionWhich is more effective between smoke moxibustion and smokeless moxibustion?“Moreover, regarding smoke moxibustion versus smokeless moxibustion, it remains unclear which is more effective, safe, and acceptable to pregnant women, and whether there is any difference in the amount of heat generated.” 1) Vague and unfocused questions
2) Closed questions simply answerable by yes or no
3) Questions requiring a simple choice
HypothesisThe smoke moxibustion group will have higher cephalic presentation.“Hypothesis 1. The smoke moxibustion stick group (SM group) and smokeless moxibustion stick group (-SLM group) will have higher rates of cephalic presentation after treatment than the control group.1) Unverifiable hypotheses
Hypothesis 2. The SM group and SLM group will have higher rates of cephalic presentation at birth than the control group.2) Incompletely stated groups of comparison
Hypothesis 3. There will be no significant differences in the well-being of the mother and child among the three groups in terms of the following outcomes: premature birth, premature rupture of membranes (PROM) at < 37 weeks, Apgar score < 7 at 5 min, umbilical cord blood pH < 7.1, admission to neonatal intensive care unit (NICU), and intrauterine fetal death.” 3) Insufficiently described variables or outcomes
Research objectiveTo determine which is more effective between smoke moxibustion and smokeless moxibustion.“The specific aims of this pilot study were (a) to compare the effects of smoke moxibustion and smokeless moxibustion treatments with the control group as a possible supplement to ECV for converting breech presentation to cephalic presentation and increasing adherence to the newly obtained cephalic position, and (b) to assess the effects of these treatments on the well-being of the mother and child.” 1) Poor understanding of the research question and hypotheses
2) Insufficient description of population, variables, or study outcomes

a These statements were composed for comparison and illustrative purposes only.

b These statements are direct quotes from Higashihara and Horiuchi. 16

VariablesUnclear and weak statement (Statement 1)Clear and good statement (Statement 2)Points to avoid
Research questionDoes disrespect and abuse (D&A) occur in childbirth in Tanzania?How does disrespect and abuse (D&A) occur and what are the types of physical and psychological abuses observed in midwives’ actual care during facility-based childbirth in urban Tanzania?1) Ambiguous or oversimplistic questions
2) Questions unverifiable by data collection and analysis
HypothesisDisrespect and abuse (D&A) occur in childbirth in Tanzania.Hypothesis 1: Several types of physical and psychological abuse by midwives in actual care occur during facility-based childbirth in urban Tanzania.1) Statements simply expressing facts
Hypothesis 2: Weak nursing and midwifery management contribute to the D&A of women during facility-based childbirth in urban Tanzania.2) Insufficiently described concepts or variables
Research objectiveTo describe disrespect and abuse (D&A) in childbirth in Tanzania.“This study aimed to describe from actual observations the respectful and disrespectful care received by women from midwives during their labor period in two hospitals in urban Tanzania.” 1) Statements unrelated to the research question and hypotheses
2) Unattainable or unexplorable objectives

a This statement is a direct quote from Shimoda et al. 17

The other statements were composed for comparison and illustrative purposes only.

CONSTRUCTING RESEARCH QUESTIONS AND HYPOTHESES

To construct effective research questions and hypotheses, it is very important to 1) clarify the background and 2) identify the research problem at the outset of the research, within a specific timeframe. 9 Then, 3) review or conduct preliminary research to collect all available knowledge about the possible research questions by studying theories and previous studies. 18 Afterwards, 4) construct research questions to investigate the research problem. Identify variables to be accessed from the research questions 4 and make operational definitions of constructs from the research problem and questions. Thereafter, 5) construct specific deductive or inductive predictions in the form of hypotheses. 4 Finally, 6) state the study aims . This general flow for constructing effective research questions and hypotheses prior to conducting research is shown in Fig. 1 .

An external file that holds a picture, illustration, etc.
Object name is jkms-37-e121-g001.jpg

Research questions are used more frequently in qualitative research than objectives or hypotheses. 3 These questions seek to discover, understand, explore or describe experiences by asking “What” or “How.” The questions are open-ended to elicit a description rather than to relate variables or compare groups. The questions are continually reviewed, reformulated, and changed during the qualitative study. 3 Research questions are also used more frequently in survey projects than hypotheses in experiments in quantitative research to compare variables and their relationships.

Hypotheses are constructed based on the variables identified and as an if-then statement, following the template, ‘If a specific action is taken, then a certain outcome is expected.’ At this stage, some ideas regarding expectations from the research to be conducted must be drawn. 18 Then, the variables to be manipulated (independent) and influenced (dependent) are defined. 4 Thereafter, the hypothesis is stated and refined, and reproducible data tailored to the hypothesis are identified, collected, and analyzed. 4 The hypotheses must be testable and specific, 18 and should describe the variables and their relationships, the specific group being studied, and the predicted research outcome. 18 Hypotheses construction involves a testable proposition to be deduced from theory, and independent and dependent variables to be separated and measured separately. 3 Therefore, good hypotheses must be based on good research questions constructed at the start of a study or trial. 12

In summary, research questions are constructed after establishing the background of the study. Hypotheses are then developed based on the research questions. Thus, it is crucial to have excellent research questions to generate superior hypotheses. In turn, these would determine the research objectives and the design of the study, and ultimately, the outcome of the research. 12 Algorithms for building research questions and hypotheses are shown in Fig. 2 for quantitative research and in Fig. 3 for qualitative research.

An external file that holds a picture, illustration, etc.
Object name is jkms-37-e121-g002.jpg

EXAMPLES OF RESEARCH QUESTIONS FROM PUBLISHED ARTICLES

  • EXAMPLE 1. Descriptive research question (quantitative research)
  • - Presents research variables to be assessed (distinct phenotypes and subphenotypes)
  • “BACKGROUND: Since COVID-19 was identified, its clinical and biological heterogeneity has been recognized. Identifying COVID-19 phenotypes might help guide basic, clinical, and translational research efforts.
  • RESEARCH QUESTION: Does the clinical spectrum of patients with COVID-19 contain distinct phenotypes and subphenotypes? ” 19
  • EXAMPLE 2. Relationship research question (quantitative research)
  • - Shows interactions between dependent variable (static postural control) and independent variable (peripheral visual field loss)
  • “Background: Integration of visual, vestibular, and proprioceptive sensations contributes to postural control. People with peripheral visual field loss have serious postural instability. However, the directional specificity of postural stability and sensory reweighting caused by gradual peripheral visual field loss remain unclear.
  • Research question: What are the effects of peripheral visual field loss on static postural control ?” 20
  • EXAMPLE 3. Comparative research question (quantitative research)
  • - Clarifies the difference among groups with an outcome variable (patients enrolled in COMPERA with moderate PH or severe PH in COPD) and another group without the outcome variable (patients with idiopathic pulmonary arterial hypertension (IPAH))
  • “BACKGROUND: Pulmonary hypertension (PH) in COPD is a poorly investigated clinical condition.
  • RESEARCH QUESTION: Which factors determine the outcome of PH in COPD?
  • STUDY DESIGN AND METHODS: We analyzed the characteristics and outcome of patients enrolled in the Comparative, Prospective Registry of Newly Initiated Therapies for Pulmonary Hypertension (COMPERA) with moderate or severe PH in COPD as defined during the 6th PH World Symposium who received medical therapy for PH and compared them with patients with idiopathic pulmonary arterial hypertension (IPAH) .” 21
  • EXAMPLE 4. Exploratory research question (qualitative research)
  • - Explores areas that have not been fully investigated (perspectives of families and children who receive care in clinic-based child obesity treatment) to have a deeper understanding of the research problem
  • “Problem: Interventions for children with obesity lead to only modest improvements in BMI and long-term outcomes, and data are limited on the perspectives of families of children with obesity in clinic-based treatment. This scoping review seeks to answer the question: What is known about the perspectives of families and children who receive care in clinic-based child obesity treatment? This review aims to explore the scope of perspectives reported by families of children with obesity who have received individualized outpatient clinic-based obesity treatment.” 22
  • EXAMPLE 5. Relationship research question (quantitative research)
  • - Defines interactions between dependent variable (use of ankle strategies) and independent variable (changes in muscle tone)
  • “Background: To maintain an upright standing posture against external disturbances, the human body mainly employs two types of postural control strategies: “ankle strategy” and “hip strategy.” While it has been reported that the magnitude of the disturbance alters the use of postural control strategies, it has not been elucidated how the level of muscle tone, one of the crucial parameters of bodily function, determines the use of each strategy. We have previously confirmed using forward dynamics simulations of human musculoskeletal models that an increased muscle tone promotes the use of ankle strategies. The objective of the present study was to experimentally evaluate a hypothesis: an increased muscle tone promotes the use of ankle strategies. Research question: Do changes in the muscle tone affect the use of ankle strategies ?” 23

EXAMPLES OF HYPOTHESES IN PUBLISHED ARTICLES

  • EXAMPLE 1. Working hypothesis (quantitative research)
  • - A hypothesis that is initially accepted for further research to produce a feasible theory
  • “As fever may have benefit in shortening the duration of viral illness, it is plausible to hypothesize that the antipyretic efficacy of ibuprofen may be hindering the benefits of a fever response when taken during the early stages of COVID-19 illness .” 24
  • “In conclusion, it is plausible to hypothesize that the antipyretic efficacy of ibuprofen may be hindering the benefits of a fever response . The difference in perceived safety of these agents in COVID-19 illness could be related to the more potent efficacy to reduce fever with ibuprofen compared to acetaminophen. Compelling data on the benefit of fever warrant further research and review to determine when to treat or withhold ibuprofen for early stage fever for COVID-19 and other related viral illnesses .” 24
  • EXAMPLE 2. Exploratory hypothesis (qualitative research)
  • - Explores particular areas deeper to clarify subjective experience and develop a formal hypothesis potentially testable in a future quantitative approach
  • “We hypothesized that when thinking about a past experience of help-seeking, a self distancing prompt would cause increased help-seeking intentions and more favorable help-seeking outcome expectations .” 25
  • “Conclusion
  • Although a priori hypotheses were not supported, further research is warranted as results indicate the potential for using self-distancing approaches to increasing help-seeking among some people with depressive symptomatology.” 25
  • EXAMPLE 3. Hypothesis-generating research to establish a framework for hypothesis testing (qualitative research)
  • “We hypothesize that compassionate care is beneficial for patients (better outcomes), healthcare systems and payers (lower costs), and healthcare providers (lower burnout). ” 26
  • Compassionomics is the branch of knowledge and scientific study of the effects of compassionate healthcare. Our main hypotheses are that compassionate healthcare is beneficial for (1) patients, by improving clinical outcomes, (2) healthcare systems and payers, by supporting financial sustainability, and (3) HCPs, by lowering burnout and promoting resilience and well-being. The purpose of this paper is to establish a scientific framework for testing the hypotheses above . If these hypotheses are confirmed through rigorous research, compassionomics will belong in the science of evidence-based medicine, with major implications for all healthcare domains.” 26
  • EXAMPLE 4. Statistical hypothesis (quantitative research)
  • - An assumption is made about the relationship among several population characteristics ( gender differences in sociodemographic and clinical characteristics of adults with ADHD ). Validity is tested by statistical experiment or analysis ( chi-square test, Students t-test, and logistic regression analysis)
  • “Our research investigated gender differences in sociodemographic and clinical characteristics of adults with ADHD in a Japanese clinical sample. Due to unique Japanese cultural ideals and expectations of women's behavior that are in opposition to ADHD symptoms, we hypothesized that women with ADHD experience more difficulties and present more dysfunctions than men . We tested the following hypotheses: first, women with ADHD have more comorbidities than men with ADHD; second, women with ADHD experience more social hardships than men, such as having less full-time employment and being more likely to be divorced.” 27
  • “Statistical Analysis
  • ( text omitted ) Between-gender comparisons were made using the chi-squared test for categorical variables and Students t-test for continuous variables…( text omitted ). A logistic regression analysis was performed for employment status, marital status, and comorbidity to evaluate the independent effects of gender on these dependent variables.” 27

EXAMPLES OF HYPOTHESIS AS WRITTEN IN PUBLISHED ARTICLES IN RELATION TO OTHER PARTS

  • EXAMPLE 1. Background, hypotheses, and aims are provided
  • “Pregnant women need skilled care during pregnancy and childbirth, but that skilled care is often delayed in some countries …( text omitted ). The focused antenatal care (FANC) model of WHO recommends that nurses provide information or counseling to all pregnant women …( text omitted ). Job aids are visual support materials that provide the right kind of information using graphics and words in a simple and yet effective manner. When nurses are not highly trained or have many work details to attend to, these job aids can serve as a content reminder for the nurses and can be used for educating their patients (Jennings, Yebadokpo, Affo, & Agbogbe, 2010) ( text omitted ). Importantly, additional evidence is needed to confirm how job aids can further improve the quality of ANC counseling by health workers in maternal care …( text omitted )” 28
  • “ This has led us to hypothesize that the quality of ANC counseling would be better if supported by job aids. Consequently, a better quality of ANC counseling is expected to produce higher levels of awareness concerning the danger signs of pregnancy and a more favorable impression of the caring behavior of nurses .” 28
  • “This study aimed to examine the differences in the responses of pregnant women to a job aid-supported intervention during ANC visit in terms of 1) their understanding of the danger signs of pregnancy and 2) their impression of the caring behaviors of nurses to pregnant women in rural Tanzania.” 28
  • EXAMPLE 2. Background, hypotheses, and aims are provided
  • “We conducted a two-arm randomized controlled trial (RCT) to evaluate and compare changes in salivary cortisol and oxytocin levels of first-time pregnant women between experimental and control groups. The women in the experimental group touched and held an infant for 30 min (experimental intervention protocol), whereas those in the control group watched a DVD movie of an infant (control intervention protocol). The primary outcome was salivary cortisol level and the secondary outcome was salivary oxytocin level.” 29
  • “ We hypothesize that at 30 min after touching and holding an infant, the salivary cortisol level will significantly decrease and the salivary oxytocin level will increase in the experimental group compared with the control group .” 29
  • EXAMPLE 3. Background, aim, and hypothesis are provided
  • “In countries where the maternal mortality ratio remains high, antenatal education to increase Birth Preparedness and Complication Readiness (BPCR) is considered one of the top priorities [1]. BPCR includes birth plans during the antenatal period, such as the birthplace, birth attendant, transportation, health facility for complications, expenses, and birth materials, as well as family coordination to achieve such birth plans. In Tanzania, although increasing, only about half of all pregnant women attend an antenatal clinic more than four times [4]. Moreover, the information provided during antenatal care (ANC) is insufficient. In the resource-poor settings, antenatal group education is a potential approach because of the limited time for individual counseling at antenatal clinics.” 30
  • “This study aimed to evaluate an antenatal group education program among pregnant women and their families with respect to birth-preparedness and maternal and infant outcomes in rural villages of Tanzania.” 30
  • “ The study hypothesis was if Tanzanian pregnant women and their families received a family-oriented antenatal group education, they would (1) have a higher level of BPCR, (2) attend antenatal clinic four or more times, (3) give birth in a health facility, (4) have less complications of women at birth, and (5) have less complications and deaths of infants than those who did not receive the education .” 30

Research questions and hypotheses are crucial components to any type of research, whether quantitative or qualitative. These questions should be developed at the very beginning of the study. Excellent research questions lead to superior hypotheses, which, like a compass, set the direction of research, and can often determine the successful conduct of the study. Many research studies have floundered because the development of research questions and subsequent hypotheses was not given the thought and meticulous attention needed. The development of research questions and hypotheses is an iterative process based on extensive knowledge of the literature and insightful grasp of the knowledge gap. Focused, concise, and specific research questions provide a strong foundation for constructing hypotheses which serve as formal predictions about the research outcomes. Research questions and hypotheses are crucial elements of research that should not be overlooked. They should be carefully thought of and constructed when planning research. This avoids unethical studies and poor outcomes by defining well-founded objectives that determine the design, course, and outcome of the study.

Disclosure: The authors have no potential conflicts of interest to disclose.

Author Contributions:

  • Conceptualization: Barroga E, Matanguihan GJ.
  • Methodology: Barroga E, Matanguihan GJ.
  • Writing - original draft: Barroga E, Matanguihan GJ.
  • Writing - review & editing: Barroga E, Matanguihan GJ.

Have a language expert improve your writing

Run a free plagiarism check in 10 minutes, generate accurate citations for free.

  • Knowledge Base

Methodology

  • What Is Quantitative Research? | Definition, Uses & Methods

What Is Quantitative Research? | Definition, Uses & Methods

Published on June 12, 2020 by Pritha Bhandari . Revised on June 22, 2023.

Quantitative research is the process of collecting and analyzing numerical data. It can be used to find patterns and averages, make predictions, test causal relationships, and generalize results to wider populations.

Quantitative research is the opposite of qualitative research , which involves collecting and analyzing non-numerical data (e.g., text, video, or audio).

Quantitative research is widely used in the natural and social sciences: biology, chemistry, psychology, economics, sociology, marketing, etc.

  • What is the demographic makeup of Singapore in 2020?
  • How has the average temperature changed globally over the last century?
  • Does environmental pollution affect the prevalence of honey bees?
  • Does working from home increase productivity for people with long commutes?

Table of contents

Quantitative research methods, quantitative data analysis, advantages of quantitative research, disadvantages of quantitative research, other interesting articles, frequently asked questions about quantitative research.

You can use quantitative research methods for descriptive, correlational or experimental research.

  • In descriptive research , you simply seek an overall summary of your study variables.
  • In correlational research , you investigate relationships between your study variables.
  • In experimental research , you systematically examine whether there is a cause-and-effect relationship between variables.

Correlational and experimental research can both be used to formally test hypotheses , or predictions, using statistics. The results may be generalized to broader populations based on the sampling method used.

To collect quantitative data, you will often need to use operational definitions that translate abstract concepts (e.g., mood) into observable and quantifiable measures (e.g., self-ratings of feelings and energy levels).

Quantitative research methods
Research method How to use Example
Control or manipulate an to measure its effect on a dependent variable. To test whether an intervention can reduce procrastination in college students, you give equal-sized groups either a procrastination intervention or a comparable task. You compare self-ratings of procrastination behaviors between the groups after the intervention.
Ask questions of a group of people in-person, over-the-phone or online. You distribute with rating scales to first-year international college students to investigate their experiences of culture shock.
(Systematic) observation Identify a behavior or occurrence of interest and monitor it in its natural setting. To study college classroom participation, you sit in on classes to observe them, counting and recording the prevalence of active and passive behaviors by students from different backgrounds.
Secondary research Collect data that has been gathered for other purposes e.g., national surveys or historical records. To assess whether attitudes towards climate change have changed since the 1980s, you collect relevant questionnaire data from widely available .

Note that quantitative research is at risk for certain research biases , including information bias , omitted variable bias , sampling bias , or selection bias . Be sure that you’re aware of potential biases as you collect and analyze your data to prevent them from impacting your work too much.

Prevent plagiarism. Run a free check.

Once data is collected, you may need to process it before it can be analyzed. For example, survey and test data may need to be transformed from words to numbers. Then, you can use statistical analysis to answer your research questions .

Descriptive statistics will give you a summary of your data and include measures of averages and variability. You can also use graphs, scatter plots and frequency tables to visualize your data and check for any trends or outliers.

Using inferential statistics , you can make predictions or generalizations based on your data. You can test your hypothesis or use your sample data to estimate the population parameter .

First, you use descriptive statistics to get a summary of the data. You find the mean (average) and the mode (most frequent rating) of procrastination of the two groups, and plot the data to see if there are any outliers.

You can also assess the reliability and validity of your data collection methods to indicate how consistently and accurately your methods actually measured what you wanted them to.

Quantitative research is often used to standardize data collection and generalize findings . Strengths of this approach include:

  • Replication

Repeating the study is possible because of standardized data collection protocols and tangible definitions of abstract concepts.

  • Direct comparisons of results

The study can be reproduced in other cultural settings, times or with different groups of participants. Results can be compared statistically.

  • Large samples

Data from large samples can be processed and analyzed using reliable and consistent procedures through quantitative data analysis.

  • Hypothesis testing

Using formalized and established hypothesis testing procedures means that you have to carefully consider and report your research variables, predictions, data collection and testing methods before coming to a conclusion.

Despite the benefits of quantitative research, it is sometimes inadequate in explaining complex research topics. Its limitations include:

  • Superficiality

Using precise and restrictive operational definitions may inadequately represent complex concepts. For example, the concept of mood may be represented with just a number in quantitative research, but explained with elaboration in qualitative research.

  • Narrow focus

Predetermined variables and measurement procedures can mean that you ignore other relevant observations.

  • Structural bias

Despite standardized procedures, structural biases can still affect quantitative research. Missing data , imprecise measurements or inappropriate sampling methods are biases that can lead to the wrong conclusions.

  • Lack of context

Quantitative research often uses unnatural settings like laboratories or fails to consider historical and cultural contexts that may affect data collection and results.

If you want to know more about statistics , methodology , or research bias , make sure to check out some of our other articles with explanations and examples.

  • Chi square goodness of fit test
  • Degrees of freedom
  • Null hypothesis
  • Discourse analysis
  • Control groups
  • Mixed methods research
  • Non-probability sampling
  • Inclusion and exclusion criteria

Research bias

  • Rosenthal effect
  • Implicit bias
  • Cognitive bias
  • Selection bias
  • Negativity bias
  • Status quo bias

Quantitative research deals with numbers and statistics, while qualitative research deals with words and meanings.

Quantitative methods allow you to systematically measure variables and test hypotheses . Qualitative methods allow you to explore concepts and experiences in more detail.

In mixed methods research , you use both qualitative and quantitative data collection and analysis methods to answer your research question .

Data collection is the systematic process by which observations or measurements are gathered in research. It is used in many different contexts by academics, governments, businesses, and other organizations.

Operationalization means turning abstract conceptual ideas into measurable observations.

For example, the concept of social anxiety isn’t directly observable, but it can be operationally defined in terms of self-rating scores, behavioral avoidance of crowded places, or physical anxiety symptoms in social situations.

Before collecting data , it’s important to consider how you will operationalize the variables that you want to measure.

Reliability and validity are both about how well a method measures something:

  • Reliability refers to the  consistency of a measure (whether the results can be reproduced under the same conditions).
  • Validity   refers to the  accuracy of a measure (whether the results really do represent what they are supposed to measure).

If you are doing experimental research, you also have to consider the internal and external validity of your experiment.

Hypothesis testing is a formal procedure for investigating our ideas about the world using statistics. It is used by scientists to test specific predictions, called hypotheses , by calculating how likely it is that a pattern or relationship between variables could have arisen by chance.

Cite this Scribbr article

If you want to cite this source, you can copy and paste the citation or click the “Cite this Scribbr article” button to automatically add the citation to our free Citation Generator.

Bhandari, P. (2023, June 22). What Is Quantitative Research? | Definition, Uses & Methods. Scribbr. Retrieved August 27, 2024, from https://www.scribbr.com/methodology/quantitative-research/

Is this article helpful?

Pritha Bhandari

Pritha Bhandari

Other students also liked, descriptive statistics | definitions, types, examples, inferential statistics | an easy introduction & examples, get unlimited documents corrected.

✔ Free APA citation check included ✔ Unlimited document corrections ✔ Specialized in correcting academic texts

  • Privacy Policy

Research Method

Home » Quantitative Research – Methods, Types and Analysis

Quantitative Research – Methods, Types and Analysis

Table of Contents

What is Quantitative Research

Quantitative Research

Quantitative research is a type of research that collects and analyzes numerical data to test hypotheses and answer research questions . This research typically involves a large sample size and uses statistical analysis to make inferences about a population based on the data collected. It often involves the use of surveys, experiments, or other structured data collection methods to gather quantitative data.

Quantitative Research Methods

Quantitative Research Methods

Quantitative Research Methods are as follows:

Descriptive Research Design

Descriptive research design is used to describe the characteristics of a population or phenomenon being studied. This research method is used to answer the questions of what, where, when, and how. Descriptive research designs use a variety of methods such as observation, case studies, and surveys to collect data. The data is then analyzed using statistical tools to identify patterns and relationships.

Correlational Research Design

Correlational research design is used to investigate the relationship between two or more variables. Researchers use correlational research to determine whether a relationship exists between variables and to what extent they are related. This research method involves collecting data from a sample and analyzing it using statistical tools such as correlation coefficients.

Quasi-experimental Research Design

Quasi-experimental research design is used to investigate cause-and-effect relationships between variables. This research method is similar to experimental research design, but it lacks full control over the independent variable. Researchers use quasi-experimental research designs when it is not feasible or ethical to manipulate the independent variable.

Experimental Research Design

Experimental research design is used to investigate cause-and-effect relationships between variables. This research method involves manipulating the independent variable and observing the effects on the dependent variable. Researchers use experimental research designs to test hypotheses and establish cause-and-effect relationships.

Survey Research

Survey research involves collecting data from a sample of individuals using a standardized questionnaire. This research method is used to gather information on attitudes, beliefs, and behaviors of individuals. Researchers use survey research to collect data quickly and efficiently from a large sample size. Survey research can be conducted through various methods such as online, phone, mail, or in-person interviews.

Quantitative Research Analysis Methods

Here are some commonly used quantitative research analysis methods:

Statistical Analysis

Statistical analysis is the most common quantitative research analysis method. It involves using statistical tools and techniques to analyze the numerical data collected during the research process. Statistical analysis can be used to identify patterns, trends, and relationships between variables, and to test hypotheses and theories.

Regression Analysis

Regression analysis is a statistical technique used to analyze the relationship between one dependent variable and one or more independent variables. Researchers use regression analysis to identify and quantify the impact of independent variables on the dependent variable.

Factor Analysis

Factor analysis is a statistical technique used to identify underlying factors that explain the correlations among a set of variables. Researchers use factor analysis to reduce a large number of variables to a smaller set of factors that capture the most important information.

Structural Equation Modeling

Structural equation modeling is a statistical technique used to test complex relationships between variables. It involves specifying a model that includes both observed and unobserved variables, and then using statistical methods to test the fit of the model to the data.

Time Series Analysis

Time series analysis is a statistical technique used to analyze data that is collected over time. It involves identifying patterns and trends in the data, as well as any seasonal or cyclical variations.

Multilevel Modeling

Multilevel modeling is a statistical technique used to analyze data that is nested within multiple levels. For example, researchers might use multilevel modeling to analyze data that is collected from individuals who are nested within groups, such as students nested within schools.

Applications of Quantitative Research

Quantitative research has many applications across a wide range of fields. Here are some common examples:

  • Market Research : Quantitative research is used extensively in market research to understand consumer behavior, preferences, and trends. Researchers use surveys, experiments, and other quantitative methods to collect data that can inform marketing strategies, product development, and pricing decisions.
  • Health Research: Quantitative research is used in health research to study the effectiveness of medical treatments, identify risk factors for diseases, and track health outcomes over time. Researchers use statistical methods to analyze data from clinical trials, surveys, and other sources to inform medical practice and policy.
  • Social Science Research: Quantitative research is used in social science research to study human behavior, attitudes, and social structures. Researchers use surveys, experiments, and other quantitative methods to collect data that can inform social policies, educational programs, and community interventions.
  • Education Research: Quantitative research is used in education research to study the effectiveness of teaching methods, assess student learning outcomes, and identify factors that influence student success. Researchers use experimental and quasi-experimental designs, as well as surveys and other quantitative methods, to collect and analyze data.
  • Environmental Research: Quantitative research is used in environmental research to study the impact of human activities on the environment, assess the effectiveness of conservation strategies, and identify ways to reduce environmental risks. Researchers use statistical methods to analyze data from field studies, experiments, and other sources.

Characteristics of Quantitative Research

Here are some key characteristics of quantitative research:

  • Numerical data : Quantitative research involves collecting numerical data through standardized methods such as surveys, experiments, and observational studies. This data is analyzed using statistical methods to identify patterns and relationships.
  • Large sample size: Quantitative research often involves collecting data from a large sample of individuals or groups in order to increase the reliability and generalizability of the findings.
  • Objective approach: Quantitative research aims to be objective and impartial in its approach, focusing on the collection and analysis of data rather than personal beliefs, opinions, or experiences.
  • Control over variables: Quantitative research often involves manipulating variables to test hypotheses and establish cause-and-effect relationships. Researchers aim to control for extraneous variables that may impact the results.
  • Replicable : Quantitative research aims to be replicable, meaning that other researchers should be able to conduct similar studies and obtain similar results using the same methods.
  • Statistical analysis: Quantitative research involves using statistical tools and techniques to analyze the numerical data collected during the research process. Statistical analysis allows researchers to identify patterns, trends, and relationships between variables, and to test hypotheses and theories.
  • Generalizability: Quantitative research aims to produce findings that can be generalized to larger populations beyond the specific sample studied. This is achieved through the use of random sampling methods and statistical inference.

Examples of Quantitative Research

Here are some examples of quantitative research in different fields:

  • Market Research: A company conducts a survey of 1000 consumers to determine their brand awareness and preferences. The data is analyzed using statistical methods to identify trends and patterns that can inform marketing strategies.
  • Health Research : A researcher conducts a randomized controlled trial to test the effectiveness of a new drug for treating a particular medical condition. The study involves collecting data from a large sample of patients and analyzing the results using statistical methods.
  • Social Science Research : A sociologist conducts a survey of 500 people to study attitudes toward immigration in a particular country. The data is analyzed using statistical methods to identify factors that influence these attitudes.
  • Education Research: A researcher conducts an experiment to compare the effectiveness of two different teaching methods for improving student learning outcomes. The study involves randomly assigning students to different groups and collecting data on their performance on standardized tests.
  • Environmental Research : A team of researchers conduct a study to investigate the impact of climate change on the distribution and abundance of a particular species of plant or animal. The study involves collecting data on environmental factors and population sizes over time and analyzing the results using statistical methods.
  • Psychology : A researcher conducts a survey of 500 college students to investigate the relationship between social media use and mental health. The data is analyzed using statistical methods to identify correlations and potential causal relationships.
  • Political Science: A team of researchers conducts a study to investigate voter behavior during an election. They use survey methods to collect data on voting patterns, demographics, and political attitudes, and analyze the results using statistical methods.

How to Conduct Quantitative Research

Here is a general overview of how to conduct quantitative research:

  • Develop a research question: The first step in conducting quantitative research is to develop a clear and specific research question. This question should be based on a gap in existing knowledge, and should be answerable using quantitative methods.
  • Develop a research design: Once you have a research question, you will need to develop a research design. This involves deciding on the appropriate methods to collect data, such as surveys, experiments, or observational studies. You will also need to determine the appropriate sample size, data collection instruments, and data analysis techniques.
  • Collect data: The next step is to collect data. This may involve administering surveys or questionnaires, conducting experiments, or gathering data from existing sources. It is important to use standardized methods to ensure that the data is reliable and valid.
  • Analyze data : Once the data has been collected, it is time to analyze it. This involves using statistical methods to identify patterns, trends, and relationships between variables. Common statistical techniques include correlation analysis, regression analysis, and hypothesis testing.
  • Interpret results: After analyzing the data, you will need to interpret the results. This involves identifying the key findings, determining their significance, and drawing conclusions based on the data.
  • Communicate findings: Finally, you will need to communicate your findings. This may involve writing a research report, presenting at a conference, or publishing in a peer-reviewed journal. It is important to clearly communicate the research question, methods, results, and conclusions to ensure that others can understand and replicate your research.

When to use Quantitative Research

Here are some situations when quantitative research can be appropriate:

  • To test a hypothesis: Quantitative research is often used to test a hypothesis or a theory. It involves collecting numerical data and using statistical analysis to determine if the data supports or refutes the hypothesis.
  • To generalize findings: If you want to generalize the findings of your study to a larger population, quantitative research can be useful. This is because it allows you to collect numerical data from a representative sample of the population and use statistical analysis to make inferences about the population as a whole.
  • To measure relationships between variables: If you want to measure the relationship between two or more variables, such as the relationship between age and income, or between education level and job satisfaction, quantitative research can be useful. It allows you to collect numerical data on both variables and use statistical analysis to determine the strength and direction of the relationship.
  • To identify patterns or trends: Quantitative research can be useful for identifying patterns or trends in data. For example, you can use quantitative research to identify trends in consumer behavior or to identify patterns in stock market data.
  • To quantify attitudes or opinions : If you want to measure attitudes or opinions on a particular topic, quantitative research can be useful. It allows you to collect numerical data using surveys or questionnaires and analyze the data using statistical methods to determine the prevalence of certain attitudes or opinions.

Purpose of Quantitative Research

The purpose of quantitative research is to systematically investigate and measure the relationships between variables or phenomena using numerical data and statistical analysis. The main objectives of quantitative research include:

  • Description : To provide a detailed and accurate description of a particular phenomenon or population.
  • Explanation : To explain the reasons for the occurrence of a particular phenomenon, such as identifying the factors that influence a behavior or attitude.
  • Prediction : To predict future trends or behaviors based on past patterns and relationships between variables.
  • Control : To identify the best strategies for controlling or influencing a particular outcome or behavior.

Quantitative research is used in many different fields, including social sciences, business, engineering, and health sciences. It can be used to investigate a wide range of phenomena, from human behavior and attitudes to physical and biological processes. The purpose of quantitative research is to provide reliable and valid data that can be used to inform decision-making and improve understanding of the world around us.

Advantages of Quantitative Research

There are several advantages of quantitative research, including:

  • Objectivity : Quantitative research is based on objective data and statistical analysis, which reduces the potential for bias or subjectivity in the research process.
  • Reproducibility : Because quantitative research involves standardized methods and measurements, it is more likely to be reproducible and reliable.
  • Generalizability : Quantitative research allows for generalizations to be made about a population based on a representative sample, which can inform decision-making and policy development.
  • Precision : Quantitative research allows for precise measurement and analysis of data, which can provide a more accurate understanding of phenomena and relationships between variables.
  • Efficiency : Quantitative research can be conducted relatively quickly and efficiently, especially when compared to qualitative research, which may involve lengthy data collection and analysis.
  • Large sample sizes : Quantitative research can accommodate large sample sizes, which can increase the representativeness and generalizability of the results.

Limitations of Quantitative Research

There are several limitations of quantitative research, including:

  • Limited understanding of context: Quantitative research typically focuses on numerical data and statistical analysis, which may not provide a comprehensive understanding of the context or underlying factors that influence a phenomenon.
  • Simplification of complex phenomena: Quantitative research often involves simplifying complex phenomena into measurable variables, which may not capture the full complexity of the phenomenon being studied.
  • Potential for researcher bias: Although quantitative research aims to be objective, there is still the potential for researcher bias in areas such as sampling, data collection, and data analysis.
  • Limited ability to explore new ideas: Quantitative research is often based on pre-determined research questions and hypotheses, which may limit the ability to explore new ideas or unexpected findings.
  • Limited ability to capture subjective experiences : Quantitative research is typically focused on objective data and may not capture the subjective experiences of individuals or groups being studied.
  • Ethical concerns : Quantitative research may raise ethical concerns, such as invasion of privacy or the potential for harm to participants.

About the author

' src=

Muhammad Hassan

Researcher, Academic Writer, Web developer

You may also like

Explanatory Research

Explanatory Research – Types, Methods, Guide

Transformative Design

Transformative Design – Methods, Types, Guide

Applied Research

Applied Research – Types, Methods and Examples

Mixed Research methods

Mixed Methods Research – Types & Analysis

Exploratory Research

Exploratory Research – Types, Methods and...

Research Methods

Research Methods – Types, Examples and Guide

Home • Knowledge hub • What is quantitative research?

What is quantitative research?

quantitative research importance across fields

Quantitative research is an important part of market research that relies on hard facts and numerical data to gain as objective a picture of people’s opinions as possible.

It’s different from qualitative research in a number of important ways and is a highly useful tool for researchers.

Quantitative research is a systematic empirical approach used in the social sciences and various other fields to gather, analyze, and interpret numerical data. It focuses on obtaining measurable data and applying statistical methods to generalize findings to a larger population.

Researchers use structured instruments such as surveys, questionnaires, or experiments to collect data from a representative sample in quantitative research. The data collected is typically numerical values or categorical responses that can be analyzed using statistical techniques. These statistical analyses help researchers identify patterns, relationships, trends, or associations among variables.

Quantitative research aims to generate objective and reliable information about a particular phenomenon, population, or group. It aims to better understand the subject under investigation by employing statistical measures such as means, percentages, correlations, or regression analyses.

Quantitative research provides:

  • A quantitative understanding of social phenomena.
  • Allowing researchers to make generalizations.
  • Predictions.
  • Comparisons based on numerical data.

It is widely used in psychology, sociology, economics, marketing, and many other disciplines to explore and gain insights into various research questions.

In this article, we’ll take a deep dive into quantitative research, why it’s important, and how to use it effectively.

How is quantitative research different from qualitative research?

Although they’re both extremely useful, there are a number of key differences between quantitative and qualitative market research strategies. A solid market research strategy will make use of both qualitative and quantitative research.

  • Quantitative research relies on gathering numerical data points. Qualitative research on the other hand, as the name suggests, seeks to gather qualitative data by speaking to people in individual or group settings. 
  • Quantitative research normally uses closed questions, while qualitative research uses open questions more frequently.
  • Quantitative research is great for establishing trends and patterns of behavior, whereas qualitative methods are great for explaining the “why” behind them.

Why is quantitative research useful?

Quantitative research has a crucial role to play in any market research strategy for a range of reasons:

  • It enables you to conduct research at scale
  • When quantitative research is conducted in a representative way, it can reveal insights about broader groups of people or the population as a whole
  • It enables us to easily compare different groups (e.g. by age, gender or market) to understand similarities or differences 
  • It can help businesses understand the size of a new opportunity 
  •  It can be helpful for reducing a complex problem or topic to a limited number of variables

Get regular insights

Keep up to date with the latest insights from our research as well as all our company news in our free monthly newsletter.

  • First Name *
  • Last Name *
  • Business Email *

quantitative research importance across fields

Quantitative Research Design

Quantitative research design refers to the overall plan and structure that guides the collection, analysis, and interpretation of numerical data in a quantitative research study. It outlines the specific steps, procedures, and techniques used to address research questions or test hypotheses systematically and rigorously. A well-designed quantitative research study ensures that the data collected is reliable, valid, and capable of answering the research objectives.

There are several key components involved in designing a quantitative research study:

  • Research Questions or Hypotheses: The research design begins with clearly defined research questions or hypotheses articulating the study’s objectives. These questions guide the selection of variables and the development of research instruments.
  • Sampling: A critical aspect of quantitative research design is selecting a representative sample from the target population. The sample should be carefully chosen to ensure it adequately represents the population of interest, allowing for the generalizability of the findings.
  • Variables and Operationalization: Quantitative research involves the measurement of variables. In the research design phase, researchers identify the variables they will study and determine how to operationalize them into measurable and observable forms. This includes defining the indicators or measures used to assess each variable.
  • Data Collection Methods: Quantitative research typically involves collecting data through structured instruments, such as surveys, questionnaires, or tests. The research design specifies the data collection methods, including the procedures for administering the instruments, the timing of data collection, and the strategies for maximizing response rates.
  • Data Analysis: Quantitative research design includes decisions about the statistical techniques and analyses applied to the collected data. This may involve descriptive statistics (e.g., means, percentages) and inferential statistics (e.g., t-tests, regression analyses) to examine variables’ relationships, differences, or associations.
  • Validity and Reliability: Ensuring the validity and reliability of the data is a crucial consideration in quantitative research design. Validity refers to the extent to which a measurement instrument or procedure accurately measures what it intends to measure. Reliability refers to the consistency and stability of the measurement over time and across different conditions. Researchers employ pilot testing, validity checks, and statistical measures to enhance validity and reliability.
  • Ethical Considerations: Quantitative research design also includes ethical considerations, such as obtaining informed consent from participants, protecting their privacy and confidentiality, and ensuring the study adheres to ethical guidelines and regulations.

By carefully designing a quantitative research study, researchers can ensure their investigations are methodologically sound, reliable, and valid. 

Well-designed research provides a solid foundation for collecting and analyzing numerical data, allowing researchers to draw meaningful conclusions and contribute to the body of knowledge in their respective fields.

Quantitative research data collection methods

When collecting and analyzing the data you need for quantitative research, you have a number of possibilities available to you. Each has its own pros and cons, and it might be best to use a mix. Here are some of the main research methods:

Survey research

This involves sending out surveys to your target audience to collect information before statistically analyzing the results to draw conclusions and insights. It’s a great way to better understand your target customers or explore a new market and can be turned around quickly. 

There are a number of different ways of conducting surveys, such as:

  • Email — this is a quick way of reaching a large number of people and can be more affordable than the other methods described below.
  • Phone — not everyone has access to the internet so if you’re looking to reach a particular demographic that may struggle to engage in this way (e.g. older consumers) telephone can be a better approach. That said, it can be expensive and time-consuming.
  • Post or Mail — as with the phone, you can reach a wide segment of the population, but it’s expensive and takes a long time. As organizations look to identify and react to changes in consumer behavior at speed, postal surveys have become somewhat outdated. 
  • In-person — in some instances it makes sense to conduct quantitative research in person. Examples of this include intercepts where you need to collect quantitative data about the customer experience in the moment or taste tests or central location tests , where you need consumers to physically interact with a product to provide useful feedback. Conducting research in this way can be expensive and logistically challenging to organize and carry out.

Survey questions for quantitative research usually include closed-ended questions rather than the open-ended questions used in qualitative research. For example, instead of asking

“How do you feel about our delivery policy?”

You might ask…

“How satisfied are you with our delivery policy? “Very satisfied / Satisfied / Don’t Know / Dissatisfied / Very Dissatisfied” 

This way, you’ll gain data that can be categorized and analyzed in a quantitative, numbers-based way.

Correlational Research

Correlational research is a specific type of quantitative research that examines the relationship between two or more variables. It focuses on determining whether there is a statistical association or correlation between variables without establishing causality. In other words, correlational research helps to understand how changes in one variable correspond to changes in another.

One of the critical features of correlational research is that it allows researchers to analyze data from existing sources or collect data through surveys or questionnaires. By measuring the variables of interest, researchers can calculate a correlation coefficient, such as Pearson’s, to quantify the strength and direction of the relationship. The correlation coefficient ranges from -1 to +1, where a positive value indicates a positive relationship, a negative value indicates a negative relationship and a value close to zero suggests no significant relationship. Correlational research is valuable in various fields, such as psychology, sociology, and economics, as it helps researchers explore connections between variables that may not be feasible to manipulate in an experimental setting. For example, a psychologist might use correlational research to investigate the relationship between sleep duration and student academic performance. By collecting data on these variables, they can determine whether there is a correlation between the two factors and to what extent they are related. It is important to note that correlational research does not imply causation. While a correlation suggests an association between variables, it does not provide evidence for a cause-and-effect relationship. Other factors, known as confounding variables, may be influencing the observed relationship. Therefore, researchers must exercise caution in interpreting correlational findings and consider additional research methods, such as experimental studies, to establish causality. Correlational research is vital in quantitative research and analysis by investigating relationships between variables. It provides valuable insights into the strength and direction of associations and helps researchers generate hypotheses for further investigation. By understanding the limitations of correlational research, researchers can use this method effectively to explore connections between variables in various disciplines.

Experimental Research

Experimental research is a fundamental approach within quantitative research that aims to establish cause-and-effect relationships between variables. It involves the manipulation of an independent variable and measuring its effects on a dependent variable while controlling for potential confounding variables. Experimental research is highly regarded for its ability to provide rigorous evidence and draw conclusions about causal relationships. The hallmark of experimental research is the presence of at least two groups: the experimental and control groups. The experimental group receives the manipulated variable, the independent variable, while the control group does not. By comparing the outcomes or responses of the two groups, researchers can attribute any differences observed to the effects of the independent variable. Several key components are employed to ensure the reliability and validity of experimental research. Random assignment is a crucial step that involves assigning participants to either the experimental or control group in a random and unbiased manner. This minimizes the potential for pre-existing differences between groups and strengthens the study’s internal validity. Another essential feature of experimental research is the ability to control extraneous variables. By carefully designing the study environment and procedures, researchers can minimize the influence of factors other than the independent variable on the dependent variable. This control enhances the ability to isolate the manipulated variable’s effects and increases the study’s internal validity. Quantitative data is typically collected in experimental research through objective and standardized measurements. Researchers use instruments such as surveys, tests, observations, or physiological measurements to gather numerical data that can be analyzed statistically. This allows for applying various statistical techniques, such as t-tests or analysis of variance (ANOVA), to determine the significance of the observed effects and draw conclusions about the relationship between variables. Experimental research is widely used across psychology, medicine, education, and the natural sciences. It enables researchers to test hypotheses, evaluate interventions or treatments, and provide evidence-based recommendations. Experimental research offers valuable insights into the effectiveness or impact of specific variables, interventions, or strategies by establishing cause-and-effect relationships. Despite its strengths, experimental research also has limitations. The artificial nature of laboratory settings and the need for control may reduce the generalizability of findings to real-world contexts. Ethical considerations also play a crucial role in experimental research, as researchers must ensure participants’ well-being and informed consent. Experimental research is a powerful tool in the quantitative research arsenal. It enables researchers to establish cause-and-effect relationships, control extraneous variables, and gather objective numerical data. Experimental research contributes to evidence-based decision-making and advances knowledge in various fields by employing rigorous methods.

Analyzing results

Once you have your results, the next step — and one of the most important overall — is to categorize and analyze them.

There are many ways to do this. One powerful method is cross-tabulation, where you separate your results into categories based on demographic subgroups. For example, of the people who answered ‘yes’ to a question, how many of them were business leaders and how many were entry-level employees?

You’ll also need to take time to clean the data (for example removing people who sped through the survey, selecting the same answer) to make sure you can confidently draw conclusions. This can all be taken care of by the right team of experts.

The importance of quantitative research

Quantitative research is a powerful tool for anyone looking to learn more about their market and customers. It allows you to gain reliable, objective insights from data and clearly understand trends and patterns.

Where quantitative research falls short is in explaining the ‘why’. This is where you need to turn to other methods, like qualitative research, where you’ll actually talk to your audience and delve into the more subjective factors driving their decision-making.

At Kadence, it’s our job to help you with every aspect of your research strategy. We’ve done this with countless businesses, and we’d love to do it with you. To find out more, get in touch with us .

Helping brands uncover valuable insights

We’ve been working with Kadence on a couple of strategic projects, which influenced our product roadmap roll-out within the region. Their work has been exceptional in providing me the insights that I need. Senior Marketing Executive Arla Foods
Kadence’s reports give us the insight, conclusion and recommended execution needed to give us a different perspective, which provided us with an opportunity to relook at our go to market strategy in a different direction which we are now reaping the benefits from. Sales & Marketing Bridgestone
Kadence helped us not only conduct a thorough and insightful piece of research, its interpretation of the data provided many useful and unexpected good-news stories that we were able to use in our communications and interactions with government bodies. General Manager PR -Internal Communications & Government Affairs Mitsubishi
Kadence team is more like a partner to us. We have run a number of projects together and … the pro-activeness, out of the box thinking and delivering in spite of tight deadlines are some of the key reasons we always reach out to them. Vital Strategies
Kadence were an excellent partner on this project; they took time to really understand our business challenges, and developed a research approach that would tackle the exam question from all directions.  The impact of the work is still being felt now, several years later. Customer Intelligence Director Wall Street Journal

Get In Touch

" (Required) " indicates required fields

Privacy Overview

  • MS in the Learning Sciences
  • Tuition & Financial Aid

SMU Simmons School of Education & Human Development

Qualitative vs. quantitative data analysis: How do they differ?

Educator presenting data to colleagues

Learning analytics have become the cornerstone for personalizing student experiences and enhancing learning outcomes. In this data-informed approach to education there are two distinct methodologies: qualitative and quantitative analytics. These methods, which are typical to data analytics in general, are crucial to the interpretation of learning behaviors and outcomes. This blog will explore the nuances that distinguish qualitative and quantitative research, while uncovering their shared roles in learning analytics, program design and instruction.

What is qualitative data?

Qualitative data is descriptive and includes information that is non numerical. Qualitative research is used to gather in-depth insights that can't be easily measured on a scale like opinions, anecdotes and emotions. In learning analytics qualitative data could include in depth interviews, text responses to a prompt, or a video of a class period. 1

What is quantitative data?

Quantitative data is information that has a numerical value. Quantitative research is conducted to gather measurable data used in statistical analysis. Researchers can use quantitative studies to identify patterns and trends. In learning analytics quantitative data could include test scores, student demographics, or amount of time spent in a lesson. 2

Key difference between qualitative and quantitative data

It's important to understand the differences between qualitative and quantitative data to both determine the appropriate research methods for studies and to gain insights that you can be confident in sharing.

Data Types and Nature

Examples of qualitative data types in learning analytics:

  • Observational data of human behavior from classroom settings such as student engagement, teacher-student interactions, and classroom dynamics
  • Textual data from open-ended survey responses, reflective journals, and written assignments
  • Feedback and discussions from focus groups or interviews
  • Content analysis from various media

Examples of quantitative data types:

  • Standardized test, assessment, and quiz scores
  • Grades and grade point averages
  • Attendance records
  • Time spent on learning tasks
  • Data gathered from learning management systems (LMS), including login frequency, online participation, and completion rates of assignments

Methods of Collection

Qualitative and quantitative research methods for data collection can occasionally seem similar so it's important to note the differences to make sure you're creating a consistent data set and will be able to reliably draw conclusions from your data.

Qualitative research methods

Because of the nature of qualitative data (complex, detailed information), the research methods used to collect it are more involved. Qualitative researchers might do the following to collect data:

  • Conduct interviews to learn about subjective experiences
  • Host focus groups to gather feedback and personal accounts
  • Observe in-person or use audio or video recordings to record nuances of human behavior in a natural setting
  • Distribute surveys with open-ended questions

Quantitative research methods

Quantitative data collection methods are more diverse and more likely to be automated because of the objective nature of the data. A quantitative researcher could employ methods such as:

  • Surveys with close-ended questions that gather numerical data like birthdates or preferences
  • Observational research and record measurable information like the number of students in a classroom
  • Automated numerical data collection like information collected on the backend of a computer system like button clicks and page views

Analysis techniques

Qualitative and quantitative data can both be very informative. However, research studies require critical thinking for productive analysis.

Qualitative data analysis methods

Analyzing qualitative data takes a number of steps. When you first get all your data in one place you can do a review and take notes of trends you think you're seeing or your initial reactions. Next, you'll want to organize all the qualitative data you've collected by assigning it categories. Your central research question will guide your data categorization whether it's by date, location, type of collection method (interview vs focus group, etc), the specific question asked or something else. Next, you'll code your data. Whereas categorizing data is focused on the method of collection, coding is the process of identifying and labeling themes within the data collected to get closer to answering your research questions. Finally comes data interpretation. To interpret the data you'll take a look at the information gathered including your coding labels and see what results are occurring frequently or what other conclusions you can make. 3

Quantitative analysis techniques

The process to analyze quantitative data can be time-consuming due to the large volume of data possible to collect. When approaching a quantitative data set, start by focusing in on the purpose of your evaluation. Without making a conclusion, determine how you will use the information gained from analysis; for example: The answers of this survey about study habits will help determine what type of exam review session will be most useful to a class. 4

Next, you need to decide who is analyzing the data and set parameters for analysis. For example, if two different researchers are evaluating survey responses that rank preferences on a scale from 1 to 5, they need to be operating with the same understanding of the rankings. You wouldn't want one researcher to classify the value of 3 to be a positive preference while the other considers it a negative preference. It's also ideal to have some type of data management system to store and organize your data, such as a spreadsheet or database. Within the database, or via an export to data analysis software, the collected data needs to be cleaned of things like responses left blank, duplicate answers from respondents, and questions that are no longer considered relevant. Finally, you can use statistical software to analyze data (or complete a manual analysis) to find patterns and summarize your findings. 4

Qualitative and quantitative research tools

From the nuanced, thematic exploration enabled by tools like NVivo and ATLAS.ti, to the statistical precision of SPSS and R for quantitative analysis, each suite of data analysis tools offers tailored functionalities that cater to the distinct natures of different data types.

Qualitative research software:

NVivo: NVivo is qualitative data analysis software that can do everything from transcribe recordings to create word clouds and evaluate uploads for different sentiments and themes. NVivo is just one tool from the company Lumivero, which offers whole suites of data processing software. 5

ATLAS.ti: Similar to NVivo, ATLAS.ti allows researchers to upload and import data from a variety of sources to be tagged and refined using machine learning and presented with visualizations and ready for insert into reports. 6

SPSS: SPSS is a statistical analysis tool for quantitative research, appreciated for its user-friendly interface and comprehensive statistical tests, which makes it ideal for educators and researchers. With SPSS researchers can manage and analyze large quantitative data sets, use advanced statistical procedures and modeling techniques, predict customer behaviors, forecast market trends and more. 7

R: R is a versatile and dynamic open-source tool for quantitative analysis. With a vast repository of packages tailored to specific statistical methods, researchers can perform anything from basic descriptive statistics to complex predictive modeling. R is especially useful for its ability to handle large datasets, making it ideal for educational institutions that generate substantial amounts of data. The programming language offers flexibility in customizing analysis and creating publication-quality visualizations to effectively communicate results. 8

Applications in Educational Research

Both quantitative and qualitative data can be employed in learning analytics to drive informed decision-making and pedagogical enhancements. In the classroom, quantitative data like standardized test scores and online course analytics create a foundation for assessing and benchmarking student performance and engagement. Qualitative insights gathered from surveys, focus group discussions, and reflective student journals offer a more nuanced understanding of learners' experiences and contextual factors influencing their education. Additionally feedback and practical engagement metrics blend these data types, providing a holistic view that informs curriculum development, instructional strategies, and personalized learning pathways. Through these varied data sets and uses, educators can piece together a more complete narrative of student success and the impacts of educational interventions.

Master Data Analysis with an M.S. in Learning Sciences From SMU

Whether it is the detailed narratives unearthed through qualitative data or the informative patterns derived from quantitative analysis, both qualitative and quantitative data can provide crucial information for educators and researchers to better understand and improve learning. Dive deeper into the art and science of learning analytics with SMU's online Master of Science in the Learning Sciences program . At SMU, innovation and inquiry converge to empower the next generation of educators and researchers. Choose the Learning Analytics Specialization to learn how to harness the power of data science to illuminate learning trends, devise impactful strategies, and drive educational innovation. You could also find out how advanced technologies like augmented reality (AR), virtual reality (VR), and artificial intelligence (AI) can revolutionize education, and develop the insight to apply embodied cognition principles to enhance learning experiences in the Learning and Technology Design Specialization , or choose your own electives to build a specialization unique to your interests and career goals.

For more information on our curriculum and to become part of a community where data drives discovery, visit SMU's MSLS program website or schedule a call with our admissions outreach advisors for any queries or further discussion. Take the first step towards transforming education with data today.

  • Retrieved on August 8, 2024, from nnlm.gov/guides/data-glossary/qualitative-data
  • Retrieved on August 8, 2024, from nnlm.gov/guides/data-glossary/quantitative-data
  • Retrieved on August 8, 2024, from cdc.gov/healthyyouth/evaluation/pdf/brief19.pdf
  • Retrieved on August 8, 2024, from cdc.gov/healthyyouth/evaluation/pdf/brief20.pdf
  • Retrieved on August 8, 2024, from lumivero.com/solutions/
  • Retrieved on August 8, 2024, from atlasti.com/
  • Retrieved on August 8, 2024, from ibm.com/products/spss-statistics
  • Retrieved on August 8, 2024, from cran.r-project.org/doc/manuals/r-release/R-intro.html#Introduction-and-preliminaries

Return to SMU Online Learning Sciences Blog

Southern Methodist University has engaged Everspring , a leading provider of education and technology services, to support select aspects of program delivery.

This will only take a moment

Information

  • Author Services

Initiatives

You are accessing a machine-readable page. In order to be human-readable, please install an RSS reader.

All articles published by MDPI are made immediately available worldwide under an open access license. No special permission is required to reuse all or part of the article published by MDPI, including figures and tables. For articles published under an open access Creative Common CC BY license, any part of the article may be reused without permission provided that the original article is clearly cited. For more information, please refer to https://www.mdpi.com/openaccess .

Feature papers represent the most advanced research with significant potential for high impact in the field. A Feature Paper should be a substantial original Article that involves several techniques or approaches, provides an outlook for future research directions and describes possible research applications.

Feature papers are submitted upon individual invitation or recommendation by the scientific editors and must receive positive feedback from the reviewers.

Editor’s Choice articles are based on recommendations by the scientific editors of MDPI journals from around the world. Editors select a small number of articles recently published in the journal that they believe will be particularly interesting to readers, or important in the respective research area. The aim is to provide a snapshot of some of the most exciting work published in the various research areas of the journal.

Original Submission Date Received: .

  • Active Journals
  • Find a Journal
  • Proceedings Series
  • For Authors
  • For Reviewers
  • For Editors
  • For Librarians
  • For Publishers
  • For Societies
  • For Conference Organizers
  • Open Access Policy
  • Institutional Open Access Program
  • Special Issues Guidelines
  • Editorial Process
  • Research and Publication Ethics
  • Article Processing Charges
  • Testimonials
  • Preprints.org
  • SciProfiles
  • Encyclopedia

agronomy-logo

Article Menu

quantitative research importance across fields

  • Subscribe SciFeed
  • Recommended Articles
  • Google Scholar
  • on Google Scholar
  • Table of Contents

Find support for a specific problem in the support section of our website.

Please let us know what you think of our products and services.

Visit our dedicated information section to learn more about MDPI.

JSmol Viewer

Identification of multiple genetic loci and candidate genes determining seed size and weight in soybean.

quantitative research importance across fields

1. Introduction

2. materials and methods, 2.1. plant materials and phenotyping, 2.2. dna extraction and sequencing, 2.3. analysis of genomic variants between the two parents, 2.4. bsa-seq data analysis, 2.5. naming principle for qtls, 2.6. rna extraction and sequencing, 2.7. transcriptome data analysis, 2.8. rt-qpcr assay, 2.9. indel/caps marker design and linkage mapping, 3.1. phenotype distribution and correlations among different seed traits in the segregating populations, 3.2. identification of genetic loci for seed traits through bsa-seq, 3.3. linkage mapping of stable qtls, 3.4. gene expression difference and genetic variation in candidate genes within qss4-1, qss20-1, and qsw14-1, 4. discussion, 4.1. identification of novel qtls for seed size and weight in soybean, 4.2. the pleiotropic loci for soybean seed size and weight, 4.3. prediction of candidate genes in qss4-1, qss20-1, and qsw14-1, 5. conclusions, supplementary materials, author contributions, data availability statement, acknowledgments, conflicts of interest.

  • Li, Y.; Guan, R.; Liu, Z.; Ma, Y.; Wang, L.; Li, L.; Lin, F.; Luan, W.; Chen, P.; Yan, Z.; et al. Genetic structure and diversity of cultivated soybean ( Glycine max (L.) Merr.) landraces in China. Theor. Appl. Genet. 2008 , 117 , 857–871. [ Google Scholar ] [ CrossRef ] [ PubMed ]
  • Zhang, D.; Sun, L.; Li, S.; Wang, W.; Ding, Y.; Swarm, S.A.; Li, L.; Wang, X.; Tang, X.; Zhang, Z.; et al. Elevation of soybean seed oil content through selection for seed coat shininess. Nat. Plants 2018 , 4 , 30–35. [ Google Scholar ] [ CrossRef ] [ PubMed ]
  • Graham, P.H.; Vance, C.P. Legumes: Importance and constraints to greater use. Plant Physiol. 2003 , 131 , 872–877. [ Google Scholar ] [ CrossRef ]
  • Liang, Q.; Chen, L.; Yang, X.; Yang, H.; Liu, S.; Kou, K.; Fan, L.; Zhang, Z.; Duan, Z.; Yuan, Y.; et al. Natural variation of Dt2 determines branching in soybean. Nat. Commun. 2022 , 13 , 6429. [ Google Scholar ] [ CrossRef ] [ PubMed ]
  • Li, J.; Zhao, J.; Li, Y.; Gao, Y.; Hua, S.; Nadeem, M.; Sun, G.; Zhang, W.; Hou, J.; Wang, X.; et al. Identification of a novel seed size associated locus SW9-1 in soybean. Crop J. 2019 , 7 , 548–559. [ Google Scholar ] [ CrossRef ]
  • Niu, Y.; Xu, Y.; Liu, X.-F.; Yang, S.-X.; Wei, S.-P.; Xie, F.-T.; Zhang, Y.-M. Association mapping for seed size and shape traits in soybean cultivars. Mol. Breed. 2013 , 31 , 785–794. [ Google Scholar ] [ CrossRef ]
  • Duan, Z.; Li, Q.; Wang, H.; He, X.; Zhang, M. Genetic regulatory networks of soybean seed size, oil and protein contents. Front. Plant Sci. 2023 , 14 , 1160418. [ Google Scholar ] [ CrossRef ]
  • Tayade, R.; Imran, M.; Ghimire, A.; Khan, W.; Nabi, R.B.S.; Kim, Y. Molecular, genetic, and genomic basis of seed size and yield characteristics in soybean. Front. Plant Sci. 2023 , 14 , 1195210. [ Google Scholar ] [ CrossRef ]
  • Kumawat, G.; Xu, D. A Major and Stable Quantitative Trait Locus qSS2 for Seed Size and Shape Traits in a Soybean RIL Population. Front. Genet. 2021 , 12 , 646102. [ Google Scholar ] [ CrossRef ]
  • Luo, S.; Jia, J.; Liu, R.; Wei, R.; Guo, Z.; Cai, Z.; Chen, B.; Liang, F.; Xia, Q.; Nian, H.; et al. Identification of major QTLs for soybean seed size and seed weight traits using a RIL population in different environments. Front. Plant Sci. 2023 , 13 , 1094112. [ Google Scholar ] [ CrossRef ]
  • Liu, D.; Yan, Y.; Fujita, Y.; Xu, D. Identification and validation of QTLs for 100-seed weight using chromosome segment substitution lines in soybean. Breed. Sci. 2018 , 68 , 442–448. [ Google Scholar ] [ CrossRef ]
  • Yuan, B.; Qi, G.; Yuan, C.; Wang, Y.; Zhao, H.; Li, Y.; Wang, Y.; Dong, L.; Dong, Y.; Liu, X. Major genetic locus with pleiotropism determined seed-related traits in cultivated and wild soybeans. Theor. Appl. Genet. 2023 , 136 , 125. [ Google Scholar ] [ CrossRef ] [ PubMed ]
  • Copley, T.R.; Duceppe, M.-O.; O’Donoughue, L.S. Identification of novel loci associated with maturity and yield traits in early maturity soybean plant introduction lines. BMC Genom. 2018 , 19 , 167. [ Google Scholar ] [ CrossRef ] [ PubMed ]
  • Yan, L.; Hofmann, N.; Li, S.; Ferreira, M.E.; Song, B.; Jiang, G.; Ren, S.; Quigley, C.; Fickus, E.; Cregan, P.; et al. Identification of QTL with large effect on seed weight in a selective population of soybean with genome-wide association and fixation index analyses. BMC Genom. 2017 , 18 , 529. [ Google Scholar ] [ CrossRef ] [ PubMed ]
  • Whiting, R.M.; Torabi, S.; Lukens, L.; Eskandari, M. Genomic regions associated with important seed quality traits in food-grade soybeans. BMC Plant Biol. 2020 , 20 , 485. [ Google Scholar ] [ CrossRef ]
  • Fang, C.; Ma, Y.; Wu, S.; Liu, Z.; Wang, Z.; Yang, R.; Hu, G.; Zhou, Z.; Yu, H.; Zhang, M.; et al. Genome-wide association studies dissect the genetic networks underlying agronomical traits in soybean. Genome Biol. 2017 , 18 , 161. [ Google Scholar ] [ CrossRef ]
  • Wu, D.; Li, C.; Jing, Y.; Wang, J.; Zhao, X.; Han, Y. Identification of quantitative trait loci underlying soybean ( Glycine max ) 100-seed weight under different levels of phosphorus fertilizer application. Plant Breed. 2020 , 139 , 959–968. [ Google Scholar ] [ CrossRef ]
  • Zhang, H.; Hao, D.; Sitoe, H.M.; Yin, Z.; Hu, Z.; Zhang, G.; Yu, D. Genetic dissection of the relationship between plant architecture and yield component traits in soybean ( Glycine max ) by association analysis across multiple environments. Plant Breed. 2015 , 134 , 564–572. [ Google Scholar ] [ CrossRef ]
  • Wang, J.; Chu, S.; Zhang, H.; Zhu, Y.; Cheng, H.; Yu, D. Development and application of a novel genome-wide SNP array reveals domestication history in soybean. Sci. Rep. 2016 , 6 , 20728. [ Google Scholar ] [ CrossRef ]
  • Hao, D.; Cheng, H.; Yin, Z.; Cui, S.; Zhang, D.; Wang, H.; Yu, D. Identification of single nucleotide polymorphisms and haplotypes associated with yield and yield components in soybean ( Glycine max ) landraces across multiple environments. Theor. Appl. Genet. 2012 , 124 , 447–458. [ Google Scholar ] [ CrossRef ]
  • Zhang, J.; Song, Q.; Cregan, P.B.; Jiang, G.-L. Genome-wide association study, genomic prediction and marker-assisted selection for seed weight in soybean ( Glycine max ). Theor. Appl. Genet. 2016 , 129 , 117–130. [ Google Scholar ] [ CrossRef ] [ PubMed ]
  • Li, J.; Zhang, Y.; Ma, R.; Huang, W.; Hou, J.; Fang, C.; Wang, L.; Yuan, Z.; Sun, Q.; Dong, X.; et al. Identification of ST1 reveals a selection involving hitchhiking of seed morphology and oil content during soybean domestication. Plant Biotechnol. J. 2022 , 20 , 1110–1121. [ Google Scholar ] [ CrossRef ] [ PubMed ]
  • Duan, Z.; Zhang, M.; Zhang, Z.; Liang, S.; Fan, L.; Yang, X.; Yuan, Y.; Pan, Y.; Zhou, G.; Liu, S.; et al. Natural allelic variation of GmST05 controlling seed size and quality in soybean. Plant Biotechnol. J. 2022 , 20 , 1807–1818. [ Google Scholar ] [ CrossRef ] [ PubMed ]
  • Goettel, W.; Zhang, H.; Li, Y.; Qiao, Z.; Jiang, H.; Hou, D.; Song, Q.; Pantalone, V.R.; Song, B.-H.; Yu, D.; et al. POWR1 is a domestication gene pleiotropically regulating seed quality and yield in soybean. Nat. Commun. 2022 , 13 , 3051. [ Google Scholar ] [ CrossRef ]
  • Lu, X.; Xiong, Q.; Cheng, T.; Li, Q.T.; Liu, X.L.; Bi, Y.D.; Li, W.; Zhang, W.K.; Ma, B.; Lai, Y.C.; et al. A PP2C-1 Allele Underlying a Quantitative Trait Locus Enhances Soybean 100-Seed Weight. Mol. Plant 2017 , 10 , 670–684. [ Google Scholar ] [ CrossRef ]
  • Du, J.; Wang, S.; He, C.; Zhou, B.; Ruan, Y.-L.; Shou, H. Identification of regulatory networks and hub genes controlling soybean seed set and size using RNA sequencing analysis. J. Exp. Bot. 2017 , 68 , 1955–1972. [ Google Scholar ] [ CrossRef ]
  • Gu, Y.; Li, W.; Jiang, H.; Wang, Y.; Gao, H.; Liu, M.; Chen, Q.; Lai, Y.; He, C. Differential expression of a WRKY gene between wild and cultivated soybeans correlates to seed size. J. Exp. Bot. 2017 , 68 , 2717–2729. [ Google Scholar ] [ CrossRef ] [ PubMed ]
  • Lu, X.; Li, Q.-T.; Xiong, Q.; Li, W.; Bi, Y.-D.; Lai, Y.-C.; Liu, X.-L.; Man, W.-Q.; Zhang, W.-K.; Ma, B.; et al. The transcriptomic signature of developing soybean seeds reveals the genetic basis of seed trait adaptation during domestication. Plant J. 2016 , 86 , 530–544. [ Google Scholar ] [ CrossRef ]
  • Zhang, Y.; Bhat, J.A.; Zhang, Y.; Yang, S. Understanding the Molecular Regulatory Networks of Seed Size in Soybean. Int. J. Mol. Sci. 2024 , 25 , 1441. [ Google Scholar ] [ CrossRef ]
  • Giovannoni, J.J.; Wing, R.A.; Ganal, M.W.; Tanksley, S.D. Isolation of molecular markers from specific chromosomal intervals using DNA pools from existing mapping populations. Nucleic Acids Res. 1991 , 19 , 6553–6568. [ Google Scholar ] [ CrossRef ] [ PubMed ]
  • Michelmore, R.W.; Paran, I.; Kesseli, R.V. Identification of markers linked to disease-resistance genes by bulked segregant analysis: A rapid method to detect markers in specific genomic regions by using segregating populations. Proc. Natl. Acad. Sci. USA 1991 , 88 , 9828–9832. [ Google Scholar ] [ CrossRef ]
  • Ehrenreich, I.M.; Torabi, N.; Jia, Y.; Kent, J.; Martis, S.; Shapiro, J.A.; Gresham, D.; Caudy, A.A.; Kruglyak, L. Dissection of genetically complex traits with extremely large pools of yeast segregants. Nature 2010 , 464 , 1039–1042. [ Google Scholar ] [ CrossRef ]
  • Li, Z.; Xu, Y. Bulk segregation analysis in the NGS era: A review of its teenage years. Plant J. 2022 , 109 , 1355–1374. [ Google Scholar ] [ CrossRef ]
  • Takagi, H.; Abe, A.; Yoshida, K.; Kosugi, S.; Natsume, S.; Mitsuoka, C.; Uemura, A.; Utsushi, H.; Tamiru, M.; Takuno, S.; et al. QTL-seq: Rapid mapping of quantitative trait loci in rice by whole genome resequencing of DNA from two bulked populations. Plant J. 2013 , 74 , 174–183. [ Google Scholar ] [ CrossRef ]
  • Magwene, P.M.; Willis, J.H.; Kelly, J.K. The Statistics of Bulk Segregant Analysis Using Next Generation Sequencing. PLoS Comput. Biol. 2011 , 7 , e1002255. [ Google Scholar ] [ CrossRef ] [ PubMed ]
  • Wenger, J.W.; Schwartz, K.; Sherlock, G. Bulk Segregant Analysis by High-Throughput Sequencing Reveals a Novel Xylose Utilization Gene from Saccharomyces cerevisiae . PLoS Genet. 2010 , 6 , e1000942. [ Google Scholar ] [ CrossRef ] [ PubMed ]
  • Zhang, S.; Abdelghany, A.M.; Azam, M.; Qi, J.; Li, J.; Feng, Y.; Liu, Y.; Feng, H.; Ma, C.; Gebregziabher, B.S.; et al. Mining candidate genes underlying seed oil content using BSA-seq in soybean. Ind. Crops Prod. 2023 , 194 , 116308. [ Google Scholar ] [ CrossRef ]
  • Li, R.; Jiang, H.; Zhang, Z.; Zhao, Y.; Xie, J.; Wang, Q.; Zheng, H.; Hou, L.; Xiong, X.; Xin, D.; et al. Combined Linkage Mapping and BSA to Identify QTL and Candidate Genes for Plant Height and the Number of Nodes on the Main Stem in Soybean. Int. J. Mol. Sci. 2020 , 21 , 42. [ Google Scholar ] [ CrossRef ]
  • Vogel, G.; LaPlant, K.E.; Mazourek, M.; Gore, M.A.; Smart, C.D. A combined BSA-Seq and linkage mapping approach identifies genomic regions associated with Phytophthora root and crown rot resistance in squash. Theor. Appl. Genet. 2021 , 134 , 1015–1031. [ Google Scholar ] [ CrossRef ]
  • Win, K.T.; Vegas, J.; Zhang, C.; Song, K.; Lee, S. QTL mapping for downy mildew resistance in cucumber via bulked segregant analysis using next-generation sequencing and conventional methods. Theor. Appl. Genet. 2017 , 130 , 199–211. [ Google Scholar ] [ CrossRef ]
  • Zhang, K.; Yuan, M.; Xia, H.; He, L.; Ma, J.; Wang, M.; Zhao, H.; Hou, L.; Zhao, S.; Li, P.; et al. BSA-seq and genetic mapping reveals AhRt2 as a candidate gene responsible for red testa of peanut. Theor. Appl. Genet. 2022 , 135 , 1529–1540. [ Google Scholar ] [ CrossRef ] [ PubMed ]
  • Bolger, A.M.; Lohse, M.; Usadel, B. Trimmomatic: A flexible trimmer for Illumina sequence data. Bioinformatics 2014 , 30 , 2114–2120. [ Google Scholar ] [ CrossRef ]
  • Li, H.; Durbin, R. Fast and accurate short read alignment with Burrows–Wheeler transform. Bioinformatics 2009 , 25 , 1754–1760. [ Google Scholar ] [ CrossRef ]
  • Cingolani, P.; Platts, A.; Wang, L.L.; Coon, M.; Nguyen, T.; Wang, L.; Land, S.J.; Lu, X.; Ruden, D.M. A program for annotating and predicting the effects of single nucleotide polymorphisms, SnpEff. Fly 2012 , 6 , 80–92. [ Google Scholar ] [ CrossRef ]
  • Chen, X.; Schulz-Trieglaff, O.; Shaw, R.; Barnes, B.; Schlesinger, F.; Källberg, M.; Cox, A.J.; Kruglyak, S.; Saunders, C.T. Manta: Rapid detection of structural variants and indels for germline and cancer sequencing applications. Bioinformatics 2015 , 32 , 1220–1222. [ Google Scholar ] [ CrossRef ] [ PubMed ]
  • Mansfeld, B.N.; Grumet, R. QTLseqr: An R Package for Bulk Segregant Analysis with Next-Generation Sequencing. Plant Genome 2018 , 11 , 180006. [ Google Scholar ] [ CrossRef ]
  • Fehr, W.R.; Caviness, C.E.; Burmood, D.T.; Pennington, J.S. Stage of Development Descriptions for Soybeans, Glycine Max (L.) Merrill. Crop Sci. 1971 , 11 , 929–931. [ Google Scholar ] [ CrossRef ]
  • Pavan Kumar, N.; Biradar, B.D.; Bagewadi, B.; Hanamaratti, N.G.; Bhat, S.; Shekharappa; Nethra, P.; Kariyannanavar, P.; Kavyashree, N.M. Identification of SSR markers linked to new fertility restoration trait in sorghum ( Sorghum bicolor (L.) Moench) for A4 ( maldandi ) male sterile cytoplasm. Plant Breed. 2023 , 143 , 195–203. [ Google Scholar ] [ CrossRef ]
  • Salas, P.; Oyarzo-Llaipen, J.C.; Wang, D.; Chase, K.; Mansur, L. Genetic mapping of seed shape in three populations of recombinant inbred lines of soybean ( Glycine max L. Merr.). Theor. Appl. Genet. 2006 , 113 , 1459–1466. [ Google Scholar ] [ CrossRef ]
  • Sun, Y.-N.; Pan, J.-B.; Shi, X.-L.; Du, X.-Y.; Wu, Q.; Qi, Z.-M.; Jiang, H.-W.; Xin, D.-W.; Liu, C.-Y.; Hu, G.-H.; et al. Multi-environment mapping and meta-analysis of 100-seed weight in soybean. Mol. Biol. Rep. 2012 , 39 , 9435–9443. [ Google Scholar ] [ CrossRef ] [ PubMed ]
  • Jun, T.-H.; Freewalt, K.; Michel, A.P.; Mian, R. Identification of novel QTL for leaf traits in soybean. Plant Breed. 2014 , 133 , 61–66. [ Google Scholar ] [ CrossRef ]
  • Jiang, S.; Jin, X.; Liu, Z.; Xu, R.; Hou, C.; Zhang, F.; Fan, C.; Wu, H.; Chen, T.; Shi, J.; et al. Natural variation in SSW1 coordinates seed growth and nitrogen use efficiency in Arabidopsis . Cell Rep. 2024 , 43 , 114150. [ Google Scholar ] [ CrossRef ]
  • Fontes, L.A.N.; Ohlrogge, A.J. Influence of Seed Size and Population on Yield and Other Characteristics of Soybean [ Glycine max (L.) Merr.]. Agron. J. 1972 , 64 , 833–836. [ Google Scholar ] [ CrossRef ]
  • Smith, T.J.; Camper, H.M., Jr. Effects of Seed Size on Soybean Performance. Agron. J. 1975 , 67 , 681–684. [ Google Scholar ] [ CrossRef ]
  • Poeta, F.; Borrás, L.; Rotundo, J.L. Variation in Seed Protein Concentration and Seed Size Affects Soybean Crop Growth and Development. Crop Sci. 2016 , 56 , 3196–3208. [ Google Scholar ] [ CrossRef ]
  • Hina, A.; Cao, Y.; Song, S.; Li, S.; Sharmin, R.A.; Elattar, M.A.; Bhat, J.A.; Zhao, T. High-Resolution Mapping in Two RIL Populations Refines Major “QTL Hotspot” Regions for Seed Size and Shape in Soybean ( Glycine max L.). Int. J. Mol. Sci. 2020 , 21 , 1040. [ Google Scholar ] [ CrossRef ] [ PubMed ]
  • Xu, Y.; Li, H.-N.; Li, G.-J.; Wang, X.; Cheng, L.-G.; Zhang, Y.-M. Mapping quantitative trait loci for seed size traits in soybean ( Glycine max L. Merr.). Theor. Appl. Genet. 2011 , 122 , 581–594. [ Google Scholar ] [ CrossRef ]
  • Wang, L.; Karikari, B.; Zhang, H.; Zhang, C.; Wang, Z.; Zhao, T.; Feng, J. Comprehensive Identification of Main, Environment Interaction and Epistasis Quantitative Trait Nucleotides for 100-Seed Weight in Soybean ( Glycine max (L.) Merr.). Agronomy 2024 , 14 , 483. [ Google Scholar ] [ CrossRef ]
  • Chen, Y.; Xiong, Y.; Hong, H.; Li, G.; Gao, J.; Guo, Q.; Sun, R.; Ren, H.; Zhang, F.; Wang, J.; et al. Genetic dissection of and genomic selection for seed weight, pod length, and pod width in soybean. Crop J. 2023 , 11 , 832–841. [ Google Scholar ] [ CrossRef ]
  • Elattar, M.A.; Karikari, B.; Li, S.; Song, S.; Cao, Y.; Aslam, M.; Hina, A.; Abou-Elwafa, S.F.; Zhao, T. Identification and Validation of Major QTLs, Epistatic Interactions, and Candidate Genes for Soybean Seed Shape and Weight Using Two Related RIL Populations. Front. Genet. 2021 , 12 , 666440. [ Google Scholar ] [ CrossRef ] [ PubMed ]
  • Li, M.; Chen, L.; Zeng, J.; Razzaq, M.K.; Xu, X.; Xu, Y.; Wang, W.; He, J.; Xing, G.; Gai, J. Identification of Additive–Epistatic QTLs Conferring Seed Traits in Soybean Using Recombinant Inbred Lines. Front. Plant Sci. 2020 , 11 , 566056. [ Google Scholar ] [ CrossRef ]
  • Nichols, D.M.; Glover, K.D.; Carlson, S.R.; Specht, J.E.; Diers, B.W. Fine Mapping of a Seed Protein QTL on Soybean Linkage Group I and Its Correlated Effects on Agronomic Traits. Crop Sci. 2006 , 46 , 834–839. [ Google Scholar ] [ CrossRef ]
  • Orf, J.H.; Chase, K.; Jarvik, T.; Mansur, L.M.; Cregan, P.B.; Adler, F.R.; Lark, K.G. Genetics of Soybean Agronomic Traits: I. Comparison of Three Related Recombinant Inbred Populations. Crop Sci. 1999 , 39 , 1642–1651. [ Google Scholar ] [ CrossRef ]
  • Specht, J.E.; Chase, K.; Macrander, M.; Graef, G.L.; Chung, J.; Markwell, J.P.; Germann, M.; Orf, J.H.; Lark, K.G. Soybean Response to Water: A QTL Analysis of Drought Tolerance. Crop Sci. 2001 , 41 , 493–509. [ Google Scholar ] [ CrossRef ]
  • Yan, L.; Li, Y.-H.; Yang, C.-Y.; Ren, S.-X.; Chang, R.-Z.; Zhang, M.-C.; Qiu, L.-J. Identification and validation of an over-dominant QTL controlling soybean seed weight using populations derived from Glycine max × Glycine soja . Plant Breed. 2014 , 133 , 632–637. [ Google Scholar ] [ CrossRef ]
  • Mian, M.A.R.; Bailey, M.A.; Tamulonis, J.P.; Shipe, E.R.; Carter, T.E.; Parrott, W.A.; Ashley, D.A.; Hussey, R.S.; Boerma, H.R. Molecular markers associated with seed weight in two soybean populations. Theor. Appl. Genet. 1996 , 93 , 1011–1016. [ Google Scholar ] [ CrossRef ] [ PubMed ]
  • Li, D.; Sun, M.; Han, Y.; Teng, W.; Li, W. Identification of QTL underlying soluble pigment content in soybean stems related to resistance to soybean white mold ( Sclerotinia sclerotiorum ). Euphytica 2010 , 172 , 49–57. [ Google Scholar ] [ CrossRef ]
  • Jiang, W.; Zhang, X.; Song, X.; Yang, J.; Pang, Y. Genome-Wide Identification and Characterization of APETALA2/Ethylene-Responsive Element Binding Factor Superfamily Genes in Soybean Seed Development. Front. Plant Sci. 2020 , 11 , 566647. [ Google Scholar ] [ CrossRef ]
  • Zhang, M.; Dong, R.; Huang, P.; Lu, M.; Feng, X.; Fu, Y.; Zhang, X. Novel Seed Size: A Novel Seed-Developing Gene in Glycine max . Int. J. Mol. Sci. 2023 , 24 , 4189. [ Google Scholar ] [ CrossRef ]
  • Tang, X.; Su, T.; Han, M.; Wei, L.; Wang, W.; Yu, Z.; Xue, Y.; Wei, H.; Du, Y.; Greiner, S.; et al. Suppression of extracellular invertase inhibitor gene expression improves seed weight in soybean ( Glycine max ). J. Exp. Bot. 2016 , 68 , 469–482. [ Google Scholar ] [ CrossRef ]
  • Hu, Y.; Liu, Y.; Tao, J.-J.; Lu, L.; Jiang, Z.-H.; Wei, J.-J.; Wu, C.-M.; Yin, C.-C.; Li, W.; Bi, Y.-D.; et al. GmJAZ3 interacts with GmRR18a and GmMYC2a to regulate seed traits in soybean. J. Integr. Plant Biol. 2023 , 65 , 1983–2000. [ Google Scholar ] [ CrossRef ] [ PubMed ]
  • Wang, X.; Li, Y.; Zhang, H.; Sun, G.; Zhang, W.; Qiu, L. Evolution and association analysis of GmCYP78A10 gene with seed size/weight and pod number in soybean. Mol. Biol. Rep. 2015 , 42 , 489–496. [ Google Scholar ] [ CrossRef ]
  • Singh, A.K.; Fu, D.-Q.; El-Habbak, M.; Navarre, D.; Ghabrial, S.; Kachroo, A. Silencing Genes Encoding Omega-3 Fatty Acid Desaturase Alters Seed Size and Accumulation of Bean pod mottle virus in Soybean. Mol. Plant Microbe Interact. 2011 , 24 , 506–515. [ Google Scholar ] [ CrossRef ] [ PubMed ]
  • Wang, S.; Liu, S.; Wang, J.; Yokosho, K.; Zhou, B.; Yu, Y.-C.; Liu, Z.; Frommer, W.B.; Ma, J.F.; Chen, L.-Q.; et al. Simultaneous changes in seed size, oil content and protein content driven by selection of SWEET homologues during soybean domestication. Natl. Sci. Rev. 2020 , 7 , 1776–1786. [ Google Scholar ] [ CrossRef ] [ PubMed ]
  • Hu, Y.; Liu, Y.; Lu, L.; Tao, J.-J.; Cheng, T.; Jin, M.; Wang, Z.-Y.; Wei, J.-J.; Jiang, Z.-H.; Sun, W.-C.; et al. Global analysis of seed transcriptomes reveals a novel PLATZ regulator for seed size and weight control in soybean. New Phytol. 2023 , 240 , 2436–2454. [ Google Scholar ] [ CrossRef ] [ PubMed ]
  • Zhu, W.; Yang, C.; Yong, B.; Wang, Y.; Li, B.; Gu, Y.; Wei, S.; An, Z.; Sun, W.; Qiu, L.; et al. An enhancing effect attributed to a nonsynonymous mutation in SOYBEAN SEED SIZE 1 , a SPINDLY -like gene, is exploited in soybean domestication and improvement. New Phytol. 2022 , 236 , 1375–1392. [ Google Scholar ] [ CrossRef ] [ PubMed ]
  • Zhao, B.; Dai, A.; Wei, H.; Yang, S.; Wang, B.; Jiang, N.; Feng, X. Arabidopsis KLU homologue GmCYP78A72 regulates seed size in soybean. Plant Mol. Biol. 2016 , 90 , 33–47. [ Google Scholar ] [ CrossRef ]
  • Yu, L.; Liu, Y.; Zeng, S.; Yan, J.; Wang, E.; Luo, L. Expression of a novel PSK-encoding gene from soybean improves seed growth and yield in transgenic plants. Planta 2019 , 249 , 1239–1250. [ Google Scholar ] [ CrossRef ]
  • Ge, L.; Yu, J.; Wang, H.; Luth, D.; Bai, G.; Wang, K.; Chen, R. Increasing seed size and quality by manipulating BIG SEEDS1 in legume species. Proc. Natl. Acad. Sci. USA 2016 , 113 , 12414–12419. [ Google Scholar ] [ CrossRef ]
  • Zhang, Y.; Zhang, Y.-J.; Yang, B.-J.; Yu, X.-X.; Wang, D.; Zu, S.-H.; Xue, H.-W.; Lin, W.-H. Functional characterization of GmBZL2 ( AtBZR1 like gene) reveals the conserved BR signaling regulation in Glycine max . Sci. Rep. 2016 , 6 , 31134. [ Google Scholar ] [ CrossRef ]
  • Collins, C.; Dewitte, W.; Murray, J.A.H. D-type cyclins control cell division and developmental rate during Arabidopsis seed development. J. Exp. Bot. 2012 , 63 , 3571–3586. [ Google Scholar ] [ CrossRef ] [ PubMed ]
  • Li, N.; Li, Y. Maternal control of seed size in plants. J. Exp. Bot. 2015 , 66 , 1087–1097. [ Google Scholar ] [ CrossRef ] [ PubMed ]
  • Di Marzo, M.; Herrera-Ubaldo, H.; Caporali, E.; Novak, O.; Strnad, M.; Balanza, V.; Ezquer, I.; Mendes, M.A.; de Folter, S.; Colombo, L. SEEDSTICK Controls Arabidopsis Fruit Size by Regulating Cytokinin Levels and FRUITFULL . Cell Rep. 2020 , 30 , 2846–2857.e2843. [ Google Scholar ] [ CrossRef ] [ PubMed ]
  • Zhang, C.; Wei, L.; Wang, W.; Qi, W.; Cao, Z.; Li, H.; Bao, M.; He, Y. Identification, characterization and functional analysis of AGAMOUS subfamily genes associated with floral organs and seed development in Marigold ( Tagetes erecta ). BMC Plant Biol. 2020 , 20 , 439. [ Google Scholar ] [ CrossRef ] [ PubMed ]

Click here to enlarge figure

QTLChrTraitsLODPVE(%) Left MarkerRight MarkerPosition (bp)
qSS4-14SL7.3715.04Caps4-1Caps4-23,294,059–4,511,350
SW6.1412.51Caps4-1Caps4-23,294,059–4,511,350
HSW5.2410.75Caps4-1Caps4-23,294,059–4,511,350
qSS20-120SL10.3719.87Marker20-4Marker20-537,158,491–38,379,617
SW4.0413.73Marker20-4Marker20-537,158,491–38,379,617
ST3.336.86Marker20-4Marker20-537,158,491–38,379,617
HSW10.3919.84Marker20-4Marker20-537,158,491–38,379,617
qSW14-114SW20.318.64Marker14-4Marker14-614,344,083–28,212,214
QTLIDAt LocusName DEG Variation
qSS4-1Glyma.04G042000AT4G34160CYCD3;1
Glyma.04G045500AT4G37630CYCD5;1R3 pods
Glyma.04G046600AT4G33800T16L1.290
Glyma.04G047900AT4G37750ANTR5 seeds
Glyma.04G055600AT5G21482CKX7
qSW14-1Glyma.14G128400AT1G15550GA3OX1NAupstream/downstream
Glyma.14G132500AT3G49600SUP32 intron/upstream/downstream
Glyma.14G138700AT2G24400SAUR37NAupstream/downstream
Glyma.14G144200AT1G10010AAP8R5 seedsintron
Glyma.14G144400AT1G10010AAP8NAupstream
Glyma.14G144700AT1G10010AAP8 3′-UTR/upstream/downstream
qSS20-1Glyma.20G135400AT1G70210CYCD1;1 intron/upstream/downstream
Glyma.20G135700AT3G49780PSK4R3 podssynonymous
Glyma.20G136400AT5G60440AGL62NA
Glyma.20G136500AT5G60440AGL62NA
Glyma.20G136600AT5G60440AGL62 upstream
Glyma.20G136700AT5G60440AGL62NAsynonymous/upstream/downstream
Glyma.20G136800AT2G24840AGL61NAmissense (Asn-Ser)/intron/upstream/downstream
The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

Wang, M.; Ding, X.; Zeng, Y.; Xie, G.; Yu, J.; Jin, M.; Liu, L.; Li, P.; Zhao, N.; Dong, Q.; et al. Identification of Multiple Genetic Loci and Candidate Genes Determining Seed Size and Weight in Soybean. Agronomy 2024 , 14 , 1957. https://doi.org/10.3390/agronomy14091957

Wang M, Ding X, Zeng Y, Xie G, Yu J, Jin M, Liu L, Li P, Zhao N, Dong Q, et al. Identification of Multiple Genetic Loci and Candidate Genes Determining Seed Size and Weight in Soybean. Agronomy . 2024; 14(9):1957. https://doi.org/10.3390/agronomy14091957

Wang, Meng, Xiaoyang Ding, Yong Zeng, Gang Xie, Jiaxin Yu, Meiyu Jin, Liu Liu, Peiyuan Li, Na Zhao, Qianli Dong, and et al. 2024. "Identification of Multiple Genetic Loci and Candidate Genes Determining Seed Size and Weight in Soybean" Agronomy 14, no. 9: 1957. https://doi.org/10.3390/agronomy14091957

Article Metrics

Supplementary material.

ZIP-Document (ZIP, 982 KiB)

Further Information

Mdpi initiatives, follow mdpi.

MDPI

Subscribe to receive issue release notifications and newsletters from MDPI journals

This paper is in the following e-collection/theme issue:

Published on 26.8.2024 in Vol 26 (2024)

This is a member publication of University College London (Jisc)

Sex-Based Performance Disparities in Machine Learning Algorithms for Cardiac Disease Prediction: Exploratory Study

Authors of this article:

Author Orcid Image

Original Paper

  • Isabel Straw, BMedSci, BMBS, MPH, MRES   ; 
  • Geraint Rees, BMBCh, PhD   ; 
  • Parashkev Nachev, BMBCh, PhD  

University College London, London, United Kingdom

Corresponding Author:

Isabel Straw, BMedSci, BMBS, MPH, MRES

University College London

222 Euston Road

London, NW1 2DA

United Kingdom

Phone: 44 020 3549 5969

Email: [email protected]

Background: The presence of bias in artificial intelligence has garnered increased attention, with inequities in algorithmic performance being exposed across the fields of criminal justice, education, and welfare services. In health care, the inequitable performance of algorithms across demographic groups may widen health inequalities.

Objective: Here, we identify and characterize bias in cardiology algorithms, looking specifically at algorithms used in the management of heart failure.

Methods: Stage 1 involved a literature search of PubMed and Web of Science for key terms relating to cardiac machine learning (ML) algorithms. Papers that built ML models to predict cardiac disease were evaluated for their focus on demographic bias in model performance, and open-source data sets were retained for our investigation. Two open-source data sets were identified: (1) the University of California Irvine Heart Failure data set and (2) the University of California Irvine Coronary Artery Disease data set. We reproduced existing algorithms that have been reported for these data sets, tested them for sex biases in algorithm performance, and assessed a range of remediation techniques for their efficacy in reducing inequities. Particular attention was paid to the false negative rate (FNR), due to the clinical significance of underdiagnosis and missed opportunities for treatment.

Results: In stage 1, our literature search returned 127 papers, with 60 meeting the criteria for a full review and only 3 papers highlighting sex differences in algorithm performance. In the papers that reported sex, there was a consistent underrepresentation of female patients in the data sets. No papers investigated racial or ethnic differences. In stage 2, we reproduced algorithms reported in the literature, achieving mean accuracies of 84.24% (SD 3.51%) for data set 1 and 85.72% (SD 1.75%) for data set 2 (random forest models). For data set 1, the FNR was significantly higher for female patients in 13 out of 16 experiments, meeting the threshold of statistical significance (–17.81% to –3.37%; P <.05). A smaller disparity in the false positive rate was significant for male patients in 13 out of 16 experiments (–0.48% to +9.77%; P <.05). We observed an overprediction of disease for male patients (higher false positive rate) and an underprediction of disease for female patients (higher FNR). Sex differences in feature importance suggest that feature selection needs to be demographically tailored.

Conclusions: Our research exposes a significant gap in cardiac ML research, highlighting that the underperformance of algorithms for female patients has been overlooked in the published literature. Our study quantifies sex disparities in algorithmic performance and explores several sources of bias. We found an underrepresentation of female patients in the data sets used to train algorithms, identified sex biases in model error rates, and demonstrated that a series of remediation techniques were unable to address the inequities present.

Introduction

Artificial intelligence (AI) has been proposed as an effective solution to many health care challenges and depends on the construction of machine learning (ML) algorithms from health care data. Recent research has drawn attention to the possibility that algorithms may exhibit bias when applied to different demographic groups [ 1 - 6 ]. Such biases may widen health inequalities and negatively impact marginalized patients, such as female patients, minoritized racial and ethnic groups, and other neglected subpopulations [ 1 - 7 ].

Over the past 5 years, an increasing number of studies have quantified disparities in algorithmic performance for underserved populations [ 2 - 7 ]. Daneshjou and colleagues [ 2 ] demonstrated that state-of-the-art dermatology algorithms tend to perform worse on darker skin tones; Seyyed-Kalantari and colleagues [ 3 ] exposed biases in radiology algorithms; and Thompson and colleagues [ 4 ] reported increased false negative errors when classifying opioid misuse disorder for Black patients compared to White patients. Beyond specific diagnoses, researchers have demonstrated that infrastructural AI systems used in hospital settings can be subject to referral bias, demonstrated by Obermeyer and colleagues [ 5 ] who highlighted a hospital treatment allocation algorithm that overlooked the health needs of Black patients. Yet despite the increasing number of papers describing this issue, most of the current uses of biomedical AI technologies do not account for the problem of bias [ 5 - 8 ]. Here, we evaluate algorithmic inequity in ML algorithms used for predicting cardiac disease, focusing on heart failure (HF).

HF is a clinical syndrome in which the heart is unable to maintain a cardiac output adequate to meet the metabolic demands of the body [ 9 ]. Traditionally, algorithmic tools capable of identifying at-risk patients have played a key role in informing decisions on HF management and end-of-life care [ 10 - 12 ]. In recent years, ML algorithms that leverage biochemical data have been proposed as a superior alternative to traditional statistical models for identifying at-risk patients with HF [ 13 ]. A range of ML techniques outperforms traditional risk scores in forecasting HF-related events [ 13 ]. Yet given that existing medical research has described sex differences in both the presentation and management of HF, algorithms trained on existing data may perform differently for male versus female patients [ 14 , 15 ].

Sex Differences in HF

HF presents differently in female patients compared with male patients [ 14 ]. Female patients experience a wider range of symptoms, including higher fluid overload and lower health-related quality of life [ 14 , 15 ]. Moreover, female patients who present with HF are on average older, sustain a higher ejection fraction (EF) throughout later stages of the disease, and have a lower incidence of previous ischemic heart disease [ 15 ]. Furthermore, the biochemical tests used to detect cardiac disease have been demonstrated to perform less well for female patients [ 16 ]. Troponin is 1 key biomarker used to predict disease, which has been demonstrated to be less sensitive in female patients [ 16 ]. Standard troponin criteria fail to detect 1 out of 5 acute myocardial infarcts occurring in female patients [ 16 ]. Historically, the neglect of sex differences in cardiac pathophysiology has disadvantaged female patients, and if not considered during ML development, these inequities may manifest in the novel algorithms being integrated into cardiac care [ 14 - 19 ].

In our research, we scope the published literature reporting algorithms that predict HF and investigate whether existing papers give attention to bias in ML algorithms. Furthermore, we examine the data sets of existing models for demographic representation, evaluate demographic inequities in algorithmic performance, and assess the efficacy of a series of bias-mitigation techniques.

Study Design

Our analysis consists of two stages: (1) a literature review of papers describing ML models used to predict HF and (2) a quantitative analysis of identified models, evaluating inequities in algorithm performance. The flowchart in Figure 1 provides an overview of our approach.

quantitative research importance across fields

Stage 1 Literature Review: Qualitative Evaluation of Published Papers

We searched PubMed and Web of Science between April 1, 2022, and May 22, 2022, to identify ML algorithms used to predict cardiac disease adhering to PRISMA (Preferred Reporting Items for Systematic Reviews and Meta-Analyses) guidelines for systematic reviews ( Figure 2 [ 20 ] and Tables S1 and S2 in Multimedia Appendix 1 [ 21 , 22 ]). All abstracts were reviewed, and papers were included for full-text review if they met the following criteria: (1) the target diagnosis was HF, (2) the model used biochemical markers to predict disease, and (3) the computational methods involved an ML approach (including supervised, unsupervised, and deep learning).

quantitative research importance across fields

Of the retained papers, full texts were then reviewed to evaluate whether authors (1) reported the demographic make-up of data sets and (2) evaluated demographic inequities in algorithm performance, meaning that the authors specifically examined differences in algorithmic performance by demographic groups defined by protected characteristics [ 17 ].

Throughout the literature review, any identified open-source data sets were maintained for use in stage 2.

Stage 2: Quantitative Evaluation of Model Performance

Two open-source data sets were uncovered in our literature review: (1) data set 1: University of California Irvine for Heart Failure Prediction [ 21 ] and (2) data set 2: University of California Irvine Cleveland Heart Disease data set for identifying coronary artery disease (CAD) [ 22 ]. Descriptive statistics were performed on both data sets, evaluating the mean and variance of the data set variables for sexes separately, affected by disease or death ( Table 1 and Tables S3-S5 in Multimedia Appendix 1 ).

VariablesSex and death (target variable)

Female (sex=0; n=105)Male (sex=1; n=194)

Survived (HF death=0)Death (HF death=1)Survived (HF death=0)Death (HF death=1)
Total count, n (%)71 (67.62)34 (32.38)132 (68.04)62 (32.96)
Age (years), mean (SD)58.6 (10.6)62.2 (12.3)58.8 (10.7)66.9 (13.5)
Anemia (Boolean), mean (SD)0.5 (0.5)0.6 (0.5)0.4 (0.5)0.4 (0.5)
Creatinine phosphokinase (mcg/L), mean (SD)462.0 (517.7)507.7 (779.7)582.8 (853.2)759.3 (1532.3)
Diabetes mellitus (Boolean), mean (SD)0.5 (0.5)0.6 (0.5)0.4 (0.5)0.3 (0.5)
Ejection fraction (percentage), mean (SD)41.9 (11.6)37.5 (14.6)39.4 (10.4)31.2 (10.7)
High blood pressure (Boolean), mean (SD)0.4 (0.5)0.5 (0.5)0.3 (0.5)0.4 (0.5)
Platelets (kiloplatelets/mL), mean (SD)289,757.6 (98,655.9)259,512.7 (107,588.6)254,232.4 (94,985.6)254,663.7 (94,060.8)
Serum creatinine (mg/dL), mean (SD)1.1 (0.6)1.9 (1.6)1.2 (0.7)1.8 (1.4)
Serum sodium (mEq/L), mean (SD)137.4 (3.6)135.5 (6.7)137.1 (4.2)135.3 (3.8)
Smoking (Boolean), mean (SD)0.0 (0.1)0.1 (0.3)0.5 (0.5)0.4 (0.5)

a Full details of data set variables are available in Tanvir et al [ 21 ].

b For the death variable, a value of 1 indicates mortality.

c HF: heart failure.

Using these data sets, we rebuilt the ML algorithms described in the published literature and performed an additional analysis exploring inequities in algorithmic performance for demographic subgroups. As the only protected characteristic reported was sex, we focus on sex disparities in performance. Despite our initial aim to focus on HF, we retained an uncovered CAD data set to investigate whether trends identified for HF generalized to patients with CAD [ 22 ]. Tables S3 and S4 in Multimedia Appendix 1 provide details on data set 1 and data set 2, respectively.

Model Reproduction

We rebuilt the models described in the existing literature for these data sets, focusing on random forest (RF) algorithms, which have been widely reported to be the most effective models [ 23 ]. For both data sets, data was split into test or training subsets (0.7:0.3), RF models were built using SciKit Learn, and RF parameters were tuned using GridSearch CV (SciKit Learn). We adopted a bootstrapping approach to quantify uncertainty, such that models were built, trained, and tested 100 times, from which average results were derived with SD.

Statistical Analysis

Across the 100 runs, sex differences in each algorithm evaluation metric (equations 1-10) were calculated and averaged, with accompanying statistical tests performed to evaluate for statistical significance of any identified sex disparities. Our method for examining differences in algorithmic error rates builds on the foundational work from Buolamwini and Gebru [ 24 ], who demonstrated that a range of ML algorithms for facial recognition performed poorly on darker-skinned female patients. To evaluate for statistical significance, independent 2-tailed t tests were performed where the data was normally distributed, and Mann-Whitney U tests were performed where the data was not normally distributed. Kolmogorov-Smirnov tests were used to assess for normality [ 25 ].

Variations in Model Development

We then introduced a variety of changes to the model development, to evaluate the impact on the identified sex disparities in performance.

Changes to Model Training Data

In total, 1 widely proposed bias mitigation technique includes preprocessing the training data of a model to account for demographic representation, with previous research highlighting the benefit of training on demographically balanced or demographically stratified data sets [ 26 ]. We therefore created a range of data sets with varied sex representation and assessed for the impact on algorithm performance disparities. To form the sex-balanced data set, we used the oversampling function of SMOTE() , which has been proposed as an effective method for improving the representation of underserved populations in ML data sets [ 27 ]. The SMOTE package generates new minority data points based on existing minority samples through linear interpolation [ 26 , 27 ]. Models were rebuilt as per the Model Reproduction section, using 4 different training data sets (sex-imbalanced, sex-balanced, and sex-specific; Tables S6 and S7 in Multimedia Appendix 1 ): (1) original sex-imbalanced training data, (2) sex-balanced training data, (3) female-only training data, and (4) male-only training data experiments.

Changes to Feature Selection

To understand why models make certain decisions, researchers in the domain of “explainable AI” have demonstrated how feature evaluation may provide important information regarding model performance for different subpopulations [ 26 , 28 ]. To do this, Shapley values have been widely accepted as a unified measure of feature importance since their proposal in 2017 [ 29 ].

In our experiments, we first perform an exploratory analysis, comparing feature importance for models trained on the male versus female data sets. Second, we create 4 feature subsets from the original data sets, to evaluate the impact of changing the feature selection on performance disparities. As described in the introduction, existing clinical research has described demographic differences in the biochemical and clinical markers of HF disease (eg, sex differences in EF and troponin levels) [ 16 ]. Thus, we delineate 4 different feature subsets that vary in this information, to examine whether certain feature subsets perform better for different demographic groups. These four feature subsets are described in detail in Tables S8 and S9 in Multimedia Appendix 1 and include (1) features with sex, (2) features without sex, (3) biochemical features, and (4) clinical features.

Our final series of experiments are therefore performed across the four training data sets (sex-imbalanced, sex-balanced, and sex-specific), and the four feature sets giving 16 total experiments: (1) original sex-imbalanced training data experiments (across four feature subsets), (2) sex-balanced training data experiments (across four feature subsets), (3) female training data experiments (across four feature subsets), and (4) male training data experiments (across four feature subsets)

Model Evaluation and Identification of Performance Disparities

Models are evaluated using global evaluation metrics (eg, accuracy) and specific error rates (eg, false negative rate [FNR]; equations 1-10). The difference between male and female scores is calculated to give a model’s “sex performance disparity” (equation 10). To evaluate for statistical significance, Kolmogorov-Smirnov Tests were used to assess for the normality of the data, following which independent 2-tailed t tests were performed where the data were normally distributed, and Mann-Whitney U tests were performed where the data were not normally distributed.

Our choice of evaluation metrics is guided by the clinical consequence of each of these scores.

The existing research on algorithmic bias has highlighted the importance of examining error rates, particularly in medicine where a false negative clinically translates to missed diagnoses or opportunities for treatment [ 3 - 6 , 26 ]. As described by Afrose and colleagues [ 26 ], focusing on global metrics of performance such as area under the receiver operating characteristic curve scores can neglect subtler disparities arising from differences in error rates affecting subgroups. When selecting a bias assessment metric, previous studies have chosen to focus on FNR and false positive rate (FPR), due to the clinical implications of these errors [ 4 , 30 , 31 ]. Equations 5-8 places the error rates in their clinical context, demonstrating that the FNR represents missed diagnoses and potentially missed treatment. For the error rates, we use the threshold of 0.5, as we are investigating performance inequities in the existing reported models that used these default settings.

Error rate definitions are as follows:

quantitative research importance across fields

Clinical implications of error rates are as follows:

True Positive Rate = Correct diagnosis that patient as disease (5)
False Positive Rate = Misdiagnosis of disease when patient is healthy (6)
True Negative Rate = Correct diagnosis that patient is healthy (7)
False Negative Rate = Misdiagnosis that patient is healthy when patient has disease (8)

The accuracy evaluation metric is calculated as follows:

quantitative research importance across fields

Sex performance disparity is calculated as follows:

Sex performance disparity = Score for male patients (mean) – Score for female patients (mean) (10)

Fairness Techniques: Fair Adversarial Gradient Tree Boosting

We implemented a recent fairness technique to evaluate whether these approaches applied to bias in HF algorithms. The Fair Adversarial Gradient Tree Boosting (FAGTB) is a recent technique proposed by Grari et al [ 8 ] for mitigating bias in decision tree classifiers and the authors demonstrate the success of their technique on 4 data sets. The authors focus on 2 definitions of fairness: demographic parity and equalized odds [ 8 ]. The equalized odds metric focuses on model FPR and FNR, and hence we highlight this for our paper. A summary of these fairness metrics is provided in Section S1 in Multimedia Appendix 2 for further interest.

The definition of equalized odds is as follows:

quantitative research importance across fields

To assess for the equalized odds the authors measure the disparate mistreatment, which computes the absolute difference between FPR and the FNR for both demographics.

The disparate FPR is calculated as follows:

quantitative research importance across fields

The disparate FNR is calculated as follows:

quantitative research importance across fields

We compare the performance of the FAGTB algorithm to a standard Gradient Tree Algorithm. As per the original FAGTB paper, we repeat 10 experiments randomly sampling 2 subsets (0.8:0.2) and report evaluation metrics for the test set.

Ethical Considerations

Ethical approval was not required for this study as all data used were sourced from publicly available open-source data sets [ 21 , 22 ] under a CC-BY 4.0 license. No direct patient contact or sensitive personal data was involved, ensuring compliance with research standards.

Literature Review Search Results

Our search returned 127 papers, of which 60 met the criteria for full review and 3 highlighted sex differences in model performance. In the papers that reported sex, there was a consistent underrepresentation of female patients. No papers investigated racial or ethnic differences. Further, 1 paper focused specifically on female patients with HF, in which Tison et al [ 32 ] highlighted that HF was more common in people who were older, White, with a higher mean number of pregnancies, a higher BMI, and were less likely to have Medicare.

Descriptive Statistics and Feature Importance

Data set 1 (hf).

The mean descriptive statistics for each feature present in the HF data set are provided in Table 1 , which demonstrates subtle sex differences in the presentation of the disease. For HF deaths, male patients tend to be older than their female counterparts, with a higher creatinine phosphokinase, lower likelihood of diabetes, lower EF, and lower blood pressure.

Our exploratory analysis identified further sex differences on examining feature importance. Figure 3 compares the rankings of feature importance for ML models built to predict HF built from the female data set compared to the male data set. These differences are important as existing ML algorithms built on mixed-sex cohorts suggest that EF can be used alone for modeling, an approach that may disadvantage female patients [ 23 ].

quantitative research importance across fields

Data Set 2 (CAD)

Table S5 in Multimedia Appendix 1 provides details of the CAD data set and demonstrates that female patients with CAD have higher resting blood pressure and higher cholesterol compared to male patients. The categorical variable “resting electrocardiogram” is also higher for female patients, due to a higher incidence of left ventricular hypertrophy.

Model Results and Performance Disparities

We replicated the algorithms described in the existing literature, reproducing the same previously reported mean predictive accuracies of 84.24% (3.51 SD) for data set 1 and 85.72% (1.75 SD) for data set 2 [ 23 ]. In Tables 2 and 3 , we present the disparity in performance for the sexes, where a positive value indicates a higher value for male patients (see equation 10).

For data set 1, Table 2 demonstrates that in 13 out of 16 experiments, the FNR is higher for female patients, meeting the threshold of statistical significance (mean difference of –17.81% to –3.37%; P <.05). Figure 4 represents this disparity in performance graphically, providing the point estimates of FNR for the sexes separately and highlighting that the disparity in FNR persisted across the variations in training data and selected features.

Disparity in model performance (score for male patients – score for female patients)Feature subset used in model training

Features with sex valueFeatures without sex valueBiochemical features valueClinical features value

Accuracy disparity (%)1.63.03 –0.72.300.10.88–0.50.49

ROC_AUC disparity (%)3.14<.01 0.43.611.51.090.47.60

FNR disparity (%)–7.53<.01 –3.84.02 –5.15.01 –3.49.049

FPR disparity (%)1.26.072.97<.01 2.11<.01 2.56<.01

Accuracy disparity (%)–4.78<.01 –7.25<.01 –9.42<.01 –3.63<.01

ROC_AUC disparity (%)7.0<.01 4.27<.01 0.15.838.32<.01

FNR disparity (%)–17.81<.01 –13.91<.01 –3.37.04 –16.09<.01

FPR disparity (%)3.90<.01 5.37<.01 3.07<.001 –0.54.24

Accuracy disparity (%)–10.95<.01 –9.75<.01 –12.32<.01 –9.64<.01

ROC_AUC disparity (%)0.60.570.57.23–2.92<.01 –0.53.07

FNR disparity (%)–7.42<.01 –10.91<.01 –2.24.271.55.01

FPR disparity (%)8.61<.01 9.77<.01 8.08<.01 –0.48.04

Accuracy disparity (%)–5.46<.01 –5.73<.01 –8.73<.01 –2.46<.01

ROC_AUC disparity (%)4.98<.01 4.54<.01 –1.59.049 8.32<.01

FNR disparity (%)–13.96<.01 –13.32<.01 –1.68.33–16.58<.01

FPR disparity (%)4.00<.01 4.24<.01 4.86<.01 –0.06.35

a Indicates a statistically significant difference ( P <.05) between the model’s performance on male versus female patients.

a ROC_AUC: area under the receiver operating characteristic curve.

b FNR: false negative rate.

c FPR: false positive rate.

Disparity in model performance (score for male patients – score for female patients)Feature subset used in model training

Features with sex valueFeatures without sex valueBiochemical features valueClinical features value

Accuracy disparity (%)0.32.500.64.170.13.800.25.61

ROC_AUC disparity (%)3.86<.01 4.24<.01 3.05<.01 3.91<.01

FNR disparity (%)–11.66<.01 –12.52<.01 –10.81<.01 –12.38<.01

FPR disparity (%)3.94<.01 4.04<.01 4.71<.01 4.57<.01

Accuracy disparity (%)–4.01<.01 –5.12<.01 –7.32<.01 –2.86<.01

ROC_AUC disparity (%)–3.89.01 –4.91.01 –7.18<.001 –2.75<.01

FNR disparity (%)7.69<.01 10.54<.01 15.59<.01 6.61<.01

FPR disparity (%)0.10.87–0.72.19–1.23.29–1.11.06

Accuracy disparity (%)–9.25<.01 –11.34<.01 –11.49<.01 –8.69<.01

ROC_AUC disparity (%)–8.97<.01 –10.95<.01 –11.10<.01 –8.45<.01

FNR disparity (%)18.98<.01 22.60<.01 27.23<.01 17.86<.01

FPR disparity (%)–1.04.07–0.70.20–5.02<.01 –0.96.09

Accuracy disparity (%)6.38<.01 5.66<.01 –1.66.02 6.10<.01

ROC_AUC disparity (%)6.30<.01 5.57<.01 1.52.075.86.01

FNR disparity (%)–10.12<.01 –10.10<.001 1.67.17–12.64<.01

FPR disparity (%)–2.48<.01 –1.04.071.38.240.92.15

a Indicates a statistically significant difference ( P <.05) between the model’s performance on male versus female patients. To determine statistical significance, the Kolmogorov-Smirnov tests were first run on the sex-stratified results to determine the distribution of data (normal or not). Independent 2-tailed t tests were used where data were normally distributed, and Mann-Whitney U tests were used when data were not normally distributed.

b ROC_AUC: area under the receiver operating characteristic curve.

c FNR: false negative rate.

d FPR: false positive rate.

A smaller disparity in the FPR was statistically significant for male patients in 13 out of 16 experiments (–0.48% to +9.77%; P <.05). The sex performance disparities in accuracy and area under the receiver operating characteristic curve varied depending on the underlying shifts in the error rates for each sex ( Table 2 and Figure 5 ). On examining the individual error rates, we see consistencies in the sex disparities across feature sets, most notably an overprediction of disease for male patients (higher FPR) and an underprediction of disease for female patients (higher FNR: Table 2 ).

quantitative research importance across fields

Our findings for data set 2 were similar to those for data set 1, such that models built on the original sex-imbalanced data set demonstrated a higher FNR for female patients (mean difference of –10.81% to –12.52%; P <.05; Table 3 ) and a higher FPR for male patients (3.94% to 4.71%; P <.05; Table 3 ). Figure 6 visualizes the disparity graphically, and demonstrates that, unlike data set 1, the disparity in error rates reversed when training on sex-balanced data and female-only data ( Figure 6 ). Figure 7 illustrates the disparity in accuracy between the sexes, where we see that the direction of the disparity varies depending on the training data and feature set ( Figure 7 ).

quantitative research importance across fields

Variations in Training Data

Sex-balanced training data.

Training on sex-balanced data led to a fall in mean accuracy for all patients in data set 1 (76%, SD 3.46% vs 84.24%, SD 3.51%), with a more substantial drop in mean accuracy for male patients (73.61%, SD 4.84% vs 84.84%, SD 4.16%; Table 4 and Figure 5 ). The opposite trend was seen in data set 2, with models trained on sex-balanced data outperforming models trained on sex-imbalanced data for all patients (87.65%, SD 1.77% vs 85.72%, SD 1.75%) and for female patients (89.66%, SD 2.44% vs 85.48%, SD 4.12%; Table 4 ). The models trained on sex-balanced data in data set 2 reduced the FNR for both sexes when using the full feature set (female patients 4.79%, SD 2.58% vs 24.86%, SD 11.35%; male patients 12.48%, SD 4.11% vs 13.19%, SD 3.26%; Table 4 and Figure 6 ). The differences between the data sets may relate to underlying differences in the 2 cardiac conditions. Further, the failure to improve performance with sex-balanced training data may reflect the issues of mixing data that has conflicting indicators for disease.

ResultsData set 1 (heart failure)Data set 2 (coronary artery disease)

Sex-imbalanced training data (n=209)Sex-balanced training data (n=272)Female training data (n=136)Male training data (n=136)Sex-imbalanced training data (n=522)Sex-balanced training data (n=715)Female training data (n=358)Male training data (n=358)









All patients, mean accuracy (SD)84.24 (3.51)76.0 (3.46)74.68 (3.53)75.12 (3.71)85.72 (1.75)87.65 (1.77)86.06 (1.67)82.63 (1.94)
Female patients, mean accuracy (SD)83.21 (6.37)78.39 (19.68)80.15 (4.43)77.85 (5.21)85.48 (4.12)89.66 (2.44)90.69 (2.38)79.44 (3.20)
Male patients, mean accuracy (SD)84.84 (4.16)73.61 (4.84)69.20 (5.96)72.39 (5.32)85.80 (2.14)85.65 (2.23)81.44 (3.02)85.82 (2.30).
Female patients, mean FNR (SD)35.98 (16.72)85.25 (14.58)74.04 (17.68)78.66 (14.0)24.86 (11.35)4.79 (2.58)4.00 (2.74)22.32 (5.25)
Male patients, mean FNR (SD)28.45 (10.41)67.43 (16.6)66.62 (17.32)64.70 (14.9)13.19 (3.26)12.48 (4.11)22.97 (5.20)12.20 (3.41)

a FNR: false negative rate.

Sex-Specific Training Data

For data set 1, mean accuracy for all patients when trained on sex-imbalanced data (84.24%, SD 3.51%) falls when training both on female-specific data (74.68%, SD 3.53%) and male-specific training data (75.12%, SD 3.71%), likely related to the smaller training data. For data set 2, mean accuracy for all patients when trained on sex-imbalanced data (85.72%, SD 1.75%) improves when training on female-specific data (86.06%, SD 1.67%) and falls when training on male-specific training data (82.62%, SD 1.94%). The overall improvement seen in the data set 2 models when trained on female data, relates to the increase in accuracy for female patients (90.69%, SD 2.38% vs 85.48%, SD 4.12%) co-occurring with a smaller decrease in accuracy for male patients (81.44%, SD 3.02% vs 85.80%, SD 2.14%; Table 3 and Figure 7 ).

Unsurprisingly, performance for each sex is lowest when trained on the opposing sex ( Table 4 , Figures 4 - 7 ). In data set 1, same-sex training was preferable to opposite-sex training; however, this did not improve results compared to the models built from sex-imbalanced and sex-balanced training data, likely relating to the smaller sample size ( Table 4 ). In contrast, data set 2 had greater training data available and demonstrated that sex-specific training is beneficial to both sexes above the sex-imbalanced models ( Table 4 ).

Variations in Feature Sets

Models built on the biochemical features subset gave the worst performance in terms of accuracy and FNR ( Figures 4 - 7 ). For data set 2, biochemical features included just cholesterol and fasting blood sugar, and so, the fall in performance may relate to information loss. Additionally, Table S5 in Multimedia Appendix 1 highlights the different biochemical profiles for male and female patients who were sick, with female patients who were sick demonstrating a far higher cholesterol level than their male counterparts (mean values: 279.2 female patients who were sick vs 247.5 male patients who were sick).

FAGTB Model

The disparity in false negative rate (DispFNR) was consistently higher than the disparity in false positive rate ( Table 5 ). Compared to the Gradient Boosting Classifier, the FAGTB reduced the DispFNR for both data sets (data set 1: 0.20 vs 0.21; data set 2: 0.19 vs 0.28), however, the DispFNR that disadvantaged female patients persisted. The fall in DispFNR and disparity in false positive rate that occurred with FAGTB was associated with a fall in overall accuracy for both data sets.

Results on test set, averaged over 10 experimentsGradient boosting classifierFAGTB

Accuracy71.371.2

DispFPR 0.080.08

DispFNR 0.210.20

Accuracy86.382.9

DispFPR0.060.06

DispFNR0.280.19

a DispFPR: disparity in false positive rate.

b DispFNR: disparity in false negative rate.

Principal Findings

Our study sheds light on an important gap in existing cardiac ML research, with significant implications for digital health equity. We find that the majority of published ML studies predicting HF fail to acknowledge the underrepresentation of female patients in their data sets and do not perform stratified model evaluations, thus failing to assess sex disparities in algorithmic performance. Our secondary evaluation of 2 cardiac data sets exposed a neglected sex disparity in model performance, highlighting the importance of integrating these methods into future studies that use ML methods for cardiac modeling. In our approach, we identified several potential sources of algorithmic bias.

First, we detected the underrepresentation of female patients in training data sets that may produce inequalities in model fidelity. Despite introducing oversampling techniques to address this omission, the disparities in performance persisted suggesting that addressing data set representation alone is not a sufficient measure for mitigating bias. Further, our experiments demonstrated that oversampling could reduce overall performance, which may result from the mixing of conflicting data (ie, male vs female feature rankings). In addition, oversampling with synthetic instances solely from the data set at hand does not provide the machine with more information, it simply redirects attention and therefore cannot easily compensate for demographic underrepresentation [ 33 ]. When balancing the data set, our methods did not include undersampling due to our small data sets, however, this may be a potential avenue for future research.

Second, we considered featurization and highlighted sex differences in the biochemical manifestation of disease. In current clinical practice, the diagnostic parameters used for identifying pathology are drawn from research trials dominated by male physiology: it is perhaps unsurprising therefore that algorithms built from these data tend to underperform in female disease. There is a growing body of research that critiques the use of unisex thresholds in medicine for biochemical tests; our sex-stratified analysis of the cardiac data sets and the identified sex differences in feature rankings supports these proposals [ 16 ].

There are further sources of inequitable performance that our evaluation cannot distinguish between. It may be that the sex differences in the physiological expression of disease mean that the prediction is harder to extract from 1 population. As a result, 1 sex may require more complex models than another, with differing architecture and degrees of flexibility. It may also simply be that there are differences in the predictability of 1 group compared with another, such that if the physiology of 1 group is more opaque, it may ultimately not be possible to resolve the observed disparities. McCradden and colleagues [ 34 ] detail this challenge further in their review, highlighting that differences across groups may not always indicate inequity. There are complex causal relationships between biological, environmental, and social factors that underpin the differences in disease rates seen across population subgroups [ 34 ]. While models must not promote different standards of care according to protected characteristics, differences between groups may not necessarily reflect discriminatory practice [ 34 ].

Our research was limited by the available information in the data sets. The absence of race or ethnicity data precluded the evaluation of their effects. Furthermore, the absence of other demographic data in the studies we identified prevented the investigation of health inequities that might impact the LGBTQ+ (lesbian, gay, bisexual, transgender, queer) community, disadvantaged socioeconomic groups, or other subgroups. Previous research has described historic and institutional biases that contribute to worse health outcomes for these groups, and evolving AI systems require the same scrutiny to ensure these harms do not become embedded within digital systems [ 35 - 37 ].

Throughout this paper, we have used the terms male and female to reference biological sex, so as not to conflate sex and gender. With the ongoing problematic conflation of sex and gender in medicine, stratification of model performance by either sex or gender is often impossible, which was noted in our own work [ 35 - 37 ]. Beyond the features discussed above, there is a wide range of additional factors that we cannot account for. For example, creatinine phosphokinase was a key feature in HF modeling yet existing studies have demonstrated the variation in these levels for manual laborers and athletes, illustrating how occupation may impact a patient’s physiology [ 38 ].

To account for the complex interactions that potentiate disease, and the heterogeneous nature of patient cohorts, we require more complex modeling capable of capturing the full range of intersecting factors influencing patient health (eg, sex differences may be mediated by income). Unsupervised high-dimensional representation learning may be the path forward for this purpose [ 39 ]. In addition to improving representation, unsupervised techniques enable us to detect neglected subpopulations without predetermining a characteristic of interest, facilitating the identification of the previously overlooked disadvantaged. In this sense, AI may provide a route forward to uncovering and addressing bias, by deploying more complex modeling that can improve patient representation and by revealing previously neglected disparities in the provision of care.

Conclusions and Limitations

In our paper, we have identified inequities in the performance of cardiac ML algorithms. Our findings are limited by the small size of the uncovered data sets, reducing their potential generalizability, and hence we propose that larger studies focused on this issue are required. These data sets also came from the same source, as we found a limited number of open-access databases due to the confidential nature of patient data and issues of proprietary ownership. In addition, we focused on RF models to replicate the papers uncovered in our literature search; however, ML models may differ in their degrees of performance disparity, and an evaluation across the range of ML model options is an important next step.

In our paper we did not attempt to solve bias; instead, we highlighted a problem that exists throughout cardiology that requires further attention. The issue we have identified in these ML models is a foundational problem across medical modeling, in any instance where the use of an “average” is applied to a diverse population. It is possible that unsupervised ML and complex representational modeling may be a route forward for capturing heterogeneity in a previously unattainable manner and addressing issues of bias [ 39 ]. Our findings demonstrate that examining performance inequities across demographic subgroups is an essential approach for identifying biases in AI and preventing the perpetuation of inequalities in digital health systems.

Acknowledgments

The data sets analyzed during this study are publicly available. Data set 1 is available from the University of California Irvine Machine Learning Repository [ 21 ]. Data set 2 is available from the IEEE Dataport Repository [ 22 ]. This work was supported by UK Research and Innovation (UKRI; EP/S021612/1).

Conflicts of Interest

None declared.

Details of literature search and data sets.

Details of Fair Adversarial Gradient Tree Boosting.

  • O'Neil C. Weapons of math destruction: how big data increases inequality and threatens democracy. New York City, U.S. Crown; 2017.
  • Daneshjou R, Vodrahalli K, Novoa RA, Jenkins M, Liang W, Rotemberg V, et al. Disparities in dermatology AI performance on a diverse, curated clinical image set. Sci Adv. 2022;8(32):eabq6147. [ FREE Full text ] [ CrossRef ] [ Medline ]
  • Seyyed-Kalantari L, Liu G, McDermott M, Chen IY, Ghassemi M. CheXclusion: fairness gaps in deep chest X-ray classifiers. Biocomputing. 2021:232-243. [ FREE Full text ] [ CrossRef ] [ Medline ]
  • Thompson HM, Sharma B, Bhalla S, Boley R, McCluskey C, Dligach D, et al. Bias and fairness assessment of a natural language processing opioid misuse classifier: detection and mitigation of electronic health record data disadvantages across racial subgroups. J Am Med Inform Assoc. 2021;28(11):2393-2403. [ FREE Full text ] [ CrossRef ] [ Medline ]
  • Obermeyer Z, Powers B, Vogeli C, Mullainathan S. Dissecting racial bias in an algorithm used to manage the health of populations. Science. 2019;366(6464):447-453. [ FREE Full text ] [ CrossRef ] [ Medline ]
  • Cirillo D, Catuara-Solarz S, Morey C, Guney E, Subirats L, Mellino S, et al. Sex and gender differences and biases in artificial intelligence for biomedicine and healthcare. NPJ Digit Med. 2020;3:81. [ FREE Full text ] [ CrossRef ] [ Medline ]
  • Liu X, Hu P, Yeung W, Zhang Z, Ho V, Liu C, et al. Illness severity assessment of older adults in critical illness using machine learning (ELDER-ICU): an international multicentre study with subgroup bias evaluation. Lancet Digit Health. 2023;5(10):e657-e667. [ FREE Full text ] [ CrossRef ] [ Medline ]
  • Grari V, Ruf B, Lamprier S, Detyniecki M. Fair adversarial gradient tree boosting. 2019. Presented at: 2019 IEEE International Conference on Data Mining (ICDM); November 8-11, 2019:1060-1065; Beijing, China. URL: https://ieeexplore.ieee.org/document/8970941 [ CrossRef ]
  • Savarese G, Becher PM, Lund LH, Seferovic P, Rosano GMC, Coats AJS. Global burden of heart failure: a comprehensive and updated review of epidemiology. Cardiovasc Res. 2022;118(17):3272-3287. [ FREE Full text ] [ CrossRef ] [ Medline ]
  • Goldraich L, Beck-da-Silva L, Clausell N. Are scores useful in advanced heart failure? Expert Rev Cardiovasc Ther. 2009;7(8):985-997. [ CrossRef ] [ Medline ]
  • Treece J, Chemchirian H, Hamilton N, Jbara M, Gangadharan V, Paul T, et al. A review of prognostic tools in heart failure. Am J Hosp Palliat Med. 2018;35(3):514-522. [ CrossRef ] [ Medline ]
  • Thorvaldsen T, Benson L, Ståhlberg M, Dahlström U, Edner M, Lund LH. Triage of patients with moderate to severe heart failure: who should be referred to a heart failure center? J Am Coll Cardiol. 2014;63(7):661-671. [ FREE Full text ] [ CrossRef ] [ Medline ]
  • Escamilla AKG, Hassani AHE, Andres E. A comparison of machine learning techniques to predict the risk of heart failure. In: Machine Learning Paradigms: Applications of Learning and Analytics in Intelligent Systems. Switzerland AG. Springer; 2019:9-26.
  • Sullivan K, Doumouras BS, Santema BT, Walsh MN, Douglas PS, Voors AA, et al. Sex-specific differences in heart failure: pathophysiology, risk factors, management, and outcomes. Can J Cardiol. 2021;37(4):560-571. [ CrossRef ] [ Medline ]
  • Walsh MN, Jessup M, Lindenfeld J. Women with heart failure: unheard, untreated, and unstudied. J Am Coll Cardiol. 2019;73(1):41-43. [ FREE Full text ] [ CrossRef ] [ Medline ]
  • Sobhani K, Castro DKN, Fu Q, Gottlieb RA, Van Eyk JE, Merz CNB. Sex differences in ischemic heart disease and heart failure biomarkers. Biol Sex Differ. 2018;9(1):43. [ FREE Full text ] [ CrossRef ] [ Medline ]
  • Straw I. The automation of bias in medical Artificial Intelligence (AI): decoding the past to create a better future. Artif Intell Med. 2020;110:101965. [ CrossRef ] [ Medline ]
  • Hamberg K. Gender bias in medicine. Womens Health (Lond). 2008;4(3):237-243. [ FREE Full text ] [ CrossRef ] [ Medline ]
  • Krieger N, Fee E. Man-made medicine and women's health: the biopolitics of sex/gender and race/ethnicity. Int J Health Serv. 1994;24(2):265-283. [ CrossRef ] [ Medline ]
  • PRISMA flow diagram. PRISMA. URL: https://www.prisma-statement.org/prisma-2020-flow-diagram [accessed 2024-07-09]
  • Tanvir AAM, Bhatti SH, Aftab M, Raza MA. Heart failure clinical records data set F. University of California Irvine Machine Learning Repository. 2020. URL: https://archive.ics.uci.edu/dataset/519/heart+failure+clinical+records [accessed 2024-05-17]
  • Siddhartha M. Heart disease dataset (comprehensive). IEEE Dataport. 2020. URL: https://ieee-dataport.org/open-access/heart-disease-dataset-comprehensive [accessed 2024-05-17]
  • Chicco D, Jurman G. Machine learning can predict survival of patients with heart failure from serum creatinine and ejection fraction alone. BMC Med Inform Decis Mak. 2020;20(1):16. [ FREE Full text ] [ CrossRef ] [ Medline ]
  • Buolamwini J, Gebru T. Gender shades: intersectional accuracy disparities in commercial gender classification. 2018. Presented at: 1st Conference on Fairness, Accountability and Transparency, PMLR 81; February 23-24, 2018:77-91; New York, NY. URL: https://proceedings.mlr.press/v81/buolamwini18a.html
  • Mishra P, Pandey CM, Singh U, Gupta A, Sahu C, Keshri A. Descriptive statistics and normality tests for statistical data. Ann Card Anaesth. 2019;22(1):67-72. [ FREE Full text ] [ CrossRef ] [ Medline ]
  • Afrose S, Song W, Nemeroff CB, Lu C, Yao DD. Subpopulation-specific machine learning prognosis for underrepresented patients with double prioritized bias correction. Commun Med (Lond). 2022;2:111. [ FREE Full text ] [ CrossRef ] [ Medline ]
  • Chawla NV, Bowyer KW, Hall LO, Kegelmeyer WP. SMOTE: synthetic minority over-sampling technique. J Artif Intell Res. 2002;16(3):321-357. [ FREE Full text ] [ CrossRef ] [ Medline ]
  • Islam SR, Eberle W, Ghafoor SK, Ahmed M. Explainable artificial intelligence approaches: a survey. arXiv. Preprint posted online on January 23, 2021. [ FREE Full text ]
  • Lundberg SM, Lee SI. A unified approach to interpreting model predictions. 2017. Presented at: NIPS'17: 31st International Conference on Neural Information Processing Systems; December 4-9, 2017:4768-4777; Long Beach, CA. URL: https://dl.acm.org/doi/proceedings/10.5555/3295222
  • Borgese M, Joyce C, Anderson EE, Churpek MM, Afshar M. Bias assessment and correction in machine learning algorithms: a use-case in a natural language processing algorithm to identify hospitalized patients with unhealthy alcohol use. AMIA Annu Symp Proc. 2022;2021:247-254. [ FREE Full text ] [ Medline ]
  • Allen A, Mataraso S, Siefkas A, Burdick H, Braden G, Dellinger RP, et al. A racially unbiased, machine learning approach to prediction of mortality: algorithm development study. JMIR Public Health Surveill. 2020;6(4):e22400. [ FREE Full text ] [ CrossRef ] [ Medline ]
  • Tison GH, Avram R, Nah G, Klein L, Howard BV, Allison MA, et al. Predicting incident heart failure in women with machine learning: the women's health initiative cohort. Can J Cardiol. 2021;37(11):1708-1714. [ FREE Full text ] [ CrossRef ] [ Medline ]
  • Pombo G, Gray R, Cardoso MJ, Ourselin S, Rees G, Ashburner J, et al. Equitable modelling of brain imaging by counterfactual augmentation with morphologically constrained 3D deep generative models. Med Image Anal. 2023;84:102723. [ FREE Full text ] [ CrossRef ] [ Medline ]
  • McCradden MD, Joshi S, Mazwi M, Anderson JA. Ethical limitations of algorithmic fairness solutions in health care machine learning. Lancet Digit Health. 2020;2(5):e221-e223. [ FREE Full text ] [ CrossRef ] [ Medline ]
  • Safer JD, Coleman E, Feldman J, Garofalo R, Hembree W, Radix A, et al. Barriers to healthcare for transgender individuals. Curr Opin Endocrinol Diabetes Obes. 2016;23(2):168-171. [ FREE Full text ] [ CrossRef ] [ Medline ]
  • Rutherford L, Stark A, Ablona A, Klassen BJ, Higgins R, Jacobsen H, et al. Health and well-being of trans and non-binary participants in a community-based survey of gay, bisexual, and queer men, and non-binary and two-spirit people across Canada. PLoS One. 2021;16(2):e0246525. [ FREE Full text ] [ CrossRef ] [ Medline ]
  • Beckwith N, McDowell MJ, Reisner SL, Zaslow S, Weiss RD, Mayer KH, et al. Psychiatric epidemiology of transgender and nonbinary adult patients at an urban health center. LGBT Health. 2019;6(2):51-61. [ FREE Full text ] [ CrossRef ] [ Medline ]
  • Vejjajiva A, Teasdale GM. Serum creatine kinase and physical exercise. Br Med J. 1965;1(5451):1653-1654. [ FREE Full text ] [ CrossRef ] [ Medline ]
  • Carruthers R, Straw I, Ruffle JK, Herron D, Nelson A, Bzdok D, et al. Representational ethical model calibration. NPJ Digit Med. 2022;5(1):170. [ FREE Full text ] [ CrossRef ] [ Medline ]

Abbreviations

artificial intelligence
coronary artery disease
disparity in false negative rate
ejection fraction
Fair Adversarial Gradient Tree Boosting
false negative rate
false positive rate
heart failure
lesbian, gay, bisexual, transgender, queer
machine learning
Preferred Reporting Items for Systematic Reviews and Meta-Analyses
random forest

Edited by A Mavragani; submitted 03.03.23; peer-reviewed by J Zeng, S Antani, E van der Velde, L Guo; comments to author 16.06.23; revised version received 13.10.23; accepted 04.05.24; published 26.08.24.

©Isabel Straw, Geraint Rees, Parashkev Nachev. Originally published in the Journal of Medical Internet Research (https://www.jmir.org), 26.08.2024.

This is an open-access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work, first published in the Journal of Medical Internet Research (ISSN 1438-8871), is properly cited. The complete bibliographic information, a link to the original publication on https://www.jmir.org/, as well as this copyright and license information must be included.

IMAGES

  1. Importance of Quantitative Research Across Different Fields

    quantitative research importance across fields

  2. Lesson 2- The importance of Quantitative Research across fields-2.pdf

    quantitative research importance across fields

  3. Importance of Quantitative Research Across Fields

    quantitative research importance across fields

  4. SOLUTION: Importance of quantitative research across fields

    quantitative research importance across fields

  5. Chapter 3

    quantitative research importance across fields

  6. SOLUTION: Importance of quantitative research across fields

    quantitative research importance across fields

VIDEO

  1. Importance of Quantitative Research Across Fields

  2. Quantitative Research Purposes: Updating the Previous Theories

  3. Quantitative Research:Importance, Characteristics

  4. Types of Quantitative Research

  5. What is Quantitative Research

  6. Research Paradigms: From Measurements to Social Liberation

COMMENTS

  1. What Is Quantitative Research? An Overview and Guidelines

    The necessity, importance, relevance, and urgency of quantitative research are articulated, establishing a strong foundation for the subsequent discussion, which delineates the scope, objectivity, goals, data, and methods that distinguish quantitative research, alongside a balanced inspection of its strengths and shortcomings, particularly in ...

  2. The Importance of Quantitative Research Across Fields ...

    This video lecture discusses the importance of Quantitative Research across fields. This is part of the course in Pratical Research 2, which is different fro...

  3. Importance of Quantitative Research Across Fields

    Importance of Quantitative Research Across Fields. First of all, research is necessary and valuable in society because, among other things, 1) it is an important tool for building knowledge and facilitating learning; 2) it serves as a means in understanding social and political issues and in increasing public awareness; 3) it helps people ...

  4. A Practical Guide to Writing Quantitative and Qualitative Research

    INTRODUCTION. Scientific research is usually initiated by posing evidenced-based research questions which are then explicitly restated as hypotheses.1,2 The hypotheses provide directions to guide the study, solutions, explanations, and expected results.3,4 Both research questions and hypotheses are essentially formulated based on conventional theories and real-world processes, which allow the ...

  5. Quantitative Research Methods: Maximizing Benefits, Addressing

    Quantitative and qualitative methods are the engine behind evidence-based outcomes. For decades, one of the popular phenomena that troubled young researchers is that which appropriate research ...

  6. Why Is Quantitative Research Important?

    Advantages of Quantitative Research. Quantitative researchers aim to create a general understanding of behavior and other phenomena across different settings and populations. Quantitative studies are often fast, focused, scientific and relatable. 4. The speed and efficiency of the quantitative method are attractive to many researchers.

  7. What Is Quantitative Research?

    Revised on June 22, 2023. Quantitative research is the process of collecting and analyzing numerical data. It can be used to find patterns and averages, make predictions, test causal relationships, and generalize results to wider populations. Quantitative research is the opposite of qualitative research, which involves collecting and analyzing ...

  8. (PDF) Quantitative Research: A Successful Investigation in Natural and

    Quantitative research explains phenomena by collecting numerical unchanging d etailed data t hat. are analyzed using mathematically based methods, in particular statistics that pose questions of ...

  9. Quantitative Research

    Quantitative research has many applications across a wide range of fields. Here are some common examples: Market Research: Quantitative research is used extensively in market research to understand consumer behavior, preferences, and trends. Researchers use surveys, experiments, and other quantitative methods to collect data that can inform ...

  10. Advances in quantitative research within the psychological sciences

    We believe that a focus on statistical advancements will strengthen and enhance research inquiries across the diverse subdisciplines comprising the psychological sciences. We hope that this special issue will provide a springboard for discussion about the importance of quantitative methodology within the field.

  11. Definition of quantitative research and its importance

    Quantitative research is a systematic empirical approach used in the social sciences and various other fields to gather, analyze, and interpret numerical data. It focuses on obtaining measurable data and applying statistical methods to generalize findings to a larger population. Researchers use structured instruments such as surveys ...

  12. Importance of Quantitative Research Across Fields

    Module 1 Lesson 2 Illustrates the importance of quantitative research across fields (CS_RS12-LA-C-2)Basically, quantitative research is important because it ...

  13. MODULE 1 (Lesson 2

    Video Lesson (PRACTICAL RESEARCH 2 - MELC's)Quarter 1 - Module 1: Nature of Inquiry and Research

  14. Chapter 3

    The Importance of Quantitative Research across Fields. null. What communicative behaviors are used to respond to co-workers displaying emotional stress? (Allen, Titsworth, Hunt, 2009) 3. QUANTITATIVE RESEARCH and SPORTS MEDICINE Quantitative research is used to analyze how sports may be used as an alternative way of medicating an illness. An ...

  15. Module 2-Importance of Quantitative Research across

    Welcome to the Practical Research 2 Alternative Delivery Mode (ADM) Module on Importance of Quantitative Research Across the Fields. The hand is one of the most symbolized part of the human body. It is often used to depict skill, action and purpose. Through our hands we may learn, create and accomplish.

  16. Module 2

    Importance of Quantitative Research Across Fields. Politics, Governance and Public Administration. Ø Quantitative can be used in political surveys by political candidates and voters to assess chances of winning and to determine the areas where they need to develop support.

  17. Importance of Quantitative Research Across Different Fields

    Quantitative research is important across many fields such as business, political science, psychology, medicine, economics, demographics, and education. It measures attributes and behaviors in populations to understand consumer attitudes, clinical standards, economic policies, population trends, educational programs and more. The results of quantitative research provide insights to improve ...

  18. Importance of Quantitative Research Across Fields

    Quantitative research uses statistical, mathematical, or computational techniques and is important across many fields. It is often used to develop and test hypotheses, investigate observable phenomena, and draw generalizations from samples to populations. Some examples of how quantitative research is used include studying the effects of interventions, comparing experimental and control groups ...

  19. Importance of Quantitative Research Across Fields

    This document discusses the importance of quantitative research across various fields such as natural sciences, social sciences, anthropology, communication, medicine, behavioral science, education, psychology, and social science. It provides examples of quantitative research methods like experiments, surveys, and mathematical modeling that are used to study observable phenomena, human ...

  20. IMPORTANCE OF QUANTITATIVE RESEARCH ACROSS FIELDS

    ‼️SHS PRACTICAL RESEARCH 2‼️🟣 GRADE 11: IMPORTANCE OF QUANTITATIVE RESEARCH ACROSS FIELDS‼️GRADE 11 PLAYLISTS ‼️General MathematicsFirst Quarter: https://t...

  21. Qualitative vs. Quantitative Data Analysis in Education

    Quantitative data is information that has a numerical value. Quantitative research is conducted to gather measurable data used in statistical analysis. Researchers can use quantitative studies to identify patterns and trends. In learning analytics quantitative data could include test scores, student demographics, or amount of time spent in a ...

  22. Importance of Quantitative Research Across Fields

    This document discusses the importance of quantitative research across various fields including STEM, ABM, HUMSS, and TVL. In STEM fields, quantitative research provides significant information about disease trends and health outcomes. It also helps evaluate clinical practices and develop new structural designs. In ABM, quantitative research helps design marketing strategies and determine the ...

  23. Agronomy

    Soybean is a primary source of plant-based oil and protein for human diets. Seed size and weight are important agronomic traits that significantly influence soybean yield. Despite their importance, the genetic mechanisms underlying soybean seed size and weight remain to be fully elucidated. In order to identify additional, major quantitative trait loci (QTL) associated with seed size and ...

  24. Journal of Medical Internet Research

    Background: The presence of bias in artificial intelligence has garnered increased attention, with inequities in algorithmic performance being exposed across the fields of criminal justice, education, and welfare services. In health care, the inequitable performance of algorithms across demographic groups may widen health inequalities.

  25. Importance of Quantitative Research Across The Fields

    This document discusses the importance of quantitative research across different fields. It provides examples of different types of quantitative research designs, including: 1. Descriptive research design which aims to describe phenomena and explore causes. This includes correlational, survey, status, analysis, and classification research.