• Skip to main content
  • Skip to primary sidebar
  • Skip to footer
  • QuestionPro

survey software icon

  • Solutions Industries Gaming Automotive Sports and events Education Government Travel & Hospitality Financial Services Healthcare Cannabis Technology Use Case NPS+ Communities Audience Contactless surveys Mobile LivePolls Member Experience GDPR Positive People Science 360 Feedback Surveys
  • Resources Blog eBooks Survey Templates Case Studies Training Help center

empirical research analysis method

Home Market Research

Empirical Research: Definition, Methods, Types and Examples

What is Empirical Research

Content Index

Empirical research: Definition

Empirical research: origin, quantitative research methods, qualitative research methods, steps for conducting empirical research, empirical research methodology cycle, advantages of empirical research, disadvantages of empirical research, why is there a need for empirical research.

Empirical research is defined as any research where conclusions of the study is strictly drawn from concretely empirical evidence, and therefore “verifiable” evidence.

This empirical evidence can be gathered using quantitative market research and  qualitative market research  methods.

For example: A research is being conducted to find out if listening to happy music in the workplace while working may promote creativity? An experiment is conducted by using a music website survey on a set of audience who are exposed to happy music and another set who are not listening to music at all, and the subjects are then observed. The results derived from such a research will give empirical evidence if it does promote creativity or not.

LEARN ABOUT: Behavioral Research

You must have heard the quote” I will not believe it unless I see it”. This came from the ancient empiricists, a fundamental understanding that powered the emergence of medieval science during the renaissance period and laid the foundation of modern science, as we know it today. The word itself has its roots in greek. It is derived from the greek word empeirikos which means “experienced”.

In today’s world, the word empirical refers to collection of data using evidence that is collected through observation or experience or by using calibrated scientific instruments. All of the above origins have one thing in common which is dependence of observation and experiments to collect data and test them to come up with conclusions.

LEARN ABOUT: Causal Research

Types and methodologies of empirical research

Empirical research can be conducted and analysed using qualitative or quantitative methods.

  • Quantitative research : Quantitative research methods are used to gather information through numerical data. It is used to quantify opinions, behaviors or other defined variables . These are predetermined and are in a more structured format. Some of the commonly used methods are survey, longitudinal studies, polls, etc
  • Qualitative research:   Qualitative research methods are used to gather non numerical data.  It is used to find meanings, opinions, or the underlying reasons from its subjects. These methods are unstructured or semi structured. The sample size for such a research is usually small and it is a conversational type of method to provide more insight or in-depth information about the problem Some of the most popular forms of methods are focus groups, experiments, interviews, etc.

Data collected from these will need to be analysed. Empirical evidence can also be analysed either quantitatively and qualitatively. Using this, the researcher can answer empirical questions which have to be clearly defined and answerable with the findings he has got. The type of research design used will vary depending on the field in which it is going to be used. Many of them might choose to do a collective research involving quantitative and qualitative method to better answer questions which cannot be studied in a laboratory setting.

LEARN ABOUT: Qualitative Research Questions and Questionnaires

Quantitative research methods aid in analyzing the empirical evidence gathered. By using these a researcher can find out if his hypothesis is supported or not.

  • Survey research: Survey research generally involves a large audience to collect a large amount of data. This is a quantitative method having a predetermined set of closed questions which are pretty easy to answer. Because of the simplicity of such a method, high responses are achieved. It is one of the most commonly used methods for all kinds of research in today’s world.

Previously, surveys were taken face to face only with maybe a recorder. However, with advancement in technology and for ease, new mediums such as emails , or social media have emerged.

For example: Depletion of energy resources is a growing concern and hence there is a need for awareness about renewable energy. According to recent studies, fossil fuels still account for around 80% of energy consumption in the United States. Even though there is a rise in the use of green energy every year, there are certain parameters because of which the general population is still not opting for green energy. In order to understand why, a survey can be conducted to gather opinions of the general population about green energy and the factors that influence their choice of switching to renewable energy. Such a survey can help institutions or governing bodies to promote appropriate awareness and incentive schemes to push the use of greener energy.

Learn more: Renewable Energy Survey Template Descriptive Research vs Correlational Research

  • Experimental research: In experimental research , an experiment is set up and a hypothesis is tested by creating a situation in which one of the variable is manipulated. This is also used to check cause and effect. It is tested to see what happens to the independent variable if the other one is removed or altered. The process for such a method is usually proposing a hypothesis, experimenting on it, analyzing the findings and reporting the findings to understand if it supports the theory or not.

For example: A particular product company is trying to find what is the reason for them to not be able to capture the market. So the organisation makes changes in each one of the processes like manufacturing, marketing, sales and operations. Through the experiment they understand that sales training directly impacts the market coverage for their product. If the person is trained well, then the product will have better coverage.

  • Correlational research: Correlational research is used to find relation between two set of variables . Regression analysis is generally used to predict outcomes of such a method. It can be positive, negative or neutral correlation.

LEARN ABOUT: Level of Analysis

For example: Higher educated individuals will get higher paying jobs. This means higher education enables the individual to high paying job and less education will lead to lower paying jobs.

  • Longitudinal study: Longitudinal study is used to understand the traits or behavior of a subject under observation after repeatedly testing the subject over a period of time. Data collected from such a method can be qualitative or quantitative in nature.

For example: A research to find out benefits of exercise. The target is asked to exercise everyday for a particular period of time and the results show higher endurance, stamina, and muscle growth. This supports the fact that exercise benefits an individual body.

  • Cross sectional: Cross sectional study is an observational type of method, in which a set of audience is observed at a given point in time. In this type, the set of people are chosen in a fashion which depicts similarity in all the variables except the one which is being researched. This type does not enable the researcher to establish a cause and effect relationship as it is not observed for a continuous time period. It is majorly used by healthcare sector or the retail industry.

For example: A medical study to find the prevalence of under-nutrition disorders in kids of a given population. This will involve looking at a wide range of parameters like age, ethnicity, location, incomes  and social backgrounds. If a significant number of kids coming from poor families show under-nutrition disorders, the researcher can further investigate into it. Usually a cross sectional study is followed by a longitudinal study to find out the exact reason.

  • Causal-Comparative research : This method is based on comparison. It is mainly used to find out cause-effect relationship between two variables or even multiple variables.

For example: A researcher measured the productivity of employees in a company which gave breaks to the employees during work and compared that to the employees of the company which did not give breaks at all.

LEARN ABOUT: Action Research

Some research questions need to be analysed qualitatively, as quantitative methods are not applicable there. In many cases, in-depth information is needed or a researcher may need to observe a target audience behavior, hence the results needed are in a descriptive analysis form. Qualitative research results will be descriptive rather than predictive. It enables the researcher to build or support theories for future potential quantitative research. In such a situation qualitative research methods are used to derive a conclusion to support the theory or hypothesis being studied.

LEARN ABOUT: Qualitative Interview

  • Case study: Case study method is used to find more information through carefully analyzing existing cases. It is very often used for business research or to gather empirical evidence for investigation purpose. It is a method to investigate a problem within its real life context through existing cases. The researcher has to carefully analyse making sure the parameter and variables in the existing case are the same as to the case that is being investigated. Using the findings from the case study, conclusions can be drawn regarding the topic that is being studied.

For example: A report mentioning the solution provided by a company to its client. The challenges they faced during initiation and deployment, the findings of the case and solutions they offered for the problems. Such case studies are used by most companies as it forms an empirical evidence for the company to promote in order to get more business.

  • Observational method:   Observational method is a process to observe and gather data from its target. Since it is a qualitative method it is time consuming and very personal. It can be said that observational research method is a part of ethnographic research which is also used to gather empirical evidence. This is usually a qualitative form of research, however in some cases it can be quantitative as well depending on what is being studied.

For example: setting up a research to observe a particular animal in the rain-forests of amazon. Such a research usually take a lot of time as observation has to be done for a set amount of time to study patterns or behavior of the subject. Another example used widely nowadays is to observe people shopping in a mall to figure out buying behavior of consumers.

  • One-on-one interview: Such a method is purely qualitative and one of the most widely used. The reason being it enables a researcher get precise meaningful data if the right questions are asked. It is a conversational method where in-depth data can be gathered depending on where the conversation leads.

For example: A one-on-one interview with the finance minister to gather data on financial policies of the country and its implications on the public.

  • Focus groups: Focus groups are used when a researcher wants to find answers to why, what and how questions. A small group is generally chosen for such a method and it is not necessary to interact with the group in person. A moderator is generally needed in case the group is being addressed in person. This is widely used by product companies to collect data about their brands and the product.

For example: A mobile phone manufacturer wanting to have a feedback on the dimensions of one of their models which is yet to be launched. Such studies help the company meet the demand of the customer and position their model appropriately in the market.

  • Text analysis: Text analysis method is a little new compared to the other types. Such a method is used to analyse social life by going through images or words used by the individual. In today’s world, with social media playing a major part of everyone’s life, such a method enables the research to follow the pattern that relates to his study.

For example: A lot of companies ask for feedback from the customer in detail mentioning how satisfied are they with their customer support team. Such data enables the researcher to take appropriate decisions to make their support team better.

Sometimes a combination of the methods is also needed for some questions that cannot be answered using only one type of method especially when a researcher needs to gain a complete understanding of complex subject matter.

We recently published a blog that talks about examples of qualitative data in education ; why don’t you check it out for more ideas?

Since empirical research is based on observation and capturing experiences, it is important to plan the steps to conduct the experiment and how to analyse it. This will enable the researcher to resolve problems or obstacles which can occur during the experiment.

Step #1: Define the purpose of the research

This is the step where the researcher has to answer questions like what exactly do I want to find out? What is the problem statement? Are there any issues in terms of the availability of knowledge, data, time or resources. Will this research be more beneficial than what it will cost.

Before going ahead, a researcher has to clearly define his purpose for the research and set up a plan to carry out further tasks.

Step #2 : Supporting theories and relevant literature

The researcher needs to find out if there are theories which can be linked to his research problem . He has to figure out if any theory can help him support his findings. All kind of relevant literature will help the researcher to find if there are others who have researched this before, or what are the problems faced during this research. The researcher will also have to set up assumptions and also find out if there is any history regarding his research problem

Step #3: Creation of Hypothesis and measurement

Before beginning the actual research he needs to provide himself a working hypothesis or guess what will be the probable result. Researcher has to set up variables, decide the environment for the research and find out how can he relate between the variables.

Researcher will also need to define the units of measurements, tolerable degree for errors, and find out if the measurement chosen will be acceptable by others.

Step #4: Methodology, research design and data collection

In this step, the researcher has to define a strategy for conducting his research. He has to set up experiments to collect data which will enable him to propose the hypothesis. The researcher will decide whether he will need experimental or non experimental method for conducting the research. The type of research design will vary depending on the field in which the research is being conducted. Last but not the least, the researcher will have to find out parameters that will affect the validity of the research design. Data collection will need to be done by choosing appropriate samples depending on the research question. To carry out the research, he can use one of the many sampling techniques. Once data collection is complete, researcher will have empirical data which needs to be analysed.

LEARN ABOUT: Best Data Collection Tools

Step #5: Data Analysis and result

Data analysis can be done in two ways, qualitatively and quantitatively. Researcher will need to find out what qualitative method or quantitative method will be needed or will he need a combination of both. Depending on the unit of analysis of his data, he will know if his hypothesis is supported or rejected. Analyzing this data is the most important part to support his hypothesis.

Step #6: Conclusion

A report will need to be made with the findings of the research. The researcher can give the theories and literature that support his research. He can make suggestions or recommendations for further research on his topic.

Empirical research methodology cycle

A.D. de Groot, a famous dutch psychologist and a chess expert conducted some of the most notable experiments using chess in the 1940’s. During his study, he came up with a cycle which is consistent and now widely used to conduct empirical research. It consists of 5 phases with each phase being as important as the next one. The empirical cycle captures the process of coming up with hypothesis about how certain subjects work or behave and then testing these hypothesis against empirical data in a systematic and rigorous approach. It can be said that it characterizes the deductive approach to science. Following is the empirical cycle.

  • Observation: At this phase an idea is sparked for proposing a hypothesis. During this phase empirical data is gathered using observation. For example: a particular species of flower bloom in a different color only during a specific season.
  • Induction: Inductive reasoning is then carried out to form a general conclusion from the data gathered through observation. For example: As stated above it is observed that the species of flower blooms in a different color during a specific season. A researcher may ask a question “does the temperature in the season cause the color change in the flower?” He can assume that is the case, however it is a mere conjecture and hence an experiment needs to be set up to support this hypothesis. So he tags a few set of flowers kept at a different temperature and observes if they still change the color?
  • Deduction: This phase helps the researcher to deduce a conclusion out of his experiment. This has to be based on logic and rationality to come up with specific unbiased results.For example: In the experiment, if the tagged flowers in a different temperature environment do not change the color then it can be concluded that temperature plays a role in changing the color of the bloom.
  • Testing: This phase involves the researcher to return to empirical methods to put his hypothesis to the test. The researcher now needs to make sense of his data and hence needs to use statistical analysis plans to determine the temperature and bloom color relationship. If the researcher finds out that most flowers bloom a different color when exposed to the certain temperature and the others do not when the temperature is different, he has found support to his hypothesis. Please note this not proof but just a support to his hypothesis.
  • Evaluation: This phase is generally forgotten by most but is an important one to keep gaining knowledge. During this phase the researcher puts forth the data he has collected, the support argument and his conclusion. The researcher also states the limitations for the experiment and his hypothesis and suggests tips for others to pick it up and continue a more in-depth research for others in the future. LEARN MORE: Population vs Sample

LEARN MORE: Population vs Sample

There is a reason why empirical research is one of the most widely used method. There are a few advantages associated with it. Following are a few of them.

  • It is used to authenticate traditional research through various experiments and observations.
  • This research methodology makes the research being conducted more competent and authentic.
  • It enables a researcher understand the dynamic changes that can happen and change his strategy accordingly.
  • The level of control in such a research is high so the researcher can control multiple variables.
  • It plays a vital role in increasing internal validity .

Even though empirical research makes the research more competent and authentic, it does have a few disadvantages. Following are a few of them.

  • Such a research needs patience as it can be very time consuming. The researcher has to collect data from multiple sources and the parameters involved are quite a few, which will lead to a time consuming research.
  • Most of the time, a researcher will need to conduct research at different locations or in different environments, this can lead to an expensive affair.
  • There are a few rules in which experiments can be performed and hence permissions are needed. Many a times, it is very difficult to get certain permissions to carry out different methods of this research.
  • Collection of data can be a problem sometimes, as it has to be collected from a variety of sources through different methods.

LEARN ABOUT:  Social Communication Questionnaire

Empirical research is important in today’s world because most people believe in something only that they can see, hear or experience. It is used to validate multiple hypothesis and increase human knowledge and continue doing it to keep advancing in various fields.

For example: Pharmaceutical companies use empirical research to try out a specific drug on controlled groups or random groups to study the effect and cause. This way, they prove certain theories they had proposed for the specific drug. Such research is very important as sometimes it can lead to finding a cure for a disease that has existed for many years. It is useful in science and many other fields like history, social sciences, business, etc.

LEARN ABOUT: 12 Best Tools for Researchers

With the advancement in today’s world, empirical research has become critical and a norm in many fields to support their hypothesis and gain more knowledge. The methods mentioned above are very useful for carrying out such research. However, a number of new methods will keep coming up as the nature of new investigative questions keeps getting unique or changing.

Create a single source of real data with a built-for-insights platform. Store past data, add nuggets of insights, and import research data from various sources into a CRM for insights. Build on ever-growing research with a real-time dashboard in a unified research management platform to turn insights into knowledge.

LEARN MORE         FREE TRIAL

MORE LIKE THIS

email survey tool

The Best Email Survey Tool to Boost Your Feedback Game

May 7, 2024

Employee Engagement Survey Tools

Top 10 Employee Engagement Survey Tools

employee engagement software

Top 20 Employee Engagement Software Solutions

May 3, 2024

customer experience software

15 Best Customer Experience Software of 2024

May 2, 2024

Other categories

  • Academic Research
  • Artificial Intelligence
  • Assessments
  • Brand Awareness
  • Case Studies
  • Communities
  • Consumer Insights
  • Customer effort score
  • Customer Engagement
  • Customer Experience
  • Customer Loyalty
  • Customer Research
  • Customer Satisfaction
  • Employee Benefits
  • Employee Engagement
  • Employee Retention
  • Friday Five
  • General Data Protection Regulation
  • Insights Hub
  • Life@QuestionPro
  • Market Research
  • Mobile diaries
  • Mobile Surveys
  • New Features
  • Online Communities
  • Question Types
  • Questionnaire
  • QuestionPro Products
  • Release Notes
  • Research Tools and Apps
  • Revenue at Risk
  • Survey Templates
  • Training Tips
  • Uncategorized
  • Video Learning Series
  • What’s Coming Up
  • Workforce Intelligence

What is Empirical Research? Definition, Methods, Examples

Appinio Research · 09.02.2024 · 36min read

What is Empirical Research Definition Methods Examples

Ever wondered how we gather the facts, unveil hidden truths, and make informed decisions in a world filled with questions? Empirical research holds the key.

In this guide, we'll delve deep into the art and science of empirical research, unraveling its methods, mysteries, and manifold applications. From defining the core principles to mastering data analysis and reporting findings, we're here to equip you with the knowledge and tools to navigate the empirical landscape.

What is Empirical Research?

Empirical research is the cornerstone of scientific inquiry, providing a systematic and structured approach to investigating the world around us. It is the process of gathering and analyzing empirical or observable data to test hypotheses, answer research questions, or gain insights into various phenomena. This form of research relies on evidence derived from direct observation or experimentation, allowing researchers to draw conclusions based on real-world data rather than purely theoretical or speculative reasoning.

Characteristics of Empirical Research

Empirical research is characterized by several key features:

  • Observation and Measurement : It involves the systematic observation or measurement of variables, events, or behaviors.
  • Data Collection : Researchers collect data through various methods, such as surveys, experiments, observations, or interviews.
  • Testable Hypotheses : Empirical research often starts with testable hypotheses that are evaluated using collected data.
  • Quantitative or Qualitative Data : Data can be quantitative (numerical) or qualitative (non-numerical), depending on the research design.
  • Statistical Analysis : Quantitative data often undergo statistical analysis to determine patterns , relationships, or significance.
  • Objectivity and Replicability : Empirical research strives for objectivity, minimizing researcher bias . It should be replicable, allowing other researchers to conduct the same study to verify results.
  • Conclusions and Generalizations : Empirical research generates findings based on data and aims to make generalizations about larger populations or phenomena.

Importance of Empirical Research

Empirical research plays a pivotal role in advancing knowledge across various disciplines. Its importance extends to academia, industry, and society as a whole. Here are several reasons why empirical research is essential:

  • Evidence-Based Knowledge : Empirical research provides a solid foundation of evidence-based knowledge. It enables us to test hypotheses, confirm or refute theories, and build a robust understanding of the world.
  • Scientific Progress : In the scientific community, empirical research fuels progress by expanding the boundaries of existing knowledge. It contributes to the development of theories and the formulation of new research questions.
  • Problem Solving : Empirical research is instrumental in addressing real-world problems and challenges. It offers insights and data-driven solutions to complex issues in fields like healthcare, economics, and environmental science.
  • Informed Decision-Making : In policymaking, business, and healthcare, empirical research informs decision-makers by providing data-driven insights. It guides strategies, investments, and policies for optimal outcomes.
  • Quality Assurance : Empirical research is essential for quality assurance and validation in various industries, including pharmaceuticals, manufacturing, and technology. It ensures that products and processes meet established standards.
  • Continuous Improvement : Businesses and organizations use empirical research to evaluate performance, customer satisfaction, and product effectiveness. This data-driven approach fosters continuous improvement and innovation.
  • Human Advancement : Empirical research in fields like medicine and psychology contributes to the betterment of human health and well-being. It leads to medical breakthroughs, improved therapies, and enhanced psychological interventions.
  • Critical Thinking and Problem Solving : Engaging in empirical research fosters critical thinking skills, problem-solving abilities, and a deep appreciation for evidence-based decision-making.

Empirical research empowers us to explore, understand, and improve the world around us. It forms the bedrock of scientific inquiry and drives progress in countless domains, shaping our understanding of both the natural and social sciences.

How to Conduct Empirical Research?

So, you've decided to dive into the world of empirical research. Let's begin by exploring the crucial steps involved in getting started with your research project.

1. Select a Research Topic

Selecting the right research topic is the cornerstone of a successful empirical study. It's essential to choose a topic that not only piques your interest but also aligns with your research goals and objectives. Here's how to go about it:

  • Identify Your Interests : Start by reflecting on your passions and interests. What topics fascinate you the most? Your enthusiasm will be your driving force throughout the research process.
  • Brainstorm Ideas : Engage in brainstorming sessions to generate potential research topics. Consider the questions you've always wanted to answer or the issues that intrigue you.
  • Relevance and Significance : Assess the relevance and significance of your chosen topic. Does it contribute to existing knowledge? Is it a pressing issue in your field of study or the broader community?
  • Feasibility : Evaluate the feasibility of your research topic. Do you have access to the necessary resources, data, and participants (if applicable)?

2. Formulate Research Questions

Once you've narrowed down your research topic, the next step is to formulate clear and precise research questions . These questions will guide your entire research process and shape your study's direction. To create effective research questions:

  • Specificity : Ensure that your research questions are specific and focused. Vague or overly broad questions can lead to inconclusive results.
  • Relevance : Your research questions should directly relate to your chosen topic. They should address gaps in knowledge or contribute to solving a particular problem.
  • Testability : Ensure that your questions are testable through empirical methods. You should be able to gather data and analyze it to answer these questions.
  • Avoid Bias : Craft your questions in a way that avoids leading or biased language. Maintain neutrality to uphold the integrity of your research.

3. Review Existing Literature

Before you embark on your empirical research journey, it's essential to immerse yourself in the existing body of literature related to your chosen topic. This step, often referred to as a literature review, serves several purposes:

  • Contextualization : Understand the historical context and current state of research in your field. What have previous studies found, and what questions remain unanswered?
  • Identifying Gaps : Identify gaps or areas where existing research falls short. These gaps will help you formulate meaningful research questions and hypotheses.
  • Theory Development : If your study is theoretical, consider how existing theories apply to your topic. If it's empirical, understand how previous studies have approached data collection and analysis.
  • Methodological Insights : Learn from the methodologies employed in previous research. What methods were successful, and what challenges did researchers face?

4. Define Variables

Variables are fundamental components of empirical research. They are the factors or characteristics that can change or be manipulated during your study. Properly defining and categorizing variables is crucial for the clarity and validity of your research. Here's what you need to know:

  • Independent Variables : These are the variables that you, as the researcher, manipulate or control. They are the "cause" in cause-and-effect relationships.
  • Dependent Variables : Dependent variables are the outcomes or responses that you measure or observe. They are the "effect" influenced by changes in independent variables.
  • Operational Definitions : To ensure consistency and clarity, provide operational definitions for your variables. Specify how you will measure or manipulate each variable.
  • Control Variables : In some studies, controlling for other variables that may influence your dependent variable is essential. These are known as control variables.

Understanding these foundational aspects of empirical research will set a solid foundation for the rest of your journey. Now that you've grasped the essentials of getting started, let's delve deeper into the intricacies of research design.

Empirical Research Design

Now that you've selected your research topic, formulated research questions, and defined your variables, it's time to delve into the heart of your empirical research journey – research design . This pivotal step determines how you will collect data and what methods you'll employ to answer your research questions. Let's explore the various facets of research design in detail.

Types of Empirical Research

Empirical research can take on several forms, each with its own unique approach and methodologies. Understanding the different types of empirical research will help you choose the most suitable design for your study. Here are some common types:

  • Experimental Research : In this type, researchers manipulate one or more independent variables to observe their impact on dependent variables. It's highly controlled and often conducted in a laboratory setting.
  • Observational Research : Observational research involves the systematic observation of subjects or phenomena without intervention. Researchers are passive observers, documenting behaviors, events, or patterns.
  • Survey Research : Surveys are used to collect data through structured questionnaires or interviews. This method is efficient for gathering information from a large number of participants.
  • Case Study Research : Case studies focus on in-depth exploration of one or a few cases. Researchers gather detailed information through various sources such as interviews, documents, and observations.
  • Qualitative Research : Qualitative research aims to understand behaviors, experiences, and opinions in depth. It often involves open-ended questions, interviews, and thematic analysis.
  • Quantitative Research : Quantitative research collects numerical data and relies on statistical analysis to draw conclusions. It involves structured questionnaires, experiments, and surveys.

Your choice of research type should align with your research questions and objectives. Experimental research, for example, is ideal for testing cause-and-effect relationships, while qualitative research is more suitable for exploring complex phenomena.

Experimental Design

Experimental research is a systematic approach to studying causal relationships. It's characterized by the manipulation of one or more independent variables while controlling for other factors. Here are some key aspects of experimental design:

  • Control and Experimental Groups : Participants are randomly assigned to either a control group or an experimental group. The independent variable is manipulated for the experimental group but not for the control group.
  • Randomization : Randomization is crucial to eliminate bias in group assignment. It ensures that each participant has an equal chance of being in either group.
  • Hypothesis Testing : Experimental research often involves hypothesis testing. Researchers formulate hypotheses about the expected effects of the independent variable and use statistical analysis to test these hypotheses.

Observational Design

Observational research entails careful and systematic observation of subjects or phenomena. It's advantageous when you want to understand natural behaviors or events. Key aspects of observational design include:

  • Participant Observation : Researchers immerse themselves in the environment they are studying. They become part of the group being observed, allowing for a deep understanding of behaviors.
  • Non-Participant Observation : In non-participant observation, researchers remain separate from the subjects. They observe and document behaviors without direct involvement.
  • Data Collection Methods : Observational research can involve various data collection methods, such as field notes, video recordings, photographs, or coding of observed behaviors.

Survey Design

Surveys are a popular choice for collecting data from a large number of participants. Effective survey design is essential to ensure the validity and reliability of your data. Consider the following:

  • Questionnaire Design : Create clear and concise questions that are easy for participants to understand. Avoid leading or biased questions.
  • Sampling Methods : Decide on the appropriate sampling method for your study, whether it's random, stratified, or convenience sampling.
  • Data Collection Tools : Choose the right tools for data collection, whether it's paper surveys, online questionnaires, or face-to-face interviews.

Case Study Design

Case studies are an in-depth exploration of one or a few cases to gain a deep understanding of a particular phenomenon. Key aspects of case study design include:

  • Single Case vs. Multiple Case Studies : Decide whether you'll focus on a single case or multiple cases. Single case studies are intensive and allow for detailed examination, while multiple case studies provide comparative insights.
  • Data Collection Methods : Gather data through interviews, observations, document analysis, or a combination of these methods.

Qualitative vs. Quantitative Research

In empirical research, you'll often encounter the distinction between qualitative and quantitative research . Here's a closer look at these two approaches:

  • Qualitative Research : Qualitative research seeks an in-depth understanding of human behavior, experiences, and perspectives. It involves open-ended questions, interviews, and the analysis of textual or narrative data. Qualitative research is exploratory and often used when the research question is complex and requires a nuanced understanding.
  • Quantitative Research : Quantitative research collects numerical data and employs statistical analysis to draw conclusions. It involves structured questionnaires, experiments, and surveys. Quantitative research is ideal for testing hypotheses and establishing cause-and-effect relationships.

Understanding the various research design options is crucial in determining the most appropriate approach for your study. Your choice should align with your research questions, objectives, and the nature of the phenomenon you're investigating.

Data Collection for Empirical Research

Now that you've established your research design, it's time to roll up your sleeves and collect the data that will fuel your empirical research. Effective data collection is essential for obtaining accurate and reliable results.

Sampling Methods

Sampling methods are critical in empirical research, as they determine the subset of individuals or elements from your target population that you will study. Here are some standard sampling methods:

  • Random Sampling : Random sampling ensures that every member of the population has an equal chance of being selected. It minimizes bias and is often used in quantitative research.
  • Stratified Sampling : Stratified sampling involves dividing the population into subgroups or strata based on specific characteristics (e.g., age, gender, location). Samples are then randomly selected from each stratum, ensuring representation of all subgroups.
  • Convenience Sampling : Convenience sampling involves selecting participants who are readily available or easily accessible. While it's convenient, it may introduce bias and limit the generalizability of results.
  • Snowball Sampling : Snowball sampling is instrumental when studying hard-to-reach or hidden populations. One participant leads you to another, creating a "snowball" effect. This method is common in qualitative research.
  • Purposive Sampling : In purposive sampling, researchers deliberately select participants who meet specific criteria relevant to their research questions. It's often used in qualitative studies to gather in-depth information.

The choice of sampling method depends on the nature of your research, available resources, and the degree of precision required. It's crucial to carefully consider your sampling strategy to ensure that your sample accurately represents your target population.

Data Collection Instruments

Data collection instruments are the tools you use to gather information from your participants or sources. These instruments should be designed to capture the data you need accurately. Here are some popular data collection instruments:

  • Questionnaires : Questionnaires consist of structured questions with predefined response options. When designing questionnaires, consider the clarity of questions, the order of questions, and the response format (e.g., Likert scale , multiple-choice).
  • Interviews : Interviews involve direct communication between the researcher and participants. They can be structured (with predetermined questions) or unstructured (open-ended). Effective interviews require active listening and probing for deeper insights.
  • Observations : Observations entail systematically and objectively recording behaviors, events, or phenomena. Researchers must establish clear criteria for what to observe, how to record observations, and when to observe.
  • Surveys : Surveys are a common data collection instrument for quantitative research. They can be administered through various means, including online surveys, paper surveys, and telephone surveys.
  • Documents and Archives : In some cases, data may be collected from existing documents, records, or archives. Ensure that the sources are reliable, relevant, and properly documented.

To streamline your process and gather insights with precision and efficiency, consider leveraging innovative tools like Appinio . With Appinio's intuitive platform, you can harness the power of real-time consumer data to inform your research decisions effectively. Whether you're conducting surveys, interviews, or observations, Appinio empowers you to define your target audience, collect data from diverse demographics, and analyze results seamlessly.

By incorporating Appinio into your data collection toolkit, you can unlock a world of possibilities and elevate the impact of your empirical research. Ready to revolutionize your approach to data collection?

Book a Demo

Data Collection Procedures

Data collection procedures outline the step-by-step process for gathering data. These procedures should be meticulously planned and executed to maintain the integrity of your research.

  • Training : If you have a research team, ensure that they are trained in data collection methods and protocols. Consistency in data collection is crucial.
  • Pilot Testing : Before launching your data collection, conduct a pilot test with a small group to identify any potential problems with your instruments or procedures. Make necessary adjustments based on feedback.
  • Data Recording : Establish a systematic method for recording data. This may include timestamps, codes, or identifiers for each data point.
  • Data Security : Safeguard the confidentiality and security of collected data. Ensure that only authorized individuals have access to the data.
  • Data Storage : Properly organize and store your data in a secure location, whether in physical or digital form. Back up data to prevent loss.

Ethical Considerations

Ethical considerations are paramount in empirical research, as they ensure the well-being and rights of participants are protected.

  • Informed Consent : Obtain informed consent from participants, providing clear information about the research purpose, procedures, risks, and their right to withdraw at any time.
  • Privacy and Confidentiality : Protect the privacy and confidentiality of participants. Ensure that data is anonymized and sensitive information is kept confidential.
  • Beneficence : Ensure that your research benefits participants and society while minimizing harm. Consider the potential risks and benefits of your study.
  • Honesty and Integrity : Conduct research with honesty and integrity. Report findings accurately and transparently, even if they are not what you expected.
  • Respect for Participants : Treat participants with respect, dignity, and sensitivity to cultural differences. Avoid any form of coercion or manipulation.
  • Institutional Review Board (IRB) : If required, seek approval from an IRB or ethics committee before conducting your research, particularly when working with human participants.

Adhering to ethical guidelines is not only essential for the ethical conduct of research but also crucial for the credibility and validity of your study. Ethical research practices build trust between researchers and participants and contribute to the advancement of knowledge with integrity.

With a solid understanding of data collection, including sampling methods, instruments, procedures, and ethical considerations, you are now well-equipped to gather the data needed to answer your research questions.

Empirical Research Data Analysis

Now comes the exciting phase of data analysis, where the raw data you've diligently collected starts to yield insights and answers to your research questions. We will explore the various aspects of data analysis, from preparing your data to drawing meaningful conclusions through statistics and visualization.

Data Preparation

Data preparation is the crucial first step in data analysis. It involves cleaning, organizing, and transforming your raw data into a format that is ready for analysis. Effective data preparation ensures the accuracy and reliability of your results.

  • Data Cleaning : Identify and rectify errors, missing values, and inconsistencies in your dataset. This may involve correcting typos, removing outliers, and imputing missing data.
  • Data Coding : Assign numerical values or codes to categorical variables to make them suitable for statistical analysis. For example, converting "Yes" and "No" to 1 and 0.
  • Data Transformation : Transform variables as needed to meet the assumptions of the statistical tests you plan to use. Common transformations include logarithmic or square root transformations.
  • Data Integration : If your data comes from multiple sources, integrate it into a unified dataset, ensuring that variables match and align.
  • Data Documentation : Maintain clear documentation of all data preparation steps, as well as the rationale behind each decision. This transparency is essential for replicability.

Effective data preparation lays the foundation for accurate and meaningful analysis. It allows you to trust the results that will follow in the subsequent stages.

Descriptive Statistics

Descriptive statistics help you summarize and make sense of your data by providing a clear overview of its key characteristics. These statistics are essential for understanding the central tendencies, variability, and distribution of your variables. Descriptive statistics include:

  • Measures of Central Tendency : These include the mean (average), median (middle value), and mode (most frequent value). They help you understand the typical or central value of your data.
  • Measures of Dispersion : Measures like the range, variance, and standard deviation provide insights into the spread or variability of your data points.
  • Frequency Distributions : Creating frequency distributions or histograms allows you to visualize the distribution of your data across different values or categories.

Descriptive statistics provide the initial insights needed to understand your data's basic characteristics, which can inform further analysis.

Inferential Statistics

Inferential statistics take your analysis to the next level by allowing you to make inferences or predictions about a larger population based on your sample data. These methods help you test hypotheses and draw meaningful conclusions. Key concepts in inferential statistics include:

  • Hypothesis Testing : Hypothesis tests (e.g., t-tests, chi-squared tests) help you determine whether observed differences or associations in your data are statistically significant or occurred by chance.
  • Confidence Intervals : Confidence intervals provide a range within which population parameters (e.g., population mean) are likely to fall based on your sample data.
  • Regression Analysis : Regression models (linear, logistic, etc.) help you explore relationships between variables and make predictions.
  • Analysis of Variance (ANOVA) : ANOVA tests are used to compare means between multiple groups, allowing you to assess whether differences are statistically significant.

Inferential statistics are powerful tools for drawing conclusions from your data and assessing the generalizability of your findings to the broader population.

Qualitative Data Analysis

Qualitative data analysis is employed when working with non-numerical data, such as text, interviews, or open-ended survey responses. It focuses on understanding the underlying themes, patterns, and meanings within qualitative data. Qualitative analysis techniques include:

  • Thematic Analysis : Identifying and analyzing recurring themes or patterns within textual data.
  • Content Analysis : Categorizing and coding qualitative data to extract meaningful insights.
  • Grounded Theory : Developing theories or frameworks based on emergent themes from the data.
  • Narrative Analysis : Examining the structure and content of narratives to uncover meaning.

Qualitative data analysis provides a rich and nuanced understanding of complex phenomena and human experiences.

Data Visualization

Data visualization is the art of representing data graphically to make complex information more understandable and accessible. Effective data visualization can reveal patterns, trends, and outliers in your data. Common types of data visualization include:

  • Bar Charts and Histograms : Used to display the distribution of categorical data or discrete data .
  • Line Charts : Ideal for showing trends and changes in data over time.
  • Scatter Plots : Visualize relationships and correlations between two variables.
  • Pie Charts : Display the composition of a whole in terms of its parts.
  • Heatmaps : Depict patterns and relationships in multidimensional data through color-coding.
  • Box Plots : Provide a summary of the data distribution, including outliers.
  • Interactive Dashboards : Create dynamic visualizations that allow users to explore data interactively.

Data visualization not only enhances your understanding of the data but also serves as a powerful communication tool to convey your findings to others.

As you embark on the data analysis phase of your empirical research, remember that the specific methods and techniques you choose will depend on your research questions, data type, and objectives. Effective data analysis transforms raw data into valuable insights, bringing you closer to the answers you seek.

How to Report Empirical Research Results?

At this stage, you get to share your empirical research findings with the world. Effective reporting and presentation of your results are crucial for communicating your research's impact and insights.

1. Write the Research Paper

Writing a research paper is the culmination of your empirical research journey. It's where you synthesize your findings, provide context, and contribute to the body of knowledge in your field.

  • Title and Abstract : Craft a clear and concise title that reflects your research's essence. The abstract should provide a brief summary of your research objectives, methods, findings, and implications.
  • Introduction : In the introduction, introduce your research topic, state your research questions or hypotheses, and explain the significance of your study. Provide context by discussing relevant literature.
  • Methods : Describe your research design, data collection methods, and sampling procedures. Be precise and transparent, allowing readers to understand how you conducted your study.
  • Results : Present your findings in a clear and organized manner. Use tables, graphs, and statistical analyses to support your results. Avoid interpreting your findings in this section; focus on the presentation of raw data.
  • Discussion : Interpret your findings and discuss their implications. Relate your results to your research questions and the existing literature. Address any limitations of your study and suggest avenues for future research.
  • Conclusion : Summarize the key points of your research and its significance. Restate your main findings and their implications.
  • References : Cite all sources used in your research following a specific citation style (e.g., APA, MLA, Chicago). Ensure accuracy and consistency in your citations.
  • Appendices : Include any supplementary material, such as questionnaires, data coding sheets, or additional analyses, in the appendices.

Writing a research paper is a skill that improves with practice. Ensure clarity, coherence, and conciseness in your writing to make your research accessible to a broader audience.

2. Create Visuals and Tables

Visuals and tables are powerful tools for presenting complex data in an accessible and understandable manner.

  • Clarity : Ensure that your visuals and tables are clear and easy to interpret. Use descriptive titles and labels.
  • Consistency : Maintain consistency in formatting, such as font size and style, across all visuals and tables.
  • Appropriateness : Choose the most suitable visual representation for your data. Bar charts, line graphs, and scatter plots work well for different types of data.
  • Simplicity : Avoid clutter and unnecessary details. Focus on conveying the main points.
  • Accessibility : Make sure your visuals and tables are accessible to a broad audience, including those with visual impairments.
  • Captions : Include informative captions that explain the significance of each visual or table.

Compelling visuals and tables enhance the reader's understanding of your research and can be the key to conveying complex information efficiently.

3. Interpret Findings

Interpreting your findings is where you bridge the gap between data and meaning. It's your opportunity to provide context, discuss implications, and offer insights. When interpreting your findings:

  • Relate to Research Questions : Discuss how your findings directly address your research questions or hypotheses.
  • Compare with Literature : Analyze how your results align with or deviate from previous research in your field. What insights can you draw from these comparisons?
  • Discuss Limitations : Be transparent about the limitations of your study. Address any constraints, biases, or potential sources of error.
  • Practical Implications : Explore the real-world implications of your findings. How can they be applied or inform decision-making?
  • Future Research Directions : Suggest areas for future research based on the gaps or unanswered questions that emerged from your study.

Interpreting findings goes beyond simply presenting data; it's about weaving a narrative that helps readers grasp the significance of your research in the broader context.

With your research paper written, structured, and enriched with visuals, and your findings expertly interpreted, you are now prepared to communicate your research effectively. Sharing your insights and contributing to the body of knowledge in your field is a significant accomplishment in empirical research.

Examples of Empirical Research

To solidify your understanding of empirical research, let's delve into some real-world examples across different fields. These examples will illustrate how empirical research is applied to gather data, analyze findings, and draw conclusions.

Social Sciences

In the realm of social sciences, consider a sociological study exploring the impact of socioeconomic status on educational attainment. Researchers gather data from a diverse group of individuals, including their family backgrounds, income levels, and academic achievements.

Through statistical analysis, they can identify correlations and trends, revealing whether individuals from lower socioeconomic backgrounds are less likely to attain higher levels of education. This empirical research helps shed light on societal inequalities and informs policymakers on potential interventions to address disparities in educational access.

Environmental Science

Environmental scientists often employ empirical research to assess the effects of environmental changes. For instance, researchers studying the impact of climate change on wildlife might collect data on animal populations, weather patterns, and habitat conditions over an extended period.

By analyzing this empirical data, they can identify correlations between climate fluctuations and changes in wildlife behavior, migration patterns, or population sizes. This empirical research is crucial for understanding the ecological consequences of climate change and informing conservation efforts.

Business and Economics

In the business world, empirical research is essential for making data-driven decisions. Consider a market research study conducted by a business seeking to launch a new product. They collect data through surveys , focus groups , and consumer behavior analysis.

By examining this empirical data, the company can gauge consumer preferences, demand, and potential market size. Empirical research in business helps guide product development, pricing strategies, and marketing campaigns, increasing the likelihood of a successful product launch.

Psychological studies frequently rely on empirical research to understand human behavior and cognition. For instance, a psychologist interested in examining the impact of stress on memory might design an experiment. Participants are exposed to stress-inducing situations, and their memory performance is assessed through various tasks.

By analyzing the data collected, the psychologist can determine whether stress has a significant effect on memory recall. This empirical research contributes to our understanding of the complex interplay between psychological factors and cognitive processes.

These examples highlight the versatility and applicability of empirical research across diverse fields. Whether in medicine, social sciences, environmental science, business, or psychology, empirical research serves as a fundamental tool for gaining insights, testing hypotheses, and driving advancements in knowledge and practice.

Conclusion for Empirical Research

Empirical research is a powerful tool for gaining insights, testing hypotheses, and making informed decisions. By following the steps outlined in this guide, you've learned how to select research topics, collect data, analyze findings, and effectively communicate your research to the world. Remember, empirical research is a journey of discovery, and each step you take brings you closer to a deeper understanding of the world around you. Whether you're a scientist, a student, or someone curious about the process, the principles of empirical research empower you to explore, learn, and contribute to the ever-expanding realm of knowledge.

How to Collect Data for Empirical Research?

Introducing Appinio , the real-time market research platform revolutionizing how companies gather consumer insights for their empirical research endeavors. With Appinio, you can conduct your own market research in minutes, gaining valuable data to fuel your data-driven decisions.

Appinio is more than just a market research platform; it's a catalyst for transforming the way you approach empirical research, making it exciting, intuitive, and seamlessly integrated into your decision-making process.

Here's why Appinio is the go-to solution for empirical research:

  • From Questions to Insights in Minutes : With Appinio's streamlined process, you can go from formulating your research questions to obtaining actionable insights in a matter of minutes, saving you time and effort.
  • Intuitive Platform for Everyone : No need for a PhD in research; Appinio's platform is designed to be intuitive and user-friendly, ensuring that anyone can navigate and utilize it effectively.
  • Rapid Response Times : With an average field time of under 23 minutes for 1,000 respondents, Appinio delivers rapid results, allowing you to gather data swiftly and efficiently.
  • Global Reach with Targeted Precision : With access to over 90 countries and the ability to define target groups based on 1200+ characteristics, Appinio empowers you to reach your desired audience with precision and ease.

Register now EN

Get free access to the platform!

Join the loop 💌

Be the first to hear about new updates, product news, and data insights. We'll send it all straight to your inbox.

Get the latest market research news straight to your inbox! 💌

Wait, there's more

Interval Scale Definition Characteristics Examples

07.05.2024 | 29min read

Interval Scale: Definition, Characteristics, Examples

What is Qualitative Observation Definition Types Examples

03.05.2024 | 29min read

What is Qualitative Observation? Definition, Types, Examples

What is a Perceptual Map and How to Make One Template

02.05.2024 | 32min read

What is a Perceptual Map and How to Make One? (+ Template)

Purdue University

  • Ask a Librarian

Research: Overview & Approaches

  • Getting Started with Undergraduate Research
  • Planning & Getting Started
  • Building Your Knowledge Base
  • Locating Sources
  • Reading Scholarly Articles
  • Creating a Literature Review
  • Productivity & Organizing Research
  • Scholarly and Professional Relationships

Introduction to Empirical Research

Databases for finding empirical research, guided search, google scholar, examples of empirical research, sources and further reading.

  • Interpretive Research
  • Action-Based Research
  • Creative & Experimental Approaches

Your Librarian

Profile Photo

  • Introductory Video This video covers what empirical research is, what kinds of questions and methods empirical researchers use, and some tips for finding empirical research articles in your discipline.

Video Tutorial

  • Guided Search: Finding Empirical Research Articles This is a hands-on tutorial that will allow you to use your own search terms to find resources.

Google Scholar Search

  • Study on radiation transfer in human skin for cosmetics
  • Long-Term Mobile Phone Use and the Risk of Vestibular Schwannoma: A Danish Nationwide Cohort Study
  • Emissions Impacts and Benefits of Plug-In Hybrid Electric Vehicles and Vehicle-to-Grid Services
  • Review of design considerations and technological challenges for successful development and deployment of plug-in hybrid electric vehicles
  • Endocrine disrupters and human health: could oestrogenic chemicals in body care cosmetics adversely affect breast cancer incidence in women?

empirical research analysis method

  • << Previous: Scholarly and Professional Relationships
  • Next: Interpretive Research >>
  • Last Updated: Apr 25, 2024 4:11 PM
  • URL: https://guides.lib.purdue.edu/research_approaches

Philosophy Institute

Understanding the Empirical Method in Research Methodology

empirical research analysis method

Table of Contents

Have you ever wondered how scientists gather evidence to support their theories? Or what steps researchers take to ensure that their findings are reliable and not just based on speculation? The answer lies in a cornerstone of scientific investigation known as the empirical method . This approach to research is all about collecting data and observing the world to form solid, evidence-based conclusions. Let’s dive into the empirical method’s fascinating world and understand why it’s so critical in research methodology.

What is the empirical method?

The empirical method is a way of gaining knowledge by means of direct and indirect observation or experience. It’s fundamentally based on the idea that knowledge comes from sensory experience and can be acquired through observation and experimentation. This method stands in contrast to approaches that rely solely on theoretical or logical means.

The role of observation in the empirical method

Observation is at the heart of the empirical method. It involves using your senses to gather information about the world. This could be as simple as noting the color of a flower or as complex as using advanced technology to observe the behavior of microscopic organisms. The key is that the observations must be systematic and replicable, providing reliable data that can be used to draw conclusions.

Data collection: qualitative and quantitative

Different types of data can be collected using the empirical method:

  • Qualitative data – This data type is descriptive and conceptual, often collected through interviews, observations, and case studies.
  • Quantitative data – This involves numerical data collected through methods like surveys, experiments, and statistical analysis.

Empirical vs. experimental methods

While the empirical method is often associated with experimentation, it’s important to distinguish between the two. Experimental methods involve controlled tests where the researcher manipulates one variable to observe the effect on another. In contrast, the empirical method doesn’t necessarily involve manipulation. Instead, it focuses on observing and collecting data in natural settings, offering a broader understanding of phenomena as they occur in real life.

Why the distinction matters

Understanding the difference between empirical and experimental methods is crucial because it affects how research is conducted and how results are interpreted. Empirical research can provide a more naturalistic view of the subject matter, whereas experimental research can offer more control over variables and potentially more precise outcomes.

The significance of experiential learning

The empirical method has deep roots in experiential learning, which emphasizes learning through experience. This connection is vital because it underlines the importance of engaging with the subject matter at a practical level, rather than just theoretically. It’s a hands-on approach to knowledge that has been valued since the time of Aristotle.

Developing theories from empirical research

One of the most significant aspects of the empirical method is its role in theory development . Researchers collect and analyze data, and from these findings, they can formulate or refine theories. Theories that are supported by empirical evidence tend to be more robust and widely accepted in the scientific community.

Applying the empirical method in various fields

The empirical method is not limited to the natural sciences. It’s used across a range of disciplines, from social sciences to humanities, to understand different aspects of the world. For instance:

  • In psychology , researchers might use the empirical method to observe and record behaviors to understand the underlying mental processes.
  • In sociology , it could involve studying social interactions to draw conclusions about societal structures.
  • In economics , empirical data might be used to test the validity of economic theories or to measure market trends.

Challenges and limitations

Despite its importance, the empirical method has its challenges and limitations. One major challenge is ensuring that observations and data collection are unbiased. Additionally, not all phenomena are easily observable, and some may require more complex or abstract approaches.

The empirical method is a fundamental aspect of research methodology that has stood the test of time. By relying on observation and data collection, it allows researchers to ground their theories in reality, providing a solid foundation for knowledge. Whether it’s used in the hard sciences, social sciences, or humanities, the empirical method continues to be a critical tool for understanding our complex world.

How do you think the empirical method affects the credibility of research findings? And can you think of a situation where empirical methods might be difficult to apply but still necessary for advancing knowledge? Let’s discuss these thought-provoking questions and consider the breadth of the empirical method’s impact on the pursuit of understanding.

How useful was this post?

Click on a star to rate it!

Average rating / 5. Vote count:

No votes so far! Be the first to rate this post.

We are sorry that this post was not useful for you!

Let us improve this post!

Tell us how we can improve this post?

Submit a Comment Cancel reply

Your email address will not be published. Required fields are marked *

Save my name, email, and website in this browser for the next time I comment.

Submit Comment

Research Methodology

1 Introduction to Research in General

  • Research in General
  • Research Circle
  • Tools of Research
  • Methods: Quantitative or Qualitative
  • The Product: Research Report or Papers

2 Original Unity of Philosophy and Science

  • Myth Philosophy and Science: Original Unity
  • The Myth: A Spiritual Metaphor
  • Myth Philosophy and Science
  • The Greek Quest for Unity
  • The Ionian School
  • Towards a Grand Unification Theory or Theory of Everything
  • Einstein’s Perennial Quest for Unity

3 Evolution of the Distinct Methods of Science

  • Definition of Scientific Method
  • The Evolution of Scientific Methods
  • Theory-Dependence of Observation
  • Scope of Science and Scientific Methods
  • Prevalent Mistakes in Applying the Scientific Method

4 Relation of Scientific and Philosophical Methods

  • Definitions of Scientific and Philosophical method
  • Philosophical method
  • Scientific method
  • The relation
  • The Importance of Philosophical and scientific methods

5 Dialectical Method

  • Introduction and a Brief Survey of the Method
  • Types of Dialectics
  • Dialectics in Classical Philosophy
  • Dialectics in Modern Philosophy
  • Critique of Dialectical Method

6 Rational Method

  • Understanding Rationalism
  • Rational Method of Investigation
  • Descartes’ Rational Method
  • Leibniz’ Aim of Philosophy
  • Spinoza’ Aim of Philosophy

7 Empirical Method

  • Common Features of Philosophical Method
  • Empirical Method
  • Exposition of Empiricism
  • Locke’s Empirical Method
  • Berkeley’s Empirical Method
  • David Hume’s Empirical Method

8 Critical Method

  • Basic Features of Critical Theory
  • On Instrumental Reason
  • Conception of Society
  • Human History as Dialectic of Enlightenment
  • Substantive Reason
  • Habermasian Critical Theory
  • Habermas’ Theory of Society
  • Habermas’ Critique of Scientism
  • Theory of Communicative Action
  • Discourse Ethics of Habermas

9 Phenomenological Method (Western and Indian)

  • Phenomenology in Philosophy
  • Phenomenology as a Method
  • Phenomenological Analysis of Knowledge
  • Phenomenological Reduction
  • Husserl’s Triad: Ego Cogito Cogitata
  • Intentionality
  • Understanding ‘Consciousness’
  • Phenomenological Method in Indian Tradition
  • Phenomenological Method in Religion

10 Analytical Method (Western and Indian)

  • Analysis in History of Philosophy
  • Conceptual Analysis
  • Analysis as a Method
  • Analysis in Logical Atomism and Logical Positivism
  • Analytic Method in Ethics
  • Language Analysis
  • Quine’s Analytical Method
  • Analysis in Indian Traditions

11 Hermeneutical Method (Western and Indian)

  • The Power (Sakti) to Convey Meaning
  • Three Meanings
  • Pre-understanding
  • The Semantic Autonomy of the Text
  • Towards a Fusion of Horizons
  • The Hermeneutical Circle
  • The True Scandal of the Text
  • Literary Forms

12 Deconstructive Method

  • The Seminal Idea of Deconstruction in Heidegger
  • Deconstruction in Derrida
  • Structuralism and Post-structuralism
  • Sign Signifier and Signified
  • Writing and Trace
  • Deconstruction as a Strategic Reading
  • The Logic of Supplement
  • No Outside-text

13 Method of Bibliography

  • Preparing to Write
  • Writing a Paper
  • The Main Divisions of a Paper
  • Writing Bibliography in Turabian and APA
  • Sample Bibliography

14 Method of Footnotes

  • Citations and Notes
  • General Hints for Footnotes
  • Writing Footnotes
  • Examples of Footnote or Endnote
  • Example of a Research Article

15 Method of Notes Taking

  • Methods of Note-taking
  • Note Book Style
  • Note taking in a Computer
  • Types of Note-taking
  • Notes from Field Research
  • Errors to be Avoided

16 Method of Thesis Proposal and Presentation

  • Preliminary Section
  • Presenting the Problem of the Thesis
  • Design of the Study
  • Main Body of the Thesis
  • Conclusion Summary and Recommendations
  • Reference Material

Share on Mastodon

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

  • View all journals
  • Explore content
  • About the journal
  • Publish with us
  • Sign up for alerts
  • Review Article
  • Published: 01 June 2023

Data, measurement and empirical methods in the science of science

  • Lu Liu 1 , 2 , 3 , 4 ,
  • Benjamin F. Jones   ORCID: orcid.org/0000-0001-9697-9388 1 , 2 , 3 , 5 , 6 ,
  • Brian Uzzi   ORCID: orcid.org/0000-0001-6855-2854 1 , 2 , 3 &
  • Dashun Wang   ORCID: orcid.org/0000-0002-7054-2206 1 , 2 , 3 , 7  

Nature Human Behaviour volume  7 ,  pages 1046–1058 ( 2023 ) Cite this article

17k Accesses

8 Citations

118 Altmetric

Metrics details

  • Scientific community

The advent of large-scale datasets that trace the workings of science has encouraged researchers from many different disciplinary backgrounds to turn scientific methods into science itself, cultivating a rapidly expanding ‘science of science’. This Review considers this growing, multidisciplinary literature through the lens of data, measurement and empirical methods. We discuss the purposes, strengths and limitations of major empirical approaches, seeking to increase understanding of the field’s diverse methodologies and expand researchers’ toolkits. Overall, new empirical developments provide enormous capacity to test traditional beliefs and conceptual frameworks about science, discover factors associated with scientific productivity, predict scientific outcomes and design policies that facilitate scientific progress.

Similar content being viewed by others

empirical research analysis method

SciSciNet: A large-scale open data lake for the science of science research

empirical research analysis method

A dataset for measuring the impact of research data and their curation

empirical research analysis method

Envisioning a “science diplomacy 2.0”: on data, global challenges, and multi-layered networks

Scientific advances are a key input to rising standards of living, health and the capacity of society to confront grand challenges, from climate change to the COVID-19 pandemic 1 , 2 , 3 . A deeper understanding of how science works and where innovation occurs can help us to more effectively design science policy and science institutions, better inform scientists’ own research choices, and create and capture enormous value for science and humanity. Building on these key premises, recent years have witnessed substantial development in the ‘science of science’ 4 , 5 , 6 , 7 , 8 , 9 , which uses large-scale datasets and diverse computational toolkits to unearth fundamental patterns behind scientific production and use.

The idea of turning scientific methods into science itself is long-standing. Since the mid-20th century, researchers from different disciplines have asked central questions about the nature of scientific progress and the practice, organization and impact of scientific research. Building on these rich historical roots, the field of the science of science draws upon many disciplines, ranging from information science to the social, physical and biological sciences to computer science, engineering and design. The science of science closely relates to several strands and communities of research, including metascience, scientometrics, the economics of science, research on research, science and technology studies, the sociology of science, metaknowledge and quantitative science studies 5 . There are noticeable differences between some of these communities, mostly around their historical origins and the initial disciplinary composition of researchers forming these communities. For example, metascience has its origins in the clinical sciences and psychology, and focuses on rigour, transparency, reproducibility and other open science-related practices and topics. The scientometrics community, born in library and information sciences, places a particular emphasis on developing robust and responsible measures and indicators for science. Science and technology studies engage the history of science and technology, the philosophy of science, and the interplay between science, technology and society. The science of science, which has its origins in physics, computer science and sociology, takes a data-driven approach and emphasizes questions on how science works. Each of these communities has made fundamental contributions to understanding science. While they differ in their origins, these differences pale in comparison to the overarching, common interest in understanding the practice of science and its societal impact.

Three major developments have encouraged rapid advances in the science of science. The first is in data 9 : modern databases include millions of research articles, grant proposals, patents and more. This windfall of data traces scientific activity in remarkable detail and at scale. The second development is in measurement: scholars have used data to develop many new measures of scientific activities and examine theories that have long been viewed as important but difficult to quantify. The third development is in empirical methods: thanks to parallel advances in data science, network science, artificial intelligence and econometrics, researchers can study relationships, make predictions and assess science policy in powerful new ways. Together, new data, measurements and methods have revealed fundamental new insights about the inner workings of science and scientific progress itself.

With multiple approaches, however, comes a key challenge. As researchers adhere to norms respected within their disciplines, their methods vary, with results often published in venues with non-overlapping readership, fragmenting research along disciplinary boundaries. This fragmentation challenges researchers’ ability to appreciate and understand the value of work outside of their own discipline, much less to build directly on it for further investigations.

Recognizing these challenges and the rapidly developing nature of the field, this paper reviews the empirical approaches that are prevalent in this literature. We aim to provide readers with an up-to-date understanding of the available datasets, measurement constructs and empirical methodologies, as well as the value and limitations of each. Owing to space constraints, this Review does not cover the full technical details of each method, referring readers to related guides to learn more. Instead, we will emphasize why a researcher might favour one method over another, depending on the research question.

Beyond a positive understanding of science, a key goal of the science of science is to inform science policy. While this Review mainly focuses on empirical approaches, with its core audience being researchers in the field, the studies reviewed are also germane to key policy questions. For example, what is the appropriate scale of scientific investment, in what directions and through what institutions 10 , 11 ? Are public investments in science aligned with public interests 12 ? What conditions produce novel or high-impact science 13 , 14 , 15 , 16 , 17 , 18 , 19 , 20 ? How do the reward systems of science influence the rate and direction of progress 13 , 21 , 22 , 23 , 24 , and what governs scientific reproducibility 25 , 26 , 27 ? How do contributions evolve over a scientific career 28 , 29 , 30 , 31 , 32 , and how may diversity among scientists advance scientific progress 33 , 34 , 35 , among other questions relevant to science policy 36 , 37 .

Overall, this review aims to facilitate entry to science of science research, expand researcher toolkits and illustrate how diverse research approaches contribute to our collective understanding of science. Section 2 reviews datasets and data linkages. Section 3 reviews major measurement constructs in the science of science. Section 4 considers a range of empirical methods, focusing on one study to illustrate each method and briefly summarizing related examples and applications. Section 5 concludes with an outlook for the science of science.

Historically, data on scientific activities were difficult to collect and were available in limited quantities. Gathering data could involve manually tallying statistics from publications 38 , 39 , interviewing scientists 16 , 40 , or assembling historical anecdotes and biographies 13 , 41 . Analyses were typically limited to a specific domain or group of scientists. Today, massive datasets on scientific production and use are at researchers’ fingertips 42 , 43 , 44 . Armed with big data and advanced algorithms, researchers can now probe questions previously not amenable to quantification and with enormous increases in scope and scale, as detailed below.

Publication datasets cover papers from nearly all scientific disciplines, enabling analyses of both general and domain-specific patterns. Commonly used datasets include the Web of Science (WoS), PubMed, CrossRef, ORCID, OpenCitations, Dimensions and OpenAlex. Datasets incorporating papers’ text (CORE) 45 , 46 , 47 , data entities (DataCite) 48 , 49 and peer review reports (Publons) 33 , 50 , 51 have also become available. These datasets further enable novel measurement, for example, representations of a paper’s content 52 , 53 , novelty 15 , 54 and interdisciplinarity 55 .

Notably, databases today capture more diverse aspects of science beyond publications, offering a richer and more encompassing view of research contexts and of researchers themselves (Fig. 1 ). For example, some datasets trace research funding to the specific publications these investments support 56 , 57 , allowing high-scale studies of the impact of funding on productivity and the return on public investment. Datasets incorporating job placements 58 , 59 , curriculum vitae 21 , 59 and scientific prizes 23 offer rich quantitative evidence on the social structure of science. Combining publication profiles with mentorship genealogies 60 , 61 , dissertations 34 and course syllabi 62 , 63 provides insights on mentoring and cultivating talent.

figure 1

This figure presents commonly used data types in science of science research, information contained in each data type and examples of data sources. Datasets in the science of science research have not only grown in scale but have also expanded beyond publications to integrate upstream funding investments and downstream applications that extend beyond science itself.

Finally, today’s scope of data extends beyond science to broader aspects of society. Altmetrics 64 captures news media and social media mentions of scientific articles. Other databases incorporate marketplace uses of science, including through patents 10 , pharmaceutical clinical trials and drug approvals 65 , 66 . Policy documents 67 , 68 help us to understand the role of science in the halls of government 69 and policy making 12 , 68 .

While datasets of the modern scientific enterprise have grown exponentially, they are not without limitations. As is often the case for data-driven research, drawing conclusions from specific data sources requires scrutiny and care. Datasets are typically based on published work, which may favour easy-to-publish topics over important ones (the streetlight effect) 70 , 71 . The publication of negative results is also rare (the file drawer problem) 72 , 73 . Meanwhile, English language publications account for over 90% of articles in major data sources, with limited coverage of non-English journals 74 . Publication datasets may also reflect biases in data collection across research institutions or demographic groups. Despite the open science movement, many datasets require paid subscriptions, which can create inequality in data access. Creating more open datasets for the science of science, such as OpenAlex, may not only improve the robustness and replicability of empirical claims but also increase entry to the field.

As today’s datasets become larger in scale and continue to integrate new dimensions, they offer opportunities to unveil the inner workings and external impacts of science in new ways. They can enable researchers to reach beyond previous limitations while conducting original studies of new and long-standing questions about the sciences.

Measurement

Here we discuss prominent measurement approaches in the science of science, including their purposes and limitations.

Modern publication databases typically include data on which articles and authors cite other papers and scientists. These citation linkages have been used to engage core conceptual ideas in scientific research. Here we consider two common measures based on citation information: citation counts and knowledge flows.

First, citation counts are commonly used indicators of impact. The term ‘indicator’ implies that it only approximates the concept of interest. A citation count is defined as how many times a document is cited by subsequent documents and can proxy for the importance of research papers 75 , 76 as well as patented inventions 77 , 78 , 79 . Rather than treating each citation equally, measures may further weight the importance of each citation, for example by using the citation network structure to produce centrality 80 , PageRank 81 , 82 or Eigenfactor indicators 83 , 84 .

Citation-based indicators have also faced criticism 84 , 85 . Citation indicators necessarily oversimplify the construct of impact, often ignoring heterogeneity in the meaning and use of a particular reference, the variations in citation practices across fields and institutional contexts, and the potential for reputation and power structures in science to influence citation behaviour 86 , 87 . Researchers have started to understand more nuanced citation behaviours ranging from negative citations 86 to citation context 47 , 88 , 89 . Understanding what a citation actually measures matters in interpreting and applying many research findings in the science of science. Evaluations relying on citation-based indicators rather than expert judgements raise questions regarding misuse 90 , 91 , 92 . Given the importance of developing indicators that can reliably quantify and evaluate science, the scientometrics community has been working to provide guidance for responsible citation practices and assessment 85 .

Second, scientists use citations to trace knowledge flows. Each citation in a paper is a link to specific previous work from which we can proxy how new discoveries draw upon existing ideas 76 , 93 and how knowledge flows between fields of science 94 , 95 , research institutions 96 , regions and nations 97 , 98 , 99 , and individuals 81 . Combinations of citation linkages can also approximate novelty 15 , disruptiveness 17 , 100 and interdisciplinarity 55 , 95 , 101 , 102 . A rapidly expanding body of work further examines citations to scientific articles from other domains (for example, patents, clinical drug trials and policy documents) to understand the applied value of science 10 , 12 , 65 , 66 , 103 , 104 , 105 .

Individuals

Analysing individual careers allows researchers to answer questions such as: How do we quantify individual scientific productivity? What is a typical career lifecycle? How are resources and credits allocated across individuals and careers? A scholar’s career can be examined through the papers they publish 30 , 31 , 106 , 107 , 108 , with attention to career progression and mobility, publication counts and citation impact, as well as grant funding 24 , 109 , 110 and prizes 111 , 112 , 113 ,

Studies of individual impact focus on output, typically approximated by the number of papers a researcher publishes and citation indicators. A popular measure for individual impact is the h -index 114 , which takes both volume and per-paper impact into consideration. Specifically, a scientist is assigned the largest value h such that they have h papers that were each cited at least h times. Later studies build on the idea of the h -index and propose variants to address limitations 115 , these variants ranging from emphasizing highly cited papers in a career 116 , to field differences 117 and normalizations 118 , to the relative contribution of an individual in collaborative works 119 .

To study dynamics in output over the lifecycle, individuals can be studied according to age, career age or the sequence of publications. A long-standing literature has investigated the relationship between age and the likelihood of outstanding achievement 28 , 106 , 111 , 120 , 121 . Recent studies further decouple the relationship between age, publication volume and per-paper citation, and measure the likelihood of producing highly cited papers in the sequence of works one produces 30 , 31 .

As simple as it sounds, representing careers using publication records is difficult. Collecting the full publication list of a researcher is the foundation to study individuals yet remains a key challenge, requiring name disambiguation techniques to match specific works to specific researchers. Although algorithms are increasingly capable at identifying millions of career profiles 122 , they vary in accuracy and robustness. ORCID can help to alleviate the problem by offering researchers the opportunity to create, maintain and update individual profiles themselves, and it goes beyond publications to collect broader outputs and activities 123 . A second challenge is survivorship bias. Empirical studies tend to focus on careers that are long enough to afford statistical analyses, which limits the applicability of the findings to scientific careers as a whole. A third challenge is the breadth of scientists’ activities, where focusing on publications ignores other important contributions such as mentorship and teaching, service (for example, refereeing papers, reviewing grant proposals and editing journals) or leadership within their organizations. Although researchers have begun exploring these dimensions by linking individual publication profiles with genealogical databases 61 , 124 , dissertations 34 , grants 109 , curriculum vitae 21 and acknowledgements 125 , scientific careers beyond publication records remain under-studied 126 , 127 . Lastly, citation-based indicators only serve as an approximation of individual performance with similar limitations as discussed above. The scientific community has called for more appropriate practices 85 , 128 , ranging from incorporating expert assessment of research contributions to broadening the measures of impact beyond publications.

Over many decades, science has exhibited a substantial and steady shift away from solo authorship towards coauthorship, especially among highly cited works 18 , 129 , 130 . In light of this shift, a research field, the science of team science 131 , 132 , has emerged to study the mechanisms that facilitate or hinder the effectiveness of teams. Team size can be proxied by the number of coauthors on a paper, which has been shown to predict distinctive types of advance: whereas larger teams tend to develop ideas, smaller teams tend to disrupt current ways of thinking 17 . Team characteristics can be inferred from coauthors’ backgrounds 133 , 134 , 135 , allowing quantification of a team’s diversity in terms of field, age, gender or ethnicity. Collaboration networks based on coauthorship 130 , 136 , 137 , 138 , 139 offer nuanced network-based indicators to understand individual and institutional collaborations.

However, there are limitations to using coauthorship alone to study teams 132 . First, coauthorship can obscure individual roles 140 , 141 , 142 , which has prompted institutional responses to help to allocate credit, including authorship order and individual contribution statements 56 , 143 . Second, coauthorship does not reflect the complex dynamics and interactions between team members that are often instrumental for team success 53 , 144 . Third, collaborative contributions can extend beyond coauthorship in publications to include members of a research laboratory 145 or co-principal investigators (co-PIs) on a grant 146 . Initiatives such as CRediT may help to address some of these issues by recording detailed roles for each contributor 147 .

Institutions

Research institutions, such as departments, universities, national laboratories and firms, encompass wider groups of researchers and their corresponding outputs. Institutional membership can be inferred from affiliations listed on publications or patents 148 , 149 , and the output of an institution can be aggregated over all its affiliated researchers 150 . Institutional research information systems (CRIS) contain more comprehensive research outputs and activities from employees.

Some research questions consider the institution as a whole, investigating the returns to research and development investment 104 , inequality of resource allocation 22 and the flow of scientists 21 , 148 , 149 . Other questions focus on institutional structures as sources of research productivity by looking into the role of peer effects 125 , 151 , 152 , 153 , how institutional policies impact research outcomes 154 , 155 and whether interdisciplinary efforts foster innovation 55 . Institution-oriented measurement faces similar limitations as with analyses of individuals and teams, including name disambiguation for a given institution and the limited capacity of formal publication records to characterize the full range of relevant institutional outcomes. It is also unclear how to allocate credit among multiple institutions associated with a paper. Moreover, relevant institutional employees extend beyond publishing researchers: interns, technicians and administrators all contribute to research endeavours 130 .

In sum, measurements allow researchers to quantify scientific production and use across numerous dimensions, but they also raise questions of construct validity: Does the proposed metric really reflect what we want to measure? Testing the construct’s validity is important, as is understanding a construct’s limits. Where possible, using alternative measurement approaches, or qualitative methods such as interviews and surveys, can improve measurement accuracy and the robustness of findings.

Empirical methods

In this section, we review two broad categories of empirical approaches (Table 1 ), each with distinctive goals: (1) to discover, estimate and predict empirical regularities; and (2) to identify causal mechanisms. For each method, we give a concrete example to help to explain how the method works, summarize related work for interested readers, and discuss contributions and limitations.

Descriptive and predictive approaches

Empirical regularities and generalizable facts.

The discovery of empirical regularities in science has had a key role in driving conceptual developments and the directions of future research. By observing empirical patterns at scale, researchers unveil central facts that shape science and present core features that theories of scientific progress and practice must explain. For example, consider citation distributions. de Solla Price first proposed that citation distributions are fat-tailed 39 , indicating that a few papers have extremely high citations while most papers have relatively few or even no citations at all. de Solla Price proposed that citation distribution was a power law, while researchers have since refined this view to show that the distribution appears log-normal, a nearly universal regularity across time and fields 156 , 157 . The fat-tailed nature of citation distributions and its universality across the sciences has in turn sparked substantial theoretical work that seeks to explain this key empirical regularity 20 , 156 , 158 , 159 .

Empirical regularities are often surprising and can contest previous beliefs of how science works. For example, it has been shown that the age distribution of great achievements peaks in middle age across a wide range of fields 107 , 121 , 160 , rejecting the common belief that young scientists typically drive breakthroughs in science. A closer look at the individual careers also indicates that productivity patterns vary widely across individuals 29 . Further, a scholar’s highest-impact papers come at a remarkably constant rate across the sequence of their work 30 , 31 .

The discovery of empirical regularities has had important roles in shaping beliefs about the nature of science 10 , 45 , 161 , 162 , sources of breakthrough ideas 15 , 163 , 164 , 165 , scientific careers 21 , 29 , 126 , 127 , the network structure of ideas and scientists 23 , 98 , 136 , 137 , 138 , 139 , 166 , gender inequality 57 , 108 , 126 , 135 , 143 , 167 , 168 , and many other areas of interest to scientists and science institutions 22 , 47 , 86 , 97 , 102 , 105 , 134 , 169 , 170 , 171 . At the same time, care must be taken to ensure that findings are not merely artefacts due to data selection or inherent bias. To differentiate meaningful patterns from spurious ones, it is important to stress test the findings through different selection criteria or across non-overlapping data sources.

Regression analysis

When investigating correlations among variables, a classic method is regression, which estimates how one set of variables explains variation in an outcome of interest. Regression can be used to test explicit hypotheses or predict outcomes. For example, researchers have investigated whether a paper’s novelty predicts its citation impact 172 . Adding additional control variables to the regression, one can further examine the robustness of the focal relationship.

Although regression analysis is useful for hypothesis testing, it bears substantial limitations. If the question one wishes to ask concerns a ‘causal’ rather than a correlational relationship, regression is poorly suited to the task as it is impossible to control for all the confounding factors. Failing to account for such ‘omitted variables’ can bias the regression coefficient estimates and lead to spurious interpretations. Further, regression models often have low goodness of fit (small R 2 ), indicating that the variables considered explain little of the outcome variation. As regressions typically focus on a specific relationship in simple functional forms, regressions tend to emphasize interpretability rather than overall predictability. The advent of predictive approaches powered by large-scale datasets and novel computational techniques offers new opportunities for modelling complex relationships with stronger predictive power.

Mechanistic models

Mechanistic modelling is an important approach to explaining empirical regularities, drawing from methods primarily used in physics. Such models predict macro-level regularities of a system by modelling micro-level interactions among basic elements with interpretable and modifiable formulars. While theoretical by nature, mechanistic models in the science of science are often empirically grounded, and this approach has developed together with the advent of large-scale, high-resolution data.

Simplicity is the core value of a mechanistic model. Consider for example, why citations follow a fat-tailed distribution. de Solla Price modelled the citing behaviour as a cumulative advantage process on a growing citation network 159 and found that if the probability a paper is cited grows linearly with its existing citations, the resulting distribution would follow a power law, broadly aligned with empirical observations. The model is intentionally simplified, ignoring myriad factors. Yet the simple cumulative advantage process is by itself sufficient in explaining a power law distribution of citations. In this way, mechanistic models can help to reveal key mechanisms that can explain observed patterns.

Moreover, mechanistic models can be refined as empirical evidence evolves. For example, later investigations showed that citation distributions are better characterized as log-normal 156 , 173 , prompting researchers to introduce a fitness parameter to encapsulate the inherent differences in papers’ ability to attract citations 174 , 175 . Further, older papers are less likely to be cited than expected 176 , 177 , 178 , motivating more recent models 20 to introduce an additional aging effect 179 . By combining the cumulative advantage, fitness and aging effects, one can already achieve substantial predictive power not just for the overall properties of the system but also the citation dynamics of individual papers 20 .

In addition to citations, mechanistic models have been developed to understand the formation of collaborations 136 , 180 , 181 , 182 , 183 , knowledge discovery and diffusion 184 , 185 , topic selection 186 , 187 , career dynamics 30 , 31 , 188 , 189 , the growth of scientific fields 190 and the dynamics of failure in science and other domains 178 .

At the same time, some observers have argued that mechanistic models are too simplistic to capture the essence of complex real-world problems 191 . While it has been a cornerstone for the natural sciences, representing social phenomena in a limited set of mathematical equations may miss complexities and heterogeneities that make social phenomena interesting in the first place. Such concerns are not unique to the science of science, as they represent a broader theme in computational social sciences 192 , 193 , ranging from social networks 194 , 195 to human mobility 196 , 197 to epidemics 198 , 199 . Other observers have questioned the practical utility of mechanistic models and whether they can be used to guide decisions and devise actionable policies. Nevertheless, despite these limitations, several complex phenomena in the science of science are well captured by simple mechanistic models, showing a high degree of regularity beneath complex interacting systems and providing powerful insights about the nature of science. Mixing such modelling with other methods could be particularly fruitful in future investigations.

Machine learning

The science of science seeks in part to forecast promising directions for scientific research 7 , 44 . In recent years, machine learning methods have substantially advanced predictive capabilities 200 , 201 and are playing increasingly important parts in the science of science. In contrast to the previous methods, machine learning does not emphasize hypotheses or theories. Rather, it leverages complex relationships in data and optimizes goodness of fit to make predictions and categorizations.

Traditional machine learning models include supervised, semi-supervised and unsupervised learning. The model choice depends on data availability and the research question, ranging from supervised models for citation prediction 202 , 203 to unsupervised models for community detection 204 . Take for example mappings of scientific knowledge 94 , 205 , 206 . The unsupervised method applies network clustering algorithms to map the structures of science. Related visualization tools make sense of clusters from the underlying network, allowing observers to see the organization, interactions and evolution of scientific knowledge. More recently, supervised learning, and deep neural networks in particular, have witnessed especially rapid developments 207 . Neural networks can generate high-dimensional representations of unstructured data such as images and texts, which encode complex properties difficult for human experts to perceive.

Take text analysis as an example. A recent study 52 utilizes 3.3 million paper abstracts in materials science to predict the thermoelectric properties of materials. The intuition is that the words currently used to describe a material may predict its hitherto undiscovered properties (Fig. 2 ). Compared with a random material, the materials predicted by the model are eight times more likely to be reported as thermoelectric in the next 5 years, suggesting that machine learning has the potential to substantially speed up knowledge discovery, especially as data continue to grow in scale and scope. Indeed, predicting the direction of new discoveries represents one of the most promising avenues for machine learning models, with neural networks being applied widely to biology 208 , physics 209 , 210 , mathematics 211 , chemistry 212 , medicine 213 and clinical applications 214 . Neural networks also offer a quantitative framework to probe the characteristics of creative products ranging from scientific papers 53 , journals 215 , organizations 148 , to paintings and movies 32 . Neural networks can also help to predict the reproducibility of papers from a variety of disciplines at scale 53 , 216 .

figure 2

This figure illustrates the word2vec skip-gram methods 52 , where the goal is to predict useful properties of materials using previous scientific literature. a , The architecture and training process of the word2vec skip-gram model, where the 3-layer, fully connected neural network learns the 200-dimensional representation (hidden layer) from the sparse vector for each word and its context in the literature (input layer). b , The top two principal components of the word embedding. Materials with similar features are close in the 2D space, allowing prediction of a material’s properties. Different targeted words are shown in different colours. Reproduced with permission from ref. 52 , Springer Nature Ltd.

While machine learning can offer high predictive accuracy, successful applications to the science of science face challenges, particularly regarding interpretability. Researchers may value transparent and interpretable findings for how a given feature influences an outcome, rather than a black-box model. The lack of interpretability also raises concerns about bias and fairness. In predicting reproducible patterns from data, machine learning models inevitably include and reproduce biases embedded in these data, often in non-transparent ways. The fairness of machine learning 217 is heavily debated in applications ranging from the criminal justice system to hiring processes. Effective and responsible use of machine learning in the science of science therefore requires thoughtful partnership between humans and machines 53 to build a reliable system accessible to scrutiny and modification.

Causal approaches

The preceding methods can reveal core facts about the workings of science and develop predictive capacity. Yet, they fail to capture causal relationships, which are particularly useful in assessing policy interventions. For example, how can we test whether a science policy boosts or hinders the performance of individuals, teams or institutions? The overarching idea of causal approaches is to construct some counterfactual world where two groups are identical to each other except that one group experiences a treatment that the other group does not.

Towards causation

Before engaging in causal approaches, it is useful to first consider the interpretative challenges of observational data. As observational data emerge from mechanisms that are not fully known or measured, an observed correlation may be driven by underlying forces that were not accounted for in the analysis. This challenge makes causal inference fundamentally difficult in observational data. An awareness of this issue is the first step in confronting it. It further motivates intermediate empirical approaches, including the use of matching strategies and fixed effects, that can help to confront (although not fully eliminate) the inference challenge. We first consider these approaches before turning to more fully causal methods.

Matching. Matching utilizes rich information to construct a control group that is similar to the treatment group on as many observable characteristics as possible before the treatment group is exposed to the treatment. Inferences can then be made by comparing the treatment and the matched control groups. Exact matching applies to categorical values, such as country, gender, discipline or affiliation 35 , 218 . Coarsened exact matching considers percentile bins of continuous variables and matches observations in the same bin 133 . Propensity score matching estimates the probability of receiving the ‘treatment’ on the basis of the controlled variables and uses the estimates to match treatment and control groups, which reduces the matching task from comparing the values of multiple covariates to comparing a single value 24 , 219 . Dynamic matching is useful for longitudinally matching variables that change over time 220 , 221 .

Fixed effects. Fixed effects are a powerful and now standard tool in controlling for confounders. A key requirement for using fixed effects is that there are multiple observations on the same subject or entity (person, field, institution and so on) 222 , 223 , 224 . The fixed effect works as a dummy variable that accounts for the role of any fixed characteristic of that entity. Consider the finding where gender-diverse teams produce higher-impact papers than same-gender teams do 225 . A confounder may be that individuals who tend to write high-impact papers may also be more likely to work in gender-diverse teams. By including individual fixed effects, one accounts for any fixed characteristics of individuals (such as IQ, cultural background or previous education) that might drive the relationship of interest.

In sum, matching and fixed effects methods reduce potential sources of bias in interpreting relationships between variables. Yet, confounders may persist in these studies. For instance, fixed effects do not control for unobserved factors that change with time within the given entity (for example, access to funding or new skills). Identifying casual effects convincingly will then typically require distinct research methods that we turn to next.

Quasi-experiments

Researchers in economics and other fields have developed a range of quasi-experimental methods to construct treatment and control groups. The key idea here is exploiting randomness from external events that differentially expose subjects to a particular treatment. Here we review three quasi-experimental methods: difference-in-differences, instrumental variables and regression discontinuity (Fig. 3 ).

figure 3

a – c , This figure presents illustrations of ( a ) differences-in-differences, ( b ) instrumental variables and ( c ) regression discontinuity methods. The solid line in b represents causal links and the dashed line represents the relationships that are not allowed, if the IV method is to produce causal inference.

Difference-in-differences. Difference-in-difference regression (DiD) investigates the effect of an unexpected event, comparing the affected group (the treated group) with an unaffected group (the control group). The control group is intended to provide the counterfactual path—what would have happened were it not for the unexpected event. Ideally, the treated and control groups are on virtually identical paths before the treatment event, but DiD can also work if the groups are on parallel paths (Fig. 3a ). For example, one study 226 examines how the premature death of superstar scientists affects the productivity of their previous collaborators. The control group are collaborators of superstars who did not die in the time frame. The two groups do not show significant differences in publications before a death event, yet upon the death of a star scientist, the treated collaborators on average experience a 5–8% decline in their quality-adjusted publication rates compared with the control group. DiD has wide applicability in the science of science, having been used to analyse the causal effects of grant design 24 , access costs to previous research 155 , 227 , university technology transfer policies 154 , intellectual property 228 , citation practices 229 , evolution of fields 221 and the impacts of paper retractions 230 , 231 , 232 . The DiD literature has grown especially rapidly in the field of economics, with substantial recent refinements 233 , 234 .

Instrumental variables. Another quasi-experimental approach utilizes ‘instrumental variables’ (IV). The goal is to determine the causal influence of some feature X on some outcome Y by using a third, instrumental variable. This instrumental variable is a quasi-random event that induces variation in X and, except for its impact through X , has no other effect on the outcome Y (Fig. 3b ). For example, consider a study of astronomy that seeks to understand how telescope time affects career advancement 235 . Here, one cannot simply look at the correlation between telescope time and career outcomes because many confounds (such as talent or grit) may influence both telescope time and career opportunities. Now consider the weather as an instrumental variable. Cloudy weather will, at random, reduce an astronomer’s observational time. Yet, the weather on particular nights is unlikely to correlate with a scientist’s innate qualities. The weather can then provide an instrumental variable to reveal a causal relationship between telescope time and career outcomes. Instrumental variables have been used to study local peer effects in research 151 , the impact of gender composition in scientific committees 236 , patents on future innovation 237 and taxes on inventor mobility 238 .

Regression discontinuity. In regression discontinuity, policies with an arbitrary threshold for receiving some benefit can be used to construct treatment and control groups (Fig. 3c ). Take the funding paylines for grant proposals as an example. Proposals with scores increasingly close to the payline are increasingly similar in their both observable and unobservable characteristics, yet only those projects with scores above the payline receive the funding. For example, a study 110 examines the effect of winning an early-career grant on the probability of winning a later, mid-career grant. The probability has a discontinuous jump across the initial grant’s payline, providing the treatment and control groups needed to estimate the causal effect of receiving a grant. This example utilizes the ‘sharp’ regression discontinuity that assumes treatment status to be fully determined by the cut-off. If we assume treatment status is only partly determined by the cut-off, we can use ‘fuzzy’ regression discontinuity designs. Here the probability of receiving a grant is used to estimate the future outcome 11 , 110 , 239 , 240 , 241 .

Although quasi-experiments are powerful tools, they face their own limitations. First, these approaches identify causal effects within a specific context and often engage small numbers of observations. How representative the samples are for broader populations or contexts is typically left as an open question. Second, the validity of the causal design is typically not ironclad. Researchers usually conduct different robustness checks to verify whether observable confounders have significant differences between the treated and control groups, before treatment. However, unobservable features may still differ between treatment and control groups. The quality of instrumental variables and the specific claim that they have no effect on the outcome except through the variable of interest, is also difficult to assess. Ultimately, researchers must rely partly on judgement to tell whether appropriate conditions are met for causal inference.

This section emphasized popular econometric approaches to causal inference. Other empirical approaches, such as graphical causal modelling 242 , 243 , also represent an important stream of work on assessing causal relationships. Such approaches usually represent causation as a directed acyclic graph, with nodes as variables and arrows between them as suspected causal relationships. In the science of science, the directed acyclic graph approach has been applied to quantify the causal effect of journal impact factor 244 and gender or racial bias 245 on citations. Graphical causal modelling has also triggered discussions on strengths and weaknesses compared to the econometrics methods 246 , 247 .

Experiments

In contrast to quasi-experimental approaches, laboratory and field experiments conduct direct randomization in assigning treatment and control groups. These methods engage explicitly in the data generation process, manipulating interventions to observe counterfactuals. These experiments are crafted to study mechanisms of specific interest and, by designing the experiment and formally randomizing, can produce especially rigorous causal inference.

Laboratory experiments. Laboratory experiments build counterfactual worlds in well-controlled laboratory environments. Researchers randomly assign participants to the treatment or control group and then manipulate the laboratory conditions to observe different outcomes in the two groups. For example, consider laboratory experiments on team performance and gender composition 144 , 248 . The researchers randomly assign participants into groups to perform tasks such as solving puzzles or brainstorming. Teams with a higher proportion of women are found to perform better on average, offering evidence that gender diversity is causally linked to team performance. Laboratory experiments can allow researchers to test forces that are otherwise hard to observe, such as how competition influences creativity 249 . Laboratory experiments have also been used to evaluate how journal impact factors shape scientists’ perceptions of rewards 250 and gender bias in hiring 251 .

Laboratory experiments allow for precise control of settings and procedures to isolate causal effects of interest. However, participants may behave differently in synthetic environments than in real-world settings, raising questions about the generalizability and replicability of the results 252 , 253 , 254 . To assess causal effects in real-world settings, researcher use randomized controlled trials.

Randomized controlled trials. A randomized controlled trial (RCT), or field experiment, is a staple for causal inference across a wide range of disciplines. RCTs randomly assign participants into the treatment and control conditions 255 and can be used not only to assess mechanisms but also to test real-world interventions such as policy change. The science of science has witnessed growing use of RCTs. For instance, a field experiment 146 investigated whether lower search costs for collaborators increased collaboration in grant applications. The authors randomly allocated principal investigators to face-to-face sessions in a medical school, and then measured participants’ chance of writing a grant proposal together. RCTs have also offered rich causal insights on peer review 256 , 257 , 258 , 259 , 260 and gender bias in science 261 , 262 , 263 .

While powerful, RCTs are difficult to conduct in the science of science, mainly for two reasons. The first concerns potential risks in a policy intervention. For instance, while randomizing funding across individuals could generate crucial causal insights for funders, it may also inadvertently harm participants’ careers 264 . Second, key questions in the science of science often require a long-time horizon to trace outcomes, which makes RCTs costly. It also raises the difficulty of replicating findings. A relative advantage of the quasi-experimental methods discussed earlier is that one can identify causal effects over potentially long periods of time in the historical record. On the other hand, quasi-experiments must be found as opposed to designed, and they often are not available for many questions of interest. While the best approaches are context dependent, a growing community of researchers is building platforms to facilitate RCTs for the science of science, aiming to lower their costs and increase their scale. Performing RCTs in partnership with science institutions can also contribute to timely, policy-relevant research that may substantially improve science decision-making and investments.

Research in the science of science has been empowered by the growth of high-scale data, new measurement approaches and an expanding range of empirical methods. These tools provide enormous capacity to test conceptual frameworks about science, discover factors impacting scientific productivity, predict key scientific outcomes and design policies that better facilitate future scientific progress. A careful appreciation of empirical techniques can help researchers to choose effective tools for questions of interest and propel the field. A better and broader understanding of these methodologies may also build bridges across diverse research communities, facilitating communication and collaboration, and better leveraging the value of diverse perspectives. The science of science is about turning scientific methods on the nature of science itself. The fruits of this work, with time, can guide researchers and research institutions to greater progress in discovery and understanding across the landscape of scientific inquiry.

Bush, V . S cience–the Endless Frontier: A Report to the President on a Program for Postwar Scientific Research (National Science Foundation, 1990).

Mokyr, J. The Gifts of Athena (Princeton Univ. Press, 2011).

Jones, B. F. in Rebuilding the Post-Pandemic Economy (eds Kearney, M. S. & Ganz, A.) 272–310 (Aspen Institute Press, 2021).

Wang, D. & Barabási, A.-L. The Science of Science (Cambridge Univ. Press, 2021).

Fortunato, S. et al. Science of science. Science 359 , eaao0185 (2018).

Article   PubMed   PubMed Central   Google Scholar  

Azoulay, P. et al. Toward a more scientific science. Science 361 , 1194–1197 (2018).

Article   PubMed   Google Scholar  

Clauset, A., Larremore, D. B. & Sinatra, R. Data-driven predictions in the science of science. Science 355 , 477–480 (2017).

Article   CAS   PubMed   Google Scholar  

Zeng, A. et al. The science of science: from the perspective of complex systems. Phys. Rep. 714 , 1–73 (2017).

Article   Google Scholar  

Lin, Z., Yin. Y., Liu, L. & Wang, D. SciSciNet: a large-scale open data lake for the science of science research. Sci. Data, https://doi.org/10.1038/s41597-023-02198-9 (2023).

Ahmadpoor, M. & Jones, B. F. The dual frontier: patented inventions and prior scientific advance. Science 357 , 583–587 (2017).

Azoulay, P., Graff Zivin, J. S., Li, D. & Sampat, B. N. Public R&D investments and private-sector patenting: evidence from NIH funding rules. Rev. Econ. Stud. 86 , 117–152 (2019).

Yin, Y., Dong, Y., Wang, K., Wang, D. & Jones, B. F. Public use and public funding of science. Nat. Hum. Behav. 6 , 1344–1350 (2022).

Merton, R. K. The Sociology of Science: Theoretical and Empirical Investigations (Univ. Chicago Press, 1973).

Kuhn, T. The Structure of Scientific Revolutions (Princeton Univ. Press, 2021).

Uzzi, B., Mukherjee, S., Stringer, M. & Jones, B. Atypical combinations and scientific impact. Science 342 , 468–472 (2013).

Zuckerman, H. Scientific Elite: Nobel Laureates in the United States (Transaction Publishers, 1977).

Wu, L., Wang, D. & Evans, J. A. Large teams develop and small teams disrupt science and technology. Nature 566 , 378–382 (2019).

Wuchty, S., Jones, B. F. & Uzzi, B. The increasing dominance of teams in production of knowledge. Science 316 , 1036–1039 (2007).

Foster, J. G., Rzhetsky, A. & Evans, J. A. Tradition and innovation in scientists’ research strategies. Am. Sociol. Rev. 80 , 875–908 (2015).

Wang, D., Song, C. & Barabási, A.-L. Quantifying long-term scientific impact. Science 342 , 127–132 (2013).

Clauset, A., Arbesman, S. & Larremore, D. B. Systematic inequality and hierarchy in faculty hiring networks. Sci. Adv. 1 , e1400005 (2015).

Ma, A., Mondragón, R. J. & Latora, V. Anatomy of funded research in science. Proc. Natl Acad. Sci. USA 112 , 14760–14765 (2015).

Article   CAS   PubMed   PubMed Central   Google Scholar  

Ma, Y. & Uzzi, B. Scientific prize network predicts who pushes the boundaries of science. Proc. Natl Acad. Sci. USA 115 , 12608–12615 (2018).

Azoulay, P., Graff Zivin, J. S. & Manso, G. Incentives and creativity: evidence from the academic life sciences. RAND J. Econ. 42 , 527–554 (2011).

Schor, S. & Karten, I. Statistical evaluation of medical journal manuscripts. JAMA 195 , 1123–1128 (1966).

Platt, J. R. Strong inference: certain systematic methods of scientific thinking may produce much more rapid progress than others. Science 146 , 347–353 (1964).

Ioannidis, J. P. Why most published research findings are false. PLoS Med. 2 , e124 (2005).

Simonton, D. K. Career landmarks in science: individual differences and interdisciplinary contrasts. Dev. Psychol. 27 , 119 (1991).

Way, S. F., Morgan, A. C., Clauset, A. & Larremore, D. B. The misleading narrative of the canonical faculty productivity trajectory. Proc. Natl Acad. Sci. USA 114 , E9216–E9223 (2017).

Sinatra, R., Wang, D., Deville, P., Song, C. & Barabási, A.-L. Quantifying the evolution of individual scientific impact. Science 354 , aaf5239 (2016).

Liu, L. et al. Hot streaks in artistic, cultural, and scientific careers. Nature 559 , 396–399 (2018).

Liu, L., Dehmamy, N., Chown, J., Giles, C. L. & Wang, D. Understanding the onset of hot streaks across artistic, cultural, and scientific careers. Nat. Commun. 12 , 5392 (2021).

Squazzoni, F. et al. Peer review and gender bias: a study on 145 scholarly journals. Sci. Adv. 7 , eabd0299 (2021).

Hofstra, B. et al. The diversity–innovation paradox in science. Proc. Natl Acad. Sci. USA 117 , 9284–9291 (2020).

Huang, J., Gates, A. J., Sinatra, R. & Barabási, A.-L. Historical comparison of gender inequality in scientific careers across countries and disciplines. Proc. Natl Acad. Sci. USA 117 , 4609–4616 (2020).

Gläser, J. & Laudel, G. Governing science: how science policy shapes research content. Eur. J. Sociol. 57 , 117–168 (2016).

Stephan, P. E. How Economics Shapes Science (Harvard Univ. Press, 2012).

Garfield, E. & Sher, I. H. New factors in the evaluation of scientific literature through citation indexing. Am. Doc. 14 , 195–201 (1963).

Article   CAS   Google Scholar  

de Solla Price, D. J. Networks of scientific papers. Science 149 , 510–515 (1965).

Etzkowitz, H., Kemelgor, C. & Uzzi, B. Athena Unbound: The Advancement of Women in Science and Technology (Cambridge Univ. Press, 2000).

Simonton, D. K. Scientific Genius: A Psychology of Science (Cambridge Univ. Press, 1988).

Khabsa, M. & Giles, C. L. The number of scholarly documents on the public web. PLoS ONE 9 , e93949 (2014).

Xia, F., Wang, W., Bekele, T. M. & Liu, H. Big scholarly data: a survey. IEEE Trans. Big Data 3 , 18–35 (2017).

Evans, J. A. & Foster, J. G. Metaknowledge. Science 331 , 721–725 (2011).

Milojević, S. Quantifying the cognitive extent of science. J. Informetr. 9 , 962–973 (2015).

Rzhetsky, A., Foster, J. G., Foster, I. T. & Evans, J. A. Choosing experiments to accelerate collective discovery. Proc. Natl Acad. Sci. USA 112 , 14569–14574 (2015).

Poncela-Casasnovas, J., Gerlach, M., Aguirre, N. & Amaral, L. A. Large-scale analysis of micro-level citation patterns reveals nuanced selection criteria. Nat. Hum. Behav. 3 , 568–575 (2019).

Hardwicke, T. E. et al. Data availability, reusability, and analytic reproducibility: evaluating the impact of a mandatory open data policy at the journal Cognition. R. Soc. Open Sci. 5 , 180448 (2018).

Nagaraj, A., Shears, E. & de Vaan, M. Improving data access democratizes and diversifies science. Proc. Natl Acad. Sci. USA 117 , 23490–23498 (2020).

Bravo, G., Grimaldo, F., López-Iñesta, E., Mehmani, B. & Squazzoni, F. The effect of publishing peer review reports on referee behavior in five scholarly journals. Nat. Commun. 10 , 322 (2019).

Tran, D. et al. An open review of open review: a critical analysis of the machine learning conference review process. Preprint at https://doi.org/10.48550/arXiv.2010.05137 (2020).

Tshitoyan, V. et al. Unsupervised word embeddings capture latent knowledge from materials science literature. Nature 571 , 95–98 (2019).

Yang, Y., Wu, Y. & Uzzi, B. Estimating the deep replicability of scientific findings using human and artificial intelligence. Proc. Natl Acad. Sci. USA 117 , 10762–10768 (2020).

Mukherjee, S., Uzzi, B., Jones, B. & Stringer, M. A new method for identifying recombinations of existing knowledge associated with high‐impact innovation. J. Prod. Innov. Manage. 33 , 224–236 (2016).

Leahey, E., Beckman, C. M. & Stanko, T. L. Prominent but less productive: the impact of interdisciplinarity on scientists’ research. Adm. Sci. Q. 62 , 105–139 (2017).

Sauermann, H. & Haeussler, C. Authorship and contribution disclosures. Sci. Adv. 3 , e1700404 (2017).

Oliveira, D. F. M., Ma, Y., Woodruff, T. K. & Uzzi, B. Comparison of National Institutes of Health grant amounts to first-time male and female principal investigators. JAMA 321 , 898–900 (2019).

Yang, Y., Chawla, N. V. & Uzzi, B. A network’s gender composition and communication pattern predict women’s leadership success. Proc. Natl Acad. Sci. USA 116 , 2033–2038 (2019).

Way, S. F., Larremore, D. B. & Clauset, A. Gender, productivity, and prestige in computer science faculty hiring networks. In Proc. 25th International Conference on World Wide Web 1169–1179. (ACM 2016)

Malmgren, R. D., Ottino, J. M. & Amaral, L. A. N. The role of mentorship in protege performance. Nature 465 , 622–626 (2010).

Ma, Y., Mukherjee, S. & Uzzi, B. Mentorship and protégé success in STEM fields. Proc. Natl Acad. Sci. USA 117 , 14077–14083 (2020).

Börner, K. et al. Skill discrepancies between research, education, and jobs reveal the critical need to supply soft skills for the data economy. Proc. Natl Acad. Sci. USA 115 , 12630–12637 (2018).

Biasi, B. & Ma, S. The Education-Innovation Gap (National Bureau of Economic Research Working papers, 2020).

Bornmann, L. Do altmetrics point to the broader impact of research? An overview of benefits and disadvantages of altmetrics. J. Informetr. 8 , 895–903 (2014).

Cleary, E. G., Beierlein, J. M., Khanuja, N. S., McNamee, L. M. & Ledley, F. D. Contribution of NIH funding to new drug approvals 2010–2016. Proc. Natl Acad. Sci. USA 115 , 2329–2334 (2018).

Spector, J. M., Harrison, R. S. & Fishman, M. C. Fundamental science behind today’s important medicines. Sci. Transl. Med. 10 , eaaq1787 (2018).

Haunschild, R. & Bornmann, L. How many scientific papers are mentioned in policy-related documents? An empirical investigation using Web of Science and Altmetric data. Scientometrics 110 , 1209–1216 (2017).

Yin, Y., Gao, J., Jones, B. F. & Wang, D. Coevolution of policy and science during the pandemic. Science 371 , 128–130 (2021).

Sugimoto, C. R., Work, S., Larivière, V. & Haustein, S. Scholarly use of social media and altmetrics: a review of the literature. J. Assoc. Inf. Sci. Technol. 68 , 2037–2062 (2017).

Dunham, I. Human genes: time to follow the roads less traveled? PLoS Biol. 16 , e3000034 (2018).

Kustatscher, G. et al. Understudied proteins: opportunities and challenges for functional proteomics. Nat. Methods 19 , 774–779 (2022).

Rosenthal, R. The file drawer problem and tolerance for null results. Psychol. Bull. 86 , 638 (1979).

Franco, A., Malhotra, N. & Simonovits, G. Publication bias in the social sciences: unlocking the file drawer. Science 345 , 1502–1505 (2014).

Vera-Baceta, M.-A., Thelwall, M. & Kousha, K. Web of Science and Scopus language coverage. Scientometrics 121 , 1803–1813 (2019).

Waltman, L. A review of the literature on citation impact indicators. J. Informetr. 10 , 365–391 (2016).

Garfield, E. & Merton, R. K. Citation Indexing: Its Theory and Application in Science, Technology, and Humanities (Wiley, 1979).

Kelly, B., Papanikolaou, D., Seru, A. & Taddy, M. Measuring Technological Innovation Over the Long Run Report No. 0898-2937 (National Bureau of Economic Research, 2018).

Kogan, L., Papanikolaou, D., Seru, A. & Stoffman, N. Technological innovation, resource allocation, and growth. Q. J. Econ. 132 , 665–712 (2017).

Hall, B. H., Jaffe, A. & Trajtenberg, M. Market value and patent citations. RAND J. Econ. 36 , 16–38 (2005).

Google Scholar  

Yan, E. & Ding, Y. Applying centrality measures to impact analysis: a coauthorship network analysis. J. Am. Soc. Inf. Sci. Technol. 60 , 2107–2118 (2009).

Radicchi, F., Fortunato, S., Markines, B. & Vespignani, A. Diffusion of scientific credits and the ranking of scientists. Phys. Rev. E 80 , 056103 (2009).

Bollen, J., Rodriquez, M. A. & Van de Sompel, H. Journal status. Scientometrics 69 , 669–687 (2006).

Bergstrom, C. T., West, J. D. & Wiseman, M. A. The eigenfactor™ metrics. J. Neurosci. 28 , 11433–11434 (2008).

Cronin, B. & Sugimoto, C. R. Beyond Bibliometrics: Harnessing Multidimensional Indicators of Scholarly Impact (MIT Press, 2014).

Hicks, D., Wouters, P., Waltman, L., De Rijcke, S. & Rafols, I. Bibliometrics: the Leiden Manifesto for research metrics. Nature 520 , 429–431 (2015).

Catalini, C., Lacetera, N. & Oettl, A. The incidence and role of negative citations in science. Proc. Natl Acad. Sci. USA 112 , 13823–13826 (2015).

Alcacer, J. & Gittelman, M. Patent citations as a measure of knowledge flows: the influence of examiner citations. Rev. Econ. Stat. 88 , 774–779 (2006).

Ding, Y. et al. Content‐based citation analysis: the next generation of citation analysis. J. Assoc. Inf. Sci. Technol. 65 , 1820–1833 (2014).

Teufel, S., Siddharthan, A. & Tidhar, D. Automatic classification of citation function. In Proc. 2006 Conference on Empirical Methods in Natural Language Processing, 103–110 (Association for Computational Linguistics 2006)

Seeber, M., Cattaneo, M., Meoli, M. & Malighetti, P. Self-citations as strategic response to the use of metrics for career decisions. Res. Policy 48 , 478–491 (2019).

Pendlebury, D. A. The use and misuse of journal metrics and other citation indicators. Arch. Immunol. Ther. Exp. 57 , 1–11 (2009).

Biagioli, M. Watch out for cheats in citation game. Nature 535 , 201 (2016).

Jo, W. S., Liu, L. & Wang, D. See further upon the giants: quantifying intellectual lineage in science. Quant. Sci. Stud. 3 , 319–330 (2022).

Boyack, K. W., Klavans, R. & Börner, K. Mapping the backbone of science. Scientometrics 64 , 351–374 (2005).

Gates, A. J., Ke, Q., Varol, O. & Barabási, A.-L. Nature’s reach: narrow work has broad impact. Nature 575 , 32–34 (2019).

Börner, K., Penumarthy, S., Meiss, M. & Ke, W. Mapping the diffusion of scholarly knowledge among major US research institutions. Scientometrics 68 , 415–426 (2006).

King, D. A. The scientific impact of nations. Nature 430 , 311–316 (2004).

Pan, R. K., Kaski, K. & Fortunato, S. World citation and collaboration networks: uncovering the role of geography in science. Sci. Rep. 2 , 902 (2012).

Jaffe, A. B., Trajtenberg, M. & Henderson, R. Geographic localization of knowledge spillovers as evidenced by patent citations. Q. J. Econ. 108 , 577–598 (1993).

Funk, R. J. & Owen-Smith, J. A dynamic network measure of technological change. Manage. Sci. 63 , 791–817 (2017).

Yegros-Yegros, A., Rafols, I. & D’este, P. Does interdisciplinary research lead to higher citation impact? The different effect of proximal and distal interdisciplinarity. PLoS ONE 10 , e0135095 (2015).

Larivière, V., Haustein, S. & Börner, K. Long-distance interdisciplinarity leads to higher scientific impact. PLoS ONE 10 , e0122565 (2015).

Fleming, L., Greene, H., Li, G., Marx, M. & Yao, D. Government-funded research increasingly fuels innovation. Science 364 , 1139–1141 (2019).

Bowen, A. & Casadevall, A. Increasing disparities between resource inputs and outcomes, as measured by certain health deliverables, in biomedical research. Proc. Natl Acad. Sci. USA 112 , 11335–11340 (2015).

Li, D., Azoulay, P. & Sampat, B. N. The applied value of public investments in biomedical research. Science 356 , 78–81 (2017).

Lehman, H. C. Age and Achievement (Princeton Univ. Press, 2017).

Simonton, D. K. Creative productivity: a predictive and explanatory model of career trajectories and landmarks. Psychol. Rev. 104 , 66 (1997).

Duch, J. et al. The possible role of resource requirements and academic career-choice risk on gender differences in publication rate and impact. PLoS ONE 7 , e51332 (2012).

Wang, Y., Jones, B. F. & Wang, D. Early-career setback and future career impact. Nat. Commun. 10 , 4331 (2019).

Bol, T., de Vaan, M. & van de Rijt, A. The Matthew effect in science funding. Proc. Natl Acad. Sci. USA 115 , 4887–4890 (2018).

Jones, B. F. Age and great invention. Rev. Econ. Stat. 92 , 1–14 (2010).

Newman, M. Networks (Oxford Univ. Press, 2018).

Mazloumian, A., Eom, Y.-H., Helbing, D., Lozano, S. & Fortunato, S. How citation boosts promote scientific paradigm shifts and nobel prizes. PLoS ONE 6 , e18975 (2011).

Hirsch, J. E. An index to quantify an individual’s scientific research output. Proc. Natl Acad. Sci. USA 102 , 16569–16572 (2005).

Alonso, S., Cabrerizo, F. J., Herrera-Viedma, E. & Herrera, F. h-index: a review focused in its variants, computation and standardization for different scientific fields. J. Informetr. 3 , 273–289 (2009).

Egghe, L. An improvement of the h-index: the g-index. ISSI Newsl. 2 , 8–9 (2006).

Kaur, J., Radicchi, F. & Menczer, F. Universality of scholarly impact metrics. J. Informetr. 7 , 924–932 (2013).

Majeti, D. et al. Scholar plot: design and evaluation of an information interface for faculty research performance. Front. Res. Metr. Anal. 4 , 6 (2020).

Sidiropoulos, A., Katsaros, D. & Manolopoulos, Y. Generalized Hirsch h-index for disclosing latent facts in citation networks. Scientometrics 72 , 253–280 (2007).

Jones, B. F. & Weinberg, B. A. Age dynamics in scientific creativity. Proc. Natl Acad. Sci. USA 108 , 18910–18914 (2011).

Dennis, W. Age and productivity among scientists. Science 123 , 724–725 (1956).

Sanyal, D. K., Bhowmick, P. K. & Das, P. P. A review of author name disambiguation techniques for the PubMed bibliographic database. J. Inf. Sci. 47 , 227–254 (2021).

Haak, L. L., Fenner, M., Paglione, L., Pentz, E. & Ratner, H. ORCID: a system to uniquely identify researchers. Learn. Publ. 25 , 259–264 (2012).

Malmgren, R. D., Ottino, J. M. & Amaral, L. A. N. The role of mentorship in protégé performance. Nature 465 , 662–667 (2010).

Oettl, A. Reconceptualizing stars: scientist helpfulness and peer performance. Manage. Sci. 58 , 1122–1140 (2012).

Morgan, A. C. et al. The unequal impact of parenthood in academia. Sci. Adv. 7 , eabd1996 (2021).

Morgan, A. C. et al. Socioeconomic roots of academic faculty. Nat. Hum. Behav. 6 , 1625–1633 (2022).

San Francisco Declaration on Research Assessment (DORA) (American Society for Cell Biology, 2012).

Falk‐Krzesinski, H. J. et al. Advancing the science of team science. Clin. Transl. Sci. 3 , 263–266 (2010).

Cooke, N. J. et al. Enhancing the Effectiveness of Team Science (National Academies Press, 2015).

Börner, K. et al. A multi-level systems perspective for the science of team science. Sci. Transl. Med. 2 , 49cm24 (2010).

Leahey, E. From sole investigator to team scientist: trends in the practice and study of research collaboration. Annu. Rev. Sociol. 42 , 81–100 (2016).

AlShebli, B. K., Rahwan, T. & Woon, W. L. The preeminence of ethnic diversity in scientific collaboration. Nat. Commun. 9 , 5163 (2018).

Hsiehchen, D., Espinoza, M. & Hsieh, A. Multinational teams and diseconomies of scale in collaborative research. Sci. Adv. 1 , e1500211 (2015).

Koning, R., Samila, S. & Ferguson, J.-P. Who do we invent for? Patents by women focus more on women’s health, but few women get to invent. Science 372 , 1345–1348 (2021).

Barabâsi, A.-L. et al. Evolution of the social network of scientific collaborations. Physica A 311 , 590–614 (2002).

Newman, M. E. Scientific collaboration networks. I. Network construction and fundamental results. Phys. Rev. E 64 , 016131 (2001).

Newman, M. E. Scientific collaboration networks. II. Shortest paths, weighted networks, and centrality. Phys. Rev. E 64 , 016132 (2001).

Palla, G., Barabási, A.-L. & Vicsek, T. Quantifying social group evolution. Nature 446 , 664–667 (2007).

Ross, M. B. et al. Women are credited less in science than men. Nature 608 , 135–145 (2022).

Shen, H.-W. & Barabási, A.-L. Collective credit allocation in science. Proc. Natl Acad. Sci. USA 111 , 12325–12330 (2014).

Merton, R. K. Matthew effect in science. Science 159 , 56–63 (1968).

Ni, C., Smith, E., Yuan, H., Larivière, V. & Sugimoto, C. R. The gendered nature of authorship. Sci. Adv. 7 , eabe4639 (2021).

Woolley, A. W., Chabris, C. F., Pentland, A., Hashmi, N. & Malone, T. W. Evidence for a collective intelligence factor in the performance of human groups. Science 330 , 686–688 (2010).

Feldon, D. F. et al. Postdocs’ lab engagement predicts trajectories of PhD students’ skill development. Proc. Natl Acad. Sci. USA 116 , 20910–20916 (2019).

Boudreau, K. J. et al. A field experiment on search costs and the formation of scientific collaborations. Rev. Econ. Stat. 99 , 565–576 (2017).

Holcombe, A. O. Contributorship, not authorship: use CRediT to indicate who did what. Publications 7 , 48 (2019).

Murray, D. et al. Unsupervised embedding of trajectories captures the latent structure of mobility. Preprint at https://doi.org/10.48550/arXiv.2012.02785 (2020).

Deville, P. et al. Career on the move: geography, stratification, and scientific impact. Sci. Rep. 4 , 4770 (2014).

Edmunds, L. D. et al. Why do women choose or reject careers in academic medicine? A narrative review of empirical evidence. Lancet 388 , 2948–2958 (2016).

Waldinger, F. Peer effects in science: evidence from the dismissal of scientists in Nazi Germany. Rev. Econ. Stud. 79 , 838–861 (2012).

Agrawal, A., McHale, J. & Oettl, A. How stars matter: recruiting and peer effects in evolutionary biology. Res. Policy 46 , 853–867 (2017).

Fiore, S. M. Interdisciplinarity as teamwork: how the science of teams can inform team science. Small Group Res. 39 , 251–277 (2008).

Hvide, H. K. & Jones, B. F. University innovation and the professor’s privilege. Am. Econ. Rev. 108 , 1860–1898 (2018).

Murray, F., Aghion, P., Dewatripont, M., Kolev, J. & Stern, S. Of mice and academics: examining the effect of openness on innovation. Am. Econ. J. Econ. Policy 8 , 212–252 (2016).

Radicchi, F., Fortunato, S. & Castellano, C. Universality of citation distributions: toward an objective measure of scientific impact. Proc. Natl Acad. Sci. USA 105 , 17268–17272 (2008).

Waltman, L., van Eck, N. J. & van Raan, A. F. Universality of citation distributions revisited. J. Am. Soc. Inf. Sci. Technol. 63 , 72–77 (2012).

Barabási, A.-L. & Albert, R. Emergence of scaling in random networks. Science 286 , 509–512 (1999).

de Solla Price, D. A general theory of bibliometric and other cumulative advantage processes. J. Am. Soc. Inf. Sci. 27 , 292–306 (1976).

Cole, S. Age and scientific performance. Am. J. Sociol. 84 , 958–977 (1979).

Ke, Q., Ferrara, E., Radicchi, F. & Flammini, A. Defining and identifying sleeping beauties in science. Proc. Natl Acad. Sci. USA 112 , 7426–7431 (2015).

Bornmann, L., de Moya Anegón, F. & Leydesdorff, L. Do scientific advancements lean on the shoulders of giants? A bibliometric investigation of the Ortega hypothesis. PLoS ONE 5 , e13327 (2010).

Mukherjee, S., Romero, D. M., Jones, B. & Uzzi, B. The nearly universal link between the age of past knowledge and tomorrow’s breakthroughs in science and technology: the hotspot. Sci. Adv. 3 , e1601315 (2017).

Packalen, M. & Bhattacharya, J. NIH funding and the pursuit of edge science. Proc. Natl Acad. Sci. USA 117 , 12011–12016 (2020).

Zeng, A., Fan, Y., Di, Z., Wang, Y. & Havlin, S. Fresh teams are associated with original and multidisciplinary research. Nat. Hum. Behav. 5 , 1314–1322 (2021).

Newman, M. E. The structure of scientific collaboration networks. Proc. Natl Acad. Sci. USA 98 , 404–409 (2001).

Larivière, V., Ni, C., Gingras, Y., Cronin, B. & Sugimoto, C. R. Bibliometrics: global gender disparities in science. Nature 504 , 211–213 (2013).

West, J. D., Jacquet, J., King, M. M., Correll, S. J. & Bergstrom, C. T. The role of gender in scholarly authorship. PLoS ONE 8 , e66212 (2013).

Gao, J., Yin, Y., Myers, K. R., Lakhani, K. R. & Wang, D. Potentially long-lasting effects of the pandemic on scientists. Nat. Commun. 12 , 6188 (2021).

Jones, B. F., Wuchty, S. & Uzzi, B. Multi-university research teams: shifting impact, geography, and stratification in science. Science 322 , 1259–1262 (2008).

Chu, J. S. & Evans, J. A. Slowed canonical progress in large fields of science. Proc. Natl Acad. Sci. USA 118 , e2021636118 (2021).

Wang, J., Veugelers, R. & Stephan, P. Bias against novelty in science: a cautionary tale for users of bibliometric indicators. Res. Policy 46 , 1416–1436 (2017).

Stringer, M. J., Sales-Pardo, M. & Amaral, L. A. Statistical validation of a global model for the distribution of the ultimate number of citations accrued by papers published in a scientific journal. J. Assoc. Inf. Sci. Technol. 61 , 1377–1385 (2010).

Bianconi, G. & Barabási, A.-L. Bose-Einstein condensation in complex networks. Phys. Rev. Lett. 86 , 5632 (2001).

Bianconi, G. & Barabási, A.-L. Competition and multiscaling in evolving networks. Europhys. Lett. 54 , 436 (2001).

Yin, Y. & Wang, D. The time dimension of science: connecting the past to the future. J. Informetr. 11 , 608–621 (2017).

Pan, R. K., Petersen, A. M., Pammolli, F. & Fortunato, S. The memory of science: Inflation, myopia, and the knowledge network. J. Informetr. 12 , 656–678 (2018).

Yin, Y., Wang, Y., Evans, J. A. & Wang, D. Quantifying the dynamics of failure across science, startups and security. Nature 575 , 190–194 (2019).

Candia, C. & Uzzi, B. Quantifying the selective forgetting and integration of ideas in science and technology. Am. Psychol. 76 , 1067 (2021).

Milojević, S. Principles of scientific research team formation and evolution. Proc. Natl Acad. Sci. USA 111 , 3984–3989 (2014).

Guimera, R., Uzzi, B., Spiro, J. & Amaral, L. A. N. Team assembly mechanisms determine collaboration network structure and team performance. Science 308 , 697–702 (2005).

Newman, M. E. Coauthorship networks and patterns of scientific collaboration. Proc. Natl Acad. Sci. USA 101 , 5200–5205 (2004).

Newman, M. E. Clustering and preferential attachment in growing networks. Phys. Rev. E 64 , 025102 (2001).

Iacopini, I., Milojević, S. & Latora, V. Network dynamics of innovation processes. Phys. Rev. Lett. 120 , 048301 (2018).

Kuhn, T., Perc, M. & Helbing, D. Inheritance patterns in citation networks reveal scientific memes. Phys. Rev. 4 , 041036 (2014).

Jia, T., Wang, D. & Szymanski, B. K. Quantifying patterns of research-interest evolution. Nat. Hum. Behav. 1 , 0078 (2017).

Zeng, A. et al. Increasing trend of scientists to switch between topics. Nat. Commun. https://doi.org/10.1038/s41467-019-11401-8 (2019).

Siudem, G., Żogała-Siudem, B., Cena, A. & Gagolewski, M. Three dimensions of scientific impact. Proc. Natl Acad. Sci. USA 117 , 13896–13900 (2020).

Petersen, A. M. et al. Reputation and impact in academic careers. Proc. Natl Acad. Sci. USA 111 , 15316–15321 (2014).

Jin, C., Song, C., Bjelland, J., Canright, G. & Wang, D. Emergence of scaling in complex substitutive systems. Nat. Hum. Behav. 3 , 837–846 (2019).

Hofman, J. M. et al. Integrating explanation and prediction in computational social science. Nature 595 , 181–188 (2021).

Lazer, D. et al. Computational social science. Science 323 , 721–723 (2009).

Lazer, D. M. et al. Computational social science: obstacles and opportunities. Science 369 , 1060–1062 (2020).

Albert, R. & Barabási, A.-L. Statistical mechanics of complex networks. Rev. Mod. Phys. 74 , 47 (2002).

Newman, M. E. The structure and function of complex networks. SIAM Rev. 45 , 167–256 (2003).

Song, C., Qu, Z., Blumm, N. & Barabási, A.-L. Limits of predictability in human mobility. Science 327 , 1018–1021 (2010).

Alessandretti, L., Aslak, U. & Lehmann, S. The scales of human mobility. Nature 587 , 402–407 (2020).

Pastor-Satorras, R. & Vespignani, A. Epidemic spreading in scale-free networks. Phys. Rev. Lett. 86 , 3200 (2001).

Pastor-Satorras, R., Castellano, C., Van Mieghem, P. & Vespignani, A. Epidemic processes in complex networks. Rev. Mod. Phys. 87 , 925 (2015).

Goodfellow, I., Bengio, Y. & Courville, A. Deep Learning (MIT Press, 2016).

Bishop, C. M. Pattern Recognition and Machine Learning (Springer, 2006).

Dong, Y., Johnson, R. A. & Chawla, N. V. Will this paper increase your h-index? Scientific impact prediction. In Proc. 8th ACM International Conference on Web Search and Data Mining, 149–158 (ACM 2015)

Xiao, S. et al. On modeling and predicting individual paper citation count over time. In IJCAI, 2676–2682 (IJCAI, 2016)

Fortunato, S. Community detection in graphs. Phys. Rep. 486 , 75–174 (2010).

Chen, C. Science mapping: a systematic review of the literature. J. Data Inf. Sci. 2 , 1–40 (2017).

CAS   Google Scholar  

Van Eck, N. J. & Waltman, L. Citation-based clustering of publications using CitNetExplorer and VOSviewer. Scientometrics 111 , 1053–1070 (2017).

LeCun, Y., Bengio, Y. & Hinton, G. Deep learning. Nature 521 , 436–444 (2015).

Senior, A. W. et al. Improved protein structure prediction using potentials from deep learning. Nature 577 , 706–710 (2020).

Krenn, M. & Zeilinger, A. Predicting research trends with semantic and neural networks with an application in quantum physics. Proc. Natl Acad. Sci. USA 117 , 1910–1916 (2020).

Iten, R., Metger, T., Wilming, H., Del Rio, L. & Renner, R. Discovering physical concepts with neural networks. Phys. Rev. Lett. 124 , 010508 (2020).

Guimerà, R. et al. A Bayesian machine scientist to aid in the solution of challenging scientific problems. Sci. Adv. 6 , eaav6971 (2020).

Segler, M. H., Preuss, M. & Waller, M. P. Planning chemical syntheses with deep neural networks and symbolic AI. Nature 555 , 604–610 (2018).

Ryu, J. Y., Kim, H. U. & Lee, S. Y. Deep learning improves prediction of drug–drug and drug–food interactions. Proc. Natl Acad. Sci. USA 115 , E4304–E4311 (2018).

Kermany, D. S. et al. Identifying medical diagnoses and treatable diseases by image-based deep learning. Cell 172 , 1122–1131.e9 (2018).

Peng, H., Ke, Q., Budak, C., Romero, D. M. & Ahn, Y.-Y. Neural embeddings of scholarly periodicals reveal complex disciplinary organizations. Sci. Adv. 7 , eabb9004 (2021).

Youyou, W., Yang, Y. & Uzzi, B. A discipline-wide investigation of the replicability of psychology papers over the past two decades. Proc. Natl Acad. Sci. USA 120 , e2208863120 (2023).

Mehrabi, N., Morstatter, F., Saxena, N., Lerman, K. & Galstyan, A. A survey on bias and fairness in machine learning. ACM Computing Surveys (CSUR) 54 , 1–35 (2021).

Way, S. F., Morgan, A. C., Larremore, D. B. & Clauset, A. Productivity, prominence, and the effects of academic environment. Proc. Natl Acad. Sci. USA 116 , 10729–10733 (2019).

Li, W., Aste, T., Caccioli, F. & Livan, G. Early coauthorship with top scientists predicts success in academic careers. Nat. Commun. 10 , 5170 (2019).

Hendry, D. F., Pagan, A. R. & Sargan, J. D. Dynamic specification. Handb. Econ. 2 , 1023–1100 (1984).

Jin, C., Ma, Y. & Uzzi, B. Scientific prizes and the extraordinary growth of scientific topics. Nat. Commun. 12 , 5619 (2021).

Azoulay, P., Ganguli, I. & Zivin, J. G. The mobility of elite life scientists: professional and personal determinants. Res. Policy 46 , 573–590 (2017).

Slavova, K., Fosfuri, A. & De Castro, J. O. Learning by hiring: the effects of scientists’ inbound mobility on research performance in academia. Organ. Sci. 27 , 72–89 (2016).

Sarsons, H. Recognition for group work: gender differences in academia. Am. Econ. Rev. 107 , 141–145 (2017).

Campbell, L. G., Mehtani, S., Dozier, M. E. & Rinehart, J. Gender-heterogeneous working groups produce higher quality science. PLoS ONE 8 , e79147 (2013).

Azoulay, P., Graff Zivin, J. S. & Wang, J. Superstar extinction. Q. J. Econ. 125 , 549–589 (2010).

Furman, J. L. & Stern, S. Climbing atop the shoulders of giants: the impact of institutions on cumulative research. Am. Econ. Rev. 101 , 1933–1963 (2011).

Williams, H. L. Intellectual property rights and innovation: evidence from the human genome. J. Polit. Econ. 121 , 1–27 (2013).

Rubin, A. & Rubin, E. Systematic Bias in the Progress of Research. J. Polit. Econ. 129 , 2666–2719 (2021).

Lu, S. F., Jin, G. Z., Uzzi, B. & Jones, B. The retraction penalty: evidence from the Web of Science. Sci. Rep. 3 , 3146 (2013).

Jin, G. Z., Jones, B., Lu, S. F. & Uzzi, B. The reverse Matthew effect: consequences of retraction in scientific teams. Rev. Econ. Stat. 101 , 492–506 (2019).

Azoulay, P., Bonatti, A. & Krieger, J. L. The career effects of scandal: evidence from scientific retractions. Res. Policy 46 , 1552–1569 (2017).

Goodman-Bacon, A. Difference-in-differences with variation in treatment timing. J. Econ. 225 , 254–277 (2021).

Callaway, B. & Sant’Anna, P. H. Difference-in-differences with multiple time periods. J. Econ. 225 , 200–230 (2021).

Hill, R. Searching for Superstars: Research Risk and Talent Discovery in Astronomy Working Paper (Massachusetts Institute of Technology, 2019).

Bagues, M., Sylos-Labini, M. & Zinovyeva, N. Does the gender composition of scientific committees matter? Am. Econ. Rev. 107 , 1207–1238 (2017).

Sampat, B. & Williams, H. L. How do patents affect follow-on innovation? Evidence from the human genome. Am. Econ. Rev. 109 , 203–236 (2019).

Moretti, E. & Wilson, D. J. The effect of state taxes on the geographical location of top earners: evidence from star scientists. Am. Econ. Rev. 107 , 1858–1903 (2017).

Jacob, B. A. & Lefgren, L. The impact of research grant funding on scientific productivity. J. Public Econ. 95 , 1168–1177 (2011).

Li, D. Expertise versus bias in evaluation: evidence from the NIH. Am. Econ. J. Appl. Econ. 9 , 60–92 (2017).

Pearl, J. Causal diagrams for empirical research. Biometrika 82 , 669–688 (1995).

Pearl, J. & Mackenzie, D. The Book of Why: The New Science of Cause and Effect (Basic Books, 2018).

Traag, V. A. Inferring the causal effect of journals on citations. Quant. Sci. Stud. 2 , 496–504 (2021).

Traag, V. & Waltman, L. Causal foundations of bias, disparity and fairness. Preprint at https://doi.org/10.48550/arXiv.2207.13665 (2022).

Imbens, G. W. Potential outcome and directed acyclic graph approaches to causality: relevance for empirical practice in economics. J. Econ. Lit. 58 , 1129–1179 (2020).

Heckman, J. J. & Pinto, R. Causality and Econometrics (National Bureau of Economic Research, 2022).

Aggarwal, I., Woolley, A. W., Chabris, C. F. & Malone, T. W. The impact of cognitive style diversity on implicit learning in teams. Front. Psychol. 10 , 112 (2019).

Balietti, S., Goldstone, R. L. & Helbing, D. Peer review and competition in the Art Exhibition Game. Proc. Natl Acad. Sci. USA 113 , 8414–8419 (2016).

Paulus, F. M., Rademacher, L., Schäfer, T. A. J., Müller-Pinzler, L. & Krach, S. Journal impact factor shapes scientists’ reward signal in the prospect of publication. PLoS ONE 10 , e0142537 (2015).

Williams, W. M. & Ceci, S. J. National hiring experiments reveal 2:1 faculty preference for women on STEM tenure track. Proc. Natl Acad. Sci. USA 112 , 5360–5365 (2015).

Collaboration, O. S. Estimating the reproducibility of psychological science. Science 349 , aac4716 (2015).

Camerer, C. F. et al. Evaluating replicability of laboratory experiments in economics. Science 351 , 1433–1436 (2016).

Camerer, C. F. et al. Evaluating the replicability of social science experiments in Nature and Science between 2010 and 2015. Nat. Hum. Behav. 2 , 637–644 (2018).

Duflo, E. & Banerjee, A. Handbook of Field Experiments (Elsevier, 2017).

Tomkins, A., Zhang, M. & Heavlin, W. D. Reviewer bias in single versus double-blind peer review. Proc. Natl Acad. Sci. USA 114 , 12708–12713 (2017).

Blank, R. M. The effects of double-blind versus single-blind reviewing: experimental evidence from the American Economic Review. Am. Econ. Rev. 81 , 1041–1067 (1991).

Boudreau, K. J., Guinan, E. C., Lakhani, K. R. & Riedl, C. Looking across and looking beyond the knowledge frontier: intellectual distance, novelty, and resource allocation in science. Manage. Sci. 62 , 2765–2783 (2016).

Lane, J. et al. When Do Experts Listen to Other Experts? The Role of Negative Information in Expert Evaluations for Novel Projects Working Paper #21-007 (Harvard Business School, 2020).

Teplitskiy, M. et al. Do Experts Listen to Other Experts? Field Experimental Evidence from Scientific Peer Review (Harvard Business School, 2019).

Moss-Racusin, C. A., Dovidio, J. F., Brescoll, V. L., Graham, M. J. & Handelsman, J. Science faculty’s subtle gender biases favor male students. Proc. Natl Acad. Sci. USA 109 , 16474–16479 (2012).

Forscher, P. S., Cox, W. T., Brauer, M. & Devine, P. G. Little race or gender bias in an experiment of initial review of NIH R01 grant proposals. Nat. Hum. Behav. 3 , 257–264 (2019).

Dennehy, T. C. & Dasgupta, N. Female peer mentors early in college increase women’s positive academic experiences and retention in engineering. Proc. Natl Acad. Sci. USA 114 , 5964–5969 (2017).

Azoulay, P. Turn the scientific method on ourselves. Nature 484 , 31–32 (2012).

Download references

Acknowledgements

The authors thank all members of the Center for Science of Science and Innovation (CSSI) for invaluable comments. This work was supported by the Air Force Office of Scientific Research under award number FA9550-19-1-0354, National Science Foundation grant SBE 1829344, and the Alfred P. Sloan Foundation G-2019-12485.

Author information

Authors and affiliations.

Center for Science of Science and Innovation, Northwestern University, Evanston, IL, USA

Lu Liu, Benjamin F. Jones, Brian Uzzi & Dashun Wang

Northwestern Institute on Complex Systems, Northwestern University, Evanston, IL, USA

Kellogg School of Management, Northwestern University, Evanston, IL, USA

College of Information Sciences and Technology, Pennsylvania State University, University Park, PA, USA

National Bureau of Economic Research, Cambridge, MA, USA

Benjamin F. Jones

Brookings Institution, Washington, DC, USA

McCormick School of Engineering, Northwestern University, Evanston, IL, USA

  • Dashun Wang

You can also search for this author in PubMed   Google Scholar

Corresponding author

Correspondence to Dashun Wang .

Ethics declarations

Competing interests.

The authors declare no competing interests.

Peer review

Peer review information.

Nature Human Behaviour thanks Ludo Waltman, Erin Leahey and Sarah Bratt for their contribution to the peer review of this work.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article.

Liu, L., Jones, B.F., Uzzi, B. et al. Data, measurement and empirical methods in the science of science. Nat Hum Behav 7 , 1046–1058 (2023). https://doi.org/10.1038/s41562-023-01562-4

Download citation

Received : 30 June 2022

Accepted : 17 February 2023

Published : 01 June 2023

Issue Date : July 2023

DOI : https://doi.org/10.1038/s41562-023-01562-4

Share this article

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

This article is cited by

Rescaling the disruption index reveals the universality of disruption distributions in science.

  • Alex J. Yang
  • Hongcun Gong
  • Sanhong Deng

Scientometrics (2024)

Scientific Data (2023)

Quick links

  • Explore articles by subject
  • Guide to authors
  • Editorial policies

Sign up for the Nature Briefing newsletter — what matters in science, free to your inbox daily.

empirical research analysis method

Tulane University Libraries Logo

Library Guides

The Empirical Research Paper: A Guide

What is empirical research, useful related guides.

  • Reading the Empirical Paper
  • Designing Empirical Research
  • Writing the Empirical Paper

Ask a Librarian

Empirical Research consists of experiments that rely on observation and measurement to provide evidence about phenomena. Empirical research employs rigorous methods to test out theories and hypotheses (expectations) using real data instead of hunches or anecdotal observations. This type of research is easily identifiable as it always consists of the following pieces of information:

  • Introduction

This Guide will serve to offer a basic understanding on how to approach empirical research via Reading the Empirical Research Paper, Designing Empirical Research, and Writing an Empirical Paper. 

  • Writing (Publishing, Research & Data) for the Health Sciences: A Guide by Mary Holt Last Updated Mar 21, 2024 160 views this year
  • Next: Reading the Empirical Paper >>
  • Last Updated: May 31, 2023 1:37 PM
  • URL: https://libguides.tulane.edu/empirical

Creative Commons License

Penn State University Libraries

Empirical research in the social sciences and education.

  • What is Empirical Research and How to Read It
  • Finding Empirical Research in Library Databases
  • Designing Empirical Research
  • Ethics, Cultural Responsiveness, and Anti-Racism in Research
  • Citing, Writing, and Presenting Your Work

Contact the Librarian at your campus for more help!

Ellysa Cahoy

Introduction: What is Empirical Research?

Empirical research is based on observed and measured phenomena and derives knowledge from actual experience rather than from theory or belief. 

How do you know if a study is empirical? Read the subheadings within the article, book, or report and look for a description of the research "methodology."  Ask yourself: Could I recreate this study and test these results?

Key characteristics to look for:

  • Specific research questions to be answered
  • Definition of the population, behavior, or   phenomena being studied
  • Description of the process used to study this population or phenomena, including selection criteria, controls, and testing instruments (such as surveys)

Another hint: some scholarly journals use a specific layout, called the "IMRaD" format, to communicate empirical research findings. Such articles typically have 4 components:

  • Introduction : sometimes called "literature review" -- what is currently known about the topic -- usually includes a theoretical framework and/or discussion of previous studies
  • Methodology: sometimes called "research design" -- how to recreate the study -- usually describes the population, research process, and analytical tools used in the present study
  • Results : sometimes called "findings" -- what was learned through the study -- usually appears as statistical data or as substantial quotations from research participants
  • Discussion : sometimes called "conclusion" or "implications" -- why the study is important -- usually describes how the research results influence professional practices or future studies

Reading and Evaluating Scholarly Materials

Reading research can be a challenge. However, the tutorials and videos below can help. They explain what scholarly articles look like, how to read them, and how to evaluate them:

  • CRAAP Checklist A frequently-used checklist that helps you examine the currency, relevance, authority, accuracy, and purpose of an information source.
  • IF I APPLY A newer model of evaluating sources which encourages you to think about your own biases as a reader, as well as concerns about the item you are reading.
  • Credo Video: How to Read Scholarly Materials (4 min.)
  • Credo Tutorial: How to Read Scholarly Materials
  • Credo Tutorial: Evaluating Information
  • Credo Video: Evaluating Statistics (4 min.)
  • Next: Finding Empirical Research in Library Databases >>
  • Last Updated: Feb 18, 2024 8:33 PM
  • URL: https://guides.libraries.psu.edu/emp

Canvas | University | Ask a Librarian

  • Library Homepage
  • Arrendale Library

Empirical Research: Quantitative & Qualitative

  • Empirical Research

Introduction: What is Empirical Research?

Quantitative methods, qualitative methods.

  • Quantitative vs. Qualitative
  • Reference Works for Social Sciences Research
  • Contact Us!

 Call us at 706-776-0111

  Chat with a Librarian

  Send Us Email

  Library Hours

Empirical research  is based on phenomena that can be observed and measured. Empirical research derives knowledge from actual experience rather than from theory or belief. 

Key characteristics of empirical research include:

  • Specific research questions to be answered;
  • Definitions of the population, behavior, or phenomena being studied;
  • Description of the methodology or research design used to study this population or phenomena, including selection criteria, controls, and testing instruments (such as surveys);
  • Two basic research processes or methods in empirical research: quantitative methods and qualitative methods (see the rest of the guide for more about these methods).

(based on the original from the Connelly LIbrary of LaSalle University)

empirical research analysis method

Empirical Research: Qualitative vs. Quantitative

Learn about common types of journal articles that use APA Style, including empirical studies; meta-analyses; literature reviews; and replication, theoretical, and methodological articles.

Academic Writer

© 2024 American Psychological Association.

  • More about Academic Writer ...

Quantitative Research

A quantitative research project is characterized by having a population about which the researcher wants to draw conclusions, but it is not possible to collect data on the entire population.

  • For an observational study, it is necessary to select a proper, statistical random sample and to use methods of statistical inference to draw conclusions about the population. 
  • For an experimental study, it is necessary to have a random assignment of subjects to experimental and control groups in order to use methods of statistical inference.

Statistical methods are used in all three stages of a quantitative research project.

For observational studies, the data are collected using statistical sampling theory. Then, the sample data are analyzed using descriptive statistical analysis. Finally, generalizations are made from the sample data to the entire population using statistical inference.

For experimental studies, the subjects are allocated to experimental and control group using randomizing methods. Then, the experimental data are analyzed using descriptive statistical analysis. Finally, just as for observational data, generalizations are made to a larger population.

Iversen, G. (2004). Quantitative research . In M. Lewis-Beck, A. Bryman, & T. Liao (Eds.), Encyclopedia of social science research methods . (pp. 897-898). Thousand Oaks, CA: SAGE Publications, Inc.

Qualitative Research

What makes a work deserving of the label qualitative research is the demonstrable effort to produce richly and relevantly detailed descriptions and particularized interpretations of people and the social, linguistic, material, and other practices and events that shape and are shaped by them.

Qualitative research typically includes, but is not limited to, discerning the perspectives of these people, or what is often referred to as the actor’s point of view. Although both philosophically and methodologically a highly diverse entity, qualitative research is marked by certain defining imperatives that include its case (as opposed to its variable) orientation, sensitivity to cultural and historical context, and reflexivity. 

In its many guises, qualitative research is a form of empirical inquiry that typically entails some form of purposive sampling for information-rich cases; in-depth interviews and open-ended interviews, lengthy participant/field observations, and/or document or artifact study; and techniques for analysis and interpretation of data that move beyond the data generated and their surface appearances. 

Sandelowski, M. (2004).  Qualitative research . In M. Lewis-Beck, A. Bryman, & T. Liao (Eds.),  Encyclopedia of social science research methods . (pp. 893-894). Thousand Oaks, CA: SAGE Publications, Inc.

  • Next: Quantitative vs. Qualitative >>
  • Last Updated: Mar 22, 2024 10:47 AM
  • URL: https://library.piedmont.edu/empirical-research
  • Ebooks & Online Video
  • New Materials
  • Renew Checkouts
  • Faculty Resources
  • Library Friends
  • Library Services
  • Our Mission
  • Library History
  • Ask a Librarian!
  • Making Citations
  • Working Online

Friend us on Facebook!

Arrendale Library Piedmont University 706-776-0111

Get science-backed answers as you write with Paperpal's Research feature

Empirical Research: A Comprehensive Guide for Academics 

empirical research

Empirical research relies on gathering and studying real, observable data. The term ’empirical’ comes from the Greek word ’empeirikos,’ meaning ‘experienced’ or ‘based on experience.’ So, what is empirical research? Instead of using theories or opinions, empirical research depends on real data obtained through direct observation or experimentation. 

Why Empirical Research?

Empirical research plays a key role in checking or improving current theories, providing a systematic way to grow knowledge across different areas. By focusing on objectivity, it makes research findings more trustworthy, which is critical in research fields like medicine, psychology, economics, and public policy. In the end, the strengths of empirical research lie in deepening our awareness of the world and improving our capacity to tackle problems wisely. 1,2  

Qualitative and Quantitative Methods

There are two main types of empirical research methods – qualitative and quantitative. 3,4 Qualitative research delves into intricate phenomena using non-numerical data, such as interviews or observations, to offer in-depth insights into human experiences. In contrast, quantitative research analyzes numerical data to spot patterns and relationships, aiming for objectivity and the ability to apply findings to a wider context. 

Steps for Conducting Empirical Research

When it comes to conducting research, there are some simple steps that researchers can follow. 5,6  

  • Create Research Hypothesis:  Clearly state the specific question you want to answer or the hypothesis you want to explore in your study. 
  • Examine Existing Research:  Read and study existing research on your topic. Understand what’s already known, identify existing gaps in knowledge, and create a framework for your own study based on what you learn. 
  • Plan Your Study:  Decide how you’ll conduct your research—whether through qualitative methods, quantitative methods, or a mix of both. Choose suitable techniques like surveys, experiments, interviews, or observations based on your research question. 
  • Develop Research Instruments:  Create reliable research collection tools, such as surveys or questionnaires, to help you collate data. Ensure these tools are well-designed and effective. 
  • Collect Data:  Systematically gather the information you need for your research according to your study design and protocols using the chosen research methods. 
  • Data Analysis:  Analyze the collected data using suitable statistical or qualitative methods that align with your research question and objectives. 
  • Interpret Results:  Understand and explain the significance of your analysis results in the context of your research question or hypothesis. 
  • Draw Conclusions:  Summarize your findings and draw conclusions based on the evidence. Acknowledge any study limitations and propose areas for future research. 

Advantages of Empirical Research

Empirical research is valuable because it stays objective by relying on observable data, lessening the impact of personal biases. This objectivity boosts the trustworthiness of research findings. Also, using precise quantitative methods helps in accurate measurement and statistical analysis. This precision ensures researchers can draw reliable conclusions from numerical data, strengthening our understanding of the studied phenomena. 4  

Disadvantages of Empirical Research

While empirical research has notable strengths, researchers must also be aware of its limitations when deciding on the right research method for their study.4 One significant drawback of empirical research is the risk of oversimplifying complex phenomena, especially when relying solely on quantitative methods. These methods may struggle to capture the richness and nuances present in certain social, cultural, or psychological contexts. Another challenge is the potential for confounding variables or biases during data collection, impacting result accuracy.  

Tips for Empirical Writing

In empirical research, the writing is usually done in research papers, articles, or reports. The empirical writing follows a set structure, and each section has a specific role. Here are some tips for your empirical writing. 7   

  • Define Your Objectives:  When you write about your research, start by making your goals clear. Explain what you want to find out or prove in a simple and direct way. This helps guide your research and lets others know what you have set out to achieve. 
  • Be Specific in Your Literature Review:  In the part where you talk about what others have studied before you, focus on research that directly relates to your research question. Keep it short and pick studies that help explain why your research is important. This part sets the stage for your work. 
  • Explain Your Methods Clearly : When you talk about how you did your research (Methods), explain it in detail. Be clear about your research plan, who took part, and what you did; this helps others understand and trust your study. Also, be honest about any rules you follow to make sure your study is ethical and reproducible. 
  • Share Your Results Clearly : After doing your empirical research, share what you found in a simple way. Use tables or graphs to make it easier for your audience to understand your research. Also, talk about any numbers you found and clearly state if they are important or not. Ensure that others can see why your research findings matter. 
  • Talk About What Your Findings Mean:  In the part where you discuss your research results, explain what they mean. Discuss why your findings are important and if they connect to what others have found before. Be honest about any problems with your study and suggest ideas for more research in the future. 
  • Wrap It Up Clearly:  Finally, end your empirical research paper by summarizing what you found and why it’s important. Remind everyone why your study matters. Keep your writing clear and fix any mistakes before you share it. Ask someone you trust to read it and give you feedback before you finish. 

References:  

  • Empirical Research in the Social Sciences and Education, Penn State University Libraries. Available online at  https://guides.libraries.psu.edu/emp  
  • How to conduct empirical research, Emerald Publishing. Available online at  https://www.emeraldgrouppublishing.com/how-to/research-methods/conduct-empirical-research  
  • Empirical Research: Quantitative & Qualitative, Arrendale Library, Piedmont University. Available online at  https://library.piedmont.edu/empirical-research  
  • Bouchrika, I.  What Is Empirical Research? Definition, Types & Samples  in 2024. Research.com, January 2024. Available online at  https://research.com/research/what-is-empirical-research  
  • Quantitative and Empirical Research vs. Other Types of Research. California State University, April 2023. Available online at  https://libguides.csusb.edu/quantitative  
  • Empirical Research, Definitions, Methods, Types and Examples, Studocu.com website. Available online at  https://www.studocu.com/row/document/uganda-christian-university/it-research-methods/emperical-research-definitions-methods-types-and-examples/55333816  
  • Writing an Empirical Paper in APA Style. Psychology Writing Center, University of Washington. Available online at  https://psych.uw.edu/storage/writing_center/APApaper.pdf  

Paperpal is an AI writing assistant that help academics write better, faster with real-time suggestions for in-depth language and grammar correction. Trained on millions of research manuscripts enhanced by professional academic editors, Paperpal delivers human precision at machine speed.  

Try it for free or upgrade to  Paperpal Prime , which unlocks unlimited access to premium features like academic translation, paraphrasing, contextual synonyms, consistency checks and more. It’s like always having a professional academic editor by your side! Go beyond limitations and experience the future of academic writing.  Get Paperpal Prime now at just US$19 a month!  

Related Reads:

  • How to Write a Scientific Paper in 10 Steps 
  • What is a Literature Review? How to Write It (with Examples)
  • What is an Argumentative Essay? How to Write It (With Examples)
  • Ethical Research Practices For Research with Human Subjects

Ethics in Science: Importance, Principles & Guidelines 

Presenting research data effectively through tables and figures, you may also like, how paperpal’s research feature helps you develop and..., how paperpal is enhancing academic productivity and accelerating..., how to write a successful book chapter for..., academic editing: how to self-edit academic text with..., 4 ways paperpal encourages responsible writing with ai, what are scholarly sources and where can you..., how to write a hypothesis types and examples , measuring academic success: definition & strategies for excellence, what is academic writing: tips for students, why traditional editorial process needs an upgrade.

Banner

  • University of Memphis Libraries
  • Research Guides

Empirical Research: Defining, Identifying, & Finding

  • Defining Empirical Research
  • Introduction

The Methods Section

  • Database Tools
  • Search Terms
  • Image Descriptions

The Methods section exists to explain to the reader how the researchers collected and analyzed their data. The authors need to convince the reader that the methods used can provide an answer to the research question and that the reader can trust the results. 

What Criteria to Look For

Since the "Methods" section describes how the research is being conducted, it is probably the most important section for identifying empirical research. It is where you are likely to find many criteria, including the

  • Methodology,  and

The Methods section is also connected to how to  recreate  the study. By providing sufficient information about their design, methodology, and sample, the authors of the research let other researchers know what they would need to do to recreate it. Additionally, the authors may also discuss how the methodology affects how much you can  generalize  from their results. 

Finding the Criteria

When looking for the sample, look for where the authors discuss who was included in the research and why the authors wanted that group. Design and methodology tend to mix together. Look for where the researchers discuss how they identified the sample, how they performed the research, and how they analyzed their results. 

What the Section Might Be Called

The Methods section has a few common heading labels: 

  • Methodology

You might see "research" or "study" added to any of the above headings. The section may also be broken down into headings or subheadings for specific aspects of the methods, such as "participants," "sample," "measures," or "data analysis."  

  • Has a single "Methods" heading for the section which starts on page 1004.
  • The first paragraph of the section discusses the sample of "24 LGBTQ students attending a public university in the southeastern United States" with a detailed breakdown of the demographics of that sample, which is then also conveyed in Table 1 on page 1005. 
  • The first paragraph also introduces broadly that the methodology will be interviews, and then in the second paragraph, which stretches from page 1004 to page 1005, the authors further explain methodology and design decisions including how participants were recruited, how the interviews were conducted, what types of questions were asked, how the data was collected, and how the authors analyzed that data. 
  • The last paragraph of the section, spanning the end of page 1005 and beginning of page 1006, begins to discuss how readers might be able to generalize the results based on limitations in the sample. 
  • Section labeled "Method" and begins on page 540.
  • Has subheadings: Sample (page 540), Measures (page 541), and Data Analysis (page 541).
  • "Sample" subheading covers the sample while "Measures" and "Data Analysis" cover methodology and design. 
  • Section labeled "Method" and begins on page 573.
  • Has subheadings: Participants, Procedure, and Measures.
  • "Participants" subheading covers sample while "Procedure" and "Measures" cover methodology and design. 
  • << Previous: Introduction
  • Next: Results >>
  • Last Updated: Apr 2, 2024 11:25 AM
  • URL: https://libguides.memphis.edu/empirical-research
  • What is Empirical Research Study? [Examples & Method]

busayo.longe

The bulk of human decisions relies on evidence, that is, what can be measured or proven as valid. In choosing between plausible alternatives, individuals are more likely to tilt towards the option that is proven to work, and this is the same approach adopted in empirical research. 

In empirical research, the researcher arrives at outcomes by testing his or her empirical evidence using qualitative or quantitative methods of observation, as determined by the nature of the research. An empirical research study is set apart from other research approaches by its methodology and features hence; it is important for every researcher to know what constitutes this investigation method. 

What is Empirical Research? 

Empirical research is a type of research methodology that makes use of verifiable evidence in order to arrive at research outcomes. In other words, this  type of research relies solely on evidence obtained through observation or scientific data collection methods. 

Empirical research can be carried out using qualitative or quantitative observation methods , depending on the data sample, that is, quantifiable data or non-numerical data . Unlike theoretical research that depends on preconceived notions about the research variables, empirical research carries a scientific investigation to measure the experimental probability of the research variables 

Characteristics of Empirical Research

  • Research Questions

An empirical research begins with a set of research questions that guide the investigation. In many cases, these research questions constitute the research hypothesis which is tested using qualitative and quantitative methods as dictated by the nature of the research.

In an empirical research study, the research questions are built around the core of the research, that is, the central issue which the research seeks to resolve. They also determine the course of the research by highlighting the specific objectives and aims of the systematic investigation. 

  • Definition of the Research Variables

The research variables are clearly defined in terms of their population, types, characteristics, and behaviors. In other words, the data sample is clearly delimited and placed within the context of the research. 

  • Description of the Research Methodology

 An empirical research also clearly outlines the methods adopted in the systematic investigation. Here, the research process is described in detail including the selection criteria for the data sample, qualitative or quantitative research methods plus testing instruments. 

An empirical research is usually divided into 4 parts which are the introduction, methodology, findings, and discussions. The introduction provides a background of the empirical study while the methodology describes the research design, processes, and tools for the systematic investigation. 

The findings refer to the research outcomes and they can be outlined as statistical data or in the form of information obtained through the qualitative observation of research variables. The discussions highlight the significance of the study and its contributions to knowledge. 

Uses of Empirical Research

Without any doubt, empirical research is one of the most useful methods of systematic investigation. It can be used for validating multiple research hypotheses in different fields including Law, Medicine, and Anthropology. 

  • Empirical Research in Law : In Law, empirical research is used to study institutions, rules, procedures, and personnel of the law, with a view to understanding how they operate and what effects they have. It makes use of direct methods rather than secondary sources, and this helps you to arrive at more valid conclusions.
  • Empirical Research in Medicine : In medicine, empirical research is used to test and validate multiple hypotheses and increase human knowledge.
  • Empirical Research in Anthropology : In anthropology, empirical research is used as an evidence-based systematic method of inquiry into patterns of human behaviors and cultures. This helps to validate and advance human knowledge.
Discover how Extrapolation Powers statistical research: Definition, examples, types, and applications explained.

The Empirical Research Cycle

The empirical research cycle is a 5-phase cycle that outlines the systematic processes for conducting and empirical research. It was developed by Dutch psychologist, A.D. de Groot in the 1940s and it aligns 5 important stages that can be viewed as deductive approaches to empirical research. 

In the empirical research methodological cycle, all processes are interconnected and none of the processes is more important than the other. This cycle clearly outlines the different phases involved in generating the research hypotheses and testing these hypotheses systematically using the empirical data. 

  • Observation: This is the process of gathering empirical data for the research. At this stage, the researcher gathers relevant empirical data using qualitative or quantitative observation methods, and this goes ahead to inform the research hypotheses.
  • Induction: At this stage, the researcher makes use of inductive reasoning in order to arrive at a general probable research conclusion based on his or her observation. The researcher generates a general assumption that attempts to explain the empirical data and s/he goes on to observe the empirical data in line with this assumption.
  • Deduction: This is the deductive reasoning stage. This is where the researcher generates hypotheses by applying logic and rationality to his or her observation.
  • Testing: Here, the researcher puts the hypotheses to test using qualitative or quantitative research methods. In the testing stage, the researcher combines relevant instruments of systematic investigation with empirical methods in order to arrive at objective results that support or negate the research hypotheses.
  • Evaluation: The evaluation research is the final stage in an empirical research study. Here, the research outlines the empirical data, the research findings and the supporting arguments plus any challenges encountered during the research process.

This information is useful for further research. 

Learn about qualitative data: uncover its types and examples here.

Examples of Empirical Research 

  • An empirical research study can be carried out to determine if listening to happy music improves the mood of individuals. The researcher may need to conduct an experiment that involves exposing individuals to happy music to see if this improves their moods.

The findings from such an experiment will provide empirical evidence that confirms or refutes the hypotheses. 

  • An empirical research study can also be carried out to determine the effects of a new drug on specific groups of people. The researcher may expose the research subjects to controlled quantities of the drug and observe research subjects to controlled quantities of the drug and observe the effects over a specific period of time to gather empirical data.
  • Another example of empirical research is measuring the levels of noise pollution found in an urban area to determine the average levels of sound exposure experienced by its inhabitants. Here, the researcher may have to administer questionnaires or carry out a survey in order to gather relevant data based on the experiences of the research subjects.
  • Empirical research can also be carried out to determine the relationship between seasonal migration and the body mass of flying birds. A researcher may need to observe the birds and carry out necessary observation and experimentation in order to arrive at objective outcomes that answer the research question.

Empirical Research Data Collection Methods

Empirical data can be gathered using qualitative and quantitative data collection methods. Quantitative data collection methods are used for numerical data gathering while qualitative data collection processes are used to gather empirical data that cannot be quantified, that is, non-numerical data. 

The following are common methods of gathering data in empirical research

  • Survey/ Questionnaire

A survey is a method of data gathering that is typically employed by researchers to gather large sets of data from a specific number of respondents with regards to a research subject. This method of data gathering is often used for quantitative data collection , although it can also be deployed during quantitative research.

A survey contains a set of questions that can range from close-ended to open-ended questions together with other question types that revolve around the research subject. A survey can be administered physically or with the use of online data-gathering platforms like Formplus. 

Empirical data can also be collected by carrying out an experiment. An experiment is a controlled simulation in which one or more of the research variables is manipulated using a set of interconnected processes in order to confirm or refute the research hypotheses.

An experiment is a useful method of measuring causality; that is cause and effect between dependent and independent variables in a research environment. It is an integral data gathering method in an empirical research study because it involves testing calculated assumptions in order to arrive at the most valid data and research outcomes. 

T he case study method is another common data gathering method in an empirical research study. It involves sifting through and analyzing relevant cases and real-life experiences about the research subject or research variables in order to discover in-depth information that can serve as empirical data.

  • Observation

The observational method is a method of qualitative data gathering that requires the researcher to study the behaviors of research variables in their natural environments in order to gather relevant information that can serve as empirical data.

How to collect Empirical Research Data with Questionnaire

With Formplus, you can create a survey or questionnaire for collecting empirical data from your research subjects. Formplus also offers multiple form sharing options so that you can share your empirical research survey to research subjects via a variety of methods.

Here is a step-by-step guide of how to collect empirical data using Formplus:

Sign in to Formplus

empirical-research-data-collection

In the Formplus builder, you can easily create your empirical research survey by dragging and dropping preferred fields into your form. To access the Formplus builder, you will need to create an account on Formplus. 

Once you do this, sign in to your account and click on “Create Form ” to begin. 

Unlock the secrets of Quantitative Data: Click here to explore the types and examples.

Edit Form Title

Click on the field provided to input your form title, for example, “Empirical Research Survey”.

empirical-research-questionnaire

Edit Form  

  • Click on the edit button to edit the form.
  • Add Fields: Drag and drop preferred form fields into your form in the Formplus builder inputs column. There are several field input options for survey forms in the Formplus builder.
  • Edit fields
  • Click on “Save”
  • Preview form.

empirical-research-survey

Customize Form

Formplus allows you to add unique features to your empirical research survey form. You can personalize your survey using various customization options. Here, you can add background images, your organization’s logo, and use other styling options. You can also change the display theme of your form. 

empirical-research-questionnaire

  • Share your Form Link with Respondents

Formplus offers multiple form sharing options which enables you to easily share your empirical research survey form with respondents. You can use the direct social media sharing buttons to share your form link to your organization’s social media pages. 

You can send out your survey form as email invitations to your research subjects too. If you wish, you can share your form’s QR code or embed it on your organization’s website for easy access. 

formplus-form-share

Empirical vs Non-Empirical Research

Empirical and non-empirical research are common methods of systematic investigation employed by researchers. Unlike empirical research that tests hypotheses in order to arrive at valid research outcomes, non-empirical research theorizes the logical assumptions of research variables. 

Definition: Empirical research is a research approach that makes use of evidence-based data while non-empirical research is a research approach that makes use of theoretical data. 

Method: In empirical research, the researcher arrives at valid outcomes by mainly observing research variables, creating a hypothesis and experimenting on research variables to confirm or refute the hypothesis. In non-empirical research, the researcher relies on inductive and deductive reasoning to theorize logical assumptions about the research subjects.

The major difference between the research methodology of empirical and non-empirical research is while the assumptions are tested in empirical research, they are entirely theorized in non-empirical research. 

Data Sample: Empirical research makes use of empirical data while non-empirical research does not make use of empirical data. Empirical data refers to information that is gathered through experience or observation. 

Unlike empirical research, theoretical or non-empirical research does not rely on data gathered through evidence. Rather, it works with logical assumptions and beliefs about the research subject. 

Data Collection Methods : Empirical research makes use of quantitative and qualitative data gathering methods which may include surveys, experiments, and methods of observation. This helps the researcher to gather empirical data, that is, data backed by evidence.  

Non-empirical research, on the other hand, does not make use of qualitative or quantitative methods of data collection . Instead, the researcher gathers relevant data through critical studies, systematic review and meta-analysis. 

Advantages of Empirical Research 

  • Empirical research is flexible. In this type of systematic investigation, the researcher can adjust the research methodology including the data sample size, data gathering methods plus the data analysis methods as necessitated by the research process.
  • It helps the research to understand how the research outcomes can be influenced by different research environments.
  • Empirical research study helps the researcher to develop relevant analytical and observation skills that can be useful in dynamic research contexts.
  • This type of research approach allows the researcher to control multiple research variables in order to arrive at the most relevant research outcomes.
  • Empirical research is widely considered as one of the most authentic and competent research designs.
  • It improves the internal validity of traditional research using a variety of experiments and research observation methods.

Disadvantages of Empirical Research 

  • An empirical research study is time-consuming because the researcher needs to gather the empirical data from multiple resources which typically takes a lot of time.
  • It is not a cost-effective research approach. Usually, this method of research incurs a lot of cost because of the monetary demands of the field research.
  • It may be difficult to gather the needed empirical data sample because of the multiple data gathering methods employed in an empirical research study.
  • It may be difficult to gain access to some communities and firms during the data gathering process and this can affect the validity of the research.
  • The report from an empirical research study is intensive and can be very lengthy in nature.

Conclusion 

Empirical research is an important method of systematic investigation because it gives the researcher the opportunity to test the validity of different assumptions, in the form of hypotheses, before arriving at any findings. Hence, it is a more research approach. 

There are different quantitative and qualitative methods of data gathering employed during an empirical research study based on the purpose of the research which include surveys, experiments, and various observatory methods. Surveys are one of the most common methods or empirical data collection and they can be administered online or physically. 

You can use Formplus to create and administer your online empirical research survey. Formplus allows you to create survey forms that you can share with target respondents in order to obtain valuable feedback about your research context, question or subject. 

In the form builder, you can add different fields to your survey form and you can also modify these form fields to suit your research process. Sign up to Formplus to access the form builder and start creating powerful online empirical research survey forms. 

Logo

Connect to Formplus, Get Started Now - It's Free!

  • advantage of empirical research
  • disadvantages of empirical resarch
  • empirical research characteristics
  • empirical research cycle
  • empirical research method
  • example of empirical research
  • uses of empirical research
  • busayo.longe

Formplus

You may also like:

Research Questions: Definitions, Types + [Examples]

A comprehensive guide on the definition of research questions, types, importance, good and bad research question examples

empirical research analysis method

What is Pure or Basic Research? + [Examples & Method]

Simple guide on pure or basic research, its methods, characteristics, advantages, and examples in science, medicine, education and psychology

Extrapolation in Statistical Research: Definition, Examples, Types, Applications

In this article we’ll look at the different types and characteristics of extrapolation, plus how it contrasts to interpolation.

Recall Bias: Definition, Types, Examples & Mitigation

This article will discuss the impact of recall bias in studies and the best ways to avoid them during research.

Formplus - For Seamless Data Collection

Collect data the right way with a versatile data collection tool. try formplus and transform your work productivity today..

empirical research analysis method

Yearly paid plans are up to 65% off for the spring sale. Limited time only! 🌸

  • Form Builder
  • Survey Maker
  • AI Form Generator
  • AI Survey Tool
  • AI Quiz Maker
  • Store Builder
  • WordPress Plugin

empirical research analysis method

HubSpot CRM

empirical research analysis method

Google Sheets

empirical research analysis method

Google Analytics

empirical research analysis method

Microsoft Excel

empirical research analysis method

  • Popular Forms
  • Job Application Form Template
  • Rental Application Form Template
  • Hotel Accommodation Form Template
  • Online Registration Form Template
  • Employment Application Form Template
  • Application Forms
  • Booking Forms
  • Consent Forms
  • Contact Forms
  • Donation Forms
  • Customer Satisfaction Surveys
  • Employee Satisfaction Surveys
  • Evaluation Surveys
  • Feedback Surveys
  • Market Research Surveys
  • Personality Quiz Template
  • Geography Quiz Template
  • Math Quiz Template
  • Science Quiz Template
  • Vocabulary Quiz Template

Try without registration Quick Start

Read engaging stories, how-to guides, learn about forms.app features.

Inspirational ready-to-use templates for getting started fast and powerful.

Spot-on guides on how to use forms.app and make the most out of it.

empirical research analysis method

See the technical measures we take and learn how we keep your data safe and secure.

  • Integrations
  • Help Center
  • Sign In Sign Up Free
  • What is empirical research: Methods, types & examples

What is empirical research: Methods, types & examples

Defne Çobanoğlu

Having opinions on matters based on observation is okay sometimes. Same as having theories on the subject you want to solve. However, some theories need to be tested. Just like Robert Oppenheimer says, “Theory will take you only so far .” 

In that case, when you have your research question ready and you want to make sure it is correct, the next step would be experimentation. Because only then you can test your ideas and collect tangible information. Now, let us start with the empirical research definition:

  • What is empirical research?

Empirical research is a research type where the aim of the study is based on finding concrete and provable evidence . The researcher using this method to draw conclusions can use both quantitative and qualitative methods. Different than theoretical research, empirical research uses scientific experimentation and investigation. 

Using experimentation makes sense when you need to have tangible evidence to act on whatever you are planning to do. As the researcher, you can be a marketer who is planning on creating a new ad for the target audience, or you can be an educator who wants the best for the students. No matter how big or small, data gathered from the real world using this research helps break down the question at hand. 

  • When to use empirical research?

Empirical research methods are used when the researcher needs to gather data analysis on direct, observable, and measurable data. Research findings are a great way to make grounded ideas. Here are some situations when one may need to do empirical research:

1. When quantitative or qualitative data is needed

There are times when a researcher, marketer, or producer needs to gather data on specific research questions to make an informed decision. And the concrete data gathered in the research process gives a good starting point.

2. When you need to test a hypothesis

When you have a hypothesis on a subject, you can test the hypothesis through observation or experiment. Making a planned study is a great way to collect information and test whether or not your hypothesis is correct.

3. When you want to establish causality

Experimental research is a good way to explore whether or not there is any correlation between two variables. Researchers usually establish causality by changing a variable and observing if the independent variable changes accordingly.

  • Types of empirical research

The aim of empirical research is to collect information about a subject from the people by doing experimentation and other data collection methods. However, the methods and data collected are divided into two groups: one collects numerical data, and the other one collects opinion-like data. Let us see the difference between these two types:

Quantitative research

Quantitative research methods are used to collect data in a numerical way. Therefore, the results gathered by these methods will be numbers, statistics, charts, etc. The results can be used to quantify behaviors, opinions, and other variables. Quantitative research methods are surveys, questionnaires, and experimental research.

Qualitiative research

Qualitative research methods are not used to collect numerical answers, instead, they are used to collect the participants’ reasons, opinions, and other meaningful aspects. Qualitative research methods include case studies, observations, interviews, focus groups, and text analysis.

  • 5 steps to conduct empirical research

Necessary steps for empirical research

Necessary steps for empirical research

When you want to collect direct and concrete data on a subject, empirical research is a great way to go. And, just like every other project and research, it is best to have a clear structure in mind. This is even more important in studies that may take a long time, such as experiments that take years. Let us look at a clear plan on how to do empirical research:

1. Define the research question

The very first step of every study is to have the question you will explore ready. Because you do not want to change your mind in the middle of the study after investing and spending time on the experimentation.

2. Go through relevant literature

This is the step where you sit down and do a desk research where you gather relevant data and see if other researchers have tried to explore similar research questions. If so, you can see how well they were able to answer the question or what kind of difficulties they faced during the research process.

3. Decide on the methodology

Once you are done going through the relevant literature, you can decide on which method or methods you can use. The appropriate methods are observation, experimentation, surveys, interviews, focus groups, etc.

4. Do data analysis

When you get to this step, it means you have successfully gathered enough data to make a data analysis. Now, all you need to do is look at the data you collected and make an informed analysis.

5. Conclusion

This is the last step, where you are finished with the experimentation and data analysis process. Now, it is time to decide what to do with this information. You can publish a paper and make informed decisions about whatever your goal is.

  • Empirical research methodologies

Some essential methodologies to conduct empirical research

Some essential methodologies to conduct empirical research

The aim of this type of research is to explore brand-new evidence and facts. Therefore, the methods should be primary and gathered in real life, directly from the people. There is more than one method for this goal, and it is up to the researcher to use which one(s). Let us see the methods of empirical research: 

  • Observation

The method of observation is a great way to collect information on people without the effect of interference. The researcher can choose the appropriate area, time, or situation and observe the people and their interactions with one another. The researcher can be just an outside observer or can be a participant as an observer or a full participant.

  • Experimentation

The experimentation process can be done in the real world by intervening in some elements to unify the environment for all participants. This method can also be done in a laboratory environment. The experimentation process is good for being able to change the variables according to the aim of the study.

The case study method is done by making an in-depth analysis of already existing cases. When the parameters and variables are similar to the research question at hand, it is wise to go through what was researched before.

  • Focus groups

The case study method is done by using a group of individuals or multiple groups and using their opinions, characteristics, and responses. The scientists gather the data from this group and generalize it to the whole population.

Surveys are an effective way to gather data directly from people. It is a systematic approach to collecting information. If it is done in an online setting as an online survey , it would be even easier to reach out to people and ask their opinions in open-ended or close-ended questions.

Interviews are similar to surveys as you are using questions to collect information and opinions of the people. Unlike a survey, this process is done face-to-face, as a phone call, or as a video call.

  • Advantages of empirical research

Empirical research is effective for many reasons, and helps researchers from numerous fields. Here are some advantages of empirical research to have in mind for your next research:

  • Empirical research improves the internal validity of the study.
  • Empirical evidence gathered from the study is used to authenticate the research question.
  • Collecting provable evidence is important for the success of the study.
  • The researcher is able to make informed decisions based on the data collected using empirical research.
  • Disadvantages of empirical research

After learning about the positive aspects of empirical research, it is time to mention the negative aspects. Because this type may not be suitable for everyone and the researcher should be mindful of the disadvantages of empirical research. Here are the disadvantages of empirical research:

  • As it is similar to other research types, a case study where experimentation is included will be time-consuming no matter what. It has more steps and variables than concluding a secondary research.
  • There are a lot of variables that need to be controlled and considered. Therefore, it may be a challenging task to be mindful of all the details.
  • Doing evidence-based research can be expensive if you need to complete it on a large scale.
  • When you are conducting an experiment, you may need some waivers and permissions.
  • Frequently asked questions about empirical research

Empirical research is one of the many research types, and there may be some questions in mind about its similarities and differences to other research types.

Is empirical research qualitative or quantitative?

The data collected by empirical research can be qualitative, quantitative, or a mix of both. It is up to the aim of researcher to what kind of data is needed and searched for.

Is empirical research the same as quantitative research?

As quantitative research heavily relies on data collection methods of observation and experimentation, it is, in nature, an empirical study. Some professors may even use the terms interchangeably. However, that does not mean that empirical research is only a quantitative one.

What is the difference between theoretical and empirical research?

Empirical studies are based on data collection to prove theories or answer questions, and it is done by using methods such as observation and experimentation. Therefore, empirical research relies on finding evidence that backs up theories. On the other hand, theoretical research relies on theorizing on empirical research data and trying to make connections and correlations.

What is the difference between conceptual and empirical research?

Conceptual research is about thoughts and ideas and does not involve any kind of experimentation. Empirical research, on the other hand, works with provable data and hard evidence.

What is the difference between empirical vs applied research?

Some scientists may use these two terms interchangeably however, there is a difference between them. Applied research involves applying theories to solve real-life problems. On the other hand, empirical research involves the obtaining and analysis of data to test hypotheses and theories.

  • Final words

Empirical research is a good means when the goal of your study is to find concrete data to go with. You may need to do empirical research when you need to test a theory, establish causality, or need qualitative/quantitative data. For example, you are a scientist and want to know if certain colors have an effect on people’s moods, or you are a marketer and want to test your theory on ad places on websites. 

In both scenarios, you can collect information by using empirical research methods and make informed decisions afterward. These are just the two of empirical research examples. This research type can be applied to many areas of work life and social sciences. Lastly, for all your research needs, you can visit forms.app to use its many useful features and over 1000 form and survey templates!

Defne is a content writer at forms.app. She is also a translator specializing in literary translation. Defne loves reading, writing, and translating professionally and as a hobby. Her expertise lies in survey research, research methodologies, content writing, and translation.

  • Form Features
  • Data Collection

Table of Contents

Related posts.

45+ Best event satisfaction survey questions to ask

45+ Best event satisfaction survey questions to ask

Ayşegül Nacu

10 proven tips for creating better forms

10 proven tips for creating better forms

5 best online form types for engaging with your customers

5 best online form types for engaging with your customers

forms.app Team

APS

New Content From Advances in Methods and Practices in Psychological Science

  • Advances in Methods and Practices in Psychological Science
  • Cognitive Dissonance
  • Meta-Analysis
  • Methodology
  • Preregistration
  • Reproducibility

empirical research analysis method

A Practical Guide to Conversation Research: How to Study What People Say to Each Other Michael Yeomans, F. Katelynn Boland, Hanne Collins, Nicole Abi-Esber, and Alison Wood Brooks  

Conversation—a verbal interaction between two or more people—is a complex, pervasive, and consequential human behavior. Conversations have been studied across many academic disciplines. However, advances in recording and analysis techniques over the last decade have allowed researchers to more directly and precisely examine conversations in natural contexts and at a larger scale than ever before, and these advances open new paths to understand humanity and the social world. Existing reviews of text analysis and conversation research have focused on text generated by a single author (e.g., product reviews, news articles, and public speeches) and thus leave open questions about the unique challenges presented by interactive conversation data (i.e., dialogue). In this article, we suggest approaches to overcome common challenges in the workflow of conversation science, including recording and transcribing conversations, structuring data (to merge turn-level and speaker-level data sets), extracting and aggregating linguistic features, estimating effects, and sharing data. This practical guide is meant to shed light on current best practices and empower more researchers to study conversations more directly—to expand the community of conversation scholars and contribute to a greater cumulative scientific understanding of the social world. 

Open-Science Guidance for Qualitative Research: An Empirically Validated Approach for De-Identifying Sensitive Narrative Data Rebecca Campbell, McKenzie Javorka, Jasmine Engleton, Kathryn Fishwick, Katie Gregory, and Rachael Goodman-Williams  

The open-science movement seeks to make research more transparent and accessible. To that end, researchers are increasingly expected to share de-identified data with other scholars for review, reanalysis, and reuse. In psychology, open-science practices have been explored primarily within the context of quantitative data, but demands to share qualitative data are becoming more prevalent. Narrative data are far more challenging to de-identify fully, and because qualitative methods are often used in studies with marginalized, minoritized, and/or traumatized populations, data sharing may pose substantial risks for participants if their information can be later reidentified. To date, there has been little guidance in the literature on how to de-identify qualitative data. To address this gap, we developed a methodological framework for remediating sensitive narrative data. This multiphase process is modeled on common qualitative-coding strategies. The first phase includes consultations with diverse stakeholders and sources to understand reidentifiability risks and data-sharing concerns. The second phase outlines an iterative process for recognizing potentially identifiable information and constructing individualized remediation strategies through group review and consensus. The third phase includes multiple strategies for assessing the validity of the de-identification analyses (i.e., whether the remediated transcripts adequately protect participants’ privacy). We applied this framework to a set of 32 qualitative interviews with sexual-assault survivors. We provide case examples of how blurring and redaction techniques can be used to protect names, dates, locations, trauma histories, help-seeking experiences, and other information about dyadic interactions. 

Impossible Hypotheses and Effect-Size Limits Wijnand van Tilburg and Lennert van Tilburg

Psychological science is moving toward further specification of effect sizes when formulating hypotheses, performing power analyses, and considering the relevance of findings. This development has sparked an appreciation for the wider context in which such effect sizes are found because the importance assigned to specific sizes may vary from situation to situation. We add to this development a crucial but in psychology hitherto underappreciated contingency: There are mathematical limits to the magnitudes that population effect sizes can take within the common multivariate context in which psychology is situated, and these limits can be far more restrictive than typically assumed. The implication is that some hypothesized or preregistered effect sizes may be impossible. At the same time, these restrictions offer a way of statistically triangulating the plausible range of unknown effect sizes. We explain the reason for the existence of these limits, illustrate how to identify them, and offer recommendations and tools for improving hypothesized effect sizes by exploiting the broader multivariate context in which they occur. 

empirical research analysis method

It’s All About Timing: Exploring Different Temporal Resolutions for Analyzing Digital-Phenotyping Data Anna Langener, Gert Stulp, Nicholas Jacobson, Andrea Costanzo, Raj Jagesar, Martien Kas, and Laura Bringmann  

The use of smartphones and wearable sensors to passively collect data on behavior has great potential for better understanding psychological well-being and mental disorders with minimal burden. However, there are important methodological challenges that may hinder the widespread adoption of these passive measures. A crucial one is the issue of timescale: The chosen temporal resolution for summarizing and analyzing the data may affect how results are interpreted. Despite its importance, the choice of temporal resolution is rarely justified. In this study, we aim to improve current standards for analyzing digital-phenotyping data by addressing the time-related decisions faced by researchers. For illustrative purposes, we use data from 10 students whose behavior (e.g., GPS, app usage) was recorded for 28 days through the Behapp application on their mobile phones. In parallel, the participants actively answered questionnaires on their phones about their mood several times a day. We provide a walk-through on how to study different timescales by doing individualized correlation analyses and random-forest prediction models. By doing so, we demonstrate how choosing different resolutions can lead to different conclusions. Therefore, we propose conducting a multiverse analysis to investigate the consequences of choosing different temporal resolutions. This will improve current standards for analyzing digital-phenotyping data and may help combat the replications crisis caused in part by researchers making implicit decisions. 

Calculating Repeated-Measures Meta-Analytic Effects for Continuous Outcomes: A Tutorial on Pretest–Posttest-Controlled Designs David R. Skvarc, Matthew Fuller-Tyszkiewicz  

Meta-analysis is a statistical technique that combines the results of multiple studies to arrive at a more robust and reliable estimate of an overall effect or estimate of the true effect. Within the context of experimental study designs, standard meta-analyses generally use between-groups differences at a single time point. This approach fails to adequately account for preexisting differences that are likely to threaten causal inference. Meta-analyses that take into account the repeated-measures nature of these data are uncommon, and so this article serves as an instructive methodology for increasing the precision of meta-analyses by attempting to estimate the repeated-measures effect sizes, with particular focus on contexts with two time points and two groups (a between-groups pretest–posttest design)—a common scenario for clinical trials and experiments. In this article, we summarize the concept of a between-groups pretest–posttest meta-analysis and its applications. We then explain the basic steps involved in conducting this meta-analysis, including the extraction of data and several alternative approaches for the calculation of effect sizes. We also highlight the importance of considering the presence of within-subjects correlations when conducting this form of meta-analysis.   

Reliability and Feasibility of Linear Mixed Models in Fully Crossed Experimental Designs Michele Scandola, Emmanuele Tidoni  

The use of linear mixed models (LMMs) is increasing in psychology and neuroscience research In this article, we focus on the implementation of LMMs in fully crossed experimental designs. A key aspect of LMMs is choosing a random-effects structure according to the experimental needs. To date, opposite suggestions are present in the literature, spanning from keeping all random effects (maximal models), which produces several singularity and convergence issues, to removing random effects until the best fit is found, with the risk of inflating Type I error (reduced models). However, defining the random structure to fit a nonsingular and convergent model is not straightforward. Moreover, the lack of a standard approach may lead the researcher to make decisions that potentially inflate Type I errors. After reviewing LMMs, we introduce a step-by-step approach to avoid convergence and singularity issues and control for Type I error inflation during model reduction of fully crossed experimental designs. Specifically, we propose the use of complex random intercepts (CRIs) when maximal models are overparametrized. CRIs are multiple random intercepts that represent the residual variance of categorical fixed effects within a given grouping factor. We validated CRIs and the proposed procedure by extensive simulations and a real-case application. We demonstrate that CRIs can produce reliable results and require less computational resources. Moreover, we outline a few criteria and recommendations on how and when scholars should reduce overparametrized models. Overall, the proposed procedure provides clear solutions to avoid overinflated results using LMMs in psychology and neuroscience.   

Understanding Meta-Analysis Through Data Simulation With Applications to Power Analysis Filippo Gambarota, Gianmarco Altoè  

Meta-analysis is a powerful tool to combine evidence from existing literature. Despite several introductory and advanced materials about organizing, conducting, and reporting a meta-analysis, to our knowledge, there are no introductive materials about simulating the most common meta-analysis models. Data simulation is essential for developing and validating new statistical models and procedures. Furthermore, data simulation is a powerful educational tool for understanding a statistical method. In this tutorial, we show how to simulate equal-effects, random-effects, and metaregression models and illustrate how to estimate statistical power. Simulations for multilevel and multivariate models are available in the Supplemental Material available online. All materials associated with this article can be accessed on OSF ( https://osf.io/54djn/ ).   

Feedback on this article? Email  [email protected]  or login to comment.

APS regularly opens certain online articles for discussion on our website. Effective February 2021, you must be a logged-in APS member to post comments. By posting a comment, you agree to our Community Guidelines and the display of your profile information, including your name and affiliation. Any opinions, findings, conclusions, or recommendations present in article comments are those of the writers and do not necessarily reflect the views of APS or the article’s author. For more information, please see our Community Guidelines .

Please login with your APS account to comment.

Privacy Overview

A new sample-size planning approach for person-specific VAR(1) studies: Predictive accuracy analysis

  • Original Manuscript
  • Published: 08 May 2024

Cite this article

empirical research analysis method

  • Jordan Revol   ORCID: orcid.org/0000-0001-5511-3617 1 ,
  • Ginette Lafit   ORCID: orcid.org/0000-0002-8227-128X 2 &
  • Eva Ceulemans   ORCID: orcid.org/0000-0002-7611-4683 1  

22 Accesses

6 Altmetric

Explore all metrics

Researchers increasingly study short-term dynamic processes that evolve within single individuals using N = 1 studies. The processes of interest are typically captured by fitting a VAR(1) model to the resulting data. A crucial question is how to perform sample-size planning and thus decide on the number of measurement occasions that are needed. The most popular approach is to perform a power analysis, which focuses on detecting the effects of interest. We argue that performing sample-size planning based on out-of-sample predictive accuracy yields additional important information regarding potential overfitting of the model. Predictive accuracy quantifies how well the estimated VAR(1) model will allow predicting unseen data from the same individual. We propose a new simulation-based sample-size planning method called predictive accuracy analysis (PAA), and an associated Shiny app. This approach makes use of a novel predictive accuracy metric that accounts for the multivariate nature of the prediction problem. We showcase how the values of the different VAR(1) model parameters impact power and predictive accuracy-based sample-size recommendations using simulated data sets and real data applications. The range of recommended sample sizes is smaller for predictive accuracy analysis than for power analysis.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA) Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Rent this article via DeepDyve

Institutional subscriptions

empirical research analysis method

Similar content being viewed by others

empirical research analysis method

Estimating power in (generalized) linear mixed models: An open introduction and tutorial in R

empirical research analysis method

Sample size planning for complex study designs: A tutorial for the mlpwr package

empirical research analysis method

Bayesian updating: increasing sample size during the course of a study

We used the following packages: DataFrames (version 1.6.1), DataTables (version 0.1.0), DataAPI (version 1.1.0), CSV (version 0.10.12) to handle the data; LinearAlgebra (version 0.5.1), GLM (version 1.9.0), HypothesisTests (version 0.11.0), StatsBase (version 0.34.2) to estimate the model and extract the estimated parameters; Distributions (version 0.25.107) and Distances (version 0.10.11) to handle statistical distributions.

When generating (V)AR(1) time series, we have to use starting values, that is, the variable scores at the first time point. To remove the influence of these starting values, we removed the first 1000 time points (known as the burn-in phase).

Adolf, J. K., Voelkle, M. C., Brose, A., & Schmiedek, F. (2017). Capturing context-related change in emotional dynamics via fixed moderated time series analysis. Multivariate Behavioral Research, 52 (4), 499–531.

Ariens, S., Ceulemans, E., & Adolf, J. K. (2020). Time series analysis of intensive longitudinal data in psychosomatic research: A methodological overview. Journal of Psycho-somatic Research, 137 , 110191.

Article   Google Scholar  

Babyak, M. A. (2004). What you see may not be what you get: A brief, nontechnical introduction to overfitting in regression-type models. Psychosomatic Medicine, 66 (3), 411–421.

Bezanson, J., Karpinski, S., Shah, V., & Edelman, A. (2012). Julia: A fast dynamic language for technical computing.

Borsboom, D., & Cramer, A. O. (2013). Network analysis: An integrative approach to the structure of psychopathology. Annual Review of Clinical Psychology, 9 (1), 91–121.

Bulteel, K., Mestdagh, M., Tuerlinckx, F., & Ceulemans, E. (2018). VAR(1) based models do not always outpredict AR(1) models in typical psychological applications. Psychological Methods, 23 , 740–756.

Article   PubMed   Google Scholar  

Bulteel, K., Tuerlinckx, F., Brose, A., & Ceulemans, E. (2018). Improved insight into and prediction of network dynamics by combining VAR and dimension reduction. Multivariate Behavioral Research, 53 (6), 853–875.

Button, K. S., Ioannidis, J. P. A., Mokrysz, C., Nosek, B. A., Flint, J., Robinson, E. S. J., & Munafó, M. R. (2013). Power failure: Why small sample size undermines the reliability of neuroscience. Nature Reviews Neuroscience, 14 (5), 365–376.

Chang, W., Cheng, J., Allaire, JJ., Sievert, C., Schloerke, B., Xie, Y., Allen, J., McPherson, J., Dipert, A., & Borges, B. (2023). Shiny: Web application framework for R.

Cohen, J. (1992). Statistical power analysis. Current Directions in Psychological Science, 1 (3), 98–101.

De Haan-Rietdijk, S., Voelkle, M. C., Keijsers, L., & Hamaker, E. L. (2017). Discretevs. continuous-time modeling of unequally spaced experience sampling method data. Frontiers in Psychology, 8 , 1849.

Dejonckheere, E., Kalokerinos, E. K., Bastian, B., & Kuppens, P. (2019). Poor emotion regulation ability mediates the link between depressive symptoms and affective bipolarity. Cognition and Emotion, 33 (5), 1076–1083.

Dejonckheere, E., Mestdagh, M., Houben, M., Rutten, I., Sels, L., Kuppens, P., & Tuerlinckx, F. (2019). Complex affect dynamics add limited information to the prediction of psychological well-being. Nature Human Behaviour, 3 (5), 478–491.

Epskamp, S., van Borkulo, C. D., van der Veen, D. C., Servaas, M. N., Isvoranu, A.-M., Riese, H., & Cramer, A. O. J. (2018). Personalized network modeling in psychopathology: The importance of contemporaneous and temporal connections. Clinical Psycho-logical Science, 6 (3), 416–427.

Fisher, A. J., Reeves, J. W., Lawyer, G., Medaglia, J. D., & Rubel, J. A. (2017). Exploring the idiographic dynamics of mood and anxiety via network analysis. Journal of Abnormal Psychology, 126 (8), 1044–1056.

Green, P., & MacLeod, C. J. (2016). SIMR : An R package for power analysis of generalized linear mixed models by simulation. Methods in Ecology and Evolution, 7 (4), 493–498.

Hamaker, E. L., Asparouhov, T., Brose, A., Schmiedek, F., & Muthén, B. (2018). At the frontiers of modeling intensive longitudinal data: Dynamic structural equation models for the affective measurements from the COGITO study. Multivariate Behavioral Research, 53 (6), 820–841.

Hamaker, E. L., Ceulemans, E., Grasman, R. P. P. P., & Tuerlinckx, F. (2015). Modeling affect dynamics: State of the art and future challenges. Emotion Review, 7 (4), 316–322.

Hamaker, E. L., & Wichers, M. (2017). No time like the present: Discovering the hidden dynamics in intensive longitudinal data. Current Directions in Psychological Science, 26 (1), 10–15.

Hamaker, E. L., Zhang, Z., & Van Der Maas, H. L. J. (2009). Using threshold autoregressive models to study dyadic interactions. Psychometrika, 74 (4), 727.

Hastie, T., Tibshirani, R., & Friedman, J. (2013). The Elements of Statistical Learning: Data Mining, Inference, and Prediction . New York, NY: Springer.

Google Scholar  

Heininga, V. E., Dejonckheere, E., Houben, M., Obbels, J., Sienaert, P., Leroy, B., van Roy, J., & Kuppens, P. (2019). The dynamical signature of anhedonia in major depressive disorder: Positive emotion dynamics, reactivity, and recovery. BMC Psychiatry, 19 (1), 59.

Article   PubMed   PubMed Central   Google Scholar  

Jongerling, J., Laurenceau, J.-P., & Hamaker, E. L. (2015). A multilevel AR(1) model: Allowing for inter-individual differences in trait-scores, inertia, and innovation variance. Multivariate Behavioral Research, 50 (3), 334–349.

Kirtley, O. J. (2022). Advancing credibility in longitudinal research by implementing open science practices: Opportunities, practical examples, and challenges. Infant and Child Development, 31 (1).

Krone, T., Albers, C. J., Kuppens, P., & Timmerman, M. E. (2018). A multivariate statistical model for emotion dynamics. Emotion, 18 , 739–754.

Kuppens, P. (2015). It’s about time: A special section on affect dynamics. Emotion Review, 7 (4), 297–300.

Kuppens, P., Allen, N. B., & Sheeber, L. B. (2010). Emotional inertia and psychological maladjustment. Psychological Science, 21 (7), 984–991.

Kuppens, P., Champagne, D., & Tuerlinckx, F. (2012). The dynamic interplay between appraisal and core affect in daily life. Frontiers in Psychology, 3 .

Kuppens, P., & Verduyn, P. (2017). Emotion dynamics. Current Opinion in Psychology, 17 , 22–26.

Lafit, G., Adolf, J. K., Dejonckheere, E., Myin-Germeys, I., Viechtbauer, W., & Ceulemans, E. (2021). Selection of the number of participants in intensive longitudinal studies: A user-friendly shiny app and tutorial for performing power analysis in multilevel regression models that account for temporal dependencies. Advances in Methods and Practices in Psychological Science, 4 (1), 251524592097873.

Lafit, G., Meers, K., & Ceulemans, E. (2022). A systematic study into the factors that affect the predictive accuracy of multilevel VAR(1) models. Psychometrika, 87 (2), 432–476.

Lafit, G., Revol, J., Cloos, L., Kuppens, P., & Ceulemans, E. (2023). The effect of different operationalizations of affect and preprocessing choices on power-based sample size recommendations in intensive longitudinal research .

Lafit, G., Sels, L., Adolf, J. K., Loeys, T., & Ceulemans, E. (2022b). PowerLAPIM: An application to conduct power analysis for linear and quadratic longitudinal actor–partner interdependence models in intensive longitudinal dyadic designs. Journal of Social and Personal Relationships , page 02654075221080128.

Lakens, D. (2022). Sample size justification. Collabra. Psychology, 8 (1), 33267.

Lane, S. P., & Hennes, E. P. (2018). Power struggles: Estimating sample size for multilevel relationships research. Journal of Social and Personal Relationships, 35 (1), 7–31.

Larson, R. & Csikszentmihalyi, M. (2014). The Experience Sampling Method, pages 21–34. Springer Netherlands, Dordrecht.

Liu, S. & Zhou, D. J. (2023). Using cross-validation methods to select time series models: Promises and pitfalls. British Journal of Mathematical and Statistical Psychology , page bmsp.12330.

Loossens, T., Dejonckheere, E., Tuerlinckx, F., & Verdonck, S. (2021). Informing VAR(1) with qualitative dynamical features improves predictive accuracy. Psychological Methods, 26 (6), 635–659.

Lütkepohl, H. (2005). New Introduction to Multiple Time Series Analysis . Berlin Heidelberg: Springer.

Mansueto, A. C., Wiers, R. W., van Weert, J. C. M., Schouten, B. C., & Epskamp, S. (2022). Investigating the feasibility of idiographic network models. Psychological Methods .

Marriott, F. H. C., & Pope, J. A. (1954). Bias in the estimation of autocorrelations. Biometrika, 41 (3/4), 390.

Munafó, M. R., Nosek, B. A., Bishop, D. V. M., Button, K. S., Chambers, C. D., Percie du Sert, N., Simonsohn, U., Wagenmakers, E.-J., Ware, J. J., & Ioannidis, J. P. A. (2017). A manifesto for reproducible science. Nature Human Behaviour, 1 (1), 0021.

Myin-Germeys, I., & Kuppens, P. (Eds.). (2021). The Open Handbook of Experience Sampling Methodology: A Step-by-Step Guide to Designing, Conducting, and Analyzing ESM Studies . Leuven: Center for Research on Experience Sampling and Ambulatory Methods.

Pe, M. L., Brose, A., Gotlib, I. H., & Kuppens, P. (2016). Affective updating ability and stressful events interact to prospectively predict increases in depressive symptoms over time. Emotion, 16 (1), 73–82.

Pe, M. L., Kircanski, K., Thompson, R. J., Bringmann, L. F., Tuerlinckx, F., Mestdagh, M., Mata, J., Jaeggi, S. M., Buschkuehl, M., Jonides, J., Kuppens, P., & Gotlib, I. H. (2015). Emotion-network density in major depressive disorder. Clinical Psychological Science, 3 (2), 292–300.

Phillips, P. C. B. (1995). Fully modified least squares and vector autoregression. Econo-metrica, 63 (5), 1023.

Provenzano, J., Fossati, P., Dejonckheere, E., Verduyn, P., & Kuppens, P. (2021). In exibly sustained negative affect and rumination independently link default mode network efficiency to subclinical depressive symptoms. Journal of Affective Disorders, 293 , 347–354.

Rosseel, Y. (2012). lavaan: An R package for structural equation modeling. Journal of Statistical Software, 48 (2), 1–36.

Schuurman, N. K., & Hamaker, E. L. (2019). Measurement error and person-specific reliability in multilevel autoregressive modeling. Psychological Methods, 24 (1), 70–91.

Sels, L., Ceulemans, E., & Kuppens, P. (2017). Partner-expected affect: How you feel now is predicted by how your partner thought you felt before. Emotion, 17 (7), 1066–1077.

Tong, H., & Lim, K. S. (1980). Threshold autoregression, limit cycles and cyclical data. Journal of the Royal Statistical Society: Series B (Methodological), 42 (3), 245–268.

Trafimow, D. (2022). Generalizing across auxiliary, statistical, and inferential assumptions. Journal for the Theory of Social Behaviour, 52 (1), 37–48.

Trull, T. J., & Ebner-Priemer, U. W. (2020). Ambulatory assessment in psychopathology research: A review of recommended reporting guidelines and current practices. Journal of Abnormal Psychology, 129 (1), 56–63.

Vanhasbroeck, N., Ariens, S., Tuerlinckx, F., & Loossens, T. (2021). Computational Models for Affect Dynamics. In C. E. Waugh & P. Kuppens (Eds.), Affect Dynamics (pp. 213–260). Cham: Springer International Publishing.

Chapter   Google Scholar  

Vanhasbroeck, N., Loossens, T., Anarat, N., Ariens, S., Vanpaemel, W., Moors, A., & Tuerlinckx, F. (2022). Stimulus-driven affective change: Evaluating computational models of affect dynamics in conjunction with input. Affective Science, 3 (3), 559–576.

Yarkoni, T., & Westfall, J. (2017). Choosing prediction over explanation in psychology: Lessons from machine learning. Perspectives on Psychological Science, 12 (6), 1100–1122.

Zhang, Y., Revol, J., Lafit, G., Ernst, A., Razum, J., Ceulemans, E., & Bringmann, L. (2023). Sample size optimization for person-specific temporal networks using power analysis and predictive accuracy analysis. Manuscript in preparation.

Download references

The research presented in this article was supported by research grants from the Fund for Scientific Research-Flanders (FWO; Project No. G0C9821N) and from the Research Council of KU Leuven (C14/23/062; iBOF/21/090) awarded to E. Ceulemans.

Author information

Authors and affiliations.

Research Group of Quantitative Psychology and Individual Differences, KU Leuven, Leuven, Belgium

Jordan Revol & Eva Ceulemans

Methodology of Educational Sciences Research Group, KU Leuven, Leuven, Belgium

Ginette Lafit

You can also search for this author in PubMed   Google Scholar

Contributions

The authors made the following contributions. Jordan Revol: Conceptualization, Formal Analysis, Methodology, Visualization, Software, Writing - Original Draft Preparation, Review & Editing; Ginette Lafit: Conceptualization, Methodology, Supervision, Writing - Original Draft Preparation, Review & Editing. Eva Ceulemans: Conceptualization, Methodology, Funding acquisition, Supervision, Writing - Original Draft Preparation, Review & Editing.

Corresponding author

Correspondence to Jordan Revol .

Ethics declarations

Conflict of interest.

The authors declare that there are no conflicts of interest with respect to the authorship or the publication of this article.

Additional information

Publisher's note.

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Revol, J., Lafit, G. & Ceulemans, E. A new sample-size planning approach for person-specific VAR(1) studies: Predictive accuracy analysis. Behav Res (2024). https://doi.org/10.3758/s13428-024-02413-4

Download citation

Accepted : 28 March 2024

Published : 08 May 2024

DOI : https://doi.org/10.3758/s13428-024-02413-4

Share this article

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

  • Predictive accuracy
  • Sample-size planning method
  • Power analysis
  • Monte Carlo simulation
  • Intensive longitudinal designs
  • Autoregressive models
  • Find a journal
  • Publish with us
  • Track your research
  • Search Menu
  • Advance Access
  • Collections
  • Author Guidelines
  • Submission Site
  • Open Access Policy
  • Self-Archiving Policy
  • Why Submit?
  • About Horticulture Research
  • About Nanjing Agricultural University
  • Editorial Board
  • Advertising & Corporate Services
  • Journals on Oxford Academic
  • Books on Oxford Academic

Nanjing Agricultural University

Article Contents

Introduction, conclusions, materials and methods, acknowledgements, author contributions, data availability, conflict of interest statement.

  • < Previous

Multi-omics analysis reveals key regulatory defense pathways and genes involved in salt tolerance of rose plants

These authors contributed equally to this work.

  • Article contents
  • Figures & tables
  • Supplementary Data

Haoran Ren, Wenjing Yang, Weikun Jing, Muhammad Owais Shahid, Yuming Liu, Xianhan Qiu, Patrick Choisy, Tao Xu, Nan Ma, Junping Gao, Xiaofeng Zhou, Multi-omics analysis reveals key regulatory defense pathways and genes involved in salt tolerance of rose plants, Horticulture Research , Volume 11, Issue 5, May 2024, uhae068, https://doi.org/10.1093/hr/uhae068

  • Permissions Icon Permissions

Salinity stress causes serious damage to crops worldwide, limiting plant production. However, the metabolic and molecular mechanisms underlying the response to salt stress in rose ( Rosa spp.) remain poorly studied. We therefore performed a multi-omics investigation of Rosa hybrida cv. Jardin de Granville (JDG) and Rosa damascena Mill. (DMS) under salt stress to determine the mechanisms underlying rose adaptability to salinity stress. Salt treatment of both JDG and DMS led to the buildup of reactive oxygen species (H 2 O 2 ). Palisade tissue was more severely damaged in DMS than in JDG, while the relative electrolyte permeability was lower and the soluble protein content was higher in JDG than in DMS. Metabolome profiling revealed significant alterations in phenolic acid, lipids, and flavonoid metabolite levels in JDG and DMS under salt stress. Proteome analysis identified enrichment of flavone and flavonol pathways in JDG under salt stress. RNA sequencing showed that salt stress influenced primary metabolism in DMS, whereas it substantially affected secondary metabolism in JDG. Integrating these datasets revealed that the phenylpropane pathway, especially the flavonoid pathway, is strongly enhanced in rose under salt stress. Consistent with this, weighted gene coexpression network analysis (WGCNA) identified the key regulatory gene chalcone synthase 1 ( CHS1 ), which is important in the phenylpropane pathway. Moreover, luciferase assays indicated that the bHLH74 transcription factor binds to the CHS1 promoter to block its transcription. These results clarify the role of the phenylpropane pathway, especially flavonoid and flavonol metabolism, in the response to salt stress in rose.

Rose ( Rosa spp.) is a popular ornamental crop that is also used in the cosmetics, perfume and medicine. Rose plants contains various bioactive substances, including flavonoids, fragrant components, and hydrolysable and condensed tannins, which have high value and market potential [ 1 ]. However, soil salinization is common in many rose-growing regions, and high salt concentrations in soil can severely inhibit rose plant growth, reduce flower quality, and cause significant economic losses [ 2 ]. Additionally, salt stress can enhance the secondary metabolites of roses such as citronellol, geraniol, and phenyl ethyl alcohol [ 3 , 4 ]. Such alterations in secondary metabolites may help to regulate the salt tolerance of rose. Research on roses has focused mainly on flower quality, petal development, and flower bloom [ 5–7 ], and there are limited data available regarding signaling pathways linking plant development and secondary metabolites associated with salt stress.

In plants, salt stress induces osmotic imbalances, which lead to the closure of leaf stomata, limit photosynthesis, and affect plant growth and metabolism [ 8 ]. To alleviate osmotic stress and protect themselves from its adverse effects, plants accumulate numerous compatible solutes (such as soluble proteins, soluble sugars, and proline), known collectively as osmoprotectants [ 9 ]. Moreover, plants generate reactive oxygen species (ROS) to cope with salt stress [ 10 ]. Nevertheless, excessive ROS accumulation can lead to oxidative DNA damage, affect protein biosynthesis, and ultimately result in cell damage and death [ 11 , 12 ]. Plant cells utilize both enzymatic and nonenzymatic antioxidant mechanisms to diminish ROS levels and prevent oxidative damage. Superoxide dismutase (SOD), peroxidase (POD), ascorbate peroxidase (APX), catalase (CAT), and glutathione peroxidase (GPX) are antioxidant enzymes that work as O 2− and H 2 O 2 scavengers [ 13 , 14 ]. Nonenzymatic antioxidants, such as ascorbate, glutathione, phenols, and flavonoids, also play vital roles in ROS scavenging [ 15 , 16 ].

Flavonoids are naturally occurring bioactive substances found in fruits, vegetables, tea, and medicinal plants [ 17 ]. Flavonoids comprise more than 9000 compounds and constitute a substantial category of plant secondary metabolites [ 18 ]. They have diverse biological functions in the growth and development of plants, including improving pollen fertility, imparting color, and influencing seed dormancy and germination [ 19 , 20 ]. In addition, flavonoids have protective roles against biotic and abiotic stresses, such as pathogen infections, ultraviolet (UV)-B, cold, drought, and salinity [ 21–23 ]. Flavonoids have also received widespread attention due to their possible benefits for human health [ 24 ].

The molecular mechanism of flavonoid biosynthesis has been elucidated in many plants [ 25 ]. Chalcone synthase (CHS) mediates the first step in flavonoid production, catalyzing the formation of naringenin chalcone from three molecules of malonyl CoA and one molecule of 4-coumaroyl CoA. Chalcone isomerase (CHI) then quickly converts naringenin chalcone into naringenin (flavanone), which is further biosynthesized into different flavonoids by the subsequent enzymes in this pathway [ 26 ]. Although the biosynthesis of flavonoids has attracted increasing attention from scholars, current research does not fully explain the effects of regulatory factors on the transcription and activity of the major enzymes in flavonoid metabolism. Therefore, further research on the signaling molecules and regulatory pathways associated with flavonoids, as well as their regulatory mechanisms, is needed to elucidate the physiological activity of flavonoids.

Rosa hybrida cv. Jardin de Granville (JDG) is a new hybrid rose developed by 'Les Roses Anciennes André Eve' for the Prestige range of Christian Dior skin care products. JDG possesses twice the vitality of a traditional rose and grows and blooms vigorously in the salty air and harsh winds of coastal climates. JDG is also rich in beneficial bioactive substances that are mainly used in cosmetics and anti-aging skin care creams [ 27 , 28 ]. Rosa damascena Mill. (DMS) is one of the most common fragrant roses in the Rosaceae family. Its essential oils and aromatic compounds are used extensively in the cosmetic and food industries worldwide [ 29 ]. DMS is considered an excellent rose throughout the world due to its high resistance to abiotic stress and abundance of beneficial secondary metabolites [ 30 ].

Here, we conducted an integrated analysis on the transcriptomes, proteomes, and metabolomes of JDG and DMS to explore the relationship between plant development and secondary metabolites of rose under salt stress. We used WGCNA and Cytoscape software to decipher the similarities and differences in the complex metabolic pathways and regulatory genes of JDG and DMS under salt stress. These results provide comprehensive information on the metabolic and molecular mechanisms of the response to salt stress in rose, promoting the cultivation of excellent new rose varieties that are both salt tolerant and rich in beneficial secondary metabolites.

JDG is more tolerant than DMS to salt stress

To explore the salt tolerance of rose, plants of JDG and DMS were treated with 400 mM NaCl for 2 weeks. DMS plants showed typical damage with yellowing and death of leaves, while JDG leaves only exhibited slight wilting ( Fig. 1A ). Additionally, detached rose leaves were treated with salt for 4 days; DMS leaves showed significantly more necrosis than JDG leaves ( Fig. 1B ). In order to quickly observe the response of rose cultivars to salt stress and convenience sampling, subsequent experiments mainly used detached rose leaves. To examine the overall anatomy and morphology of leaves treated for 2 days with NaCl, we stained treated and control leaves with toluidine blue and prepared thin sections. Palisade tissue damage in response to salt treatment was more severe in DMS than in JDG (indicated by red arrowheads in Fig. 1C ). To investigate ROS accumulation in response to salt stress, we performed 3, 3'-diaminobenzidine (DAB) staining. DMS leaves accumulated substantially more ROS (deeper staining) than JDG plants after salt stress, whereas there was no difference in ROS content between these two cultivars under normal conditions ( Fig. 1D, E ). Soluble protein content was higher in JDG leaves after 4 days of salt stress than after 2 days of salt stress, while the soluble protein content of DMS leaves was much higher than that of before treatment leaves after 2 days and decreased by 4 days of salt treatment ( Fig. 1F ). The relative electrolyte permeability of JDG leaves was increased slightly after 2 days of salt treatment and more substantially after 4 days of treatment, while relative electrolyte permeability was much higher in DMS than in JDG on both days after salt treatment ( Fig. 1G ). Phenotypic and physiological analyses indicated that JDG is more salt tolerant than DMS.

Phenotypes of JDG and DMS under salt stress. (A) Phenotypes of JDG and DMS plants after 2 weeks of treatment with 400 mM NaCl. Left, phenotype of the whole plant; right, enlarged image of the protruding part indicated by the red circle. Bars, 3 cm. (B) Detached leaves of rose on different days after onset of salt stress (400 mM NaCl). (C) Anatomical analysis of leaves in (B). Red arrowheads represent the palisade tissue. Mock (0 mM NaCl); NaCl (400 mM NaCl). Bars, 50 μm. (D) Tissue staining of rose leaves under salt stress using DAB. (E) Quantitative statistics of the relative staining intensity in (D). Brown staining area and total leaf area were measured using ImageJ software, their ratio is the relative staining intensity. (F) Soluble protein content of rose leaves at different days under salt treatment. (G) Relative electrolyte permeability of rose leaves at different days under salt treatment. Data are based on the mean ± SE of at least three repeated biological experiments.

Phenotypes of JDG and DMS under salt stress. (A) Phenotypes of JDG and DMS plants after 2 weeks of treatment with 400 mM NaCl. Left, phenotype of the whole plant; right, enlarged image of the protruding part indicated by the red circle. Bars, 3 cm. (B) Detached leaves of rose on different days after onset of salt stress (400 mM NaCl). (C) Anatomical analysis of leaves in (B). Red arrowheads represent the palisade tissue. Mock (0 mM NaCl); NaCl (400 mM NaCl). Bars, 50 μm. (D) Tissue staining of rose leaves under salt stress using DAB. (E) Quantitative statistics of the relative staining intensity in (D). Brown staining area and total leaf area were measured using ImageJ software, their ratio is the relative staining intensity. (F) Soluble protein content of rose leaves at different days under salt treatment. (G) Relative electrolyte permeability of rose leaves at different days under salt treatment. Data are based on the mean ± SE of at least three repeated biological experiments.

Flavonoid metabolites play an important role in the salinity tolerance of rose

To better understand how salt stress affects rose metabolites, we performed a comprehensive untargeted analysis of metabolites using ultra-performance liquid chromatography/mass spectrometry (UPLC/MS). Fig. S1A shows the different metabolites detected, and Fig. S1B shows the curves of the quality control samples, indicating that the mass spectral data were highly reproducible and reliable. Principal component analysis (PCA) was used to reduce the data dimensions and clarify the relationships among the samples. The two principal components PC1, and PC2 could explain 50.07% and 23.36% of the variance, respectively. Moreover, PC1 revealed variance in genotypes, while PC2 revealed differences in time of exposure to salt stress. Thus, the metabolite-based PCA revealed obvious differences in salt tolerance between the two cultivars ( Fig. S2A ).

Our screening for differentially accumulated metabolites (DAMs) identified hundreds of metabolites with significantly altered accumulation under salt stress ( Fig. 2A , Table S1 ). Preliminary analysis indicated that DAMs included amino acids and their derivatives, nucleotides and their derivatives, phenolic acids, flavonoids, lipids, tannins, lignans and coumarins, organic acids, alkaloids, and terpenoids, and most of the DAMs were upregulated under salt stress ( Fig. 2B ). Phenolic acids, lipids, and flavonoid metabolites showed significantly altered accumulation under salt stress in both JDG and DMS. Compared with their levels in DMS, flavonoid metabolites, phenolic acid metabolites, and lipids were differentially accumulated in JDG leaves under both control conditions and salt stress ( Table S1 ). These results indicate that flavonoid metabolites, phenolic acid metabolites, and lipids may play important roles in the salt tolerance of rose.

Metabolomic analysis of JDG and DMS under salt stress. (A) Number of DAMs in different comparison groups. (B) Classification of DAMs in each comparison. (C) Classification of DAMs upregulated in both JDG and DMS under salt treatment. (D) Classification of DAMs upregulated in JDG compared with DMS under both control and salt treatments. (E, F) KEGG pathway enrichment of DAMs under salt stress: (E) JDG-NaCl vs JDG-Mock and (F) DMS-NaCl vs DMS-Mock.

Metabolomic analysis of JDG and DMS under salt stress. (A) Number of DAMs in different comparison groups. (B) Classification of DAMs in each comparison. (C) Classification of DAMs upregulated in both JDG and DMS under salt treatment. (D) Classification of DAMs upregulated in JDG compared with DMS under both control and salt treatments. (E, F) KEGG pathway enrichment of DAMs under salt stress: (E) JDG-NaCl vs JDG-Mock and (F) DMS-NaCl vs DMS-Mock.

To determine how metabolites differ between JDG and DMS, we summarized the differences in metabolite accumulation in the different comparison groups using Venn diagrams. Groups JDG-NaCl vs JDG-Mock and DMS-NaCl vs DMS-Mock shared 109 of the same metabolite changes, of which 79 were increases and 15 were decreases. Among the upregulated metabolites, phenolic acids and flavonoids accounted for 21.52% and 7.59%, respectively. These metabolites included ferulic acid, coniferaldehyde, pinocembrin (dihydrochrysin), naringin, eucalyptin (5-hydroxy-7,4'-dimethoxy-6,8-dimethylflavone), patuletin (quercetagetin-6-methyl ether), naringenin-7- O -rutinoside-4'- O -glucoside, naringin (naringenin-7- O -neohesperidoside), and sudachitin ( Fig. 2C , Fig. S2B–D , Table S1 ). Notably, 5,7,8,4'-tetramethoxyflavone, vanillic acid-4- O -glucoside, and 3',4',5',5,7-pentamethoxyflavone were upregulated in JDG and downregulated in DMS under salt stress, while kaempferol-3- O -arabinoside-7- O -rhamnoside was upregulated in DMS and downregulated in JDG. Groups JDG-Mock vs DMS-Mock and JDG-NaCl vs DMS-NaCl shared 408 metabolites showing the same tendency in alteration, of which accumulation of 188 was increased and 202 was decreased. Among the upregulated metabolites, phenolic acids and flavonoids accounted for 29.26% and 33.51%, respectively ( Fig. 2D ). Notably, the genkwanin (apigenin 7-methyl ether) content was 12.74-fold higher, the 5,7-dihydroxy-6,3′,4′,5′-tetramethoxyflavone (arteanoflavone) content was 15.64-fold higher, the naringenin-4′,7-dimethyl ether content was 13-fold higher, and the naringin dihydrochalcone content was 13.30-fold in JDG compared with DMS under control conditions; all of these are flavonoid metabolites. Venn analysis also showed that many metabolites displaying changes under salt stress were genotype specific, indicating that the cultivars have different mechanisms of response to salinity. There were 77 metabolites that specifically accumulated in JDG under salt stress, which may represent the major metabolites in the salt stress response of JDG. Notably, four metabolites—ethylsalicylate (a phenolic acid), salidroside (a phenolic acid), L-ornithine (amino acids and derivatives), and epiafzelechin (a flavonoid)—accumulated specifically in JDG after salt treatment and were also highly accumulated under control conditions in JDG compared with DMS ( Fig. S2B–D , Table S1 ).

All DAMs were analyzed using Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway enrichment ( Fig. 2E, F , Fig. S3A, B ). In JDG (JDG-NaCl vs JDG-Mock group), salt stress induced changes in metabolites mainly involved 'purine metabolism,' 'phenylpropanoid biosynthesis,' 'linoleic acid metabolism,' and 'alpha-linolenic acid metabolism' ( Fig. 2E ). In DMS (DMS-NaCl vs DMS-Mock group), the DAMs in leaves under salt stress were mainly associated with 'phenylpropanoid biosynthesis,' 'alpha-linolenic acid metabolism,' 'linoleic acid metabolism,' and 'pentose and glucuronate interconversions' ( Fig. 2F ). In the JDG-Mock vs DMS-Mock group, DAMs between leaves of DMS and JDG were mostly associated with 'flavonoid biosynthesis,' 'flavone and flavonol biosynthesis,' and 'phenylpropanoid biosynthesis' ( Fig. S3A ). Meanwhile, in the JDG-NaCl vs DMS-NaCl group, DAMs were largely involved in 'flavonoid biosynthesis,' 'flavone and flavonol biosynthesis,' and 'linoleic acid metabolism' ( Fig. S3B ). KEGG enrichment analysis showed that 'linolenic acid/α-linolenic acid metabolism' and 'phenylpropanoid biosynthesis' were significantly enriched under salt stress in both cultivars, indicating that these two pathways play important roles under salt stress in rose. Regardless of the presence of salt stress, DAMs between DMS and JDG were concentrated in the flavone, flavonoid, and flavonol biosynthetic pathways, indicating that differential accumulation of these metabolites may be the main reason for different salt sensitivities among rose cultivars. Notably, 'caffeine metabolism' was enriched in JDG, while 'starch and sucrose metabolism' was significantly increased in DMS.

Salt stress causes dynamic changes in distinct sets of proteins

To delve deeper into the molecular mechanisms of the salt stress response in rose plants, we performed a proteome profiling analysis under the same salt treatment and control conditions as the metabolome analysis and characterized proteins on the basis of fold changes in their accumulation level. We identified 119 (87 upregulated and 32 downregulated) and 163 (83 downregulated and 80 upregulated) proteins with significantly differential accumulation under salt stress in JDG and DMS, respectively ( Fig. 3A, B ). Only 18 differentially accumulated proteins (DAPs) overlapped between the two cultivars, of which 13 were upregulated and 4 were downregulated in both JDG and DMS, while one DUF1279 domain–containing protein was upregulated in JDG and downregulated in DMS. Moreover, 101 DAPs were unique to JDG, whereas 145 DAPs were unique to DMS ( Table S2 ).

Proteomic analysis of rose under salt stress. (A) Number of DAPs in JDG and DMS. (B) Venn diagram of the DAPs in JDG and DMS. (C) Localizations of DAPs identified in JDG. (D) Functional categorization of DAPs unique to JDG. (E, F) KEGG enrichment analysis of DAPs in JDG (upregulated, E) and DMS (upregulated, F).

Proteomic analysis of rose under salt stress. (A) Number of DAPs in JDG and DMS. (B) Venn diagram of the DAPs in JDG and DMS. (C) Localizations of DAPs identified in JDG. (D) Functional categorization of DAPs unique to JDG. (E, F) KEGG enrichment analysis of DAPs in JDG (upregulated, E) and DMS (upregulated, F).

We predicted that most of the DAPs are located in chloroplasts in rose, according to the WoLFPSORT database ( Fig. 3C , Fig. S4A ). Gene Ontology (GO) and KEGG analyses were performed to analyze and annotate protein functions. The 20 most highly enriched GO terms associated with the DAPs are depicted in a circle diagram ( Fig. S5A, B , Table S2 ). Among them, GO:0046658 (anchored component of plasma membrane), GO:0051554 (flavonol metabolic process), GO:0047893 (flavonol 3- O -glucosyltransferase activity), and GO:0051555 (flavonol biosynthetic process) were highly enriched in JDG under salt stress. In DMS, GO:0006720 (isoprenoid catabolic process), GO:0005764 (lysosome), and GO:0004602 (glutathione peroxidase activity) were the most enriched among all GO terms. In addition, the GO data indicated that the DAPs specific to JDG were highly involved in the 'icosanoid metabolic process,' 'diterpenoid metabolic process,' and 'diterpenoid biosynthetic process' ( Fig. 3D ), whereas the DAPs specific to DMS were enriched in 'cellular hyperosmotic salinity response,' 'monocarboxylic acid catabolic process,' 'terpenoid catabolic process,' 'sesquiterpenoid catabolic process,' and 'apocarotenoid catabolic process' functions ( Fig. S4B ). DAPs shared by JDG and DMS included Q2VA35 (xyloglucan endotransglucosylase/hydrolase) and A0A2P6P708 (glutathione peroxidase), which are present only in extracellular regions ( Table S2 ). The DAPs in different comparison groups were classified and then clustered according to enrichment of their associated GO terms ( Fig. S4C ). We determined that salinity mainly influences flavone and flavonol metabolism pathways in JDG. Flavones and flavonols are antioxidants and bioactive reagents [ 24 ]. In DMS, salt mainly influences the osmotic response, water stimulus response, and salt stress response pathways, most of which are stress related [ 31 ]. We used KEGG enrichment to determine the metabolic pathways associated with the DAPs in JDG and DMS under salt stress ( Fig. 3E, F ). Many DAPs in JDG were associated with phenylpropanoid biosynthesis and alpha-linolenic acid metabolism, with examples including lipoxygenase (A0A2P6S713), 12-oxophytodienoate reductase (A0A2P6PFD8), peroxidase (A0A2P6R8H8), and flavone 3′- O -methyltransferase (A0A2P6RK21). The DAPs upregulated in DMS under salt stress were frequently associated with alpha-linolenic acid metabolism and glutathione metabolism, whereas the DAPs that were downregulated were associated with ribosomes ( Table S2 ). Notably, alpha-linolenic acid metabolism was significantly upregulated in both JDG and DMS under salt stress. Collectively, the GO and KEGG enrichment results show that salt stress causes dynamic changes in distinct sets of proteins in rose.

Salt stress differentially alters the transcriptomes of JDG and DMS

To identify the genes involved in salt stress and explore the molecular mechanisms of salt tolerance in DMS and JDG, we sequenced the transcriptomes of JDG and DMS leaves by RNA sequencing (RNA-seq). We obtained high-quality reads for transcriptome analysis ( Table S3 ). PCA showed a distinct difference between the two cultivars along PC1, and PC2 separated the treatment from the control. The three biological replicates in the ordination space were mostly clustered together, suggesting an acceptable correlation between replicates ( Fig. 4A ).

Transcriptomic analysis of JDG and DMS under salt stress. (A) PCA score plot of transcriptomic profiles from different cultivars. (B) Number of DEGs in JDG and DMS. (C–E) Venn diagrams of DEGs in JDG and DMS: (C) total DEGs, (D) upregulated DEGs, and (E) downregulated DEGs. (F, G) KEGG enrichment analysis of DEGs in JDG (F) and DMS (G).

Transcriptomic analysis of JDG and DMS under salt stress. (A) PCA score plot of transcriptomic profiles from different cultivars. (B) Number of DEGs in JDG and DMS. (C–E) Venn diagrams of DEGs in JDG and DMS: (C) total DEGs, (D) upregulated DEGs, and (E) downregulated DEGs. (F, G) KEGG enrichment analysis of DEGs in JDG (F) and DMS (G).

Correlation analysis of transcriptome, proteome, and metabolomics data. (A, B) KEGG enrichment analysis of combined transcriptome, proteome, and metabolome data: (A) JDG-NaCl vs JDG-Mock, and (B) DMS-NaCl vs DMS-Mock. The x-axis shows the enrichment factor of the pathway in different omics, and the y-axis shows the name of the KEGG pathway; the color from red to green represents the significance of enrichment from high to low (indicated by the P value). The size of bubbles indicates the number of DEGs, DAPs, or DAMs; the larger the number, the larger the symbol. The shape of bubbles illustrates the various omics: circles represent genes omics, triangles represent metabolites omics, and squares represent proteins omics. (C) Co-expression network of major genes, proteins, and metabolites in the phenylpropanoid pathway. Different colors indicate the value of log2Fold Change (NaCl/Mock), with red for upregulated and blue for downregulated genes, proteins, or metabolites.

Correlation analysis of transcriptome, proteome, and metabolomics data. (A, B) KEGG enrichment analysis of combined transcriptome, proteome, and metabolome data: (A) JDG-NaCl vs JDG-Mock, and (B) DMS-NaCl vs DMS-Mock. The x-axis shows the enrichment factor of the pathway in different omics, and the y-axis shows the name of the KEGG pathway; the color from red to green represents the significance of enrichment from high to low (indicated by the P value). The size of bubbles indicates the number of DEGs, DAPs, or DAMs; the larger the number, the larger the symbol. The shape of bubbles illustrates the various omics: circles represent genes omics, triangles represent metabolites omics, and squares represent proteins omics. (C) Co-expression network of major genes, proteins, and metabolites in the phenylpropanoid pathway. Different colors indicate the value of log 2 Fold Change (NaCl/Mock), with red for upregulated and blue for downregulated genes, proteins, or metabolites.

We analyzed differentially expressed genes (DEGs) in JDG and DMS under control and salt stress conditions. We detected 10,662 DEGs in DMS under salt stress, of which 4651 were upregulated and 6011 were downregulated. However, only 1990 genes were differentially expressed in JDG: 1102 upregulated and 888 downregulated ( Fig. 4B ). The smaller number of DEGs in JDG than in DMS under salt stress implies that JDG is less affected by salt stress. We used a Venn diagram to display the differences between various genes in DMS and JDG under salt stress. Group DMS-NaCl vs DMS-Mock and group JDG-NaCl vs JDG-Mock shared 1120 DEGs under salt stress, with 577 upregulated genes and 433 downregulated genes ( Fig. 4C–E ).

Next, we performed GO analysis of DEGs in the categories cellular component (CC), biological process (BP), and molecular function (MF). The top 21 most enriched GO terms associated with DEGs of JDG-NaCl vs JDG-Mock and DMS-NaCl vs DMS-Mock are presented in circle diagrams ( Fig. S6 , Table S4 ). Seven GO terms associated with the JDG-NaCl vs JDG-Mock group were highly involved in the BP category, among which GO:0016052 (carbohydrate catabolic process), GO:0009813 (flavonoid biosynthetic process), and GO:0009812 (flavonoid metabolic process) contained the most DEGs (43, 26, and 27, respectively), and most of these enriched genes were upregulated. Thirteen GO terms were highly involved in the MF category, among which GO:0010427 (abscisic acid binding), GO:0016832 (aldehyde-lyase activity), and GO:0019840 (isoprenoid binding) were highly significant. One GO term was highly involved in the CC category: GO:0031226 (intrinsic component of plasma membrane). Moreover, 19 GO terms associated with the DMS-NaCl vs DMS-Mock group were enriched in the BP category, among which GO:0036294 (cellular response to decreased oxygen levels), GO:0048511 (rhythmic process), and GO:0048585 (negative regulation of response to stimulus) contained the most DEGs (85, 95, and 146, respectively), and most of these enriched genes were downregulated. One GO term was enriched in the MF category: GO:0016854 (racemase and epimerase activity). Similarly, one GO term was enriched in the CC category: GO:0009501 (amyloplast). KEGG pathway enrichment analysis for JDG-NaCl vs JDG-Mock revealed that the DEGs were mainly involved in metabolic pathways, plant hormone signal transduction, biosynthesis of secondary metabolites, and glycolysis/gluconeogenesis ( Fig. 4F , Table S4 ). In the DMS-NaCl vs DMS-Mock group, the DEGs were chiefly enriched in metabolic pathways, plant hormone signal transduction, the MAPK signaling pathway, biosynthesis of cofactors, and ubiquitin-mediated proteolysis ( Fig. 4G , Table S4 ). These findings indicate that the biosynthesis of secondary metabolites is substantially enhanced under salt stress in JDG, but not in DMS. However, the biosynthesis of cofactors associated with primary metabolism is enhanced under salt stress in DMS. Therefore, we speculate that salinity results in large changes in primary metabolism in DMS, while it influences secondary metabolism in JDG.

Transcription factors (TFs) are essential for regulating the expression of stress response genes. Among the DEGs, we identified 114 TFs in JDG and 491 TFs in DMS, covering 39 TF families ( Table S4 ). The most abundant genes belonged to the AP2/ERF-ERF, MYB, NAC, bHLH, and C2C2 families ( Fig. S7A, B ). Moreover, 64 TFs were differentially expressed in both cultivars in response to salinity. We speculate that these TFs form a highly complex transcriptional regulatory network and could perform critical functions in the mechanism of salt tolerance in rose.

Expression of phenylpropanoid-related genes is correlated with proteins and metabolites affected by salt stress

Integrated analysis of multi-omics data provides a powerful tool for identifying significantly different pathways and crucial metabolites in biological processes. Here, we integrated our transcriptome, proteome, and metabolome data to determine the performance of the two rose cultivars under salt stress. Pathways associated with alpha-linolenic acid metabolism, phenylpropanoid biosynthesis, and starch and sucrose metabolism were significantly enriched in JDG under salt stress ( Fig. 5A ), while the pathways enriched in DMS were involved in starch and sucrose metabolism, cyanoamino acid metabolism, and phenylpropanoid biosynthesis ( Fig. 5B ). Starch and sucrose metabolism represent primary metabolic functions common to different cultivars [ 32 ], while alpha-linolenic acid metabolism is related to the biosynthesis of jasmonic acid, which is a phytohormone involved in fungal invasion and senescence [ 7 ]. The phenylpropanoid biosynthesis pathway comprises multiple secondary metabolites, which confer a range of colors, flavors, nutritional components, and bioactivities in plants. Flavonoids are an important type of phenylpropanoid that play key roles in resistance against biotic and abiotic stresses [ 24 ]. Thus, we focused on the phenylpropanoid pathway.

Gene–protein–metabolite correlation networks can be used to elucidate functional relationships and identify regulatory factors. Therefore, we analyzed the regulatory networks of the DEGs, DAPs, and DAMs related to phenylpropanoid metabolism. We identified 14 DEGs that were strongly correlated with one DAP and six DAMs in JDG under salt stress. Similarly, 25 DEGs were strongly correlated with one DAP and eight DAMs in DMS under salt stress ( Table S5 ). For example, in JDG, there was a strong correlation between the expression of one gene (RchiOBHmChr4g0430951) and the abundance of one protein (A0A2P6PM56) and two metabolites [coniferyl alcohol (mws0093) and sinapyl alcohol (mws0853)]. Epiafzelechin (mws1422) was also significantly associated with the expression of the gene RchiOBHmChr2g0092641. In DMS, there was a close association between the expression of three genes (RchiOBHmChr2g0092671, RchiOBHmChr3g0480401, and RchiOBHmChr5g0041231) and the abundance of one protein (A0A2P6QM41) and one metabolite [L-tyrosine (mws0250)]. The strong association of particular genes with phenylpropanoid proteins or metabolites suggests that these genes play a major role in phenylpropanoid biosynthesis under salt stress.

We selected 20 important genes in the biosynthetic pathway of phenylpropanoid and compared their expression between rose cultivars ( Table S6 ). The transcript levels of many genes ( 4CL1 , CCR1 , HCT1 , HCT2 , HCT3 , HCT4 , CHS1 , CHS2 , CHI , DFR , F3H , and ANR ) were higher in JDG than in DMS, which may be valuable for salt tolerance by stimulating JDG to produce more flavonoids. Our multi-omics analysis revealed that ferulic acid, sinapic acid, and coniferaldehyde accumulated to high levels in JDG under salt stress ( Fig. 5C , Table S1 ). We also compared the flavonoid compounds in the two cultivars. Quercetin-3,3′-dimethyl ether, 5,7-dihydroxy-6,3′,4′,5′-tetramethoxyflavone (arteanoflavone), naringenin-4′,7-dimethyl ether, naringin dihydrochalcone, genkwanin (apigenin 7-methyl ether), and mearnsetin accumulated to greater levels in JDG than in DMS under control conditions. Correspondingly, the flavonoids brickellin, 3- O -methylquercetin, 5,2′,5′-trihydroxy-3,7,4′-trimethoxyflavone-2′- O -glucoside, and kaempferol-3- O -(6′′-acetyl)glucosyl-(1→3)-galactoside were more abundant in JDG than in DMS under salt stress. By contrast, naringenin-4′,7-dimethyl ether, aromadendrin (dihydrokaempferol), pinocembrin-7- O -(6′′- O -malonyl)glucoside, Quercetin-3- O -(2”- O -glucosyl)glucuronide, were specifically accumulated in DMS. Moreover, 3′,4′,5′,5,7-pentamethoxyflavone, 3,5,7,3′4′-pentamethoxyflavone, and 5,7,8,4′-tetramethoxyflavone were abundant in JDG under salt stress but were decreased in DMS ( Table S7 ). Overall, the integration of the three omics datasets indicated that the phenylpropane pathway, especially the flavonoid pathway, is strongly enhanced under salinity conditions and that this contributes to salt tolerance in roses, especially in the JDG genotype.

Networks of co-expressed genes associated with phenylpropanoid biosynthesis are involved in the salt stress response

To identify candidate genes associated with phenylpropanoid biosynthesis, we constructed co-expression gene network modules via weighted gene correlation network analysis (WGCNA). We constructed a cluster tree based on correlation between expression levels (indicated by fragments per kilobase of script per million fragments mapped, FPKM), which partitioned the genes into 11 different gene modules ( Fig. 6A, B ). To identify candidate genes that play significant roles within the gene networks, we extracted annotation information for all these genes from the Rosa chinensis 'Old Blush' reference genome annotation database. We selected 16 genes contributing to phenylpropanoid biosynthesis and four genes associated with flavonoid biosynthesis. Table S8 lists the annotated genes participating in flavonoid-related pathways in JDG. Among the 11 modules, the green module contained 10 of these genes: CHS1 , CHS2 , CCR1 , HCT3 , HCT4 , CCoAOMT , F3H , DFR , ANR , and CHI . The turquoise module contained three genes: CCR2 , HCT1 , and CAD2 . The blue module contained three genes: PRDX1 , 4CL1 , and ANS . The red, yellow, brown, and black modules each contained one gene: CAD1 , PRDX2 , HCT2 , and 4CL2 , respectively ( Table S8 ). After combining certain genes in modules and comparing them with the DEGs, we checked and confirmed these results using reverse-transcription quantitative PCR (RT-qPCR). The expression trends of eight DEGs from phenylpropanoid and flavonoid biosynthesis pathways matched the results of RNA-seq ( Fig. S8 ).

Co-expression network related to flavonoid biosynthesis. (A) Clustering tree based on the correlation between gene expression levels. (B) Module–sample relationships. Each row represents a gene module, with the same color in as (A); each column represents a sample; the boxes within the chart contain corresponding correlations and P values. (C–E) Networks built from correlations among structural genes and TFs. Circles represent genes, and the size of the circle represents the number of relationships between genes in the network and surrounding genes. Lines represent regulatory relationships between genes, and different colored lines represent different connection strengths: red, strong connections; green, weak connections. (F) Heat map depicting the expression profiles of 15 TF genes. The scale bar denotes the Fold change/(mean expression levels across the three treatment groups). The color indicates relative levels of gene expression, horizontal rows represent the different treatments in JDG, and vertical columns show the TFs. (G) Representative images of transient expression of bHLH74 and LUC driven by the CHS1 promoter in Nicotiana benthamiana leaves. The color scale represents the signal level. High represents a strong signal, and low represents a weak signal. (H) Relative value of LUC/REN. Data are based on the mean ± SE of at least three repeated biological experiments. Significance determined using Student’s t-test (**P < 0.01).

Co-expression network related to flavonoid biosynthesis. (A) Clustering tree based on the correlation between gene expression levels. (B) Module–sample relationships. Each row represents a gene module, with the same color in as (A); each column represents a sample; the boxes within the chart contain corresponding correlations and P values. (C–E) Networks built from correlations among structural genes and TFs. Circles represent genes, and the size of the circle represents the number of relationships between genes in the network and surrounding genes. Lines represent regulatory relationships between genes, and different colored lines represent different connection strengths: red, strong connections; green, weak connections. (F) Heat map depicting the expression profiles of 15 TF genes. The scale bar denotes the Fold change/(mean expression levels across the three treatment groups). The color indicates relative levels of gene expression, horizontal rows represent the different treatments in JDG, and vertical columns show the TFs. (G) Representative images of transient expression of bHLH74 and LUC driven by the CHS1 promoter in Nicotiana benthamiana leaves. The color scale represents the signal level. High represents a strong signal, and low represents a weak signal. (H) Relative value of LUC/REN. Data are based on the mean ± SE of at least three repeated biological experiments. Significance determined using Student’s t -test ( ** P < 0.01).

To determine the regulatory genes involved in phenylpropanoid biosynthesis in JDG, we constructed three subnetworks from the different modules using the 20 phenylpropanoid biosynthesis–related DEGs as the nodes ( Table S9 ). In the regulatory networks of phenylpropanoid biosynthesis, we identified 15 TF genes from seven TF families: AP2/ERF-ERF (5 unigenes), bHLH (3 unigenes), MYB (3 unigenes), Alfin-like (1 unigene), SBP (1 unigene), C2C2-GATA (1 unigene), and TCP (1 unigene). bHLH62 and bHLH74 were strongly associated with CHS1 , CHS2 , CHI , CCR1 , and F3H ; ERF81 was strongly associated with 4CL1 ; and ERF110 and MYB-related were strongly associated with 4CL2 ( Fig. 6C–E ), indicating that CHS and 4CL are the major target genes in phenylpropanoid biosynthesis. Therefore, we speculated that the abundance of flavonoids is increased by enhancing the expression of upstream flavonoid biosynthesis genes. Fig. 6F shows a heat map of expression of the 15 TF genes after NaCl treatment. The green module contained a substantial number of phenylpropanoid biosynthesis genes, among which CHS1 was closely related to the TFs bHLH74 and bHLH62. Therefore, dual-luciferase reporter assays were conducted to determine their regulatory relationship ( Fig. 6G, H ). We used bHLH74 and bHLH62 driven by the CaMV35S promoter as effectors in a transient expression system, with the CHS1 promoter fused with LUC as a reporter. When we cotransformed Nicotiana benthamiana leaves with the effectors and the reporter, the LUC/REN ratio of CHS1 was 0.3/1, which was drastically lower than those of the controls ( Fig. 6G, H , Fig. S9A, B ). These results indicate that bHLH74, but not bHLH62, inhibits the expression of CHS1 .

Salt stress damages the structure and osmotic potential of rose leaves

Roses belong to the Rosaceae family and are one of the most important commercial flower crops. Extracts from various parts of the rose plant have also been shown to have excellent biological activity and are used in industries such as cosmetics, perfume and medicine [ 1 ]. Meanwhile, an increasing number of wild rose varieties with significant health benefits are being domesticated and brought into mainstream cultivation [ 33 ]. Salt stress is one of the most widespread abiotic constraints for rose cultivation. Salt stress threatens plant survival and growth but can stimulate an increase in the biosynthesis of secondary metabolites [ 34 ]. Previous studies have shown that optimal coordination between leaf structure and photosynthetic processes is essential for enabling plants to tolerate salt stress [ 35 ]. When exposed to salt treatment, leaves become thicker and smaller while the palisade tissue and spongy tissue become loose and jumbled and the intercellular space of the mesophyll becomes thinner [ 36–39 ]. We observed that the palisade tissue of DMS was loose, disordered, and severely damaged compared with that in JDG under salt stress ( Fig. 1C ). This indicates that DMS is more sensitive to salt stress than JDG. Typically, excessive ROS accumulate under stress conditions, which can lead to membrane oxidative damage (lipid peroxidation) [ 40 ]. Silencing of the gene GmNAC06 in soybean ( Glycine max ) leads to accumulation of ROS under salt stress, which in turn leads to significant losses in soybean production [ 41 ]. In Arabidopsis , the sibp1 mutant accumulates more ROS than wild-type plants or AtSIBP1-overexpressing plants, resulting in a lower survival rate under salt treatment [ 42 ]. In this study, salinity led to a greater accumulation of ROS in DMS compared with JDG, as detected by DAB staining ( Fig. 1D, E ). This indicates that DMS suffers greater damage under salinity stress. Excessive accumulation of ROS in cells can lead to membrane oxidative damage and trigger the production of enzyme systems or non-enzyme free radical scavengers to cope with oxidative damage [ 10 ]. Here, antioxidant enzyme activities such as peroxidase (A0A2P6R8H8) and glutathione peroxidase (A0A2P6P708) were upregulated in roses under salt treatment ( Table S2 ). This suggests that rose plants maintain lower ROS levels by upregulating the activity of antioxidant enzymes, thereby protecting photosynthetic mechanisms and maintaining plant growth under salt stress. Among the nonenzymatic antioxidants, phenols and flavonoids accumulate in various tissues and contribute to free radical scavenging that enhances plant salt tolerance [ 43 ]. Indeed, we identified significant differences in the contents of phenolic acids, lipids, and flavonoid metabolites in JDG and DMS under control and salt stress conditions ( Table S1 ). Moreover, our transcriptomic and proteomic analysis revealed the activation of genes and proteins within the phenylpropanoid and flavonol pathways. This activation results in the accumulation of various phenolic compounds, potentially enhancing their capacity for scavenging ROS.

Flavonoids are beneficial for improving salt stress in rose

Phenolic compounds, such as flavonoids, are among the most widespread secondary metabolites observed throughout the plant kingdom [ 44 ]. These compounds fulfill various biochemical and molecular functions within plants, encompassing roles in plant defense, signal transduction, antioxidant action, and the scavenging of free radicals [ 45 ]. Environmental changes commonly trigger the flavonoid pathway, which aids in shielding plants from the harmful effects of ultraviolet radiation, salt, heat, and drought [ 23 , 46 , 47 ]. Moreover, flavonoids demonstrate potent biological activity and serve as significant antioxidants [ 48 ]. Recently, researchers and consumers have been interested in plant-based polyphenols and flavonoids for their antioxidant potential, their dietary accessibility, and their role in preventing fatal diseases such as cardiovascular disease and cancer [ 49 ]. Our transcriptomics analysis showed that salinity causes significant alterations in the secondary metabolism of JDG, while affecting the primary metabolism of DMS. Proteomics showed that phenylpropanoid biosynthesis is significantly enhanced in JDG under salt stress, especially through the flavonoid pathway. In DMS, glutathione metabolism is significantly enhanced under salt stress, indicating differences in salt tolerance pathways between the two cultivars. Our metabolome data indicated that the abundance of phenolic acid and flavonoid metabolites was significantly altered in both JDG and DMS under salt stress. Furthermore, by comparing their contents in leaves under salt stress and control conditions, we found that more flavonoids accumulated in DMS than in JDG under salt stress. This evidence suggests that DMS requires an increased presence of flavones to withstand the damage caused by salinity. By contrast, salinity stress did not trigger a substantial buildup of flavonoids in JDG, possibly due to the adequate levels of flavonoids already present under normal conditions, which provided ample tolerance to salt-induced stress. This observation could also explain the higher tolerance of JDG to salt stress ( Table S1 ). When we compared the flavonoid metabolites of the phenylpropanoid pathway to identify flavonoid metabolites associated with salt tolerance, we found that 17 phenolic acid metabolites and 6 flavonoid metabolites were significantly differentially accumulated in both genotypes. Of these compounds, ferulic acid serves as a free radical scavenger, while simultaneously serving as an inhibitor for enzymes engaged in generating free radicals and boosting the activity of scavenger enzymes [ 49 ]. Sinapic acid is a bioactive phenolic acid with anti-inflammatory and anti-anxiety effects [ 50 ]. Pinocembrin, a naturally occurring flavonoid found in fruits, vegetables, nuts, seeds, flowers, and tea, is an anti-inflammatory, antimicrobial, and antioxidant agent [ 51 ]. This indicates that these two rose cultivars contain beneficial metabolites with some economic value. We investigated the possible effects of these metabolites in conferring salt tolerance in rose by comparing specific DAMs between JDG and DMS. Among these DAMs, eight metabolites were upregulated and six metabolites were downregulated under salt treatment in JDG compared to DMS. Among these eight upregulated DAMs, the contents of 3- O -methylquercetin, brickellin, 5,2′,5′-trihydroxy-3,7,4′-trimethoxyflavone-2′- O -glucoside, and kaempferol-3- O -(6′′-acetyl)glucosyl-(1→3)-galactoside accumulated significantly with salinity ( Table S7 ). These metabolites have important functions. For example, 3- O -methylquercetin has potent anticancer, antioxidant, antiallergy, and antimicrobial activities and shows strong antiviral activity against tomato ringspot virus [ 52 ]. Kaempferol, a biologically active compound found in numerous fruits, vegetables, and herbs, demonstrates various pharmacological benefits, such as antimicrobial, antioxidant, and anticancer properties [ 53 ]. This indicates that JDG is an excellent rose cultivar that is both salt tolerant and rich in beneficial bioactive substances.

bHLHL74 regulates flavonoid biosynthesis

The biosynthesis of flavonoids is initiated from the amino acid phenylalanine, giving rise to phenylpropanoids that subsequently enter the flavonoid-anthocyanin pathway [ 25 ]. The CHS enzyme is situated at a crucial regulatory position preceding the flavonoid biosynthetic pathway, directing the flow of the phenylpropanoid pathway towards flavonoid production, which has been extensively documented in many plant species [ 54 , 55 ]. In rice ( Oryza sativa ), defects in the flavonoid biosynthesis gene CHS can alter the distribution of flavonoids and lignin [ 56 ]. In eggplant ( Solanum melongena L.), CHS regulates the content of anthocyanins in eggplant skin under heat stress [ 57 ]. In apple ( Malus domestica ), overexpression of CHS increases the accumulation of flavonoids and enhances nitrogen absorption [ 58 ]. We identified a positive correlation between flavonoid accumulation and the expression of CHS genes, in agreement with previous reports. The bHLH TFs involved in regulating flavonoid biosynthesis work in a MYB-dependent or -independent manner. For example, DvIVS, a bHLH transcription factor in dahlia ( Dahlia variabilis ), activates flavonoid biosynthesis by regulating the expression of Chalcone synthase 1 ( CHS1 ) [ 59 ]. The Arabidopsis bHLH proteins TRANSPARENT TESTA 8 (AtTT8) and ENHANCER OF GLABRA 3 (AtEGL3) are all involved in the biosynthesis of various flavonoids [ 60–62 ]. In Chrysanthemum ( Chrysanthemum morifolium ), CmbHLH2 significantly activates CmDFR transcription, leading to anthocyanin accumulation, especially when in coordination with CmMYB6 [ 63 ]. In blueberry ( Vaccinium sect. Cyanococcus ), the bHLH25 and bHLH74 TFs potentially engage with MYB or directly hinder the expression of genes responsible for flavonoid biosynthesis, thereby regulating flavonoid accumulation [ 64 ]. In apple ( Malus domestica ), expression of bHLH62, bHLH74, and bHLH162 is significantly negatively correlated with anthocyanin content and has been shown to inhibit anthocyanin biosynthesis [ 65 ]. In apple fruit skin, hypermethylation of bHLH74 in the mCG context leads to transcriptional inhibition of downstream anthocyanin biosynthesis genes [ 66 ]. In rose, our co-expression network revealed a strong correlation between CHS and genes encoding TFs such as bHLH74 and bHLH62 in the key gene network. bHLH proteins can bind to the promoter regions of pivotal genes encoding enzymes, playing important roles in regulating DAMs under salt stress. Dual-luciferase reporter assays showed that LUC bioluminescence was suppressed well below background levels in Nicotiana benthamiana leaves infiltrated with pCHS1:LUC plus 35S:bHLH74, but not 35S:bHLH62 ( Fig. 6G, H , Fig. S9A, B ). Thus, we conclude that bHLHL74 TFs negatively regulate flavonoid biosynthesis by directly inhibiting the expression of CHS1 , which is involved in the flavonoid biosynthetic pathway.

We examined the morphological phenotypes, transcriptomes, proteomes, and widely targeted metabolomes of JDG and DMS under salt stress. Multi-omics analysis revealed that the phenylpropane pathway, especially the flavonoid pathway, contributes strongly to salt tolerance in rose, particularly JDG. Meanwhile, the bHLHL74 TF negatively regulates flavonoid biosynthesis by repressing the expression of the CHS1 gene involved in the flavonoid biosynthetic pathway. This research facilitates our understanding of the regulatory mechanisms of plant development and secondary metabolites underlying salt stress responses in rose, offering valuable insights that could be used to develop new strategies for improving plant tolerance to salinity.

Plant materials and growth conditions

Rosa hybrida cv. Jardin de Granville (JDG) and Rosa damascena Mill. (DMS) were planted in the Science and Technology Park of China Agricultural University (40°03′N, 116°29′E). Rose plants were propagated by cutting culture. Rose shoots with at least two nodes and approximately 6 cm in length were used as cuttings and inserted into square flowerpots (diameter 8 cm) containing a mixture of vermiculite and peat soil [1:1 (v/v)]. Cuttings were soaked in 0.15% (v/v) indole-3-butytric acid (IBA) before insertion into pots and then grown in a growth chamber at 25°C with 50% relative humidity and a cycle of 8 hours of darkness/16 hours of light for 1 month until rooting [ 67 ].

Nicotiana benthamiana plants were used for measurement of transient expression. Seeds were sown in square flowerpots (diameter 8 cm); after 1 week, seedlings were transplanted into different pots. The soil and cultivation conditions for N. benthamiana cultivation were the same as those for roses.

Salt treatment

Twenty JDG and 20 DMS rose cuttings displaying good rooting and uniform appearance were selected for salt treatment experiments. JDG or DMS plants were randomly divided into two groups watered with either 0 or 400 mM NaCl. Phenotypes were recorded after 2 weeks. This process was repeated three times [ 68 ].

Salt treatment of rose leaves was described previously [ 68 ]. Thirty JDG and 30 DMS rose cuttings with good rooting and uniform appearance were selected, and mature leaves of similar size were collected. The leaves were divided into two treatment groups, each containing 30 leaves: group A, immersed in deionized water treatment, and group B, immersed in 400 mM NaCl treatment. Phenotypes were observed after 0, 2, and 4 days. On the second day of treatment, leaves showed obvious differences. By the fourth day of treatment, the leaves had become soft or had died. Therefore, sequencing data from the second day were used. Three independent biological replicates were assayed.

Relative electrolyte permeability

Determination of relative electrolyte permeability was as previously reported [ 69 ] with the following modifications. Salt-treated leaves (0.1 g) were weighed, placed in a 50-ml centrifuge tube, and covered with 20 ml deionized water. The conductivity of the distilled water was measured and defined as EC0. After shaking for 20 minutes at 60 rpm on an orbital shaker, the conductivity at room temperature was measured and defined as EC1. The centrifuge tube was then placed in boiling water for 10 minutes and cooled to room temperature, and the conductivity of the solution was measured as EC2. The relative permeability of the electrolytes (as a percentage) was determined as (EC1-EC0) / (EC2-EC0) × 100%.

Soluble protein content

Soluble protein content was determined following the method of Bradford (1976) [ 70 ]. Leaf samples (0.5 g) were placed in a mortar with 8 ml distilled water and a small amount of quartz sand, crushed thoroughly, and incubated at room temperature for 0.5 hours. After centrifugation at 3,000 g for 20 minutes at 4 °C, the supernatant was transferred to a 10-ml volumetric flask and the volume was adjusted to 10 ml with distilled water. Two 1.0-ml aliquots of this sample extraction solution (or distilled water as a control) were transferred to clean test tubes, 5 ml of Coomassie Brilliant Blue reagent was added, and the tubes were shaken well. After 2 minutes, when the reaction was complete, the absorbance and chromaticity at 595 nm were measured, and the protein content was determined using a standard curve.

Leaf anatomical structure

Paraffin sections were prepared as described previously with some modifications [ 71 ]. Leaves from the control and NaCl treatments were collected, washed slowly with deionized water at normal room temperature, and stored at 4°C until further use. A 3-mm × 5-mm sample was cut from the same part of each leaf, and these leaf samples were fixed in 2.5% (v/v) glutaraldehyde. Samples were dehydrated using acetone through a concentration gradient of 30%, 50%, 70%, 80%, 95%, and 100% (v/v) and then embedded in paraffin. The embedded tissues (3-μm sections) were sectioned using a Leica RM2265 rotary slicer (Leica Microsystems, Wetzlar, Germany). Slides were stained with 0.02% (v/v) toluidine blue for 5 minutes, and the residual toluidine blue was removed using distilled water. Slides were allowed to dry and then observed under a microscope (OLYMPUS BH-2, Tokyo, Japan). Three independent biological replicates were examined.

DAB (3,3′-diaminobenzidine) staining for H 2 O 2

H 2 O 2 content was detected using the DAB staining method [ 72 ]. Leaves treated with NaCl or control leaves were rinsed clean with distilled water, immersed in DAB solution (1 mg/ml, pH 3.8), and placed under vacuum at approximately 0.8 Mpa for 5 minutes; this process was repeated three to six times until the leaves were completely infiltrated. Leaves were then incubated in a box in the dark for 8 hours until a brown sediment was observed. Chlorophyll was removed by repeatedly washing with eluent (ethanol:lactic acid:glycerol, 3:1:1, v/v/v). Decolorized leaves were photographed to record their phenotypes. ImageJ was used to quantify the stained areas.

UPLC-QQQ-based widely targeted metabolome analysis

Metabolomics analysis was performed on four groups of samples: JDG-Mock, JDG-NaCl, DMS-Mock, and DMS-NaCl. Extraction and determination of metabolites were performed with the assistance of Wuhan Metware Biotechnology Co., Ltd. Samples were crushed using a stirrer containing zirconia beads (MM 400, Retsch). Freeze-dried samples (0.1 g) were incubated overnight with 1.2 ml 70% (v/v) methanol solution at 4 °C, then centrifuged at 13,400 g for 10 minutes. The extracts were filtered and subjected to LC-MS/MS analysis [ 73 ]. A previously described procedure [ 74 ] was followed for analyzing the conditions and quantifying metabolites using an LC-ESI-Q TRAP-MS/MS in multi-reaction monitoring (MRM) mode. The prcomp function was used for PCA, significantly different metabolites were determined by |log 2 Fold Change| ≥ 1, and annotated metabolites were mapped to the KEGG pathway database ( http://www.kegg.jp/kegg/pathway.html ). Comparisons are described as follows: e.g., JDG-NaCl vs JDG-Mock, indicating that the treated sample is being compared with the untreated sample and that metabolites are upregulated or downregulated in the NaCl sample compared with the Mock sample.

Tandem mass tag-based proteomic analysis

Experiments were carried out with the assistance of Hangzhou Jingjie Biotechnology Co., Ltd. Samples were thoroughly ground into powder using liquid nitrogen, and protein extraction was performed using the phenol extraction method. The protein was added to trypsin for enzymolysis overnight, and then the peptide segments were labeled with TMT tags. LC-MS/MS analysis was performed using an EASY-nLC 1200 UPLC system (ThermoFisher Scientific) and a Q Active TM HF-X (ThermoFisher Scientific) [ 75 ]. An absolute value of 1.3 was used as the threshold for significant changes. GO ( http://www.ebi.ac.uk/GOA/ ) and KEGG categories were used to annotate DAPs; WoLFPSORT software was used to predict subcellular localization ( https://wolfpsort.hgc.jp/ ).

Transcriptome sequencing

We constructed 12 cDNA libraries (three biological replicates for each of JDG and DMS under each treatment) for RNA-seq. Transcriptome sequencing was completed at Wuhan Metware Biotechnology Co., Ltd. RNA purity and RNA integrity were determined using a nanophotometer spectrophotometer and an Agilent 2100 bioanalyzer, respectively. The RNA library was then sequenced on the Illumina Hiseq platform. Raw data were filtered using fastp v 0.19.3 and compared with the reference genome ( https://lipm-browsers.toulouse.inra.fr/pub/RchiOBHm-V2/ ). FPKM (fragments per kilobase of script per million fragments mapped) was used as an indicator to measure gene expression levels, with the threshold for significant differential expression being an absolute |log 2 Fold Change| ≥ 1 and False Discovery Rate < 0.05. GO and KEGG categories were used to annotate DEGs [ 76 ].

To identify modules with high gene correlation, co-expression network analysis was performed using the R-based WGCNA package (v.1.69) with default parameters [ 77 ]. The varFilter function of the R language genefilter package was used to remove genes with low or stable expression levels in all samples. Modules based on the correlation between gene expression levels were identified, and a correlation matrix between each module and the sample was calculated using the R-based WGCNA software package. The module network was visualized using Cytoscape software (v.3.7.2).

RT-qPCR was performed on eight DEGs in the phenylpropanoid pathway to verify the accuracy of the data obtained from high-throughput sequencing. Total RNA was extracted using the hot borate method [ 72 ] and reverse transcribed using HiScript III All-in-one RT SuperMix (R333-01, Vazyme Biotech Co., Ltd., Nanjing, China). Subsequently, 2 × ChamQ SYBR qPCR Master Mix (Q331, Vazyme Biotech Co., Ltd., Nanjing, China) was used for quantitative detection of gene expression. The relative expression of genes was calculated using the 2 −ΔΔCt method [ 76 ]. GAPDH was used as an endogenous control, and primers for RT-qPCR are listed in Table S10 .

Dual-LUC reporter assay

A transactivation assay was designed to evaluate the effect of BHLH74/BHLH62 on the CHS1 promoter using methods described previously [ 78 ]. Initially, a 2000-bp segment of the CHS1 promoter was cloned into the pGreenII 0800-LUC vector, generating the ProCHS1:LUC reporter plasmid. Concurrently, the coding sequences of BHLH74/BHLH62 were inserted into the pGreenII0029 62-SK vector, resulting in the construction of Pro35S: BHLH74/BHLH62 effector plasmids. pGreenII 0800-LUC vector containing REN under control of the 35S promoter was used as a positive control.

Following plasmid construction, these constructs were introduced into Agrobacterium tumefaciens strain GV3101, which harbored the pSoup plasmid. Subsequently, A. tumefaciens containing different combinations of effector and reporter plasmids was infiltrated into N. benthamiana plants with six to eight young leaves. After a 3-day incubation period, the ratios of LUC to REN were quantified using the Bio-Lite Luciferase Assay System (DD1201, Vazyme Biotech Co., Ltd., Nanjing, China). Images capturing LUC signals were acquired using a CCD camera (Night Shade LB 985, Germany). Primer sequences are listed in Table S10 .

Statistical analysis

Statistical analyses of data were conducted using IBM SPSS Statistics, while graphical representations were created using GraphPad Prism 8.0.1. Paired data comparisons were assessed through Student's t -tests ( * P < 0.05, ** P < 0.01, *** P < 0.001). Each experiment was performed using a minimum of three biological replicates, and error bars depicted on graphs denote the standard error (SE) of the mean value. The NetWare Cloud platform ( https://cloud.metware.cn ) and OmicShare tools ( https://www.chiplot.online/ ) were used for bioinformatics analyses and mapping.

This work was supported by the Consult of Flower Industry of Jinning District (202204BI090022), General Project of Shenzhen Science and Technology and Innovation Commission (Grant No. 6020330006K0).

ZX, MN conceived and designed the experiments. RH and YW conducted the experiments. RH, YW, ZX analyzed the data. LY, JW, QX, CP, XT, GJ and MN performed the research. RH, SM and ZX wrote the manuscript. All authors read and approved the manuscript. RH and YW contributed equally to this work.

The datasets generated and analyzed during the current study are available in the Biological Research Project Data (BioProject), National Center for Biotechnology Information (NCBI) repository, accession: PRJNA1030783.

The authors declare that they have no competing interests.

Mileva M , Ilieva Y , Jovtchev G . et al.  Rose flowers—a delicate perfume or a natural healer? Biomol Ther . 2021 ; 11 : 127

Google Scholar

Katsoulas N , Kittas C , Dimokas G . et al.  Effect of irrigation frequency on rose flower production and quality . Biosyst Eng . 2006 ; 93 : 237 – 44

Isah T . Stress and defense responses in plant secondary metabolites production . Biol Res . 2019 ; 52 : 39

Feng D , Zhang H , Qiu X . et al.  Comparative transcriptomic and metabonomic analysis revealed the relationships between biosynthesis of volatiles and flavonoid metabolites in Rosa rugosa . Ornam Plant Res . 2021 ; 1 : 1 – 10

Wang X , Zhao F , Wu Q . et al.  Physiological and transcriptome analyses to infer regulatory networks in flowering transition of Rosa rugosa . Ornam Plant Res . 2023 ; 3 : 1 – 12

Jia Y , Chen C , Gong F . et al.  An aux/IAA family member, RhIAA14 , involved in ethylene-inhibited petal expansion in rose ( Rosa hybrida ) . Genes . 2022 ; 13 : 1041

Ren H , Bai M , Sun J . et al.  RcMYB84 and RcMYB123 mediate jasmonate-induced defense responses against Botrytis cinerea in rose ( Rosa chinensis ) . Plant J . 2020 ; 103 : 1839 – 49

Chaves MM , Flexas J , Pinheiro C . Photosynthesis under drought and salt stress: regulation mechanisms from whole plant to cell . Ann Bot . 2009 ; 103 : 551 – 60

Askari Kelestani A , Ramezanpour S , Borzouei A . et al.  Application of gamma rays on salinity tolerance of wheat ( Triticum aestivum L.) and expression of genes related to biosynthesis of proline, glycine betaine and antioxidant enzymes . Physiol Mol Biol Plants . 2021 ; 27 : 2533 – 47

Qi S , Wang X , Wu Q . et al.  Morphological, physiological and transcriptomic analyses reveal potential candidate genes responsible for salt stress in Rosa rugosa . Ornam Plant Res . 2023 ; 3 :21

Gill SS , Tuteja N . Reactive oxygen species and antioxidant machinery in abiotic stress tolerance in crop plants . Plant Physiol Biochem . 2010 ; 48 : 909 – 30

Ye C , Zheng S , Jiang D . et al.  Initiation and execution of programmed cell death and regulation of reactive oxygen species in plants . Int J Mol Sci . 2021 ; 22 : 12942

He L , He T , Farrar S . et al.  Antioxidants maintain cellular redox homeostasis by elimination of reactive oxygen species . Cell Physiol Biochem . 2017 ; 44 : 532 – 53

Challabathula D , Analin B , Mohanan A . et al.  Differential modulation of photosynthesis, ROS and antioxidant enzyme activities in stress-sensitive and -tolerant rice cultivars during salinity and drought upon restriction of COX and AOX pathways of mitochondrial oxidative electron transport . J Plant Physiol . 2022 ; 268 :153583

Li C , Mur LAJ , Wang Q . et al.  ROS scavenging and ion homeostasis is required for the adaptation of halophyte Karelinia caspia to high salinity . Front Plant Sci . 2022 ; 13 :

Ren G , Yang P , Cui J . et al.  Multiomics analyses of two sorghum cultivars reveal the molecular mechanism of salt tolerance . Front Plant Sci . 2022 ; 13 :

Petrussa E , Braidot E , Zancani M . et al.  Plant Flavonoids--Biosynthesis, Transport and Involvement in Stress Responses . Int J Mol Sci . 2013 ; 14 : 14950 – 73

Das S , Rosazza JPN . Microbial and enzymatic transformations of flavonoids . J Nat Prod . 2006 ; 69 : 499 – 508

Gao Y , Liu J , Chen Y . et al.  Tomato SlAN11 regulates flavonoid biosynthesis and seed dormancy by interaction with bHLH proteins but not with MYB proteins . Hortic Res . 2018 ; 5 :

Zhang Z , Liu Y , Yuan Q . et al.  The bHLH1-DTX35/DFR module regulates pollen fertility by promoting flavonoid biosynthesis in Capsicum annuum L . Hortic Res . 2022 ; 9 :

Ramaroson M , Koutouan C , Helesbeux JJ . et al.  Role of Phenylpropanoids and flavonoids in plant resistance to pests and diseases . Molecules . 2022 ; 27 : 8371

Schulz E , Tohge T , Winkler JB . et al.  Natural variation among Arabidopsis accessions in the regulation of flavonoid metabolism and stress gene expression by combined UV radiation and cold . Plant Cell Physiol . 2021 ; 62 : 502 – 14

Wang F , Zhu H , Kong W . et al.  The antirrhinum AmDEL gene enhances flavonoids accumulation and salt and drought tolerance in transgenic Arabidopsis . Planta . 2016 ; 244 : 59 – 73

Shen N , Wang T , Gan Q . et al.  Plant flavonoids: classification, distribution, biosynthesis, and antioxidant activity . Food Chem . 2022 ; 383 :132531

Liu W , Feng Y , Yu S . et al.  The flavonoid biosynthesis network in plants . Int J Mol Sci . 2021 ; 22 : 12824

Zhang X , Abrahan C , Colquhoun TA . et al.  A proteolytic regulator controlling chalcone synthase stability and flavonoid biosynthesis in Arabidopsis . Plant Cell . 2017 ; 29 : 1157 – 74

Riffault-Valois L , Blanchot L , Colas C . et al.  Molecular fingerprint comparison of closely related rose varieties based on UHPLC-HRMS analysis and chemometrics . Phytochem Anal . 2017 ; 28 : 42 – 9

Riffault L , Destandau E , Pasquier L . et al.  Phytochemical analysis of Rosa hybrida cv. ‘Jardin de Granville' by HPTLC, HPLC-DAD and HPLC-ESI-HRMS: polyphenolic fingerprints of six plant organs . Phytochemistry . 2014 ; 99 : 127 – 34

Omidi M , Khandan-Mirkohi A , Kafi M . et al.  Biochemical and molecular responses of Rosa damascena mill. cv. Kashan to salicylic acid under salinity stress . BMC Plant Biol . 2022 ; 22 : 373

Azizi S , Seyed Hajizadeh H , Aghaee A . et al.  In vitro assessment of physiological traits and ROS detoxification pathways involved in tolerance of damask rose genotypes under salt stress . Sci Rep . 2023 ; 13 : 17795

Zhao S , Zhang Q , Liu M . et al.  Regulation of plant responses to salt stress . Int J Mol Sci . 2021 ; 22 : 4609

Zhang C , Zhang H , Zhan Z . et al.  Transcriptome analysis of sucrose metabolism during bulb swelling and development in onion ( Allium cepa L.) . Front Plant Sci . 2016 ; 7 :1425

Kumari P , Raju DVS , Prasad KV . et al.  Characterization of anthocyanins and their antioxidant activities in Indian rose varieties ( Rosa × hybrida ) using HPLC . Antioxidants . 2022 ; 11 : 2032

Akula R , Ravishankar GA . Influence of abiotic stress signals on secondary metabolites in plants . Plant Signal Behav . 2011 ; 6 : 1720 – 31

Barhoumi Z , Djebali W , Chaïbi W . et al.  Salt impact on photosynthesis and leaf ultrastructure of Aeluropus littoralis . J Plant Res . 2007 ; 120 : 529 – 37

Jiang D , Lu B , Liu L . et al.  Exogenous melatonin improves the salt tolerance of cotton by removing active oxygen and protecting photosynthetic organs . BMC Plant Biol . 2021 ; 21 : 331

Liu D , Dong S , Miao H . et al.  A large-scale genomic association analysis identifies the candidate genes regulating salt tolerance in cucumber ( Cucumis sativus L.) seedlings . Int J Mol Sci . 2022 ; 23 : 8260

Garrido Y , Tudela JA , Marín A . et al.  Physiological, phytochemical and structural changes of multi-leaf lettuce caused by salt stress . J Sci Food Agric . 2014 ; 94 : 1592 – 9

Yao X , Meng L , Zhao W . et al.  Changes in the morphology traits, anatomical structure of the leaves and transcriptome in Lycium barbarum L. under salt stress . Front Plant Sci . 2023 ; 14 :1090366

Tan Y , Duan Y , Chi Q . et al.  The role of reactive oxygen species in plant response to radiation . Int J Mol Sci . 2023 ; 24 : 3346

Li M , Chen R , Jiang Q . et al.  GmNAC06 , a NAC domain transcription factor enhances salt stress tolerance in soybean . Plant Mol Biol . 2021 ; 105 : 333 – 45

Wan X , Peng L , Xiong J . et al.  AtSIBP1 , a novel BTB domain-containing protein, positively regulates salt signaling in Arabidopsis thaliana . Plan Theory . 2019 ; 8 : 573

Rezayian M , Niknam V , Ebrahimzadeh H . Oxidative damage and antioxidative system in algae . Toxicol Rep . 2019 ; 6 : 1309 – 13

Liu X , Cheng X , Cao J . et al.  GOLDEN 2-LIKE transcription factors regulate chlorophyll biosynthesis and flavonoid accumulation in response to UV-B in tea plants . Hortic Plant J . 2023 ; 9 : 1055 – 66

Barreca D , Gattuso G , Bellocco E . et al.  Flavanones: citrus phytochemical with health-promoting properties . Biofactors . 2017 ; 43 : 495 – 506

Zhang F , Huang J , Guo H . et al.  OsRLCK160 contributes to flavonoid accumulation and UV-B tolerance by regulating OsbZIP48 in rice . Sci China Life Sci . 2022 ; 65 : 1380 – 94

Cui M , Liang Z , Liu Y . et al.  Flavonoid profile of Anoectochilus roxburghii (wall.) Lindl. Under short-term heat stress revealed by integrated metabolome, transcriptome, and biochemical analyses . Plant Physiol Biochem . 2023 ; 201 :107896

Dias MC , Pinto DCGA , Silva AMS . Plant flavonoids: chemical characteristics and biological activity . Molecules . 2021 ; 26 : 5377

Kumar S , Pandey AK . Chemistry and biological activities of flavonoids: an overview . Sci World J . 2013 ; 2013 : 1 – 16

Chen C . Sinapic acid and its derivatives as medicine in oxidative stress-induced diseases and aging . Oxidative Med Cell Longev . 2016 ; 2016 : 1 – 10

Rasul A , Millimouno FM , Ali Eltayb W . et al.  Pinocembrin: a novel natural compound with versatile pharmacological and biological activities . Biomed Res Int . 2013 ; 2013 : 1 – 9

Doneda E , Bianchi SE , Pittol V . et al.  3-O-methylquercetin from Achyrocline satureioides -cytotoxic activity against A375-derived human melanoma cell lines and its incorporation into cyclodextrins-hydrogels for topical administration . Drug Deliv Transl Res . 2021 ; 11 : 2151 – 68

Alam W , Khan H , Shah MA . et al.  Kaempferol as a dietary anti-inflammatory agent: current therapeutic standing . Molecules . 2020 ; 25 : 4073

Chen Y , Mao Y , Liu H . et al.  Transcriptome analysis of differentially expressed genes relevant to variegation in peach flowers . PLoS One . 2014 ; 9 :e90842

Duan B , Tan X , Long J . et al.  Integrated transcriptomic-metabolomic analysis reveals that cinnamaldehyde exposure positively regulates the phenylpropanoid pathway in postharvest Satsuma mandarin ( Citrus unshiu ) . Pestic Biochem Physiol . 2023 ; 189 :105312

Lam PY , Wang L , Lui ACW . et al.  Deficiency in flavonoid biosynthesis genes CHS , CHI , and CHIL alters rice flavonoid and lignin profiles . Plant Physiol . 2022 ; 188 : 1993 – 2011

Wu X , Zhang S , Liu X . et al.  Chalcone synthase (CHS) family members analysis from eggplant ( Solanum melongena L.) in the flavonoid biosynthetic pathway and expression patterns in response to heat stress . PLoS One . 2020 ; 15 :e0226537

Wang X , Chai X , Gao B . et al.  Multi-omics analysis reveals the mechanism of bHLH130 responding to low-nitrogen stress of apple rootstock . Plant Physiol . 2023 ; 191 : 1305 – 23

Ohno S , Hosokawa M , Hoshino A . et al.  A bHLH transcription factor, DvIVS , is involved in regulation of anthocyanin synthesis in dahlia ( Dahlia variabilis ) . J Exp Bot . 2011 ; 62 : 5105 – 16

Baudry A , Caboche M , Lepiniec L . TT8 controls its own expression in a feedback regulation involving TTG1 and homologous MYB and bHLH factors, allowing a strong and cell-specific accumulation of flavonoids in Arabidopsis thaliana . Plant J . 2006 ; 46 : 768 – 79

Gao C , Guo Y , Wang J . et al.  Brassica napus GLABRA3-1 promotes anthocyanin biosynthesis and trichome formation in true leaves when expressed in Arabidopsis thaliana . Plant Biol (Stuttg) . 2018 ; 20 : 3 – 9

Feyissa DN , Løvdal T , Olsen KM . et al.  The endogenous GL3 , but not EGL3 , gene is necessary for anthocyanin accumulation as induced by nitrogen depletion in Arabidopsis rosette stage leaves . Planta . 2009 ; 230 : 747 – 54

Lim S , Kim D , Jung J . et al.  Alternative splicing of the basic helix-loop-helix transcription factor gene CmbHLH2 affects anthocyanin biosynthesis in ray florets of chrysanthemum ( Chrysanthemum morifolium ) . Front Plant Sci . 2021 ; 12 :

Song Y , Ma B , Guo Q . et al.  UV-B induces the expression of flavonoid biosynthetic pathways in blueberry ( Vaccinium corymbosum ) calli . Front Plant Sci . 2022 ; 13 :

Li W , Mao J , Yang SJ . et al.  Anthocyanin accumulation correlates with hormones in the fruit skin of 'Red Delicious' and its four generation bud sport mutants . BMC Plant Biol . 2018 ; 18 : 363

Li W , Ning GX , Mao J . et al.  Whole-genome DNA methylation patterns and complex associations with gene expression associated with anthocyanin biosynthesis in apple fruit skin . Planta . 2019 ; 250 : 1833 – 47

Sun J , Lu J , Bai M . et al.  Phytochrome-interacting factors interact with transcription factor CONSTANS to suppress flowering in rose . Plant Physiol . 2021 ; 186 : 1186 – 201

Su L , Zhang Y , Yu S . et al.  RcbHLH59-RcPRs module enhances salinity stress tolerance by balancing Na+/K+ through callose deposition in rose ( Rosa chinensis ) . Hortic Res . 2023 ; 10 :

Liu W , Zhang R , Xiang C . et al.  Transcriptomic and physiological analysis reveal that α-linolenic acid biosynthesis responds to early chilling tolerance in pumpkin rootstock varieties . Front Plant Sci . 2021 ; 12 :

Bradford MM . A rapid and sensitive method for the quantitation of microgram quantities of protein utilizing the principle of protein-dye binding . Anal Biochem . 1976 ; 72 : 248 – 54

Cheng C , Yu Q , Wang Y . et al.  Ethylene-regulated asymmetric growth of the petal base promotes flower opening in rose ( Rosa hybrida ) . Plant Cell . 2021 ; 33 : 1229 – 51

Zhang Y , Wu Z , Feng M . et al.  The circadian-controlled PIF8-BBX28 module regulates petal senescence in rose flowers by governing mitochondrial ROS homeostasis at night . Plant Cell . 2021 ; 33 : 2716 – 35

Meng Y , Zhang H , Fan Y . et al.  Anthocyanins accumulation analysis of correlated genes by metabolome and transcriptome in green and purple peppers ( Capsicum annuum ) . BMC Plant Biol . 2022 ; 22 : 358

Deng H , Wu G , Zhang R . et al.  Comparative nutritional and metabolic analysis reveals the taste variations during yellow rambutan fruit maturation . Food Chem X . 2023 ; 17 :100580

Liu D , Pan Y , Li K . et al.  Proteomics reveals the mechanism underlying the inhibition of Phytophthora sojae by propyl gallate . J Agric Food Chem . 2020 ; 68 : 8151 – 62

Yang B , He S , Liu Y . et al.  Transcriptomics integrated with metabolomics reveals the effect of regulated deficit irrigation on anthocyanin biosynthesis in cabernet sauvignon grape berries . Food Chem . 2020 ; 314 :126170

Umer MJ , Bin Safdar L , Gebremeskel H . et al.  Identification of key gene networks controlling organic acid and sugar metabolism during watermelon fruit development by integrating metabolic phenotypes and gene expression profiles . Hortic Res . 2020 ; 7 : 193

Liang Y , Jiang C , Liu Y . et al.  Auxin regulates sucrose transport to repress petal abscission in rose ( Rosa hybrida ) . Plant Cell . 2020 ; 32 : 3485 – 99

Author notes

Supplementary data, email alerts, citing articles via.

  • International Horticulture Research Conference
  • Advertising & Corporate Services

Affiliations

  • Online ISSN 2052-7276
  • Print ISSN 2662-6810
  • Copyright © 2024 Nanjing Agricultural University
  • About Oxford Academic
  • Publish journals with us
  • University press partners
  • What we publish
  • New features  
  • Open access
  • Institutional account management
  • Rights and permissions
  • Get help with access
  • Accessibility
  • Advertising
  • Media enquiries
  • Oxford University Press
  • Oxford Languages
  • University of Oxford

Oxford University Press is a department of the University of Oxford. It furthers the University's objective of excellence in research, scholarship, and education by publishing worldwide

  • Copyright © 2024 Oxford University Press
  • Cookie settings
  • Cookie policy
  • Privacy policy
  • Legal notice

This Feature Is Available To Subscribers Only

Sign In or Create an Account

This PDF is available to Subscribers Only

For full access to this pdf, sign in to an existing account, or purchase an annual subscription.

IMAGES

  1. Empirical Research: Definition, Methods, Types and Examples

    empirical research analysis method

  2. Empirical Research: Definition, Methods, Types and Examples

    empirical research analysis method

  3. What Is Empirical Research? Definition, Types & Samples

    empirical research analysis method

  4. What Is Empirical Research? Definition, Types & Samples

    empirical research analysis method

  5. Empirical methodology phases.

    empirical research analysis method

  6. 15 Empirical Evidence Examples (2023) (2024)

    empirical research analysis method

VIDEO

  1. An Empirical Analysis of the Interconnection Queue

  2. Research Methods

  3. From régulation of plateforms: An empirical analysis of large digital platforms tools and policies

  4. Empirical research methods

  5. Empirical algorithmics

  6. Session 04: Data Analysis techniques in Qualitative Research

COMMENTS

  1. Empirical Research: Definition, Methods, Types and Examples

    Empirical research is defined as any research where conclusions of the study is strictly drawn from concretely empirical evidence, and therefore "verifiable" evidence. ... Text analysis: Text analysis method is a little new compared to the other types. Such a method is used to analyse social life by going through images or words used by the ...

  2. Empirical Research: Defining, Identifying, & Finding

    Quantitative research -- an approach to documenting reality that relies heavily on numbers both for the measurement of variables and for data analysis (p. 33). Qualitative research -- an approach to documenting reality that relies on words and images as the primary data source (p. 33). Both quantitative and qualitative methods are empirical. If ...

  3. Empirical research

    Accurate analysis of data using standardized statistical methods in scientific studies is critical to determining the validity of empirical research. Statistical formulas such as regression, uncertainty coefficient , t-test, chi square , and various types of ANOVA (analyses of variance) are fundamental to forming logical, valid conclusions.

  4. What is Empirical Research? Definition, Methods, Examples

    Empirical research is characterized by several key features: Observation and Measurement: It involves the systematic observation or measurement of variables, events, or behaviors. Data Collection: Researchers collect data through various methods, such as surveys, experiments, observations, or interviews.

  5. Empirical Research

    Strategies for Empirical Research in Writing is a particularly accessible approach to both qualitative and quantitative empirical research methods, helping novices appreciate the value of empirical research in writing while easing their fears about the research process. This comprehensive book covers research methods ranging from traditional ...

  6. Understanding the Empirical Method in Research Methodology

    The empirical method, central to scientific inquiry, relies on data collection and observation over theoretical speculation. It contrasts with experimental methods by focusing on natural data aggregation rather than controlled experiments, highlighting its roots in experiential learning and its significance in developing theories or conclusions.

  7. Empirical Research

    Hence, empirical research is a method of uncovering empirical evidence. Through the process of gathering valid empirical data, scientists from a variety of fields, ranging from the social to the natural sciences, have to carefully design their methods. This helps to ensure quality and accuracy of data collection and treatment.

  8. Data, measurement and empirical methods in the science of science

    Liu and coauthors review the major data sources, measures and analysis methods in the science of science, discussing how recent developments in these fields can help researchers to better predict ...

  9. The Empirical Research Paper: A Guide

    Empirical research employs rigorous methods to test out theories and hypotheses (expectations) using real data instead of hunches or anecdotal observations. This type of research is easily identifiable as it always consists of the following pieces of information: This Guide will serve to offer a basic understanding on how to approach empirical ...

  10. Introduction to Empirical Data Analysis

    A more detailed differentiation of the various methods of multivariate analysis is provided in Sect. 1.1.3. Today, methods of multivariate analysis are one of the foundations of empirical research in science. So, not surprisingly, the methods are still undergoing rapid development.

  11. Empirical Research in the Social Sciences and Education

    Another hint: some scholarly journals use a specific layout, called the "IMRaD" format, to communicate empirical research findings. Such articles typically have 4 components: Introduction : sometimes called "literature review" -- what is currently known about the topic -- usually includes a theoretical framework and/or discussion of previous ...

  12. What is empirical analysis and how does it work?

    Empirical analysis is an evidence-based approach to the study and interpretation of information. The empirical approach relies on real-world data, metrics and results rather than theories and concepts.

  13. Empirical Research: Quantitative & Qualitative

    Two basic research processes or methods in empirical research: quantitative methods and qualitative methods (see the rest of the guide for more about these methods). ... Then, the sample data are analyzed using descriptive statistical analysis. Finally, generalizations are made from the sample data to the entire population using statistical ...

  14. Empirical Research: A Comprehensive Guide for Academics

    Tips for Empirical Writing. In empirical research, the writing is usually done in research papers, articles, or reports. The empirical writing follows a set structure, and each section has a specific role. Here are some tips for your empirical writing. 7. Define Your Objectives: When you write about your research, start by making your goals clear.

  15. Methods

    Since the "Methods" section describes how the research is being conducted, it is probably the most important section for identifying empirical research. It is where you are likely to find many criteria, including the. Design, ... "Sample" subheading covers the sample while "Measures" and "Data Analysis" cover methodology and design. Sosoo et al ...

  16. What Is Empirical Research? Definition, Types & Samples in 2024

    A relatively new research method, textual analysis is often used nowadays to elaborate on the trends and patterns of media content, especially social media. ... This article was able to discuss the different empirical research methods, the steps for conducting empirical research, the empirical research cycle, and notable examples. All of these ...

  17. What is Empirical Research Study? [Examples & Method]

    Empirical research is a type of research methodology that makes use of verifiable evidence in order to arrive at research outcomes. In other words, this type of research relies solely on evidence obtained through observation or scientific data collection methods. Empirical research can be carried out using qualitative or quantitative ...

  18. What is empirical research: Methods, types & examples

    Empirical research methods are used when the researcher needs to gather data analysis on direct, observable, and measurable data. Research findings are a great way to make grounded ideas. Here are some situations when one may need to do empirical research: 1. When quantitative or qualitative data is needed.

  19. Introduction to systematic review and meta-analysis

    It is easy to confuse systematic reviews and meta-analyses. A systematic review is an objective, reproducible method to find answers to a certain research question, by collecting all available studies related to that question and reviewing and analyzing their results. A meta-analysis differs from a systematic review in that it uses statistical ...

  20. Empirical Research Method

    Empirical Research Methods in the Economics of Education. P.J. McEwan, in International Encyclopedia of Education (Third Edition), 2010 Conclusions. This article has described empirical research methods used to estimate the causal effect of education investments on outcomes. Some, like regression analysis with nonexperimental data, are common but not always capable of delivering strong causal ...

  21. Conduct empirical research

    Share this content. Empirical research is research that is based on observation and measurement of phenomena, as directly experienced by the researcher. The data thus gathered may be compared against a theory or hypothesis, but the results are still based on real life experience. The data gathered is all primary data, although secondary data ...

  22. PDF Introduction to Empirical Data Analysis

    the methods of multivariate analysis more knowledgeably. Therefore, all methods are explained independently of each other, i.e., the different chapters may be read individu-ally and in any order. 1.1.1 Empirical Studies and Quantitative Data Analysis Empirical research involves the collection of data and their evaluation using qualitative

  23. Choosing Empirical Study Methods in Business Management

    Data needs are central to selecting an empirical study method. Qualitative research typically generates text-based data like interview transcripts, while quantitative research produces numerical data.

  24. New Content From Advances in Methods and Practices in Psychological

    Participants included 103 research-methods instructors, academics, students, and nonacademic psychologists. Of 78 items included in the consensus process, 34 reached consensus. We coupled these results with a qualitative analysis of 707 open-ended text responses to develop nine recommendations for organizations that accredit undergraduate ...

  25. Global Governance of Artificial Intelligence: Next Steps for Empirical

    The example of autonomous weapons further illustrates how the global governance of AI raises urgent empirical and normative questions for research. On the empirical side, these developments invite researchers to map emerging regulatory initiatives, such as those within the CCW, and to explain why these particular frameworks become dominant.

  26. The performance of socially responsible investments: A meta‐analysis

    The results of empirical research that analyzes whether ESG portfolios underperform the market are mixed. In support of the underperformance hypothesis, Dorfleitner and Grebler ... a publication bias can occur in our meta-analysis. A common method to identify the presence of a potential publication or selection bias in a meta-study is funnel ...

  27. Consumers' Risk Perception of Triploid Food: Empirical Research Based

    Choose the appropriate analysis of the variance method. Set assumptions. Conduct variance analysis. Interpretation of the results. ... Fanjie Hao, Yutong Wei, Shangjie Ge-Zhang, and Jingang Cui. 2024. "Consumers' Risk Perception of Triploid Food: Empirical Research Based on Variance Analysis and Structural Equation Modeling ...

  28. Protecting human subjects participating in research

    Introduction. Institutional review boards (IRBs) and comparable entities, such as research ethics committees and ethics review boards, have been established for the primary purpose of protecting human subjects participating in research [].Since the establishment of the IRB system in the 1970s, research institutions have delegated the authorities and responsibilities of protecting human ...

  29. A new sample-size planning approach for person-specific VAR ...

    We propose a new simulation-based sample-size planning method called predictive accuracy analysis (PAA), and an associated Shiny app. ... If the empirical power is too low, the sample size and thus the number of time points is increased until the empirical power is sufficiently high. ... Capturing context-related change in emotional dynamics ...

  30. Multi-omics analysis reveals key regulatory defense pathways and genes

    Introduction. Rose (Rosa spp.) is a popular ornamental crop that is also used in the cosmetics, perfume and medicine.Rose plants contains various bioactive substances, including flavonoids, fragrant components, and hydrolysable and condensed tannins, which have high value and market potential [].However, soil salinization is common in many rose-growing regions, and high salt concentrations in ...