Grad Coach

Quant Analysis 101: Descriptive Statistics

Everything You Need To Get Started (With Examples)

By: Derek Jansen (MBA) | Reviewers: Kerryn Warren (PhD) | October 2023

If you’re new to quantitative data analysis , one of the first terms you’re likely to hear being thrown around is descriptive statistics. In this post, we’ll unpack the basics of descriptive statistics, using straightforward language and loads of examples . So grab a cup of coffee and let’s crunch some numbers!

Overview: Descriptive Statistics

What are descriptive statistics.

  • Descriptive vs inferential statistics
  • Why the descriptives matter
  • The “ Big 7 ” descriptive statistics
  • Key takeaways

At the simplest level, descriptive statistics summarise and describe relatively basic but essential features of a quantitative dataset – for example, a set of survey responses. They provide a snapshot of the characteristics of your dataset and allow you to better understand, roughly, how the data are “shaped” (more on this later). For example, a descriptive statistic could include the proportion of males and females within a sample or the percentages of different age groups within a population.

Another common descriptive statistic is the humble average (which in statistics-talk is called the mean ). For example, if you undertook a survey and asked people to rate their satisfaction with a particular product on a scale of 1 to 10, you could then calculate the average rating. This is a very basic statistic, but as you can see, it gives you some idea of how this data point is shaped .

Descriptive statistics summarise and describe relatively basic but essential features of a quantitative dataset, including its “shape”

What about inferential statistics?

Now, you may have also heard the term inferential statistics being thrown around, and you’re probably wondering how that’s different from descriptive statistics. Simply put, descriptive statistics describe and summarise the sample itself , while inferential statistics use the data from a sample to make inferences or predictions about a population .

Put another way, descriptive statistics help you understand your dataset , while inferential statistics help you make broader statements about the population , based on what you observe within the sample. If you’re keen to learn more, we cover inferential stats in another post , or you can check out the explainer video below.

Why do descriptive statistics matter?

While descriptive statistics are relatively simple from a mathematical perspective, they play a very important role in any research project . All too often, students skim over the descriptives and run ahead to the seemingly more exciting inferential statistics, but this can be a costly mistake.

The reason for this is that descriptive statistics help you, as the researcher, comprehend the key characteristics of your sample without getting lost in vast amounts of raw data. In doing so, they provide a foundation for your quantitative analysis . Additionally, they enable you to quickly identify potential issues within your dataset – for example, suspicious outliers, missing responses and so on. Just as importantly, descriptive statistics inform the decision-making process when it comes to choosing which inferential statistics you’ll run, as each inferential test has specific requirements regarding the shape of the data.

Long story short, it’s essential that you take the time to dig into your descriptive statistics before looking at more “advanced” inferentials. It’s also worth noting that, depending on your research aims and questions, descriptive stats may be all that you need in any case . So, don’t discount the descriptives! 

Free Webinar: Research Methodology 101

The “Big 7” descriptive statistics

With the what and why out of the way, let’s take a look at the most common descriptive statistics. Beyond the counts, proportions and percentages we mentioned earlier, we have what we call the “Big 7” descriptives. These can be divided into two categories – measures of central tendency and measures of dispersion.

Measures of central tendency

True to the name, measures of central tendency describe the centre or “middle section” of a dataset. In other words, they provide some indication of what a “typical” data point looks like within a given dataset. The three most common measures are:

The mean , which is the mathematical average of a set of numbers – in other words, the sum of all numbers divided by the count of all numbers. 
The median , which is the middlemost number in a set of numbers, when those numbers are ordered from lowest to highest.
The mode , which is the most frequently occurring number in a set of numbers (in any order). Naturally, a dataset can have one mode, no mode (no number occurs more than once) or multiple modes.

To make this a little more tangible, let’s look at a sample dataset, along with the corresponding mean, median and mode. This dataset reflects the service ratings (on a scale of 1 – 10) from 15 customers.

Example set of descriptive stats

As you can see, the mean of 5.8 is the average rating across all 15 customers. Meanwhile, 6 is the median . In other words, if you were to list all the responses in order from low to high, Customer 8 would be in the middle (with their service rating being 6). Lastly, the number 5 is the most frequent rating (appearing 3 times), making it the mode.

Together, these three descriptive statistics give us a quick overview of how these customers feel about the service levels at this business. In other words, most customers feel rather lukewarm and there’s certainly room for improvement. From a more statistical perspective, this also means that the data tend to cluster around the 5-6 mark , since the mean and the median are fairly close to each other.

To take this a step further, let’s look at the frequency distribution of the responses . In other words, let’s count how many times each rating was received, and then plot these counts onto a bar chart.

Example frequency distribution of descriptive stats

As you can see, the responses tend to cluster toward the centre of the chart , creating something of a bell-shaped curve. In statistical terms, this is called a normal distribution .

As you delve into quantitative data analysis, you’ll find that normal distributions are very common , but they’re certainly not the only type of distribution. In some cases, the data can lean toward the left or the right of the chart (i.e., toward the low end or high end). This lean is reflected by a measure called skewness , and it’s important to pay attention to this when you’re analysing your data, as this will have an impact on what types of inferential statistics you can use on your dataset.

Example of skewness

Measures of dispersion

While the measures of central tendency provide insight into how “centred” the dataset is, it’s also important to understand how dispersed that dataset is . In other words, to what extent the data cluster toward the centre – specifically, the mean. In some cases, the majority of the data points will sit very close to the centre, while in other cases, they’ll be scattered all over the place. Enter the measures of dispersion, of which there are three:

Range , which measures the difference between the largest and smallest number in the dataset. In other words, it indicates how spread out the dataset really is.

Variance , which measures how much each number in a dataset varies from the mean (average). More technically, it calculates the average of the squared differences between each number and the mean. A higher variance indicates that the data points are more spread out , while a lower variance suggests that the data points are closer to the mean.

Standard deviation , which is the square root of the variance . It serves the same purposes as the variance, but is a bit easier to interpret as it presents a figure that is in the same unit as the original data . You’ll typically present this statistic alongside the means when describing the data in your research.

Again, let’s look at our sample dataset to make this all a little more tangible.

descriptive formula in research

As you can see, the range of 8 reflects the difference between the highest rating (10) and the lowest rating (2). The standard deviation of 2.18 tells us that on average, results within the dataset are 2.18 away from the mean (of 5.8), reflecting a relatively dispersed set of data .

For the sake of comparison, let’s look at another much more tightly grouped (less dispersed) dataset.

Example of skewed data

As you can see, all the ratings lay between 5 and 8 in this dataset, resulting in a much smaller range, variance and standard deviation . You might also notice that the data are clustered toward the right side of the graph – in other words, the data are skewed. If we calculate the skewness for this dataset, we get a result of -0.12, confirming this right lean.

In summary, range, variance and standard deviation all provide an indication of how dispersed the data are . These measures are important because they help you interpret the measures of central tendency within context . In other words, if your measures of dispersion are all fairly high numbers, you need to interpret your measures of central tendency with some caution , as the results are not particularly centred. Conversely, if the data are all tightly grouped around the mean (i.e., low dispersion), the mean becomes a much more “meaningful” statistic).

Key Takeaways

We’ve covered quite a bit of ground in this post. Here are the key takeaways:

  • Descriptive statistics, although relatively simple, are a critically important part of any quantitative data analysis.
  • Measures of central tendency include the mean (average), median and mode.
  • Skewness indicates whether a dataset leans to one side or another
  • Measures of dispersion include the range, variance and standard deviation

If you’d like hands-on help with your descriptive statistics (or any other aspect of your research project), check out our private coaching service , where we hold your hand through each step of the research journey. 

Literature Review Course

Psst… there’s more!

This post is an extract from our bestselling short course, Methodology Bootcamp . If you want to work smart, you don't want to miss this .

You Might Also Like:

Inferential stats 101

Good day. May I ask about where I would be able to find the statistics cheat sheet?

Khan

Right above you comment 🙂

Laarbik Patience

Good job. you saved me

Lou

Brilliant and well explained. So much information explained clearly!

Submit a Comment Cancel reply

Your email address will not be published. Required fields are marked *

Save my name, email, and website in this browser for the next time I comment.

  • Print Friendly
  • Skip to main content
  • Skip to primary sidebar
  • Skip to footer
  • QuestionPro

survey software icon

  • Solutions Industries Gaming Automotive Sports and events Education Government Travel & Hospitality Financial Services Healthcare Cannabis Technology Use Case NPS+ Communities Audience Contactless surveys Mobile LivePolls Member Experience GDPR Positive People Science 360 Feedback Surveys
  • Resources Blog eBooks Survey Templates Case Studies Training Help center

descriptive formula in research

Home Market Research

Descriptive Research: Definition, Characteristics, Methods + Examples

Descriptive Research

Suppose an apparel brand wants to understand the fashion purchasing trends among New York’s buyers, then it must conduct a demographic survey of the specific region, gather population data, and then conduct descriptive research on this demographic segment.

The study will then uncover details on “what is the purchasing pattern of New York buyers,” but will not cover any investigative information about “ why ” the patterns exist. Because for the apparel brand trying to break into this market, understanding the nature of their market is the study’s main goal. Let’s talk about it.

What is descriptive research?

Descriptive research is a research method describing the characteristics of the population or phenomenon studied. This descriptive methodology focuses more on the “what” of the research subject than the “why” of the research subject.

The method primarily focuses on describing the nature of a demographic segment without focusing on “why” a particular phenomenon occurs. In other words, it “describes” the research subject without covering “why” it happens.

Characteristics of descriptive research

The term descriptive research then refers to research questions, the design of the study, and data analysis conducted on that topic. We call it an observational research method because none of the research study variables are influenced in any capacity.

Some distinctive characteristics of descriptive research are:

  • Quantitative research: It is a quantitative research method that attempts to collect quantifiable information for statistical analysis of the population sample. It is a popular market research tool that allows us to collect and describe the demographic segment’s nature.
  • Uncontrolled variables: In it, none of the variables are influenced in any way. This uses observational methods to conduct the research. Hence, the nature of the variables or their behavior is not in the hands of the researcher.
  • Cross-sectional studies: It is generally a cross-sectional study where different sections belonging to the same group are studied.
  • The basis for further research: Researchers further research the data collected and analyzed from descriptive research using different research techniques. The data can also help point towards the types of research methods used for the subsequent research.

Applications of descriptive research with examples

A descriptive research method can be used in multiple ways and for various reasons. Before getting into any survey , though, the survey goals and survey design are crucial. Despite following these steps, there is no way to know if one will meet the research outcome. How to use descriptive research? To understand the end objective of research goals, below are some ways organizations currently use descriptive research today:

  • Define respondent characteristics: The aim of using close-ended questions is to draw concrete conclusions about the respondents. This could be the need to derive patterns, traits, and behaviors of the respondents. It could also be to understand from a respondent their attitude, or opinion about the phenomenon. For example, understand millennials and the hours per week they spend browsing the internet. All this information helps the organization researching to make informed business decisions.
  • Measure data trends: Researchers measure data trends over time with a descriptive research design’s statistical capabilities. Consider if an apparel company researches different demographics like age groups from 24-35 and 36-45 on a new range launch of autumn wear. If one of those groups doesn’t take too well to the new launch, it provides insight into what clothes are like and what is not. The brand drops the clothes and apparel that customers don’t like.
  • Conduct comparisons: Organizations also use a descriptive research design to understand how different groups respond to a specific product or service. For example, an apparel brand creates a survey asking general questions that measure the brand’s image. The same study also asks demographic questions like age, income, gender, geographical location, geographic segmentation , etc. This consumer research helps the organization understand what aspects of the brand appeal to the population and what aspects do not. It also helps make product or marketing fixes or even create a new product line to cater to high-growth potential groups.
  • Validate existing conditions: Researchers widely use descriptive research to help ascertain the research object’s prevailing conditions and underlying patterns. Due to the non-invasive research method and the use of quantitative observation and some aspects of qualitative observation , researchers observe each variable and conduct an in-depth analysis . Researchers also use it to validate any existing conditions that may be prevalent in a population.
  • Conduct research at different times: The analysis can be conducted at different periods to ascertain any similarities or differences. This also allows any number of variables to be evaluated. For verification, studies on prevailing conditions can also be repeated to draw trends.

Advantages of descriptive research

Some of the significant advantages of descriptive research are:

Advantages of descriptive research

  • Data collection: A researcher can conduct descriptive research using specific methods like observational method, case study method, and survey method. Between these three, all primary data collection methods are covered, which provides a lot of information. This can be used for future research or even for developing a hypothesis for your research object.
  • Varied: Since the data collected is qualitative and quantitative, it gives a holistic understanding of a research topic. The information is varied, diverse, and thorough.
  • Natural environment: Descriptive research allows for the research to be conducted in the respondent’s natural environment, which ensures that high-quality and honest data is collected.
  • Quick to perform and cheap: As the sample size is generally large in descriptive research, the data collection is quick to conduct and is inexpensive.

Descriptive research methods

There are three distinctive methods to conduct descriptive research. They are:

Observational method

The observational method is the most effective method to conduct this research, and researchers make use of both quantitative and qualitative observations.

A quantitative observation is the objective collection of data primarily focused on numbers and values. It suggests “associated with, of or depicted in terms of a quantity.” Results of quantitative observation are derived using statistical and numerical analysis methods. It implies observation of any entity associated with a numeric value such as age, shape, weight, volume, scale, etc. For example, the researcher can track if current customers will refer the brand using a simple Net Promoter Score question .

Qualitative observation doesn’t involve measurements or numbers but instead just monitoring characteristics. In this case, the researcher observes the respondents from a distance. Since the respondents are in a comfortable environment, the characteristics observed are natural and effective. In a descriptive research design, the researcher can choose to be either a complete observer, an observer as a participant, a participant as an observer, or a full participant. For example, in a supermarket, a researcher can from afar monitor and track the customers’ selection and purchasing trends. This offers a more in-depth insight into the purchasing experience of the customer.

Case study method

Case studies involve in-depth research and study of individuals or groups. Case studies lead to a hypothesis and widen a further scope of studying a phenomenon. However, case studies should not be used to determine cause and effect as they can’t make accurate predictions because there could be a bias on the researcher’s part. The other reason why case studies are not a reliable way of conducting descriptive research is that there could be an atypical respondent in the survey. Describing them leads to weak generalizations and moving away from external validity.

Survey research

In survey research, respondents answer through surveys or questionnaires or polls . They are a popular market research tool to collect feedback from respondents. A study to gather useful data should have the right survey questions. It should be a balanced mix of open-ended questions and close ended-questions . The survey method can be conducted online or offline, making it the go-to option for descriptive research where the sample size is enormous.

Examples of descriptive research

Some examples of descriptive research are:

  • A specialty food group launching a new range of barbecue rubs would like to understand what flavors of rubs are favored by different people. To understand the preferred flavor palette, they conduct this type of research study using various methods like observational methods in supermarkets. By also surveying while collecting in-depth demographic information, offers insights about the preference of different markets. This can also help tailor make the rubs and spreads to various preferred meats in that demographic. Conducting this type of research helps the organization tweak their business model and amplify marketing in core markets.
  • Another example of where this research can be used is if a school district wishes to evaluate teachers’ attitudes about using technology in the classroom. By conducting surveys and observing their comfortableness using technology through observational methods, the researcher can gauge what they can help understand if a full-fledged implementation can face an issue. This also helps in understanding if the students are impacted in any way with this change.

Some other research problems and research questions that can lead to descriptive research are:

  • Market researchers want to observe the habits of consumers.
  • A company wants to evaluate the morale of its staff.
  • A school district wants to understand if students will access online lessons rather than textbooks.
  • To understand if its wellness questionnaire programs enhance the overall health of the employees.

FREE TRIAL         LEARN MORE

MORE LIKE THIS

NPS Survey Platform

NPS Survey Platform: Types, Tips, 11 Best Platforms & Tools

Apr 26, 2024

user journey vs user flow

User Journey vs User Flow: Differences and Similarities

gap analysis tools

Best 7 Gap Analysis Tools to Empower Your Business

Apr 25, 2024

employee survey tools

12 Best Employee Survey Tools for Organizational Excellence

Other categories.

  • Academic Research
  • Artificial Intelligence
  • Assessments
  • Brand Awareness
  • Case Studies
  • Communities
  • Consumer Insights
  • Customer effort score
  • Customer Engagement
  • Customer Experience
  • Customer Loyalty
  • Customer Research
  • Customer Satisfaction
  • Employee Benefits
  • Employee Engagement
  • Employee Retention
  • Friday Five
  • General Data Protection Regulation
  • Insights Hub
  • Life@QuestionPro
  • Market Research
  • Mobile diaries
  • Mobile Surveys
  • New Features
  • Online Communities
  • Question Types
  • Questionnaire
  • QuestionPro Products
  • Release Notes
  • Research Tools and Apps
  • Revenue at Risk
  • Survey Templates
  • Training Tips
  • Uncategorized
  • Video Learning Series
  • What’s Coming Up
  • Workforce Intelligence

Root out friction in every digital experience, super-charge conversion rates, and optimize digital self-service

Uncover insights from any interaction, deliver AI-powered agent coaching, and reduce cost to serve

Increase revenue and loyalty with real-time insights and recommendations delivered to teams on the ground

Know how your people feel and empower managers to improve employee engagement, productivity, and retention

Take action in the moments that matter most along the employee journey and drive bottom line growth

Whatever they’re are saying, wherever they’re saying it, know exactly what’s going on with your people

Get faster, richer insights with qual and quant tools that make powerful market research available to everyone

Run concept tests, pricing studies, prototyping + more with fast, powerful studies designed by UX research experts

Track your brand performance 24/7 and act quickly to respond to opportunities and challenges in your market

Explore the platform powering Experience Management

  • Free Account
  • For Digital
  • For Customer Care
  • For Human Resources
  • For Researchers
  • Financial Services
  • All Industries

Popular Use Cases

  • Customer Experience
  • Employee Experience
  • Employee Exit Interviews
  • Net Promoter Score
  • Voice of Customer
  • Customer Success Hub
  • Product Documentation
  • Training & Certification
  • XM Institute
  • Popular Resources
  • Customer Stories

Market Research

  • Artificial Intelligence
  • Partnerships
  • Marketplace

The annual gathering of the experience leaders at the world’s iconic brands building breakthrough business results, live in Salt Lake City.

  • English/AU & NZ
  • Español/Europa
  • Español/América Latina
  • Português Brasileiro
  • REQUEST DEMO
  • Experience Management
  • Descriptive Research

Try Qualtrics for free

Descriptive research: what it is and how to use it.

8 min read Understanding the who, what and where of a situation or target group is an essential part of effective research and making informed business decisions.

For example you might want to understand what percentage of CEOs have a bachelor’s degree or higher. Or you might want to understand what percentage of low income families receive government support – or what kind of support they receive.

Descriptive research is what will be used in these types of studies.

In this guide we’ll look through the main issues relating to descriptive research to give you a better understanding of what it is, and how and why you can use it.

Free eBook: 2024 global market research trends report

What is descriptive research?

Descriptive research is a research method used to try and determine the characteristics of a population or particular phenomenon.

Using descriptive research you can identify patterns in the characteristics of a group to essentially establish everything you need to understand apart from why something has happened.

Market researchers use descriptive research for a range of commercial purposes to guide key decisions.

For example you could use descriptive research to understand fashion trends in a given city when planning your clothing collection for the year. Using descriptive research you can conduct in depth analysis on the demographic makeup of your target area and use the data analysis to establish buying patterns.

Conducting descriptive research wouldn’t, however, tell you why shoppers are buying a particular type of fashion item.

Descriptive research design

Descriptive research design uses a range of both qualitative research and quantitative data (although quantitative research is the primary research method) to gather information to make accurate predictions about a particular problem or hypothesis.

As a survey method, descriptive research designs will help researchers identify characteristics in their target market or particular population.

These characteristics in the population sample can be identified, observed and measured to guide decisions.

Descriptive research characteristics

While there are a number of descriptive research methods you can deploy for data collection, descriptive research does have a number of predictable characteristics.

Here are a few of the things to consider:

Measure data trends with statistical outcomes

Descriptive research is often popular for survey research because it generates answers in a statistical form, which makes it easy for researchers to carry out a simple statistical analysis to interpret what the data is saying.

Descriptive research design is ideal for further research

Because the data collection for descriptive research produces statistical outcomes, it can also be used as secondary data for another research study.

Plus, the data collected from descriptive research can be subjected to other types of data analysis .

Uncontrolled variables

A key component of the descriptive research method is that it uses random variables that are not controlled by the researchers. This is because descriptive research aims to understand the natural behavior of the research subject.

It’s carried out in a natural environment

Descriptive research is often carried out in a natural environment. This is because researchers aim to gather data in a natural setting to avoid swaying respondents.

Data can be gathered using survey questions or online surveys.

For example, if you want to understand the fashion trends we mentioned earlier, you would set up a study in which a researcher observes people in the respondent’s natural environment to understand their habits and preferences.

Descriptive research allows for cross sectional study

Because of the nature of descriptive research design and the randomness of the sample group being observed, descriptive research is ideal for cross sectional studies – essentially the demographics of the group can vary widely and your aim is to gain insights from within the group.

This can be highly beneficial when you’re looking to understand the behaviors or preferences of a wider population.

Descriptive research advantages

There are many advantages to using descriptive research, some of them include:

Cost effectiveness

Because the elements needed for descriptive research design are not specific or highly targeted (and occur within the respondent’s natural environment) this type of study is relatively cheap to carry out.

Multiple types of data can be collected

A big advantage of this research type, is that you can use it to collect both quantitative and qualitative data. This means you can use the stats gathered to easily identify underlying patterns in your respondents’ behavior.

Descriptive research disadvantages

Potential reliability issues.

When conducting descriptive research it’s important that the initial survey questions are properly formulated.

If not, it could make the answers unreliable and risk the credibility of your study.

Potential limitations

As we’ve mentioned, descriptive research design is ideal for understanding the what, who or where of a situation or phenomenon.

However, it can’t help you understand the cause or effect of the behavior. This means you’ll need to conduct further research to get a more complete picture of a situation.

Descriptive research methods

Because descriptive research methods include a range of quantitative and qualitative research, there are several research methods you can use.

Use case studies

Case studies in descriptive research involve conducting in-depth and detailed studies in which researchers get a specific person or case to answer questions.

Case studies shouldn’t be used to generate results, rather it should be used to build or establish hypothesis that you can expand into further market research .

For example you could gather detailed data about a specific business phenomenon, and then use this deeper understanding of that specific case.

Use observational methods

This type of study uses qualitative observations to understand human behavior within a particular group.

By understanding how the different demographics respond within your sample you can identify patterns and trends.

As an observational method, descriptive research will not tell you the cause of any particular behaviors, but that could be established with further research.

Use survey research

Surveys are one of the most cost effective ways to gather descriptive data.

An online survey or questionnaire can be used in descriptive studies to gather quantitative information about a particular problem.

Survey research is ideal if you’re using descriptive research as your primary research.

Descriptive research examples

Descriptive research is used for a number of commercial purposes or when organizations need to understand the behaviors or opinions of a population.

One of the biggest examples of descriptive research that is used in every democratic country, is during elections.

Using descriptive research, researchers will use surveys to understand who voters are more likely to choose out of the parties or candidates available.

Using the data provided, researchers can analyze the data to understand what the election result will be.

In a commercial setting, retailers often use descriptive research to figure out trends in shopping and buying decisions.

By gathering information on the habits of shoppers, retailers can get a better understanding of the purchases being made.

Another example that is widely used around the world, is the national census that takes place to understand the population.

The research will provide a more accurate picture of a population’s demographic makeup and help to understand changes over time in areas like population age, health and education level.

Where Qualtrics helps with descriptive research

Whatever type of research you want to carry out, there’s a survey type that will work.

Qualtrics can help you determine the appropriate method and ensure you design a study that will deliver the insights you need.

Our experts can help you with your market research needs , ensuring you get the most out of Qualtrics market research software to design, launch and analyze your data to guide better, more accurate decisions for your organization.

Related resources

Market intelligence 10 min read, marketing insights 11 min read, ethnographic research 11 min read, qualitative vs quantitative research 13 min read, qualitative research questions 11 min read, qualitative research design 12 min read, primary vs secondary research 14 min read, request demo.

Ready to learn more about Qualtrics?

Just one more step to your free trial.

.surveysparrow.com

Already using SurveySparrow?  Login

By clicking on "Get Started", I agree to the Privacy Policy and Terms of Service .

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

Don't miss the future of CX at RefineCX USA!  Register Now

Enterprise Survey Software

Enterprise Survey Software to thrive in your business ecosystem

NPS® Software

Turn customers into promoters

Offline Survey

Real-time data collection, on the move. Go internet-independent.

360 Assessment

Conduct omnidirectional employee assessments. Increase productivity, grow together.

Reputation Management

Turn your existing customers into raving promoters by monitoring online reviews.

Ticket Management

Build loyalty and advocacy by delivering personalized support experiences that matter.

Chatbot for Website

Collect feedback smartly from your website visitors with the engaging Chatbot for website.

Swift, easy, secure. Scalable for your organization.

Executive Dashboard

Customer journey map, craft beautiful surveys, share surveys, gain rich insights, recurring surveys, white label surveys, embedded surveys, conversational forms, mobile-first surveys, audience management, smart surveys, video surveys, secure surveys, api, webhooks, integrations, survey themes, accept payments, custom workflows, all features, customer experience, employee experience, product experience, marketing experience, sales experience, hospitality & travel, market research, saas startup programs, wall of love, success stories, sparrowcast, nps® benchmarks, learning centre, apps & integrations, testimonials.

Our surveys come with superpowers ⚡

Blog General

Descriptive Research 101: Definition, Methods and Examples

Parvathi vijayamohan.

8 April 2024

Table Of Contents

  • Descriptive Research 101: The Definitive Guide

What is Descriptive Research?

Key characteristics of descriptive research.

  • Descriptive Research Methods: The 3 You Need to Know!

Observation

Case studies, 7 types of descriptive research, descriptive research: examples to build your next study, tips to excel at descriptive research.

Imagine you are a detective called to a crime scene. Your job is to study the scene and report whatever you find: whether that’s the half-smoked cigarette on the table or the large “RACHE” written in blood on the wall. That, in a nutshell, is  descriptive research .

Researchers often need to do descriptive research on a problem before they attempt to solve it. So in this guide, we’ll take you through:

  • What is descriptive research + characteristics
  • Descriptive research methods
  • Types of descriptive research
  • Descriptive research examples
  • Tips to excel at the descriptive method

Click to jump to the section that interests you.

Definition: As its name says, descriptive research  describes  the characteristics of the problem, phenomenon, situation, or group under study.

So the goal of all descriptive studies is to  explore  the background, details, and existing patterns in the problem to fully understand it. In other words, preliminary research.

However, descriptive research can be both  preliminary and conclusive . You can use the data from a descriptive study to make reports and get insights for further planning.

What descriptive research isn’t: Descriptive research finds the  what/when/where  of a problem, not the  why/how .

Because of this, we can’t use the descriptive method to explore cause-and-effect relationships where one variable (like a person’s job role) affects another variable (like their monthly income).

  • Answers the “what,” “when,” and “where”  of a research problem. For this reason, it is popularly used in  market research ,  awareness surveys , and  opinion polls .
  • Sets the stage  for a research problem. As an early part of the research process, descriptive studies help you dive deeper into the topic.
  • Opens the door  for further research. You can use descriptive data as the basis for more profound research, analysis and studies.
  • Qualitative and quantitative . It is possible to get a balanced mix of numerical responses and open-ended answers from the descriptive method.
  • No control or interference with the variables . The researcher simply observes and reports on them. However, specific research software has filters that allow her to zoom in on one variable.
  • Done in natural settings . You can get the best results from descriptive research by talking to people, surveying them, or observing them in a suitable environment. For example, suppose you are a website beta testing an app feature. In that case, descriptive research invites users to try the feature, tracking their behavior and then asking their opinions .
  • Can be applied to many research methods and areas. Examples include healthcare, SaaS, psychology, political studies, education, and pop culture.

Descriptive Research Methods: The Top Three You Need to Know!

In short, survey research is a brief interview or conversation with a set of prepared questions about a topic.

So you create a questionnaire, share it, and analyze the data you collect for further action. Learn about the differences between surveys and questionnaires  here .

You can access free survey templates , over 20+ question types, and pass data to 1,500+ applications with survey software, like SurveySparrow . It enables you to create surveys, share them and capture data with very little effort.

Sign up today to launch stunning surveys for free.

Please enter a valid Email ID.

14-Day Free Trial • No Credit Card Required • No Strings Attached

  • Surveys can be hyper-local, regional, or global, depending on your objectives.
  • Share surveys in-person, offline, via SMS, email, or QR codes – so many options!
  • Easy to automate if you want to conduct many surveys over a period.

The observational method is a type of descriptive research in which you, the researcher, observe ongoing behavior.

Now, there are several (non-creepy) ways you can observe someone. In fact, observational research has three main approaches:

  • Covert observation: In true spy fashion, the researcher mixes in with the group undetected or observes from a distance.
  • Overt observation : The researcher identifies himself as a researcher – “The name’s Bond. J. Bond.” – and explains the purpose of the study.
  • Participatory observation : The researcher participates in what he is observing to understand his topic better.
  • Observation is one of the most accurate ways to get data on a subject’s behavior in a natural setting.
  • You don’t need to rely on people’s willingness to share information.
  • Observation is a universal method that can be applied to any area of research.

In the case study method, you do a detailed study of a specific group, person, or event over a period.

This brings us to a frequently asked question: “What’s the difference between case studies and longitudinal studies?”

A case study will go  very in-depth into the subject with one-on-one interviews, observations, and archival research. They are also qualitative, though sometimes they will use numbers and stats.

An example of longitudinal research would be a study of the health of night shift employees vs. general shift employees over a decade. An example of a case study would involve in-depth interviews with Casey, an assistant director of nursing who’s handled the night shift at the hospital for ten years now.

  • Due to the focus on a few people, case studies can give you a tremendous amount of information.
  • Because of the time and effort involved, a case study engages both researchers and participants.
  • Case studies are helpful for ethically investigating unusual, complex, or challenging subjects. An example would be a study of the habits of long-term cocaine users.

1. Case Study: Airbnb’s Growth Strategy

In an excellent case study, Tam Al Saad, Principal Consultant, Strategy + Growth at Webprofits, deep dives into how Airbnb attracted and retained 150 million users .

“What Airbnb offers isn’t a cheap place to sleep when you’re on holiday; it’s the opportunity to experience your destination as a local would. It’s the chance to meet the locals, experience the markets, and find non-touristy places.

Sure, you can visit the Louvre, see Buckingham Palace, and climb the Empire State Building, but you can do it as if it were your hometown while staying in a place that has character and feels like a home.” – Tam al Saad, Principal Consultant, Strategy + Growth at Webprofits

2. Observation – Better Tech Experiences for the Elderly

We often think that our elders are so hopeless with technology. But we’re not getting any younger either, and tech is changing at a hair trigger! This article by Annemieke Hendricks shares a wonderful example where researchers compare the levels of technological familiarity between age groups and how that influences usage.

“It is generally assumed that older adults have difficulty using modern electronic devices, such as mobile telephones or computers. Because this age group is growing in most countries, changing products and processes to adapt to their needs is increasingly more important. “ – Annemieke Hendricks, Marketing Communication Specialist, Noldus

3. Surveys – Decoding Sleep with SurveySparrow

SRI International (formerly Stanford Research Institute) – an independent, non-profit research center – wanted to investigate the impact of stress on an adolescent’s sleep. To get those insights, two actions were essential: tracking sleep patterns through wearable devices and sending surveys at a pre-set time –  the pre-sleep period.

“With SurveySparrow’s recurring surveys feature, SRI was able to share engaging surveys with their participants exactly at the time they wanted and at the frequency they preferred.”

Read more about this project : How SRI International decoded sleep patterns with SurveySparrow

1: Answer the six Ws –

  • Who should we consider?
  • What information do we need?
  • When should we collect the information?
  • Where should we collect the information?
  • Why are we obtaining the information?
  • Way to collect the information

#2: Introduce and explain your methodological approach

#3: Describe your methods of data collection and/or selection.

#4: Describe your methods of analysis.

#5: Explain the reasoning behind your choices.

#6: Collect data.

#7: Analyze the data. Use software to speed up the process and reduce overthinking and human error.

#8: Report your conclusions and how you drew the results.

Wrapping Up

That’s all, folks!

Growth Marketer at SurveySparrow

Fledgling growth marketer. Cloud watcher. Aunty to a naughty beagle.

You Might Also Like

Ask like a pro: your guide to mastering open-ended questions, the ultimate guide to collect product feedback, how to write a self-evaluation for work: tips & examples to help you nail it.

Leave us your email, we wont spam. Promise!

Start your free trial today

No Credit Card Required. 14-Day Free Trial

Request a Demo

Want to learn more about SurveySparrow? We'll be in touch soon!

Scale up your descriptive research with the best survey software

Build surveys that actually work. give surveysparrow a free try today.

14-Day Free Trial • No Credit card required • 40% more completion rate

Hi there, we use cookies to offer you a better browsing experience and to analyze site traffic. By continuing to use our website, you consent to the use of these cookies. Learn More

  • What is descriptive research?

Last updated

5 February 2023

Reviewed by

Cathy Heath

Descriptive research is a common investigatory model used by researchers in various fields, including social sciences, linguistics, and academia.

Read on to understand the characteristics of descriptive research and explore its underlying techniques, processes, and procedures.

Analyze your descriptive research

Dovetail streamlines analysis to help you uncover and share actionable insights

Descriptive research is an exploratory research method. It enables researchers to precisely and methodically describe a population, circumstance, or phenomenon.

As the name suggests, descriptive research describes the characteristics of the group, situation, or phenomenon being studied without manipulating variables or testing hypotheses . This can be reported using surveys , observational studies, and case studies. You can use both quantitative and qualitative methods to compile the data.

Besides making observations and then comparing and analyzing them, descriptive studies often develop knowledge concepts and provide solutions to critical issues. It always aims to answer how the event occurred, when it occurred, where it occurred, and what the problem or phenomenon is.

  • Characteristics of descriptive research

The following are some of the characteristics of descriptive research:

Quantitativeness

Descriptive research can be quantitative as it gathers quantifiable data to statistically analyze a population sample. These numbers can show patterns, connections, and trends over time and can be discovered using surveys, polls, and experiments.

Qualitativeness

Descriptive research can also be qualitative. It gives meaning and context to the numbers supplied by quantitative descriptive research .

Researchers can use tools like interviews, focus groups, and ethnographic studies to illustrate why things are what they are and help characterize the research problem. This is because it’s more explanatory than exploratory or experimental research.

Uncontrolled variables

Descriptive research differs from experimental research in that researchers cannot manipulate the variables. They are recognized, scrutinized, and quantified instead. This is one of its most prominent features.

Cross-sectional studies

Descriptive research is a cross-sectional study because it examines several areas of the same group. It involves obtaining data on multiple variables at the personal level during a certain period. It’s helpful when trying to understand a larger community’s habits or preferences.

Carried out in a natural environment

Descriptive studies are usually carried out in the participants’ everyday environment, which allows researchers to avoid influencing responders by collecting data in a natural setting. You can use online surveys or survey questions to collect data or observe.

Basis for further research

You can further dissect descriptive research’s outcomes and use them for different types of investigation. The outcomes also serve as a foundation for subsequent investigations and can guide future studies. For example, you can use the data obtained in descriptive research to help determine future research designs.

  • Descriptive research methods

There are three basic approaches for gathering data in descriptive research: observational, case study, and survey.

You can use surveys to gather data in descriptive research. This involves gathering information from many people using a questionnaire and interview .

Surveys remain the dominant research tool for descriptive research design. Researchers can conduct various investigations and collect multiple types of data (quantitative and qualitative) using surveys with diverse designs.

You can conduct surveys over the phone, online, or in person. Your survey might be a brief interview or conversation with a set of prepared questions intended to obtain quick information from the primary source.

Observation

This descriptive research method involves observing and gathering data on a population or phenomena without manipulating variables. It is employed in psychology, market research , and other social science studies to track and understand human behavior.

Observation is an essential component of descriptive research. It entails gathering data and analyzing it to see whether there is a relationship between the two variables in the study. This strategy usually allows for both qualitative and quantitative data analysis.

Case studies

A case study can outline a specific topic’s traits. The topic might be a person, group, event, or organization.

It involves using a subset of a larger group as a sample to characterize the features of that larger group.

You can generalize knowledge gained from studying a case study to benefit a broader audience.

This approach entails carefully examining a particular group, person, or event over time. You can learn something new about the study topic by using a small group to better understand the dynamics of the entire group.

  • Types of descriptive research

There are several types of descriptive study. The most well-known include cross-sectional studies, census surveys, sample surveys, case reports, and comparison studies.

Case reports and case series

In the healthcare and medical fields, a case report is used to explain a patient’s circumstances when suffering from an uncommon illness or displaying certain symptoms. Case reports and case series are both collections of related cases. They have aided the advancement of medical knowledge on countless occasions.

The normative component is an addition to the descriptive survey. In the descriptive–normative survey, you compare the study’s results to the norm.

Descriptive survey

This descriptive type of research employs surveys to collect information on various topics. This data aims to determine the degree to which certain conditions may be attained.

You can extrapolate or generalize the information you obtain from sample surveys to the larger group being researched.

Correlative survey

Correlative surveys help establish if there is a positive, negative, or neutral connection between two variables.

Performing census surveys involves gathering relevant data on several aspects of a given population. These units include individuals, families, organizations, objects, characteristics, and properties.

During descriptive research, you gather different degrees of interest over time from a specific population. Cross-sectional studies provide a glimpse of a phenomenon’s prevalence and features in a population. There are no ethical challenges with them and they are quite simple and inexpensive to carry out.

Comparative studies

These surveys compare the two subjects’ conditions or characteristics. The subjects may include research variables, organizations, plans, and people.

Comparison points, assumption of similarities, and criteria of comparison are three important variables that affect how well and accurately comparative studies are conducted.

For instance, descriptive research can help determine how many CEOs hold a bachelor’s degree and what proportion of low-income households receive government help.

  • Pros and cons

The primary advantage of descriptive research designs is that researchers can create a reliable and beneficial database for additional study. To conduct any inquiry, you need access to reliable information sources that can give you a firm understanding of a situation.

Quantitative studies are time- and resource-intensive, so knowing the hypotheses viable for testing is crucial. The basic overview of descriptive research provides helpful hints as to which variables are worth quantitatively examining. This is why it’s employed as a precursor to quantitative research designs.

Some experts view this research as untrustworthy and unscientific. However, there is no way to assess the findings because you don’t manipulate any variables statistically.

Cause-and-effect correlations also can’t be established through descriptive investigations. Additionally, observational study findings cannot be replicated, which prevents a review of the findings and their replication.

The absence of statistical and in-depth analysis and the rather superficial character of the investigative procedure are drawbacks of this research approach.

  • Descriptive research examples and applications

Several descriptive research examples are emphasized based on their types, purposes, and applications. Research questions often begin with “What is …” These studies help find solutions to practical issues in social science, physical science, and education.

Here are some examples and applications of descriptive research:

Determining consumer perception and behavior

Organizations use descriptive research designs to determine how various demographic groups react to a certain product or service.

For example, a business looking to sell to its target market should research the market’s behavior first. When researching human behavior in response to a cause or event, the researcher pays attention to the traits, actions, and responses before drawing a conclusion.

Scientific classification

Scientific descriptive research enables the classification of organisms and their traits and constituents.

Measuring data trends

A descriptive study design’s statistical capabilities allow researchers to track data trends over time. It’s frequently used to determine the study target’s current circumstances and underlying patterns.

Conduct comparison

Organizations can use a descriptive research approach to learn how various demographics react to a certain product or service. For example, you can study how the target market responds to a competitor’s product and use that information to infer their behavior.

  • Bottom line

A descriptive research design is suitable for exploring certain topics and serving as a prelude to larger quantitative investigations. It provides a comprehensive understanding of the “what” of the group or thing you’re investigating.

This research type acts as the cornerstone of other research methodologies . It is distinctive because it can use quantitative and qualitative research approaches at the same time.

What is descriptive research design?

Descriptive research design aims to systematically obtain information to describe a phenomenon, situation, or population. More specifically, it helps answer the what, when, where, and how questions regarding the research problem rather than the why.

How does descriptive research compare to qualitative research?

Despite certain parallels, descriptive research concentrates on describing phenomena, while qualitative research aims to understand people better.

How do you analyze descriptive research data?

Data analysis involves using various methodologies, enabling the researcher to evaluate and provide results regarding validity and reliability.

Get started today

Go from raw data to valuable insights with a flexible research platform

Editor’s picks

Last updated: 21 December 2023

Last updated: 16 December 2023

Last updated: 6 October 2023

Last updated: 25 November 2023

Last updated: 12 May 2023

Last updated: 15 February 2024

Last updated: 11 March 2024

Last updated: 12 December 2023

Last updated: 18 May 2023

Last updated: 6 March 2024

Last updated: 10 April 2023

Last updated: 20 December 2023

Latest articles

Related topics, log in or sign up.

Get started for free

Have a language expert improve your writing

Run a free plagiarism check in 10 minutes, automatically generate references for free.

  • Knowledge Base
  • Methodology
  • Descriptive Research Design | Definition, Methods & Examples

Descriptive Research Design | Definition, Methods & Examples

Published on 5 May 2022 by Shona McCombes . Revised on 10 October 2022.

Descriptive research aims to accurately and systematically describe a population, situation or phenomenon. It can answer what , where , when , and how   questions , but not why questions.

A descriptive research design can use a wide variety of research methods  to investigate one or more variables . Unlike in experimental research , the researcher does not control or manipulate any of the variables, but only observes and measures them.

Table of contents

When to use a descriptive research design, descriptive research methods.

Descriptive research is an appropriate choice when the research aim is to identify characteristics, frequencies, trends, and categories.

It is useful when not much is known yet about the topic or problem. Before you can research why something happens, you need to understand how, when, and where it happens.

  • How has the London housing market changed over the past 20 years?
  • Do customers of company X prefer product Y or product Z?
  • What are the main genetic, behavioural, and morphological differences between European wildcats and domestic cats?
  • What are the most popular online news sources among under-18s?
  • How prevalent is disease A in population B?

Prevent plagiarism, run a free check.

Descriptive research is usually defined as a type of quantitative research , though qualitative research can also be used for descriptive purposes. The research design should be carefully developed to ensure that the results are valid and reliable .

Survey research allows you to gather large volumes of data that can be analysed for frequencies, averages, and patterns. Common uses of surveys include:

  • Describing the demographics of a country or region
  • Gauging public opinion on political and social topics
  • Evaluating satisfaction with a company’s products or an organisation’s services

Observations

Observations allow you to gather data on behaviours and phenomena without having to rely on the honesty and accuracy of respondents. This method is often used by psychological, social, and market researchers to understand how people act in real-life situations.

Observation of physical entities and phenomena is also an important part of research in the natural sciences. Before you can develop testable hypotheses , models, or theories, it’s necessary to observe and systematically describe the subject under investigation.

Case studies

A case study can be used to describe the characteristics of a specific subject (such as a person, group, event, or organisation). Instead of gathering a large volume of data to identify patterns across time or location, case studies gather detailed data to identify the characteristics of a narrowly defined subject.

Rather than aiming to describe generalisable facts, case studies often focus on unusual or interesting cases that challenge assumptions, add complexity, or reveal something new about a research problem .

Cite this Scribbr article

If you want to cite this source, you can copy and paste the citation or click the ‘Cite this Scribbr article’ button to automatically add the citation to our free Reference Generator.

McCombes, S. (2022, October 10). Descriptive Research Design | Definition, Methods & Examples. Scribbr. Retrieved 29 April 2024, from https://www.scribbr.co.uk/research-methods/descriptive-research-design/

Is this article helpful?

Shona McCombes

Shona McCombes

Other students also liked, a quick guide to experimental design | 5 steps & examples, correlational research | guide, design & examples, qualitative vs quantitative research | examples & methods.

  • Chester Fritz Library
  • Library of the Health Sciences
  • Thormodsgard Law Library
  • University of North Dakota
  • Research Guides
  • SMHS Library Resources

Statistics - explanations and formulas

Descriptive statistics.

  • Absolute Risk Reduction
  • Bell-shaped Curve
  • Confidence Interval
  • Control Event Rate
  • Correlation
  • Discrete Stats
  • Experimental Event Rate
  • Forest Plots
  • Hazard Ratio
  • Heterogeneity / Statistical Heterogeneity
  • Inferential Statistics
  • Intention to Treat
  • Internal Validity / External Validity
  • Kaplan-Meier Curves
  • Kruskal-Wallis Test
  • Likelihood Ratios
  • Logistics Regression
  • Mann-Whitney U Test
  • Mean Difference
  • Misclassification Bias
  • Multiple Regression Coefficients
  • Nominal Data
  • Noninferiority Studies
  • Noninferiority Trials
  • Nonparametric Analysis
  • Normal Distribution
  • Number Needed to Treat - including how to calculate
  • Power Analysis
  • Predictive Power
  • Probability
  • Propensity Score
  • Random Sample
  • Regression Analysis
  • Relative Risk
  • Sampling Error
  • Spearman Rank Correlation
  • Specificity and Sensitivity
  • Statistical Significance versus Clinical Significance
  • Survivor Analysis
  • Wilcoxon Rank Sum Test
  • Excel formulas
  • Picking the appropriate method

Descriptive statistics are techniques used for describing, graphing, organizing and summarizing quantitative data . They describe something, either visually or statistically, about individual variables or the association among two or more variables. For instance, a social researcher may want to know how many people in his/her study are male or female, what the average age of the respondents is, or what the median income is. Researchers often need to know how closely their data represent the population from which it is drawn so that they can assess the data’s representativeness.

Descriptive statistics include mean, standard deviation, mode,and median.

Descriptive information gives researchers a general picture of their data, as opposed to an explanation for why certain variables may be associated with each other. Descriptive statistics are often contrasted with inferential statistics, which are used to make inferences, or to explain factors, about the population. Data can be summarized at the univariate level with visual pictures, such as graphs, histograms, and pie charts. Statistical techniques used to describe individual variables include frequencies, the mean , median, mode, cumulative percent, percentile, standard deviation, variance, and interquartile range. Data can also be summarized at the bivariate level. Measures of association between two variables include calculations of eta, gamma, lambda, Pearson’s r, Kendall’s tau, Spearman’s rho, and chi2, among others. Bivariate relationships can also be illustrated in visual graphs that describe the association between two variables.

(from Oxford Reference Online )

  • << Previous: Correlation
  • Next: Discrete Stats >>
  • Last Updated: Apr 1, 2024 8:54 AM
  • URL: https://libguides.und.edu/statistics

Logo for BCcampus Open Publishing

Want to create or adapt books like this? Learn more about how Pressbooks supports open publishing practices.

Chapter 1. Descriptive Statistics and Frequency Distributions

This chapter is about describing populations and samples, a subject known as descriptive statistics. This will all make more sense if you keep in mind that the information you want to produce is a description of the population or sample as a whole, not a description of one member of the population. The first topic in this chapter is a discussion of distributions , essentially pictures of populations (or samples). Second will be the discussion of descriptive statistics. The topics are arranged in this order because the descriptive statistics can be thought of as ways to describe the picture of a population, the distribution.

Distributions

The first step in turning data into information is to create a distribution. The most primitive way to present a distribution is to simply list, in one column, each value that occurs in the population and, in the next column, the number of times it occurs. It is customary to list the values from lowest to highest. This simple listing is called a frequency distribution . A more elegant way to turn data into information is to draw a graph of the distribution. Customarily, the values that occur are put along the horizontal axis and the frequency of the value is on the vertical axis.

Ann is the equipment manager for the Chargers athletic teams at Camosun College, located in Victoria, British Columbia. She called the basketball and volleyball team managers and collected the following data on sock sizes used by their players. Ann found out that last year the basketball team used 14 pairs of size 7 socks, 18 pairs of size 8, 15 pairs of size 9, and 6 pairs of size 10 were used. The volleyball team used 3 pairs of size 6, 10 pairs of size 7, 15 pairs of size 8, 5 pairs of size 9, and 11 pairs of size 10. Ann arranged her data into a distribution and then drew a graph called a histogram. Ann could have created a relative frequency distribution as well as a frequency distribution. The difference is that instead of listing how many times each value occurred, Ann would list what proportion of her sample was made up of socks of each size.

You can use the Excel template below (Figure 1.1) to see all the histograms and frequencies she has created. You may also change her numbers in the yellow cells to see how the graphs will change automatically.

Notice that Ann has drawn the graphs differently. In the first graph, she has used bars for each value, while on the second, she has drawn a point for the relative frequency of each size, and then “connected the dots”. While both methods are correct, when you have values that are continuous, you will want to do something more like the “connect the dots” graph. Sock sizes are discrete , they only take on a limited number of values. Other things have continuous values; they can take on an infinite number of values, though we are often in the habit of rounding them off. An example is how much students weigh. While we usually give our weight in whole kilograms in Canada (“I weigh 60 kilograms”), few have a weight that is exactly so many kilograms. When you say “I weigh 60”, you actually mean that you weigh between 59 1/2 and 60 1/2 kilograms. We are heading toward a graph of a distribution of a continuous variable where the relative frequency of any exact value is very small, but the relative frequency of observations between two values is measurable. What we want to do is to get used to the idea that the total area under a “connect the dots” relative frequency graph, from the lowest to the highest possible value, is one. Then the part of the area under the graph between two values is the relative frequency of observations with values within that range. The height of the line above any particular value has lost any direct meaning, because it is now the area under the line between two values that is the relative frequency of an observation between those two values occurring.

You can get some idea of how this works if you go back to the bar graph of the distribution of sock sizes, but draw it with relative frequency on the vertical axis. If you arbitrarily decide that each bar has a width of one, then the area under the curve between 7.5 and 8.5 is simply the height times the width of the bar for sock size 8: .3510*1 . If you wanted to find the relative frequency of sock sizes between 6.5 and 8.5, you could simply add together the area of the bar for size 7 (that’s between 6.5 and 7.5) and the bar for size 8 (between 7.5 and 8.5).

Descriptive statistics

Now that you see how a distribution is created, you are ready to learn how to describe one. There are two main things that need to be described about a distribution: its location and its shape. Generally, it is best to give a single measure as the description of the location and a single measure as the description of the shape.

To describe the location of a distribution, statisticians use a typical value from the distribution. There are a number of different ways to find the typical value, but by far the most used is the arithmetic mean , usually simply called the mean . You already know how to find the arithmetic mean, you are just used to calling it the average . Statisticians use average more generally — the arithmetic mean is one of a number of different averages. Look at the formula for the arithmetic mean:

[latex]\mu = \dfrac{\sum{x}}{N}[/latex]

All you do is add up all of the members of the population, [latex]\sum{x}[/latex], and divide by how many members there are, N . The only trick is to remember that if there is more than one member of the population with a certain value, to add that value once for every member that has it. To reflect this, the equation for the mean sometimes is written:

[latex]\mu = \dfrac{\sum{f_i(x_i)}}{N}[/latex]

where f i is the frequency of members of the population with the value x i .

This is really the same formula as above. If there are seven members with a value of ten, the first formula would have you add seven ten times. The second formula simply has you multiply seven by ten — the same thing as adding together ten sevens.

Other measures of location are the median and the mode. The median is the value of the member of the population that is in the middle when the members are sorted from smallest to largest. Half of the members of the population have values higher than the median, and half have values lower. The median is a better measure of location if there are one or two members of the population that are a lot larger (or a lot smaller) than all the rest. Such extreme values can make the mean a poor measure of location, while they have little effect on the median. If there are an odd number of members of the population, there is no problem finding which member has the median value. If there are an even number of members of the population, then there is no single member in the middle. In that case, just average together the values of the two members that share the middle.

The third common measure of location is the mode . If you have arranged the population into a frequency or relative frequency distribution, the mode is easy to find because it is the value that occurs most often. While in some sense, the mode is really the most typical member of the population, it is often not very near the middle of the population. You can also have multiple modes. I am sure you have heard someone say that “it was a bimodal distribution “. That simply means that there were two modes, two values that occurred equally most often.

If you think about it, you should not be surprised to learn that for bell-shaped distributions, the mean, median, and mode will be equal. Most of what statisticians do when describing or inferring the location of a population is done with the mean. Another thing to think about is using a spreadsheet program, like Microsoft Excel, when arranging data into a frequency distribution or when finding the median or mode. By using the sort and distribution commands in 1-2-3, or similar commands in Excel, data can quickly be arranged in order or placed into value classes and the number in each class found. Excel also has a function, =AVERAGE(…), for finding the arithmetic mean. You can also have the spreadsheet program draw your frequency or relative frequency distribution.

One of the reasons that the arithmetic mean is the most used measure of location is because the mean of a sample is an unbiased estimator of the population mean. Because the sample mean is an unbiased estimator of the population mean, the sample mean is a good way to make an inference about the population mean. If you have a sample from a population, and you want to guess what the mean of that population is, you can legitimately guess that the population mean is equal to the mean of your sample. This is a legitimate way to make this inference because the mean of all the sample means equals the mean of the population, so if you used this method many times to infer the population mean, on average you’d be correct.

All of these measures of location can be found for samples as well as populations, using the same formulas. Generally, μ is used for a population mean, and x is used for sample means. Upper-case N , really a Greek nu , is used for the size of a population, while lower case n is used for sample size. Though it is not universal, statisticians tend to use the Greek alphabet for population characteristics and the Roman alphabet for sample characteristics.

Measuring population shape

Measuring the shape of a distribution is more difficult. Location has only one dimension (“where?”), but shape has a lot of dimensions. We will talk about two,and you will find that most of the time, only one dimension of shape is measured. The two dimensions of shape discussed here are the width and symmetry of the distribution. The simplest way to measure the width is to do just that—the range is the distance between the lowest and highest members of the population. The range is obviously affected by one or two population members that are much higher or lower than all the rest.

The most common measures of distribution width are the standard deviation and the variance. The standard deviation is simply the square root of the variance, so if you know one (and have a calculator that does squares and square roots) you know the other. The standard deviation is just a strange measure of the mean distance between the members of a population and the mean of the population. This is easiest to see if you start out by looking at the formula for the variance:

[latex]\sigma^2 = \dfrac{\sum{(x-\mu)^2}}{N}[/latex]

Look at the numerator. To find the variance, the first step (after you have the mean, μ ) is to take each member of the population, and find the difference between its value and the mean; you should have N differences. Square each of those, and add them together, dividing the sum by N , the number of members of the population. Since you find the mean of a group of things by adding them together and then dividing by the number in the group, the variance is simply the mean of the squared distances between members of the population and the population mean.

Notice that this is the formula for a population characteristic, so we use the Greek σ and that we write the variance as σ 2 , or sigma square because the standard deviation is simply the square root of the variance, its symbol is simply sigma , σ .

One of the things statisticians have discovered is that 75 per cent of the members of any population are within two standard deviations of the mean of the population. This is known as Chebyshev’s theorem . If the mean of a population of shoe sizes is 9.6 and the standard deviation is 1.1, then 75 per cent of the shoe sizes are between 7.4 (two standard deviations below the mean) and 11.8 (two standard deviations above the mean). This same theorem can be stated in probability terms: the probability that anything is within two standard deviations of the mean of its population is .75.

It is important to be careful when dealing with variances and standard deviations. In later chapters, there are formulas using the variance, and formulas using the standard deviation. Be sure you know which one you are supposed to be using. Here again, spreadsheet programs will figure out the standard deviation for you. In Excel, there is a function, =STDEVP(…), that does all of the arithmetic. Most calculators will also compute the standard deviation. Read the little instruction booklet, and find out how to have your calculator do the numbers before you do any homework or have a test.

The other measure of shape we will discuss here is the measure of skewness. Skewness is simply a measure of whether or not the distribution is symmetric or if it has a long tail on one side, but not the other. There are a number of ways to measure skewness, with many of the measures based on a formula much like the variance. The formula looks a lot like that for the variance, except the distances between the members and the population mean are cubed, rather than squared, before they are added together:

[latex]sk = \dfrac{\sum{(x-\mu)^3}}{N}[/latex]

At first, it might not seem that cubing rather than squaring those distances would make much difference. Remember, however, that when you square either a positive or negative number, you get a positive number, but when you cube a positive, you get a positive and when you cube a negative you get a negative. Also remember that when you square a number, it gets larger, but that when you cube a number, it gets a whole lot larger. Think about a distribution with a long tail out to the left. There are a few members of that population much smaller than the mean, members for which (x – μ) is large and negative. When these are cubed, you end up with some really big negative numbers. Because there are no members with such large, positive (x – μ) , there are no corresponding really big positive numbers to add in when you sum up the (x – μ) 3 , and the sum will be negative. A negative measure of skewness means that there is a tail out to the left, a positive measure means a tail to the right. Take a minute and convince yourself that if the distribution is symmetric, with equal tails on the left and right, the measure of skew is zero.

To be really complete, there is one more thing to measure, kurtosis or peakedness . As you might expect by now, it is measured by taking the distances between the members and the mean and raising them to the fourth power before averaging them together.

Measuring sample shape

Measuring the location of a sample is done in exactly the way that the location of a population is done. However, measuring the shape of a sample is done a little differently than measuring the shape of a population. The reason behind the difference is the desire to have the sample measurement serve as an unbiased estimator of the population measurement. If we took all of the possible samples of a certain size, n , from a population and found the variance of each one, and then found the mean of those sample variances, that mean would be a little smaller than the variance of the population.

You can see why this is so if you think it through. If you knew the population mean, you could find [latex]\sum{\dfrac{(x-\mu)^2}{n}}[/latex] for each sample, and have an unbiased estimate for σ 2 . However, you do not know the population mean, so you will have to infer it. The best way to infer the population mean is to use the sample mean x . The variance of a sample will then be found by averaging together all of the [latex]\sum{\dfrac{(x-\bar{x})^2}{n}}[/latex].

The mean of a sample is obviously determined by where the members of that sample lie. If you have a sample that is mostly from the high (or right) side of a population’s distribution, then the sample mean will almost for sure be greater than the population mean. For such a sample, [latex]\sum{\dfrac{(x-\bar{x})^2}{n}}[/latex] would underestimate σ 2 . The same is true for samples that are mostly from the low (or left) side of the population. If you think about what kind of samples will have [latex]\sum{\dfrac{(x-\bar{x})^2}{n}}[/latex] that is greater than the population σ 2 , you will come to the realization that it is only those samples with a few very high members and a few very low members — and there are not very many samples like that. By now you should have convinced yourself that [latex]\sum{\dfrac{(x-\bar{x})^2}{n}}[/latex] will result in a biased estimate of σ 2 . You can see that, on average, it is too small.

How can an unbiased estimate of the population variance, σ 2 , be found? If [latex]\sum{\dfrac{(x-\bar{x})^2}{n}}[/latex] is on average too small, we need to do something to make it a little bigger. We want to keep the [latex]\sum{(x-\bar{x})^2}[/latex], but if we divide it by something a little smaller, the result will be a little larger. Statisticians have found out that the following way to compute the sample variance results in an unbiased estimator of the population variance:

[latex]s^2 = \dfrac{\sum{(x-\bar{x})^2}}{n-1}[/latex]

If we took all of the possible samples of some size, n , from a population, and found the sample variance for each of those samples, using this formula, the mean of those sample variances would equal the population variance, σ 2 .

Note that we use s 2 instead of σ 2 , and n instead of N (really nu , not en ) since this is for a sample and we want to use the Roman letters rather than the Greek letters, which are used for populations.

There is another way to see why you divide by n-1 . We also have to address something called degrees of freedom before too long, and the degrees of freedom are the key in the other explanation. As we go through this explanation, you should be able to see that the two explanations are related.

Imagine that you have a sample with 10 members, n=10 , and you want to use it to estimate the variance of the population from which it was drawn. You write each of the 10 values on a separate scrap of paper. If you know the population mean, you could start by computing all 10 (x – μ) 2 . However, in the usual case, you do not know μ , and you must start by finding x from the values on the 10 scraps to use as an estimate of m . Once you have found x , you could lose any one of the 10 scraps and still be able to find the value that was on the lost scrap from the other 9 scraps. If you are going to use x in the formula for sample variance, only 9 (or n-1 ) of the x ’s are free to take on any value. Because only n-1 of the  x ’s can vary freely, you should divide [latex]\sum{(x-\bar{x})^2}[/latex] by n-1 , the number of ( x ’s) that are really free. Once you use x in the formula for sample variance, you use up one degree of freedom, leaving only n-1 . Generally, whenever you use something you have previously computed from a sample within a formula, you use up a degree of freedom.

A little thought will link the two explanations. The first explanation is based on the idea that x , the estimator of μ , varies with the sample. It is because x varies with the sample that a degree of freedom is used up in the second explanation.

The sample standard deviation is found simply by taking the square root of the sample variance:

[latex]s=\surd[\dfrac{\sum{(x-\bar{x}})^2}{n-1}][/latex]

While the sample variance is an unbiased estimator of population variance, the sample standard deviation is not an unbiased estimator of the population standard deviation — the square root of the average is not the same as the average of the square roots. This causes statisticians to use variance where it seems as though they are trying to get at standard deviation. In general, statisticians tend to use variance more than standard deviation. Be careful with formulas using sample variance and standard deviation in the following chapters. Make sure you are using the right one. Also note that many calculators will find standard deviation using both the population and sample formulas. Some use σ and s to show the difference between population and sample formulas, some use s n and s n-1 to show the difference.

If Ann wanted to infer what the population distribution of volleyball players’ sock sizes looked like she could do so from her sample. If she is going to send volleyball coaches packages of socks for the players to try, she will want to have the packages contain an assortment of sizes that will allow each player to have a pair that fits. Ann wants to infer what the distribution of volleyball players’ sock sizes looks like. She wants to know the mean and variance of that distribution. Her data, again, are shown in Table 1.1.

The mean sock size can be found: [latex]=\dfrac{3*6+24*7+33*8+20*9+17*10}{97} = 8.25[/latex]

To find the sample standard deviation, Ann decides to use Excel. She lists the sock sizes that were in the sample in column A (see Table 1.2) , and the frequency of each of those sizes in column B. For column C, she has the computer find for each of [latex]\sum{(x-\bar{x})^2}[/latex] the sock sizes, using the formula (A1-8.25) 2 in the first row, and then copying it down to the other four rows. In D1, she multiplies C1, by the frequency using the formula =B1*C1, and copying it down into the other rows. Finally, she finds the sample standard deviation by adding up the five numbers in column D and dividing by n-1 = 96 using the Excel formula =sum(D1:D5)/96. The spreadsheet appears like this when she is done:

Ann now has an estimate of the variance of the sizes of socks worn by basketball and volleyball players, 1.22. She has inferred that the population of Chargers players’ sock sizes has a mean of 8.25 and a variance of 1.22.

Ann’s collected data can simply be added to the following Excel template. The calculations of both variance and standard deviation have been shown below. You can change her numbers to see how these two measures change.

To describe a population you need to describe the picture or graph of its distribution. The two things that need to be described about the distribution are its location and its shape. Location is measured by an average, most often the arithmetic mean. The most important measure of shape is a measure of dispersion, roughly width, most often the variance or its square root the standard deviation.

Samples need to be described, too. If all we wanted to do with sample descriptions was describe the sample, we could use exactly the same measures for sample location and dispersion that are used for populations. However, we want to use the sample describers for dual purposes: (a) to describe the sample, and (b) to make inferences about the description of the population that sample came from. Because we want to use them to make inferences, we want our sample descriptions to be unbiased estimators . Our desire to measure sample dispersion with an unbiased estimator of population dispersion means that the formula we use for computing sample variance is a little different from the one used for computing population variance.

Introductory Business Statistics with Interactive Spreadsheets - 1st Canadian Edition Copyright © 2015 by Mohammad Mahbobi and Thomas K. Tiemann is licensed under a Creative Commons Attribution 4.0 International License , except where otherwise noted.

Share This Book

descriptive formula in research

  • Descriptive Research Designs: Types, Examples & Methods

busayo.longe

One of the components of research is getting enough information about the research problem—the what, how, when and where answers, which is why descriptive research is an important type of research. It is very useful when conducting research whose aim is to identify characteristics, frequencies, trends, correlations, and categories.

This research method takes a problem with little to no relevant information and gives it a befitting description using qualitative and quantitative research method s. Descriptive research aims to accurately describe a research problem.

In the subsequent sections, we will be explaining what descriptive research means, its types, examples, and data collection methods.

What is Descriptive Research?

Descriptive research is a type of research that describes a population, situation, or phenomenon that is being studied. It focuses on answering the how, what, when, and where questions If a research problem, rather than the why.

This is mainly because it is important to have a proper understanding of what a research problem is about before investigating why it exists in the first place. 

For example, an investor considering an investment in the ever-changing Amsterdam housing market needs to understand what the current state of the market is, how it changes (increasing or decreasing), and when it changes (time of the year) before asking for the why. This is where descriptive research comes in.

What Are The Types of Descriptive Research?

Descriptive research is classified into different types according to the kind of approach that is used in conducting descriptive research. The different types of descriptive research are highlighted below:

  • Descriptive-survey

Descriptive survey research uses surveys to gather data about varying subjects. This data aims to know the extent to which different conditions can be obtained among these subjects.

For example, a researcher wants to determine the qualification of employed professionals in Maryland. He uses a survey as his research instrument , and each item on the survey related to qualifications is subjected to a Yes/No answer. 

This way, the researcher can describe the qualifications possessed by the employed demographics of this community. 

  • Descriptive-normative survey

This is an extension of the descriptive survey, with the addition being the normative element. In the descriptive-normative survey, the results of the study should be compared with the norm.

For example, an organization that wishes to test the skills of its employees by a team may have them take a skills test. The skills tests are the evaluation tool in this case, and the result of this test is compared with the norm of each role.

If the score of the team is one standard deviation above the mean, it is very satisfactory, if within the mean, satisfactory, and one standard deviation below the mean is unsatisfactory.

  • Descriptive-status

This is a quantitative description technique that seeks to answer questions about real-life situations. For example, a researcher researching the income of the employees in a company, and the relationship with their performance.

A survey will be carried out to gather enough data about the income of the employees, then their performance will be evaluated and compared to their income. This will help determine whether a higher income means better performance and low income means lower performance or vice versa.

  • Descriptive-analysis

The descriptive-analysis method of research describes a subject by further analyzing it, which in this case involves dividing it into 2 parts. For example, the HR personnel of a company that wishes to analyze the job role of each employee of the company may divide the employees into the people that work at the Headquarters in the US and those that work from Oslo, Norway office.

A questionnaire is devised to analyze the job role of employees with similar salaries and who work in similar positions.

  • Descriptive classification

This method is employed in biological sciences for the classification of plants and animals. A researcher who wishes to classify the sea animals into different species will collect samples from various search stations, then classify them accordingly.

  • Descriptive-comparative

In descriptive-comparative research, the researcher considers 2 variables that are not manipulated, and establish a formal procedure to conclude that one is better than the other. For example, an examination body wants to determine the better method of conducting tests between paper-based and computer-based tests.

A random sample of potential participants of the test may be asked to use the 2 different methods, and factors like failure rates, time factors, and others will be evaluated to arrive at the best method.

  • Correlative Survey

Correlative surveys are used to determine whether the relationship between 2 variables is positive, negative, or neutral. That is, if 2 variables say X and Y are directly proportional, inversely proportional or are not related to each other.

Examples of Descriptive Research

There are different examples of descriptive research, that may be highlighted from its types, uses, and applications. However, we will be restricting ourselves to only 3 distinct examples in this article.

  • Comparing Student Performance:

An academic institution may wish 2 compare the performance of its junior high school students in English language and Mathematics. This may be used to classify students based on 2 major groups, with one group going ahead to study while courses, while the other study courses in the Arts & Humanities field.

Students who are more proficient in mathematics will be encouraged to go into STEM and vice versa. Institutions may also use this data to identify students’ weak points and work on ways to assist them.

  • Scientific Classification

During the major scientific classification of plants, animals, and periodic table elements, the characteristics and components of each subject are evaluated and used to determine how they are classified.

For example, living things may be classified into kingdom Plantae or kingdom animal is depending on their nature. Further classification may group animals into mammals, pieces, vertebrae, invertebrae, etc. 

All these classifications are made a result of descriptive research which describes what they are.

  • Human Behavior

When studying human behaviour based on a factor or event, the researcher observes the characteristics, behaviour, and reaction, then use it to conclude. A company willing to sell to its target market needs to first study the behaviour of the market.

This may be done by observing how its target reacts to a competitor’s product, then use it to determine their behaviour.

What are the Characteristics of Descriptive Research?  

The characteristics of descriptive research can be highlighted from its definition, applications, data collection methods, and examples. Some characteristics of descriptive research are:

  • Quantitativeness

Descriptive research uses a quantitative research method by collecting quantifiable information to be used for statistical analysis of the population sample. This is very common when dealing with research in the physical sciences.

  • Qualitativeness

It can also be carried out using the qualitative research method, to properly describe the research problem. This is because descriptive research is more explanatory than exploratory or experimental.

  • Uncontrolled variables

In descriptive research, researchers cannot control the variables like they do in experimental research.

  • The basis for further research

The results of descriptive research can be further analyzed and used in other research methods. It can also inform the next line of research, including the research method that should be used.

This is because it provides basic information about the research problem, which may give birth to other questions like why a particular thing is the way it is.

Why Use Descriptive Research Design?  

Descriptive research can be used to investigate the background of a research problem and get the required information needed to carry out further research. It is used in multiple ways by different organizations, and especially when getting the required information about their target audience.

  • Define subject characteristics :

It is used to determine the characteristics of the subjects, including their traits, behaviour, opinion, etc. This information may be gathered with the use of surveys, which are shared with the respondents who in this case, are the research subjects.

For example, a survey evaluating the number of hours millennials in a community spends on the internet weekly, will help a service provider make informed business decisions regarding the market potential of the community.

  • Measure Data Trends

It helps to measure the changes in data over some time through statistical methods. Consider the case of individuals who want to invest in stock markets, so they evaluate the changes in prices of the available stocks to make a decision investment decision.

Brokerage companies are however the ones who carry out the descriptive research process, while individuals can view the data trends and make decisions.

Descriptive research is also used to compare how different demographics respond to certain variables. For example, an organization may study how people with different income levels react to the launch of a new Apple phone.

This kind of research may take a survey that will help determine which group of individuals are purchasing the new Apple phone. Do the low-income earners also purchase the phone, or only the high-income earners do?

Further research using another technique will explain why low-income earners are purchasing the phone even though they can barely afford it. This will help inform strategies that will lure other low-income earners and increase company sales.

  • Validate existing conditions

When you are not sure about the validity of an existing condition, you can use descriptive research to ascertain the underlying patterns of the research object. This is because descriptive research methods make an in-depth analysis of each variable before making conclusions.

  • Conducted Overtime

Descriptive research is conducted over some time to ascertain the changes observed at each point in time. The higher the number of times it is conducted, the more authentic the conclusion will be.

What are the Disadvantages of Descriptive Research?  

  • Response and Non-response Bias

Respondents may either decide not to respond to questions or give incorrect responses if they feel the questions are too confidential. When researchers use observational methods, respondents may also decide to behave in a particular manner because they feel they are being watched.

  • The researcher may decide to influence the result of the research due to personal opinion or bias towards a particular subject. For example, a stockbroker who also has a business of his own may try to lure investors into investing in his own company by manipulating results.
  • A case-study or sample taken from a large population is not representative of the whole population.
  • Limited scope:The scope of descriptive research is limited to the what of research, with no information on why thereby limiting the scope of the research.

What are the Data Collection Methods in Descriptive Research?  

There are 3 main data collection methods in descriptive research, namely; observational method, case study method, and survey research.

1. Observational Method

The observational method allows researchers to collect data based on their view of the behaviour and characteristics of the respondent, with the respondents themselves not directly having an input. It is often used in market research, psychology, and some other social science research to understand human behaviour.

It is also an important aspect of physical scientific research, with it being one of the most effective methods of conducting descriptive research . This process can be said to be either quantitative or qualitative.

Quantitative observation involved the objective collection of numerical data , whose results can be analyzed using numerical and statistical methods. 

Qualitative observation, on the other hand, involves the monitoring of characteristics and not the measurement of numbers. The researcher makes his observation from a distance, records it, and is used to inform conclusions.

2. Case Study Method

A case study is a sample group (an individual, a group of people, organizations, events, etc.) whose characteristics are used to describe the characteristics of a larger group in which the case study is a subgroup. The information gathered from investigating a case study may be generalized to serve the larger group.

This generalization, may, however, be risky because case studies are not sufficient to make accurate predictions about larger groups. Case studies are a poor case of generalization.

3. Survey Research

This is a very popular data collection method in research designs. In survey research, researchers create a survey or questionnaire and distribute it to respondents who give answers.

Generally, it is used to obtain quick information directly from the primary source and also conducting rigorous quantitative and qualitative research. In some cases, survey research uses a blend of both qualitative and quantitative strategies.

Survey research can be carried out both online and offline using the following methods

  • Online Surveys: This is a cheap method of carrying out surveys and getting enough responses. It can be carried out using Formplus, an online survey builder. Formplus has amazing tools and features that will help increase response rates.
  • Offline Surveys: This includes paper forms, mobile offline forms , and SMS-based forms.

What Are The Differences Between Descriptive and Correlational Research?  

Before going into the differences between descriptive and correlation research, we need to have a proper understanding of what correlation research is about. Therefore, we will be giving a summary of the correlation research below.

Correlational research is a type of descriptive research, which is used to measure the relationship between 2 variables, with the researcher having no control over them. It aims to find whether there is; positive correlation (both variables change in the same direction), negative correlation (the variables change in the opposite direction), or zero correlation (there is no relationship between the variables).

Correlational research may be used in 2 situations;

(i) when trying to find out if there is a relationship between two variables, and

(ii) when a causal relationship is suspected between two variables, but it is impractical or unethical to conduct experimental research that manipulates one of the variables. 

Below are some of the differences between correlational and descriptive research:

  • Definitions :

Descriptive research aims is a type of research that provides an in-depth understanding of the study population, while correlational research is the type of research that measures the relationship between 2 variables. 

  • Characteristics :

Descriptive research provides descriptive data explaining what the research subject is about, while correlation research explores the relationship between data and not their description.

  • Predictions :

 Predictions cannot be made in descriptive research while correlation research accommodates the possibility of making predictions.

Descriptive Research vs. Causal Research

Descriptive research and causal research are both research methodologies, however, one focuses on a subject’s behaviors while the latter focuses on a relationship’s cause-and-effect. To buttress the above point, descriptive research aims to describe and document the characteristics, behaviors, or phenomena of a particular or specific population or situation. 

It focuses on providing an accurate and detailed account of an already existing state of affairs between variables. Descriptive research answers the questions of “what,” “where,” “when,” and “how” without attempting to establish any causal relationships or explain any underlying factors that might have caused the behavior.

Causal research, on the other hand, seeks to determine cause-and-effect relationships between variables. It aims to point out the factors that influence or cause a particular result or behavior. Causal research involves manipulating variables, controlling conditions or a subgroup, and observing the resulting effects. The primary objective of causal research is to establish a cause-effect relationship and provide insights into why certain phenomena happen the way they do.

Descriptive Research vs. Analytical Research

Descriptive research provides a detailed and comprehensive account of a specific situation or phenomenon. It focuses on describing and summarizing data without making inferences or attempting to explain underlying factors or the cause of the factor. 

It is primarily concerned with providing an accurate and objective representation of the subject of research. While analytical research goes beyond the description of the phenomena and seeks to analyze and interpret data to discover if there are patterns, relationships, or any underlying factors. 

It examines the data critically, applies statistical techniques or other analytical methods, and draws conclusions based on the discovery. Analytical research also aims to explore the relationships between variables and understand the underlying mechanisms or processes involved.

Descriptive Research vs. Exploratory Research

Descriptive research is a research method that focuses on providing a detailed and accurate account of a specific situation, group, or phenomenon. This type of research describes the characteristics, behaviors, or relationships within the given context without looking for an underlying cause. 

Descriptive research typically involves collecting and analyzing quantitative or qualitative data to generate descriptive statistics or narratives. Exploratory research differs from descriptive research because it aims to explore and gain firsthand insights or knowledge into a relatively unexplored or poorly understood topic. 

It focuses on generating ideas, hypotheses, or theories rather than providing definitive answers. Exploratory research is often conducted at the early stages of a research project to gather preliminary information and identify key variables or factors for further investigation. It involves open-ended interviews, observations, or small-scale surveys to gather qualitative data.

Read More – Exploratory Research: What are its Method & Examples?

Descriptive Research vs. Experimental Research

Descriptive research aims to describe and document the characteristics, behaviors, or phenomena of a particular population or situation. It focuses on providing an accurate and detailed account of the existing state of affairs. 

Descriptive research typically involves collecting data through surveys, observations, or existing records and analyzing the data to generate descriptive statistics or narratives. It does not involve manipulating variables or establishing cause-and-effect relationships.

Experimental research, on the other hand, involves manipulating variables and controlling conditions to investigate cause-and-effect relationships. It aims to establish causal relationships by introducing an intervention or treatment and observing the resulting effects. 

Experimental research typically involves randomly assigning participants to different groups, such as control and experimental groups, and measuring the outcomes. It allows researchers to control for confounding variables and draw causal conclusions.

Related – Experimental vs Non-Experimental Research: 15 Key Differences

Descriptive Research vs. Explanatory Research

Descriptive research focuses on providing a detailed and accurate account of a specific situation, group, or phenomenon. It aims to describe the characteristics, behaviors, or relationships within the given context. 

Descriptive research is primarily concerned with providing an objective representation of the subject of study without explaining underlying causes or mechanisms. Explanatory research seeks to explain the relationships between variables and uncover the underlying causes or mechanisms. 

It goes beyond description and aims to understand the reasons or factors that influence a particular outcome or behavior. Explanatory research involves analyzing data, conducting statistical analyses, and developing theories or models to explain the observed relationships.

Descriptive Research vs. Inferential Research

Descriptive research focuses on describing and summarizing data without making inferences or generalizations beyond the specific sample or population being studied. It aims to provide an accurate and objective representation of the subject of study. 

Descriptive research typically involves analyzing data to generate descriptive statistics, such as means, frequencies, or percentages, to describe the characteristics or behaviors observed.

Inferential research, however, involves making inferences or generalizations about a larger population based on a smaller sample. 

It aims to draw conclusions about the population characteristics or relationships by analyzing the sample data. Inferential research uses statistical techniques to estimate population parameters, test hypotheses, and determine the level of confidence or significance in the findings.

Related – Inferential Statistics: Definition, Types + Examples

Conclusion  

The uniqueness of descriptive research partly lies in its ability to explore both quantitative and qualitative research methods. Therefore, when conducting descriptive research, researchers have the opportunity to use a wide variety of techniques that aids the research process.

Descriptive research explores research problems in-depth, beyond the surface level thereby giving a detailed description of the research subject. That way, it can aid further research in the field, including other research methods .

It is also very useful in solving real-life problems in various fields of social science, physical science, and education.

Logo

Connect to Formplus, Get Started Now - It's Free!

  • descriptive research
  • descriptive research method
  • example of descriptive research
  • types of descriptive research
  • busayo.longe

Formplus

You may also like:

Type I vs Type II Errors: Causes, Examples & Prevention

This article will discuss the two different types of errors in hypothesis testing and how you can prevent them from occurring in your research

descriptive formula in research

Acceptance Sampling: Meaning, Examples, When to Use

In this post, we will discuss extensively what acceptance sampling is and when it is applied.

Extrapolation in Statistical Research: Definition, Examples, Types, Applications

In this article we’ll look at the different types and characteristics of extrapolation, plus how it contrasts to interpolation.

Cross-Sectional Studies: Types, Pros, Cons & Uses

In this article, we’ll look at what cross-sectional studies are, how it applies to your research and how to use Formplus to collect...

Formplus - For Seamless Data Collection

Collect data the right way with a versatile data collection tool. try formplus and transform your work productivity today..

Methods and formulas for Descriptive Statistics (Tables)

In this topic, standard deviation, n nonmissing, row percent, column percent, total percents.

The mean is the sum of all observations divided by the number of (non-missing) observations. Use the following formula to calculate the mean for each cell or margin using the data corresponding to that cell or margin.

descriptive formula in research

The median is the middle value in an ordered data set. Thus, at least half the observations are less than or equal to the median, and at least half the observations are greater than or equal to the median.

If the number of observations in a data set is odd, the median is the value in the middle. If the number of observations in a data set is even, the median is the average of the two middle values.

descriptive formula in research

The smallest data value that is in a table cell or margin.

The largest data value that is in a table cell or margin.

The sum is the total of all the data values that are in a table cell or margin.

The standard deviation is the most common measure of dispersion, or how spread out the data are about the mean. The more widely the values are spread out, the larger the standard deviation. The standard deviation is calculated by taking the square root of the variance.

Use this formula to calculate the standard deviation for each cell or margin using the data from that cell or margin.

descriptive formula in research

The number of non-missing observations that are in a table cell or margin.

The number of missing observations that are in a table cell or margin.

The count is the number of times each combination of categories occurs.

The row percent is obtained by multiplying the ratio of a cell count to the corresponding row total by 100 and is given by:

descriptive formula in research

The column percent is obtained by multiplying the ratio of a cell count to the corresponding column total by 100 and is given by:

descriptive formula in research

The total percent is obtained by multiplying the ratio of a cell count to the total number of observations by 100 and is given by:

descriptive formula in research

  • Minitab.com
  • License Portal
  • Cookie Settings

You are now leaving support.minitab.com.

Click Continue to proceed to:

  • Privacy Policy

Research Method

Home » Descriptive Analytics – Methods, Tools and Examples

Descriptive Analytics – Methods, Tools and Examples

Table of Contents

Descriptive Analytics

Descriptive Analytics

Definition:

Descriptive analytics focused on describing or summarizing raw data and making it interpretable. This type of analytics provides insight into what has happened in the past. It involves the analysis of historical data to identify patterns, trends, and insights. Descriptive analytics often uses visualization tools to represent the data in a way that is easy to interpret.

Descriptive Analytics in Research

Descriptive analytics plays a crucial role in research, helping investigators understand and describe the data collected in their studies. Here’s how descriptive analytics is typically used in a research setting:

  • Descriptive Statistics: In research, descriptive analytics often takes the form of descriptive statistics . This includes calculating measures of central tendency (like mean, median, and mode), measures of dispersion (like range, variance, and standard deviation), and measures of frequency (like count, percent, and frequency). These calculations help researchers summarize and understand their data.
  • Visualizing Data: Descriptive analytics also involves creating visual representations of data to better understand and communicate research findings . This might involve creating bar graphs, line graphs, pie charts, scatter plots, box plots, and other visualizations.
  • Exploratory Data Analysis: Before conducting any formal statistical tests, researchers often conduct an exploratory data analysis, which is a form of descriptive analytics. This might involve looking at distributions of variables, checking for outliers, and exploring relationships between variables.
  • Initial Findings: Descriptive analytics are often reported in the results section of a research study to provide readers with an overview of the data. For example, a researcher might report average scores, demographic breakdowns, or the percentage of participants who endorsed each response on a survey.
  • Establishing Patterns and Relationships: Descriptive analytics helps in identifying patterns, trends, or relationships in the data, which can guide subsequent analysis or future research. For instance, researchers might look at the correlation between variables as a part of descriptive analytics.

Descriptive Analytics Techniques

Descriptive analytics involves a variety of techniques to summarize, interpret, and visualize historical data. Some commonly used techniques include:

Statistical Analysis

This includes basic statistical methods like mean, median, mode (central tendency), standard deviation, variance (dispersion), correlation, and regression (relationships between variables).

Data Aggregation

It is the process of compiling and summarizing data to obtain a general perspective. It can involve methods like sum, count, average, min, max, etc., often applied to a group of data.

Data Mining

This involves analyzing large volumes of data to discover patterns, trends, and insights. Techniques used in data mining can include clustering (grouping similar data), classification (assigning data into categories), association rules (finding relationships between variables), and anomaly detection (identifying outliers).

Data Visualization

This involves presenting data in a graphical or pictorial format to provide clear and easy understanding of the data patterns, trends, and insights. Common data visualization methods include bar charts, line graphs, pie charts, scatter plots, histograms, and more complex forms like heat maps and interactive dashboards.

This involves organizing data into informational summaries to monitor how different areas of a business are performing. Reports can be generated manually or automatically and can be presented in tables, graphs, or dashboards.

Cross-tabulation (or Pivot Tables)

It involves displaying the relationship between two or more variables in a tabular form. It can provide a deeper understanding of the data by allowing comparisons and revealing patterns and correlations that may not be readily apparent in raw data.

Descriptive Modeling

Some techniques use complex algorithms to interpret data. Examples include decision tree analysis, which provides a graphical representation of decision-making situations, and neural networks, which are used to identify correlations and patterns in large data sets.

Descriptive Analytics Tools

Some common Descriptive Analytics Tools are as follows:

Excel: Microsoft Excel is a widely used tool that can be used for simple descriptive analytics. It has powerful statistical and data visualization capabilities. Pivot tables are a particularly useful feature for summarizing and analyzing large data sets.

Tableau: Tableau is a data visualization tool that is used to represent data in a graphical or pictorial format. It can handle large data sets and allows for real-time data analysis.

Power BI: Power BI, another product from Microsoft, is a business analytics tool that provides interactive visualizations with self-service business intelligence capabilities.

QlikView: QlikView is a data visualization and discovery tool. It allows users to analyze data and use this data to support decision-making.

SAS: SAS is a software suite that can mine, alter, manage and retrieve data from a variety of sources and perform statistical analysis on it.

SPSS: SPSS (Statistical Package for the Social Sciences) is a software package used for statistical analysis. It’s widely used in social sciences research but also in other industries.

Google Analytics: For web data, Google Analytics is a popular tool. It allows businesses to analyze in-depth detail about the visitors on their website, providing valuable insights that can help shape the success strategy of a business.

R and Python: Both are programming languages that have robust capabilities for statistical analysis and data visualization. With packages like pandas, matplotlib, seaborn in Python and ggplot2, dplyr in R, these languages are powerful tools for descriptive analytics.

Looker: Looker is a modern data platform that can take data from any database and let you start exploring and visualizing.

When to use Descriptive Analytics

Descriptive analytics forms the base of the data analysis workflow and is typically the first step in understanding your business or organization’s data. Here are some situations when you might use descriptive analytics:

Understanding Past Behavior: Descriptive analytics is essential for understanding what has happened in the past. If you need to understand past sales trends, customer behavior, or operational performance, descriptive analytics is the tool you’d use.

Reporting Key Metrics: Descriptive analytics is used to establish and report key performance indicators (KPIs). It can help in tracking and presenting these KPIs in dashboards or regular reports.

Identifying Patterns and Trends: If you need to identify patterns or trends in your data, descriptive analytics can provide these insights. This might include identifying seasonality in sales data, understanding peak operational times, or spotting trends in customer behavior.

Informing Business Decisions: The insights provided by descriptive analytics can inform business strategy and decision-making. By understanding what has happened in the past, you can make more informed decisions about what steps to take in the future.

Benchmarking Performance: Descriptive analytics can be used to compare current performance against historical data. This can be used for benchmarking and setting performance goals.

Auditing and Regulatory Compliance: In sectors where compliance and auditing are essential, descriptive analytics can provide the necessary data and trends over specific periods.

Initial Data Exploration: When you first acquire a dataset, descriptive analytics is useful to understand the structure of the data, the relationships between variables, and any apparent anomalies or outliers.

Examples of Descriptive Analytics

Examples of Descriptive Analytics are as follows:

Retail Industry: A retail company might use descriptive analytics to analyze sales data from the past year. They could break down sales by month to identify any seasonality trends. For example, they might find that sales increase in November and December due to holiday shopping. They could also break down sales by product to identify which items are the most popular. This analysis could inform their purchasing and stocking decisions for the next year. Additionally, data on customer demographics could be analyzed to understand who their primary customers are, guiding their marketing strategies.

Healthcare Industry: In healthcare, descriptive analytics could be used to analyze patient data over time. For instance, a hospital might analyze data on patient admissions to identify trends in admission rates. They might find that admissions for certain conditions are higher at certain times of the year. This could help them allocate resources more effectively. Also, analyzing patient outcomes data can help identify the most effective treatments or highlight areas where improvement is needed.

Finance Industry: A financial firm might use descriptive analytics to analyze historical market data. They could look at trends in stock prices, trading volume, or economic indicators to inform their investment decisions. For example, analyzing the price-earnings ratios of stocks in a certain sector over time could reveal patterns that suggest whether the sector is currently overvalued or undervalued. Similarly, credit card companies can analyze transaction data to detect any unusual patterns, which could be signs of fraud.

Advantages of Descriptive Analytics

Descriptive analytics plays a vital role in the world of data analysis, providing numerous advantages:

  • Understanding the Past: Descriptive analytics provides an understanding of what has happened in the past, offering valuable context for future decision-making.
  • Data Summarization: Descriptive analytics is used to simplify and summarize complex datasets, which can make the information more understandable and accessible.
  • Identifying Patterns and Trends: With descriptive analytics, organizations can identify patterns, trends, and correlations in their data, which can provide valuable insights.
  • Inform Decision-Making: The insights generated through descriptive analytics can inform strategic decisions and help organizations to react more quickly to events or changes in behavior.
  • Basis for Further Analysis: Descriptive analytics lays the groundwork for further analytical activities. It’s the first necessary step before moving on to more advanced forms of analytics like predictive analytics (forecasting future events) or prescriptive analytics (advising on possible outcomes).
  • Performance Evaluation: It allows organizations to evaluate their performance by comparing current results with past results, enabling them to see where improvements have been made and where further improvements can be targeted.
  • Enhanced Reporting and Dashboards: Through the use of visualization techniques, descriptive analytics can improve the quality of reports and dashboards, making the data more understandable and easier to interpret for stakeholders at all levels of the organization.
  • Immediate Value: Unlike some other types of analytics, descriptive analytics can provide immediate insights, as it doesn’t require complex models or deep analytical capabilities to provide value.

Disadvantages of Descriptive Analytics

While descriptive analytics offers numerous benefits, it also has certain limitations or disadvantages. Here are a few to consider:

  • Limited to Past Data: Descriptive analytics primarily deals with historical data and provides insights about past events. It does not predict future events or trends and can’t help you understand possible future outcomes on its own.
  • Lack of Deep Insights: While descriptive analytics helps in identifying what happened, it does not answer why it happened. For deeper insights, you would need to use diagnostic analytics, which analyzes data to understand the root cause of a particular outcome.
  • Can Be Misleading: If not properly executed, descriptive analytics can sometimes lead to incorrect conclusions. For example, correlation does not imply causation, but descriptive analytics might tempt one to make such an inference.
  • Data Quality Issues: The accuracy and usefulness of descriptive analytics are heavily reliant on the quality of the underlying data. If the data is incomplete, incorrect, or biased, the results of the descriptive analytics will be too.
  • Over-reliance on Descriptive Analytics: Businesses may rely too much on descriptive analytics and not enough on predictive and prescriptive analytics. While understanding past and present data is important, it’s equally vital to forecast future trends and make data-driven decisions based on those predictions.
  • Doesn’t Provide Actionable Insights: Descriptive analytics is used to interpret historical data and identify patterns and trends, but it doesn’t provide recommendations or courses of action. For that, prescriptive analytics is needed.

About the author

' src=

Muhammad Hassan

Researcher, Academic Writer, Web developer

You may also like

Digital Ethnography

Digital Ethnography – Types, Methods and Examples

Predictive Analytics

Predictive Analytics – Techniques, Tools and...

Big Data Analytics

Big Data Analytics -Types, Tools and Methods

Diagnostic Analytics

Diagnostic Analytics – Methods, Tools and...

Blockchain Research

Blockchain Research – Methods, Types and Examples

Social Network Analysis

Social Network Analysis – Types, Tools and...

Logo for University of Southern Queensland

Want to create or adapt books like this? Learn more about how Pressbooks supports open publishing practices.

14 Quantitative analysis: Descriptive statistics

Numeric data collected in a research project can be analysed quantitatively using statistical tools in two different ways. Descriptive analysis refers to statistically describing, aggregating, and presenting the constructs of interest or associations between these constructs. Inferential analysis refers to the statistical testing of hypotheses (theory testing). In this chapter, we will examine statistical techniques used for descriptive analysis, and the next chapter will examine statistical techniques for inferential analysis. Much of today’s quantitative data analysis is conducted using software programs such as SPSS or SAS. Readers are advised to familiarise themselves with one of these programs for understanding the concepts described in this chapter.

Data preparation

In research projects, data may be collected from a variety of sources: postal surveys, interviews, pretest or posttest experimental data, observational data, and so forth. This data must be converted into a machine-readable, numeric format, such as in a spreadsheet or a text file, so that they can be analysed by computer programs like SPSS or SAS. Data preparation usually follows the following steps:

Data coding. Coding is the process of converting data into numeric format. A codebook should be created to guide the coding process. A codebook is a comprehensive document containing a detailed description of each variable in a research study, items or measures for that variable, the format of each item (numeric, text, etc.), the response scale for each item (i.e., whether it is measured on a nominal, ordinal, interval, or ratio scale, and whether this scale is a five-point, seven-point scale, etc.), and how to code each value into a numeric format. For instance, if we have a measurement item on a seven-point Likert scale with anchors ranging from ‘strongly disagree’ to ‘strongly agree’, we may code that item as 1 for strongly disagree, 4 for neutral, and 7 for strongly agree, with the intermediate anchors in between. Nominal data such as industry type can be coded in numeric form using a coding scheme such as: 1 for manufacturing, 2 for retailing, 3 for financial, 4 for healthcare, and so forth (of course, nominal data cannot be analysed statistically). Ratio scale data such as age, income, or test scores can be coded as entered by the respondent. Sometimes, data may need to be aggregated into a different form than the format used for data collection. For instance, if a survey measuring a construct such as ‘benefits of computers’ provided respondents with a checklist of benefits that they could select from, and respondents were encouraged to choose as many of those benefits as they wanted, then the total number of checked items could be used as an aggregate measure of benefits. Note that many other forms of data—such as interview transcripts—cannot be converted into a numeric format for statistical analysis. Codebooks are especially important for large complex studies involving many variables and measurement items, where the coding process is conducted by different people, to help the coding team code data in a consistent manner, and also to help others understand and interpret the coded data.

Data entry. Coded data can be entered into a spreadsheet, database, text file, or directly into a statistical program like SPSS. Most statistical programs provide a data editor for entering data. However, these programs store data in their own native format—e.g., SPSS stores data as .sav files—which makes it difficult to share that data with other statistical programs. Hence, it is often better to enter data into a spreadsheet or database where it can be reorganised as needed, shared across programs, and subsets of data can be extracted for analysis. Smaller data sets with less than 65,000 observations and 256 items can be stored in a spreadsheet created using a program such as Microsoft Excel, while larger datasets with millions of observations will require a database. Each observation can be entered as one row in the spreadsheet, and each measurement item can be represented as one column. Data should be checked for accuracy during and after entry via occasional spot checks on a set of items or observations. Furthermore, while entering data, the coder should watch out for obvious evidence of bad data, such as the respondent selecting the ‘strongly agree’ response to all items irrespective of content, including reverse-coded items. If so, such data can be entered but should be excluded from subsequent analysis.

-1

Data transformation. Sometimes, it is necessary to transform data values before they can be meaningfully interpreted. For instance, reverse coded items—where items convey the opposite meaning of that of their underlying construct—should be reversed (e.g., in a 1-7 interval scale, 8 minus the observed value will reverse the value) before they can be compared or combined with items that are not reverse coded. Other kinds of transformations may include creating scale measures by adding individual scale items, creating a weighted index from a set of observed measures, and collapsing multiple values into fewer categories (e.g., collapsing incomes into income ranges).

Univariate analysis

Univariate analysis—or analysis of a single variable—refers to a set of statistical techniques that can describe the general properties of one variable. Univariate statistics include: frequency distribution, central tendency, and dispersion. The frequency distribution of a variable is a summary of the frequency—or percentages—of individual values or ranges of values for that variable. For instance, we can measure how many times a sample of respondents attend religious services—as a gauge of their ‘religiosity’—using a categorical scale: never, once per year, several times per year, about once a month, several times per month, several times per week, and an optional category for ‘did not answer’. If we count the number or percentage of observations within each category—except ‘did not answer’ which is really a missing value rather than a category—and display it in the form of a table, as shown in Figure 14.1, what we have is a frequency distribution. This distribution can also be depicted in the form of a bar chart, as shown on the right panel of Figure 14.1, with the horizontal axis representing each category of that variable and the vertical axis representing the frequency or percentage of observations within each category.

Frequency distribution of religiosity

With very large samples, where observations are independent and random, the frequency distribution tends to follow a plot that looks like a bell-shaped curve—a smoothed bar chart of the frequency distribution—similar to that shown in Figure 14.2. Here most observations are clustered toward the centre of the range of values, with fewer and fewer observations clustered toward the extreme ends of the range. Such a curve is called a normal distribution .

(15 + 20 + 21 + 20 + 36 + 15 + 25 + 15)/8=20.875

Lastly, the mode is the most frequently occurring value in a distribution of values. In the previous example, the most frequently occurring value is 15, which is the mode of the above set of test scores. Note that any value that is estimated from a sample, such as mean, median, mode, or any of the later estimates are called a statistic .

36-15=21

Bivariate analysis

Bivariate analysis examines how two variables are related to one another. The most common bivariate statistic is the bivariate correlation —often, simply called ‘correlation’—which is a number between -1 and +1 denoting the strength of the relationship between two variables. Say that we wish to study how age is related to self-esteem in a sample of 20 respondents—i.e., as age increases, does self-esteem increase, decrease, or remain unchanged?. If self-esteem increases, then we have a positive correlation between the two variables, if self-esteem decreases, then we have a negative correlation, and if it remains the same, we have a zero correlation. To calculate the value of this correlation, consider the hypothetical dataset shown in Table 14.1.

Normal distribution

After computing bivariate correlation, researchers are often interested in knowing whether the correlation is significant (i.e., a real one) or caused by mere chance. Answering such a question would require testing the following hypothesis:

\[H_0:\quad r = 0 \]

Social Science Research: Principles, Methods and Practices (Revised edition) Copyright © 2019 by Anol Bhattacherjee is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License , except where otherwise noted.

Share This Book

U.S. flag

An official website of the United States government

The .gov means it’s official. Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

The site is secure. The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

  • Publications
  • Account settings

Preview improvements coming to the PMC website in October 2024. Learn More or Try it out now .

  • Advanced Search
  • Journal List
  • J Korean Med Sci
  • v.37(16); 2022 Apr 25

Logo of jkms

A Practical Guide to Writing Quantitative and Qualitative Research Questions and Hypotheses in Scholarly Articles

Edward barroga.

1 Department of General Education, Graduate School of Nursing Science, St. Luke’s International University, Tokyo, Japan.

Glafera Janet Matanguihan

2 Department of Biological Sciences, Messiah University, Mechanicsburg, PA, USA.

The development of research questions and the subsequent hypotheses are prerequisites to defining the main research purpose and specific objectives of a study. Consequently, these objectives determine the study design and research outcome. The development of research questions is a process based on knowledge of current trends, cutting-edge studies, and technological advances in the research field. Excellent research questions are focused and require a comprehensive literature search and in-depth understanding of the problem being investigated. Initially, research questions may be written as descriptive questions which could be developed into inferential questions. These questions must be specific and concise to provide a clear foundation for developing hypotheses. Hypotheses are more formal predictions about the research outcomes. These specify the possible results that may or may not be expected regarding the relationship between groups. Thus, research questions and hypotheses clarify the main purpose and specific objectives of the study, which in turn dictate the design of the study, its direction, and outcome. Studies developed from good research questions and hypotheses will have trustworthy outcomes with wide-ranging social and health implications.

INTRODUCTION

Scientific research is usually initiated by posing evidenced-based research questions which are then explicitly restated as hypotheses. 1 , 2 The hypotheses provide directions to guide the study, solutions, explanations, and expected results. 3 , 4 Both research questions and hypotheses are essentially formulated based on conventional theories and real-world processes, which allow the inception of novel studies and the ethical testing of ideas. 5 , 6

It is crucial to have knowledge of both quantitative and qualitative research 2 as both types of research involve writing research questions and hypotheses. 7 However, these crucial elements of research are sometimes overlooked; if not overlooked, then framed without the forethought and meticulous attention it needs. Planning and careful consideration are needed when developing quantitative or qualitative research, particularly when conceptualizing research questions and hypotheses. 4

There is a continuing need to support researchers in the creation of innovative research questions and hypotheses, as well as for journal articles that carefully review these elements. 1 When research questions and hypotheses are not carefully thought of, unethical studies and poor outcomes usually ensue. Carefully formulated research questions and hypotheses define well-founded objectives, which in turn determine the appropriate design, course, and outcome of the study. This article then aims to discuss in detail the various aspects of crafting research questions and hypotheses, with the goal of guiding researchers as they develop their own. Examples from the authors and peer-reviewed scientific articles in the healthcare field are provided to illustrate key points.

DEFINITIONS AND RELATIONSHIP OF RESEARCH QUESTIONS AND HYPOTHESES

A research question is what a study aims to answer after data analysis and interpretation. The answer is written in length in the discussion section of the paper. Thus, the research question gives a preview of the different parts and variables of the study meant to address the problem posed in the research question. 1 An excellent research question clarifies the research writing while facilitating understanding of the research topic, objective, scope, and limitations of the study. 5

On the other hand, a research hypothesis is an educated statement of an expected outcome. This statement is based on background research and current knowledge. 8 , 9 The research hypothesis makes a specific prediction about a new phenomenon 10 or a formal statement on the expected relationship between an independent variable and a dependent variable. 3 , 11 It provides a tentative answer to the research question to be tested or explored. 4

Hypotheses employ reasoning to predict a theory-based outcome. 10 These can also be developed from theories by focusing on components of theories that have not yet been observed. 10 The validity of hypotheses is often based on the testability of the prediction made in a reproducible experiment. 8

Conversely, hypotheses can also be rephrased as research questions. Several hypotheses based on existing theories and knowledge may be needed to answer a research question. Developing ethical research questions and hypotheses creates a research design that has logical relationships among variables. These relationships serve as a solid foundation for the conduct of the study. 4 , 11 Haphazardly constructed research questions can result in poorly formulated hypotheses and improper study designs, leading to unreliable results. Thus, the formulations of relevant research questions and verifiable hypotheses are crucial when beginning research. 12

CHARACTERISTICS OF GOOD RESEARCH QUESTIONS AND HYPOTHESES

Excellent research questions are specific and focused. These integrate collective data and observations to confirm or refute the subsequent hypotheses. Well-constructed hypotheses are based on previous reports and verify the research context. These are realistic, in-depth, sufficiently complex, and reproducible. More importantly, these hypotheses can be addressed and tested. 13

There are several characteristics of well-developed hypotheses. Good hypotheses are 1) empirically testable 7 , 10 , 11 , 13 ; 2) backed by preliminary evidence 9 ; 3) testable by ethical research 7 , 9 ; 4) based on original ideas 9 ; 5) have evidenced-based logical reasoning 10 ; and 6) can be predicted. 11 Good hypotheses can infer ethical and positive implications, indicating the presence of a relationship or effect relevant to the research theme. 7 , 11 These are initially developed from a general theory and branch into specific hypotheses by deductive reasoning. In the absence of a theory to base the hypotheses, inductive reasoning based on specific observations or findings form more general hypotheses. 10

TYPES OF RESEARCH QUESTIONS AND HYPOTHESES

Research questions and hypotheses are developed according to the type of research, which can be broadly classified into quantitative and qualitative research. We provide a summary of the types of research questions and hypotheses under quantitative and qualitative research categories in Table 1 .

Research questions in quantitative research

In quantitative research, research questions inquire about the relationships among variables being investigated and are usually framed at the start of the study. These are precise and typically linked to the subject population, dependent and independent variables, and research design. 1 Research questions may also attempt to describe the behavior of a population in relation to one or more variables, or describe the characteristics of variables to be measured ( descriptive research questions ). 1 , 5 , 14 These questions may also aim to discover differences between groups within the context of an outcome variable ( comparative research questions ), 1 , 5 , 14 or elucidate trends and interactions among variables ( relationship research questions ). 1 , 5 We provide examples of descriptive, comparative, and relationship research questions in quantitative research in Table 2 .

Hypotheses in quantitative research

In quantitative research, hypotheses predict the expected relationships among variables. 15 Relationships among variables that can be predicted include 1) between a single dependent variable and a single independent variable ( simple hypothesis ) or 2) between two or more independent and dependent variables ( complex hypothesis ). 4 , 11 Hypotheses may also specify the expected direction to be followed and imply an intellectual commitment to a particular outcome ( directional hypothesis ) 4 . On the other hand, hypotheses may not predict the exact direction and are used in the absence of a theory, or when findings contradict previous studies ( non-directional hypothesis ). 4 In addition, hypotheses can 1) define interdependency between variables ( associative hypothesis ), 4 2) propose an effect on the dependent variable from manipulation of the independent variable ( causal hypothesis ), 4 3) state a negative relationship between two variables ( null hypothesis ), 4 , 11 , 15 4) replace the working hypothesis if rejected ( alternative hypothesis ), 15 explain the relationship of phenomena to possibly generate a theory ( working hypothesis ), 11 5) involve quantifiable variables that can be tested statistically ( statistical hypothesis ), 11 6) or express a relationship whose interlinks can be verified logically ( logical hypothesis ). 11 We provide examples of simple, complex, directional, non-directional, associative, causal, null, alternative, working, statistical, and logical hypotheses in quantitative research, as well as the definition of quantitative hypothesis-testing research in Table 3 .

Research questions in qualitative research

Unlike research questions in quantitative research, research questions in qualitative research are usually continuously reviewed and reformulated. The central question and associated subquestions are stated more than the hypotheses. 15 The central question broadly explores a complex set of factors surrounding the central phenomenon, aiming to present the varied perspectives of participants. 15

There are varied goals for which qualitative research questions are developed. These questions can function in several ways, such as to 1) identify and describe existing conditions ( contextual research question s); 2) describe a phenomenon ( descriptive research questions ); 3) assess the effectiveness of existing methods, protocols, theories, or procedures ( evaluation research questions ); 4) examine a phenomenon or analyze the reasons or relationships between subjects or phenomena ( explanatory research questions ); or 5) focus on unknown aspects of a particular topic ( exploratory research questions ). 5 In addition, some qualitative research questions provide new ideas for the development of theories and actions ( generative research questions ) or advance specific ideologies of a position ( ideological research questions ). 1 Other qualitative research questions may build on a body of existing literature and become working guidelines ( ethnographic research questions ). Research questions may also be broadly stated without specific reference to the existing literature or a typology of questions ( phenomenological research questions ), may be directed towards generating a theory of some process ( grounded theory questions ), or may address a description of the case and the emerging themes ( qualitative case study questions ). 15 We provide examples of contextual, descriptive, evaluation, explanatory, exploratory, generative, ideological, ethnographic, phenomenological, grounded theory, and qualitative case study research questions in qualitative research in Table 4 , and the definition of qualitative hypothesis-generating research in Table 5 .

Qualitative studies usually pose at least one central research question and several subquestions starting with How or What . These research questions use exploratory verbs such as explore or describe . These also focus on one central phenomenon of interest, and may mention the participants and research site. 15

Hypotheses in qualitative research

Hypotheses in qualitative research are stated in the form of a clear statement concerning the problem to be investigated. Unlike in quantitative research where hypotheses are usually developed to be tested, qualitative research can lead to both hypothesis-testing and hypothesis-generating outcomes. 2 When studies require both quantitative and qualitative research questions, this suggests an integrative process between both research methods wherein a single mixed-methods research question can be developed. 1

FRAMEWORKS FOR DEVELOPING RESEARCH QUESTIONS AND HYPOTHESES

Research questions followed by hypotheses should be developed before the start of the study. 1 , 12 , 14 It is crucial to develop feasible research questions on a topic that is interesting to both the researcher and the scientific community. This can be achieved by a meticulous review of previous and current studies to establish a novel topic. Specific areas are subsequently focused on to generate ethical research questions. The relevance of the research questions is evaluated in terms of clarity of the resulting data, specificity of the methodology, objectivity of the outcome, depth of the research, and impact of the study. 1 , 5 These aspects constitute the FINER criteria (i.e., Feasible, Interesting, Novel, Ethical, and Relevant). 1 Clarity and effectiveness are achieved if research questions meet the FINER criteria. In addition to the FINER criteria, Ratan et al. described focus, complexity, novelty, feasibility, and measurability for evaluating the effectiveness of research questions. 14

The PICOT and PEO frameworks are also used when developing research questions. 1 The following elements are addressed in these frameworks, PICOT: P-population/patients/problem, I-intervention or indicator being studied, C-comparison group, O-outcome of interest, and T-timeframe of the study; PEO: P-population being studied, E-exposure to preexisting conditions, and O-outcome of interest. 1 Research questions are also considered good if these meet the “FINERMAPS” framework: Feasible, Interesting, Novel, Ethical, Relevant, Manageable, Appropriate, Potential value/publishable, and Systematic. 14

As we indicated earlier, research questions and hypotheses that are not carefully formulated result in unethical studies or poor outcomes. To illustrate this, we provide some examples of ambiguous research question and hypotheses that result in unclear and weak research objectives in quantitative research ( Table 6 ) 16 and qualitative research ( Table 7 ) 17 , and how to transform these ambiguous research question(s) and hypothesis(es) into clear and good statements.

a These statements were composed for comparison and illustrative purposes only.

b These statements are direct quotes from Higashihara and Horiuchi. 16

a This statement is a direct quote from Shimoda et al. 17

The other statements were composed for comparison and illustrative purposes only.

CONSTRUCTING RESEARCH QUESTIONS AND HYPOTHESES

To construct effective research questions and hypotheses, it is very important to 1) clarify the background and 2) identify the research problem at the outset of the research, within a specific timeframe. 9 Then, 3) review or conduct preliminary research to collect all available knowledge about the possible research questions by studying theories and previous studies. 18 Afterwards, 4) construct research questions to investigate the research problem. Identify variables to be accessed from the research questions 4 and make operational definitions of constructs from the research problem and questions. Thereafter, 5) construct specific deductive or inductive predictions in the form of hypotheses. 4 Finally, 6) state the study aims . This general flow for constructing effective research questions and hypotheses prior to conducting research is shown in Fig. 1 .

An external file that holds a picture, illustration, etc.
Object name is jkms-37-e121-g001.jpg

Research questions are used more frequently in qualitative research than objectives or hypotheses. 3 These questions seek to discover, understand, explore or describe experiences by asking “What” or “How.” The questions are open-ended to elicit a description rather than to relate variables or compare groups. The questions are continually reviewed, reformulated, and changed during the qualitative study. 3 Research questions are also used more frequently in survey projects than hypotheses in experiments in quantitative research to compare variables and their relationships.

Hypotheses are constructed based on the variables identified and as an if-then statement, following the template, ‘If a specific action is taken, then a certain outcome is expected.’ At this stage, some ideas regarding expectations from the research to be conducted must be drawn. 18 Then, the variables to be manipulated (independent) and influenced (dependent) are defined. 4 Thereafter, the hypothesis is stated and refined, and reproducible data tailored to the hypothesis are identified, collected, and analyzed. 4 The hypotheses must be testable and specific, 18 and should describe the variables and their relationships, the specific group being studied, and the predicted research outcome. 18 Hypotheses construction involves a testable proposition to be deduced from theory, and independent and dependent variables to be separated and measured separately. 3 Therefore, good hypotheses must be based on good research questions constructed at the start of a study or trial. 12

In summary, research questions are constructed after establishing the background of the study. Hypotheses are then developed based on the research questions. Thus, it is crucial to have excellent research questions to generate superior hypotheses. In turn, these would determine the research objectives and the design of the study, and ultimately, the outcome of the research. 12 Algorithms for building research questions and hypotheses are shown in Fig. 2 for quantitative research and in Fig. 3 for qualitative research.

An external file that holds a picture, illustration, etc.
Object name is jkms-37-e121-g002.jpg

EXAMPLES OF RESEARCH QUESTIONS FROM PUBLISHED ARTICLES

  • EXAMPLE 1. Descriptive research question (quantitative research)
  • - Presents research variables to be assessed (distinct phenotypes and subphenotypes)
  • “BACKGROUND: Since COVID-19 was identified, its clinical and biological heterogeneity has been recognized. Identifying COVID-19 phenotypes might help guide basic, clinical, and translational research efforts.
  • RESEARCH QUESTION: Does the clinical spectrum of patients with COVID-19 contain distinct phenotypes and subphenotypes? ” 19
  • EXAMPLE 2. Relationship research question (quantitative research)
  • - Shows interactions between dependent variable (static postural control) and independent variable (peripheral visual field loss)
  • “Background: Integration of visual, vestibular, and proprioceptive sensations contributes to postural control. People with peripheral visual field loss have serious postural instability. However, the directional specificity of postural stability and sensory reweighting caused by gradual peripheral visual field loss remain unclear.
  • Research question: What are the effects of peripheral visual field loss on static postural control ?” 20
  • EXAMPLE 3. Comparative research question (quantitative research)
  • - Clarifies the difference among groups with an outcome variable (patients enrolled in COMPERA with moderate PH or severe PH in COPD) and another group without the outcome variable (patients with idiopathic pulmonary arterial hypertension (IPAH))
  • “BACKGROUND: Pulmonary hypertension (PH) in COPD is a poorly investigated clinical condition.
  • RESEARCH QUESTION: Which factors determine the outcome of PH in COPD?
  • STUDY DESIGN AND METHODS: We analyzed the characteristics and outcome of patients enrolled in the Comparative, Prospective Registry of Newly Initiated Therapies for Pulmonary Hypertension (COMPERA) with moderate or severe PH in COPD as defined during the 6th PH World Symposium who received medical therapy for PH and compared them with patients with idiopathic pulmonary arterial hypertension (IPAH) .” 21
  • EXAMPLE 4. Exploratory research question (qualitative research)
  • - Explores areas that have not been fully investigated (perspectives of families and children who receive care in clinic-based child obesity treatment) to have a deeper understanding of the research problem
  • “Problem: Interventions for children with obesity lead to only modest improvements in BMI and long-term outcomes, and data are limited on the perspectives of families of children with obesity in clinic-based treatment. This scoping review seeks to answer the question: What is known about the perspectives of families and children who receive care in clinic-based child obesity treatment? This review aims to explore the scope of perspectives reported by families of children with obesity who have received individualized outpatient clinic-based obesity treatment.” 22
  • EXAMPLE 5. Relationship research question (quantitative research)
  • - Defines interactions between dependent variable (use of ankle strategies) and independent variable (changes in muscle tone)
  • “Background: To maintain an upright standing posture against external disturbances, the human body mainly employs two types of postural control strategies: “ankle strategy” and “hip strategy.” While it has been reported that the magnitude of the disturbance alters the use of postural control strategies, it has not been elucidated how the level of muscle tone, one of the crucial parameters of bodily function, determines the use of each strategy. We have previously confirmed using forward dynamics simulations of human musculoskeletal models that an increased muscle tone promotes the use of ankle strategies. The objective of the present study was to experimentally evaluate a hypothesis: an increased muscle tone promotes the use of ankle strategies. Research question: Do changes in the muscle tone affect the use of ankle strategies ?” 23

EXAMPLES OF HYPOTHESES IN PUBLISHED ARTICLES

  • EXAMPLE 1. Working hypothesis (quantitative research)
  • - A hypothesis that is initially accepted for further research to produce a feasible theory
  • “As fever may have benefit in shortening the duration of viral illness, it is plausible to hypothesize that the antipyretic efficacy of ibuprofen may be hindering the benefits of a fever response when taken during the early stages of COVID-19 illness .” 24
  • “In conclusion, it is plausible to hypothesize that the antipyretic efficacy of ibuprofen may be hindering the benefits of a fever response . The difference in perceived safety of these agents in COVID-19 illness could be related to the more potent efficacy to reduce fever with ibuprofen compared to acetaminophen. Compelling data on the benefit of fever warrant further research and review to determine when to treat or withhold ibuprofen for early stage fever for COVID-19 and other related viral illnesses .” 24
  • EXAMPLE 2. Exploratory hypothesis (qualitative research)
  • - Explores particular areas deeper to clarify subjective experience and develop a formal hypothesis potentially testable in a future quantitative approach
  • “We hypothesized that when thinking about a past experience of help-seeking, a self distancing prompt would cause increased help-seeking intentions and more favorable help-seeking outcome expectations .” 25
  • “Conclusion
  • Although a priori hypotheses were not supported, further research is warranted as results indicate the potential for using self-distancing approaches to increasing help-seeking among some people with depressive symptomatology.” 25
  • EXAMPLE 3. Hypothesis-generating research to establish a framework for hypothesis testing (qualitative research)
  • “We hypothesize that compassionate care is beneficial for patients (better outcomes), healthcare systems and payers (lower costs), and healthcare providers (lower burnout). ” 26
  • Compassionomics is the branch of knowledge and scientific study of the effects of compassionate healthcare. Our main hypotheses are that compassionate healthcare is beneficial for (1) patients, by improving clinical outcomes, (2) healthcare systems and payers, by supporting financial sustainability, and (3) HCPs, by lowering burnout and promoting resilience and well-being. The purpose of this paper is to establish a scientific framework for testing the hypotheses above . If these hypotheses are confirmed through rigorous research, compassionomics will belong in the science of evidence-based medicine, with major implications for all healthcare domains.” 26
  • EXAMPLE 4. Statistical hypothesis (quantitative research)
  • - An assumption is made about the relationship among several population characteristics ( gender differences in sociodemographic and clinical characteristics of adults with ADHD ). Validity is tested by statistical experiment or analysis ( chi-square test, Students t-test, and logistic regression analysis)
  • “Our research investigated gender differences in sociodemographic and clinical characteristics of adults with ADHD in a Japanese clinical sample. Due to unique Japanese cultural ideals and expectations of women's behavior that are in opposition to ADHD symptoms, we hypothesized that women with ADHD experience more difficulties and present more dysfunctions than men . We tested the following hypotheses: first, women with ADHD have more comorbidities than men with ADHD; second, women with ADHD experience more social hardships than men, such as having less full-time employment and being more likely to be divorced.” 27
  • “Statistical Analysis
  • ( text omitted ) Between-gender comparisons were made using the chi-squared test for categorical variables and Students t-test for continuous variables…( text omitted ). A logistic regression analysis was performed for employment status, marital status, and comorbidity to evaluate the independent effects of gender on these dependent variables.” 27

EXAMPLES OF HYPOTHESIS AS WRITTEN IN PUBLISHED ARTICLES IN RELATION TO OTHER PARTS

  • EXAMPLE 1. Background, hypotheses, and aims are provided
  • “Pregnant women need skilled care during pregnancy and childbirth, but that skilled care is often delayed in some countries …( text omitted ). The focused antenatal care (FANC) model of WHO recommends that nurses provide information or counseling to all pregnant women …( text omitted ). Job aids are visual support materials that provide the right kind of information using graphics and words in a simple and yet effective manner. When nurses are not highly trained or have many work details to attend to, these job aids can serve as a content reminder for the nurses and can be used for educating their patients (Jennings, Yebadokpo, Affo, & Agbogbe, 2010) ( text omitted ). Importantly, additional evidence is needed to confirm how job aids can further improve the quality of ANC counseling by health workers in maternal care …( text omitted )” 28
  • “ This has led us to hypothesize that the quality of ANC counseling would be better if supported by job aids. Consequently, a better quality of ANC counseling is expected to produce higher levels of awareness concerning the danger signs of pregnancy and a more favorable impression of the caring behavior of nurses .” 28
  • “This study aimed to examine the differences in the responses of pregnant women to a job aid-supported intervention during ANC visit in terms of 1) their understanding of the danger signs of pregnancy and 2) their impression of the caring behaviors of nurses to pregnant women in rural Tanzania.” 28
  • EXAMPLE 2. Background, hypotheses, and aims are provided
  • “We conducted a two-arm randomized controlled trial (RCT) to evaluate and compare changes in salivary cortisol and oxytocin levels of first-time pregnant women between experimental and control groups. The women in the experimental group touched and held an infant for 30 min (experimental intervention protocol), whereas those in the control group watched a DVD movie of an infant (control intervention protocol). The primary outcome was salivary cortisol level and the secondary outcome was salivary oxytocin level.” 29
  • “ We hypothesize that at 30 min after touching and holding an infant, the salivary cortisol level will significantly decrease and the salivary oxytocin level will increase in the experimental group compared with the control group .” 29
  • EXAMPLE 3. Background, aim, and hypothesis are provided
  • “In countries where the maternal mortality ratio remains high, antenatal education to increase Birth Preparedness and Complication Readiness (BPCR) is considered one of the top priorities [1]. BPCR includes birth plans during the antenatal period, such as the birthplace, birth attendant, transportation, health facility for complications, expenses, and birth materials, as well as family coordination to achieve such birth plans. In Tanzania, although increasing, only about half of all pregnant women attend an antenatal clinic more than four times [4]. Moreover, the information provided during antenatal care (ANC) is insufficient. In the resource-poor settings, antenatal group education is a potential approach because of the limited time for individual counseling at antenatal clinics.” 30
  • “This study aimed to evaluate an antenatal group education program among pregnant women and their families with respect to birth-preparedness and maternal and infant outcomes in rural villages of Tanzania.” 30
  • “ The study hypothesis was if Tanzanian pregnant women and their families received a family-oriented antenatal group education, they would (1) have a higher level of BPCR, (2) attend antenatal clinic four or more times, (3) give birth in a health facility, (4) have less complications of women at birth, and (5) have less complications and deaths of infants than those who did not receive the education .” 30

Research questions and hypotheses are crucial components to any type of research, whether quantitative or qualitative. These questions should be developed at the very beginning of the study. Excellent research questions lead to superior hypotheses, which, like a compass, set the direction of research, and can often determine the successful conduct of the study. Many research studies have floundered because the development of research questions and subsequent hypotheses was not given the thought and meticulous attention needed. The development of research questions and hypotheses is an iterative process based on extensive knowledge of the literature and insightful grasp of the knowledge gap. Focused, concise, and specific research questions provide a strong foundation for constructing hypotheses which serve as formal predictions about the research outcomes. Research questions and hypotheses are crucial elements of research that should not be overlooked. They should be carefully thought of and constructed when planning research. This avoids unethical studies and poor outcomes by defining well-founded objectives that determine the design, course, and outcome of the study.

Disclosure: The authors have no potential conflicts of interest to disclose.

Author Contributions:

  • Conceptualization: Barroga E, Matanguihan GJ.
  • Methodology: Barroga E, Matanguihan GJ.
  • Writing - original draft: Barroga E, Matanguihan GJ.
  • Writing - review & editing: Barroga E, Matanguihan GJ.

Enago Academy

Bridging the Gap: Overcome these 7 flaws in descriptive research design

' src=

Descriptive research design is a powerful tool used by scientists and researchers to gather information about a particular group or phenomenon. This type of research provides a detailed and accurate picture of the characteristics and behaviors of a particular population or subject. By observing and collecting data on a given topic, descriptive research helps researchers gain a deeper understanding of a specific issue and provides valuable insights that can inform future studies.

In this blog, we will explore the definition, characteristics, and common flaws in descriptive research design, and provide tips on how to avoid these pitfalls to produce high-quality results. Whether you are a seasoned researcher or a student just starting, understanding the fundamentals of descriptive research design is essential to conducting successful scientific studies.

Table of Contents

What Is Descriptive Research Design?

The descriptive research design involves observing and collecting data on a given topic without attempting to infer cause-and-effect relationships. The goal of descriptive research is to provide a comprehensive and accurate picture of the population or phenomenon being studied and to describe the relationships, patterns, and trends that exist within the data.

Descriptive research methods can include surveys, observational studies , and case studies, and the data collected can be qualitative or quantitative . The findings from descriptive research provide valuable insights and inform future research, but do not establish cause-and-effect relationships.

Importance of Descriptive Research in Scientific Studies

1. understanding of a population or phenomenon.

Descriptive research provides a comprehensive picture of the characteristics and behaviors of a particular population or phenomenon, allowing researchers to gain a deeper understanding of the topic.

2. Baseline Information

The information gathered through descriptive research can serve as a baseline for future research and provide a foundation for further studies.

3. Informative Data

Descriptive research can provide valuable information and insights into a particular topic, which can inform future research, policy decisions, and programs.

4. Sampling Validation

Descriptive research can be used to validate sampling methods and to help researchers determine the best approach for their study.

5. Cost Effective

Descriptive research is often less expensive and less time-consuming than other research methods , making it a cost-effective way to gather information about a particular population or phenomenon.

6. Easy to Replicate

Descriptive research is straightforward to replicate, making it a reliable way to gather and compare information from multiple sources.

Key Characteristics of Descriptive Research Design

The primary purpose of descriptive research is to describe the characteristics, behaviors, and attributes of a particular population or phenomenon.

2. Participants and Sampling

Descriptive research studies a particular population or sample that is representative of the larger population being studied. Furthermore, sampling methods can include convenience, stratified, or random sampling.

3. Data Collection Techniques

Descriptive research typically involves the collection of both qualitative and quantitative data through methods such as surveys, observational studies, case studies, or focus groups.

4. Data Analysis

Descriptive research data is analyzed to identify patterns, relationships, and trends within the data. Statistical techniques , such as frequency distributions and descriptive statistics, are commonly used to summarize and describe the data.

5. Focus on Description

Descriptive research is focused on describing and summarizing the characteristics of a particular population or phenomenon. It does not make causal inferences.

6. Non-Experimental

Descriptive research is non-experimental, meaning that the researcher does not manipulate variables or control conditions. The researcher simply observes and collects data on the population or phenomenon being studied.

When Can a Researcher Conduct Descriptive Research?

A researcher can conduct descriptive research in the following situations:

  • To better understand a particular population or phenomenon
  • To describe the relationships between variables
  • To describe patterns and trends
  • To validate sampling methods and determine the best approach for a study
  • To compare data from multiple sources.

Types of Descriptive Research Design

1. survey research.

Surveys are a type of descriptive research that involves collecting data through self-administered or interviewer-administered questionnaires. Additionally, they can be administered in-person, by mail, or online, and can collect both qualitative and quantitative data.

2. Observational Research

Observational research involves observing and collecting data on a particular population or phenomenon without manipulating variables or controlling conditions. It can be conducted in naturalistic settings or controlled laboratory settings.

3. Case Study Research

Case study research is a type of descriptive research that focuses on a single individual, group, or event. It involves collecting detailed information on the subject through a variety of methods, including interviews, observations, and examination of documents.

4. Focus Group Research

Focus group research involves bringing together a small group of people to discuss a particular topic or product. Furthermore, the group is usually moderated by a researcher and the discussion is recorded for later analysis.

5. Ethnographic Research

Ethnographic research involves conducting detailed observations of a particular culture or community. It is often used to gain a deep understanding of the beliefs, behaviors, and practices of a particular group.

Advantages of Descriptive Research Design

1. provides a comprehensive understanding.

Descriptive research provides a comprehensive picture of the characteristics, behaviors, and attributes of a particular population or phenomenon, which can be useful in informing future research and policy decisions.

2. Non-invasive

Descriptive research is non-invasive and does not manipulate variables or control conditions, making it a suitable method for sensitive or ethical concerns.

3. Flexibility

Descriptive research allows for a wide range of data collection methods , including surveys, observational studies, case studies, and focus groups, making it a flexible and versatile research method.

4. Cost-effective

Descriptive research is often less expensive and less time-consuming than other research methods. Moreover, it gives a cost-effective option to many researchers.

5. Easy to Replicate

Descriptive research is easy to replicate, making it a reliable way to gather and compare information from multiple sources.

6. Informs Future Research

The insights gained from a descriptive research can inform future research and inform policy decisions and programs.

Disadvantages of Descriptive Research Design

1. limited scope.

Descriptive research only provides a snapshot of the current situation and cannot establish cause-and-effect relationships.

2. Dependence on Existing Data

Descriptive research relies on existing data, which may not always be comprehensive or accurate.

3. Lack of Control

Researchers have no control over the variables in descriptive research, which can limit the conclusions that can be drawn.

The researcher’s own biases and preconceptions can influence the interpretation of the data.

5. Lack of Generalizability

Descriptive research findings may not be applicable to other populations or situations.

6. Lack of Depth

Descriptive research provides a surface-level understanding of a phenomenon, rather than a deep understanding.

7. Time-consuming

Descriptive research often requires a large amount of data collection and analysis, which can be time-consuming and resource-intensive.

7 Ways to Avoid Common Flaws While Designing Descriptive Research

descriptive formula in research

1. Clearly define the research question

A clearly defined research question is the foundation of any research study, and it is important to ensure that the question is both specific and relevant to the topic being studied.

2. Choose the appropriate research design

Choosing the appropriate research design for a study is crucial to the success of the study. Moreover, researchers should choose a design that best fits the research question and the type of data needed to answer it.

3. Select a representative sample

Selecting a representative sample is important to ensure that the findings of the study are generalizable to the population being studied. Researchers should use a sampling method that provides a random and representative sample of the population.

4. Use valid and reliable data collection methods

Using valid and reliable data collection methods is important to ensure that the data collected is accurate and can be used to answer the research question. Researchers should choose methods that are appropriate for the study and that can be administered consistently and systematically.

5. Minimize bias

Bias can significantly impact the validity and reliability of research findings.  Furthermore, it is important to minimize bias in all aspects of the study, from the selection of participants to the analysis of data.

6. Ensure adequate sample size

An adequate sample size is important to ensure that the results of the study are statistically significant and can be generalized to the population being studied.

7. Use appropriate data analysis techniques

The appropriate data analysis technique depends on the type of data collected and the research question being asked. Researchers should choose techniques that are appropriate for the data and the question being asked.

Have you worked on descriptive research designs? How was your experience creating a descriptive design? What challenges did you face? Do write to us or leave a comment below and share your insights on descriptive research designs!

' src=

extremely very educative

Indeed very educative and useful. Well explained. Thank you

Simple,easy to understand

Rate this article Cancel Reply

Your email address will not be published.

descriptive formula in research

Enago Academy's Most Popular Articles

7 Step Guide for Optimizing Impactful Research Process

  • Publishing Research
  • Reporting Research

How to Optimize Your Research Process: A step-by-step guide

For researchers across disciplines, the path to uncovering novel findings and insights is often filled…

Launch of "Sony Women in Technology Award with Nature"

  • Industry News
  • Trending Now

Breaking Barriers: Sony and Nature unveil “Women in Technology Award”

Sony Group Corporation and the prestigious scientific journal Nature have collaborated to launch the inaugural…

Guide to Adhere Good Research Practice (FREE CHECKLIST)

Achieving Research Excellence: Checklist for good research practices

Academia is built on the foundation of trustworthy and high-quality research, supported by the pillars…

ResearchSummary

  • Promoting Research

Plain Language Summary — Communicating your research to bridge the academic-lay gap

Science can be complex, but does that mean it should not be accessible to the…

Journals Combat Image Manipulation with AI

Science under Surveillance: Journals adopt advanced AI to uncover image manipulation

Journals are increasingly turning to cutting-edge AI tools to uncover deceitful images published in manuscripts.…

Choosing the Right Analytical Approach: Thematic analysis vs. content analysis for…

Comparing Cross Sectional and Longitudinal Studies: 5 steps for choosing the right…

Research Recommendations – Guiding policy-makers for evidence-based decision making

descriptive formula in research

Sign-up to read more

Subscribe for free to get unrestricted access to all our resources on research writing and academic publishing including:

  • 2000+ blog articles
  • 50+ Webinars
  • 10+ Expert podcasts
  • 50+ Infographics
  • 10+ Checklists
  • Research Guides

We hate spam too. We promise to protect your privacy and never spam you.

I am looking for Editing/ Proofreading services for my manuscript Tentative date of next journal submission:

descriptive formula in research

What should universities' stance be on AI tools in research and academic writing?

Logo for University of Iowa Pressbooks

Want to create or adapt books like this? Learn more about how Pressbooks supports open publishing practices.

Unit 3. Descriptive Statistics for Psychological Research

J Toby Mordkoff and Leyre Castro

Summary. This unit briefly reviews the distinction between descriptive and inferential statistics and then discusses the ways in which both numerical and categorical data are usually summarized for psychological research.  Different measures of center and spread, and when to use them, are explained.  The shape of the data is also discussed.

Prerequisite Units

Unit 1. Introduction to Statistics for Psychological Science

Unit 2. Managing Data

Introduction

Assume that you are interested in some attribute or characteristic of a very large number of people, such as the average hours of sleep per night for all undergraduates at all universities.  Clearly, you are not going to do this by measuring the hours of sleep for every student, as that would be difficult to impossible.  So, instead, you will probably take a relatively small sample of students (e.g., 100 people), ask each of them how many hours of sleep they usually get, and then use these data to estimate the average for all undergraduates.

The process outlined above can be thought of as having three phases or steps: (1) collect a sample, (2) summarize the data in the sample, and (3) use the summarized data to make the estimate of the entire population.  The issues related to collecting the sample, such as how one ensures that the sample is representative of the entire population will not be discussed here.  Likewise, the way that one uses the summary of a sample to calculate an estimate of the population will not be explained here.  This unit will focus on the second step: the way in which psychologists summarize data.

The general label for procedures that summarize data is descriptive statistics .  This can be contrasted with procedures that make estimates of population values, which are known as inferential statistics .  Thus, descriptive and inferential statistics each give different insights into the nature of the data gathered.  Descriptive statistics describe the data so that the big picture can be seen.  How?  By organizing and summarizing properties of a data set.  Calculating descriptive statistics takes unordered observations and logically organizes them in some way.  This allow us to describe the data obtained, but it does not make conclusions beyond the sample.  This is important, because part of conducting (good) research is being able to communicate your findings to other people, and descriptive statistics will allow you to do this quickly, clearly, and precisely.

To prepare you for what follows, please note two things in advance.  First, there are several different ways that we can summarize a large set of data.  Most of all: we can use numbers or we can use graphical representations.  Furthermore, when the data are numerical, we will have options for several of the summary values that we need to calculate.  This may seem confusing at first; hopefully, it soon will make sense.  Second, but related to the first, the available options for summarizing data often depend on the type of data that we have collected.  For example, numerical data, such as hours of sleep per night, are summarized differently from categorical data, such as favorite flavors of ice-cream.

The key to preventing this from becoming confusing is to keep the function of descriptive statistics in mind: we are trying to summarize a large amount of data in a way that can be communicated quickly, clearly, and precisely.  In some cases, a few numbers will do the trick; in other cases, you will need to create a plot of the data.

This unit will only discuss the ways in which a single set of values are summarized.  When you collect more than one piece of information from every participant in the sample –e.g., you not only ask them how many hours of sleep they usually get, but also ask them for their favorite flavor of ice-cream– then you can do three things using descriptive statistics: summarized the first set of values (on their own), summarize the second set of values (on their own), and summarize the relationship between the two sets of values.  This unit only covers the first two of these three.  Different ways to summarize the relationship between two sets of values will be covered in Units 7 and 8.

Summarizing Numerical Data

The most-popular way to summarize a set of numerical data –e.g., hours of sleep per night– is in terms of two or three aspects.  One always includes values for the center of the data and the spread of the data; in some cases, the shape of the data is also described.  A measure of center is a single value that attempts to describe an entire set of data by identifying the central position within that set of data.  The full, formal label for this descriptive statistic is measure of central tendency , but most people simply say “center.”  Another label for this is the “average.”

A measure of spread is also a single number, but this one indicates how widely the data are distributed around their center.  Another way of saying this is to talk about the “variability” of the data.  If all of the individual pieces of data are located close to the center, then the value of spread will be low; if the data are widely distributed, then the value of spread will be high.

What makes this a little bit complicated is that there are multiple ways to mathematically define the center and spread of a set of data.  For example, both the mean and the median (discussed in detail below) are valid measures of central tendency.  Similarly, both the variance (or standard deviation) and the inter-quartile range (also discussed below) are valid measures of spread.  This might suggest that there are at least four combinations of center and spread (i.e., two versions of center crossed with two version of spread), but that isn’t true.  The standard measures of center and spread actually come in pairs, such that your choice with regard to one forces you to use a particular option for the other.  If you define the center as the mean, for example, then you have to use variance (or standard deviation) for spread; if you define the center as the median, then you have to use the inter-quartile range for spread.  Because of this dependency, in what follows we shall discuss the standard measures of center and spread in pairs.  When this is finished, we shall mention some of the less popular alternatives and then, finally, turn to the issue of shape.

Measures of Center and Spread Based on Moments

The mean and variance of a set of numerical values are (technically) the first and second moments of the set of data.  Although it is not used very often in psychology, the term “moment” is quite popular in physics, where the first moment is the center of mass and the second moment is rotational inertia  (these are very useful concepts when describing how hard it is to throw or spin something).  The fact that the mean and variance of a set of numbers are the first and second moments isn’t all that important; the key is that they are based on the same approach to the data, which is why they are one of the standard pairs of measures for describing a set of numerical data.

The mean is the most popular and well known measure of central tendency.  It is what most people intend when they use the word “average.”  The mean can be calculated for any set of numerical data, discrete or continuous, regardless of units or details.  The mean is equal to the sum of all values divided by the number of values.  So, if we have n  values in a data set and they have values x 1 , x 2 , …, x n , the mean is calculated using the following formula:

 \begin{equation*} \bar {X} = \frac {\sum {X_i}} {n} \end{equation*}

Before moving forward, note two things about using the mean as the measure of center.  First, the mean is rarely one of the actual values from the original set of data.  As an extreme example: when the data are discrete (e.g., whole numbers, like the number of siblings), the mean will almost never match any of the specific values, because the mean will almost never be a whole number as well.

Second, an important property of the mean is that it includes and depends on every value in your set of data.  If any value in the data set is changed, then the mean will change.  In other words, the mean is “sensitive” to all of the data.

Variance and Standard Deviation

When the center is defined as the mean, the measure of spread to use is the variance (or the square-root of this value, which is the standard deviation ). Variance is defined as the average of the squared deviations from the mean.  The formula for variance is:

 \begin{equation*} \ Variance \medspace of \thinspace X = \frac {\sum {(X_i - \bar {X})^2}} {n - 1} \end{equation*}

(2 – 4.33) 2 + (4 – 4.33) 2 + (7 – 4.33) 2 = 12.6667

and then divide by (3−1) →  6.33

Note that, because each sub-step of the summation involves a value that has been squared, the value of variance cannot be a negative number.  Note, also, that when all of the individual pieces of data are the same, they will all be equal to the mean, so you will be adding up numbers that are all zero, so variance will also be zero.  These both make sense, because here we are calculating a measure of how spread out the data are, which will be zero when all of the data are the same and cannot be less than this.

As mentioned above, some people prefer to express this measure of spread in terms of the square-root of the variance, which is the standard deviation.  The main reason for doing this is because the units of variance are the square of the units of the original data, whereas the units of standard deviation are the same as the units of the original data.  Thus, for example, if you have response times of 2, 4, and 7 seconds, which have a mean of 4.33 seconds, then the variance is 6.33 seconds 2 (which is difficult to conceptualize), but also have a standard deviation of 2.52 seconds (which is easy to think about).

Conceptually, you can think of the standard deviation as the typical distance of any score from the mean.  In other words, the standard deviation represents the standard amount by which individual scores deviate from the mean.  The standard deviation uses the mean of the data as a baseline or reference point, and measures variability by considering the distance between each score and the mean.

Note that similar to the mean, both the variance and the standard deviation are sensitive to every value in the set of data; if any one piece of data is changed, then not only will the mean change, but the variance and standard deviation will also be changed.

 \bar {Y}\

Table 3.1. Number of study hours before an exam (X, Hours), and the grade obtained in that exam (Y, Grade) for 15 participants. The two most right columns show the deviation scores for each X and Y score.

Once we have the deviation scores for each participant, we square each of the deviation scores, and sum them.

(-5.66) 2 + (-2.66) 2 + (2.34) 2 + (0.34) 2 + (-1.66) 2 + (1.34) 2 + (4.34) 2 + (6.34) 2 + (-3.66) 2 + (-4.66) 2 + …

…( 2.34) 2 + (3.34) 2 + (-0.66) 2 + (-1.66) 2 + (0.34) 2  = 166.334

We then divide that sum by one less than the number of scores, 15 – 1 in this case:

 \ 166.334 / 14 = 11.66 \

So, 11.66 is the variance for the number of hours in our sample of participants.

In order to obtain the standard deviation, we calculate the square root of the variance:

 \sqrt {11.66 } = 3.42\

We follow the same steps to calculate the standard deviation of our participants’ grade.  First, we square each of the deviation scores (most right column in Table 3.1), and sum them:

(-8.46) 2 + (-6.46) 2 + (2.54) 2 + (-1.46) 2 + (-2.46) 2 + (-0.46) 2 + (8.54) 2 + (9.54) 2 + (-3.46) 2 + …

… (-5.46) 2 + (6.54) 2 + (5.54) 2 + (-2.46) 2 + (-3.46) 2 + (1.54) 2 = 427.734

Next, we divide that sum by one less than the number of scores, 14:

 \ 427.734 / 14 = 30.55 \

So, 30.55 is the variance for the grade in our sample of participants.

 \sqrt {30.55 } = 5.53\

Thus, you can summarize the data in our sample saying that the mean hours of study time are 13.66 , with a standard deviation of 3.42 , whereas the mean grade is 86.46 , with a standard deviation of 5.53.

Measures of Center and Spread Based on Percentiles

The second pair of measures for center and spread are based on percentile ranks and percentile values, instead of moments.  In general, the percentile rank for a given value is the percent of the data that is smaller (i.e., lower in value).  As a simple example, if the data are 2, 4, and 7, then the percentile rank for 5 is 67%, because two of the three values are smaller than 5.  Percentile ranks are usually easy to calculate.  In contrast, a percentile value (which is kind of the “opposite” of a percentile rank) is much more complicated.  For example, the percentile value for 67% when the data are 2, 4, and 7 is something between 4 and 7, because any value between 4 and 7 would be larger than two-thirds of the data.  (FYI: the percentile value is this case is 5.02.)  Fortunately, we won’t need to worry about the details when calculating that standard measures of center and spread when using the percentile-based method.

The median –which is how the percentile-based method defines center – is best thought of the middle score when the data have been arranged in order of magnitude.  To see how this can be done by hand, assume that we start with the data below:

We first re-arrange these data from smallest to largest:

The median is the middle of this new set of scores; in this case, the value (i n blue) is 56.  This is the middle value because there are 5 scores lower than it and 5 scores higher than it.  Finding the median is very easy when you have an odd number of scores.

What happens when you have an even number of scores?  What if you had only 10 scores, instead of 11?  In this case, you take the middle two scores, and calculate the mean of them.  So, if we start with the following data (which are the same as above, with the last one omitted):

We again re-arrange that data from smallest to largest:

And then calculate the mean of the 5th and 6th values (tied for the middle , in blue) to get a median of 55.50.

In general, the median is the value that splits the entire set of data into two equal halves.  Because of this, the other name for the median is 50th percentile –50% of the data are below this value and 50% of the data are above this value.  This makes the median a reasonable alternative definition of center.

Inter-Quartile Range

The inter-quartile range (typically named using its initials, IQR) is the measure of spread that is paired with the median as the measure of center.  As the name suggests, the IQR divides the data into four sub-sets, instead of just two: the bottom quarter, the next higher quarter, the next higher quarter, and the top quarter (the same as for the median, you must start by re-arranging the data from smallest to largest).  As described above, the median is the dividing line between the middle two quarters.  The IQR is the distance between the dividing line between the bottom two quarters and the dividing line between the top two quarters.

Technically, the IQR is the distance between the 25th percentile and the 75th percentile.  You calculate the value for which 25% of the data is below this point, then you calculate the value for which 25% of the data is above this point, and then you subtract the first from the second.  Because the 75th percentile cannot be lower than the 25th percentile (and is almost always much higher), the value for IQR cannot be negative number.

Returning to our example set of 11 values, for which the median was 56, the way that you can calculate the IQR by hand is as follows.  First, focus only on those values that are to the left of (i.e., lower than) the middle value:

Then calculate the “median” of these values.  In this case, the answer is 45, because the third box is the middle of these five boxes.  Therefore, the 25th percentile is 45.

Next, focus on the values that are to the right of (i.e., higher than) the original median:

The middle of these values, which is 77, is the 75th percentile.  Therefore, the IQR for these data is 32, because 77 – 45 = 32.  Note how, when the original set of data has an odd number of values (which made it easy to find the median), the middle value in the data set was ignored when finding the 25th and 75th percentiles.  In the above example, the number of values to be examined in each subsequent step was also odd (i.e., 5 each), so we selected the middle value of each subset to get the 25th and 75th percentiles.

If the number of values to be examined in each subsequent step had been even (e.g., if we had started with 9 values, so that 4 values would be used to get the 25th percentile), then the same averaging rule as we use for median would be used: use the average of the two values that tie for being in the middle.  For example, if these are the data (which are the first nine values from the original example after being sorted):

The median (in blue) is 55, the 25th percentile (the average of the two values in green) is 40, and the 75th percentile (the average of the two values in red) is 61.  Therefore, the IQR for these data is 61 – 40 = 21.

A similar procedure is used when you start with an even number of values, but with a few extra complications (these complications are caused by the particular method of calculating percentiles that is typically used in the psychology).  The first change to the procedure for calculating IQR is that now every value is included in one of the two sub-steps for getting the 25th and 75th percentile; none are omitted.  For example, if we use the same set of 10 values from above (i.e., the original 11 values with the highest omitted), for which the median was 55.50, then here is what we would use in the first sub-step:

In this case, the 25th percentile will be calculated from an odd number of values (5).  We start in the same way before, with the middle of these values (in green), which is 45.  Then we adjust it by moving the score 25% of the distance towards next lower value, which is 35.  The distance between these two values is 2.50 –i.e., (45 – 35) x .25 = 2.50– so the final value for the 25th percentile is 42.50.

The same thing is done for 75th percentile.  This time we would start with:

The starting value (in red) of 65 would then be moved 25% of the distance towards the next higher, which is 77, producing a 75th percentile of 68 –i.e., 65 + ((77 – 65) x .25) = 68.  Note how we moved the value away from the median in both cases.  If we don’t do this –if we used the same simple method as we used when the original set of data had an odd number of values– then we would slightly under-estimate the value of IQR.

Finally, if we start with an even number of pieces of data and also have an even number for each of the sub-steps (e.g., we started with 8 values), then we again have to apply the correction.  Whether you have to shift the 25th and 75th percentiles depends on original number of pieces of data, not the number that are used for the subsequent sub-steps.  To demonstrate this, here are the first eight values from the original set of data:

The first step to calculating the 25th percentile is to average the two values (in green) that tied for being in the middle of the lower half of the data; the answer is 40.  Then, as above, move this value 25% of the distance away from the median –i.e., move it down by 2.50, because (45 – 35) x .25 = 2.50.  The final value is 37.50.

Then do the same for the upper half of the data:

Start with the average of the two values (in red) that tied for being in the middle and then shift this value 25% of their difference away from the center.  The mean of the two values is 56.50 and after shifting the 75th percentile is 56.75.  Thus, the IQR for these eight pieces of data is 56.75 – 37.50 = 19.25.

Note the following about the median and IQR: because these are both based on percentiles, they are not always sensitive to every value in the set of data.  Look again at the original set of 11 values used in the examples.  Now imagine that the first (lowest) value was 4, instead of 14.  Would either the median or the IQR change?  The answer is No, neither would change .  Now imagine that the last (highest) value was 420, instead of 92.  Would either the median or IQR change?  Again, the answer is No .

Some of the other values can also change without altering the median and/or IQR, but not all of them.  If you changed the 56 in the original set to being 50, instead, for example, then the median would drop from 56 to 55, but the IQR would remain 32.  In contrast, if you only changed the 45 to being a 50, then the IQR would drop from 32 to 27, but the median would remain 56.

The one thing that is highly consistent is how you can decrease the lowest value and/or increase the highest value without changing either the median or IQR (as long as you start with at least 5 pieces of data).  This is an important property of percentiles-based methods: they are relatively insensitive to the most extreme values.  This is quite different from moments-based methods; the mean and variance of a set of data are both sensitive to every value.

Other Measures of Center and Spread

Although a vast majority of psychologists use either the mean and variance (as a pair) or the median and IQR (as a pair) as their measures of center and spread, occasionally you might come across a few other options.

The mode is a (rarely-used) way of defining the center of a set of data.  The mode is simply the value that appears the most often in a set of data.  For example, if your data are 2, 3, 3, 4, 5, and 9, then the mode is 3 because there are two 3s in the data and no other value appears more than once.  When you think about other sets of example data, you will probably see why the mode is not very popular.  First, many sets of data do not have a meaningful mode.  For the set of 2, 4, and 7, all three different values appear once each, so no value is more frequent than any other value.  When the data are continuous and measured precisely (e.g., response time in milliseconds), then this problem will happen quite often.  Now consider the set of 2, 3, 3, 4, 5, 5, 7, and 9; these data have two modes: 3 and 5.  This also happens quite often, especially when the data are discrete, such as when they must all be whole numbers.

But the greatest problem with using the mode as the measure of center is that it is often at one of the extremes, instead of being anywhere near the middle.  Here is a favorite example (even if it is not from psychology): the amount of federal income tax paid.  The most-frequent value for this –i.e., the mode of federal income tax paid– is zero.  This also happens to be the same as the lowest value.  In contrast, in 2021, for example, the mean amount of federal income tax paid was a little bit over $10,000.

Another descriptive statistic that you might come across is the range of the data.  Sometimes this is given as the lowest and highest values –e.g., “the participant ages ranged from 18 to 24 years”– which provides some information about center and spread simultaneously.  Other times the range is more specifically intended as only a measure of spread, so the difference between the highest and lowest values is given –e.g., “the average age was 21 years with a range of 6 years.”  There is nothing inherently wrong with providing the range, but it is probably best used as a supplement to one of the pairs of measures for center and spread.  This is true because range (in either format) often fails to provide sufficient detail.  For example, the set of 18, 18, 18, 18, and 24 and the set of 18, 24, 24, 24, and 24 both range from 18 to 24 (or have a range of 6), even though the data sets are clearly quite different.

Choosing the Measures of Center and Spread

When it comes to deciding which measures to use for center and spread when describing a set of numerical data –which is almost always a choice between mean and variance (or standard deviation) or median and IQR– the first thing to keep in mind is that this is not a question of “which is better?”; it is a question of which is more appropriate for the situation.  That is, the mean and the median are not just alternative ways of calculating a value for the center of a set of data; they use different definitions of the meaning of center.

So how should you make this decision?  One factor that you should consider focuses on a key difference between moments and percentiles that was mentioned above: how the mean and variance of a set of data both depend on every value, whereas the median and IQR are often unaffected by the specific values at the upper and lower extremes.  Therefore, if you believe that every value in the set of data is equally important and equally representative of whatever is being studied, then you should probably use the mean and variance for your descriptive statistics. In contrast, if you believe that some extreme values might be outliers (e.g., the participant wasn’t taking the study very seriously or was making random fast guesses), then you might want to use the median and IQR instead.

Another related factor to consider is the shape of the distribution of values in the set of data.  If the values are spread around the center in a roughly symmetrical manner, then the mean and the median will be very similar, but if there are more extreme values in one tail of the distribution (e.g., there are more extreme values above the middle than below), this will pull the mean away from the median, and the latter might better match what you think of as the center.

Finally, if you are calculating descriptive statistics as part of a process that will later involve making inferences about the population from which the sample was taken, you might want to consider the type of statistics that you will be using later.  Many inferential statistics (including t -tests, ANOVA, and the standard form of the correlation coefficient) are based on moments so, if you plan to use these later, it would be probably more appropriate to summarize the data in terms of mean and variance (or standard deviation).  Other statistics (including sign tests and alternative forms of the correlation coefficient) are based on percentiles, so if you plan to use these instead, then the median and IQR might be more appropriate for the descriptive statistics.

Hybrid Methods

Although relatively rare, there is one alternative to making a firm decision between moments (i.e., mean and variance) and percentiles (i.e., median and IQR) –namely, hybrid methods.  One example of this is as follows.  First, sort the data from smallest to largest (in the same manner as when using percentiles).  Then remove a certain number of values from the beginning and end of the list.  The most popular version of this is to remove the lowest 2.5% and the highest 2.5% of the data; for example, if you started with 200 pieces of data, remove the first 5 and the last 5, keeping the middle 190.  Then switch methods and calculate the mean and variance of the retained data.  This method is trying to have the best of both worlds: it is avoiding outliers by removing the extreme values, but it is remaining sensitive to all the data that are being retained.  When this method is used, the correct label for the final two values are the “trimmed mean” and “trimmed variance.”

Measures of Shape for Numerical Data

As the name suggests, the shape of a set of data is best thought about in terms of how the data would look if you made some sort of figure or plot of the values.  The most popular way to make a plot of a single set of numerical values starts by putting all of the data into something that is called a frequency table .  In brief, a frequency table is a list of all possible values, along with how many times each value occurs in the set of data.  This is easy to create when there are not very many different values (e.g., number of siblings); it becomes more complicated when almost every value in the set of data is unique (e.g., response time in milliseconds).

The key to resolving the problem of having too many unique values is to “bin” the data.  To bin a set of data, you choose a set of equally-spaced cut-offs, which will determine the borders of adjacent bins.  For example, if you are working with response times which happen to range from about 300 to 600 milliseconds (with every specific value being unique), you might decide to use bins that are 50 milliseconds wide, such that all values from 301 to 350 go in the first bin, all values from 351 to 400 go in the second bin, etc.  Most spreadsheet-based software packages (e.g., Excel) have built-in procedures to do this for you.

As an illustration of this process, let’s go back to the set of 11 values we have used in previous examples:

Based on the total number of values and their range, we decide to use bins that are 20 units wide.  Here are the same data in a frequency table:

Once you have a list of values or bins and the number of pieces of data in each, you can make a frequency histogram of the data, as shown in Figure 3.1:

Histogram with 5 bars

Based on this histogram, we can start to make descriptive statements about the shape of the data.  In general, these will concern two aspects, known as skewness and kurtosis , as we shall see next.

Skewness refers to the lack of symmetry.  It the left and right sides of the plot are mirror images of each other, then the distribution has no skew, because it is symmetrical; this is the case of the normal distribution (see Figure 3.2).  This clearly is not true for the example in Figure 3.1.  If the distribution has a longer tail on the left side, as is true here, then the data are said to have negative skew .  If the distribution has a longer “tail” on the right, then the distribution is said to have positive skew .  Note that you need to focus on the skinny part of each end of the plot.  The example in Figure 3.1 might appear to be heavier on the right, but skew is determined by the length of the skinny tails, which is clearly much longer on the left.  As a reference, Figure 3.2. shows you a normal distribution, perfectly symmetrical, so its skewness is zero; to the left and to the right, you can see two skewed distributions, positive and negative.  Most of the data points in the distribution with a positive skew have low values, and has a long tail on its right side.  The opposite is true for the distribution with negative skew: most of its data points have high values, and has a long tail on its left side.

Distributions with different skewness

The other aspect of shape, kurtosis, is a bit more complicated.  In general, kurtosis refers to how sharply the data are peaked, and is established in reference to a baseline or standard shape, the normal distribution, that has kurtosis zero.  When we have a nearly flat distribution, for example when every value occurs equally often, the kurtosis is negative.  When the distribution is very pointy, the kurtosis is positive.

If the shape of your data looks like a bell curve, then it’s said to be mesokurtic (“meso” means middle or intermediate in Greek).  If the shape of your data is flatter than this, then it’s said to be platykurtic (“platy” means flat in Greek).  If your shape is more pointed from this, then your data are leptokurtic (“lepto” means thin, narrow, or pointed in Greek).  Examples of these shapes can be seen in Figure 3.3.

Distributions with different levels of kurtosis

Both skew and kurtosis can vary a lot; these two attributes of shape are not completely independent.  That is, it is impossible for a perfectly flat distribution to have any skew; it is also impossible for a highly-skewed distribution to have zero kurtosis.  A large proportion of the data that is collected by psychologists is approximately normal, but with a long right tail.  In this situation, a good verbal label for the overall shape could be positively-skewed normal, even if that seems a bit contradictory, because the true normal distribution is actually symmetrical (see Figures 3.2 and 3.3).  The goal is to summarize the shape in a way that is easy to understand while being as accurate as possible.  You can always show a picture of your distribution to your audience.  A simple summary of the shape of the histogram in Figure 3.1 could be: roughly normal, but with a lot of negative skew ; this tells your audience that the data have a decent-sized peak in the middle, but the lower tail is a lot longer than the upper tail.

Numerical Values for Skew and Kurtosis

In some rare situations, you might want to be even more precise about the shape of a set of data.  Assuming that you used the mean and variance as your measures of center and spread, in these cases, you can use some (complicated) formulae to calculate specific numerical values for skew and kurtosis.  These are the third and fourth moments of the distribution (which is why they can only be used with the mean and variance, because those are the first and second moments of the data).  The details of these measures are beyond this course, but to give you an idea, as indicated above, values that depart from zero tells you that the shape is different from the normal distribution. A value of skew that is less than –1 or greater than +1 implies that the shape is notably skewed, whereas a value of kurtosis that is more than 1 unit away from zero imply that the data are not mesokurtic.

Summarizing Categorical Data

By definition, you cannnot summarize a set of categorical data (e.g., favorite colors) in terms of a numerical mean and/or a numerical spread.  It also does not make much sense to talk about shape, because this would depend on the order in which you placed the options on the X-axis of the plot.  Therefore, in this situation, we usually make a frequency table (with the options in any order that we wish).  You can also make a frequency histogram, but be careful not to read anything important into the apparent shape, because changing the order of the options would completely alter the shape.

An issue worth mentioning here is something that is similar to the process of binning.  Assume, for example, that you have taken a sample of 100 undergraduates, asking each for their favorite genre of music.  Assume that a majority of the respondents chose either pop (24), hip-hop (27), rock (25), or classical (16), but a few chose techno (3), trance (2), or country (3).  In this situation, you might want to combine all of the rare responses into one category with the label Other .  The reason for doing this is that it is difficult to come to any clear conclusions when something is rare.  As a general rule, if a category contains fewer than 5% of the observations, then it should probably be combined with one or more other options.  An example frequency table for such data is this:

Finally, to be technically accurate, it should be mentioned that there are some ways to quantify whether each of the options is being selected the same percent of the time, including the Chi-square (pronounced “kai-squared”) test and relative entropy (which comes from physics), but these are not very usual.  In general, most researchers just make a table and/or maybe a histogram to show the distribution of the categorical values.

A set of individuals selected from a population, typically intended to represent the population in a research study.

A variable that consists of separate, indivisible categories. No values can exist between two neighboring categories.

The entire set of individuals of interest for a given research question.

An individual value in a dataset that is substantially different (larger or smaller) than the other values in the dataset.

The end sections of a data distribution where the scores taper off.

Statistical analyses and techniques that are used to make inferences beyond what is observed in a given sample, and make decisions about what the data mean.

Data Analysis in the Psychological Sciences: A Practical, Applied, Multimedia Approach Copyright © 2023 by J Toby Mordkoff and Leyre Castro is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License , except where otherwise noted.

Share This Book

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

  • View all journals
  • Explore content
  • About the journal
  • Publish with us
  • Sign up for alerts
  • Review Article
  • Published: 23 April 2024

Embracing data science in catalysis research

  • Manu Suvarna   ORCID: orcid.org/0000-0003-0927-0579 1 &
  • Javier Pérez-Ramírez   ORCID: orcid.org/0000-0002-5805-7355 1  

Nature Catalysis ( 2024 ) Cite this article

1116 Accesses

16 Altmetric

Metrics details

  • Biocatalysis
  • Cheminformatics
  • Computational science
  • Heterogeneous catalysis
  • Homogeneous catalysis

Accelerating catalyst discovery and development is of paramount importance in addressing the global energy, sustainability and healthcare demands. The past decade has witnessed significant momentum in harnessing data science concepts in catalysis research to aid the aforementioned cause. Here we comprehensively review how catalysis practitioners have leveraged data-driven strategies to solve complex challenges across heterogeneous, homogeneous and enzymatic catalysis. We delineate all studies into deductive or inductive modes, and statistically infer the prevalence of catalytic tasks, model reactions, data representations and choice of algorithms. We highlight frontiers in the field and knowledge transfer opportunities among the catalysis subdisciplines. Our critical assessment reveals a glaring gap in data science exploration in experimental catalysis, which we bridge by elaborating on four pillars of data science, namely descriptive, predictive, causal and prescriptive analytics. We advocate their adoption into routine experimental workflows and underscore the importance of data standardization to spur future research in digital catalysis.

descriptive formula in research

This is a preview of subscription content, access via your institution

Access options

Access Nature and 54 other Nature Portfolio journals

Get Nature+, our best-value online-access subscription

24,99 € / 30 days

cancel any time

Subscribe to this journal

Receive 12 digital issues and online access to articles

111,21 € per year

only 9,27 € per issue

Buy this article

  • Purchase on Springer Link
  • Instant access to full article PDF

Prices may be subject to local taxes which are calculated during checkout

descriptive formula in research

Similar content being viewed by others

descriptive formula in research

De novo design of protein structure and function with RFdiffusion

descriptive formula in research

Dinuclear Cu(I) molecular electrocatalyst for CO2-to-C3 product conversion

descriptive formula in research

Local environment in biomolecular condensates modulates enzymatic activity across length scales

Vogt, C. & Weckhuysen, B. M. The concept of active site in heterogeneous catalysis. Nat. Rev. Chem. 6 , 89–111 (2022).

Article   PubMed   Google Scholar  

Ye, R., Zhao, J., Wickemeyer, B. B., Toste, F. D. & Somorjai, G. A. Foundations and strategies of the construction of hybrid catalysts for optimized performances. Nat. Catal. 1 , 318–325 (2018).

Article   Google Scholar  

Copéret, C., Chabanas, M., Petroff Saint-Arroman, R. & Basset, J. M. Homogeneous and heterogeneous catalysis: bridging the gap through surface organometallic chemistry. Angew. Chem. Int. Ed. 42 , 156–181 (2003).

Ye, R., Hurlburt, T. J., Sabyrov, K., Alayoglu, S. & Somorjai, G. A. Molecular catalysis science: perspective on unifying the fields of catalysis. Proc. Natl Acad. Sci. USA 113 , 5159–5166 (2016).

Article   CAS   PubMed   PubMed Central   Google Scholar  

Zhao, B., Han, Z. & Ding, K. The N-H functional group in organometallic catalysis. Angew. Chem. Int. Ed. 52 , 4744–4788 (2013).

Article   CAS   Google Scholar  

Sheldon, R. A. & Woodley, J. M. Role of biocatalysis in sustainable chemistry. Chem. Rev. 118 , 801–838 (2018).

Article   CAS   PubMed   Google Scholar  

Munnik, P., de Jongh, P. E. & de Jong, K. P. Recent developments in the synthesis of supported catalysts. Chem. Rev. 115 , 6687–6718 (2015).

Bornscheuer, U. T. et al. Engineering the third wave of biocatalysis. Nature 485 , 185–194 (2012).

Grunwaldt, J.-D. & Schroer, C. G. Hard and soft X-ray microscopy and tomography in catalysis: bridging the different time and length scales. Chem. Soc. Rev. 39 , 4741–4753 (2010).

Meirer, F. & Weckhuysen, B. M. Spatial and temporal exploration of heterogeneous catalysts with synchrotron radiation. Nat. Rev. Mater. 3 , 324–340 (2018).

Chen, B. W. J., Xu, L. & Mavrikakis, M. Computational methods in heterogeneous catalysis. Chem. Rev. 121 , 1007–1048 (2021).

Durand, D. J. & Fey, N. Computational ligand descriptors for catalyst design. Chem. Rev. 119 , 6561–6594 (2019).

Wang, H. et al. Scientific discovery in the age of artificial intelligence. Nature 620 , 47–60 (2023).

LeCun, Y., Bengio, Y. & Hinton, G. Deep learning. Nature 521 , 436–444 (2015).

Kitchin, J. R. Machine learning in catalysis. Nat. Catal. 1 , 230–232 (2018).

Toyao, T. et al. Machine learning for catalysis informatics: recent applications and prospects. ACS Catal. 10 , 2260–2297 (2020).

Ma, X., Li, Z., Achenie, L. E. K. & Xin, H. Machine-learning-augmented chemisorption model for CO 2 electroreduction catalyst screening. J. Phys. Chem. Lett. 6 , 3528–3533 (2015).

Ahneman, D. T., Estrada, J. G., Lin, S., Dreher, S. D. & Doyle, A. G. Predicting reaction performance in C-N cross-coupling using machine learning. Science 360 , 186–190 (2018). The application of interpretable machine learning on a high-throughput Buchwald–Hartwig dataset to predict high-performing palladium catalysts and unravel their inhibition mechanism .

Kim, M. et al. Searching for an optimal multi-metallic alloy catalyst by active learning combined with experiments. Adv. Mater. 34 , 2108900 (2022).

Shields, B. J. et al. Bayesian reaction optimization as a tool for chemical synthesis. Nature 590 , 89–96 (2021). Development of Bayesian optimization on palladium-catalysed Mitsunobu and deoxyfluorination reactions where the algorithm consistently outperformed human decision-making in terms number of experiments and actual yields to optimize the process .

Wu, Z., Kan, S. B. J., Lewis, R. D., Wittmann, B. J. & Arnold, F. H. Machine learning-assisted directed protein evolution with combinatorial libraries. Proc. Natl Acad. Sci. USA 116 , 8852–8858 (2019).

Lu, H. et al. Machine learning-aided engineering of hydrolases for PET depolymerization. Nature 604 , 662–667 (2022).

Ulissi, Z. W., Medford, A. J., Bligaard, T. & Nørskov, J. K. To address surface reaction network complexity using scaling relations machine learning and DFT calculations. Nat. Commun. 8 , 14621 (2017).

Article   PubMed   PubMed Central   Google Scholar  

Li, F. et al. Deep learning-based kcat prediction enables improved enzyme-constrained model reconstruction. Nat. Catal. 5 , 662–672 (2022). A deep learning methodology to predict enzyme turnover numbers of metabolic enzymes from any organism merely from substrate structures and protein sequences .

Holeňa, M. & Baerns, M. Feedforward neural networks in catalysis: a tool for the approximation of the dependency of yield on catalyst composition and for knowledge extraction. Catal. Today 81 , 485–494 (2003). Amongst the earliest reports on applied machine learning in catalysis, wherein a feedforward neural network was used to predict propene yield based on the catalyst composition .

Baumes, L., Farrusseng, D., Lengliz, M. & Mirodatos, C. Using artificial neural networks to boost high-throughput discovery in heterogeneous catalysis. QSAR Comb. Sci. 23 , 767–778 (2004).

Burello, E., Farrusseng, D. & Rothenberg, G. Combinatorial explosion in homogeneous catalysis: screening 60,000 cross-coupling reactions. Adv. Synth. Catal. 346 , 1844–1853 (2004).

Corma, A. et al. Optimisation of olefin epoxidation catalysts with the application of high-throughput and genetic algorithms assisted by artificial neural networks (softcomputing techniques). J. Catal. 229 , 513–524 (2005).

Venkatasubramanian, V. The promise of artificial intelligence in chemical engineering: is it here, finally? AIChE J. 65 , 466–478 (2019).

Pyzer-Knapp, E. O. et al. Accelerating materials discovery using artificial intelligence, high performance computing and robotics. NPJ Comput. Mater. 8 , 84 (2022).

Pedregosa, F. et al. Scikit-learn: machine learning in Python. J. Mach. Learn. Res. 12 , 2825–2830 (2011).

Google Scholar  

RDKit; https://www.rdkit.org/

Chanussot, L. et al. Open catalyst 2020 (OC20) dataset and community challenges. ACS Catal. 11 , 6059–6072 (2021). The most extensive database consisting of close to 1.3 million density DFT relaxations across a wide swath of materials, surfaces and adsorbates (nitrogen, carbon and oxygen chemistries) for application in heterogeneous catalysis .

Kearnes, S. M. et al. The open reaction database. J. Am. Chem. Soc. 143 , 18820–18826 (2021).

Yano, J. et al. The case for data science in experimental chemistry: examples and recommendations. Nat. Rev. Chem. 6 , 357–370 (2022).

Schlexer Lamoureux, P. et al. Machine learning for computational heterogeneous catalysis. ChemCatChem 11 , 3581–3601 (2019).

Medford, A. J., Kunz, M. R., Ewing, S. M., Borders, T. & Fushimi, R. Extracting knowledge from data through catalysis informatics. ACS Catal. 8 , 7403–7429 (2018).

Maldonado, A. G. & Rothenberg, G. Predictive modeling in homogeneous catalysis: a tutorial. Chem. Soc. Rev. 39 , 1891–1902 (2010).

Mazurenko, S., Prokop, Z. & Damborsky, J. Machine learning in enzyme engineering. ACS Catal. 10 , 1210–1223 (2020).

Suvarna, M. & Pérez-Ramírez, J. Dataset: Embracing Data Science in Catalysis Research (Zenodo, 2024); https://doi.org/10.5281/zenodo.10640876

Zahrt, A. F. et al. Prediction of higher-selectivity catalysts by computer-driven workflow and machine learning. Science 363 , eaau5631 (2019). The study models multiple conformations of more than 800 prospective catalysts for the coupling reaction of imines and thiols, and trained machine learning algorithms on a subset of experimental results, to achieve highly accurate predictions of enantioselectivities .

Nguyen, T. N. et al. High-throughput experimentation and catalyst informatics for oxidative coupling of methane. ACS Catal. 10 , 921–932 (2020).

Tran, K. & Ulissi, Z. W. Active learning across intermetallics to guide discovery of electrocatalysts for CO 2 reduction and H 2 evolution. Nat. Catal. 1 , 696–703 (2018). A fully automated screening method developed by integrating machine learning and optimization algorithms to guide DFT calculations, for in silico prediction of electrocatalyst performance for CO 2 reduction and H 2 evolution .

Wang, G. et al. Accelerated discovery of multi-elemental reverse water–gas shift catalysts using extrapolative machine learning approach. Nat. Commun. 14 , 5861 (2023).

Amar, Y., Schweidtmann, A. M., Deutsch, P., Cao, L. & Lapkin, A. Machine learning and molecular descriptors enable rational solvent selection in asymmetric catalysis. Chem. Sci. 10 , 6697–6706 (2019).

Rinehart, N. I. et al. A machine-learning tool to predict substrate-adaptive conditions for Pd-catalyzed C-N couplings. Science 381 , 965–972 (2023).

Schweidtmann, A. M. et al. Machine learning meets continuous flow chemistry: automated optimization towards the pareto front of multiple objectives. Chem. Eng. J. 352 , 277–282 (2018).

O’Connor, N. J., Jonayat, A. S. M., Janik, M. J. & Senftle, T. P. Interaction trends between single metal atoms and oxide supports identified with density functional theory and statistical learning. Nat. Catal. 1 , 531–539 (2018).

Foppa, L. et al. Materials genes of heterogeneous catalysis from clean experiments and artificial intelligence. MRS Bull. 46 , 1016–1026 (2021).

Zhao, S. et al. Enantiodivergent Pd-catalyzed C-C bond formation enabled through ligand parameterization. Science 362 , 670–674 (2018).

Timoshenko, J., Lu, D., Lin, Y. & Frenkel, A. I. Supervised machine-learning-based determination of three-dimensional structure of metallic nanoparticles. J. Phys. Chem. Lett. 8 , 5091–5098 (2017). Application of deep learning to solve metal catalyst from XANES, broadly applicable to the determination of nanoparticle structures in operando studies and generalizable to other nanoscale systems .

Zheng, C. et al. Automated generation and ensemble-learned matching of X-ray absorption spectra. NPJ Comput. Mater. 4 , 12 (2018).

Mitchell, S. et al. Automated image analysis for single-atom detection in catalytic materials by transmission electron microscopy. J. Am. Chem. Soc. 144 , 8018–8029 (2022).

Büchler, J. et al. Algorithm-aided engineering of aliphatic halogenase WelO5* for the asymmetric late-stage functionalization of soraphens. Nat. Commun. 13 , 371 (2022).

Wulf, C. et al. A unified research data infrastructure for catalysis research - challenges and concepts. ChemCatChem 13 , 3223–3236 (2021).

Mendes, P. S. F., Siradze, S., Pirro, L. & Thybaut, J. W. Open data in catalysis: from today’s big picture to the future of small data. ChemCatChem 13 , 836–850 (2021).

Marshall, C. P., Schumann, J. & Trunschke, A. Achieving digital catalysis: strategies for data acquisition, storage and use. Angew. Chem. Int. Ed. 62 , e202302971 (2023).

Zavyalova, U., Holena, M., Schlögl, R. & Baerns, M. Statistical analysis of past catalytic data on oxidative methane coupling for new insights into the composition of high-performance catalysts. ChemCatChem 3 , 1935–1947 (2011).

Odabasi, C., Gunay, M. E. & Yildrim, R. Knowledge extraction for water gas shift reaction over noble metal catalysts from publications in the literature between 2002 and 2012. Int. J. Hydrog. Energy 39 , 5733–5746 (2014).

Suvarna, M., Araújo, T. P. & Pérez-Ramírez, J. A generalized machine learning framework to predict the space-time yield of methanol from thermocatalytic CO 2 hydrogenation. Appl. Catal. B Environ. 315 , 121530 (2022).

Mamun, O., Winther, K. T., Boes, J. R. & Bligaard, T. High-throughput calculations of catalytic properties of bimetallic alloy surfaces. Sci. Data 6 , 76 (2019).

Jinnouchi, R. & Asahi, R. Predicting catalytic activity of nanoparticles by a DFT-aided machine-learning algorithm. J. Phys. Chem. Lett. 8 , 4279–4283 (2017).

Berman, H. M. et al. The protein data bank. Nucleic Acids Res. 28 , 235–242 (2000).

Mitchell, A. L. et al. InterPro in 2019: improving coverage, classification and access to protein sequence annotations. Nucleic Acids Res. 47 , D351–D360 (2019).

UniProt Consortium. UniProt: a worldwide hub of protein knowledge. Nucleic Acids Res. 47 , D506–D515 (2019).

Schomburg, I., Chang, A. & Schomburg, D. BRENDA, enzyme data and metabolic information. Nucleic Acids Res. 30 , 47–49 (2002).

Nagano, N. EzCatDB: the enzyme catalytic-mechanism database. Nucleic Acids Res. 33 , D407–D412 (2005).

Finnigan, W. et al. RetroBioCat as a computer-aided synthesis planning tool for biocatalytic reactions and cascades. Nat. Catal. 4 , 98–104 (2021).

Winther, K. T. et al. Catalysis-Hub.org, an open electronic structure database for surface reactions. Sci. Data 6 , 75 (2019).

Álvarez-Moreno, M. et al. Managing the computational chemistry big data problem: the ioChem-BD platform. J. Chem. Inf. Model. 55 , 95–103 (2015).

Gensch, T. et al. A comprehensive discovery platform for organophosphorus ligands for catalysis. J. Am. Chem. Soc. 144 , 1205–1217 (2022).

Bengio, Y., Courville, A. & Vincent, P. Representation learning: a review and new perspectives. IEEE Trans. Pattern Anal. Mach. Intell. 35 , 1798–1828 (2013).

Schmidt, J., Marques, M. R. G., Botti, S. & Marques, M. A. L. Recent advances and applications of machine learning in solid-state materials science. NPJ Comput. Mater. 5 , 83 (2019).

Sanchez-Lengeling, B. & Aspuru-Guzik, A. Inverse molecular design using machine learning: generative models for matter engineering. Science 361 , 360–365 (2018).

Mitchell, J. B. O. Machine learning methods in chemoinformatics. WIREs Comput. Mol. Sci. 4 , 468–481 (2014).

Wigh, D. S., Goodman, J. M. & Lapkin, A. A. A review of molecular representation in the age of machine learning. WIREs Comput. Mol. Sci. 12 , e1603 (2022).

Krenn, M., Häse, F., Nigam, A., Friederich, P. & Aspuru-Guzik, A. Self-referencing embedded strings (SELFIES): a 100% robust molecular string representation. Mach. Learn. Sci. Technol. 1 , 045024 (2020).

Kononova, O. et al. Text-mined dataset of inorganic materials synthesis recipes. Sci. Data 6 , 203 (2019).

Olivetti, E. A. et al. Data-driven materials research enabled by natural language processing and information extraction. Appl. Phys. Rev. 7 , 041317 (2020).

Kim, E. et al. Materials synthesis insights from scientific literature via text extraction and machine learning. Chem. Mater. 29 , 9436–9444 (2017).

Jensen, Z. et al. A machine learning approach to zeolite synthesis enabled by automatic literature data extraction. ACS Cent. Sci. 5 , 892–899 (2019).

Luo, Y. et al. MOF synthesis prediction enabled by automatic data mining and machine learning. Angew. Chem. Int. Ed. 61 , e202200242 (2022).

Zheng, Z., Zhang, O., Borgs, C., Chayes, J. T. & Yaghi, O. M. ChatGPT chemistry assistant for text mining and the prediction of MOF synthesis. J. Am. Chem. Soc. 145 , 18048–18062 (2023).

Suvarna, M., Vaucher, A. C., Mitchell, S., Laino, T. & Pérez-Ramírez, J. Language models and protocol standardization guidelines for accelerating synthesis planning in heterogeneous catalysis. Nat. Commun. 14 , 7964 (2023).

Lai, N. S. et al. Artificial intelligence (AI) workflow for catalyst design and optimization. Ind. Eng. Chem. Res. 62 , 17835–17848 (2023).

Probst, D. et al. Biocatalysed synthesis planning using data-driven learning. Nat. Commun. 13 , 964 (2022).

Moon, J. et al. Active learning guides discovery of a champion four-metal perovskite oxide for oxygen evolution electrocatalysis. Nat. Mater. 23 , 108–115 (2024).

Zhong, M. et al. Accelerated discovery of CO 2 electrocatalysts using active machine learning. Nature 581 , 178–183 (2020). Discovery of Cu-Al electrocatalysts, though DFT aided machine learning, to efficiently reduce CO 2 to ethylene with a Faradaic efficiency of 80% .

Torres, J. A. G. et al. A multi-objective active learning platform and web app for reaction optimization. J. Am. Chem. Soc. 144 , 19999–20007 (2022).

Greenhalgh, J. C., Fahlberg, S. A., Pfleger, B. F. & Romero, P. A. Machine learning-guided acyl-ACP reductase engineering for improved in vivo fatty alcohol production. Nat. Commun. 12 , 5825 (2021).

Tallorin, L. et al. Discovering de novo peptide substrates for enzymes using machine learning. Nat. Commun. 9 , 5253 (2018).

Schwaller, P. et al. Molecular transformer: a model for uncertainty-calibrated chemical reaction prediction. ACS Cent. Sci. 5 , 1572–1583 (2019).

Anstine, D. M. & Isayev, O. Generative models as an emerging paradigm in the chemical sciences. J. Am. Chem. Soc. 145 , 8736–8750 (2023).

Gómez-Bombarelli, R. et al. Automatic chemical design using a data-driven continuous representation of molecules. ACS Cent. Sci. 4 , 268–276 (2018). A method to convert discrete representations of molecules into multidimensional continuous representations for generating compounds in silico .

Repecka, D. et al. Expanding functional protein sequence spaces using generative adversarial networks. Nat. Mach. Intell. 3 , 324–333 (2021).

Hawkins-Hooker, A. et al. Generating functional protein variants with variational autoencoders. PLoS Comput. Biol. 17 , e1008736 (2021).

Johnson, S. R. et al. Computational scoring and experimental evaluation of enzymes generated by neural networks. Preprint at https://www.biorxiv.org/content/10.1101/2023.03.04.531015v1 (2023).

Schilter, O., Vaucher, A., Schwaller, P. & Laino, T. Designing catalysts with deep generative models and computational data. A case study for Suzuki cross coupling reactions. Digit. Discov. 2 , 728–735 (2023).

Kreutter, D., Schwaller, P. & Reymond, J.-L. Predicting enzymatic reactions with a molecular transformer. Chem. Sci. 12 , 8648–8659 (2021).

Popova, M., Isayev, O. & Tropsha, A. Deep reinforcement learning for de novo drug design. Sci. Adv. 4 , eaap7885 (2018).

Zhou, Z., Li, X. & Zare, R. N. Optimizing chemical reactions with deep reinforcement learning. ACS Cent. Sci. 3 , 1337–1344 (2017). A fully automated deep reinforcement learning to optimize chemical reactions where the model iteratively records the results of a chemical reaction and chooses new experimental conditions to improve the reaction outcome .

Lan, T. & An, Q. Discovering catalytic reaction networks using deep reinforcement learning from first-principles. J. Am. Chem. Soc. 143 , 16804–16812 (2021).

Song, Z. et al. Adaptive design of alloys for CO 2 activation and methanation via reinforcement learning Monte Carlo tree search algorithm. J. Phys. Chem. Lett. 14 , 3594–3601 (2023).

Suvarna, M., Preikschas, P. & Pérez-Ramírez, J. Identifying descriptors for promoted rhodium-based catalysts for higher alcohol synthesis via machine learning. ACS Catal. 12 , 15373–15385 (2022).

Smith, A., Keane, A., Dumesic, J. A., Huber, G. W. & Zavala, V. M. A machine learning framework for the analysis and prediction of catalytic activity from experimental data. Appl. Catal. B Environ. 263 , 118257 (2020).

Vellayappan, K. et al. Impacts of catalyst and process parameters on Ni-catalyzed methane dry reforming via interpretable machine learning. Appl. Catal. B Environ. 330 , 122593 (2023).

Roh, J. et al. Interpretable machine learning framework for catalyst performance prediction and validation with dry reforming of methane. Appl. Catal. B Environ. 343 , 123454 (2024).

McCullough, K., Williams, T., Mingle, K., Jamshidi, P. & Lauterbach, J. High-throughput experimentation meets artificial intelligence: a new pathway to catalyst discovery. Phys. Chem. Chem. Phys. 22 , 11174–11196 (2020).

Suzuki, K. et al. Statistical analysis and discovery of heterogeneous catalysts based on machine learning from diverse published data. ChemCatChem 11 , 4537–4547 (2019).

Butler, K. T., Davies, D. W., Cartwright, H., Isayev, O. & Walsh, A. Machine learning for molecular and materials science. Nature 559 , 547–555 (2018).

Oviedo, F., Ferres, J. L., Buonassisi, T. & Butler, K. T. Interpretable and explainable machine learning for materials science and chemistry. Acc. Mater. Res. 3 , 597–607 (2022).

Esterhuizen, J. A., Goldsmith, B. R. & Linic, S. Interpretable machine learning for knowledge generation in heterogeneous catalysis. Nat. Catal. 5 , 175–184 (2022).

Wu, K. & Doyle, A. G. Parameterization of phosphine ligands demonstrates enhancement of nickel catalysis via remote steric effects. Nat. Chem. 9 , 779–784 (2017).

Weng, B. et al. Simple descriptor derived from symbolic regression accelerating the discovery of new perovskite catalysts. Nat. Commun. 11 , 3513 (2020).

Ouyang, R., Curtarolo, S., Ahmetcik, E., Scheffler, M. & Ghiringhelli, L. M. SISSO: a compressed-sensing method for identifying the best low-dimensional descriptor in an immensity of offered candidates. Phys. Rev. Mater. 2 , 083802 (2018).

Foppa, L. et al. Data-centric heterogeneous catalysis: identifying rules and materials genes of alkane selective oxidation. J. Am. Chem. Soc. 145 , 3427–3442 (2023).

Li, Z., Ma, X. & Xin, H. Feature engineering of machine-learning chemisorption models for catalyst design. Catal. Today 280 , 232–238 (2017).

Lundberg, S. M. et al. From local explanations to global understanding with explainable AI for trees. Nat. Mach. Intell. 2 , 56–67 (2020).

Timoshenko, J. et al. Linking the evolution of catalytic properties and structural changes in copper-zinc nanocatalysts using operando EXAFS and neural-networks. Chem. Sci. 11 , 3727–3736 (2020).

Wilkinson, M. D. et al. The FAIR guiding principles for scientific data management and stewardship. Sci. Data 3 , 160018 (2016).

Scheffler, M. et al. FAIR data enabling new horizons for materials research. Nature 604 , 635–642 (2022).

Abolhasani, M. & Kumacheva, E. The rise of self-driving labs in chemical and materials sciences. Nat. Synth. 2 , 483–492 (2023). A review of self-driving labs through the integration of machine learning, lab automation and robotics to accelerate digital data curation and enable data-driven discoveries in chemical sciences .

MacLeod, B. P. et al. Self-driving laboratory for accelerated discovery of thin-film materials. Sci. Adv. 6 , eaaz8867 (2020).

Download references

Acknowledgements

This study was created as part of NCCR Catalysis (grant no. 180544), a National Centre of Competence in Research funded by the Swiss National Science Foundation. We thank C. Ko, M. E. Usteri, T. Zou and P. Preikschas for fruitful discussions on the manuscript and help with illustrations.

Author information

Authors and affiliations.

Institute for Chemical and Bioengineering, Department of Chemistry and Applied Biosciences, ETH Zurich, Zurich, Switzerland

Manu Suvarna & Javier Pérez-Ramírez

You can also search for this author in PubMed   Google Scholar

Contributions

M.S. and J.P.-R. conceived the project. M.S. led the data collection and analysis efforts, and wrote the manuscript. J.P.-R. supervised the project, wrote the manuscript, and managed resources and funding. Both authors provided input to the manuscript and approved the final version.

Corresponding author

Correspondence to Javier Pérez-Ramírez .

Ethics declarations

Competing interests.

The authors declare no competing interests.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary information

Supplementary information.

Supplementary Notes 1 and 2, Tables 1–3 and Fig. 1.

Source data

41929_2024_1150_moesm2_esm.xlsx.

Source Data Fig. 1a Raw data for creating timeline plot of research trends in data-driven catalysis. Source Data Fig. 1c Raw data for creating alluvial plots linking catalysis subdisciplines with different tasks and modes. Source Data Fig. 2a Raw data for network plot mapping relations of deductive tasks based on catalysis type. Source Data Fig. 2b Raw data for network plot mapping relations of deductive tasks based on driving. Source Data Fig. 4 Raw data for establishing structure-property-performance relations through ML.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article.

Suvarna, M., Pérez-Ramírez, J. Embracing data science in catalysis research. Nat Catal (2024). https://doi.org/10.1038/s41929-024-01150-3

Download citation

Received : 08 December 2023

Accepted : 29 February 2024

Published : 23 April 2024

DOI : https://doi.org/10.1038/s41929-024-01150-3

Share this article

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

Quick links

  • Explore articles by subject
  • Guide to authors
  • Editorial policies

Sign up for the Nature Briefing: AI and Robotics newsletter — what matters in AI and robotics research, free to your inbox weekly.

descriptive formula in research

University of Houston Hydride Research Pushes Frontiers of Practical, Accessible Superconductivity

By Bryan Luhn — 713-743-0954

  • Science, Energy and Innovation

descriptive formula in research

Science is taking a huge step forward in the quest for superconductors that will not require ultra-high pressure to function thanks to multinational research led by Xiaojia Chen at the University of Houston.

“It has long been superconductivity researchers’ goal to ease or even eliminate the critical controls currently required regarding temperature and pressure,” said Chen , the M.D. Anderson Professor of Physics at UH’s College of Natural Sciences and Mathematics and a principal investigator at the Texas Center for Superconductivity at UH .

The evolution toward eliminating the current special handling now required by superconductive material – which is defined as material that offers little or no impedance from electrical resistance or magnetic fields – hints that the potential for radical boosts in efficiency for certain processes in research, healthcare, industry and other commercial enterprises might become reality before long.

But currently, the conditions needed for successful superconductivity outstretch the resources of many potential users, even many research laboratories.

Chen explains that lowering the accessible pressure for superconductivity is one important goal of the current studies on hydrides. “But the experiments are still challenged in providing a set of convincing evidence,” he said.

“For example, rare-earth hydrides have been reported to exhibit superconductivity near room temperature. This is based on the observations of two essential characteristics – the zero-resistance state and the Meissner effect,” Chen said.

(The Meissner effect, discovered in 1933, recognizes a decrease or reverse in magnetism as a material achieves superconductivity, providing physicists with a method to measure the change.)

“However, these superconducting rare-earth materials performed on target only at extremely high pressures. To make progress, we have to reduce synthesis pressure as low as possible, ideally to atmosphere conditions,” Chen explained.

Chen’s team found their breakthrough with their choice of conductive media – alloys of hydride, which are lab-made metallic substances that include hydrogen molecules with two electrons. Specifically, they worked with yttrium-cerium hydrides (Y 0.5 Ce 0.5 H 9 ) and lanthanum-cerium hydrides (La 0.5 Ce 0.5 H 10 ).

The inclusion of Cerium (Ce) was seen to make a key difference.

“These observations were suggested due to the enhanced chemical pre-compression effect through the introduction of the Ce element in these superhydrides,” Chen explained.

The team’s findings are detailed in two journal articles. The more recent, in Nature Communications, focuses on yttrium-cerium hydrides; the other, in Journal of Physics: Condensed Matter, concentrates on lanthanum-cerium hydrides.

The team has found these superconductors can maintain relatively high transition temperature. In other words, the lanthanum-cerium hydrides and yttrium-cerium hydrides are capable of superconductivity in less extreme conditions (at lower pressure but maintaining relatively higher transition temperature) than has been accomplished before.

“This moves us forward in our evolution toward a workable and relatively available superconductive media,” Chen said. “We subjected our findings to multiple measurements of the electrical transport, synchrotron x-ray diffraction, Raman scattering and theoretical calculations. The tests confirmed that our results remain consistent.”

“This finding points to a route toward high-temperature superconductivity that can be accessible in many current laboratory settings,” Chen explained. The hydride research moves the frontier far beyond the recognized standard set by copper oxides (also known as cuprates).

“We still have a way to go to reach truly ambient conditions. The goal remains to achieve superconductivity at room temperature and in pressure equivalent to our familiar ground-level atmosphere. So the research goes on,” Chen said.

The team’s findings are detailed in these journal articles:

  • “ Synthesis and Superconductivity in Yttrium-Cerium Hydrides at High Pressures , ” published in Nature Communications
  • “ Synthesis of Superconducting Phase of La0.5Ce0.5H10 at High Pressures ,” published in Journal of Physics: Condensed Matter

Research team

Joining Chen as co-authors on the project are team members Liu-Cheng Chen, Tao Luo, Zi-Yu Cao, Ge Huang and Di Peng from the Harbin Institute of Technology (Shenzhen) and Center for High Pressure Science and Technology Advanced Research (Shanghai), and collaborators Philip Dalladay-Simpson, Federico Aiace Gorelli, Li-Li Zhang, Guo-Hua Zhong and Hai-Qing Lin from other academic institutes in China.

Top Stories

April 26, 2024

2024 UH Faculty Awards Recognize Teaching, Research, Service

UH celebrated the efforts of those professors who have gone above and beyond their roles as educators in demonstrating a commitment to both campus and community. The 2024 Faculty Awards were presented on Thursday, April 25 to spotlight excellence in teaching, research and service.

April 15, 2024

UH’s Mielad Ziaee Named 2024 Truman Scholar

University of Houston student Mielad Ziaee earned the prestigious Truman Scholarship. He is just the third UH student to receive this honor.

  • Student, Faculty and Staff Success

April 11, 2024

University of Houston Lands 18 Programs in Latest QS World University Rankings

The University of Houston is, once again, demonstrating its strength as a global leader. UH has 18 programs ranked among the top in the world according to the 2024 edition of the QS World University Rankings by Subject, released today by QS Quacquarelli Symonds.

  • University and Campus

An official website of the United States government

Here’s how you know

Official websites use .gov A .gov website belongs to an official government organization in the United States.

Secure .gov websites use HTTPS A lock ( Lock Locked padlock icon ) or https:// means you’ve safely connected to the .gov website. Share sensitive information only on official, secure websites.

  • Entire Site
  • Research & Funding
  • Health Information
  • About NIDDK
  • News Archive

Alternatives to Race-Based Kidney Function Calculations

Race has long been used as a biological variable in health research, under the mistaken belief that racial categories correlate with genetic traits that account for population-level biological differences. However, we now know that more genetic variation exists within race categories than between them, and that race correlates poorly with the spectrum of biological variability that exists among human beings.

Accordingly, NIDDK-supported research is leading to a change in the way kidney disease is diagnosed and monitored by removing race as a variable from the equations used to estimate glomerular filtration rate (GFR). Estimated GFR remains a primary tool to assess kidney function and to classify the severity of kidney disease. Estimated GFR also helps determine prognosis and treatment, such as when hemodialysis or a transplant may be needed and how to optimize the dosage of certain drugs.

Because measuring someone’s GFR directly is expensive, difficult, and burdensome on the person being tested, GFR is normally estimated by using an inexpensive blood test to determine the concentration of a compound called creatinine. Because creatinine is synthesized at a constant rate by one’s muscles and filtered out of the blood by the kidneys, its concentration in the blood is strongly linked to kidney function. For several reasons, however, creatinine is not a perfect biomarker of a person’s actual kidney function. For example, its synthesis rate is determined by how much muscle a person has, and its blood concentration is also affected to a degree by how much meat they eat. As a result, a person’s real GFR might be a bit higher or lower than their estimated GFR, but the estimates are generally close.

However, the original study data used to develop estimated GFR calculations came overwhelmingly from participants of European descent. Subsequent work showed that the relationship between creatinine level and real GFR, on average, was the same for people from most other groups. Researchers discovered, though, that for unknown reasons creatinine levels tend to be slightly higher in Black study participants than in participants from other populations, at any given directly measured GFR. As a result, for many years estimated GFR calculations have taken into consideration whether the person being tested is “Black” or “non-Black.”

This practice is problematic. Race was created for social and political reasons, and thus has no biological basis. Indeed, race categories do not align with the continuum of human genetic and biological variability. For example, many individuals who identify as Black do not, in fact, have a higher creatinine to GFR ratio than is found in other groups. Therefore, applying the “correction factor” for Black race sometimes leads to overestimation of GFR, potentially aggravating the significant health disparities that exist in kidney health outcomes. For example, a person who identifies as Black could be erroneously excluded from receiving a kidney transplant because the equation overestimates their GFR, making it appear that their kidneys are more functional than they are. Further, the physiological reason why some Black people have a higher creatinine to true GFR ratio remains unknown, so there is no way to test for it. Thus, it is unclear who, exactly, does have a higher creatinine to true GFR ratio and thus should receive a correction factor for determination of estimated GFR.

Recent NIDDK-supported research from the Chronic Renal Insufficiency Cohort Study and the Chronic Kidney Disease Epidemiology Collaboration has sought to address these issues by identifying new, better methods for assessing kidney function. For example, one group investigated whether genetic ancestry analysis might be useful for helping determine who the correction factor should apply to. While the use of ancestry did improve accuracy at the population level, it is both impractical and not always applicable at the individual level. Other approaches that considered body composition ( e.g. , how muscular a person is) or urinary creatinine excretion rates marginally improved accuracy. Another group tested an alternative formula for estimating GFR from creatinine that corrects somewhat for age and sex, but that does not use race as a modifier. On average, this approach slightly underestimated GFR for participants who identified as Black, and slightly overestimated GFR for people who considered themselves non-Black.

Encouragingly, both studies also found that estimating GFR based on blood levels of a compound called cystatin C, which does not vary by a person’s race, could help improve the accuracy of kidney function tests. One of the studies found that the most accurate, least biased results were obtained using equations that utilize both markers—creatinine and cystatin C. Thus, NIDDK-supported research has informed recent recommendations to use both serum creatinine and cystatin C to estimate GFR in adults, when cystatin C is available. Using the combined serum creatinine-cystatin C equation is particularly important when the estimated GFR value is close to a critical decision point, such as when determining drug dosing or kidney transplant eligibility.

At present, however, laboratory and reimbursement infrastructure are not yet adequate to support routine ordering of cystatin C tests in clinical settings for all people for whom GFR should be more accurately assessed. Therefore, two leading kidney health advocacy groups—the American Society of Nephrology and the National Kidney Foundation—have called for measures to improve the availability of cystatin C testing, as well as more research to find still better approaches for assessing kidney health. In the meantime, they have called for adoption of the improved creatinine-only based GFR estimation method that uses age and sex—but not race—as modifiers.

Thus, NIDDK-supported research is improving the equitable and accurate assessment of kidney function. NIDDK remains committed to research that builds on that improvement and to the overarching goal of reducing disparities in kidney disease.

share this!

April 29, 2024

This article has been reviewed according to Science X's editorial process and policies . Editors have highlighted the following attributes while ensuring the content's credibility:

fact-checked

peer-reviewed publication

trusted source

Hydride research pushes frontiers of practical, accessible superconductivity

by Bryan Luhn, University of Houston

University of houston hydride research pushes frontiers of practical, accessible superconductivity

Science is taking a step forward in the quest for superconductors that will not require ultra-high pressure to function, thanks to multinational research led by Xiaojia Chen at the University of Houston.

"It has long been superconductivity researchers' goal to ease or even eliminate the critical controls currently required regarding temperature and pressure ," said Chen, the M.D. Anderson Professor of Physics at UH's College of Natural Sciences and Mathematics and a principal investigator at the Texas Center for Superconductivity at UH.

The evolution toward eliminating the current special handling now required by superconductive material—which is defined as material that offers little or no impedance from electrical resistance or magnetic fields—hints that the potential for radical boosts in efficiency for certain processes in research, health care, industry, and other commercial enterprises might become reality before long.

But currently, the conditions needed for successful superconductivity outstretch the resources of many potential users, even many research laboratories.

Chen explains that lowering the accessible pressure for superconductivity is one important goal of the current studies on hydrides. "But the experiments are still challenged in providing a set of convincing evidence," he said.

"For example, rare-earth hydrides have been reported to exhibit superconductivity near room temperature. This is based on the observations of two essential characteristics—the zero-resistance state and the Meissner effect," Chen said.

(The Meissner effect, discovered in 1933, recognizes a decrease or reverse in magnetism as a material achieves superconductivity, providing physicists with a method to measure the change.)

"However, these superconducting rare-earth materials performed on target only at extremely high pressures. To make progress, we have to reduce synthesis pressure as low as possible, ideally to atmosphere conditions," Chen explained.

Chen's team found their breakthrough with their choice of conductive media—alloys of hydride , which are lab-made metallic substances that include hydrogen molecules with two electrons. Specifically, they worked with yttrium-cerium hydrides (Y 0.5 Ce 0.5 H 9 ) and lanthanum-cerium hydrides (La 0.5 Ce 0.5 H 10 ).

The inclusion of Cerium (Ce) was seen to make a key difference.

"These observations were suggested due to the enhanced chemical pre-compression effect through the introduction of the Ce element in these superhydrides," Chen explained.

Two journal articles detail the team's findings. The more recent, in Nature Communications , focuses on yttrium-cerium hydrides; the other, in Journal of Physics: Condensed Matter , concentrates on lanthanum-cerium hydrides.

The team has found these superconductors can maintain relatively high transition temperatures. In other words, the lanthanum-cerium hydrides and yttrium-cerium hydrides are capable of superconductivity in less extreme conditions (at lower pressure but maintaining relatively higher transition temperature) than has been accomplished before.

"This moves us forward in our evolution toward a workable and relatively available superconductive media," Chen said. "We subjected our findings to multiple measurements of the electrical transport, synchrotron X-ray diffraction, Raman scattering, and theoretical calculations. The tests confirmed that our results remain consistent."

"This finding points to a route toward high-temperature superconductivity that can be accessible in many current laboratory settings," Chen explained. The hydride research moves the frontier far beyond the recognized standard set by copper oxides (also known as cuprates).

"We still have a way to go to reach truly ambient conditions. The goal remains to achieve superconductivity at room temperature and in pressure equivalent to our familiar ground-level atmosphere. So the research goes on," Chen said.

Ge Huang et al, Synthesis of superconducting phase of La0.5Ce0.5H10 at high pressures, Journal of Physics: Condensed Matter (2023). DOI: 10.1088/1361-648X/ad0915

Journal information: Nature Communications

Provided by University of Houston

Explore further

Feedback to editors

descriptive formula in research

Novel mechanisms for cleavage-independent activation of gasdermins revealed

13 minutes ago

descriptive formula in research

Fading lights: Comprehensive study unveils multiple threats to North America's firefly populations

50 minutes ago

descriptive formula in research

New tech enables deep tissue imaging during surgery

descriptive formula in research

European Bison can adapt well to the Mediterranean climate of southern Spain, analysis suggests

51 minutes ago

descriptive formula in research

Vaccinia virus: New insights into the structure and function of the poxvirus prototype

descriptive formula in research

Quantum fiber optics in the brain enhance processing, may protect against degenerative diseases

descriptive formula in research

Scientists reveal how SID-1 recognizes dsRNA and initiates systemic RNA interference

57 minutes ago

descriptive formula in research

Details of hurricane Ian's aftermath captured with new remote sensing method

descriptive formula in research

Researchers discover 'topological Kerr effect' in two-dimensional quantum magnets

descriptive formula in research

Researchers improve the plasticity of ceramic materials at room temperature

Relevant physicsforums posts, how can i calculate spin hall conductivity for a 4*4 hamiltonian.

Apr 27, 2024

Hall effect in P-type semiconductors: electron-centric heuristic?

Apr 25, 2024

Insulator band gap and applied voltage?

Apr 24, 2024

Cavity locking using Lockin PID technique

Question about cgs vs si units in the context of the debye length.

Apr 22, 2024

Looking for study group to hack on basic theory in condensed matter

Apr 20, 2024

More from Atomic and Condensed Matter

Related Stories

descriptive formula in research

Researchers discover new yttrium-hydrogen compounds with implications for high-pressure superconductivity

Mar 14, 2024

descriptive formula in research

New cerium superhydrides become stepping stones to 'Goldilocks' superconductors

Sep 13, 2021

descriptive formula in research

New ternary hydrides of lanthanum and yttrium join the ranks of high-temperature superconductors

Jul 1, 2021

descriptive formula in research

High-temperature superconductivity in lanthanum, yttrium, and cerium ternary hydrides

Nov 16, 2022

descriptive formula in research

Researchers discover tin hydride with properties of strange metal

Aug 30, 2023

descriptive formula in research

New material facilitates search for room-temperature superconductivity

May 12, 2023

Recommended for you

descriptive formula in research

Topologically controlled multiskyrmions: Researchers propose a new family of quasiparticles

2 hours ago

descriptive formula in research

When does a conductor not conduct? Switching a 2D metal-organic framework from an insulator to a metal

descriptive formula in research

Research demonstrates a new mechanism of order formation in quantum systems

Apr 26, 2024

descriptive formula in research

Scientists simulate magnetization reversal of Nd-Fe-B magnets using large-scale finite element models

descriptive formula in research

Making light 'feel' a magnetic field like an electron would

Let us know if there is a problem with our content.

Use this form if you have come across a typo, inaccuracy or would like to send an edit request for the content on this page. For general inquiries, please use our contact form . For general feedback, use the public comments section below (please adhere to guidelines ).

Please select the most appropriate category to facilitate processing of your request

Thank you for taking time to provide your feedback to the editors.

Your feedback is important to us. However, we do not guarantee individual replies due to the high volume of messages.

E-mail the story

Your email address is used only to let the recipient know who sent the email. Neither your address nor the recipient's address will be used for any other purpose. The information you enter will appear in your e-mail message and is not retained by Phys.org in any form.

Newsletter sign up

Get weekly and/or daily updates delivered to your inbox. You can unsubscribe at any time and we'll never share your details to third parties.

More information Privacy policy

Donate and enjoy an ad-free experience

We keep our content available to everyone. Consider supporting Science X's mission by getting a premium account.

E-mail newsletter

ScienceDaily

New research shows 'profound' link between dietary choices and brain health

New research has highlighted the profound link between dietary choices and brain health.

New research has highlighted the profound link between dietary choices and brain health.

Published in Nature , the research showed that a healthy, balanced diet was linked to superior brain health, cognitive function and mental wellbeing. The study, involving researchers at the University of Warwick, sheds light on how our food preferences not only influence physical health but also significantly impact brain health.

The dietary choices of a large sample of 181,990 participants from the UK Biobank were analysed against and a range of physical evaluations, including cognitive function, blood metabolic biomarkers, brain imaging, and genetics -- unveiling new insights into the relationship between nutrition and overall wellbeing.

The food preferences of each participant were collected via an online questionnaire, which the team catagorised into 10 groups (such as alcohol, fruits and meats). A type of AI called machine learning helped the researchers analyse the large dataset.

A balanced diet was associated with better mental health, superior cognitive functions and even higher amounts of grey matter in the brain -- linked to intelligence -- compared with those with a less varied diet.

The study also highlighted the need for gradual dietary modifications, particularly for individuals accustomed to highly palatable but nutritionally deficient foods. By slowly reducing sugar and fat intake over time, individuals may find themselves naturally gravitating towards healthier food choices.

Genetic factors may also contribute to the association between diet and brain health, the scientists believe, showing how a combination of genetic predispositions and lifestyle choices shape wellbeing.

Lead Author Professor Jianfeng Feng, University of Warwick, emphasised the importance of establishing healthy food preferences early in life. He said: "Developing a healthy balanced diet from an early age is crucial for healthy growth. To foster the development of a healthy balanced diet, both families and schools should offer a diverse range of nutritious meals and cultivate an environment that supports their physical and mental health."

Addressing the broader implications of the research, Prof Feng emphasized the role of public policy in promoting accessible and affordable healthy eating options. "Since dietary choices can be influenced by socioeconomic status, it's crucial to ensure that this does not hinder individuals from adopting a healthy balanced dietary profile," he stated. "Implementing affordable nutritious food policies is essential for governments to empower the general public to make informed and healthier dietary choices, thereby promoting overall public health."

Co-Auhtor Wei Cheng, Fudan University, added: "Our findings underscore the associations between dietary patterns and brain health, urging for concerted efforts in promoting nutritional awareness and fostering healthier eating habits across diverse populations."

Dr Richard Pemberton, Certified Lifestyle Physician and GP, Hexagon Health, who was not involved in the stud, commented: "This exciting research further demonstrates that a poor diet detrimentally impacts not only our physical health but also our mental and brain health. This study supports the need for urgent government action to optimise health in our children, protecting future generations. We also hope this provides further evidence to motivate us all to make better lifestyle choices, to improve our health and reduce the risk of developing chronic disease."

  • Diet and Weight Loss
  • Staying Healthy
  • Dieting and Weight Control
  • Nutrition Research
  • Mental Health
  • Animal Learning and Intelligence
  • Origin of life
  • Dietary mineral
  • Health science
  • Epidemiology
  • Healthy diet
  • Public health
  • Antioxidant

Story Source:

Materials provided by University of Warwick . Note: Content may be edited for style and length.

Journal Reference :

  • Ruohan Zhang, Bei Zhang, Chun Shen, Barbara J. Sahakian, Zeyu Li, Wei Zhang, Yujie Zhao, Yuzhu Li, Jianfeng Feng, Wei Cheng. Associations of dietary patterns with brain health from behavioral, neuroimaging, biochemical and genetic analyses . Nature Mental Health , 2024; DOI: 10.1038/s44220-024-00226-0

Cite This Page :

Explore More

  • Mice Given Mouse-Rat Brains Can Smell Again
  • New Circuit Boards Can Be Repeatedly Recycled
  • Collisions of Neutron Stars and Black Holes
  • Advance in Heart Regenerative Therapy
  • Bioluminescence in Animals 540 Million Years Ago
  • Profound Link Between Diet and Brain Health
  • Loneliness Runs Deep Among Parents
  • Food in Sight? The Liver Is Ready!
  • Acid Reflux Drugs and Risk of Migraine
  • Do Cells Have a Hidden Communication System?

Trending Topics

Strange & offbeat.

IMAGES

  1. Introduction to Descriptive Statistics

    descriptive formula in research

  2. Introduction to Descriptive Statistics

    descriptive formula in research

  3. How To Use Descriptive Analysis In Research

    descriptive formula in research

  4. Understanding Descriptive Research Methods

    descriptive formula in research

  5. Descriptive Research: Methods, Types, and Examples

    descriptive formula in research

  6. Descriptive Research: Methods, Types, and Examples

    descriptive formula in research

VIDEO

  1. Descriptive primary key by formula

  2. how to calculate sample size in descriptive studies,sample size calculation in research paper/thesis

  3. Descriptive research design/ steps, type, advantages and disadvantages of descriptive research

  4. How To Write A Descriptive Essay Step by Step #Shorts

  5. Applied Research Methods: Part III-Descriptive Statistics

  6. DESCRIPTIVE STATISTICS AND IT'S TYPES

COMMENTS

  1. Descriptive Research

    Descriptive research methods. Descriptive research is usually defined as a type of quantitative research, though qualitative research can also be used for descriptive purposes. The research design should be carefully developed to ensure that the results are valid and reliable.. Surveys. Survey research allows you to gather large volumes of data that can be analyzed for frequencies, averages ...

  2. Descriptive Research Design

    As discussed earlier, common research methods for descriptive research include surveys, case studies, observational studies, cross-sectional studies, and longitudinal studies. Design your study: Plan the details of your study, including the sampling strategy, data collection methods, and data analysis plan.

  3. What is Descriptive Research? Definition, Methods, Types and Examples

    Descriptive research is a methodological approach that seeks to depict the characteristics of a phenomenon or subject under investigation. In scientific inquiry, it serves as a foundational tool for researchers aiming to observe, record, and analyze the intricate details of a particular topic. This method provides a rich and detailed account ...

  4. What Is Descriptive Statistics: Full Explainer With Examples

    Descriptive statistics, although relatively simple, are a critically important part of any quantitative data analysis. Measures of central tendency include the mean (average), median and mode. Skewness indicates whether a dataset leans to one side or another. Measures of dispersion include the range, variance and standard deviation.

  5. Descriptive Research: Characteristics, Methods + Examples

    Descriptive research is a research method describing the characteristics of the population or phenomenon studied. This descriptive methodology focuses more on the "what" of the research subject than the "why" of the research subject. The method primarily focuses on describing the nature of a demographic segment without focusing on ...

  6. Descriptive Research Design: What It Is and How to Use It

    Descriptive research design. Descriptive research design uses a range of both qualitative research and quantitative data (although quantitative research is the primary research method) to gather information to make accurate predictions about a particular problem or hypothesis. As a survey method, descriptive research designs will help ...

  7. Descriptive Research 101: Definition, Methods and Examples

    For example, suppose you are a website beta testing an app feature. In that case, descriptive research invites users to try the feature, tracking their behavior and then asking their opinions. Can be applied to many research methods and areas. Examples include healthcare, SaaS, psychology, political studies, education, and pop culture.

  8. Descriptive Research: Design, Methods, Examples, and FAQs

    Descriptive research is an exploratory research method.It enables researchers to precisely and methodically describe a population, circumstance, or phenomenon.. As the name suggests, descriptive research describes the characteristics of the group, situation, or phenomenon being studied without manipulating variables or testing hypotheses.This can be reported using surveys, observational ...

  9. Descriptive Research Design

    Descriptive research methods. Descriptive research is usually defined as a type of quantitative research, though qualitative research can also be used for descriptive purposes. The research design should be carefully developed to ensure that the results are valid and reliable.. Surveys. Survey research allows you to gather large volumes of data that can be analysed for frequencies, averages ...

  10. Research Guides: Statistics

    Descriptive information gives researchers a general picture of their data, as opposed to an explanation for why certain variables may be associated with each other. Descriptive statistics are often contrasted with inferential statistics, which are used to make inferences, or to explain factors, about the population.

  11. Chapter 1. Descriptive Statistics and Frequency Distributions

    Descriptive Statistics and Frequency Distributions This chapter is about describing populations and samples, a subject known as descriptive statistics. This will all make more sense if you keep in mind that the information you want to produce is a description of the population or sample as a whole, not a description of one member of the population.

  12. Descriptive Statistics

    Descriptive Statistics Formulas. Sure, here are some of the most commonly used formulas in descriptive statistics: Mean (μ or x̄): The average of all the numbers in the dataset. It is computed by summing all the observations and dividing by the number of observations. Formula: μ = Σx/n or x̄ = Σx/n

  13. Methods and formulas for Display Descriptive Statistics

    Suppose you have a column that contains N values. To calculate the median, first order your data values from smallest to largest. If N is odd, the sample median is the value in the middle. If N is even, the sample median is the average of the two middle values. For example, when N = 5 and you have data x 1, x 2, x 3, x 4, and x 5, the median = x 3.

  14. Descriptive Statistics for Summarising Data

    Using the data from these three rows, we can draw the following descriptive picture. Mentabil scores spanned a range of 50 (from a minimum score of 85 to a maximum score of 135). Speed scores had a range of 16.05 s (from 1.05 s - the fastest quality decision to 17.10 - the slowest quality decision).

  15. Descriptive Research Designs: Types, Examples & Methods

    Some characteristics of descriptive research are: Quantitativeness. Descriptive research uses a quantitative research method by collecting quantifiable information to be used for statistical analysis of the population sample. This is very common when dealing with research in the physical sciences. Qualitativeness.

  16. Methods and formulas for Descriptive Statistics (Tables)

    If the number of observations in a data set is odd, the median is the value in the middle. If the number of observations in a data set is even, the median is the average of the two middle values. Use the following method to calculate the median for each cell or margin using the data corresponding to that cell or margin.

  17. Descriptive Analytics

    Descriptive Analytics. Definition: Descriptive analytics focused on describing or summarizing raw data and making it interpretable. This type of analytics provides insight into what has happened in the past. It involves the analysis of historical data to identify patterns, trends, and insights. Descriptive analytics often uses visualization ...

  18. Quantitative analysis: Descriptive statistics

    Numeric data collected in a research project can be analysed quantitatively using statistical tools in two different ways. Descriptive analysis refers to statistically describing, ... Using the above formula, as shown in Table 14.1, the manually computed value of correlation between age and self-esteem is 0.79. ...

  19. Descriptive research

    Descriptive research is mainly done when a researcher wants to gain a better understanding of a topic. That is, analysis of the past as opposed to the future. Descriptive research is the exploration of the existing certain phenomena. The details of the facts won't be known. The existing phenomena's facts are not known to the person.

  20. A Practical Guide to Writing Quantitative and Qualitative Research

    These questions can function in several ways, such as to 1) identify and describe existing conditions (contextual research questions); 2) describe a phenomenon (descriptive research questions); 3) assess the effectiveness of existing methods, protocols, theories, or procedures (evaluation research questions); 4) examine a phenomenon or analyze ...

  21. Descriptive Research

    1. Purpose. The primary purpose of descriptive research is to describe the characteristics, behaviors, and attributes of a particular population or phenomenon. 2. Participants and Sampling. Descriptive research studies a particular population or sample that is representative of the larger population being studied.

  22. Unit 3. Descriptive Statistics for Psychological Research

    The key to preventing this from becoming confusing is to keep the function of descriptive statistics in mind: we are trying to summarize a large amount of data in a way that can be communicated quickly, clearly, and precisely. In some cases, a few numbers will do the trick; in other cases, you will need to create a plot of the data.

  23. Identifying and evaluating the dimensions and components ...

    This study is an applied research, which uses descriptive-analytical method to identify and measure dimensions and components resilience of Konarak city. The sample size was estimated using Cochran's formula of 371 households in four neighborhoods of Shomal nirogah gazi, Surak, Nazarabad and Jonoob shahr of Konarak city.

  24. Embracing data science in catalysis research

    The use of data science tools in catalysis research has experienced a surge in the past 10-15 years. This Review provides a holistic overview and categorization of the field across the various ...

  25. University of Houston Hydride Research Pushes Frontiers of Practical

    Science is taking a huge step forward in the quest for superconductors that will not require ultra-high pressure to function thanks to multinational research led by Xiaojia Chen at the University of Houston. "It has long been superconductivity researchers' goal to ease or even eliminate the critical controls currently required regarding temperature and pressure," said Chen, the M.D ...

  26. Alternatives to Race-Based Kidney Function Calculations

    Accordingly, NIDDK-supported research is leading to a change in the way kidney disease is diagnosed and monitored by removing race as a variable from the equations used to estimate glomerular filtration rate (GFR). Estimated GFR remains a primary tool to assess kidney function and to classify the severity of kidney disease.

  27. Hydride research pushes frontiers of practical, accessible

    Synthesis of Y 0.5 Ce 0.5 hydrides at extreme conditions (high pressure and high temperature). a Schematic diagram of the experimental setup for the measurements. The arrows represent the laser ...

  28. New research shows 'profound' link between dietary choices and brain

    Published in Nature, the research showed that a healthy, balanced diet was linked to superior brain health, cognitive function and mental wellbeing.The study, involving researchers at the ...