Grad Coach

Quantitative Data Analysis 101

The lingo, methods and techniques, explained simply.

By: Derek Jansen (MBA)  and Kerryn Warren (PhD) | December 2020

Quantitative data analysis is one of those things that often strikes fear in students. It’s totally understandable – quantitative analysis is a complex topic, full of daunting lingo , like medians, modes, correlation and regression. Suddenly we’re all wishing we’d paid a little more attention in math class…

The good news is that while quantitative data analysis is a mammoth topic, gaining a working understanding of the basics isn’t that hard , even for those of us who avoid numbers and math . In this post, we’ll break quantitative analysis down into simple , bite-sized chunks so you can approach your research with confidence.

Quantitative data analysis methods and techniques 101

Overview: Quantitative Data Analysis 101

  • What (exactly) is quantitative data analysis?
  • When to use quantitative analysis
  • How quantitative analysis works

The two “branches” of quantitative analysis

  • Descriptive statistics 101
  • Inferential statistics 101
  • How to choose the right quantitative methods
  • Recap & summary

What is quantitative data analysis?

Despite being a mouthful, quantitative data analysis simply means analysing data that is numbers-based – or data that can be easily “converted” into numbers without losing any meaning.

For example, category-based variables like gender, ethnicity, or native language could all be “converted” into numbers without losing meaning – for example, English could equal 1, French 2, etc.

This contrasts against qualitative data analysis, where the focus is on words, phrases and expressions that can’t be reduced to numbers. If you’re interested in learning about qualitative analysis, check out our post and video here .

What is quantitative analysis used for?

Quantitative analysis is generally used for three purposes.

  • Firstly, it’s used to measure differences between groups . For example, the popularity of different clothing colours or brands.
  • Secondly, it’s used to assess relationships between variables . For example, the relationship between weather temperature and voter turnout.
  • And third, it’s used to test hypotheses in a scientifically rigorous way. For example, a hypothesis about the impact of a certain vaccine.

Again, this contrasts with qualitative analysis , which can be used to analyse people’s perceptions and feelings about an event or situation. In other words, things that can’t be reduced to numbers.

How does quantitative analysis work?

Well, since quantitative data analysis is all about analysing numbers , it’s no surprise that it involves statistics . Statistical analysis methods form the engine that powers quantitative analysis, and these methods can vary from pretty basic calculations (for example, averages and medians) to more sophisticated analyses (for example, correlations and regressions).

Sounds like gibberish? Don’t worry. We’ll explain all of that in this post. Importantly, you don’t need to be a statistician or math wiz to pull off a good quantitative analysis. We’ll break down all the technical mumbo jumbo in this post.

Need a helping hand?

analysis techniques in quantitative research

As I mentioned, quantitative analysis is powered by statistical analysis methods . There are two main “branches” of statistical methods that are used – descriptive statistics and inferential statistics . In your research, you might only use descriptive statistics, or you might use a mix of both , depending on what you’re trying to figure out. In other words, depending on your research questions, aims and objectives . I’ll explain how to choose your methods later.

So, what are descriptive and inferential statistics?

Well, before I can explain that, we need to take a quick detour to explain some lingo. To understand the difference between these two branches of statistics, you need to understand two important words. These words are population and sample .

First up, population . In statistics, the population is the entire group of people (or animals or organisations or whatever) that you’re interested in researching. For example, if you were interested in researching Tesla owners in the US, then the population would be all Tesla owners in the US.

However, it’s extremely unlikely that you’re going to be able to interview or survey every single Tesla owner in the US. Realistically, you’ll likely only get access to a few hundred, or maybe a few thousand owners using an online survey. This smaller group of accessible people whose data you actually collect is called your sample .

So, to recap – the population is the entire group of people you’re interested in, and the sample is the subset of the population that you can actually get access to. In other words, the population is the full chocolate cake , whereas the sample is a slice of that cake.

So, why is this sample-population thing important?

Well, descriptive statistics focus on describing the sample , while inferential statistics aim to make predictions about the population, based on the findings within the sample. In other words, we use one group of statistical methods – descriptive statistics – to investigate the slice of cake, and another group of methods – inferential statistics – to draw conclusions about the entire cake. There I go with the cake analogy again…

With that out the way, let’s take a closer look at each of these branches in more detail.

Descriptive statistics vs inferential statistics

Branch 1: Descriptive Statistics

Descriptive statistics serve a simple but critically important role in your research – to describe your data set – hence the name. In other words, they help you understand the details of your sample . Unlike inferential statistics (which we’ll get to soon), descriptive statistics don’t aim to make inferences or predictions about the entire population – they’re purely interested in the details of your specific sample .

When you’re writing up your analysis, descriptive statistics are the first set of stats you’ll cover, before moving on to inferential statistics. But, that said, depending on your research objectives and research questions , they may be the only type of statistics you use. We’ll explore that a little later.

So, what kind of statistics are usually covered in this section?

Some common statistical tests used in this branch include the following:

  • Mean – this is simply the mathematical average of a range of numbers.
  • Median – this is the midpoint in a range of numbers when the numbers are arranged in numerical order. If the data set makes up an odd number, then the median is the number right in the middle of the set. If the data set makes up an even number, then the median is the midpoint between the two middle numbers.
  • Mode – this is simply the most commonly occurring number in the data set.
  • In cases where most of the numbers are quite close to the average, the standard deviation will be relatively low.
  • Conversely, in cases where the numbers are scattered all over the place, the standard deviation will be relatively high.
  • Skewness . As the name suggests, skewness indicates how symmetrical a range of numbers is. In other words, do they tend to cluster into a smooth bell curve shape in the middle of the graph, or do they skew to the left or right?

Feeling a bit confused? Let’s look at a practical example using a small data set.

Descriptive statistics example data

On the left-hand side is the data set. This details the bodyweight of a sample of 10 people. On the right-hand side, we have the descriptive statistics. Let’s take a look at each of them.

First, we can see that the mean weight is 72.4 kilograms. In other words, the average weight across the sample is 72.4 kilograms. Straightforward.

Next, we can see that the median is very similar to the mean (the average). This suggests that this data set has a reasonably symmetrical distribution (in other words, a relatively smooth, centred distribution of weights, clustered towards the centre).

In terms of the mode , there is no mode in this data set. This is because each number is present only once and so there cannot be a “most common number”. If there were two people who were both 65 kilograms, for example, then the mode would be 65.

Next up is the standard deviation . 10.6 indicates that there’s quite a wide spread of numbers. We can see this quite easily by looking at the numbers themselves, which range from 55 to 90, which is quite a stretch from the mean of 72.4.

And lastly, the skewness of -0.2 tells us that the data is very slightly negatively skewed. This makes sense since the mean and the median are slightly different.

As you can see, these descriptive statistics give us some useful insight into the data set. Of course, this is a very small data set (only 10 records), so we can’t read into these statistics too much. Also, keep in mind that this is not a list of all possible descriptive statistics – just the most common ones.

But why do all of these numbers matter?

While these descriptive statistics are all fairly basic, they’re important for a few reasons:

  • Firstly, they help you get both a macro and micro-level view of your data. In other words, they help you understand both the big picture and the finer details.
  • Secondly, they help you spot potential errors in the data – for example, if an average is way higher than you’d expect, or responses to a question are highly varied, this can act as a warning sign that you need to double-check the data.
  • And lastly, these descriptive statistics help inform which inferential statistical techniques you can use, as those techniques depend on the skewness (in other words, the symmetry and normality) of the data.

Simply put, descriptive statistics are really important , even though the statistical techniques used are fairly basic. All too often at Grad Coach, we see students skimming over the descriptives in their eagerness to get to the more exciting inferential methods, and then landing up with some very flawed results.

Don’t be a sucker – give your descriptive statistics the love and attention they deserve!

Examples of descriptive statistics

Branch 2: Inferential Statistics

As I mentioned, while descriptive statistics are all about the details of your specific data set – your sample – inferential statistics aim to make inferences about the population . In other words, you’ll use inferential statistics to make predictions about what you’d expect to find in the full population.

What kind of predictions, you ask? Well, there are two common types of predictions that researchers try to make using inferential stats:

  • Firstly, predictions about differences between groups – for example, height differences between children grouped by their favourite meal or gender.
  • And secondly, relationships between variables – for example, the relationship between body weight and the number of hours a week a person does yoga.

In other words, inferential statistics (when done correctly), allow you to connect the dots and make predictions about what you expect to see in the real world population, based on what you observe in your sample data. For this reason, inferential statistics are used for hypothesis testing – in other words, to test hypotheses that predict changes or differences.

Inferential statistics are used to make predictions about what you’d expect to find in the full population, based on the sample.

Of course, when you’re working with inferential statistics, the composition of your sample is really important. In other words, if your sample doesn’t accurately represent the population you’re researching, then your findings won’t necessarily be very useful.

For example, if your population of interest is a mix of 50% male and 50% female , but your sample is 80% male , you can’t make inferences about the population based on your sample, since it’s not representative. This area of statistics is called sampling, but we won’t go down that rabbit hole here (it’s a deep one!) – we’ll save that for another post .

What statistics are usually used in this branch?

There are many, many different statistical analysis methods within the inferential branch and it’d be impossible for us to discuss them all here. So we’ll just take a look at some of the most common inferential statistical methods so that you have a solid starting point.

First up are T-Tests . T-tests compare the means (the averages) of two groups of data to assess whether they’re statistically significantly different. In other words, do they have significantly different means, standard deviations and skewness.

This type of testing is very useful for understanding just how similar or different two groups of data are. For example, you might want to compare the mean blood pressure between two groups of people – one that has taken a new medication and one that hasn’t – to assess whether they are significantly different.

Kicking things up a level, we have ANOVA, which stands for “analysis of variance”. This test is similar to a T-test in that it compares the means of various groups, but ANOVA allows you to analyse multiple groups , not just two groups So it’s basically a t-test on steroids…

Next, we have correlation analysis . This type of analysis assesses the relationship between two variables. In other words, if one variable increases, does the other variable also increase, decrease or stay the same. For example, if the average temperature goes up, do average ice creams sales increase too? We’d expect some sort of relationship between these two variables intuitively , but correlation analysis allows us to measure that relationship scientifically .

Lastly, we have regression analysis – this is quite similar to correlation in that it assesses the relationship between variables, but it goes a step further to understand cause and effect between variables, not just whether they move together. In other words, does the one variable actually cause the other one to move, or do they just happen to move together naturally thanks to another force? Just because two variables correlate doesn’t necessarily mean that one causes the other.

Stats overload…

I hear you. To make this all a little more tangible, let’s take a look at an example of a correlation in action.

Here’s a scatter plot demonstrating the correlation (relationship) between weight and height. Intuitively, we’d expect there to be some relationship between these two variables, which is what we see in this scatter plot. In other words, the results tend to cluster together in a diagonal line from bottom left to top right.

Sample correlation

As I mentioned, these are are just a handful of inferential techniques – there are many, many more. Importantly, each statistical method has its own assumptions and limitations.

For example, some methods only work with normally distributed (parametric) data, while other methods are designed specifically for non-parametric data. And that’s exactly why descriptive statistics are so important – they’re the first step to knowing which inferential techniques you can and can’t use.

Remember that every statistical method has its own assumptions and limitations,  so you need to be aware of these.

How to choose the right analysis method

To choose the right statistical methods, you need to think about two important factors :

  • The type of quantitative data you have (specifically, level of measurement and the shape of the data). And,
  • Your research questions and hypotheses

Let’s take a closer look at each of these.

Factor 1 – Data type

The first thing you need to consider is the type of data you’ve collected (or the type of data you will collect). By data types, I’m referring to the four levels of measurement – namely, nominal, ordinal, interval and ratio. If you’re not familiar with this lingo, check out the video below.

Why does this matter?

Well, because different statistical methods and techniques require different types of data. This is one of the “assumptions” I mentioned earlier – every method has its assumptions regarding the type of data.

For example, some techniques work with categorical data (for example, yes/no type questions, or gender or ethnicity), while others work with continuous numerical data (for example, age, weight or income) – and, of course, some work with multiple data types.

If you try to use a statistical method that doesn’t support the data type you have, your results will be largely meaningless . So, make sure that you have a clear understanding of what types of data you’ve collected (or will collect). Once you have this, you can then check which statistical methods would support your data types here .

If you haven’t collected your data yet, you can work in reverse and look at which statistical method would give you the most useful insights, and then design your data collection strategy to collect the correct data types.

Another important factor to consider is the shape of your data . Specifically, does it have a normal distribution (in other words, is it a bell-shaped curve, centred in the middle) or is it very skewed to the left or the right? Again, different statistical techniques work for different shapes of data – some are designed for symmetrical data while others are designed for skewed data.

This is another reminder of why descriptive statistics are so important – they tell you all about the shape of your data.

Factor 2: Your research questions

The next thing you need to consider is your specific research questions, as well as your hypotheses (if you have some). The nature of your research questions and research hypotheses will heavily influence which statistical methods and techniques you should use.

If you’re just interested in understanding the attributes of your sample (as opposed to the entire population), then descriptive statistics are probably all you need. For example, if you just want to assess the means (averages) and medians (centre points) of variables in a group of people.

On the other hand, if you aim to understand differences between groups or relationships between variables and to infer or predict outcomes in the population, then you’ll likely need both descriptive statistics and inferential statistics.

So, it’s really important to get very clear about your research aims and research questions, as well your hypotheses – before you start looking at which statistical techniques to use.

Never shoehorn a specific statistical technique into your research just because you like it or have some experience with it. Your choice of methods must align with all the factors we’ve covered here.

Time to recap…

You’re still with me? That’s impressive. We’ve covered a lot of ground here, so let’s recap on the key points:

  • Quantitative data analysis is all about  analysing number-based data  (which includes categorical and numerical data) using various statistical techniques.
  • The two main  branches  of statistics are  descriptive statistics  and  inferential statistics . Descriptives describe your sample, whereas inferentials make predictions about what you’ll find in the population.
  • Common  descriptive statistical methods include  mean  (average),  median , standard  deviation  and  skewness .
  • Common  inferential statistical methods include  t-tests ,  ANOVA ,  correlation  and  regression  analysis.
  • To choose the right statistical methods and techniques, you need to consider the  type of data you’re working with , as well as your  research questions  and hypotheses.

analysis techniques in quantitative research

Psst... there’s more!

This post was based on one of our popular Research Bootcamps . If you're working on a research project, you'll definitely want to check this out ...

You Might Also Like:

Narrative analysis explainer

74 Comments

Oddy Labs

Hi, I have read your article. Such a brilliant post you have created.

Derek Jansen

Thank you for the feedback. Good luck with your quantitative analysis.

Abdullahi Ramat

Thank you so much.

Obi Eric Onyedikachi

Thank you so much. I learnt much well. I love your summaries of the concepts. I had love you to explain how to input data using SPSS

Lumbuka Kaunda

Amazing and simple way of breaking down quantitative methods.

Charles Lwanga

This is beautiful….especially for non-statisticians. I have skimmed through but I wish to read again. and please include me in other articles of the same nature when you do post. I am interested. I am sure, I could easily learn from you and get off the fear that I have had in the past. Thank you sincerely.

Essau Sefolo

Send me every new information you might have.

fatime

i need every new information

Dr Peter

Thank you for the blog. It is quite informative. Dr Peter Nemaenzhe PhD

Mvogo Mvogo Ephrem

It is wonderful. l’ve understood some of the concepts in a more compréhensive manner

Maya

Your article is so good! However, I am still a bit lost. I am doing a secondary research on Gun control in the US and increase in crime rates and I am not sure which analysis method I should use?

Joy

Based on the given learning points, this is inferential analysis, thus, use ‘t-tests, ANOVA, correlation and regression analysis’

Peter

Well explained notes. Am an MPH student and currently working on my thesis proposal, this has really helped me understand some of the things I didn’t know.

Jejamaije Mujoro

I like your page..helpful

prashant pandey

wonderful i got my concept crystal clear. thankyou!!

Dailess Banda

This is really helpful , thank you

Lulu

Thank you so much this helped

wossen

Wonderfully explained

Niamatullah zaheer

thank u so much, it was so informative

mona

THANKYOU, this was very informative and very helpful

Thaddeus Ogwoka

This is great GRADACOACH I am not a statistician but I require more of this in my thesis

Include me in your posts.

Alem Teshome

This is so great and fully useful. I would like to thank you again and again.

Mrinal

Glad to read this article. I’ve read lot of articles but this article is clear on all concepts. Thanks for sharing.

Emiola Adesina

Thank you so much. This is a very good foundation and intro into quantitative data analysis. Appreciate!

Josyl Hey Aquilam

You have a very impressive, simple but concise explanation of data analysis for Quantitative Research here. This is a God-send link for me to appreciate research more. Thank you so much!

Lynnet Chikwaikwai

Avery good presentation followed by the write up. yes you simplified statistics to make sense even to a layman like me. Thank so much keep it up. The presenter did ell too. i would like more of this for Qualitative and exhaust more of the test example like the Anova.

Adewole Ikeoluwa

This is a very helpful article, couldn’t have been clearer. Thank you.

Samih Soud ALBusaidi

Awesome and phenomenal information.Well done

Nūr

The video with the accompanying article is super helpful to demystify this topic. Very well done. Thank you so much.

Lalah

thank you so much, your presentation helped me a lot

Anjali

I don’t know how should I express that ur article is saviour for me 🥺😍

Saiqa Aftab Tunio

It is well defined information and thanks for sharing. It helps me a lot in understanding the statistical data.

Funeka Mvandaba

I gain a lot and thanks for sharing brilliant ideas, so wish to be linked on your email update.

Rita Kathomi Gikonyo

Very helpful and clear .Thank you Gradcoach.

Hilaria Barsabal

Thank for sharing this article, well organized and information presented are very clear.

AMON TAYEBWA

VERY INTERESTING AND SUPPORTIVE TO NEW RESEARCHERS LIKE ME. AT LEAST SOME BASICS ABOUT QUANTITATIVE.

Tariq

An outstanding, well explained and helpful article. This will help me so much with my data analysis for my research project. Thank you!

chikumbutso

wow this has just simplified everything i was scared of how i am gonna analyse my data but thanks to you i will be able to do so

Idris Haruna

simple and constant direction to research. thanks

Mbunda Castro

This is helpful

AshikB

Great writing!! Comprehensive and very helpful.

himalaya ravi

Do you provide any assistance for other steps of research methodology like making research problem testing hypothesis report and thesis writing?

Sarah chiwamba

Thank you so much for such useful article!

Lopamudra

Amazing article. So nicely explained. Wow

Thisali Liyanage

Very insightfull. Thanks

Melissa

I am doing a quality improvement project to determine if the implementation of a protocol will change prescribing habits. Would this be a t-test?

Aliyah

The is a very helpful blog, however, I’m still not sure how to analyze my data collected. I’m doing a research on “Free Education at the University of Guyana”

Belayneh Kassahun

tnx. fruitful blog!

Suzanne

So I am writing exams and would like to know how do establish which method of data analysis to use from the below research questions: I am a bit lost as to how I determine the data analysis method from the research questions.

Do female employees report higher job satisfaction than male employees with similar job descriptions across the South African telecommunications sector? – I though that maybe Chi Square could be used here. – Is there a gender difference in talented employees’ actual turnover decisions across the South African telecommunications sector? T-tests or Correlation in this one. – Is there a gender difference in the cost of actual turnover decisions across the South African telecommunications sector? T-tests or Correlation in this one. – What practical recommendations can be made to the management of South African telecommunications companies on leveraging gender to mitigate employee turnover decisions?

Your assistance will be appreciated if I could get a response as early as possible tomorrow

Like

This was quite helpful. Thank you so much.

kidane Getachew

wow I got a lot from this article, thank you very much, keep it up

FAROUK AHMAD NKENGA

Thanks for yhe guidance. Can you send me this guidance on my email? To enable offline reading?

Nosi Ruth Xabendlini

Thank you very much, this service is very helpful.

George William Kiyingi

Every novice researcher needs to read this article as it puts things so clear and easy to follow. Its been very helpful.

Adebisi

Wonderful!!!! you explained everything in a way that anyone can learn. Thank you!!

Miss Annah

I really enjoyed reading though this. Very easy to follow. Thank you

Reza Kia

Many thanks for your useful lecture, I would be really appreciated if you could possibly share with me the PPT of presentation related to Data type?

Protasia Tairo

Thank you very much for sharing, I got much from this article

Fatuma Chobo

This is a very informative write-up. Kindly include me in your latest posts.

naphtal

Very interesting mostly for social scientists

Boy M. Bachtiar

Thank you so much, very helpfull

You’re welcome 🙂

Dr Mafaza Mansoor

woow, its great, its very informative and well understood because of your way of writing like teaching in front of me in simple languages.

Opio Len

I have been struggling to understand a lot of these concepts. Thank you for the informative piece which is written with outstanding clarity.

Eric

very informative article. Easy to understand

Leena Fukey

Beautiful read, much needed.

didin

Always greet intro and summary. I learn so much from GradCoach

Mmusyoka

Quite informative. Simple and clear summary.

Jewel Faver

I thoroughly enjoyed reading your informative and inspiring piece. Your profound insights into this topic truly provide a better understanding of its complexity. I agree with the points you raised, especially when you delved into the specifics of the article. In my opinion, that aspect is often overlooked and deserves further attention.

Shantae

Absolutely!!! Thank you

Thazika Chitimera

Thank you very much for this post. It made me to understand how to do my data analysis.

Submit a Comment Cancel reply

Your email address will not be published. Required fields are marked *

Save my name, email, and website in this browser for the next time I comment.

  • Print Friendly

Quantitative Data Analysis: A Comprehensive Guide

By: Ofem Eteng | Published: May 18, 2022

A healthcare giant successfully introduces the most effective drug dosage through rigorous statistical modeling, saving countless lives. A marketing team predicts consumer trends with uncanny accuracy, tailoring campaigns for maximum impact.

Table of Contents

These trends and dosages are not just any numbers but are a result of meticulous quantitative data analysis. Quantitative data analysis offers a robust framework for understanding complex phenomena, evaluating hypotheses, and predicting future outcomes.

In this blog, we’ll walk through the concept of quantitative data analysis, the steps required, its advantages, and the methods and techniques that are used in this analysis. Read on!

What is Quantitative Data Analysis?

Quantitative data analysis is a systematic process of examining, interpreting, and drawing meaningful conclusions from numerical data. It involves the application of statistical methods, mathematical models, and computational techniques to understand patterns, relationships, and trends within datasets.

Quantitative data analysis methods typically work with algorithms, mathematical analysis tools, and software to gain insights from the data, answering questions such as how many, how often, and how much. Data for quantitative data analysis is usually collected from close-ended surveys, questionnaires, polls, etc. The data can also be obtained from sales figures, email click-through rates, number of website visitors, and percentage revenue increase. 

Quantitative Data Analysis vs Qualitative Data Analysis

When we talk about data, we directly think about the pattern, the relationship, and the connection between the datasets – analyzing the data in short. Therefore when it comes to data analysis, there are broadly two types – Quantitative Data Analysis and Qualitative Data Analysis.

Quantitative data analysis revolves around numerical data and statistics, which are suitable for functions that can be counted or measured. In contrast, qualitative data analysis includes description and subjective information – for things that can be observed but not measured.

Let us differentiate between Quantitative Data Analysis and Quantitative Data Analysis for a better understanding.

Data Preparation Steps for Quantitative Data Analysis

Quantitative data has to be gathered and cleaned before proceeding to the stage of analyzing it. Below are the steps to prepare a data before quantitative research analysis:

  • Step 1: Data Collection

Before beginning the analysis process, you need data. Data can be collected through rigorous quantitative research, which includes methods such as interviews, focus groups, surveys, and questionnaires.

  • Step 2: Data Cleaning

Once the data is collected, begin the data cleaning process by scanning through the entire data for duplicates, errors, and omissions. Keep a close eye for outliers (data points that are significantly different from the majority of the dataset) because they can skew your analysis results if they are not removed.

This data-cleaning process ensures data accuracy, consistency and relevancy before analysis.

  • Step 3: Data Analysis and Interpretation

Now that you have collected and cleaned your data, it is now time to carry out the quantitative analysis. There are two methods of quantitative data analysis, which we will discuss in the next section.

However, if you have data from multiple sources, collecting and cleaning it can be a cumbersome task. This is where Hevo Data steps in. With Hevo, extracting, transforming, and loading data from source to destination becomes a seamless task, eliminating the need for manual coding. This not only saves valuable time but also enhances the overall efficiency of data analysis and visualization, empowering users to derive insights quickly and with precision

Hevo is the only real-time ELT No-code Data Pipeline platform that cost-effectively automates data pipelines that are flexible to your needs. With integration with 150+ Data Sources (40+ free sources), we help you not only export data from sources & load data to the destinations but also transform & enrich your data, & make it analysis-ready.

Start for free now!

Now that you are familiar with what quantitative data analysis is and how to prepare your data for analysis, the focus will shift to the purpose of this article, which is to describe the methods and techniques of quantitative data analysis.

Methods and Techniques of Quantitative Data Analysis

Quantitative data analysis employs two techniques to extract meaningful insights from datasets, broadly. The first method is descriptive statistics, which summarizes and portrays essential features of a dataset, such as mean, median, and standard deviation.

Inferential statistics, the second method, extrapolates insights and predictions from a sample dataset to make broader inferences about an entire population, such as hypothesis testing and regression analysis.

An in-depth explanation of both the methods is provided below:

  • Descriptive Statistics
  • Inferential Statistics

1) Descriptive Statistics

Descriptive statistics as the name implies is used to describe a dataset. It helps understand the details of your data by summarizing it and finding patterns from the specific data sample. They provide absolute numbers obtained from a sample but do not necessarily explain the rationale behind the numbers and are mostly used for analyzing single variables. The methods used in descriptive statistics include: 

  • Mean:   This calculates the numerical average of a set of values.
  • Median: This is used to get the midpoint of a set of values when the numbers are arranged in numerical order.
  • Mode: This is used to find the most commonly occurring value in a dataset.
  • Percentage: This is used to express how a value or group of respondents within the data relates to a larger group of respondents.
  • Frequency: This indicates the number of times a value is found.
  • Range: This shows the highest and lowest values in a dataset.
  • Standard Deviation: This is used to indicate how dispersed a range of numbers is, meaning, it shows how close all the numbers are to the mean.
  • Skewness: It indicates how symmetrical a range of numbers is, showing if they cluster into a smooth bell curve shape in the middle of the graph or if they skew towards the left or right.

2) Inferential Statistics

In quantitative analysis, the expectation is to turn raw numbers into meaningful insight using numerical values, and descriptive statistics is all about explaining details of a specific dataset using numbers, but it does not explain the motives behind the numbers; hence, a need for further analysis using inferential statistics.

Inferential statistics aim to make predictions or highlight possible outcomes from the analyzed data obtained from descriptive statistics. They are used to generalize results and make predictions between groups, show relationships that exist between multiple variables, and are used for hypothesis testing that predicts changes or differences.

There are various statistical analysis methods used within inferential statistics; a few are discussed below.

  • Cross Tabulations: Cross tabulation or crosstab is used to show the relationship that exists between two variables and is often used to compare results by demographic groups. It uses a basic tabular form to draw inferences between different data sets and contains data that is mutually exclusive or has some connection with each other. Crosstabs help understand the nuances of a dataset and factors that may influence a data point.
  • Regression Analysis: Regression analysis estimates the relationship between a set of variables. It shows the correlation between a dependent variable (the variable or outcome you want to measure or predict) and any number of independent variables (factors that may impact the dependent variable). Therefore, the purpose of the regression analysis is to estimate how one or more variables might affect a dependent variable to identify trends and patterns to make predictions and forecast possible future trends. There are many types of regression analysis, and the model you choose will be determined by the type of data you have for the dependent variable. The types of regression analysis include linear regression, non-linear regression, binary logistic regression, etc.
  • Monte Carlo Simulation: Monte Carlo simulation, also known as the Monte Carlo method, is a computerized technique of generating models of possible outcomes and showing their probability distributions. It considers a range of possible outcomes and then tries to calculate how likely each outcome will occur. Data analysts use it to perform advanced risk analyses to help forecast future events and make decisions accordingly.
  • Analysis of Variance (ANOVA): This is used to test the extent to which two or more groups differ from each other. It compares the mean of various groups and allows the analysis of multiple groups.
  • Factor Analysis:   A large number of variables can be reduced into a smaller number of factors using the factor analysis technique. It works on the principle that multiple separate observable variables correlate with each other because they are all associated with an underlying construct. It helps in reducing large datasets into smaller, more manageable samples.
  • Cohort Analysis: Cohort analysis can be defined as a subset of behavioral analytics that operates from data taken from a given dataset. Rather than looking at all users as one unit, cohort analysis breaks down data into related groups for analysis, where these groups or cohorts usually have common characteristics or similarities within a defined period.
  • MaxDiff Analysis: This is a quantitative data analysis method that is used to gauge customers’ preferences for purchase and what parameters rank higher than the others in the process. 
  • Cluster Analysis: Cluster analysis is a technique used to identify structures within a dataset. Cluster analysis aims to be able to sort different data points into groups that are internally similar and externally different; that is, data points within a cluster will look like each other and different from data points in other clusters.
  • Time Series Analysis: This is a statistical analytic technique used to identify trends and cycles over time. It is simply the measurement of the same variables at different times, like weekly and monthly email sign-ups, to uncover trends, seasonality, and cyclic patterns. By doing this, the data analyst can forecast how variables of interest may fluctuate in the future. 
  • SWOT analysis: This is a quantitative data analysis method that assigns numerical values to indicate strengths, weaknesses, opportunities, and threats of an organization, product, or service to show a clearer picture of competition to foster better business strategies

How to Choose the Right Method for your Analysis?

Choosing between Descriptive Statistics or Inferential Statistics can be often confusing. You should consider the following factors before choosing the right method for your quantitative data analysis:

1. Type of Data

The first consideration in data analysis is understanding the type of data you have. Different statistical methods have specific requirements based on these data types, and using the wrong method can render results meaningless. The choice of statistical method should align with the nature and distribution of your data to ensure meaningful and accurate analysis.

2. Your Research Questions

When deciding on statistical methods, it’s crucial to align them with your specific research questions and hypotheses. The nature of your questions will influence whether descriptive statistics alone, which reveal sample attributes, are sufficient or if you need both descriptive and inferential statistics to understand group differences or relationships between variables and make population inferences.

Pros and Cons of Quantitative Data Analysis

1. Objectivity and Generalizability:

  • Quantitative data analysis offers objective, numerical measurements, minimizing bias and personal interpretation.
  • Results can often be generalized to larger populations, making them applicable to broader contexts.

Example: A study using quantitative data analysis to measure student test scores can objectively compare performance across different schools and demographics, leading to generalizable insights about educational strategies.

2. Precision and Efficiency:

  • Statistical methods provide precise numerical results, allowing for accurate comparisons and prediction.
  • Large datasets can be analyzed efficiently with the help of computer software, saving time and resources.

Example: A marketing team can use quantitative data analysis to precisely track click-through rates and conversion rates on different ad campaigns, quickly identifying the most effective strategies for maximizing customer engagement.

3. Identification of Patterns and Relationships:

  • Statistical techniques reveal hidden patterns and relationships between variables that might not be apparent through observation alone.
  • This can lead to new insights and understanding of complex phenomena.

Example: A medical researcher can use quantitative analysis to pinpoint correlations between lifestyle factors and disease risk, aiding in the development of prevention strategies.

1. Limited Scope:

  • Quantitative analysis focuses on quantifiable aspects of a phenomenon ,  potentially overlooking important qualitative nuances, such as emotions, motivations, or cultural contexts.

Example: A survey measuring customer satisfaction with numerical ratings might miss key insights about the underlying reasons for their satisfaction or dissatisfaction, which could be better captured through open-ended feedback.

2. Oversimplification:

  • Reducing complex phenomena to numerical data can lead to oversimplification and a loss of richness in understanding.

Example: Analyzing employee productivity solely through quantitative metrics like hours worked or tasks completed might not account for factors like creativity, collaboration, or problem-solving skills, which are crucial for overall performance.

3. Potential for Misinterpretation:

  • Statistical results can be misinterpreted if not analyzed carefully and with appropriate expertise.
  • The choice of statistical methods and assumptions can significantly influence results.

This blog discusses the steps, methods, and techniques of quantitative data analysis. It also gives insights into the methods of data collection, the type of data one should work with, and the pros and cons of such analysis.

Gain a better understanding of data analysis with these essential reads:

  • Data Analysis and Modeling: 4 Critical Differences
  • Exploratory Data Analysis Simplified 101
  • 25 Best Data Analysis Tools in 2024

Carrying out successful data analysis requires prepping the data and making it analysis-ready. That is where Hevo steps in.

Want to give Hevo a try? Sign Up for a 14-day free trial and experience the feature-rich Hevo suite first hand. You may also have a look at the amazing Hevo price , which will assist you in selecting the best plan for your requirements.

Share your experience of understanding Quantitative Data Analysis in the comment section below! We would love to hear your thoughts.

Ofem Eteng

Ofem is a freelance writer specializing in data-related topics, who has expertise in translating complex concepts. With a focus on data science, analytics, and emerging technologies.

No-code Data Pipeline for your Data Warehouse

  • Data Strategy

Get Started with Hevo

Related articles

analysis techniques in quantitative research

Preetipadma Khandavilli

Data Mining and Data Analysis: 4 Key Differences

analysis techniques in quantitative research

Nicholas Samuel

Data Quality Analysis Simplified: A Comprehensive Guide 101

analysis techniques in quantitative research

Sharon Rithika

Data Analysis in Tableau: Unleash the Power of COUNTIF

I want to read this e-book.

analysis techniques in quantitative research

Learn / Guides / Quantitative data analysis guide

Back to guides

The ultimate guide to quantitative data analysis

Numbers help us make sense of the world. We collect quantitative data on our speed and distance as we drive, the number of hours we spend on our cell phones, and how much we save at the grocery store.

Our businesses run on numbers, too. We spend hours poring over key performance indicators (KPIs) like lead-to-client conversions, net profit margins, and bounce and churn rates.

But all of this quantitative data can feel overwhelming and confusing. Lists and spreadsheets of numbers don’t tell you much on their own—you have to conduct quantitative data analysis to understand them and make informed decisions.

Last updated

Reading time.

analysis techniques in quantitative research

This guide explains what quantitative data analysis is and why it’s important, and gives you a four-step process to conduct a quantitative data analysis, so you know exactly what’s happening in your business and what your users need .

Collect quantitative customer data with Hotjar

Use Hotjar’s tools to gather the customer insights you need to make quantitative data analysis a breeze.

What is quantitative data analysis? 

Quantitative data analysis is the process of analyzing and interpreting numerical data. It helps you make sense of information by identifying patterns, trends, and relationships between variables through mathematical calculations and statistical tests. 

With quantitative data analysis, you turn spreadsheets of individual data points into meaningful insights to drive informed decisions. Columns of numbers from an experiment or survey transform into useful insights—like which marketing campaign asset your average customer prefers or which website factors are most closely connected to your bounce rate. 

Without analytics, data is just noise. Analyzing data helps you make decisions which are informed and free from bias.

What quantitative data analysis is not

But as powerful as quantitative data analysis is, it’s not without its limitations. It only gives you the what, not the why . For example, it can tell you how many website visitors or conversions you have on an average day, but it can’t tell you why users visited your site or made a purchase.

For the why behind user behavior, you need qualitative data analysis , a process for making sense of qualitative research like open-ended survey responses, interview clips, or behavioral observations. By analyzing non-numerical data, you gain useful contextual insights to shape your strategy, product, and messaging. 

Quantitative data analysis vs. qualitative data analysis 

Let’s take an even deeper dive into the differences between quantitative data analysis and qualitative data analysis to explore what they do and when you need them.

analysis techniques in quantitative research

The bottom line: quantitative data analysis and qualitative data analysis are complementary processes. They work hand-in-hand to tell you what’s happening in your business and why.  

💡 Pro tip: easily toggle between quantitative and qualitative data analysis with Hotjar Funnels . 

The Funnels tool helps you visualize quantitative metrics like drop-off and conversion rates in your sales or conversion funnel to understand when and where users leave your website. You can break down your data even further to compare conversion performance by user segment.

Spot a potential issue? A single click takes you to relevant session recordings , where you see user behaviors like mouse movements, scrolls, and clicks. With this qualitative data to provide context, you'll better understand what you need to optimize to streamline the user experience (UX) and increase conversions .

Hotjar Funnels lets you quickly explore the story behind the quantitative data

4 benefits of quantitative data analysis

There’s a reason product, web design, and marketing teams take time to analyze metrics: the process pays off big time. 

Four major benefits of quantitative data analysis include:

1. Make confident decisions 

With quantitative data analysis, you know you’ve got data-driven insights to back up your decisions . For example, if you launch a concept testing survey to gauge user reactions to a new logo design, and 92% of users rate it ‘very good’—you'll feel certain when you give the designer the green light. 

Since you’re relying less on intuition and more on facts, you reduce the risks of making the wrong decision. (You’ll also find it way easier to get buy-in from team members and stakeholders for your next proposed project. 🙌)

2. Reduce costs

By crunching the numbers, you can spot opportunities to reduce spend . For example, if an ad campaign has lower-than-average click-through rates , you might decide to cut your losses and invest your budget elsewhere. 

Or, by analyzing ecommerce metrics , like website traffic by source, you may find you’re getting very little return on investment from a certain social media channel—and scale back spending in that area.

3. Personalize the user experience

Quantitative data analysis helps you map the customer journey , so you get a better sense of customers’ demographics, what page elements they interact with on your site, and where they drop off or convert . 

These insights let you better personalize your website, product, or communication, so you can segment ads, emails, and website content for specific user personas or target groups.

4. Improve user satisfaction and delight

Quantitative data analysis lets you see where your website or product is doing well—and where it falls short for your users . For example, you might see stellar results from KPIs like time on page, but conversion rates for that page are low. 

These quantitative insights encourage you to dive deeper into qualitative data to see why that’s happening—looking for moments of confusion or frustration on session recordings, for example—so you can make adjustments and optimize your conversions by improving customer satisfaction and delight.

💡Pro tip: use Net Promoter Score® (NPS) surveys to capture quantifiable customer satisfaction data that’s easy for you to analyze and interpret. 

With an NPS tool like Hotjar, you can create an on-page survey to ask users how likely they are to recommend you to others on a scale from 0 to 10. (And for added context, you can ask follow-up questions about why customers selected the rating they did—rich qualitative data is always a bonus!)

analysis techniques in quantitative research

Hotjar graphs your quantitative NPS data to show changes over time

4 steps to effective quantitative data analysis 

Quantitative data analysis sounds way more intimidating than it actually is. Here’s how to make sense of your company’s numbers in just four steps:

1. Collect data

Before you can actually start the analysis process, you need data to analyze. This involves conducting quantitative research and collecting numerical data from various sources, including: 

Interviews or focus groups 

Website analytics

Observations, from tools like heatmaps or session recordings

Questionnaires, like surveys or on-page feedback widgets

Just ensure the questions you ask in your surveys are close-ended questions—providing respondents with select choices to choose from instead of open-ended questions that allow for free responses.

analysis techniques in quantitative research

Hotjar’s pricing plans survey template provides close-ended questions

 2. Clean data

Once you’ve collected your data, it’s time to clean it up. Look through your results to find errors, duplicates, and omissions. Keep an eye out for outliers, too. Outliers are data points that differ significantly from the rest of the set—and they can skew your results if you don’t remove them.

By taking the time to clean your data set, you ensure your data is accurate, consistent, and relevant before it’s time to analyze. 

3. Analyze and interpret data

At this point, your data’s all cleaned up and ready for the main event. This step involves crunching the numbers to find patterns and trends via mathematical and statistical methods. 

Two main branches of quantitative data analysis exist: 

Descriptive analysis : methods to summarize or describe attributes of your data set. For example, you may calculate key stats like distribution and frequency, or mean, median, and mode.

Inferential analysis : methods that let you draw conclusions from statistics—like analyzing the relationship between variables or making predictions. These methods include t-tests, cross-tabulation, and factor analysis. (For more detailed explanations and how-tos, head to our guide on quantitative data analysis methods.)

Then, interpret your data to determine the best course of action. What does the data suggest you do ? For example, if your analysis shows a strong correlation between email open rate and time sent, you may explore optimal send times for each user segment.

4. Visualize and share data

Once you’ve analyzed and interpreted your data, create easy-to-read, engaging data visualizations—like charts, graphs, and tables—to present your results to team members and stakeholders. Data visualizations highlight similarities and differences between data sets and show the relationships between variables.

Software can do this part for you. For example, the Hotjar Dashboard shows all of your key metrics in one place—and automatically creates bar graphs to show how your top pages’ performance compares. And with just one click, you can navigate to the Trends tool to analyze product metrics for different segments on a single chart. 

Hotjar Trends lets you compare metrics across segments

Discover rich user insights with quantitative data analysis

Conducting quantitative data analysis takes a little bit of time and know-how, but it’s much more manageable than you might think. 

By choosing the right methods and following clear steps, you gain insights into product performance and customer experience —and you’ll be well on your way to making better decisions and creating more customer satisfaction and loyalty.

FAQs about quantitative data analysis

What is quantitative data analysis.

Quantitative data analysis is the process of making sense of numerical data through mathematical calculations and statistical tests. It helps you identify patterns, relationships, and trends to make better decisions.

How is quantitative data analysis different from qualitative data analysis?

Quantitative and qualitative data analysis are both essential processes for making sense of quantitative and qualitative research .

Quantitative data analysis helps you summarize and interpret numerical results from close-ended questions to understand what is happening. Qualitative data analysis helps you summarize and interpret non-numerical results, like opinions or behavior, to understand why the numbers look like they do.

 If you want to make strong data-driven decisions, you need both.

What are some benefits of quantitative data analysis?

Quantitative data analysis turns numbers into rich insights. Some benefits of this process include: 

Making more confident decisions

Identifying ways to cut costs

Personalizing the user experience

Improving customer satisfaction

What methods can I use to analyze quantitative data?

Quantitative data analysis has two branches: descriptive statistics and inferential statistics. 

Descriptive statistics provide a snapshot of the data’s features by calculating measures like mean, median, and mode. 

Inferential statistics , as the name implies, involves making inferences about what the data means. Dozens of methods exist for this branch of quantitative data analysis, but three commonly used techniques are: 

Cross tabulation

Factor analysis

  • USC Libraries
  • Research Guides

Organizing Your Social Sciences Research Paper

  • Quantitative Methods
  • Purpose of Guide
  • Design Flaws to Avoid
  • Independent and Dependent Variables
  • Glossary of Research Terms
  • Reading Research Effectively
  • Narrowing a Topic Idea
  • Broadening a Topic Idea
  • Extending the Timeliness of a Topic Idea
  • Academic Writing Style
  • Applying Critical Thinking
  • Choosing a Title
  • Making an Outline
  • Paragraph Development
  • Research Process Video Series
  • Executive Summary
  • The C.A.R.S. Model
  • Background Information
  • The Research Problem/Question
  • Theoretical Framework
  • Citation Tracking
  • Content Alert Services
  • Evaluating Sources
  • Primary Sources
  • Secondary Sources
  • Tiertiary Sources
  • Scholarly vs. Popular Publications
  • Qualitative Methods
  • Insiderness
  • Using Non-Textual Elements
  • Limitations of the Study
  • Common Grammar Mistakes
  • Writing Concisely
  • Avoiding Plagiarism
  • Footnotes or Endnotes?
  • Further Readings
  • Generative AI and Writing
  • USC Libraries Tutorials and Other Guides
  • Bibliography

Quantitative methods emphasize objective measurements and the statistical, mathematical, or numerical analysis of data collected through polls, questionnaires, and surveys, or by manipulating pre-existing statistical data using computational techniques . Quantitative research focuses on gathering numerical data and generalizing it across groups of people or to explain a particular phenomenon.

Babbie, Earl R. The Practice of Social Research . 12th ed. Belmont, CA: Wadsworth Cengage, 2010; Muijs, Daniel. Doing Quantitative Research in Education with SPSS . 2nd edition. London: SAGE Publications, 2010.

Need Help Locating Statistics?

Resources for locating data and statistics can be found here:

Statistics & Data Research Guide

Characteristics of Quantitative Research

Your goal in conducting quantitative research study is to determine the relationship between one thing [an independent variable] and another [a dependent or outcome variable] within a population. Quantitative research designs are either descriptive [subjects usually measured once] or experimental [subjects measured before and after a treatment]. A descriptive study establishes only associations between variables; an experimental study establishes causality.

Quantitative research deals in numbers, logic, and an objective stance. Quantitative research focuses on numeric and unchanging data and detailed, convergent reasoning rather than divergent reasoning [i.e., the generation of a variety of ideas about a research problem in a spontaneous, free-flowing manner].

Its main characteristics are :

  • The data is usually gathered using structured research instruments.
  • The results are based on larger sample sizes that are representative of the population.
  • The research study can usually be replicated or repeated, given its high reliability.
  • Researcher has a clearly defined research question to which objective answers are sought.
  • All aspects of the study are carefully designed before data is collected.
  • Data are in the form of numbers and statistics, often arranged in tables, charts, figures, or other non-textual forms.
  • Project can be used to generalize concepts more widely, predict future results, or investigate causal relationships.
  • Researcher uses tools, such as questionnaires or computer software, to collect numerical data.

The overarching aim of a quantitative research study is to classify features, count them, and construct statistical models in an attempt to explain what is observed.

  Things to keep in mind when reporting the results of a study using quantitative methods :

  • Explain the data collected and their statistical treatment as well as all relevant results in relation to the research problem you are investigating. Interpretation of results is not appropriate in this section.
  • Report unanticipated events that occurred during your data collection. Explain how the actual analysis differs from the planned analysis. Explain your handling of missing data and why any missing data does not undermine the validity of your analysis.
  • Explain the techniques you used to "clean" your data set.
  • Choose a minimally sufficient statistical procedure ; provide a rationale for its use and a reference for it. Specify any computer programs used.
  • Describe the assumptions for each procedure and the steps you took to ensure that they were not violated.
  • When using inferential statistics , provide the descriptive statistics, confidence intervals, and sample sizes for each variable as well as the value of the test statistic, its direction, the degrees of freedom, and the significance level [report the actual p value].
  • Avoid inferring causality , particularly in nonrandomized designs or without further experimentation.
  • Use tables to provide exact values ; use figures to convey global effects. Keep figures small in size; include graphic representations of confidence intervals whenever possible.
  • Always tell the reader what to look for in tables and figures .

NOTE:   When using pre-existing statistical data gathered and made available by anyone other than yourself [e.g., government agency], you still must report on the methods that were used to gather the data and describe any missing data that exists and, if there is any, provide a clear explanation why the missing data does not undermine the validity of your final analysis.

Babbie, Earl R. The Practice of Social Research . 12th ed. Belmont, CA: Wadsworth Cengage, 2010; Brians, Craig Leonard et al. Empirical Political Analysis: Quantitative and Qualitative Research Methods . 8th ed. Boston, MA: Longman, 2011; McNabb, David E. Research Methods in Public Administration and Nonprofit Management: Quantitative and Qualitative Approaches . 2nd ed. Armonk, NY: M.E. Sharpe, 2008; Quantitative Research Methods. Writing@CSU. Colorado State University; Singh, Kultar. Quantitative Social Research Methods . Los Angeles, CA: Sage, 2007.

Basic Research Design for Quantitative Studies

Before designing a quantitative research study, you must decide whether it will be descriptive or experimental because this will dictate how you gather, analyze, and interpret the results. A descriptive study is governed by the following rules: subjects are generally measured once; the intention is to only establish associations between variables; and, the study may include a sample population of hundreds or thousands of subjects to ensure that a valid estimate of a generalized relationship between variables has been obtained. An experimental design includes subjects measured before and after a particular treatment, the sample population may be very small and purposefully chosen, and it is intended to establish causality between variables. Introduction The introduction to a quantitative study is usually written in the present tense and from the third person point of view. It covers the following information:

  • Identifies the research problem -- as with any academic study, you must state clearly and concisely the research problem being investigated.
  • Reviews the literature -- review scholarship on the topic, synthesizing key themes and, if necessary, noting studies that have used similar methods of inquiry and analysis. Note where key gaps exist and how your study helps to fill these gaps or clarifies existing knowledge.
  • Describes the theoretical framework -- provide an outline of the theory or hypothesis underpinning your study. If necessary, define unfamiliar or complex terms, concepts, or ideas and provide the appropriate background information to place the research problem in proper context [e.g., historical, cultural, economic, etc.].

Methodology The methods section of a quantitative study should describe how each objective of your study will be achieved. Be sure to provide enough detail to enable the reader can make an informed assessment of the methods being used to obtain results associated with the research problem. The methods section should be presented in the past tense.

  • Study population and sampling -- where did the data come from; how robust is it; note where gaps exist or what was excluded. Note the procedures used for their selection;
  • Data collection – describe the tools and methods used to collect information and identify the variables being measured; describe the methods used to obtain the data; and, note if the data was pre-existing [i.e., government data] or you gathered it yourself. If you gathered it yourself, describe what type of instrument you used and why. Note that no data set is perfect--describe any limitations in methods of gathering data.
  • Data analysis -- describe the procedures for processing and analyzing the data. If appropriate, describe the specific instruments of analysis used to study each research objective, including mathematical techniques and the type of computer software used to manipulate the data.

Results The finding of your study should be written objectively and in a succinct and precise format. In quantitative studies, it is common to use graphs, tables, charts, and other non-textual elements to help the reader understand the data. Make sure that non-textual elements do not stand in isolation from the text but are being used to supplement the overall description of the results and to help clarify key points being made. Further information about how to effectively present data using charts and graphs can be found here .

  • Statistical analysis -- how did you analyze the data? What were the key findings from the data? The findings should be present in a logical, sequential order. Describe but do not interpret these trends or negative results; save that for the discussion section. The results should be presented in the past tense.

Discussion Discussions should be analytic, logical, and comprehensive. The discussion should meld together your findings in relation to those identified in the literature review, and placed within the context of the theoretical framework underpinning the study. The discussion should be presented in the present tense.

  • Interpretation of results -- reiterate the research problem being investigated and compare and contrast the findings with the research questions underlying the study. Did they affirm predicted outcomes or did the data refute it?
  • Description of trends, comparison of groups, or relationships among variables -- describe any trends that emerged from your analysis and explain all unanticipated and statistical insignificant findings.
  • Discussion of implications – what is the meaning of your results? Highlight key findings based on the overall results and note findings that you believe are important. How have the results helped fill gaps in understanding the research problem?
  • Limitations -- describe any limitations or unavoidable bias in your study and, if necessary, note why these limitations did not inhibit effective interpretation of the results.

Conclusion End your study by to summarizing the topic and provide a final comment and assessment of the study.

  • Summary of findings – synthesize the answers to your research questions. Do not report any statistical data here; just provide a narrative summary of the key findings and describe what was learned that you did not know before conducting the study.
  • Recommendations – if appropriate to the aim of the assignment, tie key findings with policy recommendations or actions to be taken in practice.
  • Future research – note the need for future research linked to your study’s limitations or to any remaining gaps in the literature that were not addressed in your study.

Black, Thomas R. Doing Quantitative Research in the Social Sciences: An Integrated Approach to Research Design, Measurement and Statistics . London: Sage, 1999; Gay,L. R. and Peter Airasain. Educational Research: Competencies for Analysis and Applications . 7th edition. Upper Saddle River, NJ: Merril Prentice Hall, 2003; Hector, Anestine. An Overview of Quantitative Research in Composition and TESOL . Department of English, Indiana University of Pennsylvania; Hopkins, Will G. “Quantitative Research Design.” Sportscience 4, 1 (2000); "A Strategy for Writing Up Research Results. The Structure, Format, Content, and Style of a Journal-Style Scientific Paper." Department of Biology. Bates College; Nenty, H. Johnson. "Writing a Quantitative Research Thesis." International Journal of Educational Science 1 (2009): 19-32; Ouyang, Ronghua (John). Basic Inquiry of Quantitative Research . Kennesaw State University.

Strengths of Using Quantitative Methods

Quantitative researchers try to recognize and isolate specific variables contained within the study framework, seek correlation, relationships and causality, and attempt to control the environment in which the data is collected to avoid the risk of variables, other than the one being studied, accounting for the relationships identified.

Among the specific strengths of using quantitative methods to study social science research problems:

  • Allows for a broader study, involving a greater number of subjects, and enhancing the generalization of the results;
  • Allows for greater objectivity and accuracy of results. Generally, quantitative methods are designed to provide summaries of data that support generalizations about the phenomenon under study. In order to accomplish this, quantitative research usually involves few variables and many cases, and employs prescribed procedures to ensure validity and reliability;
  • Applying well established standards means that the research can be replicated, and then analyzed and compared with similar studies;
  • You can summarize vast sources of information and make comparisons across categories and over time; and,
  • Personal bias can be avoided by keeping a 'distance' from participating subjects and using accepted computational techniques .

Babbie, Earl R. The Practice of Social Research . 12th ed. Belmont, CA: Wadsworth Cengage, 2010; Brians, Craig Leonard et al. Empirical Political Analysis: Quantitative and Qualitative Research Methods . 8th ed. Boston, MA: Longman, 2011; McNabb, David E. Research Methods in Public Administration and Nonprofit Management: Quantitative and Qualitative Approaches . 2nd ed. Armonk, NY: M.E. Sharpe, 2008; Singh, Kultar. Quantitative Social Research Methods . Los Angeles, CA: Sage, 2007.

Limitations of Using Quantitative Methods

Quantitative methods presume to have an objective approach to studying research problems, where data is controlled and measured, to address the accumulation of facts, and to determine the causes of behavior. As a consequence, the results of quantitative research may be statistically significant but are often humanly insignificant.

Some specific limitations associated with using quantitative methods to study research problems in the social sciences include:

  • Quantitative data is more efficient and able to test hypotheses, but may miss contextual detail;
  • Uses a static and rigid approach and so employs an inflexible process of discovery;
  • The development of standard questions by researchers can lead to "structural bias" and false representation, where the data actually reflects the view of the researcher instead of the participating subject;
  • Results provide less detail on behavior, attitudes, and motivation;
  • Researcher may collect a much narrower and sometimes superficial dataset;
  • Results are limited as they provide numerical descriptions rather than detailed narrative and generally provide less elaborate accounts of human perception;
  • The research is often carried out in an unnatural, artificial environment so that a level of control can be applied to the exercise. This level of control might not normally be in place in the real world thus yielding "laboratory results" as opposed to "real world results"; and,
  • Preset answers will not necessarily reflect how people really feel about a subject and, in some cases, might just be the closest match to the preconceived hypothesis.

Research Tip

Finding Examples of How to Apply Different Types of Research Methods

SAGE publications is a major publisher of studies about how to design and conduct research in the social and behavioral sciences. Their SAGE Research Methods Online and Cases database includes contents from books, articles, encyclopedias, handbooks, and videos covering social science research design and methods including the complete Little Green Book Series of Quantitative Applications in the Social Sciences and the Little Blue Book Series of Qualitative Research techniques. The database also includes case studies outlining the research methods used in real research projects. This is an excellent source for finding definitions of key terms and descriptions of research design and practice, techniques of data gathering, analysis, and reporting, and information about theories of research [e.g., grounded theory]. The database covers both qualitative and quantitative research methods as well as mixed methods approaches to conducting research.

SAGE Research Methods Online and Cases

  • << Previous: Qualitative Methods
  • Next: Insiderness >>
  • Last Updated: Apr 18, 2024 12:20 PM
  • URL: https://libguides.usc.edu/writingguide

Logo for UEN Digital Press with Pressbooks

Part II: Data Analysis Methods in Quantitative Research

Data analysis methods in quantitative research.

We started this module with levels of measurement as a way to categorize our data. Data analysis is directed toward answering the original research question and achieving the study purpose (or aim). Now, we are going to delve into two main statistical analyses to describe our data and make inferences about our data:

Descriptive Statistics and Inferential Statistics.

Descriptive Statistics:

Before you panic, we will not be going into statistical analyses very deeply. We want to simply get a good overview of some of the types of general statistical analyses so that it makes some sense to us when we read results in published research articles.

Descriptive statistics   summarize or describe the characteristics of a data set. This is a method of simply organizing and describing our data. Why? Because data that are not organized in some fashion are super difficult to interpret.

Let’s say our sample is golden retrievers (population “canines”). Our descriptive statistics  tell us more about the same.

  • 37% of our sample is male, 43% female
  • The mean age is 4 years
  • Mode is 6 years
  • Median age is 5.5 years

Image of golden retriever in field

Let’s explore some of the types of descriptive statistics.

Frequency Distributions : A frequency distribution describes the number of observations for each possible value of a measured variable. The numbers are arranged from lowest to highest and features a count of how many times each value occurred.

For example, if 18 students have pet dogs, dog ownership has a frequency of 18.

We might see what other types of pets that students have. Maybe cats, fish, and hamsters. We find that 2 students have hamsters, 9 have fish, 1 has a cat.

You can see that it is very difficult to interpret the various pets into any meaningful interpretation, yes?

Now, let’s take those same pets and place them in a frequency distribution table.                          

As we can now see, this is much easier to interpret.

Let’s say that we want to know how many books our sample population of  students have read in the last year. We collect our data and find this:

We can then take that table and plot it out on a frequency distribution graph. This makes it much easier to see how the numbers are disbursed. Easier on the eyes, yes?

Chart, histogram Description automatically generated

Here’s another example of symmetrical, positive skew, and negative skew:

Understanding Descriptive Statistics | by Sarang Narkhede | Towards Data Science

Correlation : Relationships between two research variables are called correlations . Remember, correlation is not cause-and-effect. Correlations  simply measure the extent of relationship between two variables. To measure correlation in descriptive statistics, the statistical analysis called Pearson’s correlation coefficient I is often used.  You do not need to know how to calculate this for this course. But, do remember that analysis test because you will often see this in published research articles. There really are no set guidelines on what measurement constitutes a “strong” or “weak” correlation, as it really depends on the variables being measured.

However, possible values for correlation coefficients range from -1.00 through .00 to +1.00. A value of +1 means that the two variables are positively correlated, as one variable goes up, the other goes up. A value of r = 0 means that the two variables are not linearly related.

Often, the data will be presented on a scatter plot. Here, we can view the data and there appears to be a straight line (linear) trend between height and weight. The association (or correlation) is positive. That means, that there is a weight increase with height. The Pearson correlation coefficient in this case was r = 0.56.

analysis techniques in quantitative research

A type I error is made by rejecting a null hypothesis that is true. This means that there was no difference but the researcher concluded that the hypothesis was true.

A type II error is made by accepting that the null hypothesis is true when, in fact, it was false. Meaning there was actually a difference but the researcher did not think their hypothesis was supported.

Hypothesis Testing Procedures : In a general sense, the overall testing of a hypothesis has a systematic methodology. Remember, a hypothesis is an educated guess about the outcome. If we guess wrong, we might set up the tests incorrectly and might get results that are invalid. Sometimes, this is super difficult to get right. The main purpose of statistics is to test a hypothesis.

  • Selecting a statistical test. Lots of factors go into this, including levels of measurement of the variables.
  • Specifying the level of significance. Usually 0.05 is chosen.
  • Computing a test statistic. Lots of software programs to help with this.
  • Determining degrees of freedom ( df ). This refers to the number of observations free to vary about a parameter. Computing this is easy (but you don’t need to know how for this course).
  • Comparing the test statistic to a theoretical value. Theoretical values exist for all test statistics, which is compared to the study statistics to help establish significance.

Some of the common inferential statistics you will see include:

Comparison tests: Comparison tests look for differences among group means. They can be used to test the effect of a categorical variable on the mean value of some other characteristic.

T-tests are used when comparing the means of precisely two groups (e.g., the average heights of men and women). ANOVA and MANOVA tests are used when comparing the means of more than two groups (e.g., the average heights of children, teenagers, and adults).

  • t -tests (compares differences in two groups) – either paired t-test (example: What is the effect of two different test prep programs on the average exam scores for students from the same class?) or independent t-test (example: What is the difference in average exam scores for students from two different schools?)
  • analysis of variance (ANOVA, which compares differences in three or more groups) (example: What is the difference in average pain levels among post-surgical patients given three different painkillers?) or MANOVA (compares differences in three or more groups, and 2 or more outcomes) (example: What is the effect of flower species on petal length, petal width, and stem length?)

Correlation tests: Correlation tests check whether variables are related without hypothesizing a cause-and-effect relationship.

  • Pearson r (measures the strength and direction of the relationship between two variables) (example: How are latitude and temperature related?)

Nonparametric tests: Non-parametric tests don’t make as many assumptions about the data, and are useful when one or more of the common statistical assumptions are violated. However, the inferences they make aren’t as strong as with parametric tests.

  • chi-squared ( X 2 ) test (measures differences in proportions). Chi-square tests are often used to test hypotheses. The chi-square statistic compares the size of any discrepancies between the expected results and the actual results, given the size of the sample and the number of variables in the relationship. For example, the results of tossing a fair coin meet these criteria. We can apply a chi-square test to determine which type of candy is most popular and make sure that our shelves are well stocked. Or maybe you’re a scientist studying the offspring of cats to determine the likelihood of certain genetic traits being passed to a litter of kittens.

Inferential Versus Descriptive Statistics Summary Table

Statistical Significance Versus Clinical Significance

Finally, when it comes to statistical significance  in hypothesis testing, the normal probability value in nursing is <0.05. A p=value (probability) is a statistical measurement used to validate a hypothesis against measured data in the study. Meaning, it measures the likelihood that the results were actually observed due to the intervention, or if the results were just due by chance. The p-value, in measuring the probability of obtaining the observed results, assumes the null hypothesis is true.

The lower the p-value, the greater the statistical significance of the observed difference.

In the example earlier about our diabetic patients receiving online diet education, let’s say we had p = 0.05. Would that be a statistically significant result?

If you answered yes, you are correct!

What if our result was p = 0.8?

Not significant. Good job!

That’s pretty straightforward, right? Below 0.05, significant. Over 0.05 not   significant.

Could we have significance clinically even if we do not have statistically significant results? Yes. Let’s explore this a bit.

Statistical hypothesis testing provides little information for interpretation purposes. It’s pretty mathematical and we can still get it wrong. Additionally, attaining statistical significance does not really state whether a finding is clinically meaningful. With a large enough sample, even a small very tiny relationship may be statistically significant. But, clinical significance  is the practical importance of research. Meaning, we need to ask what the palpable effects may be on the lives of patients or healthcare decisions.

Remember, hypothesis testing cannot prove. It also cannot tell us much other than “yeah, it’s probably likely that there would be some change with this intervention”. Hypothesis testing tells us the likelihood that the outcome was due to an intervention or influence and not just by chance. Also, as nurses and clinicians, we are not concerned with a group of people – we are concerned at the individual, holistic level. The goal of evidence-based practice is to use best evidence for decisions about specific individual needs.

analysis techniques in quantitative research

Additionally, begin your Discussion section. What are the implications to practice? Is there little evidence or a lot? Would you recommend additional studies? If so, what type of study would you recommend, and why?

analysis techniques in quantitative research

  • Were all the important results discussed?
  • Did the researchers discuss any study limitations and their possible effects on the credibility of the findings? In discussing limitations, were key threats to the study’s validity and possible biases reviewed? Did the interpretations take limitations into account?
  • What types of evidence were offered in support of the interpretation, and was that evidence persuasive? Were results interpreted in light of findings from other studies?
  • Did the researchers make any unjustifiable causal inferences? Were alternative explanations for the findings considered? Were the rationales for rejecting these alternatives convincing?
  • Did the interpretation consider the precision of the results and/or the magnitude of effects?
  • Did the researchers draw any unwarranted conclusions about the generalizability of the results?
  • Did the researchers discuss the study’s implications for clinical practice or future nursing research? Did they make specific recommendations?
  • If yes, are the stated implications appropriate, given the study’s limitations and the magnitude of the effects as well as evidence from other studies? Are there important implications that the report neglected to include?
  • Did the researchers mention or assess clinical significance? Did they make a distinction between statistical and clinical significance?
  • If clinical significance was examined, was it assessed in terms of group-level information (e.g., effect sizes) or individual-level results? How was clinical significance operationalized?

References & Attribution

“ Green check mark ” by rawpixel licensed CC0 .

“ Magnifying glass ” by rawpixel licensed CC0

“ Orange flame ” by rawpixel licensed CC0 .

Polit, D. & Beck, C. (2021).  Lippincott CoursePoint Enhanced for Polit’s Essentials of Nursing Research  (10th ed.). Wolters Kluwer Health 

Vaid, N. K. (2019) Statistical performance measures. Medium. https://neeraj-kumar-vaid.medium.com/statistical-performance-measures-12bad66694b7

Evidence-Based Practice & Research Methodologies Copyright © by Tracy Fawns is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License , except where otherwise noted.

Share This Book

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

  • View all journals
  • Explore content
  • About the journal
  • Publish with us
  • Sign up for alerts
  • Published: 11 April 2024

Quantitative text analysis

  • Kristoffer L. Nielbo   ORCID: orcid.org/0000-0002-5116-5070 1 ,
  • Folgert Karsdorp 2 ,
  • Melvin Wevers   ORCID: orcid.org/0000-0001-8177-4582 3 ,
  • Alie Lassche   ORCID: orcid.org/0000-0002-7607-0174 4 ,
  • Rebekah B. Baglini   ORCID: orcid.org/0000-0002-2836-5867 5 ,
  • Mike Kestemont 6 &
  • Nina Tahmasebi   ORCID: orcid.org/0000-0003-1688-1845 7  

Nature Reviews Methods Primers volume  4 , Article number:  25 ( 2024 ) Cite this article

2930 Accesses

53 Altmetric

Metrics details

  • Computational science
  • Interdisciplinary studies

Text analysis has undergone substantial evolution since its inception, moving from manual qualitative assessments to sophisticated quantitative and computational methods. Beginning in the late twentieth century, a surge in the utilization of computational techniques reshaped the landscape of text analysis, catalysed by advances in computational power and database technologies. Researchers in various fields, from history to medicine, are now using quantitative methodologies, particularly machine learning, to extract insights from massive textual data sets. This transformation can be described in three discernible methodological stages: feature-based models, representation learning models and generative models. Although sequential, these stages are complementary, each addressing analytical challenges in the text analysis. The progression from feature-based models that require manual feature engineering to contemporary generative models, such as GPT-4 and Llama2, signifies a change in the workflow, scale and computational infrastructure of the quantitative text analysis. This Primer presents a detailed introduction of some of these developments, offering insights into the methods, principles and applications pertinent to researchers embarking on the quantitative text analysis, especially within the field of machine learning.

You have full access to this article via your institution.

Similar content being viewed by others

analysis techniques in quantitative research

Augmenting interpretable models with large language models during training

Chandan Singh, Armin Askari, … Jianfeng Gao

analysis techniques in quantitative research

An open source machine learning framework for efficient and transparent systematic reviews

Rens van de Schoot, Jonathan de Bruin, … Daniel L. Oberski

analysis techniques in quantitative research

The shaky foundations of large language models and foundation models for electronic health records

Michael Wornow, Yizhe Xu, … Nigam H. Shah

Introduction

Qualitative analysis of textual data has a long research history. However, a fundamental shift occurred in the late twentieth century when researchers began investigating the potential of computational methods for text analysis and interpretation 1 . Today, researchers in diverse fields, such as history, medicine and chemistry, commonly use the quantification of large textual data sets to uncover patterns and trends, producing insights and knowledge that can aid in decision-making and offer novel ways of viewing historical events and current realities. Quantitative text analysis (QTA) encompasses a range of computational methods that convert textual data or natural language into structured formats before subjecting them to statistical, mathematical and numerical analysis. With the increasing availability of digital text from numerous sources, such as books, scientific articles, social media posts and online forums, these methods are becoming increasingly valuable, facilitated by advances in computational technology.

Given the widespread application of QTA across disciplines, it is essential to understand the evolution of the field. As a relatively consolidated field, QTA embodies numerous methods for extracting and structuring information in textual data. It gained momentum in the late 1990s as a subset of the broader domain of data mining, catalysed by advances in database technologies, software accessibility and computational capabilities 2 , 3 . However, it is essential to recognize that the evolution of QTA extends beyond computer science and statistics. It has heavily incorporated techniques and algorithms derived from  corpus linguistics 4 , computer linguistics 5 and information retrieval 6 . Today, QTA is largely driven by  machine learning , a crucial component of  data science , artificial intelligence (AI) and natural language processing (NLP).

Methods of QTA are often referred to as techniques that are innately linked with specific tasks (Table  1 ). For example, the sentiment analysis aims to determine the emotional tone of a text 7 , whereas entity and concept extraction seek to identify and categorize elements in a text, such as names, locations or key themes 8 , 9 . Text classification refers to the task of sorting texts into groups with predefined labels 10 — for example, sorting news articles into semantic categories such as politics, sports or entertainment. In contrast to machine-learning tasks that use supervised learning , text clustering, which uses  unsupervised learning , involves finding naturally occurring groups in unlabelled texts 11 . A significant subset of tasks primarily aim to simplify and structure natural language. For example, representation learning includes tasks that automatically convert texts into numerical representations, which can then be used for other tasks 12 . The lines separating these techniques can be blurred and often vary depending on the research context. For example, topic modelling, a type of statistical modelling used for concept extraction, serves simultaneously as a clustering and representation learning technique 13 , 14 , 15 .

QTA, similar to machine learning, learns from observation of existing data rather than by manipulating variables as in scientific experiments 16 . In QTA, experiments encompass the design and implementation of empirical tests to explore and evaluate the performance of models, algorithms and techniques in relation to specific tasks and applications. In practice, this involves a series of steps. First, text data are collected from real-world sources such as newspaper articles, patient records or social media posts. Then, a specific type of machine-learning model is selected and designed. The model could be a tree-based decision model, a clustering technique or more complex encoder–decoder models for tasks such as translation. Subsequently, the selected model is trained on the collected data, learning to make categorizations or predictions based on the data. The performance of the model is evaluated using predominantly intrinsic performance metrics (such as accuracy for a classification task) and, to a lesser degree, extrinsic metrics that measure how the output of the model impacts a broader task or system.

Three distinct methodological stages can be observed in the evolution of QTA: feature-based models, representation learning models and generative models (Fig.  1 ). Feature-based models use efficient machine-learning techniques, collectively referred to as shallow learning, which are ideal for tabular data but require manual feature engineering. They include models based on  bag-of-words models , decision trees and support vector machines and were some of the first methods applied in QTA. Representation learning models use deep learning techniques that automatically learn useful features from text. These models include architectures such as the highly influential  transformer architecture 17 and techniques such as masked language modelling, as used in language representation models such as Bidirectional Encoder Representations from Transformers (BERT) 18 . BERT makes use of the transformer architecture, as do most other large language models after the introduction of the architecture 17 . This shift towards automatic learning representations marked an important advance in natural language understanding. Generative models, trained using autoregressive techniques, represent the latest frontier. These models, such as generative pre-trained transformer GPT-3 (ref. 19 ), GPT-4 and Llama2 (ref. 20 ), can generate coherent and contextually appropriate responses and are powerful tools for natural language generation. Feature-based models preceded representation learning, which in turn preceded generative models.

figure 1

a , Feature-based models in which data undergo preprocessing to generate features for model training and prediction. b , Representation learning models that can be trained from scratch using raw data or leverage pre-trained models fine-tuned with specific data. c , Generative models in which a prompt guides the generative deep learning model, potentially augmented by external data, to produce a result.

Although these models are temporally ordered, they do not replace each other. Instead, each offers unique methodological features and is suitable for different tasks. The progress from small models with limited computing capacity to today’s large models with billions of parameters encapsulates the transformation in the scale and complexity of the QTA.

The evolution of these models reflects the advancement of machine-learning infrastructure, particularly in the emergence and development of tooling frameworks. These frameworks, exemplified by platforms such as scikit-learn 21 and Hugging Face 22 , have served as essential infrastructure for democratizing and simplifying the implementation of increasingly sophisticated models. They offer user-friendly interfaces that mask the complexities of the algorithms, thereby empowering researchers to harness advanced methodologies with minimal prerequisite knowledge and coding expertise. The advent of high-level generative models such as GPT-3 (ref. 19 ), GPT-4 and Llama2 (ref. 20 ) marks milestones in the progression. Renowned for their unprecedented language understanding and generation capabilities, these models have the potential to redefine access to the sophisticated text analysis by operating on natural language prompts, effectively bypassing the traditional need for coding. It is important to emphasize that these stages represent an abstraction that points to fundamental changes to the workflow and underlying infrastructure of QTA.

This Primer offers an accessible introduction to QTA methods, principles and applications within feature-based models, representation learning and generative models. The focus is on how to extract and structure textual data using machine learning to enable quantitative analysis. The Primer is particularly suitable for researchers new to the field with a pragmatic interest in these techniques. By focusing on machine-learning methodologies, a comprehensive overview of several key workflows currently in use is presented. The focus consciously excludes traditional count-based and rule-based methods, such as keyword and collocation analysis. This decision is guided by the current dominance of machine learning in QTA, in terms of both performance and scalability. However, it is worth noting that machine-learning methods can encompass traditional approaches where relevant, adding to their versatility and broad applicability. The experiments in QTA are presented, including problem formulation, data collection, model selection and evaluation techniques. The results and real-world applications of these methodologies are discussed, underscoring the importance of reproducibility and robust data management practices. The inherent limitations and potential optimizations within the field are addressed, charting the evolution from basic feature-based approaches to advanced generative models. The article concludes with a forward-looking discussion on the ethical implications, practical considerations and methodological advances shaping the future of QTA. Regarding tools and software, references to specific libraries and packages are omitted as they are relatively easy to identify given a specific task. Generally, the use of programming languages that are well suited for QTA is recommended, such as Python, R and Julia, but it is also acknowledged that graphical platforms for data analysis provide similar functionalities and may be better suited for certain disciplines.

Experimentation

In QTA, the term experiment assumes a distinct character. Rather than mirroring the controlled conditions commonly associated with randomized controlled trials, it denotes a structured procedure that aims to validate, refine and compare models and findings. QTA experiments provide a platform for testing ideas, establishing hypotheses and paving the way for advancement. At the heart of these experiments lies a model — a mathematical and computational embodiment of discernible patterns drawn from data. A model can be considered a learned function that captures the intricate relationship between textual features and their intended outcomes, allowing for informed decisions on unseen data. For example, in the sentiment analysis, a model learns the association between specific words or phrases and the emotions they convey, later using this knowledge to assess the sentiment of new texts.

The following section delineates the required steps for a QTA experiment. This step-by-step description encompasses everything from problem definition and data collection to the nuances of model selection, training and validation. It is important to distinguish between two approaches in QTA: training or fine-tuning a model, and applying a (pre-trained) model (Fig.  1 ). In the first approach, a model is trained or fine-tuned to solve a QTA task. In the second approach, a pre-trained model is used to solve a QTA task. Finally, it is important to recognize that experimentation, much like other scientific pursuits, is inherently iterative. This cyclic process ensures that the devised models are not just accurate but also versatile enough to be applicable in real-world scenarios.

Problem formulation

Problem formulation is a crucial first step in QTA, laying the foundation for subsequent analysis and experimentation. This process involves several key considerations, which, when clearly defined beforehand, contributes to the clarity and focus of the experiment. First, every QTA project begins with the identification of a research question. The subsequent step is to determine the scope of the analysis, which involves defining the boundaries of the study, such as the time period, the type of texts to be analysed or the geographical or demographic considerations.

An integral part of this process is to identify the nature of the analytical task. This involves deciding whether the study is a classification task, for example, in which data are categorized into predefined classes; a clustering task, in which data are grouped based on similarities without predefined categories; or another type of analysis. The choice of task has significant implications for both the design of the study and the selection of appropriate data and analytical techniques. For instance, a classification task such as sentiment analysis requires clearly defined categories and suitable labelled data, whereas a clustering task might be used in the exploratory data analysis to uncover underlying patterns in the data.

After selecting data to support the analysis, an important next step is deciding on the level of analysis. QTA can be conducted at various levels, such as the document-level, paragraph-level, sentence-level or even word-level. The choice largely depends on the research question, as well as the nature of the data set.

Classification

A common application of a classification task in QTA is the sentiment analysis. For instance, in analysing social media comments, a binary classification might be employed in which comments are labelled as positive or negative. This straightforward example showcases the formulation of a problem in which the objective is clear-cut classification based on predefined sentiment labels. In this case, the level of analysis might be at the sentence level, focusing on the sentiment expressed in each individual comment.

From this sentence-level information, it is possible to extrapolate to general degrees of sentiment. This is often done when companies want to survey their products or when political parties want to analyse their support, for example, to determine how many people are positive or negative towards the party 23 . Finally, from changing degrees of sentiment, one can extract the most salient aspects that form this sentiment: recurring positive or negative sentiments towards price or quality, or different political issues.

Modelling of themes

The modelling of themes involves the identification of prevalent topics, for example, in a collection of news articles. Unlike the emotion classification task, here the researcher is interested in uncovering underlying themes or topics, rather than classifying texts into predefined categories. This problem formulation requires an approach that can discern and categorize emergent topics from the textual data, possibly at the document level, to capture broader thematic elements. This can be done without using any predefined hypotheses 24 , or by steering topic models towards certain seed topics (such as a given scientific paper or book) 25 . Using such topic detection tools, it can be determined how prevalent topics are in different time periods or across genre to determine significance or impact of both topics and authors.

Modelling of temporal change

Consider a study aiming to track the evolution of literary themes over time. In this scenario, the problem formulation would involve not only the selection of texts and features but also a temporal dimension, in which changes in themes are analysed across different time periods. This type of analysis might involve examining patterns and trends in literary themes, requiring a longitudinal approach to text analysis, for example, in the case of scientific themes or reports about important events 26 or themes as proxy for meaning change 27 . Often, when longitudinal analysis is considered, additional challenges are involved, such as statistical properties relating to increasing or decreasing quantity or quality of data that can influence results, see, for example, refs. 28 , 29 , 30 , 31 .

In similar fashion, temporal analysis of changing data happens in a multitude of disciplines from linguistics, as in computational detection of words that experience change in meaning 32 , to conceptual change in history 33 , poetry 34 , medicine 35 , political science 36 , 37 and to the study of ethnical biases and racism 38 , 39 , 40 .

The GIGO principle, meaning ‘garbage in, garbage out’, is ever present in QTA because without high-quality data even the most sophisticated models can falter, rendering analyses inaccurate or misleading. To ensure robustness in, for example, social media data, its inherently informal and dynamic nature must be acknowledged, often characterized by non-standard grammar, slang and evolving language use. Robustness here refers to the ability of the data to provide reliable, consistent analysis, despite these irregularities. This requires implementing specialized preprocessing techniques that can handle such linguistic variability without losing contextual meaning. For example, rather than discarding non-standard expressions or internet-specific abbreviations, these elements should be carefully processed to preserve their significant role in conveying sentiment and meaning. Additionally, ensuring representativeness and diversity in the data set is crucial; collecting data across different demographics, topics and time frames can mitigate biases and provide a more comprehensive view of the discourse if this is needed. Finally, it is important to pay attention to errors, anomalies and irregularities in the data, such as optical character recognition errors and missing values, and in some cases take steps to remediate these in preprocessing. More generally, it is crucial to emphasize that the quality of a given data set depends on the research question. Grammatically well-formed sentences may be high-quality data for training a linguistic parser; social media could never be studied as people on social media rarely abide by the rules of morphology and syntax. This underscores the vital role of data not just as input but also as an essential component that dictates the success and validity of the analytical endeavour.

Data acquisition

Depending on the research objective, data sets can vary widely in their characteristics. For the emotion classifier, a data set could consist of many social media comments. If the task is to train or fine-tune a model, each comment should be annotated with its corresponding sentiment label (labels). If the researcher wants to apply a pre-trained model, then only a subset of the data must be annotated to test the generalizability of the model. Labels can be annotated manually or automatically, for instance, by user-generated ratings, such as product reviews or social media posts, for example. Training data should have sufficient coverage of the phenomenon under investigation to capture its linguistic characteristics. For the emotion classifier, a mix of comments are needed, ranging from brief quips to lengthy rants, offering diverse emotional perspectives. Adhering to the principle that there are no data like more data, the breadth and depth of such a data set significantly enhance the accuracy of the model. Traditionally, data collection was arduous, but today QTA researchers can collect data from the web and archives using dedicated software libraries or an  application programming interface . For analogue data, optical character recognition and handwritten text recognition offer efficient conversion to machine-readable formats 41 . Similarly, for auditory language data, automatic speech recognition has emerged as an invaluable tool 42 .

Data preprocessing

In feature-based QTA, manual data preprocessing is one of the most crucial and time-consuming stages. Studies suggest that researchers can spend up to 80% of their project time refining and managing their data 43 . A typical preprocessing workflow for feature-based techniques requires data cleaning and text normalization. Standard procedures include transforming all characters to lower case for uniformity, eliminating punctuation marks and removing high-frequency functional words such as ‘and’, ‘the’ or ‘is’. However, it is essential to recognize that these preprocessing strategies should be closely aligned with the specific research question at hand. For example, in the sentiment analysis, retaining emotive terms and expressions is crucial, whereas in syntactic parsing, the focus might be on the structural elements of language, requiring a different approach to what constitutes ‘noise’ in the data. More nuanced challenges arise in ensuring the integrity of a data set. For instance, issues with character encoding require attention to maintain language and platform interoperability, which means resorting to universally accepted encoding formats such as UTF-8. Other normalization steps, such as  stemming or lemmatization , involve reducing words to their root forms to reduce lexical variation. Although these are standard practices, their application might vary depending on the research objective. For example, in a study focusing on linguistic diversity, aggressive stemming may erase important stylistic or dialectal markers. Many open-source software libraries exist nowadays that can help automate such processes for various languages. The impact of these steps on research results underscores the necessity of a structured and well-documented approach to preprocessing, including detailed reporting of all preprocessing steps and software used, to ensure that analyses are both reliable and reproducible. The practice of documenting preprocessing is crucial, yet often overlooked, reinforcing its importance for the integrity of research.

With representation learning and generative techniques, QTA has moved towards end-to-end models that take raw text input such as social media comments and directly produces the final desired output such as emotion classification, handling all intermediate steps without manual intervention 44 . However, removal of non-textual artefacts such as HTML codes and unwanted textual elements such as pornographic material can still require substantial work to prepare data to train an end-to-end model.

Annotation and labelling

Training and validating a (pre-trained) model requires annotating the textual data set. These data sets come in two primary flavours: pre-existing collections with established labels and newly curated sets awaiting annotation. Although pre-existing data sets offer a head-start, owing to their readymade labels, they must be validated to ensure alignment with research objectives. By contrast, crafting a data set from scratch confers flexibility to tailor the data to precise research needs, but it also ushers in the intricate task of collecting and annotating data. Annotation is a meticulous endeavour that demands rigorous consistency and reliability. To ensure inter-annotator agreement (IAA) 45 , for example, annotations from multiple annotators are compared using metrics such as  Fleiss’ kappa ( κ ) to assess consistency. A high IAA score not only indicates annotation consistency but also lends confidence in the reliability of the data set. There is no universally accepted manner to interpret κ statistics, although κ  ≥ 0. 61 is generally considered to indicate ‘substantial agreement’ 46 .

Various tools and platforms support the annotation process. Specialized software for research teams provides controlled environments for annotation tasks. Crowdsourcing is another approach, in which tasks are distributed among a large group of people. This can be done through non-monetized campaigns, focusing on volunteer participation or gamification strategies to encourage user engagement in annotation tasks 47 . Monetized platforms, such as Amazon Mechanical Turk, represent a different facet of crowdsourcing in which microtasks are outsourced for financial compensation. It is important to emphasize that, although these platforms offer a convenient way to gather large-scale annotations, they raise ethical concerns regarding worker exploitation and fair compensation. Critical studies, such as those of Paolacci, Chandler and Ipeirotis 48 and Bergvall-Kåreborn and Howcroft 49 , highlight the need for awareness and responsible use of such platforms in research contexts.

Provenance and ethical considerations

Data provenance is of utmost importance in QTA. Whenever feasible, preference should be given to open and well-documented data sets that comply with the principles of FAIR (findable, accessible, interoperable and reusable) 50 . However, the endeavour to harness data, especially online, requires both legal and ethical considerations. For instance, the General Data Protection Regulation delineates the rights of European data subjects and sets stringent data collection and usage criteria. Unstructured data can complicate standard techniques for data depersonalization (for example, data masking, swapping and pseudonymization). Where these techniques fail, differential privacy may be a viable alternative to ensure that the probability of any specific output of the model does not depend on the information of any individual in the data set 51 .

Recognition of encoded biases is equally important. Data sets can inadvertently perpetuate cultural biases towards attributes such as gender and race, resulting in sampling bias. Such bias compromises research integrity and can lead to models that reinforce existing inequalities. Gender, for instance, can have subtle effects that are not easily detected in textual data 52 . A popular approach to rectifying biases is  data augmentation , which can be used to increase the diversity of a data set without collecting new data 53 . This is achieved by applying transformations to existing textual data, creating new and diverse examples. The main goal of data augmentation is to improve model generalization by exposing it to a broader range of data variations.

Model selection and design

Model selection and design set the boundaries for efficiency, accuracy and generalizability of any QTA experiment. Choosing the right model architecture depends on several considerations and will typically require experimentation to compare the performance of multiple models. Although the methodological trajectory of QTA provides a roadmap, specific requirements of the task, coupled with available data volume, often guide the final choice. Although some tasks require that the model be trained from scratch owing to, for instance, transparency and security requirements, it has become common to use pre-trained models that provide text representations originating from training on massive data sets. Pre-trained models can be fine-tuned for a specific task, for example, emotion classification. Training feature-based models may be optimal for smaller data sets, focusing on straightforward interpretability. By contrast, the complexities of expansive textual data often require representation learning or generative models. In QTA, achieving peak performance is a trade-off among model interpretability, computational efficiency and predictive power. As the sophistication of a model grows, hyperparameter tuning, regularization and loss function require meticulous consideration. These decisions ensure that a model is not only accurate but also customized for research-specific requirements.

Training and evaluation

During the training phase, models learn patterns from the data to predict or classify textual input. Evaluation is the assessment phase that determines how the trained model performs on unseen data. Evaluation serves multiple purposes, but first and foremost, it is used to assess how well the model performs on a specific task using metrics such as accuracy, precision and recall. For example, knowing how accurately the emotion classifier identifies emotions is crucial for any research application. Evaluation of this model also allows researchers to assess whether it is biased towards common emotions and whether it generalizes across different types of text sources. When an emotion classifier is trained on social media posts, a common practice, its effectiveness can be evaluated on different data types, such as patient journals or historical newspapers, to determine its performance across varied contexts. Evaluation enables us to compare multiple models to select the most relevant for the research problem. Additional evaluation involves hyperparameter tuning, resource allocation, benchmarking and model fairness audits.

Overfitting is often a challenge in model training, which can occur when a model is excessively tailored to the peculiarities of the training data and becomes so specialized that its generalizability is compromised. Such a model performs accurately on the specific data set but underperforms on unseen examples. Overfitting can be counteracted by dividing the data into three distinct subsets: the training set, the validation set and the test set. The training set is the primary data set from which the model learns patterns, adjusts its weights and fine-tunes itself based on the labelled examples provided. The validation set is used to monitor and assess the performance of the model during training. It acts as a checkpoint, guides hyperparameter tuning and ensures that the model is not veering off track. The test set is the final held-out set on which the performance of the model is evaluated. The test set is akin to a final examination, assessing how well the model generalizes to unseen data. If a pre-trained model is used, only the data sets used to fine-tune the model are necessary to evaluate the model.

The effectiveness of any trained model is gauged not just by how well it fits the training data but also by its performance on unseen samples. Evaluation metrics provide objective measures to assess performance on validation and test sets as well as unseen examples. The evaluation process is fundamental to QTA experiments, as demonstrated in the text classification research 10 . Several evaluation metrics are used to measure performance. The most prominent are accuracy (the proportion of all predictions that are correct), precision (the proportion of positive predictions that are actually correct) and recall (the proportion of actual positives that were correctly identified). The F1 score amalgamates precision and recall and emerges as a balanced metric, especially when class distributions are skewed. An effective evaluation typically uses various complementary metrics.

In QTA, a before-and-after dynamic often emerges, encapsulating the transformation from raw data to insightful conclusions 54 . This paradigm is especially important in QTA, in which the raw textual data can be used to distil concrete answers to research questions. In the preceding section, the preliminary before phase, the process of setting up an experiment in QTA, is explored with emphasis on the importance of model training and thorough evaluation to ensure robustness. For the after phase, the focus pivots to the critical step of applying the trained model to new, unseen data, aiming to answer the research questions that guide exploration.

Research questions in QTA are often sophisticated and complex, encompassing a range of inquiries either directly related to the text being analysed or to the external phenomena the text reflects. The link between the output of QTA models and the research question is often vague and under-specified. When dealing with a complex research question, for example, the processes that govern the changing attitudes towards different migrant groups, the outcome of any one QTA model is often insufficient. Even several models might not provide a complete answer to the research question. Consequently, challenges surface during the transition from before to after, from setting up and training to applying and validating. One primary obstacle is the validation difficulty posed by the uniqueness and unseen nature of the new data.

Validating QTA models on new, unseen data introduces a layer of complexity that highlights the need for robust validation strategies, to ensure stability, generalizability and replicability of results. Although the effectiveness of a model might have been calibrated in a controlled setup, its performance can oscillate when exposed to the multifaceted layers of new real-world data. Ensuring consistent model performance is crucial to deriving meaningful conclusions aligned with the research question. This dual approach of applying the model and subsequently evaluating its performance in fresh terrains is central to the after phase of QTA. In addition to validating the models, the results that stem from the models need to be validated with respect to the research question. The results need to be representative for the data as a whole; they need to be stable such that the answer does not change if different choices are made in the before phase; and they need to provide an answer to the research question at hand.

This section provides a road map for navigating the application of QTA models to new data and a chart of methodologies for evaluating the outcomes in line with the research question (questions). The goal is to help researchers cross the bridge between the theoretical foundations of QTA and its practical implementation, illuminating the steps that support the successful application and assessment of QTA models. The ensuing discussion covers validation strategies that cater to the challenges brought forth by new data, paving the way towards more insightful analysis.

Application to new data

After the training and evaluation phases have been completed, the next step is applying the trained model to new, unseen data (Fig.  2 ). The goal is to ensure that the application aligns with the research questions and aids in extracting meaningful insights. However, applying the model to new data is not without challenges.

figure 2

Although the illustration demonstrates a feature-based modelling approach, the fundamental principle remains consistent across different methodologies, be it feature-based, representation learning or generative. A critical consideration is ensuring the consistency in content and preprocessing between the training data and any new data subjected to inference.

Before application of the model, it is crucial to preprocess the new data similar to the training data. This involves routine tasks such as tokenization and lemmatization, but also demands vigilance for anomalies such as divergent text encoding formats or missing values. In such cases, additional preprocessing steps might be required and should be documented carefully to ensure reproducibility.

Another potential hurdle is the discrepancy in data distributions between the training data and new data, often referred to as domain shift. If not addressed, domain shifts may hinder the efficacy of the model. Even thematically, new data may unearth categories or motifs that were absent during training, thus challenging the interpretative effectiveness of the model. In such scenarios, transfer learning or domain adaptation techniques are invaluable tools for adjusting the model so that it aligns better with the characteristics of the new data. In transfer learning, a pre-trained model provides general language understanding and is fine-tuned with a small data set for a specific task (for example, fine-tuning a large language model such as GPT or BERT for emotion classification) 55 , 56 . Domain adaptation techniques similarly adjust a model from a source domain to a target domain; for example, an emotion classifier trained on customer reviews can be adapted to rate social media comments.

Given the iterative nature of QTA, applying a model is not necessarily an end point; it may simply be a precursor to additional refinement and analysis. Therefore, the adaptability of the validation strategies is paramount. As nuances in the new data are uncovered, validation strategies may need refinement or re-adaptation to ensure the predictions of the model remain accurate and insightful, ensuring that the answers to the research questions are precise and meaningful. Through careful application and handling of the new data, coupled with adaptable validation strategies, researchers can significantly enhance the value of their analysis in answering the research question.

Evaluation metrics

QTA models are often initially developed and validated on well-defined data sets, ensuring their reliability in controlled settings. This controlled environment allows researchers to set aside a held-out test set to gauge the performance of a model, simulating how it will fare on new data. The real world, however, is considerably more complex than any single data set can capture. The challenge is how to transition from a controlled setting to novel data sets.

One primary challenge is the mismatch between the test set and real-world texts. Even with the most comprehensive test sets, capturing the linguistic variation, topic nuance and contextual subtlety present in new data sets is not a trivial task, and researchers should not be overconfident regarding the universal applicability of a model 57 . The situation does not become less complicated when relying on pre-trained or off-the-shelf models. The original training data and its characteristics might not be transparent or known with such models. Without appropriate documentation, predicting the behaviour of a model on new data may become a speculative endeavour 58 .

The following sections summarize strategies for evaluating models on new data.

Model confidence scores

In QTA, models often generate confidence or probability scores alongside predictions, indicating the confidence of the model in its accuracy. However, high scores do not guarantee correctness and can be misleading. Calibrating the model refines these scores to align better with true label likelihoods 59 . This is especially crucial in high-stakes QTA applications such as legal or financial text analysis 60 . Calibration techniques adjust the original probability estimates, enhancing model reliability and the trustworthiness of predictions, thereby addressing potential discrepancies between the expressed confidence of the model and its actual performance.

Precision at k

Precision at k (P@ k ) is useful for tasks with rankable predictions, such as determining document relevance. P@ k measures the proportion of relevant items among the top- k ranked items, providing a tractable way to gauge the performance of a model on unseen data by focusing on a manageable subset, especially when manual evaluation of the entire data set is infeasible. Although primarily used in information retrieval and recommender system , its principles apply to QTA, in which assessing the effectiveness of a model in retrieving or categorizing relevant texts is crucial.

External feedback mechanisms

Soliciting feedback from domain experts is invaluable in evaluating models on unseen data. Domain experts can provide qualitative insights into the output of the model, identifying strengths and potential missteps. For example, in topic modelling, domain experts can assess the coherence and relevance of the generated topics. This iterative feedback helps refine the model, ensuring its robustness and relevance when applied to new, unseen data, thereby bridging the gap between model development and practical application.

Software and tools

When analysing and evaluating QTA models on unseen data, researchers often turn to specialized tools designed to increase model transparency and explain model predictions. Among these tools, LIME (Local Interpretable Model-agnostic Explanations) 61 and SHAP (SHapley Additive exPlanations) 62 have gained traction for their ability to provide insights into model behaviour per instance, which is crucial when transitioning to new data domains.

LIME focuses on the predictions of machine-learning models by creating locally faithful explanations. It operates by perturbing the input data and observing how the predictions change, making it a useful tool to understand model behaviour on unseen data. Using LIME, researchers can approximate complex models with simpler, interpretable models locally around the prediction point. By doing so, they can gain insight into how different input features contribute to the prediction of the model, which can be instrumental in understanding how a model might generalize to new, unseen data.

SHAP, by contrast, provides a unified measure of feature importance across different data types, including text. It uses game theoretic principles to attribute the output of machine-learning models to their input features. This method allows for a more precise understanding of how different words or phrases in text data influence the output of the model, thereby offering a clearer picture of the behaviour of the model on new data domains. The SHAP library provides examples of how to explain predictions from text analysis models applied to various NLP tasks including sentiment analysis, text generation and translation.

Both LIME and SHAP offer visual tools to help researchers interpret the predictions of the model, making it easier to identify potential issues when transitioning to unseen data domains. For instance, visualizations allow researchers to identify words or phrases that heavily influence the decisions of the model, which can be invaluable in understanding and adjusting the model for new text data.

Interpretation

Interpretability is paramount in QTA as it facilitates the translation of complex model outcomes into actionable insights relevant to the research questions. The nature and complexity of the research question can significantly mould the interpretation process by requiring various information signals to be extracted from the text, see, for example, ref.  63 . For example, in predicting election outcomes based on sentiments expressed in social media 64 , it is essential to account for both endorsements of parties as expressed in the text and a count of individuals (that is, statistical signals) to avoid the results being skewed because some individuals make a high number of posts. It is also important to note whether voters of some political parties are under-represented in the data.

The complexity amplifies when delving into understanding why people vote (or do not vote) for particular parties and what arguments sway their decisions. Such research questions demand a more comprehensive analysis, often necessitating the amalgamation of insights from multiple models, for example, argument mining, aspect-based sentiment analysis and topic models. There is a discernible gap between the numerical or categorical outputs of QTA models — such as classification values, proportions of different stances or vectors representing individual words — and the nuanced understanding required to fully address the research question. This understanding is achieved either using qualitative human analysis or applying additional QTA methods and extracts a diverse set of important arguments in support of different stances, or provides qualitative summaries of a large set of different comments. Because it is not only a matter of ‘what’ results are found using QTA, but the value that can be attributed to those results.

When interpreting the results of a computational model applied to textual data for a specific research question, it is important to consider the completeness of the answer (assess whether the output of the model sufficiently addresses the research question or whether there are aspects left unexplored), the necessity of additional models (determine whether the insights from more models are needed to fully answer the research question), the independence or co-dependence of results (in cases in which multiple models are used, ascertain whether their results are independent or co-dependent and adjust for any overlap in insights accordingly), clarify how the results are used to support an answer (such as the required occurrence of a phenomenon in the text to accept a concept, or how well a derived topic is understood and represented) and the effect of methodology (evaluate the impact of the chosen method or preprocessing on the results, ensuring the reproducibility and robustness of the findings against changes in preprocessing or methods).

Using these considerations alongside techniques such as LIME and SHAP enhances the evaluation of the application of the model. For instance, in a scenario in which a QTA model is used to analyse customer reviews, LIME and SHAP could provide nuanced insights on a peer-review basis and across all reviews, respectively. Such insights are pivotal in assessing the alignment of the model with the domain-relevant information necessary to address the research questions and in making any adjustments needed to enhance its relevance and performance. Moreover, these techniques and considerations catalyse a dialogue between model and domain experts, enabling a more nuanced evaluation that extends beyond mere quantitative metrics towards a qualitative understanding of the application of the model.

Applications

The applicability of QTA can be found in its ability to address research questions across various disciplines. Although these questions are varied and tasks exist that do not fit naturally into categories, they can be grouped into four primary tasks: extracting, categorizing, predicting and generating. Each task is important in advancing understanding of large textual data sets, either by examining phenomena specific to a text or by using texts as a proxy for phenomena outside the text.

Extracting information

In the context of QTA, information extraction goes beyond mere data retrieval; it also involves identifying and assessing patterns, structures and entities within extensive textual data sets. At its core are techniques such as frequency analysis, in which words or sets of words are counted and their occurrences plotted over periods to reveal trends or shifts in usage and syntactical analysis, which targets specific structures such as nouns, verbs and intricate patterns such as passive voice constructions. Named entity recognition pinpoints entities such as persons, organizations and locations using syntactic information and lexicons of entities.

These methodologies have proven useful in various academic domains. For example, humanities scholars have applied QTA to track the evolution of literary themes 65 . Word embedding has been used to shed light on broader sociocultural shifts such as the conceptual change of ‘racism’, or detecting moments of linguistic change in American foreign relations 40 , 66 . In a historical context, researchers have used diachronic word embeddings to scrutinize the role of abolitionist newspapers in influencing public opinion about the abolition of slavery, revealing pathways of lexical semantic influence, distinguishing leaders from followers and identifying others who stood out based on the semantic changes that swept through this period 67 . Topic modelling and topic linkage (the extent to which two topics tend to co-appear) have been applied to user comments and submissions from the ‘subreddit’ group r/TheRedPill to study how people interact with ideology 68 . In the medical domain 69 , QTA tools have been used to study narrative structures in personal birth stories. The authors utilized a topic model based on latent Dirichlet allocation (LDA) to not only represent the sequence of events in every story but also detect outlier stories using the probability of transitioning between topics.

Historically, the focus was predominantly on feature-based models that relied on manual feature engineering. Such methods were transparent but rigid, constraining the richness of the textual data. Put differently, given the labour-intensive selection of features and the need to keep them interpretable, the complexity of a text was reduced to a limited set of features. However, the advent of representation learning has catalysed a significant paradigm shift. It enables more nuanced extraction, considers contextual variations and allows for sophisticated trend analysis. Studies using these advanced techniques have been successful in, for example, analysing how gender stereotypes and attitudes towards ethnic minorities in the USA evolved during the twentieth and twenty-first centuries 38 and tracking the emergence of ideas in the domains of politics, law and business through contextual embeddings combined with statistical modelling 70 (Box  1 ).

Box 1 Using text mining to model prescient ideas

Vicinanza et al. 70 focused on the predictive power of linguistic markers within the domains of politics, law and business, positing that certain shifts in language can serve as early indicators of deeper cognitive changes. They identified two primary attributes of prescient ideas: their capacity to challenge existing contextual assumptions, and their ability to foreshadow the future evolution of a domain. To quantify this, they utilized Bidirectional Encoder Representations from Transformers, a type 2 language model, to calculate a metric termed contextual novelty to gauge the predictability of an utterance within the prevailing discourse.

Their study presents compelling evidence that prescient ideas are more likely to emerge from the periphery of a domain than from its core. This suggests that prescience is not solely an individual trait but also significantly influenced by contextual factors. Thus, the researchers extended the notion of prescience to include the environments in which innovative ideas are nurtured, adding another layer to our understanding of how novel concepts evolve and gain acceptance.

Categorizing content

It remains an indispensable task in QTA to categorize content, especially when dealing with large data sets. The challenge is not only logistical but also methodological, demanding sophisticated techniques to ensure precision and utility. Text classification algorithms, supervised or unsupervised, continue to have a central role in labelling and organizing content. They serve crucial functions beyond academic settings; for instance, digital libraries use these algorithms to manage and make accessible their expansive article collections. These classification systems also contribute significantly to the systematic review of the literature, enabling more focused and effective investigations of, for example, medical systematic reviews 71 . In addition, unsupervised techniques such as topic modelling have proven invaluable in uncovering latent subject matter within data sets 72 (Box  2 ). This utility extends to multiple scenarios, from reducing redundancies in large document sets to facilitating the analysis of open-ended survey responses 73 , 74 .

Earlier approaches to categorization relied heavily on feature-based models that used manually crafted features for organization. This traditional paradigm has been disrupted by advances in representation learning, deep neural networks and word embeddings, which has introduced a new age of dynamic unsupervised and semi-supervised techniques for content categorization. GPT models represent another leap forward in text classification tasks, outpacing existing benchmarks across various applications. From the sentiment analysis to text labelling and psychological construct detection, generative models have demonstrated a superior capability for context understanding, including the ability to parse complex linguistic cues such as sarcasm and mixed emotions 75 , 76 , 77 . Although the validity of these models is a matter of debate, they offer explanations for their reasoning, which adds a layer of interpretability.

Box 2 Exploring molecular data with topic modelling

Schneider et al. 72 introduced a novel application of topic modelling to the field of medicinal chemistry. The authors adopt a probabilistic topic modelling approach to organize large molecular data sets into chemical topics, enabling the investigation of relationships between these topics. They demonstrate the effectiveness of the quantitative text analysis method in identifying and retrieving chemical series from molecular sets. The authors are able to reproduce concepts assigned by humans in the identification and retrieval of chemical series from sets of molecules. Using topic modelling, the authors are able to show chemical topics intuitively with data visualization and efficiently extend the method to a large data set (ChEMBL22) containing 1.6 million molecules.

Predicting outcomes

QTA is not limited to understanding or classifying text but extends its reach into predictive analytics, which is an invaluable tool across many disciplines and industries. In the financial realm, sentiment analysis tools are applied to news articles and social media data to anticipate stock market fluctuations 78 . Similarly, political analysts use sentiment analysis techniques to make election forecasts, using diverse data sources ranging from Twitter (now X) feeds to party manifestos 79 . Authorship attribution offers another intriguing facet, in which predictive abilities of the QTA are harnessed to identify potential authors of anonymous or pseudonymous works 80 . A notable instance was the unmasking of J.K. Rowling as the author behind the pseudonym Robert Galbraith 81 . Health care has also tapped into predictive strengths of the QTA: machine-learning models that integrate natural language and binary features from patient records have been shown to have potential as early warning systems to prevent unnecessary mechanical restraint of psychiatric inpatients 82 (Box  3 ).

In the era of feature-based models, predictions often hinged on linear or tree-based structures using manually engineered features. Representation learning introduced embeddings and sequential models that improved prediction capabilities. These learned representations enrich predictive tasks, enhancing accuracy and reliability while decreasing interpretability.

Box 3 Predicting mechanical restraint: assessing the contribution of textual data

Danielsen et al. 82 set out to assess the potential of electronic health text data to predict incidents of mechanical restraint of psychiatric patients. Mechanical restraint is used during inpatient treatments to avert potential self-harm or harm to others. The research team used feature-based supervised machine learning to train a predictive model on clinical notes and health records from the Central Denmark Region, specifically focusing on the first hour of admission data. Of 5,050 patients and 8,869 admissions, 100 patients were subjected to mechanical restraint between 1 h and 3 days after admission. Impressively, a random forest algorithm could predict mechanical restraint with considerable precision, showing an area under the curve of 0.87. Nine of the ten most influential predictors stemmed directly from clinical notes, that is, unstructured textual data. The results show the potential of textual data for the creation of an early detection system that could pave the way for interventions that minimize the use of mechanical restraint. It is important to emphasize that the model was limited by a narrow scope of data from the Central Denmark Region, and by the fact that only initial mechanical restraint episodes were considered (in other words, recurrent incidents were not included in the study).

Generating content

Although the initial QTA methodologies were not centred on content generation, the rise of generative models has been transformative. Models such as GPT-4 and Llama2 (ref. 20 ) have brought forth previously unimagined capabilities, expanding the potential of QTA to create content, including coherent and contextually accurate paragraphs to complete articles. Writers and content creators are now using tools based on models such as GPT-4 to augment their writing processes by offering suggestions or even drafting entire sections of texts. In education, such models aid in developing customized content for students, ensuring adaptive learning 83 . The capacity to create synthetic data also heralds new possibilities. Consider the domain of historical research, in which generative models can simulate textual content, offering speculative yet data-driven accounts of alternate histories or events that might have been; for example, relying on generative models to create computational software agents that simulate human behaviour 84 . However, the risks associated with text-generating models are exemplified by a study in which GPT-3 was used for storytelling. The generated stories were found to exhibit many known gender stereotypes, even when prompts did not contain explicit gender cues or stereotype-related content 85 .

Reproducibility and data deposition

Given the rapidly evolving nature of the models, methods and practices in QTA, reproducibility is essential for validating the results and creating a foundation upon which other researchers can build. Sharing code and trained models in well-documented repositories are important to enable reproducible experiments. However, sharing and depositing raw data can be challenging, owing to the inherent limitations of unstructured data and regulations related to proprietary and sensitive data.

Code and model sharing

In QTA research, using open source code has become the norm and the need to share models and code to foster innovation and collaboration has been widely accepted. QTA is interdisciplinary by nature, and by making code and models public, the field has avoided unnecessary silos and enabled collaboration between otherwise disparate disciplines. A further benefit of open source software is the flexibility and transparency that comes from freely accessing and modifying software to meet specific research needs. Accessibility enables an iterative feedback loop, as researchers can validate, critique and build on the existing work. Software libraries, such as scikit-learn, that have been drivers for adopting machine learning in QTA are testimony to the importance of open source software 21 .

Sharing models is not without challenges. QTA is evolving rapidly, and models may use specific versions of software and hardware configurations that no longer work or that yield different results with other versions or configurations. This variability can complicate the accessibility and reproducibility of research results. The breakthroughs of generative AI in particular have introduced new proprietary challenges to model sharing as data owners and sources raise objections to the use of models that have been trained on their data. This challenge is complicated, but fundamentally it mirrors the disputes about intellectual property rights and proprietary code in software engineering. Although QTA as a field benefits from open source software, individual research institutions may have commercial interests or intellectual property rights related to their software.

On the software side, there is currently a preference for scripting languages, especially Python, that enable rapid development, provide access to a wide selection of software libraries and have a large user community. QTA is converging towards code and model sharing through open source platforms such as GitHub and GitLab with an appropriate open source software license such as the MIT license . Models often come with additional disclaimers or use-based restrictions to promote responsible use of AI, such as in the RAIL licenses . Pre-trained models are also regularly shared on dedicated machine-learning platforms such as Hugging Face 22 to enable efficient fine-tuning and deployment. It is important to emphasize that although these platforms support open science, these services are provided by companies with commercial interests. Open science platforms such as Zenodo and OSF can also be used to share code and models for the purpose of reproducibility.

Popular containerization software has been widely adopted in the machine-learning community and has spread to QTA. Containerization, that is, packaging all parts of a QTA application — including code and other dependencies — into a single standalone unit ensures that model and code run consistently across various computing environments. It offers a powerful solution to challenges such as reproducibility, specifically variability in software and hardware configurations.

Data management and storage

Advances in QTA in recent years are mainly because of the availability of vast amounts of text data and the rise of deep learning techniques. However, the dependency on large unstructured data sets, many of which are proprietary or sensitive, poses unique data management challenges. Pre-trained models irrespective of their use (for example, representation learning or generative) require extensive training on large data sets. When these data sets are proprietary or sensitive, they cannot be readily available, which limits the ability of researchers to reproduce results and develop competitive models. Furthermore, models trained on proprietary data sets often lack transparency regarding their collection and curation processes, which can hide potential biases in the data. Finally, there can be data privacy issues related to training or using models that are trained on sensitive data. Individuals whose data are included may not have given their explicit consent for their information to be used in research, which can pose ethical and legal challenges.

It is a widely adopted practice in QTA to share data and metadata with an appropriate license whenever possible. Data can be deposited in open science platforms such as Zenodo, but specialized machine-learning platforms are also used for this purpose. However, it should be noted that QTA data are rarely unique, unlike experimental data collected through random controlled trials. In many cases, access to appropriate metadata and documentation would enable the data to be reconstructed. In almost all cases, it is therefore strongly recommended that researchers share metadata and documentation for data, as well as code and models, using a standardized document or framework, a so-called datasheet. Although QTA is not committed to one set of principles for (meta)data management, European research institutions are increasingly adopting the FAIR principles 50 .

Documentation

Although good documentation is vital in all fields of software development and research, the reliance of QTA on code, models and large data sets makes documentation particularly crucial for reproducibility. Popular resources for structuring projects include project templating tools and documentation generators such as Cookiecutter and Sphinx . Models are often documented with model cards that provide a detailed overview of the development, capabilities and biases of the model to promote transparency and accountability 86 . Similarly, datasheets or data cards can be used to promote transparency for data used in QTA 87 . Finally, it is considered good practice to provide logs for models that document parameters, metrics and events for QTA experiments, especially during training and fine-tuning. Although not strictly required, logs are also important for documenting the iterative process of model refinement. There are several platforms that support the creation and visualization of training logs ( Weights & Biases and MLflow ).

Limitations and optimizations

The application of QTA requires scrutiny of its inherent limitations and potentials. This section discusses these aspects and elucidates the challenges and opportunities for further refinement.

Limitations in QTA

Defining research questions.

In QTA, the framing of research questions is often determined by the capabilities and limitations of the available text analysis tools, rather than by intellectual inquiry or scientific curiosity. This leads to task-driven limitations, in which inquiry is confined to areas where the tools are most effective. For example, relying solely on bag-of-words models might skew research towards easily quantifiable aspects, distorting the intellectual landscape. Operationalizing broad and nuanced research questions into specific tasks may strip them of their depth, forcing them to conform to the constraints of existing analytical models 88 .

Challenges in interpretation

The representation of language of underlying phenomena is often ambiguous or indirect, requiring careful interpretation. Misinterpretations can arise, leading to challenges related to historical, social and cultural context of a text, in which nuanced meanings that change across time, class and cultures are misunderstood 89 . Overlooking other modalities such as visual or auditory information can lead to a partial understanding of the subject matter and limit the full scope of insights. This can to some extent be remedied by the use of grounded models (such as GPT-4), but it remains a challenge for the community to solve long term.

Determining reliability and validation

The reliability and stability of the conclusions drawn from the QTA require rigorous validation, which is often neglected in practice. Multiple models, possibly on different types of data, should be compared to ensure that conclusions are not artefacts of a particular method or of a different use of the method. Furthermore, cultural phenomena should be evolved to avoid misguided insights. Building a robust framework that allows testing and comparison enhances the integrity and applicability of QTA in various contexts 90 .

Connecting analysis to cultural insights

Connecting text analysis to larger cultural claims necessitates foundational theoretical frameworks, including recognizing linguistic patterns, sociolinguistic variables and theories of cultural evolution that may explain changes. Translating textual patterns into meaningful cultural observations requires understanding how much (or how little) culture is expressed in text so that findings can be generalized beyond isolated observations. A theoretical foundation is vital to translate textual patterns into culturally relevant insights, making QTA a more effective tool for broader cultural analysis.

Balancing factors in machine learning

Balancing factors is critical in aligning machine-learning techniques with research objectives. This includes the trade-off between quality and control. Quality refers to rigorous, robust and valid findings, and control refers to the ability to manage specific variables for clear insights. It is also vital to ensure a balance between quantity and quality in data source to lead to more reliable conclusions. Balance is also needed between correctness and accuracy, in which the former ensures consistent application of rules, and the latter captures the true nature of the text.

From features-based to generative models

QTA has undergone a profound evolution, transitioning from feature-based approaches to representation learning and finally to generative models. This progression demonstrates growing complexity in our understanding of language, reflecting the maturity in the field of QTA. Each stage has its characteristics, strengths and limitations.

In the early stages, feature-based models were both promising and limiting. The simplicity of their design, relying on explicit feature engineering, allowed for the targeted analysis. However, this simplicity limited their ability to grasp complex, high-level patterns in language. For example, the use of bag-of-words models in the sentiment analysis showcased direct applicability, but also revealed limitations in understanding contextual nuances. The task-driven limitations of these models sometimes overshadowed genuine intellectual inquiry. Using a fixed (often modern) list of words with corresponding emotional valences may limit our ability to fully comprehend the complexity of emotional stances in, for example, historical literature. Despite these drawbacks, the ability to customize features provided researchers with a direct and specific understanding of language phenomena that could be informed by specialized domain knowledge 91 .

With the emergence of representation learning, a shift occurred within the field of QTA. These models offered the ability to capture higher-level abstractions, forging a richer understanding of language. Their scalability to handle large data sets and uncover complex relationships became a significant strength. However, this complexity introduced new challenges, such as a loss of specificity in analysis and difficulties in translating broad research questions into specific tasks. Techniques such as Word2Vec enabled the capture of semantic relationships but made it difficult to pinpoint specific linguistic features. Contextualized models, in turn, allow for more specificity, but are typically pre-trained on huge data sets (not available for scrutiny) and then applied to a research question without any discussion of how well the model fits the data at hand. In addition, these contextualized models inundate with information. Instead of providing one representation for a word (similar to Word2Vec does), they provide one representation for each occurrence of the word. Each of these representations is one order of magnitude larger than vectors typical for Word2Vec (768–1,600 dimensions compared with 50–200) and comes in several varieties, one for each of the layers of the mode, typically 12.

The introduction of generative models represents the latest stage of this evolution, providing even greater complexity and potential. Innovative in their design, generative models provide opportunities to address more complex and open-ended research questions. They fuel the generation of new ideas and offer avenues for novel approaches. However, these models are not without their challenges. Their high complexity can make interpretation and validation demanding, and if not properly managed, biases and ethical dilemmas will emerge. The use of generative models in creating synthetic text must be handled with care to avoid reinforcing stereotypes or generating misleading information. In addition, if the enormous amounts of synthetically generated text are used to further train the models, this will lead to a spiral of decaying quality as eventually a majority of the training data will have been generated by machines (the models often fail to distinguish synthetic text from genuine human-created text) 92 . However, it will also allow researchers to draw insights from a machine that is learning on data it has generated itself.

The evolution from feature-based to representation learning to generative models reflects increasing maturity in the field of QTA. As models become more complex, the need for careful consideration, ethical oversight and methodological innovation intensifies. The challenge now lies in ensuring that these methodologies align with intellectual and scientific goals, rather than being constrained by their inherent limitations. This growing complexity mirrors the increasing demands of this information-driven society, requiring interdisciplinary collaboration and responsible innovation. Generative models require a nuanced understanding of the complex interplay between language, culture, time and society, and a clear recognition of constraints of the QTA. Researchers must align their tools with intellectual goals and embrace active efforts to address the challenges through optimization strategies. The evolution in QTA emphasizes not only technological advances but also the necessity of aligning the ever-changing landscape of computational methodologies with research questions. By focusing on these areas and embracing the accompanying challenges, the field can build robust, reliable conclusions and move towards more nuanced applications of the text analysis. This progress marks a significant step towards an enriched exploration of textual data, widening the scope for understanding multifaceted relationships. The road ahead calls for a further integration of theory and practice. It is essential that evolution of QTA ensures that technological advancement serves both intellectual curiosity and ethical responsibility, resonating with the multifaceted dynamics of language, culture, time and society 93 .

Balancing size and quality

In QTA, the relationship between data quantity and data quality is often misconceived. Although large data sets serve as the basis for training expansive language models, they are not always required when seeking answers to nuanced research questions. The wide-ranging scope of large data sets can offer comprehensive insights into broad trends and general phenomena. However, this often comes at the cost of a detailed understanding of context-specific occurrences. An issue such as  frequency bias exemplifies this drawback. Using diverse sampling strategies, such as stratified sampling to ensure representation across different social groups and bootstrapping methods to correct for selection bias, can offer a more balanced, contextualized viewpoint. Also, relying on methods such as burst or change-point detection can help to pinpoint moments of interest in data sets with a temporal dimension. Triangulating these methods across multiple smaller data sets can enhance reliability and depth of the analysis.

The design of machine-learning models should account for both the frequency and the significance of individual data points. In other words, the models should be capable of learning not just from repetitive occurrences but also from singular, yet critical, events. This enables the machine to understand rare but important phenomena such as revolutions, seminal publications or watershed individual actions, which would typically be overlooked in a conventional data-driven approach. The capacity to learn from such anomalies can enhance the interpretative depth of the model, enabling them to offer more nuanced insights.

Although textual data have been the mainstay for computational analyses, it is not the only type of data that matters, especially when the research questions involve cultural and societal nuances. Diverse data types including images, audio recordings and even physical artefacts should be integrated into the research to provide a more rounded analysis. Additionally, sourcing data from varied geographical and sociocultural contexts can bring multiple perspectives into the frame, thus offering a multifaceted understanding that textual data from English sources alone cannot capture.

Ethical, practical and efficient models

The evolving landscape of machine learning, specifically with respect to model design and utility, reflects a growing emphasis on efficiency and interpretive value. One notable shift is towards smaller, more energy-efficient models. This transition is motivated by both environmental sustainability and economic pragmatism. With computational costs soaring and the environmental toll becoming untenable, the demand for smaller models that maintain or even exceed the quality of larger models is escalating 94 .

Addressing the data sources used to train models is equally critical, particularly when considering models that will serve research or policy purposes. The provenance and context of data dictate its interpretive value, requiring models to be designed with a hierarchical evaluation of data sources. Such an approach could improve the understanding of a model of the importance of each data type given a specific context, thereby improving the quality and reliability of its analysis. Additionally, it is important to acknowledge the potential ethical and legal challenges within this process, including the exploitation of workers during the data collection and model development.

Transparency remains another pressing issue as these models become integral to research processes. Future iterations should feature a declaration of content that enumerates not only the origin of the data but also its sociocultural and temporal context, preprocessing steps, any known biases, along with the analytical limitations of the model. This becomes especially important for generative models, which may produce misleading or even harmful content if the original data sources are not properly disclosed and understood. Important steps have already been taken with the construction of model cards and data sheets 95 .

Finally, an emergent concern is the risk of feedback loops compromising the quality of machine-learning models. If a model is trained on its own output, errors and biases risk being amplified over time. This necessitates constant vigilance as it poses a threat to the long-term reliability and integrity of AI models. The creation of a gold-standard version of the Internet, not polluted by AI-generated data, is also important 96 .

Refining the methodology and ethos

The rapid advances in QTA, particularly the rise of generative models, have opened up a discourse that transcends mere technological prowess. Although earlier feature-based models require domain expertise and extensive human input before they could be used, generative models can already generate convincing output based on relatively short prompts. This shift raises crucial questions about the interplay between machine capability and human expertise. The notion that advanced algorithms might eventually replace researchers is a common misplaced apprehension. These algorithms and models should be conceived as tools to enhance human scholarship by automating mundane tasks, spawning new research questions and even offering novel pathways for data analysis that might be too complex or time-consuming for human cognition.

This paradigm shift towards augmentative technologies introduces a nuanced problem-solving framework that accommodates the complexities intrinsic to studying human culture and behaviour. The approach of problem decomposition, a cornerstone in computer science, also proves invaluable here, converting overarching research queries into discrete, operationalizable components. These elements can then be addressed individually through specialized algorithms or models, whose results can subsequently be synthesized into a comprehensive answer. As we integrate increasingly advanced tuning methods into generative models — such as prompt engineering, retrieval augmented generation and parameter-efficient fine-tuning — it is important to remember that these models are tools, not replacements. They are most effective when employed as part of a broader research toolkit, in which their strengths can complement traditional scholarly methods.

Consequently, model selection becomes pivotal and should be intricately aligned with the nature of the research inquiry. Unsupervised learning algorithms such as clustering are well suited to exploratory research aimed at pattern identification. Conversely, confirmatory questions, which seek to validate theories or test hypotheses, are better addressed through supervised learning models such as regression.

The importance of a well-crafted interpretation stage cannot be overstated. This is where the separate analytical threads are woven into a comprehensive narrative that explains how the individual findings conjoin to form a cohesive answer to the original research query. However, the lack of standardization across methodologies is a persistent challenge. This absence hinders the reliable comparison of research outcomes across various studies. To remedy this, a shift towards establishing guidelines or best practices is advocated. These need not be rigid frameworks but could be adapted to fit specific research contexts, thereby ensuring methodological rigor alongside innovative freedom.

Reflecting on the capabilities and limitations of current generative models in QTA research is crucial. Beyond recognizing their utility, the blind spots — questions they cannot answer and challenges they have yet to overcome — need to be addressed 97 , 98 . There is a growing need to tailor these models to account for nuances such as frequency bias and to include various perspectives, possibly through more diverse data sets or a polyvocal approach.

In summary, a multipronged approach that synergizes transparent and informed data selection, ethical and critical perspectives on model building and selection, and an explicit and reproducible result interpretation offers a robust framework for tackling intricate research questions. By adopting such a nuanced strategy, we make strides not just in technological capability but also in the rigor, validity and credibility of QTA as a research tool.

Miner, G. Practical Text Mining and Statistical Analysis for Non-structured Text Data Applications (Academic Press, 2012).

Fayyad, U., Piatetsky-Shapiro, G. & Smyth, P. From data mining to knowledge discovery in databases. AI Mag. 17 , 37 (1996).

Google Scholar  

Hand, D. J. Data mining: statistics and more? Am. Stat. 52 , 112–116 (1998).

Article   Google Scholar  

McEnery, T. & Wilson, A. Corpus Linguistics: An Introduction (Edinburgh University Press, 2001).

Manning, C. D. & Schütze, H. Foundations of Statistical Natural Language Processing 1st edn (The MIT Press, 1999).

Manning, C., Raghavan, P. & Schütze, H. Introduction to Information Retrieval 1st edn (Cambridge University Press, 2008).

Wankhade, M., Rao, A. C. S. & Kulkarni, C. A survey on sentiment analysis methods, applications, and challenges. Artif. Intell. Rev. 55 , 5731–5780 (2022).

Jehangir, B., Radhakrishnan, S. & Agarwal, R. A survey on named entity recognition — datasets, tools, and methodologies. Nat. Lang. Process. J. 3 , 100017 (2023).

Fu, S. et al. Clinical concept extraction: a methodology review. J. Biomed. Inform. 109 , 103526 (2020).

Sebastiani, F. Machine learning in automated text categorization. ACM Comput. Surv. 34 , 1–47 (2002).

Talley, E. M. et al. Database of NIH grants using machine-learned categories and graphical clustering. Nat. Meth. 8 , 443–444 (2011).

Bommasani, R. et al. On the opportunities and risks of foundation models. Preprint at https://arxiv.org/abs/2108.07258 (2022).

Blei, D. M., Ng, A. Y. & Jordan, M. I. Latent Dirichlet allocation. J. Mach. Learn. Res. 3 , 993–1022 (2003).

Angelov, D. Top2Vec: distributed representations of topics. Preprint at https://arxiv.org/abs/2008.09470 (2020).

Barron, A. T. J., Huang, J., Spang, R. L. & DeDeo, S. Individuals, institutions, and innovation in the debates of the French Revolution. Proc. Natl Acad. Sci. USA 115 , 4607–4612 (2018).

Article   ADS   Google Scholar  

Mitchell, T. M. Machine Learning 1st edn (McGraw-Hill, 1997).

Vaswani, A. et al. Attention is all you need. in Advances in Neural Information Processing Systems (eds Guyon, I. et al.) Vol. 30 (Curran Associates, Inc., 2017).

Devlin, J., Chang, M.-W., Lee, K. & Toutanova, K. BERT: pre-training of deep bidirectional transformers for language understanding. https://arxiv.org/abs/1810.04805 (2018).

Brown, T. et al. Language models are few-shot learners. in Advances in Neural Information Processing Systems Vol. 33 (eds Larochelle, H., Ranzato, M., Hadsell, R., Balcan, M. & Lin, H.) 1877–1901 (Curran Associates, Inc., 2020).

Touvron, H. et al. Llama 2: open foundation and fine-tuned chat models. Preprint at https://arxiv.org/abs/2307.09288 (2023).

Pedregosa, F. et al. Scikit-learn: machine learning in Python. J. Mach. Learn. Res. 12 , 2825–2830 (2011).

MathSciNet   Google Scholar  

Wolf, T. et al. Transformers: state-of-the-art natural language processing. in Proc. 2020 Conference on Empirical Methods in Natural Language Processing: System Demonstrations 38–45 (Association for Computational Linguistics, Online, 2020).

Demartini, G., Siersdorfer, S., Chelaru, S. & Nejdl, W. Analyzing political trends in the blogosphere. in Proceedings of the International AAAI Conference on Web and Social Media vol. 5 466–469 (AAAI, 2011).

Goldstone, A. & Underwood, T. The quiet transformations of literary studies: what thirteen thousand scholars could tell us. New Lit. Hist. 45 , 359–384 (2014).

Tangherlini, T. R. & Leonard, P. Trawling in the sea of the great unread: sub-corpus topic modeling and humanities research. Poetics 41 , 725–749 (2013).

Mei, Q. & Zhai, C. Discovering evolutionary theme patterns from text: an exploration of temporal text mining. in Proceedings of the Eleventh ACM SIGKDD International Conference on Knowledge Discovery and Data Mining 198–207 (Association for Computing Machinery, 2005).

Frermann, L. & Lapata, M. A Bayesian model of diachronic meaning change. Trans. Assoc. Comput. Linguist. 4 , 31–45 (2016).

Koplenig, A. Analyzing Lexical Change in Diachronic Corpora . PhD thesis, Mannheim https://nbn-resolving.org/urn:nbn:de:bsz:mh39-48905 (2016).

Dubossarsky, H., Weinshall, D. & Grossman, E. Outta control: laws of semantic change and inherent biases in word representation models. in Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing 1136–1145 (Association for Computational Linguistics, 2017).

Dubossarsky, H., Hengchen, S., Tahmasebi, N. & Schlechtweg, D. Time-out: temporal referencing for robust modeling of lexical semantic change. in Proc. 57th Annual Meeting of the Association for Computational Linguistics 457–470 (Association for Computational Linguistics, 2019).

Koplenig, A. Why the quantitative analysis of diachronic corpora that does not consider the temporal aspect of time-series can lead to wrong conclusions. Digit. Scholarsh. Humanit. 32 , 159–168 (2017).

Tahmasebi, N., Borin, L. & Jatowt, A. Survey of computational approaches to lexical semantic change detection. Zenodo https://doi.org/10.5281/zenodo.5040302 (2021).

Bizzoni, Y., Degaetano-Orttlieb, S., Fankhauser, P. & Teich, E. Linguistic variation and change in 250 years of English scientific writing: a data-driven approach. Front. Artif. Intell. 3 , 73 (2020).

Haider, T. & Eger, S. Semantic change and emerging tropes in a large corpus of New High German poetry. in Proc. 1st International Workshop on Computational Approaches to Historical Language Change 216–222 (Association for Computational Linguistics, 2019).

Vylomova, E., Murphy, S. & Haslam, N. Evaluation of semantic change of harm-related concepts in psychology. in Proc. 1st International Workshop on Computational Approaches to Historical Language Change 29–34 (Association for Computational Linguistics, 2019).

Marjanen, J., Pivovarova, L., Zosa, E. & Kurunmäki, J. Clustering ideological terms in historical newspaper data with diachronic word embeddings. in 5th International Workshop on Computational History, HistoInformatics 2019 (CEUR-WS, 2019).

Tripodi, R., Warglien, M., Levis Sullam, S. & Paci, D. Tracing antisemitic language through diachronic embedding projections: France 1789–1914. in Proc. 1st International Workshop on Computational Approaches to Historical Language Change 115–125 (Association for Computational Linguistics, 2019).

Garg, N., Schiebinger, L., Jurafsky, D. & Zou, J. Word embeddings quantify 100 years of gender and ethnic stereotypes. Proc. Natl. Acad. Sci. USA 115 , E3635–E3644 (2018).

Wevers, M. Using word embeddings to examine gender bias in Dutch newspapers, 1950–1990. in Proc. 1st International Workshop on Computational Approaches to Historical Language Change 92–97 (Association for Computational Linguistics, 2019).

Sommerauer, P. & Fokkens, A. Conceptual change and distributional semantic models: an exploratory study on pitfalls and possibilities. in Proc. 1st International Workshop on Computational Approaches to Historical Language Change 223–233 (Association for Computational Linguistics, 2019). This article examines the effects of known pitfalls on digital humanities studies, using embedding models, and proposes guidelines for conducting such studies while acknowledging the need for further research to differentiate between artefacts and actual conceptual changes .

Doermann, D. & Tombre, K. (eds) Handbook of Document Image Processing and Recognition 2014th edn (Springer, 2014).

Yu, D. & Deng, L. Automatic Speech Recognition: A Deep Learning Approach 2015th edn (Springer, 2014).

Dasu, T. & Johnson, T. Exploratory Data Mining and Data Cleaning (John Wiley & Sons, Inc., 2003).

Prabhavalkar, R., Hori, T., Sainath, T. N., Schlüter, R. & Watanabe, S. End-to-end speech recognition: a survey https://arxiv.org/abs/2303.03329 (2023).

Pustejovsky, J. & Stubbs, A. Natural Language Annotation for Machine Learning: A Guide to Corpus-Building for Applications 1st edn (O’Reilly Media, 2012). A hands-on guide to data-intensive humanities research, including the quantitative text analysis, using the Python programming language .

Landis, J. R. & Koch, G. G. The measurement of observer agreement for categorical data. Biometrics 33 , 159–174 (1977).

Gurav, V., Parkar, M. & Kharwar, P. Accessible and ethical data annotation with the application of gamification. in Data Science and Analytics (eds Batra, U., Roy, N. R. & Panda, B.) 68–78 (Springer Singapore, 2020).

Paolacci, G., Chandler, J. & Ipeirotis, P. G. Running experiments on Amazon Mechanical Turk. Judgm. Decis. Mak. 5 , 411–419 (2010).

Bergvall-Kåreborn, B. & Howcroft, D. Amazon mechanical turk and the commodification of labour. New Technol. Work Employ. 29 , 213–223 (2014).

Wilkinson, M. D. et al. The FAIR guiding principles for scientific data management and stewardship. Sci. Data 3 , 160018 (2016).

Klymenko, O., Meisenbacher, S. & Matthes, F. Differential privacy in natural language processing the story so far. in Proc. Fourth Workshop on Privacy in Natural Language Processing 1–11 (Association for Computational Linguistics, 2022).

Lassen, I. M. S., Almasi, M., Enevoldsen, K. & Kristensen-McLachlan, R. D. Detecting intersectionality in NER models: a data-driven approach. in Proc. 7th Joint SIGHUM Workshop on Computational Linguistics for Cultural Heritage, Social Sciences, Humanities and Literature 116–127 (Association for Computational Linguistics, 2023).

DaCy: A Unified Framework for Danish NLP Vol. 2989, 206–216 (CEUR Workshop Proceedings, 2021).

Karsdorp, F., Kestemont, M. & Riddell, A. Humanities Data Analysis: Case Studies with Python (Princeton Univ. Press, 2021).

Ruder, S., Peters, M. E., Swayamdipta, S. & Wolf, T. Transfer learning in natural language processing. in Proc. 2019 Conference of the North American Chapter of the Association for Computational Linguistics : Tutorials 15–18 (Association for Computational Linguistics, 2019). The paper presents an overview of modern transfer learning methods in natural language processing, highlighting their emergence, effectiveness in improving the state of the art across various tasks and potential to become a standard tool in natural language processing .

Malte, A. & Ratadiya, P. Evolution of transfer learning in natural language processing. Preprint at https://arxiv.org/abs/1910.07370 (2019).

Groh, M. Identifying the context shift between test benchmarks and production data. Preprint at https://arxiv.org/abs/2207.01059 (2022).

Wang, H., Li, J., Wu, H., Hovy, E. & Sun, Y. Pre-trained language models and their applications. Engineering 25 , 51–65 (2023). This article provides a comprehensive review of the recent progress and research on pre-trained language models in natural language processing , including their development, impact, challenges and future directions in the field.

Wilks, D. S. On the combination of forecast probabilities for consecutive precipitation periods. Weather Forecast. 5 , 640–650 (1990).

Loughran, T. & McDonald, B. Textual analysis in accounting and finance: a survey. J. Account. Res. 54 , 1187–1230 (2016).

Ribeiro, M. T., Singh, S. & Guestrin, C. ‘Why should I trust you?’: explaining the predictions of any classifier. Preprint at https://arxiv.org/abs/1602.04938 (2016).

Lundberg, S. M. & Lee, S.-I. A unified approach to interpreting model predictions. in Advances in Neural Information Processing Systems Vol. 30 (eds Guyon, I. et al.) 4765–4774 (Curran Associates, Inc., 2017).

Tahmasebi, N. & Hengchen, S. The strengths and pitfalls of large-scale text mining for literary studies. Samlaren 140 , 198–227 (2019).

Jaidka, K., Ahmed, S., Skoric, M. & Hilbert, M. Predicting elections from social media: a three-country, three-method comparative study. Asian J. Commun. 29 , 252–273 (2019).

Underwood, T. Distant Horizons : Digital Evidence and Literary Change (Univ. Chicago Press, 2019).

Jo, E. S. & Algee-Hewitt, M. The long arc of history: neural network approaches to diachronic linguistic change. J. Jpn Assoc. Digit. Humanit. 3 , 1–32 (2018).

Soni, S., Klein, L. F. & Eisenstein, J. Abolitionist networks: modeling language change in nineteenth-century activist newspapers. J. Cultural Anal. 6 , 1–43 (2021).

Perry, C. & Dedeo, S. The cognitive science of extremist ideologies online. Preprint at https://arxiv.org/abs/2110.00626 (2021).

Antoniak, M., Mimno, D. & Levy, K. Narrative paths and negotiation of power in birth stories. Proc. ACM Hum. Comput. Interact. 3 , 1–27 (2019).

Vicinanza, P., Goldberg, A. & Srivastava, S. B. A deep-learning model of prescient ideas demonstrates that they emerge from the periphery. PNAS Nexus 2 , pgac275 (2023). Using deep learning on text data, the study identifies markers of prescient ideas, revealing that groundbreaking thoughts often emerge from the periphery of domains rather than their core.

Adeva, J. G., Atxa, J. P., Carrillo, M. U. & Zengotitabengoa, E. A. Automatic text classification to support systematic reviews in medicine. Exp. Syst. Appl. 41 , 1498–1508 (2014).

Schneider, N., Fechner, N., Landrum, G. A. & Stiefl, N. Chemical topic modeling: exploring molecular data sets using a common text-mining approach. J. Chem. Inf. Model. 57 , 1816–1831 (2017).

Kayi, E. S., Yadav, K. & Choi, H.-A. Topic modeling based classification of clinical reports. in 51st Annual Meeting of the Association for Computational Linguistics Proceedings of the Student Research Workshop 67–73 (Association for Computational Linguistics, 2013).

Roberts, M. E. et al. Structural topic models for open-ended survey responses. Am. J. Political Sci. 58 , 1064–1082 (2014).

Kheiri, K. & Karimi, H. SentimentGPT: exploiting GPT for advanced sentiment analysis and its departure from current machine learning. Preprint at https://arxiv.org/abs/2307.10234 (2023).

Pelaez, S., Verma, G., Ribeiro, B. & Shapira, P. Large-scale text analysis using generative language models: a case study in discovering public value expressions in AI patents. Preprint at https://arxiv.org/abs/2305.10383 (2023).

Rathje, S. et al. GPT is an effective tool for multilingual psychological text analysis. Preprint at https://psyarxiv.com/sekf5/ (2023).

Bollen, J., Mao, H. & Zeng, X. Twitter mood predicts the stock market. J. Comput. Sci. 2 , 1–8 (2011). Analysing large-scale Twitter feeds, the study finds that certain collective mood states can predict daily changes in the Dow Jones Industrial Average with 86.7% accuracy .

Tumasjan, A., Sprenger, T. O., Sandner, P. G. & Welpe, I. M. Election forecasts with twitter: how 140 characters reflect the political landscape. Soc. Sci. Comput. Rev. 29 , 402–418 (2011).

Koppel, M., Schler, J. & Argamon, S. Computational methods in authorship attribution. J. Am. Soc. Inf. Sci. Tech. 60 , 9–26 (2009).

Juola, P. The Rowling case: a proposed standard analytic protocol for authorship questions. Digit. Scholarsh. Humanit. 30 , i100–i113 (2015).

Danielsen, A. A., Fenger, M. H. J., Østergaard, S. D., Nielbo, K. L. & Mors, O. Predicting mechanical restraint of psychiatric inpatients by applying machine learning on electronic health data. Acta Psychiatr. Scand. 140 , 147–157 (2019). The study used machine learning from electronic health data to predict mechanical restraint incidents within 3 days of psychiatric patient admission, achieving an accuracy of 0.87 area under the curve, with most predictive factors coming from clinical text notes .

Rudolph, J., Tan, S. & Tan, S. ChatGPT: bullshit spewer or the end of traditional assessments in higher education? J. Appl. Learn. Teach. 6 , 342–363 (2023).

Park, J. S. et al. Generative agents: interactive Simulacra of human behavior. in Proceedings of the 36th Annual ACM Symposium on User Interface Software and Technology (UIST ‘23) 1–22 (Association for Computing Machinery, 2023).

Lucy, L. & Bamman, D. Gender and representation bias in GPT-3 generated stories. in Proc. Third Workshop on Narrative Understanding 48–55 (Association for Computational Linguistics, Virtual, 2021). The paper shows how GPT-3-generated stories exhibit gender stereotypes, associating feminine characters with family and appearance, and showing them as less powerful than masculine characters, prompting concerns about social biases in language models for storytelling.

Mitchell, M. et al. Model cards for model reporting. in Proc. Conference on Fairness, Accountability, and Transparency (Association for Computing Machinery, 2019). The paper introduces model cards for documentation of machine-learning models, detailing their performance characteristics across diverse conditions and contexts to promote transparency and responsible use .

Gebru, T. et al. Datasheets for datasets. Commun. ACM 64 , 86–92 (2021).

Bailer-Jones, D. M. When scientific models represent. Int. Stud. Philos. Sci. 17 , 59–74 (2010).

Guldi, J. The Dangerous Art of Text Mining: A Methodology for Digital History 1st edn (Cambridge Univ. Press, (2023).

Da, N. Z. The computational case against computational literary studies. Crit. Inquiry 45 , 601–639 (2019).

Mäntylä, M. V., Graziotin, D. & Kuutila, M. The evolution of sentiment analysis — a review of research topics, venues, and top cited papers. Comp. Sci. Rev. 27 , 16–32 (2018).

Alemohammad, S. et al. Self-consuming generative models go mad. Preprint at https://arxiv.org/abs/2307.01850 (2023).

Bockting, C. L., van Dis, E. A., van Rooij, R., Zuidema, W. & Bollen, J. Living guidelines for generative AI — why scientists must oversee its use. Nature 622 , 693–696 (2023).

Wu, C.-J. et al. Sustainable AI: environmental implications, challenges and opportunities. in Proceedings of Machine Learning and Systems 4 (MLSys 2022) vol. 4, 795–813 (2022).

Pushkarna, M., Zaldivar, A. & Kjartansson, O. Data cards: purposeful and transparent dataset documentation for responsible AI. in 2022 ACM Conference on Fairness, Accountability, and Transparency 1776–1826 (Association for Computing Machinery, 2022).

Shumailov, I. et al. The curse of recursion: training on generated data makes models forget. Preprint at https://arxiv.org/abs/2305.17493 (2023).

Mitchell, M. How do we know how smart AI systems are? Science https://doi.org/10.1126/science.adj5957 (2023).

Wu, Z. et al. Reasoning or reciting? Exploring the capabilities and limitations of language models through counterfactual tasks. Preprint at https://arxiv.org/abs/2307.02477 (2023).

Birjali, M., Kasri, M. & Beni-Hssane, A. A comprehensive survey on sentiment analysis: approaches, challenges and trends. Knowl. Based Syst. 226 , 107134 (2021).

Acheampong, F. A., Wenyu, C. & Nunoo Mensah, H. Text based emotion detection: advances, challenges, and opportunities. Eng. Rep. 2 , e12189 (2020).

Pauca, V. P., Shahnaz, F., Berry, M. W. & Plemmons, R. J. Text mining using non-negative matrix factorizations. in Proc. 2004 SIAM International Conference on Data Mining 452–456 (Society for Industrial and Applied Mathematics, 2004).

Sharma, A., Amrita, Chakraborty, S. & Kumar, S. Named entity recognition in natural language processing: a systematic review. in Proc. Second Doctoral Symposium on Computational Intelligence (eds Gupta, D., Khanna, A., Kansal, V., Fortino, G. & Hassanien, A. E.) 817–828 (Springer Singapore, 2022).

Nasar, Z., Jaffry, S. W. & Malik, M. K. Named entity recognition and relation extraction: state-of-the-art. ACM Comput. Surv. 54 , 1–39 (2021).

Sedighi, M. Application of word co-occurrence analysis method in mapping of the scientific fields (case study: the field of informetrics). Library Rev. 65 , 52–64 (2016).

El-Kassas, W. S., Salama, C. R., Rafea, A. A. & Mohamed, H. K. Automatic text summarization: a comprehensive survey. Exp. Syst. Appl. 165 , 113679 (2021).

Download references

Acknowledgements

K.L.N. was supported by grants from the Velux Foundation (grant title: FabulaNET) and the Carlsberg Foundation (grant number: CF23-1583). N.T. was supported by the research programme Change is Key! supported by Riksbankens Jubileumsfond (grant number: M21-0021).

Author information

Authors and affiliations.

Center for Humanities Computing, Aarhus University, Aarhus, Denmark

Kristoffer L. Nielbo

Meertens Institute, Royal Netherlands Academy of Arts and Sciences, Amsterdam, The Netherlands

Folgert Karsdorp

Department of History, University of Amsterdam, Amsterdam, The Netherlands

Melvin Wevers

Institute of History, Leiden University, Leiden, The Netherlands

Alie Lassche

Department of Linguistics, Aarhus University, Aarhus, Denmark

Rebekah B. Baglini

Department of Literature, University of Antwerp, Antwerp, Belgium

Mike Kestemont

Department of Philosophy, Linguistics and Theory of Science, University of Gothenburg, Gothenburg, Sweden

Nina Tahmasebi

You can also search for this author in PubMed   Google Scholar

Contributions

Introduction (K.L.N. and F.K.); Experimentation (K.L.N., F.K., M.K. and R.B.B.); Results (F.K., M.K., R.B.B. and N.T.); Applications (K.L.N., M.W. and A.L.); Reproducibility and data deposition (K.L.N. and A.L.); Limitations and optimizations (M.W. and N.T.); Outlook (M.W. and N.T.); overview of the Primer (K.L.N.).

Corresponding author

Correspondence to Kristoffer L. Nielbo .

Ethics declarations

Competing interests.

The authors declare no competing interests.

Peer review

Peer review information.

Nature Reviews Methods Primers thanks F. Jannidis, L. Nelson, T. Tangherlini and the other, anonymous, reviewer(s) for their contribution to the peer review of this work.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

A set of rules, protocols and tools for building software and applications, which programs can query to obtain data.

A model that represents text as a numerical vector based on word frequency or presence. Each text corresponds to a predefined vocabulary dictionary, with the vector.

Intersection of linguistics, computer science and artificial intelligence that is concerned with computational aspects of human language. It involves the development of algorithms and models that enable computers to understand, interpret and generate human language.

The branch of linguistics that studies language as expressed in corpora (samples of real-world text) and uses computational methods to analyse large collections of textual data.

A technique used to increase the size and diversity of language data sets to train machine-learning models.

The application of statistical, analytical and computational techniques to extract insights and knowledge from data.

( κ ). A statistical measure used to assess the reliability of agreement between multiple raters when assigning categorical ratings to a number of items.

A phenomenon in which elements that are over-represented in a data set receive disproportionate attention or influence in the analysis.

A field of study focused on the science of searching for information within documents and retrieving relevant documents from large databases.

A text normalization technique used in natural language processing in which words are reduced to their base or dictionary form.

In quantitative text analysis, machine learning refers to the application of algorithms and statistical models to enable computers to identify patterns, trends and relationships in textual data without being explicitly programmed. It involves training these models on large data sets to learn and infer from the structure and nuances of language.

A field of artificial intelligence using computational methods for analysing and generating natural language and speech.

A type of information filtering system that seeks to predict user preferences and recommend items (such as books, movies and products) that are likely to be of interest to the user.

A set of techniques in machine learning in which the system learns to automatically identify and extract useful features or representations from raw data.

A text normalization technique used in natural language processing, in which words are reduced to their base or root form.

A machine-learning approach in which models are trained on labelled data, such that each training text is paired with an output label. The model learns to predict the output from the input data, with the aim of generalizing the training set to unseen data.

A deep learning model that handles sequential data, such as text, using mechanisms called attention and self-attention, allowing it to weigh the importance of different parts of the input data. In the quantitative text analysis, transformers are used for tasks such as sentiment analysis, text classification and language translation, offering superior performance in understanding context and nuances in large data sets.

A type of machine learning in which models are trained on data without output labels. The goal is to discover underlying patterns, groupings or structures within the data, often through clustering or dimensionality reduction techniques.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article.

Nielbo, K.L., Karsdorp, F., Wevers, M. et al. Quantitative text analysis. Nat Rev Methods Primers 4 , 25 (2024). https://doi.org/10.1038/s43586-024-00302-w

Download citation

Accepted : 21 February 2024

Published : 11 April 2024

DOI : https://doi.org/10.1038/s43586-024-00302-w

Share this article

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

Quick links

  • Explore articles by subject
  • Guide to authors
  • Editorial policies

Sign up for the Nature Briefing: AI and Robotics newsletter — what matters in AI and robotics research, free to your inbox weekly.

analysis techniques in quantitative research

Quantitative Research

  • Reference work entry
  • First Online: 13 January 2019
  • Cite this reference work entry

Book cover

  • Leigh A. Wilson 2 , 3  

4165 Accesses

4 Citations

Quantitative research methods are concerned with the planning, design, and implementation of strategies to collect and analyze data. Descartes, the seventeenth-century philosopher, suggested that how the results are achieved is often more important than the results themselves, as the journey taken along the research path is a journey of discovery. High-quality quantitative research is characterized by the attention given to the methods and the reliability of the tools used to collect the data. The ability to critique research in a systematic way is an essential component of a health professional’s role in order to deliver high quality, evidence-based healthcare. This chapter is intended to provide a simple overview of the way new researchers and health practitioners can understand and employ quantitative methods. The chapter offers practical, realistic guidance in a learner-friendly way and uses a logical sequence to understand the process of hypothesis development, study design, data collection and handling, and finally data analysis and interpretation.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
  • Available as EPUB and PDF
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Babbie ER. The practice of social research. 14th ed. Belmont: Wadsworth Cengage; 2016.

Google Scholar  

Descartes. Cited in Halverston, W. (1976). In: A concise introduction to philosophy, 3rd ed. New York: Random House; 1637.

Doll R, Hill AB. The mortality of doctors in relation to their smoking habits. BMJ. 1954;328(7455):1529–33. https://doi.org/10.1136/bmj.328.7455.1529 .

Article   Google Scholar  

Liamputtong P. Research methods in health: foundations for evidence-based practice. 3rd ed. Melbourne: Oxford University Press; 2017.

McNabb DE. Research methods in public administration and nonprofit management: quantitative and qualitative approaches. 2nd ed. New York: Armonk; 2007.

Merriam-Webster. Dictionary. http://www.merriam-webster.com . Accessed 20th December 2017.

Olesen Larsen P, von Ins M. The rate of growth in scientific publication and the decline in coverage provided by Science Citation Index. Scientometrics. 2010;84(3):575–603.

Pannucci CJ, Wilkins EG. Identifying and avoiding bias in research. Plast Reconstr Surg. 2010;126(2):619–25. https://doi.org/10.1097/PRS.0b013e3181de24bc .

Petrie A, Sabin C. Medical statistics at a glance. 2nd ed. London: Blackwell Publishing; 2005.

Portney LG, Watkins MP. Foundations of clinical research: applications to practice. 3rd ed. New Jersey: Pearson Publishing; 2009.

Sheehan J. Aspects of research methodology. Nurse Educ Today. 1986;6:193–203.

Wilson LA, Black DA. Health, science research and research methods. Sydney: McGraw Hill; 2013.

Download references

Author information

Authors and affiliations.

School of Science and Health, Western Sydney University, Penrith, NSW, Australia

Leigh A. Wilson

Faculty of Health Science, Discipline of Behavioural and Social Sciences in Health, University of Sydney, Lidcombe, NSW, Australia

You can also search for this author in PubMed   Google Scholar

Corresponding author

Correspondence to Leigh A. Wilson .

Editor information

Editors and affiliations.

Pranee Liamputtong

Rights and permissions

Reprints and permissions

Copyright information

© 2019 Springer Nature Singapore Pte Ltd.

About this entry

Cite this entry.

Wilson, L.A. (2019). Quantitative Research. In: Liamputtong, P. (eds) Handbook of Research Methods in Health Social Sciences. Springer, Singapore. https://doi.org/10.1007/978-981-10-5251-4_54

Download citation

DOI : https://doi.org/10.1007/978-981-10-5251-4_54

Published : 13 January 2019

Publisher Name : Springer, Singapore

Print ISBN : 978-981-10-5250-7

Online ISBN : 978-981-10-5251-4

eBook Packages : Social Sciences Reference Module Humanities and Social Sciences Reference Module Business, Economics and Social Sciences

Share this entry

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

  • Publish with us

Policies and ethics

  • Find a journal
  • Track your research

U.S. flag

An official website of the United States government

The .gov means it’s official. Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

The site is secure. The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

  • Publications
  • Account settings

Preview improvements coming to the PMC website in October 2024. Learn More or Try it out now .

  • Advanced Search
  • Journal List
  • Indian J Anaesth
  • v.60(9); 2016 Sep

Basic statistical tools in research and data analysis

Zulfiqar ali.

Department of Anaesthesiology, Division of Neuroanaesthesiology, Sheri Kashmir Institute of Medical Sciences, Soura, Srinagar, Jammu and Kashmir, India

S Bala Bhaskar

1 Department of Anaesthesiology and Critical Care, Vijayanagar Institute of Medical Sciences, Bellary, Karnataka, India

Statistical methods involved in carrying out a study include planning, designing, collecting data, analysing, drawing meaningful interpretation and reporting of the research findings. The statistical analysis gives meaning to the meaningless numbers, thereby breathing life into a lifeless data. The results and inferences are precise only if proper statistical tests are used. This article will try to acquaint the reader with the basic research tools that are utilised while conducting various studies. The article covers a brief outline of the variables, an understanding of quantitative and qualitative variables and the measures of central tendency. An idea of the sample size estimation, power analysis and the statistical errors is given. Finally, there is a summary of parametric and non-parametric tests used for data analysis.

INTRODUCTION

Statistics is a branch of science that deals with the collection, organisation, analysis of data and drawing of inferences from the samples to the whole population.[ 1 ] This requires a proper design of the study, an appropriate selection of the study sample and choice of a suitable statistical test. An adequate knowledge of statistics is necessary for proper designing of an epidemiological study or a clinical trial. Improper statistical methods may result in erroneous conclusions which may lead to unethical practice.[ 2 ]

Variable is a characteristic that varies from one individual member of population to another individual.[ 3 ] Variables such as height and weight are measured by some type of scale, convey quantitative information and are called as quantitative variables. Sex and eye colour give qualitative information and are called as qualitative variables[ 3 ] [ Figure 1 ].

An external file that holds a picture, illustration, etc.
Object name is IJA-60-662-g001.jpg

Classification of variables

Quantitative variables

Quantitative or numerical data are subdivided into discrete and continuous measurements. Discrete numerical data are recorded as a whole number such as 0, 1, 2, 3,… (integer), whereas continuous data can assume any value. Observations that can be counted constitute the discrete data and observations that can be measured constitute the continuous data. Examples of discrete data are number of episodes of respiratory arrests or the number of re-intubations in an intensive care unit. Similarly, examples of continuous data are the serial serum glucose levels, partial pressure of oxygen in arterial blood and the oesophageal temperature.

A hierarchical scale of increasing precision can be used for observing and recording the data which is based on categorical, ordinal, interval and ratio scales [ Figure 1 ].

Categorical or nominal variables are unordered. The data are merely classified into categories and cannot be arranged in any particular order. If only two categories exist (as in gender male and female), it is called as a dichotomous (or binary) data. The various causes of re-intubation in an intensive care unit due to upper airway obstruction, impaired clearance of secretions, hypoxemia, hypercapnia, pulmonary oedema and neurological impairment are examples of categorical variables.

Ordinal variables have a clear ordering between the variables. However, the ordered data may not have equal intervals. Examples are the American Society of Anesthesiologists status or Richmond agitation-sedation scale.

Interval variables are similar to an ordinal variable, except that the intervals between the values of the interval variable are equally spaced. A good example of an interval scale is the Fahrenheit degree scale used to measure temperature. With the Fahrenheit scale, the difference between 70° and 75° is equal to the difference between 80° and 85°: The units of measurement are equal throughout the full range of the scale.

Ratio scales are similar to interval scales, in that equal differences between scale values have equal quantitative meaning. However, ratio scales also have a true zero point, which gives them an additional property. For example, the system of centimetres is an example of a ratio scale. There is a true zero point and the value of 0 cm means a complete absence of length. The thyromental distance of 6 cm in an adult may be twice that of a child in whom it may be 3 cm.

STATISTICS: DESCRIPTIVE AND INFERENTIAL STATISTICS

Descriptive statistics[ 4 ] try to describe the relationship between variables in a sample or population. Descriptive statistics provide a summary of data in the form of mean, median and mode. Inferential statistics[ 4 ] use a random sample of data taken from a population to describe and make inferences about the whole population. It is valuable when it is not possible to examine each member of an entire population. The examples if descriptive and inferential statistics are illustrated in Table 1 .

Example of descriptive and inferential statistics

An external file that holds a picture, illustration, etc.
Object name is IJA-60-662-g002.jpg

Descriptive statistics

The extent to which the observations cluster around a central location is described by the central tendency and the spread towards the extremes is described by the degree of dispersion.

Measures of central tendency

The measures of central tendency are mean, median and mode.[ 6 ] Mean (or the arithmetic average) is the sum of all the scores divided by the number of scores. Mean may be influenced profoundly by the extreme variables. For example, the average stay of organophosphorus poisoning patients in ICU may be influenced by a single patient who stays in ICU for around 5 months because of septicaemia. The extreme values are called outliers. The formula for the mean is

An external file that holds a picture, illustration, etc.
Object name is IJA-60-662-g003.jpg

where x = each observation and n = number of observations. Median[ 6 ] is defined as the middle of a distribution in a ranked data (with half of the variables in the sample above and half below the median value) while mode is the most frequently occurring variable in a distribution. Range defines the spread, or variability, of a sample.[ 7 ] It is described by the minimum and maximum values of the variables. If we rank the data and after ranking, group the observations into percentiles, we can get better information of the pattern of spread of the variables. In percentiles, we rank the observations into 100 equal parts. We can then describe 25%, 50%, 75% or any other percentile amount. The median is the 50 th percentile. The interquartile range will be the observations in the middle 50% of the observations about the median (25 th -75 th percentile). Variance[ 7 ] is a measure of how spread out is the distribution. It gives an indication of how close an individual observation clusters about the mean value. The variance of a population is defined by the following formula:

An external file that holds a picture, illustration, etc.
Object name is IJA-60-662-g004.jpg

where σ 2 is the population variance, X is the population mean, X i is the i th element from the population and N is the number of elements in the population. The variance of a sample is defined by slightly different formula:

An external file that holds a picture, illustration, etc.
Object name is IJA-60-662-g005.jpg

where s 2 is the sample variance, x is the sample mean, x i is the i th element from the sample and n is the number of elements in the sample. The formula for the variance of a population has the value ‘ n ’ as the denominator. The expression ‘ n −1’ is known as the degrees of freedom and is one less than the number of parameters. Each observation is free to vary, except the last one which must be a defined value. The variance is measured in squared units. To make the interpretation of the data simple and to retain the basic unit of observation, the square root of variance is used. The square root of the variance is the standard deviation (SD).[ 8 ] The SD of a population is defined by the following formula:

An external file that holds a picture, illustration, etc.
Object name is IJA-60-662-g006.jpg

where σ is the population SD, X is the population mean, X i is the i th element from the population and N is the number of elements in the population. The SD of a sample is defined by slightly different formula:

An external file that holds a picture, illustration, etc.
Object name is IJA-60-662-g007.jpg

where s is the sample SD, x is the sample mean, x i is the i th element from the sample and n is the number of elements in the sample. An example for calculation of variation and SD is illustrated in Table 2 .

Example of mean, variance, standard deviation

An external file that holds a picture, illustration, etc.
Object name is IJA-60-662-g008.jpg

Normal distribution or Gaussian distribution

Most of the biological variables usually cluster around a central value, with symmetrical positive and negative deviations about this point.[ 1 ] The standard normal distribution curve is a symmetrical bell-shaped. In a normal distribution curve, about 68% of the scores are within 1 SD of the mean. Around 95% of the scores are within 2 SDs of the mean and 99% within 3 SDs of the mean [ Figure 2 ].

An external file that holds a picture, illustration, etc.
Object name is IJA-60-662-g009.jpg

Normal distribution curve

Skewed distribution

It is a distribution with an asymmetry of the variables about its mean. In a negatively skewed distribution [ Figure 3 ], the mass of the distribution is concentrated on the right of Figure 1 . In a positively skewed distribution [ Figure 3 ], the mass of the distribution is concentrated on the left of the figure leading to a longer right tail.

An external file that holds a picture, illustration, etc.
Object name is IJA-60-662-g010.jpg

Curves showing negatively skewed and positively skewed distribution

Inferential statistics

In inferential statistics, data are analysed from a sample to make inferences in the larger collection of the population. The purpose is to answer or test the hypotheses. A hypothesis (plural hypotheses) is a proposed explanation for a phenomenon. Hypothesis tests are thus procedures for making rational decisions about the reality of observed effects.

Probability is the measure of the likelihood that an event will occur. Probability is quantified as a number between 0 and 1 (where 0 indicates impossibility and 1 indicates certainty).

In inferential statistics, the term ‘null hypothesis’ ( H 0 ‘ H-naught ,’ ‘ H-null ’) denotes that there is no relationship (difference) between the population variables in question.[ 9 ]

Alternative hypothesis ( H 1 and H a ) denotes that a statement between the variables is expected to be true.[ 9 ]

The P value (or the calculated probability) is the probability of the event occurring by chance if the null hypothesis is true. The P value is a numerical between 0 and 1 and is interpreted by researchers in deciding whether to reject or retain the null hypothesis [ Table 3 ].

P values with interpretation

An external file that holds a picture, illustration, etc.
Object name is IJA-60-662-g011.jpg

If P value is less than the arbitrarily chosen value (known as α or the significance level), the null hypothesis (H0) is rejected [ Table 4 ]. However, if null hypotheses (H0) is incorrectly rejected, this is known as a Type I error.[ 11 ] Further details regarding alpha error, beta error and sample size calculation and factors influencing them are dealt with in another section of this issue by Das S et al .[ 12 ]

Illustration for null hypothesis

An external file that holds a picture, illustration, etc.
Object name is IJA-60-662-g012.jpg

PARAMETRIC AND NON-PARAMETRIC TESTS

Numerical data (quantitative variables) that are normally distributed are analysed with parametric tests.[ 13 ]

Two most basic prerequisites for parametric statistical analysis are:

  • The assumption of normality which specifies that the means of the sample group are normally distributed
  • The assumption of equal variance which specifies that the variances of the samples and of their corresponding population are equal.

However, if the distribution of the sample is skewed towards one side or the distribution is unknown due to the small sample size, non-parametric[ 14 ] statistical techniques are used. Non-parametric tests are used to analyse ordinal and categorical data.

Parametric tests

The parametric tests assume that the data are on a quantitative (numerical) scale, with a normal distribution of the underlying population. The samples have the same variance (homogeneity of variances). The samples are randomly drawn from the population, and the observations within a group are independent of each other. The commonly used parametric tests are the Student's t -test, analysis of variance (ANOVA) and repeated measures ANOVA.

Student's t -test

Student's t -test is used to test the null hypothesis that there is no difference between the means of the two groups. It is used in three circumstances:

An external file that holds a picture, illustration, etc.
Object name is IJA-60-662-g013.jpg

where X = sample mean, u = population mean and SE = standard error of mean

An external file that holds a picture, illustration, etc.
Object name is IJA-60-662-g014.jpg

where X 1 − X 2 is the difference between the means of the two groups and SE denotes the standard error of the difference.

  • To test if the population means estimated by two dependent samples differ significantly (the paired t -test). A usual setting for paired t -test is when measurements are made on the same subjects before and after a treatment.

The formula for paired t -test is:

An external file that holds a picture, illustration, etc.
Object name is IJA-60-662-g015.jpg

where d is the mean difference and SE denotes the standard error of this difference.

The group variances can be compared using the F -test. The F -test is the ratio of variances (var l/var 2). If F differs significantly from 1.0, then it is concluded that the group variances differ significantly.

Analysis of variance

The Student's t -test cannot be used for comparison of three or more groups. The purpose of ANOVA is to test if there is any significant difference between the means of two or more groups.

In ANOVA, we study two variances – (a) between-group variability and (b) within-group variability. The within-group variability (error variance) is the variation that cannot be accounted for in the study design. It is based on random differences present in our samples.

However, the between-group (or effect variance) is the result of our treatment. These two estimates of variances are compared using the F-test.

A simplified formula for the F statistic is:

An external file that holds a picture, illustration, etc.
Object name is IJA-60-662-g016.jpg

where MS b is the mean squares between the groups and MS w is the mean squares within groups.

Repeated measures analysis of variance

As with ANOVA, repeated measures ANOVA analyses the equality of means of three or more groups. However, a repeated measure ANOVA is used when all variables of a sample are measured under different conditions or at different points in time.

As the variables are measured from a sample at different points of time, the measurement of the dependent variable is repeated. Using a standard ANOVA in this case is not appropriate because it fails to model the correlation between the repeated measures: The data violate the ANOVA assumption of independence. Hence, in the measurement of repeated dependent variables, repeated measures ANOVA should be used.

Non-parametric tests

When the assumptions of normality are not met, and the sample means are not normally, distributed parametric tests can lead to erroneous results. Non-parametric tests (distribution-free test) are used in such situation as they do not require the normality assumption.[ 15 ] Non-parametric tests may fail to detect a significant difference when compared with a parametric test. That is, they usually have less power.

As is done for the parametric tests, the test statistic is compared with known values for the sampling distribution of that statistic and the null hypothesis is accepted or rejected. The types of non-parametric analysis techniques and the corresponding parametric analysis techniques are delineated in Table 5 .

Analogue of parametric and non-parametric tests

An external file that holds a picture, illustration, etc.
Object name is IJA-60-662-g017.jpg

Median test for one sample: The sign test and Wilcoxon's signed rank test

The sign test and Wilcoxon's signed rank test are used for median tests of one sample. These tests examine whether one instance of sample data is greater or smaller than the median reference value.

This test examines the hypothesis about the median θ0 of a population. It tests the null hypothesis H0 = θ0. When the observed value (Xi) is greater than the reference value (θ0), it is marked as+. If the observed value is smaller than the reference value, it is marked as − sign. If the observed value is equal to the reference value (θ0), it is eliminated from the sample.

If the null hypothesis is true, there will be an equal number of + signs and − signs.

The sign test ignores the actual values of the data and only uses + or − signs. Therefore, it is useful when it is difficult to measure the values.

Wilcoxon's signed rank test

There is a major limitation of sign test as we lose the quantitative information of the given data and merely use the + or – signs. Wilcoxon's signed rank test not only examines the observed values in comparison with θ0 but also takes into consideration the relative sizes, adding more statistical power to the test. As in the sign test, if there is an observed value that is equal to the reference value θ0, this observed value is eliminated from the sample.

Wilcoxon's rank sum test ranks all data points in order, calculates the rank sum of each sample and compares the difference in the rank sums.

Mann-Whitney test

It is used to test the null hypothesis that two samples have the same median or, alternatively, whether observations in one sample tend to be larger than observations in the other.

Mann–Whitney test compares all data (xi) belonging to the X group and all data (yi) belonging to the Y group and calculates the probability of xi being greater than yi: P (xi > yi). The null hypothesis states that P (xi > yi) = P (xi < yi) =1/2 while the alternative hypothesis states that P (xi > yi) ≠1/2.

Kolmogorov-Smirnov test

The two-sample Kolmogorov-Smirnov (KS) test was designed as a generic method to test whether two random samples are drawn from the same distribution. The null hypothesis of the KS test is that both distributions are identical. The statistic of the KS test is a distance between the two empirical distributions, computed as the maximum absolute difference between their cumulative curves.

Kruskal-Wallis test

The Kruskal–Wallis test is a non-parametric test to analyse the variance.[ 14 ] It analyses if there is any difference in the median values of three or more independent samples. The data values are ranked in an increasing order, and the rank sums calculated followed by calculation of the test statistic.

Jonckheere test

In contrast to Kruskal–Wallis test, in Jonckheere test, there is an a priori ordering that gives it a more statistical power than the Kruskal–Wallis test.[ 14 ]

Friedman test

The Friedman test is a non-parametric test for testing the difference between several related samples. The Friedman test is an alternative for repeated measures ANOVAs which is used when the same parameter has been measured under different conditions on the same subjects.[ 13 ]

Tests to analyse the categorical data

Chi-square test, Fischer's exact test and McNemar's test are used to analyse the categorical or nominal variables. The Chi-square test compares the frequencies and tests whether the observed data differ significantly from that of the expected data if there were no differences between groups (i.e., the null hypothesis). It is calculated by the sum of the squared difference between observed ( O ) and the expected ( E ) data (or the deviation, d ) divided by the expected data by the following formula:

An external file that holds a picture, illustration, etc.
Object name is IJA-60-662-g018.jpg

A Yates correction factor is used when the sample size is small. Fischer's exact test is used to determine if there are non-random associations between two categorical variables. It does not assume random sampling, and instead of referring a calculated statistic to a sampling distribution, it calculates an exact probability. McNemar's test is used for paired nominal data. It is applied to 2 × 2 table with paired-dependent samples. It is used to determine whether the row and column frequencies are equal (that is, whether there is ‘marginal homogeneity’). The null hypothesis is that the paired proportions are equal. The Mantel-Haenszel Chi-square test is a multivariate test as it analyses multiple grouping variables. It stratifies according to the nominated confounding variables and identifies any that affects the primary outcome variable. If the outcome variable is dichotomous, then logistic regression is used.

SOFTWARES AVAILABLE FOR STATISTICS, SAMPLE SIZE CALCULATION AND POWER ANALYSIS

Numerous statistical software systems are available currently. The commonly used software systems are Statistical Package for the Social Sciences (SPSS – manufactured by IBM corporation), Statistical Analysis System ((SAS – developed by SAS Institute North Carolina, United States of America), R (designed by Ross Ihaka and Robert Gentleman from R core team), Minitab (developed by Minitab Inc), Stata (developed by StataCorp) and the MS Excel (developed by Microsoft).

There are a number of web resources which are related to statistical power analyses. A few are:

  • StatPages.net – provides links to a number of online power calculators
  • G-Power – provides a downloadable power analysis program that runs under DOS
  • Power analysis for ANOVA designs an interactive site that calculates power or sample size needed to attain a given power for one effect in a factorial ANOVA design
  • SPSS makes a program called SamplePower. It gives an output of a complete report on the computer screen which can be cut and paste into another document.

It is important that a researcher knows the concepts of the basic statistical methods used for conduct of a research study. This will help to conduct an appropriately well-designed study leading to valid and reliable results. Inappropriate use of statistical techniques may lead to faulty conclusions, inducing errors and undermining the significance of the article. Bad statistics may lead to bad research, and bad research may lead to unethical practice. Hence, an adequate knowledge of statistics and the appropriate use of statistical tests are important. An appropriate knowledge about the basic statistical methods will go a long way in improving the research designs and producing quality medical research which can be utilised for formulating the evidence-based guidelines.

Financial support and sponsorship

Conflicts of interest.

There are no conflicts of interest.

Qualitative vs Quantitative Research Methods & Data Analysis

Saul Mcleod, PhD

Editor-in-Chief for Simply Psychology

BSc (Hons) Psychology, MRes, PhD, University of Manchester

Saul Mcleod, PhD., is a qualified psychology teacher with over 18 years of experience in further and higher education. He has been published in peer-reviewed journals, including the Journal of Clinical Psychology.

Learn about our Editorial Process

Olivia Guy-Evans, MSc

Associate Editor for Simply Psychology

BSc (Hons) Psychology, MSc Psychology of Education

Olivia Guy-Evans is a writer and associate editor for Simply Psychology. She has previously worked in healthcare and educational sectors.

On This Page:

What is the difference between quantitative and qualitative?

The main difference between quantitative and qualitative research is the type of data they collect and analyze.

Quantitative research collects numerical data and analyzes it using statistical methods. The aim is to produce objective, empirical data that can be measured and expressed in numerical terms. Quantitative research is often used to test hypotheses, identify patterns, and make predictions.

Qualitative research , on the other hand, collects non-numerical data such as words, images, and sounds. The focus is on exploring subjective experiences, opinions, and attitudes, often through observation and interviews.

Qualitative research aims to produce rich and detailed descriptions of the phenomenon being studied, and to uncover new insights and meanings.

Quantitative data is information about quantities, and therefore numbers, and qualitative data is descriptive, and regards phenomenon which can be observed but not measured, such as language.

What Is Qualitative Research?

Qualitative research is the process of collecting, analyzing, and interpreting non-numerical data, such as language. Qualitative research can be used to understand how an individual subjectively perceives and gives meaning to their social reality.

Qualitative data is non-numerical data, such as text, video, photographs, or audio recordings. This type of data can be collected using diary accounts or in-depth interviews and analyzed using grounded theory or thematic analysis.

Qualitative research is multimethod in focus, involving an interpretive, naturalistic approach to its subject matter. This means that qualitative researchers study things in their natural settings, attempting to make sense of, or interpret, phenomena in terms of the meanings people bring to them. Denzin and Lincoln (1994, p. 2)

Interest in qualitative data came about as the result of the dissatisfaction of some psychologists (e.g., Carl Rogers) with the scientific study of psychologists such as behaviorists (e.g., Skinner ).

Since psychologists study people, the traditional approach to science is not seen as an appropriate way of carrying out research since it fails to capture the totality of human experience and the essence of being human.  Exploring participants’ experiences is known as a phenomenological approach (re: Humanism ).

Qualitative research is primarily concerned with meaning, subjectivity, and lived experience. The goal is to understand the quality and texture of people’s experiences, how they make sense of them, and the implications for their lives.

Qualitative research aims to understand the social reality of individuals, groups, and cultures as nearly as possible as participants feel or live it. Thus, people and groups are studied in their natural setting.

Some examples of qualitative research questions are provided, such as what an experience feels like, how people talk about something, how they make sense of an experience, and how events unfold for people.

Research following a qualitative approach is exploratory and seeks to explain ‘how’ and ‘why’ a particular phenomenon, or behavior, operates as it does in a particular context. It can be used to generate hypotheses and theories from the data.

Qualitative Methods

There are different types of qualitative research methods, including diary accounts, in-depth interviews , documents, focus groups , case study research , and ethnography.

The results of qualitative methods provide a deep understanding of how people perceive their social realities and in consequence, how they act within the social world.

The researcher has several methods for collecting empirical materials, ranging from the interview to direct observation, to the analysis of artifacts, documents, and cultural records, to the use of visual materials or personal experience. Denzin and Lincoln (1994, p. 14)

Here are some examples of qualitative data:

Interview transcripts : Verbatim records of what participants said during an interview or focus group. They allow researchers to identify common themes and patterns, and draw conclusions based on the data. Interview transcripts can also be useful in providing direct quotes and examples to support research findings.

Observations : The researcher typically takes detailed notes on what they observe, including any contextual information, nonverbal cues, or other relevant details. The resulting observational data can be analyzed to gain insights into social phenomena, such as human behavior, social interactions, and cultural practices.

Unstructured interviews : generate qualitative data through the use of open questions.  This allows the respondent to talk in some depth, choosing their own words.  This helps the researcher develop a real sense of a person’s understanding of a situation.

Diaries or journals : Written accounts of personal experiences or reflections.

Notice that qualitative data could be much more than just words or text. Photographs, videos, sound recordings, and so on, can be considered qualitative data. Visual data can be used to understand behaviors, environments, and social interactions.

Qualitative Data Analysis

Qualitative research is endlessly creative and interpretive. The researcher does not just leave the field with mountains of empirical data and then easily write up his or her findings.

Qualitative interpretations are constructed, and various techniques can be used to make sense of the data, such as content analysis, grounded theory (Glaser & Strauss, 1967), thematic analysis (Braun & Clarke, 2006), or discourse analysis.

For example, thematic analysis is a qualitative approach that involves identifying implicit or explicit ideas within the data. Themes will often emerge once the data has been coded.

RESEARCH THEMATICANALYSISMETHOD

Key Features

  • Events can be understood adequately only if they are seen in context. Therefore, a qualitative researcher immerses her/himself in the field, in natural surroundings. The contexts of inquiry are not contrived; they are natural. Nothing is predefined or taken for granted.
  • Qualitative researchers want those who are studied to speak for themselves, to provide their perspectives in words and other actions. Therefore, qualitative research is an interactive process in which the persons studied teach the researcher about their lives.
  • The qualitative researcher is an integral part of the data; without the active participation of the researcher, no data exists.
  • The study’s design evolves during the research and can be adjusted or changed as it progresses. For the qualitative researcher, there is no single reality. It is subjective and exists only in reference to the observer.
  • The theory is data-driven and emerges as part of the research process, evolving from the data as they are collected.

Limitations of Qualitative Research

  • Because of the time and costs involved, qualitative designs do not generally draw samples from large-scale data sets.
  • The problem of adequate validity or reliability is a major criticism. Because of the subjective nature of qualitative data and its origin in single contexts, it is difficult to apply conventional standards of reliability and validity. For example, because of the central role played by the researcher in the generation of data, it is not possible to replicate qualitative studies.
  • Also, contexts, situations, events, conditions, and interactions cannot be replicated to any extent, nor can generalizations be made to a wider context than the one studied with confidence.
  • The time required for data collection, analysis, and interpretation is lengthy. Analysis of qualitative data is difficult, and expert knowledge of an area is necessary to interpret qualitative data. Great care must be taken when doing so, for example, looking for mental illness symptoms.

Advantages of Qualitative Research

  • Because of close researcher involvement, the researcher gains an insider’s view of the field. This allows the researcher to find issues that are often missed (such as subtleties and complexities) by the scientific, more positivistic inquiries.
  • Qualitative descriptions can be important in suggesting possible relationships, causes, effects, and dynamic processes.
  • Qualitative analysis allows for ambiguities/contradictions in the data, which reflect social reality (Denscombe, 2010).
  • Qualitative research uses a descriptive, narrative style; this research might be of particular benefit to the practitioner as she or he could turn to qualitative reports to examine forms of knowledge that might otherwise be unavailable, thereby gaining new insight.

What Is Quantitative Research?

Quantitative research involves the process of objectively collecting and analyzing numerical data to describe, predict, or control variables of interest.

The goals of quantitative research are to test causal relationships between variables , make predictions, and generalize results to wider populations.

Quantitative researchers aim to establish general laws of behavior and phenomenon across different settings/contexts. Research is used to test a theory and ultimately support or reject it.

Quantitative Methods

Experiments typically yield quantitative data, as they are concerned with measuring things.  However, other research methods, such as controlled observations and questionnaires , can produce both quantitative information.

For example, a rating scale or closed questions on a questionnaire would generate quantitative data as these produce either numerical data or data that can be put into categories (e.g., “yes,” “no” answers).

Experimental methods limit how research participants react to and express appropriate social behavior.

Findings are, therefore, likely to be context-bound and simply a reflection of the assumptions that the researcher brings to the investigation.

There are numerous examples of quantitative data in psychological research, including mental health. Here are a few examples:

Another example is the Experience in Close Relationships Scale (ECR), a self-report questionnaire widely used to assess adult attachment styles .

The ECR provides quantitative data that can be used to assess attachment styles and predict relationship outcomes.

Neuroimaging data : Neuroimaging techniques, such as MRI and fMRI, provide quantitative data on brain structure and function.

This data can be analyzed to identify brain regions involved in specific mental processes or disorders.

For example, the Beck Depression Inventory (BDI) is a clinician-administered questionnaire widely used to assess the severity of depressive symptoms in individuals.

The BDI consists of 21 questions, each scored on a scale of 0 to 3, with higher scores indicating more severe depressive symptoms. 

Quantitative Data Analysis

Statistics help us turn quantitative data into useful information to help with decision-making. We can use statistics to summarize our data, describing patterns, relationships, and connections. Statistics can be descriptive or inferential.

Descriptive statistics help us to summarize our data. In contrast, inferential statistics are used to identify statistically significant differences between groups of data (such as intervention and control groups in a randomized control study).

  • Quantitative researchers try to control extraneous variables by conducting their studies in the lab.
  • The research aims for objectivity (i.e., without bias) and is separated from the data.
  • The design of the study is determined before it begins.
  • For the quantitative researcher, the reality is objective, exists separately from the researcher, and can be seen by anyone.
  • Research is used to test a theory and ultimately support or reject it.

Limitations of Quantitative Research

  • Context: Quantitative experiments do not take place in natural settings. In addition, they do not allow participants to explain their choices or the meaning of the questions they may have for those participants (Carr, 1994).
  • Researcher expertise: Poor knowledge of the application of statistical analysis may negatively affect analysis and subsequent interpretation (Black, 1999).
  • Variability of data quantity: Large sample sizes are needed for more accurate analysis. Small-scale quantitative studies may be less reliable because of the low quantity of data (Denscombe, 2010). This also affects the ability to generalize study findings to wider populations.
  • Confirmation bias: The researcher might miss observing phenomena because of focus on theory or hypothesis testing rather than on the theory of hypothesis generation.

Advantages of Quantitative Research

  • Scientific objectivity: Quantitative data can be interpreted with statistical analysis, and since statistics are based on the principles of mathematics, the quantitative approach is viewed as scientifically objective and rational (Carr, 1994; Denscombe, 2010).
  • Useful for testing and validating already constructed theories.
  • Rapid analysis: Sophisticated software removes much of the need for prolonged data analysis, especially with large volumes of data involved (Antonius, 2003).
  • Replication: Quantitative data is based on measured values and can be checked by others because numerical data is less open to ambiguities of interpretation.
  • Hypotheses can also be tested because of statistical analysis (Antonius, 2003).

Antonius, R. (2003). Interpreting quantitative data with SPSS . Sage.

Black, T. R. (1999). Doing quantitative research in the social sciences: An integrated approach to research design, measurement and statistics . Sage.

Braun, V. & Clarke, V. (2006). Using thematic analysis in psychology . Qualitative Research in Psychology , 3, 77–101.

Carr, L. T. (1994). The strengths and weaknesses of quantitative and qualitative research : what method for nursing? Journal of advanced nursing, 20(4) , 716-721.

Denscombe, M. (2010). The Good Research Guide: for small-scale social research. McGraw Hill.

Denzin, N., & Lincoln. Y. (1994). Handbook of Qualitative Research. Thousand Oaks, CA, US: Sage Publications Inc.

Glaser, B. G., Strauss, A. L., & Strutzel, E. (1968). The discovery of grounded theory; strategies for qualitative research. Nursing research, 17(4) , 364.

Minichiello, V. (1990). In-Depth Interviewing: Researching People. Longman Cheshire.

Punch, K. (1998). Introduction to Social Research: Quantitative and Qualitative Approaches. London: Sage

Further Information

  • Designing qualitative research
  • Methods of data collection and analysis
  • Introduction to quantitative and qualitative research
  • Checklists for improving rigour in qualitative research: a case of the tail wagging the dog?
  • Qualitative research in health care: Analysing qualitative data
  • Qualitative data analysis: the framework approach
  • Using the framework method for the analysis of
  • Qualitative data in multi-disciplinary health research
  • Content Analysis
  • Grounded Theory
  • Thematic Analysis

Print Friendly, PDF & Email

How to do Market Research: a Step-by-Step Guide

14 min read

Looking for the best way to do market research? From framing your initial question to extracting valuable customer insights, we’ll walk you through the lean market research process step-by-step. You will learn effective techniques for collecting and analyzing data , with practical tips on applying your findings to benefit your SaaS. Get ready to empower your decisions with real-world market intelligence.

  • Market research is vital for making informed business decisions, enabling companies to understand the market, target audience, and competitors, reducing risks, and optimizing marketing communications and product strategies .
  • Effective market research requires clear and measurable objectives, guiding decision-making and ensuring relevance to the project’s needs, and should be accompanied by appropriate methods , including both primary and secondary research .
  • Applying insights from market research to product development and marketing strategies can significantly enhance business growth. This allows businesses to tailor their offerings and engage more effectively with their target market.

analysis techniques in quantitative research

Try Userpilot and Take Your Product Growth to the Next Level

  • 14 Day Trial
  • No Credit Card Required

analysis techniques in quantitative research

What is Market Research

analysis techniques in quantitative research

Essentially, market research is the process of understanding one’s target audience’s needs and wants to validate a new product, feature, or service idea. It involves probing and extracting answers based on empirical evidence instead of relying on hunches or speculative judgment.

Why should you do market research?

Understanding your consumers’ behavior and needs well through methodical market research is vital for informed decision-making when it comes to your product roadmap. These choices can make or break your SaaS company. Without thorough market research, you’re navigating blindly, basing crucial judgments on antiquated notions of customer habits, imprecise economic gauges, or untested assumptions rather than solid competitive analysis.

The outcome? Sharper marketing messages, savvy product development strategies, and an intimate grasp of both prospective buyers and existing customers’ preferences and needs.

Identifying Your Market Research Goals

Before you do anything – you need to determine specific and actionable goals of your market research project. Setting SMART (specific, measurable, achievable, relevant and time-bound) goals will help you stay on track, come up with better market research questions and achieve more reliable results faster.

smart goals userpilot

For effective market research outcomes, your goals must be:

  • Quantifiable .
  • Attainable.
  • Directly aligned with project requirements.

Having established unambiguous goals prior to delving into data analysis sets up a solid foundation ensuring pivotal questions, hypotheses, and indicators are systematically tackled during effective market research.

Market Research Methods

generative research methods

Now that you understand the role of well-defined research objectives, let’s examine the different types of market research and research techniques for realizing these goals. These methods are essentially your toolkit for extracting valuable insights and they fall into two broad categories: primary research and secondary research . Choosing between them depends on many factors such as your budget, time availability, and whether you’re looking for more exploratory research data or concrete answers.

Engaging in primary research is comparable to unearthing precious metals—it requires gathering new information straight from sources through several approaches including:

Userpilot surveys

  • Focus groups.

free trial

This approach gives you first-hand insight into your target audience.

Conversely, secondary research uses already established datasets of primary data – which can add depth and reinforcement to your firsthand findings. For a 360 view of your market trends, combine both techniques – exploratory primary research and secondary channels of inquiry.

Let’s look a bit deeper into them now.

What is Primary Market Research?

Market research uses primary market research as an essential tool. This involves collecting new data directly from your target audience using various methods, such as surveys , focus groups, and interviews.

analysis techniques in quantitative research

Each method has its benefits. For example, observational studies allow you to see how consumers interact with your product.

analysis techniques in quantitative research

There are many ways to conduct primary research.

Focus Groups : Hold discussions with small groups of 5 to 10 people from your target audience. These discussions can provide valuable feedback on products, perceptions of your company’s brand name, or opinions on competitors.

Interviews : Have one-on-one conversations to gather detailed information from individuals in your target audience.

analysis techniques in quantitative research

Surveys : These are a common tool in primary market research and can be used instead of focus groups to understand consumer attitudes. Surveys use structured questions and can reach a broad audience efficiently.

analysis techniques in quantitative research

Navigating Secondary Market Research

While market research using primary methods is like discovering precious metals, secondary market research technique is like using a treasure map. This approach uses data collected by others from various sources, providing a broad industry view. These sources include market analyses from agencies like Statista, historical data such as census records, and academic studies.

Secondary research provides the basic knowledge necessary for conducting primary market research goals but may lack detail on specific business questions and could also be accessible to competitors.

To make the most of secondary market research, it’s important to analyze summarized data to identify trends, rely on reputable sources for accurate data, and remain unbiased in data collection methods.

The effectiveness of secondary research depends significantly on how well the data is interpreted, ensuring that this information complements the insights from primary research.

The Role of Qualitative and Quantitative Data in Market Research

Qualitative data analysis

In market research, there are two main types of data: qualitative and quantitative. Qualitative data explores the reasons behind consumer actions, collecting non-numeric information to understand consumer behaviors and motivations. For more on gathering and analyzing qualitative data, see How to Analyse Qualitative Data . On the other hand, quantitative data uses numeric data to measure consumer preferences, behaviors, and market sizes. To learn more about handling this type of data, check out User Analytics .

A thorough market analysis usually combines both qualitative and quantitative data. This approach provides a full view of the market by merging detailed qualitative insights with concrete quantitative statistics. For more on combining these approaches, refer to Generative vs. Evaluative Research .

Gathering Qualitative Insights

analysis techniques in quantitative research

Qualitative research involves direct engagement with customers, like having detailed discussions. It includes observational studies that capture genuine consumer reactions. This type of research provides deep insights into consumer perceptions, brand comparisons, consumer behavior, and feedback on specific product features.

Studies on customer satisfaction and loyalty reveal effective strategies for keeping customers and what keeps them loyal, such as loyalty programs and quality customer service. The strength of qualitative research lies in its ability to dig deeper than just numbers, reaching insights that quantitative data might miss. By using qualitative data to customize experiences, businesses can increase customer satisfaction, interaction , and loyalty, leading to greater business growth.

Analyzing Quantitative Data

Quantitative research provides precision and the ability to measure findings using structured data collection methods like polls and surveys. Product analytics tools such as Userpilot , Amplitude , Heap , and Mixpanel are highly effective for collecting and organizing quantifiable data. This type of data is crucial for identifying trends and insights, which can help businesses track important performance indicators such as conversion rates or customer lifetime value , supporting their growth strategies.

Quantitative research data is divided into two types: discrete data, which includes countable numbers, and continuous data, which consists of numbers that can have fractions or decimals. These are vital for revealing important demographic information.

Segmenting Your Target Market

Userpilot segmentation

Market research plays a key role in segmenting your target audience into manageable segments.

These market segments are typically grouped by similar needs or attributes, and display similar responses in marketing research surveys and initiatives. The full market segmentation process is vital for comprehensively grasping and satisfying the requirements of your targeted consumer base.

Accumulating demographic information forms the basis for executing effective market segmentation strategies. Businesses prioritize obtaining user data such as:

  • Job functions.
  • Organizational scale.
  • Customer demographics profiles.
  • Lifestyle choices.
  • Values systems.
  • Product usage patterns.

This information can be collected in the initial sign-up flow (through a signup flow survey; see the Asana example below) or by conducting comprehensive market research surveys .

signup flow

At its core, successful market segmentation enables businesses to communicate effectively in their target customers’ dialects while catering explicitly to their distinct demands.

Userpilot allows you to easily segment your users not only by demographic information, company size, plan, or role – but also by their in-app engagement ( behavioral segmentation ):

behavioral segmentation

In summary, the techniques used to create detailed analyses, like conducting specialized surveys and carefully collecting relevant participant information, are crucial for identifying groups within a larger target population. These groups are defined by usage patterns and broad demographic and economic indicators, enabling companies to not only reach but also deeply connect with each niche market they aim to capture.

Creating buyer personas based on your market research

user personas userpilot

Creating buyer personas is a strategic process that helps businesses better understand and cater to their target customers. Here’s how you can systematically approach creating effective buyer personas:

  • Gather Initial Data : Start by collecting basic demographic information such as age, gender, location, and education level. This can come from existing customer databases, market research, or industry reports.

analysis techniques in quantitative research

  • Segment the Audience : Based on the collected data, segment your audience into distinct groups. Each segment should represent a type of customer with similar characteristics and behaviors. This segmentation helps in personalizing marketing and sales strategies effectively.
  • Build Detailed Personas : For each segment, create a detailed persona that includes not only demographic and behavioral traits but also psychographics like interests, values, and lifestyle. Each persona should tell the story of an ideal customer, making them relatable for your marketing team.
  • Refine Over Time : Buyer personas are not static. As you gather more data and the market evolves, revisit and refine your personas to keep them relevant and accurate.
  • Utilize Tools Like Userpilot : Tools such as Userpilot can enhance this process by providing analytics that reveal how users interact with your product. This can confirm hypotheses or uncover new insights about user preferences and behaviors, which can be integrated into existing personas to make them even more accurate.

By carefully crafting and continually updating buyer personas, businesses can achieve a deeper understanding of their customers. This enables them to tailor their offerings and communications effectively, thereby enhancing customer engagement and satisfaction.

Recruiting Participants for Primary Research

Choosing the right participants for primary research is a crucial step in market research. It’s important to find individuals who can provide relevant and meaningful consumer feedback, on your product or service, as this feedback is key to developing accurate user personas.

Userpilot can be instrumental in this process. It collects data on how users interact with and use your products, helping you identify who might be the best candidates for more detailed studies, such as interviews.

To efficiently recruit participants for interviews, Userpilot’s in-app features, such as in-app modals with embedded surveys can be extremely useful. You can use these tools to engage directly with users who meet your specific criteria, right within your app.

analysis techniques in quantitative research

This method not only simplifies the recruitment process but also ensures that you’re interacting with the most relevant users. By leveraging these features, you can gather deep insights that significantly enhance the development of your user personas, ensuring your research is both timely and informed.

Competitive Analysis for Strategic Advantage

Competitive analysis helps businesses understand their rivals’ strategies. It involves identifying which industries or markets to target and listing key competitors to gain a clear view of the competitive environment. This includes evaluating competitors’ market share, strengths, weaknesses, and potential entry barriers, often using tools like SWOT analysis.

By understanding competitors’ operations and past marketing efforts, businesses can craft new strategies, pinpoint opportunities, and learn from competitors’ mistakes. Employing market research, brand perception surveys, and market statistics, alongside analytical frameworks like Porter’s Five Forces Model, helps businesses uncover new opportunities and maintain a competitive advantage.

Ultimately, competitive analysis uses the understanding of competition to fuel business growth.

Conducting Effective Market Research Surveys

Primary market research often uses surveys as a cost-effective way to gather data. These surveys reach wide audiences and provide quick feedback. It’s crucial to carefully plan the creation and distribution of these surveys to ensure they are effective. Given the high amount of web traffic from mobile devices, it’s particularly important to make surveys mobile-friendly.

To get the most comprehensive data, include both quantitative (closed-ended) and qualitative (open-ended) questions in your survey . Offering incentives like financial compensation or vouchers can encourage participation, but it’s important to manage these carefully to avoid biasing the responses.

analysis techniques in quantitative research

Well-designed surveys are like direct interviews with your target audience and are key to obtaining valuable insights about their views and experiences.

Userpilot offers over 50 in-app survey templates along with a bespoke builder, which are important tools for collecting the right responses. These features allow you to tailor surveys precisely to your needs, ensuring you gather accurate and relevant data directly from your users. By leveraging these templates and customizing them with the bespoke builder, you can effectively engage your audience and enhance the quality of insights you receive. This setup is crucial for conducting efficient and effective market research.

analysis techniques in quantitative research

Analyzing and Interpreting Market Research Data

Once you have collected data through surveys, market research data analysis is the next critical step. It involves identifying patterns, establishing connections, and extracting insights that inform business decisions.

Userpilot’s analytics suite offers deep and easily accessible insights into your market research data:

analysis techniques in quantitative research

This process starts with preparing the data by cleaning and organizing it to ensure accuracy and ease of analysis. Depending on the study’s goals, various analytical methods can be used, from simple descriptive statistics to complex multivariate analyses, all chosen to provide meaningful insights.

The core of this analysis aims to uncover market trends and understand industry specifics, which can highlight key factors such as impactful customer experiences, profitable products or services, and effective marketing strategies. Communicating these findings effectively involves presenting them in clear reports and using visual aids while making practical recommendations and addressing any limitations in the research scope or methods. Ultimately, data analysis transforms raw data into compelling narratives that offer actionable business intelligence.

Applying Market Research to Product or Service Development

Market research is much more than just collecting data and uncovering insights; it’s a vital tool for driving business growth and guiding product development at every stage. Here’s how market research supports business throughout the product lifecycle:

  • Concept Creation : Helps identify market needs and opportunities to inform the initial product idea.
  • Building a Business Case : Provides evidence and data to justify investment in the new product.
  • Product Development : Offers insights into customer preferences and feedback for refining product features.
  • Market Introduction : Aids in strategizing the launch, targeting the right audience, and setting the right price.
  • Lifecycle Management : Continuously gathers data on customer usage and satisfaction to enhance the product over time.

Consider a B2B SaaS company that develops project management software. By engaging in targeted market research activities like surveys and doing focus group call groups among its business clients, the company can:

  • Understand Business Needs : Gain insights into the specific project management challenges and needs of different industries.
  • Refine Product Features : Discover which software features are most valued by businesses, such as integration capabilities, user-friendliness, or specific tools for collaboration.
  • Tailor Marketing Strategies : Identify the most effective communication channels and messaging that resonate with business clients, such as emphasizing efficiency gains or return on investment.

Market research guides businesses from the initial idea through to launch and beyond, acting as a strategic tool that ensures all actions are aligned with market demands and customer needs , ultimately aiming for successful business outcomes.

Utilizing Tools for Efficient Market Research

Using tools like Userpilot, SurveyMonkey, Google Forms, and Typeform, market researchers can reach a wide audience and get fast responses. These platforms help to design, distribute, and analyze surveys efficiently.

Userpilot stands out by allowing businesses to create targeted in-app experiences that engage users directly where they are most active—within the app itself. This direct engagement method improves the quality of the feedback collected as it relates to specific features or user experiences.

analysis techniques in quantitative research

Userpilot also offers features such as demographic filtering and behavioral-based segmentation, which speeds up the process of finding and recruiting the right participants for market research.

analysis techniques in quantitative research

These tools are essential for performing detailed and effective market research. They break down geographic and cultural barriers, offer access to diverse user groups, and enable businesses to conduct deep, actionable analyses across different market segments.

Translating Research Findings into Business Growth

Market research does more than just gather and analyze data; it aims to transform these insights into tangible business improvements. This process is crucial in guiding product development and helping increase a company’s market share by informing targeted strategies. For instance, a B2B SaaS company could use market research to:

  • Tailor marketing strategies specifically for key user personas.
  • Identify the most valued features for your users.
  • Develop pricing strategies that appeal to companies of different sizes.
  • Gain insight into the specific needs and expectations of their customers.

By implementing effective market research techniques, companies can customize their products or services to better serve their target audience’s needs, fundamental for stimulating company growth . Conducting personalized market research adds value, while collaborating with specialized firms may yield additional profound insights.

Market research is not just about collecting data; it’s about deeply understanding your customers, spotting opportunities, and making informed decisions that drive your business forward. It provides essential insights into the market and business environment, influencing how potential clients perceive your company.

By conducting competitor analysis and market research, organizations can:

  • Connect with their target audience.
  • Understand their competitive position.
  • Plan strategically for future initiatives.
  • Gain insights into customer perceptions of their brand, uncovering new perspectives or opportunities for improvement.

Since competitors also use market research to their advantage, engaging in these analytical processes is crucial for a comprehensive marketing strategy, aimed at business growth.

Start your own market research and journey today to pave the way to success.

Frequently Asked Questions

What is market research and why is it important.

Understanding their target market through collected information and insights, businesses can make informed decisions, diminish risks, and enhance marketing strategies with the aid of market research. This ensures that choices are based on reliable data, which is crucial for business success.

What is the difference between primary and secondary research?

To summarize, primary research entails the gathering of original data directly from the source, whereas secondary research utilizes previously compiled data sources to add perspective and reinforce conclusions derived from primary research.

How does market research guide product development?

By offering critical data on consumer habits and preferences, market research steers the enhancement of product features, thereby influencing decisions across all stages of a product’s life cycle and aiding in the evolution of product development.

What tools can be used for efficient market research?

Platforms such as Userpilot, SurveyMonkey, Google Forms, and Typeform can be leveraged alongside technologies that are driven by data to simplify the process of crafting, disseminating, and examining online surveys which play a crucial role in conducting market research effectively.

How can market research translate into business growth?

By informing product development, marketing strategies, and identifying opportunities for growth through enlightened decision-making, market research results can propel business expansion.

Leave a comment Cancel reply

Save my name, email, and website in this browser for the next time I comment.

Book a demo with on of our product specialists

Get The Insights!

The fastest way to learn about Product Growth,Management & Trends.

The coolest way to learn about Product Growth, Management & Trends. Delivered fresh to your inbox, weekly.

analysis techniques in quantitative research

The fastest way to learn about Product Growth, Management & Trends.

You might also be interested in ...

20 effective saas marketing tactics to drive product growth in 2024, a complete guide to conversion path marketing.

Aazar Ali Shad

IMAGES

  1. Quantitative Research

    analysis techniques in quantitative research

  2. Quantitative Research: What It Is, Practices & Methods

    analysis techniques in quantitative research

  3. Quantitative Research Methods PowerPoint Template

    analysis techniques in quantitative research

  4. 15 Research Methodology Examples (2023)

    analysis techniques in quantitative research

  5. Sampling Quantitative Techniques For Data analysis

    analysis techniques in quantitative research

  6. Quantitative Research Methods PowerPoint Template

    analysis techniques in quantitative research

VIDEO

  1. Quantitative research process

  2. Quantitative Analysis X Technical Analysis to Find The Best Stock (Satriawan Haryono, CEWA, CTAD)

  3. Quantitative Analysis, Topic 4

  4. Quantitative analysis introduction Chemistry secondary 3 تالتة ثانوى كيمسترى

  5. Steps of Quantitative Analysis

  6. How to Quantitative Analysis

COMMENTS

  1. Quantitative Data Analysis Methods & Techniques 101

    The two "branches" of quantitative analysis. As I mentioned, quantitative analysis is powered by statistical analysis methods.There are two main "branches" of statistical methods that are used - descriptive statistics and inferential statistics.In your research, you might only use descriptive statistics, or you might use a mix of both, depending on what you're trying to figure out.

  2. Quantitative Data Analysis: A Comprehensive Guide

    Quantitative data has to be gathered and cleaned before proceeding to the stage of analyzing it. Below are the steps to prepare a data before quantitative research analysis: Step 1: Data Collection. Before beginning the analysis process, you need data. Data can be collected through rigorous quantitative research, which includes methods such as ...

  3. Quantitative Research

    Here are some commonly used quantitative research analysis methods: Statistical Analysis. Statistical analysis is the most common quantitative research analysis method. It involves using statistical tools and techniques to analyze the numerical data collected during the research process. Statistical analysis can be used to identify patterns ...

  4. What Is Quantitative Research?

    Quantitative research methods. You can use quantitative research methods for descriptive, correlational or experimental research. In descriptive research, you simply seek an overall summary of your study variables.; In correlational research, you investigate relationships between your study variables.; In experimental research, you systematically examine whether there is a cause-and-effect ...

  5. A Comprehensive Guide to Quantitative Research Methods: Design, Data

    5. Efficiency: Quantitative methods often allow for efficient data collection and analysis. Surveys, experiments, and statistical software tools facilitate the processing of large amounts of data, making it feasible to study complex phenomena within a reasonable timeframe. Limitations of Quantitative Methods: 1.

  6. Data Analysis in Quantitative Research

    Quantitative data analysis serves as part of an essential process of evidence-making in health and social sciences. It is adopted for any types of research question and design whether it is descriptive, explanatory, or causal. However, compared with qualitative counterpart, quantitative data analysis has less flexibility.

  7. Quantitative Data Analysis Methods, Types + Techniques

    8. Weight customer feedback. So far, the quantitative data analysis methods on this list have leveraged numeric data only. However, there are ways to turn qualitative data into quantifiable feedback and to mix and match data sources. For example, you might need to analyze user feedback from multiple surveys.

  8. Quantitative Methods

    Definition. Quantitative method is the collection and analysis of numerical data to answer scientific research questions. Quantitative method is used to summarize, average, find patterns, make predictions, and test causal associations as well as generalizing results to wider populations.

  9. The Beginner's Guide to Statistical Analysis

    Statistical analysis means investigating trends, patterns, and relationships using quantitative data. It is an important research tool used by scientists, governments, businesses, and other organizations. To draw valid conclusions, statistical analysis requires careful planning from the very start of the research process. You need to specify ...

  10. Data Analysis Techniques for Quantitative Study

    Abstract. This chapter describes the types of data analysis techniques in quantitative research and sampling strategies suitable for quantitative studies, particularly probability sampling, to produce credible and trustworthy explanations of a phenomenon. Initially, it briefly describes the measurement levels of variables.

  11. Quantitative Data Analysis: A Complete Guide

    Here's how to make sense of your company's numbers in just four steps: 1. Collect data. Before you can actually start the analysis process, you need data to analyze. This involves conducting quantitative research and collecting numerical data from various sources, including: Interviews or focus groups.

  12. Quantitative Methods

    Quantitative methods emphasize objective measurements and the statistical, mathematical, or numerical analysis of data collected through polls, questionnaires, and surveys, or by manipulating pre-existing statistical data using computational techniques.Quantitative research focuses on gathering numerical data and generalizing it across groups of people or to explain a particular phenomenon.

  13. Part II: Data Analysis Methods in Quantitative Research

    Data Analysis Methods in Quantitative Research. We started this module with levels of measurement as a way to categorize our data. Data analysis is directed toward answering the original research question and achieving the study purpose (or aim). Now, we are going to delve into two main statistical analyses to describe our data and make ...

  14. A Really Simple Guide to Quantitative Data Analysis

    nominal. It is important to know w hat kind of data you are planning to collect or analyse as this w ill. affect your analysis method. A 12 step approach to quantitative data analysis. Step 1 ...

  15. Introduction to Research Statistical Analysis: An Overview of the

    Introduction. Statistical analysis is necessary for any research project seeking to make quantitative conclusions. The following is a primer for research-based statistical analysis. It is intended to be a high-level overview of appropriate statistical testing, while not diving too deep into any specific methodology.

  16. A Practical Guide to Writing Quantitative and Qualitative Research

    A research question is what a study aims to answer after data analysis and interpretation. The answer is written in length in the discussion section of the paper. ... In quantitative research, ... this suggests an integrative process between both research methods wherein a single mixed-methods research question can be developed.1 ...

  17. Quantitative Analysis

    Quantitative analysis is the use of mathematical and statistical techniques to assess the performance of a business. Before the advent of quantitative analysis, many company directors based their decisions on experience and gut. Business owners can now use quantitative methods to predict trends, determine the allocation of resources, and manage ...

  18. Quantitative text analysis

    Quantitative text analysis is a range of computational methods to analyse text data statistically and mathematically. In this Primer, Kristoffer Nielbo et al. introduce the methods, principles and ...

  19. (PDF) Quantitative Analysis: the guide for beginners

    quantitative (numbers) and qualitative (words or images) data. The combination of. quantitative and qualitative research methods is called mixed methods. For example, first, numerical data are ...

  20. Quantitative Research

    Quantitative research methods are concerned with the planning, design, and implementation of strategies to collect and analyze data. Descartes, the seventeenth-century philosopher, suggested that how the results are achieved is often more important than the results themselves, as the journey taken along the research path is a journey of discovery. . High-quality quantitative research is ...

  21. Qualitative vs. Quantitative Research

    When collecting and analyzing data, quantitative research deals with numbers and statistics, while qualitative research deals with words and meanings. Both are important for gaining different kinds of knowledge. Quantitative research. Quantitative research is expressed in numbers and graphs. It is used to test or confirm theories and assumptions.

  22. Basic statistical tools in research and data analysis

    Statistical methods involved in carrying out a study include planning, designing, collecting data, analysing, drawing meaningful interpretation and reporting of the research findings. The statistical analysis gives meaning to the meaningless numbers, thereby breathing life into a lifeless data. The results and inferences are precise only if ...

  23. Quantitative Data

    Quantitative data refers to numerical data that can be measured or counted. This type of data is often used in scientific research and is typically collected through methods such as surveys, experiments, and statistical analysis. Quantitative Data Types. There are two main types of quantitative data: discrete and continuous.

  24. Qualitative vs Quantitative Research Methods & Data Analysis

    The main difference between quantitative and qualitative research is the type of data they collect and analyze. Quantitative research collects numerical data and analyzes it using statistical methods. The aim is to produce objective, empirical data that can be measured and expressed in numerical terms.

  25. How to do Market Research: a Step-by-Step Guide

    Quantitative research data is divided into two types: discrete data, which includes countable numbers, and continuous data, which consists of numbers that can have fractions or decimals. ... and using visual aids while making practical recommendations and addressing any limitations in the research scope or methods. Ultimately, data analysis ...