• How it works

researchprospect post subheader

Thematic Analysis – A Guide with Examples

Published by Alvin Nicolas at August 16th, 2021 , Revised On August 29, 2023

Thematic analysis is one of the most important types of analysis used for qualitative data . When researchers have to analyse audio or video transcripts, they give preference to thematic analysis. A researcher needs to look keenly at the content to identify the context and the message conveyed by the speaker.

Moreover, with the help of this analysis, data can be simplified.  

Importance of Thematic Analysis

Thematic analysis has so many unique and dynamic features, some of which are given below:

Thematic analysis is used because:

  • It is flexible.
  • It is best for complex data sets.
  • It is applied to qualitative data sets.
  • It takes less complexity compared to other theories of analysis.

Intellectuals and researchers give preference to thematic analysis due to its effectiveness in the research.

How to Conduct a Thematic Analysis?

While doing any research , if your data and procedure are clear, it will be easier for your reader to understand how you concluded the results . This will add much clarity to your research.

Understand the Data

This is the first step of your thematic analysis. At this stage, you have to understand the data set. You need to read the entire data instead of reading the small portion. If you do not have the data in the textual form, you have to transcribe it.

Example: If you are visiting an adult dating website, you have to make a data corpus. You should read and re-read the data and consider several profiles. It will give you an idea of how adults represent themselves on dating sites. You may get the following results:

I am a tall, single(widowed), easy-going, honest, good listener with a good sense of humor. Being a handyperson, I keep busy working around the house, and I also like to follow my favourite hockey team on TV or spoil my two granddaughters when I get the chance!! Enjoy most music except Rap! I keep fit by jogging, walking, and bicycling (at least three times a week). I have travelled to many places and RVD the South-West U.S., but I would now like to find that special travel partner to do more travel to warm and interesting countries. I now feel it’s time to meet a nice, kind, honest woman who has some of the same interests as I do; to share the happy times, quiet times, and adventures together

I enjoy photography, lapidary & seeking collectibles in the form of classic movies & 33 1/3, 45 & 78 RPM recordings from the 1920s, ’30s & ’40s. I am retired & looking forward to travelling to Canada, the USA, the UK & Europe, China. I am unique since I do not judge a book by its cover. I accept people for who they are. I will not demand or request perfection from anyone until I am perfect, so I guess that means everyone is safe. My musical tastes range from Classical, big band era, early jazz, classic ’50s & 60’s rock & roll & country since its inception.

Development of Initial Coding:

At this stage, you have to do coding. It’s the essential step of your research . Here you have two options for coding. Either you can do the coding manually or take the help of any tool. A software named the NOVIC is considered the best tool for doing automatic coding.

For manual coding, you can follow the steps given below:

  • Please write down the data in a proper format so that it can be easier to proceed.
  • Use a highlighter to highlight all the essential points from data.
  • Make as many points as possible.
  • Take notes very carefully at this stage.
  • Apply themes as much possible.
  • Now check out the themes of the same pattern or concept.
  • Turn all the same themes into the single one.

Example: For better understanding, the previously explained example of Step 1 is continued here. You can observe the coded profiles below:

Profile No. Data Item Initial Codes
1 I am a tall, single(widowed), easy-going, honest, good listener with a good sense of humour. Being a handyperson, I keep busy working around the house; I also like to follow my favourite hockey team on TV or spoiling my
two granddaughters when I get the chance!! I enjoy most
music except for Rap! I keep fit by jogging, walking, and bicycling(at least three times a week). I have travelled to many places and RVD the South-West U.S., but I would now like to find that special travel partner to do more travel to warm and interesting countries. I now feel it’s time to meet a nice, kind, honest woman who has some of the same interests as I do; to share the happy times, quiet times and adventures together.
Physical description
Widowed
Positive qualities
Humour
Keep busy
Hobbies
Family
Music
Active
Travel
Plans
Partner qualities
Plans
Profile No. Data Item Initial Codes
2 I enjoy photography, lapidary & seeking collectables in the form of classic movies & 33 1/3, 45 & 78 RPM recordings from the 1920s, ’30s & ’40s. I am retired & looking forward to travelling to Canada, the USA, the UK & Europe, China. I am unique since I do not judge a book by its cover. I accept people for who they are. I will not demand or request perfection from anyone until I am perfect, so I guess that means everyone is safe. My musical tastes range from Classical, big band era, early jazz, classic ’50s & 60’s rock & roll & country since its inception. HobbiesFuture plans

Travel

Unique

Values

Humour

Music

Make Themes

At this stage, you have to make the themes. These themes should be categorised based on the codes. All the codes which have previously been generated should be turned into themes. Moreover, with the help of the codes, some themes and sub-themes can also be created. This process is usually done with the help of visuals so that a reader can take an in-depth look at first glance itself.

Extracted Data Review

Now you have to take an in-depth look at all the awarded themes again. You have to check whether all the given themes are organised properly or not. It would help if you were careful and focused because you have to note down the symmetry here. If you find that all the themes are not coherent, you can revise them. You can also reshape the data so that there will be symmetry between the themes and dataset here.

For better understanding, a mind-mapping example is given here:

Extracted Data

Reviewing all the Themes Again

You need to review the themes after coding them. At this stage, you are allowed to play with your themes in a more detailed manner. You have to convert the bigger themes into smaller themes here. If you want to combine some similar themes into a single theme, then you can do it. This step involves two steps for better fragmentation. 

You need to observe the coded data separately so that you can have a precise view. If you find that the themes which are given are following the dataset, it’s okay. Otherwise, you may have to rearrange the data again to coherence in the coded data.

Corpus Data

Here you have to take into consideration all the corpus data again. It would help if you found how themes are arranged here. It would help if you used the visuals to check out the relationship between them. Suppose all the things are not done accordingly, so you should check out the previous steps for a refined process. Otherwise, you can move to the next step. However, make sure that all the themes are satisfactory and you are not confused.

When all the two steps are completed, you need to make a more précised mind map. An example following the previous cases has been given below:

Corpus Data

Define all the Themes here

Now you have to define all the themes which you have given to your data set. You can recheck them carefully if you feel that some of them can fit into one concept, you can keep them, and eliminate the other irrelevant themes. Because it should be precise and clear, there should not be any ambiguity. Now you have to think about the main idea and check out that all the given themes are parallel to your main idea or not. This can change the concept for you.

The given names should be so that it can give any reader a clear idea about your findings. However, it should not oppose your thematic analysis; rather, everything should be organised accurately.

Steps of Writing a dissertation

Does your Research Methodology Have the Following?

  • Great Research/Sources
  • Perfect Language
  • Accurate Sources

If not, we can help. Our panel of experts makes sure to keep the 3 pillars of Research Methodology strong.

Does your Research Methodology Have the Following?

Also, read about discourse analysis , content analysis and survey conducting . we have provided comprehensive guides.

Make a Report

You need to make the final report of all the findings you have done at this stage. You should include the dataset, findings, and every aspect of your analysis in it.

While making the final report , do not forget to consider your audience. For instance, you are writing for the Newsletter, Journal, Public awareness, etc., your report should be according to your audience. It should be concise and have some logic; it should not be repetitive. You can use the references of other relevant sources as evidence to support your discussion.  

Frequently Asked Questions

What is meant by thematic analysis.

Thematic Analysis is a qualitative research method that involves identifying, analyzing, and interpreting recurring themes or patterns in data. It aims to uncover underlying meanings, ideas, and concepts within the dataset, providing insights into participants’ perspectives and experiences.

You May Also Like

In correlational research, a researcher measures the relationship between two or more variables or sets of scores without having control over the variables.

Struggling to figure out “whether I should choose primary research or secondary research in my dissertation?” Here are some tips to help you decide.

Quantitative research is associated with measurable numerical data. Qualitative research is where a researcher collects evidence to seek answers to a question.

USEFUL LINKS

LEARNING RESOURCES

researchprospect-reviews-trust-site

COMPANY DETAILS

Research-Prospect-Writing-Service

  • How It Works

Have a language expert improve your writing

Run a free plagiarism check in 10 minutes, generate accurate citations for free.

  • Knowledge Base

Methodology

  • How to Do Thematic Analysis | Step-by-Step Guide & Examples

How to Do Thematic Analysis | Step-by-Step Guide & Examples

Published on September 6, 2019 by Jack Caulfield . Revised on June 22, 2023.

Thematic analysis is a method of analyzing qualitative data . It is usually applied to a set of texts, such as an interview or transcripts . The researcher closely examines the data to identify common themes – topics, ideas and patterns of meaning that come up repeatedly.

There are various approaches to conducting thematic analysis, but the most common form follows a six-step process: familiarization, coding, generating themes, reviewing themes, defining and naming themes, and writing up. Following this process can also help you avoid confirmation bias when formulating your analysis.

This process was originally developed for psychology research by Virginia Braun and Victoria Clarke . However, thematic analysis is a flexible method that can be adapted to many different kinds of research.

Table of contents

When to use thematic analysis, different approaches to thematic analysis, step 1: familiarization, step 2: coding, step 3: generating themes, step 4: reviewing themes, step 5: defining and naming themes, step 6: writing up, other interesting articles.

Thematic analysis is a good approach to research where you’re trying to find out something about people’s views, opinions, knowledge, experiences or values from a set of qualitative data – for example, interview transcripts , social media profiles, or survey responses .

Some types of research questions you might use thematic analysis to answer:

  • How do patients perceive doctors in a hospital setting?
  • What are young women’s experiences on dating sites?
  • What are non-experts’ ideas and opinions about climate change?
  • How is gender constructed in high school history teaching?

To answer any of these questions, you would collect data from a group of relevant participants and then analyze it. Thematic analysis allows you a lot of flexibility in interpreting the data, and allows you to approach large data sets more easily by sorting them into broad themes.

However, it also involves the risk of missing nuances in the data. Thematic analysis is often quite subjective and relies on the researcher’s judgement, so you have to reflect carefully on your own choices and interpretations.

Pay close attention to the data to ensure that you’re not picking up on things that are not there – or obscuring things that are.

Here's why students love Scribbr's proofreading services

Discover proofreading & editing

Once you’ve decided to use thematic analysis, there are different approaches to consider.

There’s the distinction between inductive and deductive approaches:

  • An inductive approach involves allowing the data to determine your themes.
  • A deductive approach involves coming to the data with some preconceived themes you expect to find reflected there, based on theory or existing knowledge.

Ask yourself: Does my theoretical framework give me a strong idea of what kind of themes I expect to find in the data (deductive), or am I planning to develop my own framework based on what I find (inductive)?

There’s also the distinction between a semantic and a latent approach:

  • A semantic approach involves analyzing the explicit content of the data.
  • A latent approach involves reading into the subtext and assumptions underlying the data.

Ask yourself: Am I interested in people’s stated opinions (semantic) or in what their statements reveal about their assumptions and social context (latent)?

After you’ve decided thematic analysis is the right method for analyzing your data, and you’ve thought about the approach you’re going to take, you can follow the six steps developed by Braun and Clarke .

The first step is to get to know our data. It’s important to get a thorough overview of all the data we collected before we start analyzing individual items.

This might involve transcribing audio , reading through the text and taking initial notes, and generally looking through the data to get familiar with it.

Next up, we need to code the data. Coding means highlighting sections of our text – usually phrases or sentences – and coming up with shorthand labels or “codes” to describe their content.

Let’s take a short example text. Say we’re researching perceptions of climate change among conservative voters aged 50 and up, and we have collected data through a series of interviews. An extract from one interview looks like this:

Coding qualitative data
Interview extract Codes
Personally, I’m not sure. I think the climate is changing, sure, but I don’t know why or how. People say you should trust the experts, but who’s to say they don’t have their own reasons for pushing this narrative? I’m not saying they’re wrong, I’m just saying there’s reasons not to 100% trust them. The facts keep changing – it used to be called global warming.

In this extract, we’ve highlighted various phrases in different colors corresponding to different codes. Each code describes the idea or feeling expressed in that part of the text.

At this stage, we want to be thorough: we go through the transcript of every interview and highlight everything that jumps out as relevant or potentially interesting. As well as highlighting all the phrases and sentences that match these codes, we can keep adding new codes as we go through the text.

After we’ve been through the text, we collate together all the data into groups identified by code. These codes allow us to gain a a condensed overview of the main points and common meanings that recur throughout the data.

Next, we look over the codes we’ve created, identify patterns among them, and start coming up with themes.

Themes are generally broader than codes. Most of the time, you’ll combine several codes into a single theme. In our example, we might start combining codes into themes like this:

Turning codes into themes
Codes Theme
Uncertainty
Distrust of experts
Misinformation

At this stage, we might decide that some of our codes are too vague or not relevant enough (for example, because they don’t appear very often in the data), so they can be discarded.

Other codes might become themes in their own right. In our example, we decided that the code “uncertainty” made sense as a theme, with some other codes incorporated into it.

Again, what we decide will vary according to what we’re trying to find out. We want to create potential themes that tell us something helpful about the data for our purposes.

Now we have to make sure that our themes are useful and accurate representations of the data. Here, we return to the data set and compare our themes against it. Are we missing anything? Are these themes really present in the data? What can we change to make our themes work better?

If we encounter problems with our themes, we might split them up, combine them, discard them or create new ones: whatever makes them more useful and accurate.

For example, we might decide upon looking through the data that “changing terminology” fits better under the “uncertainty” theme than under “distrust of experts,” since the data labelled with this code involves confusion, not necessarily distrust.

Now that you have a final list of themes, it’s time to name and define each of them.

Defining themes involves formulating exactly what we mean by each theme and figuring out how it helps us understand the data.

Naming themes involves coming up with a succinct and easily understandable name for each theme.

For example, we might look at “distrust of experts” and determine exactly who we mean by “experts” in this theme. We might decide that a better name for the theme is “distrust of authority” or “conspiracy thinking”.

Finally, we’ll write up our analysis of the data. Like all academic texts, writing up a thematic analysis requires an introduction to establish our research question, aims and approach.

We should also include a methodology section, describing how we collected the data (e.g. through semi-structured interviews or open-ended survey questions ) and explaining how we conducted the thematic analysis itself.

The results or findings section usually addresses each theme in turn. We describe how often the themes come up and what they mean, including examples from the data as evidence. Finally, our conclusion explains the main takeaways and shows how the analysis has answered our research question.

In our example, we might argue that conspiracy thinking about climate change is widespread among older conservative voters, point out the uncertainty with which many voters view the issue, and discuss the role of misinformation in respondents’ perceptions.

If you want to know more about statistics , methodology , or research bias , make sure to check out some of our other articles with explanations and examples.

  • Normal distribution
  • Measures of central tendency
  • Chi square tests
  • Confidence interval
  • Quartiles & Quantiles
  • Cluster sampling
  • Stratified sampling
  • Discourse analysis
  • Cohort study
  • Peer review
  • Ethnography

Research bias

  • Implicit bias
  • Cognitive bias
  • Conformity bias
  • Hawthorne effect
  • Availability heuristic
  • Attrition bias
  • Social desirability bias

Cite this Scribbr article

If you want to cite this source, you can copy and paste the citation or click the “Cite this Scribbr article” button to automatically add the citation to our free Citation Generator.

Caulfield, J. (2023, June 22). How to Do Thematic Analysis | Step-by-Step Guide & Examples. Scribbr. Retrieved August 28, 2024, from https://www.scribbr.com/methodology/thematic-analysis/

Is this article helpful?

Jack Caulfield

Jack Caulfield

Other students also liked, what is qualitative research | methods & examples, inductive vs. deductive research approach | steps & examples, critical discourse analysis | definition, guide & examples, get unlimited documents corrected.

✔ Free APA citation check included ✔ Unlimited document corrections ✔ Specialized in correcting academic texts

Thematic Analysis: A Step by Step Guide

Saul McLeod, PhD

Editor-in-Chief for Simply Psychology

BSc (Hons) Psychology, MRes, PhD, University of Manchester

Saul McLeod, PhD., is a qualified psychology teacher with over 18 years of experience in further and higher education. He has been published in peer-reviewed journals, including the Journal of Clinical Psychology.

Learn about our Editorial Process

Olivia Guy-Evans, MSc

Associate Editor for Simply Psychology

BSc (Hons) Psychology, MSc Psychology of Education

Olivia Guy-Evans is a writer and associate editor for Simply Psychology. She has previously worked in healthcare and educational sectors.

On This Page:

What is Thematic Analysis?

Thematic analysis is a qualitative research method used to identify, analyze, and interpret patterns of shared meaning (themes) within a given data set, which can be in the form of interviews , focus group discussions , surveys, or other textual data.

Thematic analysis is a useful method for research seeking to understand people’s views, opinions, knowledge, experiences, or values from qualitative data.

This method is widely used in various fields, including psychology, sociology, and health sciences.

Thematic analysis minimally organizes and describes a data set in rich detail. Often, though, it goes further than this and interprets aspects of the research topic.

Key aspects of Thematic Analysis include:

  • Flexibility : It can be adapted to suit the needs of various studies, providing a rich and detailed account of the data.
  • Coding : The process involves assigning labels or codes to specific segments of the data that capture a single idea or concept relevant to the research question.
  • Themes : Representing a broader level of analysis, encompassing multiple codes that share a common underlying meaning or pattern. They provide a more abstract and interpretive understanding of the data.
  • Iterative process : Thematic analysis is a recursive process that involves constantly moving back and forth between the coded extracts, the entire data set, and the thematic analysis being produced.
  • Interpretation : The researcher interprets the identified themes to make sense of the data and draw meaningful conclusions.

It’s important to note that the types of thematic analysis are not mutually exclusive, and researchers may adopt elements from different approaches depending on their research questions, goals, and epistemological stance.

The choice of approach should be guided by the research aims, the nature of the data, and the philosophical assumptions underpinning the study.

FeatureCoding Reliability TACodebook TAReflexive TA
Conceptualized as topic summaries of the data Typically conceptualized as topic summariesConceptualized as patterns of shared meaning that are underpinned by a central organizing concept
Involves using a coding frame or codebook, which may be predetermined or generated from the data, to find evidence for themes or allocate data to predefined topics. Ideally, two or more researchers apply the coding frame separately to the data to avoid contaminationTypically involves early theme development and the use of a codebook and structured approach to codingInvolves an active process in which codes are developed from the data through the analysis. The researcher’s subjectivity shapes the coding and theme development process
Emphasizes securing the reliability and accuracy of data coding, reflecting (post)positivist research values. Prioritizes minimizing subjectivity and maximizing objectivity in the coding processCombines elements of both coding reliability and reflexive TA, but qualitative values tend to predominate. For example, the “accuracy” or “reliability” of coding is not a primary concernEmphasizes the role of the researcher in knowledge construction and acknowledges that their subjectivity shapes the research process and outcomes
Often used in research where minimizing subjectivity and maximizing objectivity in the coding process are highly valuedCommonly employed in applied research, particularly when information needs are predetermined, deadlines are tight, and research teams are large and may include qualitative novices. Pragmatic concerns often drive its useWell-suited for exploring complex research issues. Often used in research where the researcher’s active role in knowledge construction is acknowledged and valued. Can be used to analyze a wide range of data, including interview transcripts, focus groups, and policy documents
Themes are often predetermined or generated early in the analysis process, either prior to data analysis or following some familiarization with the dataThemes are typically developed early in the analysis processThemes are developed later in the analytic process, emerging from the coded data
The researcher’s subjectivity is minimized, aiming for objectivity in codingThe researcher’s subjectivity is acknowledged, though structured coding methods are usedThe researcher’s subjectivity is viewed as a valuable resource in the analytic process and is considered to inevitably shape the research findings

1. Coding Reliability Thematic Analysis

Coding reliability TA emphasizes using coding techniques to achieve reliable and accurate data coding, which reflects (post)positivist research values.

This approach emphasizes the reliability and replicability of the coding process. It involves multiple coders independently coding the data using a predetermined codebook.

The goal is to achieve a high level of agreement among the coders, which is often measured using inter-rater reliability metrics.

This approach often involves a coding frame or codebook determined in advance or generated after familiarization with the data.

In this type of TA, two or more researchers apply a fixed coding frame to the data, ideally working separately.

Some researchers even suggest that at least some coders should be unaware of the research question or area of study to prevent bias in the coding process.

Statistical tests are used to assess the level of agreement between coders, or the reliability of coding. Any differences in coding between researchers are resolved through consensus.

This approach is more suitable for research questions that require a more structured and reliable coding process, such as in content analysis or when comparing themes across different data sets.

2. Codebook Thematic Analysis

Codebook TA, such as template, framework, and matrix analysis, combines elements of coding reliability and reflexive.

Codebook TA, while employing structured coding methods like those used in coding reliability TA, generally prioritizes qualitative research values, such as reflexivity.

In this approach, the researcher develops a codebook based on their initial engagement with the data. The codebook contains a list of codes, their definitions, and examples from the data.

The codebook is then used to systematically code the entire data set. This approach allows for a more detailed and nuanced analysis of the data, as the codebook can be refined and expanded throughout the coding process.

It is particularly useful when the research aims to provide a comprehensive description of the data set.

Codebook TA is often chosen for pragmatic reasons in applied research, particularly when there are predetermined information needs, strict deadlines, and large teams with varying levels of qualitative research experience

The use of a codebook in this context helps to map the developing analysis, which is thought to improve teamwork, efficiency, and the speed of output delivery.

3. Reflexive Thematic Analysis

This approach emphasizes the role of the researcher in the analysis process. It acknowledges that the researcher’s subjectivity, theoretical assumptions, and interpretative framework shape the identification and interpretation of themes.

In reflexive TA, analysis starts with coding after data familiarization. Unlike other TA approaches, there is no codebook or coding frame. Instead, researchers develop codes as they work through the data.

As their understanding grows, codes can change to reflect new insights—for example, they might be renamed, combined with other codes, split into multiple codes, or have their boundaries redrawn.

If multiple researchers are involved, differences in coding are explored to enhance understanding, not to reach a consensus. The finalized coding is always open to new insights and coding.

Reflexive thematic analysis involves a more organic and iterative process of coding and theme development. The researcher continuously reflects on their role in the research process and how their own experiences and perspectives might influence the analysis.

This approach is particularly useful for exploratory research questions and when the researcher aims to provide a rich and nuanced interpretation of the data.

Six Steps Of Thematic Analysis

The process is characterized by a recursive movement between the different phases, rather than a strict linear progression.

This means that researchers might revisit earlier phases as their understanding of the data evolves, constantly refining their analysis.

For instance, during the reviewing and developing themes phase, researchers may realize that their initial codes don’t effectively capture the nuances of the data and might need to return to the coding phase. 

This back-and-forth movement continues throughout the analysis, ensuring a thorough and evolving understanding of the data

thematic analysis

Step 1: Familiarization With the Data

Familialization is crucial, as it helps researchers figure out the type (and number) of themes that might emerge from the data.

Familiarization involves immersing yourself in the data by reading and rereading textual data items, such as interview transcripts or survey responses.

You should read through the entire data set at least once, and possibly multiple times, until you feel intimately familiar with its content.

  • Read and re-read the data (e.g., interview transcripts, survey responses, or other textual data) : The researcher reads through the entire data set (e.g., interview transcripts, survey responses, or field notes) multiple times to gain a comprehensive understanding of the data’s breadth and depth. This helps the researcher develop a holistic sense of the participants’ experiences, perspectives, and the overall narrative of the data.
  • Listen to the audio recordings of the interviews : This helps to pick up on tone, emphasis, and emotional responses that may not be evident in the written transcripts. For instance, they might note a participant’s hesitation or excitement when discussing a particular topic. This is an important step if you didn’t collect the data or transcribe it yourself.
  • Take notes on initial ideas and observations : Note-making at this stage should be observational and casual, not systematic and inclusive, as you aren’t coding yet. Think of the notes as memory aids and triggers for later coding and analysis. They are primarily for you, although they might be shared with research team members.
  • Immerse yourself in the data to gain a deep understanding of its content : It’s not about just absorbing surface meaning like you would with a novel, but about thinking about what the data  mean .

By the end of the familiarization step, the researcher should have a good grasp of the overall content of the data, the key issues and experiences discussed by the participants, and any initial patterns or themes that emerge.

This deep engagement with the data sets the stage for the subsequent steps of thematic analysis, where the researcher will systematically code and analyze the data to identify and interpret the central themes.

Step 2: Generating Initial Codes

Codes are concise labels or descriptions assigned to segments of the data that capture a specific feature or meaning relevant to the research question.

The process of qualitative coding helps the researcher organize and reduce the data into manageable chunks, making it easier to identify patterns and themes relevant to the research question.

Think of it this way:  If your analysis is a house, themes are the walls and roof, while codes are the individual bricks and tiles.

Coding is an iterative process, with researchers refining and revising their codes as their understanding of the data evolves.

The ultimate goal is to develop a coherent and meaningful coding scheme that captures the richness and complexity of the participants’ experiences and helps answer the research questions.

Coding can be done manually (paper transcription and pen or highlighter) or by means of software (e.g. by using NVivo, MAXQDA or ATLAS.ti).

qualitative coding

Decide On Your Coding Approach

  • Will you use predefined deductive codes (based on theory or prior research), or let codes emerge from the data (inductive coding)?
  • Will a piece of data have one code or multiple?
  • Will you code everything or selectively? Broader research questions may warrant coding more comprehensively.

If you decide not to code everything, it’s crucial to:

  • Have clear criteria for what you will and won’t code
  • Be transparent about your selection process in research reports
  • Remain open to revisiting uncoded data later in analysis

Do A First Round Of Coding

  • Go through the data and assign initial codes to chunks that stand out
  • Create a code name (a word or short phrase) that captures the essence of each chunk
  • Keep a codebook – a list of your codes with descriptions or definitions
  • Be open to adding, revising or combining codes as you go

After generating your first code, compare each new data extract to see if an existing code applies or a new one is needed.

Coding can be done at two levels of meaning:

  • Semantic:  Provides a concise summary of a portion of data, staying close to the content and the participant’s meaning. For example, “Fear/anxiety about people’s reactions to his sexuality.”
  • Latent:  Goes beyond the participant’s meaning to provide a conceptual interpretation of the data. For example, “Coming out imperative” interprets the meaning behind a participant’s statement.

Most codes will be a mix of descriptive and conceptual. Novice coders tend to generate more descriptive codes initially, developing more conceptual approaches with experience.

This step ends when:

  • All data is fully coded.
  • Data relevant to each code has been collated.

You have enough codes to capture the data’s diversity and patterns of meaning, with most codes appearing across multiple data items.

The number of codes you generate will depend on your topic, data set, and coding precision.

Step 3: Searching for Themes

Searching for themes begins after all data has been initially coded and collated, resulting in a comprehensive list of codes identified across the data set.

This step involves shifting from the specific, granular codes to a broader, more conceptual level of analysis.

Thematic analysis is not about “discovering” themes that already exist in the data, but rather actively constructing or generating themes through a careful and iterative process of examination and interpretation.

1 . Collating codes into potential themes :

The process of collating codes into potential themes involves grouping codes that share a unifying feature or represent a coherent and meaningful pattern in the data.

The researcher looks for patterns, similarities, and connections among the codes to develop overarching themes that capture the essence of the data.

By the end of this step, the researcher will have a collection of candidate themes and sub-themes, along with their associated data extracts.

However, these themes are still provisional and will be refined in the next step of reviewing the themes.

The searching for themes step helps the researcher move from a granular, code-level analysis to a more conceptual, theme-level understanding of the data.

This process is similar to sculpting, where the researcher shapes the “raw” data into a meaningful analysis.

This involves grouping codes that share a unifying feature or represent a coherent pattern in the data:
  • Review the list of initial codes and their associated data extracts
  • Look for codes that seem to share a common idea or concept
  • Group related codes together to form potential themes
  • Some codes may form main themes, while others may be sub-themes or may not fit into any theme

Thematic maps can help visualize the relationship between codes and themes. These visual aids provide a structured representation of the emerging patterns and connections within the data, aiding in understanding the significance of each theme and its contribution to the overall research question.

Example : Studying first-generation college students, the researcher might notice that the codes “financial challenges,” “working part-time,” and “scholarships” all relate to the broader theme of “Financial Obstacles and Support.”

Shared Meaning vs. Shared Topic in Thematic Analysis

Braun and Clarke distinguish between two different conceptualizations of  themes : topic summaries and shared meaning

  • Topic summary themes , which they consider to be underdeveloped, are organized around a shared topic but not a shared meaning, and often resemble “buckets” into which data is sorted.
  • Shared meaning themes  are patterns of shared meaning underpinned by a central organizing concept.
When grouping codes into themes, it’s crucial to ensure they share a central organizing concept or idea, reflecting a shared meaning rather than just belonging to the same topic.

Thematic analysis aims to uncover patterns of shared meaning within the data that offer insights into the research question

For example, codes centered around the concept of “Negotiating Sexual Identity” might not form one comprehensive theme, but rather two distinct themes: one related to “coming out and being out” and another exploring “different versions of being a gay man.”

Avoid : Themes as Topic Summaries (Shared Topic)

In this approach, themes simply summarize what participants mentioned about a particular topic, without necessarily revealing a unified meaning.

These themes are often underdeveloped and lack a central organizing concept.

It’s crucial to avoid creating themes that are merely summaries of data domains or directly reflect the interview questions. 

Example : A theme titled “Incidents of homophobia” that merely describes various participant responses about homophobia without delving into deeper interpretations would be a topic summary theme.

Tip : Using interview questions as theme titles without further interpretation or relying on generic social functions (“social conflict”) or structural elements (“economics”) as themes often indicates a lack of shared meaning and thorough theme development. Such themes might lack a clear connection to the specific dataset

Ensure : Themes as Shared Meaning

Instead, themes should represent a deeper level of interpretation, capturing the essence of the data and providing meaningful insights into the research question.

These themes go beyond summarizing a topic by identifying a central concept or idea that connects the codes.

They reflect a pattern of shared meaning across different data points, even if those points come from different topics.

Example : The theme “‘There’s always that level of uncertainty’: Compulsory heterosexuality at university” effectively captures the shared experience of fear and uncertainty among LGBT students, connecting various codes related to homophobia and its impact on their lives.

2. Gathering data relevant to each potential theme

Once a potential theme is identified, all coded data extracts associated with the codes grouped under that theme are collated. This ensures a comprehensive view of the data pertaining to each theme.

This involves reviewing the collated data extracts for each code and organizing them under the relevant themes.

For example, if you have a potential theme called “Student Strategies for Test Preparation,” you would gather all data extracts that have been coded with related codes, such as “Time Management for Test Preparation” or “Study Groups for Test Preparation”.

You can then begin reviewing the data extracts for each theme to see if they form a coherent pattern. 

This step helps to ensure that your themes accurately reflect the data and are not based on your own preconceptions.

It’s important to remember that coding is an organic and ongoing process.

You may need to re-read your entire data set to see if you have missed any data that is relevant to your themes, or if you need to create any new codes or themes.

The researcher should ensure that the data extracts within each theme are coherent and meaningful.

Example : The researcher would gather all the data extracts related to “Financial Obstacles and Support,” such as quotes about struggling to pay for tuition, working long hours, or receiving scholarships.

Here’s a more detailed explanation of how to gather data relevant to each potential theme:

  • Start by creating a visual representation of your potential themes, such as a thematic map or table
  • List each potential theme and its associated sub-themes (if any)
  • This will help you organize your data and see the relationships between themes
  • Go through your coded data extracts (e.g., highlighted quotes or segments from interview transcripts)
  • For each coded extract, consider which theme or sub-theme it best fits under
  • If a coded extract seems to fit under multiple themes, choose the theme that it most closely aligns with in terms of shared meaning
  • As you identify which theme each coded extract belongs to, copy and paste the extract under the relevant theme in your thematic map or table
  • Include enough context around each extract to ensure its meaning is clear
  • If using qualitative data analysis software, you can assign the coded extracts to the relevant themes within the software
  • As you gather data extracts under each theme, continuously review the extracts to ensure they form a coherent pattern
  • If some extracts do not fit well with the rest of the data in a theme, consider whether they might better fit under a different theme or if the theme needs to be refined

3. Considering relationships between codes, themes, and different levels of themes

Once you have gathered all the relevant data extracts under each theme, review the themes to ensure they are meaningful and distinct.

This step involves analyzing how different codes combine to form overarching themes and exploring the hierarchical relationship between themes and sub-themes.

Within a theme, there can be different levels of themes, often organized hierarchically as main themes and sub-themes.

  • Main themes  represent the most overarching or significant patterns found in the data. They provide a high-level understanding of the key issues or concepts present in the data. 
  • Sub-themes , as the name suggests, fall under main themes, offering a more nuanced and detailed understanding of a particular aspect of the main theme.

The process of developing these relationships is iterative and involves:

  • Creating a Thematic Map : The relationship between codes, sub-themes and main themes can be visualized using a thematic map, diagram, or table. Refine the thematic map as you continue to review and analyze the data.
  • Examine how the codes and themes relate to each other : Some themes may be more prominent or overarching (main themes), while others may be secondary or subsidiary (sub-themes).
  • Refining Themes : This map helps researchers review and refine themes, ensuring they are internally consistent (homogeneous) and distinct from other themes (heterogeneous).
  • Defining and Naming Themes : Finally, themes are given clear and concise names and definitions that accurately reflect the meaning they represent in the data.

Thematic map of qualitative data from focus groups W640

Consider how the themes tell a coherent story about the data and address the research question.

If some themes seem to overlap or are not well-supported by the data, consider combining or refining them.

If a theme is too broad or diverse, consider splitting it into separate themes or sub-theme.

Example : The researcher might identify “Academic Challenges” and “Social Adjustment” as other main themes, with sub-themes like “Imposter Syndrome” and “Balancing Work and School” under “Academic Challenges.” They would then consider how these themes relate to each other and contribute to the overall understanding of first-generation college students’ experiences.

Step 4: Reviewing Themes

The researcher reviews, modifies, and develops the preliminary themes identified in the previous step.

This phase involves a recursive process of checking the themes against the coded data extracts and the entire data set to ensure they accurately reflect the meanings evident in the data.

The purpose is to refine the themes, ensuring they are coherent, consistent, and distinctive.

According to Braun and Clarke, a well-developed theme “captures something important about the data in relation to the research question and represents some level of patterned response or meaning within the data set”.

A well-developed theme will:

  • Go beyond paraphrasing the data to analyze the meaning and significance of the patterns identified.
  • Provide a detailed analysis of what the theme is about.
  • Be supported with a good amount of relevant data extracts.
  • Be related to the research question.
Revisions at this stage might involve creating new themes, refining existing themes, or discarding themes that do not fit the data

Level One : Reviewing Themes Against Coded Data Extracts

  • Researchers begin by comparing their candidate themes against the coded data extracts associated with each theme.
  • This step helps to determine whether each theme is supported by the data and whether it accurately reflects the meaning found in the extracts. Determine if there is enough data to support each theme.
  • Look at the relationships between themes and sub-themes in the thematic map. Consider whether the themes work together to tell a coherent story about the data. If the thematic map does not effectively represent the data, consider making adjustments to the themes or their organization.
  • It’s important to ensure that each theme has a singular focus and is not trying to encompass too much. Themes should be distinct from one another, although they may build on or relate to each other.
  • Discarding codes : If certain codes within a theme are not well-supported or do not fit, they can be removed.
  • Relocating codes : Codes that fit better under a different theme can be moved.
  • Redrawing theme boundaries : The scope of a theme can be adjusted to better capture the relevant data.
  • Discarding themes : Entire themes can be abandoned if they do not work.

Level Two : Evaluating Themes Against the Entire Data Set

  • Once the themes appear coherent and well-supported by the coded extracts, researchers move on to evaluate them against the entire data set.
  • This involves a final review of all the data to ensure that the themes accurately capture the most important and relevant patterns across the entire dataset in relation to the research question.
  • During this level, researchers may need to recode some extracts for consistency, especially if the coding process evolved significantly, and earlier data items were not recoded according to these changes.

Step 5: Defining and Naming Themes

The themes are finalized when the researcher is satisfied with the theme names and definitions.

If the analysis is carried out by a single researcher, it is recommended to seek feedback from an external expert to confirm that the themes are well-developed, clear, distinct, and capture all the relevant data.

Defining themes  means determining the exact meaning of each theme and understanding how it contributes to understanding the data.

This process involves formulating exactly what we mean by each theme. The researcher should consider what a theme says, if there are subthemes, how they interact and relate to the main theme, and how the themes relate to each other.

Themes should not be overly broad or try to encompass too much, and should have a singular focus. They should be distinct from one another and not repetitive, although they may build on one another.

In this phase the researcher specifies the essence of each theme.

  • What does the theme tell us that is relevant for the research question?
  • How does it fit into the ‘overall story’ the researcher wants to tell about the data?
Naming themes  involves developing a clear and concise name that effectively conveys the essence of each theme to the reader. A good name for a theme is informative, concise, and catchy.
  • The researcher develops concise, punchy, and informative names for each theme that effectively communicate its essence to the reader.
  • Theme names should be catchy and evocative, giving the reader an immediate sense of what the theme is about.
  • Avoid using jargon or overly complex language in theme names.
  • The name should go beyond simply paraphrasing the content of the data extracts and instead interpret the meaning and significance of the patterns within the theme.
  • The goal is to make the themes accessible and easily understandable to the intended audience. If a theme contains sub-themes, the researcher should also develop clear and informative names for each sub-theme.
  • Theme names can include direct quotations from the data, which helps convey the theme’s meaning. However, researchers should avoid using data collection questions as theme names. Using data collection questions as themes often leads to analyses that present summaries of topics rather than fully realized themes.

For example, “‘There’s always that level of uncertainty’: Compulsory heterosexuality at university” is a strong theme name because it captures the theme’s meaning. In contrast, “incidents of homophobia” is a weak theme name because it only states the topic.

For instance, a theme labeled “distrust of experts” might be renamed “distrust of authority” or “conspiracy thinking” after careful consideration of the theme’s meaning and scope.

Step 6: Producing the Report

A thematic analysis report should provide a convincing and clear, yet complex story about the data that is situated within a scholarly field.

A balance should be struck between the narrative and the data presented, ensuring that the report convincingly explains the meaning of the data, not just summarizes it.

To achieve this, the report should include vivid, compelling data extracts illustrating the themes and incorporate extracts from different data sources to demonstrate the themes’ prevalence and strengthen the analysis by representing various perspectives within the data.

The report should be written in first-person active tense, unless otherwise stated in the reporting requirements.

The analysis can be presented in two ways :

  • Integrated Results and Discussion section:  This approach is suitable when the analysis has strong connections to existing research and when the analysis is more theoretical or interpretive.
  • Separate Discussion section:  This approach presents the data interpretation separately from the results.
Regardless of the presentation style, researchers should aim to “show” what the data reveals and “tell” the reader what it means in order to create a convincing analysis.
  • Presentation order of themes: Consider how to best structure the presentation of the themes in the report. This may involve presenting the themes in order of importance, chronologically, or in a way that tells a coherent story.
  • Subheadings: Use subheadings to clearly delineate each theme and its sub-themes, making the report easy to navigate and understand.

The analysis should go beyond a simple summary of participant’s words and instead interpret the meaning of the data.

Themes should connect logically and meaningfully and, if relevant, should build on previous themes to tell a coherent story about the data.

The report should include vivid, compelling data extracts that clearly illustrate the theme being discussed and should incorporate extracts from different data sources, rather than relying on a single source.

Although it is tempting to rely on one source when it eloquently expresses a particular aspect of the theme, using multiple sources strengthens the analysis by representing a wider range of perspectives within the data.

Researchers should strive to maintain a balance between the amount of narrative and the amount of data presented.

Potential Pitfalls to Avoid

  • Failing to analyze the data : Thematic analysis should involve more than simply presenting data extracts without an analytic narrative. The researcher must provide an interpretation and make sense of the data, telling the reader what it means and how it relates to the research questions.
  • Using data collection questions as themes : Themes should be identified across the entire dataset, not just based on the questions asked during data collection. Reporting data collection questions as themes indicates a lack of thorough analytic work to identify patterns and meanings in the data.
  • Conducting a weak or unconvincing analysis : Themes should be distinct, internally coherent, and consistent, capturing the majority of the data or providing a rich description of specific aspects. A weak analysis may have overlapping themes, fail to capture the data adequately, or lack sufficient examples to support the claims made.
  • Mismatch between data and analytic claims : The researcher’s interpretations and analytic points must be consistent with the data extracts presented. Claims that are not supported by the data, contradict the data, or fail to consider alternative readings or variations in the account are problematic.
  • Misalignment between theory, research questions, and analysis : The interpretations of the data should be consistent with the theoretical framework used. For example, an experiential framework would not typically make claims about the social construction of the topic. The form of thematic analysis used should also align with the research questions.
  • Neglecting to clarify assumptions, purpose, and process : A good thematic analysis should spell out its theoretical assumptions, clarify how it was undertaken, and for what purpose. Without this crucial information, the analysis is lacking context and transparency, making it difficult for readers to evaluate the research.

Reducing Bias

When researchers are both reflexive and transparent in their thematic analysis, it strengthens the trustworthiness and rigor of their findings.

The explicit acknowledgement of potential biases and the detailed documentation of the analytical process provide a stronger foundation for the interpretation of the data, making it more likely that the findings reflect the perspectives of the participants rather than the biases of the researcher.

Reflexivity

Reflexivity involves critically examining one’s own assumptions and biases, is crucial in qualitative research to ensure the trustworthiness of findings.

It requires acknowledging that researcher subjectivity is inherent in the research process and can influence how data is collected, analyzed, and interpreted.

Identifying and Challenging Assumptions:

Reflexivity encourages researchers to explicitly acknowledge their preconceived notions, theoretical leanings, and potential biases.

By actively reflecting on how these factors might influence their interpretation of the data, researchers can take steps to mitigate their impact.

This might involve seeking alternative explanations, considering contradictory evidence, or discussing their interpretations with others to gain different perspectives.

Transparency

Transparency refers to clearly documenting the research process, including coding decisions, theme development, and the rationale behind behind theme development.

This openness allows others to understand how the analysis was conducted and to assess the credibility of the findings

This transparency helps ensure the trustworthiness and rigor of the findings, allowing others to understand and potentially replicate the analysis.

Documenting Decision-Making:

Transparency requires researchers to provide a clear and detailed account of their analytical choices throughout the research process.

This includes documenting the rationale behind coding decisions, the process of theme development, and any changes made to the analytical approach during the study.

By making these decisions transparent, researchers allow others to scrutinize their work and assess the potential for bias.

Practical Strategies for Reflexivity and Transparency in Thematic Analysis:

  • Maintaining a reflexive journal:  Researchers can keep a journal throughout the research process to document their thoughts, assumptions, and potential biases. This journal serves as a record of the researcher’s evolving understanding of the data and can help identify potential blind spots in their analysis.
  • Engaging in team-based analysis:  Collaborative analysis, involving multiple researchers, can enhance reflexivity by providing different perspectives and interpretations of the data. Discussing coding decisions and theme development as a team allows researchers to challenge each other’s assumptions and ensure a more comprehensive analysis.
  • Clearly articulating the analytical process:  In reporting the findings of thematic analysis, researchers should provide a detailed account of their methods, including the rationale behind coding decisions, the process of theme development, and any challenges encountered during analysis. This transparency allows readers to understand the steps taken to ensure the rigor and trustworthiness of the analysis.
  • Flexibility:  Thematic analysis is a flexible method, making it adaptable to different research questions and theoretical frameworks. It can be employed with various epistemological approaches, including realist, constructionist, and contextualist perspectives. For example, researchers can focus on analyzing meaning across the entire data set or examine a particular aspect in depth.
  • Accessibility:  Thematic analysis is an accessible method, especially for novice qualitative researchers, as it doesn’t demand extensive theoretical or technical knowledge compared to methods like Discourse Analysis (DA) or Conversation Analysis (CA). It is considered a foundational qualitative analysis method.
  • Rich Description:  Thematic analysis facilitates a rich and detailed description of data9. It can provide a thorough understanding of the predominant themes in a data set, offering valuable insights, particularly in under-researched areas.
  • Theoretical Freedom:  Thematic analysis is not restricted to any pre-existing theoretical framework, allowing for diverse applications. This distinguishes it from methods like Grounded Theory or Interpretative Phenomenological Analysis (IPA), which are more closely tied to specific theoretical approaches

Disadvantages

  • Subjectivity and Interpretation:  The flexibility of thematic analysis, while an advantage, can also be a disadvantage. The method’s openness can lead to a wide range of interpretations of the same data set, making it difficult to determine which aspects to emphasize. This potential subjectivity might raise concerns about the analysis’s reliability and consistency.
  • Limited Interpretive Power:  Unlike methods like narrative analysis or biographical approaches, thematic analysis may not capture the nuances of individual experiences or contradictions within a single account. The focus on patterns across interviews could result in overlooking unique individual perspectives.
  • Oversimplification:  Thematic analysis might oversimplify complex phenomena by focusing on common themes, potentially missing subtle but important variations within the data. If not carefully executed, the analysis may present a homogenous view of the data that doesn’t reflect the full range of perspectives.
  • Lack of Established Theoretical Frameworks:  Thematic analysis does not inherently rely on pre-existing theoretical frameworks. While this allows for inductive exploration, it can also limit the interpretive power of the analysis if not anchored within a relevant theoretical context. The absence of a theoretical foundation might make it challenging to draw meaningful and generalizable conclusions.
  • Difficulty in Higher-Phase Analysis:  While thematic analysis is relatively easy to initiate, the flexibility in its application can make it difficult to establish specific guidelines for higher-phase analysis1. Researchers may find it challenging to navigate the later stages of analysis and develop a coherent and insightful interpretation of the identified themes.
  • Potential for Researcher Bias:  As with any qualitative research method, thematic analysis is susceptible to researcher bias. Researchers’ preconceived notions and assumptions can influence how they code and interpret data, potentially leading to skewed results.

Further Information

  • Braun, V., & Clarke, V. (2006). Using thematic analysis in psychology . Qualitative Research in Psychology, 3 (2), 77–101.
  • Braun, V., & Clarke, V. (2013). Successful qualitative research: A practical guide for beginners. Sage.
  • Braun, V., & Clarke, V. (2019). Reflecting on reflexive thematic analysi s. Qualitative Research in Sport, Exercise and Health, 11 (4), 589–597.
  • Braun, V., & Clarke, V. (2021). One size fits all? What counts as quality practice in (reflexive) thematic analysis? Qualitative Research in Psychology, 18 (3), 328–352.
  • Braun, V., & Clarke, V. (2021). To saturate or not to saturate? Questioning data saturation as a useful concept for thematic analysis and sample-size rationales . Qualitative Research in Sport, Exercise and Health, 13 (2), 201–216.
  • Braun, V., & Clarke, V. (2022). Conceptual and design thinking for thematic analysis .  Qualitative psychology ,  9 (1), 3.
  • Braun, V., & Clarke, V. (2022b). Thematic analysis: A practical guide . Sage.
  • Braun, V., Clarke, V., & Hayfield, N. (2022). ‘A starting point for your journey, not a map’: Nikki Hayfield in conversation with Virginia Braun and Victoria Clarke about thematic analysis.  Qualitative research in psychology ,  19 (2), 424-445.
  • Finlay, L., & Gough, B. (Eds.). (2003). Reflexivity: A practical guide for researchers in health and social sciences. Blackwell Science.
  • Gibbs, G. R. (2013). Using software in qualitative analysis. In U. Flick (ed.) The Sage handbook of qualitative data analysis (pp. 277–294). London: Sage.
  • McLeod, S. (2024, May 17). Qualitative Data Coding . Simply Psychology. https://www.simplypsychology.org/qualitative-data-coding.html
  • Terry, G., & Hayfield, N. (2021). Essentials of thematic analysis . American Psychological Association.

Example TA Studies

  • Braun, V., Terry, G., Gavey, N., & Fenaughty, J. (2009). ‘ Risk’and sexual coercion among gay and bisexual men in Aotearoa/New Zealand–key informant accounts .  Culture, Health & Sexuality ,  11 (2), 111-124.
  • Clarke, V., & Kitzinger, C. (2004). Lesbian and gay parents on talk shows: resistance or collusion in heterosexism? .  Qualitative Research in Psychology ,  1 (3), 195-217.

Print Friendly, PDF & Email

example of thematic analysis in research

What (Exactly) Is Thematic Analysis?

Plain-Language Explanation & Definition (With Examples)

By: Jenna Crosley (PhD). Expert Reviewed By: Dr Eunice Rautenbach | April 2021

Thematic analysis is one of the most popular qualitative analysis techniques we see students opting for at Grad Coach – and for good reason. Despite its relative simplicity, thematic analysis can be a very powerful analysis technique when used correctly. In this post, we’ll unpack thematic analysis using plain language (and loads of examples) so that you can conquer your analysis with confidence.

Thematic Analysis 101

  • Basic terminology relating to thematic analysis
  • What is thematic analysis
  • When to use thematic analysis
  • The main approaches to thematic analysis
  • The three types of thematic analysis
  • How to “do” thematic analysis (the process)
  • Tips and suggestions

First, the lingo…

Before we begin, let’s first lay down some terminology. When undertaking thematic analysis, you’ll make use of codes . A code is a label assigned to a piece of text, and the aim of using a code is to identify and summarise important concepts within a set of data, such as an interview transcript.

For example, if you had the sentence, “My rabbit ate my shoes”, you could use the codes “rabbit” or “shoes” to highlight these two concepts. The process of assigning codes is called qualitative coding . If this is a new concept to you, be sure to check out our detailed post about qualitative coding .

Codes are vital as they lay a foundation for themes . But what exactly is a theme? Simply put, a theme is a pattern that can be identified within a data set. In other words, it’s a topic or concept that pops up repeatedly throughout your data. Grouping your codes into themes serves as a way of summarising sections of your data in a useful way that helps you answer your research question(s) and achieve your research aim(s).

Alright – with that out of the way, let’s jump into the wonderful world of thematic analysis…

Thematic analysis 101

What is thematic analysis?

Thematic analysis is the study of patterns to uncover meaning . In other words, it’s about analysing the patterns and themes within your data set to identify the underlying meaning. Importantly, this process is driven by your research aims and questions , so it’s not necessary to identify every possible theme in the data, but rather to focus on the key aspects that relate to your research questions .

Although the research questions are a driving force in thematic analysis (and pretty much all analysis methods), it’s important to remember that these questions are not necessarily fixed . As thematic analysis tends to be a bit of an exploratory process, research questions can evolve as you progress with your coding and theme identification.

Thematic analysis is about analysing the themes within your data set to identify meaning, based on your research questions.

When should you use thematic analysis?

There are many potential qualitative analysis methods that you can use to analyse a dataset. For example, content analysis , discourse analysis , and narrative analysis are popular choices. So why use thematic analysis?

Thematic analysis is highly beneficial when working with large bodies of data ,  as it allows you to divide and categorise large amounts of data in a way that makes it easier to digest. Thematic analysis is particularly useful when looking for subjective information , such as a participant’s experiences, views, and opinions. For this reason, thematic analysis is often conducted on data derived from interviews , conversations, open-ended survey responses , and social media posts.

Your research questions can also give you an idea of whether you should use thematic analysis or not. For example, if your research questions were to be along the lines of:

  • How do dog walkers perceive rules and regulations on dog-friendly beaches?
  • What are students’ experiences with the shift to online learning?
  • What opinions do health professionals hold about the Hippocratic code?
  • How is gender constructed in a high school classroom setting?

These examples are all research questions centering on the subjective experiences of participants and aim to assess experiences, views, and opinions. Therefore, thematic analysis presents a possible approach.

In short, thematic analysis is a good choice when you are wanting to categorise large bodies of data (although the data doesn’t necessarily have to be large), particularly when you are interested in subjective experiences .

Thematic analysis allows you to divide and categorise large amounts of data in a way that makes it far easier to digest.

What are the main approaches?

Broadly speaking, there are two overarching approaches to thematic analysis: inductive and deductive . The approach you take will depend on what is most suitable in light of your research aims and questions. Let’s have a look at the options.

The inductive approach

The inductive approach involves deriving meaning and creating themes from data without any preconceptions . In other words, you’d dive into your analysis without any idea of what codes and themes will emerge, and thus allow these to emerge from the data.

For example, if you’re investigating typical lunchtime conversational topics in a university faculty, you’d enter the research without any preconceived codes, themes or expected outcomes. Of course, you may have thoughts about what might be discussed (e.g., academic matters because it’s an academic setting), but the objective is to not let these preconceptions inform your analysis.

The inductive approach is best suited to research aims and questions that are exploratory in nature , and cases where there is little existing research on the topic of interest.

The deductive approach

In contrast to the inductive approach, a deductive approach involves jumping into your analysis with a pre-determined set of codes . Usually, this approach is informed by prior knowledge and/or existing theory or empirical research (which you’d cover in your literature review ).

For example, a researcher examining the impact of a specific psychological intervention on mental health outcomes may draw on an existing theoretical framework that includes concepts such as coping strategies, social support, and self-efficacy, using these as a basis for a set of pre-determined codes.

The deductive approach is best suited to research aims and questions that are confirmatory in nature , and cases where there is a lot of existing research on the topic of interest.

Regardless of whether you take the inductive or deductive approach, you’ll also need to decide what level of content your analysis will focus on – specifically, the semantic level or the latent level.

A semantic-level focus ignores the underlying meaning of data , and identifies themes based only on what is explicitly or overtly stated or written – in other words, things are taken at face value.

In contrast, a latent-level focus concentrates on the underlying meanings and looks at the reasons for semantic content. Furthermore, in contrast to the semantic approach, a latent approach involves an element of interpretation , where data is not just taken at face value, but meanings are also theorised.

“But how do I know when to use what approach?”, I hear you ask.

Well, this all depends on the type of data you’re analysing and what you’re trying to achieve with your analysis. For example, if you’re aiming to analyse explicit opinions expressed in interviews and you know what you’re looking for ahead of time (based on a collection of prior studies), you may choose to take a deductive approach with a semantic-level focus.

On the other hand, if you’re looking to explore the underlying meaning expressed by participants in a focus group, and you don’t have any preconceptions about what to expect, you’ll likely opt for an inductive approach with a latent-level focus.

Simply put, the nature and focus of your research, especially your research aims , objectives and questions will  inform the approach you take to thematic analysis.

The four main approaches to thematic analysis are inductive, deductive, semantic and latent. The choice of approach depends on the type of data and what you're trying to achieve

What are the types of thematic analysis?

Now that you’ve got an understanding of the overarching approaches to thematic analysis, it’s time to have a look at the different types of thematic analysis you can conduct. Broadly speaking, there are three “types” of thematic analysis:

  • Reflexive thematic analysis
  • Codebook thematic analysis
  • Coding reliability thematic analysis

Let’s have a look at each of these:

Reflexive thematic analysis takes an inductive approach, letting the codes and themes emerge from that data. This type of thematic analysis is very flexible, as it allows researchers to change, remove, and add codes as they work through the data. As the name suggests, reflexive thematic analysis emphasizes the active engagement of the researcher in critically reflecting on their assumptions, biases, and interpretations, and how these may shape the analysis.

Reflexive thematic analysis typically involves iterative and reflexive cycles of coding, interpreting, and reflecting on data, with the aim of producing nuanced and contextually sensitive insights into the research topic, while at the same time recognising and addressing the subjective nature of the research process.

Codebook thematic analysis , on the other hand, lays on the opposite end of the spectrum. Taking a deductive approach, this type of thematic analysis makes use of structured codebooks containing clearly defined, predetermined codes. These codes are typically drawn from a combination of existing theoretical theories, empirical studies and prior knowledge of the situation.

Codebook thematic analysis aims to produce reliable and consistent findings. Therefore, it’s often used in studies where a clear and predefined coding framework is desired to ensure rigour and consistency in data analysis.

Coding reliability thematic analysis necessitates the work of multiple coders, and the design is specifically intended for research teams. With this type of analysis, codebooks are typically fixed and are rarely altered.

The benefit of this form of analysis is that it brings an element of intercoder reliability where coders need to agree upon the codes used, which means that the outcome is more rigorous as the element of subjectivity is reduced. In other words, multiple coders discuss which codes should be used and which shouldn’t, and this consensus reduces the bias of having one individual coder decide upon themes.

Quick Recap: Thematic analysis approaches and types

To recap, the two main approaches to thematic analysis are inductive , and deductive . Then we have the three types of thematic analysis: reflexive, codebook and coding reliability . Which type of thematic analysis you opt for will need to be informed by factors such as:

  • The approach you are taking. For example, if you opt for an inductive approach, you’ll likely utilise reflexive thematic analysis.
  • Whether you’re working alone or in a group . It’s likely that, if you’re doing research as part of your postgraduate studies, you’ll be working alone. This means that you’ll need to choose between reflexive and codebook thematic analysis.

Now that we’ve covered the “what” in terms of thematic analysis approaches and types, it’s time to look at the “how” of thematic analysis.

Need a helping hand?

example of thematic analysis in research

How to “do” thematic analysis

At this point, you’re ready to get going with your analysis, so let’s dive right into the thematic analysis process. Keep in mind that what we’ll cover here is a generic process, and the relevant steps will vary depending on the approach and type of thematic analysis you opt for.

Step 1: Get familiar with the data

The first step in your thematic analysis involves getting a feel for your data and seeing what general themes pop up. If you’re working with audio data, this is where you’ll do the transcription , converting audio to text.

At this stage, you’ll want to come up with preliminary thoughts about what you’ll code , what codes you’ll use for them, and what codes will accurately describe your content. It’s a good idea to revisit your research topic , and your aims and objectives at this stage. For example, if you’re looking at what people feel about different types of dogs, you can code according to when different breeds are mentioned (e.g., border collie, Labrador, corgi) and when certain feelings/emotions are brought up.

As a general tip, it’s a good idea to keep a reflexivity journal . This is where you’ll write down how you coded your data, why you coded your data in that particular way, and what the outcomes of this data coding are. Using a reflexive journal from the start will benefit you greatly in the final stages of your analysis because you can reflect on the coding process and assess whether you have coded in a manner that is reliable and whether your codes and themes support your findings.

As you can imagine, a reflexivity journal helps to increase reliability as it allows you to analyse your data systematically and consistently. If you choose to make use of a reflexivity journal, this is the stage where you’ll want to take notes about your initial codes and list them in your journal so that you’ll have an idea of what exactly is being reflected in your data. At a later stage in the analysis, this data can be more thoroughly coded, or the identified codes can be divided into more specific ones.

Keep a research journal for thematic analysis

Step 2: Search for patterns or themes in the codes

Step 2! You’re going strong. In this step, you’ll want to look out for patterns or themes in your codes. Moving from codes to themes is not necessarily a smooth or linear process. As you become more and more familiar with the data, you may find that you need to assign different codes or themes according to new elements you find. For example, if you were analysing a text talking about wildlife, you may come across the codes, “pigeon”, “canary” and “budgerigar” which can fall under the theme of birds.

As you work through the data, you may start to identify subthemes , which are subdivisions of themes that focus specifically on an aspect within the theme that is significant or relevant to your research question. For example, if your theme is a university, your subthemes could be faculties or departments at that university.

In this stage of the analysis, your reflexivity journal entries need to reflect how codes were interpreted and combined to form themes.

Step 3: Review themes

By now you’ll have a good idea of your codes, themes, and potentially subthemes. Now it’s time to review all the themes you’ve identified . In this step, you’ll want to check that everything you’ve categorised as a theme actually fits the data, whether the themes do indeed exist in the data, whether there are any themes missing , and whether you can move on to the next step knowing that you’ve coded all your themes accurately and comprehensively . If you find that your themes have become too broad and there is far too much information under one theme, it may be useful to split this into more themes so that you’re able to be more specific with your analysis.

In your reflexivity journal, you’ll want to write about how you understood the themes and how they are supported by evidence, as well as how the themes fit in with your codes. At this point, you’ll also want to revisit your research questions and make sure that the data and themes you’ve identified are directly relevant to these questions .

If you find that your themes have become too broad and there is too much information under one theme, you can split them up into more themes, so that you can be more specific with your analysis.

Step 4: Finalise Themes

By this point, your analysis will really start to take shape. In the previous step, you reviewed and refined your themes, and now it’s time to label and finalise them . It’s important to note here that, just because you’ve moved onto the next step, it doesn’t mean that you can’t go back and revise or rework your themes. In contrast to the previous step, finalising your themes means spelling out what exactly the themes consist of, and describe them in detail . If you struggle with this, you may want to return to your data to make sure that your data and coding do represent the themes, and if you need to divide your themes into more themes (i.e., return to step 3).

When you name your themes, make sure that you select labels that accurately encapsulate the properties of the theme . For example, a theme name such as “enthusiasm in professionals” leaves the question of “who are the professionals?”, so you’d want to be more specific and label the theme as something along the lines of “enthusiasm in healthcare professionals”.

It is very important at this stage that you make sure that your themes align with your research aims and questions . When you’re finalising your themes, you’re also nearing the end of your analysis and need to keep in mind that your final report (discussed in the next step) will need to fit in with the aims and objectives of your research.

In your reflexivity journal, you’ll want to write down a few sentences describing your themes and how you decided on these. Here, you’ll also want to mention how the theme will contribute to the outcomes of your research, and also what it means in relation to your research questions and focus of your research.

By the end of this stage, you’ll be done with your themes – meaning it’s time to write up your findings and produce a report.

It is very important at the theme finalisation stage to make sure that your themes align with your research questions.

Step 5: Produce your report

You’re nearly done! Now that you’ve analysed your data, it’s time to report on your findings. A typical thematic analysis report consists of:

  • An introduction
  • A methodology section
  • Your results and findings
  • A conclusion

When writing your report, make sure that you provide enough information for a reader to be able to evaluate the rigour of your analysis. In other words, the reader needs to know the exact process you followed when analysing your data and why. The questions of “what”, “how”, “why”, “who”, and “when” may be useful in this section.

So, what did you investigate? How did you investigate it? Why did you choose this particular method? Who does your research focus on, and who are your participants? When did you conduct your research, when did you collect your data, and when was the data produced? Your reflexivity journal will come in handy here as within it you’ve already labelled, described, and supported your themes.

If you’re undertaking a thematic analysis as part of a dissertation or thesis, this discussion will be split across your methodology, results and discussion chapters . For more information about those chapters, check out our detailed post about dissertation structure .

It’s absolutely vital that, when writing up your results, you back up every single one of your findings with quotations . The reader needs to be able to see that what you’re reporting actually exists within the results. Also make sure that, when reporting your findings, you tie them back to your research questions . You don’t want your reader to be looking through your findings and asking, “So what?”, so make sure that every finding you represent is relevant to your research topic and questions.

Quick Recap: How to “do” thematic analysis

Getting familiar with your data: Here you’ll read through your data and get a general overview of what you’re working with. At this stage, you may identify a few general codes and themes that you’ll make use of in the next step.

Search for patterns or themes in your codes : Here you’ll dive into your data and pick out the themes and codes relevant to your research question(s).

Review themes : In this step, you’ll revisit your codes and themes to make sure that they are all truly representative of the data, and that you can use them in your final report.

Finalise themes : Here’s where you “solidify” your analysis and make it report-ready by describing and defining your themes.

Produce your report : This is the final step of your thematic analysis process, where you put everything you’ve found together and report on your findings.

Tips & Suggestions

In the video below, we share 6 time-saving tips and tricks to help you approach your thematic analysis as effectively and efficiently as possible.

Wrapping Up

In this article, we’ve covered the basics of thematic analysis – what it is, when to use it, the different approaches and types of thematic analysis, and how to perform a thematic analysis.

If you have any questions about thematic analysis, drop a comment below and we’ll do our best to assist. If you’d like 1-on-1 support with your thematic analysis, be sure to check out our research coaching services here .

example of thematic analysis in research

Psst... there’s more!

This post was based on one of our popular Research Bootcamps . If you're working on a research project, you'll definitely want to check this out ...

23 Comments

Ollie

I really appreciate the help

Oliv

Hello Sir, how many levels of coding can be done in thematic analysis? We generate codes from the transcripts, then subthemes from the codes and themes from subthemes, isn’t it? Should these themes be again grouped together? how many themes can be derived?can you please share an example of coding through thematic analysis in a tabular format?

Abdullahi Maude

I’ve found the article very educative and useful

TOMMY BIN SEMBEH

Excellent. Very helpful and easy to understand.

SK

This article so far has been most helpful in understanding how to write an analysis chapter. Thank you.

Ruwini

My research topic is the challenges face by the school principal on the process of procurement . Thematic analysis is it sutable fir data analysis ?

M. Anwar

It is a great help. Thanks.

Pari

Best advice. Worth reading. Thank you.

Yvonne Worrell

Where can I find an example of a template analysis table ?

aishch

Finally I got the best article . I wish they also have every psychology topics.

Rosa Ophelia Velarde

Hello, Sir/Maam

I am actually finding difficulty in doing qualitative analysis of my data and how to triangulate this with quantitative data. I encountered your web by accident in the process of searching for a much simplified way of explaining about thematic analysis such as coding, thematic analysis, write up. When your query if I need help popped up, I was hesitant to answer. Because I think this is for fee and I cannot afford. So May I just ask permission to copy for me to read and guide me to study so I can apply it myself for my gathered qualitative data for my graduate study.

Thank you very much! this is very helpful to me in my Graduate research qualitative data analysis.

SAMSON ROTTICH

Thank you very much. I find your guidance here helpful. Kindly let help me understand how to write findings and discussions.

arshad ahmad

i am having troubles with the concept of framework analysis which i did not find here and i have been an assignment on framework analysis

tayron gee

I was discouraged and felt insecure because after more than a year of writing my thesis, my work seemed lost its direction after being checked. But, I am truly grateful because through the comments, corrections, and guidance of the wisdom of my director, I can already see the bright light because of thematic analysis. I am working with Biblical Texts. And thematic analysis will be my method. Thank you.

OLADIPO TOSIN KABIR

lovely and helpful. thanks

Imdad Hussain

very informative information.

Ricky Fordan

thank you very much!, this is very helpful in my report, God bless……..

Akosua Andrews

Thank you for the insight. I am really relieved as you have provided a super guide for my thesis.

Christelle M.

Thanks a lot, really enlightening

fariya shahzadi

excellent! very helpful thank a lot for your great efforts

Daniel Pelu

I am currently conducting a research on the Economic challenges to migrant integration. Using interviews to understand the challenges by interviewing professionals working with migrants. Wouks appreciate help with how to do this using the thematic approach. Thanks

KM Majola

The article cleared so many issues that I was not certain of. Very informative. Thank you.

Submit a Comment Cancel reply

Your email address will not be published. Required fields are marked *

Save my name, email, and website in this browser for the next time I comment.

  • Print Friendly

Have a language expert improve your writing

Run a free plagiarism check in 10 minutes, automatically generate references for free.

  • Knowledge Base
  • Methodology
  • How to Do Thematic Analysis | Guide & Examples

How to Do Thematic Analysis | Guide & Examples

Published on 5 May 2022 by Jack Caulfield . Revised on 7 June 2024.

Thematic analysis is a method of analysing qualitative data . It is usually applied to a set of texts, such as an interview or transcripts . The researcher closely examines the data to identify common themes, topics, ideas and patterns of meaning that come up repeatedly.

There are various approaches to conducting thematic analysis, but the most common form follows a six-step process:

  • Familiarisation
  • Generating themes
  • Reviewing themes
  • Defining and naming themes

This process was originally developed for psychology research by Virginia Braun and Victoria Clarke . However, thematic analysis is a flexible method that can be adapted to many different kinds of research.

Table of contents

When to use thematic analysis, different approaches to thematic analysis, step 1: familiarisation, step 2: coding, step 3: generating themes, step 4: reviewing themes, step 5: defining and naming themes, step 6: writing up.

Thematic analysis is a good approach to research where you’re trying to find out something about people’s views, opinions, knowledge, experiences, or values from a set of qualitative data – for example, interview transcripts , social media profiles, or survey responses .

Some types of research questions you might use thematic analysis to answer:

  • How do patients perceive doctors in a hospital setting?
  • What are young women’s experiences on dating sites?
  • What are non-experts’ ideas and opinions about climate change?
  • How is gender constructed in secondary school history teaching?

To answer any of these questions, you would collect data from a group of relevant participants and then analyse it. Thematic analysis allows you a lot of flexibility in interpreting the data, and allows you to approach large datasets more easily by sorting them into broad themes.

However, it also involves the risk of missing nuances in the data. Thematic analysis is often quite subjective and relies on the researcher’s judgement, so you have to reflect carefully on your own choices and interpretations.

Pay close attention to the data to ensure that you’re not picking up on things that are not there – or obscuring things that are.

Prevent plagiarism, run a free check.

Once you’ve decided to use thematic analysis, there are different approaches to consider.

There’s the distinction between inductive and deductive approaches:

  • An inductive approach involves allowing the data to determine your themes.
  • A deductive approach involves coming to the data with some preconceived themes you expect to find reflected there, based on theory or existing knowledge.

There’s also the distinction between a semantic and a latent approach:

  • A semantic approach involves analysing the explicit content of the data.
  • A latent approach involves reading into the subtext and assumptions underlying the data.

After you’ve decided thematic analysis is the right method for analysing your data, and you’ve thought about the approach you’re going to take, you can follow the six steps developed by Braun and Clarke .

The first step is to get to know our data. It’s important to get a thorough overview of all the data we collected before we start analysing individual items.

This might involve transcribing audio , reading through the text and taking initial notes, and generally looking through the data to get familiar with it.

Next up, we need to code the data. Coding means highlighting sections of our text – usually phrases or sentences – and coming up with shorthand labels or ‘codes’ to describe their content.

Let’s take a short example text. Say we’re researching perceptions of climate change among conservative voters aged 50 and up, and we have collected data through a series of interviews. An extract from one interview looks like this:

Coding qualitative data
Interview extract Codes
Personally, I’m not sure. I think the climate is changing, sure, but I don’t know why or how. People say you should trust the experts, but who’s to say they don’t have their own reasons for pushing this narrative? I’m not saying they’re wrong, I’m just saying there’s reasons not to 100% trust them. The facts keep changing – it used to be called global warming.

In this extract, we’ve highlighted various phrases in different colours corresponding to different codes. Each code describes the idea or feeling expressed in that part of the text.

At this stage, we want to be thorough: we go through the transcript of every interview and highlight everything that jumps out as relevant or potentially interesting. As well as highlighting all the phrases and sentences that match these codes, we can keep adding new codes as we go through the text.

After we’ve been through the text, we collate together all the data into groups identified by code. These codes allow us to gain a condensed overview of the main points and common meanings that recur throughout the data.

Next, we look over the codes we’ve created, identify patterns among them, and start coming up with themes.

Themes are generally broader than codes. Most of the time, you’ll combine several codes into a single theme. In our example, we might start combining codes into themes like this:

Turning codes into themes
Codes Theme
Uncertainty
Distrust of experts
Misinformation

At this stage, we might decide that some of our codes are too vague or not relevant enough (for example, because they don’t appear very often in the data), so they can be discarded.

Other codes might become themes in their own right. In our example, we decided that the code ‘uncertainty’ made sense as a theme, with some other codes incorporated into it.

Again, what we decide will vary according to what we’re trying to find out. We want to create potential themes that tell us something helpful about the data for our purposes.

Now we have to make sure that our themes are useful and accurate representations of the data. Here, we return to the dataset and compare our themes against it. Are we missing anything? Are these themes really present in the data? What can we change to make our themes work better?

If we encounter problems with our themes, we might split them up, combine them, discard them, or create new ones: whatever makes them more useful and accurate.

For example, we might decide upon looking through the data that ‘changing terminology’ fits better under the ‘uncertainty’ theme than under ‘distrust of experts’, since the data labelled with this code involves confusion, not necessarily distrust.

Now that you have a final list of themes, it’s time to name and define each of them.

Defining themes involves formulating exactly what we mean by each theme and figuring out how it helps us understand the data.

Naming themes involves coming up with a succinct and easily understandable name for each theme.

For example, we might look at ‘distrust of experts’ and determine exactly who we mean by ‘experts’ in this theme. We might decide that a better name for the theme is ‘distrust of authority’ or ‘conspiracy thinking’.

Finally, we’ll write up our analysis of the data. Like all academic texts, writing up a thematic analysis requires an introduction to establish our research question, aims, and approach.

We should also include a methodology section, describing how we collected the data (e.g., through semi-structured interviews or open-ended survey questions ) and explaining how we conducted the thematic analysis itself.

The results or findings section usually addresses each theme in turn. We describe how often the themes come up and what they mean, including examples from the data as evidence. Finally, our conclusion explains the main takeaways and shows how the analysis has answered our research question.

In our example, we might argue that conspiracy thinking about climate change is widespread among older conservative voters, point out the uncertainty with which many voters view the issue, and discuss the role of misinformation in respondents’ perceptions.

Cite this Scribbr article

If you want to cite this source, you can copy and paste the citation or click the ‘Cite this Scribbr article’ button to automatically add the citation to our free Reference Generator.

Caulfield, J. (2024, June 07). How to Do Thematic Analysis | Guide & Examples. Scribbr. Retrieved 26 August 2024, from https://www.scribbr.co.uk/research-methods/thematic-analysis-explained/

Is this article helpful?

Jack Caulfield

Jack Caulfield

Other students also liked, qualitative vs quantitative research | examples & methods, inductive reasoning | types, examples, explanation, what is deductive reasoning | explanation & examples.

Reference management. Clean and simple.

How to do a thematic analysis

example of thematic analysis in research

What is a thematic analysis?

When is thematic analysis used, braun and clarke’s reflexive thematic analysis, the six steps of thematic analysis, 1. familiarizing, 2. generating initial codes, 3. generating themes, 4. reviewing themes, 5. defining and naming themes, 6. creating the report, the advantages and disadvantages of thematic analysis, disadvantages, frequently asked questions about thematic analysis, related articles.

Thematic analysis is a broad term that describes an approach to analyzing qualitative data . This approach can encompass diverse methods and is usually applied to a collection of texts, such as survey responses and transcriptions of interviews or focus group discussions. Learn more about different research methods.

A researcher performing a thematic analysis will study a set of data to pinpoint repeating patterns, or themes, in the topics and ideas that are expressed in the texts.

In analyzing qualitative data, thematic analysis focuses on concepts, opinions, and experiences, as opposed to pure statistics. This requires an approach to data that is complex and exploratory and can be anchored by different philosophical and conceptual foundations.

A six-step system was developed to help establish clarity and rigor around this process, and it is this system that is most commonly used when conducting a thematic analysis. The six steps are:

  • Familiarization
  • Generating codes
  • Generating themes
  • Reviewing themes
  • Defining and naming themes
  • Creating the report

It is important to note that even though the six steps are listed in sequence, thematic analysis is not necessarily a linear process that advances forward in a one-way, predictable fashion from step one through step six. Rather, it involves a more fluid shifting back and forth between the phases, adjusting to accommodate new insights when they arise.

And arriving at insight is a key goal of this approach. A good thematic analysis doesn’t just seek to present or summarize data. It interprets and makes a statement about it; it extracts meaning from the data.

Since thematic analysis is used to study qualitative data, it works best in cases where you’re looking to gather information about people’s views, values, opinions, experiences, and knowledge.

Some examples of research questions that thematic analysis can be used to answer are:

  • What are senior citizens’ experiences of long-term care homes?
  • How do women view social media sites as a tool for professional networking?
  • How do non-religious people perceive the role of the church in a society?
  • What are financial analysts’ ideas and opinions about cryptocurrency?

To begin answering these questions, you would need to gather data from participants who can provide relevant responses. Once you have the data, you would then analyze and interpret it.

Because you’re dealing with personal views and opinions, there is a lot of room for flexibility in terms of how you interpret the data. In this way, thematic analysis is systematic but not purely scientific.

A landmark 2006 paper by Victoria Braun and Victoria Clarke (“ Using thematic analysis in psychology ”) established parameters around thematic analysis—what it is and how to go about it in a systematic way—which had until then been widely used but poorly defined.

Since then, their work has been updated, with the name being revised, notably, to “reflexive thematic analysis.”

One common misconception that Braun and Clarke have taken pains to clarify about their work is that they do not believe that themes “emerge” from the data. To think otherwise is problematic since this suggests that meaning is somehow inherent to the data and that a researcher is merely an objective medium who identifies that meaning.

Conversely, Braun and Clarke view analysis as an interactive process in which the researcher is an active participant in constructing meaning, rather than simply identifying it.

The six stages they presented in their paper are still the benchmark for conducting a thematic analysis. They are presented below.

This step is where you take a broad, high-level view of your data, looking at it as a whole and taking note of your first impressions.

This typically involves reading through written survey responses and other texts, transcribing audio, and recording any patterns that you notice. It’s important to read through and revisit the data in its entirety several times during this stage so that you develop a thorough grasp of all your data.

After familiarizing yourself with your data, the next step is coding notable features of the data in a methodical way. This often means highlighting portions of the text and applying labels, aka codes, to them that describe the nature of their content.

In our example scenario, we’re researching the experiences of women over the age of 50 on professional networking social media sites. Interviews were conducted to gather data, with the following excerpt from one interview.

Interview snippetCodes

It’s hard to get a handle on it. It’s so different from how things used to be done, when networking was about handshakes and business cards.

Confusion

Comparison with old networking methods

It makes me feel like a dinosaur.

Sense of being left behind

Plus, I've been burned a few times. I'll spend time making what I think are professional connections with male peers, only for the conversation to unexpectedly turn romantic on me. It seems like a lot of men use these sites as a way to meet women, not to develop their careers. It's stressful, to be honest.

Discomfort and unease

Unexpected experience with other users

In the example interview snippet, portions have been highlighted and coded. The codes describe the idea or perception described in the text.

It pays to be exhaustive and thorough at this stage. Good practice involves scrutinizing the data several times, since new information and insight may become apparent upon further review that didn’t jump out at first glance. Multiple rounds of analysis also allow for the generation of more new codes.

Once the text is thoroughly reviewed, it’s time to collate the data into groups according to their code.

Now that we’ve created our codes, we can examine them, identify patterns within them, and begin generating themes.

Keep in mind that themes are more encompassing than codes. In general, you’ll be bundling multiple codes into a single theme.

To draw on the example we used above about women and networking through social media, codes could be combined into themes in the following way:

CodesTheme

Confusion, Discomfort and unease, Unexpected experience with other users

Negative experience

Comparison with old networking methods, Sense of being left behind

Perceived lack of skills

You’ll also be curating your codes and may elect to discard some on the basis that they are too broad or not directly relevant. You may also choose to redefine some of your codes as themes and integrate other codes into them. It all depends on the purpose and goal of your research.

This is the stage where we check that the themes we’ve generated accurately and relevantly represent the data they are based on. Once again, it’s beneficial to take a thorough, back-and-forth approach that includes review, assessment, comparison, and inquiry. The following questions can support the review:

  • Has anything been overlooked?
  • Are the themes definitively supported by the data?
  • Is there any room for improvement?

With your final list of themes in hand, the next step is to name and define them.

In defining them, we want to nail down the meaning of each theme and, importantly, how it allows us to make sense of the data.

Once you have your themes defined, you’ll need to apply a concise and straightforward name to each one.

In our example, our “perceived lack of skills” may be adjusted to reflect that the texts expressed uncertainty about skills rather than the definitive absence of them. In this case, a more apt name for the theme might be “questions about competence.”

To finish the process, we put our findings down in writing. As with all scholarly writing, a thematic analysis should open with an introduction section that explains the research question and approach.

This is followed by a statement about the methodology that includes how data was collected and how the thematic analysis was performed.

Each theme is addressed in detail in the results section, with attention paid to the frequency and presence of the themes in the data, as well as what they mean, and with examples from the data included as supporting evidence.

The conclusion section describes how the analysis answers the research question and summarizes the key points.

In our example, the conclusion may assert that it is common for women over the age of 50 to have negative experiences on professional networking sites, and that these are often tied to interactions with other users and a sense that using these sites requires specialized skills.

Thematic analysis is useful for analyzing large data sets, and it allows a lot of flexibility in terms of designing theoretical and research frameworks. Moreover, it supports the generation and interpretation of themes that are backed by data.

There are times when thematic analysis is not the best approach to take because it can be highly subjective, and, in seeking to identify broad patterns, it can overlook nuance in the data.

What’s more, researchers must be judicious about reflecting on how their own position and perspective bears on their interpretations of the data and if they are imposing meaning that is not there or failing to pick up on meaning that is.

Thematic analysis offers a flexible and recursive way to approach qualitative data that has the potential to yield valuable insights about people’s opinions, views, and lived experience. It must be applied, however, in a conscientious fashion so as not to allow subjectivity to taint or obscure the results.

The purpose of thematic analysis is to find repeating patterns, or themes, in qualitative data. Thematic analysis can encompass diverse methods and is usually applied to a collection of texts, such as survey responses and transcriptions of interviews or focus group discussions. In analyzing qualitative data, thematic analysis focuses on concepts, opinions, and experiences, as opposed to pure statistics.

A big advantage of thematic analysis is that it allows a lot of flexibility in terms of designing theoretical and research frameworks. It also supports the generation and interpretation of themes that are backed by data.

A disadvantage of thematic analysis is that it can be highly subjective and can overlook nuance in the data. Also, researchers must be aware of how their own position and perspective influences their interpretations of the data and if they are imposing meaning that is not there or failing to pick up on meaning that is.

How many themes make sense in your thematic analysis of course depends on your topic and the material you are working with. In general, it makes sense to have no more than 6-10 broader themes, instead of having many really detailed ones. You can then identify further nuances and differences under each theme when you are diving deeper into the topic.

Since thematic analysis is used to study qualitative data, it works best in cases where you’re looking to gather information about people’s views, values, opinions, experiences, and knowledge. Therefore, it makes sense to use thematic analysis for interviews.

After familiarizing yourself with your data, the first step of a thematic analysis is coding notable features of the data in a methodical way. This often means highlighting portions of the text and applying labels, aka codes, to them that describe the nature of their content.

example of thematic analysis in research

How to Do Thematic Analysis_ 6 Steps & Examples

How to Do Thematic Analysis: 6 Steps & Examples

Unlock qualitative insights with our step-by-step guide on thematic analysis. Identify patterns, and generate meaningful insights in six simple steps.

Thematic analysis is a game-changer for qualitative researchers. It's the key to unlocking the hidden patterns and meanings buried deep within your data.

In this step-by-step guide, you'll discover how to master thematic analysis and transform your raw data into powerful insights. From familiarizing yourself with the data to generating codes and themes, you'll learn the essential techniques to conduct a rigorous and systematic analysis.

Whether you're a seasoned researcher or just starting out, this guide will demystify the process and provide you with a clear roadmap to success. So get ready to dive into the world of thematic analysis!

Table of contents

What is thematic analysis

6 Steps for doing thematic analysis

Thematic Analysis in Action: A Real-World Example

Method Pros and Cons

Applications in Qualitative Research

What is thematic analysis.

Thematic analysis is a qualitative research method that focuses on identifying, analyzing, and reporting patterns or themes within a dataset. Thematic analysis involves reading through a data set, identifying patterns in meaning, and deriving themes, providing a systematic and flexible way to interpret various aspects of the research topic.

The primary purpose of thematic analysis is to uncover and make sense of the collective or shared meanings and experiences within a dataset. By identifying common threads that extend across the data, researchers can gain a deeper understanding of the phenomenon under study and draw meaningful conclusions.

Key Characteristics

One of the key characteristics of thematic analysis is its flexibility. The approach is adaptable to a wide range of research questions and data types. Researchers can use thematic analysis inductively, allowing themes to emerge from the data itself, or, deductively, using existing theories or frameworks to guide the analysis process.

Another important aspect of thematic analysis is its focus on identifying and describing both implicit and explicit ideas within the data. Themes are not always directly observable but can be uncovered through a careful and systematic analysis of the dataset. This process involves looking beyond the surface-level content and examining the underlying meanings, assumptions, and ideas that shape participants' responses.

Inductive vs. Deductive Approaches

When conducting thematic analysis, researchers can choose between inductive (data-driven) or deductive (theory-driven) analysis approach. Inductive data analysis involves allowing themes to emerge from the data without any preconceived notions or theoretical frameworks guiding the analysis. This approach is particularly useful when exploring a new or under-researched topic, as it allows for the discovery of unexpected insights and patterns.

On the other hand, the deductive approach involves using existing theories or frameworks to guide the analysis process. In this case, researchers start with a set of pre-determined themes or categories and look for evidence within the data that supports or refutes these ideas. This approach is useful when testing or extending existing theories or when comparing findings across different studies or populations.

thematic analysis steps

Thematic Analysis Simplified: A 6 Step-by-Step Process for Qualitative Data Analysis

This step-by-step guide breaks down the process into six manageable stages.

By following these steps, you can effectively analyze and interpret qualitative data to gain valuable insights .

Step 1: Familiarize Yourself with the Data

The first step in thematic analysis is to immerse yourself in the data. Read and re-read the transcripts, field notes, or other qualitative data sources to gain a deep understanding of the content. As you read, take notes on initial ideas and observations that come to mind. This process helps you become familiar with the depth and breadth of the data.

Pay attention to patterns, recurring ideas, and potential themes that emerge during this initial review. It's important to approach the data with an open mind, allowing the content to guide your understanding rather than imposing preconceived notions or expectations.

Tips for Familiarizing Yourself with the Data

Set aside dedicated time to read through the data without distractions.

Use colors or and notes to mark interesting or significant passages.

Create a summary or overview of each data source to help you remember key points.

Thematic analysis code frames

Step 2: Generate Initial Codes

Once you've familiarized yourself with the data, the next step is to generate initial codes. Coding involves systematically labeling and organizing the data into meaningful groups. Go through the entire dataset and assign codes to interesting features or segments that are relevant to your research question.

Codes can be descriptive, interpretive, or pattern-based. Descriptive codes summarize the content, interpretive codes reflect the researcher's understanding, and pattern codes identify emerging themes or explanations. As you code, collate the data relevant to each code.

Tips for Generating Initial Codes

Use a qualitative data analysis software or a spreadsheet to organize your codes.

Be open to creating new codes as you progress through the data.

Regularly review and refine your codes to ensure consistency and relevance.

Thematic analysis steps

Step 3: Search for Themes

After coding the data, the next step is to search for themes. Themes are broader patterns or categories that capture significant aspects of the data in relation to the research question. Review your codes and consider how they can be grouped or combined to form overarching themes.

Collate all the data relevant to each potential theme. This may involve creating thematic maps or diagrams to visualize the relationships between codes and themes. Consider the different levels of themes, such as main themes and sub-themes , and how they connect to one another.

Tips for Searching for Themes

Look for recurring ideas, concepts, or patterns across the coded data.

Consider the relationships and connections between different codes.

Use visual aids like mind maps or sticky notes to organize and explore potential themes.

Step 4: Review Themes

Once you've identified potential themes, it's crucial to review and refine them. Check if the themes work in relation to the coded extracts and the entire dataset. This involves a two-level review process.

First, read through the collated extracts for each theme to ensure they form a coherent pattern. If some extracts don't fit, consider reworking the theme, creating a new theme, or discarding the extracts. Second, re-read the entire dataset to assess whether the themes accurately represent the data and capture the most important and relevant aspects.

Tips for Reviewing Themes

Ensure each theme is distinct and coherent.

Look for any data that contradicts or challenges your themes.

Create a thematic map to visually represent the relationships between themes.

how to do thematic analysis

Step 5: Define and Name Themes

After refining your themes, the next step is to define and name them. Conduct ongoing analysis to identify the essence and scope of each theme. Develop a clear and concise name for each theme that captures its central concept and significance.

Write a detailed analysis for each theme, explaining its meaning, relevance, and how it relates to the research question. Consider the story that each theme tells and how it contributes to the overall understanding of the data.

Tips for Defining and Naming Themes

Choose names that are concise, informative, and engaging.

Ensure the theme names and definitions are easily understandable to others.

Use quotes or examples from the data to illustrate and support each theme.

Step 6: Write Up

The final step in thematic analysis is to write up your findings in a clear and structured report. Your report should include an introduction that outlines the research question and methodology, followed by a detailed presentation of your themes and their significance.

Use examples and quotes from the data to support and illustrate each theme. Discuss how the themes relate to one another and to the overall research question. Consider the implications of your findings and how they contribute to existing knowledge or practice.

Tips for Writing Up

Use a clear and logical structure to guide the reader through your analysis.

Provide sufficient evidence and examples to support your themes.

Discuss the limitations of your study and suggest areas for future research.

how to do thematic analysis

Let's consider a real-world example to illustrate thematic analysis in action. Suppose an online retailer was looking to conduct semi-structured interviews with 20 customers who recently purchased products in their new footwear line. The researcher will likely want to understand the customers' experiences with the product, including its performance, design, and overall impact on their quality of life.

Step 1: Familiarizing Yourself with the Data

The first step in thematic analysis is to become familiar with the data. In this case, the researcher would transcribe the audio recordings of the interviews and read through the transcripts multiple times to get a sense of the overall content.

Immersing Yourself in the Data

During this familiarization process, the researcher should take notes on initial impressions, ideas, and potential patterns. This step is crucial for gaining a deep understanding of the data and laying the foundation for the subsequent analysis.

Step 2: Generating Initial Codes

Once familiar with the data, the researcher begins the coding process . Coding involves identifying and labeling segments of the text that are relevant to the research question.

In this example, the researcher might create codes such as "side effects," "quality of life," "treatment effectiveness," and "patient satisfaction." These codes help organize the data and make it easier to identify patterns and themes.

Using Coding Software

To streamline the coding process, researchers can use qualitative data analysis software like Kapiche . The platform allows uers to highlight and label segments of text , organize codes into categories, and visualize the relationships between the data.

Step 3: Searching for Themes

After coding the data, the researcher looks for broader patterns of meaning, known as themes. Themes capture something important about the data in relation to the research question and represent a level of patterned response or meaning within the dataset.

In this example, the researcher might identify themes such as "patients experienced significant improvement in symptoms," "side effects were manageable and tolerable," and "treatment enhanced overall quality of life."

Step 4: Reviewing and Refining Themes

The researcher then reviews and refines the themes to ensure they accurately represent the data. This process involves checking that the themes work in relation to the coded extracts and the entire dataset.

Ensuring Theme Coherence

The researcher should also consider whether the themes are internally coherent, consistent, and distinctive. If necessary, themes may be combined, split, or discarded to better capture the essence of the data.

Step 5: Defining and Naming Themes

The researcher defines and names the themes, capturing the essence of what each theme is about. Clear and concise theme names help convey the key findings of the analysis to readers.

In this example, the researcher might define and name the themes as "Treatment Effectiveness," "Manageable Side Effects," and "Improved Quality of Life."

By following these steps, the researcher can use thematic analysis to make sense of the patient interview data and gain valuable insights into their experiences with the new treatment. This real-world example demonstrates the power of thematic analysis in identifying patterns of meaning and providing a rich, detailed account of qualitative data.

Step 6: Report write-up

Finally, the researcher can package the findings in a clear report to communicate to other key stakeholders. The report would ideally include a summary themes, methodology, as well as detailed examples that bring the overarching trends to life.

thematic analysis pros and cons

Thematic Analysis: Weighing the Pros and Cons

Having explored the steps in doing thematic analysis, it's important to consider the advantages and disadvantages of the research method.

Thematic analysis has gained popularity due to its flexibility and accessibility, but it also has some limitations that researchers should be aware of.

Advantages of Thematic Analysis

Thematic analysis offers several benefits, making it a popular choice for qualitative analysis. One of its main advantages is its flexibility in application across a range of theoretical approaches. This means that researchers can use thematic analysis in various fields, from psychology and sociology to healthcare and education.

Another advantage is that thematic analysis is accessible to researchers with little or no experience in qualitative research methods. The process is relatively straightforward and does not require advanced technical skills or specialized software. This makes it an attractive option for novice researchers or those working with limited resources.

Thematic analysis also produces results that are generally accessible to an educated general public. The themes generated from the data are often easy to understand and can be presented in a clear and concise manner. This is particularly useful when communicating research findings to stakeholders or policymakers who may not have a background in the specific field of study.

Disadvantages of Thematic Analysis

Despite its advantages, thematic analysis also has some limitations that researchers should consider. One of the main disadvantages is the lack of substantial rigour on thematic analysis methodology compared to other qualitative approaches. This can make it challenging for researchers to find guidance or examples of best practices when conducting thematic analysis.

The flexibility of thematic analysis can also be a double-edged sword. While it allows for adaptability across different research contexts, it can also lead to inconsistency and lack of coherence in developing themes. Researchers may struggle to maintain a consistent approach throughout the analysis process, resulting in themes that are not well-defined or integrated.

Another limitation of thematic analysis is its limited interpretive power if not used within an existing theoretical framework. Without a guiding theory or conceptual framework, the analysis may remain descriptive rather than interpretive, failing to provide the deeper insights you're after.

Ensuring Rigorous Thematic Analysis

To overcome the limitations of thematic analysis process and ensure rigorous results, researchers should:

Familiarize themselves with the existing literature on thematic analysis and seek guidance from experienced researchers in the field.

Develop a clear and consistent approach to coding and theme development, documenting each step of the process to ensure transparency and reproducibility.

Consider using thematic analysis in conjunction with other qualitative methods or within an existing theoretical framework to enhance its interpretive power.

Be flexible throughout the research process, acknowledging biases and assumptions and how these may influence the analysis.

By weighing the pros and cons of thematic analysis and taking steps to ensure rigour, researchers can harness the benefits of this method while minimizing its limitations, producing valuable insights from qualitative data.

thematic analysis method

Thematic analysis is widely used in various fields, including psychology, social sciences, and health research. This approach is particularly suitable for anyone doing qualitative content analysis of interviews, focus groups, and open-ended survey responses.

In psychology, thematic analysis has been used to explore a range of topics, such as experiences of mental health issues, identity formation, and interpersonal relationships. A key paper by Braun and Clarke (2006) demonstrated how thematic analysis can be used in psychology studies, providing guidelines on how to approach generating themes and leveraging a systematic coding process.

Combining Thematic Analysis with Other Methods

Thematic analysis can be used as a standalone method or in combination with other qualitative or quantitative approaches. When used in conjunction with other methods, thematic analysis can provide a more comprehensive understanding of the research topic and can enhance the credibility of the findings.

For example, researchers can use thematic analysis to analyze raw interview data, and then use the identified themes to inform the development of a quantitative survey to probe deeper. This approach allows for effective exploration of a topic, providing a more complete picture of the research themes.

Thematic Analysis: Your Key to Unlocking Qualitative Insights

Thematic analysis is a powerful tool for making sense of research data. By familiarizing yourself with data, generating initial codes, searching for themes, reviewing and refining them, and finally writing up your findings, you can uncover rich insights that might otherwise remain hidden.

Ready to put thematic analysis into practice? Start by gathering your qualitative data, whether it's interview transcripts, open-ended survey responses, or focus group discussions.

Then, leverage a tool like Kapiche as you follow the step-by-step process outlined in this guide. From pre-coding to post-coding, this guide should help arrive at the themes that best capture the essence of your data.

Want to see how Kapiche can support your thematic research goals? Watch a demo here today to get a tour of the platform.

You might also like

example of thematic analysis in research

example of thematic analysis in research

The Ultimate Guide to Qualitative Research - Part 2: Handling Qualitative Data

example of thematic analysis in research

  • Handling qualitative data
  • Transcripts
  • Field notes
  • Survey data and responses
  • Visual and audio data
  • Data organization
  • Data coding
  • Coding frame
  • Auto and smart coding
  • Organizing codes
  • Qualitative data analysis

Content analysis

  • Introduction

What is meant by thematic analysis?

The thematic analysis process, thematic analysis in other research methods, using atlas.ti for qualitative analysis, considerations for thematic analysis.

  • Thematic analysis vs. content analysis
  • Narrative research
  • Phenomenological research

Discourse analysis

Grounded theory.

  • Deductive reasoning
  • Inductive reasoning
  • Inductive vs. deductive reasoning
  • Qualitative data interpretation
  • Qualitative data analysis software

Thematic analysis

One of the most straightforward forms of qualitative data analysis involves the identification of themes and patterns that appear in otherwise unstructured qualitative data . Thematic analysis is an integral component of qualitative research because it provides an entry point into analyzing qualitative data.

Let's look at thematic analysis, its role in qualitative research methods , and how ATLAS.ti can help you form themes from raw data to generate a theoretical framework .

example of thematic analysis in research

The main objective of research is to order data into meaningful patterns and generate new knowledge arising from theories about that data. Quantitative data is analyzed to measure a phenomenon's quantifiable aspects (e.g., an element's melting point, the effective income tax rate in the suburbs). The advantage of quantitative research is that data is often already structured, or at least easily structured, to quickly draw insights from numerical values.

On the other hand, some phenomena cannot be easily quantified, or they require conceptual development before they can be quantified. For example, what do people mean when they think of a movie or TV show as "good"? In the everyday world, people in a casual discussion may judge the quality of entertainment as a matter of personal preference, something that cannot be defined, let alone universally understood.

example of thematic analysis in research

As a result, researchers analyze qualitative data for identifying themes or phenomena that occur often or in telling patterns. In the case of TV shows, a collection of reviews of TV shows may frequently mention the acting, the script writing, and the production values, among other things. If these aspects are mentioned the most often, researchers can think of these as the themes determining the quality of a given TV show.

A useful metaphor for thematic analysis

Even if this is an easy concept to grasp, realizing this concept in qualitative research is a significant challenge. The biggest consideration for thematic analysis is that qualitative data is often unstructured and requires some organization to make it relevant to researchers and their audience.

Imagine that you have a bag of marbles. Each marble has one of a set of different colors. If you were to sort the marbles by color, you could determine how many colors are in the bag and which colors are the most common.

example of thematic analysis in research

The thematic analysis process is similar to sorting different-colored marbles. Instead of sorting colors, you are sorting themes in a data set to determine which themes appear the most often or to identify patterns among these themes.

After your initial analysis, you can take this one step further and separate "dark" colors from "light" colors or "warm" colors from "cool" colors. Blue and green are distinctly different colors, but you can group them under the "cool" category of colors to form a more overarching theme.

example of thematic analysis in research

Turn raw data into broader insights with ATLAS.ti

Our powerful data analysis software is available with a free trial.

A simple example of thematic analysis

Imagine a simple research question : how do teachers determine if a student's essay is good? Suppose you have a set of transcripts of interviews with teachers discussing writing classes and students' essays. In this case, the objective of thematic analysis is to determine the main factors teachers use to determine the quality of a piece of writing.

As you read the transcripts, you might find that teachers share some common answers. Of course, you might have an intuition that correct grammar and spelling are important, which will likely be confirmed by the teachers in their interviews. However, other considerations might surface in the data.

The next question in this casual thematic analysis is, what considerations appear most often? A few teachers may occasionally mention the size and typeface of the text as deciding factors, but more often they might say that the flow and organization of students' writing are more important. Analyzing the occurrences and patterns among themes across your transcripts can help you develop an answer to your research question.

The subjectivity of themes

One challenge is that themes in qualitative analysis, as with determining the themes of good writing, are not as visible to the naked eye as colors on a marble. The color "red" is relatively easy to see, but the fields in which thematic analysis is often applied do not deal with concepts that can necessarily be seen "objectively." It is up to the researcher to derive themes from the data from an inductive approach. Researchers can also utilize deductive approaches if they want to analyze their data according to themes that have been previously identified in other research.

example of thematic analysis in research

Think about the picture up above. To the naked eye, these children are holding hands. But themes that can be interpreted from this picture may include "friendship," "happiness," or even "family." The thematic analysis of pictures like this one often depends on a researcher's theoretical commitments, knowledge base, and cultural perspective.

This also means that you are responsible for explaining how you arrived at the themes arising from your data set. While colors are intuitively easy to distinguish, you are often required to explain more subjective codes and themes like "resilience" or "entitlement" so that you and your research audience have a common understanding of your data analysis .

This explanation should account for who you are as a researcher and how you see the data (since, after all, a word like "resilience" can mean different things to different people). A fully reflexive thematic analysis documents and presents where the researcher is relative to their data and to their research audience.

example of thematic analysis in research

Applications for thematic analysis

Many disciplines within qualitative research employ thematic analysis to make sense of social phenomena. For instances, these fields might be:

  • psychotherapy research
  • qualitative psychology
  • cultural anthropology

In a nutshell, any research discipline that relies on the understanding of social phenomena or insights that may not easily be quantifiable will attract researchers engaged in thematic analysis. Moreover, any exploratory research design lends itself easily to the identification of previously unknown themes that can later be used in a qualitative, quantitative, or confirmatory research project.

Common forms of data collection

Thematic analysis can involve any number of qualitative research methods to collect data, including:

  • focus groups
  • observations
  • literature reviews

Any unstructured data set, particularly any data set that captures social phenomena, can benefit from thematic analysis. The main consideration in ensuring rigor in data collection for thematic analysis is ensuring that your data is representative of the population or phenomenon you are trying to capture.

Virginia Braun and Victoria Clarke are the key researchers involved in making thematic analysis a commonly utilized approach in qualitative research . A quick search for their scholarship will tell you the basic steps involved in thematic analysis:

  • Become familiar with the data
  • Generate codes from the data
  • Generate themes based on the codes
  • Review the potential themes
  • Define the themes for the final reporting

In a nutshell, thematic analysis requires the researcher to look at their data, summarize their data with codes , and develop those codes to the extent that they can contribute a broader understanding of the context from which the data is collected.

While these are the key points in a robust and rigorous thematic analysis , there are understated parts of the qualitative research process that can often be taken for granted but must never be overlooked to ensure that researchers can analyze their data quickly and with as few challenges as possible.

The process in greater detail

Thematic analysis relies on research questions that are exploratory in nature, thus requiring an inductive approach to examining the data. While you might rely on an existing theoretical framework to decide your research questions and collect all the data for your project, thematic analysis primarily looks at your data inductively for what it says and what it says most often.

After data collection, you need to organize the data in some way to make the data analysis process easier (or, at minimum, possible). A data set in qualitative research is often akin to a crowd of people where individuals move in any direction without any sense of organization. This is a challenge if your research question involves understanding the crowd's age, gender, ethnicity, or style of clothing.

example of thematic analysis in research

The role of qualitative researchers at this stage is to sort out the crowd. In this example, perhaps this means having the crowd split into different groups according to those demographic identifiers to see which groups are the largest. Reorganizing the crowd from what was previously a group of wandering individuals can offer a better sense of who is in the room.

Qualitative data is often similarly unstructured and in need of reorganization. When dealing with thematic analysis, you need to reorganize the information so that the themes become more apparent to you and your research audience. In most cases, this means reducing the entire data set, as large as it might be, into a more concise form that allows for a more feasible analysis .

example of thematic analysis in research

Codes and themes are forms of data reduction that address this need. In a thematic analysis involving qualitative data analysis software , researchers code their data by applying short but descriptive phrases to larger data segments to summarize them for later analysis. Later stages of thematic analysis reorganize these codes into larger categories and then themes, where ultimately the themes support contribution to meaningful insights and existing theory.

As you progress in the coding process, you should start to notice that distinct codes may be related to each other. In a sense, codes provide researchers with visual data that they can examine to generate useful themes. ATLAS.ti, for example, lets you examine your codes in the margin to give you a sense of which codes and themes frequently appear in your data. As you code your data, you can apply colors to your codes. This is a flexible method that allows you to create preliminary categories that you can examine visually for their abundance and patterns.

example of thematic analysis in research

Later on, your codes can be organized into more formal categories or nested in hierarchies to contribute to a more robust thematic analysis.

Especially in qualitative research , discrete analytical approaches overlap with each other, meaning that a sufficiently thorough analysis of your data can eventually yield themes useful to your research. Let's examine a few of the more prominent approaches in qualitative research and their relation to thematic analysis.

Using grounded theory involves developing analysis iteratively through an inductive approach . While there is a great deal of overlap with thematic analysis approaches, grounded theory relies on incorporating more data to support the analysis in previous iterations of the research.

Nonetheless, the analytic process is largely the same for both approaches as they rely on seeking out phenomena that occur in abundance or distinct patterns. As you analyze qualitative data in either orientation, your main consideration is to observe which patterns emerge that can help contribute to a more universal understanding of the population or phenomenon under observation.

Narrative analysis

Understanding narratives is often less about taking large samples of data and more about unpacking the meaning that is produced in the data that is collected. In narrative research analysis , the data set is merely the narrative to be examined for its meaning, intent, and effect on its audience.

Searching for abundant or patterned themes is still a common objective when examining narratives. However, specific questions guide a narrative analysis , such as what the narrator is trying to say, how they say it, and how their audience receives the narrator's message.

Analyzing discourse is similar to analyzing narratives in that there is an examination of the subtext informing the use of words in communication. Research questions under both of these approaches focus specifically on language and communication, while thematic analysis can apply to all forms of data.

The scope of analysis is also different among approaches. Thematic analysis seeks to identify patterns in abundance. In contrast, discourse analysis can look at individual instances in discursive practices to more fully understand why people use language in a particular way.

However, the data resulting from an analysis of discursive practices can also be examined thematically. Discursive patterns within culturally-defined groups and cultural practices can be determined with a thematic analysis when utterances or interactional turns and patterns among them can be identified.

Among all the approaches in this section, content analysis is arguably the most quantitative. Strictly speaking, the words or phrases that appear most often in a body of textual data can tell something useful about the data as a whole. For example, imagine how we feel when a public speaker says "um" or "uh" an excessive number of times compared to another speaker who doesn't use these utterances at all. In another case, what can we say about the confidence of a person who frequently writes, "I don't know, but..."?

Content analysis seeks to determine the frequencies of aspects of language to understand a body of data. Unlike discourse analysis, however, content analysis looks strictly at what is said or written, with analysis primarily stemming from a statistical understanding of the data.

Oftentimes, content analysis is deductive in that it might apply previous theory to new data, unlike thematic analysis, which is primarily inductive in nature. That said, the findings from a content analysis can be used to determine themes, particularly if your research question can be addressed by directly looking at the textual data.

For thematic analysis, software is especially useful for identifying themes within large data sets. After all, thematically analyzing data by hand can be time-consuming, and a researcher might miss nuanced data without software to help them look at all the data thoroughly.

Coding qualitative data

For qualitative researchers, the coding process is one of the key tools for structuring qualitative data to facilitate any data analysis . In ATLAS.ti, data is broken down into quotations or segments of data that can be reduced to a set of codes that can be analyzed later.

example of thematic analysis in research

The codes and quotations appear in the margin next to a document in ATLAS.ti. This visualization is useful in showing how much of your data is coded and what concise meaning can be inferred from the data. In terms of thematic analysis, however, the codes can be assigned different colors based on what the researcher perceives as categories emerging from their project, as seen in the example above.

As you code the data iteratively, reviewing themes as they emerge, you can organize discrete codes within larger categories. ATLAS.ti provides spaces in your project called code groups and code categories where sets of codes in tandem represent broader, more theoretically developed themes. This approach to data organization , rather than merging codes together as broader units, allows for a more particular analysis of individual codes as your research questions evolve and develop over the course of your project.

ATLAS.ti tools for thematic analysis

As discussed above, analyzing qualitative data for themes can often be a matter of determining which codes and which categories of codes appear across the data and patterns among them. Indeed, any analysis software can assist you with this coding process for thematic analysis. The tools in ATLAS.ti, however, can help to make the process easier and more insightful. Let's look at a few of the many important features that are invaluable to conducting thematic analysis.

Code Manager

The Code Manager is ATLAS.ti's central space where researchers can organize and analyze their codes independent of the raw data . Researchers can perform numerous tasks in the Code Manager depending on their research questions and objectives, including looking just at the data that is associated with a particular code, organizing codes into hierarchies through code categories and nested sub-codes, and determining the frequencies and level of theoretical development for each code.

example of thematic analysis in research

Co-Occurrence Analysis

Combinations of codes that overlap with each other can also illuminate themes in your data, perhaps more ably than discrete codes. This is different from understanding codes as groups, as an analysis for codes that frequently occur together in the data can give a sense of the relationships between different aspects of a phenomenon.

example of thematic analysis in research

The Co-Occurrence Analysis tool helps researchers determine co-occurrence between different codes by placing them in a table, a bar chart, a Sankey diagram, or a force-directed graph. These visualizations can illustrate the strength of relationships between codes to you and your research audience. The relationships themselves can also be useful in generating themes useful for your analysis.

Word Frequencies

Qualitative content analysis depends on the frequencies of words, phrases, and other important aspects found in textual data. These frequencies can also help you in generating themes, particularly if your research questions are focused on the textual data itself.

The Word Frequencies tool in ATLAS.ti can facilitate a content analysis leading to a thematic analysis by giving you statistical data about what words appear most often in your project. Suppose these words can contribute to the development of themes. In that case, you can click on these words to find relevant quotations that you can code for thematic analysis. In addition, you can use ATLAS.ti’s Text Search tool to search for data segments that contain your word(s) of interest and automatically code them .

example of thematic analysis in research

You can also use themes to refine the scope of the Word Frequencies tool. By default, Word Frequencies looks at documents, but the tool also allows researchers to filter the data by selecting the codes relevant to their query. That way, you can look at the most relevant data quotations that match your desired codes for a richer thematic analysis.

Patterns and themes may also emerge from combinations of codes, in which case the Query Tool can help you construct smart codes. Smart codes are more versatile than nested sub-codes or code groups as they allow you to set multiple criteria based on true/false conditions as well as proximity. For example, while a code group simply aggregates distinct codes together to show you quotations with any of the included codes, you can define a set of rules to filter the data and find the most relevant quotations for your thematic analysis.

example of thematic analysis in research

A systematic and rigorous approach to thematic analysis involves showing your research audience how you arrived at your codes and themes. In qualitative research , visualizations offer clarity about the data in your project, which is a critical skill when explaining the broader meaning derived from otherwise unstructured data .

A TreeMap of codes is a representation of the application of codes relative to each other. In other words, codes that have been applied the most often in your data occupy the largest portions of the TreeMap, while less frequently used codes appear smaller in your visualization. This can give you a sense of the prevalence of certain codes over other codes. Moreover, when you assign colors to codes along the lines of themes and categories, you can quickly get a visual understanding of the themes that appear most often in your project.

example of thematic analysis in research

As a result, the TreeMap for codes can help provide a visual, thematic map that you can export as an image for use in explaining key themes in your research reports .

In qualitative research , thematic analysis is a useful means for generating a theoretical framework for qualitative concepts and phenomena. As always, though, theoretical development is best supported by thorough research. A theory that emerges from thematic analysis can be affirmed by additional inquiries, whether through a qualitative, quantitative , or mixed methods study .

Further research is always recommended for qualitative research, such as those that employ a thematic analysis, for the very reason that themes in qualitative concepts are socially constructed by the researcher. In turn, future research building on thematic analysis depends on a research design that is transparent and clearly defined so that other researchers can understand how the themes were generated in the first place. This requires a detailed accounting of the data and the analysis through comprehensive detail and visualizations in the final report.

To that end, ATLAS.ti's various tools are specifically designed to allow researchers to share and report their data to their research audiences through data reports and visualizations. Especially where qualitative research and thematic analysis are involved, researchers can benefit from transparently showing their analysis through data excerpts, visualizations , and descriptions of their methodology.

Analyze and visualize all your data in ATLAS.ti

Identify patterns and potential themes from your data with ATLAS.ti. Download a free trial today.

How to do thematic analysis

Last updated

8 February 2023

Reviewed by

Miroslav Damyanov

Short on time? Get an AI generated summary of this article instead

Uncovering themes in data requires a systematic approach. Thematic analysis organizes data so you can easily recognize the context.

  • What is thematic analysis?

Thematic analysis is   a method for analyzing qualitative data that involves reading through a data set and looking for patterns to derive themes . The researcher's subjective experience plays a central role in finding meaning within the data.

Streamline your thematic analysis

Find patterns and themes across all your qualitative data when you analyze it in Dovetail

  • What are the main approaches to thematic analysis?

Inductive thematic analysis approach

Inductive thematic analysis entails   deriving meaning and identifying themes from data with no preconceptions.  You analyze the data without any expected outcomes.

Deductive thematic analysis approach

In the deductive approach, you analyze data with a set of expected themes. Prior knowledge, research, or existing theory informs this approach.

Semantic thematic analysis approach

With the semantic approach, you ignore the underlying meaning of data. You take identifying themes at face value based on what is written or explicitly stated.

Latent thematic analysis approach

Unlike the semantic approach, the latent approach focuses on underlying meanings in data and looks at the reasons for semantic content. It involves an element of interpretation where you theorize meanings and don’t just take data at face value.

  • When should thematic analysis be used?

Thematic analysis is beneficial when you’re working with large bodies of data. It allows you to divide and categorize huge quantities of data in a way that makes it far easier to digest.  

The following scenarios warrant the use of thematic analysis:

You’re new to qualitative analysis

You need to identify patterns in data

You want to involve participants in the process

Thematic analysis is particularly useful when you’re looking for subjective information such as experiences and opinions in surveys , interviews, conversations, or social media posts. 

  • What are the advantages and disadvantages of thematic analysis?

Thematic analysis is a highly flexible approach to qualitative data analysis that you can modify to meet the needs of many studies. It enables you to generate new insights and concepts from data. 

Beginner researchers who are just learning how to analyze data will find thematic analysis very accessible. It’s easy for most people to grasp and can be relatively quick to learn.

The flexibility of thematic analysis can also be a disadvantage. It can feel intimidating to decide what’s important to emphasize, as there are many ways to interpret meaning from a data set.

  • What is the step-by-step process for thematic analysis?

The basic thematic analysis process requires recognizing codes and themes within a data set. A code is a label assigned to a piece of data that you use to identify and summarize important concepts within a data set. A theme is a pattern that you identify within the data. Relevant steps may vary based on the approach and type of thematic analysis, but these are the general steps you’d take:

1. Familiarize yourself with the data(pre-coding work)

Before you can successfully work with data, you need to understand it. Get a feel for the data to see what general themes pop up. Transcribe audio files and observe any meanings and patterns across the data set. Read through the transcript, and jot down notes about potential codes to create. 

2. Create the initial codes (open code work)

Create a set of initial codes to represent the patterns and meanings in the data. Make a codebook to keep track of the codes. Read through the data again to identify interesting excerpts and apply the appropriate codes. You should use the same code to represent excerpts with the same meaning. 

3. Collate codes with supporting data (clustering of initial code)

Now it's time to group all excerpts associated with a particular code. If you’re doing this manually, cut out codes and put them together. Thematic analysis software will automatically collate them.

4. Group codes into themes (clustering of selective codes)

Once you’ve finalized the codes, you can sort them into potential themes. Themes reflect trends and patterns in data. You can combine some codes to create sub-themes.

5. Review, revise, and finalize the themes (final revision)

Now you’ve decided upon the initial themes, you can review and adjust them as needed. Each theme should be distinct, with enough data to support it. You can merge similar themes and remove those lacking sufficient supportive data. Begin formulating themes into a narrative. 

6. Write the report

The final step of telling the story of a set of data is writing the report. You should fully consider the themes to communicate the validity of your analysis.

A typical thematic analysis report contains the following:

An introduction

A methodology section

Results and findings

A conclusion

Your narrative must be coherent, and it should include vivid quotes that can back up points. It should also include an interpretive analysis and argument for your claims. In addition, consider reporting your findings in a flowchart or tree diagram, which can be independent of or part of your report.  

In conclusion, a thematic analysis is a method of analyzing qualitative data. By following the six steps, you will identify common themes from a large set of texts. This method can help you find rich and useful insights about people’s experiences, behaviors, and nuanced opinions.

  • How to analyze qualitative data

Qualitative data analysis is the process of organizing, analyzing, and interpreting non-numerical and subjective data . The goal is to capture themes and patterns, answer questions, and identify the best actions to take based on that data. 

Researchers can use qualitative data to understand people’s thoughts, feelings, and attitudes. For example, qualitative researchers can help business owners draw reliable conclusions about customers’ opinions and discover areas that need improvement. 

In addition to thematic analysis, you can analyze qualitative data using the following:

Content analysis

Content analysis examines and counts the presence of certain words, subjects, and contexts in documents and communication artifacts, such as: 

Text in various formats

This method transforms qualitative input into quantitative data. You can do it manually or with electronic tools that recognize patterns to make connections between concepts.  

Free AI content analysis generator

Make sense of your research by automatically summarizing key takeaways through our free content analysis tool.

example of thematic analysis in research

Narrative analysis

Narrative analysis interprets research participants' stories from testimonials, case studies, interviews, and other text or visual data. It provides valuable insights into the complexity of people's feelings, beliefs, and behaviors.

Discourse analysis

In discourse analysis , you analyze the underlying meaning of qualitative data in a particular context, including: 

Historical 

This approach allows us to study how people use language in text, audio, and video to unravel social issues, power dynamics, or inequalities. 

For example, you can look at how people communicate with their coworkers versus their bosses. Discourse analysis goes beyond the literal meaning of words to examine social reality.

Grounded theory analysis

In grounded theory analysis, you develop theories by examining real-world data. The process involves creating hypotheses and theories by systematically collecting and evaluating this data. While this approach is helpful for studying lesser-known phenomena, it might be overwhelming for a novice researcher. 

  • Challenges with analyzing qualitative data

While qualitative data can answer questions that quantitative data can't, it still comes with challenges.

If done manually, qualitative data analysis is very time-consuming.

It can be hard to choose a method. 

Avoiding bias is difficult.

Human error affects accuracy and consistency.

To overcome these challenges, you should fine-tune your methods by using the appropriate tools in collaboration with teammates.

example of thematic analysis in research

Learn more about thematic analysis software

What is thematic analysis in qualitative research.

Thematic analysis is a method of analyzing qualitative data. It is applied to texts, such as interviews or transcripts. The researcher closely examines the data to identify common patterns and themes.

Can thematic analysis be done manually?

You can do thematic analysis manually, but it is very time-consuming without the help of software.

What are the two types of thematic analysis?

The two main types of thematic analysis include codebook thematic analysis and reflexive thematic analysis.

Codebook thematic analysis uses predetermined codes and structured codebooks to analyze from a deductive perspective. You draw codes from a review of the data or an initial analysis to produce the codebooks.

Reflexive thematic analysis is more flexible and does not use a codebook. Researchers can change, remove, and add codes as they work through the data. 

What makes a good thematic analysis?

The goal of thematic analysis is more than simply summarizing data; it's about identifying important themes. Good thematic analysis interprets, makes sense of data, and explains it. It produces trustworthy and insightful findings that are easy to understand and apply. 

What are examples of themes in thematic analysis?

Grouping codes into themes summarize sections of data in a useful way to answer research questions and achieve objectives. A theme identifies an area of data and tells the reader something about it. A good theme can sit alone without requiring descriptive text beneath it.

For example, if you were analyzing data on wildlife, codes might be owls, hawks, and falcons. These codes might fall beneath the theme of birds of prey. If your data were about the latest trends for teenage girls, codes such as mini skirts, leggings, and distressed jeans would fall under fashion.  

Thematic analysis is straightforward and intuitive enough that most people have no trouble applying it.

Should you be using a customer insights hub?

Do you want to discover previous research faster?

Do you share your research findings with others?

Do you analyze research data?

Start for free today, add your research, and get to key insights faster

Editor’s picks

Last updated: 18 April 2023

Last updated: 27 February 2023

Last updated: 22 August 2024

Last updated: 5 February 2023

Last updated: 16 August 2024

Last updated: 9 March 2023

Last updated: 30 April 2024

Last updated: 12 December 2023

Last updated: 11 March 2024

Last updated: 4 July 2024

Last updated: 6 March 2024

Last updated: 5 March 2024

Last updated: 13 May 2024

Latest articles

Related topics, .css-je19u9{-webkit-align-items:flex-end;-webkit-box-align:flex-end;-ms-flex-align:flex-end;align-items:flex-end;display:-webkit-box;display:-webkit-flex;display:-ms-flexbox;display:flex;-webkit-flex-direction:row;-ms-flex-direction:row;flex-direction:row;-webkit-box-flex-wrap:wrap;-webkit-flex-wrap:wrap;-ms-flex-wrap:wrap;flex-wrap:wrap;-webkit-box-pack:center;-ms-flex-pack:center;-webkit-justify-content:center;justify-content:center;row-gap:0;text-align:center;max-width:671px;}@media (max-width: 1079px){.css-je19u9{max-width:400px;}.css-je19u9>span{white-space:pre;}}@media (max-width: 799px){.css-je19u9{max-width:400px;}.css-je19u9>span{white-space:pre;}} decide what to .css-1kiodld{max-height:56px;display:-webkit-box;display:-webkit-flex;display:-ms-flexbox;display:flex;-webkit-align-items:center;-webkit-box-align:center;-ms-flex-align:center;align-items:center;}@media (max-width: 1079px){.css-1kiodld{display:none;}} build next, decide what to build next, log in or sign up.

Get started for free

  • - Google Chrome

Intended for healthcare professionals

  • My email alerts
  • BMA member login
  • Username * Password * Forgot your log in details? Need to activate BMA Member Log In Log in via OpenAthens Log in via your institution

Home

Search form

  • Advanced search
  • Search responses
  • Search blogs
  • Practical thematic...

Practical thematic analysis: a guide for multidisciplinary health services research teams engaging in qualitative analysis

  • Related content
  • Peer review
  • on behalf of the Coproduction Laboratory
  • 1 Dartmouth Health, Lebanon, NH, USA
  • 2 Dartmouth Institute for Health Policy and Clinical Practice, Geisel School of Medicine at Dartmouth College, Lebanon, NH, USA
  • 3 Center for Primary Care and Public Health (Unisanté), Lausanne, Switzerland
  • 4 Jönköping Academy for Improvement of Health and Welfare, School of Health and Welfare, Jönköping University, Jönköping, Sweden
  • 5 Highland Park, NJ, USA
  • 6 Division of Public Health Sciences, Department of Surgery, Washington University School of Medicine, St Louis, MO, USA
  • Correspondence to: C H Saunders catherine.hylas.saunders{at}dartmouth.edu
  • Accepted 26 April 2023

Qualitative research methods explore and provide deep contextual understanding of real world issues, including people’s beliefs, perspectives, and experiences. Whether through analysis of interviews, focus groups, structured observation, or multimedia data, qualitative methods offer unique insights in applied health services research that other approaches cannot deliver. However, many clinicians and researchers hesitate to use these methods, or might not use them effectively, which can leave relevant areas of inquiry inadequately explored. Thematic analysis is one of the most common and flexible methods to examine qualitative data collected in health services research. This article offers practical thematic analysis as a step-by-step approach to qualitative analysis for health services researchers, with a focus on accessibility for patients, care partners, clinicians, and others new to thematic analysis. Along with detailed instructions covering three steps of reading, coding, and theming, the article includes additional novel and practical guidance on how to draft effective codes, conduct a thematic analysis session, and develop meaningful themes. This approach aims to improve consistency and rigor in thematic analysis, while also making this method more accessible for multidisciplinary research teams.

Through qualitative methods, researchers can provide deep contextual understanding of real world issues, and generate new knowledge to inform hypotheses, theories, research, and clinical care. Approaches to data collection are varied, including interviews, focus groups, structured observation, and analysis of multimedia data, with qualitative research questions aimed at understanding the how and why of human experience. 1 2 Qualitative methods produce unique insights in applied health services research that other approaches cannot deliver. In particular, researchers acknowledge that thematic analysis is a flexible and powerful method of systematically generating robust qualitative research findings by identifying, analysing, and reporting patterns (themes) within data. 3 4 5 6 Although qualitative methods are increasingly valued for answering clinical research questions, many researchers are unsure how to apply them or consider them too time consuming to be useful in responding to practical challenges 7 or pressing situations such as public health emergencies. 8 Consequently, researchers might hesitate to use them, or use them improperly. 9 10 11

Although much has been written about how to perform thematic analysis, practical guidance for non-specialists is sparse. 3 5 6 12 13 In the multidisciplinary field of health services research, qualitative data analysis can confound experienced researchers and novices alike, which can stoke concerns about rigor, particularly for those more familiar with quantitative approaches. 14 Since qualitative methods are an area of specialisation, support from experts is beneficial. However, because non-specialist perspectives can enhance data interpretation and enrich findings, there is a case for making thematic analysis easier, more rapid, and more efficient, 8 particularly for patients, care partners, clinicians, and other stakeholders. A practical guide to thematic analysis might encourage those on the ground to use these methods in their work, unearthing insights that would otherwise remain undiscovered.

Given the need for more accessible qualitative analysis approaches, we present a simple, rigorous, and efficient three step guide for practical thematic analysis. We include new guidance on the mechanics of thematic analysis, including developing codes, constructing meaningful themes, and hosting a thematic analysis session. We also discuss common pitfalls in thematic analysis and how to avoid them.

Summary points

Qualitative methods are increasingly valued in applied health services research, but multidisciplinary research teams often lack accessible step-by-step guidance and might struggle to use these approaches

A newly developed approach, practical thematic analysis, uses three simple steps: reading, coding, and theming

Based on Braun and Clarke’s reflexive thematic analysis, our streamlined yet rigorous approach is designed for multidisciplinary health services research teams, including patients, care partners, and clinicians

This article also provides companion materials including a slide presentation for teaching practical thematic analysis to research teams, a sample thematic analysis session agenda, a theme coproduction template for use during the session, and guidance on using standardised reporting criteria for qualitative research

In their seminal work, Braun and Clarke developed a six phase approach to reflexive thematic analysis. 4 12 We built on their method to develop practical thematic analysis ( box 1 , fig 1 ), which is a simplified and instructive approach that retains the substantive elements of their six phases. Braun and Clarke’s phase 1 (familiarising yourself with the dataset) is represented in our first step of reading. Phase 2 (coding) remains as our second step of coding. Phases 3 (generating initial themes), 4 (developing and reviewing themes), and 5 (refining, defining, and naming themes) are represented in our third step of theming. Phase 6 (writing up) also occurs during this third step of theming, but after a thematic analysis session. 4 12

Key features and applications of practical thematic analysis

Step 1: reading.

All manuscript authors read the data

All manuscript authors write summary memos

Step 2: Coding

Coders perform both data management and early data analysis

Codes are complete thoughts or sentences, not categories

Step 3: Theming

Researchers host a thematic analysis session and share different perspectives

Themes are complete thoughts or sentences, not categories

Applications

For use by practicing clinicians, patients and care partners, students, interdisciplinary teams, and those new to qualitative research

When important insights from healthcare professionals are inaccessible because they do not have qualitative methods training

When time and resources are limited

Fig 1

Steps in practical thematic analysis

  • Download figure
  • Open in new tab
  • Download powerpoint

We present linear steps, but as qualitative research is usually iterative, so too is thematic analysis. 15 Qualitative researchers circle back to earlier work to check whether their interpretations still make sense in the light of additional insights, adapting as necessary. While we focus here on the practical application of thematic analysis in health services research, we recognise our approach exists in the context of the broader literature on thematic analysis and the theoretical underpinnings of qualitative methods as a whole. For a more detailed discussion of these theoretical points, as well as other methods widely used in health services research, we recommend reviewing the sources outlined in supplemental material 1. A strong and nuanced understanding of the context and underlying principles of thematic analysis will allow for higher quality research. 16

Practical thematic analysis is a highly flexible approach that can draw out valuable findings and generate new hypotheses, including in cases with a lack of previous research to build on. The approach can also be used with a variety of data, such as transcripts from interviews or focus groups, patient encounter transcripts, professional publications, observational field notes, and online activity logs. Importantly, successful practical thematic analysis is predicated on having high quality data collected with rigorous methods. We do not describe qualitative research design or data collection here. 11 17

In supplemental material 1, we summarise the foundational methods, concepts, and terminology in qualitative research. Along with our guide below, we include a companion slide presentation for teaching practical thematic analysis to research teams in supplemental material 2. We provide a theme coproduction template for teams to use during thematic analysis sessions in supplemental material 3. Our method aligns with the major qualitative reporting frameworks, including the Consolidated Criteria for Reporting Qualitative Research (COREQ). 18 We indicate the corresponding step in practical thematic analysis for each COREQ item in supplemental material 4.

Familiarisation and memoing

We encourage all manuscript authors to review the full dataset (eg, interview transcripts) to familiarise themselves with it. This task is most critical for those who will later be engaged in the coding and theming steps. Although time consuming, it is the best way to involve team members in the intellectual work of data interpretation, so that they can contribute to the analysis and contextualise the results. If this task is not feasible given time limitations or large quantities of data, the data can be divided across team members. In this case, each piece of data should be read by at least two individuals who ideally represent different professional roles or perspectives.

We recommend that researchers reflect on the data and independently write memos, defined as brief notes on thoughts and questions that arise during reading, and a summary of their impressions of the dataset. 2 19 Memoing is an opportunity to gain insights from varying perspectives, particularly from patients, care partners, clinicians, and others. It also gives researchers the opportunity to begin to scope which elements of and concepts in the dataset are relevant to the research question.

Data saturation

The concept of data saturation ( box 2 ) is a foundation of qualitative research. It is defined as the point in analysis at which new data tend to be redundant of data already collected. 21 Qualitative researchers are expected to report their approach to data saturation. 18 Because thematic analysis is iterative, the team should discuss saturation throughout the entire process, beginning with data collection and continuing through all steps of the analysis. 22 During step 1 (reading), team members might discuss data saturation in the context of summary memos. Conversations about saturation continue during step 2 (coding), with confirmation that saturation has been achieved during step 3 (theming). As a rule of thumb, researchers can often achieve saturation in 9-17 interviews or 4-8 focus groups, but this will vary depending on the specific characteristics of the study. 23

Data saturation in context

Braun and Clarke discourage the use of data saturation to determine sample size (eg, number of interviews), because it assumes that there is an objective truth to be captured in the data (sometimes known as a positivist perspective). 20 Qualitative researchers often try to avoid positivist approaches, arguing that there is no one true way of seeing the world, and will instead aim to gather multiple perspectives. 5 Although this theoretical debate with qualitative methods is important, we recognise that a priori estimates of saturation are often needed, particularly for investigators newer to qualitative research who might want a more pragmatic and applied approach. In addition, saturation based, sample size estimation can be particularly helpful in grant proposals. However, researchers should still follow a priori sample size estimation with a discussion to confirm saturation has been achieved.

Definition of coding

We describe codes as labels for concepts in the data that are directly relevant to the study objective. Historically, the purpose of coding was to distil the large amount of data collected into conceptually similar buckets so that researchers could review it in aggregate and identify key themes. 5 24 We advocate for a more analytical approach than is typical with thematic analysis. With our method, coding is both the foundation for and the beginning of thematic analysis—that is, early data analysis, management, and reduction occur simultaneously rather than as different steps. This approach moves the team more efficiently towards being able to describe themes.

Building the coding team

Coders are the research team members who directly assign codes to the data, reading all material and systematically labelling relevant data with appropriate codes. Ideally, at least two researchers would code every discrete data document, such as one interview transcript. 25 If this task is not possible, individual coders can each code a subset of the data that is carefully selected for key characteristics (sometimes known as purposive selection). 26 When using this approach, we recommend that at least 10% of data be coded by two or more coders to ensure consistency in codebook application. We also recommend coding teams of no more than four to five people, for practical reasons concerning maintaining consistency.

Clinicians, patients, and care partners bring unique perspectives to coding and enrich the analytical process. 27 Therefore, we recommend choosing coders with a mix of relevant experiences so that they can challenge and contextualise each other’s interpretations based on their own perspectives and opinions ( box 3 ). We recommend including both coders who collected the data and those who are naive to it, if possible, given their different perspectives. We also recommend all coders review the summary memos from the reading step so that key concepts identified by those not involved in coding can be integrated into the analytical process. In practice, this review means coding the memos themselves and discussing them during the code development process. This approach ensures that the team considers a diversity of perspectives.

Coding teams in context

The recommendation to use multiple coders is a departure from Braun and Clarke. 28 29 When the views, experiences, and training of each coder (sometimes known as positionality) 30 are carefully considered, having multiple coders can enhance interpretation and enrich findings. When these perspectives are combined in a team setting, researchers can create shared meaning from the data. Along with the practical consideration of distributing the workload, 31 inclusion of these multiple perspectives increases the overall quality of the analysis by mitigating the impact of any one coder’s perspective. 30

Coding tools

Qualitative analysis software facilitates coding and managing large datasets but does not perform the analytical work. The researchers must perform the analysis themselves. Most programs support queries and collaborative coding by multiple users. 32 Important factors to consider when choosing software can include accessibility, cost, interoperability, the look and feel of code reports, and the ease of colour coding and merging codes. Coders can also use low tech solutions, including highlighters, word processors, or spreadsheets.

Drafting effective codes

To draft effective codes, we recommend that the coders review each document line by line. 33 As they progress, they can assign codes to segments of data representing passages of interest. 34 Coders can also assign multiple codes to the same passage. Consensus among coders on what constitutes a minimum or maximum amount of text for assigning a code is helpful. As a general rule, meaningful segments of text for coding are shorter than one paragraph, but longer than a few words. Coders should keep the study objective in mind when determining which data are relevant ( box 4 ).

Code types in context

Similar to Braun and Clarke’s approach, practical thematic analysis does not specify whether codes are based on what is evident from the data (sometimes known as semantic) or whether they are based on what can be inferred at a deeper level from the data (sometimes known as latent). 4 12 35 It also does not specify whether they are derived from the data (sometimes known as inductive) or determined ahead of time (sometimes known as deductive). 11 35 Instead, it should be noted that health services researchers conducting qualitative studies often adopt all these approaches to coding (sometimes known as hybrid analysis). 3

In practical thematic analysis, codes should be more descriptive than general categorical labels that simply group data with shared characteristics. At a minimum, codes should form a complete (or full) thought. An easy way to conceptualise full thought codes is as complete sentences with subjects and verbs ( table 1 ), although full sentence coding is not always necessary. With full thought codes, researchers think about the data more deeply and capture this insight in the codes. This coding facilitates the entire analytical process and is especially valuable when moving from codes to broader themes. Experienced qualitative researchers often intuitively use full thought or sentence codes, but this practice has not been explicitly articulated as a path to higher quality coding elsewhere in the literature. 6

Example transcript with codes used in practical thematic analysis 36

  • View inline

Depending on the nature of the data, codes might either fall into flat categories or be arranged hierarchically. Flat categories are most common when the data deal with topics on the same conceptual level. In other words, one topic is not a subset of another topic. By contrast, hierarchical codes are more appropriate for concepts that naturally fall above or below each other. Hierarchical coding can also be a useful form of data management and might be necessary when working with a large or complex dataset. 5 Codes grouped into these categories can also make it easier to naturally transition into generating themes from the initial codes. 5 These decisions between flat versus hierarchical coding are part of the work of the coding team. In both cases, coders should ensure that their code structures are guided by their research questions.

Developing the codebook

A codebook is a shared document that lists code labels and comprehensive descriptions for each code, as well as examples observed within the data. Good code descriptions are precise and specific so that coders can consistently assign the same codes to relevant data or articulate why another coder would do so. Codebook development is iterative and involves input from the entire coding team. However, as those closest to the data, coders must resist undue influence, real or perceived, from other team members with conflicting opinions—it is important to mitigate the risk that more senior researchers, like principal investigators, exert undue influence on the coders’ perspectives.

In practical thematic analysis, coders begin codebook development by independently coding a small portion of the data, such as two to three transcripts or other units of analysis. Coders then individually produce their initial codebooks. This task will require them to reflect on, organise, and clarify codes. The coders then meet to reconcile the draft codebooks, which can often be difficult, as some coders tend to lump several concepts together while others will split them into more specific codes. Discussing disagreements and negotiating consensus are necessary parts of early data analysis. Once the codebook is relatively stable, we recommend soliciting input on the codes from all manuscript authors. Yet, coders must ultimately be empowered to finalise the details so that they are comfortable working with the codebook across a large quantity of data.

Assigning codes to the data

After developing the codebook, coders will use it to assign codes to the remaining data. While the codebook’s overall structure should remain constant, coders might continue to add codes corresponding to any new concepts observed in the data. If new codes are added, coders should review the data they have already coded and determine whether the new codes apply. Qualitative data analysis software can be useful for editing or merging codes.

We recommend that coders periodically compare their code occurrences ( box 5 ), with more frequent check-ins if substantial disagreements occur. In the event of large discrepancies in the codes assigned, coders should revise the codebook to ensure that code descriptions are sufficiently clear and comprehensive to support coding alignment going forward. Because coding is an iterative process, the team can adjust the codebook as needed. 5 28 29

Quantitative coding in context

Researchers should generally avoid reporting code counts in thematic analysis. However, counts can be a useful proxy in maintaining alignment between coders on key concepts. 26 In practice, therefore, researchers should make sure that all coders working on the same piece of data assign the same codes with a similar pattern and that their memoing and overall assessment of the data are aligned. 37 However, the frequency of a code alone is not an indicator of its importance. It is more important that coders agree on the most salient points in the data; reviewing and discussing summary memos can be helpful here. 5

Researchers might disagree on whether or not to calculate and report inter-rater reliability. We note that quantitative tests for agreement, such as kappa statistics or intraclass correlation coefficients, can be distracting and might not provide meaningful results in qualitative analyses. Similarly, Braun and Clarke argue that expecting perfect alignment on coding is inconsistent with the goal of co-constructing meaning. 28 29 Overall consensus on codes’ salience and contributions to themes is the most important factor.

Definition of themes

Themes are meta-constructs that rise above codes and unite the dataset ( box 6 , fig 2 ). They should be clearly evident, repeated throughout the dataset, and relevant to the research questions. 38 While codes are often explicit descriptions of the content in the dataset, themes are usually more conceptual and knit the codes together. 39 Some researchers hypothesise that theme development is loosely described in the literature because qualitative researchers simply intuit themes during the analytical process. 39 In practical thematic analysis, we offer a concrete process that should make developing meaningful themes straightforward.

Themes in context

According to Braun and Clarke, a theme “captures something important about the data in relation to the research question and represents some level of patterned response or meaning within the data set.” 4 Similarly, Braun and Clarke advise against themes as domain summaries. While different approaches can draw out themes from codes, the process begins by identifying patterns. 28 35 Like Braun and Clarke and others, we recommend that researchers consider the salience of certain themes, their prevalence in the dataset, and their keyness (ie, how relevant the themes are to the overarching research questions). 4 12 34

Fig 2

Use of themes in practical thematic analysis

Constructing meaningful themes

After coding all the data, each coder should independently reflect on the team’s summary memos (step 1), the codebook (step 2), and the coded data itself to develop draft themes (step 3). It can be illuminating for coders to review all excerpts associated with each code, so that they derive themes directly from the data. Researchers should remain focused on the research question during this step, so that themes have a clear relation with the overall project aim. Use of qualitative analysis software will make it easy to view each segment of data tagged with each code. Themes might neatly correspond to groups of codes. Or—more likely—they will unite codes and data in unexpected ways. A whiteboard or presentation slides might be helpful to organise, craft, and revise themes. We also provide a template for coproducing themes (supplemental material 3). As with codebook justification, team members will ideally produce individual drafts of the themes that they have identified in the data. They can then discuss these with the group and reach alignment or consensus on the final themes.

The team should ensure that all themes are salient, meaning that they are: supported by the data, relevant to the study objectives, and important. Similar to codes, themes are framed as complete thoughts or sentences, not categories. While codes and themes might appear to be similar to each other, the key distinction is that the themes represent a broader concept. Table 2 shows examples of codes and their corresponding themes from a previously published project that used practical thematic analysis. 36 Identifying three to four key themes that comprise a broader overarching theme is a useful approach. Themes can also have subthemes, if appropriate. 40 41 42 43 44

Example codes with themes in practical thematic analysis 36

Thematic analysis session

After each coder has independently produced draft themes, a carefully selected subset of the manuscript team meets for a thematic analysis session ( table 3 ). The purpose of this session is to discuss and reach alignment or consensus on the final themes. We recommend a session of three to five hours, either in-person or virtually.

Example agenda of thematic analysis session

The composition of the thematic analysis session team is important, as each person’s perspectives will shape the results. This group is usually a small subset of the broader research team, with three to seven individuals. We recommend that primary and senior authors work together to include people with diverse experiences related to the research topic. They should aim for a range of personalities and professional identities, particularly those of clinicians, trainees, patients, and care partners. At a minimum, all coders and primary and senior authors should participate in the thematic analysis session.

The session begins with each coder presenting their draft themes with supporting quotes from the data. 5 Through respectful and collaborative deliberation, the group will develop a shared set of final themes.

One team member facilitates the session. A firm, confident, and consistent facilitation style with good listening skills is critical. For practical reasons, this person is not usually one of the primary coders. Hierarchies in teams cannot be entirely flattened, but acknowledging them and appointing an external facilitator can reduce their impact. The facilitator can ensure that all voices are heard. For example, they might ask for perspectives from patient partners or more junior researchers, and follow up on comments from senior researchers to say, “We have heard your perspective and it is important; we want to make sure all perspectives in the room are equally considered.” Or, “I hear [senior person] is offering [x] idea, I’d like to hear other perspectives in the room.” The role of the facilitator is critical in the thematic analysis session. The facilitator might also privately discuss with more senior researchers, such as principal investigators and senior authors, the importance of being aware of their influence over others and respecting and eliciting the perspectives of more junior researchers, such as patients, care partners, and students.

To our knowledge, this discrete thematic analysis session is a novel contribution of practical thematic analysis. It helps efficiently incorporate diverse perspectives using the session agenda and theme coproduction template (supplemental material 3) and makes the process of constructing themes transparent to the entire research team.

Writing the report

We recommend beginning the results narrative with a summary of all relevant themes emerging from the analysis, followed by a subheading for each theme. Each subsection begins with a brief description of the theme and is illustrated with relevant quotes, which are contextualised and explained. The write-up should not simply be a list, but should contain meaningful analysis and insight from the researchers, including descriptions of how different stakeholders might have experienced a particular situation differently or unexpectedly.

In addition to weaving quotes into the results narrative, quotes can be presented in a table. This strategy is a particularly helpful when submitting to clinical journals with tight word count limitations. Quote tables might also be effective in illustrating areas of agreement and disagreement across stakeholder groups, with columns representing different groups and rows representing each theme or subtheme. Quotes should include an anonymous label for each participant and any relevant characteristics, such as role or gender. The aim is to produce rich descriptions. 5 We recommend against repeating quotations across multiple themes in the report, so as to avoid confusion. The template for coproducing themes (supplemental material 3) allows documentation of quotes supporting each theme, which might also be useful during report writing.

Visual illustrations such as a thematic map or figure of the findings can help communicate themes efficiently. 4 36 42 44 If a figure is not possible, a simple list can suffice. 36 Both must clearly present the main themes with subthemes. Thematic figures can facilitate confirmation that the researchers’ interpretations reflect the study populations’ perspectives (sometimes known as member checking), because authors can invite discussions about the figure and descriptions of findings and supporting quotes. 46 This process can enhance the validity of the results. 46

In supplemental material 4, we provide additional guidance on reporting thematic analysis consistent with COREQ. 18 Commonly used in health services research, COREQ outlines a standardised list of items to be included in qualitative research reports ( box 7 ).

Reporting in context

We note that use of COREQ or any other reporting guidelines does not in itself produce high quality work and should not be used as a substitute for general methodological rigor. Rather, researchers must consider rigor throughout the entire research process. As the issue of how to conceptualise and achieve rigorous qualitative research continues to be debated, 47 48 we encourage researchers to explicitly discuss how they have looked at methodological rigor in their reports. Specifically, we point researchers to Braun and Clarke’s 2021 tool for evaluating thematic analysis manuscripts for publication (“Twenty questions to guide assessment of TA [thematic analysis] research quality”). 16

Avoiding common pitfalls

Awareness of common mistakes can help researchers avoid improper use of qualitative methods. Improper use can, for example, prevent researchers from developing meaningful themes and can risk drawing inappropriate conclusions from the data. Braun and Clarke also warn of poor quality in qualitative research, noting that “coherence and integrity of published research does not always hold.” 16

Weak themes

An important distinction between high and low quality themes is that high quality themes are descriptive and complete thoughts. As such, they often contain subjects and verbs, and can be expressed as full sentences ( table 2 ). Themes that are simply descriptive categories or topics could fail to impart meaningful knowledge beyond categorisation. 16 49 50

Researchers will often move from coding directly to writing up themes, without performing the work of theming or hosting a thematic analysis session. Skipping concerted theming often results in themes that look more like categories than unifying threads across the data.

Unfocused analysis

Because data collection for qualitative research is often semi-structured (eg, interviews, focus groups), not all data will be directly relevant to the research question at hand. To avoid unfocused analysis and a correspondingly unfocused manuscript, we recommend that all team members keep the research objective in front of them at every stage, from reading to coding to theming. During the thematic analysis session, we recommend that the research question be written on a whiteboard so that all team members can refer back to it, and so that the facilitator can ensure that conversations about themes occur in the context of this question. Consistently focusing on the research question can help to ensure that the final report directly answers it, as opposed to the many other interesting insights that might emerge during the qualitative research process. Such insights can be picked up in a secondary analysis if desired.

Inappropriate quantification

Presenting findings quantitatively (eg, “We found 18 instances of participants mentioning safety concerns about the vaccines”) is generally undesirable in practical thematic analysis reporting. 51 Descriptive terms are more appropriate (eg, “participants had substantial concerns about the vaccines,” or “several participants were concerned about this”). This descriptive presentation is critical because qualitative data might not be consistently elicited across participants, meaning that some individuals might share certain information while others do not, simply based on how conversations evolve. Additionally, qualitative research does not aim to draw inferences outside its specific sample. Emphasising numbers in thematic analysis can lead to readers incorrectly generalising the findings. Although peer reviewers unfamiliar with thematic analysis often request this type of quantification, practitioners of practical thematic analysis can confidently defend their decision to avoid it. If quantification is methodologically important, we recommend simultaneously conducting a survey or incorporating standardised interview techniques into the interview guide. 11

Neglecting group dynamics

Researchers should concertedly consider group dynamics in the research team. Particular attention should be paid to power relations and the personality of team members, which can include aspects such as who most often speaks, who defines concepts, and who resolves disagreements that might arise within the group. 52

The perspectives of patient and care partners are particularly important to cultivate. Ideally, patient partners are meaningfully embedded in studies from start to finish, not just for practical thematic analysis. 53 Meaningful engagement can build trust, which makes it easier for patient partners to ask questions, request clarification, and share their perspectives. Professional team members should actively encourage patient partners by emphasising that their expertise is critically important and valued. Noting when a patient partner might be best positioned to offer their perspective can be particularly powerful.

Insufficient time allocation

Researchers must allocate enough time to complete thematic analysis. Working with qualitative data takes time, especially because it is often not a linear process. As the strength of thematic analysis lies in its ability to make use of the rich details and complexities of the data, we recommend careful planning for the time required to read and code each document.

Estimating the necessary time can be challenging. For step 1 (reading), researchers can roughly calculate the time required based on the time needed to read and reflect on one piece of data. For step 2 (coding), the total amount of time needed can be extrapolated from the time needed to code one document during codebook development. We also recommend three to five hours for the thematic analysis session itself, although coders will need to independently develop their draft themes beforehand. Although the time required for practical thematic analysis is variable, teams should be able to estimate their own required effort with these guidelines.

Practical thematic analysis builds on the foundational work of Braun and Clarke. 4 16 We have reframed their six phase process into three condensed steps of reading, coding, and theming. While we have maintained important elements of Braun and Clarke’s reflexive thematic analysis, we believe that practical thematic analysis is conceptually simpler and easier to teach to less experienced researchers and non-researcher stakeholders. For teams with different levels of familiarity with qualitative methods, this approach presents a clear roadmap to the reading, coding, and theming of qualitative data. Our practical thematic analysis approach promotes efficient learning by doing—experiential learning. 12 29 Practical thematic analysis avoids the risk of relying on complex descriptions of methods and theory and places more emphasis on obtaining meaningful insights from those close to real world clinical environments. Although practical thematic analysis can be used to perform intensive theory based analyses, it lends itself more readily to accelerated, pragmatic approaches.

Strengths and limitations

Our approach is designed to smooth the qualitative analysis process and yield high quality themes. Yet, researchers should note that poorly performed analyses will still produce low quality results. Practical thematic analysis is a qualitative analytical approach; it does not look at study design, data collection, or other important elements of qualitative research. It also might not be the right choice for every qualitative research project. We recommend it for applied health services research questions, where diverse perspectives and simplicity might be valuable.

We also urge researchers to improve internal validity through triangulation methods, such as member checking (supplemental material 1). 46 Member checking could include soliciting input on high level themes, theme definitions, and quotations from participants. This approach might increase rigor.

Implications

We hope that by providing clear and simple instructions for practical thematic analysis, a broader range of researchers will be more inclined to use these methods. Increased transparency and familiarity with qualitative approaches can enhance researchers’ ability to both interpret qualitative studies and offer up new findings themselves. In addition, it can have usefulness in training and reporting. A major strength of this approach is to facilitate meaningful inclusion of patient and care partner perspectives, because their lived experiences can be particularly valuable in data interpretation and the resulting findings. 11 30 As clinicians are especially pressed for time, they might also appreciate a practical set of instructions that can be immediately used to leverage their insights and access to patients and clinical settings, and increase the impact of qualitative research through timely results. 8

Practical thematic analysis is a simplified approach to performing thematic analysis in health services research, a field where the experiences of patients, care partners, and clinicians are of inherent interest. We hope that it will be accessible to those individuals new to qualitative methods, including patients, care partners, clinicians, and other health services researchers. We intend to empower multidisciplinary research teams to explore unanswered questions and make new, important, and rigorous contributions to our understanding of important clinical and health systems research.

Acknowledgments

All members of the Coproduction Laboratory provided input that shaped this manuscript during laboratory meetings. We acknowledge advice from Elizabeth Carpenter-Song, an expert in qualitative methods.

Coproduction Laboratory group contributors: Stephanie C Acquilano ( http://orcid.org/0000-0002-1215-5531 ), Julie Doherty ( http://orcid.org/0000-0002-5279-6536 ), Rachel C Forcino ( http://orcid.org/0000-0001-9938-4830 ), Tina Foster ( http://orcid.org/0000-0001-6239-4031 ), Megan Holthoff, Christopher R Jacobs ( http://orcid.org/0000-0001-5324-8657 ), Lisa C Johnson ( http://orcid.org/0000-0001-7448-4931 ), Elaine T Kiriakopoulos, Kathryn Kirkland ( http://orcid.org/0000-0002-9851-926X ), Meredith A MacMartin ( http://orcid.org/0000-0002-6614-6091 ), Emily A Morgan, Eugene Nelson, Elizabeth O’Donnell, Brant Oliver ( http://orcid.org/0000-0002-7399-622X ), Danielle Schubbe ( http://orcid.org/0000-0002-9858-1805 ), Gabrielle Stevens ( http://orcid.org/0000-0001-9001-178X ), Rachael P Thomeer ( http://orcid.org/0000-0002-5974-3840 ).

Contributors: Practical thematic analysis, an approach designed for multidisciplinary health services teams new to qualitative research, was based on CHS’s experiences teaching thematic analysis to clinical teams and students. We have drawn heavily from qualitative methods literature. CHS is the guarantor of the article. CHS, AS, CvP, AMK, JRK, and JAP contributed to drafting the manuscript. AS, JG, CMM, JAP, and RWY provided feedback on their experiences using practical thematic analysis. CvP, LCL, SLB, AVC, GE, and JKL advised on qualitative methods in health services research, given extensive experience. All authors meaningfully edited the manuscript content, including AVC and RKS. The corresponding author attests that all listed authors meet authorship criteria and that no others meeting the criteria have been omitted.

Funding: This manuscript did not receive any specific grant from funding agencies in the public, commercial, or not-for-profit sectors.

Competing interests: All authors have completed the ICMJE uniform disclosure form at https://www.icmje.org/disclosure-of-interest/ and declare: no support from any organisation for the submitted work; no financial relationships with any organisations that might have an interest in the submitted work in the previous three years; no other relationships or activities that could appear to have influenced the submitted work.

Provenance and peer review: Not commissioned; externally peer reviewed.

  • Ziebland S ,
  • ↵ A Hybrid Approach to Thematic Analysis in Qualitative Research: Using a Practical Example. 2018. https://methods.sagepub.com/case/hybrid-approach-thematic-analysis-qualitative-research-a-practical-example .
  • Maguire M ,
  • Vindrola-Padros C ,
  • Vindrola-Padros B
  • ↵ Vindrola-Padros C. Rapid Ethnographies: A Practical Guide . Cambridge University Press 2021. https://play.google.com/store/books/details?id=n80HEAAAQBAJ
  • Schroter S ,
  • Merino JG ,
  • Barbeau A ,
  • ↵ Padgett DK. Qualitative and Mixed Methods in Public Health . SAGE Publications 2011. https://play.google.com/store/books/details?id=LcYgAQAAQBAJ
  • Scharp KM ,
  • Korstjens I
  • Barnett-Page E ,
  • ↵ Guest G, Namey EE, Mitchell ML. Collecting Qualitative Data: A Field Manual for Applied Research . SAGE 2013. https://play.google.com/store/books/details?id=-3rmWYKtloC
  • Sainsbury P ,
  • Emerson RM ,
  • Saunders B ,
  • Kingstone T ,
  • Hennink MM ,
  • Kaiser BN ,
  • Hennink M ,
  • O’Connor C ,
  • ↵ Yen RW, Schubbe D, Walling L, et al. Patient engagement in the What Matters Most trial: experiences and future implications for research. Poster presented at International Shared Decision Making conference, Quebec City, Canada. July 2019.
  • ↵ Got questions about Thematic Analysis? We have prepared some answers to common ones. https://www.thematicanalysis.net/faqs/ (accessed 9 Nov 2022).
  • ↵ Braun V, Clarke V. Thematic Analysis. SAGE Publications. 2022. https://uk.sagepub.com/en-gb/eur/thematic-analysis/book248481 .
  • Kalpokas N ,
  • Radivojevic I
  • Campbell KA ,
  • Durepos P ,
  • ↵ Understanding Thematic Analysis. https://www.thematicanalysis.net/understanding-ta/ .
  • Saunders CH ,
  • Stevens G ,
  • CONFIDENT Study Long-Term Care Partners
  • MacQueen K ,
  • Vaismoradi M ,
  • Turunen H ,
  • Schott SL ,
  • Berkowitz J ,
  • Carpenter-Song EA ,
  • Goldwag JL ,
  • Durand MA ,
  • Goldwag J ,
  • Saunders C ,
  • Mishra MK ,
  • Rodriguez HP ,
  • Shortell SM ,
  • Verdinelli S ,
  • Scagnoli NI
  • Campbell C ,
  • Sparkes AC ,
  • McGannon KR
  • Sandelowski M ,
  • Connelly LM ,
  • O’Malley AJ ,

example of thematic analysis in research

Logo for Open Educational Resources Collective

Want to create or adapt books like this? Learn more about how Pressbooks supports open publishing practices.

Chapter 22: Thematic Analysis

Darshini Ayton

Learning outcomes

Upon completion of this chapter, you should be able to:

  • Describe the different approaches to thematic analysis.
  • Understand how to conduct the three types of thematic analysis.
  • Identify the strengths and limitations of each type of thematic analysis.

What is thematic analysis?

Thematic analysis is a common method used in the analysis of qualitative data to identify, analyse and interpret meaning through a systematic process of generating codes (see Chapter 20) that leads to the development of themes. 1 Thematic analysis requires the active engagement of the researcher with the data, in a process of sorting, categorising and interpretation. 1 Thematic analysis is exploratory analysis whereby codes are not predetermined and are data-derived, usually from primary sources of data (e,g, interviews and focus groups). This is in contrast to themes generated through directed or summative content analysis, which is considered confirmatory hypothesis-driven analysis, with predetermined codes typically generated from a hypothesis (see Chapter 21). 2 There are many forms of thematic analysis. Hence, it is important to treat thematic analysis as one of many methods of analysis, and to justify the approach on the basis of the research question and pragmatic considerations such as resources, time and audience. The three main forms of thematic analysis used in health and social care research, discussed in this chapter, are:

Applied thematic analysis

  • Framework analysis
  • Reflexive thematic analysis.

This involves multiple, inductive analytic techniques designed to identify and examine themes from textual data in a way that is transparent and credible, drawing from a broad range of theoretical and methodological perspectives. It focuses on presenting the stories of participants as accurately and comprehensively as possible. Applied thematic analysis mixes a bit of everything: grounded theory, positivism, interpretivism and phenomenology. 2

Applied thematic analysis borrows what we feel are the more useful techniques from each theoretical and methodological camp and adapts them to an applied research context. 2(p16)

Applied thematic analysis involves five elements:

  • Text s egmentation  involves identifying a meaningful segment of text and the boundaries of the segment. Text segmentation is a useful process as a transcript from a 30-minute interview can be many pages long. Hence, segmenting the text provides a manageable section of the data for interrogation of meaning. For example, text segmentation may be a participant’s response to an interview question, a keyword or concept in context, or a complete discourse between participants. The segment of text is more than a short phrase and can be both small and large sections of text. Text segments can also overlap, and a smaller segment may be embedded within a larger segment. 3
  • Creation of the codebook is a critical element of applied thematic analysis. The codebook is created when the segments of text are systematically coded into categories, types and relationships, and the codes are defined by the observed meaning in the text. The codes and their definitions are descriptive in the beginning, and then evolve into explanatory codes as the researcher examines the commonalities, differences and relationships between the codes. The codebook is an iterative document that the researcher builds and refines as they become more immersed and familiar with the data. 3 Table 22.1 outlines the key components of a codebook. 3

Table 22.1. Codebook components and an example

Code Definition When to use When not to use Example
Attitudes or perceptions: falls Attitudes about falls from health professionals When a health professional describes their thoughts about falls.
Look for ‘I think’ and ‘I believe’ statements.
When providing definitions about falls 'I think they [falls] are an unsolved problem.’
  • Structural coding can be useful if a structured interview guide or focus group guide has been used by the researcher and the researcher stays close to the wording of the question and its prompts. The structured question is the structural code in the codebook, and the text segment should include the participant’s response and any dialogue following the question. Of course, this form of coding can be used even if the researcher does not follow a structured guide, which is often the reality of qualitative data collection. The relevant text segments are coded for the specific structure, as appropriate. 3
  • Content coding is informed by the research question(s) and the questions informing the analysis. The segmented text is grouped in different ways to explore relationships, hierarchies, descriptions and explanations of events, similarities, differences and consequences. The content of the text segment should be read and re-read to identify patterns and meaning, with the generated codes added to the codebook.
  • Themes vary in scope, yet at the core they are phrases or statements that explain the meaning of the text. Researchers need to be aware that themes are considered a higher conceptual level than codes, and therefore should not be comprised of single words or labels. Typically, multiple codes will lead to a theme. Revisiting the research and analysis questions will assist the researcher to identify themes. Through the coding process, the researcher actively searches the data for themes. Examples of how themes may be identified include the repetition of concepts within and across transcripts, the use of metaphors and analogies, key phrases and common phrases used in an unfamiliar way. 3

Framework a nalysis

This method originated in the 1980s in social policy research. Framework analysis is suited to research seeking to answer specific questions about a problem or issue, within a limited time frame and with homogenous data (in topics, concepts and participants); multiple researchers are usually involved in the coding process. 4-6 The process of framework analysis is methodical and suits large data sets, hence is attractive to quantitative researchers and health services researchers. Framework analysis is useful for multidisciplinary teams in which not all members are familiar with qualitative analysis. Framework analysis does not seek to generate theory and is not aligned with any particular epistemological, philosophical or theoretical approach. 5 The output of framework analysis is a matrix with rows (cases), columns (codes) and cells of summarised data that enables researchers to analyse the data case by case and code by code. The case is usually an individual interview, or it can be a defined group or organisation. 5

The process for conducting framework analysis is as follows 5 :

1. Transcription – usually verbatim transcription of the interview.

2. Familiarisation with the interview – reading the transcript and listening to the audio recording (particularly if the researcher doing the analysis did not conduct the interview) can assist in the interpretation of the data. Notes on analytical observations, thoughts and impressions are made in the margins of the transcript during this stage.

3. Coding – completed in a line-by-line method by at least two researchers from different disciplines (or with a patient or public involvement representative), where possible. Coding can be both deductive – (using a theory or specific topics relevant to the project – or inductive, whereby open coding is applied to elements such as behaviours, incidents, values, attitudes, beliefs, emotions and participant reactions. All data is coded.

4. Developing a working analytical framework – codes are collated and organised into categories, to create a structure for summarising or reducing the data.

5. Applying the analytical framework – indexing the remaining transcripts by using the categories and codes of the analytical framework.

6. Charting data into the framework matrix – summarising the data by category and from each transcript into the framework matrix, which is a spreadsheet with numbered cells in which summarised data are entered by codes (columns) and cases (rows). Charting needs to balance the reduction of data to a manageable few lines and retention of the meaning and ‘feel’ of the participant. References to illustrative quotes should be included.

7. Interpreting the data – using the framework matrix and notes taken throughout the analysis process to interpret meaning, in collaboration with team members, including lay and clinical members.

Reflexive thematic analysis

This is the thematic analysis approach developed by Braun and Clarke in 2006 and explained in the highly cited article ‘ Using thematic analysis in psychology ’ . 7 Reflexive thematic analysis recognises the subjectiveness of the analysis process, and that codes and themes are actively generated by the researcher. Hence, themes and codes are influenced by the researcher’s values, skills and experiences. 8 Reflexive thematic analysis ‘exists at the intersection of the researcher, the dataset and the various contexts of interpretation’. 9(line 5-6) In this method, the coding process is less structured and more organic than in applied thematic analysis. Braun and Clarke have been critical of the use of the term ‘emerging themes’, which many researchers use to indicate that the theme was data-driven, as opposed to a deductive approach:

This language suggests that meaning is self evident and somehow ‘within’ the data waiting to be revealed, and that the researcher is a neutral conduit for the revelation of said meaning. In contrast, we conceptualise analysis as a situated and interactive process, reflecting both the data, the positionality of the researcher, and the context of the research itself… it is disingenuous to evoke a process whereby themes simply emerge, instead of being active co-productions on the part of the researcher, the data/participants and context. 10 (p15)

Since 2006, Braun and Clarke have published extensively on reflexive thematic analysis, including a methodological paper comparing reflexive thematic analysis with other approaches to qualitative analysis, 8 and have provided resources on their website to support researchers and students. 9 There are many ways to conduct reflexive thematic analysis, but the six main steps in the method are outlined following. 9 Note that this is not a linear, prescriptive or rule-based process, but rather an approach to guide researchers in systematically and robustly exploring their data.

1.  Familiarisation with data – involves reading and re-reading transcripts so that the researcher is immersed in the data. The researcher makes notes on their initial observations, interpretations and insights for both the individual transcripts and across all the transcripts or data sources.

2.  Coding – the process of applying succinct labels (codes) to the data in a way that captures the meaning and characteristics of the data relevant to the research question. The entire data set is coded in numerous rounds; however, unlike line-by-line coding in grounded theory (Chapter 27), or data segmentation in applied thematic analysis, not all sections of data need to be coded. 8 After a few rounds of coding, the codes are collated and relevant data is extracted.

3.  Generating initial themes – using the collated codes and extracted data, the researcher identifies patterns of meaning (initial or potential themes). The researcher then revisits codes and the data to extract relevant data for the initial themes, to examine the viability of the theme.

4 .  Developing and reviewing themes – checking the initial themes against codes and the entire data set to assess whether it captures the ‘story’ of the data and addresses the research question. During this step, the themes are often reworked by combining, splitting or discarding. For reflexive thematic analysis, a theme is defined as a ‘pattern of shared meaning underpinned by a central concept or idea’. 8 (p 39 )

5.  Refining, defining and naming themes – developing the scope and boundaries of the theme, creating the story of the theme and applying an informative name for the theme.

6.  Writing up – is a key part of the analysis and involves writing the narrative of the themes, embedding the data and providing the contextual basis for the themes in the literature.

Themes versus c odes

As described above, themes are informed by codes, and themes are defined at a conceptually higher level than codes. Themes are broader categorisations that tend to describe or explain the topic or concept. Themes need to extend beyond the code and are typically statements that can stand alone to describe and/or explain the data. Fereday and Muir-Cochrane explain this development from code to theme in Table 22.2. 11

Table 22.2. Corroborating and legitimating coded themes to identify second-order themes

First-order theme Clustered themes Second-order themes
The relationship between the source and recipient is important for feedback credibility, including frequency of contact, respect and trust

The source of the feedback must demonstrate an understanding of the situational context surrounding the feedback message. Feedback should be gathered from a variety of sources.

Verbal feedback is preferred to formal assessment, due to timing, and the opportunity to discuss issues.
Familiarity with a person increases the credibility of the feedback message.

Feedback requires a situational-context.

Verbal feedback is preferred over written feedback.

Trust and respect between the source and recipient of feedback enhances the feedback message.

Familiarity within relationships is potentially detrimental to the feedback process.
Familiarity
When relationships enhance the relevance of feedback

*Note: This table is from an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

When I [the author] first started publishing qualitative research, many of my themes were at the code level. I then got advice that when the themes are the subheadings of the results section of my paper, they should tell the story of the research. The difference in my theme naming can be seen when comparing a paper from my PhD thesis, 12 which explores the challenges of church-based health promotion, with a more recent paper that I published on antimicrobial stewardship 13 (refer to the theme tables in the publications).

Table 22.3. Examples of thematic analysis

Title

CC
Licence

CC BY 4.0

CC BY 4.0

Public Domain Mark 1.0

First
author and year

McKenna-Plumley, 2021

Dickinson, 2020

Bunzli, 2019

Aim/research
question

What are people’s experiences of loneliness while practising physical distancing due to a global pandemic?

‘To explore how medical students in their first clerkship year perceive the relevance of biomedical science knowledge to clinical medicine with the goal of providing insights relevant to curricular reform efforts that impact how the biomedical sciences are taught’

‘To investigate the patient-related cognitive factors (beliefs/attitudes toward knee osteoarthritis and its treatment) and health system-related factors (access, referral pathways) known to influence treatment decisions.’

‘Exploring why patients may feel that nonsurgical interventions are of little value in the treatment of knee osteoarthritis.’

Data
collection

Semi-structured interviews by phone or videoconferencing software.

Interview topics covered social isolation, social connection, loneliness and coping.

(supplementary file 2)

55 student essays in response to the prompt: ‘How is biomedical science knowledge relevant to clinical medicine?’ A reflective writing assignment based on the principles of Kolb experiential learning model

Face-to-face or phone interviews with 27 patients who were on a waiting list for total knee arthroplasty.

Thematic
analysis approach

Reflexive thematic analysis

Applied thematic analysis

Framework analysis

Results

Table of themes and illustrative quotes:

1. Loss of in-person interaction causing loneliness

2. Constrained freedom

3. Challenging emotions

4. Coping with loneliness

1. Knowledge-to-practice medicine

2. Lifelong learning

3. Physician-patient relationship      

4. Learning perception of self

Identity beliefs – knee osteoarthritis is ‘bone on bone’

Casual belief – ‘osteoarthritis is due to excessive loading through the knee’

Consequence beliefs – fear of falling and damaging the joint

Timeline beliefs – osteoarthritis as a downward trajectory, the urgency to do something and arriving at the end of the road.

Advantages and challenges of thematic analysis

Thematic analysis is flexible and can be used to analyse small and large data sets with homogenous and heterogenous samples. Thematic analysis can be applied to any type of data source, from interviews and focus groups to diary entries and online discussion forums. 1 Applied thematic analysis and framework analysis are accessible approaches for non-qualitative researchers or beginner researchers. However, the flexibility and accessibility of thematic analysis can lead to limitations and challenges when thematic analysis is misapplied or done poorly. Thematic analysis can be more descriptive than interpretive if not properly anchored in a theoretical framework. 1 For framework analysis, the spreadsheet matrix output can lead to quantitative researchers inappropriately quantifying the qualitative data. Therefore, training and support from a qualitative researcher with the appropriate expertise can help to ensure that the interpretation of the data is meaningful. 5

Thematic analysis is a family of analysis techniques that are flexible and inductive and involve the generation of codes and themes. There are three main types of thematic analysis: applied thematic analysis, framework analysis and reflexive thematic analysis. These approaches span from structured coding to organic and unstructured coding for theme development. The choice of approach should be guided by the research question, the research design and the available resources and skills of the researcher and team.

  • Clarke V, Braun V. Thematic analysis. J Posit Psychol . 2017;12(3):297-298. doi:10.1080/17439760.2016.1262613
  • Guest G, MacQueen KM, Namey EE. Introduction to applied thematic analysis. In: Guest G, MacQueen, K.M., Namey, E.E., ed. Applied thematic analysis . SAGE Publications, Inc.; 2014. Accessed September 18, 2023. https://methods.sagepub.com/book/applied-thematic-analysis
  • Guest G, MacQueen, K.M., Namey, E.E.,. Themes and Codes. In: Guest G, MacQueen, K.M., Namey, E.E., ed. Applied thematic analysis . SAGE Publications, Inc.; 2014. Accessed September 18, 2023. https://methods.sagepub.com/book/applied-thematic-analysis
  • Srivastava A, Thomson SB. Framework analysis: A qualitative methodology for applied policy research. Journal of Administration and Governance . 2009;72(3). Accessed September 14, 2023. https://ssrn.com/abstract=2760705
  • Gale NK, Heath G, Cameron E, Rashid S, Redwood S. Using the framework method for the analysis of qualitative data in multi-disciplinary health research. BMC Med Res Methodol . 2013;13:117. doi:10.1186/1471-2288-13-117
  • Smith J, Firth J. Qualitative data analysis: the framework approach. Nurse Res . 2011;18(2):52-62. doi:10.7748/nr2011.01.18.2.52.c8284
  • Braun V, Clarke V. Using thematic analysis in psychology. Qual Res Psychol . 2006;3(2):77-101. doi:10.1191/1478088706qp063oa
  • Braun V, Clarke V. Can I use TA? Should I use TA? Should I not use TA? Comparing reflexive thematic analysis and other pattern-based qualitative analytic approaches. Couns Psychother Res . 2021;21(1):37-47. doi:10.1002/capr.12360
  • Braun V, Clarke V. Thematic analysis. University of Auckland. Accessed September 18, 2023. https://www.thematicanalysis.net/
  • Braun V, Clarke V. Answers to frequently asked questions about thematic analysis. University of Auckland. Accessed September 18, 2023. https://www.thematicanalysis.net/faqs/
  • Fereday J, Muir-Cochrane E. Demonstrating Rigour Using Thematic Analysis: A Hybrid Approach of Inductive and Deductive Coding and Theme Development. International Journal of Qualitative Methods . 2006;5(1):80-92. doi: 10.1177/160940690600500107
  • Ayton D, Manderson L, Smith BJ. Barriers and challenges affecting the contemporary church’s engagement in health promotion. Health Promot J Austr . 2017;28(1):52-58. doi:10.1071/HE15037
  • Ayton D, Watson E, Betts JM, et al. Implementation of an antimicrobial stewardship program in the Australian private hospital system: qualitative study of attitudes to antimicrobial resistance and antimicrobial stewardship. BMC Health Serv Res . 2022;22(1):1554. doi:10.1186/s12913-022-08938-8
  • McKenna-Plumley PE, Graham-Wisener L, Berry E, Groarke JM. Connection, constraint, and coping: A qualitative study of experiences of loneliness during the COVID-19 lockdown in the UK. PLoS One . 2021;16(10):e0258344. doi:10.1371/journal.pone.0258344
  • Dickinson BL, Gibson K, VanDerKolk K, et al. “It is this very knowledge that makes us doctors”: an applied thematic analysis of how medical students perceive the relevance of biomedical science knowledge to clinical medicine. BMC Med Educ . 2020;20(1):356. doi:10.1186/s12909-020-02251-w
  • Bunzli S, O’Brien P, Ayton D, et al. Misconceptions and the acceptance of evidence-based nonsurgical interventions for knee osteoarthritis. A Qualitative Study. Clin Orthop Relat Res . 2019;477(9):1975-1983. doi:10.1097/CORR.0000000000000784

Qualitative Research – a practical guide for health and social care researchers and practitioners Copyright © 2023 by Darshini Ayton is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License , except where otherwise noted.

Share This Book

example of thematic analysis in research

  • Voxco Online
  • Voxco Panel Management
  • Voxco Panel Portal
  • Voxco Audience
  • Voxco Mobile Offline
  • Voxco Dialer Cloud
  • Voxco Dialer On-premise
  • Voxco TCPA Connect
  • Voxco Analytics
  • Voxco Text & Sentiment Analysis

example of thematic analysis in research

  • 40+ question types
  • Drag-and-drop interface
  • Skip logic and branching
  • Multi-lingual survey
  • Text piping
  • Question library
  • CSS customization
  • White-label surveys
  • Customizable ‘Thank You’ page
  • Customizable survey theme
  • Reminder send-outs
  • Survey rewards
  • Social media
  • Website surveys
  • Correlation analysis
  • Cross-tabulation analysis
  • Trend analysis
  • Real-time dashboard
  • Customizable report
  • Email address validation
  • Recaptcha validation
  • SSL security

Take a peek at our powerful survey features to design surveys that scale discoveries.

Download feature sheet.

  • Hospitality
  • Academic Research
  • Customer Experience
  • Employee Experience
  • Product Experience
  • Market Research
  • Social Research
  • Data Analysis

Explore Voxco 

Need to map Voxco’s features & offerings? We can help!

Watch a Demo 

Download Brochures 

Get a Quote

  • NPS Calculator
  • CES Calculator
  • A/B Testing Calculator
  • Margin of Error Calculator
  • Sample Size Calculator
  • CX Strategy & Management Hub
  • Market Research Hub
  • Patient Experience Hub
  • Employee Experience Hub
  • NPS Knowledge Hub
  • Market Research Guide
  • Customer Experience Guide
  • Survey Research Guides
  • Survey Template Library
  • Webinars and Events
  • Feature Sheets
  • Try a sample survey
  • Professional Services

example of thematic analysis in research

Get exclusive insights into research trends and best practices from top experts! Access Voxco’s ‘State of Research Report 2024 edition’ .

We’ve been avid users of the Voxco platform now for over 20 years. It gives us the flexibility to routinely enhance our survey toolkit and provides our clients with a more robust dataset and story to tell their clients.

VP Innovation & Strategic Partnerships, The Logit Group

  • Client Stories
  • Voxco Reviews
  • Why Voxco Research?
  • Careers at Voxco
  • Vulnerabilities and Ethical Hacking

Explore Regional Offices

  • Survey Software The world’s leading omnichannel survey software
  • Online Survey Tools Create sophisticated surveys with ease.
  • Mobile Offline Conduct efficient field surveys.
  • Text Analysis
  • Close The Loop
  • Automated Translations
  • NPS Dashboard
  • CATI Manage high volume phone surveys efficiently
  • Cloud/On-premise Dialer TCPA compliant Cloud on-premise dialer
  • IVR Survey Software Boost productivity with automated call workflows.
  • Analytics Analyze survey data with visual dashboards
  • Panel Manager Nurture a loyal community of respondents.
  • Survey Portal Best-in-class user friendly survey portal.
  • Voxco Audience Conduct targeted sample research in hours.
  • Predictive Analytics
  • Customer 360
  • Customer Loyalty
  • Fraud & Risk Management
  • AI/ML Enablement Services
  • Credit Underwriting

example of thematic analysis in research

Find the best survey software for you! (Along with a checklist to compare platforms)

Get Buyer’s Guide

  • 100+ question types
  • SMS surveys
  • Financial Services
  • Banking & Financial Services
  • Retail Solution
  • Risk Management
  • Customer Lifecycle Solutions
  • Net Promoter Score
  • Customer Behaviour Analytics
  • Customer Segmentation
  • Data Unification

Explore Voxco 

Watch a Demo 

Download Brochures 

  • CX Strategy & Management Hub
  • The Voxco Guide to Customer Experience
  • Professional services
  • Blogs & White papers
  • Case Studies

Find the best customer experience platform

Uncover customer pain points, analyze feedback and run successful CX programs with the best CX platform for your team.

Get the Guide Now

example of thematic analysis in research

VP Innovation & Strategic Partnerships, The Logit Group

  • Why Voxco Intelligence?
  • Our clients
  • Client stories
  • Featuresheets

Role Of Thematic Analysis In Qualitative Research

SHARE THE ARTICLE ON

Thematic Analysis: Definition, Methods & Examples employee experience

Whether you are a market researcher gathering customer insights or a social scientist exploring human behavior, listening to respondents’ unfiltered and unrestricted voices can provide more valuable insights. And, so thematic research stands as a powerful data analysis method that empowers researchers to gather textual feedback and harness the potential of data. 

In this blog, we’ll dive into thematic analysis and how it aids the world of research. 

What is thematic analysis?

As the name suggests, Thematic Analysis means analyzing the patterns of themes in data. It is a method of qualitative data analysis. It means this method can be used to analyze non-numerical data such as textual feedback you gather using surveys. 

The process involves systematically organizing and interpreting open-ended feedback to uncover meaningful insights and identify underlying patterns within the data. Tools such as text analysis software help you automatically perform analysis and identify themes and patterns. 

The analysis approach enables you to contextualize experiences, understand participants’ perspectives, and explore the hidden meaning behind their behavior. 

Example of a real-world application of thematic analysis.  

Say a tech company conducts market research to understand customer satisfaction. The company collects textual feedback from all its active customers. 

The company uses text analysis tools to perform thematic analysis to make sense of the data. After careful analysis, they identify several recurring themes such as “customer support,” “billing,” “ease of use,” and “product performance.” 

Through this analysis method, the tech company can now quantify the recurrence of each key theme and draw insights into areas that need improvement and where they excelled. 

Read how Voxco helped HRI to conduct complex research studies & speed up insight generation.

How does thematic analysis help in research questions.

Thematic Analysis: Definition, Methods & Examples employee experience

The patterns can be analyzed by repetitive data reading, data coding, and theme creation. The importance of thematic analysis is in exploring human experiences, attitudes, and behavior. 

  • It allows you to discover patterns and trends, thus revealing recurring themes that may otherwise go unnoticed. 
  • You can generate a rich and nuanced understanding of respondents’ perspectives. 
  • It helps you contextualize data by offering a comprehensive interpretation. 

Let’s take an example to understand how thematic analysis helps in research questions . 

How has social media changed over the years?

The above research will need you to gather data from sources, blogs, news, interviews posted online. Interview a few new generation users of the platforms and the old users to gather intel about how they use the social platforms and what their experience is. 

Also read: Quantitative Research Question

New call-to-action

Who does thematic analysis?

Some of the established players have started implementing Thematic Analysis to improve their Manual Rules processes but tend to produce a list of terms that are difficult to review. This approach works well for text analytics platforms that are focused on improving the customer experience. However, it avoids generating generic solutions that are usually not designed to solve the problem. 

Only a small portion of feedback is linked to the top 10 themes. Uncategorized feedback means that you can’t slice the data to get deeper insights. Thematic research is a method that can easily analyze text-based feedback from multiple sources, such as email, social media, and real estate brokers.

Empowering decision-makers with text analysis.

Explore how easy it is to conduct sophisticated statistical analysis and create one-click summaries, custom live dashboards, and in-depth reports with Voxco Analytics.

What are some best practices for survey data visualization?

As we are towards the end of the article, let’s look at some of the best practices to ensure consistent, efficient, and comprehensive data visualization. 

1. Ensuring interpretability and clarity: 

Use visualization elements that are clear, concise, and easy to interpret. Use appropriate labels, titles, and context to communicate complex information with completeness clearly. 

2. Consider accessibility of the report: 

Make survey the survey data visualization tools accessible to all responsible users. The platform must adhere to data security standards and guidelines to ensure any authorized user is unable to access it. 

3. Ethical data handling: 

Leverage a survey analytics software that adheres to data privacy regulations and ethical standards when handling survey data. Obtain informed consent from respondents and implement robust data security to protect data integrity and uphold ethical principles. 

One-stop-shop to gather, measure, uncover, and act on insightful data.

→ Learn how your customers feel & why

→ Identify recurring trends & patterns

→ Resolve customer issues in real-time

What are the advantages of using thematic analysis?

Thematic analysis in qualitative research is an unsupervised approach that enables you to create categories and perform statistical tests without having to set up any rules or procedures in advance.

  • In-depth insights: 

The method allows you to gain deeper insights into participants’ motivations, emotions, and perspectives. 

  • User-friendly: 

The analysis is highly accessible as most modern survey software provides you with the capability of text analysis. 

  • Holistic understanding: 

Exploring multiple recurring themes in context with the research objective creates a comprehensive understanding of the topic. 

  • Applicability: 

The data analysis method is useful for various fields like market research, customer experience research, social research, healthcare research, etc. 

What are the disadvantages of using thematic analysis?

Thematic analysis is an unsupervised approach that enables you to create categories and perform statistical tests without having to set up any rules or procedures in advance.

  • Overlooks minor themes: 

Often times, you may overlook minor themes or less recurring patterns, leading to oversight of crucial insights. 

  • Limited context: 

Thematic analysis is typically phrase-based. Sometimes, it can’t capture the meaning of a phrase correctly. For instance, in a complex narrative it can’t capture the customer’s intent to stop using the service.

What are the different types of thematic analysis?

With an adequate understanding of the method, let us dive into the various types of thematic analysis in qualitative research. 

1. Inductive:

Focuses on developing a theory. This approach is used when there is not much information available on a topic and you have to build a theory straight from scratch. You can always validate this approach but it is hard to prove that observation made from this approach is correct. The inductive approach consists of three stages:

  • Observation- a road has a busy traffic
  • Look for a pattern- the road has busy traffic from 9 am to 6 pm.
  • Develop a theory- a road has busy traffic during working hours.

2. Deductive:

Focuses on testing an existing theory. It totally depends on the Inductive approach as you start from working on an already existing theory. You go on formulating the theory and derive a conclusion out of it. The genuineness of deductive theory depends on how much true inductive theory is. The deductive theory has four stages:

  • Starting with a theory- the road has busy traffic during working hours. 
  • Formulate a hypothesis- generally, all roads are busy during working hours.
  • Collecting data to study hypothesis- observing all the roads during the working hours every day. 
  • Analyze the result (does the collected data reject or validate the hypothesis)- since all the roads are busy during working hours -> support a hypothesis. 

3. Semantic:

Focuses on the details of the data. We research the data on the grounds that it has some secondary meaning and purpose to it. This will help to construct insights and information regarding how the data was being used. 

Focuses beyond the semantics of the data and works more on the underlying meanings, concepts, and assumptions that we made earlier with the semantic approach.

In order to choose the best-fit approach for your study, go through its requirements and which approach or combination of approaches will best align with your data. 

Also read: How to make the most of your data analysis in research?

Enhance your research efficiency with Voxco. Gather, measure, uncover, and act on insightful data Click Below To Get A Personalized Quote.

How to conduct thematic analysis.

There are six steps involved in conducting thematic analysis:

1: Familiarization

3: Generating themes

4: Reviewing themes

5: Defining themes

Once you have gathered adequate data and chosen your suitable approach, it is time to follow the following steps to build your thematic analysis for your problem statement. 

Step 1: Familiarization 

It is important to be familiar with the data before we begin to dig deep into the individual topics. This can include re-reading the whole data, having an overview of its context, and taking out personal notes if necessary. This is will help you to know your data. 

Step 2: Coding 

This includes highlighting or labelling certain words or group of words or even phrases in the data that all together indicates something. This something will come in handy when you are trying to grab the essence of the data. Let’s take an example to understand this:

Example: How has social media changed over the years?

Let’s say we are interviewing an old social media user here and her opinion on the problem statement. She says, 

“ I think the social media platforms are not for us oldies anymore. The trends are rapidly changing and there is always something new on the wall every day. It becomes difficult for people like me to keep up with those. Hence we often feel disconnected. ” 

Now we can derive codes for the highlighted phrases like; Fast change | Uninterested | Discomfort

Step 3: Generating Themes

Now that we have our codes, we can derive themes from them. Themes can have several codes indicating the same expressions. As for our above example, we can have a theme called “not satisfied” for all the codes we derived from the interview. This will give an idea about how many codes are being used again and again and which ones of them serve no purpose so we can just discard them. 

Step 4: Reviewing themes

Here we compare the themes with our original data and look for any missing points or irrelevant results. We can modify our themes depending on how they satisfy and justify the data after tracing them back to it. 

Step 5: Defining themes

Further ahead, we can name the themes depending on what they indicate and what we get to understand from it about the data. 

Step 6: Writing

For the last step, we will the results that we have come to and the conclusion that our thematic analysis has helped us to understand. As per our example, we can conclude that social media has changed so much that the older generations find it hard to interact with and result in their dissatisfaction on the matter. 

So that is how to regulate your perfect Thematic Analysis for the next time you decide to research a problem statement. 

Voxco helps the top 50 MR firms & 500+ global brands gather omnichannel feedback, measure sentiment, uncover insights, and act on them.

See how voxco can enhance your research efficiency..

In qualitative research, thematic analysis is a powerful and flexible data analysis tool with an ability to produce comprehensive findings, reveal underlying meanings, and detect patterns that aid in drawing data-based conclusions as well as decisions by researchers. To leverage the full potential of this approach, integrate it with quantitative and other qualitative data analysis methods. Mixed-method blends the benefits of both analysis approach and empowers you to generate broader and rich insight. 

Market Research toolkit to start your market research surveys and studies.

What is the thematic analysis method?

The thematic analysis refers to the process of identifying, interpreting, and reporting these themes in textual data. It is a method for identifying, analyzing and reporting patterns within data.

What are the 5 steps of thematic analysis?

The common approach to thematic analysis involves five steps: 

  • Familiarization
  • Generating themes
  • Reviewing themes
  • Defining and naming themes

What is the meaning of making thematic analysis?

Meaning-making thematic analysis involves interpreting the data to derive deeper insights and understand underlying meanings.

What is an example of doing thematic analysis?

A thematic analysis example could involve studying participants’ experiences on dating sites. Researchers used a thematic analysis to study participants’ experiences on dating sites, analyzing qualitative data to identify trending issues and recurrent patterns, providing insights into young females’ mindsets and social interactions.

Explore Voxco Survey Software

+ Omnichannel Survey Software 

+ Online Survey Software 

+ CATI Survey Software 

+ IVR Survey Software 

+ Market Research Tool

+ Customer Experience Tool 

+ Product Experience Software 

+ Enterprise Survey Software 

Thematic Analysis: Definition, Methods & Examples employee experience

How Netflix’ Employee & Customer Experience has helped them grow double their expectations during the crisis

How Netflix’ Employee & Customer Experience has helped them grow double their expectations during the crisis Read Netflix’s secret to customer experience Get our in-depth

Digital Customer Experience cvr

What is Digital CX?

What is Digital CX? Digital Customer Experience Ensuring an excellent digital customer experience can be tricky but an effective guide can help. Download Now SHARE

Ad Recall Survey Questions cvr

Self-Administered Paper Questionnaires

Self-Administered Paper Questionnaires SHARE THE ARTICLE ON Table of Contents What are Self-Administered Paper Questionnaires? A self-administered paper questionnaires is a tool used for the

pexels photo 5428833 L

Types of Survey with Examples

17 Types Of Surveys With Examples SHARE THE ARTICLE ON Table of Contents There are two ways you can conduct research to gather the desired

What is a Panel Survey1

What is a Panel Survey

What is a Panel Survey? Transform your insight generation process Use our in-depth online survey guide to create an actionable feedback collection survey process. Download

How to conduct Social Research1

How to conduct Social Research?

How to conduct Social Research? SHARE THE ARTICLE ON Table of Contents How to conduct Social Research? Social research can be defined as the study

We use cookies in our website to give you the best browsing experience and to tailor advertising. By continuing to use our website, you give us consent to the use of cookies. Read More

Name Domain Purpose Expiry Type
hubspotutk www.voxco.com HubSpot functional cookie. 1 year HTTP
lhc_dir_locale amplifyreach.com --- 52 years ---
lhc_dirclass amplifyreach.com --- 52 years ---
Name Domain Purpose Expiry Type
_fbp www.voxco.com Facebook Pixel advertising first-party cookie 3 months HTTP
__hstc www.voxco.com Hubspot marketing platform cookie. 1 year HTTP
__hssrc www.voxco.com Hubspot marketing platform cookie. 52 years HTTP
__hssc www.voxco.com Hubspot marketing platform cookie. Session HTTP
Name Domain Purpose Expiry Type
_gid www.voxco.com Google Universal Analytics short-time unique user tracking identifier. 1 days HTTP
MUID bing.com Microsoft User Identifier tracking cookie used by Bing Ads. 1 year HTTP
MR bat.bing.com Microsoft User Identifier tracking cookie used by Bing Ads. 7 days HTTP
IDE doubleclick.net Google advertising cookie used for user tracking and ad targeting purposes. 2 years HTTP
_vwo_uuid_v2 www.voxco.com Generic Visual Website Optimizer (VWO) user tracking cookie. 1 year HTTP
_vis_opt_s www.voxco.com Generic Visual Website Optimizer (VWO) user tracking cookie that detects if the user is new or returning to a particular campaign. 3 months HTTP
_vis_opt_test_cookie www.voxco.com A session (temporary) cookie used by Generic Visual Website Optimizer (VWO) to detect if the cookies are enabled on the browser of the user or not. 52 years HTTP
_ga www.voxco.com Google Universal Analytics long-time unique user tracking identifier. 2 years HTTP
_uetsid www.voxco.com Microsoft Bing Ads Universal Event Tracking (UET) tracking cookie. 1 days HTTP
vuid vimeo.com Vimeo tracking cookie 2 years HTTP
Name Domain Purpose Expiry Type
__cf_bm hubspot.com Generic CloudFlare functional cookie. Session HTTP
Name Domain Purpose Expiry Type
_gcl_au www.voxco.com --- 3 months ---
_gat_gtag_UA_3262734_1 www.voxco.com --- Session ---
_clck www.voxco.com --- 1 year ---
_ga_HNFQQ528PZ www.voxco.com --- 2 years ---
_clsk www.voxco.com --- 1 days ---
visitor_id18452 pardot.com --- 10 years ---
visitor_id18452-hash pardot.com --- 10 years ---
lpv18452 pi.pardot.com --- Session ---
lhc_per www.voxco.com --- 6 months ---
_uetvid www.voxco.com --- 1 year ---

example of thematic analysis in research

  • Get Started!

example of thematic analysis in research

A Comprehensive Guide to Thematic Analysis in Qualitative Research

don't change careers alone ad

What is Qualitative Data?

What do all the methods above have in common? They result in loads of qualitative data. If you're not new here, you've heard us mention qualitative data many times already. Qualitative data is non-numeric data that is collected in the form of words, images, or sound bites. Qual data is often used to understand people's experiences, perspectives, and motivations, and is often collected and sorted by UX Researchers to better understand the company's users. Qualitative data is subjective and often in response to open-ended questions, and is typically analyzed through methods such as thematic analysis, content analysis, and discourse analysis. In this resource we'll be focusing specifically on how to conduct an effective thematic analysis from scratch! Qualitative data is the sister of quantitative data, which is data that is collected in the form of numbers and can be analyzed using statistical methods. Qualitative and quantitative data are often used together in mixed methods research, which combines both types of data to gain a more comprehensive understanding of a research question.

UX Research Methods

There are many different types of UX research methods that can be used to gather insights about user behavior and attitudes. Some common UX research methods include:

  • Interviews: One-on-one conversations with users to gather detailed information about their experiences, needs, and preferences.
  • Surveys: Online or paper-based questionnaires that can be used to gather large amounts of data from a broad group of users.
  • Focus groups: Group discussions with a moderated discussion to explore user attitudes and behaviors.
  • User testing: Observing users as they interact with a product or service to identify problems and gather feedback.
  • Ethnographic research: Observing and interacting with users in their natural environments to gain a deep understanding of their behaviors and motivations.
  • Card sorting: A technique used to understand how users categorize and organize information.
  • Tree testing: A method used to evaluate the effectiveness of a website's navigation structure.
  • Heuristic evaluation: A method used to identify usability issues by having experts review a product and identify potential problems.
  • Expert review: Gathering feedback from industry experts on a product or service to identify potential issues and areas for improvement.

Introduction to Thematic Analysis of Qualitative Data

Thematic analysis is a popular way of analyzing qualitative data, like transcripts or interview responses, by identifying and analyzing recurring themes (hence the name!). This method often follows a six-step process, which includes getting familiar with the data, sorting and coding the data, generating your various themes, reviewing and editing these themes, defining and naming the themes, and writing up the results to present. This process can help researchers avoid confirmation bias in their analysis. Thematic analysis was developed for psychology research, but it can be used in many different types of research and is especially prevalent in the UX research profession.

When to Use Thematic Analysis

Thematic analysis is a useful method for analyzing qualitative data when you are interested in understanding the underlying themes and patterns in the data. Some situations in which thematic analysis might be appropriate include:

  • When you have a large amount of qualitative data, such as transcripts from interviews or focus groups.
  • When you want to understand people's experiences, perspectives, or motivations in depth.
  • When you want to identify patterns or themes that emerge from the data.
  • When you want to explore complex and open-ended research questions.
  • When you are interested in understanding how people make sense of their experiences and the world around them.

Some UX research specific questions that could be a good fit for thematic analysis are:

  • How do users think about their experiences with a particular product, service or company?
  • What are the common challenges that a user might encounter when using a product or service, and how do they overcome them?
  • How do users make sense of the navigation of a website or app?
  • What are the key drivers of user satisfaction or dissatisfaction with a product or service?
  • How do users' experiences with a product or service compare with their expectations?

It is important to keep in mind that thematic analysis is just one of many methods for analyzing qualitative data, and it may not be the most appropriate method for every research question or situation. A key part of a UX researcher's role is being aware of the most appropriate research method to use based on the problem the company is trying to solve and the constraints of the company's research practice.

Types of Thematic Analysis

There are two primary types of thematic analysis, called inductive and deductive approaches. An inductive approach involves going into the study blind, and allowing the results of the data-capture to guide and shape the analysis and theming. Think of it like induction heating-- the data heats your results! (OK, we get it, that was a bad joke. But you won't forget now!) An example of an inductive approach would be parachuting onto a client without knowing much about their website, and discovering the checkout was difficult to use by the amount of people who brought it up. An easy theme! On the flip-side, a deductive approach involves attacking the data with some preconceived notions you expect to find in the qualitative data, based on a theory. For example, if you think your company's website navigation is hard to use because the text is too small, you may find yourself looking for themes like "small text" or "difficult navigation." We don't have a joke for this one, but we tried. To get even more nitty-gritty, there are two additional types of thematic analysis called semantic and latent thematic analysis. These are more advanced, but we'll throw them here for good measure. Semantic thematic analysis involves identifying themes in the data by analyzing the exact wording of the comments made used by participants. Latent thematic analysis involves identifying themes in the data by analyzing the underlying meanings and actions that were taken, but perhaps not necessarily stated by study participants. Both of these methods can be used in user research, though latent analysis is more popular because users often say different things than what they actually do.

Steps in Conducting a Thematic Analysis

Let's jump in! As mentioned before, there are 6 steps to completing a thematic analysis.

Step One: get familiar with your data!

This might seem obvious, but sometimes it's hard to know when to start. This might take the form of listening to the audio interviews or unmoderated studies, or reading the notes taken during a moderated interview. It's important to know the overall ideas of what you're dealing with to effectively theme your study. While you're doing this, pay attention to some big picture themes you can use in step two when you code your data. Break out key ideas from each participant. This might take the form of summarized answers for each question response, or a written review of actions taken for each task given. Just make sure to standardize it across participants.

Step Two: sort & code the data.

Now that you have your standardized notes across your participants, it's time to sort and code the collected qualitative data! Think of the themes from before when you were taking your notes. Think of these codes like metaphorical buckets, and start sorting! Every comment that fits a theme in a box, put it there. Back to our navigation example: some codes could be "small text" or "hard to use." We could put a participant action of "squinting" into the bucket for "small text," or a comment from another mentioning they had trouble finding "tents" in "hard to use."

Step Three: break the codes into themes!

Try to think of each theme as a makeup of three or more codes. For the navigation example, we could put both "small text," and "hard to use" into a theme of "Difficult Navigation."

Step Four: review and name your themes.

Now is the time to clean up the data. Are all your themes relevant to the problem you're trying to solve? Are all the themes coherent and straightforward? Are you comfortable defending your theme choices to teammates? These are all great questions to ask yourself in this stage.

Step Five: Present!!

To have a cohesive presentation of your thematic analysis, you'll need to include an introduction that explains the user problem you were trying to identify and the method you took to study it. Use the terminology from beginning of this resource to identify your research method. Usually for something like this, it will be a user survey or interview. ‍ You also need to include how you analyzed your participant data (inductive, deductive, latent or semantic) to identify your codes and themes. In the meaty section of your presentation, describe each theme and give quotations and user actions from the data to support your points.

Step Six: Insights and Recommendations

Your conclusion should not stop at your presentation of your findings. The best user researchers are valuable for both their insights and recommendations. Since UX researchers spend so much time with participants, they have indispensable knowledge about the best way to do things that make life easy for the company's users. Don't keep this information to yourself! On the final 1-3 slides of your presentation, state the "Next Steps & Recommendations" that you'd like your team and leadership to follow up on. These recommendations could include things like additional qualitative or quantitative studies, UX changes to make or test, or a copy change to make the experience clearer for readers. Your ultimate job is to create the best user experience, and you made it this far-- you got this!

And there you have it! That's everything you need to complete a thematic analysis of qualitative data to identify potential solutions or key concepts for a particular user problem. But don't stop there! We recommend using these principles in the wild to conduct research of your own. Identify a question or potential problem you'd like to analyze on one of your favorite sites. Use a service like Sprig to come up with non-bias questions to ask friends and family to try and gather your own qualitative data. Next, complete and document yourself completing the 6-step analysis process. What do you discover? Be prepared to share on interviews-- hiring managers love to see initiative! Good luck.

View the UX Research Job Guide Here

Our Sources: 

Caulfield, J. (2022, November 25). How to Do Thematic Analysis | Step-by-Step Guide & Examples . Scribbr. https://www.scribbr.com/methodology/thematic-analysis/

example of thematic analysis in research

BRIDGED AT A GLANCE

explore careers

Find information on career paths for high-paying roles that align with your strengths and goals. Try our easy quiz to help you get started.

target skill gaps

View the skills you need to learn and develop with our state-of-the-art gap identifier. This is your next stop once you've found a role!

review certifications

Learn about affordable and reputable certifications that won't break your bank. No expensive bootcamps or schooling required.

identify dream roles

We've vetted jobs at top companies that need talent! Easily match with companies that work with your job preferences.

your ultimate career platform

It’s almost impossible to get jobs without experience, and experience is impossible to get without a job. We're working to change that.

Thematic Analysis: Making Values Emerge from Texts

  • Open Access
  • First Online: 15 February 2022

Cite this chapter

You have full access to this open access chapter

example of thematic analysis in research

  • Arild Wæraas 2  

19k Accesses

14 Citations

This chapter explains how thematic analysis can be used to make values emerge from texts. Taking reflexive thematic analysis as its starting point, it begins by giving a general overview of the processes of coding and generating themes from codes. The chapter then presents three ways of generating themes from coded values: Grouping synonyms, grouping based on value type, and grouping based on semantic meaning. It also distinguishes between and gives examples of thematic coding of values at the explicit, implicit, and latent level. Overall, the chapter presents a five-step approach to thematic analysis of values: (1) assigning codes, (2) generating themes, and if possible (3) organizing themes, (4) identifying aggregate dimensions, and (5) making visual representations of codes and themes.

You have full access to this open access chapter,  Download chapter PDF

Similar content being viewed by others

example of thematic analysis in research

Inductive Content Analysis

example of thematic analysis in research

Thematic Analysis

example of thematic analysis in research

  • Thematic analysis
  • Visual representations
  • Data reduction

Introduction

When you have transcribedyourqualitativeinterviews, completed your field notes, and you have collected and sorted supporting documents, you most likely have a very large amount of data. How do you proceed when you want to understand the values that are conveyed in the texts, what they mean, and how they relate to each other?

If these are your questions, then thematic analysis could provide the answers. Thematic analysis is a flexible and systematic way of making sense of qualitative data. It can be applied to any kind of written document such as interview transcripts, annual reports, strategy documents and marketing materials, blogs, observation field notes, employment advertisements, letters to shareholders, press releases, and even YouTube videos and photographs. More importantly, thematic analysis can serve to analyse any way of expressing values, explicitly as well as implicitly.

Thematic analysis is not a research design or methodology in its own right, as it only deals with the analysis of existing data. It does not exist in one single version, and many aspects of it can be found in other methods for analysis such as qualitativecontent analysis (Schreier, 2012 ), grounded theory (Glaser & Strauss, 1999 ), narrative analysis (Esin et al., 2014 ) (see Chap. 11 by Espedal and Synnes in this volume), and text condensation analysis (Malterud, 2012 ). These methods employ different concepts to describe similar aspects and stages of qualitative analysis without necessarily referring to their approach as thematic, the result of which can be confusing (Braun & Clarke, 2020 ). In this chapter I do not attempt to bring clarity to this variety, nor do I propose a new way of analysing qualitative data. Rather I discuss the merits of applying some principles of thematic analysis to a specific empirical field; the research on values in organisational settings, and I offer examples of how this can be done. In doing so I draw mainly on a reflexive approach to thematic analysis (Braun & Clarke, 2006 , 2012 , 2020 ), in contrast to reliability- or codebook-based versions (Boyatzis, 1998 ; Guest et al., 2011 ; Hayes, 1997 ). My outline of thematic analysis of values is also inspired by Gioia et al. ( 2012 ), emphasizing inductive-based analysis grounded in data rather than deductive, theory-based analysis.

The chapter is structured as follows: First I describe the general aspects of thematic analysis and present the main concepts of thematic analysis such as codes and themes. I then review some common principles for performing thematic analysis of texts. Finally, I show how thematic analysis of values can be performed. I will not address the use of computer software programmes, although readers should note that these can be very useful for handling the technical aspects of thematic analysis (see e.g. Paulus and Lester ( 2016 ) or Saillard ( 2011 )).

Thematic Analysis: A Brief Overview

Thematic analysis is a method for systematically describing and interpreting the meaning of qualitative data by assigning codes to the data and reducing the codes into themes, followed by an analysis and presentation of these themes. Thematic analysis thus combines a structured approach with the researcher’s subjective interpretation. This combination is a key characteristic and strength of thematic analysis, as it draws on the merits of systematically documenting all the steps in the process of analysing data at the same time as it allows the researcher considerable creativity in attaching meaning to the data. The researcher determines the themes, how many, and what they should be called. As such, thematic analysis does not presuppose the existence of one single “truth” in the data, waiting to be discovered once and for all, nor does it assume that coding is necessarily “accurate” or “objective” (Braun & Clarke, 2020 ). Rather, it requires a sort of deep immersion by the researcher into the data that eventually leads to themes being generated from the data rather than discovered in the data.

The structured aspects of thematic analysis revolve around the concepts of codes and coding. The process involves initial coding of the data, followed by a second round of coding whereby codes are grouped into themes and often organized in relation to each other. In the following I briefly explain these steps. A third round of coding can be added to identify aggregate dimensions, followed by visualrepresentations of the codes andthemes. I will illustrate these last steps towards the end of the chapter.

Assigning Codes to Data

Codes are the building blocks of thematic analysis. In the first round of coding, you use them to label text segments (coding units) that seem relevant to your research question. Briefly stated, a code is a label assigned to a coding unit, intended to capture the meaning of that unit.

The coding unit may vary from a single word to several paragraphs. The meaning conveyed by the unit determines the coding unit. As a rule of thumb, the coded text segment should always be sufficiently large to retain its meaning when taken out of context.

Where do the codes come from? You can determine (at least some of) the codes before you begin the analysis, in which case you develop them in a theory-driven or deductive way. You can also develop them during the analysis, in which case your approach is inductive and data-driven, similar to open coding used in grounded theory (Strauss & Corbin, 1998 ). Alternatively, you can use a combination of deductiveand inductive approaches. In any case, predetermined codes are rarely sufficient alone in order to capture the breadth of the data. This chapter focuses on the inductive, data-driven approach only, although it should be recognized that thematic analysis cannot be entirely inductive since your pre-existing knowledge and theoretical concepts will always influence what you see in the data.

Should you rephrase the words in the text when developing codes or use the same words as those in the text? A distinction can be made between in vivo codes and descriptive codes (Saldaña, 2015 ; Strauss & Corbin, 1998 ). In vivo codes are taken directly from the text, meaning that the code assigned to a coding unit is exactly the same as the coding unit. Thus, if the word “seeking integrity” appears in the text and is important with respect to the research question, the in vivo code for that specific coding unit is also “seeking integrity”. In vivo codes are informant-centric, and useful if it is important for you to ensure an as close relationship as possible between informant/textual expressions and codes.

Descriptive codes, by contrast, are researcher-centric codes that you create yourself to describe the meaning of a coding unit by developing another, shorter way to express what you think is conveyed by that unit. For example, “seeking integrity” could be the descriptive code if you determine that this is the meaning of a sentence or a paragraph, even if the words “seeking” and “integrity” are not used in the text. Descriptive codes are useful when in vivo codes do not sufficiently represent the nuances and the meaning of the text, and/or when the coding unit is large.

A final distinction can be made between semantic and latent codes (Braun & Clarke, 2006 ). Semantic codes are descriptive codes or in vivo codes; they describe the explicit or manifest meanings of the data. By contrast, latent codes are descriptive codes that you develop to identify what you think goes on beyond the data by identifying the underlying ideas, assumptions, or ideologies that have produced the patterns in the data. Both semantic and latent codes can involve making inferences about something that is not directly observable. The difference is that whereas semantic codes seek to show patterns in semantic content and establish the meaning of what is expressed, latent codes seek to determine what produced those meanings.

From Codes to Themes

When codingthe data, you will eventually notice that some codes convey similar meanings. If so, something important about the data in relation to the research question has been observed. In a second round of coding, you can then decide to group these codes together into themes. Themes are higher level theoretical constructs than codes because they encapsulate the meanings conveyed by many codes. They are “patterns of shared meaning cohering around a central concept” (Braun & Clarke, 2020 , p. 4).

Your judgement as a researcher is critical in order to determine not only which themes are important for your research question but also when a set of codes forms a theme and how many themes should be generated. There is no rule for how many themes you should end up with, although at some point you will probably notice that adding an extra theme to the ones you already have no longer provides useful information. You may actually be more likely to merge some of the themes you have identified, especially if you have a large number of them.

There is also no rule for how many occurrences of a code or similar codes are needed in order to create a theme. Whereas one theme may be prevalent in every interview transcript or text and backed by thirty codes with similar meanings, other themes may be present in much fewer transcripts and texts and supported by only a handful of codes. The themes that are less prevalent may still be very important if they capture something new, essential, or revealing about the phenomenon of interest.

Organizing Themes

Once you have generated themes from codes, your analysis may stop here, in which case the next step is to report your themes as your findings. However, you could also undertake an additional analytical step by examining how the themes are connected. To figure out the connections, ask yourself the following questions (cf. Saldaña, 2015 , p. 247): Do the themes make more sense if they are arranged chronologically? Which theme seems to logically precede the other themes? Does one theme influence another? Is there a hierarchical relationship between them? Can some themes be understood as subthemes and others as aggregate themes?

When developing answers to these questions, you may be able to see a connection between the themes that becomes an important part of your findings. If this is the case, your analysis may end up proposing a grounded theory model (Gioia et al., 2012 ). However, regardless of whether it does so or not, keep in mind that your themes are your findings. When presenting your findings, it is important that you structure your presentation around the themes and back up your claims with relevant quotes that address the research question.

Thematic Analysis of Values

The coding process in thematic analysis of values varies depending on some features of the values to be studied and the goal of your research. Two important questions to address are:

Are the values explicit or implicit in the text? This is to say, do the informants and the documents you have collected make direct references to values, or do you need to “read between the lines” to observe them?

Is the goal of your research primarily to report the values as they are articulated explicitly or implicitly in the text, or do you want to go “deeper” in order to understand how the values relate to latent beliefs and assumptions?

Coding Explicitly Expressed Values

Let us first consider the simplest case, which is when the data consists of texts that make explicit references to values, and your primary aim is to describe these values. In this case, the values are easily identifiable, and you will only have to deal with the question of what counts as a value rather than interpreting the text in order to establish them. Perhaps you asked your informants to talk about the values that are important to them, or perhaps you are studying official core values statements retrieved from strategy documents or web pages. In both of these cases, core values will be explicitly mentioned in the texts or transcripts. An example is provided in Table 9.1  below. It shows an excerpt from an analysis of the values found in a large university’s core values statement (NTNU, 2018 ).

The table highlights the values in the text in the left column. In the right column, each value is now an in vivo code. In other words, the coding unit is one word (or sometimes several words, but rarely many), and the code is the same as the coding unit. If you are using a software for qualitativedata analysis, the table looks quite similar to what you would see on your screen. On the left you identify and highlight the values; on the right you assign codes. The codes used in this example are in vivo codes only. The procedure for descriptive codes is basically the same, except that the coding units are likely to be larger because more than one word is needed to represent a value.

If your data material consists of explicit text segments such as this one, you should be able to produce a long list of data-driven codes that correspond exactly or at least very closely to the values in the text and then look for themes emerging from that list that could provide a better understanding of the values and your research questions.

Coding Implicit Values

In some cases, yourdata material is likely to speak about values in a more subtle way. This could be because the abstract nature of values makes it difficult to elicit information about values from informants, even when they are asked direct questions. Also, many written documents and other sources are not created specifically for the purpose of describing values. This does not mean that these texts do not contain values. What it means is that you will need to look for the values that are hidden in the language of the text and make a judgement about which values are implicitly invoked. Coding at the implicit level requires interpretation, meaning that you will have to infer from your observations something that is not directly said. For this type of coding, it will be necessary to rely more on descriptive codes rather than in vivo codes.

Consider the example in Table 9.2  where the researcher wants to find out which values are expressed in different leadership philosophies. A French factory CEO describes his dreams for the ideal workplace in the following way (Minnaar, 2017

In this case, the CEO was not asked to reflect on the values on which his leadership is built, nor on what the values should be. He was simply asked to describe his leadership philosophy, and he actually does not explicitly mention a single value. However, the texts still express many important values. Generally, you should look for phrases such as “It’s important that”, “I like”, “I love”, “I need”, “I think”, “I feel”, and “I want” (Saldaña, 2015 , p. 113), or, as in the case described above, “I was dreaming of”.

Did you agree with the coding in the table above? Note that there could be multiple ways of delimiting the relevant coding units and assigning codes in this case. Two different researchers may not arrive at the same codes. For example, take the first sentence; “I was dreaming of a company where the worker would become the operator”. Alternative codes to “empowerment” could be “emancipation”, “liberation”, “enablement”, and other synonyms. Also note that if the research question was different, for example, if it involved examining the various components of leadership philosophies rather than identifying values, then the code could be “vision” or “worker-centric”, depending on the preferences of the researcher.

So far, we have seen an example of a text that was very explicit about its values, and another that was not. Usually, texts are not either explicit or implicit in this respect—they are a combination. Your coding should reflect this reality. Alternating between in vivo codes, descriptive codes, explicitly derived codes, and implicitly derived codes is perfectly possible in thematic analysis of texts.

Coding at the Latent Level

With some researchquestions your primary interest may be to understand what lies behind the values you see in the texts. In these cases, you are less interested in identifying which values the texts are talking about, explicitly and/or implicitly, and more interested in understanding the attitudes, ideas, beliefs, or assumptions that seem to underpin the values you observe in the text. Hence, latent thematic coding of values is based on the assumption that our beliefs shape the values we talk about and how we talk about them (similar to discourse analysis described in chapter 10 by Kivle and Espedal). As such, latent thematic coding could be especially relevant for highlighting and explaining differences between groups of informants. Questions you may ask yourself are: What do the values that you observe “really” mean in the context in which they are expressed? With what kind of characteristics, assumptions, or ideals do the texts associate the values? To which world views do the values “belong”? Does the text highlight some values as more important or essential than others?

As an example, consider in Table 9.3  again the example of the CEO who expressed his leadership

principles:

Although this informant makes indirect references to values, these values are not the main focus. Rather, we create codes for the assumptions and beliefs that we think produce these values. Doing so requires a thorough analysis of the claims in order to get an idea of what lies behind them. Notice that the codes consist of multiple words because they need to capture a more complex logic compared to descriptive codes that seek to reflect explicit or implicit values. This makes coding at the latent level more complex than coding at the explicit level.

Latent coding is complex also for a different reason: When analysing underlying assumptions and beliefs, your private beliefs could be challenged. For example, consider the statement: “I definitely feel like I need to hire more people with a different cultural and ethnic background”. Which latent belief or assumption lies behind this view? Without examining the rest of the text, at least two different interpretations are possible depending on your own views. One is “diversity is good for the workplace”, another is “political correctness is a necessary evil”. These beliefs are contradictory, yet both could arguably have produced the statement above. So, be careful: Before deciding on the latent belief, make sure you can justify your coding based on how the informants talk about their values, practices, and beliefs in the context in which they find themselves.

Generating Themes from Codes

Having developedcodes, your task is now to identify themes. The process of doing so can occur in different ways. Three alternatives are as follows:

Grouping synonyms : You are likely to discover that many of the values you have coded are synonyms with similar meanings. For example, according to the Merriam-Webster dictionary ( 2020 ), sincerity, openness, frankness, candour, honesty, impartiality, and trustworthinessare synonyms. If these codes are part of your list, they can form a theme. Choose a name for the theme that matches its contents (e.g. “sincerity”). Similarly, other synonyms such as empathy, sympathy, clemency, altruism, benevolence, kindness, and compassion can also be grouped into a theme and given a name, if they exist in your data. This is a straightforward way of generating themes from codes, although it is not well suited for latent codes. You also risk the possibility that some of the codes on your list do not have synonyms and consequently do not have a “home”. As a result, you may want to consider the other two alternatives:

Grouping codes of the same type : Many codes are likely to share features even if they are not synonyms. For example, when scholars classify values as belonging to the same type, they look for something that the values have in common. An example is Kernaghan’s ( 2000 ) typology of public service values. It groups values such as integrity and fairness into ethical values, impartiality and rule of law into democratic values, and effectiveness and service into professional values. The logic of this process of generating themes is the following: You consider whether a group of codes have similar meanings in the sense that they refer to similar aspects of organisational activities, practices, identities, or states. If they do, then you group them into a theme, and find a name for this theme. This approach is also relatively straightforward. However, again, this approach is not well suited for coding at the latent level, and it does not fully consider the semantic content of the codes.

Grouping codes based on semantic content: Finally, and perhaps most importantly, you can group codes based on their semantic content. In this case, the approach involves figuring out what the codes are saying about something or someone, and then condensing that information into themes, regardless of whether the codes that constitute the themes are synonyms and/or of the same type or not. This is usually not a straightforward process. Each theme will have to be phrased as a short sentence, and this can be done in a number of ways. You may be experimenting with some themes initially, discarding some, and splitting others into separate themes. You may also be moving codes back and forth from one theme to another multiple times before you make up your mind about which codes belong where and how to name the themes. Moreover, you may discover new themes as you are working with your data. In the end, you will have to make a decision about which codes go where, how many themes are necessary to represent the data, and how the themes should be named.

If possible, you should consider whether the themes can be further reduced into aggregate dimensions. This would be an additional round of coding and the last step of the coding process in which you connect all the themes around a few core dimensions. The aggregate dimensions could clarify certain shared aspects of the values or highlight common underlying assumptions, and they could form the basis for grounded theory development (Gioia et al., 2012 ).

Visual Representations

It is always useful to draw visual representations of your themes and their relationship with the coded values. By doing so, you keep track of all the codes and make sure they are grouped somewhere, and you can better demonstrate how you generated the themes. There are many ways of visually displaying codes and themes. Figures 9.1 and 9.2 show two examples of themes generated from the same set of initial codes. Note that the figures are not complete representations of the data set. In your own thematic analysis, the number of codes and themes is likely to be higher (for a more complete example, see Vaccaro and Palazzo [ 2015 ]).

A visual representation of codes and themes. Codes such as teamwork, innovation, quality, diversity, and collaboration are grouped into the theme of internal cultures and workplace values. The other set of codes such as honesty, comparison, respect, integrity, and family are grouped into the theme of relational values.

Codes grouped into themes based on type of code

A visual representation of codes and themes based on semantic content. Codes such as teamwork, innovation, quality, diversity, and collaboration are grouped into the theme of internal cultures and workplace values. The other set of codes such as honesty, comparison, respect, integrity, and family are grouped into the theme of relational values. Both themes are based on employees change supporting attitudes during organizational attitudes.

Codes grouped into themes based on semantic content of codes

When comparing the two figures you will notice that although the initial codes are the same, the themes are different. These differences not only reflect different ways of generating themes (the first figure is based on type of code, the second on semantic content), but also different research questions or purposes. In the first case, the purpose may be to understand what characterises the values of a particular organisation or group as they are expressed by employees and top managers. In the second case, the figure could reflect the desire to understand the implications for successful organisational change of the values that employees attach to their own organisation or group. In this case it is possible to develop an aggregate dimension that highlights the overall pattern in the themes.

The themes can also be displayed quantitatively as frequencies. You could, for example, create charts that rank the different themes on the basis of how many codes they contain. This could be useful for summarizing your findings. Note, however, that frequency charts should not be used as the only basis for presenting codes and themes, as this would be more similar to quantitative content analysis (and in some cases, qualitativecontent analysis [Schreier, 2012 ]).

Finally, if your themes are developed on the basis of semantic content or latent codes, your analysis may benefit from showing visually how the themes are connected (see e.g. Braun & Clarke, 2006 ; Gioia et al., 2012 ). Figures 9.3a – c outline three possible models. The first model shows a cyclical relationship between the themes, the second shows one central theme and three subthemes, and the last shows four themes in chronological order. You may find that your themes fit one of the models, but if not, you could develop your own variation of one of them, or you could develop an entirely different one.

First is the cyclic model of the themes, the second diagram in which theme1 is in the center and attach to it has three subthemes, and the third diagram in which the themes are organized in order of theme1, theme2, theme3, theme4.

(a) Themes organized in a cyclical model. (b) Themes organized as one central theme with three subthemes. (c) Themes organized in chronological order

Values often manifest themselves in texts, and thematic analysis is one way of making them emerge from those texts. This chapter has suggested a few ways of doing so. As a stepwise approach, thematic analysis of values can be summarized in the following way: (1) Assign codes, (2) generate themes, and if possible, (3) organize themes, (4) create aggregate dimensions from themes, and (5) make visual representations. The first two steps should be seen as essential to thematic analysis of values, the remaining ones can be added for further analysis and refinement.

The steps you take should be appropriate for your data and your research question, and you should never try to force fit your data to codes or themes or to a complex visual representation. If your research question only involves describing explicitly expressed values, steps 3 through 5 are probably redundant. If your goal is to understand how latent assumptions and ideas produce different value orientations in different types of settings, you may need all five steps. In any case, apply the principles outlined here with flexibility and creativity, and take your time to understand what kind of analysis your research questions require.

Values come to expression in different ways, and thematic analysis is one of many ways of understanding how. Its benefits lie in the reduction of information into a manageable and comprehensible body of data, which, in the end, is an important part of understanding abstract aspects of social life such as values.

Boyatzis, R. E. (1998). Transforming qualitative information: Thematic analysis and code development . Sage.

Google Scholar  

Braun, V., & Clarke, V. (2006). Using thematic analysis in psychology. Qualitative Research in Psychology, 3 (2), 77–101.

Article   Google Scholar  

Braun, V., & Clarke, V. (2012). Thematic analysis. In H. Cooper (Ed.), APA handbook of research methods in psychology. Research designs (Vol. 2). APA Press.

Braun, V., & Clarke, V. (2020). One size fits all? What counts as quality practice in (reflexive) thematic analysis? Qualitative Research in Psychology, 18 (3), 328–352.

Esin, C., Fathi, M., & Squire, C. (2014). Narrative analysis: The constructionist approach. In U. Flick (Ed.), The SAGE handbook of qualitative data analysis (pp. 203–216). Sage.

Chapter   Google Scholar  

Gioia, D. A., Corley, K. G., & Hamilton, A. L. (2012). Seeking qualitative rigor in inductive research: Notes on the Gioia methodology. Organizational Research Methods, 16 (1), 15–31.

Glaser, B. G., & Strauss, A. L. (1999). The discovery of grounded theory: Strategies for qualitative research . Aldine de Gruyter.

Guest, G., MacQueen, K. M., & Namey, E. E. (2011). Applied thematic analysis . Sage.

Hayes, N. (1997). Theory-led thematic analysis: Social identification in small companies. In N. Hayes (Ed.), Doing qualitative analysis in psychology (pp. 93–114). Psychology Press.

Kernaghan, K. (2000, March). The post-bureaucratic organization and public service values. International Review of Administrative Sciences, 66 (1), 91–104. <Go to ISI>://000086531400009.

Malterud, K. (2012). Systematic text condensation: A strategy for qualitative analysis. Scandinavian Journal of Public Health, 40 (8), 795–805.

Merriam-Webster Dictionaries. (2020). Thesaurus. https://www.merriam-webster.com/

Minnaar, J. (2017). FAVI: How Zobrist broke down FAVI’s command-and-control structures. https://corporate-rebels.com/zobrist/

NTNU. (2018). Kunnskap for en bedre verden. NTNU strategi 2018–2025 . NTNU. https://www.ntnu.no/ntnus-strategi/overordnet-mal#verdier

Paulus, T. M., & Lester, J. N. (2016). ATLAS. Ti for conversation and discourse analysis studies. International Journal of Social Research Methodology, 19 (4), 405–428.

Saillard, E. K. (2011). Systematic versus interpretive analysis with two CAQDAS packages: NVivo and MAXQDA. Forum Qualitative Sozialforschung/Forum: Qualitative Social Research, 12 (1), 1–21.

Saldaña, J. (2015). The coding manual for qualitative researchers . Sage.

Schreier, M. (2012). Qualitative content analysis in practice . Sage.

Strauss, A., & Corbin, J. (1998). Basics of qualitative research: Techniques and procedures for developing grounded theory (2nd ed.). Sage.

Vaccaro, A., & Palazzo, G. (2015). Values against violence: Institutional change in societies dominated by organized crime. Academy of Management Journal, 58 (4), 1075–1101.

Download references

Author information

Authors and affiliations.

VID Specialized University, Oslo, Norway

Arild Wæraas

You can also search for this author in PubMed   Google Scholar

Corresponding author

Correspondence to Arild Wæraas .

Editor information

Editors and affiliations.

Gry Espedal , Beate Jelstad Løvaas , Stephen Sirris  & Arild Wæraas , ,  & 

Rights and permissions

Open Access This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License ( http://creativecommons.org/licenses/by/4.0/ ), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter’s Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter’s Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

Reprints and permissions

Copyright information

© 2022 The Author(s)

About this chapter

Wæraas, A. (2022). Thematic Analysis: Making Values Emerge from Texts. In: Espedal, G., Jelstad Løvaas, B., Sirris, S., Wæraas, A. (eds) Researching Values. Palgrave Macmillan, Cham. https://doi.org/10.1007/978-3-030-90769-3_9

Download citation

DOI : https://doi.org/10.1007/978-3-030-90769-3_9

Published : 15 February 2022

Publisher Name : Palgrave Macmillan, Cham

Print ISBN : 978-3-030-90768-6

Online ISBN : 978-3-030-90769-3

eBook Packages : Business and Management Business and Management (R0)

Share this chapter

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

  • Publish with us

Policies and ethics

  • Find a journal
  • Track your research

Thematic Analysis

Student Examples of Good Practice

Sometimes it’s good to know what ‘doing a good job’ looks like… To help those wanting to understand what describing the reflexive TA process well might look like, we offer some good examples here, from student projects. This may be particularly helpful for students doing research projects, and for people very well-trained in positivism.

As well as the example(s) we provide here, you can find a much more detailed discussion in our book Thematic Analysis: A Practical Guide (SAGE, 2022).

Suzy Anderson (Professional Doctorate)

The following sections are by Suzy Anderson, from her UWE Counselling Psychology Professional Doctorate thesis – The Problem with Picking: Permittance, Escape and Shame in Problematic Skin Picking.

An example of a description of the thematic analysis process:

Process of Coding and Developing Themes

Coding and analysis were guided by Braun and Clarke’s (2006, 2013) guidelines for using thematic analysis. Each stage of the coding and theme development process described below was clearly documented ensuring that the evolution of themes was clear and traceable. This helped to ensure research rigour and means that process and dependability may be demonstrable.

I familiarised myself with the data by reading the transcripts several times while making rough notes. As data collection took place over a protracted period of time, coding of transcribed interviews began before the full dataset was available. Transcripts were read line-by-line and initial codes were written in a column alongside the transcripts. These codes were refined and added to as interviews were revisited over time. Throughout this process I was careful to note and re-read areas of relatively sparse coding to ensure they were not neglected. My supervisor also independently coded three of the interviews for purposes of reflexivity, providing an interesting alternative standpoint. I cross-referenced our two perspectives to notice and reflect on our differences of perspective.

Once initial coding was complete, I looked for larger patterns across the dataset and grouped the codes into themes (Braun & Clarke, 2006). I found it helpful to think of the theme titles as spoken in the first person, and imagine participants saying them, to check whether they reflected the dataset and participants’ meanings. I tried not to have my coding and themes steered by ideas, categories and definitions from previous research, to allow a more inductive, data-driven approach, while recognising my role as researcher in co-creation of themes (Braun & Clarke, 2013). However, there were times when the language of previous research appeared a good fit, such as in the discussion of ‘automatic’ and ‘focussed’ picking. Given that the experience of SP is an under-researched area, particularly from a qualitative perspective, and that the aim is for this study to contribute to therapeutic developments, themes were developed with the entire dataset in mind (Braun & Clarke, 2006), such that they would more likely be relevant to someone presenting in therapy for help with SP. There was clear heterogeneity in the interviews, and in cases where I have taken a narrower perspective on an experience (such as when describing an experience only true for some of the participants), I have tried to give a loose indication of prevalence and alternative views.

I created a large ‘directory’ of themes and smaller sub-themes, with the relevant participant quotations filed under each theme or sub-theme heading. This helped me to adjust theme titles, boundaries and position, meant that I could check that themes were faithful to the data at a glance, and was of practical help when writing the analysis.

The process of coding and developing themes was intended to have both descriptive and interpretive elements (using Braun & Clarke’s definitions, 2013). The descriptive element was intended to represent what participants said, while the interpretative element drew on my subjectivity to consider less directly evident patterns, such as those that might be influenced by social context or forces such as shame. This interpretation was of particular value to the current study as participants often struggled to find words for their experience and several reported or implied that they did not understanding the mechanisms of their picking. An interpretative stance meant that I could develop ideas about what they were able to describe and consider the relationships between these experiences, making sense of them alongside previous literature (Braun & Clarke, 2006). Writing was considered an integral part of the analysis (Braun & Clarke, 2013) and it helped me to adjust the boundaries of themes, notice more latent patterns and considered how themes and their content were related.

Given the known heterogeneity of picking I was keen to make sure my analysis did not become skewed towards one type of SP experience to the detriment of another. I actively looked for participant experiences that diverged from those of the developing themes (with similar intentions to a ‘deviant case analysis’; Lincoln & Guba, 1985) so that the final analysis would represent themes in context and with balance. When adding quotations to the prose of my analysis I re-read them in their original context to ensure that my representation of their words appeared to be a credible reflection of what was said.

An example of researcher reflexivity in relation to analysis process

Subjectivity as a Resource

I considered my subjectivity to be a resource when conducting interviews and analysing data (Gough & Madill, 2012). It guided my judgement when interviewing, helping me to respond to participants’ explicit, implicit and more verbally concealed distress. I allowed aspects of my own experience to resonate with those of participants meaning that I could listen to their stories with empathy and a genuine curiosity. During analysis, themes were actively created and categorised, demanding my use of self (DeSantis & Ugarriza, 2000). I sought to interpret the data rather than simply describe it, which necessarily requires acknowledgement of both researcher and participant subjectivity. I strongly feel that we can only make sense of another’s story by relating it to our own phenomenology (Smith & Shinebourne, 2012), and that we re-construct their stories on frameworks formed by our own subjective experience. As such it is useful to be aware of my personal experiences and assumptions.

Braun, V., & Clarke, V. (2006). Using thematic analysis in psychology. Qualitative Research in Psychology, 3 (2), 77-101.

Braun, V., & Clarke, V. (2013). Successful qualitative research: A practical guide for beginners. Sage.

DeSantis, L., & Ugarriza, D. N. (2000). The concept of theme as used in qualitative nursing research. Western Journal of Nursing Research, 22 (3), 351-372.

Gough, B., & Madill, A. (2012). Subjectivity in psychological research: From problem to prospect. Psychological Methods, 17 (3), 374-384.

Lincoln, Y. S., & Guba, E. G. (1985). Establishing trustworthiness. Naturalistic Inquiry, 289 (331), 289-327.

Smith, J. A., & Shinebourne, P. (2012). Interpretative phenomenological analysis. In H. Cooper, P. M. Camic, D. L. Long, A. T. Panter, D. Rindskopf, & K. J. Sher (Eds.),  APA handbook of research methods in psychology, Vol. 2. Research designs: Quantitative, qualitative, neuropsychological, and biological (p. 73–82). American Psychological Association.

Gina Broom (Research Master's)

The following extract is by Gina Broom, from her University of Auckland Master’s thesis (2020): “Oh my god, this might actually be cheating”: Experiencing attractions or feelings for others in committed relationships .

A detailed description of reflexive TA analytic approach and process

I analysed data through a process of reflexive thematic analysis (reflexive TA), as outlined by Braun, Clarke, Hayfield, and Terry (2019), who describe reflexive TA as a method by which a researcher will “explore and develop an understanding of patterned meaning across the dataset” with the aim of producing “a coherent and compelling interpretation of the data, grounded in the data” (p. 848). I utilized Braun and colleagues’ reflexive approach to TA, as opposed to alternative models of TA, due to my alignment with critical qualitative research. I did not select a c oding reliability TA approach, for example, due to its foundation of (post)positivist assumptions and processes (such as predetermined hypotheses, the aim of discovering ‘accurate’ themes or “domain summaries”, and efforts to ‘remove’ researcher bias while evidencing reliability/replicability), which were not suitable for the critical realist epistemology underpinning this thesis. In contrast, Reflexive TA is a ‘Big Q’ qualitative approach, constructing patterns of meaning as an ‘output’ from the data (rather than as predetermined domain summaries) while valuing “researcher subjectivity as not just valid but a resource” (Braun et al., 2019, p. 848). As the critical realist and feminist approaches of this thesis theorize knowledge as contextual, subjective, and partial, with reflexivity valued as a crucial process, a reflexive TA was the most appropriate method for this analysis.

Braun and colleagues’ (2019) reflexive TA process involves six-phases, including familiarization with the data, generating codes, constructing themes, revising and defining themes, and producing the report of the analysis. I outline my process for each of these below:

Phase 1, familiarization: Much of my initial engagement with the data was done through my transcription of the interviews, as the process provided extended time with each interview, both listening to the audio of the participant, and in the writing of the transcript. Some qualitative researchers describe transcription as an essential process for a researcher to perform themselves, as “transcribing discourse, like photographing reality, is an interpretive practice” (Riessman, 1993, p. 13), and as a result, “analysis begins during transcription” (Bird, 2005, p. 230). Braun and Clarke (2012) suggest certain questions to consider during the process of familiarization: “How does this participant make sense of their experiences? What assumptions do they make in interpreting their experience? What kind of world is revealed through their accounts?” (p. 61). During transcription, I took notes of potential points of interest for the analysis, using these types of questions as a guide. In exploring attractions or feelings for others in committed relationships, these questions (and my notes) often related to the meaning participants applied to their feelings and relationships, particularly in terms of morality and social acceptability, while the ‘world’ of their accounts was conveyed through their discourse of the contemporary relational context.

Phase 2, generating initial codes : Following transcription, I systematically coded each interview, searching for instances of talk that produced snippets of meaning relevant to the topic of attractions or feelings for others. I coded interviews using the ‘comment’ feature in the Microsoft Word document of each transcript, highlighting the relevant text excerpt for each code comment. I used this approach, rather than working ‘on paper’, so that I would later be able to easily export my coded excerpts for use in my theme construction. The coding of thematic analysis can be either an inductive ‘bottom up’ approach, or a deductive or theoretical ‘top down’ approach, or a combination of the two, depending on the extent to which the analysis is driven by the content of the data, and the extent to which theoretical perspectives drive the analysis (Braun & Clarke, 2006, 2013). Coding can also be semantic , where codes capture “explicit meaning, close to participant language”, or latent , where codes “focus on a deeper, more implicit or conceptual level of meaning” (Braun et al., 2019, p. 853). I used an inductive approach due to the need for exploratory research on experiences attractions or feelings for others, as it is a relatively new topic without an existing theoretical foundation. The focus of my coding therefore developed throughout the process of engaging with the data, focusing on segments of participants’ meaning-making in relation to general, personal, or partner-centred experiences of: attractions or feelings for others in the contemporary relational context, implied moral and/or social acceptability (or unacceptability), related affective experiences and responses, and enacted or recommended management of attractions or feelings for others. At the beginning of the process, I mostly noted semantic codes such as ‘feels guilty about attractions or feelings for others’, particularly as my coding was exploratory and inductive, rather than guided by a knowledge of ‘deeper’ contextual meaning. As I progressed, however, I began to notice and code for more latent meanings, such as ‘love = effortless emotional exclusivity’ or ‘monogamy compulsory/unspoken relationship default’. When all interviews had been systematically and thoroughly coded (and when highly similar codes had been condensed into single codes), I had a final list of roughly 200 codes to take into the next phase of analysis.

Phase 3, constructing themes : When developing my initial candidate themes, I utilized the approach described by Braun and colleagues (2019) as “using codes as building blocks”, sorting my codes into topic areas or “clusters of meaning” (p. 855) with bullet-point lists in Microsoft Word. From this grouping of codes, I produced and refined a set of candidate themes through visual mapping and continuous engagement with the data. These candidate themes were grouped into two overarching themes: the first encompassed 2 themes and 6 sub-themes evidencing pervasive ‘traditional’ conceptions of committed relationships (as monogamous by default with an assumption of emotionally exclusivity), and the way attractions or feelings for others were positioned as an unexpected threat within this context; the second encompassed four themes and eight sub-themes exploring modern contradictions (which problematized the quality of the relationship or the ‘maturity’ of those within it, rather than the attractions or feelings), and the way attractions or feelings for others were positioned as ‘only natural’ or even positive agents of change. This process of candidate theme development was still explorative and inductive, as I worked closely with the coded data and had only brief engagement with potentially relevant theoretical literature at this stage. Further engagement with contextually relevant literature, and a deductive integration of it into the analysis, was developed in the next phases.

Phases 4 and 5, revising and defining themes : My process of revising and defining themes started by using a macro (that was developed for this project) to export all of my initial codes and their associated excerpts into a single master sheet in Microsoft Excel, with columns indicating the source interview for each excerpt, as well as relevant participant demographic information (e.g. age, gender, relationship as monogamous or non-monogamous). This master sheet contained 6006 coded excerpts. In two new columns (one for themes and one for sub-themes), I ‘tagged’ excerpts relevant to my candidate analysis by writing the themes and/or sub-themes that they fit into. I was then able to export these excerpts, using the macro designed for this project, sorting the relevant data for each theme and sub-theme into separate tabs. I then reviewed all the excerpts for each individual theme and sub-theme, which allowed me to revise and define my candidate themes into my first full thematic analysis for the writing phase.

The thematic analysis at this stage included 13 themes and seven sub-themes, and these differed from the original candidate themes in a number of ways. In reviewing the collated data, I noted that some sub-themes were nuanced and prominent enough to be promoted to themes; the sub-theme ‘stay or go? (partner or other)’, for example, became the theme ‘you have to choose’. Similarly, I found other themes or sub-themes to be ‘thin’, and either removed them, or integrated them into other parts of the analysis; the sub-theme roughly titled ‘families at stake (marriage, children)’, for example, became a smaller part of the ‘safety in exclusivity’ theme. I also noted that the first overarching theme in the candidate analysis was ‘messy’, and in an effort to improve focus and clarity, I split this first overarching theme into three new ones, each with its own “central organizing concept” (Braun et al., 2019, p. 48): the first evidenced the contemporary relational context as one of default monogamy with an idealization of exclusivity; the second evidenced infidelity as an unforgivable offence, while associating attractions or feelings for others with this threat of infidelity; the third evidenced discourses in which someone must be to blame (either the person with the feelings or their partner). The second half of the candidate analysis became a fourth and final overarching theme, which encompassed a revised list of themes evidencing favourable talk of attractions or feelings for others.

Phase 6, writing the report : In writing my first draft of my analysis, I developed an even deeper sense of which themes and sub-themes were ‘falling into place’, and which did not fit so well with the overall analysis. At this point I was also engaging in a deeper exploration of relevant literature, and writing my chapter on the context of sexuality and relationships, which provided a foundation of theoretical knowledge that I could deductively integrate into my analysis. Through a process of supervisor feedback on my initial draft, engagement with literature, and revision of the data, I developed the analysis into the final thematic structure. My initial research question of ‘how do people make sense of attractions or feelings for others in committed relationships?’ also developed into three final research questions, each of which is explored across the three overarching themes of the final analysis:

Upon revision, both of the first two overarching themes from the second (revised) thematic map (‘the safety of default monogamy’ and ‘the danger of infidelity’) involved themes and sub-themes which situated attractions or feelings for others within the dominant contemporary relational context. I combined relevant parts of these into one overarching theme in the final analysis, which explored the research question: What is the contemporary relational context, and how are attractions or feelings for others made sense of within that context? Two themes and five sub-themes together evidenced attractions or feelings for others as a threat (by association with infidelity) within the mononormative sociocultural context.

The third overarching theme from the second (revised) thematic map (‘there’s gotta be someone to blame’) did not require much revision to fit with the final analysis. I refined information that was too similar or redundant in the original analysis, such as the sub-themes ‘partner is flawed’ and ‘deficit in partner’ which were combined into one sub-theme. I also added a third theme, ‘the relationship was wrong’, from a later part of the original analysis, as this also fit with the central organizing concept of wrongness and accountability. Together, these three themes and two sub-themes formed the second overarching theme of the final analysis, exploring the question: What accountabilities are at stake with attractions or feelings for others in committed relationships? This chapter also explores the affective consequences of these attributed accountabilities, as described by participants and interpreted by myself as researcher.

I revised and developed the final overarching theme most, in contrast to the analysis previously done, as my process of writing, feedback, and revision demonstrated that this section was the least coherent, and the central organizing concept required development. There were various themes and sub-themes across the initial analysis that explored imperatives or choices that were either made or recommended by participants. These parts of the original analysis were combined to produce the third overarching theme of the final analysis, including four (contradictory) themes and four sub-themes exploring the research question: How do people navigate, or recommend navigating, attractions or feelings for others?.

Combined, these three final overarching themes tell a story of (dominant or ‘normative’) initial sense making of attractions or feelings for others, subsequent attributions of accountability, and various (often contradictory and moralized) ways these feelings are navigated. Braun and Clarke (2006) describe thematic analysis as an active production of knowledge by the researcher, as themes aren’t ‘discovered’ or a pre-existing form of knowledge that will ‘emerge’, but rather patterns that a researcher identifies through their perspective of the data. My thematic analysis was influenced by my own social context, experiences, and theoretical positioning. In the context of critical research, ethical considerations are often complex, and researcher reflexivity is a crucial part of the process (Bott, 2010; L. Finlay, 2002; Lafrance & Wigginton, 2019; Mauthner & Doucet, 2003; Price, 1996; Teo, 2019; Weatherall et al., 2002). As the theoretical foundation of this thematic analysis was a combination of critical realism and critical feminist psychology, I engaged in an ongoing consideration of ethics and reflexivity throughout my data collection and analysis, which I discuss in the following section.

Bird, C. M. (2005). How I stopped dreading and learned to love transcription. Qualitative Inquiry , 11 (2), 226–248.

Bott, E. (2010). Favourites and others: Reflexivity and the shaping of subjectivities and data in qualitative research. Qualitative Research , 10 (2), 159–173.

Braun, V., & Clarke, V. (2006). Using thematic analysis in psychology. Qualitative Research in Psychology , 3 (2), 77–101.

Braun, V., & Clarke, V. (2012). Thematic analysis. In H. Cooper, P. M. Camic, D. L. Long, A. T. Panter, D. Rindskopf, & K. J. Sher (Eds.), APA Handbook of Research Methods in Psychology (Vol. 2: Research Designs: Quantitative, qualitative, neuropsychological, and biological, pp. 57-71). APA books.

Braun, V., & Clarke, V. (2013). Successful qualitative research: A practical guide for beginners . Sage.

Braun, V., Clarke, V., Hayfield, N., & Terry, G. (2019). Thematic analysis. In P. Liamputtong (Ed.), Handbook of Research Methods in Health Social Sciences (pp. 843-860). Springer.

Finlay, L. (2002). “Outing” the researcher: The provenance, process, and practice of reflexivity. Qualitative Health Research , 12 (4), 531–545.

Lafrance, M. N., & Wigginton, B. (2019). Doing critical feminist research: A Feminism & Psychology reader. Feminism & Psychology , 29 (4), 534–552.

Mauthner, N. S., & Doucet, A. (2003). Reflexive accounts and accounts of reflexivity in qualitative data analysis. Sociology , 37 (3), 413–431.

Price, J. (1996). Snakes in the swamp: Ethical issues in qualitative research. In R. Josselson (Ed.), Ethics and Process in the Narrative Study of Lives (pp. 207–215). Sage.

Riessman, C. K. (1993). Narrative analysis . Sage.

Teo, T. (2019). Beyond reflexivity in theoretical psychology: From philosophy to the psychological humanities. In T. Teo (Ed.), Re-envisioning Theoretical Psychology (pp. 273–288). Palgrave Macmillan.

Weatherall, A., Gavey, N., & Potts, A. (2002). So whose words are they anyway? Feminism & Psychology , 12 (4), 531–539.

Lucie Wheeler (Professional Doctorate)

The following sections are by Lucie Wheeler, from her UWE Counselling Psychology Professional Doctorate thesis – “It’s such a hard and lonely journey”: Women’s experiences of perinatal loss and the subsequent pregnancy .

Data from the qualitative surveys and interviews were analysed using reflexive thematic analysis within a contextualist approach, as this allows the flexibility of combining multiple sources of data (Braun & Clarke, 2006; 2020). Both forms of data provided accounts of perinatal experiences, and therefore were considered as one whole data set throughout analysis, rather than analysed separately. The inclusion of data from different perspectives, by not limiting the type of perinatal loss experienced, and offering multiple ways to engage with the research, allowed a rich understanding of the experiences being studied (Polkinghorne, 2005). However, despite the data providing a rich and complex picture of the participants’ experiences, I acknowledge that any understanding that has developed though this analysis can only ever be partial, and therefore does not aim to completely capture the phenomenon under scrutiny (Tracy, 2010). An inductive approach was taken to analysis, working with the data from the bottom-up (Braun & Clarke, 2013), exploring the perspectives of the participants, whilst also examining the contexts from which the data were produced. Through the analysis I sought to identify patterns across the data in order to tell a story about the journey through loss and the next pregnancy. The six phases of Braun and Clarke’s (2006; 2020) reflexive thematic analysis were used through an iterative process, in the following ways:

Phase 1 – Data familiarisation and writing familiarisation notes:

By conducting every aspect of the data collection myself, from developing the interview schedule and survey questions, to carrying out the face-to-face interviews, and then transcribing them, I was immersed in the data from the outset. Particularly for the interviews, the experience allowed me to engage with participants, build rapport, explore their stories with them, and then listen to each interview multiple times through the transcription process. I therefore felt familiar with the interview data before actively engaging with analysis. I found the process of transcribing the interviews a particularly useful way to engage with the data, as it slowed the interview process down, with a need to take in every word, and therefore led me to notice things that hadn’t been apparent when carrying out the interviews. The surveys, as well as the interview transcripts, were read through several times. I used a reflective journal throughout this process to makes notes about anything that came to mind during data collection and transcription. This included personal reflections, what the data had reminded me of, led me to think about, as well as what I noticed about the participant and the way in which they framed their experiences.

Phase 2 – Systematic data coding:

Coding of the data was done initially for the interviews, and then for the survey responses. I began by going line by line through each transcript, paying equal attention to each part of the data, and applying codes to anything identified as meaningful. The majority of coding was semantic, sticking closely to the participants’ understanding of their own experiences, however, as the process developed, and each transcript was re-visited, some latent coding was applied, that sought to look below the surface level meaning of what participants had said. Again, throughout this process, a reflective journal was used in order to make notes about my own experience of the data, to capture anything I felt may be drawing on my own experience, and to reflect on what I was being drawn to in the data.

Due to the quantity of data (over 70,000 words in the transcripts, and over 23,000 words of survey responses), this was a slow process, and required repeatedly stepping away from the data and coming back to it in a different frame of mind, reviewing data items in a different order, and discussions with peers and supervisors in the process. I noticed that my coding tended to be longer phrases, rather than one-to-two words, as it felt important to maintain some element of context for the codes, particularly as the stories being told had a sense of chronology to them, that seemed related to the way in which experiences were understood. The codes were then collated into a Word document. Writing up the codes in this way separately to the data, it was important to ensure that the codes captured meaning in a way that could be understood in isolation. Therefore, the wording of some of the codes was developed further at this stage. During the coding process I began to notice a number of patterns in the data, so alongside coding, I also developed some rough diagrams of ideas that could later be used in the development of thematic maps.

Phase 3: Generating initial themes from coded and collated data:

The process of generating themes from the data was initially a process of collating the codes from both the interviews and the surveys, and organising them in a way that reflected some of the commonality in what participants had expressed. Despite each of the participants having a unique story to tell, with details specific to their personal context, there was also commonality found in these experiences. Through reflecting on the codes themselves, going back to the data, and using notes and diagrams that had been made throughout the process in my reflective journal, I began to further develop ideas about the patterns that I had developed from the data. Related codes were collated, and developed into potential theme and sub theme ideas. I used thematic maps to develop my thinking, and changed these as my understanding of the data developed. I was conscious that in the development of codes and theme ideas, I wanted to ensure that my analysis was firmly grounded in the data, and therefore, repeatedly returned to the raw data during this process. The use of my reflective notes was also vital at this stage, to ensure that I did not become too fixated on limited ways of seeing the data, but was able to remain open and willing to let initial ideas go.

Phase 4: Developing and reviewing themes:

Theme development was an iterative process of going back and fore between the codes, and the way that patterns had been identified, and the data, collating quotes to illustrate ideas. A number of thematic maps were created that aimed to illustrate the way in which participants made sense of their experiences across the data set, including identifying areas of contradiction and overlap. The use of thematic maps was particularly useful as a visual tool of the way in which different ideas and patterns were connected and related.

Phase 5: Refining, defining and naming themes:

Through the process of developing thematic maps, areas of overlap became evident, which led to further refinement of ideas. There were many possible ways in which the data could be described, and therefore defining and articulating ideas to colleagues and supervisors brought helpful clarity about what could be defined as a theme, where related ideas fitted together into sub themes, and also where separation of ideas was necessary. The theme names were developed once there were clear differences between ideas, and with the use of participants’ quotes where appropriate, in order to keep close links between the themes and the data itself.

Phase 6: Writing the report:

Writing up each theme required further clarity as I sought to articulate ideas, and illustrate these through multiple participant quotes. The process of writing a theme report required further refinement of ideas, and rather than just a final part of the process, still required the iterative process of revisiting earlier phases to ensure that the ideas being presented closely represented the data whilst meeting the research aims. At this stage links were also made to existing literature in order to expand upon patterns identified in the data. Referring to relevant existing literature also helped me to further question my interpretation of the data, and to expand upon my understanding of the participants’ experiences.

Braun, V., & Clarke, V. (2013). Successful qualitative research: A practical guide for beginners . London: SAGE.

Braun, V., & Clarke, V. (2020). One size fits all? What counts as quality practice in (reflexive) thematic analysis? Qualitative Research in Psychology , 1-25. [online first]

Polkinghorne, D. E. (2005). Language and meaning: Data collection in qualitative research. Journal of Counseling Psychology, 52 (2), 137-145.

Tracy, S. J. (2010). Qualitative quality: Eight “big tent” criteria for excellent qualitative research. Qualitative Inquiry, 16 (10), 837.

U.S. flag

An official website of the United States government

The .gov means it’s official. Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

The site is secure. The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

  • Publications
  • Account settings

Preview improvements coming to the PMC website in October 2024. Learn More or Try it out now .

  • Advanced Search
  • Journal List
  • Int J Qual Stud Health Well-being
  • v.6(3); 2011

Children's understandings’ of obesity, a thematic analysis

Childhood obesity is a major concern in today's society. Research suggests the inclusion of the views and understandings of a target group facilitates strategies that have better efficacy. The objective of this study was to explore the concepts and themes that make up children's understandings of the causes and consequences of obesity. Participants were selected from Reception (4–5 years old) and Year 6 (10–11 years old), and attended a school in an area of Sunderland, in North East England. Participants were separated according to age and gender, resulting in four focus groups, run across two sessions. A thematic analysis (Braun & Clarke, 2006) identified overarching themes evident across all groups, suggesting the key concepts that contribute to children's understandings of obesity are “Knowledge through Education,” “Role Models,” “Fat is Bad,” and “Mixed Messages.” The implications of these findings and considerations of the methodology are discussed in full.

The Health Survey for England 2009 illustrated that 65.9% of men and 56.9% of women have a body mass index (BMI) higher than 25 kg/m 2 , classing them as overweight, obese (>30 kg/m 2 ), or morbidly obese (>40 kg/m 2 ). Obesity is linked to many chronic illnesses, including type II diabetes, heart disease, and some cancers—specifically bowel and others within the digestive system (Renehan, Tyson, Egger, Heller, & Zwahlen, 2008 ). As a result, the direct cost to the National Health Service (NHS) of treating obesity was estimated to be between £991 and £1,124 million, for the 2001/2002 financial year (McCormick & Stone, 2007 ).

Childhood obesity is of particular concern because obese children are far more likely than children of a normal weight to become obese adults (Alexander & Sherman, 1991 ). The Health Survey for England 2009 showed that between 1995 and 2008, the percentage of overweight and obese girls rose from 25.5 to 29.2% and from 24.5 to 31.4% for boys. This is despite the fact that during the same period reported total energy intake in the United Kingdom (UK) fell by around 20% (Statistics on Obesity, Physical Activity and Diet England, 2006 ). These contradictory figures highlight the complexity of factors contributing to obesity, pointing to issues such as levels of physical activity, which have significantly fallen over the past two decades (Prentice & Jebb, 1995 ).

Many other factors influence incidences of obesity. The negative impact of childhood obesity causes the greatest concern and needs to be further understood. Obese children are more likely to become obese adults and experience increased health problems. Knowler, Pettitt, and Saad ( 1991 ), highlighted the links between childhood obesity and a poor immune system, risk of raised blood pressure, and cardiovascular problems. Studies have also identified that overweight and obese children are more likely to suffer psychological problems associated with low self-esteem, bullying, and social exclusion (Breat, Mervielde, & Vandereycken, 1997 ).

On an international scale, obesity can be seen as a problem of the developed world, a result of economic wealth, high food availability, and low levels of manual labour leading to lower levels of physical activity. This is in conjunction with high levels of car ownership and wide ranging public transport systems adding to the problem. In short, at the heart of obesity lies a homeostatic biological system that works constantly to maintain energy balance to keep the body at a constant weight. This system has not yet adapted to the world in which we currently live because the pace of technological progress has surpassed evolution resulting in a more sedentary lifestyle (Department of Innovation Universities and Skills, 2007). One surprising feature of the geographical distribution of obesity is its increased prevalence in economically and socially deprived areas in the western world, including the focus of this current piece of research, the United Kingdom. This phenomenon is very much a recent development, because historically deprived areas tended to see higher levels of under-nutrition. Brunt, Lester, Davies, and Williams ( 2008 ) illustrate how this situation has now reversed. They found between 1995 and 2005 the gap between obesity levels in the most deprived areas compared to the least (the latter typically having the higher levels) was steadily closing, and that by 2005 obesity levels in the most deprived areas had overtaken those in the least deprived areas, a phenomena that persists today.

The Childhood Measurement Programme (Department of Health and Department for Children, Schools and Families, 2008 ) demonstrated Sunderland in the north-east of England has some of the highest levels of overweight and obese children in the United Kingdom. This same publication also points out the strong positive correlation between areas considered as deprived and levels of obesity in children in Reception (4–5 year olds) and Year 6 (10–11 year olds). Areas of Sunderland are considered to be economically and socially deprived meaning the children who live there can be considered high risk. The statistics relating to Sunderland, where this study took place, demonstrate that 27.8% of Reception-aged children are either overweight or obese and for Year 6 pupils this rises to 38.4%.

The Foresight Report (Department of Innovation Universities and Skills, 2007), tackling obesity, points out that current policies are failing because they do not provide the depth and range of interventions needed. This might lead to positive interventions being ineffective if they are undermined by other areas in society such as social factors and the power of media advertising. The government launched its Healthy Schools Initiative in 2005; however, there has been no substantial reduction in obesity levels since 2005 (Department of Health and Department for Children, Schools and Families, 2008). With this in mind it would seem timely to approach the problem from a different perspective. Effective policies to tackle obesity need to consider all parties involved. However, current policies have been formed using a top down approach i.e., from government, health and education professionals, and even celebrity chefs! Even though these groups are likely to have a broad understanding of the problem from its roots to the long-term consequences, there has been a notable failure to take into consideration the understandings of the individuals at highest risk of obesity, the children themselves. There is growing evidence that interventions incorporating the views of the target population have a greater level of success (Hesketh, Water, Green, Salmon, & Williams, 2005). In the United Kingdom there has been a strong movement to ensure the inclusion of children in decision making particularly in relation to issues that directly affect them such as education, social care, and health (Department of Health, 2002 ; Department of Health and Department for Education and Skills, 2004 ). The collection and dissemination of the understandings of children relating to obesity could provide an insight into why so many strategies are failing. This in turn could lead to the development of policies that can be delivered to provide more successful outcomes.

There is a clear shortage of research examining children's understandings’ of obesity, the studies that have attempted to explore this domain have focused on exploring parent and care giver perceptions (Young-Hyman, Herman, Scott, & Schlundt, 1999), and the understandings of health professionals (Chamberlin, Sherman, Jain, Powers, & Whitaker, 2002 ). More recently studies have considered the understandings of care givers, health professionals, and teachers alongside those of the children themselves (Borra, Kelly, Shirreffs, Neville, & Geiger, 2003; Hesketh et al., 2005). Studies that have examined children's understanding have been focused on body image, overweight versus underweight (Hill & Silver, 1995 ), and peer perceptions of overweight and eating behaviour (Bell & Morgan, 2000 ; Oliver & Thelen, 1996 ), but not on the understandings’ of the children themselves with regards to the causes and consequences of obesity.

Focus groups have proved to be a particularly useful method for collecting data from children, they are most effective with groups of three children and in situations where the children know and like each other. Groups must be carefully selected to ensure the children are comfortable with each other. Talking together in small groups is familiar territory for children because it simulates class work. This method allows the researcher to structure the discussion around themes or topics rather than direct questions. This in turn enables the children to take control of the discussion (Mauthner, 1997 ) with the researcher present to keep things on track. Conducting group discussions in single sex groups can also prove to be more successful because boys are often louder and more willing to talk and this can mean they direct the topic of conversation. It has also been noted the use of some sort of structured activity such as drawing, reading, or sorting cards, can help focus discussion in particular with young children. When discussing diet with children, nutritionists and dieticians regularly use replica food items to help visualise the topic under discussion and photos depicting scenes of physical activity have proved effective in qualitative studies (Hesketh et al., 2005 ).

In summary the objective of this research is to investigate the understandings of a high risk group of children (high risk because of their socio-economic status so determined by their locality), of some of the causes and consequences of obesity, and its links to diet and physical activity. The concepts and themes generated by this research should be used to provide an insight that may inform local policies and interventions that need to be developed to provide a broader and deeper range of options to address this multi-faceted issue.

In order to address the gaps in current literature it was decided this research should focus on identifying themes within the participants understanding. This would provide the researcher with scope for further investigation of the subject in question. It was therefore decided that the most appropriate method of analysis would be a thematic analysis. However, there have been criticisms of this approach in the past due to the lack of clear guidelines for researchers employing such methods. This has subsequently contributed to some researchers omitting “how” they actually analysed their results (Attride-Stirling, 2001 ). It was of upmost importance to the authors in this current study to employ a clear, replicable, and transparent methodology.

Braun and Clarke ( 2006 ) outline a series of phases through which researchers must pass in order to produce a thematic analysis. This procedure allows a clear demarcation of thematic analysis, providing researchers with a well-defined explanation of what it is and how it is carried out whilst maintaining the “flexibility” tied to its epistemological position. The authors in this paper take a position that acknowledges our desire to incorporate the individual experiences of the participants and the meanings they attach to them. However, we also wish to consider the impact of the wider social context on these meanings. Braun and Clarke describe such a position as “contextualist,” sitting firmly between essentialism or realism and constuctionism. Not all theorists describe these two poles of epistemological outlook in the same way; Madill et al. ( 2000 ) refers to them as “naive realist” and “radical relativist.” Methodologies that go hand in hand with this mid-ground position are typically phenomenological in nature, but the flexibility of thematic analysis means that it can also be underpinned by an “in-between” epistemological position. Willig ( 2008 , p. 13) summarises this by explaining a position that argues “while experience is always the product of interpretation and, therefore, constructed (and flexible) … it is nevertheless ‘real’ to the person who is having the experience.” We wish to consider the reality of obesity to the participants, through an exploration of their experiences and the meanings they attach to them, whilst incorporating the broader role society plays in contributing to and shaping the participants meaning making and subsequent understandings.

Participants

Twelve participants were selected through liaising with the school and class teachers, this was particularly important considering the sensitive nature of the research topic and the fact that the participants taking part in this study were children—a vulnerable group. Measures were taken to prevent any of the participants feeling stigmatised. Therefore, under the guidance of the class teachers, the participants approached to take part in the study were carefully selected to ensure no children who may have been made to feel uncomfortable by the discussion were included, and to make sure that the children selected to be in the same focus groups were comfortable with each other. Six (three boys and three girls) were selected from two school years; Reception, aged between 4 and 5 years and Year 6 aged between 10 and 11. The motivation for selecting these age groups was that government statistics relating to childhood obesity are published for these two age brackets. These age groups are viewed as critical points in measuring children's BMI and in monitoring their changing health status. Through looking at these age groups, it may help us to gain an insight into what understandings children arrive at school with (primarily shaped by their experiences set within a home environment) and those that they have later on in their school life when further social influence (school and peers) may play a role in shaping their understandings. Efforts were made to make the sample representative of ethnicities attending the school so a proportionate number of children of Bangladeshi and Afro-Caribbean heritage took part. Participants were not recruited on account of their BMI or weight status. The parents of the children were provided with a study information letter and, in addition, received a phone call from the school's community liaison officer to ensure that parents fully understood the nature of the study because the researcher was aware that for some parents English was not their first language. The phone calls were made in their mother tongue thus allowing the parents to sign the parental assent form with all their queries being answered. Participants were also asked for their verbal consent on the day prior to the study taking place.

The study had received ethical approval from Northumbria University's School of Psychology and Sports Science Ethics Board prior to commencing. The researcher had also been approved by means of an enhanced criminal records background check clearing her to work with children; this approval was required by both the school and the university.

The focus groups all took place in the same quiet room at the school and were conducted by the principal investigator (referred to herein as the researcher). On arrival, the researcher introduced herself and provided name badges for the participants. The researcher briefly explained to the participants that she was there to talk to them about food and exercise. The researcher also explained to the participants that she wanted them to assume that she knew nothing, they were not being tested, and she was only interested in hearing what they had to say—not whether they were right or wrong. Verbal instructions were provided to the participants and they provided verbal assent prior to the recording commencing. A series of questions were developed by the research team, these were designed to keep the focus group sessions on track whilst exploring issues relevant to the research question. The sessions started initially with a discussion centred on the replica food items laid out on the table. Participants were asked to use the replica food and pick out healthy foods and make what they thought would be a healthy lunch. They were asked to explain why it was healthy and what made it healthy. Participants were then asked about foods they liked and why they liked them. In addition, they were asked about the sorts of things they normally ate at home and in school and things they liked to eat. Once conversation had dwindled concerning the replica food the researcher introduced the laminated picture cards, and the discussion moved to physical activity with the researcher encouraging the participants to explore the relationship between diet and exercise. Questions focussed on what activities they thought were healthy (as the images depicted activities that were both physical and sedentary; that is, one image of somebody running another of somebody playing computer games). The participants were asked about what sorts of activities they liked doing and what made those activities good for them. They were asked what activities they regularly engaged with, the sorts of sports their parents and siblings took part in, and the activities they did as families. The themes of discussion were encouraged around the two elements pertinent to any strategy looking to reduce obesity: healthy eating and physical activity. Furthermore, questions also probed at what the participants thought the benefits were of following a healthy lifestyle and what the consequences were of not following one. They were also asked what advice they would give somebody who wanted to be healthier and how important it was to them to be healthy. The focus group guide was intended to provide a structure but not rigidly dictate the line of questioning. The researcher included prompts and encouraged participants to expand on their initial responses and followed up on notions that the participants raised themselves. The sessions on the first day lasted between 20 and 30 min, ending when the participants input was insufficient to continue. At the end of each session the researcher read out the participant debrief and provided each participant with a parental debrief information sheet to take home.

In order to strengthen the analysis process and gather the most appropriate data, the researchers reviewed the recording made on the first day and reflected on the procedures employed in the focus groups. Similar approaches of reviewing data to informing further data collection are used in methods such as grounded theory and it was felt that doing so would strengthen the current study. The decision was made not to use the props (replica food and cards) used on the first day in the second round of focus groups, as at times they had proved to be a distraction to the participants. As an alternative, Reception children were given colouring pens and paper to focus their attention. Year 6 focus groups were run again allowing for free discussion, following on from issues and understandings they had raised in the initial session. The second round of focus groups, other than the changes already detailed above, followed the same sequence as they had on day one and lasted around 30 min. The recordings were transcribed combining the recordings from both days creating four transcripts, one for each group.

Data analysis

The data collected from all the focus groups was transcribed by the principal investigator, during this process the initial thoughts and ideas were noted down as this is considered an essential stage in analysis (Riessman, 1993 ). The transcribed data was then read and re-read several times and, in addition, the recordings were listened to several times to ensure the accuracy of the transcription. This process of “repeated reading” (Braun & Clarke, 2006 ) and the use of the recordings to listen to the data, results in data immersion and refers to the researcher's closeness with the data. Following on from this initial stage and building on the notes and ideas generated through transcription and data immersion is the coding phase. These codes identified features of the data that the researcher considered pertinent to the research question. Furthermore, as is intrinsic to the method, the whole data set was given equal attention so that full consideration could be given to repeated patterns within the data. The third stage involved searching for themes; these explained larger sections of the data by combining different codes that may have been very similar or may have been considered the same aspect within the data. All initial codes relevant to the research question were incorporated into a theme. Braun and Clarke (2006) also suggest the development of thematic maps to aid the generation of themes. These helped the researchers to visualise and consider the links and relationships between themes. At this point any themes that did not have enough data to support them or were too diverse were discarded. This refinement of the themes took place on two levels, primarily with the coded data ensuring they formed a coherent pattern, secondly once a coherent pattern was formed the themes were considered in relation to the data set as a whole. This ensured the themes accurately reflected what was evident in the data set as a whole (Braun & Clarke, 2006 ). Further coding also took place at this stage to ensure no codes had been missed in the earlier stages. Once a clear idea of the various themes and how they fitted together emerged, analysis moved to phase five. This involves defining and naming the themes, each theme needs to be clearly defined and accompanied by a detailed analysis. Considerations were made not only of the story told within individual themes but how these related to the overall story that was evident within the data. In addition, it was highly important to develop short but punchy names that conveyed an immediate indication of the essence of the theme. The final stage or the report production involved choosing examples of transcript to illustrate elements of the themes. These extracts clearly identified issues within the theme and presented a lucid example of the point being made.

The thematic analysis process that was applied to the transcripts elicited key concepts that were evident in the data. These themes are viewed as essential in determining the understandings of all the participants. These categories have been labelled as “Knowledge through Education,” “Role Models,” “Fat is Bad,” and “Mixed Messages.” There are of course aspects of the participants’ understandings that overlap across these categories. This, however, should be viewed as a good interpretation of understandings and attitudes in general, which are never made up of isolated concepts but are all relative to each other.

Knowledge through education

This theme is defined by the ability of all the participants to understand the roles of diet and physical activity. This is, in part, likely to be defined by different levels of education that the two age groups represented have, but nothing conclusive can be drawn given the relatively small sample size. The impact of their education on their knowledge will be demonstrated through evidence from the transcript.

All participants in the reception age group expressed the ability to name and identify different food items from the replica food. When they were asked to prepare a healthy lunch from the food items, they were able to point out food that would typically be classified as healthy.

I: No none of it is real! So what have you put in your healthy lunches girls? You tell me what you have got. *: Apple, I've got pasta, egg, cracker, grapes, bun and cheese. Girls reception Open in a separate window

However, despite displaying that they “know” what healthy means there is evidence of confusion, and it would seem the concept of something being “good” for them is interpreted to be things they like to eat. This suggests that they don't yet fully understand the concept of “healthy” food.

I: And why's rice healthy? *: Because it's nice. I: What healthy food do you eat? *: Chips Boys reception Open in a separate window

Their definition of healthy is centred on food they believe will make them grow for which fruit is highlighted as being particularly important. However, they also attribute this property to the food that makes up their personal diets. This understanding might result from being told to eat so they grow up to be big and strong. It is important to consider younger children's understandings are likely to be primarily shaped by their home environment, where the emphasis is often on how much children are eating as opposed to what they are eating.

I: Why is a banana important? *: Because it makes you strong so you can grow you have to have fruit so you can grow. I: Can you tell me then girls, we have found all these things that are good, as an example can you tell me, sausage, why is sausage good? *: Because it makes you feel strong. Girls reception Open in a separate window

This understanding of the reception-aged girls represented in this study of eating so they can grow up to be strong is also evident with the boys in the same age group. However, the reception boys also place great importance on the necessity of exercise to develop strength, this demonstrates another aspect in their knowledge.

I: What about this one here, swimming, who likes swimming? *: Me *: Me *: Me I: And why is swimming good for you? *: Cos it makes you strong. Boys reception Open in a separate window

It is fair to say Year 6 groups relished the opportunity to express their knowledge. They were able to identify and name different food groups and discuss different types of physical activity; what's more they understand the link between the two in relation to obesity. It seems other influences have impacted on the children's understandings’ such as school and extracurricular groups.

*: This is a banana. I: Ok why's a banana healthy? *: Because it's got seeds inside, because it's a fruit. Girls year 6 Open in a separate window

The ability to identify a particular fruit by one of its universal characteristics shows a deeper level of understanding and suggests that a higher degree of learning. In fact it is explicitly stated that this nutritional knowledge has been gained at school.

I: So do you know the different groups of food like carbohydrates, I heard you say protein and dairy before? *: Done it in science. Girls year 6 Open in a separate window

Moreover, it isn't just a nutritional knowledge they have developed through education. They appear well versed in the concept of a balanced diet and also understand the importance of a balanced lifestyle in relation to physical activity. They are able to articulate the notion of a balanced, healthy lifestyle through a consideration of the consequences of over eating and not exercising.

I: So what happens to you if all you do is you do watch TV and play the computer, eat the food that you told me was the bad food, what would happen to you? *: You would have a miserable life. *: Get fat, teeth will fall out. Girls year 6 Open in a separate window

In the case of the Year 6 boys who took part in this study, it is apparent that although a great deal of their knowledge has come through education at school, other avenues have helped them develop different aspects of their understandings. In this case it seems to be through taking part in activities, typically sport outside of school, or and more uniquely to this group through the influence of their fathers.

*: I would say my dad likes fish so I eat fish loads. *: My dad likes chicken, so he gives me chicken cos after school I do sport, like boxing, he gives me a sandwich with loads of different toppings in cos meats a muscle maker and vegetables is like an energy maker, so if you eat those you will get fitter and healthier. Boys year 6 Open in a separate window

It is evident where the ability exists, or is encouraged, to apply knowledge they have in a context relevant to their own lives, the knowledge becomes embedded in their understandings; it is applicable to them and, therefore, moves from being written on the board in school to being important to their own existence. This is exhibited by those participants, in particular the boys who participated, who have an involvement in sport. Having a motivation to understand nutrition and exercise leads to a desire to apply it because they comprehend the potential benefits. This aspect within the initial theme of knowledge through education leads directly on to the next theme of role models. The key difference between these two themes is the first relates to information that is directly and intentionally meant to inform the children about healthy lifestyles in an institutional setting, while the second theme is typified by understandings that are formed through interactions with other people.

Role models

The application of knowledge gained through education is often facilitated by role models such as family members who reiterate this information through example. Role models play an important role in the concepts described by all the groups, for example, the older boys reported that their fathers helped encourage healthy behaviours, above and beyond the nutritional knowledge in the previous theme.

*: Like sometimes on an afternoon my dad goes to the gym, then there is these tracks outside, and I practice every day on my 100 meter sprint and I can do it in 12 seconds, and when I started doing it I was 21 second, so I keep practicing. Boys year 6 Open in a separate window

This demonstrates some of the participants’ understandings have developed by examples set for them by significant individuals in their lives. This is evident in the younger children's understandings in a less explicit manner; the example below demonstrates good health behaviours can be established through everyday behaviour exhibited by role models.

I: What about this one, walking to school? … Why is it good for you? *: Because me and my mam walk to school and its good. Girls reception Open in a separate window

There is some evidence that examples set to the girls who took part in this study, at home and by other role models, can encourage behaviours or ideals that are not beneficial to the girls health. Girls appear to look up to older female family members who aspire to be skinny.

*: I like to be skinny, my nana does as well, and she wants to be skinny because she's fat now but I still love her. Girls reception Open in a separate window

They also appear to have developed unrealistic ideas about weight loss and the consequences in terms of treatment. Viewing hospital treatment as a solution to obesity, demonstrates a lack of understanding about the role of lifestyle behaviours in the condition. This may also suggest that these participants don't appreciate the importance of lifestyle behaviours in the onset of obesity.

*: Guess what, I seen this film right the boy was fat right, his legs was right down to the bottom, he had a fat tummy, I was hiding cos I hated him, he was horrible, he will have to go to hospital, he was fat. Girls reception I: So what would you tell somebody if you pretend that I was really, really fat, what would you tell me to do. *: Go to the doctors … hospital, operation. Girls reception Open in a separate window

There was some evidence that the older girls in this study had a more balanced outlook on what sort of body shape was healthiest, because they were aware of the negative health consequences associated with being underweight. It is interesting, however, that they are aware that maintaining a healthy lifestyle may be a challenge and this may result in a barrier to adopting healthier practices.

I: What about the other end of the scale, you know if you've got overweight being fat on this side what about being underweight at this end? *: It's bad cos you're all bony and you can't do anything cos you're not strong enough, you're weak. *: So you need to be in the middle. I: Is it easy to stay in the middle? *: No, because sometimes you can't be bothered to eat well and exercise. Girls year 6 Open in a separate window

Within the theme of role models, there was some evidence of a difference between the genders in terms of available role models. The participating boys often cited football heroes as people whom they looked up to and aspired to be like. This highlights the role of the celebrity in providing a role model for today's children; the evidence from the participants in this study may suggest that typically boys look to footballers and other sporting heroes. It can be argued that such individuals do not always provide a strong moral code; they are seen as following a healthy lifestyle in terms of diet and exercise. It would seem that the female participants in this study often looked up to celebrities who weren't so explicitly seen to be following healthy lifestyles, or a sense of caution was attached to following healthier behaviours.

*: Yeah like Wayne Rooney. I: And why is he fit? *: Cos he's good at footballing. I: Do you think that they have to eat special food? *: Yes I: And what special food do they have to eat? *: Bananas and apples. Boys reception *: Actually you can put weight on running cos muscle weighs more than fat so you can put weight on—like Katie Price she put on 10 pounds cos she started running. Girls year 6 Open in a separate window

Another interesting aspect of the notion of role models’ is that the girls were more concerned with how they appeared in a physical sense; it was particularly striking that the Year 6 boys identified unhealthy behaviour in their female peers attributing this to a desire to be like models.

*: Yes, she wants to be a model so she starves herself, her mam gives her a big packed lunch and she puts most of it in the bin, she's like that skinny then she walks out of the dinner hall. Boys year 6 Open in a separate window

There were many aspects of the transcript that highlighted participants were aware that being underweight was as worrying as being overweight. However, across the board they were far more critical of individuals who were overweight and discussed wide ranging consequences for these individuals, this leads on to the next theme evident in the analysis.

There was a united consensus that being fat was something to avoid, that it was a bad thing, and had typically negative consequences. Elements of this theme have been demonstrated throughout the discussion of the previous two themes; however, this illustrates how their understanding impacts on their attitudes toward obesity.

*: Like all the fat goes through your blood and stuff. *: Like sugar, like all the sugar goes through your blood if you eat too much of it would clog up your arteries and you might die. Boys year 6 I: Like how? What would happen to you? Is something going to happen straight away or is it something that's going to happen to. *: You would get rotten teeth and you would not be as strong as you would be if you ate healthy and stuff. *: You could die. Girls year 6 *: Because fat would be horrible. *: Because it's bad for you, because it looks bad. *: Because people call you big fat. Girls reception Open in a separate window

In addition to the health issues and those relating to physical attractiveness were the issues of bullying and social exclusion, which seemed to play a big role in the children's understandings of what it would be like to be overweight. The stigma attached to being overweight is evident as participants often started giggling when talking about people being overweight.

I: Is it important to eat things that are good for you? *: Laughter I: What do you think happens to you if you eat lots of these biscuits? *: Fat I: And what good would stop you from getting fat, or would help you not be fat? *: Giggling Boys reception Open in a separate window

Inability to have a successful career and even death were understood to be the results of obesity. Participants felt people who were overweight were in some way bad or an embarrassment. There was even a sense of fear toward people who they considered overweight, indicating that they would avoid being seen with somebody who was obese.

I: So … so what do you think about being fat, like if you see somebody in the street who looks like they are not very healthy do you think? *: They can't do much, like most of the things you want to do in life, like swimming, jogging. *: Jobs when you grow older. Girls year 6 *: Like if my parents were proper massive and I went to the town with them I would just say they took me to the town and I don't know them. Boys year 6 Open in a separate window

It is clear that the participants’ understanding is that obesity is a very negative issue. However, there is also evidence that they understand the complexity of the condition and are also aware being underweight maybe as much of a problem. The older children in this study seemed to understand that it is a complex issue and fully grasped the concept of moderation. They often refer to the fact that you can have a small amount of things that maybe classified as unhealthy, as long as you don't eat them all the time or balance them out with exercise.

I: And what sort of things for eating well? *: Like fruit and vegetables. *: Some Sugar. *: If you eat vegetables and fruit and you might get back to underweight. *: And you want to be in the middle. *: You need a bit of fat on you. Girls year 6 Open in a separate window

This category of Fat is Bad highlights an issue that clouds all the children's understandings of issues surrounding obesity and that is of conflicting messages. This notion of mixed messages forms the final theme evident in the data.

Mixed messages

The evidence presented here would suggest the information intended to educate and inform children is often met with equal amounts of contradictory or confusing messages and behaviours. The result of this is easily demonstrated by comparing what the children know they should be doing with what they actually talk about doing. For the majority of the participants their knowledge did not always match with their described behaviour, their food preferences often overriding their knowledge. This was perhaps not so surprising; knowledge does not by any means dictate behaviour.

I: Do you have breakfast most mornings? Do you normally have some breakfast, what do you normally have for breakfast? *: Miss I have chocolate cookies. I: What did you have for your tea last night? *: I just had for my supper. I: What did you have last night for your supper? *: Err sandwiches, cake and I: What about what did you have last night for your tea? *: Pizza Girls reception I: You eat two, two pieces of fruit? *: Yes, cos my mam chops it into two halves. Boys reception Open in a separate window

Conflict existed in a number of forms in the understandings expressed by the participants. It is worth reiterating that the younger girls who participated believed treatment for obesity was to go to the hospital and have an operation—something they have picked up from a TV documentary—this conflicts with diet and exercise education they receive at school. Other participants gave more specific and direct examples of receiving contradictory information. This ranged from conflicts in direct health messages to conflicting information and action between school and home. They felt that at times it was difficult to know which information was the right information, not only was it conflicting but it was forever changing.

*: And people say if you make fruit smoothies its healthy for you but it said in the news something about being obese again it said that if you drink a smoothie one a day you'll put on 13 pounds, that's nearly a stone in a year. Boys year 6 I: What about at home? You know if you're taught all this stuff at school what happens when you go home? Do Mum and Dad teach you the same things or is it different? *: Different I: And why is it different? *: I eat more sweets. Girls year 6 Open in a separate window

In addition to this, older children also pointed out they felt that healthy lifestyle information wasn't always delivered in the correct manner, there was a belief that stigmatising people who were overweight was negative. There was an awareness that there is a psychological aspect to overeating, and in some individuals it is this that needs to be addressed. Moreover, there was a feeling again demonstrated solely by the older participants that being overweight/obese could be difficult to rectify and maintaining a healthy weight could be a challenge.

*: So you need to be in the middle. I: Is it easy to stay in the middle? *: No, because sometimes you can't be bothered to eat well and exercise. Girls year 6 I: Do you think it's quite easy to lose weight? *: Yes *: Well for some people. *: If you put your mind to it, it is. I: No go on cos everyone's got different ideas. *: You can't just lose weight quickly. *: Cos my dad when he was young he was obese so he told me, but he's sort of addicted really. *: Addicted to what. *: Addicted he cannot stop but he's trying. *: He cannot stop what. *: Eating when he was young, he like learnt now he's saying to me about being fit cos he tells me about what happened when he was young so I try it. Boys year 6 Open in a separate window

This understanding of the complex nature of the obesity problem, coupled with the confusion and conflict in both the information and behaviours the participants are exposed to, can help explain some of the barriers to individuals adopting a healthier lifestyle.

Comprehensive understanding

The results detailed above highlight some important findings as to how children understand obesity in terms of some of its causes and consequences. It was particularly clear that knowledge, often imparted in a school setting, is getting through to the children who participated in this study. However, it appears equally evident that this knowledge in many cases does not transfer to behaviour. Further examination of the results allows us to explore the potential reasons behind the knowledge-behaviour gap.

Role models by their nature provide examples for both the children's beliefs and their behaviour. There are a wide variety of potential role models for children from parents, teachers, peers, and celebrities. What seems particularly important, in terms of being a positive role model with regards to healthy lifestyles, is that children have an opportunity to view the process of being healthy. In this study, this was typified by the examples of the Year 6 boys who participated in sport with their fathers. It appears this close and active relationship allows the knowledge that has been started at school to grow. Allowing children the opportunity to apply their knowledge and see the steps taken by a role model to get or stay fit help translate this knowledge into behaviour. What is interesting, however, is that it seems passive behaviours by role models can have the same impact. It was the case with these participants that the effect of passive knowledge transfer seemed to be more negative, but that is by no means to say that passive behaviours by role models will not also encourage positive lifestyle behaviours in other cases. The most obvious example of this within this data set was the seemingly implicit messages that the girls received about being skinny. There was not an overtly explicit attempt on the behalf of the role models described here to encourage a “skinny” ideal; however, messages seemed to reach the participants that would indicate this is the case. The key difference between these active and passive role models appears to come from whether the role models place focus on the process; taking part in sport (in the example of the older boys) or outcome being skinny (in the example of the girls). Focus on the action of being physically active or enjoying a healthy diet in the case of these participants produces a healthier outlook on maintaining a healthy body weight. When that focus is on the outcome—the weight loss or the weight gain—there seems to be less concern for actually “being healthy” in terms of body weight and lifestyle. This notion about process and outcome is intrinsically linked to the theme of Fat is Bad.

It is interesting to note that whilst the children expressed an understanding of fat as a component of diet and were able to identify high fat foods and their link to obesity, the focus was on fat as an outcome and not so much about it as input. It is a well-documented fact that fat is a requirement of a balanced diet. The participants were able to recite in great detail the consequences of becoming fat but were not so forthright about the processes involved in becoming fat. It can be suggested that by focussing on the process of becoming fat and understanding the need for fat in moderation and being physically active it may help to discourage fat becoming the output. This may also help to draw away the focus from physical appearance that is so closely tied to the stigma attached to being overweight and place it on living a healthy lifestyle and being healthy.

The key finding of this study is that it is evident that children receive contradictory messages when it comes to following a healthy diet and taking part in exercise. The research presented here highlights children's understandings of some of the causes of obesity and the consequences of becoming overweight. However, it is equally evident that this information has reached them on a knowledge level but has not or cannot be fully translated into behaviour. It appears that central to this problem are the multiple discourses that exist around diet and exercise. Whilst government campaigns may impart facts and figures and provide advice on changes that can be made, there are a whole host of other sources to contend with. There is an undoubted role played by the media both in terms of active advertising campaigns for junk food or sedentary games and the passive portrayal of unattainable body shapes and sizes in magazines and by celebrity culture. However, more than this, health messages are competing against a variety of cultural values, social, and personal norms that may well go against messages that encourage certain behaviours. What is more is that ultimately individuals have the power and autonomy to make their own choices about diet and exercise. Stakeholders need to ensure that people are in a position to make an informed decision and not one where their judgement is clouded by an array of contradicting messages. There is also a responsibility to ensure that individuals are able to act on advice given and to provide advice that is relevant and tailored to individual circumstances. It is easy to understand why parents on a low income may struggle to incorporate “5 a day” into their families diets when they perhaps don't have access to a car and the nearest shop selling fresh fruit and vegetables is several miles away. Ensuring people know that frozen fruits and vegetables are just as good and, in some cases better, is a far more useful and usable message.

Comparisons with past research

The objective of this study was to explore children's understandings of obesity in terms of diet and physical activity; the children included were considered high risk because of their socio-economic status. To meet this objective, focus group data was analysed using thematic analysis. This analysis produced key themes pertaining to the understandings of the participants. There is not a wealth of prior research in this domain and it was for this reason thematic analysis was chosen to analyse the data. The method proved to be particularly useful in generating these exploratory data that are discussed here in relation to previous findings.

The theme of knowledge has previously been identified by Hesketh et al. ( 2005 ) in terms of information and awareness that is pertinent to children's perceptions of healthy eating, activity, and preventing obesity. Increasing knowledge relating to diet and physical activity cannot prevent obesity but it can encourage children to make informed choices.

This study, as have others (Hesketh et al., 2005 ; Borra et al., 2003 ; Musaiger, Mater, Alekri, & Mahdi, 1991 ), identified misunderstandings in children's knowledge as barriers to healthful behaviour. It might be useful to address this issue, particularly with younger children who are developing their knowledge. Previous literature has identified young children often consume their recommended daily intake of fruit but fall well short when it comes to vegetables (Dennison, Rockwell, & Baker, 1998 ). Government campaigns encourage people to eat five portions of fruit and vegetables a day ( www.5aday.nhs.co.uk ); however, nutritionists would encourage three portions of vegetables and two of fruit—fruit having high sugar content. There was no evidence in the transcripts that any of the children were aware of or understood this distinction. This needs further investigation; however, education should encourage an understanding of fruit and vegetables as separate entities to help increase the consumption of vegetables (Gibson, Wardle, & Watts, 1998 ).

The evidence in this study suggests children grasp the causes of obesity, overeating, and low levels of physical activity; however, there was a general lack of understanding of the underlying physiological processes. There was a limited understanding of the concept of energy balance or that there might also be medical reasons for the obesity. Bell and Morgan ( 2000 ) demonstrated providing medical explanations for obesity can have a positive effect on children's attitudes to obese individuals. Overweight individuals were generally stigmatised by the participants in this study, so providing better medical information could help to alleviate these negative attitudes. It is fair to say those children who did have more in-depth knowledge of obesity were more sympathetic in their considerations of overweight individuals acknowledging the difficulty in making lifestyle changes.

The influence of parents concerning diet and exercise behaviours is well documented (Prout, 1996 ). Hesketh et al. (2005), Borra et al. (2003), and Young-Hyman et al. ( 2000 ) consider parental influence to be a determining factor in children's attitudes and understandings of obesity. It is clear this influence can be as detrimental as it can be beneficial. Previous research (Borra et al., 2003 ) argues interventions need to be developed that consider the role of the parent. Children cannot be expected to apply the information they receive at school to themselves if it is not reiterated at home. Nutritional education and physical education have not formed a core or extensive part of school curriculums in the United Kingdom in previous years, and there is now a generation of young parents who do not have the skills to attractively present appropriate foods (Tuttle & Truswell, 2002 ) or who regularly take part in sport themselves. The impact of this on their children's behaviour is that they don't always have examples of healthy behaviour to model their own on.

Of particular importance was the finding that children feel that they often receive mixed and contradicting messages. This is of great relevance when considering the development of policies and strategies that can be more effective. More over this backs up the findings of Dorey and McCool ( 2009 ) who conclude that nutritional messages evident in health promotion and advertising were often perceived by child audiences to be ambiguous. The authors warn that these contradictory messages could potentially serve to weaken the trustworthiness viewers have in health promotion initiatives. This really points to a key area in which health professionals can target efforts to tackle obesity. Clarity and consistency in healthy messages and recommendations are central to helping people take on board and act on the information they receive. Contradiction allows room for people to question the advice given and when effort is required to make a change in behaviour that change is less likely to be made if there is reason to doubt the accuracy of information. Furthermore, coherent messages need to consider person specific factors that may inhibit behaviour change; when individuals are encouraged to behave in a certain way but the constraints of day-to-day life lead to another, the results are confusion and hostility to the initial message (Owens & Driffill, 2008 ).

Procedural issues

The main methodological issue arising was participants from Reception struggled to engage fully in conversation, and the sessions followed a structure more a kin to an interview (i.e., question and answer). It was difficult to encourage responses that were longer than a few words; often one word responses were given. There is the potential to gain some very useful information from children in this age group; however, it can be a long and time-consuming process to elicit enough information to make the analysis process worthwhile. The length of the sessions also must be kept relatively short because attention spans are not long lasting; this was a finding similar to that of Miller ( 2000 ). The replica food items selected to help provide structure to the focus groups were useful and did provide a catalyst for discussion; however, for very young children (i.e., those in Reception) they resemble toys too closely, this then leads to them becoming more of a distraction, hindering the discussion. The use of the picture cards and pens and paper as suggested by Backett and Alexander ( 1991 ) provided a more a suitable means of structuring focus groups for young children.

There were at times issues with certain members of the groups making themselves heard more than others, thus the researcher had to encourage those happier to sit back and let others take the lead (Kirk, 2007 ). However, through a little encouragement all participants appeared comfortable talking with each other and participated equally, a result of the careful selection process. It also appeared to be beneficial speaking to boys and girls separately, with the boys often more excitable in their discussion style in comparison to the girls. It also facilitated the identification of some important issues, for example, the Year 6 boys identified eating behaviours present in the Year 6 girls that the girls themselves did not discuss.

Implications for the future

The Foresight Report (Department of Innovation Universities and Skills, 2007 ), in tackling obesity, points out that current policies are failing because they do not provide the depth and range of interventions needed. This present study has determined that central to children's understandings of the causes and consequences of obesity are the concepts of knowledge, the opportunity to apply this knowledge to their own lives, and the existence of role models to set an example. There exist certain myths and misconceptions that need to be addressed and children need to believe they can trust the health messages they receive because they are aware some messages are misleading or forever changing.

The key to this issue seems to be children learn by example, they can have all the knowledge in the world provided to them through an institution such as a school but this information needs to be supported by life at home. This provides evidence that campaigns need to target parents to tackle childhood obesity; this is an issue that policy makers are already aware of ( National Institute for Health and Clinical Excellence, 2006 ). However, this means health messages delivered to the general public need to be clearer and avoid ambiguity. There needs to be careful considerations of the context in which health messages are received, taking into account the understandings of the target population (Hesketh et al., 2005).

There were some issues raised in the focus group that were beyond the scope of this particular study. There was a representation of different ethnic minorities in the groups, and slight differences in the understandings of these different groups were identified. Further research should investigate the understandings of different minority groups to see if ethnicity influences or results in divergent concepts. Future study also needs to look at strategies that enable children to apply healthy lifestyle information to their own lives.

Children spend, on average, a quarter of their waking lives in schools; therefore, schools can be seen as an effective environment and source to help encourage healthy lifestyles. However, that leaves three quarters of a child's time in which they are out of the control of the school environment. Strategies must be developed to unite the teaching at school with practices in the home. This supports the conclusions of Hughes, Sherman, and Whitaker ( 2010 ) who write that strategies need to be framed in a manner that makes low income mothers feel more supported in addressing issues their children may have with their weight. Ensuring that approaches to encourage healthy lives take on a holistic format will also help to provide consistent and realistic role models. There needs to be a concerted effort from within society to develop role models who have a healthy relationship with food and exercise. These seem to already exist for young boys in the form of sporting heroes but seem in short supply for young girls who already consider that being healthy is the ideal but then look to surgery as a form of weight loss. Lieberman, Gauvin, Bukowski, and White ( 2001 ) highlight the importance of role models and peer influence in the onset of disordered eating in young girls and this needs to be seriously taken into account when sending out messages that being overweight is bad, girls need to be aware that being underweight also has severe health consequences.

In conclusion, the time children spend eating and taking part in physical activity out of school is likely to be the biggest challenge to preventing the continuing obesity problems in the United Kingdom, and this is where current strategies appear to be failing. Children understand obesity and its contributing factors in terms set out to them by those people they consider role models. It is only by helping these role models to provide consistent and reliable information by setting suitable active examples and by being aware of the impact of their passive actions that we can begin to address the problem of obesity.

Acknowledgements

The authors would like to thank Sunderland Children's Centres and Back on the Map for their support in facilitating this research.

Conflict of interest and funding

The author have not received any funding or benefits from industry or elsewhere to conduct this study

  • Alexander M. A., Sherman J. B. Factors associated with obesity in school children. Journal of School Nursing. 1991; 7 :6–10. [ PubMed ] [ Google Scholar ]
  • Attride-Stirling J. Thematic networks: An analytical tool for qualitative research. Qualitative Research. 2001; 1 :385–405. [ Google Scholar ]
  • Backett K., Alexander H. Talking to young children about health: Methods and findings. Health Education Journal. 1991; 50 (1):34–38. [ Google Scholar ]
  • Bell S. K., Morgan S. M. Children's attitudes and behavioural intentions towards a peer presented as obese: Does a medical explanation for the obesity make a difference. Journal of Paediatric Psychology. 2000; 25 (3):137–145. [ PubMed ] [ Google Scholar ]
  • Borra S. T., Kelly L., Shirreffs M. B., Neville K., Geiger C. J. Developing health messages: Qualitative studies with children, parents, and teachers help identify communications opportunities for healthful lifestyles and the prevention of obesity. Journal of the American Dietetic Association. 2003; 103 (6):721–728. [ PubMed ] [ Google Scholar ]
  • Braun V., Clarke V. Using thematic analysis in psychology. Qualitative Research in Psychology. 2006; 3 (2):77–101. [ Google Scholar ]
  • Breat C., Mervielde I., Vandereycken W. Psychological aspects of childhood obesity: A controlled study in a clinical and non clinical sample. Journal of Paediatric Psychiatry. 1997; 22 (1):59–71. [ PubMed ] [ Google Scholar ]
  • Brunt H., Lester N., Davies G., Williams R. Childhood overweight and obesity: Is the gap closing the wrong way? Journal of Public Health. 2008; 30 (2):145–152. [ PubMed ] [ Google Scholar ]
  • Chamberlin L. A., Sherman S. N., Jain A., Powers S. W., Whitaker R. C. The challenge of preventing and treating obesity in low-income, preschool children: Perceptions of WIC Health Care Professionals. Archives of Pediatrics & Adolescent Medicine. 2002; 156 :662–668. [ PubMed ] [ Google Scholar ]
  • Dennison B. A., Rockwell H. L., Baker S. L. Fruit and vegetable intake in young children. Journal of the American College of Nutrition. 1998; 17 (4):371–378. [ PubMed ] [ Google Scholar ]
  • Department of Health. Listening, hearing and responding: Department of Health Action Plan—Core principles for the involvement of young people. London: Author; 2002. [ Google Scholar ]
  • Department of Health and Department for Children, Schools and Families. National Childhood Measurement Programme: results from the 2006/07 school year. The Information Centre. 2008. Retrived April 10, 2008, from http://www.ic.nhs.uk/pubs/ncmp0607 .
  • Department of Health and Department for Education and Skills. The national service framework for children, young people and maternity services (executive summary) London: Author; 2004. [ Google Scholar ]
  • Department of Innovation Universities and Skills. Foresight-tackling obesities: Future choices and project report. London: Government Office for Science; 2007. [ Google Scholar ]
  • Dorey E., McCool J. The role of the media in influencing children's nutritional perceptions. Qualitative Health Research. 2009; 19 (5):645–654. [ PubMed ] [ Google Scholar ]
  • Gibson E. L., Wardle J., Watts C. J. Fruit and vegetable consumption, nutritional knowledge and beliefs in mothers and children. Appetite. 1998; 31 :205–228. [ PubMed ] [ Google Scholar ]
  • Hesketh K., Water E., Green J., Salmon L., Williams J. Healthy eating, activity and obesity prevention: A qualitative study of parent and child perceptions in Australia. Health Promotion International. 2005; 20 (1):19–26. [ PubMed ] [ Google Scholar ]
  • Hill A. J., Silver E. K. Fat, friendless and unhealthy: 9-year old children's perception of bodyshape stereotypes. International Journal of Obesity and Related Metabolic Disorders. 1995; 19 :423–430. [ PubMed ] [ Google Scholar ]
  • Hughes C. C., Sherman S., Whitaker R. How low-income mothers with overweight preschool children make sense of obesity. Qualitative Health Research. 2010; 20 :465–478. [ PubMed ] [ Google Scholar ]
  • Kirk S. Methodological and ethical issues in conducting qualitative research with children and young people: A literature review. International Journal of Nursing Studies. 2007; 44 :1250–1260. [ PubMed ] [ Google Scholar ]
  • Knowler W. C., Pettitt D. J., Saad M. F. Obesity in Pime Indians: Its magnitude and relationship with diabetes. American Journal of Clinical Nutrition. 1991; 53 :15435–15515. [ PubMed ] [ Google Scholar ]
  • Lieberman M., Gauvin L., Bukowski W., White D. Interpersonal influence and disordered eating behaviours in adolescent girls. The role of peer modelling, social reinforcement, and body-related teasing. Eating Behaviour. 2001; 2 :215–236. [ PubMed ] [ Google Scholar ]
  • Madill A., Jordan A., Shiley C. Objectivity and reliability in qualitative analysis: realist, contextualist and radical constructionist epistemologies. British Journal of Psychology. 2000; 91 :1–20. [ PubMed ] [ Google Scholar ]
  • Mauthner M. Methodological aspects of collecting data from children: Lessons from three research projects. Children and Society. 1997; 11 :16–28. [ Google Scholar ]
  • McCormick B., Stone I. Economic costs of obesity and the case for government intervention. Obesity Reviews. 2007; 8 (1):161–164. [ PubMed ] [ Google Scholar ]
  • Miller S. Researching children: Issues arising from a phenomenological study with children who have diabetes mellitus. Journal of Advanced Nursing. 2000; 31 (5):1228–1234. [ PubMed ] [ Google Scholar ]
  • Musaiger A. O., Mater A. M., Alekri S. A., Mahdi A. E. Knowledge and attitudes of Bahraini adolescents towards obesity. Journal of Consumer Studies and Home Economics. 1991; 15 :321–325. [ Google Scholar ]
  • National Institute for Health and Clinical Excellence. Obesity guidance on the prevention, identification, assessment and management of overweight and obesity in adults and children. 2006. Clinical Guidelines. [ PubMed ] [ Google Scholar ]
  • Oliver K. K., Thelen M. H. Children's perceptions of peer influence on eating concerns. Behavior Therapy. 1996; 27 :25–39. [ Google Scholar ]
  • Owens S., Driffill L. How to change attitudes and behaviour in the context of energy. Energy Policy. 2008; 36 :4412–4418. [ Google Scholar ]
  • Prentice A. M., Jebb S. A. Obesity in Britain: Gluttony or sloth? British Medical Journal. 1995; 11 :437–439. [ PMC free article ] [ PubMed ] [ Google Scholar ]
  • Prout A. Families, cultural bias and health promotion. London: Health Education Authority; 1996. [ Google Scholar ]
  • Renehan A. G., Tyson M., Egger M., Heller R. F., Zwahlen M. Body-mass index and incidence of cancer: A systematic review and meta-analysis of prospective observational studies. Lancet. 2008; 371 :569–578. [ PubMed ] [ Google Scholar ]
  • Riessman C. K. Narrative analysis. London: Sage; 1993. [ Google Scholar ]
  • Statistics on Obesity, Physical Activity and Diet England. The NHS Information Centre. 2006. Retrieved October 2010 from http://www.ic.nhs.uk/statistics-and-data-collections/health-and-lifestyles/obesity/statistics-on-obesity-physical-activity-and-diet-england-2006 .
  • The Health Survey for England. Body mass index (BMI) by gender, updated tables including 2008 data. The NHS Information Centre. 2009. Retrieved March 2010 from http://www.ic.nhs.uk/statistics-and-data-collections/health-and-lifestyles-r elated-surveys/health-survey-for-england .
  • Tuttle C., Truswell S. Childhood and adolescence. In: Mann J., Truswell S., editors. Essentials of human nutrition. Oxford: Oxford University Press; 2002. [ Google Scholar ]
  • Willig C. Introducing qualitative research in psychology. 2nd ed. England: OUP; 2008. [ Google Scholar ]
  • Young-Hyman D., Herman L. J., Scott D. L., Schlundt D. G. Care giver perception of children's obesity-related health risk: A study of African American families. Obesity Research. 2000; 8 :241–248. [ PubMed ] [ Google Scholar ]

U.S. flag

An official website of the United States government

The .gov means it’s official. Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

The site is secure. The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

  • Publications
  • Account settings
  • My Bibliography
  • Collections
  • Citation manager

Save citation to file

Email citation, add to collections.

  • Create a new collection
  • Add to an existing collection

Add to My Bibliography

Your saved search, create a file for external citation management software, your rss feed.

  • Search in PubMed
  • Search in NLM Catalog
  • Add to Search

Scientific models for qualitative research: a textual thematic analysis coding system - part 2

Affiliations.

  • 1 Forensic Mental Health Research Unit, Middelfart, Faculty of Health Science, Department of Regional Health Research, University of Southern Denmark, Denmark.
  • 2 University of Newcastle School of Nursing and Midwifery, Callaghan, NSW, Australia.
  • PMID: 37440301
  • DOI: 10.7748/nr.2023.e1893

Background: Models are central to the acquisition and organisation of scientific knowledge. They can be viewed as tools for interpretive description as well as cognitive representations of an empirical phenomenon. However, discussions about how to develop models in qualitative research - particularly in the literature on thematic analysis - are sparse.

Aim: To discuss an approach to scientific qualitative modelling that uses the new technique described in the first part of this article ( Gildberg and Wilson 2023 ): the Empirical Test for Thematic Analysis (ETTA).

Discussion: The authors discuss scientific models and their inherent limitations and strengths, so that others may assess models and their potential.

Conclusion: A limitation of ETTA is the risk that excessive rigour and systematisation could reduce creativity in the construction of models. However, on balance there is a scientific need for qualitative researchers to improve their capability to refine and describe the techniques they use to construct models, adequately explain the reliable generation of models, and improve transparency regarding the epistemological and methodological basis for the construction of models.

Implications for practice: By using ETTA on qualitative data obtained from clinical practice it becomes possible to illuminate the interconnections among themes within the data. This approach not only assists in illustrating these connections, it also enables clinicians and researchers to gain a comprehensive understanding of specific clinical phenomena through the use of models. The process of developing and using these models enables the simulation and strategic intervention development based on data that addresses the specific problem being investigated.

Keywords: data analysis; methodology; qualitative research; research.

© 2023 RCN Publishing Company Ltd. All rights reserved. Not to be copied, transmitted or recorded in any way, in whole or part, without prior permission of the publishers.

PubMed Disclaimer

Conflict of interest statement

None declared

Erratum for

  • Scientific models for qualitative research: a textual thematic analysis coding system - Part 1. Alkier Gildberg F, Wilson R. Alkier Gildberg F, et al. Nurse Res. 2023 Sep 7;31(3):36-42. doi: 10.7748/nr.2023.e1893. Epub 2023 May 31. Nurse Res. 2023. PMID: 37254707

Similar articles

  • Avoiding and identifying errors in health technology assessment models: qualitative study and methodological review. Chilcott J, Tappenden P, Rawdin A, Johnson M, Kaltenthaler E, Paisley S, Papaioannou D, Shippam A. Chilcott J, et al. Health Technol Assess. 2010 May;14(25):iii-iv, ix-xii, 1-107. doi: 10.3310/hta14250. Health Technol Assess. 2010. PMID: 20501062 Review.
  • A research roadmap for complementary and alternative medicine - what we need to know by 2020. Fischer F, Lewith G, Witt CM, Linde K, von Ammon K, Cardini F, Falkenberg T, Fønnebø V, Johannessen H, Reiter B, Uehleke B, Weidenhammer W, Brinkhaus B. Fischer F, et al. Forsch Komplementmed. 2014;21(2):e1-16. doi: 10.1159/000360744. Epub 2014 Mar 24. Forsch Komplementmed. 2014. PMID: 24851850
  • Attempting rigour and replicability in thematic analysis of qualitative research data; a case study of codebook development. Roberts K, Dowell A, Nie JB. Roberts K, et al. BMC Med Res Methodol. 2019 Mar 28;19(1):66. doi: 10.1186/s12874-019-0707-y. BMC Med Res Methodol. 2019. PMID: 30922220 Free PMC article.
  • Factors that impact on the use of mechanical ventilation weaning protocols in critically ill adults and children: a qualitative evidence-synthesis. Jordan J, Rose L, Dainty KN, Noyes J, Blackwood B. Jordan J, et al. Cochrane Database Syst Rev. 2016 Oct 4;10(10):CD011812. doi: 10.1002/14651858.CD011812.pub2. Cochrane Database Syst Rev. 2016. PMID: 27699783 Free PMC article. Review.

Publication types

  • Search in MeSH
  • Citation Manager

NCBI Literature Resources

MeSH PMC Bookshelf Disclaimer

The PubMed wordmark and PubMed logo are registered trademarks of the U.S. Department of Health and Human Services (HHS). Unauthorized use of these marks is strictly prohibited.

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

  • View all journals
  • Explore content
  • About the journal
  • Publish with us
  • Sign up for alerts
  • Open access
  • Published: 28 August 2024

Performance and biases of Large Language Models in public opinion simulation

  • Yao Qu 1 &
  • Jue Wang   ORCID: orcid.org/0000-0002-3401-713X 1  

Humanities and Social Sciences Communications volume  11 , Article number:  1095 ( 2024 ) Cite this article

Metrics details

  • Politics and international relations
  • Science, technology and society

The rise of Large Language Models (LLMs) like ChatGPT marks a pivotal advancement in artificial intelligence, reshaping the landscape of data analysis and processing. By simulating public opinion, ChatGPT shows promise in facilitating public policy development. However, challenges persist regarding its worldwide applicability and bias across demographics and themes. Our research employs socio-demographic data from the World Values Survey to evaluate ChatGPT’s performance in diverse contexts. Findings indicate significant performance disparities, especially when comparing countries. Models perform better in Western, English-speaking, and developed nations, notably the United States, in comparison to others. Disparities also manifest across demographic groups, showing biases related to gender, ethnicity, age, education, and social class. The study further uncovers thematic biases in political and environmental simulations. These results highlight the need to enhance LLMs’ representativeness and address biases, ensuring their equitable and effective integration into public opinion research alongside conventional methodologies.

Similar content being viewed by others

example of thematic analysis in research

Bias of AI-generated content: an examination of news produced by large language models

example of thematic analysis in research

Evolving linguistic divergence on polarizing social media

example of thematic analysis in research

‘What they’re not telling you about ChatGPT’: exploring the discourse of AI in UK news media headlines

Introduction.

Public opinion is crucial in shaping policy decisions, particularly in democratic societies, where it reflects the electorate’s preferences, concerns, and priorities (Burstein, 2003 ). This feedback loop enables policymakers to remain attuned to their constituents’ needs, fostering accountability and responsive governance (Hutchings, 2005 ). While traditional public opinion collection methods like surveys and interviews provide valuable insights, they are plagued by issues like low response rates, potential biases, and challenges in achieving representativeness. For instance, lengthy surveys in particular risk diminishing respondent engagement due to their extensive nature (Dillion et al., 2023 ). Fortunately, the recent advances in artificial intelligence (AI), especially large language models (LLMs) like ChatGPT, offer a novel approach to complementing traditional methods in public opinion collection as they are capable of swiftly responding to a multitude of questions (Lee et al., 2023 ). This efficiency, combined with the ability to process and analyze extensive text data, empowers LLMs to uncover insights into public sentiment often overlooked by conventional methods (Ray, 2023 ).

The role of generative LLMs in social science is increasingly recognized for its multifaceted applications. As noted by Korinek ( 2023 ), these models are instrumental in various tasks within psychological science, including editing academic papers and facilitating literature reviews. In the educational domain, Cowen and Tabarrok ( 2023 ) demonstrate how LLMs can simulate expert responses or create specific personas to deepen understanding of complex subjects like economics.

Recent research underscores the potential of LLMs in public opinion analysis. For instance, Argyle et al. ( 2023 ) demonstrated ChatGPT’s ability to accurately reflect responses across various human subgroups, particularly in the context of presidential election behaviors. A notable correlation was observed between human responses and those generated by LLMs, referred to as ‘silicon samples’. Similarly, Lee et al. ( 2023 ) found that LLMs can predict public opinions on global warming. However, Lee et al. ( 2023 ) emphasized the need for LLMs to incorporate a broader range of variables, including psychological factors, for a more precise simulation of opinions on complex issues like global warming. Additionally, Aher et al. ( 2023 ) and Horton ( 2023 ) explored LLMs’ capacity to emulate specific personas, showing ChatGPT’s proficiency in replicating human subject experiments with detailed demographic analysis. Complementary to this, studies by Brand et al. ( 2023 ) and Park et al. ( 2023 ) highlighted ChatGPT’s skill in simulating consumer behavior and human actions in various scenarios. These studies collectively highlight the sophisticated simulation capabilities of LLMs like ChatGPT, marking their significant role and expanding influence in public opinion research.

While the use of LLMs like ChatGPT in social science shows promise, three significant challenges necessitate further investigation. C1) The global applicability and reliability of LLMs . The prevalent use of U.S. surveys in existing studies (Argyle et al., 2023 ; Lee et al., 2023 ) reflects the English-centric training data of ChatGPT. It leaves us with uncertainty regarding the model’s effectiveness in navigating and accurately reflecting public opinion across diverse cultural, linguistic, and economic contexts. This gap in understanding poses a critical challenge in assessing the applicability and reliability of LLMs, such as ChatGPT, in public opinion analysis on a global scale. C2) Demographic biases within LLMs . Biases related to gender, race, education, age, and income, inherent in LLMs due to training on internet-based content, may not sufficiently represent diverse perspectives. For instance, Martin ( 2023 ) suggested a tendency in ChatGPT’s responses to favor liberal and privileged viewpoints. Therefore, identifying and addressing specific areas of unfair representation, particularly in terms of socio-economic diversity, merits further research to ensure equitable AI development. C3) Complexity and Choice Variability in LLM Simulations . A notable research gap exists in assessing LLMs, like ChatGPT, for their capability to replicate complex decision-making across various topics. This gap encompasses a limited insight into the models’ adaptability to distinct decision dynamics, such as environmental versus political issues, and the influence of increased choice complexity on simulation accuracy. Closing this gap is essential for evaluating the boundaries and efficacy of LLMs in diverse and complex societal contexts.

The study aims to tackle these challenges through a three-fold approach. Firstly, we explore the impact of cultural, linguistic, and economic development differences in AI simulation accuracy (for C1). This objective directly addresses the gap related to the predominance of English and U.S.-centric data in AI models. The study assesses how these biases influence public opinion representation in diverse contexts and their subsequent effects on policy decisions across different countries. Building upon this foundation, the second aim is to analyze the implications of demographic biases within AI simulations (for C2). This aim focuses on understanding how demographic biases in AI affect the inclusivity and representativeness of public policies, ensuring that diverse demographic perspectives are accurately reflected. Finally, we assess AI simulation accuracy in diverse issues and explore ideological and choice complexity biases in policy implications (for C3). This involves a focused examination of three aspects: the variation in AI simulation accuracy between topics like environmental and political issues, the influence of ideological biases on policy-related simulations, and the effect of choice complexity on simulation fidelity. These explorations are essential for guaranteeing that AI-driven policies are founded on a realistic, unbiased, and comprehensive grasp of complex societal issues.

The contributions of this paper are summarized as follows: The theoretical significance of this research lies in its potential to enrich public opinion theories by examining the parallels and discrepancies between human biases in opinion formation and AI biases in opinion simulation. This provides insights into AI’s role and potential impact on public policy. On an empirical level, the study aims to empirically analyze biases related to culture, language, economy, demographics, and themes in AI-simulated public opinions. It seeks to highlight the complexities and challenges AI tools encounter in accurately representing diverse viewpoints.

Recognizing the challenges of adding value ethically to AI, especially in capturing the diversity and complexity of global public opinions, this study’s outcomes inform the creation of more sophisticated AI applications in public policy. It underscores the need to develop policies informed by a balanced and inclusive representation of public opinions, essential for efficient governance in areas like environmental protection, economic development, and political processes.

Materials and methods

Tool: chatgpt.

Advancements in AI and natural language processing (NLP) have led to the development of LLMs, which are reshaping the landscape of content creation and text generation (Mathew, 2023 ). ChatGPT, a prominent example of these models developed by OpenAI, stands at the forefront of this transformation. Built on the Generative Pre-trained Transformer (GPT) architecture, ChatGPT excels at mirroring human-like language capabilities (Chan, 2023 ). It leverages vast datasets to generate contextually appropriate responses, showing the power of LLMs in understanding and generating nuanced text (Ray, 2023 ). Inspired by the method of Argyle et al. ( 2023 ), we utilize ChatGPT to generate ‘Silicon Sample Data’ to assess the correspondence between simulated responses and real survey results across different research settings.

Survey data source

The World Values Survey (WVS), initiated in 1981, surveys socio-cultural, political, and moral values globally, covering nearly 100 countries and representing about 90% of the global population (Inglehart et al., 2014 ). The WVS’s standard questionnaire ensures data consistency across diverse linguistic, economic, and cultural regions, making it valuable for comparative analyses like ours. This uniformity is crucial in our study to attribute any variation in AI-simulated responses to the AI’s interpretation instead of differences in question phrasing. Besides, the WVS questionnaire covers a broad spectrum of topics, including economic, political, religious, and social values, making it useful for various research fields. It enables comparisons between responses on potentially biased topics like environmental issues and political questions, assessing AI simulation biases across different themes. Moreover, with interviews with nearly 400,000 respondents, the WVS is one of the largest studies of its kind (Inglehart et al., 2014 ). It provides detailed demographic data for each respondent, which is important for examining demographic representation biases in AI simulations and how well AI models mirror public opinion across diverse subgroups. In this study, we use data from WVS Wave Six (2010–2014). The time of the survey varied by country; it was conducted in Japan in 2010, the United States in 2011, Sweden in 2011, Singapore in 2012, South Africa in 2013, and Brazil in 2014.

Simulation input parameters

Target variables.

The first target variable, V81, assesses prioritization between the economy and the environment. It asks respondents to choose among statements: 1. Emphasis on protecting the environment, 2. Emphasis on economic growth, 3. No answer to environmental versus economic priorities. This variable is primarily used in the first two studies focusing on country comparisons and demographic biases. The survey questions for this and the below variables are available in Table S3 in the Supplementary Materials.

The second target variable is political election voting behavior, measured by question V228: “If there were a national election tomorrow, for which party on this list would you vote?” Respondents can choose from major political parties in their country, along with options like uncertainty or not voting. For example, in the United States, options include 1. Democrat, 2. Republican, 3. Other party, and 4. No answer/Don’t know/I would not vote. This variable is introduced in the third study, where both environmental and political questions are utilized for thematic comparison.

Demographic Variables. Key demographic variables include ethnicity (V254), sex (V240), age (V242), education level (V248), and social class (V238). Ethnicity options are country-specific and reflect major ethnic groups for respective countries. Sex is coded as 1 for males and 2 for females. Age is a continuous variable. Education levels range from no formal education to a university degree. Social class is self-identified, with options like upper class, middle class, or lower class.

Covariates. For the environmental issue, we choose covariates that are frequently included in environmental surveys and have precedent in prior research (Lee et al., 2023 ), including:

Membership in Environmental Organizations (V30): This assesses active (2), inactive (1), or non-membership (0) in various organizations, including environmental ones.

Environmental Consciousness (V78): This measures respondents’ identification with the statement “Looking after the environment is important to this person; to care for nature and save life resources.” Responses range from 1 (very much like me) to 6 (not at all like me).

Financial Support for Ecological Organizations (V82): This variable inquiries about donations to ecological organizations in the past two years, coded as 1 (yes) and 2 (no).

Participation in Environmental Demonstrations (V83): This variable assesses involvement in environmental demonstrations in the past two years, with responses coded as 1 (yes) and 2 (no).

Confidence in Environmental Organizations (V122): This measures confidence levels in environmental organizations, ranging from a great deal of confidence (1) to none at all (4).

As mentioned in the limitation, there are few covariates associated with the political question. We only identified one covariate, which is Political Ideology (V95): In political matters, people talk of ‘the left’ and ‘the right.’, if 1 means extremely left and 10 means extremely right, where would you place your views?

Simulation process

Model and api setting.

Our study employs the GPT-3.5 Turbo model, due to GPT-3.5’s superior efficiency in processing large data volumes and faster response capabilities, essential for our extensive simulation research. Moreover, despite the commonly held view that human morality is a challenging aspect for language models to grasp, Russell ( 2019 ) and Dillion et al. ( 2023 ) discovered a notable alignment between GPT-3.5’s responses and human moral judgments. This congruence of GPT-3.5 can help enhance the accuracy and relevance of our simulations in replicating complex human ethical considerations. Note that we acknowledge that our findings are specific to the version of the language model used and do not necessarily reflect the capabilities or biases of all LLMs.

The impact of temperature settings on language model outputs varies depending on the task. As noted by Boelaert et al. ( 2024 ), in scenarios where responses are limited to predetermined options, such as in our experiments, temperature variations have minimal effect on outcomes. This contrasts with full-answer generation tasks, where temperature can influence next-token probabilities. Despite the limited impact in our case, we follow the recommendations of Guilherme and Vincenzi ( 2023 ) and Davis et al. ( 2024 ), who suggest that lower temperatures produce more consistent outputs. Consequently, we set the OpenAI API’s temperature to 0.2 for our survey simulation.

Prompt design

We adopt an interview-styled format for generating AI responses that simulate human participants. The process begins with converting raw survey data, including demographic information and other covariates, into a format understandable by the AI model. We assign specific codes to each demographic attribute and then translate these codes into descriptive sentences. For example, ‘V240-1’ translates to “You are male.” These sentences form a comprehensive demographic profile for each respondent, starting with “Please assume that you are…” Regarding the target question, our approach differs across studies. Initially, we focus exclusively on the environmental protection versus economic growth question for country comparison and demographic biases. For the third study, both the environmental question and the political election voting decision question are employed for thematic comparison.

We then integrate the demographic profile and the target question into a single prompt, guiding the AI to respond as a person with specific demographic characteristics. For example, “Assuming you are a 30-year-old female with a university degree and middleclass status, when asked whether you support protecting the environment or enhancing economic growth, what is your choice: (1) emphasis on protecting the environment, (2) emphasis on economic growth, or (3) neither/other?”

To improve the authenticity of our simulations, we used prompts in the native languages of non-English speaking countries—Sweden, Brazil, and Japan—drawing directly from the questionnaires in local languages available in the WVS database and enabled the ChatGPT to respond in the language of the query. This method preserves the original context and meaning, enhancing the accuracy of our cross-linguistic analysis of ChatGPT performance. For other countries in our study, where English is the primary language and the questionnaires were administered in English, we continued to use English prompts. Moreover, for each sample, we conducted 100 simulations considering the variability inherent in the model’s responses.

To validate our simulation, we instruct the AI to provide a reasoning chain before its final answer, ensuring responses mimic human-like thought processes. Additionally, we direct the AI to forego politically correct answers, favoring responses based on an assumed personal setting. The AI-simulated response, typically a chosen numerical option, is then extracted and recorded. Figure 1 shows the process from raw survey data conversion to AI-generated responses, as well as the reasoning chain of ChatGPT before giving an answer.

figure 1

The top section outlines the procedure for generating ChatGPT-simulated responses using socio-demographic prompts derived from the World Values Survey Dataset for six countries. The bottom section displays the step-by-step rationale of ChatGPT to give an answer based on the demographic information.

Comparative design

The literature on bias in AI systems reveals varied detection methods. Delobelle et al. ( 2021 ) questioned the generality of using fixed templates and specific seeds, while Caliskan et al. ( 2017 ) emphasized the role of training data in introducing biases into AI. Akyürek et al. ( 2022 ) noted bias metrics’ inconsistency, potentially leading to contradictory findings. Liu et al. ( 2022 ) discussed the operational difficulties in developing bias classifiers and the often-limited access to a model’s word embeddings that are essential for thorough bias assessment.

In the context of AI systems, particularly language models like ChatGPT, algorithmic fidelity would imply the model’s ability to reflect the diversity of human opinions, cultural nuances, and socio-cultural dynamics in its responses or outputs (Argyle et al., 2023 ; Lee et al., 2023 ). For instance, if a language model is used to simulate public opinion, high algorithmic fidelity would mean that the opinions generated by the model closely align with the actual distribution of opinions across different populations. The concept is crucial in evaluating the effectiveness and reliability of AI systems in applications where reflecting human-like understanding and behaviors is important.

In line with the theoretical framework of algorithmic fidelity, we posit that an unbiased AI should accurately reflect the wide range of opinions represented in the WVS, showing the diversity and proportionality inherent in a global, multicultural sample. Consequently, our operational definition of bias is centered around the extent of deviation in the AI’s depiction of public opinion from the empirically observed distribution of responses within the WVS. To assess this, we utilize agreement not as a direct bias metric, but as a tool to assess the degree of alignment between ChatGPT’s responses and the actual WVS outcomes.

Thus, the detection of biases stems from a comparative analysis that scrutinizes agreement scores across various countries, demographic segments, and thematic areas. Through examination of the variations in agreement scores among these groups, we identify which simulations most accurately mirror the surveyed populations and which may display signs of bias. Higher agreement levels in certain groups, as opposed to others, suggest a lower propensity for bias in the model’s representations of those particular groups’ opinions.

Cultural, linguistic, and economic bias evaluation

Cultural, linguistic, and economic biases in AI models like ChatGPT, primarily stem from their internet-based training data, which is heavily skewed towards specific cultures, languages, and economic perspectives (Ray, 2023 ). The strategic selection of Japan, Singapore, the U.S., South Africa, Sweden, and Brazil for this study, as detailed in Table 1 , aims to encompass a broad spectrum of cultural, economic, and linguistic contexts. This facilitates a thorough analysis of ChatGPT’s performance and biases across varied global settings.

Demographic bias assessment

The study investigates the presence of gender, racial, age, educational, and income biases in AI models such as ChatGPT, likely originating from biases in the training data (Ray, 2023 ). We assess these biases through simulated interactions with ChatGPT among varied demographic groups within the United States, specifically analysing responses to environmental issues.

Complexity and choice variability

We continue to address the potential ideological bias in AI models like ChatGPT (Ray, 2023 ). This entails examining three key aspects: the difference in AI simulation accuracy across topics such as environmental and political issues, the presence of ideological biases for different topics, and how choice complexity affects simulation fidelity.

Data analysis

To measure the correspondence between the simulated responses and the real survey results, our analysis primarily employs Cohen’s Kappa, a robust measure adjusting for chance agreement, thus providing a more accurate assessment of ChatGPT’s responses compared to actual survey results. A Kappa value of 1 indicates perfect agreement, while a value of 0 indicates no agreement beyond what is expected by chance. Negative values indicate less agreement than expected by chance.

In support of Cohen’s Kappa, we also utilize Cramer’s V, which measures the strength of association between two nominal variables independent of table size, offering values from 0 (no association) to 1 (perfect association). This method complements Kappa by assessing the overall correspondence between variables.

Finally, we assess the Proportion Agreement, a fundamental measure that determines the percentage of instances where two evaluators provide identical classifications. While this method yields a straightforward calculation of agreement, it lacks the capacity to account for coincidental concurrence. Consequently, a high rate of agreement does not invariably mean a substantive association, as it might merely reflect chance alignment. This limitation renders the Proportion Agreement a supplementary tool rather than a focal point of our analysis, particularly in comparison to Cohen’s Kappa and Cramer’s V.

Together, these statistical methods provide a thorough analytical framework. Our focus, however, is on Cohen’s Kappa for its robust adjustment for chance, a vital factor in analyzing AI response patterns. We conducted 100 simulations per respondent to calculate agreement and used the mean of these calculations as the overall agreement level for each prompt. This method reduced the variability in the model’s responses, yielding a more reliable consensus estimate.

This research has provided insights into the capabilities and limitations of LLMs like ChatGPT for simulating public opinion across various cultural, economic, linguistic, demographic, and thematic contexts. Our findings highlight that while LLMs show promise in replicating public opinions, particularly in contexts like the United States where the model’s training data is more robust, there are notable limitations in its global applicability and reliability. Moreover, our analysis within the United States uncovered unfair representation of specific demographic groups. This disparity suggests that current LLMs, including ChatGPT, may inherently possess biases influenced by the demographic representation in their training data. The underrepresentation or misrepresentation of certain groups, especially marginalized communities, raises concerns about the equitable use of LLMs in public opinion research. Last, the study reveals that ChatGPT favors liberal choices more in political than environmental simulations, that its simulation accuracy is higher for political behaviors than complex environmental decisions, and that increased choice complexity reduces the model’s simulation accuracy. These findings highlight the importance of addressing inherent biases and the incorporation of more diversified training materials in AI models for reliable application across various topics and countries.

Comparative study across countries

Figure 2 presents the distribution of Cohen’s kappa values across each country, derived from 100 simulation iterations. The mean value of these results is calculated and reported. Figure 3 illustrates the differences in ChatGPT’s ability to simulate survey responses across countries based on Cohen’s Kappa score. A higher score shows a higher level of agreement in simulation. The results on the other two measures – Cramer’s V and Proportion Agreement – are available in Table S1 in the Supplementary Materials.

figure 2

This figure illustrates the variability and central tendency in Cohen’s Kappa statistics through 100 simulations for six different countries: USA, Sweden, Singapore, Brazil, Japan, and South Africa. The density plots demonstrate the distribution of the kappa values, while the dashed vertical lines indicate the mean kappa value for each country, providing a reference for the central location of the data within each simulation set.

figure 3

The horizontal axis quantifies Cohen’s Kappa values, and the vertical axis segregates countries into different categories based on culture, economic development, and primary language.

The United States displays a moderate Cohen’s Kappa score of 0.239, indicating a reasonably good simulation of survey responses. On the other hand, Japan and South Africa’s low Cohen’s Kappa values of 0.024 and 0.006, respectively, highlight significant limitations in the model’s accuracy within these contexts. The inconsistency suggests that the simulation’s current assumptions—such as the uniform influence of cultural, economic, and social factors across different countries—may be flawed, indicating these elements are not sufficiently integrated or weighted in the model.

To better understand the correlation, we have transformed key aspects—culture, economy, and language—into binary variables using a 0 and 1 scheme, as detailed in Table 2 . Cultural background is coded as “Western” (1) or “Not Western” (0), economic status as “Developed” (1) or “Developing” (0), and dominant language as “English” (1) or “Not English” (0). Each of these six countries uniquely represents a code formed by combining the three categories.

We use Pearson correlation coefficients to identify linear relationships between various factors and ChatGPT’s simulated results. These coefficients, which vary between −1 and 1, elucidate both the strength and type of these relationships. Coefficients near 1 or −1 denote strong positive or negative correlations, respectively, whereas a coefficient around 0 indicates a lack of significant correlation.

Figure 4 demonstrates the correlations between different simulation result metrics and the binary categories of cultural background, dominant language, and economic status. In the heatmap, dark blue indicates a strong positive correlation, whereas lighter blue suggests a weaker positive correlation. The heatmap analysis underscores the substantial influence of culture on ChatGPT’s simulation accuracy, with a high Cohen’s Kappa correlation of 0.971, indicating a strong predictive relationship. Complementary to this, the correlation with Cramer’s V and Proportion Agreement is also notable, recorded at 0.942 and 0.789, respectively, reinforcing culture’s pivotal role. In contrast, economic factors reveal a moderate correlation through a Cohen’s Kappa value of 0.557, suggesting its influence is considerable but not as pronounced. Furthermore, language demonstrates its impact with a Cohen’s Kappa correlation of 0.101, confirming its relevance, albeit to a lesser extent than cultural and economic factors. These correlations highlight the significance of integrating diverse socio-cultural and economic considerations for enhancing the fidelity of ChatGPT simulations in reflecting public opinion.

figure 4

In the heatmap, dark blue indicates a strong positive correlation, whereas lighter blue suggests a weaker positive correlation.

Demographic representation in the United States

Since the previous result shows that ChatGPT’s effectiveness in simulating survey responses is most prominent in the United States, we further explore the demographic subpopulation representation within this country using the environmental issue survey question. Here, we only highlight Cohen’s Kappa results using Fig. 5 since our analysis across Cohen’s Kappa, Cramer’s V, and Proportion Agreement demonstrated a consistent pattern. The corresponding results from the other two measures are available in Table S2 in the Supplementary Materials.

figure 5

The bar chart quantifies the demographic representation of ChatGPT’s simulation accuracy using Cohen’s Kappa. Each bar reflects the level of agreement between ChatGPT’s responses and human judgment across different demographic sectors: sex, ethnicity, age, social class, and education.

Figure 5 reveals distinct patterns in the alignment between ChatGPT’s simulated and actual responses across U.S. demographics regarding the priority between economy and environment. Males show slightly higher agreement and association than females. Among ethnic groups, white and other ethnicities exhibit more robust correspondence. Older age groups demonstrate notably stronger alignment, indicating age-related variability. In terms of social class, the upper and middle classes align more closely with the simulations. Additionally, the group with university education displays a higher fidelity in response alignment, suggesting a correlation between higher education and response predictability.

These trends are in line with Dillion et al. ( 2023 ), who observed that GPT models tend to mirror the viewpoints of individuals with higher incomes and education. Furthermore, Ray’s ( 2023 ) review corroborates our findings regarding gender and ethnic biases. However, our study diverges when it comes to age representation; we find that older age groups align more with ChatGPT’s Turbo-3.5. In contrast, Santurkar et al. ( 2023 ) noted that the 65+ demographic is poorly represented by current language models. This discrepancy might be attributed to the specific models used, as each may possess unique biases (Dillion et al., 2023 ), influencing the representation of different age groups.

Moreover, while the previous studies (Dillion et al., 2023 ; Ray, 2023 ) primarily examined political issues, our research extends into environmental issues. It not only reaffirms the existence of these demographic biases but also suggests their pervasiveness across different spheres. This implies a more widespread issue of unfair representation of different demographic subpopulations in AI models, warranting careful consideration and action.

Comparative analysis of topic-related results

Accuracy of political vs. environmental issue simulations.

We compare ChatGPT’s simulation accuracy on two different issues within the United States: environmental protection versus economic development, and political voting decisions. Research by Lee et al. ( 2023 ) successfully forecasts political outcomes using demographic data solely, suggesting a simpler decision-making process compared to environmental issues, which appear less predictable from demographics alone. We assess this by comparing political decisions and environmental decisions, both with and without additional covariates. Our comparative analysis between political and environmental decisions involved distinct sets of covariates: five for environmental decisions, including Membership in Environmental Organizations, Environmental Consciousness, Financial Support for Ecological Organizations, Participation in Environmental Demonstrations, and Confidence in Environmental Organizations; and one for political decisions, Political Ideology. To enable a direct comparison amidst the diverse response options of the original survey for V228 and V81, our study focused exclusively on respondents who voted for either “Democrats” or “Republicans” in the political voting question (Lee et al., 2023 ). This binary categorization was also applied to the ChatGPT simulations. Similarly, for environmental issues, we confined our analysis to participants who voted with a preference for either economic or environmental priorities, ensuring a uniform framework of binary options to isolate the model’s performance from response complexity.

Table 3 indicates that political simulations demonstrate higher accuracy compared to environmental simulations, both with covariates and without covariates. This inherent predictability of political behavior is supported by Lee et al. ( 2023 ), who found demographics alone to be a strong predictor. In the comparison involving covariates, political simulations are modeled using just a single covariate, whereas environmental simulations incorporate multiple covariates but still fail to achieve comparable accuracy. This discrepancy suggests additional complexities in modeling environmental decision-making. Therefore, our study reinforces the notion that simulating environmental decision-making is inherently more challenging than predicting political behavior.

Ideological bias in ChatGPT simulations across different topics

We investigated ChatGPT’s potential bias towards liberal ideologies in simulations related to environmental issues and election voting. We defined prioritizing environmental protection as a liberal stance in environmental dialogs and voting for the Democratic Party as the liberal option in political discussions. To assess the model’s ideological tendencies, we analyzed the frequency of liberal choices in the simulations for both topics, contrasting them with the actual survey results. Recognizing the broader range of options in political questions, we normalized the responses to mitigate any potential bias amplification due to the larger set of political choices.

Table 4 presents a −6.10% difference in the liberal proportion for environmental issues, signaling that in simulations, fewer simulated respondents were inclined to select the liberal option compared to actual survey outcomes, denoting a conservative deviation. Conversely, the political issue simulations exhibit a 16.33% increase in liberal selection, indicating a greater number of simulated respondents favoring the liberal choice relative to the survey data, revealing a liberal inclination. Our findings are consistent with the research conducted by Martin ( 2023 ) and Dillion et al. ( 2023 ), which suggested that ChatGPT tends to exhibit a bias towards liberal viewpoints in political matters. Moreover, our study goes beyond this by revealing that ChatGPT’s ideological predisposition varies depending on the specific simulation topic being discussed.

Impact of choice variety on simulation fidelity

Focusing solely on the political issue, we compared ChatGPT’s responses between scenarios with two and four options. Table 5 shows a clear trend: as the number of choices increases, the simulation’s alignment with expected outcomes decreases. This suggests that ChatGPT’s ability to match the target distribution diminishes with more complex choice sets. This finding is consistent with Lee et al. ( 2023 )’s research, highlighting that greater choice complexity challenges the accuracy of AI in decision-making simulations. This underscores the critical role of choice quantity in influencing AI model performance in simulations.

Our research assesses the efficacy of ChatGPT in public opinion analysis, considering geographical, demographic, and topic-specific aspects. These dimensions collectively shed light on the strengths and limitations of LLMs in accurately capturing diverse public opinions. While demonstrating accuracy in reflecting views within the United States, simulations reveal biases and constraints, especially in representing socially disadvantaged subgroups, non-Western and developing countries, and maintaining ideological neutrality across topics. This highlights the need for a balanced and cautious approach in integrating LLMs with traditional research methods, ensuring comprehensive and representative insights into diverse public opinions.

Global applicability and reliability of LLMs

The study reveals notable disparities in ChatGPT’s simulation accuracy among different countries, highlighting a higher alignment with the United States compared to others. This finding is in line with Dillion et al.’s ( 2023 ) research, which suggested that language models like GPT are more adept at providing general estimates about Western English speakers. This is attributed to the predominance of Western English expressions in the training data of such models. Further analysis indicates that cultural background is the primary factor influencing these variations, followed by dominant economic status and language.

While language use is the most intuitive factor since language models like ChatGPT are trained primarily on textual data, its influence on simulation accuracy extends beyond mere linguistic comprehension. Language, embedded with cultural and contextual nuances, serves as a conduit for conveying broader socio-cultural and economic realities. Countries with higher economic status often have more extensive digital footprints, as their citizens are more likely to have internet access and contribute content. This results in a larger and more diverse set of data from these regions, enhancing the model’s ability to accurately simulate scenarios and understand content specific to these areas. Similarly, cultural norms, values, and context significantly influence language usage and communication styles. Since cultural expressions and contexts vary widely across the globe, a dataset predominantly composed of content from Western cultures can lead to a bias towards these cultures.

In conclusion, the effectiveness of language models like ChatGPT in capturing global perspectives hinges on a triad of factors: cultural depth, economic development, and language use. These elements collectively shape the training data’s diversity and representativeness, thereby impacting the model’s proficiency in accurately mirroring and addressing global experiences. The evident geographical disparities in model performance underscore concerns about the universal applicability of LLMs in diverse analytical contexts. This is particularly pronounced in scenarios involving perspectives from non-western, economically less developed, or non-English speaking regions, where representation in training data is noticeably lacking. To enhance the global applicability and reliability of ChatGPT in public opinion analysis, it is necessary to diversify the training data and incorporate more varied cultural, socio-economic, and linguistic perspectives.

Demographic biases in AI simulations

The observed demographic disparities in ChatGPT’s simulations, particularly within the United States, highlight a significant skew towards representing males, individuals with higher education, and those from upper social classes. This uneven representation reflects a broader issue of demographic bias in AI, mirroring the biases present in human societies. Our findings align with recent studies that underscore the challenges in using LLMs to simulate diverse human survey responses. Liu et al. ( 2022 ) and Liang et al. ( 2021 ), Alon-Barkat and Busuioc ( 2023 ), consistently show that GPT models tend to overrepresent perspectives aligned with liberal, higher-income, and well-educated demographics. Bisbee et al. ( 2024 ) found that LLM-generated outputs often lack diversity and exhibit more bias than actual survey data, particularly underrepresenting minority opinions. Boelaert et al. ( 2024 ) introduce the concept of ‘machine bias’ to illustrate how LLMs fail to capture human population diversity, stemming from both training data and the models’ technical configurations.

This phenomenon of AI models reflecting human biases can be attributed to the nature of their training data, which predominantly comes from sources where these demographic groups are more active and visible (Chan, 2023 ). Since AI models learn from existing data, they inadvertently perpetuate and amplify the biases present in that data.

The presence of biases in AI becomes increasingly apparent when examining the research topics we study. Our investigation into environmental issues, typically regarded as neutral and less divisive, still uncovers biases in AI simulations. This is noteworthy, especially when compared to the common biases observed in politically charged discussions. It underscores that AI biases are not confined to highly contentious or polarized areas like politics. Instead, they also permeate more universally relevant topics, further emphasizing the widespread and deep-rooted nature of these biases.

Such a pattern raises concerns about the AI model reinforcing societal biases by amplifying the voices of already dominant groups, potentially sidelining less-represented communities. The tendency of ChatGPT to reflect existing societal structures and biases in its outputs underlines critical issues in the inclusivity and equity of AI tools in public opinion research. This calls for a careful examination of AI integration in public opinion research, ensuring diverse and balanced representation in AI-generated data.

Thematic Bias in AI

Our study also reveals distinct disparities in ChatGPT’s accuracy regarding political versus environmental issue simulations. The findings indicate that political behavior predictions, even when based solely on demographic data, are more accurate compared to environmental issue predictions. This aligns with the research of Lee et al. ( 2023 ), suggesting that political decision-making may be more straightforward and predictable based on demographics. In contrast, environmental decision-making appears to involve more complex and diverse factors beyond demographic indicators. Our study, however, highlights the limitations of our dataset, particularly in the context of political simulations. The gap in accuracy compared to previous studies utilizing a broader range of covariates, such as that of Argyle et al. ( 2023 ), underscores the importance of comprehensive data for enhancing predictive accuracy.

Besides, our findings reveal ideological biases in ChatGPT’s simulations, with a conservative bias in environmental scenarios and a liberal inclination in political simulations, aligning with Motoki et al.’s research ( 2024 ) on a left-leaning bias favoring the Democrats in the U.S. The disparity in bias across different thematic areas raises critical questions about the influences shaping ChatGPT’s response patterns. It suggests that the model’s training data might be imbued with ideological leanings, impacting its outputs in topic-specific contexts. This is crucial for researchers and practitioners using AI for public opinion analysis, emphasizing the need to consider potential biases in AI-generated simulations, especially in politically charged topics.

The study also shows that the complexity of choice options in simulations impacts ChatGPT’s accuracy. With an increase in the number of response options, the model’s alignment with expected outcomes decreases. This observation is consistent with previous research (Lee et al., 2023 ), emphasizing that AI models face challenges in decision-making simulations with greater choice complexity. This insight is crucial for designing and interpreting AI-based simulations, suggesting a need for careful consideration of choice quantity and structure to ensure fidelity in AI-generated predictions.

More perspectives

Our analysis acknowledges multiple factors influencing LLMs’ ability to simulate diverse perspectives accurately. These include limited training data diversity, which may bias the model towards overrepresented cultures; architectural constraints that hinder nuanced cultural understanding; and the critical role of prompt design in guiding output. Furthermore, inherent biases within the data can skew the model’s representations. In our study, we aimed to minimize external variations by consistently using the same ChatGPT model and standardized prompts across different countries. This methodical approach allowed us to conduct a comparative analysis with reduced confounding factors, focusing on the influence of internal variables, particularly the training data. Our conclusions shed light on the intrinsic factors that affect the performance of LLMs. For future work, exploring the impact of further diversifying training data and refining model architecture could provide deeper insights into enhancing LLMs’ global perspective representation.

LLMs have the potential to tailor their outputs to reflect the nuances of specific countries through the incorporation of country names in prompts. This capability stems from semantic embeddings, which encode words and phrases, including country names, into dense vectors capturing contextual meanings. When a prompt includes a country, the model’s response aligns more closely with the attitudes and perspectives associated with that country. However, we observe that the effectiveness of this country-specific alignment varies, largely depending on the model’s exposure to relevant data. To explore this possibility, we conducted an additional experiment using the political election question. We used data from the United States (Wave 6) but modified the prompts to indicate that respondents were from Japan. The resulting low values across Cohen’s Kappa, Cramer’s V, and Proportion Agreement suggest that the LLM’s responses are significantly influenced by the specified country context (Appendix Table S4 ), supporting our observation that the model can reflect variations in country contexts, but the extent of this reflection depends on the model’s training data and the specific country in question.

Additionally, to assess the temporal consistency of LLM outputs, we compared the simulated responses using data from the United States (Wave 7) in 2017 to those from Wave 6. Our findings revealed consistent simulation accuracy across these time periods, suggesting some degree of long-term viability in LLM-generated responses. This observation aligns with research by Argyle et al. ( 2023 ), who also found a high degree of correspondence between reported two-party presidential vote choice proportions from GPT-3 and ANES respondents. The detailed results of these experiments are included in Appendix Table S4 .

Implications for policy and governance

The exploration of ChatGPT’s potential as a supplementary tool for traditional research methods in public policy requires consideration of the risks and limitations illustrated in our study. The presence of cultural, economic, linguistic, and demographic biases in LLM simulations, such as those of ChatGPT, poses a significant challenge to equitable policy development. If policies are shaped by biased AI simulations, they risk overlooking the needs and perspectives of diverse population segments, particularly in non-English-speaking and culturally diverse regions. This can lead to policies that inadvertently exacerbate existing inequalities.

More importantly, the use of LLMs to simulate public opinion raises critical ethical concerns, particularly in terms of privacy and potential misuse. As LLMs are trained on vast amounts of data, including personal information shared online, there are concerns about the privacy threats. To ensure the privacy rights of individuals, LLMs must obtain and utilize data in an ethical and responsible manner. Furthermore, the potential misuse of LLM-generated public opinion simulations is a significant concern. If these simulations are presented as genuine public opinions without proper disclosure of their AI-generated nature, they could be used to manipulate public discourse and decision-making, leading to the spread of misinformation, the amplification of biased perspectives, and the undermining of democratic processes.

To mitigate these risks, it is crucial to prioritize inclusivity and equity in AI development. Diversifying the training datasets to encompass a wide range of languages, cultures, and demographic backgrounds is essential to ensure that AI tools like ChatGPT can accurately and fairly represent global public opinions. This approach calls for a collaborative effort between developers and researchers to identify and address inherent biases in AI models. Such concerted efforts are vital to establish AI tools as reliable and trustworthy aids in public policy formulation. Additionally, the study underscores an ethical and social responsibility for AI developers and users in public management. Utilizing AI in governance requires a critical understanding of its limitations and potential biases. Policymakers and researchers must be cautious in interpreting AI-generated data, ensuring that it complements rather than replaces traditional methods of public opinion collection. Besides, it is essential to establish clear guidelines and regulations for the use of LLMs in public opinion research, ensuring transparency, accountability, and the protection of public interests. This responsible approach can enable the effective harnessing of AI’s potential, leading to the formulation of policies that are equitable, effective, and truly reflective of the diverse spectrum of public opinions.

In summary, while ChatGPT offers promising avenues for enhancing public policy research, its integration requires a balanced, ethical, and inclusive approach to fully realize its benefits while mitigating risks.

Limitations

Our study acknowledges three primary limitations. The first limitation pertains to the temporal and contextual relevance of our research. This is particularly significant given the dynamic nature of public opinion and the continuous development of AI technology. Previous research (Argyle et al., 2023 ) has investigated the temporal capabilities of language models like GPT-3, assessing their ability to maintain accuracy when analyzing data beyond their training scope. For example, Argyle et al. ( 2023 ) examined the algorithmic fidelity of GPT-3 with data from 2020, which is beyond its training cutoff in 2019. Such analyses are important as they evaluate the model’s performance over time, providing insights into its long-term viability.

Our study, however, does not include this temporal analysis due to our data limitations. The World Values Survey’s five-year interval means we lack access to U.S. data post-2021, which coincides with the training cut-off for ChatGPT’s Turbo-3.5. Consequently, we cannot evaluate how ChatGPT’s simulation accuracy evolves with fresh inputs from periods beyond its training scope. Note that variations in the dataset’s timeframe and model capability iterations may lead to differing experimental outcomes. This limitation restricts our understanding of the model’s adaptability to new developments and shifts in public opinion that have occurred since the last dataset. However, such variations do not detract from our core insights, because our analysis is focused on comparing the relative efficiency of LLMs in simulating country-specific perspectives. The resolution of this limitation is dependent on the availability of updated survey data, which would allow for a more comprehensive temporal analysis and enhance the robustness of our findings.

The second limitation of our study is the focused analysis on a single AI model, ChatGPT’s Turbo-3.5, rather than a comparative evaluation across different models. While acknowledging that each AI model has its own set of inherent biases (Dillion et al., 2023 ), we concentrated on Turbo-3.5 to conduct an in-depth examination of its reasoning processes. We aimed for an in-depth exploration of this model’s capability to maintain consistency in its outputs, rather than a broad but less detailed comparison across multiple models. Given the scope and depth of this analysis, comparing multiple models was outside our research scope. However, the comparative study of various AI models, including those with capabilities surpassing Turbo-3.5, represents a significant opportunity for future research. Such comparative analyses could enable the identification of model-specific biases and idiosyncrasies, contributing to the knowledge of factors influencing LLM performance in simulating public opinion across diverse contexts.

The third limitation pertains to the limited covariate analysis. While we incorporated several covariates in our research, particularly in the environmental contexts, a more comprehensive examination of the impact of additional covariates on LLM performance would further strengthen our findings. As highlighted by Lee et al. ( 2023 ), integrating a broader array of covariates, including psychological and social factors, could notably refine the fidelity of AI simulations. This is especially relevant in complex areas, where decision-making is influenced by a wide range of factors beyond demographic indicators. Unfortunately, due to limited covariate availability in our dataset, we were unable to incorporate a broader range of covariates in the analysis across different topics. To ensure comparability across the six countries in our study, we selected only those questions and their associated covariates that were consistently available for all six countries. This constraint particularly affected the political domain, where the relevant covariates were limited. Nevertheless, given the primary focus of our study is on the relative performance of LLMs in simulating public opinion, it does not detract from our main contribution of identifying performance disparities across countries and demographic groups. Future research exploring a broader array of covariates to enhance the predictive accuracy of LLMs could further improve both the theoretical foundations and practical adoption of simulation techniques in public opinion research.

Directions for future research

As discussed above, the limitations of our study could be addressed by future research investigating the temporal capabilities of LLMs, conducting comparative analyses across multiple LLMs, and identifying and testing various influential covariates. Additionally, further exploration is needed in other areas to enhance the effectiveness and reliability of LLMs in this domain.

One critical aspect is expanding the global scope of LLM-based public opinion simulation. The current study is limited to comparison across six countries. Incorporating more countries into future studies could provide deeper insights into optimizing LLMs for public opinion analysis in different national contexts. This expansion would allow for a more comprehensive understanding of how LLMs can be effectively tailored to diverse global perspectives and settings, enhancing their applicability and reliability in international contexts. By including a wider range of countries with varying cultural, economic, and linguistic backgrounds, researchers can uncover the nuances in LLM performance across different regions and develop strategies to mitigate potential biases and limitations.

Moreover, future research could explore thematic biases in LLM simulations more extensively. While our study briefly addresses these biases, a more in-depth analysis of how different types of questions, such as factual, opinion-based, and hypothetical questions, affect LLM performance would be beneficial. For instance, researchers could investigate the potential of using LLMs to generate hypothetical scenarios or counterfactuals, enabling a deeper analysis of how public opinion might shift under different circumstances. By comparing the simulation accuracy across various question types and examining how the inherent characteristics of each type influence the model’s ability to generate accurate and contextually relevant responses, researchers can better understand LLM performance across different thematic domains. This knowledge would help identify potential areas for improvement in the model’s training and architecture, leading to more robust and reliable public opinion simulations.

Using ChatGPT to generate silicon samples, this study underscores the potential of LLMs in enriching public opinion research but also highlights the urgent need to address their limitations. Our findings highlight that while LLMs show promise in replicating public opinions, particularly in contexts like the United States where the model’s training data is more robust, there are notable limitations in its global applicability and reliability. Moreover, our analysis within the United States uncovered unfair representation of specific demographic groups. This disparity suggests that current LLMs, including ChatGPT, may inherently possess biases influenced by the demographic representation in their training data. The underrepresentation or misrepresentation of certain groups, especially marginalized communities, raises concerns about the equitable use of LLMs in public opinion research. Lastly, the study reveals that ChatGPT favors liberal choices more in political than environmental simulations, that its simulation accuracy is higher for political behaviors than complex environmental decisions, and that increased choice complexity reduces the model’s simulation accuracy. These findings highlight the importance of addressing inherent biases and the incorporation of more diversified training materials in AI models for reliable application across various topics and countries.

In conclusion, this study underscores the potential of LLMs in enriching public opinion research but also highlights the urgent need to address their limitations. For LLMs to be effectively and equitably utilized in public management and policy formulation, it is imperative to enhance their cultural and linguistic diversity, mitigate inherent biases, and ensure the ethical and responsible use of the training data and opinion simulation. Future research should focus on improving the representativeness of training datasets, enriching the covariate and thematic analysis, and developing methodologies to assess and reduce biases in LLM simulations. The goal is to ensure that the insights derived from such AI tools are inclusive, equitable, and truly reflective of the diverse tapestry of global public opinions.

Data availability

The datasets generated during and/or analysed during the current study are available from the corresponding author on reasonable request.

Aher GV, Arriaga RI, Kalai AT (2023) Using large language models to simulate multiple humans and replicate human subject studies. Proceedings of the 40th International Conference on Machine Learning, 337–371. https://proceedings.mlr.press/v202/aher23a.html

Akyürek AF, Paik S, Kocyigit MY, Akbiyik S, Runyun ŞL, Wijaya D (2022) On Measuring Social Biases in Prompt-Based Multi-Task Learning (arXiv:2205.11605). arXiv. https://doi.org/10.48550/arXiv.2205.11605

Alon-Barkat S, Busuioc M (2023) Human–AI interactions in public sector decision making: “automation bias” and “selective adherence” to algorithmic advice. J Public Adm Res Theory 33(1):153–169. https://doi.org/10.1093/jopart/muac007

Article   Google Scholar  

Argyle LP, Busby EC, Fulda N, Gubler JR, Rytting C, Wingate D (2023) Out of one, many: using language models to simulate human samples. Political Anal 31(3):337–351. https://doi.org/10.1017/pan.2023.2

Bisbee J, Clinton JD, Dorff C, Kenkel B, Larson JM (2024) Synthetic replacements for human survey data? The perils of large language models. Polit Anal 1–16. https://doi.org/10.1017/pan.2024.5

Boelaert J, Coavoux S, Ollion E, Petev ID, Präg P (2024) Machine Bias. Generative Large Language Models Have a View of Their Own. OSF. https://doi.org/10.31235/osf.io/r2pnb

Brand J, Israeli A, Ngwe D (2023) Using GPT for Market Research (SSRN Scholarly Paper 4395751). https://doi.org/10.2139/ssrn.4395751

Burstein P (2003) The impact of public opinion on public policy: a review and an agenda. Political Res Q 56(1):29–40. https://doi.org/10.1177/106591290305600103

Caliskan A, Bryson JJ, Narayanan A (2017) Semantics derived automatically from language corpora contain human-like biases. Science 356(6334):183–186. https://doi.org/10.1126/science.aal4230

Article   ADS   CAS   PubMed   Google Scholar  

Chan A (2023) GPT-3 and InstructGPT: technological dystopianism, utopianism, and “Contextual” perspectives in AI ethics and industry. AI Ethics 3(1):53–64. https://doi.org/10.1007/s43681-022-00148-6

Cowen T, Tabarrok AT (2023) How to Learn and Teach Economics with Large Language Models, Including GPT (SSRN Scholarly Paper 4391863). https://doi.org/10.2139/ssrn.4391863

Davis J, Bulck LV, Durieux BN, Lindvall C (2024) The temperature feature of ChatGPT: modifying creativity for clinical research. JMIR Hum Factors 11(1)):e53559. https://doi.org/10.2196/53559

Delobelle P, Temple P, Perrouin G, Frénay B, Heymans P, Berendt B (2021) Ethical adversaries: towards mitigating unfairness with adversarial machine learning. ACM SIGKDD Explor Newsl 23(1):32–41. https://doi.org/10.1145/3468507.3468513

Dillion D, Tandon N, Gu Y, Gray K (2023) Can AI language models replace human participants? Trends Cogn Sci 27(7):597–600. https://doi.org/10.1016/j.tics.2023.04.008

Article   PubMed   Google Scholar  

Guilherme V, Vincenzi A (2023) An initial investigation of ChatGPT unit test generation capability. Proceedings of the 8th Brazilian Symposium on Systematic and Automated Software Testing, 15–24. https://doi.org/10.1145/3624032.3624035

Horton JJ (2023) Large Language Models as Simulated Economic Agents: What Can We Learn from Homo Silicus? (Working Paper 31122). Natl Bureau Econ Res. https://doi.org/10.3386/w31122

Hutchings VL (2005) Public Opinion and Democratic Accountability: How Citizens Learn about Politics. Princeton University Press

Inglehart R, Haerpfer C, Moreno A, Welzel C, Kizilova K, Diez-Medrano J, et al. (eds) (2014) World Values Survey: Round Six - Country-Pooled Datafile Version: www.worldvaluessurvey.org/WVSDocumentationWV6.jsp . JD Systems Institute, Madrid

Korinek A (2023) Language Models and Cognitive Automation for Economic Research (Working Paper 30957). National Bureau of Economic Research. https://doi.org/10.3386/w30957

Lee S, Peng TQ, Goldberg MH, Rosenthal SA, Kotcher JE, Maibach EW, Leiserowitz A (2023) Can Large Language Models Capture Public Opinion about Global Warming? An Empirical Assessment of Algorithmic Fidelity and Bias (arXiv:2311.00217). arXiv. https://doi.org/10.48550/arXiv.2311.00217

Liang PP, Wu C, Morency L-P, Salakhutdinov R (2021) Towards understanding and mitigating social biases in language models. Proceedings of the 38th International Conference on Machine Learning, 6565–6576. https://proceedings.mlr.press/v139/liang21a.html

Liu H, Tang D, Yang J, Zhao X, Liu H, Tang J, Cheng Y (2022) Rating distribution calibration for selection bias mitigation in recommendations. Proceedings of the ACM Web Conference, 2048–2057. https://doi.org/10.1145/3485447.3512078

Liu R, Jia C, Wei J, Xu G, Vosoughi S (2022) Quantifying and alleviating political bias in language models. Artif Intell 304:103654. https://doi.org/10.1016/j.artint.2021.103654

Martin JL (2023) The ethico-political universe of ChatGPT. J Soc Comput 4(1):1–11. https://doi.org/10.23919/JSC.2023.0003

Mathew A (2023) Is Artificial Intelligence a World Changer? A Case Study of OpenAI’s Chat GPT (pp. 35–42). B P International. https://doi.org/10.9734/bpi/rpst/v5/18240D

Motoki F, Pinho Neto V, Rodrigues V (2024) More human than human: measuring ChatGPT political bias. Public Choice 198(1):3–23. https://doi.org/10.1007/s11127-023-01097-2

Park JS, O’Brien J, Cai CJ, Morris MR, Liang P, Bernstein MS (2023) Generative agents: interactive simulacra of human behavior. Proceedings of the 36th Annual ACM Symposium on User Interface Software and Technology, 1–22. https://doi.org/10.1145/3586183.3606763

Ray PP (2023) ChatGPT: a comprehensive review on background, applications, key challenges, bias, ethics, limitations and future scope. Internet Things Cyber-Phys Syst 3:121–154. https://doi.org/10.1016/j.iotcps.2023.04.003

Russell S (2019) Human compatible: AI and the problem of control. Penguin, Uk

Google Scholar  

Santurkar S, Durmus E, Ladhak F, Lee C, Liang P, Hashimoto T (2023) Whose opinions do language models reflect? Proceedings of the 40th International Conference on Machine Learning, 29971–30004. https://proceedings.mlr.press/v202/santurkar23a.html

Download references

Acknowledgements

The study is supported by funding from Nanyang Center for Public Administration, NTU.

Author information

Authors and affiliations.

School of Social Sciences, Nanyang Technological University, Singapore, Singapore

Yao Qu & Jue Wang

You can also search for this author in PubMed   Google Scholar

Contributions

Yao Qu: methodology, formal analysis, data curation, writing - original draft, writing - review & editing. Jue Wang: conceptualization, methodology, writing - review & editing, supervision, funding acquisition. All authors reviewed the results and approved the final version of the manuscript.

Corresponding author

Correspondence to Jue Wang .

Ethics declarations

Competing interests.

The authors declare no competing interests.

Ethical approval

Ethical approval was not required as the study did not involve human participants.

Informed consent

Informed consent was not required as the study did not involve human participants.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary information

Supplementary materials, rights and permissions.

Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/ .

Reprints and permissions

About this article

Cite this article.

Qu, Y., Wang, J. Performance and biases of Large Language Models in public opinion simulation. Humanit Soc Sci Commun 11 , 1095 (2024). https://doi.org/10.1057/s41599-024-03609-x

Download citation

Received : 16 December 2023

Accepted : 15 August 2024

Published : 28 August 2024

DOI : https://doi.org/10.1057/s41599-024-03609-x

Share this article

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

Quick links

  • Explore articles by subject
  • Guide to authors
  • Editorial policies

example of thematic analysis in research

IMAGES

  1. Thematic Analysis: Step-by-Step Guide

    example of thematic analysis in research

  2. How to Do Thematic Analysis

    example of thematic analysis in research

  3. Thematic Analysis of Qualitative Data: Identifying Patterns that solve

    example of thematic analysis in research

  4. Thematic Analysis

    example of thematic analysis in research

  5. How to Analyze Qualitative Data from UX Research: Thematic Analysis

    example of thematic analysis in research

  6. How to Do Thematic Analysis

    example of thematic analysis in research

VIDEO

  1. Thematic Analysis in Qualitative research studies very simple explanation with example

  2. Qualitative Analysis

  3. Training

  4. Eng 518 lecture 26

  5. 3 reasons why you cannot find your themes / Thematic analysis in qualitative research

  6. Qualitative Data Analysis Procedures in Linguistics

COMMENTS

  1. Thematic Analysis

    Thematic Analysis is a qualitative research method that involves identifying, analyzing, and interpreting recurring themes or patterns in data. It aims to uncover underlying meanings, ideas, and concepts within the dataset, providing insights into participants' perspectives and experiences.

  2. How to Do Thematic Analysis

    Learn how to use thematic analysis to analyze qualitative data from texts, such as interviews or transcripts. Follow the six-step process developed by Braun and Clarke, and see examples of coding and theme generation.

  3. Thematic Analysis: A Step by Step Guide

    Thematic analysis is a qualitative research method used to identify, analyze, and interpret patterns of shared meaning (themes) within a given data set, which can be in the form of interviews, focus group discussions, surveys, or other textual data.

  4. What Is Thematic Analysis? Explainer + Examples

    Learn what thematic analysis is, when to use it, and how to do it with plain language and examples. Explore the different approaches (inductive and deductive) and types (semantic and latent) of thematic analysis and get tips and suggestions for your research.

  5. A Step-by-Step Process of Thematic Analysis to Develop a Conceptual

    Abstract Thematic analysis is a highly popular technique among qualitative researchers for analyzing qualitative data, which usually comprises thick descriptive data. However, the application and use of thematic analysis has also involved complications due to confusion regarding the final outcome's presentation as a conceptual model. This paper develops a systematic thematic analysis process ...

  6. How to Do Thematic Analysis

    When to use thematic analysis Thematic analysis is a good approach to research where you're trying to find out something about people's views, opinions, knowledge, experiences, or values from a set of qualitative data - for example, interview transcripts, social media profiles, or survey responses.

  7. Thematic Analysis Examples

    Thematic Analysis Examples. Thematic analysis in qualitative research is a widely utilized qualitative research method that provides a systematic approach to identifying, analyzing, and reporting potential themes and patterns within data. Whereas quantitative data often relies on statistical analysis to make judgments about insights, thematic ...

  8. How to do a thematic analysis [6 steps]

    Learn the six steps of thematic analysis, a qualitative research method for analyzing texts and identifying patterns and themes. See examples of thematic analysis in psychology and other fields.

  9. How to Do Thematic Analysis: 6 Steps & Examples

    Thematic analysis is a qualitative research method that focuses on identifying, analyzing, and reporting patterns or themes within a dataset. Thematic analysis involves reading through a data set, identifying patterns in meaning, and deriving themes, providing a systematic and flexible way to interpret various aspects of the research topic.

  10. Thematic analysis: A practical guide

    Learn how to conduct thematic analysis with this practical guide and an exemplar study.

  11. How to Conduct Thematic Analysis?

    Thematic analysis One of the most straightforward forms of qualitative data analysis involves the identification of themes and patterns that appear in otherwise unstructured qualitative data. Thematic analysis is an integral component of qualitative research because it provides an entry point into analyzing qualitative data.

  12. Thematic Analysis: A Step-by-Step Guide

    Thematic analysis is a qualitative data analysis method that involves reading through a set of data and identifying patterns across that data to derive themes.

  13. Practical thematic analysis: a guide for multidisciplinary health

    This article offers practical thematic analysis as a step-by-step approach to qualitative analysis for health services researchers, with a focus on accessibility for patients, care partners, clinicians, and others new to thematic analysis.

  14. PDF Essentials of Thematic Analysis

    We then asked them to describe in detail the steps of the method, including the research team, sampling, biases and expectations, data collection, data analysis, and variations on the method. We also asked authors to provide tips for the research process and for writing a manuscript emerging from a study that used the method.

  15. Thematic Analysis: Striving to Meet the Trustworthiness Criteria

    Although there are numerous examples of how to conduct qualitative research, few sophisticated tools are available to researchers for conducting a rigorous and relevant thematic analysis. The purpose of this article is to guide researchers using thematic analysis as a research method.

  16. General-purpose thematic analysis: a useful qualitative method for

    Thematic analysis is a good starting point for those new to qualitative research and is relevant to many questions in the perioperative context. It can be used to understand the experiences of healthcare professionals and patients and their families. gives examples of questions amenable to thematic analysis in anaesthesia research.

  17. Chapter 22: Thematic Analysis

    Learn about three forms of thematic analysis: applied, framework and reflexive. See how to conduct each method, its strengths and limitations, and examples of codebooks and matrices.

  18. Role Of Thematic Analysis In Qualitative Research

    Thematic analysis in qualitative research is an unsupervised approach that enables you to create categories and perform statistical tests without having to set up any rules or procedures in advance. The method allows you to gain deeper insights into participants' motivations, emotions, and perspectives.

  19. A Comprehensive Guide to Thematic Analysis in Qualitative Research

    Learn how to conduct a thematic analysis of qualitative data, such as interviews or focus groups, by identifying and analyzing recurring themes. Follow the six-step process, see examples of different types of thematic analysis, and find out when to use this method in UX research.

  20. Thematic Analysis: Making Values Emerge from Texts

    This chapter explains how thematic analysis can be used to make values emerge from texts. Taking reflexive thematic analysis as its starting point, it begins by giving a general overview of the processes of coding and generating themes from codes. The chapter then...

  21. Student Examples of Good Practice

    Student Examples of Good Practice Sometimes it's good to know what 'doing a good job' looks like… To help those wanting to understand what describing the reflexive TA process well might look like, we offer some good examples here, from student projects. This may be particularly helpful for students doing research projects, and for people very well-trained in positivism.

  22. Children's understandings' of obesity, a thematic analysis

    Abstract. Childhood obesity is a major concern in today's society. Research suggests the inclusion of the views and understandings of a target group facilitates strategies that have better efficacy. The objective of this study was to explore the concepts and themes that make up children's understandings of the causes and consequences of obesity.

  23. (PDF) Exploring Thematic Analysis in Qualitative Research

    Thematic analysis has evolved as a prominent qualitative research method, rooted in the rich history of social sciences and, in particular, psychology.

  24. How to Create A Codebook for Thematic Analysis: A Practical Guide

    Increasing rigor and reducing bias in qualitative research: A document analysis of parliamentary debates using applied thematic analysis. Qualitative Social Work, 18(6), 965-980. Virginia Braun & Victoria Clarke (2006) Using thematic analysis in psychology, Qualitative Research in Psychology, 3:2, 77-101, DOI: 10.1191/1478088706qp063oa

  25. Scientific models for qualitative research: a textual thematic analysis

    However, discussions about how to develop models in qualitative research - particularly in the literature on thematic analysis - are sparse. Aim: To discuss an approach to scientific qualitative modelling that uses the new technique described in the first part of this article ( Gildberg and Wilson 2023 ): the Empirical Test for Thematic ...

  26. Understanding and Identifying 'Themes' in Qualitative Case Study Research

    Themes are identified with any form of qualitative research method, be it phenomenology, narrative analysis, grounded theory, thematic analysis or any other form. However, the purpose and process of identifying themes may differ based not only on the methodology but also the research questions ( Braun & Clarke, 2006 ).

  27. Challenges and facilitators in the experience of caregiving for an

    Purpose: To obtain a better understanding of the factors which complicate or facilitate the adjustment of caregivers after traumatic brain injury (TBI) in older adults. Research Method: At 4, 8, and 12 months post-TBI (mild to severe), 65 caregivers answered two open-ended questions regarding facilitators and challenges linked to the injury of their loved one. A thematic analysis was performed ...

  28. Performance and biases of Large Language Models in public opinion

    Future research should focus on improving the representativeness of training datasets, enriching the covariate and thematic analysis, and developing methodologies to assess and reduce biases in ...