When you choose to publish with PLOS, your research makes an impact. Make your work accessible to all, without restrictions, and accelerate scientific discovery with options like preprints and published peer review that make your work more Open.

  • PLOS Biology
  • PLOS Climate
  • PLOS Complex Systems
  • PLOS Computational Biology
  • PLOS Digital Health
  • PLOS Genetics
  • PLOS Global Public Health
  • PLOS Medicine
  • PLOS Mental Health
  • PLOS Neglected Tropical Diseases
  • PLOS Pathogens
  • PLOS Sustainability and Transformation
  • PLOS Collections
  • How to Write Your Methods

paper research techniques

Ensure understanding, reproducibility and replicability

What should you include in your methods section, and how much detail is appropriate?

Why Methods Matter

The methods section was once the most likely part of a paper to be unfairly abbreviated, overly summarized, or even relegated to hard-to-find sections of a publisher’s website. While some journals may responsibly include more detailed elements of methods in supplementary sections, the movement for increased reproducibility and rigor in science has reinstated the importance of the methods section. Methods are now viewed as a key element in establishing the credibility of the research being reported, alongside the open availability of data and results.

A clear methods section impacts editorial evaluation and readers’ understanding, and is also the backbone of transparency and replicability.

For example, the Reproducibility Project: Cancer Biology project set out in 2013 to replicate experiments from 50 high profile cancer papers, but revised their target to 18 papers once they understood how much methodological detail was not contained in the original papers.

paper research techniques

What to include in your methods section

What you include in your methods sections depends on what field you are in and what experiments you are performing. However, the general principle in place at the majority of journals is summarized well by the guidelines at PLOS ONE : “The Materials and Methods section should provide enough detail to allow suitably skilled investigators to fully replicate your study. ” The emphases here are deliberate: the methods should enable readers to understand your paper, and replicate your study. However, there is no need to go into the level of detail that a lay-person would require—the focus is on the reader who is also trained in your field, with the suitable skills and knowledge to attempt a replication.

A constant principle of rigorous science

A methods section that enables other researchers to understand and replicate your results is a constant principle of rigorous, transparent, and Open Science. Aim to be thorough, even if a particular journal doesn’t require the same level of detail . Reproducibility is all of our responsibility. You cannot create any problems by exceeding a minimum standard of information. If a journal still has word-limits—either for the overall article or specific sections—and requires some methodological details to be in a supplemental section, that is OK as long as the extra details are searchable and findable .

Imagine replicating your own work, years in the future

As part of PLOS’ presentation on Reproducibility and Open Publishing (part of UCSF’s Reproducibility Series ) we recommend planning the level of detail in your methods section by imagining you are writing for your future self, replicating your own work. When you consider that you might be at a different institution, with different account logins, applications, resources, and access levels—you can help yourself imagine the level of specificity that you yourself would require to redo the exact experiment. Consider:

  • Which details would you need to be reminded of? 
  • Which cell line, or antibody, or software, or reagent did you use, and does it have a Research Resource ID (RRID) that you can cite?
  • Which version of a questionnaire did you use in your survey? 
  • Exactly which visual stimulus did you show participants, and is it publicly available? 
  • What participants did you decide to exclude? 
  • What process did you adjust, during your work? 

Tip: Be sure to capture any changes to your protocols

You yourself would want to know about any adjustments, if you ever replicate the work, so you can surmise that anyone else would want to as well. Even if a necessary adjustment you made was not ideal, transparency is the key to ensuring this is not regarded as an issue in the future. It is far better to transparently convey any non-optimal methods, or methodological constraints, than to conceal them, which could result in reproducibility or ethical issues downstream.

Visual aids for methods help when reading the whole paper

Consider whether a visual representation of your methods could be appropriate or aid understanding your process. A visual reference readers can easily return to, like a flow-diagram, decision-tree, or checklist, can help readers to better understand the complete article, not just the methods section.

Ethical Considerations

In addition to describing what you did, it is just as important to assure readers that you also followed all relevant ethical guidelines when conducting your research. While ethical standards and reporting guidelines are often presented in a separate section of a paper, ensure that your methods and protocols actually follow these guidelines. Read more about ethics .

Existing standards, checklists, guidelines, partners

While the level of detail contained in a methods section should be guided by the universal principles of rigorous science outlined above, various disciplines, fields, and projects have worked hard to design and develop consistent standards, guidelines, and tools to help with reporting all types of experiment. Below, you’ll find some of the key initiatives. Ensure you read the submission guidelines for the specific journal you are submitting to, in order to discover any further journal- or field-specific policies to follow, or initiatives/tools to utilize.

Tip: Keep your paper moving forward by providing the proper paperwork up front

Be sure to check the journal guidelines and provide the necessary documents with your manuscript submission. Collecting the necessary documentation can greatly slow the first round of peer review, or cause delays when you submit your revision.

Randomized Controlled Trials – CONSORT The Consolidated Standards of Reporting Trials (CONSORT) project covers various initiatives intended to prevent the problems of  inadequate reporting of randomized controlled trials. The primary initiative is an evidence-based minimum set of recommendations for reporting randomized trials known as the CONSORT Statement . 

Systematic Reviews and Meta-Analyses – PRISMA The Preferred Reporting Items for Systematic Reviews and Meta-Analyses ( PRISMA ) is an evidence-based minimum set of items focusing  on the reporting of  reviews evaluating randomized trials and other types of research.

Research using Animals – ARRIVE The Animal Research: Reporting of In Vivo Experiments ( ARRIVE ) guidelines encourage maximizing the information reported in research using animals thereby minimizing unnecessary studies. (Original study and proposal , and updated guidelines , in PLOS Biology .) 

Laboratory Protocols Protocols.io has developed a platform specifically for the sharing and updating of laboratory protocols , which are assigned their own DOI and can be linked from methods sections of papers to enhance reproducibility. Contextualize your protocol and improve discovery with an accompanying Lab Protocol article in PLOS ONE .

Consistent reporting of Materials, Design, and Analysis – the MDAR checklist A cross-publisher group of editors and experts have developed, tested, and rolled out a checklist to help establish and harmonize reporting standards in the Life Sciences . The checklist , which is available for use by authors to compile their methods, and editors/reviewers to check methods, establishes a minimum set of requirements in transparent reporting and is adaptable to any discipline within the Life Sciences, by covering a breadth of potentially relevant methodological items and considerations. If you are in the Life Sciences and writing up your methods section, try working through the MDAR checklist and see whether it helps you include all relevant details into your methods, and whether it reminded you of anything you might have missed otherwise.

Summary Writing tips

The main challenge you may find when writing your methods is keeping it readable AND covering all the details needed for reproducibility and replicability. While this is difficult, do not compromise on rigorous standards for credibility!

paper research techniques

  • Keep in mind future replicability, alongside understanding and readability.
  • Follow checklists, and field- and journal-specific guidelines.
  • Consider a commitment to rigorous and transparent science a personal responsibility, and not just adhering to journal guidelines.
  • Establish whether there are persistent identifiers for any research resources you use that can be specifically cited in your methods section.
  • Deposit your laboratory protocols in Protocols.io, establishing a permanent link to them. You can update your protocols later if you improve on them, as can future scientists who follow your protocols.
  • Consider visual aids like flow-diagrams, lists, to help with reading other sections of the paper.
  • Be specific about all decisions made during the experiments that someone reproducing your work would need to know.

paper research techniques

Don’t

  • Summarize or abbreviate methods without giving full details in a discoverable supplemental section.
  • Presume you will always be able to remember how you performed the experiments, or have access to private or institutional notebooks and resources.
  • Attempt to hide constraints or non-optimal decisions you had to make–transparency is the key to ensuring the credibility of your research.
  • How to Write a Great Title
  • How to Write an Abstract
  • How to Report Statistics
  • How to Write Discussions and Conclusions
  • How to Edit Your Work

The contents of the Peer Review Center are also available as a live, interactive training session, complete with slides, talking points, and activities. …

The contents of the Writing Center are also available as a live, interactive training session, complete with slides, talking points, and activities. …

There’s a lot to consider when deciding where to submit your work. Learn how to choose a journal that will help your study reach its audience, while reflecting your values as a researcher…

What is Research Methodology? Definition, Types, and Examples

paper research techniques

Research methodology 1,2 is a structured and scientific approach used to collect, analyze, and interpret quantitative or qualitative data to answer research questions or test hypotheses. A research methodology is like a plan for carrying out research and helps keep researchers on track by limiting the scope of the research. Several aspects must be considered before selecting an appropriate research methodology, such as research limitations and ethical concerns that may affect your research.

The research methodology section in a scientific paper describes the different methodological choices made, such as the data collection and analysis methods, and why these choices were selected. The reasons should explain why the methods chosen are the most appropriate to answer the research question. A good research methodology also helps ensure the reliability and validity of the research findings. There are three types of research methodology—quantitative, qualitative, and mixed-method, which can be chosen based on the research objectives.

What is research methodology ?

A research methodology describes the techniques and procedures used to identify and analyze information regarding a specific research topic. It is a process by which researchers design their study so that they can achieve their objectives using the selected research instruments. It includes all the important aspects of research, including research design, data collection methods, data analysis methods, and the overall framework within which the research is conducted. While these points can help you understand what is research methodology, you also need to know why it is important to pick the right methodology.

Why is research methodology important?

Having a good research methodology in place has the following advantages: 3

  • Helps other researchers who may want to replicate your research; the explanations will be of benefit to them.
  • You can easily answer any questions about your research if they arise at a later stage.
  • A research methodology provides a framework and guidelines for researchers to clearly define research questions, hypotheses, and objectives.
  • It helps researchers identify the most appropriate research design, sampling technique, and data collection and analysis methods.
  • A sound research methodology helps researchers ensure that their findings are valid and reliable and free from biases and errors.
  • It also helps ensure that ethical guidelines are followed while conducting research.
  • A good research methodology helps researchers in planning their research efficiently, by ensuring optimum usage of their time and resources.

Writing the methods section of a research paper? Let Paperpal help you achieve perfection

Types of research methodology.

There are three types of research methodology based on the type of research and the data required. 1

  • Quantitative research methodology focuses on measuring and testing numerical data. This approach is good for reaching a large number of people in a short amount of time. This type of research helps in testing the causal relationships between variables, making predictions, and generalizing results to wider populations.
  • Qualitative research methodology examines the opinions, behaviors, and experiences of people. It collects and analyzes words and textual data. This research methodology requires fewer participants but is still more time consuming because the time spent per participant is quite large. This method is used in exploratory research where the research problem being investigated is not clearly defined.
  • Mixed-method research methodology uses the characteristics of both quantitative and qualitative research methodologies in the same study. This method allows researchers to validate their findings, verify if the results observed using both methods are complementary, and explain any unexpected results obtained from one method by using the other method.

What are the types of sampling designs in research methodology?

Sampling 4 is an important part of a research methodology and involves selecting a representative sample of the population to conduct the study, making statistical inferences about them, and estimating the characteristics of the whole population based on these inferences. There are two types of sampling designs in research methodology—probability and nonprobability.

  • Probability sampling

In this type of sampling design, a sample is chosen from a larger population using some form of random selection, that is, every member of the population has an equal chance of being selected. The different types of probability sampling are:

  • Systematic —sample members are chosen at regular intervals. It requires selecting a starting point for the sample and sample size determination that can be repeated at regular intervals. This type of sampling method has a predefined range; hence, it is the least time consuming.
  • Stratified —researchers divide the population into smaller groups that don’t overlap but represent the entire population. While sampling, these groups can be organized, and then a sample can be drawn from each group separately.
  • Cluster —the population is divided into clusters based on demographic parameters like age, sex, location, etc.
  • Convenience —selects participants who are most easily accessible to researchers due to geographical proximity, availability at a particular time, etc.
  • Purposive —participants are selected at the researcher’s discretion. Researchers consider the purpose of the study and the understanding of the target audience.
  • Snowball —already selected participants use their social networks to refer the researcher to other potential participants.
  • Quota —while designing the study, the researchers decide how many people with which characteristics to include as participants. The characteristics help in choosing people most likely to provide insights into the subject.

What are data collection methods?

During research, data are collected using various methods depending on the research methodology being followed and the research methods being undertaken. Both qualitative and quantitative research have different data collection methods, as listed below.

Qualitative research 5

  • One-on-one interviews: Helps the interviewers understand a respondent’s subjective opinion and experience pertaining to a specific topic or event
  • Document study/literature review/record keeping: Researchers’ review of already existing written materials such as archives, annual reports, research articles, guidelines, policy documents, etc.
  • Focus groups: Constructive discussions that usually include a small sample of about 6-10 people and a moderator, to understand the participants’ opinion on a given topic.
  • Qualitative observation : Researchers collect data using their five senses (sight, smell, touch, taste, and hearing).

Quantitative research 6

  • Sampling: The most common type is probability sampling.
  • Interviews: Commonly telephonic or done in-person.
  • Observations: Structured observations are most commonly used in quantitative research. In this method, researchers make observations about specific behaviors of individuals in a structured setting.
  • Document review: Reviewing existing research or documents to collect evidence for supporting the research.
  • Surveys and questionnaires. Surveys can be administered both online and offline depending on the requirement and sample size.

Let Paperpal help you write the perfect research methods section. Start now!

What are data analysis methods.

The data collected using the various methods for qualitative and quantitative research need to be analyzed to generate meaningful conclusions. These data analysis methods 7 also differ between quantitative and qualitative research.

Quantitative research involves a deductive method for data analysis where hypotheses are developed at the beginning of the research and precise measurement is required. The methods include statistical analysis applications to analyze numerical data and are grouped into two categories—descriptive and inferential.

Descriptive analysis is used to describe the basic features of different types of data to present it in a way that ensures the patterns become meaningful. The different types of descriptive analysis methods are:

  • Measures of frequency (count, percent, frequency)
  • Measures of central tendency (mean, median, mode)
  • Measures of dispersion or variation (range, variance, standard deviation)
  • Measure of position (percentile ranks, quartile ranks)

Inferential analysis is used to make predictions about a larger population based on the analysis of the data collected from a smaller population. This analysis is used to study the relationships between different variables. Some commonly used inferential data analysis methods are:

  • Correlation: To understand the relationship between two or more variables.
  • Cross-tabulation: Analyze the relationship between multiple variables.
  • Regression analysis: Study the impact of independent variables on the dependent variable.
  • Frequency tables: To understand the frequency of data.
  • Analysis of variance: To test the degree to which two or more variables differ in an experiment.

Qualitative research involves an inductive method for data analysis where hypotheses are developed after data collection. The methods include:

  • Content analysis: For analyzing documented information from text and images by determining the presence of certain words or concepts in texts.
  • Narrative analysis: For analyzing content obtained from sources such as interviews, field observations, and surveys. The stories and opinions shared by people are used to answer research questions.
  • Discourse analysis: For analyzing interactions with people considering the social context, that is, the lifestyle and environment, under which the interaction occurs.
  • Grounded theory: Involves hypothesis creation by data collection and analysis to explain why a phenomenon occurred.
  • Thematic analysis: To identify important themes or patterns in data and use these to address an issue.

How to choose a research methodology?

Here are some important factors to consider when choosing a research methodology: 8

  • Research objectives, aims, and questions —these would help structure the research design.
  • Review existing literature to identify any gaps in knowledge.
  • Check the statistical requirements —if data-driven or statistical results are needed then quantitative research is the best. If the research questions can be answered based on people’s opinions and perceptions, then qualitative research is most suitable.
  • Sample size —sample size can often determine the feasibility of a research methodology. For a large sample, less effort- and time-intensive methods are appropriate.
  • Constraints —constraints of time, geography, and resources can help define the appropriate methodology.

Got writer’s block? Kickstart your research paper writing with Paperpal now!

How to write a research methodology .

A research methodology should include the following components: 3,9

  • Research design —should be selected based on the research question and the data required. Common research designs include experimental, quasi-experimental, correlational, descriptive, and exploratory.
  • Research method —this can be quantitative, qualitative, or mixed-method.
  • Reason for selecting a specific methodology —explain why this methodology is the most suitable to answer your research problem.
  • Research instruments —explain the research instruments you plan to use, mainly referring to the data collection methods such as interviews, surveys, etc. Here as well, a reason should be mentioned for selecting the particular instrument.
  • Sampling —this involves selecting a representative subset of the population being studied.
  • Data collection —involves gathering data using several data collection methods, such as surveys, interviews, etc.
  • Data analysis —describe the data analysis methods you will use once you’ve collected the data.
  • Research limitations —mention any limitations you foresee while conducting your research.
  • Validity and reliability —validity helps identify the accuracy and truthfulness of the findings; reliability refers to the consistency and stability of the results over time and across different conditions.
  • Ethical considerations —research should be conducted ethically. The considerations include obtaining consent from participants, maintaining confidentiality, and addressing conflicts of interest.

Streamline Your Research Paper Writing Process with Paperpal

The methods section is a critical part of the research papers, allowing researchers to use this to understand your findings and replicate your work when pursuing their own research. However, it is usually also the most difficult section to write. This is where Paperpal can help you overcome the writer’s block and create the first draft in minutes with Paperpal Copilot, its secure generative AI feature suite.  

With Paperpal you can get research advice, write and refine your work, rephrase and verify the writing, and ensure submission readiness, all in one place. Here’s how you can use Paperpal to develop the first draft of your methods section.  

  • Generate an outline: Input some details about your research to instantly generate an outline for your methods section 
  • Develop the section: Use the outline and suggested sentence templates to expand your ideas and develop the first draft.  
  • P araph ras e and trim : Get clear, concise academic text with paraphrasing that conveys your work effectively and word reduction to fix redundancies. 
  • Choose the right words: Enhance text by choosing contextual synonyms based on how the words have been used in previously published work.  
  • Check and verify text : Make sure the generated text showcases your methods correctly, has all the right citations, and is original and authentic. .   

You can repeat this process to develop each section of your research manuscript, including the title, abstract and keywords. Ready to write your research papers faster, better, and without the stress? Sign up for Paperpal and start writing today!

Frequently Asked Questions

Q1. What are the key components of research methodology?

A1. A good research methodology has the following key components:

  • Research design
  • Data collection procedures
  • Data analysis methods
  • Ethical considerations

Q2. Why is ethical consideration important in research methodology?

A2. Ethical consideration is important in research methodology to ensure the readers of the reliability and validity of the study. Researchers must clearly mention the ethical norms and standards followed during the conduct of the research and also mention if the research has been cleared by any institutional board. The following 10 points are the important principles related to ethical considerations: 10

  • Participants should not be subjected to harm.
  • Respect for the dignity of participants should be prioritized.
  • Full consent should be obtained from participants before the study.
  • Participants’ privacy should be ensured.
  • Confidentiality of the research data should be ensured.
  • Anonymity of individuals and organizations participating in the research should be maintained.
  • The aims and objectives of the research should not be exaggerated.
  • Affiliations, sources of funding, and any possible conflicts of interest should be declared.
  • Communication in relation to the research should be honest and transparent.
  • Misleading information and biased representation of primary data findings should be avoided.

Q3. What is the difference between methodology and method?

A3. Research methodology is different from a research method, although both terms are often confused. Research methods are the tools used to gather data, while the research methodology provides a framework for how research is planned, conducted, and analyzed. The latter guides researchers in making decisions about the most appropriate methods for their research. Research methods refer to the specific techniques, procedures, and tools used by researchers to collect, analyze, and interpret data, for instance surveys, questionnaires, interviews, etc.

Research methodology is, thus, an integral part of a research study. It helps ensure that you stay on track to meet your research objectives and answer your research questions using the most appropriate data collection and analysis tools based on your research design.

Accelerate your research paper writing with Paperpal. Try for free now!

  • Research methodologies. Pfeiffer Library website. Accessed August 15, 2023. https://library.tiffin.edu/researchmethodologies/whatareresearchmethodologies
  • Types of research methodology. Eduvoice website. Accessed August 16, 2023. https://eduvoice.in/types-research-methodology/
  • The basics of research methodology: A key to quality research. Voxco. Accessed August 16, 2023. https://www.voxco.com/blog/what-is-research-methodology/
  • Sampling methods: Types with examples. QuestionPro website. Accessed August 16, 2023. https://www.questionpro.com/blog/types-of-sampling-for-social-research/
  • What is qualitative research? Methods, types, approaches, examples. Researcher.Life blog. Accessed August 15, 2023. https://researcher.life/blog/article/what-is-qualitative-research-methods-types-examples/
  • What is quantitative research? Definition, methods, types, and examples. Researcher.Life blog. Accessed August 15, 2023. https://researcher.life/blog/article/what-is-quantitative-research-types-and-examples/
  • Data analysis in research: Types & methods. QuestionPro website. Accessed August 16, 2023. https://www.questionpro.com/blog/data-analysis-in-research/#Data_analysis_in_qualitative_research
  • Factors to consider while choosing the right research methodology. PhD Monster website. Accessed August 17, 2023. https://www.phdmonster.com/factors-to-consider-while-choosing-the-right-research-methodology/
  • What is research methodology? Research and writing guides. Accessed August 14, 2023. https://paperpile.com/g/what-is-research-methodology/
  • Ethical considerations. Business research methodology website. Accessed August 17, 2023. https://research-methodology.net/research-methodology/ethical-considerations/

Paperpal is a comprehensive AI writing toolkit that helps students and researchers achieve 2x the writing in half the time. It leverages 21+ years of STM experience and insights from millions of research articles to provide in-depth academic writing, language editing, and submission readiness support to help you write better, faster.  

Get accurate academic translations, rewriting support, grammar checks, vocabulary suggestions, and generative AI assistance that delivers human precision at machine speed. Try for free or upgrade to Paperpal Prime starting at US$19 a month to access premium features, including consistency, plagiarism, and 30+ submission readiness checks to help you succeed.  

Experience the future of academic writing – Sign up to Paperpal and start writing for free!  

Related Reads:

  • Dangling Modifiers and How to Avoid Them in Your Writing 
  • Webinar: How to Use Generative AI Tools Ethically in Your Academic Writing
  • Research Outlines: How to Write An Introduction Section in Minutes with Paperpal Copilot
  • How to Paraphrase Research Papers Effectively

Language and Grammar Rules for Academic Writing

Climatic vs. climactic: difference and examples, you may also like, how to use paperpal to generate emails &..., ai in education: it’s time to change the..., is it ethical to use ai-generated abstracts without..., do plagiarism checkers detect ai content, word choice problems: how to use the right..., how to avoid plagiarism when using generative ai..., what are journal guidelines on using generative ai..., types of plagiarism and 6 tips to avoid..., how to write an essay introduction (with examples)..., similarity checks: the author’s guide to plagiarism and....

  • Locations and Hours
  • UCLA Library
  • Research Guides
  • Research Tips and Tools

Advanced Research Methods

Writing the research paper.

  • What Is Research?
  • Library Research
  • Writing a Research Proposal

Before Writing the Paper

Methods, thesis, and hypothesis, clarity, precision, and academic expression, format your paper, typical problems, a few suggestions, avoid plagiarism.

  • Presenting the Research Paper
  • Try to find a subject that really interests you.
  • While you explore the topic, narrow or broaden your target and focus on something that gives the most promising results.
  • Don't choose a huge subject if you have to write a 3 page long paper, and broaden your topic sufficiently if you have to submit at least 25 pages.
  • Consult your class instructor (and your classmates) about the topic.
  • Find primary and secondary sources in the library.
  • Read and critically analyse them.
  • Take notes.
  • Compile surveys, collect data, gather materials for quantitative analysis (if these are good methods to investigate the topic more deeply).
  • Come up with new ideas about the topic. Try to formulate your ideas in a few sentences.
  • Review your notes and other materials and enrich the outline.
  • Try to estimate how long the individual parts will be.
  • Do others understand what you want to say?
  • Do they accept it as new knowledge or relevant and important for a paper?
  • Do they agree that your thoughts will result in a successful paper?
  • Qualitative: gives answers on questions (how, why, when, who, what, etc.) by investigating an issue
  • Quantitative:requires data and the analysis of data as well
  • the essence, the point of the research paper in one or two sentences.
  • a statement that can be proved or disproved.
  • Be specific.
  • Avoid ambiguity.
  • Use predominantly the active voice, not the passive.
  • Deal with one issue in one paragraph.
  • Be accurate.
  • Double-check your data, references, citations and statements.

Academic Expression

  • Don't use familiar style or colloquial/slang expressions.
  • Write in full sentences.
  • Check the meaning of the words if you don't know exactly what they mean.
  • Avoid metaphors.
  • Almost the rough content of every paragraph.
  • The order of the various topics in your paper.
  • On the basis of the outline, start writing a part by planning the content, and then write it down.
  • Put a visible mark (which you will later delete) where you need to quote a source, and write in the citation when you finish writing that part or a bigger part.
  • Does the text make sense?
  • Could you explain what you wanted?
  • Did you write good sentences?
  • Is there something missing?
  • Check the spelling.
  • Complete the citations, bring them in standard format.

Use the guidelines that your instructor requires (MLA, Chicago, APA, Turabian, etc.).

  • Adjust margins, spacing, paragraph indentation, place of page numbers, etc.
  • Standardize the bibliography or footnotes according to the guidelines.

paper research techniques

  • EndNote and EndNote Basic by UCLA Library Last Updated Mar 18, 2024 667 views this year
  • Zotero by UCLA Library Last Updated Jan 18, 2023 507 views this year

(Based on English Composition 2 from Illinois Valley Community College):

  • Weak organization
  • Poor support and development of ideas
  • Weak use of secondary sources
  • Excessive errors
  • Stylistic weakness

When collecting materials, selecting research topic, and writing the paper:

  • Be systematic and organized (e.g. keep your bibliography neat and organized; write your notes in a neat way, so that you can find them later on.
  • Use your critical thinking ability when you read.
  • Write down your thoughts (so that you can reconstruct them later).
  • Stop when you have a really good idea and think about whether you could enlarge it to a whole research paper. If yes, take much longer notes.
  • When you write down a quotation or summarize somebody else's thoughts in your notes or in the paper, cite the source (i.e. write down the author, title, publication place, year, page number).
  • If you quote or summarize a thought from the internet, cite the internet source.
  • Write an outline that is detailed enough to remind you about the content.
  • Read your paper for yourself or, preferably, somebody else. 
  • When you finish writing, check the spelling;
  • Use the citation form (MLA, Chicago, or other) that your instructor requires and use it everywhere.

Plagiarism : somebody else's words or ideas presented without citation by an author

  • Cite your source every time when you quote a part of somebody's work.
  • Cite your source  every time when you summarize a thought from somebody's work.
  • Cite your source  every time when you use a source (quote or summarize) from the Internet.

Consult the Citing Sources research guide for further details.

  • << Previous: Writing a Research Proposal
  • Next: Presenting the Research Paper >>
  • Last Updated: Jan 4, 2024 12:24 PM
  • URL: https://guides.library.ucla.edu/research-methods

Join thousands of product people at Insight Out Conf on April 11. Register free.

Insights hub solutions

Analyze data

Uncover deep customer insights with fast, powerful features, store insights, curate and manage insights in one searchable platform, scale research, unlock the potential of customer insights at enterprise scale.

Featured reads

paper research techniques

Inspiration

Three things to look forward to at Insight Out

Create a quick summary to identify key takeaways and keep your team in the loop.

Tips and tricks

Make magic with your customer data in Dovetail

paper research techniques

Four ways Dovetail helps Product Managers master continuous product discovery

Events and videos

© Dovetail Research Pty. Ltd.

  • How to write a research paper

Last updated

11 January 2024

Reviewed by

With proper planning, knowledge, and framework, completing a research paper can be a fulfilling and exciting experience. 

Though it might initially sound slightly intimidating, this guide will help you embrace the challenge. 

By documenting your findings, you can inspire others and make a difference in your field. Here's how you can make your research paper unique and comprehensive.

  • What is a research paper?

Research papers allow you to demonstrate your knowledge and understanding of a particular topic. These papers are usually lengthier and more detailed than typical essays, requiring deeper insight into the chosen topic.

To write a research paper, you must first choose a topic that interests you and is relevant to the field of study. Once you’ve selected your topic, gathering as many relevant resources as possible, including books, scholarly articles, credible websites, and other academic materials, is essential. You must then read and analyze these sources, summarizing their key points and identifying gaps in the current research.

You can formulate your ideas and opinions once you thoroughly understand the existing research. To get there might involve conducting original research, gathering data, or analyzing existing data sets. It could also involve presenting an original argument or interpretation of the existing research.

Writing a successful research paper involves presenting your findings clearly and engagingly, which might involve using charts, graphs, or other visual aids to present your data and using concise language to explain your findings. You must also ensure your paper adheres to relevant academic formatting guidelines, including proper citations and references.

Overall, writing a research paper requires a significant amount of time, effort, and attention to detail. However, it is also an enriching experience that allows you to delve deeply into a subject that interests you and contribute to the existing body of knowledge in your chosen field.

  • How long should a research paper be?

Research papers are deep dives into a topic. Therefore, they tend to be longer pieces of work than essays or opinion pieces. 

However, a suitable length depends on the complexity of the topic and your level of expertise. For instance, are you a first-year college student or an experienced professional? 

Also, remember that the best research papers provide valuable information for the benefit of others. Therefore, the quality of information matters most, not necessarily the length. Being concise is valuable.

Following these best practice steps will help keep your process simple and productive:

1. Gaining a deep understanding of any expectations

Before diving into your intended topic or beginning the research phase, take some time to orient yourself. Suppose there’s a specific topic assigned to you. In that case, it’s essential to deeply understand the question and organize your planning and approach in response. Pay attention to the key requirements and ensure you align your writing accordingly. 

This preparation step entails

Deeply understanding the task or assignment

Being clear about the expected format and length

Familiarizing yourself with the citation and referencing requirements 

Understanding any defined limits for your research contribution

Where applicable, speaking to your professor or research supervisor for further clarification

2. Choose your research topic

Select a research topic that aligns with both your interests and available resources. Ideally, focus on a field where you possess significant experience and analytical skills. In crafting your research paper, it's crucial to go beyond summarizing existing data and contribute fresh insights to the chosen area.

Consider narrowing your focus to a specific aspect of the topic. For example, if exploring the link between technology and mental health, delve into how social media use during the pandemic impacts the well-being of college students. Conducting interviews and surveys with students could provide firsthand data and unique perspectives, adding substantial value to the existing knowledge.

When finalizing your topic, adhere to legal and ethical norms in the relevant area (this ensures the integrity of your research, protects participants' rights, upholds intellectual property standards, and ensures transparency and accountability). Following these principles not only maintains the credibility of your work but also builds trust within your academic or professional community.

For instance, in writing about medical research, consider legal and ethical norms, including patient confidentiality laws and informed consent requirements. Similarly, if analyzing user data on social media platforms, be mindful of data privacy regulations, ensuring compliance with laws governing personal information collection and use. Aligning with legal and ethical standards not only avoids potential issues but also underscores the responsible conduct of your research.

3. Gather preliminary research

Once you’ve landed on your topic, it’s time to explore it further. You’ll want to discover more about available resources and existing research relevant to your assignment at this stage. 

This exploratory phase is vital as you may discover issues with your original idea or realize you have insufficient resources to explore the topic effectively. This key bit of groundwork allows you to redirect your research topic in a different, more feasible, or more relevant direction if necessary. 

Spending ample time at this stage ensures you gather everything you need, learn as much as you can about the topic, and discover gaps where the topic has yet to be sufficiently covered, offering an opportunity to research it further. 

4. Define your research question

To produce a well-structured and focused paper, it is imperative to formulate a clear and precise research question that will guide your work. Your research question must be informed by the existing literature and tailored to the scope and objectives of your project. By refining your focus, you can produce a thoughtful and engaging paper that effectively communicates your ideas to your readers.

5. Write a thesis statement

A thesis statement is a one-to-two-sentence summary of your research paper's main argument or direction. It serves as an overall guide to summarize the overall intent of the research paper for you and anyone wanting to know more about the research.

A strong thesis statement is:

Concise and clear: Explain your case in simple sentences (avoid covering multiple ideas). It might help to think of this section as an elevator pitch.

Specific: Ensure that there is no ambiguity in your statement and that your summary covers the points argued in the paper.

Debatable: A thesis statement puts forward a specific argument––it is not merely a statement but a debatable point that can be analyzed and discussed.

Here are three thesis statement examples from different disciplines:

Psychology thesis example: "We're studying adults aged 25-40 to see if taking short breaks for mindfulness can help with stress. Our goal is to find practical ways to manage anxiety better."

Environmental science thesis example: "This research paper looks into how having more city parks might make the air cleaner and keep people healthier. I want to find out if more green spaces means breathing fewer carcinogens in big cities."

UX research thesis example: "This study focuses on improving mobile banking for older adults using ethnographic research, eye-tracking analysis, and interactive prototyping. We investigate the usefulness of eye-tracking analysis with older individuals, aiming to spark debate and offer fresh perspectives on UX design and digital inclusivity for the aging population."

6. Conduct in-depth research

A research paper doesn’t just include research that you’ve uncovered from other papers and studies but your fresh insights, too. You will seek to become an expert on your topic––understanding the nuances in the current leading theories. You will analyze existing research and add your thinking and discoveries.  It's crucial to conduct well-designed research that is rigorous, robust, and based on reliable sources. Suppose a research paper lacks evidence or is biased. In that case, it won't benefit the academic community or the general public. Therefore, examining the topic thoroughly and furthering its understanding through high-quality research is essential. That usually means conducting new research. Depending on the area under investigation, you may conduct surveys, interviews, diary studies, or observational research to uncover new insights or bolster current claims.

7. Determine supporting evidence

Not every piece of research you’ve discovered will be relevant to your research paper. It’s important to categorize the most meaningful evidence to include alongside your discoveries. It's important to include evidence that doesn't support your claims to avoid exclusion bias and ensure a fair research paper.

8. Write a research paper outline

Before diving in and writing the whole paper, start with an outline. It will help you to see if more research is needed, and it will provide a framework by which to write a more compelling paper. Your supervisor may even request an outline to approve before beginning to write the first draft of the full paper. An outline will include your topic, thesis statement, key headings, short summaries of the research, and your arguments.

9. Write your first draft

Once you feel confident about your outline and sources, it’s time to write your first draft. While penning a long piece of content can be intimidating, if you’ve laid the groundwork, you will have a structure to help you move steadily through each section. To keep up motivation and inspiration, it’s often best to keep the pace quick. Stopping for long periods can interrupt your flow and make jumping back in harder than writing when things are fresh in your mind.

10. Cite your sources correctly

It's always a good practice to give credit where it's due, and the same goes for citing any works that have influenced your paper. Building your arguments on credible references adds value and authenticity to your research. In the formatting guidelines section, you’ll find an overview of different citation styles (MLA, CMOS, or APA), which will help you meet any publishing or academic requirements and strengthen your paper's credibility. It is essential to follow the guidelines provided by your school or the publication you are submitting to ensure the accuracy and relevance of your citations.

11. Ensure your work is original

It is crucial to ensure the originality of your paper, as plagiarism can lead to serious consequences. To avoid plagiarism, you should use proper paraphrasing and quoting techniques. Paraphrasing is rewriting a text in your own words while maintaining the original meaning. Quoting involves directly citing the source. Giving credit to the original author or source is essential whenever you borrow their ideas or words. You can also use plagiarism detection tools such as Scribbr or Grammarly to check the originality of your paper. These tools compare your draft writing to a vast database of online sources. If you find any accidental plagiarism, you should correct it immediately by rephrasing or citing the source.

12. Revise, edit, and proofread

One of the essential qualities of excellent writers is their ability to understand the importance of editing and proofreading. Even though it's tempting to call it a day once you've finished your writing, editing your work can significantly improve its quality. It's natural to overlook the weaker areas when you've just finished writing a paper. Therefore, it's best to take a break of a day or two, or even up to a week, to refresh your mind. This way, you can return to your work with a new perspective. After some breathing room, you can spot any inconsistencies, spelling and grammar errors, typos, or missing citations and correct them. 

  • The best research paper format 

The format of your research paper should align with the requirements set forth by your college, school, or target publication. 

There is no one “best” format, per se. Depending on the stated requirements, you may need to include the following elements:

Title page: The title page of a research paper typically includes the title, author's name, and institutional affiliation and may include additional information such as a course name or instructor's name. 

Table of contents: Include a table of contents to make it easy for readers to find specific sections of your paper.

Abstract: The abstract is a summary of the purpose of the paper.

Methods : In this section, describe the research methods used. This may include collecting data, conducting interviews, or doing field research.

Results: Summarize the conclusions you drew from your research in this section.

Discussion: In this section, discuss the implications of your research. Be sure to mention any significant limitations to your approach and suggest areas for further research.

Tables, charts, and illustrations: Use tables, charts, and illustrations to help convey your research findings and make them easier to understand.

Works cited or reference page: Include a works cited or reference page to give credit to the sources that you used to conduct your research.

Bibliography: Provide a list of all the sources you consulted while conducting your research.

Dedication and acknowledgments : Optionally, you may include a dedication and acknowledgments section to thank individuals who helped you with your research.

  • General style and formatting guidelines

Formatting your research paper means you can submit it to your college, journal, or other publications in compliance with their criteria.

Research papers tend to follow the American Psychological Association (APA), Modern Language Association (MLA), or Chicago Manual of Style (CMOS) guidelines.

Here’s how each style guide is typically used:

Chicago Manual of Style (CMOS):

CMOS is a versatile style guide used for various types of writing. It's known for its flexibility and use in the humanities. CMOS provides guidelines for citations, formatting, and overall writing style. It allows for both footnotes and in-text citations, giving writers options based on their preferences or publication requirements.

American Psychological Association (APA):

APA is common in the social sciences. It’s hailed for its clarity and emphasis on precision. It has specific rules for citing sources, creating references, and formatting papers. APA style uses in-text citations with an accompanying reference list. It's designed to convey information efficiently and is widely used in academic and scientific writing.

Modern Language Association (MLA):

MLA is widely used in the humanities, especially literature and language studies. It emphasizes the author-page format for in-text citations and provides guidelines for creating a "Works Cited" page. MLA is known for its focus on the author's name and the literary works cited. It’s frequently used in disciplines that prioritize literary analysis and critical thinking.

To confirm you're using the latest style guide, check the official website or publisher's site for updates, consult academic resources, and verify the guide's publication date. Online platforms and educational resources may also provide summaries and alerts about any revisions or additions to the style guide.

Citing sources

When working on your research paper, it's important to cite the sources you used properly. Your citation style will guide you through this process. Generally, there are three parts to citing sources in your research paper: 

First, provide a brief citation in the body of your essay. This is also known as a parenthetical or in-text citation. 

Second, include a full citation in the Reference list at the end of your paper. Different types of citations include in-text citations, footnotes, and reference lists. 

In-text citations include the author's surname and the date of the citation. 

Footnotes appear at the bottom of each page of your research paper. They may also be summarized within a reference list at the end of the paper. 

A reference list includes all of the research used within the paper at the end of the document. It should include the author, date, paper title, and publisher listed in the order that aligns with your citation style.

10 research paper writing tips:

Following some best practices is essential to writing a research paper that contributes to your field of study and creates a positive impact.

These tactics will help you structure your argument effectively and ensure your work benefits others:

Clear and precise language:  Ensure your language is unambiguous. Use academic language appropriately, but keep it simple. Also, provide clear takeaways for your audience.

Effective idea separation:  Organize the vast amount of information and sources in your paper with paragraphs and titles. Create easily digestible sections for your readers to navigate through.

Compelling intro:  Craft an engaging introduction that captures your reader's interest. Hook your audience and motivate them to continue reading.

Thorough revision and editing:  Take the time to review and edit your paper comprehensively. Use tools like Grammarly to detect and correct small, overlooked errors.

Thesis precision:  Develop a clear and concise thesis statement that guides your paper. Ensure that your thesis aligns with your research's overall purpose and contribution.

Logical flow of ideas:  Maintain a logical progression throughout the paper. Use transitions effectively to connect different sections and maintain coherence.

Critical evaluation of sources:  Evaluate and critically assess the relevance and reliability of your sources. Ensure that your research is based on credible and up-to-date information.

Thematic consistency:  Maintain a consistent theme throughout the paper. Ensure that all sections contribute cohesively to the overall argument.

Relevant supporting evidence:  Provide concise and relevant evidence to support your arguments. Avoid unnecessary details that may distract from the main points.

Embrace counterarguments:  Acknowledge and address opposing views to strengthen your position. Show that you have considered alternative arguments in your field.

7 research tips 

If you want your paper to not only be well-written but also contribute to the progress of human knowledge, consider these tips to take your paper to the next level:

Selecting the appropriate topic: The topic you select should align with your area of expertise, comply with the requirements of your project, and have sufficient resources for a comprehensive investigation.

Use academic databases: Academic databases such as PubMed, Google Scholar, and JSTOR offer a wealth of research papers that can help you discover everything you need to know about your chosen topic.

Critically evaluate sources: It is important not to accept research findings at face value. Instead, it is crucial to critically analyze the information to avoid jumping to conclusions or overlooking important details. A well-written research paper requires a critical analysis with thorough reasoning to support claims.

Diversify your sources: Expand your research horizons by exploring a variety of sources beyond the standard databases. Utilize books, conference proceedings, and interviews to gather diverse perspectives and enrich your understanding of the topic.

Take detailed notes: Detailed note-taking is crucial during research and can help you form the outline and body of your paper.

Stay up on trends: Keep abreast of the latest developments in your field by regularly checking for recent publications. Subscribe to newsletters, follow relevant journals, and attend conferences to stay informed about emerging trends and advancements. 

Engage in peer review: Seek feedback from peers or mentors to ensure the rigor and validity of your research. Peer review helps identify potential weaknesses in your methodology and strengthens the overall credibility of your findings.

  • The real-world impact of research papers

Writing a research paper is more than an academic or business exercise. The experience provides an opportunity to explore a subject in-depth, broaden one's understanding, and arrive at meaningful conclusions. With careful planning, dedication, and hard work, writing a research paper can be a fulfilling and enriching experience contributing to advancing knowledge.

How do I publish my research paper? 

Many academics wish to publish their research papers. While challenging, your paper might get traction if it covers new and well-written information. To publish your research paper, find a target publication, thoroughly read their guidelines, format your paper accordingly, and send it to them per their instructions. You may need to include a cover letter, too. After submission, your paper may be peer-reviewed by experts to assess its legitimacy, quality, originality, and methodology. Following review, you will be informed by the publication whether they have accepted or rejected your paper. 

What is a good opening sentence for a research paper? 

Beginning your research paper with a compelling introduction can ensure readers are interested in going further. A relevant quote, a compelling statistic, or a bold argument can start the paper and hook your reader. Remember, though, that the most important aspect of a research paper is the quality of the information––not necessarily your ability to storytell, so ensure anything you write aligns with your goals.

Research paper vs. a research proposal—what’s the difference?

While some may confuse research papers and proposals, they are different documents. 

A research proposal comes before a research paper. It is a detailed document that outlines an intended area of exploration. It includes the research topic, methodology, timeline, sources, and potential conclusions. Research proposals are often required when seeking approval to conduct research. 

A research paper is a summary of research findings. A research paper follows a structured format to present those findings and construct an argument or conclusion.

Get started today

Go from raw data to valuable insights with a flexible research platform

Editor’s picks

Last updated: 21 December 2023

Last updated: 16 December 2023

Last updated: 6 October 2023

Last updated: 17 February 2024

Last updated: 5 March 2024

Last updated: 19 November 2023

Last updated: 15 February 2024

Last updated: 11 March 2024

Last updated: 12 December 2023

Last updated: 6 March 2024

Last updated: 10 April 2023

Last updated: 20 December 2023

Latest articles

Related topics.

  • 10 research paper

Log in or sign up

Get started for free

Sacred Heart University Library

Organizing Academic Research Papers: 6. The Methodology

  • Purpose of Guide
  • Design Flaws to Avoid
  • Glossary of Research Terms
  • Narrowing a Topic Idea
  • Broadening a Topic Idea
  • Extending the Timeliness of a Topic Idea
  • Academic Writing Style
  • Choosing a Title
  • Making an Outline
  • Paragraph Development
  • Executive Summary
  • Background Information
  • The Research Problem/Question
  • Theoretical Framework
  • Citation Tracking
  • Content Alert Services
  • Evaluating Sources
  • Primary Sources
  • Secondary Sources
  • Tertiary Sources
  • What Is Scholarly vs. Popular?
  • Qualitative Methods
  • Quantitative Methods
  • Using Non-Textual Elements
  • Limitations of the Study
  • Common Grammar Mistakes
  • Avoiding Plagiarism
  • Footnotes or Endnotes?
  • Further Readings
  • Annotated Bibliography
  • Dealing with Nervousness
  • Using Visual Aids
  • Grading Someone Else's Paper
  • How to Manage Group Projects
  • Multiple Book Review Essay
  • Reviewing Collected Essays
  • About Informed Consent
  • Writing Field Notes
  • Writing a Policy Memo
  • Writing a Research Proposal
  • Acknowledgements

The methods section of a research paper provides the information by which a study’s validity is judged. The method section answers two main questions: 1) How was the data collected or generated? 2) How was it analyzed? The writing should be direct and precise and written in the past tense.

Importance of a Good Methodology Section

You must explain how you obtained and analyzed your results for the following reasons:

  • Readers need to know how the data was obtained because the method you choose affects the results and, by extension, how you likely interpreted those results.
  • Methodology is crucial for any branch of scholarship because an unreliable method produces unreliable results and it misappropriates interpretations of findings .
  • In most cases, there are a variety of different methods you can choose to investigate a research problem. Your methodology section of your paper should make clear the reasons why you chose a particular method or procedure .
  • The reader wants to know that the data was collected or generated in a way that is consistent with accepted practice in the field of study. For example, if you are using a questionnaire, readers need to know that it offered your respondents a reasonable range of answers to choose from.
  • The research method must be appropriate to the objectives of the study . For example, be sure you have a large enough sample size to be able to generalize and make recommendations based upon the findings.
  • The methodology should discuss the problems that were anticipated and the steps you took to prevent them from occurring . For any problems that did arise, you must describe the ways in which their impact was minimized or why these problems do not affect the findings in any way that impacts your interpretation of the data.
  • Often in social science research, it is useful for other researchers to adapt or replicate your methodology. Therefore, it is important to always provide sufficient information to allow others to use or replicate the study . This information is particularly important when a new method had been developed or an innovative use of an existing method has been utilized.

Bem, Daryl J. Writing the Empirical Journal Article . Psychology Writing Center. University of Washington; Lunenburg, Frederick C. Writing a Successful Thesis or Dissertation: Tips and Strategies for Students in the Social and Behavioral Sciences . Thousand Oaks, CA: Corwin Press, 2008.

Structure and Writing Style

I. Groups of Research Methods

There are two main groups of research methods in the social sciences:

  • The empirical-analytical group approaches the study of social sciences in a similar manner that researchers study the natural sciences. This type of research focuses on objective knowledge, research questions that can be answered yes or no, and operational definitions of variables to be measured. The empirical-analytical group employs deductive reasoning that uses existing theory as a foundation for hypotheses that need to be tested. This approach is focused on explanation .
  • The interpretative group is focused on understanding phenomenon in a comprehensive, holistic way . This research method allows you to recognize your connection to the subject under study. Because the interpretative group focuses more on subjective knowledge, it requires careful interpretation of variables.

II. Content

An effectively written methodology section should:

  • Introduce the overall methodological approach for investigating your research problem . Is your study qualitative or quantitative or a combination of both (mixed method)? Are you going to take a special approach, such as action research, or a more neutral stance?
  • Indicate how the approach fits the overall research design . Your methods should have a clear connection with your research problem. In other words, make sure that your methods will actually address the problem. One of the most common deficiencies found in research papers is that the proposed methodology is unsuited to achieving the stated objective of your paper.
  • Describe the specific methods of data collection you are going to use , such as, surveys, interviews, questionnaires, observation, archival research. If you are analyzing existing data, such as a data set or archival documents, describe how it was originally created or gathered and by whom.
  • Explain how you intend to analyze your results . Will you use statistical analysis? Will you use specific theoretical perspectives to help you analyze a text or explain observed behaviors?
  • Provide background and rationale for methodologies that are unfamiliar for your readers . Very often in the social sciences, research problems and the methods for investigating them require more explanation/rationale than widely accepted rules governing the natural and physical sciences. Be clear and concise in your explanation.
  • Provide a rationale for subject selection and sampling procedure . For instance, if you propose to conduct interviews, how do you intend to select the sample population? If you are analyzing texts, which texts have you chosen, and why? If you are using statistics, why is this set of statisics being used? If other data sources exist, explain why the data you chose is most appropriate.
  • Address potential limitations . Are there any practical limitations that could affect your data collection? How will you attempt to control for potential confounding variables and errors? If your methodology may lead to problems you can anticipate, state this openly and show why pursuing this methodology outweighs the risk of these problems cropping up.

NOTE :  Once you have written all of the elements of the methods section, subsequent revisions should focus on how to present those elements as clearly and as logically as possibly. The description of how you prepared to study the research problem, how you gathered the data, and the protocol for analyzing the data should be organized chronologically. For clarity, when a large amount of detail must be presented, information should be presented in sub-sections according to topic.

III.  Problems to Avoid

Irrelevant Detail The methodology section of your paper should be thorough but to the point. Don’t provide any background information that doesn’t directly help the reader to understand why a particular method was chosen, how the data was gathered or obtained, and how it was analyzed. Unnecessary Explanation of Basic Procedures Remember that you are not writing a how-to guide about a particular method. You should make the assumption that readers possess a basic understanding of how to investigate the research problem on their own and, therefore, you do not have to go into great detail about specific methodological procedures. The focus should be on how you applied a method , not on the mechanics of doing a method. NOTE: An exception to this rule is if you select an unconventional approach to doing the method; if this is the case, be sure to explain why this approach was chosen and how it enhances the overall research process. Problem Blindness It is almost a given that you will encounter problems when collecting or generating your data. Do not ignore these problems or pretend they did not occur. Often, documenting how you overcame obstacles can form an interesting part of the methodology. It demonstrates to the reader that you can provide a cogent rationale for the decisions you made to minimize the impact of any problems that arose. Literature Review Just as the literature review section of your paper provides an overview of sources you have examined while researching a particular topic, the methodology section should cite any sources that informed your choice and application of a particular method [i.e., the choice of a survey should include any citations to the works you used to help construct the survey].

It’s More than Sources of Information! A description of a research study's method should not be confused with a description of the sources of information. Such a list of sources is useful in itself, especially if it is accompanied by an explanation about the selection and use of the sources. The description of the project's methodology complements a list of sources in that it sets forth the organization and interpretation of information emanating from those sources.

Azevedo, L.F. et al. How to Write a Scientific Paper: Writing the Methods Section. Revista Portuguesa de Pneumologia 17 (2011): 232-238; Butin, Dan W. The Education Dissertation A Guide for Practitioner Scholars . Thousand Oaks, CA: Corwin, 2010; Carter, Susan. Structuring Your Research Thesis . New York: Palgrave Macmillan, 2012; Lunenburg, Frederick C. Writing a Successful Thesis or Dissertation: Tips and Strategies for Students in the Social and Behavioral Sciences . Thousand Oaks, CA: Corwin Press, 2008. Methods Section . The Writer’s Handbook. Writing Center. University of Wisconsin, Madison; Writing the Experimental Report: Methods, Results, and Discussion . The Writing Lab and The OWL. Purdue University; Methods and Materials . The Structure, Format, Content, and Style of a Journal-Style Scientific Paper. Department of Biology. Bates College.

Writing Tip

Statistical Designs and Tests? Do Not Fear Them!

Don't avoid using a quantitative approach to analyzing your research problem just because you fear the idea of applying statistical designs and tests. A qualitative approach, such as conducting interviews or content analysis of archival texts, can yield exciting new insights about a research problem, but it should not be undertaken simply because you have a disdain for running a simple regression. A well designed quantitative research study can often be accomplished in very clear and direct ways, whereas, a similar study of a qualitative nature usually requires considerable time to analyze large volumes of data and a tremendous burden to create new paths for analysis where previously no path associated with your research problem had existed.

Another Writing Tip

Knowing the Relationship Between Theories and Methods

There can be multiple meaning associated with the term "theories" and the term "methods" in social sciences research. A helpful way to delineate between them is to understand "theories" as representing different ways of characterizing the social world when you research it and "methods" as representing different ways of generating and analyzing data about that social world. Framed in this way, all empirical social sciences research involves theories and methods, whether they are stated explicitly or not. However, while theories and methods are often related, it is important that, as a researcher, you deliberately separate them in order to avoid your theories playing a disproportionate role in shaping what outcomes your chosen methods produce.

Introspectively engage in an ongoing dialectic between theories and methods to help enable you to use the outcomes from your methods to interrogate and develop new theories, or ways of framing conceptually the research problem. This is how scholarship grows and branches out into new intellectual territory.

Reynolds, R. Larry. Ways of Knowing. Alternative Microeconomics. Part 1, Chapter 3. Boise State University; The Theory-Method Relationship . S-Cool Revision. United Kingdom.

  • << Previous: What Is Scholarly vs. Popular?
  • Next: Qualitative Methods >>
  • Last Updated: Jul 18, 2023 11:58 AM
  • URL: https://library.sacredheart.edu/c.php?g=29803
  • QuickSearch
  • Library Catalog
  • Databases A-Z
  • Publication Finder
  • Course Reserves
  • Citation Linker
  • Digital Commons
  • Our Website

Research Support

  • Ask a Librarian
  • Appointments
  • Interlibrary Loan (ILL)
  • Research Guides
  • Databases by Subject
  • Citation Help

Using the Library

  • Reserve a Group Study Room
  • Renew Books
  • Honors Study Rooms
  • Off-Campus Access
  • Library Policies
  • Library Technology

User Information

  • Grad Students
  • Online Students
  • COVID-19 Updates
  • Staff Directory
  • News & Announcements
  • Library Newsletter

My Accounts

  • Interlibrary Loan
  • Staff Site Login

Sacred Heart University

FIND US ON  

helpful professor logo

15 Types of Research Methods

types of research methods, explained below

Research methods refer to the strategies, tools, and techniques used to gather and analyze data in a structured way in order to answer a research question or investigate a hypothesis (Hammond & Wellington, 2020).

Generally, we place research methods into two categories: quantitative and qualitative. Each has its own strengths and weaknesses, which we can summarize as:

  • Quantitative research can achieve generalizability through scrupulous statistical analysis applied to large sample sizes.
  • Qualitative research achieves deep, detailed, and nuance accounts of specific case studies, which are not generalizable.

Some researchers, with the aim of making the most of both quantitative and qualitative research, employ mixed methods, whereby they will apply both types of research methods in the one study, such as by conducting a statistical survey alongside in-depth interviews to add context to the quantitative findings.

Below, I’ll outline 15 common research methods, and include pros, cons, and examples of each .

Types of Research Methods

Research methods can be broadly categorized into two types: quantitative and qualitative.

  • Quantitative methods involve systematic empirical investigation of observable phenomena via statistical, mathematical, or computational techniques, providing an in-depth understanding of a specific concept or phenomenon (Schweigert, 2021). The strengths of this approach include its ability to produce reliable results that can be generalized to a larger population, although it can lack depth and detail.
  • Qualitative methods encompass techniques that are designed to provide a deep understanding of a complex issue, often in a specific context, through collection of non-numerical data (Tracy, 2019). This approach often provides rich, detailed insights but can be time-consuming and its findings may not be generalizable.

These can be further broken down into a range of specific research methods and designs:

Combining the two methods above, mixed methods research mixes elements of both qualitative and quantitative research methods, providing a comprehensive understanding of the research problem . We can further break these down into:

  • Sequential Explanatory Design (QUAN→QUAL): This methodology involves conducting quantitative analysis first, then supplementing it with a qualitative study.
  • Sequential Exploratory Design (QUAL→QUAN): This methodology goes in the other direction, starting with qualitative analysis and ending with quantitative analysis.

Let’s explore some methods and designs from both quantitative and qualitative traditions, starting with qualitative research methods.

Qualitative Research Methods

Qualitative research methods allow for the exploration of phenomena in their natural settings, providing detailed, descriptive responses and insights into individuals’ experiences and perceptions (Howitt, 2019).

These methods are useful when a detailed understanding of a phenomenon is sought.

1. Ethnographic Research

Ethnographic research emerged out of anthropological research, where anthropologists would enter into a setting for a sustained period of time, getting to know a cultural group and taking detailed observations.

Ethnographers would sometimes even act as participants in the group or culture, which many scholars argue is a weakness because it is a step away from achieving objectivity (Stokes & Wall, 2017).

In fact, at its most extreme version, ethnographers even conduct research on themselves, in a fascinating methodology call autoethnography .

The purpose is to understand the culture, social structure, and the behaviors of the group under study. It is often useful when researchers seek to understand shared cultural meanings and practices in their natural settings.

However, it can be time-consuming and may reflect researcher biases due to the immersion approach.

Example of Ethnography

Liquidated: An Ethnography of Wall Street  by Karen Ho involves an anthropologist who embeds herself with Wall Street firms to study the culture of Wall Street bankers and how this culture affects the broader economy and world.

2. Phenomenological Research

Phenomenological research is a qualitative method focused on the study of individual experiences from the participant’s perspective (Tracy, 2019).

It focuses specifically on people’s experiences in relation to a specific social phenomenon ( see here for examples of social phenomena ).

This method is valuable when the goal is to understand how individuals perceive, experience, and make meaning of particular phenomena. However, because it is subjective and dependent on participants’ self-reports, findings may not be generalizable, and are highly reliant on self-reported ‘thoughts and feelings’.

Example of Phenomenological Research

A phenomenological approach to experiences with technology  by Sebnem Cilesiz represents a good starting-point for formulating a phenomenological study. With its focus on the ‘essence of experience’, this piece presents methodological, reliability, validity, and data analysis techniques that phenomenologists use to explain how people experience technology in their everyday lives.

3. Historical Research

Historical research is a qualitative method involving the examination of past events to draw conclusions about the present or make predictions about the future (Stokes & Wall, 2017).

As you might expect, it’s common in the research branches of history departments in universities.

This approach is useful in studies that seek to understand the past to interpret present events or trends. However, it relies heavily on the availability and reliability of source materials, which may be limited.

Common data sources include cultural artifacts from both material and non-material culture , which are then examined, compared, contrasted, and contextualized to test hypotheses and generate theories.

Example of Historical Research

A historical research example might be a study examining the evolution of gender roles over the last century. This research might involve the analysis of historical newspapers, advertisements, letters, and company documents, as well as sociocultural contexts.

4. Content Analysis

Content analysis is a research method that involves systematic and objective coding and interpreting of text or media to identify patterns, themes, ideologies, or biases (Schweigert, 2021).

A content analysis is useful in analyzing communication patterns, helping to reveal how texts such as newspapers, movies, films, political speeches, and other types of ‘content’ contain narratives and biases.

However, interpretations can be very subjective, which often requires scholars to engage in practices such as cross-comparing their coding with peers or external researchers.

Content analysis can be further broken down in to other specific methodologies such as semiotic analysis, multimodal analysis , and discourse analysis .

Example of Content Analysis

How is Islam Portrayed in Western Media?  by Poorebrahim and Zarei (2013) employs a type of content analysis called critical discourse analysis (common in poststructuralist and critical theory research ). This study by Poorebrahum and Zarei combs through a corpus of western media texts to explore the language forms that are used in relation to Islam and Muslims, finding that they are overly stereotyped, which may represent anti-Islam bias or failure to understand the Islamic world.

5. Grounded Theory Research

Grounded theory involves developing a theory  during and after  data collection rather than beforehand.

This is in contrast to most academic research studies, which start with a hypothesis or theory and then testing of it through a study, where we might have a null hypothesis (disproving the theory) and an alternative hypothesis (supporting the theory).

Grounded Theory is useful because it keeps an open mind to what the data might reveal out of the research. It can be time-consuming and requires rigorous data analysis (Tracy, 2019).

Grounded Theory Example

Developing a Leadership Identity   by Komives et al (2005) employs a grounded theory approach to develop a thesis based on the data rather than testing a hypothesis. The researchers studied the leadership identity of 13 college students taking on leadership roles. Based on their interviews, the researchers theorized that the students’ leadership identities shifted from a hierarchical view of leadership to one that embraced leadership as a collaborative concept.

6. Action Research

Action research is an approach which aims to solve real-world problems and bring about change within a setting. The study is designed to solve a specific problem – or in other words, to take action (Patten, 2017).

This approach can involve mixed methods, but is generally qualitative because it usually involves the study of a specific case study wherein the researcher works, e.g. a teacher studying their own classroom practice to seek ways they can improve.

Action research is very common in fields like education and nursing where practitioners identify areas for improvement then implement a study in order to find paths forward.

Action Research Example

Using Digital Sandbox Gaming to Improve Creativity Within Boys’ Writing   by Ellison and Drew was a research study one of my research students completed in his own classroom under my supervision. He implemented a digital game-based approach to literacy teaching with boys and interviewed his students to see if the use of games as stimuli for storytelling helped draw them into the learning experience.

7. Natural Observational Research

Observational research can also be quantitative (see: experimental research), but in naturalistic settings for the social sciences, researchers tend to employ qualitative data collection methods like interviews and field notes to observe people in their day-to-day environments.

This approach involves the observation and detailed recording of behaviors in their natural settings (Howitt, 2019). It can provide rich, in-depth information, but the researcher’s presence might influence behavior.

While observational research has some overlaps with ethnography (especially in regard to data collection techniques), it tends not to be as sustained as ethnography, e.g. a researcher might do 5 observations, every second Monday, as opposed to being embedded in an environment.

Observational Research Example

A researcher might use qualitative observational research to study the behaviors and interactions of children at a playground. The researcher would document the behaviors observed, such as the types of games played, levels of cooperation , and instances of conflict.

8. Case Study Research

Case study research is a qualitative method that involves a deep and thorough investigation of a single individual, group, or event in order to explore facets of that phenomenon that cannot be captured using other methods (Stokes & Wall, 2017).

Case study research is especially valuable in providing contextualized insights into specific issues, facilitating the application of abstract theories to real-world situations (Patten, 2017).

However, findings from a case study may not be generalizable due to the specific context and the limited number of cases studied (Walliman, 2021).

See More: Case Study Advantages and Disadvantages

Example of a Case Study

Scholars conduct a detailed exploration of the implementation of a new teaching method within a classroom setting. The study focuses on how the teacher and students adapt to the new method, the challenges encountered, and the outcomes on student performance and engagement. While the study provides specific and detailed insights of the teaching method in that classroom, it cannot be generalized to other classrooms, as statistical significance has not been established through this qualitative approach.

Quantitative Research Methods

Quantitative research methods involve the systematic empirical investigation of observable phenomena via statistical, mathematical, or computational techniques (Pajo, 2022). The focus is on gathering numerical data and generalizing it across groups of people or to explain a particular phenomenon.

9. Experimental Research

Experimental research is a quantitative method where researchers manipulate one variable to determine its effect on another (Walliman, 2021).

This is common, for example, in high-school science labs, where students are asked to introduce a variable into a setting in order to examine its effect.

This type of research is useful in situations where researchers want to determine causal relationships between variables. However, experimental conditions may not reflect real-world conditions.

Example of Experimental Research

A researcher may conduct an experiment to determine the effects of a new educational approach on student learning outcomes. Students would be randomly assigned to either the control group (traditional teaching method) or the experimental group (new educational approach).

10. Surveys and Questionnaires

Surveys and questionnaires are quantitative methods that involve asking research participants structured and predefined questions to collect data about their attitudes, beliefs, behaviors, or characteristics (Patten, 2017).

Surveys are beneficial for collecting data from large samples, but they depend heavily on the honesty and accuracy of respondents.

They tend to be seen as more authoritative than their qualitative counterparts, semi-structured interviews, because the data is quantifiable (e.g. a questionnaire where information is presented on a scale from 1 to 10 can allow researchers to determine and compare statistical means, averages, and variations across sub-populations in the study).

Example of a Survey Study

A company might use a survey to gather data about employee job satisfaction across its offices worldwide. Employees would be asked to rate various aspects of their job satisfaction on a Likert scale. While this method provides a broad overview, it may lack the depth of understanding possible with other methods (Stokes & Wall, 2017).

11. Longitudinal Studies

Longitudinal studies involve repeated observations of the same variables over extended periods (Howitt, 2019). These studies are valuable for tracking development and change but can be costly and time-consuming.

With multiple data points collected over extended periods, it’s possible to examine continuous changes within things like population dynamics or consumer behavior. This makes a detailed analysis of change possible.

a visual representation of a longitudinal study demonstrating that data is collected over time on one sample so researchers can examine how variables change over time

Perhaps the most relatable example of a longitudinal study is a national census, which is taken on the same day every few years, to gather comparative demographic data that can show how a nation is changing over time.

While longitudinal studies are commonly quantitative, there are also instances of qualitative ones as well, such as the famous 7 Up study from the UK, which studies 14 individuals every 7 years to explore their development over their lives.

Example of a Longitudinal Study

A national census, taken every few years, uses surveys to develop longitudinal data, which is then compared and analyzed to present accurate trends over time. Trends a census can reveal include changes in religiosity, values and attitudes on social issues, and much more.

12. Cross-Sectional Studies

Cross-sectional studies are a quantitative research method that involves analyzing data from a population at a specific point in time (Patten, 2017). They provide a snapshot of a situation but cannot determine causality.

This design is used to measure and compare the prevalence of certain characteristics or outcomes in different groups within the sampled population.

A visual representation of a cross-sectional group of people, demonstrating that the data is collected at a single point in time and you can compare groups within the sample

The major advantage of cross-sectional design is its ability to measure a wide range of variables simultaneously without needing to follow up with participants over time.

However, cross-sectional studies do have limitations . This design can only show if there are associations or correlations between different variables, but cannot prove cause and effect relationships, temporal sequence, changes, and trends over time.

Example of a Cross-Sectional Study

Our longitudinal study example of a national census also happens to contain cross-sectional design. One census is cross-sectional, displaying only data from one point in time. But when a census is taken once every few years, it becomes longitudinal, and so long as the data collection technique remains unchanged, identification of changes will be achievable, adding another time dimension on top of a basic cross-sectional study.

13. Correlational Research

Correlational research is a quantitative method that seeks to determine if and to what degree a relationship exists between two or more quantifiable variables (Schweigert, 2021).

This approach provides a fast and easy way to make initial hypotheses based on either positive or  negative correlation trends  that can be observed within dataset.

While correlational research can reveal relationships between variables, it cannot establish causality.

Methods used for data analysis may include statistical correlations such as Pearson’s or Spearman’s.

Example of Correlational Research

A team of researchers is interested in studying the relationship between the amount of time students spend studying and their academic performance. They gather data from a high school, measuring the number of hours each student studies per week and their grade point averages (GPAs) at the end of the semester. Upon analyzing the data, they find a positive correlation, suggesting that students who spend more time studying tend to have higher GPAs.

14. Quasi-Experimental Design Research

Quasi-experimental design research is a quantitative research method that is similar to experimental design but lacks the element of random assignment to treatment or control.

Instead, quasi-experimental designs typically rely on certain other methods to control for extraneous variables.

The term ‘quasi-experimental’ implies that the experiment resembles a true experiment, but it is not exactly the same because it doesn’t meet all the criteria for a ‘true’ experiment, specifically in terms of control and random assignment.

Quasi-experimental design is useful when researchers want to study a causal hypothesis or relationship, but practical or ethical considerations prevent them from manipulating variables and randomly assigning participants to conditions.

Example of Quasi-Experimental Design

A researcher wants to study the impact of a new math tutoring program on student performance. However, ethical and practical constraints prevent random assignment to the “tutoring” and “no tutoring” groups. Instead, the researcher compares students who chose to receive tutoring (experimental group) to similar students who did not choose to receive tutoring (control group), controlling for other variables like grade level and previous math performance.

Related: Examples and Types of Random Assignment in Research

15. Meta-Analysis Research

Meta-analysis statistically combines the results of multiple studies on a specific topic to yield a more precise estimate of the effect size. It’s the gold standard of secondary research .

Meta-analysis is particularly useful when there are numerous studies on a topic, and there is a need to integrate the findings to draw more reliable conclusions.

Some meta-analyses can identify flaws or gaps in a corpus of research, when can be highly influential in academic research, despite lack of primary data collection.

However, they tend only to be feasible when there is a sizable corpus of high-quality and reliable studies into a phenomenon.

Example of a Meta-Analysis

The power of feedback revisited (Wisniewski, Zierer & Hattie, 2020) is a meta-analysis that examines 435 empirical studies research on the effects of feedback on student learning. They use a random-effects model to ascertain whether there is a clear effect size across the literature. The authors find that feedback tends to impact cognitive and motor skill outcomes but has less of an effect on motivational and behavioral outcomes.

Choosing a research method requires a lot of consideration regarding what you want to achieve, your research paradigm, and the methodology that is most valuable for what you are studying. There are multiple types of research methods, many of which I haven’t been able to present here. Generally, it’s recommended that you work with an experienced researcher or research supervisor to identify a suitable research method for your study at hand.

Hammond, M., & Wellington, J. (2020). Research methods: The key concepts . New York: Routledge.

Howitt, D. (2019). Introduction to qualitative research methods in psychology . London: Pearson UK.

Pajo, B. (2022). Introduction to research methods: A hands-on approach . New York: Sage Publications.

Patten, M. L. (2017). Understanding research methods: An overview of the essentials . New York: Sage

Schweigert, W. A. (2021). Research methods in psychology: A handbook . Los Angeles: Waveland Press.

Stokes, P., & Wall, T. (2017). Research methods . New York: Bloomsbury Publishing.

Tracy, S. J. (2019). Qualitative research methods: Collecting evidence, crafting analysis, communicating impact . London: John Wiley & Sons.

Walliman, N. (2021). Research methods: The basics. London: Routledge.

Chris

Chris Drew (PhD)

Dr. Chris Drew is the founder of the Helpful Professor. He holds a PhD in education and has published over 20 articles in scholarly journals. He is the former editor of the Journal of Learning Development in Higher Education. [Image Descriptor: Photo of Chris]

  • Chris Drew (PhD) https://helpfulprofessor.com/author/chris-drew-phd/ 5 Top Tips for Succeeding at University
  • Chris Drew (PhD) https://helpfulprofessor.com/author/chris-drew-phd/ 50 Durable Goods Examples
  • Chris Drew (PhD) https://helpfulprofessor.com/author/chris-drew-phd/ 100 Consumer Goods Examples
  • Chris Drew (PhD) https://helpfulprofessor.com/author/chris-drew-phd/ 30 Globalization Pros and Cons

Leave a Comment Cancel Reply

Your email address will not be published. Required fields are marked *

U.S. flag

An official website of the United States government

The .gov means it’s official. Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

The site is secure. The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

  • Publications
  • Account settings

Preview improvements coming to the PMC website in October 2024. Learn More or Try it out now .

  • Advanced Search
  • Journal List
  • Neurol Res Pract

Logo of neurrp

How to use and assess qualitative research methods

Loraine busetto.

1 Department of Neurology, Heidelberg University Hospital, Im Neuenheimer Feld 400, 69120 Heidelberg, Germany

Wolfgang Wick

2 Clinical Cooperation Unit Neuro-Oncology, German Cancer Research Center, Heidelberg, Germany

Christoph Gumbinger

Associated data.

Not applicable.

This paper aims to provide an overview of the use and assessment of qualitative research methods in the health sciences. Qualitative research can be defined as the study of the nature of phenomena and is especially appropriate for answering questions of why something is (not) observed, assessing complex multi-component interventions, and focussing on intervention improvement. The most common methods of data collection are document study, (non-) participant observations, semi-structured interviews and focus groups. For data analysis, field-notes and audio-recordings are transcribed into protocols and transcripts, and coded using qualitative data management software. Criteria such as checklists, reflexivity, sampling strategies, piloting, co-coding, member-checking and stakeholder involvement can be used to enhance and assess the quality of the research conducted. Using qualitative in addition to quantitative designs will equip us with better tools to address a greater range of research problems, and to fill in blind spots in current neurological research and practice.

The aim of this paper is to provide an overview of qualitative research methods, including hands-on information on how they can be used, reported and assessed. This article is intended for beginning qualitative researchers in the health sciences as well as experienced quantitative researchers who wish to broaden their understanding of qualitative research.

What is qualitative research?

Qualitative research is defined as “the study of the nature of phenomena”, including “their quality, different manifestations, the context in which they appear or the perspectives from which they can be perceived” , but excluding “their range, frequency and place in an objectively determined chain of cause and effect” [ 1 ]. This formal definition can be complemented with a more pragmatic rule of thumb: qualitative research generally includes data in form of words rather than numbers [ 2 ].

Why conduct qualitative research?

Because some research questions cannot be answered using (only) quantitative methods. For example, one Australian study addressed the issue of why patients from Aboriginal communities often present late or not at all to specialist services offered by tertiary care hospitals. Using qualitative interviews with patients and staff, it found one of the most significant access barriers to be transportation problems, including some towns and communities simply not having a bus service to the hospital [ 3 ]. A quantitative study could have measured the number of patients over time or even looked at possible explanatory factors – but only those previously known or suspected to be of relevance. To discover reasons for observed patterns, especially the invisible or surprising ones, qualitative designs are needed.

While qualitative research is common in other fields, it is still relatively underrepresented in health services research. The latter field is more traditionally rooted in the evidence-based-medicine paradigm, as seen in " research that involves testing the effectiveness of various strategies to achieve changes in clinical practice, preferably applying randomised controlled trial study designs (...) " [ 4 ]. This focus on quantitative research and specifically randomised controlled trials (RCT) is visible in the idea of a hierarchy of research evidence which assumes that some research designs are objectively better than others, and that choosing a "lesser" design is only acceptable when the better ones are not practically or ethically feasible [ 5 , 6 ]. Others, however, argue that an objective hierarchy does not exist, and that, instead, the research design and methods should be chosen to fit the specific research question at hand – "questions before methods" [ 2 , 7 – 9 ]. This means that even when an RCT is possible, some research problems require a different design that is better suited to addressing them. Arguing in JAMA, Berwick uses the example of rapid response teams in hospitals, which he describes as " a complex, multicomponent intervention – essentially a process of social change" susceptible to a range of different context factors including leadership or organisation history. According to him, "[in] such complex terrain, the RCT is an impoverished way to learn. Critics who use it as a truth standard in this context are incorrect" [ 8 ] . Instead of limiting oneself to RCTs, Berwick recommends embracing a wider range of methods , including qualitative ones, which for "these specific applications, (...) are not compromises in learning how to improve; they are superior" [ 8 ].

Research problems that can be approached particularly well using qualitative methods include assessing complex multi-component interventions or systems (of change), addressing questions beyond “what works”, towards “what works for whom when, how and why”, and focussing on intervention improvement rather than accreditation [ 7 , 9 – 12 ]. Using qualitative methods can also help shed light on the “softer” side of medical treatment. For example, while quantitative trials can measure the costs and benefits of neuro-oncological treatment in terms of survival rates or adverse effects, qualitative research can help provide a better understanding of patient or caregiver stress, visibility of illness or out-of-pocket expenses.

How to conduct qualitative research?

Given that qualitative research is characterised by flexibility, openness and responsivity to context, the steps of data collection and analysis are not as separate and consecutive as they tend to be in quantitative research [ 13 , 14 ]. As Fossey puts it : “sampling, data collection, analysis and interpretation are related to each other in a cyclical (iterative) manner, rather than following one after another in a stepwise approach” [ 15 ]. The researcher can make educated decisions with regard to the choice of method, how they are implemented, and to which and how many units they are applied [ 13 ]. As shown in Fig.  1 , this can involve several back-and-forth steps between data collection and analysis where new insights and experiences can lead to adaption and expansion of the original plan. Some insights may also necessitate a revision of the research question and/or the research design as a whole. The process ends when saturation is achieved, i.e. when no relevant new information can be found (see also below: sampling and saturation). For reasons of transparency, it is essential for all decisions as well as the underlying reasoning to be well-documented.

An external file that holds a picture, illustration, etc.
Object name is 42466_2020_59_Fig1_HTML.jpg

Iterative research process

While it is not always explicitly addressed, qualitative methods reflect a different underlying research paradigm than quantitative research (e.g. constructivism or interpretivism as opposed to positivism). The choice of methods can be based on the respective underlying substantive theory or theoretical framework used by the researcher [ 2 ].

Data collection

The methods of qualitative data collection most commonly used in health research are document study, observations, semi-structured interviews and focus groups [ 1 , 14 , 16 , 17 ].

Document study

Document study (also called document analysis) refers to the review by the researcher of written materials [ 14 ]. These can include personal and non-personal documents such as archives, annual reports, guidelines, policy documents, diaries or letters.

Observations

Observations are particularly useful to gain insights into a certain setting and actual behaviour – as opposed to reported behaviour or opinions [ 13 ]. Qualitative observations can be either participant or non-participant in nature. In participant observations, the observer is part of the observed setting, for example a nurse working in an intensive care unit [ 18 ]. In non-participant observations, the observer is “on the outside looking in”, i.e. present in but not part of the situation, trying not to influence the setting by their presence. Observations can be planned (e.g. for 3 h during the day or night shift) or ad hoc (e.g. as soon as a stroke patient arrives at the emergency room). During the observation, the observer takes notes on everything or certain pre-determined parts of what is happening around them, for example focusing on physician-patient interactions or communication between different professional groups. Written notes can be taken during or after the observations, depending on feasibility (which is usually lower during participant observations) and acceptability (e.g. when the observer is perceived to be judging the observed). Afterwards, these field notes are transcribed into observation protocols. If more than one observer was involved, field notes are taken independently, but notes can be consolidated into one protocol after discussions. Advantages of conducting observations include minimising the distance between the researcher and the researched, the potential discovery of topics that the researcher did not realise were relevant and gaining deeper insights into the real-world dimensions of the research problem at hand [ 18 ].

Semi-structured interviews

Hijmans & Kuyper describe qualitative interviews as “an exchange with an informal character, a conversation with a goal” [ 19 ]. Interviews are used to gain insights into a person’s subjective experiences, opinions and motivations – as opposed to facts or behaviours [ 13 ]. Interviews can be distinguished by the degree to which they are structured (i.e. a questionnaire), open (e.g. free conversation or autobiographical interviews) or semi-structured [ 2 , 13 ]. Semi-structured interviews are characterized by open-ended questions and the use of an interview guide (or topic guide/list) in which the broad areas of interest, sometimes including sub-questions, are defined [ 19 ]. The pre-defined topics in the interview guide can be derived from the literature, previous research or a preliminary method of data collection, e.g. document study or observations. The topic list is usually adapted and improved at the start of the data collection process as the interviewer learns more about the field [ 20 ]. Across interviews the focus on the different (blocks of) questions may differ and some questions may be skipped altogether (e.g. if the interviewee is not able or willing to answer the questions or for concerns about the total length of the interview) [ 20 ]. Qualitative interviews are usually not conducted in written format as it impedes on the interactive component of the method [ 20 ]. In comparison to written surveys, qualitative interviews have the advantage of being interactive and allowing for unexpected topics to emerge and to be taken up by the researcher. This can also help overcome a provider or researcher-centred bias often found in written surveys, which by nature, can only measure what is already known or expected to be of relevance to the researcher. Interviews can be audio- or video-taped; but sometimes it is only feasible or acceptable for the interviewer to take written notes [ 14 , 16 , 20 ].

Focus groups

Focus groups are group interviews to explore participants’ expertise and experiences, including explorations of how and why people behave in certain ways [ 1 ]. Focus groups usually consist of 6–8 people and are led by an experienced moderator following a topic guide or “script” [ 21 ]. They can involve an observer who takes note of the non-verbal aspects of the situation, possibly using an observation guide [ 21 ]. Depending on researchers’ and participants’ preferences, the discussions can be audio- or video-taped and transcribed afterwards [ 21 ]. Focus groups are useful for bringing together homogeneous (to a lesser extent heterogeneous) groups of participants with relevant expertise and experience on a given topic on which they can share detailed information [ 21 ]. Focus groups are a relatively easy, fast and inexpensive method to gain access to information on interactions in a given group, i.e. “the sharing and comparing” among participants [ 21 ]. Disadvantages include less control over the process and a lesser extent to which each individual may participate. Moreover, focus group moderators need experience, as do those tasked with the analysis of the resulting data. Focus groups can be less appropriate for discussing sensitive topics that participants might be reluctant to disclose in a group setting [ 13 ]. Moreover, attention must be paid to the emergence of “groupthink” as well as possible power dynamics within the group, e.g. when patients are awed or intimidated by health professionals.

Choosing the “right” method

As explained above, the school of thought underlying qualitative research assumes no objective hierarchy of evidence and methods. This means that each choice of single or combined methods has to be based on the research question that needs to be answered and a critical assessment with regard to whether or to what extent the chosen method can accomplish this – i.e. the “fit” between question and method [ 14 ]. It is necessary for these decisions to be documented when they are being made, and to be critically discussed when reporting methods and results.

Let us assume that our research aim is to examine the (clinical) processes around acute endovascular treatment (EVT), from the patient’s arrival at the emergency room to recanalization, with the aim to identify possible causes for delay and/or other causes for sub-optimal treatment outcome. As a first step, we could conduct a document study of the relevant standard operating procedures (SOPs) for this phase of care – are they up-to-date and in line with current guidelines? Do they contain any mistakes, irregularities or uncertainties that could cause delays or other problems? Regardless of the answers to these questions, the results have to be interpreted based on what they are: a written outline of what care processes in this hospital should look like. If we want to know what they actually look like in practice, we can conduct observations of the processes described in the SOPs. These results can (and should) be analysed in themselves, but also in comparison to the results of the document analysis, especially as regards relevant discrepancies. Do the SOPs outline specific tests for which no equipment can be observed or tasks to be performed by specialized nurses who are not present during the observation? It might also be possible that the written SOP is outdated, but the actual care provided is in line with current best practice. In order to find out why these discrepancies exist, it can be useful to conduct interviews. Are the physicians simply not aware of the SOPs (because their existence is limited to the hospital’s intranet) or do they actively disagree with them or does the infrastructure make it impossible to provide the care as described? Another rationale for adding interviews is that some situations (or all of their possible variations for different patient groups or the day, night or weekend shift) cannot practically or ethically be observed. In this case, it is possible to ask those involved to report on their actions – being aware that this is not the same as the actual observation. A senior physician’s or hospital manager’s description of certain situations might differ from a nurse’s or junior physician’s one, maybe because they intentionally misrepresent facts or maybe because different aspects of the process are visible or important to them. In some cases, it can also be relevant to consider to whom the interviewee is disclosing this information – someone they trust, someone they are otherwise not connected to, or someone they suspect or are aware of being in a potentially “dangerous” power relationship to them. Lastly, a focus group could be conducted with representatives of the relevant professional groups to explore how and why exactly they provide care around EVT. The discussion might reveal discrepancies (between SOPs and actual care or between different physicians) and motivations to the researchers as well as to the focus group members that they might not have been aware of themselves. For the focus group to deliver relevant information, attention has to be paid to its composition and conduct, for example, to make sure that all participants feel safe to disclose sensitive or potentially problematic information or that the discussion is not dominated by (senior) physicians only. The resulting combination of data collection methods is shown in Fig.  2 .

An external file that holds a picture, illustration, etc.
Object name is 42466_2020_59_Fig2_HTML.jpg

Possible combination of data collection methods

Attributions for icons: “Book” by Serhii Smirnov, “Interview” by Adrien Coquet, FR, “Magnifying Glass” by anggun, ID, “Business communication” by Vectors Market; all from the Noun Project

The combination of multiple data source as described for this example can be referred to as “triangulation”, in which multiple measurements are carried out from different angles to achieve a more comprehensive understanding of the phenomenon under study [ 22 , 23 ].

Data analysis

To analyse the data collected through observations, interviews and focus groups these need to be transcribed into protocols and transcripts (see Fig.  3 ). Interviews and focus groups can be transcribed verbatim , with or without annotations for behaviour (e.g. laughing, crying, pausing) and with or without phonetic transcription of dialects and filler words, depending on what is expected or known to be relevant for the analysis. In the next step, the protocols and transcripts are coded , that is, marked (or tagged, labelled) with one or more short descriptors of the content of a sentence or paragraph [ 2 , 15 , 23 ]. Jansen describes coding as “connecting the raw data with “theoretical” terms” [ 20 ]. In a more practical sense, coding makes raw data sortable. This makes it possible to extract and examine all segments describing, say, a tele-neurology consultation from multiple data sources (e.g. SOPs, emergency room observations, staff and patient interview). In a process of synthesis and abstraction, the codes are then grouped, summarised and/or categorised [ 15 , 20 ]. The end product of the coding or analysis process is a descriptive theory of the behavioural pattern under investigation [ 20 ]. The coding process is performed using qualitative data management software, the most common ones being InVivo, MaxQDA and Atlas.ti. It should be noted that these are data management tools which support the analysis performed by the researcher(s) [ 14 ].

An external file that holds a picture, illustration, etc.
Object name is 42466_2020_59_Fig3_HTML.jpg

From data collection to data analysis

Attributions for icons: see Fig. ​ Fig.2, 2 , also “Speech to text” by Trevor Dsouza, “Field Notes” by Mike O’Brien, US, “Voice Record” by ProSymbols, US, “Inspection” by Made, AU, and “Cloud” by Graphic Tigers; all from the Noun Project

How to report qualitative research?

Protocols of qualitative research can be published separately and in advance of the study results. However, the aim is not the same as in RCT protocols, i.e. to pre-define and set in stone the research questions and primary or secondary endpoints. Rather, it is a way to describe the research methods in detail, which might not be possible in the results paper given journals’ word limits. Qualitative research papers are usually longer than their quantitative counterparts to allow for deep understanding and so-called “thick description”. In the methods section, the focus is on transparency of the methods used, including why, how and by whom they were implemented in the specific study setting, so as to enable a discussion of whether and how this may have influenced data collection, analysis and interpretation. The results section usually starts with a paragraph outlining the main findings, followed by more detailed descriptions of, for example, the commonalities, discrepancies or exceptions per category [ 20 ]. Here it is important to support main findings by relevant quotations, which may add information, context, emphasis or real-life examples [ 20 , 23 ]. It is subject to debate in the field whether it is relevant to state the exact number or percentage of respondents supporting a certain statement (e.g. “Five interviewees expressed negative feelings towards XYZ”) [ 21 ].

How to combine qualitative with quantitative research?

Qualitative methods can be combined with other methods in multi- or mixed methods designs, which “[employ] two or more different methods [ …] within the same study or research program rather than confining the research to one single method” [ 24 ]. Reasons for combining methods can be diverse, including triangulation for corroboration of findings, complementarity for illustration and clarification of results, expansion to extend the breadth and range of the study, explanation of (unexpected) results generated with one method with the help of another, or offsetting the weakness of one method with the strength of another [ 1 , 17 , 24 – 26 ]. The resulting designs can be classified according to when, why and how the different quantitative and/or qualitative data strands are combined. The three most common types of mixed method designs are the convergent parallel design , the explanatory sequential design and the exploratory sequential design. The designs with examples are shown in Fig.  4 .

An external file that holds a picture, illustration, etc.
Object name is 42466_2020_59_Fig4_HTML.jpg

Three common mixed methods designs

In the convergent parallel design, a qualitative study is conducted in parallel to and independently of a quantitative study, and the results of both studies are compared and combined at the stage of interpretation of results. Using the above example of EVT provision, this could entail setting up a quantitative EVT registry to measure process times and patient outcomes in parallel to conducting the qualitative research outlined above, and then comparing results. Amongst other things, this would make it possible to assess whether interview respondents’ subjective impressions of patients receiving good care match modified Rankin Scores at follow-up, or whether observed delays in care provision are exceptions or the rule when compared to door-to-needle times as documented in the registry. In the explanatory sequential design, a quantitative study is carried out first, followed by a qualitative study to help explain the results from the quantitative study. This would be an appropriate design if the registry alone had revealed relevant delays in door-to-needle times and the qualitative study would be used to understand where and why these occurred, and how they could be improved. In the exploratory design, the qualitative study is carried out first and its results help informing and building the quantitative study in the next step [ 26 ]. If the qualitative study around EVT provision had shown a high level of dissatisfaction among the staff members involved, a quantitative questionnaire investigating staff satisfaction could be set up in the next step, informed by the qualitative study on which topics dissatisfaction had been expressed. Amongst other things, the questionnaire design would make it possible to widen the reach of the research to more respondents from different (types of) hospitals, regions, countries or settings, and to conduct sub-group analyses for different professional groups.

How to assess qualitative research?

A variety of assessment criteria and lists have been developed for qualitative research, ranging in their focus and comprehensiveness [ 14 , 17 , 27 ]. However, none of these has been elevated to the “gold standard” in the field. In the following, we therefore focus on a set of commonly used assessment criteria that, from a practical standpoint, a researcher can look for when assessing a qualitative research report or paper.

Assessors should check the authors’ use of and adherence to the relevant reporting checklists (e.g. Standards for Reporting Qualitative Research (SRQR)) to make sure all items that are relevant for this type of research are addressed [ 23 , 28 ]. Discussions of quantitative measures in addition to or instead of these qualitative measures can be a sign of lower quality of the research (paper). Providing and adhering to a checklist for qualitative research contributes to an important quality criterion for qualitative research, namely transparency [ 15 , 17 , 23 ].

Reflexivity

While methodological transparency and complete reporting is relevant for all types of research, some additional criteria must be taken into account for qualitative research. This includes what is called reflexivity, i.e. sensitivity to the relationship between the researcher and the researched, including how contact was established and maintained, or the background and experience of the researcher(s) involved in data collection and analysis. Depending on the research question and population to be researched this can be limited to professional experience, but it may also include gender, age or ethnicity [ 17 , 27 ]. These details are relevant because in qualitative research, as opposed to quantitative research, the researcher as a person cannot be isolated from the research process [ 23 ]. It may influence the conversation when an interviewed patient speaks to an interviewer who is a physician, or when an interviewee is asked to discuss a gynaecological procedure with a male interviewer, and therefore the reader must be made aware of these details [ 19 ].

Sampling and saturation

The aim of qualitative sampling is for all variants of the objects of observation that are deemed relevant for the study to be present in the sample “ to see the issue and its meanings from as many angles as possible” [ 1 , 16 , 19 , 20 , 27 ] , and to ensure “information-richness [ 15 ]. An iterative sampling approach is advised, in which data collection (e.g. five interviews) is followed by data analysis, followed by more data collection to find variants that are lacking in the current sample. This process continues until no new (relevant) information can be found and further sampling becomes redundant – which is called saturation [ 1 , 15 ] . In other words: qualitative data collection finds its end point not a priori , but when the research team determines that saturation has been reached [ 29 , 30 ].

This is also the reason why most qualitative studies use deliberate instead of random sampling strategies. This is generally referred to as “ purposive sampling” , in which researchers pre-define which types of participants or cases they need to include so as to cover all variations that are expected to be of relevance, based on the literature, previous experience or theory (i.e. theoretical sampling) [ 14 , 20 ]. Other types of purposive sampling include (but are not limited to) maximum variation sampling, critical case sampling or extreme or deviant case sampling [ 2 ]. In the above EVT example, a purposive sample could include all relevant professional groups and/or all relevant stakeholders (patients, relatives) and/or all relevant times of observation (day, night and weekend shift).

Assessors of qualitative research should check whether the considerations underlying the sampling strategy were sound and whether or how researchers tried to adapt and improve their strategies in stepwise or cyclical approaches between data collection and analysis to achieve saturation [ 14 ].

Good qualitative research is iterative in nature, i.e. it goes back and forth between data collection and analysis, revising and improving the approach where necessary. One example of this are pilot interviews, where different aspects of the interview (especially the interview guide, but also, for example, the site of the interview or whether the interview can be audio-recorded) are tested with a small number of respondents, evaluated and revised [ 19 ]. In doing so, the interviewer learns which wording or types of questions work best, or which is the best length of an interview with patients who have trouble concentrating for an extended time. Of course, the same reasoning applies to observations or focus groups which can also be piloted.

Ideally, coding should be performed by at least two researchers, especially at the beginning of the coding process when a common approach must be defined, including the establishment of a useful coding list (or tree), and when a common meaning of individual codes must be established [ 23 ]. An initial sub-set or all transcripts can be coded independently by the coders and then compared and consolidated after regular discussions in the research team. This is to make sure that codes are applied consistently to the research data.

Member checking

Member checking, also called respondent validation , refers to the practice of checking back with study respondents to see if the research is in line with their views [ 14 , 27 ]. This can happen after data collection or analysis or when first results are available [ 23 ]. For example, interviewees can be provided with (summaries of) their transcripts and asked whether they believe this to be a complete representation of their views or whether they would like to clarify or elaborate on their responses [ 17 ]. Respondents’ feedback on these issues then becomes part of the data collection and analysis [ 27 ].

Stakeholder involvement

In those niches where qualitative approaches have been able to evolve and grow, a new trend has seen the inclusion of patients and their representatives not only as study participants (i.e. “members”, see above) but as consultants to and active participants in the broader research process [ 31 – 33 ]. The underlying assumption is that patients and other stakeholders hold unique perspectives and experiences that add value beyond their own single story, making the research more relevant and beneficial to researchers, study participants and (future) patients alike [ 34 , 35 ]. Using the example of patients on or nearing dialysis, a recent scoping review found that 80% of clinical research did not address the top 10 research priorities identified by patients and caregivers [ 32 , 36 ]. In this sense, the involvement of the relevant stakeholders, especially patients and relatives, is increasingly being seen as a quality indicator in and of itself.

How not to assess qualitative research

The above overview does not include certain items that are routine in assessments of quantitative research. What follows is a non-exhaustive, non-representative, experience-based list of the quantitative criteria often applied to the assessment of qualitative research, as well as an explanation of the limited usefulness of these endeavours.

Protocol adherence

Given the openness and flexibility of qualitative research, it should not be assessed by how well it adheres to pre-determined and fixed strategies – in other words: its rigidity. Instead, the assessor should look for signs of adaptation and refinement based on lessons learned from earlier steps in the research process.

Sample size

For the reasons explained above, qualitative research does not require specific sample sizes, nor does it require that the sample size be determined a priori [ 1 , 14 , 27 , 37 – 39 ]. Sample size can only be a useful quality indicator when related to the research purpose, the chosen methodology and the composition of the sample, i.e. who was included and why.

Randomisation

While some authors argue that randomisation can be used in qualitative research, this is not commonly the case, as neither its feasibility nor its necessity or usefulness has been convincingly established for qualitative research [ 13 , 27 ]. Relevant disadvantages include the negative impact of a too large sample size as well as the possibility (or probability) of selecting “ quiet, uncooperative or inarticulate individuals ” [ 17 ]. Qualitative studies do not use control groups, either.

Interrater reliability, variability and other “objectivity checks”

The concept of “interrater reliability” is sometimes used in qualitative research to assess to which extent the coding approach overlaps between the two co-coders. However, it is not clear what this measure tells us about the quality of the analysis [ 23 ]. This means that these scores can be included in qualitative research reports, preferably with some additional information on what the score means for the analysis, but it is not a requirement. Relatedly, it is not relevant for the quality or “objectivity” of qualitative research to separate those who recruited the study participants and collected and analysed the data. Experiences even show that it might be better to have the same person or team perform all of these tasks [ 20 ]. First, when researchers introduce themselves during recruitment this can enhance trust when the interview takes place days or weeks later with the same researcher. Second, when the audio-recording is transcribed for analysis, the researcher conducting the interviews will usually remember the interviewee and the specific interview situation during data analysis. This might be helpful in providing additional context information for interpretation of data, e.g. on whether something might have been meant as a joke [ 18 ].

Not being quantitative research

Being qualitative research instead of quantitative research should not be used as an assessment criterion if it is used irrespectively of the research problem at hand. Similarly, qualitative research should not be required to be combined with quantitative research per se – unless mixed methods research is judged as inherently better than single-method research. In this case, the same criterion should be applied for quantitative studies without a qualitative component.

The main take-away points of this paper are summarised in Table ​ Table1. 1 . We aimed to show that, if conducted well, qualitative research can answer specific research questions that cannot to be adequately answered using (only) quantitative designs. Seeing qualitative and quantitative methods as equal will help us become more aware and critical of the “fit” between the research problem and our chosen methods: I can conduct an RCT to determine the reasons for transportation delays of acute stroke patients – but should I? It also provides us with a greater range of tools to tackle a greater range of research problems more appropriately and successfully, filling in the blind spots on one half of the methodological spectrum to better address the whole complexity of neurological research and practice.

Take-away-points

Acknowledgements

Abbreviations, authors’ contributions.

LB drafted the manuscript; WW and CG revised the manuscript; all authors approved the final versions.

no external funding.

Availability of data and materials

Ethics approval and consent to participate, consent for publication, competing interests.

The authors declare no competing interests.

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

  • USC Libraries
  • Research Guides

Organizing Your Social Sciences Research Paper

  • Quantitative Methods
  • Purpose of Guide
  • Design Flaws to Avoid
  • Independent and Dependent Variables
  • Glossary of Research Terms
  • Reading Research Effectively
  • Narrowing a Topic Idea
  • Broadening a Topic Idea
  • Extending the Timeliness of a Topic Idea
  • Academic Writing Style
  • Choosing a Title
  • Making an Outline
  • Paragraph Development
  • Research Process Video Series
  • Executive Summary
  • The C.A.R.S. Model
  • Background Information
  • The Research Problem/Question
  • Theoretical Framework
  • Citation Tracking
  • Content Alert Services
  • Evaluating Sources
  • Primary Sources
  • Secondary Sources
  • Tiertiary Sources
  • Scholarly vs. Popular Publications
  • Qualitative Methods
  • Insiderness
  • Using Non-Textual Elements
  • Limitations of the Study
  • Common Grammar Mistakes
  • Writing Concisely
  • Avoiding Plagiarism
  • Footnotes or Endnotes?
  • Further Readings
  • Generative AI and Writing
  • USC Libraries Tutorials and Other Guides
  • Bibliography

Quantitative methods emphasize objective measurements and the statistical, mathematical, or numerical analysis of data collected through polls, questionnaires, and surveys, or by manipulating pre-existing statistical data using computational techniques . Quantitative research focuses on gathering numerical data and generalizing it across groups of people or to explain a particular phenomenon.

Babbie, Earl R. The Practice of Social Research . 12th ed. Belmont, CA: Wadsworth Cengage, 2010; Muijs, Daniel. Doing Quantitative Research in Education with SPSS . 2nd edition. London: SAGE Publications, 2010.

Need Help Locating Statistics?

Resources for locating data and statistics can be found here:

Statistics & Data Research Guide

Characteristics of Quantitative Research

Your goal in conducting quantitative research study is to determine the relationship between one thing [an independent variable] and another [a dependent or outcome variable] within a population. Quantitative research designs are either descriptive [subjects usually measured once] or experimental [subjects measured before and after a treatment]. A descriptive study establishes only associations between variables; an experimental study establishes causality.

Quantitative research deals in numbers, logic, and an objective stance. Quantitative research focuses on numeric and unchanging data and detailed, convergent reasoning rather than divergent reasoning [i.e., the generation of a variety of ideas about a research problem in a spontaneous, free-flowing manner].

Its main characteristics are :

  • The data is usually gathered using structured research instruments.
  • The results are based on larger sample sizes that are representative of the population.
  • The research study can usually be replicated or repeated, given its high reliability.
  • Researcher has a clearly defined research question to which objective answers are sought.
  • All aspects of the study are carefully designed before data is collected.
  • Data are in the form of numbers and statistics, often arranged in tables, charts, figures, or other non-textual forms.
  • Project can be used to generalize concepts more widely, predict future results, or investigate causal relationships.
  • Researcher uses tools, such as questionnaires or computer software, to collect numerical data.

The overarching aim of a quantitative research study is to classify features, count them, and construct statistical models in an attempt to explain what is observed.

  Things to keep in mind when reporting the results of a study using quantitative methods :

  • Explain the data collected and their statistical treatment as well as all relevant results in relation to the research problem you are investigating. Interpretation of results is not appropriate in this section.
  • Report unanticipated events that occurred during your data collection. Explain how the actual analysis differs from the planned analysis. Explain your handling of missing data and why any missing data does not undermine the validity of your analysis.
  • Explain the techniques you used to "clean" your data set.
  • Choose a minimally sufficient statistical procedure ; provide a rationale for its use and a reference for it. Specify any computer programs used.
  • Describe the assumptions for each procedure and the steps you took to ensure that they were not violated.
  • When using inferential statistics , provide the descriptive statistics, confidence intervals, and sample sizes for each variable as well as the value of the test statistic, its direction, the degrees of freedom, and the significance level [report the actual p value].
  • Avoid inferring causality , particularly in nonrandomized designs or without further experimentation.
  • Use tables to provide exact values ; use figures to convey global effects. Keep figures small in size; include graphic representations of confidence intervals whenever possible.
  • Always tell the reader what to look for in tables and figures .

NOTE:   When using pre-existing statistical data gathered and made available by anyone other than yourself [e.g., government agency], you still must report on the methods that were used to gather the data and describe any missing data that exists and, if there is any, provide a clear explanation why the missing data does not undermine the validity of your final analysis.

Babbie, Earl R. The Practice of Social Research . 12th ed. Belmont, CA: Wadsworth Cengage, 2010; Brians, Craig Leonard et al. Empirical Political Analysis: Quantitative and Qualitative Research Methods . 8th ed. Boston, MA: Longman, 2011; McNabb, David E. Research Methods in Public Administration and Nonprofit Management: Quantitative and Qualitative Approaches . 2nd ed. Armonk, NY: M.E. Sharpe, 2008; Quantitative Research Methods. Writing@CSU. Colorado State University; Singh, Kultar. Quantitative Social Research Methods . Los Angeles, CA: Sage, 2007.

Basic Research Design for Quantitative Studies

Before designing a quantitative research study, you must decide whether it will be descriptive or experimental because this will dictate how you gather, analyze, and interpret the results. A descriptive study is governed by the following rules: subjects are generally measured once; the intention is to only establish associations between variables; and, the study may include a sample population of hundreds or thousands of subjects to ensure that a valid estimate of a generalized relationship between variables has been obtained. An experimental design includes subjects measured before and after a particular treatment, the sample population may be very small and purposefully chosen, and it is intended to establish causality between variables. Introduction The introduction to a quantitative study is usually written in the present tense and from the third person point of view. It covers the following information:

  • Identifies the research problem -- as with any academic study, you must state clearly and concisely the research problem being investigated.
  • Reviews the literature -- review scholarship on the topic, synthesizing key themes and, if necessary, noting studies that have used similar methods of inquiry and analysis. Note where key gaps exist and how your study helps to fill these gaps or clarifies existing knowledge.
  • Describes the theoretical framework -- provide an outline of the theory or hypothesis underpinning your study. If necessary, define unfamiliar or complex terms, concepts, or ideas and provide the appropriate background information to place the research problem in proper context [e.g., historical, cultural, economic, etc.].

Methodology The methods section of a quantitative study should describe how each objective of your study will be achieved. Be sure to provide enough detail to enable the reader can make an informed assessment of the methods being used to obtain results associated with the research problem. The methods section should be presented in the past tense.

  • Study population and sampling -- where did the data come from; how robust is it; note where gaps exist or what was excluded. Note the procedures used for their selection;
  • Data collection – describe the tools and methods used to collect information and identify the variables being measured; describe the methods used to obtain the data; and, note if the data was pre-existing [i.e., government data] or you gathered it yourself. If you gathered it yourself, describe what type of instrument you used and why. Note that no data set is perfect--describe any limitations in methods of gathering data.
  • Data analysis -- describe the procedures for processing and analyzing the data. If appropriate, describe the specific instruments of analysis used to study each research objective, including mathematical techniques and the type of computer software used to manipulate the data.

Results The finding of your study should be written objectively and in a succinct and precise format. In quantitative studies, it is common to use graphs, tables, charts, and other non-textual elements to help the reader understand the data. Make sure that non-textual elements do not stand in isolation from the text but are being used to supplement the overall description of the results and to help clarify key points being made. Further information about how to effectively present data using charts and graphs can be found here .

  • Statistical analysis -- how did you analyze the data? What were the key findings from the data? The findings should be present in a logical, sequential order. Describe but do not interpret these trends or negative results; save that for the discussion section. The results should be presented in the past tense.

Discussion Discussions should be analytic, logical, and comprehensive. The discussion should meld together your findings in relation to those identified in the literature review, and placed within the context of the theoretical framework underpinning the study. The discussion should be presented in the present tense.

  • Interpretation of results -- reiterate the research problem being investigated and compare and contrast the findings with the research questions underlying the study. Did they affirm predicted outcomes or did the data refute it?
  • Description of trends, comparison of groups, or relationships among variables -- describe any trends that emerged from your analysis and explain all unanticipated and statistical insignificant findings.
  • Discussion of implications – what is the meaning of your results? Highlight key findings based on the overall results and note findings that you believe are important. How have the results helped fill gaps in understanding the research problem?
  • Limitations -- describe any limitations or unavoidable bias in your study and, if necessary, note why these limitations did not inhibit effective interpretation of the results.

Conclusion End your study by to summarizing the topic and provide a final comment and assessment of the study.

  • Summary of findings – synthesize the answers to your research questions. Do not report any statistical data here; just provide a narrative summary of the key findings and describe what was learned that you did not know before conducting the study.
  • Recommendations – if appropriate to the aim of the assignment, tie key findings with policy recommendations or actions to be taken in practice.
  • Future research – note the need for future research linked to your study’s limitations or to any remaining gaps in the literature that were not addressed in your study.

Black, Thomas R. Doing Quantitative Research in the Social Sciences: An Integrated Approach to Research Design, Measurement and Statistics . London: Sage, 1999; Gay,L. R. and Peter Airasain. Educational Research: Competencies for Analysis and Applications . 7th edition. Upper Saddle River, NJ: Merril Prentice Hall, 2003; Hector, Anestine. An Overview of Quantitative Research in Composition and TESOL . Department of English, Indiana University of Pennsylvania; Hopkins, Will G. “Quantitative Research Design.” Sportscience 4, 1 (2000); "A Strategy for Writing Up Research Results. The Structure, Format, Content, and Style of a Journal-Style Scientific Paper." Department of Biology. Bates College; Nenty, H. Johnson. "Writing a Quantitative Research Thesis." International Journal of Educational Science 1 (2009): 19-32; Ouyang, Ronghua (John). Basic Inquiry of Quantitative Research . Kennesaw State University.

Strengths of Using Quantitative Methods

Quantitative researchers try to recognize and isolate specific variables contained within the study framework, seek correlation, relationships and causality, and attempt to control the environment in which the data is collected to avoid the risk of variables, other than the one being studied, accounting for the relationships identified.

Among the specific strengths of using quantitative methods to study social science research problems:

  • Allows for a broader study, involving a greater number of subjects, and enhancing the generalization of the results;
  • Allows for greater objectivity and accuracy of results. Generally, quantitative methods are designed to provide summaries of data that support generalizations about the phenomenon under study. In order to accomplish this, quantitative research usually involves few variables and many cases, and employs prescribed procedures to ensure validity and reliability;
  • Applying well established standards means that the research can be replicated, and then analyzed and compared with similar studies;
  • You can summarize vast sources of information and make comparisons across categories and over time; and,
  • Personal bias can be avoided by keeping a 'distance' from participating subjects and using accepted computational techniques .

Babbie, Earl R. The Practice of Social Research . 12th ed. Belmont, CA: Wadsworth Cengage, 2010; Brians, Craig Leonard et al. Empirical Political Analysis: Quantitative and Qualitative Research Methods . 8th ed. Boston, MA: Longman, 2011; McNabb, David E. Research Methods in Public Administration and Nonprofit Management: Quantitative and Qualitative Approaches . 2nd ed. Armonk, NY: M.E. Sharpe, 2008; Singh, Kultar. Quantitative Social Research Methods . Los Angeles, CA: Sage, 2007.

Limitations of Using Quantitative Methods

Quantitative methods presume to have an objective approach to studying research problems, where data is controlled and measured, to address the accumulation of facts, and to determine the causes of behavior. As a consequence, the results of quantitative research may be statistically significant but are often humanly insignificant.

Some specific limitations associated with using quantitative methods to study research problems in the social sciences include:

  • Quantitative data is more efficient and able to test hypotheses, but may miss contextual detail;
  • Uses a static and rigid approach and so employs an inflexible process of discovery;
  • The development of standard questions by researchers can lead to "structural bias" and false representation, where the data actually reflects the view of the researcher instead of the participating subject;
  • Results provide less detail on behavior, attitudes, and motivation;
  • Researcher may collect a much narrower and sometimes superficial dataset;
  • Results are limited as they provide numerical descriptions rather than detailed narrative and generally provide less elaborate accounts of human perception;
  • The research is often carried out in an unnatural, artificial environment so that a level of control can be applied to the exercise. This level of control might not normally be in place in the real world thus yielding "laboratory results" as opposed to "real world results"; and,
  • Preset answers will not necessarily reflect how people really feel about a subject and, in some cases, might just be the closest match to the preconceived hypothesis.

Research Tip

Finding Examples of How to Apply Different Types of Research Methods

SAGE publications is a major publisher of studies about how to design and conduct research in the social and behavioral sciences. Their SAGE Research Methods Online and Cases database includes contents from books, articles, encyclopedias, handbooks, and videos covering social science research design and methods including the complete Little Green Book Series of Quantitative Applications in the Social Sciences and the Little Blue Book Series of Qualitative Research techniques. The database also includes case studies outlining the research methods used in real research projects. This is an excellent source for finding definitions of key terms and descriptions of research design and practice, techniques of data gathering, analysis, and reporting, and information about theories of research [e.g., grounded theory]. The database covers both qualitative and quantitative research methods as well as mixed methods approaches to conducting research.

SAGE Research Methods Online and Cases

  • << Previous: Qualitative Methods
  • Next: Insiderness >>
  • Last Updated: Apr 5, 2024 1:38 PM
  • URL: https://libguides.usc.edu/writingguide
  • Open access
  • Published: 01 December 2021

Feature selection revisited in the single-cell era

  • Pengyi Yang   ORCID: orcid.org/0000-0003-1098-3138 1 , 2 , 3 ,
  • Hao Huang 1 , 2 &
  • Chunlei Liu 2  

Genome Biology volume  22 , Article number:  321 ( 2021 ) Cite this article

10k Accesses

29 Citations

31 Altmetric

Metrics details

Recent advances in single-cell biotechnologies have resulted in high-dimensional datasets with increased complexity, making feature selection an essential technique for single-cell data analysis. Here, we revisit feature selection techniques and summarise recent developments. We review their application to a range of single-cell data types generated from traditional cytometry and imaging technologies and the latest array of single-cell omics technologies. We highlight some of the challenges and future directions and finally consider their scalability and make general recommendations on each type of feature selection method. We hope this review stimulates future research and application of feature selection in the single-cell era.

Introduction

High-throughput biotechnologies are at the centre of modern molecular biology, where typically a sheer number of biomolecules are measured in cells and tissues. While significantly higher coverage of molecules is achieved by high-throughput biotechnologies compared to traditional biochemical assays, the variation in sample quality, reagents and workflow introduces profound technical variation in the data. The high dimensionality, redundancy and noise commonly found in these large-scale molecular datasets create significant challenges in their analysis and can lead to a reduction in model generalisability and reliability. Feature selection, a class of computational techniques for data analytics and machine learning, is at the forefront in dealing with these challenges and has been an essential driving force in a wide range of bioinformatics applications [ 1 ].

Until recently, the global molecular signatures generated from most high-throughput biotechnologies have been the average profiles of mixed populations of cells from tissues, organs or patients, and feature selection techniques have been predominately applied to such ‘bulk’ data. However, the recent development of technologies that enables the profiling of various molecules (e.g. DNA, RNA, protein) in individual cells at the omics scale has revolutionised our ability to study various molecular programs and cellular processes at the single-cell resolution [ 2 ]. The accumulation of large-scale and high-dimensional single-cell data has seen renewed interests in developing and the need for applying feature selection techniques to such data given their increased scale and complexity compared to their bulk counterparts.

To foster research in feature selection in the new era of single-cell sciences, we set out to revisit the feature selection literature, summarise its advancement in the last decade and recent development in the field of deep learning and review its current applications in various single-cell data types. We then discuss some key challenges and opportunities that we hope would inspire future research and development in this fast-growing interdisciplinary field. Finally, we consider the scalability and applicability of each type of feature selection methods and make general recommendations to their usage.

Basics of feature selection techniques

Feature selection refers to a class of computational methods where the aim is to select a subset of useful features from the original feature set in a dataset. When dealing with high-dimensional data, feature selection is an effective strategy to reduce the feature dimension and redundancy and can alleviate issues such as model overfitting in downstream analysis. Different from dimension reduction methods (e.g. principal component analysis) where features in a dataset are combined and/or transformed to derive a lower feature dimension, feature selection methods do not alter the original features in the dataset but only identify and select features that satisfy certain pre-defined criteria or optimise certain computational procedures [ 3 ]. The application of feature selection in bioinformatics is widespread [ 1 ]. Some of the most popular research directions include selecting genes that can discriminate complex diseases such as cancers from microarray data [ 4 , 5 ], selecting protein markers that can be used for disease diagnosis and prognostic prediction from mass spectrometry-based proteomics data [ 6 ], identifying single nucleotide polymorphisms (SNPs) and their interactions that are associated with specific phenotypes or diseases in genome-wide association studies (GWAS) [ 7 ], selecting epigenetic features that mark cancer subtypes [ 8 ] and selecting DNA structural properties for predicting genomic regulatory elements [ 9 ]. Traditionally, feature selection techniques fall into one of the three categories including filters, wrappers and embedded methods (Fig. 1 ). In this section, we revisit the key properties and defining characteristics of the three categories of feature selection methods. Please refer to [ 10 ] for a comprehensive survey of feature selection methods.

figure 1

Schematic illustrations of typical filter ( a ), wrapper ( b ) and embedded methods ( c ) in feature selection

Filter methods typically rank the features based on certain criteria that may facilitate other subsequent analyses (e.g. discriminating samples) and select those that pass a threshold judged by the filtering criteria (Fig. 1 A). In bioinformatics applications, commonly used criteria are univariate methods such as t statistics, on which most ‘differential expression’ (DE) methods for biological data analysis are built [ 11 ], and multivariate methods that take into account relationships among features [ 12 ]. The main advantages of filter methods lie in their simplicity, requiring less computational resources in general and ease of applications in practice [ 13 ]. However, filter methods typically select features independent from the induction algorithms (e.g. classification algorithms) that are applied for downstream analyses, and therefore, the selected features may not be optimal with respect to the induction algorithms in the subsequent applications.

In comparison, wrappers utilise the performance of the induction algorithms to guide the feature selection process and therefore may lead to features that are more conducive to the induction algorithm used for optimisation in downstream analyses [ 14 ] (Fig. 1 B). A key aspect of wrapper methods is the design of the feature optimisation algorithms that maximise the performance of the induction algorithms. Since the feature dimensions are typically very high in bioinformatics applications, exhaustive search is often impractical. To this end, various greedy algorithms, such as forward and backward selection [ 15 ], and nature-inspired algorithms, such as the genetic algorithm (GA) [ 16 ] and the particles swarm optimisation (PSO) [ 17 ], were employed to speed up the optimisation and feature selection processes. Nevertheless, since the induction algorithms are included to iteratively evaluate feature subsets, wrappers are typically computationally intensive compared to filter methods.

While filters and wrappers separate feature selection from downstream analysis, embedded methods typically perform feature selection as part of the induction algorithm itself [ 18 ] (Fig. 1 C). Akin to wrappers, embedded methods optimise selected features with respect to an induction model and therefore may lead to more suited features for the induction algorithm in subsequent tasks such as sample classification. Since the embedded methods perform feature selection and induction simultaneously, it is also generally more computationally efficient than wrapper methods albeit less so when compared to filter methods [ 19 ]. Nevertheless, as feature selection is part of the induction algorithm in embedded methods, they are often specific to the algorithmic design and less generic compared to filters and wrappers. Popular choices of embedded methods in bioinformatics applications include tree-based methods [ 20 , 21 ] and shrinkage-based methods such as LASSO [ 22 ].

Advance of feature selection in the past decade

Besides the astonishing increase in the number of feature selection techniques in the last decade, we have also seen a few notable trends in their development. Here, we summarise three aspects that have shown proliferating research in various fields and applications, including bioinformatics.

First, a variety of approaches have been proposed for ensemble feature selection, including those for filters [ 23 , 24 ], wrappers [ 25 ] and embedded methods such as tree-based ensembles [ 26 ]. Ensemble learning is a well-established approach where instead of building a single model, multiple ‘base’ models are combined to perform tasks [ 27 ]. Supervised ensemble classification models are popular among bioinformatics applications [ 28 ] and have recently seen their increasing integration with deep learning models [ 29 ]. Similar to their counterpart in supervised learning, ensemble feature selection methods, typically, rely on either perturbation to the dataset or hyperparameters of the feature selection algorithms for creating ‘base selectors’ from which the ensemble could be derived [ 30 ]. Examples include using different subsets of samples for creating multiple filters or using different learning parameters in an induction algorithm of a wrapper method. Key attributes of ensemble feature selection methods are that they generally achieve better generalisability in sample classification [ 31 ] and higher reproducibility in feature selection [ 32 , 33 ]. Although these improvements in performance typically come with a cost on computational efficiency, ensemble feature selection methods are increasingly popular given the increasing computational capacity in the last decade and the parallelisation in some of their implementations [ 34 , 35 , 36 ].

Second, various hybrid methods have been proposed to combine filters, wrappers and embedded methods [ 37 ]. While these methods closely resemble ensemble approaches, they do not rely on data or model perturbations but instead use heterogeneous feature selection algorithms for creating a consensus [ 38 ]. Typically, these include combining different filter algorithms or different types of feature selection algorithms (e.g. stepwise combination of filter and wrapper). Generally, hybrid methods are motivated by the aim of taking advantage of the strengths of individual methods while alleviating or avoiding their weaknesses [ 39 ]. For example, in bioinformatics applications, several methods combine filters with wrappers in that filters are first applied to reduce the number of features from high dimension to a moderate number so that wrappers can be employed more efficiently for generating the final set of features [ 40 , 41 ]. As another example, genes selected by various feature selection methods are used for training a set of support vector machines (SVMs) for achieving better classification accuracy using microarray data [ 42 ]. While many hybrid feature selection algorithms are intuitive and numerous studies have reported favourable results compared to their individual components, a fundamental issue of these methods is their ad hoc nature, complicating the formal analysis of their underlying properties, such as theoretical algorithmic complexity and scalability.

Third, a recent evolution in feature selection has been its development and implementation using deep learning models. These include models based on perturbation [ 43 , 44 ], such as randomly excluding features to test their impact on the neural network output, and gradient propagation, where the gradient from the trained neural network is backpropagated to determine the importance of the input feature [ 45 , 46 ]. These deep learning feature selection models share a common concept of ‘saliency’ which was initially designed for interpreting black-box deep neural networks by highlighting input features that are relevant for the prediction of the model [ 47 ]. Some examples in bioinformatics applications include a deep feature selection model that uses a neural network with a weighted layer to select key input features for the identification and understanding regulatory events [ 48 ]; and a generative adversarial network approach for identifying genes that are associated with major depressive disorders using gradient-based methods [ 49 ]. While feature selection methods that are based on deep learning generally require significantly more computational resources (e.g. memory) and may be slower than traditional methods (especially when compared to filter methods), their capabilities for identifying complex relationships (e.g. non-linearity, interaction) among features have attracted tremendous attention in recent years.

Feature selection in the single-cell era

Until recently, the global molecular signatures generated from most biotechnologies are the average profiles from mixed populations of cells, masking the heterogeneity of cell and tissue types, a foundational characteristic of multicellular organisms [ 50 ]. Breakthroughs in global profiling techniques at the single-cell resolution, such as single-cell RNA-sequencing (scRNA-seq), single-cell Assay for Transposase Accessible Chromatin using sequencing (scATAC-seq) [ 51 ] and cellular indexing of transcriptomes and epitopes by sequencing (CITE-seq) [ 52 ], have reshaped many of our long-held views on multicellular biological systems. These advances of single-cell technologies create unprecedented opportunities for studying complex biological systems at resolutions that were previously unattainable and have led to renewed interests in feature selection for analysing such data. Below we review some of the latest developments and applications of feature selection across various domains in the single-cell field. Table 1 summarises the methods and their applications with additional details included in Additional file 1 : Table S1.

Feature selection in single-cell transcriptomics

By far, the most widely applied single-cell omics technologies are single-cell transcriptomics [ 53 ] made popular by an array of scRNA-seq protocols [ 54 ]. Given the availability of a huge amount of scRNA-seq data and the large number of genes profiled in these datasets, a similar characteristic of their bulk counterparts, most of recent feature selection applications in single-cell transcriptomics have been concentrated on gene selection from scRNA-seq data for various upstream pre-processing and downstream data analyses.

Among these, some of the most popular methods are univariate filters designed for identifying differential distributed genes, including t statistics or ANOVA based DE methods [ 55 , 56 ] and other statistical approaches such as differential variability (DV) [ 57 ] and differential proportion (DP) [ 58 ]. While differential distribution-based methods can often identify genes that are highly discriminative for downstream analysis, they require labels such as cell types to be pre-defined, limiting their applicability when such information is not available. A less restrictive and widely used alternative approach is to filter for highly variable genes (HVGs), which is implemented in various methods including the popular Seurat package [ 59 ]. Other methods that do not require label information include SCMarker which relies on testing the number of modalities of each gene through its expression profile [ 60 ], M3Drop which models the relationship between mean expression and dropout rate [ 61 ], and OGFSC, a variant of HVGs, based on modelling coefficient of variance of genes across cells [ 62 ]. Many scRNA-seq clustering algorithms also implement HVGs and its variants for gene filtering to improve the clustering of cells [ 63 ]. Besides the above univariate filters, recent research has also explored multivariate approaches. Examples include COMET which relies on a modified hypergeometric test for filtering gene pairs [ 64 ] and a multinomial method for gene filtering using the deviance statistic [ 65 ].

While filters are the most common options for pre-processing and feature selection from single-cell transcriptomics data, the application of wrapper methods is gaining much attention with a range of approaches built and extends on classic methods with the primary goal of facilitating downstream analyses such as cell type classification. Some examples include the application of classic methods such as greedy-based optimisation of entropy [ 66 ], nature-inspired optimisation such as using GA [ 67 , 68 ], and their hybrid with filters [ 69 , 70 , 71 ] or embedded methods [ 72 ]. More advanced methods include active learning-based feature selection using SVM as a wrapper [ 73 ] and optimisation based on data projection [ 74 ]. The impact of optimal feature selection using wrapper methods on improving cell type classification is well demonstrated through these studies.

Due to the simplicity in their application, the popularity of embedded methods is growing quickly in the last few years especially in studies that treat feature selection as a key goal in their analyses. These include the discovery of the minimum marker gene combinations using tree-based models [ 75 ], discriminative learning of DE genes using logistic regression models [ 76 ], regulatory gene signature identification using LASSO [ 77 ] and marker gene selection based on compressed sensing optimisation [ 78 ].

Lastly, several studies have compared the effect of various feature selection methods on the clustering of cell types [ 63 ] and investigated factors that affect feature selection in cell lineage analysis [ 79 ]. Together, these studies demonstrate the utility and flexibility of feature selection techniques in a wide range of tasks in single-cell transcriptomic data analyses.

Feature selection in single-cell epigenomics

Besides single-cell transcriptomic profiling, another fast-maturing single-cell omics technology is single-cell epigenomics profiling using scATAC-seq [ 51 ]. In particular, scATAC-seq measures genome-wide chromatin accessibility and therefore can provide a clue regarding the activity of epigenomic regulatory elements and their transcription factor binding motifs in single cells. Such data can offer additional information that is not accessible to scRNA-seq technologies and hence can complement and significantly enrich scRNA-seq data for characterising cell identity and gene regulatory networks (GRNs) in single cells [ 80 ]. Although most application of feature selection has been on investigating single-cell transcriptomes, recent studies have broadened the view to single-cell epigenomics primarily through their application in scATAC-seq data analysis. These analyses enable us to expand the gene expression analysis to also include regulatory elements such as enhancers and silencers in understanding molecular and cellular processes.

Feature selection methods could be directly applied to scATAC-seq data for identifying differential accessible chromatin regions or one can summarise scATAC-seq data to the gene level using tools such as those reviewed in [ 81 ] and then feature selection be performed for selecting ‘differentially accessible genes’ (DAGs) using such summarised data. For instance, Scasat, a tool for classifying cells using scATAC-seq data, implements both information gain and Fisher’s exact test for filtering and selecting differential accessible chromatin regions [ 82 ]. Similarly, scATAC-pro, a pipeline for scATAC-seq analysis at the chromatin level, employs Wilcoxon test as the default for filtering differential accessible chromatin regions, while also implements embedded methods such as logistic regression and negative binomial regression-based models as alternative options [ 83 ]. Another example is SnapATAC [ 84 ] which performs differential accessible chromatin analysis using the DE method implemented in edgeR [ 85 ]. In contrast, Kawaguchi et al. [ 86 ] summarised scATAC-seq data to the gene level using SCANPY [ 87 ] and performed embedded feature selection using either logistic LASSO or random forests to identify DAGs [ 86 ]. Muto et al. [ 88 ] performed filter-based differential analysis on both chromatin and gene levels based on Cicero estimated gene activity scores [ 89 ]. Finally, DUBStepR [ 71 ], a hybrid approach that combines a correlation-based filter and a regression-based wrapper for gene selection from scRNA-seq data, can also be applied to scATAC-seq data. Collectively, these methods and tools demonstrate the utility and impact of feature selection on scATAC data for cell-type identification, motif analysis, regulatory element and gene interaction detection among other applications.

Feature selection for single-cell surface proteins

Owing to the recent advancement in flow cytometry and related technologies such as mass cytometry [ 90 , 91 ], and single-cell multimodal sequencing technologies such as CITE-seq [ 52 ], surface proteins of the cells have now also become increasingly accessible at the single-cell resolution.

A key application of feature selection methods to flow and mass cytometry data has been for finding optimal protein markers for cell gating [ 92 ]. A representative example is GateFinder which implements a random forest-based feature selection procedure for optimising stepwise gating strategies on each given dataset [ 93 ]. Besides automated gating, several studies have also explored the use of feature selection for improving model performance on sample classification. For example, in their study, Hassan et al. [ 94 ] demonstrated the utility of shrinkage-based embedded models for classifying cancer samples. Another application of feature selection techniques was recently demonstrated by Tanhaemami et al. [ 95 ] for discovering signatures from label-free single cells. In particular, the authors employed a GA for feature selection and verified its utility in predicting lipid contents in algal cells under different conditions. Together, these studies illustrate the wide applicability of feature selection methods in a wide range of challenges in flow and mass cytometry data analysis.

Recent advancement in single-cell multimodal sequencing technologies such as CITE-seq and other related techniques such as RNA expression and protein sequencing (REAP-seq) [ 96 ] has enabled the profiling of both surface proteins and gene expressions at the single-cell level. While still at its infancy, feature selection techniques have already found their use in such data. One example is the application of a random forest-based approach for selecting marker proteins that can distinguish closely related cell types profiled using CITE-seq from PBMCs isolated from the blood of healthy human donors [ 97 ]. Another example is the use of a greedy forward feature selection wrapper that maximises a logistic regression model for identifying surface protein markers for each cell type from a given CITE-seq dataset [ 98 ].

Feature selection in single-cell imaging data

Other widely accessible data at the single-cell resolution are imaging-related data types such as those generated by image cytometry [ 99 ] and various single-cell imaging techniques [ 100 ]. Although the application of feature selection methods in this domain is very diverse, the following examples provide a snapshot of different types of feature selection techniques used for single-cell imaging data analysis.

To classify cell states using imaging flow cytometry data, Pischel et al. [ 101 ] employed a set of filters, including mutual information maximisation, maximum relevance minimum redundancy and Fisher score, for feature selection and demonstrated their utility on apoptosis detection. To predict cell cycle phases, Hennig et al. [ 102 ] implemented two embedded feature selection techniques, gradient boosting and random forest, for selecting the most predictive features from image cytometry data. These implementations are included in the CellProfiler, open-source software for imaging flow cytometry data analysis. To improve data interpretability of single-cell imaging data, Peralta and Saeys [ 103 ] proposed a clustering-based method for selecting representative features from each cluster and thus significantly reducing data dimensionality. To classify cell phenotypes, Doan et al. [ 104 ] implemented supervised and weakly supervised deep learning models in a framework called Deepometry for feature selection from imaging cytometry data. To classify cells according to their response to insulin stimulation, Norris et al. [ 105 ] used a random forest approach for ranking the informativeness of various temporal features extracted from time-course live-cell imaging data. Finally, to select spatially variable genes from imaging data generated by multiplexed single-molecule fluorescence in situ hybridization (smFISH), Svensson et al. [ 106 ] introduced a model based on the Gaussian process regression that decomposes expression and spatial information for gene selection.

Upcoming domains and future opportunities

The works reviewed above covers some of the most popular single-cell data types. Nevertheless, the technological advances in the single-cell field are extending our capability at a breakneck speed, enabling many other data modalities [ 107 ] as well as the spatial locations [ 108 ] of individual cells to be captured in high-throughput. For instance, recent development in single-cell DNA-sequencing provides the opportunity to analyse SNPs and copy-number variations (CNVs) in individual cells from cancer and normal tissues [ 109 , 110 ], and single-cell proteomics seems now on the horizon [ 111 , 112 ], holding great promises to further transform the single-cell field. Given the high feature-dimensionality of such data (e.g. numbers of SNPs, proteins and spatial locations), we anticipate feature selection techniques to be readily adopted for these single-cell data types when they become more available.

Another fast-growing capability in the single-cell field is increasingly towards multimodality. CITE-seq and REAP-seq are examples where both the gene expression and the surface proteins are measured in each individual cell. Nevertheless, many more recent techniques now also enable other combinations of modalities to be profiled at the single-cell level (Fig. 2 ). Some examples include ASAP-seq for profiling gene expression, chromatin accessibility and protein levels [ 113 ]; scMT-seq for profiling gene expression and DNA methylation [ 114 ] and its extension, scNMT-seq, for gene expression, chromatin accessibility and DNA methylation [ 115 ]; SHARE-seq and SNARE-seq for gene expression and chromatin accessibility [ 116 , 117 ]; scTrio-seq for CNVs, DNA methylation and gene expression [ 118 ]; and G&T-seq for genomic DNA and gene expression [ 119 ]. Given the complexity in the data structure in these single-cell multimodal data, feature selection methods that can facilitate integrative analysis of multiple data modalities are in great need. While some preliminary works have emerged recently [ 120 ], research on integrative feature selection is still at its infancy and requires significant innovation in their design and implementation.

figure 2

A schematic summary of some recent multimodal single-cell omics technologies

On the design of feature selection techniques in the single-cell field, most current studies directly use one of the three main types of methods (i.e. filters, wrappers and embedded methods). While we found a small number of them employed hybrid approaches (e.g. [ 71 , 72 ]), most are relatively straightforward combinations (such as stepwise application of filter and then wrapper methods) as have been used previously for bulk data analyses. The application of ensemble and deep learning-based feature selection methods is even sparser in the field. One ensemble feature selection method is EDGE which uses a set of weak learners to vote for important genes from scRNA-seq data [ 121 ], and the current literature on deep learning-based feature selection in single cells are a study for identifying regulatory modules from scRNA-seq data through autoencoder deconvolution [ 122 ]; and another for identifying disease-associated gene from scRNA-seq data using gradient-based methods [ 49 ]. Owing to the non-linear nature of the deep learning models, feature selection methods that are based on deep learning are well-suited to learn complex non-linear relationships among features. Given the widespread non-linearity relationships, such as gene-gene and protein-protein interactions, and interactions among genomic regulatory elements and their target genes in biological systems, and hence the data derived from them, we anticipate more research to be conducted on developing and adopting deep learning-based feature selection techniques in the single-cell field in the near future.

Applicability considerations

The works we have reviewed above showcase diverse feature selection strategies and promising future directions in single-cell data analytics. In practice, scalability and robustness are critical in choosing feature selection techniques and are largely dependent on the algorithm structure and implementation. Here, we discuss several key aspects specific to the utility and applicability of feature selection methods with the goal of guiding the choice of methods from each feature selection category for readers who are interested in their application.

Scalability towards the feature dimension

A key aspect in the applicability of a feature selection method rests upon its scalability to large datasets. Univariate filter algorithms are probably the most efficient in terms of scalability towards the feature dimension since, in general, the computation time of these algorithms increases linearly with the number of features. We therefore recommend univariate filters as the first choice when working with datasets with very high feature dimensions. In comparison, wrapper algorithms generally do not scale well with respect to the number of features due to their frequent reliance on combinatorial optimisation and therefore will remain applicable to datasets with a relatively small number of features. While other factors such as available computational resources and specific algorithm implementations also affect the choice of methods, wrapper algorithms are generally applied to datasets with up to a few hundred features. Embedded methods offer a good trade-off and both tree- and shrinkage-based methods computationally scale well with the number of features [ 19 ]. Nevertheless, like wrapper methods, embedded methods rely on an induction algorithm for feature selection and therefore are sensitive to model overfitting when dealing with data with a small sample size. We recommend choosing embedded methods for datasets with up to a few thousand features when the sample size (e.g. number of cells) is moderate or large. Similarly, hybrid algorithms that combine the filter with wrappers or filter with embedded methods also make a useful compromise and can be applied to the dataset with relatively high to very high feature dimensions, depending on the reduced feature dimension following the filtering step.

Scalability towards the sample size

With the advance of biotechnologies, the number of cells profiled in an experiment is growing exponentially. Hence, apart from the feature dimensionality, the scalability of the feature selection algorithm towards the sample size, typically in terms of the number of cells, is also a central determinant of its applicability to large-scale single-cell datasets. Although classic feature selection algorithms such as filters scale linearly towards the feature dimension, this does not necessarily mean they also scale linearly with the increasing number of cells [ 55 ]. To this end, the choice is more dependent on the specific implementation of the feature selection algorithms. Methods that purely rely on estimating variabilities (e.g. HVGs) without using cell type labels and fitting models generally scale better due to the extra steps taken by the latter for learning various data characteristics (e.g. zero-inflation). Another aspect to note is the memory usage. Most filter methods require the entire dataset to be loaded into the computer memory before feature selection can be performed. This can be an issue when the size of the dataset exceeds the size of the computer memory. Interestingly, deep learning-based feature selection methods could be better suited for analysing datasets with a very large number of cells. This is due to the unique characteristic of these methods where the neural network can be trained using small batches of input data sequentially and therefore alleviates the need to load the entire dataset into the computer memory.

Robustness and interpretability

Besides algorithm scalability, robustness and interpretability are also important criteria for assessing and selecting feature selection methods. This is especially crucial when the downstream applications are to identify reproducible biomarkers, where the selection of robust and stable features is essential, or to characterise gene regulatory networks, where model interpretability will be highly desirable. A key property of ensemble feature selection methods is their robustness to noise and slight variations in the data, which leads to better reproducibility in selected features [ 32 , 33 ]. We thus recommend exploring ensemble feature selection methods when the task is related to identify reproducible biomarkers such as marker genes for cells of a given type. In terms of interpretability, complex models, while often offering better performance in downstream analyses such as cell classification, may not be the most appropriate choices given the difficulties in their model interpretation. To this end, simpler models such as tree-based methods can provide clarity, for example, to how selected features are used to classify a cell and hence can facilitate the characterisation of gene regulatory networks underlying cell identity. Notably, however, significant progress has been made to improve interpretability especially for deep learning models [ 123 ]. Given the increasing importance of downstream analyses that involves biomarker discovery and pathway/network characterisation in single-cell research, we anticipate increasing efforts to be devoted to improving the robustness and interpretability of advanced methods such as deep learning models in feature selection applications.

Other considerations

Finally, the choice of feature selection methods also depends on other factors such as programming language, computing platform, parallelisation and whether they are well documented and easy to use. While most recent methods are implemented using popular programming languages such as R and Python which are well supported in various computing platforms including Windows, macOS and Linux/Unix and its variants, their difficulty in application varies and requires different levels of expertise from interacting with a simple graphical user interface to more complex execution that involves programming (e.g. loading packages in the R programming environment). Methods that optimise for computation speed may use C/C++ as their programming language and may also offer parallelisation. However, these methods are often computing platform-specific and may require more expertise from a specific operating system and programming language from users for their application. Lastly, the quality of the documentation of methods can have a significant impact on their ease of use. Methods that have comprehensive documentations with testable examples could help popularise their application. To this end, methods that are implemented under standardised framework such as Bioconductor [ 124 ] generally provide well-documented usages and examples known as ‘vignette’ for supporting users and therefore can be a practical consideration in their choices.

Conclusions

The explosion of single-cell data in recent years has led to a resurgence in the development and application of feature selection techniques for analysing such data. In this review, we revisited and summarised feature selection methods and their key development in the last decade. We then reviewed the recent literature for their applications in the single-cell field, summarising achievements so far and identifying missing aspects in the field. Based on these, we propose several research directions and discuss practical considerations that we hope will spark future research in feature selection and their application in the single-cell era.

Availability of data and materials

Not applicable.

Saeys Y, Inza I, Larranaga P. A review of feature selection techniques in bioinformatics. Bioinformatics. 2007;23(19):2507–17. https://doi.org/10.1093/bioinformatics/btm344 .

Article   CAS   PubMed   Google Scholar  

Efremova M, Teichmann SA. Computational methods for single-cell omics across modalities. Nature Methods. 2020;17(1):14–7. https://doi.org/10.1038/s41592-019-0692-4 .

Guyon I, Elisseeff A. An introduction to variable and feature selection. Journal of Machine Learning Research. 2003;3:1157–82.

Google Scholar  

Lazar C, Taminau J, Meganck S, Steenhoff D, Coletta A, Molter C, et al. A survey on filter techniques for feature selection in gene expression microarray analysis. IEEE/ACM Transactions on Computational Biology and Bioinformatics. 2012;9(4):1106–19. https://doi.org/10.1109/TCBB.2012.33 .

Article   PubMed   Google Scholar  

Bolón-Canedo V, Sánchez-Marono N, Alonso-Betanzos A, Benítez JM, Herrera F. A review of microarray datasets and applied feature selection methods. Information Sciences. 2014;282:111–35. https://doi.org/10.1016/j.ins.2014.05.042 .

Article   Google Scholar  

Levner I. Feature selection and nearest centroid classification for protein mass spectrometry. BMC Bioinformatics. 2005;6(1):1–14. https://doi.org/10.1186/1471-2105-6-68 .

Article   CAS   Google Scholar  

Yang P, Ho JW, Zomaya AY, Zhou BB. A genetic ensemble approach for gene-gene interaction identification. BMC Bioinformatics. 2010;11(1):1–15. https://doi.org/10.1186/1471-2105-11-524 .

Model F, Adorjan P, Olek A, Piepenbrock C. Feature selection for DNA methylation based cancer classification. Bioinformatics. 2001;17(Suppl 1):S157–64. https://doi.org/10.1093/bioinformatics/17.suppl_1.S157 .

Gan Y, Guan J, Zhou S. A comparison study on feature selection of DNA structural properties for promoter prediction. BMC Bioinformatics. 2012;13(1):1–12. https://doi.org/10.1186/1471-2105-13-4 .

Chandrashekar G, Sahin F. A survey on feature selection methods. Computers & Electrical Engineering. 2014;40(1):16–28. https://doi.org/10.1016/j.compeleceng.2013.11.024 .

Ritchie ME, Phipson B, Wu D, Hu Y, Law CW, Shi W, et al. limma powers differential expression analyses for RNA-sequencing and microarray studies. Nucleic Acids Research. 2015;43(7):e47–7. https://doi.org/10.1093/nar/gkv007 .

Ding C, Peng H. Minimum redundancy feature selection from microarray gene expression data. Journal of Bioinformatics and Computational Biology. 2005;3(02):185–205. https://doi.org/10.1142/S0219720005001004 .

Bommert A, Sun X, Bischl B, Rahnenführer J, Lang M. Benchmark for filter methods for feature selection in high-dimensional classification data. Computational Statistics & Data Analysis. 2020;143:106839. https://doi.org/10.1016/j.csda.2019.106839 .

Kohavi R, John GH. Wrappers for feature subset selection. Artificial Intelligence. 1997;97(1-2):273–324. https://doi.org/10.1016/S0004-3702(97)00043-X .

Aha, D. W. & Bankert, R. L. A comparative evaluation of sequential feature selection algorithms. In Learning From Data, 199–206 (Springer, 1996).

Li L, Weinberg CR, Darden TA, Pedersen LG. Gene selection for sample classification based on gene expression data: study of sensitivity to choice of parameters of the GA/KNN method. Bioinformatics. 2001;17(12):1131–42. https://doi.org/10.1093/bioinformatics/17.12.1131 .

Yang P, Xu L, Zhou BB, Zhang Z, Zomaya AY. A particle swarm based hybrid system for imbalanced medical data sampling. BMC Genomics. 2009;10(Suppl 3):S34. https://doi.org/10.1186/1471-2164-10-S3-S34 .

Article   PubMed   PubMed Central   Google Scholar  

Lal, T. N., Chapelle, O., Weston, J. & Elisseeff, A. Embedded methods. In Feature Extraction, 137–165 (Springer, 2006).

Bolón-Canedo V, Sánchez-Maroño N, Alonso-Betanzos A. A review of feature selection methods on synthetic data. Knowledge and Information Systems. 2013;34(3):483–519. https://doi.org/10.1007/s10115-012-0487-8 .

Deng, H. & Runger, G. Feature selection via regularized trees. In The 2012 International Joint Conference on Neural Networks (IJCNN), 1–8 (IEEE, 2012).

Breiman L. Random forests. Machine Learning. 2001;45(1):5–32. https://doi.org/10.1023/A:1010933404324 .

Tibshirani R. Regression shrinkage and selection via the lasso. Journal of the Royal Statistical Society: Series B (Methodological). 1996;58:267–88.

Saeys, Y., Abeel, T. & Van de Peer, Y. Robust feature selection using ensemble feature selection techniques. In Joint European Conference on Machine Learning and Knowledge Discovery in Databases, 313–325 (Springer, 2008).

Abeel T, Helleputte T, Van de Peer Y, Dupont P, Saeys Y. Robust biomarker identification for cancer diagnosis with ensemble feature selection methods. Bioinformatics. 2010;26(3):392–8. https://doi.org/10.1093/bioinformatics/btp630 .

Yang, P., Liu, W., Zhou, B. B., Chawla, S. & Zomaya, A. Y. Ensemble-based wrapper methods for feature selection and class imbalance learning. In Pacific-Asia conference on knowledge discovery and data mining, 544–555 (Springer, 2013).

Tuv E, Borisov A, Runger G, Torkkola K. Feature selection with ensembles, artificial variables, and redundancy elimination. The Journal of Machine Learning Research. 2009;10:1341–66.

Dietterich, T. G. Ensemble methods in machine learning. In International Workshop on Multiple Classifier Systems, 1–15 (Springer, 2000).

Yang P, Hwa Yang Y. B Zhou, B. & Y Zomaya, A. A review of ensemble methods in bioinformatics. Current Bioinformatics. 2010;5(4):296–308. https://doi.org/10.2174/157489310794072508 .

Cao Y, Geddes TA, Yang JYH, Yang P. Ensemble deep learning in bioinformatics. Nature Machine Intelligence. 2020;2:500–8.

Bolón-Canedo V, Alonso-Betanzos A. Ensembles for feature selection: a review and future trends. Information Fusion. 2019;52:1–12. https://doi.org/10.1016/j.inffus.2018.11.008 .

Brahim AB, Limam M. Ensemble feature selection for high dimensional data: a new method and a comparative study. Advances in Data Analysis and Classification. 2018;12(4):937–52. https://doi.org/10.1007/s11634-017-0285-y .

Yang, P., Zhou, B. B., Yang, J. Y.-H. & Zomaya, A. Y. Stability of feature selection algorithms and ensemble feature selection methods in bioinformatics. Biological Knowledge Discovery Handbook, 333–352 (2013).

Pes B. Ensemble feature selection for high-dimensional data: a stability analysis across multiple domains. Neural Computing and Applications. 2020;32(10):5951–73. https://doi.org/10.1007/s00521-019-04082-3 .

Hijazi, N. M., Faris, H. & Aljarah, I. A parallel metaheuristic approach for ensemble feature selection based on multi-core architectures. Expert Systems with Applications 115290 (2021).

Tsai C-F, Sung Y-T. Ensemble feature selection in high dimension, low sample size datasets: Parallel and serial combination approaches. Knowledge-Based Systems. 2020;203:106097. https://doi.org/10.1016/j.knosys.2020.106097 .

Soufan O, Kleftogiannis D, Kalnis P, Bajic VB. Dwfs: a wrapper feature selection tool based on a parallel genetic algorithm. PloS one. 2015;10(2):e0117988. https://doi.org/10.1371/journal.pone.0117988 .

Article   CAS   PubMed   PubMed Central   Google Scholar  

Chen C-W, Tsai Y-H, Chang F-R, Lin W-C. Ensemble feature selection in medical datasets: combining filter, wrapper, and embedded feature selection results. Expert Systems. 2020;37:e12553.

Seijo-Pardo B, Porto-Díaz I, Bolón-Canedo V, Alonso-Betanzos A. Ensemble feature selection: homogeneous and heterogeneous approaches. Knowledge-Based Systems. 2017;118:124–39. https://doi.org/10.1016/j.knosys.2016.11.017 .

Jovic´, A., Brkic´, K. & Bogunovic´, N. A review of feature selection methods with applications. In 2015 38th international convention on information and communication technology, electronics and microelectronics (MIPRO), 1200–1205 (Ieee, 2015).

Yang P, Zhou BB, Zhang Z, Zomaya AY. A multi-filter enhanced genetic ensemble system for gene selection and sample classification of microarray data. BMC Bioinformatics. 2010;11(S1):1–12. https://doi.org/10.1186/1471-2105-11-S1-S5 .

Chuang L-Y, Yang C-H, Wu K-C, Yang C-H. A hybrid feature selection method for dna microarray data. Computers in Biology and Medicine. 2011;41(4):228–37. https://doi.org/10.1016/j.compbiomed.2011.02.004 .

Nanni L, Brahnam S, Lumini A. Combining multiple approaches for gene microarray classification. Bioinformatics. 2012;28(8):1151–7. https://doi.org/10.1093/bioinformatics/bts108 .

Ribeiro, M. T., Singh, S. & Guestrin, C. “Why should I trust you?” explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data mining, 1135–1144 (2016).

Bach S, Binder A, Montavon G, Klauschen F, Müller KR, Samek W. On pixel-wise explanations for non-linear classifier decisions by layer-wise relevance propagation. PloS One. 2015;10(7):e0130140. https://doi.org/10.1371/journal.pone.0130140 .

Simonyan, K., Vedaldi, A. & Zisserman, A. Deep inside convolutional networks: visualising image classification models and saliency maps. In In Workshop at International Conference on Learning Representations (Citeseer, 2014).

Shrikumar, A., Greenside, P. & Kundaje, A. Learning important features through propagating activation differences. In International Conference on Machine Learning, 3145–3153 (PMLR, 2017).

Cancela B, Bolón-Canedo V, Alonso-Betanzos A, Gama J. A scalable saliency-based feature selection method with instance-level information. Knowledge-Based Systems. 2020;192:105326. https://doi.org/10.1016/j.knosys.2019.105326 .

Li Y, Chen C-Y, Wasserman WW. Deep feature selection: theory and application to identify enhancers and promoters. Journal of Computational Biology. 2016;23(5):322–36. https://doi.org/10.1089/cmb.2015.0189 .

Bahrami M, Maitra M, Nagy C, Turecki G, Rabiee HR, Li Y. Deep feature extraction of single-cell transcriptomes by generative adversarial network. Bioinformatics. 2021;37(10):1345–51. https://doi.org/10.1093/bioinformatics/btaa976 .

Buettner F, Natarajan KN, Casale FP, Proserpio V, Scialdone A, Theis FJ, et al. Computational analysis of cell-to-cell heterogeneity in single-cell rna-sequencing data reveals hidden subpopulations of cells. Nature Biotechnology. 2015;33(2):155–60. https://doi.org/10.1038/nbt.3102 .

Cusanovich DA, Daza R, Adey A, Pliner HA, Christiansen L, Gunderson KL, et al. Multiplex single-cell profiling of chromatin accessibility by combinatorial cellular indexing. Science. 2015;348(6237):910–4. https://doi.org/10.1126/science.aab1601 .

Stoeckius M, Hafemeister C, Stephenson W, Houck-Loomis B, Chattopadhyay PK, Swerdlow H, et al. Simultaneous epitope and transcriptome measurement in single cells. Nature Methods. 2017;14(9):865–8. https://doi.org/10.1038/nmeth.4380 .

Aldridge S, Teichmann SA. Single cell transcriptomics comes of age. Nature Communications. 2020;11:1–4.

Mereu E, Lafzi A, Moutinho C, Ziegenhain C, McCarthy DJ, Álvarez-Varela A, et al. Benchmarking single-cell RNA-sequencing protocols for cell atlas projects. Nature Biotechnology. 2020;38(6):747–55. https://doi.org/10.1038/s41587-020-0469-4 .

Soneson C, Robinson MD. Bias, robustness and scalability in single-cell differential expression analysis. Nature Methods. 2018;15(4):255–61. https://doi.org/10.1038/nmeth.4612 .

Vans, E., Patil, A. & Sharma, A. Feats: feature selection-based clustering of single-cell rna-seq data. Briefings in bioinformatics bbaa306.

Lin, Y. et al. scclassify: sample size estimation and multiscale classification of cells using single and multiple reference. Molecular Systems Biology 16, e9389 (2020).

Korthauer KD, Chu LF, Newton MA, Li Y, Thomson J, Stewart R, et al. A statistical approach for identifying differential distributions in single-cell rna-seq experiments. Genome Biology. 2016;17(1):1–15. https://doi.org/10.1186/s13059-016-1077-y .

Stuart T, Butler A, Hoffman P, Hafemeister C, Papalexi E, Mauck WM III, et al. Comprehensive integration of single-cell data. Cell. 2019;177(7):1888–902. https://doi.org/10.1016/j.cell.2019.05.031 .

Wang F, Liang S, Kumar T, Navin N, Chen K. Scmarker: ab initio marker selection for single cell transcriptome profiling. PLoS Computational Biology. 2019;15(10):e1007445. https://doi.org/10.1371/journal.pcbi.1007445 .

Andrews TS, Hemberg M. M3drop: dropout-based feature selection for scrnaseq. Bioinformatics. 2019;35(16):2865–7. https://doi.org/10.1093/bioinformatics/bty1044 .

Hao J, Cao W, Huang J, Zou X, Han Z-G. Optimal gene filtering for single-cell data (ogfsc)—a gene filtering algorithm for single-cell rna-seq data. Bioinformatics. 2019;35(15):2602–9. https://doi.org/10.1093/bioinformatics/bty1016 .

Su K, Yu T, Wu H. Accurate feature selection improves single-cell RNA-seq cell clustering. Briefings in Bioinformatics. 2021;22(5). https://doi.org/10.1093/bib/bbab034 .

Delaney C, Schnell A, Cammarata LV, Yao-Smith A, Regev A, Kuchroo VK, et al. Combinatorial prediction of marker panels from single-cell transcriptomic data. Molecular systems biology. 2019;15(10):e9005. https://doi.org/10.15252/msb.20199005 .

Townes FW, Hicks SC, Aryee MJ, Irizarry RA. Feature selection and dimension reduction for single-cell RNA-seq based on a multinomial model. Genome Biology. 2019;20(1):1–16. https://doi.org/10.1186/s13059-019-1861-6 .

Lall, S., Ghosh, A., Ray, S. & Bandyopadhyay, S. sc-REnF: an entropy guided robust feature selection for clustering of single-cell rna-seq data. bioRxiv (2020).

Aliee H, Theis FJ. Autogenes: automatic gene selection using multi-objective optimization for RNA-seq deconvolution. Cell Systems. 2021;12(7):706–715.e4. https://doi.org/10.1016/j.cels.2021.05.006 .

Gupta S, Verma AK, Ahmad S. Feature selection for topological proximity prediction of single-cell transcriptomic profiles in drosophila embryo using genetic algorithm. Genes. 2021;12(1):28. https://doi.org/10.3390/genes12010028 .

Zhang, J. & Feng, J. Gene selection for single-cell RNA-seq data based on information gain and genetic algorithm. In 2018 14th International Conference on Computational Intelligence and Security (CIS), 57–61 (IEEE, 2018).

Zhang, J., Feng, J. & Yang, X. Gene selection for scRNA-seq data based on information gain and fruit fly optimization algorithm. In 2019 15th International Conference on Computational Intelligence and Security (CIS), 187–191 (IEEE, 2019).

Ranjan B, Sun W, Park J, Mishra K, Schmidt F, Xie R, et al. DUBStepR is a scalable correlation-based feature selection method for accurately clustering single-cell data. Nature Communications. 2021;12(1):5849. https://doi.org/10.1038/s41467-021-26085-2 .

Yuan F, Pan XY, Zeng T, Zhang YH, Chen L, Gan Z, et al. Identifying cell-type specific genes and expression rules based on single-cell transcriptomic atlas data. Frontiers in Bioengineering and Biotechnology. 2020;8:350. https://doi.org/10.3389/fbioe.2020.00350 .

Chen, X., Chen, S. & Thomson, M. Active feature selection discovers minimal gene-sets for classifying cell-types and disease states in single-cell mRNA-seq data. arXiv preprint arXiv:2106.08317 (2021).

Dumitrascu B, Villar S, Mixon DG, Engelhardt BE. Optimal marker gene selection for cell type discrimination in single cell analyses. Nature Communications. 2021;12(1):1–8. https://doi.org/10.1038/s41467-021-21453-4 .

Aevermann, B. D. et al. A machine learning method for the discovery of minimum marker gene combinations for cell-type identification from single-cell RNA sequencing. Genome Research, gr–275569 (2021).

Ntranos V, Yi L, Melsted P, Pachter L. A discriminative learning approach to differential expression analysis for single-cell RNA-seq. Nature Methods. 2019;16(2):163–6. https://doi.org/10.1038/s41592-018-0303-9 .

Huynh, N. P., Kelly, N. H., Katz, D. B., Pham, M. & Guilak, F. Single cell RNA sequencing reveals heterogeneity of human MSC chondrogenesis: Lasso regularized logistic regression to identify gene and regulatory signatures. bioRxiv 854406 (2019).

Vargo AH, Gilbert AC. A rank-based marker selection method for high throughput scRNA-seq data. BMC Bioinformatics. 2020;21(1):1–51. https://doi.org/10.1186/s12859-020-03641-z .

Chen B. Herring, C. A. & Lau, K. S. pyNVR: investigating factors affecting feature selection from scRNA-seq data for lineage reconstruction. Bioinformatics. 2019;35(13):2335–7. https://doi.org/10.1093/bioinformatics/bty950 .

Buenrostro JD, Wu B, Litzenburger UM, Ruff D, Gonzales ML, Snyder MP, et al. Single-cell chromatin accessibility reveals principles of regulatory variation. Nature. 2015;523(7561):486–90. https://doi.org/10.1038/nature14590 .

Chen H, Lareau C, Andreani T, Vinyard ME, Garcia SP, Clement K, et al. Assessment of computational methods for the analysis of single-cell ATAC-seq data. Genome Biology. 2019;20(1):1–25. https://doi.org/10.1186/s13059-019-1854-5 .

Baker SM, Rogerson C, Hayes A, Sharrocks AD, Rattray M. Classifying cells with scasat, a single-cell ATAC-seq analysis tool. Nucleic acids research. 2019;47(2):e10–0. https://doi.org/10.1093/nar/gky950 .

Yu W, Uzun Y, Zhu Q. Chen, C. & Tan, K. scATAC-pro: a comprehensive workbench for single-cell chromatin accessibility sequencing data. Genome Biology. 2020;21(1):1–17. https://doi.org/10.1186/s13059-020-02008-0 .

Fang R, Preissl S, Li Y, Hou X, Lucero J, Wang X, et al. Comprehensive analysis of single cell atac-seq data with snapatac. Nature communications. 2021;12(1):1–15. https://doi.org/10.1038/s41467-021-21583-9 .

Robinson MD. McCarthy, D. J. & Smyth, G. K. edgeR: a bioconductor package for differential expression analysis of digital gene expression data. Bioinformatics. 2010;26(1):139–40. https://doi.org/10.1093/bioinformatics/btp616 .

Kawaguchi RK, et al. Exploiting marker genes for robust classification and characterization of single-cell chromatin accessibility. BioRxiv. 2021.

Wolf FA, Angerer P, Theis FJ. SCANPY: large-scale single-cell gene expression data analysis. Genome Biology. 2018;19(1):1–5. https://doi.org/10.1186/s13059-017-1382-0 .

Muto Y, Wilson PC, Ledru N, Wu H, Dimke H, Waikar SS, et al. Single cell transcriptional and chromatin accessibility profiling redefine cellular heterogeneity in the adult human kidney. Nature Communications. 2021;12(1):1–17. https://doi.org/10.1038/s41467-021-22368-w .

Pliner HA, Packer JS, McFaline-Figueroa JL, Cusanovich DA, Daza RM, Aghamirzaie D, et al. Cicero predicts cis-regulatory DNA interactions from single-cell chromatin accessibility data. Molecular Cell. 2018;71(5):858–71. https://doi.org/10.1016/j.molcel.2018.06.044 .

Brummelman J, Haftmann C, Núñez NG, Alvisi G, Mazza EMC, Becher B, et al. Development, application and computational analysis of high-dimensional fluorescent antibody panels for single-cell flow cytometry. Nature Protocols. 2019;14(7):1946–69. https://doi.org/10.1038/s41596-019-0166-2 .

Spitzer MH, Nolan GP. Mass cytometry: single cells, many features. Cell. 2016;165(4):780–91. https://doi.org/10.1016/j.cell.2016.04.019 .

Saeys Y, Van Gassen S, Lambrecht BN. Computational flow cytometry: helping to make sense of high-dimensional immunology data. Nature Reviews Immunology. 2016;16(7):449–62. https://doi.org/10.1038/nri.2016.56 .

Aghaeepour N, Simonds EF, Knapp DJHF, Bruggner RV, Sachs K, Culos A, et al. GateFinder: projection-based gating strategy optimization for flow and mass cytometry. Bioinformatics. 2018;34(23):4131–3. https://doi.org/10.1093/bioinformatics/bty430 .

Hassan, S. S., Ruusuvuori, P., Latonen, L. & Huttunen, H. Flow cytometry-based classification in cancer research: a view on feature selection. Cancer Informatics 14, CIN–S30795 (2015).

Tanhaemami M, Alizadeh E, Sanders CK, Marrone BL, Munsky B. Using flow cytometry and multistage machine learning to discover label-free signatures of algal lipid accumulation. Physical Biology. 2019;16(5):055001. https://doi.org/10.1088/1478-3975/ab2c60 .

Peterson VM, Zhang KX, Kumar N, Wong J, Li L, Wilson DC, et al. Multiplexed quantification of proteins and transcripts in single cells. Nature Biotechnology. 2017;35(10):936–9. https://doi.org/10.1038/nbt.3973 .

Kim HJ, Lin Y, Geddes TA, Yang JYH, Yang P. CiteFuse enables multi-modal analysis of CITE-Seq data. Bioinformatics. 2020;36(14):4137–43. https://doi.org/10.1093/bioinformatics/btaa282 .

Hao Y, Hao S, Andersen-Nissen E, Mauck WM III, Zheng S, Butler A, et al. Integrated analysis of multimodal single-cell data. Cell. 2021;184(13):3573–3587.e29. https://doi.org/10.1016/j.cell.2021.04.048 .

Weissleder R, Lee H. Automated molecular-image cytometry and analysis in modern oncology. Nature Reviews Materials. 2020;5(6):409–22. https://doi.org/10.1038/s41578-020-0180-6 .

Stender AS, Marchuk K, Liu C, Sander S, Meyer MW, Smith EA, et al. Single cell optical imaging and spectroscopy. Chemical Reviews. 2013;113(4):2469–527. https://doi.org/10.1021/cr300336e .

Pischel D, Buchbinder JH, Sundmacher K, Lavrik IN, Flassig RJ. A guide to automated apoptosis detection: how to make sense of imaging flow cytometry data. PloS One. 2018;13(5):e0197208. https://doi.org/10.1371/journal.pone.0197208 .

Hennig H, Rees P, Blasi T, Kamentsky L, Hung J, Dao D, et al. An open-source solution for advanced imaging flow cytometry data analysis using machine learning. Methods. 2017;112:201–10. https://doi.org/10.1016/j.ymeth.2016.08.018 .

Peralta D, Saeys Y. Robust unsupervised dimensionality reduction based on feature clustering for single-cell imaging data. Applied Soft Computing. 2020;93:106421. https://doi.org/10.1016/j.asoc.2020.106421 .

Doan, M. et al. Deepometry, a framework for applying supervised and weakly supervised deep learning to imaging cytometry. Nature Protocols 1–24 (2021).

Norris, D. et al. Signaling heterogeneity is defined by pathway architecture and intercellular variability in protein expression. iScience 24, 102118 (2021).

Svensson V, Teichmann SA, Stegle O. SpatialDE: identification of spatially variable genes. Nature Methods. 2018;15(5):343–6. https://doi.org/10.1038/nmeth.4636 .

Macaulay IC, Ponting CP, Voet T. Single-cell multiomics: multiple measurements from single cells. Trends in Genetics. 2017;33(2):155–68. https://doi.org/10.1016/j.tig.2016.12.003 .

Burgess DJ. Spatial transcriptomics coming of age. Nature Reviews Genetics. 2019;20(6):317–7. https://doi.org/10.1038/s41576-019-0129-z .

Velazquez-Villarreal EI, Maheshwari S, Sorenson J, Fiddes IT, Kumar V, Yin Y, et al. Single-cell sequencing of genomic DNA resolves sub-clonal heterogeneity in a melanoma cell line. Communications Biology. 2020;3(1):1–8. https://doi.org/10.1038/s42003-020-1044-8 .

Luquette LJ, Bohrson CL, Sherman MA, Park PJ. Identification of somatic mutations in single cell DNA-seq using a spatial model of allelic imbalance. Nature Communications. 2019;10(1):1–14. https://doi.org/10.1038/s41467-019-11857-8 .

Marx V. A dream of single-cell proteomics. Nature Methods. 2019;16(9):809–12. https://doi.org/10.1038/s41592-019-0540-6 .

Kelly RT. Single-cell proteomics: progress and prospects. Molecular & Cellular Proteomics. 2020;19(11):1739–48. https://doi.org/10.1074/mcp.R120.002234 .

Mimitou, E. P. et al. Scalable, multimodal profiling of chromatin accessibility, gene expression and protein levels in single cells. Nature Biotechnology 1–13 (2021).

Hu Y, Huang K, An Q, du G, Hu G, Xue J, et al. Simultaneous profiling of transcriptome and DNA methylome from a single cell. Genome Biology. 2016;17(1):1–11. https://doi.org/10.1186/s13059-016-0950-z .

Clark SJ, Argelaguet R, Kapourani CA, Stubbs TM, Lee HJ, Alda-Catalinas C, et al. scNMT-seq enables joint profiling of chromatin accessibility DNA methylation and transcription in single cells. Nature Communications. 2018;9(1):1–9. https://doi.org/10.1038/s41467-018-03149-4 .

Ma S, Zhang B, LaFave LM, Earl AS, Chiang Z, Hu Y, et al. Chromatin potential identified by shared single-cell profiling of RNA and chromatin. Cell. 2020;183(4):1103–16. https://doi.org/10.1016/j.cell.2020.09.056 .

Chen S, Lake BB, Zhang K. High-throughput sequencing of the transcriptome and chromatin accessibility in the same cell. Nature Biotechnology. 2019;37(12):1452–7. https://doi.org/10.1038/s41587-019-0290-0 .

Hou Y, Guo H, Cao C, Li X, Hu B, Zhu P, et al. Single-cell triple omics sequencing reveals genetic, epigenetic, and transcriptomic heterogeneity in hepatocellular carcinomas. Cell Research. 2016;26(3):304–19. https://doi.org/10.1038/cr.2016.23 .

Macaulay IC, Haerty W, Kumar P, Li YI, Hu TX, Teng MJ, et al. G&t-seq: parallel sequencing of single-cell genomes and transcriptomes. Nature Methods. 2015;12(6):519–22. https://doi.org/10.1038/nmeth.3370 .

Liang S, Mohanty V, Dou J, Miao Q, Huang Y, Müftüoğlu M, et al. Single-cell manifold-preserving feature selection for detecting rare cell populations. Nature Computational Science. 2021;1(5):374–84. https://doi.org/10.1038/s43588-021-00070-7 .

Sun X, Liu Y, An L. Ensemble dimensionality reduction and feature gene extraction for single-cell RNA-seq data. Nature Communications. 2020;11(1):1–9. https://doi.org/10.1038/s41467-020-19465-7 .

Kinalis S, Nielsen FC, Winther O, Bagger FO. Deconvolution of autoencoders to learn biological regulatory modules from single cell mRNA sequencing data. BMC Bioinformatics. 2019;20(1):1–9. https://doi.org/10.1186/s12859-019-2952-9 .

Samek, W. et al. Explainable artificial intelligence: understanding, visualizing and interpreting deep learning models. arXiv:1708.08296 (2017).

Gentleman RC, Carey VJ, Bates DM, Bolstad B, Dettling M, Dudoit S, et al. Bioconductor: open software development for computational biology and bioinformatics. Genome Biology. 2004;5(10):R80. https://doi.org/10.1186/gb-2004-5-10-r80 .

Download references

Acknowledgements

The authors thank the feedback from the members of the Sydney Precision Bioinformatics Alliance.

Peer review information

Barbara Cheifet was the primary editor of this article and managed its editorial process and peer review in collaboration with the rest of the editorial team.

Review history

The review history is available as Additional file 2 .

P.Y. is supported by a National Health and Medical Research Council Investigator Grant (1173469).

Author information

Authors and affiliations.

School of Mathematics and Statistics, University of Sydney, Sydney, NSW, 2006, Australia

Pengyi Yang & Hao Huang

Computational Systems Biology Group, Children’s Medical Research Institute, University of Sydney, Westmead, NSW, 2145, Australia

Pengyi Yang, Hao Huang & Chunlei Liu

Charles Perkins Centre, University of Sydney, Sydney, NSW, 2006, Australia

Pengyi Yang

You can also search for this author in PubMed   Google Scholar

Contributions

P.Y. conceptualised this work. The authors reviewed the literature, drafted the manuscript and wrote and edited the manuscript. The authors read and approved the final manuscript.

Authors’ information

Twitter handles: @PengyiYang82 (Pengyi Yang); @haohuang1999 (Hao Huang); @ChunleiLiu0 (Chunlei Liu).

Corresponding author

Correspondence to Pengyi Yang .

Ethics declarations

Ethics approval and consent to participate, consent for publication, competing interests.

The authors declare that they have no competing interests.

Additional information

Publisher’s note.

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary information

Additional file 1: table s1..

A list of studies that applied feature selection techniques to the single-cell field.

Additional file 2:

Review history.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ . The Creative Commons Public Domain Dedication waiver ( http://creativecommons.org/publicdomain/zero/1.0/ ) applies to the data made available in this article, unless otherwise stated in a credit line to the data.

Reprints and permissions

About this article

Cite this article.

Yang, P., Huang, H. & Liu, C. Feature selection revisited in the single-cell era. Genome Biol 22 , 321 (2021). https://doi.org/10.1186/s13059-021-02544-3

Download citation

Received : 26 July 2021

Accepted : 15 November 2021

Published : 01 December 2021

DOI : https://doi.org/10.1186/s13059-021-02544-3

Share this article

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

Genome Biology

ISSN: 1474-760X

paper research techniques

Recent advances in mathematical methods for finance

  • Open access
  • Published: 04 April 2024

Cite this article

You have full access to this open access article

  • Giorgia Callegaro 1 ,
  • Claudio Fontana 1 ,
  • Martino Grasselli 1 ,
  • Wolfgang J. Runggaldier 1 &
  • Tiziano Vargiolu 1  

Explore all metrics

Avoid common mistakes on your manuscript.

In recent years, Mathematical Finance has witnessed the emergence of new research directions spurred by developments of financial markets, technological advances, and societal challenges. On the one hand, financial markets have seen the introduction of new financial products, regulatory frameworks, and trading infrastructures. On the other hand, artificial intelligence and machine learning techniques are introducing revolutionary changes in numerical methods in finance, overcoming computational challenges considered insurmountable until recently. In addition, new types of risks, such as climate-related and cyber-risks, have gained prominence, significantly impacting financial institutions and society at large.

This special issue on Recent Advances in Mathematical Methods for Finance provides a comprehensive overview of some of the latest developments in Mathematical Finance. We decided to launch this special issue on the occasion of the 10th General AMaMeF Conference, organised by the Guest Editors at the University of Padova and held in a virtual format on June 22–25, 2021. AMaMeF is the acronym for Advanced Mathematical Methods for Finance, and was born as a programme network of the European Science Foundation from 2005 to 2010, under the Sixth Framework Program for research and technological development of the European Union. AMaMeF now represents a European network of research promoting the exchange and diffusion of knowledge in the field of Mathematical Finance, spanning more than 20 countries. The biannual general conference stands as the flagship event of the AMaMeF network. The 10th General AMaMeF Conference spanned a broad range of topics in mathematical finance, including algorithmic trading and financial technologies, asset pricing under market frictions, collateralization and XVA, credit risk and interest rate modeling, energy and commodity markets, equilibrium and principal-agents models, climate risk, green and sustainable finance, machine learning and computational methods in finance, market microstructure, mean-field games and McKean–Vlasov equations, model uncertainty, model risk and robust finance, risk measures, stochastic control and portfolio optimization, stochastic volatility modeling, systemic risk and financial networks. These topics were specifically targeted by the call for papers for the special issue, which was open to the entire scientific community and not restricted to papers presented at the conference.

The special issue contains 44 papers, which underwent a rigorous peer review process under the supervision of the Guest Editors. Coherently with the title of the special issue, in the selection of the submitted papers emphasis was placed on the originality and interest of the mathematical methods employed, alongside the relevance of their financial applications. The selected papers encompass theoretical contributions as well as more applied research, offering a comprehensive view of promising research directions in mathematical finance.

We are thankful to Prof. Endre Boros, Editor-in-Chief of Annals of Operations Research , for giving us the opportunity to edit this special issue and to the Springer staff for their assistance throughout the production process. We are grateful to the referees for their valuable feedback and constructive criticisms, which aided in the selection of the submissions and enhanced the quality of accepted papers. Finally, our most sincere gratitude goes to the authors of the submitted papers, for contributing their work to this special issue. We hope that this collection of papers will stimulate further research on several emerging topics in Mathematical Finance.

Open access funding provided by Università degli Studi di Padova within the CRUI-CARE Agreement.

Author information

Authors and affiliations.

Department of Mathematics “Tullio Levi-Civita”, University of Padova, Padua, Italy

Giorgia Callegaro, Claudio Fontana, Martino Grasselli, Wolfgang J. Runggaldier & Tiziano Vargiolu

You can also search for this author in PubMed   Google Scholar

Corresponding author

Correspondence to Claudio Fontana .

Additional information

Publisher’s note.

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ .

Reprints and permissions

About this article

Callegaro, G., Fontana, C., Grasselli, M. et al. Recent advances in mathematical methods for finance. Ann Oper Res (2024). https://doi.org/10.1007/s10479-024-05959-w

Download citation

Published : 04 April 2024

DOI : https://doi.org/10.1007/s10479-024-05959-w

Share this article

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

  • Find a journal
  • Publish with us
  • Track your research

Generate accurate APA citations for free

  • Knowledge Base
  • APA Style 7th edition
  • How to write an APA methods section

How to Write an APA Methods Section | With Examples

Published on February 5, 2021 by Pritha Bhandari . Revised on June 22, 2023.

The methods section of an APA style paper is where you report in detail how you performed your study. Research papers in the social and natural sciences often follow APA style. This article focuses on reporting quantitative research methods .

In your APA methods section, you should report enough information to understand and replicate your study, including detailed information on the sample , measures, and procedures used.

Instantly correct all language mistakes in your text

Upload your document to correct all your mistakes in minutes

upload-your-document-ai-proofreader

Table of contents

Structuring an apa methods section.

Participants

Example of an APA methods section

Other interesting articles, frequently asked questions about writing an apa methods section.

The main heading of “Methods” should be centered, boldfaced, and capitalized. Subheadings within this section are left-aligned, boldfaced, and in title case. You can also add lower level headings within these subsections, as long as they follow APA heading styles .

To structure your methods section, you can use the subheadings of “Participants,” “Materials,” and “Procedures.” These headings are not mandatory—aim to organize your methods section using subheadings that make sense for your specific study.

Note that not all of these topics will necessarily be relevant for your study. For example, if you didn’t need to consider outlier removal or ways of assigning participants to different conditions, you don’t have to report these steps.

The APA also provides specific reporting guidelines for different types of research design. These tell you exactly what you need to report for longitudinal designs , replication studies, experimental designs , and so on. If your study uses a combination design, consult APA guidelines for mixed methods studies.

Detailed descriptions of procedures that don’t fit into your main text can be placed in supplemental materials (for example, the exact instructions and tasks given to participants, the full analytical strategy including software code, or additional figures and tables).

Prevent plagiarism. Run a free check.

Begin the methods section by reporting sample characteristics, sampling procedures, and the sample size.

Participant or subject characteristics

When discussing people who participate in research, descriptive terms like “participants,” “subjects” and “respondents” can be used. For non-human animal research, “subjects” is more appropriate.

Specify all relevant demographic characteristics of your participants. This may include their age, sex, ethnic or racial group, gender identity, education level, and socioeconomic status. Depending on your study topic, other characteristics like educational or immigration status or language preference may also be relevant.

Be sure to report these characteristics as precisely as possible. This helps the reader understand how far your results may be generalized to other people.

The APA guidelines emphasize writing about participants using bias-free language , so it’s necessary to use inclusive and appropriate terms.

Sampling procedures

Outline how the participants were selected and all inclusion and exclusion criteria applied. Appropriately identify the sampling procedure used. For example, you should only label a sample as random  if you had access to every member of the relevant population.

Of all the people invited to participate in your study, note the percentage that actually did (if you have this data). Additionally, report whether participants were self-selected, either by themselves or by their institutions (e.g., schools may submit student data for research purposes).

Identify any compensation (e.g., course credits or money) that was provided to participants, and mention any institutional review board approvals and ethical standards followed.

Sample size and power

Detail the sample size (per condition) and statistical power that you hoped to achieve, as well as any analyses you performed to determine these numbers.

It’s important to show that your study had enough statistical power to find effects if there were any to be found.

Additionally, state whether your final sample differed from the intended sample. Your interpretations of the study outcomes should be based only on your final sample rather than your intended sample.

Write up the tools and techniques that you used to measure relevant variables. Be as thorough as possible for a complete picture of your techniques.

Primary and secondary measures

Define the primary and secondary outcome measures that will help you answer your primary and secondary research questions.

Specify all instruments used in gathering these measurements and the construct that they measure. These instruments may include hardware, software, or tests, scales, and inventories.

  • To cite hardware, indicate the model number and manufacturer.
  • To cite common software (e.g., Qualtrics), state the full name along with the version number or the website URL .
  • To cite tests, scales or inventories, reference its manual or the article it was published in. It’s also helpful to state the number of items and provide one or two example items.

Make sure to report the settings of (e.g., screen resolution) any specialized apparatus used.

For each instrument used, report measures of the following:

  • Reliability : how consistently the method measures something, in terms of internal consistency or test-retest reliability.
  • Validity : how precisely the method measures something, in terms of construct validity  or criterion validity .

Giving an example item or two for tests, questionnaires , and interviews is also helpful.

Describe any covariates—these are any additional variables that may explain or predict the outcomes.

Quality of measurements

Review all methods you used to assure the quality of your measurements.

These may include:

  • training researchers to collect data reliably,
  • using multiple people to assess (e.g., observe or code) the data,
  • translation and back-translation of research materials,
  • using pilot studies to test your materials on unrelated samples.

For data that’s subjectively coded (for example, classifying open-ended responses), report interrater reliability scores. This tells the reader how similarly each response was rated by multiple raters.

Report all of the procedures applied for administering the study, processing the data, and for planned data analyses.

Data collection methods and research design

Data collection methods refers to the general mode of the instruments: surveys, interviews, observations, focus groups, neuroimaging, cognitive tests, and so on. Summarize exactly how you collected the necessary data.

Describe all procedures you applied in administering surveys, tests, physical recordings, or imaging devices, with enough detail so that someone else can replicate your techniques. If your procedures are very complicated and require long descriptions (e.g., in neuroimaging studies), place these details in supplementary materials.

To report research design, note your overall framework for data collection and analysis. State whether you used an experimental, quasi-experimental, descriptive (observational), correlational, and/or longitudinal design. Also note whether a between-subjects or a within-subjects design was used.

For multi-group studies, report the following design and procedural details as well:

  • how participants were assigned to different conditions (e.g., randomization),
  • instructions given to the participants in each group,
  • interventions for each group,
  • the setting and length of each session(s).

Describe whether any masking was used to hide the condition assignment (e.g., placebo or medication condition) from participants or research administrators. Using masking in a multi-group study ensures internal validity by reducing research bias . Explain how this masking was applied and whether its effectiveness was assessed.

Participants were randomly assigned to a control or experimental condition. The survey was administered using Qualtrics (https://www.qualtrics.com). To begin, all participants were given the AAI and a demographics questionnaire to complete, followed by an unrelated filler task. In the control condition , participants completed a short general knowledge test immediately after the filler task. In the experimental condition, participants were asked to visualize themselves taking the test for 3 minutes before they actually did. For more details on the exact instructions and tasks given, see supplementary materials.

Data diagnostics

Outline all steps taken to scrutinize or process the data after collection.

This includes the following:

  • Procedures for identifying and removing outliers
  • Data transformations to normalize distributions
  • Compensation strategies for overcoming missing values

To ensure high validity, you should provide enough detail for your reader to understand how and why you processed or transformed your raw data in these specific ways.

Analytic strategies

The methods section is also where you describe your statistical analysis procedures, but not their outcomes. Their outcomes are reported in the results section.

These procedures should be stated for all primary, secondary, and exploratory hypotheses. While primary and secondary hypotheses are based on a theoretical framework or past studies, exploratory hypotheses are guided by the data you’ve just collected.

Are your APA in-text citations flawless?

The AI-powered APA Citation Checker points out every error, tells you exactly what’s wrong, and explains how to fix it. Say goodbye to losing marks on your assignment!

Get started!

paper research techniques

This annotated example reports methods for a descriptive correlational survey on the relationship between religiosity and trust in science in the US. Hover over each part for explanation of what is included.

The sample included 879 adults aged between 18 and 28. More than half of the participants were women (56%), and all participants had completed at least 12 years of education. Ethics approval was obtained from the university board before recruitment began. Participants were recruited online through Amazon Mechanical Turk (MTurk; www.mturk.com). We selected for a geographically diverse sample within the Midwest of the US through an initial screening survey. Participants were paid USD $5 upon completion of the study.

A sample size of at least 783 was deemed necessary for detecting a correlation coefficient of ±.1, with a power level of 80% and a significance level of .05, using a sample size calculator (www.sample-size.net/correlation-sample-size/).

The primary outcome measures were the levels of religiosity and trust in science. Religiosity refers to involvement and belief in religious traditions, while trust in science represents confidence in scientists and scientific research outcomes. The secondary outcome measures were gender and parental education levels of participants and whether these characteristics predicted religiosity levels.

Religiosity

Religiosity was measured using the Centrality of Religiosity scale (Huber, 2003). The Likert scale is made up of 15 questions with five subscales of ideology, experience, intellect, public practice, and private practice. An example item is “How often do you experience situations in which you have the feeling that God or something divine intervenes in your life?” Participants were asked to indicate frequency of occurrence by selecting a response ranging from 1 (very often) to 5 (never). The internal consistency of the instrument is .83 (Huber & Huber, 2012).

Trust in Science

Trust in science was assessed using the General Trust in Science index (McCright, Dentzman, Charters & Dietz, 2013). Four Likert scale items were assessed on a scale from 1 (completely distrust) to 5 (completely trust). An example question asks “How much do you distrust or trust scientists to create knowledge that is unbiased and accurate?” Internal consistency was .8.

Potential participants were invited to participate in the survey online using Qualtrics (www.qualtrics.com). The survey consisted of multiple choice questions regarding demographic characteristics, the Centrality of Religiosity scale, an unrelated filler anagram task, and finally the General Trust in Science index. The filler task was included to avoid priming or demand characteristics, and an attention check was embedded within the religiosity scale. For full instructions and details of tasks, see supplementary materials.

For this correlational study , we assessed our primary hypothesis of a relationship between religiosity and trust in science using Pearson moment correlation coefficient. The statistical significance of the correlation coefficient was assessed using a t test. To test our secondary hypothesis of parental education levels and gender as predictors of religiosity, multiple linear regression analysis was used.

If you want to know more about statistics , methodology , or research bias , make sure to check out some of our other articles with explanations and examples.

  • Normal distribution
  • Measures of central tendency
  • Chi square tests
  • Confidence interval
  • Quartiles & Quantiles

Methodology

  • Cluster sampling
  • Stratified sampling
  • Thematic analysis
  • Cohort study
  • Peer review
  • Ethnography

Research bias

  • Implicit bias
  • Cognitive bias
  • Conformity bias
  • Hawthorne effect
  • Availability heuristic
  • Attrition bias
  • Social desirability bias

In your APA methods section , you should report detailed information on the participants, materials, and procedures used.

  • Describe all relevant participant or subject characteristics, the sampling procedures used and the sample size and power .
  • Define all primary and secondary measures and discuss the quality of measurements.
  • Specify the data collection methods, the research design and data analysis strategy, including any steps taken to transform the data and statistical analyses.

You should report methods using the past tense , even if you haven’t completed your study at the time of writing. That’s because the methods section is intended to describe completed actions or research.

In a scientific paper, the methodology always comes after the introduction and before the results , discussion and conclusion . The same basic structure also applies to a thesis, dissertation , or research proposal .

Depending on the length and type of document, you might also include a literature review or theoretical framework before the methodology.

Cite this Scribbr article

If you want to cite this source, you can copy and paste the citation or click the “Cite this Scribbr article” button to automatically add the citation to our free Citation Generator.

Bhandari, P. (2023, June 22). How to Write an APA Methods Section | With Examples. Scribbr. Retrieved April 2, 2024, from https://www.scribbr.com/apa-style/methods-section/

Is this article helpful?

Pritha Bhandari

Pritha Bhandari

Other students also liked, how to write an apa results section, apa format for academic papers and essays, apa headings and subheadings, scribbr apa citation checker.

An innovative new tool that checks your APA citations with AI software. Say goodbye to inaccurate citations!

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

  • View all journals
  • My Account Login
  • Explore content
  • About the journal
  • Publish with us
  • Sign up for alerts
  • Open access
  • Published: 26 March 2024

Predicting and improving complex beer flavor through machine learning

  • Michiel Schreurs   ORCID: orcid.org/0000-0002-9449-5619 1 , 2 , 3   na1 ,
  • Supinya Piampongsant 1 , 2 , 3   na1 ,
  • Miguel Roncoroni   ORCID: orcid.org/0000-0001-7461-1427 1 , 2 , 3   na1 ,
  • Lloyd Cool   ORCID: orcid.org/0000-0001-9936-3124 1 , 2 , 3 , 4 ,
  • Beatriz Herrera-Malaver   ORCID: orcid.org/0000-0002-5096-9974 1 , 2 , 3 ,
  • Christophe Vanderaa   ORCID: orcid.org/0000-0001-7443-5427 4 ,
  • Florian A. Theßeling 1 , 2 , 3 ,
  • Łukasz Kreft   ORCID: orcid.org/0000-0001-7620-4657 5 ,
  • Alexander Botzki   ORCID: orcid.org/0000-0001-6691-4233 5 ,
  • Philippe Malcorps 6 ,
  • Luk Daenen 6 ,
  • Tom Wenseleers   ORCID: orcid.org/0000-0002-1434-861X 4 &
  • Kevin J. Verstrepen   ORCID: orcid.org/0000-0002-3077-6219 1 , 2 , 3  

Nature Communications volume  15 , Article number:  2368 ( 2024 ) Cite this article

49k Accesses

847 Altmetric

Metrics details

  • Chemical engineering
  • Gas chromatography
  • Machine learning
  • Metabolomics
  • Taste receptors

The perception and appreciation of food flavor depends on many interacting chemical compounds and external factors, and therefore proves challenging to understand and predict. Here, we combine extensive chemical and sensory analyses of 250 different beers to train machine learning models that allow predicting flavor and consumer appreciation. For each beer, we measure over 200 chemical properties, perform quantitative descriptive sensory analysis with a trained tasting panel and map data from over 180,000 consumer reviews to train 10 different machine learning models. The best-performing algorithm, Gradient Boosting, yields models that significantly outperform predictions based on conventional statistics and accurately predict complex food features and consumer appreciation from chemical profiles. Model dissection allows identifying specific and unexpected compounds as drivers of beer flavor and appreciation. Adding these compounds results in variants of commercial alcoholic and non-alcoholic beers with improved consumer appreciation. Together, our study reveals how big data and machine learning uncover complex links between food chemistry, flavor and consumer perception, and lays the foundation to develop novel, tailored foods with superior flavors.

Similar content being viewed by others

paper research techniques

BitterSweet: Building machine learning models for predicting the bitter and sweet taste of small molecules

Rudraksh Tuwani, Somin Wadhwa & Ganesh Bagler

paper research techniques

Sensory lexicon and aroma volatiles analysis of brewing malt

Xiaoxia Su, Miao Yu, … Tianyi Du

paper research techniques

Predicting odor from molecular structure: a multi-label classification approach

Kushagra Saini & Venkatnarayan Ramanathan

Introduction

Predicting and understanding food perception and appreciation is one of the major challenges in food science. Accurate modeling of food flavor and appreciation could yield important opportunities for both producers and consumers, including quality control, product fingerprinting, counterfeit detection, spoilage detection, and the development of new products and product combinations (food pairing) 1 , 2 , 3 , 4 , 5 , 6 . Accurate models for flavor and consumer appreciation would contribute greatly to our scientific understanding of how humans perceive and appreciate flavor. Moreover, accurate predictive models would also facilitate and standardize existing food assessment methods and could supplement or replace assessments by trained and consumer tasting panels, which are variable, expensive and time-consuming 7 , 8 , 9 . Lastly, apart from providing objective, quantitative, accurate and contextual information that can help producers, models can also guide consumers in understanding their personal preferences 10 .

Despite the myriad of applications, predicting food flavor and appreciation from its chemical properties remains a largely elusive goal in sensory science, especially for complex food and beverages 11 , 12 . A key obstacle is the immense number of flavor-active chemicals underlying food flavor. Flavor compounds can vary widely in chemical structure and concentration, making them technically challenging and labor-intensive to quantify, even in the face of innovations in metabolomics, such as non-targeted metabolic fingerprinting 13 , 14 . Moreover, sensory analysis is perhaps even more complicated. Flavor perception is highly complex, resulting from hundreds of different molecules interacting at the physiochemical and sensorial level. Sensory perception is often non-linear, characterized by complex and concentration-dependent synergistic and antagonistic effects 15 , 16 , 17 , 18 , 19 , 20 , 21 that are further convoluted by the genetics, environment, culture and psychology of consumers 22 , 23 , 24 . Perceived flavor is therefore difficult to measure, with problems of sensitivity, accuracy, and reproducibility that can only be resolved by gathering sufficiently large datasets 25 . Trained tasting panels are considered the prime source of quality sensory data, but require meticulous training, are low throughput and high cost. Public databases containing consumer reviews of food products could provide a valuable alternative, especially for studying appreciation scores, which do not require formal training 25 . Public databases offer the advantage of amassing large amounts of data, increasing the statistical power to identify potential drivers of appreciation. However, public datasets suffer from biases, including a bias in the volunteers that contribute to the database, as well as confounding factors such as price, cult status and psychological conformity towards previous ratings of the product.

Classical multivariate statistics and machine learning methods have been used to predict flavor of specific compounds by, for example, linking structural properties of a compound to its potential biological activities or linking concentrations of specific compounds to sensory profiles 1 , 26 . Importantly, most previous studies focused on predicting organoleptic properties of single compounds (often based on their chemical structure) 27 , 28 , 29 , 30 , 31 , 32 , 33 , thus ignoring the fact that these compounds are present in a complex matrix in food or beverages and excluding complex interactions between compounds. Moreover, the classical statistics commonly used in sensory science 34 , 35 , 36 , 37 , 38 , 39 require a large sample size and sufficient variance amongst predictors to create accurate models. They are not fit for studying an extensive set of hundreds of interacting flavor compounds, since they are sensitive to outliers, have a high tendency to overfit and are less suited for non-linear and discontinuous relationships 40 .

In this study, we combine extensive chemical analyses and sensory data of a set of different commercial beers with machine learning approaches to develop models that predict taste, smell, mouthfeel and appreciation from compound concentrations. Beer is particularly suited to model the relationship between chemistry, flavor and appreciation. First, beer is a complex product, consisting of thousands of flavor compounds that partake in complex sensory interactions 41 , 42 , 43 . This chemical diversity arises from the raw materials (malt, yeast, hops, water and spices) and biochemical conversions during the brewing process (kilning, mashing, boiling, fermentation, maturation and aging) 44 , 45 . Second, the advent of the internet saw beer consumers embrace online review platforms, such as RateBeer (ZX Ventures, Anheuser-Busch InBev SA/NV) and BeerAdvocate (Next Glass, inc.). In this way, the beer community provides massive data sets of beer flavor and appreciation scores, creating extraordinarily large sensory databases to complement the analyses of our professional sensory panel. Specifically, we characterize over 200 chemical properties of 250 commercial beers, spread across 22 beer styles, and link these to the descriptive sensory profiling data of a 16-person in-house trained tasting panel and data acquired from over 180,000 public consumer reviews. These unique and extensive datasets enable us to train a suite of machine learning models to predict flavor and appreciation from a beer’s chemical profile. Dissection of the best-performing models allows us to pinpoint specific compounds as potential drivers of beer flavor and appreciation. Follow-up experiments confirm the importance of these compounds and ultimately allow us to significantly improve the flavor and appreciation of selected commercial beers. Together, our study represents a significant step towards understanding complex flavors and reinforces the value of machine learning to develop and refine complex foods. In this way, it represents a stepping stone for further computer-aided food engineering applications 46 .

To generate a comprehensive dataset on beer flavor, we selected 250 commercial Belgian beers across 22 different beer styles (Supplementary Fig.  S1 ). Beers with ≤ 4.2% alcohol by volume (ABV) were classified as non-alcoholic and low-alcoholic. Blonds and Tripels constitute a significant portion of the dataset (12.4% and 11.2%, respectively) reflecting their presence on the Belgian beer market and the heterogeneity of beers within these styles. By contrast, lager beers are less diverse and dominated by a handful of brands. Rare styles such as Brut or Faro make up only a small fraction of the dataset (2% and 1%, respectively) because fewer of these beers are produced and because they are dominated by distinct characteristics in terms of flavor and chemical composition.

Extensive analysis identifies relationships between chemical compounds in beer

For each beer, we measured 226 different chemical properties, including common brewing parameters such as alcohol content, iso-alpha acids, pH, sugar concentration 47 , and over 200 flavor compounds (Methods, Supplementary Table  S1 ). A large portion (37.2%) are terpenoids arising from hopping, responsible for herbal and fruity flavors 16 , 48 . A second major category are yeast metabolites, such as esters and alcohols, that result in fruity and solvent notes 48 , 49 , 50 . Other measured compounds are primarily derived from malt, or other microbes such as non- Saccharomyces yeasts and bacteria (‘wild flora’). Compounds that arise from spices or staling are labeled under ‘Others’. Five attributes (caloric value, total acids and total ester, hop aroma and sulfur compounds) are calculated from multiple individually measured compounds.

As a first step in identifying relationships between chemical properties, we determined correlations between the concentrations of the compounds (Fig.  1 , upper panel, Supplementary Data  1 and 2 , and Supplementary Fig.  S2 . For the sake of clarity, only a subset of the measured compounds is shown in Fig.  1 ). Compounds of the same origin typically show a positive correlation, while absence of correlation hints at parameters varying independently. For example, the hop aroma compounds citronellol, and alpha-terpineol show moderate correlations with each other (Spearman’s rho=0.39 and 0.57), but not with the bittering hop component iso-alpha acids (Spearman’s rho=0.16 and −0.07). This illustrates how brewers can independently modify hop aroma and bitterness by selecting hop varieties and dosage time. If hops are added early in the boiling phase, chemical conversions increase bitterness while aromas evaporate, conversely, late addition of hops preserves aroma but limits bitterness 51 . Similarly, hop-derived iso-alpha acids show a strong anti-correlation with lactic acid and acetic acid, likely reflecting growth inhibition of lactic acid and acetic acid bacteria, or the consequent use of fewer hops in sour beer styles, such as West Flanders ales and Fruit beers, that rely on these bacteria for their distinct flavors 52 . Finally, yeast-derived esters (ethyl acetate, ethyl decanoate, ethyl hexanoate, ethyl octanoate) and alcohols (ethanol, isoamyl alcohol, isobutanol, and glycerol), correlate with Spearman coefficients above 0.5, suggesting that these secondary metabolites are correlated with the yeast genetic background and/or fermentation parameters and may be difficult to influence individually, although the choice of yeast strain may offer some control 53 .

figure 1

Spearman rank correlations are shown. Descriptors are grouped according to their origin (malt (blue), hops (green), yeast (red), wild flora (yellow), Others (black)), and sensory aspect (aroma, taste, palate, and overall appreciation). Please note that for the chemical compounds, for the sake of clarity, only a subset of the total number of measured compounds is shown, with an emphasis on the key compounds for each source. For more details, see the main text and Methods section. Chemical data can be found in Supplementary Data  1 , correlations between all chemical compounds are depicted in Supplementary Fig.  S2 and correlation values can be found in Supplementary Data  2 . See Supplementary Data  4 for sensory panel assessments and Supplementary Data  5 for correlation values between all sensory descriptors.

Interestingly, different beer styles show distinct patterns for some flavor compounds (Supplementary Fig.  S3 ). These observations agree with expectations for key beer styles, and serve as a control for our measurements. For instance, Stouts generally show high values for color (darker), while hoppy beers contain elevated levels of iso-alpha acids, compounds associated with bitter hop taste. Acetic and lactic acid are not prevalent in most beers, with notable exceptions such as Kriek, Lambic, Faro, West Flanders ales and Flanders Old Brown, which use acid-producing bacteria ( Lactobacillus and Pediococcus ) or unconventional yeast ( Brettanomyces ) 54 , 55 . Glycerol, ethanol and esters show similar distributions across all beer styles, reflecting their common origin as products of yeast metabolism during fermentation 45 , 53 . Finally, low/no-alcohol beers contain low concentrations of glycerol and esters. This is in line with the production process for most of the low/no-alcohol beers in our dataset, which are produced through limiting fermentation or by stripping away alcohol via evaporation or dialysis, with both methods having the unintended side-effect of reducing the amount of flavor compounds in the final beer 56 , 57 .

Besides expected associations, our data also reveals less trivial associations between beer styles and specific parameters. For example, geraniol and citronellol, two monoterpenoids responsible for citrus, floral and rose flavors and characteristic of Citra hops, are found in relatively high amounts in Christmas, Saison, and Brett/co-fermented beers, where they may originate from terpenoid-rich spices such as coriander seeds instead of hops 58 .

Tasting panel assessments reveal sensorial relationships in beer

To assess the sensory profile of each beer, a trained tasting panel evaluated each of the 250 beers for 50 sensory attributes, including different hop, malt and yeast flavors, off-flavors and spices. Panelists used a tasting sheet (Supplementary Data  3 ) to score the different attributes. Panel consistency was evaluated by repeating 12 samples across different sessions and performing ANOVA. In 95% of cases no significant difference was found across sessions ( p  > 0.05), indicating good panel consistency (Supplementary Table  S2 ).

Aroma and taste perception reported by the trained panel are often linked (Fig.  1 , bottom left panel and Supplementary Data  4 and 5 ), with high correlations between hops aroma and taste (Spearman’s rho=0.83). Bitter taste was found to correlate with hop aroma and taste in general (Spearman’s rho=0.80 and 0.69), and particularly with “grassy” noble hops (Spearman’s rho=0.75). Barnyard flavor, most often associated with sour beers, is identified together with stale hops (Spearman’s rho=0.97) that are used in these beers. Lactic and acetic acid, which often co-occur, are correlated (Spearman’s rho=0.66). Interestingly, sweetness and bitterness are anti-correlated (Spearman’s rho = −0.48), confirming the hypothesis that they mask each other 59 , 60 . Beer body is highly correlated with alcohol (Spearman’s rho = 0.79), and overall appreciation is found to correlate with multiple aspects that describe beer mouthfeel (alcohol, carbonation; Spearman’s rho= 0.32, 0.39), as well as with hop and ester aroma intensity (Spearman’s rho=0.39 and 0.35).

Similar to the chemical analyses, sensorial analyses confirmed typical features of specific beer styles (Supplementary Fig.  S4 ). For example, sour beers (Faro, Flanders Old Brown, Fruit beer, Kriek, Lambic, West Flanders ale) were rated acidic, with flavors of both acetic and lactic acid. Hoppy beers were found to be bitter and showed hop-associated aromas like citrus and tropical fruit. Malt taste is most detected among scotch, stout/porters, and strong ales, while low/no-alcohol beers, which often have a reputation for being ‘worty’ (reminiscent of unfermented, sweet malt extract) appear in the middle. Unsurprisingly, hop aromas are most strongly detected among hoppy beers. Like its chemical counterpart (Supplementary Fig.  S3 ), acidity shows a right-skewed distribution, with the most acidic beers being Krieks, Lambics, and West Flanders ales.

Tasting panel assessments of specific flavors correlate with chemical composition

We find that the concentrations of several chemical compounds strongly correlate with specific aroma or taste, as evaluated by the tasting panel (Fig.  2 , Supplementary Fig.  S5 , Supplementary Data  6 ). In some cases, these correlations confirm expectations and serve as a useful control for data quality. For example, iso-alpha acids, the bittering compounds in hops, strongly correlate with bitterness (Spearman’s rho=0.68), while ethanol and glycerol correlate with tasters’ perceptions of alcohol and body, the mouthfeel sensation of fullness (Spearman’s rho=0.82/0.62 and 0.72/0.57 respectively) and darker color from roasted malts is a good indication of malt perception (Spearman’s rho=0.54).

figure 2

Heatmap colors indicate Spearman’s Rho. Axes are organized according to sensory categories (aroma, taste, mouthfeel, overall), chemical categories and chemical sources in beer (malt (blue), hops (green), yeast (red), wild flora (yellow), Others (black)). See Supplementary Data  6 for all correlation values.

Interestingly, for some relationships between chemical compounds and perceived flavor, correlations are weaker than expected. For example, the rose-smelling phenethyl acetate only weakly correlates with floral aroma. This hints at more complex relationships and interactions between compounds and suggests a need for a more complex model than simple correlations. Lastly, we uncovered unexpected correlations. For instance, the esters ethyl decanoate and ethyl octanoate appear to correlate slightly with hop perception and bitterness, possibly due to their fruity flavor. Iron is anti-correlated with hop aromas and bitterness, most likely because it is also anti-correlated with iso-alpha acids. This could be a sign of metal chelation of hop acids 61 , given that our analyses measure unbound hop acids and total iron content, or could result from the higher iron content in dark and Fruit beers, which typically have less hoppy and bitter flavors 62 .

Public consumer reviews complement expert panel data

To complement and expand the sensory data of our trained tasting panel, we collected 180,000 reviews of our 250 beers from the online consumer review platform RateBeer. This provided numerical scores for beer appearance, aroma, taste, palate, overall quality as well as the average overall score.

Public datasets are known to suffer from biases, such as price, cult status and psychological conformity towards previous ratings of a product. For example, prices correlate with appreciation scores for these online consumer reviews (rho=0.49, Supplementary Fig.  S6 ), but not for our trained tasting panel (rho=0.19). This suggests that prices affect consumer appreciation, which has been reported in wine 63 , while blind tastings are unaffected. Moreover, we observe that some beer styles, like lagers and non-alcoholic beers, generally receive lower scores, reflecting that online reviewers are mostly beer aficionados with a preference for specialty beers over lager beers. In general, we find a modest correlation between our trained panel’s overall appreciation score and the online consumer appreciation scores (Fig.  3 , rho=0.29). Apart from the aforementioned biases in the online datasets, serving temperature, sample freshness and surroundings, which are all tightly controlled during the tasting panel sessions, can vary tremendously across online consumers and can further contribute to (among others, appreciation) differences between the two categories of tasters. Importantly, in contrast to the overall appreciation scores, for many sensory aspects the results from the professional panel correlated well with results obtained from RateBeer reviews. Correlations were highest for features that are relatively easy to recognize even for untrained tasters, like bitterness, sweetness, alcohol and malt aroma (Fig.  3 and below).

figure 3

RateBeer text mining results can be found in Supplementary Data  7 . Rho values shown are Spearman correlation values, with asterisks indicating significant correlations ( p  < 0.05, two-sided). All p values were smaller than 0.001, except for Esters aroma (0.0553), Esters taste (0.3275), Esters aroma—banana (0.0019), Coriander (0.0508) and Diacetyl (0.0134).

Besides collecting consumer appreciation from these online reviews, we developed automated text analysis tools to gather additional data from review texts (Supplementary Data  7 ). Processing review texts on the RateBeer database yielded comparable results to the scores given by the trained panel for many common sensory aspects, including acidity, bitterness, sweetness, alcohol, malt, and hop tastes (Fig.  3 ). This is in line with what would be expected, since these attributes require less training for accurate assessment and are less influenced by environmental factors such as temperature, serving glass and odors in the environment. Consumer reviews also correlate well with our trained panel for 4-vinyl guaiacol, a compound associated with a very characteristic aroma. By contrast, correlations for more specific aromas like ester, coriander or diacetyl are underrepresented in the online reviews, underscoring the importance of using a trained tasting panel and standardized tasting sheets with explicit factors to be scored for evaluating specific aspects of a beer. Taken together, our results suggest that public reviews are trustworthy for some, but not all, flavor features and can complement or substitute taste panel data for these sensory aspects.

Models can predict beer sensory profiles from chemical data

The rich datasets of chemical analyses, tasting panel assessments and public reviews gathered in the first part of this study provided us with a unique opportunity to develop predictive models that link chemical data to sensorial features. Given the complexity of beer flavor, basic statistical tools such as correlations or linear regression may not always be the most suitable for making accurate predictions. Instead, we applied different machine learning models that can model both simple linear and complex interactive relationships. Specifically, we constructed a set of regression models to predict (a) trained panel scores for beer flavor and quality and (b) public reviews’ appreciation scores from beer chemical profiles. We trained and tested 10 different models (Methods), 3 linear regression-based models (simple linear regression with first-order interactions (LR), lasso regression with first-order interactions (Lasso), partial least squares regressor (PLSR)), 5 decision tree models (AdaBoost regressor (ABR), extra trees (ET), gradient boosting regressor (GBR), random forest (RF) and XGBoost regressor (XGBR)), 1 support vector regression (SVR), and 1 artificial neural network (ANN) model.

To compare the performance of our machine learning models, the dataset was randomly split into a training and test set, stratified by beer style. After a model was trained on data in the training set, its performance was evaluated on its ability to predict the test dataset obtained from multi-output models (based on the coefficient of determination, see Methods). Additionally, individual-attribute models were ranked per descriptor and the average rank was calculated, as proposed by Korneva et al. 64 . Importantly, both ways of evaluating the models’ performance agreed in general. Performance of the different models varied (Table  1 ). It should be noted that all models perform better at predicting RateBeer results than results from our trained tasting panel. One reason could be that sensory data is inherently variable, and this variability is averaged out with the large number of public reviews from RateBeer. Additionally, all tree-based models perform better at predicting taste than aroma. Linear models (LR) performed particularly poorly, with negative R 2 values, due to severe overfitting (training set R 2  = 1). Overfitting is a common issue in linear models with many parameters and limited samples, especially with interaction terms further amplifying the number of parameters. L1 regularization (Lasso) successfully overcomes this overfitting, out-competing multiple tree-based models on the RateBeer dataset. Similarly, the dimensionality reduction of PLSR avoids overfitting and improves performance, to some extent. Still, tree-based models (ABR, ET, GBR, RF and XGBR) show the best performance, out-competing the linear models (LR, Lasso, PLSR) commonly used in sensory science 65 .

GBR models showed the best overall performance in predicting sensory responses from chemical information, with R 2 values up to 0.75 depending on the predicted sensory feature (Supplementary Table  S4 ). The GBR models predict consumer appreciation (RateBeer) better than our trained panel’s appreciation (R 2 value of 0.67 compared to R 2 value of 0.09) (Supplementary Table  S3 and Supplementary Table  S4 ). ANN models showed intermediate performance, likely because neural networks typically perform best with larger datasets 66 . The SVR shows intermediate performance, mostly due to the weak predictions of specific attributes that lower the overall performance (Supplementary Table  S4 ).

Model dissection identifies specific, unexpected compounds as drivers of consumer appreciation

Next, we leveraged our models to infer important contributors to sensory perception and consumer appreciation. Consumer preference is a crucial sensory aspects, because a product that shows low consumer appreciation scores often does not succeed commercially 25 . Additionally, the requirement for a large number of representative evaluators makes consumer trials one of the more costly and time-consuming aspects of product development. Hence, a model for predicting chemical drivers of overall appreciation would be a welcome addition to the available toolbox for food development and optimization.

Since GBR models on our RateBeer dataset showed the best overall performance, we focused on these models. Specifically, we used two approaches to identify important contributors. First, rankings of the most important predictors for each sensorial trait in the GBR models were obtained based on impurity-based feature importance (mean decrease in impurity). High-ranked parameters were hypothesized to be either the true causal chemical properties underlying the trait, to correlate with the actual causal properties, or to take part in sensory interactions affecting the trait 67 (Fig.  4A ). In a second approach, we used SHAP 68 to determine which parameters contributed most to the model for making predictions of consumer appreciation (Fig.  4B ). SHAP calculates parameter contributions to model predictions on a per-sample basis, which can be aggregated into an importance score.

figure 4

A The impurity-based feature importance (mean deviance in impurity, MDI) calculated from the Gradient Boosting Regression (GBR) model predicting RateBeer appreciation scores. The top 15 highest ranked chemical properties are shown. B SHAP summary plot for the top 15 parameters contributing to our GBR model. Each point on the graph represents a sample from our dataset. The color represents the concentration of that parameter, with bluer colors representing low values and redder colors representing higher values. Greater absolute values on the horizontal axis indicate a higher impact of the parameter on the prediction of the model. C Spearman correlations between the 15 most important chemical properties and consumer overall appreciation. Numbers indicate the Spearman Rho correlation coefficient, and the rank of this correlation compared to all other correlations. The top 15 important compounds were determined using SHAP (panel B).

Both approaches identified ethyl acetate as the most predictive parameter for beer appreciation (Fig.  4 ). Ethyl acetate is the most abundant ester in beer with a typical ‘fruity’, ‘solvent’ and ‘alcoholic’ flavor, but is often considered less important than other esters like isoamyl acetate. The second most important parameter identified by SHAP is ethanol, the most abundant beer compound after water. Apart from directly contributing to beer flavor and mouthfeel, ethanol drastically influences the physical properties of beer, dictating how easily volatile compounds escape the beer matrix to contribute to beer aroma 69 . Importantly, it should also be noted that the importance of ethanol for appreciation is likely inflated by the very low appreciation scores of non-alcoholic beers (Supplementary Fig.  S4 ). Despite not often being considered a driver of beer appreciation, protein level also ranks highly in both approaches, possibly due to its effect on mouthfeel and body 70 . Lactic acid, which contributes to the tart taste of sour beers, is the fourth most important parameter identified by SHAP, possibly due to the generally high appreciation of sour beers in our dataset.

Interestingly, some of the most important predictive parameters for our model are not well-established as beer flavors or are even commonly regarded as being negative for beer quality. For example, our models identify methanethiol and ethyl phenyl acetate, an ester commonly linked to beer staling 71 , as a key factor contributing to beer appreciation. Although there is no doubt that high concentrations of these compounds are considered unpleasant, the positive effects of modest concentrations are not yet known 72 , 73 .

To compare our approach to conventional statistics, we evaluated how well the 15 most important SHAP-derived parameters correlate with consumer appreciation (Fig.  4C ). Interestingly, only 6 of the properties derived by SHAP rank amongst the top 15 most correlated parameters. For some chemical compounds, the correlations are so low that they would have likely been considered unimportant. For example, lactic acid, the fourth most important parameter, shows a bimodal distribution for appreciation, with sour beers forming a separate cluster, that is missed entirely by the Spearman correlation. Additionally, the correlation plots reveal outliers, emphasizing the need for robust analysis tools. Together, this highlights the need for alternative models, like the Gradient Boosting model, that better grasp the complexity of (beer) flavor.

Finally, to observe the relationships between these chemical properties and their predicted targets, partial dependence plots were constructed for the six most important predictors of consumer appreciation 74 , 75 , 76 (Supplementary Fig.  S7 ). One-way partial dependence plots show how a change in concentration affects the predicted appreciation. These plots reveal an important limitation of our models: appreciation predictions remain constant at ever-increasing concentrations. This implies that once a threshold concentration is reached, further increasing the concentration does not affect appreciation. This is false, as it is well-documented that certain compounds become unpleasant at high concentrations, including ethyl acetate (‘nail polish’) 77 and methanethiol (‘sulfury’ and ‘rotten cabbage’) 78 . The inability of our models to grasp that flavor compounds have optimal levels, above which they become negative, is a consequence of working with commercial beer brands where (off-)flavors are rarely too high to negatively impact the product. The two-way partial dependence plots show how changing the concentration of two compounds influences predicted appreciation, visualizing their interactions (Supplementary Fig.  S7 ). In our case, the top 5 parameters are dominated by additive or synergistic interactions, with high concentrations for both compounds resulting in the highest predicted appreciation.

To assess the robustness of our best-performing models and model predictions, we performed 100 iterations of the GBR, RF and ET models. In general, all iterations of the models yielded similar performance (Supplementary Fig.  S8 ). Moreover, the main predictors (including the top predictors ethanol and ethyl acetate) remained virtually the same, especially for GBR and RF. For the iterations of the ET model, we did observe more variation in the top predictors, which is likely a consequence of the model’s inherent random architecture in combination with co-correlations between certain predictors. However, even in this case, several of the top predictors (ethanol and ethyl acetate) remain unchanged, although their rank in importance changes (Supplementary Fig.  S8 ).

Next, we investigated if a combination of RateBeer and trained panel data into one consolidated dataset would lead to stronger models, under the hypothesis that such a model would suffer less from bias in the datasets. A GBR model was trained to predict appreciation on the combined dataset. This model underperformed compared to the RateBeer model, both in the native case and when including a dataset identifier (R 2  = 0.67, 0.26 and 0.42 respectively). For the latter, the dataset identifier is the most important feature (Supplementary Fig.  S9 ), while most of the feature importance remains unchanged, with ethyl acetate and ethanol ranking highest, like in the original model trained only on RateBeer data. It seems that the large variation in the panel dataset introduces noise, weakening the models’ performances and reliability. In addition, it seems reasonable to assume that both datasets are fundamentally different, with the panel dataset obtained by blind tastings by a trained professional panel.

Lastly, we evaluated whether beer style identifiers would further enhance the model’s performance. A GBR model was trained with parameters that explicitly encoded the styles of the samples. This did not improve model performance (R2 = 0.66 with style information vs R2 = 0.67). The most important chemical features are consistent with the model trained without style information (eg. ethanol and ethyl acetate), and with the exception of the most preferred (strong ale) and least preferred (low/no-alcohol) styles, none of the styles were among the most important features (Supplementary Fig.  S9 , Supplementary Table  S5 and S6 ). This is likely due to a combination of style-specific chemical signatures, such as iso-alpha acids and lactic acid, that implicitly convey style information to the original models, as well as the low number of samples belonging to some styles, making it difficult for the model to learn style-specific patterns. Moreover, beer styles are not rigorously defined, with some styles overlapping in features and some beers being misattributed to a specific style, all of which leads to more noise in models that use style parameters.

Model validation

To test if our predictive models give insight into beer appreciation, we set up experiments aimed at improving existing commercial beers. We specifically selected overall appreciation as the trait to be examined because of its complexity and commercial relevance. Beer flavor comprises a complex bouquet rather than single aromas and tastes 53 . Hence, adding a single compound to the extent that a difference is noticeable may lead to an unbalanced, artificial flavor. Therefore, we evaluated the effect of combinations of compounds. Because Blond beers represent the most extensive style in our dataset, we selected a beer from this style as the starting material for these experiments (Beer 64 in Supplementary Data  1 ).

In the first set of experiments, we adjusted the concentrations of compounds that made up the most important predictors of overall appreciation (ethyl acetate, ethanol, lactic acid, ethyl phenyl acetate) together with correlated compounds (ethyl hexanoate, isoamyl acetate, glycerol), bringing them up to 95 th percentile ethanol-normalized concentrations (Methods) within the Blond group (‘Spiked’ concentration in Fig.  5A ). Compared to controls, the spiked beers were found to have significantly improved overall appreciation among trained panelists, with panelist noting increased intensity of ester flavors, sweetness, alcohol, and body fullness (Fig.  5B ). To disentangle the contribution of ethanol to these results, a second experiment was performed without the addition of ethanol. This resulted in a similar outcome, including increased perception of alcohol and overall appreciation.

figure 5

Adding the top chemical compounds, identified as best predictors of appreciation by our model, into poorly appreciated beers results in increased appreciation from our trained panel. Results of sensory tests between base beers and those spiked with compounds identified as the best predictors by the model. A Blond and Non/Low-alcohol (0.0% ABV) base beers were brought up to 95th-percentile ethanol-normalized concentrations within each style. B For each sensory attribute, tasters indicated the more intense sample and selected the sample they preferred. The numbers above the bars correspond to the p values that indicate significant changes in perceived flavor (two-sided binomial test: alpha 0.05, n  = 20 or 13).

In a last experiment, we tested whether using the model’s predictions can boost the appreciation of a non-alcoholic beer (beer 223 in Supplementary Data  1 ). Again, the addition of a mixture of predicted compounds (omitting ethanol, in this case) resulted in a significant increase in appreciation, body, ester flavor and sweetness.

Predicting flavor and consumer appreciation from chemical composition is one of the ultimate goals of sensory science. A reliable, systematic and unbiased way to link chemical profiles to flavor and food appreciation would be a significant asset to the food and beverage industry. Such tools would substantially aid in quality control and recipe development, offer an efficient and cost-effective alternative to pilot studies and consumer trials and would ultimately allow food manufacturers to produce superior, tailor-made products that better meet the demands of specific consumer groups more efficiently.

A limited set of studies have previously tried, to varying degrees of success, to predict beer flavor and beer popularity based on (a limited set of) chemical compounds and flavors 79 , 80 . Current sensitive, high-throughput technologies allow measuring an unprecedented number of chemical compounds and properties in a large set of samples, yielding a dataset that can train models that help close the gaps between chemistry and flavor, even for a complex natural product like beer. To our knowledge, no previous research gathered data at this scale (250 samples, 226 chemical parameters, 50 sensory attributes and 5 consumer scores) to disentangle and validate the chemical aspects driving beer preference using various machine-learning techniques. We find that modern machine learning models outperform conventional statistical tools, such as correlations and linear models, and can successfully predict flavor appreciation from chemical composition. This could be attributed to the natural incorporation of interactions and non-linear or discontinuous effects in machine learning models, which are not easily grasped by the linear model architecture. While linear models and partial least squares regression represent the most widespread statistical approaches in sensory science, in part because they allow interpretation 65 , 81 , 82 , modern machine learning methods allow for building better predictive models while preserving the possibility to dissect and exploit the underlying patterns. Of the 10 different models we trained, tree-based models, such as our best performing GBR, showed the best overall performance in predicting sensory responses from chemical information, outcompeting artificial neural networks. This agrees with previous reports for models trained on tabular data 83 . Our results are in line with the findings of Colantonio et al. who also identified the gradient boosting architecture as performing best at predicting appreciation and flavor (of tomatoes and blueberries, in their specific study) 26 . Importantly, besides our larger experimental scale, we were able to directly confirm our models’ predictions in vivo.

Our study confirms that flavor compound concentration does not always correlate with perception, suggesting complex interactions that are often missed by more conventional statistics and simple models. Specifically, we find that tree-based algorithms may perform best in developing models that link complex food chemistry with aroma. Furthermore, we show that massive datasets of untrained consumer reviews provide a valuable source of data, that can complement or even replace trained tasting panels, especially for appreciation and basic flavors, such as sweetness and bitterness. This holds despite biases that are known to occur in such datasets, such as price or conformity bias. Moreover, GBR models predict taste better than aroma. This is likely because taste (e.g. bitterness) often directly relates to the corresponding chemical measurements (e.g., iso-alpha acids), whereas such a link is less clear for aromas, which often result from the interplay between multiple volatile compounds. We also find that our models are best at predicting acidity and alcohol, likely because there is a direct relation between the measured chemical compounds (acids and ethanol) and the corresponding perceived sensorial attribute (acidity and alcohol), and because even untrained consumers are generally able to recognize these flavors and aromas.

The predictions of our final models, trained on review data, hold even for blind tastings with small groups of trained tasters, as demonstrated by our ability to validate specific compounds as drivers of beer flavor and appreciation. Since adding a single compound to the extent of a noticeable difference may result in an unbalanced flavor profile, we specifically tested our identified key drivers as a combination of compounds. While this approach does not allow us to validate if a particular single compound would affect flavor and/or appreciation, our experiments do show that this combination of compounds increases consumer appreciation.

It is important to stress that, while it represents an important step forward, our approach still has several major limitations. A key weakness of the GBR model architecture is that amongst co-correlating variables, the largest main effect is consistently preferred for model building. As a result, co-correlating variables often have artificially low importance scores, both for impurity and SHAP-based methods, like we observed in the comparison to the more randomized Extra Trees models. This implies that chemicals identified as key drivers of a specific sensory feature by GBR might not be the true causative compounds, but rather co-correlate with the actual causative chemical. For example, the high importance of ethyl acetate could be (partially) attributed to the total ester content, ethanol or ethyl hexanoate (rho=0.77, rho=0.72 and rho=0.68), while ethyl phenylacetate could hide the importance of prenyl isobutyrate and ethyl benzoate (rho=0.77 and rho=0.76). Expanding our GBR model to include beer style as a parameter did not yield additional power or insight. This is likely due to style-specific chemical signatures, such as iso-alpha acids and lactic acid, that implicitly convey style information to the original model, as well as the smaller sample size per style, limiting the power to uncover style-specific patterns. This can be partly attributed to the curse of dimensionality, where the high number of parameters results in the models mainly incorporating single parameter effects, rather than complex interactions such as style-dependent effects 67 . A larger number of samples may overcome some of these limitations and offer more insight into style-specific effects. On the other hand, beer style is not a rigid scientific classification, and beers within one style often differ a lot, which further complicates the analysis of style as a model factor.

Our study is limited to beers from Belgian breweries. Although these beers cover a large portion of the beer styles available globally, some beer styles and consumer patterns may be missing, while other features might be overrepresented. For example, many Belgian ales exhibit yeast-driven flavor profiles, which is reflected in the chemical drivers of appreciation discovered by this study. In future work, expanding the scope to include diverse markets and beer styles could lead to the identification of even more drivers of appreciation and better models for special niche products that were not present in our beer set.

In addition to inherent limitations of GBR models, there are also some limitations associated with studying food aroma. Even if our chemical analyses measured most of the known aroma compounds, the total number of flavor compounds in complex foods like beer is still larger than the subset we were able to measure in this study. For example, hop-derived thiols, that influence flavor at very low concentrations, are notoriously difficult to measure in a high-throughput experiment. Moreover, consumer perception remains subjective and prone to biases that are difficult to avoid. It is also important to stress that the models are still immature and that more extensive datasets will be crucial for developing more complete models in the future. Besides more samples and parameters, our dataset does not include any demographic information about the tasters. Including such data could lead to better models that grasp external factors like age and culture. Another limitation is that our set of beers consists of high-quality end-products and lacks beers that are unfit for sale, which limits the current model in accurately predicting products that are appreciated very badly. Finally, while models could be readily applied in quality control, their use in sensory science and product development is restrained by their inability to discern causal relationships. Given that the models cannot distinguish compounds that genuinely drive consumer perception from those that merely correlate, validation experiments are essential to identify true causative compounds.

Despite the inherent limitations, dissection of our models enabled us to pinpoint specific molecules as potential drivers of beer aroma and consumer appreciation, including compounds that were unexpected and would not have been identified using standard approaches. Important drivers of beer appreciation uncovered by our models include protein levels, ethyl acetate, ethyl phenyl acetate and lactic acid. Currently, many brewers already use lactic acid to acidify their brewing water and ensure optimal pH for enzymatic activity during the mashing process. Our results suggest that adding lactic acid can also improve beer appreciation, although its individual effect remains to be tested. Interestingly, ethanol appears to be unnecessary to improve beer appreciation, both for blond beer and alcohol-free beer. Given the growing consumer interest in alcohol-free beer, with a predicted annual market growth of >7% 84 , it is relevant for brewers to know what compounds can further increase consumer appreciation of these beers. Hence, our model may readily provide avenues to further improve the flavor and consumer appreciation of both alcoholic and non-alcoholic beers, which is generally considered one of the key challenges for future beer production.

Whereas we see a direct implementation of our results for the development of superior alcohol-free beverages and other food products, our study can also serve as a stepping stone for the development of novel alcohol-containing beverages. We want to echo the growing body of scientific evidence for the negative effects of alcohol consumption, both on the individual level by the mutagenic, teratogenic and carcinogenic effects of ethanol 85 , 86 , as well as the burden on society caused by alcohol abuse and addiction. We encourage the use of our results for the production of healthier, tastier products, including novel and improved beverages with lower alcohol contents. Furthermore, we strongly discourage the use of these technologies to improve the appreciation or addictive properties of harmful substances.

The present work demonstrates that despite some important remaining hurdles, combining the latest developments in chemical analyses, sensory analysis and modern machine learning methods offers exciting avenues for food chemistry and engineering. Soon, these tools may provide solutions in quality control and recipe development, as well as new approaches to sensory science and flavor research.

Beer selection

250 commercial Belgian beers were selected to cover the broad diversity of beer styles and corresponding diversity in chemical composition and aroma. See Supplementary Fig.  S1 .

Chemical dataset

Sample preparation.

Beers within their expiration date were purchased from commercial retailers. Samples were prepared in biological duplicates at room temperature, unless explicitly stated otherwise. Bottle pressure was measured with a manual pressure device (Steinfurth Mess-Systeme GmbH) and used to calculate CO 2 concentration. The beer was poured through two filter papers (Macherey-Nagel, 500713032 MN 713 ¼) to remove carbon dioxide and prevent spontaneous foaming. Samples were then prepared for measurements by targeted Headspace-Gas Chromatography-Flame Ionization Detector/Flame Photometric Detector (HS-GC-FID/FPD), Headspace-Solid Phase Microextraction-Gas Chromatography-Mass Spectrometry (HS-SPME-GC-MS), colorimetric analysis, enzymatic analysis, Near-Infrared (NIR) analysis, as described in the sections below. The mean values of biological duplicates are reported for each compound.

HS-GC-FID/FPD

HS-GC-FID/FPD (Shimadzu GC 2010 Plus) was used to measure higher alcohols, acetaldehyde, esters, 4-vinyl guaicol, and sulfur compounds. Each measurement comprised 5 ml of sample pipetted into a 20 ml glass vial containing 1.75 g NaCl (VWR, 27810.295). 100 µl of 2-heptanol (Sigma-Aldrich, H3003) (internal standard) solution in ethanol (Fisher Chemical, E/0650DF/C17) was added for a final concentration of 2.44 mg/L. Samples were flushed with nitrogen for 10 s, sealed with a silicone septum, stored at −80 °C and analyzed in batches of 20.

The GC was equipped with a DB-WAXetr column (length, 30 m; internal diameter, 0.32 mm; layer thickness, 0.50 µm; Agilent Technologies, Santa Clara, CA, USA) to the FID and an HP-5 column (length, 30 m; internal diameter, 0.25 mm; layer thickness, 0.25 µm; Agilent Technologies, Santa Clara, CA, USA) to the FPD. N 2 was used as the carrier gas. Samples were incubated for 20 min at 70 °C in the headspace autosampler (Flow rate, 35 cm/s; Injection volume, 1000 µL; Injection mode, split; Combi PAL autosampler, CTC analytics, Switzerland). The injector, FID and FPD temperatures were kept at 250 °C. The GC oven temperature was first held at 50 °C for 5 min and then allowed to rise to 80 °C at a rate of 5 °C/min, followed by a second ramp of 4 °C/min until 200 °C kept for 3 min and a final ramp of (4 °C/min) until 230 °C for 1 min. Results were analyzed with the GCSolution software version 2.4 (Shimadzu, Kyoto, Japan). The GC was calibrated with a 5% EtOH solution (VWR International) containing the volatiles under study (Supplementary Table  S7 ).

HS-SPME-GC-MS

HS-SPME-GC-MS (Shimadzu GCMS-QP-2010 Ultra) was used to measure additional volatile compounds, mainly comprising terpenoids and esters. Samples were analyzed by HS-SPME using a triphase DVB/Carboxen/PDMS 50/30 μm SPME fiber (Supelco Co., Bellefonte, PA, USA) followed by gas chromatography (Thermo Fisher Scientific Trace 1300 series, USA) coupled to a mass spectrometer (Thermo Fisher Scientific ISQ series MS) equipped with a TriPlus RSH autosampler. 5 ml of degassed beer sample was placed in 20 ml vials containing 1.75 g NaCl (VWR, 27810.295). 5 µl internal standard mix was added, containing 2-heptanol (1 g/L) (Sigma-Aldrich, H3003), 4-fluorobenzaldehyde (1 g/L) (Sigma-Aldrich, 128376), 2,3-hexanedione (1 g/L) (Sigma-Aldrich, 144169) and guaiacol (1 g/L) (Sigma-Aldrich, W253200) in ethanol (Fisher Chemical, E/0650DF/C17). Each sample was incubated at 60 °C in the autosampler oven with constant agitation. After 5 min equilibration, the SPME fiber was exposed to the sample headspace for 30 min. The compounds trapped on the fiber were thermally desorbed in the injection port of the chromatograph by heating the fiber for 15 min at 270 °C.

The GC-MS was equipped with a low polarity RXi-5Sil MS column (length, 20 m; internal diameter, 0.18 mm; layer thickness, 0.18 µm; Restek, Bellefonte, PA, USA). Injection was performed in splitless mode at 320 °C, a split flow of 9 ml/min, a purge flow of 5 ml/min and an open valve time of 3 min. To obtain a pulsed injection, a programmed gas flow was used whereby the helium gas flow was set at 2.7 mL/min for 0.1 min, followed by a decrease in flow of 20 ml/min to the normal 0.9 mL/min. The temperature was first held at 30 °C for 3 min and then allowed to rise to 80 °C at a rate of 7 °C/min, followed by a second ramp of 2 °C/min till 125 °C and a final ramp of 8 °C/min with a final temperature of 270 °C.

Mass acquisition range was 33 to 550 amu at a scan rate of 5 scans/s. Electron impact ionization energy was 70 eV. The interface and ion source were kept at 275 °C and 250 °C, respectively. A mix of linear n-alkanes (from C7 to C40, Supelco Co.) was injected into the GC-MS under identical conditions to serve as external retention index markers. Identification and quantification of the compounds were performed using an in-house developed R script as described in Goelen et al. and Reher et al. 87 , 88 (for package information, see Supplementary Table  S8 ). Briefly, chromatograms were analyzed using AMDIS (v2.71) 89 to separate overlapping peaks and obtain pure compound spectra. The NIST MS Search software (v2.0 g) in combination with the NIST2017, FFNSC3 and Adams4 libraries were used to manually identify the empirical spectra, taking into account the expected retention time. After background subtraction and correcting for retention time shifts between samples run on different days based on alkane ladders, compound elution profiles were extracted and integrated using a file with 284 target compounds of interest, which were either recovered in our identified AMDIS list of spectra or were known to occur in beer. Compound elution profiles were estimated for every peak in every chromatogram over a time-restricted window using weighted non-negative least square analysis after which peak areas were integrated 87 , 88 . Batch effect correction was performed by normalizing against the most stable internal standard compound, 4-fluorobenzaldehyde. Out of all 284 target compounds that were analyzed, 167 were visually judged to have reliable elution profiles and were used for final analysis.

Discrete photometric and enzymatic analysis

Discrete photometric and enzymatic analysis (Thermo Scientific TM Gallery TM Plus Beermaster Discrete Analyzer) was used to measure acetic acid, ammonia, beta-glucan, iso-alpha acids, color, sugars, glycerol, iron, pH, protein, and sulfite. 2 ml of sample volume was used for the analyses. Information regarding the reagents and standard solutions used for analyses and calibrations is included in Supplementary Table  S7 and Supplementary Table  S9 .

NIR analyses

NIR analysis (Anton Paar Alcolyzer Beer ME System) was used to measure ethanol. Measurements comprised 50 ml of sample, and a 10% EtOH solution was used for calibration.

Correlation calculations

Pairwise Spearman Rank correlations were calculated between all chemical properties.

Sensory dataset

Trained panel.

Our trained tasting panel consisted of volunteers who gave prior verbal informed consent. All compounds used for the validation experiment were of food-grade quality. The tasting sessions were approved by the Social and Societal Ethics Committee of the KU Leuven (G-2022-5677-R2(MAR)). All online reviewers agreed to the Terms and Conditions of the RateBeer website.

Sensory analysis was performed according to the American Society of Brewing Chemists (ASBC) Sensory Analysis Methods 90 . 30 volunteers were screened through a series of triangle tests. The sixteen most sensitive and consistent tasters were retained as taste panel members. The resulting panel was diverse in age [22–42, mean: 29], sex [56% male] and nationality [7 different countries]. The panel developed a consensus vocabulary to describe beer aroma, taste and mouthfeel. Panelists were trained to identify and score 50 different attributes, using a 7-point scale to rate attributes’ intensity. The scoring sheet is included as Supplementary Data  3 . Sensory assessments took place between 10–12 a.m. The beers were served in black-colored glasses. Per session, between 5 and 12 beers of the same style were tasted at 12 °C to 16 °C. Two reference beers were added to each set and indicated as ‘Reference 1 & 2’, allowing panel members to calibrate their ratings. Not all panelists were present at every tasting. Scores were scaled by standard deviation and mean-centered per taster. Values are represented as z-scores and clustered by Euclidean distance. Pairwise Spearman correlations were calculated between taste and aroma sensory attributes. Panel consistency was evaluated by repeating samples on different sessions and performing ANOVA to identify differences, using the ‘stats’ package (v4.2.2) in R (for package information, see Supplementary Table  S8 ).

Online reviews from a public database

The ‘scrapy’ package in Python (v3.6) (for package information, see Supplementary Table  S8 ). was used to collect 232,288 online reviews (mean=922, min=6, max=5343) from RateBeer, an online beer review database. Each review entry comprised 5 numerical scores (appearance, aroma, taste, palate and overall quality) and an optional review text. The total number of reviews per reviewer was collected separately. Numerical scores were scaled and centered per rater, and mean scores were calculated per beer.

For the review texts, the language was estimated using the packages ‘langdetect’ and ‘langid’ in Python. Reviews that were classified as English by both packages were kept. Reviewers with fewer than 100 entries overall were discarded. 181,025 reviews from >6000 reviewers from >40 countries remained. Text processing was done using the ‘nltk’ package in Python. Texts were corrected for slang and misspellings; proper nouns and rare words that are relevant to the beer context were specified and kept as-is (‘Chimay’,’Lambic’, etc.). A dictionary of semantically similar sensorial terms, for example ‘floral’ and ‘flower’, was created and collapsed together into one term. Words were stemmed and lemmatized to avoid identifying words such as ‘acid’ and ‘acidity’ as separate terms. Numbers and punctuation were removed.

Sentences from up to 50 randomly chosen reviews per beer were manually categorized according to the aspect of beer they describe (appearance, aroma, taste, palate, overall quality—not to be confused with the 5 numerical scores described above) or flagged as irrelevant if they contained no useful information. If a beer contained fewer than 50 reviews, all reviews were manually classified. This labeled data set was used to train a model that classified the rest of the sentences for all beers 91 . Sentences describing taste and aroma were extracted, and term frequency–inverse document frequency (TFIDF) was implemented to calculate enrichment scores for sensorial words per beer.

The sex of the tasting subject was not considered when building our sensory database. Instead, results from different panelists were averaged, both for our trained panel (56% male, 44% female) and the RateBeer reviews (70% male, 30% female for RateBeer as a whole).

Beer price collection and processing

Beer prices were collected from the following stores: Colruyt, Delhaize, Total Wine, BeerHawk, The Belgian Beer Shop, The Belgian Shop, and Beer of Belgium. Where applicable, prices were converted to Euros and normalized per liter. Spearman correlations were calculated between these prices and mean overall appreciation scores from RateBeer and the taste panel, respectively.

Pairwise Spearman Rank correlations were calculated between all sensory properties.

Machine learning models

Predictive modeling of sensory profiles from chemical data.

Regression models were constructed to predict (a) trained panel scores for beer flavors and quality from beer chemical profiles and (b) public reviews’ appreciation scores from beer chemical profiles. Z-scores were used to represent sensory attributes in both data sets. Chemical properties with log-normal distributions (Shapiro-Wilk test, p  <  0.05 ) were log-transformed. Missing chemical measurements (0.1% of all data) were replaced with mean values per attribute. Observations from 250 beers were randomly separated into a training set (70%, 175 beers) and a test set (30%, 75 beers), stratified per beer style. Chemical measurements (p = 231) were normalized based on the training set average and standard deviation. In total, three linear regression-based models: linear regression with first-order interaction terms (LR), lasso regression with first-order interaction terms (Lasso) and partial least squares regression (PLSR); five decision tree models, Adaboost regressor (ABR), Extra Trees (ET), Gradient Boosting regressor (GBR), Random Forest (RF) and XGBoost regressor (XGBR); one support vector machine model (SVR) and one artificial neural network model (ANN) were trained. The models were implemented using the ‘scikit-learn’ package (v1.2.2) and ‘xgboost’ package (v1.7.3) in Python (v3.9.16). Models were trained, and hyperparameters optimized, using five-fold cross-validated grid search with the coefficient of determination (R 2 ) as the evaluation metric. The ANN (scikit-learn’s MLPRegressor) was optimized using Bayesian Tree-Structured Parzen Estimator optimization with the ‘Optuna’ Python package (v3.2.0). Individual models were trained per attribute, and a multi-output model was trained on all attributes simultaneously.

Model dissection

GBR was found to outperform other methods, resulting in models with the highest average R 2 values in both trained panel and public review data sets. Impurity-based rankings of the most important predictors for each predicted sensorial trait were obtained using the ‘scikit-learn’ package. To observe the relationships between these chemical properties and their predicted targets, partial dependence plots (PDP) were constructed for the six most important predictors of consumer appreciation 74 , 75 .

The ‘SHAP’ package in Python (v0.41.0) was implemented to provide an alternative ranking of predictor importance and to visualize the predictors’ effects as a function of their concentration 68 .

Validation of causal chemical properties

To validate the effects of the most important model features on predicted sensory attributes, beers were spiked with the chemical compounds identified by the models and descriptive sensory analyses were carried out according to the American Society of Brewing Chemists (ASBC) protocol 90 .

Compound spiking was done 30 min before tasting. Compounds were spiked into fresh beer bottles, that were immediately resealed and inverted three times. Fresh bottles of beer were opened for the same duration, resealed, and inverted thrice, to serve as controls. Pairs of spiked samples and controls were served simultaneously, chilled and in dark glasses as outlined in the Trained panel section above. Tasters were instructed to select the glass with the higher flavor intensity for each attribute (directional difference test 92 ) and to select the glass they prefer.

The final concentration after spiking was equal to the within-style average, after normalizing by ethanol concentration. This was done to ensure balanced flavor profiles in the final spiked beer. The same methods were applied to improve a non-alcoholic beer. Compounds were the following: ethyl acetate (Merck KGaA, W241415), ethyl hexanoate (Merck KGaA, W243906), isoamyl acetate (Merck KGaA, W205508), phenethyl acetate (Merck KGaA, W285706), ethanol (96%, Colruyt), glycerol (Merck KGaA, W252506), lactic acid (Merck KGaA, 261106).

Significant differences in preference or perceived intensity were determined by performing the two-sided binomial test on each attribute.

Reporting summary

Further information on research design is available in the  Nature Portfolio Reporting Summary linked to this article.

Data availability

The data that support the findings of this work are available in the Supplementary Data files and have been deposited to Zenodo under accession code 10653704 93 . The RateBeer scores data are under restricted access, they are not publicly available as they are property of RateBeer (ZX Ventures, USA). Access can be obtained from the authors upon reasonable request and with permission of RateBeer (ZX Ventures, USA).  Source data are provided with this paper.

Code availability

The code for training the machine learning models, analyzing the models, and generating the figures has been deposited to Zenodo under accession code 10653704 93 .

Tieman, D. et al. A chemical genetic roadmap to improved tomato flavor. Science 355 , 391–394 (2017).

Article   ADS   CAS   PubMed   Google Scholar  

Plutowska, B. & Wardencki, W. Application of gas chromatography–olfactometry (GC–O) in analysis and quality assessment of alcoholic beverages – A review. Food Chem. 107 , 449–463 (2008).

Article   CAS   Google Scholar  

Legin, A., Rudnitskaya, A., Seleznev, B. & Vlasov, Y. Electronic tongue for quality assessment of ethanol, vodka and eau-de-vie. Anal. Chim. Acta 534 , 129–135 (2005).

Loutfi, A., Coradeschi, S., Mani, G. K., Shankar, P. & Rayappan, J. B. B. Electronic noses for food quality: A review. J. Food Eng. 144 , 103–111 (2015).

Ahn, Y.-Y., Ahnert, S. E., Bagrow, J. P. & Barabási, A.-L. Flavor network and the principles of food pairing. Sci. Rep. 1 , 196 (2011).

Article   CAS   PubMed   PubMed Central   Google Scholar  

Bartoshuk, L. M. & Klee, H. J. Better fruits and vegetables through sensory analysis. Curr. Biol. 23 , R374–R378 (2013).

Article   CAS   PubMed   Google Scholar  

Piggott, J. R. Design questions in sensory and consumer science. Food Qual. Prefer. 3293 , 217–220 (1995).

Article   Google Scholar  

Kermit, M. & Lengard, V. Assessing the performance of a sensory panel-panellist monitoring and tracking. J. Chemom. 19 , 154–161 (2005).

Cook, D. J., Hollowood, T. A., Linforth, R. S. T. & Taylor, A. J. Correlating instrumental measurements of texture and flavour release with human perception. Int. J. Food Sci. Technol. 40 , 631–641 (2005).

Chinchanachokchai, S., Thontirawong, P. & Chinchanachokchai, P. A tale of two recommender systems: The moderating role of consumer expertise on artificial intelligence based product recommendations. J. Retail. Consum. Serv. 61 , 1–12 (2021).

Ross, C. F. Sensory science at the human-machine interface. Trends Food Sci. Technol. 20 , 63–72 (2009).

Chambers, E. IV & Koppel, K. Associations of volatile compounds with sensory aroma and flavor: The complex nature of flavor. Molecules 18 , 4887–4905 (2013).

Pinu, F. R. Metabolomics—The new frontier in food safety and quality research. Food Res. Int. 72 , 80–81 (2015).

Danezis, G. P., Tsagkaris, A. S., Brusic, V. & Georgiou, C. A. Food authentication: state of the art and prospects. Curr. Opin. Food Sci. 10 , 22–31 (2016).

Shepherd, G. M. Smell images and the flavour system in the human brain. Nature 444 , 316–321 (2006).

Meilgaard, M. C. Prediction of flavor differences between beers from their chemical composition. J. Agric. Food Chem. 30 , 1009–1017 (1982).

Xu, L. et al. Widespread receptor-driven modulation in peripheral olfactory coding. Science 368 , eaaz5390 (2020).

Kupferschmidt, K. Following the flavor. Science 340 , 808–809 (2013).

Billesbølle, C. B. et al. Structural basis of odorant recognition by a human odorant receptor. Nature 615 , 742–749 (2023).

Article   ADS   PubMed   PubMed Central   Google Scholar  

Smith, B. Perspective: Complexities of flavour. Nature 486 , S6–S6 (2012).

Pfister, P. et al. Odorant receptor inhibition is fundamental to odor encoding. Curr. Biol. 30 , 2574–2587 (2020).

Moskowitz, H. W., Kumaraiah, V., Sharma, K. N., Jacobs, H. L. & Sharma, S. D. Cross-cultural differences in simple taste preferences. Science 190 , 1217–1218 (1975).

Eriksson, N. et al. A genetic variant near olfactory receptor genes influences cilantro preference. Flavour 1 , 22 (2012).

Ferdenzi, C. et al. Variability of affective responses to odors: Culture, gender, and olfactory knowledge. Chem. Senses 38 , 175–186 (2013).

Article   PubMed   Google Scholar  

Lawless, H. T. & Heymann, H. Sensory evaluation of food: Principles and practices. (Springer, New York, NY). https://doi.org/10.1007/978-1-4419-6488-5 (2010).

Colantonio, V. et al. Metabolomic selection for enhanced fruit flavor. Proc. Natl. Acad. Sci. 119 , e2115865119 (2022).

Fritz, F., Preissner, R. & Banerjee, P. VirtualTaste: a web server for the prediction of organoleptic properties of chemical compounds. Nucleic Acids Res 49 , W679–W684 (2021).

Tuwani, R., Wadhwa, S. & Bagler, G. BitterSweet: Building machine learning models for predicting the bitter and sweet taste of small molecules. Sci. Rep. 9 , 1–13 (2019).

Dagan-Wiener, A. et al. Bitter or not? BitterPredict, a tool for predicting taste from chemical structure. Sci. Rep. 7 , 1–13 (2017).

Pallante, L. et al. Toward a general and interpretable umami taste predictor using a multi-objective machine learning approach. Sci. Rep. 12 , 1–11 (2022).

Malavolta, M. et al. A survey on computational taste predictors. Eur. Food Res. Technol. 248 , 2215–2235 (2022).

Lee, B. K. et al. A principal odor map unifies diverse tasks in olfactory perception. Science 381 , 999–1006 (2023).

Mayhew, E. J. et al. Transport features predict if a molecule is odorous. Proc. Natl. Acad. Sci. 119 , e2116576119 (2022).

Niu, Y. et al. Sensory evaluation of the synergism among ester odorants in light aroma-type liquor by odor threshold, aroma intensity and flash GC electronic nose. Food Res. Int. 113 , 102–114 (2018).

Yu, P., Low, M. Y. & Zhou, W. Design of experiments and regression modelling in food flavour and sensory analysis: A review. Trends Food Sci. Technol. 71 , 202–215 (2018).

Oladokun, O. et al. The impact of hop bitter acid and polyphenol profiles on the perceived bitterness of beer. Food Chem. 205 , 212–220 (2016).

Linforth, R., Cabannes, M., Hewson, L., Yang, N. & Taylor, A. Effect of fat content on flavor delivery during consumption: An in vivo model. J. Agric. Food Chem. 58 , 6905–6911 (2010).

Guo, S., Na Jom, K. & Ge, Y. Influence of roasting condition on flavor profile of sunflower seeds: A flavoromics approach. Sci. Rep. 9 , 11295 (2019).

Ren, Q. et al. The changes of microbial community and flavor compound in the fermentation process of Chinese rice wine using Fagopyrum tataricum grain as feedstock. Sci. Rep. 9 , 3365 (2019).

Hastie, T., Friedman, J. & Tibshirani, R. The Elements of Statistical Learning. (Springer, New York, NY). https://doi.org/10.1007/978-0-387-21606-5 (2001).

Dietz, C., Cook, D., Huismann, M., Wilson, C. & Ford, R. The multisensory perception of hop essential oil: a review. J. Inst. Brew. 126 , 320–342 (2020).

CAS   Google Scholar  

Roncoroni, Miguel & Verstrepen, Kevin Joan. Belgian Beer: Tested and Tasted. (Lannoo, 2018).

Meilgaard, M. Flavor chemistry of beer: Part II: Flavor and threshold of 239 aroma volatiles. in (1975).

Bokulich, N. A. & Bamforth, C. W. The microbiology of malting and brewing. Microbiol. Mol. Biol. Rev. MMBR 77 , 157–172 (2013).

Dzialo, M. C., Park, R., Steensels, J., Lievens, B. & Verstrepen, K. J. Physiology, ecology and industrial applications of aroma formation in yeast. FEMS Microbiol. Rev. 41 , S95–S128 (2017).

Article   PubMed   PubMed Central   Google Scholar  

Datta, A. et al. Computer-aided food engineering. Nat. Food 3 , 894–904 (2022).

American Society of Brewing Chemists. Beer Methods. (American Society of Brewing Chemists, St. Paul, MN, U.S.A.).

Olaniran, A. O., Hiralal, L., Mokoena, M. P. & Pillay, B. Flavour-active volatile compounds in beer: production, regulation and control. J. Inst. Brew. 123 , 13–23 (2017).

Verstrepen, K. J. et al. Flavor-active esters: Adding fruitiness to beer. J. Biosci. Bioeng. 96 , 110–118 (2003).

Meilgaard, M. C. Flavour chemistry of beer. part I: flavour interaction between principal volatiles. Master Brew. Assoc. Am. Tech. Q 12 , 107–117 (1975).

Briggs, D. E., Boulton, C. A., Brookes, P. A. & Stevens, R. Brewing 227–254. (Woodhead Publishing). https://doi.org/10.1533/9781855739062.227 (2004).

Bossaert, S., Crauwels, S., De Rouck, G. & Lievens, B. The power of sour - A review: Old traditions, new opportunities. BrewingScience 72 , 78–88 (2019).

Google Scholar  

Verstrepen, K. J. et al. Flavor active esters: Adding fruitiness to beer. J. Biosci. Bioeng. 96 , 110–118 (2003).

Snauwaert, I. et al. Microbial diversity and metabolite composition of Belgian red-brown acidic ales. Int. J. Food Microbiol. 221 , 1–11 (2016).

Spitaels, F. et al. The microbial diversity of traditional spontaneously fermented lambic beer. PLoS ONE 9 , e95384 (2014).

Blanco, C. A., Andrés-Iglesias, C. & Montero, O. Low-alcohol Beers: Flavor Compounds, Defects, and Improvement Strategies. Crit. Rev. Food Sci. Nutr. 56 , 1379–1388 (2016).

Jackowski, M. & Trusek, A. Non-Alcohol. beer Prod. – Overv. 20 , 32–38 (2018).

Takoi, K. et al. The contribution of geraniol metabolism to the citrus flavour of beer: Synergy of geraniol and β-citronellol under coexistence with excess linalool. J. Inst. Brew. 116 , 251–260 (2010).

Kroeze, J. H. & Bartoshuk, L. M. Bitterness suppression as revealed by split-tongue taste stimulation in humans. Physiol. Behav. 35 , 779–783 (1985).

Mennella, J. A. et al. A spoonful of sugar helps the medicine go down”: Bitter masking bysucrose among children and adults. Chem. Senses 40 , 17–25 (2015).

Wietstock, P., Kunz, T., Perreira, F. & Methner, F.-J. Metal chelation behavior of hop acids in buffered model systems. BrewingScience 69 , 56–63 (2016).

Sancho, D., Blanco, C. A., Caballero, I. & Pascual, A. Free iron in pale, dark and alcohol-free commercial lager beers. J. Sci. Food Agric. 91 , 1142–1147 (2011).

Rodrigues, H. & Parr, W. V. Contribution of cross-cultural studies to understanding wine appreciation: A review. Food Res. Int. 115 , 251–258 (2019).

Korneva, E. & Blockeel, H. Towards better evaluation of multi-target regression models. in ECML PKDD 2020 Workshops (eds. Koprinska, I. et al.) 353–362 (Springer International Publishing, Cham, 2020). https://doi.org/10.1007/978-3-030-65965-3_23 .

Gastón Ares. Mathematical and Statistical Methods in Food Science and Technology. (Wiley, 2013).

Grinsztajn, L., Oyallon, E. & Varoquaux, G. Why do tree-based models still outperform deep learning on tabular data? Preprint at http://arxiv.org/abs/2207.08815 (2022).

Gries, S. T. Statistics for Linguistics with R: A Practical Introduction. in Statistics for Linguistics with R (De Gruyter Mouton, 2021). https://doi.org/10.1515/9783110718256 .

Lundberg, S. M. et al. From local explanations to global understanding with explainable AI for trees. Nat. Mach. Intell. 2 , 56–67 (2020).

Ickes, C. M. & Cadwallader, K. R. Effects of ethanol on flavor perception in alcoholic beverages. Chemosens. Percept. 10 , 119–134 (2017).

Kato, M. et al. Influence of high molecular weight polypeptides on the mouthfeel of commercial beer. J. Inst. Brew. 127 , 27–40 (2021).

Wauters, R. et al. Novel Saccharomyces cerevisiae variants slow down the accumulation of staling aldehydes and improve beer shelf-life. Food Chem. 398 , 1–11 (2023).

Li, H., Jia, S. & Zhang, W. Rapid determination of low-level sulfur compounds in beer by headspace gas chromatography with a pulsed flame photometric detector. J. Am. Soc. Brew. Chem. 66 , 188–191 (2008).

Dercksen, A., Laurens, J., Torline, P., Axcell, B. C. & Rohwer, E. Quantitative analysis of volatile sulfur compounds in beer using a membrane extraction interface. J. Am. Soc. Brew. Chem. 54 , 228–233 (1996).

Molnar, C. Interpretable Machine Learning: A Guide for Making Black-Box Models Interpretable. (2020).

Zhao, Q. & Hastie, T. Causal interpretations of black-box models. J. Bus. Econ. Stat. Publ. Am. Stat. Assoc. 39 , 272–281 (2019).

Article   MathSciNet   Google Scholar  

Hastie, T., Tibshirani, R. & Friedman, J. The Elements of Statistical Learning. (Springer, 2019).

Labrado, D. et al. Identification by NMR of key compounds present in beer distillates and residual phases after dealcoholization by vacuum distillation. J. Sci. Food Agric. 100 , 3971–3978 (2020).

Lusk, L. T., Kay, S. B., Porubcan, A. & Ryder, D. S. Key olfactory cues for beer oxidation. J. Am. Soc. Brew. Chem. 70 , 257–261 (2012).

Gonzalez Viejo, C., Torrico, D. D., Dunshea, F. R. & Fuentes, S. Development of artificial neural network models to assess beer acceptability based on sensory properties using a robotic pourer: A comparative model approach to achieve an artificial intelligence system. Beverages 5 , 33 (2019).

Gonzalez Viejo, C., Fuentes, S., Torrico, D. D., Godbole, A. & Dunshea, F. R. Chemical characterization of aromas in beer and their effect on consumers liking. Food Chem. 293 , 479–485 (2019).

Gilbert, J. L. et al. Identifying breeding priorities for blueberry flavor using biochemical, sensory, and genotype by environment analyses. PLOS ONE 10 , 1–21 (2015).

Goulet, C. et al. Role of an esterase in flavor volatile variation within the tomato clade. Proc. Natl. Acad. Sci. 109 , 19009–19014 (2012).

Article   ADS   CAS   PubMed   PubMed Central   Google Scholar  

Borisov, V. et al. Deep Neural Networks and Tabular Data: A Survey. IEEE Trans. Neural Netw. Learn. Syst. 1–21 https://doi.org/10.1109/TNNLS.2022.3229161 (2022).

Statista. Statista Consumer Market Outlook: Beer - Worldwide.

Seitz, H. K. & Stickel, F. Molecular mechanisms of alcoholmediated carcinogenesis. Nat. Rev. Cancer 7 , 599–612 (2007).

Voordeckers, K. et al. Ethanol exposure increases mutation rate through error-prone polymerases. Nat. Commun. 11 , 3664 (2020).

Goelen, T. et al. Bacterial phylogeny predicts volatile organic compound composition and olfactory response of an aphid parasitoid. Oikos 129 , 1415–1428 (2020).

Article   ADS   Google Scholar  

Reher, T. et al. Evaluation of hop (Humulus lupulus) as a repellent for the management of Drosophila suzukii. Crop Prot. 124 , 104839 (2019).

Stein, S. E. An integrated method for spectrum extraction and compound identification from gas chromatography/mass spectrometry data. J. Am. Soc. Mass Spectrom. 10 , 770–781 (1999).

American Society of Brewing Chemists. Sensory Analysis Methods. (American Society of Brewing Chemists, St. Paul, MN, U.S.A., 1992).

McAuley, J., Leskovec, J. & Jurafsky, D. Learning Attitudes and Attributes from Multi-Aspect Reviews. Preprint at https://doi.org/10.48550/arXiv.1210.3926 (2012).

Meilgaard, M. C., Carr, B. T. & Carr, B. T. Sensory Evaluation Techniques. (CRC Press, Boca Raton). https://doi.org/10.1201/b16452 (2014).

Schreurs, M. et al. Data from: Predicting and improving complex beer flavor through machine learning. Zenodo https://doi.org/10.5281/zenodo.10653704 (2024).

Download references

Acknowledgements

We thank all lab members for their discussions and thank all tasting panel members for their contributions. Special thanks go out to Dr. Karin Voordeckers for her tremendous help in proofreading and improving the manuscript. M.S. was supported by a Baillet-Latour fellowship, L.C. acknowledges financial support from KU Leuven (C16/17/006), F.A.T. was supported by a PhD fellowship from FWO (1S08821N). Research in the lab of K.J.V. is supported by KU Leuven, FWO, VIB, VLAIO and the Brewing Science Serves Health Fund. Research in the lab of T.W. is supported by FWO (G.0A51.15) and KU Leuven (C16/17/006).

Author information

These authors contributed equally: Michiel Schreurs, Supinya Piampongsant, Miguel Roncoroni.

Authors and Affiliations

VIB—KU Leuven Center for Microbiology, Gaston Geenslaan 1, B-3001, Leuven, Belgium

Michiel Schreurs, Supinya Piampongsant, Miguel Roncoroni, Lloyd Cool, Beatriz Herrera-Malaver, Florian A. Theßeling & Kevin J. Verstrepen

CMPG Laboratory of Genetics and Genomics, KU Leuven, Gaston Geenslaan 1, B-3001, Leuven, Belgium

Leuven Institute for Beer Research (LIBR), Gaston Geenslaan 1, B-3001, Leuven, Belgium

Laboratory of Socioecology and Social Evolution, KU Leuven, Naamsestraat 59, B-3000, Leuven, Belgium

Lloyd Cool, Christophe Vanderaa & Tom Wenseleers

VIB Bioinformatics Core, VIB, Rijvisschestraat 120, B-9052, Ghent, Belgium

Łukasz Kreft & Alexander Botzki

AB InBev SA/NV, Brouwerijplein 1, B-3000, Leuven, Belgium

Philippe Malcorps & Luk Daenen

You can also search for this author in PubMed   Google Scholar

Contributions

S.P., M.S. and K.J.V. conceived the experiments. S.P., M.S. and K.J.V. designed the experiments. S.P., M.S., M.R., B.H. and F.A.T. performed the experiments. S.P., M.S., L.C., C.V., L.K., A.B., P.M., L.D., T.W. and K.J.V. contributed analysis ideas. S.P., M.S., L.C., C.V., T.W. and K.J.V. analyzed the data. All authors contributed to writing the manuscript.

Corresponding author

Correspondence to Kevin J. Verstrepen .

Ethics declarations

Competing interests.

K.J.V. is affiliated with bar.on. The other authors declare no competing interests.

Peer review

Peer review information.

Nature Communications thanks Florian Bauer, Andrew John Macintosh and the other, anonymous, reviewer(s) for their contribution to the peer review of this work. A peer review file is available.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary information

Supplementary information, peer review file, description of additional supplementary files, supplementary data 1, supplementary data 2, supplementary data 3, supplementary data 4, supplementary data 5, supplementary data 6, supplementary data 7, reporting summary, source data, source data, rights and permissions.

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ .

Reprints and permissions

About this article

Cite this article.

Schreurs, M., Piampongsant, S., Roncoroni, M. et al. Predicting and improving complex beer flavor through machine learning. Nat Commun 15 , 2368 (2024). https://doi.org/10.1038/s41467-024-46346-0

Download citation

Received : 30 October 2023

Accepted : 21 February 2024

Published : 26 March 2024

DOI : https://doi.org/10.1038/s41467-024-46346-0

Share this article

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

By submitting a comment you agree to abide by our Terms and Community Guidelines . If you find something abusive or that does not comply with our terms or guidelines please flag it as inappropriate.

Quick links

  • Explore articles by subject
  • Guide to authors
  • Editorial policies

Sign up for the Nature Briefing: Translational Research newsletter — top stories in biotechnology, drug discovery and pharma.

paper research techniques

  • Privacy Policy

Buy Me a Coffee

Research Method

Home » Research Paper – Structure, Examples and Writing Guide

Research Paper – Structure, Examples and Writing Guide

Table of Contents

Research Paper

Research Paper

Definition:

Research Paper is a written document that presents the author’s original research, analysis, and interpretation of a specific topic or issue.

It is typically based on Empirical Evidence, and may involve qualitative or quantitative research methods, or a combination of both. The purpose of a research paper is to contribute new knowledge or insights to a particular field of study, and to demonstrate the author’s understanding of the existing literature and theories related to the topic.

Structure of Research Paper

The structure of a research paper typically follows a standard format, consisting of several sections that convey specific information about the research study. The following is a detailed explanation of the structure of a research paper:

The title page contains the title of the paper, the name(s) of the author(s), and the affiliation(s) of the author(s). It also includes the date of submission and possibly, the name of the journal or conference where the paper is to be published.

The abstract is a brief summary of the research paper, typically ranging from 100 to 250 words. It should include the research question, the methods used, the key findings, and the implications of the results. The abstract should be written in a concise and clear manner to allow readers to quickly grasp the essence of the research.

Introduction

The introduction section of a research paper provides background information about the research problem, the research question, and the research objectives. It also outlines the significance of the research, the research gap that it aims to fill, and the approach taken to address the research question. Finally, the introduction section ends with a clear statement of the research hypothesis or research question.

Literature Review

The literature review section of a research paper provides an overview of the existing literature on the topic of study. It includes a critical analysis and synthesis of the literature, highlighting the key concepts, themes, and debates. The literature review should also demonstrate the research gap and how the current study seeks to address it.

The methods section of a research paper describes the research design, the sample selection, the data collection and analysis procedures, and the statistical methods used to analyze the data. This section should provide sufficient detail for other researchers to replicate the study.

The results section presents the findings of the research, using tables, graphs, and figures to illustrate the data. The findings should be presented in a clear and concise manner, with reference to the research question and hypothesis.

The discussion section of a research paper interprets the findings and discusses their implications for the research question, the literature review, and the field of study. It should also address the limitations of the study and suggest future research directions.

The conclusion section summarizes the main findings of the study, restates the research question and hypothesis, and provides a final reflection on the significance of the research.

The references section provides a list of all the sources cited in the paper, following a specific citation style such as APA, MLA or Chicago.

How to Write Research Paper

You can write Research Paper by the following guide:

  • Choose a Topic: The first step is to select a topic that interests you and is relevant to your field of study. Brainstorm ideas and narrow down to a research question that is specific and researchable.
  • Conduct a Literature Review: The literature review helps you identify the gap in the existing research and provides a basis for your research question. It also helps you to develop a theoretical framework and research hypothesis.
  • Develop a Thesis Statement : The thesis statement is the main argument of your research paper. It should be clear, concise and specific to your research question.
  • Plan your Research: Develop a research plan that outlines the methods, data sources, and data analysis procedures. This will help you to collect and analyze data effectively.
  • Collect and Analyze Data: Collect data using various methods such as surveys, interviews, observations, or experiments. Analyze data using statistical tools or other qualitative methods.
  • Organize your Paper : Organize your paper into sections such as Introduction, Literature Review, Methods, Results, Discussion, and Conclusion. Ensure that each section is coherent and follows a logical flow.
  • Write your Paper : Start by writing the introduction, followed by the literature review, methods, results, discussion, and conclusion. Ensure that your writing is clear, concise, and follows the required formatting and citation styles.
  • Edit and Proofread your Paper: Review your paper for grammar and spelling errors, and ensure that it is well-structured and easy to read. Ask someone else to review your paper to get feedback and suggestions for improvement.
  • Cite your Sources: Ensure that you properly cite all sources used in your research paper. This is essential for giving credit to the original authors and avoiding plagiarism.

Research Paper Example

Note : The below example research paper is for illustrative purposes only and is not an actual research paper. Actual research papers may have different structures, contents, and formats depending on the field of study, research question, data collection and analysis methods, and other factors. Students should always consult with their professors or supervisors for specific guidelines and expectations for their research papers.

Research Paper Example sample for Students:

Title: The Impact of Social Media on Mental Health among Young Adults

Abstract: This study aims to investigate the impact of social media use on the mental health of young adults. A literature review was conducted to examine the existing research on the topic. A survey was then administered to 200 university students to collect data on their social media use, mental health status, and perceived impact of social media on their mental health. The results showed that social media use is positively associated with depression, anxiety, and stress. The study also found that social comparison, cyberbullying, and FOMO (Fear of Missing Out) are significant predictors of mental health problems among young adults.

Introduction: Social media has become an integral part of modern life, particularly among young adults. While social media has many benefits, including increased communication and social connectivity, it has also been associated with negative outcomes, such as addiction, cyberbullying, and mental health problems. This study aims to investigate the impact of social media use on the mental health of young adults.

Literature Review: The literature review highlights the existing research on the impact of social media use on mental health. The review shows that social media use is associated with depression, anxiety, stress, and other mental health problems. The review also identifies the factors that contribute to the negative impact of social media, including social comparison, cyberbullying, and FOMO.

Methods : A survey was administered to 200 university students to collect data on their social media use, mental health status, and perceived impact of social media on their mental health. The survey included questions on social media use, mental health status (measured using the DASS-21), and perceived impact of social media on their mental health. Data were analyzed using descriptive statistics and regression analysis.

Results : The results showed that social media use is positively associated with depression, anxiety, and stress. The study also found that social comparison, cyberbullying, and FOMO are significant predictors of mental health problems among young adults.

Discussion : The study’s findings suggest that social media use has a negative impact on the mental health of young adults. The study highlights the need for interventions that address the factors contributing to the negative impact of social media, such as social comparison, cyberbullying, and FOMO.

Conclusion : In conclusion, social media use has a significant impact on the mental health of young adults. The study’s findings underscore the need for interventions that promote healthy social media use and address the negative outcomes associated with social media use. Future research can explore the effectiveness of interventions aimed at reducing the negative impact of social media on mental health. Additionally, longitudinal studies can investigate the long-term effects of social media use on mental health.

Limitations : The study has some limitations, including the use of self-report measures and a cross-sectional design. The use of self-report measures may result in biased responses, and a cross-sectional design limits the ability to establish causality.

Implications: The study’s findings have implications for mental health professionals, educators, and policymakers. Mental health professionals can use the findings to develop interventions that address the negative impact of social media use on mental health. Educators can incorporate social media literacy into their curriculum to promote healthy social media use among young adults. Policymakers can use the findings to develop policies that protect young adults from the negative outcomes associated with social media use.

References :

  • Twenge, J. M., & Campbell, W. K. (2019). Associations between screen time and lower psychological well-being among children and adolescents: Evidence from a population-based study. Preventive medicine reports, 15, 100918.
  • Primack, B. A., Shensa, A., Escobar-Viera, C. G., Barrett, E. L., Sidani, J. E., Colditz, J. B., … & James, A. E. (2017). Use of multiple social media platforms and symptoms of depression and anxiety: A nationally-representative study among US young adults. Computers in Human Behavior, 69, 1-9.
  • Van der Meer, T. G., & Verhoeven, J. W. (2017). Social media and its impact on academic performance of students. Journal of Information Technology Education: Research, 16, 383-398.

Appendix : The survey used in this study is provided below.

Social Media and Mental Health Survey

  • How often do you use social media per day?
  • Less than 30 minutes
  • 30 minutes to 1 hour
  • 1 to 2 hours
  • 2 to 4 hours
  • More than 4 hours
  • Which social media platforms do you use?
  • Others (Please specify)
  • How often do you experience the following on social media?
  • Social comparison (comparing yourself to others)
  • Cyberbullying
  • Fear of Missing Out (FOMO)
  • Have you ever experienced any of the following mental health problems in the past month?
  • Do you think social media use has a positive or negative impact on your mental health?
  • Very positive
  • Somewhat positive
  • Somewhat negative
  • Very negative
  • In your opinion, which factors contribute to the negative impact of social media on mental health?
  • Social comparison
  • In your opinion, what interventions could be effective in reducing the negative impact of social media on mental health?
  • Education on healthy social media use
  • Counseling for mental health problems caused by social media
  • Social media detox programs
  • Regulation of social media use

Thank you for your participation!

Applications of Research Paper

Research papers have several applications in various fields, including:

  • Advancing knowledge: Research papers contribute to the advancement of knowledge by generating new insights, theories, and findings that can inform future research and practice. They help to answer important questions, clarify existing knowledge, and identify areas that require further investigation.
  • Informing policy: Research papers can inform policy decisions by providing evidence-based recommendations for policymakers. They can help to identify gaps in current policies, evaluate the effectiveness of interventions, and inform the development of new policies and regulations.
  • Improving practice: Research papers can improve practice by providing evidence-based guidance for professionals in various fields, including medicine, education, business, and psychology. They can inform the development of best practices, guidelines, and standards of care that can improve outcomes for individuals and organizations.
  • Educating students : Research papers are often used as teaching tools in universities and colleges to educate students about research methods, data analysis, and academic writing. They help students to develop critical thinking skills, research skills, and communication skills that are essential for success in many careers.
  • Fostering collaboration: Research papers can foster collaboration among researchers, practitioners, and policymakers by providing a platform for sharing knowledge and ideas. They can facilitate interdisciplinary collaborations and partnerships that can lead to innovative solutions to complex problems.

When to Write Research Paper

Research papers are typically written when a person has completed a research project or when they have conducted a study and have obtained data or findings that they want to share with the academic or professional community. Research papers are usually written in academic settings, such as universities, but they can also be written in professional settings, such as research organizations, government agencies, or private companies.

Here are some common situations where a person might need to write a research paper:

  • For academic purposes: Students in universities and colleges are often required to write research papers as part of their coursework, particularly in the social sciences, natural sciences, and humanities. Writing research papers helps students to develop research skills, critical thinking skills, and academic writing skills.
  • For publication: Researchers often write research papers to publish their findings in academic journals or to present their work at academic conferences. Publishing research papers is an important way to disseminate research findings to the academic community and to establish oneself as an expert in a particular field.
  • To inform policy or practice : Researchers may write research papers to inform policy decisions or to improve practice in various fields. Research findings can be used to inform the development of policies, guidelines, and best practices that can improve outcomes for individuals and organizations.
  • To share new insights or ideas: Researchers may write research papers to share new insights or ideas with the academic or professional community. They may present new theories, propose new research methods, or challenge existing paradigms in their field.

Purpose of Research Paper

The purpose of a research paper is to present the results of a study or investigation in a clear, concise, and structured manner. Research papers are written to communicate new knowledge, ideas, or findings to a specific audience, such as researchers, scholars, practitioners, or policymakers. The primary purposes of a research paper are:

  • To contribute to the body of knowledge : Research papers aim to add new knowledge or insights to a particular field or discipline. They do this by reporting the results of empirical studies, reviewing and synthesizing existing literature, proposing new theories, or providing new perspectives on a topic.
  • To inform or persuade: Research papers are written to inform or persuade the reader about a particular issue, topic, or phenomenon. They present evidence and arguments to support their claims and seek to persuade the reader of the validity of their findings or recommendations.
  • To advance the field: Research papers seek to advance the field or discipline by identifying gaps in knowledge, proposing new research questions or approaches, or challenging existing assumptions or paradigms. They aim to contribute to ongoing debates and discussions within a field and to stimulate further research and inquiry.
  • To demonstrate research skills: Research papers demonstrate the author’s research skills, including their ability to design and conduct a study, collect and analyze data, and interpret and communicate findings. They also demonstrate the author’s ability to critically evaluate existing literature, synthesize information from multiple sources, and write in a clear and structured manner.

Characteristics of Research Paper

Research papers have several characteristics that distinguish them from other forms of academic or professional writing. Here are some common characteristics of research papers:

  • Evidence-based: Research papers are based on empirical evidence, which is collected through rigorous research methods such as experiments, surveys, observations, or interviews. They rely on objective data and facts to support their claims and conclusions.
  • Structured and organized: Research papers have a clear and logical structure, with sections such as introduction, literature review, methods, results, discussion, and conclusion. They are organized in a way that helps the reader to follow the argument and understand the findings.
  • Formal and objective: Research papers are written in a formal and objective tone, with an emphasis on clarity, precision, and accuracy. They avoid subjective language or personal opinions and instead rely on objective data and analysis to support their arguments.
  • Citations and references: Research papers include citations and references to acknowledge the sources of information and ideas used in the paper. They use a specific citation style, such as APA, MLA, or Chicago, to ensure consistency and accuracy.
  • Peer-reviewed: Research papers are often peer-reviewed, which means they are evaluated by other experts in the field before they are published. Peer-review ensures that the research is of high quality, meets ethical standards, and contributes to the advancement of knowledge in the field.
  • Objective and unbiased: Research papers strive to be objective and unbiased in their presentation of the findings. They avoid personal biases or preconceptions and instead rely on the data and analysis to draw conclusions.

Advantages of Research Paper

Research papers have many advantages, both for the individual researcher and for the broader academic and professional community. Here are some advantages of research papers:

  • Contribution to knowledge: Research papers contribute to the body of knowledge in a particular field or discipline. They add new information, insights, and perspectives to existing literature and help advance the understanding of a particular phenomenon or issue.
  • Opportunity for intellectual growth: Research papers provide an opportunity for intellectual growth for the researcher. They require critical thinking, problem-solving, and creativity, which can help develop the researcher’s skills and knowledge.
  • Career advancement: Research papers can help advance the researcher’s career by demonstrating their expertise and contributions to the field. They can also lead to new research opportunities, collaborations, and funding.
  • Academic recognition: Research papers can lead to academic recognition in the form of awards, grants, or invitations to speak at conferences or events. They can also contribute to the researcher’s reputation and standing in the field.
  • Impact on policy and practice: Research papers can have a significant impact on policy and practice. They can inform policy decisions, guide practice, and lead to changes in laws, regulations, or procedures.
  • Advancement of society: Research papers can contribute to the advancement of society by addressing important issues, identifying solutions to problems, and promoting social justice and equality.

Limitations of Research Paper

Research papers also have some limitations that should be considered when interpreting their findings or implications. Here are some common limitations of research papers:

  • Limited generalizability: Research findings may not be generalizable to other populations, settings, or contexts. Studies often use specific samples or conditions that may not reflect the broader population or real-world situations.
  • Potential for bias : Research papers may be biased due to factors such as sample selection, measurement errors, or researcher biases. It is important to evaluate the quality of the research design and methods used to ensure that the findings are valid and reliable.
  • Ethical concerns: Research papers may raise ethical concerns, such as the use of vulnerable populations or invasive procedures. Researchers must adhere to ethical guidelines and obtain informed consent from participants to ensure that the research is conducted in a responsible and respectful manner.
  • Limitations of methodology: Research papers may be limited by the methodology used to collect and analyze data. For example, certain research methods may not capture the complexity or nuance of a particular phenomenon, or may not be appropriate for certain research questions.
  • Publication bias: Research papers may be subject to publication bias, where positive or significant findings are more likely to be published than negative or non-significant findings. This can skew the overall findings of a particular area of research.
  • Time and resource constraints: Research papers may be limited by time and resource constraints, which can affect the quality and scope of the research. Researchers may not have access to certain data or resources, or may be unable to conduct long-term studies due to practical limitations.

About the author

' src=

Muhammad Hassan

Researcher, Academic Writer, Web developer

You may also like

Research Paper Citation

How to Cite Research Paper – All Formats and...

Data collection

Data Collection – Methods Types and Examples

Delimitations

Delimitations in Research – Types, Examples and...

Research Paper Formats

Research Paper Format – Types, Examples and...

Research Process

Research Process – Steps, Examples and Tips

Research Design

Research Design – Types, Methods and Examples

Help | Advanced Search

Computer Science > Human-Computer Interaction

Title: advancing explainable autonomous vehicle systems: a comprehensive review and research roadmap.

Abstract: Given the uncertainty surrounding how existing explainability methods for autonomous vehicles (AVs) meet the diverse needs of stakeholders, a thorough investigation is imperative to determine the contexts requiring explanations and suitable interaction strategies. A comprehensive review becomes crucial to assess the alignment of current approaches with the varied interests and expectations within the AV ecosystem. This study presents a review to discuss the complexities associated with explanation generation and presentation to facilitate the development of more effective and inclusive explainable AV systems. Our investigation led to categorising existing literature into three primary topics: explanatory tasks, explanatory information, and explanatory information communication. Drawing upon our insights, we have proposed a comprehensive roadmap for future research centred on (i) knowing the interlocutor, (ii) generating timely explanations, (ii) communicating human-friendly explanations, and (iv) continuous learning. Our roadmap is underpinned by principles of responsible research and innovation, emphasising the significance of diverse explanation requirements. To effectively tackle the challenges associated with implementing explainable AV systems, we have delineated various research directions, including the development of privacy-preserving data integration, ethical frameworks, real-time analytics, human-centric interaction design, and enhanced cross-disciplinary collaborations. By exploring these research directions, the study aims to guide the development and deployment of explainable AVs, informed by a holistic understanding of user needs, technological advancements, regulatory compliance, and ethical considerations, thereby ensuring safer and more trustworthy autonomous driving experiences.

Submission history

Access paper:.

  • HTML (experimental)
  • Other Formats

References & Citations

  • Google Scholar
  • Semantic Scholar

BibTeX formatted citation

BibSonomy logo

Bibliographic and Citation Tools

Code, data and media associated with this article, recommenders and search tools.

  • Institution

arXivLabs: experimental projects with community collaborators

arXivLabs is a framework that allows collaborators to develop and share new arXiv features directly on our website.

Both individuals and organizations that work with arXivLabs have embraced and accepted our values of openness, community, excellence, and user data privacy. arXiv is committed to these values and only works with partners that adhere to them.

Have an idea for a project that will add value for arXiv's community? Learn more about arXivLabs .

ASA Connect

  • Community Home
  • Discussion 12.1K
  • Library 761
  • Members 15.9K

Health Survey Research Methods Conference call for papers

1.  health survey research methods conference call for papers.

Hi all. Reposting this to ASA in case there is interest:

The 12th Health Survey Research Methods Conference (HSRMC) will continue the series that began 50 years ago, in 1975, to discuss innovative survey research methods that improve the quality of health survey data. The next conference will be held in Williamsburg, VA from March 4-7, 2025.

The HSRMC steering committee is seeking abstracts for papers to be presented at the 2025 conference, including: general overview papers that summarize and integrate current knowledge, papers that identify and address future research challenges, innovative theoretical essays, and other papers that describe new empirical research that advances the field of survey methods and their application to health-related issues.

Read more about the 2025 call, submit your abstracts and learn more about the history of the HSRMC at the link below.

Call for Papers | HSRM Conference

Hoping to see many AAPOR friends at HSRMC 2025!

HSRMC Steering Committee member

New Best Answer

Related content, 2022 fcsm conference: call for papers, job opening: mathematical statistician (survey methodologist) at cdc, 2020 fall technical conference call for papers, call for papers - spring research conference 2016.

American Statistical Association 732 North Washington Street Alexandria, VA 22314-1943 Email: [email protected] Phone:  (703) 684-1221

Join Benefits Learn More

About Us Code of Conduct

IMAGES

  1. Best Steps to Write a Research Paper in College/University

    paper research techniques

  2. Best Research Techniques You Should Employ When Writing A Research

    paper research techniques

  3. 15 Research Methodology Examples (2023)

    paper research techniques

  4. How to Write a Research Paper

    paper research techniques

  5. How to Write a Research Paper

    paper research techniques

  6. How to Write a Research Paper ()

    paper research techniques

VIDEO

  1. Research Methods Paper!

  2. Research Paper Methodology

  3. Secret To Writing A Research Paper

  4. Online Workshop on Research Paper Writing & Publishing Day 1

  5. Online Workshop on Research Paper Writing & Publishing Day 2

  6. Research Paper Vs Review Paper

COMMENTS

  1. Research Methods

    Research methods are specific procedures for collecting and analyzing data. Developing your research methods is an integral part of your research design. When planning your methods, there are two key decisions you will make. First, decide how you will collect data. Your methods depend on what type of data you need to answer your research question:

  2. Research Methodology

    The research methodology is an important section of any research paper or thesis, as it describes the methods and procedures that will be used to conduct the research. It should include details about the research design, data collection methods, data analysis techniques, and any ethical considerations.

  3. Research Techniques

    Examples of quantitative research techniques are surveys, experiments, and statistical analysis. Qualitative research: This is a research method that focuses on collecting and analyzing non-numerical data, such as text, images, and videos, to gain insights into the subjective experiences and perspectives of the participants.

  4. How to Write Your Methods

    Your Methods Section contextualizes the results of your study, giving editors, reviewers and readers alike the information they need to understand and interpret your work. Your methods are key to establishing the credibility of your study, along with your data and the results themselves. A complete methods section should provide enough detail ...

  5. What is Research Methodology? Definition, Types, and Examples

    Definition, Types, and Examples. Research methodology 1,2 is a structured and scientific approach used to collect, analyze, and interpret quantitative or qualitative data to answer research questions or test hypotheses. A research methodology is like a plan for carrying out research and helps keep researchers on track by limiting the scope of ...

  6. Organizing Your Social Sciences Research Paper

    The methods section describes actions taken to investigate a research problem and the rationale for the application of specific procedures or techniques used to identify, select, process, and analyze information applied to understanding the problem, thereby, allowing the reader to critically evaluate a study's overall validity and reliability.

  7. Writing the Research Paper

    Writing the Research Paper. Write a detailed outline. Almost the rough content of every paragraph. The order of the various topics in your paper. On the basis of the outline, start writing a part by planning the content, and then write it down. Put a visible mark (which you will later delete) where you need to quote a source, and write in the ...

  8. A Practical Guide to Writing Quantitative and Qualitative Research

    INTRODUCTION. Scientific research is usually initiated by posing evidenced-based research questions which are then explicitly restated as hypotheses.1,2 The hypotheses provide directions to guide the study, solutions, explanations, and expected results.3,4 Both research questions and hypotheses are essentially formulated based on conventional theories and real-world processes, which allow the ...

  9. How to Write a Research Paper

    By refining your focus, you can produce a thoughtful and engaging paper that effectively communicates your ideas to your readers. 5. Write a thesis statement. A thesis statement is a one-to-two-sentence summary of your research paper's main argument or direction.

  10. Research Methods

    Quantitative research methods are used to collect and analyze numerical data. This type of research is useful when the objective is to test a hypothesis, determine cause-and-effect relationships, and measure the prevalence of certain phenomena. Quantitative research methods include surveys, experiments, and secondary data analysis.

  11. Organizing Academic Research Papers: 6. The Methodology

    One of the most common deficiencies found in research papers is that the proposed methodology is unsuited to achieving the stated objective of your paper. Describe the specific methods of data collection you are going to use, such as, surveys, interviews, questionnaires, observation, archival research. If you are analyzing existing data, such ...

  12. How to Write the Methods Section of a Research Paper

    The methods section is a fundamental section of any paper since it typically discusses the 'what', 'how', 'which', and 'why' of the study, which is necessary to arrive at the final conclusions. In a research article, the introduction, which serves to set the foundation for comprehending the background and results is usually ...

  13. A tutorial on methodological studies: the what, when, how and why

    Even though methodological studies can be conducted on qualitative or mixed methods research, this paper focuses on and draws examples exclusively from quantitative research. The objectives of this paper are to provide some insights on how to conduct methodological studies so that there is greater consistency between the research questions ...

  14. 15 Types of Research Methods (2024)

    These methods are useful when a detailed understanding of a phenomenon is sought. 1. Ethnographic Research. Ethnographic research emerged out of anthropological research, where anthropologists would enter into a setting for a sustained period of time, getting to know a cultural group and taking detailed observations.

  15. How to use and assess qualitative research methods

    Abstract. This paper aims to provide an overview of the use and assessment of qualitative research methods in the health sciences. Qualitative research can be defined as the study of the nature of phenomena and is especially appropriate for answering questions of why something is (not) observed, assessing complex multi-component interventions ...

  16. Quantitative Methods

    Quantitative methods emphasize objective measurements and the statistical, mathematical, or numerical analysis of data collected through polls, questionnaires, and surveys, or by manipulating pre-existing statistical data using computational techniques.Quantitative research focuses on gathering numerical data and generalizing it across groups of people or to explain a particular phenomenon.

  17. Papers on research methods: The hidden gems of the research literature

    A research methods paper that presents a data analysis software is the contribution of Stiglic, Watson, and Cilar . The researchers present R, a package in the public domain, and provide the code in R for a confirmatory factor analysis. Our last special issue paper is not a research methods paper. Nor is it any of our traditional paper types.

  18. Methods in Ecology and Evolution

    1 INTRODUCTION. A major thrust of ecological research focuses on characterizing spatial and temporal variation in biologically relevant characteristics and understanding their causes and consequences (Scheiner & Willig, 2011).The need to do so is particularly critical in the Anthropocene, as the rate of temporal change in the environment is accelerating, with increasingly well documented ...

  19. Feature selection revisited in the single-cell era

    Recent advances in single-cell biotechnologies have resulted in high-dimensional datasets with increased complexity, making feature selection an essential technique for single-cell data analysis. Here, we revisit feature selection techniques and summarise recent developments. We review their application to a range of single-cell data types generated from traditional cytometry and imaging ...

  20. Recent advances in mathematical methods for finance

    The special issue contains 44 papers, which underwent a rigorous peer review process under the supervision of the Guest Editors. Coherently with the title of the special issue, in the selection of the submitted papers emphasis was placed on the originality and interest of the mathematical methods employed, alongside the relevance of their financial applications.

  21. How to Write an APA Methods Section

    Research papers in the social and natural sciences often follow APA style. This article focuses on reporting quantitative research methods. In your APA methods section, you should report enough information to understand and replicate your study, including detailed information on the sample, measures, and procedures used.

  22. Predicting and improving complex beer flavor through machine ...

    Research in the lab of K.J.V. is supported by KU Leuven, FWO, VIB, VLAIO and the Brewing Science Serves Health Fund. Research in the lab of T.W. is supported by FWO (G.0A51.15) and KU Leuven (C16 ...

  23. [2403.20329] ReALM: Reference Resolution As Language Modeling

    ReALM: Reference Resolution As Language Modeling. Reference resolution is an important problem, one that is essential to understand and successfully handle context of different kinds. This context includes both previous turns and context that pertains to non-conversational entities, such as entities on the user's screen or those running in the ...

  24. Research Paper

    Definition: Research Paper is a written document that presents the author's original research, analysis, and interpretation of a specific topic or issue. It is typically based on Empirical Evidence, and may involve qualitative or quantitative research methods, or a combination of both. The purpose of a research paper is to contribute new ...

  25. The ratting of North America: A 350-year retrospective on

    Although generating multi-decadal and multi-city perspectives using methods presently available to the urban ecology research community could be prohibitive in terms of both time and cost (12, 25), the broad scope of archaeology, encompassing the full geography and timeframe of human-rat relationships, has potential to open new temporal windows ...

  26. Seattle's AI2 Incubator launches online forum for spotlighting research

    Seattle's AI2 Incubator launched a new online forum called Harmonious for discussing research papers and advances related to artificial intelligence, saying the aim was to cut through the glut ...

  27. [2404.00019] Advancing Explainable Autonomous Vehicle Systems: A

    View a PDF of the paper titled Advancing Explainable Autonomous Vehicle Systems: A Comprehensive Review and Research Roadmap, by Sule Tekkesinoglu and 1 other authors. View PDF HTML (experimental) Abstract: Given the uncertainty surrounding how existing explainability methods for autonomous vehicles (AVs) meet the diverse needs of stakeholders ...

  28. Health Survey Research Methods Conference call for papers

    Reposting this to ASA in case there is interest: The 12th Health Survey Research Methods Conference (HSRMC) will continue the series that began 50 years ago, in 1975, to discuss innovative survey research methods that improve the quality of health survey data. The next conference will be held in Williamsburg, VA from March 4-7, 2025.