University of Texas

  • University of Texas Libraries

Literature Reviews

Steps in the literature review process.

  • What is a literature review?
  • Define your research question
  • Determine inclusion and exclusion criteria
  • Choose databases and search
  • Review Results
  • Synthesize Results
  • Analyze Results
  • Librarian Support
  • You may need to some exploratory searching of the literature to get a sense of scope, to determine whether you need to narrow or broaden your focus
  • Identify databases that provide the most relevant sources, and identify relevant terms (controlled vocabularies) to add to your search strategy
  • Finalize your research question
  • Think about relevant dates, geographies (and languages), methods, and conflicting points of view
  • Conduct searches in the published literature via the identified databases
  • Check to see if this topic has been covered in other discipline's databases
  • Examine the citations of on-point articles for keywords, authors, and previous research (via references) and cited reference searching.
  • Save your search results in a citation management tool (such as Zotero, Mendeley or EndNote)
  • De-duplicate your search results
  • Make sure that you've found the seminal pieces -- they have been cited many times, and their work is considered foundational 
  • Check with your professor or a librarian to make sure your search has been comprehensive
  • Evaluate the strengths and weaknesses of individual sources and evaluate for bias, methodologies, and thoroughness
  • Group your results in to an organizational structure that will support why your research needs to be done, or that provides the answer to your research question  
  • Develop your conclusions
  • Are there gaps in the literature?
  • Where has significant research taken place, and who has done it?
  • Is there consensus or debate on this topic?
  • Which methodological approaches work best?
  • For example: Background, Current Practices, Critics and Proponents, Where/How this study will fit in 
  • Organize your citations and focus on your research question and pertinent studies
  • Compile your bibliography

Note: The first four steps are the best points at which to contact a librarian. Your librarian can help you determine the best databases to use for your topic, assess scope, and formulate a search strategy.

Videos Tutorials about Literature Reviews

This 4.5 minute video from Academic Education Materials has a Creative Commons License and a British narrator.

Recommended Reading

Cover Art

  • Last Updated: Oct 26, 2022 2:49 PM
  • URL: https://guides.lib.utexas.edu/literaturereviews

Creative Commons License

  • UConn Library
  • Literature Review: The What, Why and How-to Guide
  • Introduction

Literature Review: The What, Why and How-to Guide — Introduction

  • Getting Started
  • How to Pick a Topic
  • Strategies to Find Sources
  • Evaluating Sources & Lit. Reviews
  • Tips for Writing Literature Reviews
  • Writing Literature Review: Useful Sites
  • Citation Resources
  • Other Academic Writings

What are Literature Reviews?

So, what is a literature review? "A literature review is an account of what has been published on a topic by accredited scholars and researchers. In writing the literature review, your purpose is to convey to your reader what knowledge and ideas have been established on a topic, and what their strengths and weaknesses are. As a piece of writing, the literature review must be defined by a guiding concept (e.g., your research objective, the problem or issue you are discussing, or your argumentative thesis). It is not just a descriptive list of the material available, or a set of summaries." Taylor, D.  The literature review: A few tips on conducting it . University of Toronto Health Sciences Writing Centre.

Goals of Literature Reviews

What are the goals of creating a Literature Review?  A literature could be written to accomplish different aims:

  • To develop a theory or evaluate an existing theory
  • To summarize the historical or existing state of a research topic
  • Identify a problem in a field of research 

Baumeister, R. F., & Leary, M. R. (1997). Writing narrative literature reviews .  Review of General Psychology , 1 (3), 311-320.

What kinds of sources require a Literature Review?

  • A research paper assigned in a course
  • A thesis or dissertation
  • A grant proposal
  • An article intended for publication in a journal

All these instances require you to collect what has been written about your research topic so that you can demonstrate how your own research sheds new light on the topic.

Types of Literature Reviews

What kinds of literature reviews are written?

Narrative review: The purpose of this type of review is to describe the current state of the research on a specific topic/research and to offer a critical analysis of the literature reviewed. Studies are grouped by research/theoretical categories, and themes and trends, strengths and weakness, and gaps are identified. The review ends with a conclusion section which summarizes the findings regarding the state of the research of the specific study, the gaps identify and if applicable, explains how the author's research will address gaps identify in the review and expand the knowledge on the topic reviewed.

  • Example : Predictors and Outcomes of U.S. Quality Maternity Leave: A Review and Conceptual Framework:  10.1177/08948453211037398  

Systematic review : "The authors of a systematic review use a specific procedure to search the research literature, select the studies to include in their review, and critically evaluate the studies they find." (p. 139). Nelson, L. K. (2013). Research in Communication Sciences and Disorders . Plural Publishing.

  • Example : The effect of leave policies on increasing fertility: a systematic review:  10.1057/s41599-022-01270-w

Meta-analysis : "Meta-analysis is a method of reviewing research findings in a quantitative fashion by transforming the data from individual studies into what is called an effect size and then pooling and analyzing this information. The basic goal in meta-analysis is to explain why different outcomes have occurred in different studies." (p. 197). Roberts, M. C., & Ilardi, S. S. (2003). Handbook of Research Methods in Clinical Psychology . Blackwell Publishing.

  • Example : Employment Instability and Fertility in Europe: A Meta-Analysis:  10.1215/00703370-9164737

Meta-synthesis : "Qualitative meta-synthesis is a type of qualitative study that uses as data the findings from other qualitative studies linked by the same or related topic." (p.312). Zimmer, L. (2006). Qualitative meta-synthesis: A question of dialoguing with texts .  Journal of Advanced Nursing , 53 (3), 311-318.

  • Example : Women’s perspectives on career successes and barriers: A qualitative meta-synthesis:  10.1177/05390184221113735

Literature Reviews in the Health Sciences

  • UConn Health subject guide on systematic reviews Explanation of the different review types used in health sciences literature as well as tools to help you find the right review type
  • << Previous: Getting Started
  • Next: How to Pick a Topic >>
  • Last Updated: Sep 21, 2022 2:16 PM
  • URL: https://guides.lib.uconn.edu/literaturereview

Creative Commons

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

  • View all journals
  • Explore content
  • About the journal
  • Publish with us
  • Sign up for alerts
  • CAREER FEATURE
  • 04 December 2020
  • Correction 09 December 2020

How to write a superb literature review

Andy Tay is a freelance writer based in Singapore.

You can also search for this author in PubMed   Google Scholar

Literature reviews are important resources for scientists. They provide historical context for a field while offering opinions on its future trajectory. Creating them can provide inspiration for one’s own research, as well as some practice in writing. But few scientists are trained in how to write a review — or in what constitutes an excellent one. Even picking the appropriate software to use can be an involved decision (see ‘Tools and techniques’). So Nature asked editors and working scientists with well-cited reviews for their tips.

Access options

Access Nature and 54 other Nature Portfolio journals

Get Nature+, our best-value online-access subscription

24,99 € / 30 days

cancel any time

Subscribe to this journal

Receive 51 print issues and online access

185,98 € per year

only 3,65 € per issue

Rent or buy this article

Prices vary by article type

Prices may be subject to local taxes which are calculated during checkout

doi: https://doi.org/10.1038/d41586-020-03422-x

Interviews have been edited for length and clarity.

Updates & Corrections

Correction 09 December 2020 : An earlier version of the tables in this article included some incorrect details about the programs Zotero, Endnote and Manubot. These have now been corrected.

Hsing, I.-M., Xu, Y. & Zhao, W. Electroanalysis 19 , 755–768 (2007).

Article   Google Scholar  

Ledesma, H. A. et al. Nature Nanotechnol. 14 , 645–657 (2019).

Article   PubMed   Google Scholar  

Brahlek, M., Koirala, N., Bansal, N. & Oh, S. Solid State Commun. 215–216 , 54–62 (2015).

Choi, Y. & Lee, S. Y. Nature Rev. Chem . https://doi.org/10.1038/s41570-020-00221-w (2020).

Download references

Related Articles

literature review of techniques

  • Research management

How I run a virtual lab group that’s collaborative, inclusive and productive

How I run a virtual lab group that’s collaborative, inclusive and productive

Career Column 31 MAY 24

Defying the stereotype of Black resilience

Defying the stereotype of Black resilience

Career Q&A 30 MAY 24

How I overcame my stage fright in the lab

How I overcame my stage fright in the lab

Career Column 30 MAY 24

Researcher parents are paying a high price for conference travel — here’s how to fix it

Researcher parents are paying a high price for conference travel — here’s how to fix it

Career Column 27 MAY 24

How researchers in remote regions handle the isolation

How researchers in remote regions handle the isolation

Career Feature 24 MAY 24

Biomedical paper retractions have quadrupled in 20 years — why?

Biomedical paper retractions have quadrupled in 20 years — why?

News 31 MAY 24

What is science? Tech heavyweights brawl over definition

What is science? Tech heavyweights brawl over definition

Japan’s push to make all research open access is taking shape

Japan’s push to make all research open access is taking shape

News 30 MAY 24

Associate Editor, High-energy physics

As an Associate Editor, you will independently handle all phases of the peer review process and help decide what will be published.

Homeworking

American Physical Society

literature review of techniques

Postdoctoral Fellowships: Immuno-Oncology

We currently have multiple postdoctoral fellowship positions available within our multidisciplinary research teams based In Hong Kong.

Hong Kong (HK)

Centre for Oncology and Immunology

literature review of techniques

Chief Editor

Job Title:  Chief Editor Organisation: Nature Ecology & Evolution Location: New York, Berlin or Heidelberg - Hybrid working Closing date: June 23rd...

New York City, New York (US)

Springer Nature Ltd

literature review of techniques

Global Talent Recruitment (Scientist Positions)

Global Talent Gathering for Innovation, Changping Laboratory Recruiting Overseas High-Level Talents.

Beijing, China

Changping Laboratory

literature review of techniques

Postdoctoral Associate - Amyloid Strain Differences in Alzheimer's Disease

Houston, Texas (US)

Baylor College of Medicine (BCM)

literature review of techniques

Sign up for the Nature Briefing newsletter — what matters in science, free to your inbox daily.

Quick links

  • Explore articles by subject
  • Guide to authors
  • Editorial policies

Harvey Cushing/John Hay Whitney Medical Library

  • Collections
  • Research Help

YSN Doctoral Programs: Steps in Conducting a Literature Review

  • Biomedical Databases
  • Global (Public Health) Databases
  • Soc. Sci., History, and Law Databases
  • Grey Literature
  • Trials Registers
  • Data and Statistics
  • Public Policy
  • Google Tips
  • Recommended Books
  • Steps in Conducting a Literature Review

What is a literature review?

A literature review is an integrated analysis -- not just a summary-- of scholarly writings and other relevant evidence related directly to your research question.  That is, it represents a synthesis of the evidence that provides background information on your topic and shows a association between the evidence and your research question.

A literature review may be a stand alone work or the introduction to a larger research paper, depending on the assignment.  Rely heavily on the guidelines your instructor has given you.

Why is it important?

A literature review is important because it:

  • Explains the background of research on a topic.
  • Demonstrates why a topic is significant to a subject area.
  • Discovers relationships between research studies/ideas.
  • Identifies major themes, concepts, and researchers on a topic.
  • Identifies critical gaps and points of disagreement.
  • Discusses further research questions that logically come out of the previous studies.

APA7 Style resources

Cover Art

APA Style Blog - for those harder to find answers

1. Choose a topic. Define your research question.

Your literature review should be guided by your central research question.  The literature represents background and research developments related to a specific research question, interpreted and analyzed by you in a synthesized way.

  • Make sure your research question is not too broad or too narrow.  Is it manageable?
  • Begin writing down terms that are related to your question. These will be useful for searches later.
  • If you have the opportunity, discuss your topic with your professor and your class mates.

2. Decide on the scope of your review

How many studies do you need to look at? How comprehensive should it be? How many years should it cover? 

  • This may depend on your assignment.  How many sources does the assignment require?

3. Select the databases you will use to conduct your searches.

Make a list of the databases you will search. 

Where to find databases:

  • use the tabs on this guide
  • Find other databases in the Nursing Information Resources web page
  • More on the Medical Library web page
  • ... and more on the Yale University Library web page

4. Conduct your searches to find the evidence. Keep track of your searches.

  • Use the key words in your question, as well as synonyms for those words, as terms in your search. Use the database tutorials for help.
  • Save the searches in the databases. This saves time when you want to redo, or modify, the searches. It is also helpful to use as a guide is the searches are not finding any useful results.
  • Review the abstracts of research studies carefully. This will save you time.
  • Use the bibliographies and references of research studies you find to locate others.
  • Check with your professor, or a subject expert in the field, if you are missing any key works in the field.
  • Ask your librarian for help at any time.
  • Use a citation manager, such as EndNote as the repository for your citations. See the EndNote tutorials for help.

Review the literature

Some questions to help you analyze the research:

  • What was the research question of the study you are reviewing? What were the authors trying to discover?
  • Was the research funded by a source that could influence the findings?
  • What were the research methodologies? Analyze its literature review, the samples and variables used, the results, and the conclusions.
  • Does the research seem to be complete? Could it have been conducted more soundly? What further questions does it raise?
  • If there are conflicting studies, why do you think that is?
  • How are the authors viewed in the field? Has this study been cited? If so, how has it been analyzed?

Tips: 

  • Review the abstracts carefully.  
  • Keep careful notes so that you may track your thought processes during the research process.
  • Create a matrix of the studies for easy analysis, and synthesis, across all of the studies.
  • << Previous: Recommended Books
  • Last Updated: Jan 4, 2024 10:52 AM
  • URL: https://guides.library.yale.edu/YSNDoctoral
  • USC Libraries
  • Research Guides

Organizing Your Social Sciences Research Paper

  • 5. The Literature Review
  • Purpose of Guide
  • Design Flaws to Avoid
  • Independent and Dependent Variables
  • Glossary of Research Terms
  • Reading Research Effectively
  • Narrowing a Topic Idea
  • Broadening a Topic Idea
  • Extending the Timeliness of a Topic Idea
  • Academic Writing Style
  • Applying Critical Thinking
  • Choosing a Title
  • Making an Outline
  • Paragraph Development
  • Research Process Video Series
  • Executive Summary
  • The C.A.R.S. Model
  • Background Information
  • The Research Problem/Question
  • Theoretical Framework
  • Citation Tracking
  • Content Alert Services
  • Evaluating Sources
  • Primary Sources
  • Secondary Sources
  • Tiertiary Sources
  • Scholarly vs. Popular Publications
  • Qualitative Methods
  • Quantitative Methods
  • Insiderness
  • Using Non-Textual Elements
  • Limitations of the Study
  • Common Grammar Mistakes
  • Writing Concisely
  • Avoiding Plagiarism
  • Footnotes or Endnotes?
  • Further Readings
  • Generative AI and Writing
  • USC Libraries Tutorials and Other Guides
  • Bibliography

A literature review surveys prior research published in books, scholarly articles, and any other sources relevant to a particular issue, area of research, or theory, and by so doing, provides a description, summary, and critical evaluation of these works in relation to the research problem being investigated. Literature reviews are designed to provide an overview of sources you have used in researching a particular topic and to demonstrate to your readers how your research fits within existing scholarship about the topic.

Fink, Arlene. Conducting Research Literature Reviews: From the Internet to Paper . Fourth edition. Thousand Oaks, CA: SAGE, 2014.

Importance of a Good Literature Review

A literature review may consist of simply a summary of key sources, but in the social sciences, a literature review usually has an organizational pattern and combines both summary and synthesis, often within specific conceptual categories . A summary is a recap of the important information of the source, but a synthesis is a re-organization, or a reshuffling, of that information in a way that informs how you are planning to investigate a research problem. The analytical features of a literature review might:

  • Give a new interpretation of old material or combine new with old interpretations,
  • Trace the intellectual progression of the field, including major debates,
  • Depending on the situation, evaluate the sources and advise the reader on the most pertinent or relevant research, or
  • Usually in the conclusion of a literature review, identify where gaps exist in how a problem has been researched to date.

Given this, the purpose of a literature review is to:

  • Place each work in the context of its contribution to understanding the research problem being studied.
  • Describe the relationship of each work to the others under consideration.
  • Identify new ways to interpret prior research.
  • Reveal any gaps that exist in the literature.
  • Resolve conflicts amongst seemingly contradictory previous studies.
  • Identify areas of prior scholarship to prevent duplication of effort.
  • Point the way in fulfilling a need for additional research.
  • Locate your own research within the context of existing literature [very important].

Fink, Arlene. Conducting Research Literature Reviews: From the Internet to Paper. 2nd ed. Thousand Oaks, CA: Sage, 2005; Hart, Chris. Doing a Literature Review: Releasing the Social Science Research Imagination . Thousand Oaks, CA: Sage Publications, 1998; Jesson, Jill. Doing Your Literature Review: Traditional and Systematic Techniques . Los Angeles, CA: SAGE, 2011; Knopf, Jeffrey W. "Doing a Literature Review." PS: Political Science and Politics 39 (January 2006): 127-132; Ridley, Diana. The Literature Review: A Step-by-Step Guide for Students . 2nd ed. Los Angeles, CA: SAGE, 2012.

Types of Literature Reviews

It is important to think of knowledge in a given field as consisting of three layers. First, there are the primary studies that researchers conduct and publish. Second are the reviews of those studies that summarize and offer new interpretations built from and often extending beyond the primary studies. Third, there are the perceptions, conclusions, opinion, and interpretations that are shared informally among scholars that become part of the body of epistemological traditions within the field.

In composing a literature review, it is important to note that it is often this third layer of knowledge that is cited as "true" even though it often has only a loose relationship to the primary studies and secondary literature reviews. Given this, while literature reviews are designed to provide an overview and synthesis of pertinent sources you have explored, there are a number of approaches you could adopt depending upon the type of analysis underpinning your study.

Argumentative Review This form examines literature selectively in order to support or refute an argument, deeply embedded assumption, or philosophical problem already established in the literature. The purpose is to develop a body of literature that establishes a contrarian viewpoint. Given the value-laden nature of some social science research [e.g., educational reform; immigration control], argumentative approaches to analyzing the literature can be a legitimate and important form of discourse. However, note that they can also introduce problems of bias when they are used to make summary claims of the sort found in systematic reviews [see below].

Integrative Review Considered a form of research that reviews, critiques, and synthesizes representative literature on a topic in an integrated way such that new frameworks and perspectives on the topic are generated. The body of literature includes all studies that address related or identical hypotheses or research problems. A well-done integrative review meets the same standards as primary research in regard to clarity, rigor, and replication. This is the most common form of review in the social sciences.

Historical Review Few things rest in isolation from historical precedent. Historical literature reviews focus on examining research throughout a period of time, often starting with the first time an issue, concept, theory, phenomena emerged in the literature, then tracing its evolution within the scholarship of a discipline. The purpose is to place research in a historical context to show familiarity with state-of-the-art developments and to identify the likely directions for future research.

Methodological Review A review does not always focus on what someone said [findings], but how they came about saying what they say [method of analysis]. Reviewing methods of analysis provides a framework of understanding at different levels [i.e. those of theory, substantive fields, research approaches, and data collection and analysis techniques], how researchers draw upon a wide variety of knowledge ranging from the conceptual level to practical documents for use in fieldwork in the areas of ontological and epistemological consideration, quantitative and qualitative integration, sampling, interviewing, data collection, and data analysis. This approach helps highlight ethical issues which you should be aware of and consider as you go through your own study.

Systematic Review This form consists of an overview of existing evidence pertinent to a clearly formulated research question, which uses pre-specified and standardized methods to identify and critically appraise relevant research, and to collect, report, and analyze data from the studies that are included in the review. The goal is to deliberately document, critically evaluate, and summarize scientifically all of the research about a clearly defined research problem . Typically it focuses on a very specific empirical question, often posed in a cause-and-effect form, such as "To what extent does A contribute to B?" This type of literature review is primarily applied to examining prior research studies in clinical medicine and allied health fields, but it is increasingly being used in the social sciences.

Theoretical Review The purpose of this form is to examine the corpus of theory that has accumulated in regard to an issue, concept, theory, phenomena. The theoretical literature review helps to establish what theories already exist, the relationships between them, to what degree the existing theories have been investigated, and to develop new hypotheses to be tested. Often this form is used to help establish a lack of appropriate theories or reveal that current theories are inadequate for explaining new or emerging research problems. The unit of analysis can focus on a theoretical concept or a whole theory or framework.

NOTE: Most often the literature review will incorporate some combination of types. For example, a review that examines literature supporting or refuting an argument, assumption, or philosophical problem related to the research problem will also need to include writing supported by sources that establish the history of these arguments in the literature.

Baumeister, Roy F. and Mark R. Leary. "Writing Narrative Literature Reviews."  Review of General Psychology 1 (September 1997): 311-320; Mark R. Fink, Arlene. Conducting Research Literature Reviews: From the Internet to Paper . 2nd ed. Thousand Oaks, CA: Sage, 2005; Hart, Chris. Doing a Literature Review: Releasing the Social Science Research Imagination . Thousand Oaks, CA: Sage Publications, 1998; Kennedy, Mary M. "Defining a Literature." Educational Researcher 36 (April 2007): 139-147; Petticrew, Mark and Helen Roberts. Systematic Reviews in the Social Sciences: A Practical Guide . Malden, MA: Blackwell Publishers, 2006; Torracro, Richard. "Writing Integrative Literature Reviews: Guidelines and Examples." Human Resource Development Review 4 (September 2005): 356-367; Rocco, Tonette S. and Maria S. Plakhotnik. "Literature Reviews, Conceptual Frameworks, and Theoretical Frameworks: Terms, Functions, and Distinctions." Human Ressource Development Review 8 (March 2008): 120-130; Sutton, Anthea. Systematic Approaches to a Successful Literature Review . Los Angeles, CA: Sage Publications, 2016.

Structure and Writing Style

I.  Thinking About Your Literature Review

The structure of a literature review should include the following in support of understanding the research problem :

  • An overview of the subject, issue, or theory under consideration, along with the objectives of the literature review,
  • Division of works under review into themes or categories [e.g. works that support a particular position, those against, and those offering alternative approaches entirely],
  • An explanation of how each work is similar to and how it varies from the others,
  • Conclusions as to which pieces are best considered in their argument, are most convincing of their opinions, and make the greatest contribution to the understanding and development of their area of research.

The critical evaluation of each work should consider :

  • Provenance -- what are the author's credentials? Are the author's arguments supported by evidence [e.g. primary historical material, case studies, narratives, statistics, recent scientific findings]?
  • Methodology -- were the techniques used to identify, gather, and analyze the data appropriate to addressing the research problem? Was the sample size appropriate? Were the results effectively interpreted and reported?
  • Objectivity -- is the author's perspective even-handed or prejudicial? Is contrary data considered or is certain pertinent information ignored to prove the author's point?
  • Persuasiveness -- which of the author's theses are most convincing or least convincing?
  • Validity -- are the author's arguments and conclusions convincing? Does the work ultimately contribute in any significant way to an understanding of the subject?

II.  Development of the Literature Review

Four Basic Stages of Writing 1.  Problem formulation -- which topic or field is being examined and what are its component issues? 2.  Literature search -- finding materials relevant to the subject being explored. 3.  Data evaluation -- determining which literature makes a significant contribution to the understanding of the topic. 4.  Analysis and interpretation -- discussing the findings and conclusions of pertinent literature.

Consider the following issues before writing the literature review: Clarify If your assignment is not specific about what form your literature review should take, seek clarification from your professor by asking these questions: 1.  Roughly how many sources would be appropriate to include? 2.  What types of sources should I review (books, journal articles, websites; scholarly versus popular sources)? 3.  Should I summarize, synthesize, or critique sources by discussing a common theme or issue? 4.  Should I evaluate the sources in any way beyond evaluating how they relate to understanding the research problem? 5.  Should I provide subheadings and other background information, such as definitions and/or a history? Find Models Use the exercise of reviewing the literature to examine how authors in your discipline or area of interest have composed their literature review sections. Read them to get a sense of the types of themes you might want to look for in your own research or to identify ways to organize your final review. The bibliography or reference section of sources you've already read, such as required readings in the course syllabus, are also excellent entry points into your own research. Narrow the Topic The narrower your topic, the easier it will be to limit the number of sources you need to read in order to obtain a good survey of relevant resources. Your professor will probably not expect you to read everything that's available about the topic, but you'll make the act of reviewing easier if you first limit scope of the research problem. A good strategy is to begin by searching the USC Libraries Catalog for recent books about the topic and review the table of contents for chapters that focuses on specific issues. You can also review the indexes of books to find references to specific issues that can serve as the focus of your research. For example, a book surveying the history of the Israeli-Palestinian conflict may include a chapter on the role Egypt has played in mediating the conflict, or look in the index for the pages where Egypt is mentioned in the text. Consider Whether Your Sources are Current Some disciplines require that you use information that is as current as possible. This is particularly true in disciplines in medicine and the sciences where research conducted becomes obsolete very quickly as new discoveries are made. However, when writing a review in the social sciences, a survey of the history of the literature may be required. In other words, a complete understanding the research problem requires you to deliberately examine how knowledge and perspectives have changed over time. Sort through other current bibliographies or literature reviews in the field to get a sense of what your discipline expects. You can also use this method to explore what is considered by scholars to be a "hot topic" and what is not.

III.  Ways to Organize Your Literature Review

Chronology of Events If your review follows the chronological method, you could write about the materials according to when they were published. This approach should only be followed if a clear path of research building on previous research can be identified and that these trends follow a clear chronological order of development. For example, a literature review that focuses on continuing research about the emergence of German economic power after the fall of the Soviet Union. By Publication Order your sources by publication chronology, then, only if the order demonstrates a more important trend. For instance, you could order a review of literature on environmental studies of brown fields if the progression revealed, for example, a change in the soil collection practices of the researchers who wrote and/or conducted the studies. Thematic [“conceptual categories”] A thematic literature review is the most common approach to summarizing prior research in the social and behavioral sciences. Thematic reviews are organized around a topic or issue, rather than the progression of time, although the progression of time may still be incorporated into a thematic review. For example, a review of the Internet’s impact on American presidential politics could focus on the development of online political satire. While the study focuses on one topic, the Internet’s impact on American presidential politics, it would still be organized chronologically reflecting technological developments in media. The difference in this example between a "chronological" and a "thematic" approach is what is emphasized the most: themes related to the role of the Internet in presidential politics. Note that more authentic thematic reviews tend to break away from chronological order. A review organized in this manner would shift between time periods within each section according to the point being made. Methodological A methodological approach focuses on the methods utilized by the researcher. For the Internet in American presidential politics project, one methodological approach would be to look at cultural differences between the portrayal of American presidents on American, British, and French websites. Or the review might focus on the fundraising impact of the Internet on a particular political party. A methodological scope will influence either the types of documents in the review or the way in which these documents are discussed.

Other Sections of Your Literature Review Once you've decided on the organizational method for your literature review, the sections you need to include in the paper should be easy to figure out because they arise from your organizational strategy. In other words, a chronological review would have subsections for each vital time period; a thematic review would have subtopics based upon factors that relate to the theme or issue. However, sometimes you may need to add additional sections that are necessary for your study, but do not fit in the organizational strategy of the body. What other sections you include in the body is up to you. However, only include what is necessary for the reader to locate your study within the larger scholarship about the research problem.

Here are examples of other sections, usually in the form of a single paragraph, you may need to include depending on the type of review you write:

  • Current Situation : Information necessary to understand the current topic or focus of the literature review.
  • Sources Used : Describes the methods and resources [e.g., databases] you used to identify the literature you reviewed.
  • History : The chronological progression of the field, the research literature, or an idea that is necessary to understand the literature review, if the body of the literature review is not already a chronology.
  • Selection Methods : Criteria you used to select (and perhaps exclude) sources in your literature review. For instance, you might explain that your review includes only peer-reviewed [i.e., scholarly] sources.
  • Standards : Description of the way in which you present your information.
  • Questions for Further Research : What questions about the field has the review sparked? How will you further your research as a result of the review?

IV.  Writing Your Literature Review

Once you've settled on how to organize your literature review, you're ready to write each section. When writing your review, keep in mind these issues.

Use Evidence A literature review section is, in this sense, just like any other academic research paper. Your interpretation of the available sources must be backed up with evidence [citations] that demonstrates that what you are saying is valid. Be Selective Select only the most important points in each source to highlight in the review. The type of information you choose to mention should relate directly to the research problem, whether it is thematic, methodological, or chronological. Related items that provide additional information, but that are not key to understanding the research problem, can be included in a list of further readings . Use Quotes Sparingly Some short quotes are appropriate if you want to emphasize a point, or if what an author stated cannot be easily paraphrased. Sometimes you may need to quote certain terminology that was coined by the author, is not common knowledge, or taken directly from the study. Do not use extensive quotes as a substitute for using your own words in reviewing the literature. Summarize and Synthesize Remember to summarize and synthesize your sources within each thematic paragraph as well as throughout the review. Recapitulate important features of a research study, but then synthesize it by rephrasing the study's significance and relating it to your own work and the work of others. Keep Your Own Voice While the literature review presents others' ideas, your voice [the writer's] should remain front and center. For example, weave references to other sources into what you are writing but maintain your own voice by starting and ending the paragraph with your own ideas and wording. Use Caution When Paraphrasing When paraphrasing a source that is not your own, be sure to represent the author's information or opinions accurately and in your own words. Even when paraphrasing an author’s work, you still must provide a citation to that work.

V.  Common Mistakes to Avoid

These are the most common mistakes made in reviewing social science research literature.

  • Sources in your literature review do not clearly relate to the research problem;
  • You do not take sufficient time to define and identify the most relevant sources to use in the literature review related to the research problem;
  • Relies exclusively on secondary analytical sources rather than including relevant primary research studies or data;
  • Uncritically accepts another researcher's findings and interpretations as valid, rather than examining critically all aspects of the research design and analysis;
  • Does not describe the search procedures that were used in identifying the literature to review;
  • Reports isolated statistical results rather than synthesizing them in chi-squared or meta-analytic methods; and,
  • Only includes research that validates assumptions and does not consider contrary findings and alternative interpretations found in the literature.

Cook, Kathleen E. and Elise Murowchick. “Do Literature Review Skills Transfer from One Course to Another?” Psychology Learning and Teaching 13 (March 2014): 3-11; Fink, Arlene. Conducting Research Literature Reviews: From the Internet to Paper . 2nd ed. Thousand Oaks, CA: Sage, 2005; Hart, Chris. Doing a Literature Review: Releasing the Social Science Research Imagination . Thousand Oaks, CA: Sage Publications, 1998; Jesson, Jill. Doing Your Literature Review: Traditional and Systematic Techniques . London: SAGE, 2011; Literature Review Handout. Online Writing Center. Liberty University; Literature Reviews. The Writing Center. University of North Carolina; Onwuegbuzie, Anthony J. and Rebecca Frels. Seven Steps to a Comprehensive Literature Review: A Multimodal and Cultural Approach . Los Angeles, CA: SAGE, 2016; Ridley, Diana. The Literature Review: A Step-by-Step Guide for Students . 2nd ed. Los Angeles, CA: SAGE, 2012; Randolph, Justus J. “A Guide to Writing the Dissertation Literature Review." Practical Assessment, Research, and Evaluation. vol. 14, June 2009; Sutton, Anthea. Systematic Approaches to a Successful Literature Review . Los Angeles, CA: Sage Publications, 2016; Taylor, Dena. The Literature Review: A Few Tips On Conducting It. University College Writing Centre. University of Toronto; Writing a Literature Review. Academic Skills Centre. University of Canberra.

Writing Tip

Break Out of Your Disciplinary Box!

Thinking interdisciplinarily about a research problem can be a rewarding exercise in applying new ideas, theories, or concepts to an old problem. For example, what might cultural anthropologists say about the continuing conflict in the Middle East? In what ways might geographers view the need for better distribution of social service agencies in large cities than how social workers might study the issue? You don’t want to substitute a thorough review of core research literature in your discipline for studies conducted in other fields of study. However, particularly in the social sciences, thinking about research problems from multiple vectors is a key strategy for finding new solutions to a problem or gaining a new perspective. Consult with a librarian about identifying research databases in other disciplines; almost every field of study has at least one comprehensive database devoted to indexing its research literature.

Frodeman, Robert. The Oxford Handbook of Interdisciplinarity . New York: Oxford University Press, 2010.

Another Writing Tip

Don't Just Review for Content!

While conducting a review of the literature, maximize the time you devote to writing this part of your paper by thinking broadly about what you should be looking for and evaluating. Review not just what scholars are saying, but how are they saying it. Some questions to ask:

  • How are they organizing their ideas?
  • What methods have they used to study the problem?
  • What theories have been used to explain, predict, or understand their research problem?
  • What sources have they cited to support their conclusions?
  • How have they used non-textual elements [e.g., charts, graphs, figures, etc.] to illustrate key points?

When you begin to write your literature review section, you'll be glad you dug deeper into how the research was designed and constructed because it establishes a means for developing more substantial analysis and interpretation of the research problem.

Hart, Chris. Doing a Literature Review: Releasing the Social Science Research Imagination . Thousand Oaks, CA: Sage Publications, 1 998.

Yet Another Writing Tip

When Do I Know I Can Stop Looking and Move On?

Here are several strategies you can utilize to assess whether you've thoroughly reviewed the literature:

  • Look for repeating patterns in the research findings . If the same thing is being said, just by different people, then this likely demonstrates that the research problem has hit a conceptual dead end. At this point consider: Does your study extend current research?  Does it forge a new path? Or, does is merely add more of the same thing being said?
  • Look at sources the authors cite to in their work . If you begin to see the same researchers cited again and again, then this is often an indication that no new ideas have been generated to address the research problem.
  • Search Google Scholar to identify who has subsequently cited leading scholars already identified in your literature review [see next sub-tab]. This is called citation tracking and there are a number of sources that can help you identify who has cited whom, particularly scholars from outside of your discipline. Here again, if the same authors are being cited again and again, this may indicate no new literature has been written on the topic.

Onwuegbuzie, Anthony J. and Rebecca Frels. Seven Steps to a Comprehensive Literature Review: A Multimodal and Cultural Approach . Los Angeles, CA: Sage, 2016; Sutton, Anthea. Systematic Approaches to a Successful Literature Review . Los Angeles, CA: Sage Publications, 2016.

  • << Previous: Theoretical Framework
  • Next: Citation Tracking >>
  • Last Updated: May 30, 2024 9:38 AM
  • URL: https://libguides.usc.edu/writingguide

Libraries | Research Guides

Literature reviews, what is a literature review, learning more about how to do a literature review.

  • Planning the Review
  • The Research Question
  • Choosing Where to Search
  • Organizing the Review
  • Writing the Review

A literature review is a review and synthesis of existing research on a topic or research question. A literature review is meant to analyze the scholarly literature, make connections across writings and identify strengths, weaknesses, trends, and missing conversations. A literature review should address different aspects of a topic as it relates to your research question. A literature review goes beyond a description or summary of the literature you have read. 

  • Sage Research Methods Core Collection This link opens in a new window SAGE Research Methods supports research at all levels by providing material to guide users through every step of the research process. SAGE Research Methods is the ultimate methods library with more than 1000 books, reference works, journal articles, and instructional videos by world-leading academics from across the social sciences, including the largest collection of qualitative methods books available online from any scholarly publisher. – Publisher

Cover Art

  • Next: Planning the Review >>
  • Last Updated: May 2, 2024 10:39 AM
  • URL: https://libguides.northwestern.edu/literaturereviews

literature review of techniques

Get science-backed answers as you write with Paperpal's Research feature

What is a Literature Review? How to Write It (with Examples)

literature review

A literature review is a critical analysis and synthesis of existing research on a particular topic. It provides an overview of the current state of knowledge, identifies gaps, and highlights key findings in the literature. 1 The purpose of a literature review is to situate your own research within the context of existing scholarship, demonstrating your understanding of the topic and showing how your work contributes to the ongoing conversation in the field. Learning how to write a literature review is a critical tool for successful research. Your ability to summarize and synthesize prior research pertaining to a certain topic demonstrates your grasp on the topic of study, and assists in the learning process. 

Table of Contents

  • What is the purpose of literature review? 
  • a. Habitat Loss and Species Extinction: 
  • b. Range Shifts and Phenological Changes: 
  • c. Ocean Acidification and Coral Reefs: 
  • d. Adaptive Strategies and Conservation Efforts: 

How to write a good literature review 

  • Choose a Topic and Define the Research Question: 
  • Decide on the Scope of Your Review: 
  • Select Databases for Searches: 
  • Conduct Searches and Keep Track: 
  • Review the Literature: 
  • Organize and Write Your Literature Review: 
  • How to write a literature review faster with Paperpal? 
  • Frequently asked questions 

What is a literature review?

A well-conducted literature review demonstrates the researcher’s familiarity with the existing literature, establishes the context for their own research, and contributes to scholarly conversations on the topic. One of the purposes of a literature review is also to help researchers avoid duplicating previous work and ensure that their research is informed by and builds upon the existing body of knowledge.

literature review of techniques

What is the purpose of literature review?

A literature review serves several important purposes within academic and research contexts. Here are some key objectives and functions of a literature review: 2  

1. Contextualizing the Research Problem: The literature review provides a background and context for the research problem under investigation. It helps to situate the study within the existing body of knowledge. 

2. Identifying Gaps in Knowledge: By identifying gaps, contradictions, or areas requiring further research, the researcher can shape the research question and justify the significance of the study. This is crucial for ensuring that the new research contributes something novel to the field. 

Find academic papers related to your research topic faster. Try Research on Paperpal  

3. Understanding Theoretical and Conceptual Frameworks: Literature reviews help researchers gain an understanding of the theoretical and conceptual frameworks used in previous studies. This aids in the development of a theoretical framework for the current research. 

4. Providing Methodological Insights: Another purpose of literature reviews is that it allows researchers to learn about the methodologies employed in previous studies. This can help in choosing appropriate research methods for the current study and avoiding pitfalls that others may have encountered. 

5. Establishing Credibility: A well-conducted literature review demonstrates the researcher’s familiarity with existing scholarship, establishing their credibility and expertise in the field. It also helps in building a solid foundation for the new research. 

6. Informing Hypotheses or Research Questions: The literature review guides the formulation of hypotheses or research questions by highlighting relevant findings and areas of uncertainty in existing literature. 

Literature review example

Let’s delve deeper with a literature review example: Let’s say your literature review is about the impact of climate change on biodiversity. You might format your literature review into sections such as the effects of climate change on habitat loss and species extinction, phenological changes, and marine biodiversity. Each section would then summarize and analyze relevant studies in those areas, highlighting key findings and identifying gaps in the research. The review would conclude by emphasizing the need for further research on specific aspects of the relationship between climate change and biodiversity. The following literature review template provides a glimpse into the recommended literature review structure and content, demonstrating how research findings are organized around specific themes within a broader topic. 

Literature Review on Climate Change Impacts on Biodiversity:

Climate change is a global phenomenon with far-reaching consequences, including significant impacts on biodiversity. This literature review synthesizes key findings from various studies: 

a. Habitat Loss and Species Extinction:

Climate change-induced alterations in temperature and precipitation patterns contribute to habitat loss, affecting numerous species (Thomas et al., 2004). The review discusses how these changes increase the risk of extinction, particularly for species with specific habitat requirements. 

b. Range Shifts and Phenological Changes:

Observations of range shifts and changes in the timing of biological events (phenology) are documented in response to changing climatic conditions (Parmesan & Yohe, 2003). These shifts affect ecosystems and may lead to mismatches between species and their resources. 

c. Ocean Acidification and Coral Reefs:

The review explores the impact of climate change on marine biodiversity, emphasizing ocean acidification’s threat to coral reefs (Hoegh-Guldberg et al., 2007). Changes in pH levels negatively affect coral calcification, disrupting the delicate balance of marine ecosystems. 

d. Adaptive Strategies and Conservation Efforts:

Recognizing the urgency of the situation, the literature review discusses various adaptive strategies adopted by species and conservation efforts aimed at mitigating the impacts of climate change on biodiversity (Hannah et al., 2007). It emphasizes the importance of interdisciplinary approaches for effective conservation planning. 

literature review of techniques

Strengthen your literature review with factual insights. Try Research on Paperpal for free!    

Writing a literature review involves summarizing and synthesizing existing research on a particular topic. A good literature review format should include the following elements. 

Introduction: The introduction sets the stage for your literature review, providing context and introducing the main focus of your review. 

  • Opening Statement: Begin with a general statement about the broader topic and its significance in the field. 
  • Scope and Purpose: Clearly define the scope of your literature review. Explain the specific research question or objective you aim to address. 
  • Organizational Framework: Briefly outline the structure of your literature review, indicating how you will categorize and discuss the existing research. 
  • Significance of the Study: Highlight why your literature review is important and how it contributes to the understanding of the chosen topic. 
  • Thesis Statement: Conclude the introduction with a concise thesis statement that outlines the main argument or perspective you will develop in the body of the literature review. 

Body: The body of the literature review is where you provide a comprehensive analysis of existing literature, grouping studies based on themes, methodologies, or other relevant criteria. 

  • Organize by Theme or Concept: Group studies that share common themes, concepts, or methodologies. Discuss each theme or concept in detail, summarizing key findings and identifying gaps or areas of disagreement. 
  • Critical Analysis: Evaluate the strengths and weaknesses of each study. Discuss the methodologies used, the quality of evidence, and the overall contribution of each work to the understanding of the topic. 
  • Synthesis of Findings: Synthesize the information from different studies to highlight trends, patterns, or areas of consensus in the literature. 
  • Identification of Gaps: Discuss any gaps or limitations in the existing research and explain how your review contributes to filling these gaps. 
  • Transition between Sections: Provide smooth transitions between different themes or concepts to maintain the flow of your literature review. 

Write and Cite as you go with Paperpal Research. Start now for free.   

Conclusion: The conclusion of your literature review should summarize the main findings, highlight the contributions of the review, and suggest avenues for future research. 

  • Summary of Key Findings: Recap the main findings from the literature and restate how they contribute to your research question or objective. 
  • Contributions to the Field: Discuss the overall contribution of your literature review to the existing knowledge in the field. 
  • Implications and Applications: Explore the practical implications of the findings and suggest how they might impact future research or practice. 
  • Recommendations for Future Research: Identify areas that require further investigation and propose potential directions for future research in the field. 
  • Final Thoughts: Conclude with a final reflection on the importance of your literature review and its relevance to the broader academic community. 

what is a literature review

Conducting a literature review

Conducting a literature review is an essential step in research that involves reviewing and analyzing existing literature on a specific topic. It’s important to know how to do a literature review effectively, so here are the steps to follow: 1  

Choose a Topic and Define the Research Question:

  • Select a topic that is relevant to your field of study. 
  • Clearly define your research question or objective. Determine what specific aspect of the topic do you want to explore? 

Decide on the Scope of Your Review:

  • Determine the timeframe for your literature review. Are you focusing on recent developments, or do you want a historical overview? 
  • Consider the geographical scope. Is your review global, or are you focusing on a specific region? 
  • Define the inclusion and exclusion criteria. What types of sources will you include? Are there specific types of studies or publications you will exclude? 

Select Databases for Searches:

  • Identify relevant databases for your field. Examples include PubMed, IEEE Xplore, Scopus, Web of Science, and Google Scholar. 
  • Consider searching in library catalogs, institutional repositories, and specialized databases related to your topic. 

Conduct Searches and Keep Track:

  • Develop a systematic search strategy using keywords, Boolean operators (AND, OR, NOT), and other search techniques. 
  • Record and document your search strategy for transparency and replicability. 
  • Keep track of the articles, including publication details, abstracts, and links. Use citation management tools like EndNote, Zotero, or Mendeley to organize your references. 

Review the Literature:

  • Evaluate the relevance and quality of each source. Consider the methodology, sample size, and results of studies. 
  • Organize the literature by themes or key concepts. Identify patterns, trends, and gaps in the existing research. 
  • Summarize key findings and arguments from each source. Compare and contrast different perspectives. 
  • Identify areas where there is a consensus in the literature and where there are conflicting opinions. 
  • Provide critical analysis and synthesis of the literature. What are the strengths and weaknesses of existing research? 

Organize and Write Your Literature Review:

  • Literature review outline should be based on themes, chronological order, or methodological approaches. 
  • Write a clear and coherent narrative that synthesizes the information gathered. 
  • Use proper citations for each source and ensure consistency in your citation style (APA, MLA, Chicago, etc.). 
  • Conclude your literature review by summarizing key findings, identifying gaps, and suggesting areas for future research. 

Whether you’re exploring a new research field or finding new angles to develop an existing topic, sifting through hundreds of papers can take more time than you have to spare. But what if you could find science-backed insights with verified citations in seconds? That’s the power of Paperpal’s new Research feature!  

How to write a literature review faster with Paperpal?

Paperpal, an AI writing assistant, integrates powerful academic search capabilities within its writing platform. With the Research feature, you get 100% factual insights, with citations backed by 250M+ verified research articles, directly within your writing interface with the option to save relevant references in your Citation Library. By eliminating the need to switch tabs to find answers to all your research questions, Paperpal saves time and helps you stay focused on your writing.   

Here’s how to use the Research feature:  

  • Ask a question: Get started with a new document on paperpal.com. Click on the “Research” feature and type your question in plain English. Paperpal will scour over 250 million research articles, including conference papers and preprints, to provide you with accurate insights and citations. 
  • Review and Save: Paperpal summarizes the information, while citing sources and listing relevant reads. You can quickly scan the results to identify relevant references and save these directly to your built-in citations library for later access. 
  • Cite with Confidence: Paperpal makes it easy to incorporate relevant citations and references into your writing, ensuring your arguments are well-supported by credible sources. This translates to a polished, well-researched literature review. 

The literature review sample and detailed advice on writing and conducting a review will help you produce a well-structured report. But remember that a good literature review is an ongoing process, and it may be necessary to revisit and update it as your research progresses. By combining effortless research with an easy citation process, Paperpal Research streamlines the literature review process and empowers you to write faster and with more confidence. Try Paperpal Research now and see for yourself.  

Frequently asked questions

A literature review is a critical and comprehensive analysis of existing literature (published and unpublished works) on a specific topic or research question and provides a synthesis of the current state of knowledge in a particular field. A well-conducted literature review is crucial for researchers to build upon existing knowledge, avoid duplication of efforts, and contribute to the advancement of their field. It also helps researchers situate their work within a broader context and facilitates the development of a sound theoretical and conceptual framework for their studies.

Literature review is a crucial component of research writing, providing a solid background for a research paper’s investigation. The aim is to keep professionals up to date by providing an understanding of ongoing developments within a specific field, including research methods, and experimental techniques used in that field, and present that knowledge in the form of a written report. Also, the depth and breadth of the literature review emphasizes the credibility of the scholar in his or her field.  

Before writing a literature review, it’s essential to undertake several preparatory steps to ensure that your review is well-researched, organized, and focused. This includes choosing a topic of general interest to you and doing exploratory research on that topic, writing an annotated bibliography, and noting major points, especially those that relate to the position you have taken on the topic. 

Literature reviews and academic research papers are essential components of scholarly work but serve different purposes within the academic realm. 3 A literature review aims to provide a foundation for understanding the current state of research on a particular topic, identify gaps or controversies, and lay the groundwork for future research. Therefore, it draws heavily from existing academic sources, including books, journal articles, and other scholarly publications. In contrast, an academic research paper aims to present new knowledge, contribute to the academic discourse, and advance the understanding of a specific research question. Therefore, it involves a mix of existing literature (in the introduction and literature review sections) and original data or findings obtained through research methods. 

Literature reviews are essential components of academic and research papers, and various strategies can be employed to conduct them effectively. If you want to know how to write a literature review for a research paper, here are four common approaches that are often used by researchers.  Chronological Review: This strategy involves organizing the literature based on the chronological order of publication. It helps to trace the development of a topic over time, showing how ideas, theories, and research have evolved.  Thematic Review: Thematic reviews focus on identifying and analyzing themes or topics that cut across different studies. Instead of organizing the literature chronologically, it is grouped by key themes or concepts, allowing for a comprehensive exploration of various aspects of the topic.  Methodological Review: This strategy involves organizing the literature based on the research methods employed in different studies. It helps to highlight the strengths and weaknesses of various methodologies and allows the reader to evaluate the reliability and validity of the research findings.  Theoretical Review: A theoretical review examines the literature based on the theoretical frameworks used in different studies. This approach helps to identify the key theories that have been applied to the topic and assess their contributions to the understanding of the subject.  It’s important to note that these strategies are not mutually exclusive, and a literature review may combine elements of more than one approach. The choice of strategy depends on the research question, the nature of the literature available, and the goals of the review. Additionally, other strategies, such as integrative reviews or systematic reviews, may be employed depending on the specific requirements of the research.

The literature review format can vary depending on the specific publication guidelines. However, there are some common elements and structures that are often followed. Here is a general guideline for the format of a literature review:  Introduction:   Provide an overview of the topic.  Define the scope and purpose of the literature review.  State the research question or objective.  Body:   Organize the literature by themes, concepts, or chronology.  Critically analyze and evaluate each source.  Discuss the strengths and weaknesses of the studies.  Highlight any methodological limitations or biases.  Identify patterns, connections, or contradictions in the existing research.  Conclusion:   Summarize the key points discussed in the literature review.  Highlight the research gap.  Address the research question or objective stated in the introduction.  Highlight the contributions of the review and suggest directions for future research.

Both annotated bibliographies and literature reviews involve the examination of scholarly sources. While annotated bibliographies focus on individual sources with brief annotations, literature reviews provide a more in-depth, integrated, and comprehensive analysis of existing literature on a specific topic. The key differences are as follows: 

References 

  • Denney, A. S., & Tewksbury, R. (2013). How to write a literature review.  Journal of criminal justice education ,  24 (2), 218-234. 
  • Pan, M. L. (2016).  Preparing literature reviews: Qualitative and quantitative approaches . Taylor & Francis. 
  • Cantero, C. (2019). How to write a literature review.  San José State University Writing Center . 

Paperpal is an AI writing assistant that help academics write better, faster with real-time suggestions for in-depth language and grammar correction. Trained on millions of research manuscripts enhanced by professional academic editors, Paperpal delivers human precision at machine speed.  

Try it for free or upgrade to  Paperpal Prime , which unlocks unlimited access to premium features like academic translation, paraphrasing, contextual synonyms, consistency checks and more. It’s like always having a professional academic editor by your side! Go beyond limitations and experience the future of academic writing.  Get Paperpal Prime now at just US$19 a month!

Related Reads:

  • Empirical Research: A Comprehensive Guide for Academics 
  • How to Write a Scientific Paper in 10 Steps 
  • How Long Should a Chapter Be?
  • How to Use Paperpal to Generate Emails & Cover Letters?

6 Tips for Post-Doc Researchers to Take Their Career to the Next Level

Self-plagiarism in research: what it is and how to avoid it, you may also like, mla works cited page: format, template & examples, how to ace grant writing for research funding..., powerful academic phrases to improve your essay writing , how to write a high-quality conference paper, how paperpal’s research feature helps you develop and..., how paperpal is enhancing academic productivity and accelerating..., how to write a successful book chapter for..., academic editing: how to self-edit academic text with..., 4 ways paperpal encourages responsible writing with ai, what are scholarly sources and where can you....

U.S. flag

An official website of the United States government

The .gov means it’s official. Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

The site is secure. The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

  • Publications
  • Account settings

Preview improvements coming to the PMC website in October 2024. Learn More or Try it out now .

  • Advanced Search
  • Journal List
  • Indian J Sex Transm Dis AIDS
  • v.35(2); Jul-Dec 2014

Reviewing literature for research: Doing it the right way

Shital amin poojary.

Department of Dermatology, K J Somaiya Medical College, Mumbai, Maharashtra, India

Jimish Deepak Bagadia

In an era of information overload, it is important to know how to obtain the required information and also to ensure that it is reliable information. Hence, it is essential to understand how to perform a systematic literature search. This article focuses on reliable literature sources and how to make optimum use of these in dermatology and venereology.

INTRODUCTION

A thorough review of literature is not only essential for selecting research topics, but also enables the right applicability of a research project. Most importantly, a good literature search is the cornerstone of practice of evidence based medicine. Today, everything is available at the click of a mouse or at the tip of the fingertips (or the stylus). Google is often the Go-To search website, the supposed answer to all questions in the universe. However, the deluge of information available comes with its own set of problems; how much of it is actually reliable information? How much are the search results that the search string threw up actually relevant? Did we actually find what we were looking for? Lack of a systematic approach can lead to a literature review ending up as a time-consuming and at times frustrating process. Hence, whether it is for research projects, theses/dissertations, case studies/reports or mere wish to obtain information; knowing where to look, and more importantly, how to look, is of prime importance today.

Literature search

Fink has defined research literature review as a “systematic, explicit and reproducible method for identifying, evaluating, and synthesizing the existing body of completed and recorded work produced by researchers, scholars and practitioners.”[ 1 ]

Review of research literature can be summarized into a seven step process: (i) Selecting research questions/purpose of the literature review (ii) Selecting your sources (iii) Choosing search terms (iv) Running your search (v) Applying practical screening criteria (vi) Applying methodological screening criteria/quality appraisal (vii) Synthesizing the results.[ 1 ]

This article will primarily concentrate on refining techniques of literature search.

Sources for literature search are enumerated in Table 1 .

Sources for literature search

An external file that holds a picture, illustration, etc.
Object name is IJSTD-35-85-g001.jpg

PubMed is currently the most widely used among these as it contains over 23 million citations for biomedical literature and has been made available free by National Center for Biotechnology Information (NCBI), U.S. National Library of Medicine. However, the availability of free full text articles depends on the sources. Use of options such as advanced search, medical subject headings (MeSH) terms, free full text, PubMed tutorials, and single citation matcher makes the database extremely user-friendly [ Figure 1 ]. It can also be accessed on the go through mobiles using “PubMed Mobile.” One can also create own account in NCBI to save searches and to use certain PubMed tools.

An external file that holds a picture, illustration, etc.
Object name is IJSTD-35-85-g002.jpg

PubMed home page showing location of different tools which can be used for an efficient literature search

Tips for efficient use of PubMed search:[ 2 , 3 , 4 ]

Use of field and Boolean operators

When one searches using key words, all articles containing the words show up, many of which may not be related to the topic. Hence, the use of operators while searching makes the search more specific and less cumbersome. Operators are of two types: Field operators and Boolean operators, the latter enabling us to combine more than one concept, thereby making the search highly accurate. A few key operators that can be used in PubMed are shown in Tables ​ Tables2 2 and ​ and3 3 and illustrated in Figures ​ Figures2 2 and ​ and3 3 .

Field operators used in PubMed search

An external file that holds a picture, illustration, etc.
Object name is IJSTD-35-85-g003.jpg

Boolean operators used in PubMed search

An external file that holds a picture, illustration, etc.
Object name is IJSTD-35-85-g004.jpg

PubMed search results page showing articles on donovanosis using the field operator [TIAB]; it shows all articles which have the keyword “donovanosis” in either title or abstract of the article

An external file that holds a picture, illustration, etc.
Object name is IJSTD-35-85-g006.jpg

PubMed search using Boolean operators ‘AND’, ‘NOT’; To search for articles on treatment of lepra reaction other than steroids, after clicking the option ‘Advanced search’ on the home page, one can build the search using ‘AND’ option for treatment and ‘NOT’ option for steroids to omit articles on steroid treatment in lepra reaction

Use of medical subject headings terms

These are very specific and standardized terms used by indexers to describe every article in PubMed and are added to the record of every article. A search using MeSH will show all articles about the topic (or keywords), but will not show articles only containing these keywords (these articles may be about an entirely different topic, but still may contain your keywords in another context in any part of the article). This will make your search more specific. Within the topic, specific subheadings can be added to the search builder to refine your search [ Figure 4 ]. For example, MeSH terms for treatment are therapy and therapeutics.

An external file that holds a picture, illustration, etc.
Object name is IJSTD-35-85-g007.jpg

PubMed search using medical subject headings (MeSH) terms for management of gonorrhea. Click on MeSH database ( Figure 1 ) →In the MeSH search box type gonorrhea and click search. Under the MeSH term gonorrhea, there will be a list of subheadings; therapy, prevention and control, click the relevant check boxes and add to search builder →Click on search →All articles on therapy, prevention and control of gonorrhea will be displayed. Below the subheadings, there are two options: (1) Restrict to medical subject headings (MeSH) major topic and (2) do not include MeSH terms found below this term in the MeSH hierarchy. These can be used to further refine the search results so that only articles which are majorly about treatment of gonorrhea will be displayed

Two additional options can be used to further refine MeSH searches. These are located below the subheadings for a MeSH term: (1) Restrict to MeSH major topic; checking this box will retrieve articles which are majorly about the search term and are therefore, more focused and (2) Do not include MeSH terms found below this term in the MeSH hierarchy. This option will again give you more focused articles as it excludes the lower specific terms [ Figure 4 ].

Similar feature is available with Cochrane library (also called MeSH), EMBASE (known as EMTREE) and PsycINFO (Thesaurus of Psychological Index Terms).

Saving your searches

Any search that one has performed can be saved by using the ‘Send to’ option and can be saved as a simple word file [ Figure 5 ]. Alternatively, the ‘Save Search’ button (just below the search box) can be used. However, it is essential to set up an NCBI account and log in to NCBI for this. One can even choose to have E-mail updates of new articles in the topic of interest.

An external file that holds a picture, illustration, etc.
Object name is IJSTD-35-85-g008.jpg

Saving PubMed searches. A simple option is to click on the dropdown box next to ‘Send to’ option and then choose among the options. It can be saved as a text or word file by choosing ‘File’ option. Another option is the “Save search” option below the search box but this will require logging into your National Center for Biotechnology Information account. This however allows you to set up alerts for E-mail updates for new articles

Single citation matcher

This is another important tool that helps to find the genuine original source of a particular research work (when few details are known about the title/author/publication date/place/journal) and cite the reference in the most correct manner [ Figure 6 ].

An external file that holds a picture, illustration, etc.
Object name is IJSTD-35-85-g009.jpg

Single citation matcher: Click on “Single citation matcher” on PubMed Home page. Type available details of the required reference in the boxes to get the required citation

Full text articles

In any search clicking on the link “free full text” (if present) gives you free access to the article. In some instances, though the published article may not be available free, the author manuscript may be available free of charge. Furthermore, PubMed Central articles are available free of charge.

Managing filters

Filters can be used to refine a search according to type of article required or subjects of research. One can specify the type of article required such as clinical trial, reviews, free full text; these options are available on a typical search results page. Further specialized filters are available under “manage filters:” e.g., articles confined to certain age groups (properties option), “Links” to other databases, article specific to particular journals, etc. However, one needs to have an NCBI account and log in to access this option [ Figure 7 ].

An external file that holds a picture, illustration, etc.
Object name is IJSTD-35-85-g010.jpg

Managing filters. Simple filters are available on the ‘search results’ page. One can choose type of article, e.g., clinical trial, reviews etc. Further options are available in the “Manage filters” option, but this requires logging into National Center for Biotechnology Information account

The Cochrane library

Although reviews are available in PubMed, for systematic reviews and meta-analysis, Cochrane library is a much better resource. The Cochrane library is a collection of full length systematic reviews, which can be accessed for free in India, thanks to Indian Council of Medical Research renewing the license up to 2016, benefitting users all over India. It is immensely helpful in finding detailed high quality research work done in a particular field/topic [ Figure 8 ].

An external file that holds a picture, illustration, etc.
Object name is IJSTD-35-85-g011.jpg

Cochrane library is a useful resource for reliable, systematic reviews. One can choose the type of reviews required, including trials

An important tool that must be used while searching for research work is screening. Screening helps to improve the accuracy of search results. It is of two types: (1) Practical: To identify a broad range of potentially useful studies. Examples: Date of publication (last 5 years only; gives you most recent updates), participants or subjects (humans above 18 years), publication language (English only) (2) methodological: To identify best available studies (for example, excluding studies not involving control group or studies with only randomized control trials).

Selecting the right quality of literature is the key to successful research literature review. The quality can be estimated by what is known as “The Evidence Pyramid.” The level of evidence of references obtained from the aforementioned search tools are depicted in Figure 9 . Systematic reviews obtained from Cochrane library constitute level 1 evidence.

An external file that holds a picture, illustration, etc.
Object name is IJSTD-35-85-g012.jpg

Evidence pyramid: Depicting the level of evidence of references obtained from the aforementioned search tools

Thus, a systematic literature review can help not only in setting up the basis of a good research with optimal use of available information, but also in practice of evidence-based medicine.

Source of Support: Nil.

Conflict of Interest: None declared.

Banner

The Literature Review: 3. Methods for Searching the Literature

  • 1. Introduction
  • 2. Why Do a Literature Review?
  • 3. Methods for Searching the Literature
  • 4. Analysing the Literature
  • 5. Organizing the Literature Review
  • 6. Writing the Review

1. Tasks Involved in a Literature Review

There are two major tasks involved in a literature review:

  • Identifying and selecting literature
  • Writing about the literature

2. Skills Required for Conducting a Literature Search

  • Information seeking skills
  • Ability to use manual and electronic methods to identify useful resources
  • Ability to conduct extensive bibliographic searches
  • Critical appraisal skills
  • Ability to describe, critique, and relate each source to the topic
  • Ability to identify areas of controversy in the literature
  • Organizational skills
  • Ability to organize the literature collected around your topic
  • Ability to present the review logically

3. Searching Techniques

Scan the literature for various types of content, including:

  • theoretical foundations and definitions
  • discussion and debate
  • current issues

Skim potential works to select materials for inclusion

  • decide whether to include or exclude a work from the review

4. Sorting the Literature

For each article identified for possible inclusion in the literature review, you need to:

1. read the abstract

  • decide whether to read the entire article

2. read the introduction

  • explains why the study is important
  • provides a review and evalution of relevant literature

3. read Methods section critically

  • focus on participants and methodology

4. evaluate results

  • are the conclusions logical?
  • is there evidence of bias?

5. Notetaking

  • Take notes as you read through each paper that you will include in the review
  • Purpose of study - research aims or hypotheses
  • Research design and methodology
  • Data analysis
  • Summary of findings

Part of the task in taking notes is to begin the process of sifting and arranging ideas

6. Questions to Keep in Mind

  • What are the key sources of information on this topic?
  • What are the major issues and debates on this topic?
  • What are the key theories, concepts, and ideas on this topic?
  • What are the main questions and problems that have been addressed so far?
  • What are the strengths and weaknesses of the various arguments on the topic?
  • Who are the significant research personalities in this area?
  • << Previous: 2. Why Do a Literature Review?
  • Next: 4. Analysing the Literature >>
  • Last Updated: May 9, 2024 10:36 AM
  • URL: https://libguides.uwi.edu/litreviewsoe

Google sign-in

Different types of literature review techniques followed in a research

The purpose of a literature review is to summarise crucial previous research done on a topic. It is an integral part of the research. From the literature review, the researcher finds out what is lacking in an existing area of research. There are several types of literature review techniques that can be followed in a thesis. They depend on the type of research and the purpose of the study. This article explains the characteristics and fundamental differences in the literature review techniques.

Narrative or traditional literature review technique

This technique of reviewing the literature is the most popular among thesis scholars, and also an important part of any study (Xiao and Watson, 2019). It helps to establish the conceptual and theoretical framework of thesis research. The aim is to identify the studies that have a problem of interest. It does not have a research problem and strategy. It can be used only when there is a topic of interest. The steps in this type of literature review technique include defining the audience, topic and a search for literature. It involves being critical and finding a logical structure. It describes and appraises the articles.

However, the method for selecting articles may not be described. The questions are broader, and the evaluation is variable. Narrative or traditional literature review technique can be of three further types:

  • A general literature review is the crucial dimension of current knowledge of a topic.
  • A historical literature review is based on the examination of research throughout a period. Generally, it starts with the first time a theory or topic has emerged.
  • The methodological literature review describes the way a research design or methods are described.

Systematic literature review technique

A systematic literature review technique helps to identify and appraise the investigation. It is done to get an answer to the formulated question. It aims at determining the appropriate response to the research problem. This sounds similar to the narrative literature review technique but there are a few fundamental differences such as:

  • The research questions are specific.
  • The study selection is based on criteria.
  • The evaluation style is critical and rigorous.
  • The inferences are usually evidence-based.
  • There is a specific format for presenting the review and it usually involves tables.

A systematic review is used to draw decision-making. Therefore it cannot be a part of all types of theses. The systematic literature review technique is majorly used in the field of science, psychology, medicine, and social sciences. There should be certain principles to be kept in mind while following the systemic review technique. These are clarity, transparency, equality, focus, and accessibility in the representation of results so that chances of bias are minimised.

Critical review technique

The critical review technique is more than a summary of the current studies. It is a detailed discussion of a topic or a study involving various views (Eaton, 2018). The critical review technique is the exercise of careful thinking considering strengths and weaknesses. A critical review requires seeking information and reviewing literature effectively. The aim is to conduct a critical assessment of the focus area of research. The essential characteristic is that it makes a reasoned judgment. A well-made critical review reveals the suitability and relevance of a problem. In addition to this, it weighs the study’s significance.

This type of review technique is used when the study needs to be explored critically. It makes an accurate and precise conclusion about the problem. The critical review is presented to make an unbiased, critical analysis. It involves strong and convincing opinions and arguments. The reasoning is vital in this type of review technique.

Theoretical framework review technique

The purpose of a theoretical framework in a thesis is to support the study with relevant theories. When a researcher explains the theory underpinning the idea, it makes the study stronger and more reliable. There may be many theories supporting a single idea. It is important to choose the one that best explains the main concepts and depicts the relation between them. A robust theoretical framework in the research helps explain, interpret, and generalize findings (Frederiksen et. al . 2018). It can increase the success rate of the research. Theories and concepts also signify the understanding of the researcher. In a way, it also acts as the blueprint of the study.

A theoretical framework is particularly essential in subjects like social sciences, economics and consumer psychology. In these subjects, new developments are built on existing theories that constantly evolve but do not become obsolete. Such theses require the identification of at least two theories. Diagrammatic presentation of theories is recommended as they help the reader to visualise better.

Differences in the different literature review techniques

No matter which type of literature review is chosen for a study, it must be written with a proper structure and format. Emotional phrasing and unjustified claims should be avoided. Moreover, the researcher should not use irrelevant content and non-scholarly sources. The focus must be on choosing the right type of review and finding a logical structure.

  • Cooper, C., Booth, A., Varley-Campbell, J., Britten, N. and Garside, R., 2018. Defining the process to literature searching in systematic reviews: a literature review of guidance and supporting studies.  BMC medical research methodology ,  18 (1), pp.1-14.
  • Dodgson, J.E., 2021. Critical analysis: The often-missing step in conducting literature review research.  Journal of Human Lactation ,  37 (1), pp.27-32.
  • Eaton, S.E., 2018. Educational research literature reviews: Understanding the hierarchy of sources.
  • Frederiksen, L., Phelps, S.F. and Kimmons, R., 2018. What is a Literature Review?.  Rapid Academic Writing .
  • Hart, C., 2018. Doing a literature review: Releasing the research imagination.
  • Snyder, H., 2019. Literature review as a research methodology: An overview and guidelines.  Journal of business research ,  104 , pp.333-339.
  • Xiao, Y. and Watson, M., 2019. Guidance on conducting a systematic literature review.  Journal of Planning Education and Research ,  39 (1), pp.93-112.
  • Click to share on Twitter (Opens in new window)
  • Click to share on Facebook (Opens in new window)
  • Click to share on LinkedIn (Opens in new window)
  • Click to share on WhatsApp (Opens in new window)
  • Click to share on Telegram (Opens in new window)

Notify me of follow-up comments by email.

proofreading

Forecasting e-commerce consumer returns: a systematic literature review

  • Open access
  • Published: 21 May 2024

Cite this article

You have full access to this open access article

literature review of techniques

  • David Karl   ORCID: orcid.org/0000-0002-0326-5982 1  

266 Accesses

Explore all metrics

The substantial growth of e-commerce during the last years has led to a surge in consumer returns. Recently, research interest in consumer returns has grown steadily. The availability of vast customer data and advancements in machine learning opened up new avenues for returns forecasting. However, existing reviews predominantly took a broader perspective, focussing on reverse logistics and closed-loop supply chain management aspects. This paper addresses this gap by reviewing the state of research on returns forecasting in the realms of e-commerce. Methodologically, a systematic literature review was conducted, analyzing 25 relevant publications regarding methodology, required or employed data, significant predictors, and forecasting techniques, classifying them into several publication streams according to the papers’ main scope. Besides extending a taxonomy for machine learning in e-commerce, this review outlines avenues for future research. This comprehensive literature review contributes to several disciplines, from information systems to operations management and marketing research, and is the first to explore returns forecasting issues specifically from the e-commerce perspective.

Similar content being viewed by others

literature review of techniques

An E-Commerce Prototype for Predicting the Product Return Phenomenon Using Optimization and Regression Techniques

literature review of techniques

Forecasting Misused E-Commerce Consumer Returns

literature review of techniques

Extreme Learning Machine for Business Sales Forecasts: A Systematic Review

Avoid common mistakes on your manuscript.

1 Introduction

E-commerce has witnessed substantial growth rates in recent years and continues growing by double-digit margins (National Retail Federation/Appriss Retail 2023 ). However, lenient consumer return policies have resulted in $212 Billion worth of merchandise being returned to online retailers in the U.S. in 2022, accounting for 16.5% of online sales (National Retail Federation/Appriss Retail 2023 ). While high rates of consumer returns mainly concern specific sectors and product categories, online fashion retailing is particularly affected (Diggins et al. 2016 ). Recent studies report average shipment-related return rates for fashion retailers in the 40–50% range (Difrancesco et al. 2018 ; Karl and Asdecker 2021 ). In addition to missed sales and reduced profits (Zhao et al. 2020 ), consumer returns pose operational challenges (Stock and Mulki 2009 ), including unavoidable processing costs (Asdecker 2015 ) and uncertainties regarding logistics capacities, inventory management, procurement decisions, and marketing activities. Hence, effectively managing consumer returns is an essential part of the e-commerce business model (Urbanke et al. 2015 ).

Similar to the research conducted by Abdulla et al. ( 2019 ), this work focuses on consumer returns in online retailing (e-commerce), excluding the larger body of closed-loop supply chain (CLSC) management, which encompasses product returns related to end-of-life and end-of-use scenarios involving raw material recycling or remanufacturing. In contrast to CLSC returns, retail consumer returns are typically sent or given back unused or undamaged shortly after purchase, without any quality-related defects. These returns should be reimbursed to the consumer and are intended to be resold “as new” (de Brito et al. 2005 ; Melacini et al. 2018 ; Shang et al. 2020 ).

Regarding forecasting aspects, demand forecasting is a crucial activity for successful retail management (Ge et al. 2019 ). In contrast to demand and sales, returns constitute the “supply” side of the return process (Frei et al. 2022 ). Consequently, forecasting becomes a complex task and a significant challenge in managing returns due to the inherently uncertain nature of customer decisions regarding product retention (Frei et al. 2022 ). Moreover, return forecasts are interconnected with sales forecasts and promotional activities (Govindan and Bouzon 2018 ; Tibben-Lembke and Rogers 2002 ). Hence, forecasting objectives may vary, encompassing return quantities, timing (Hachimi et al. 2018 ), and even individual return probabilities. Minimizing return forecast errors is critical to reduce and minimize reactive planning (Hess and Mayhew 1997 ). Accurate forecasts rely on (1) comprehensive data collection, e.g., regarding consumer behavior, and (2) information and communications technology (ICT) for data processing, such as big data analytics. Despite extensive research in supply chain management (SCM), Barbosa et al. ( 2018 ) noted a lack of relevant publications exploring the "returns management" process of SCM in conjunction with big data analytics. Specifically, “the topic of forecasting consumer returns has received little attention in the academic literature” (Shang et al. 2020 ). Nonetheless, precise return forecasts positively impact reverse logistics activities’ economic, environmental, and social performance, primarily concerning quantity, quality, and timing predictions (Agrawal and Singh 2020 ). Hence, forecasting returns holds significant relevance across various supply chain stages.

1.1 Previous meta-research

Hess and Mayhew ( 1997 ) emphasized the need for extensive data analysis concerning reverse flows, which forms the basis for returns forecasting. Subsequently, research on consumer returns and reverse logistics has proliferated. Thus, before collecting data and reviewing the topic of consumer returns forecasting, we first examined existing reviews and meta-studies relevant to the subject matter. To accomplish this, we referred to Web of Science, Business Source Ultimate via EBSCOhost, JSTOR and the AIS Electronic Library as primary sources of knowledge (search term: "literature review" AND "return*" AND "forecast*”). As a secondary source, we appended the results of Google Scholar, Footnote 1 for which a different search term was used (intitle:"literature review" ("product return" OR "consumer return" OR "retail return" OR "e-commerce return") forecast) due to unavailable truncations and to reduce the vast amount of literature with financial focus the search term “return” would lead to. Table 1 presents the most pertinent literature reviews related to the scope of this paper.

Agrawal et al. ( 2015 ) identified research gaps within the realm of reverse logistics, finding “forecasting product returns” as a crucial future research path. However, among 21 papers focusing on “forecasting models for product returns”, the emphasis was predominantly on CLSC, reuse, remanufacturing, and recycling, which do not align with the aim of this review. Agrawal et al. also noted a lack of comprehensive analysis of underlying factors in returns forecasting, such as demographics or consumer behavior.

Similarly, Hachimi et al. ( 2018 ) addressed forecasting challenges within the broader context of reverse logistics. They classified their literature using various forecasting approaches: time series and machine learning, operations research methods, and simulation programs. The research gaps they identified included a limited number of influencing factors taken into account, the absence of established performance indicators, and methodological issues related to dynamic lot-sizing with returns. Although this review focused on reverse logistics, the call for research into predictors of future returns is equally applicable to consumer returns in e-commerce.

The review of Abdulla et al. ( 2019 ) centers on consumer returns within the retail context, particularly in relation to return policies. While they discuss consumer behavior and planning and execution of returns, they do not present any sources explicitly focused on forecasting issues.

Micol Policarpo et al. ( 2021 ) reviewed the literature on the use of machine learning (ML) in e-commerce, encompassing common goals of e-commerce studies (e.g., purchase prediction, repurchase prediction, and product return prediction) and the ML techniques suitable for supporting these goals. Their primary contribution is a novel taxonomy of machine learning in e-commerce, covering most of the identified goals. However, within the taxonomy developed, the aspect of return predictions is disregarded.

The most exhaustive literature review to date regarding product returns, conducted by Ambilkar et al. ( 2021 ), analyzed 518 papers and adopted a holistic reverse logistics approach encompassing all supply chain stages. The authors categorized the papers into six categories, including “forecasting product returns”, for which they found and concisely described 13 papers. Due to the broader research scope, none of the analyzed papers focused on consumer returns within the retail context.

The review by Duong et al. ( 2022 ) employed a hybrid approach combining machine learning and bibliometric analysis. Regarding forecasts of product returns, they identified three relevant papers (Clottey and Benton 2014 ; Cui et al. 2020 ; Shang et al. 2020 ) within the “operations management” category. They explicitly call for further research on predicting customer returns behavior in the pre-purchase stage, highlighting the importance of a better understanding of online product reviews and customers’ online interactions.

1.2 Research gaps and research questions

Why is a systematic literature review necessary for investigating consumer returns and forecasting? On the one hand, there are empirical and conceptual papers that touch upon this topic, including brief literature reviews that align with the subject’s focus (e.g., Hofmann et al. 2020 ). However, narrative reviews lack transparency and replicability (Tranfield et al. 2003 ) and often induce selection bias (Srivastava and Srivastava 2006 ) as they tend to approach a field from a specific perspective. In contrast, systematic reviews strive to present a holistic, differentiated, and more detailed picture, incorporating the complete available literature (Uman 2011 ). On the other hand, existing systematic reviews provide structured yet relatively superficial overviews of literature on end-of-use and end-of-life forecasting (Shang et al. 2020 ), but they do not specifically address consumer returns. Furthermore, we contend that a review dedicated to general reverse logistics forecasting would not adequately capture the distinctive context and requirements inherent in the consumer-retailer relationship within the realm of e-commerce (Abdulla et al. 2019 ).

Consequently, based on existing reviews and papers, we have identified research gaps worth examining more in detail: (1) Returns forecasting techniques and relevant predictors for the respective underlying purposes, especially in the context of e-commerce (RQ1 and RQ2); (2) the integration of return forecasts into an existing but incomplete taxonomy of machine learning in e-commerce (Micol Policarpo et al. 2021 ; RQ3); and (3) future research directions pertaining to e-commerce returns forecasting (RQ4). Therefore, this review aims to shed more light on consumer returns forecasting in the retail context. The following research questions outline the primary objectives:

RQ1: What key research problems (e.g., forecasting purposes, technological approaches) have been addressed in the literature on forecasting consumer returns over time?

RQ2: What are the …

Publication outlets and research disciplines,

Research types and methodologies,

Product categories and industries,

Data sources and characteristics,

Relevant forecasting predictors,

Techniques and algorithms

… used to address these key problems?

RQ3: How can returns forecasting be integrated into a taxonomy of machine learning in e-commerce?

RQ4: What are promising or emerging future research directions regarding forecasting consumer returns?

The paper is organized as follows: Sect.  2 describes selected fundamental concepts and the delimitation of the research field on consumer returns forecasting. Section  3 contains the methodology for the review, drawing on the PRISMA guideline (Page et al. 2021 ) while integrating the approaches of Denyer and Tranfield ( 2009 ) and Webster and Watson ( 2002 ). Section  4 presents the review’s main results, answering RQs 1 (Sect.  4.1 ), RQ2 (Sects.  4.2 – 4.5 ), and RQ 3 (Sect.  4.6 ). A research framework developed in Sect.  5 structures the discussion regarding future research directions (RQ4). Section  6 subsumes the overall contribution of this review.

2 Consumer returns and forecasting

2.1 consumer returns and return reasons.

Reverse product flows, commonly referred to as product returns, can be classified into three categories: manufacturing returns, distribution returns, and consumer returns (Shaharudin et al. 2015 ; Tibben-Lembke and Rogers 2002 ). Among these, consumer returns are further differentiated between returns in brick-and-mortar retail or mail-order/e-commerce returns (Tibben-Lembke and Rogers 2002 ) and are also known as commercial returns (de Brito et al. 2005 ) or retail (product) returns (Bernon et al. 2016 ). With sky-rocketing e-commerce sales, online consumer returns have emerged as the dominant segment, making them a highly relevant field of research (Abdulla et al. 2019 ; Frei et al. 2020 ). Additionally, the digitization of retail provides numerous opportunities for data collection, as digital customer accounts facilitate more efficient analytical monitoring of customer behavior (Akter and Wamba 2016 ). Simultaneously, as competitive pressures intensify in e-commerce due to increased price transparency and substitution possibilites, retailers aiming to stimulate impulse purchases face hightened return rates (Cook and Yurchisin 2017 ; Karl et al. 2022 ).

The spatial decoupling of supply and demand introduces a higher level of uncertainty for e-commerce customers regarding various product attributes compared to bricks-and-mortar retailing (Hong and Pavlou 2014 ). As consumers are unable to physically assess the products they order, this translates into returns being essential part of the e-commerce business model. Besides fit uncertainty, other reasons for returns exist. Stöcker et al. ( 2021 ) classify the drivers triggering consumer returns into consumer behavior related reasons (e.g., impulsive purchases, showrooming), fulfillment/service related reasons (e.g., wrong/delayed delivery) and information gap related reasons (product fit, insufficient visualization). By mitigating customers’ return reasons, retailers try to reduce the return likelihood (“return avoidance”) (Rogers et al. 2002 ). Another, but less promising way of reducing returns, is preventing customers who intend to return from actually doing so (e.g., by incurring additional effort or by rejecting returns) (Rogers et al. 2002 ).

Adapted from Abdulla et al. ( 2019 ) and Vakulenko et al. ( 2019 ), a simplified parallel process of a return transaction from the consumer’s and retailer’s perspective is visualized in Fig.  1 . Retailers can use forecasting in all transaction phases (Hess and Mayhew 1997 ). Targeting customer interventions pre-purchase (real-time forecasting) could be implemented by using dynamically generated (Dalecke and Karlsen 2020 ) digital nudging elements (Kaiser 2018 ; Thaler and Sunstein 2009 ; Zahn et al. 2022 ) in case of a predicted high return propensity. In the post-purchase phase, forecasting could stimulate different interventions (e.g., customer support) or can be helpful for logistics and inventory planning activities (Hess and Mayhew 1997 ). In the phase after the return decision, data analysis, including segmentation on different levels, e.g., for customers, products, or brands (Shang et al. 2020 ), can support managerial decision-making regarding assortment or (individualized) return policies for future orders (Abdulla et al. 2019 ). In other words, forecasting (or modeling) of returns in later phases of the process can substantiate interventions in earlier phases of the process (e.g., a temporary return policy change, or the suspension of product promotions due to particular forecasts). However, such data-driven interventions itself also represent an influencing factor to be taken into account in future forecasts; thus, different forecasting purposes can be linked, at least when it comes to the data required. All these interdependencies hint at the circularity of the returns process, with an adequate management of returns representing an opportunity for generating customer satisfaction and retention (Ahsan and Rahman 2016 ; Röllecke et al. 2018 ).

figure 1

Purchase and return process concerning forecasting issues (adapted from Abdulla et al. 2019 ; Vakulenko et al. 2019 )

Although primarily focussing on the online retailers’ process, it is worth noting that the issue at hand is equally applicable to brick-and-mortar retail (Santoro et al. 2019 ), which can benefit from the application of advanced data analysis techniques for forecasting purposes (Hess and Mayhew 1997 ).

2.2 Forecasting purposes and corresponding techniques

Accurate forecasting holds significant importance in the realm of e-commerce. Precise demand forecasts (“predictions”) play a pivotal role in inventory planning, pricing, and promotions and ultimately impact the commercial success of retailers (Ren et al. 2020 ). Forecasting consumer returns affects similar business aspects and resorts to comparable existing technical procedures. The data science and statistics literature offers diverse methods and algorithms for forecasting consumer returns. The choice of approach depends on the specific objective, with the outcome variable being scaled accordingly. For instance, when forecasting whether a single product will be returned, the dependent variable is either binary or expressed as a propensity value ranging form 0 to 1. On the other hand, forecasting the quantitay or timing of returns entails continuous outcome variables. As a result, various techniques, from time-series forecasting to machine learning approaches can be applied, which will be briefly outlined in the subsequent sections.

2.2.1 Return classifications and propensities

A naïve method for determining the propensity or return decision forecast is using lagged (historical) return information (return rates), either for a given product, a given customer, or any other reference, to calculate a historical return probability (Hess and Mayhew 1997 ). Return rate forecasts are a reference-specific variant of forecasting return propensities.

Simple causal models based on statistical regression methods utilize one or more independent exogenous variables. The logistic regression (logit model) is employed when the dependent variable is binary or contains more nominal outcomes (multinomial logistic regression). For each observation, the binary logistic regression assesses the probability that the dependent variable takes the value “1” (Hastie et al. 2017 ). Consequently, this approach finds application for return decisions and return propensities. Comparatively, linear discriminant analysis (Fisher 1936 ) bears a resemblance to logistic regression by generating a linear combination of independent variables to best classify available data. This classification process involves determining a score for each observation, subsequently compared to a critical discriminant score threshold, and distinguishing between return and keep.

More sophisticated machine learning (ML) techniques such as neural networks, decision tree-based methods, ensemble learning, and boosting methods are highly suitable for this forecasting purpose. For a general exposition of ML techniques in the domain of e-commerce, we refer to Micol Policarpo et al. ( 2021 ). Additionally, for a comparative study of several state-of-the-art ML classification techniques, see Fernández-Delgado et al. ( 2014 ). Artificial Neural Networks (NN) consist of interconnected nodes (“neurons”) organized in layers, exchanging signals to ascertain a function that accurately assigns input data to corresponding outputs. Typically, supervised learning techniques such as backpropagation compare the network outputs with known actual values (Hastie et al. 2017 ). Notably, neural networks are the most popular machine learning algorithm in last years’ e-commerce research (Micol Policarpo et al. 2021 ), and deep learning extensions like Long Short-Term Memory (Bandara et al. 2019 ) are gaining attention. Decision Trees (DT) manifest as hierarchical structures of branches representing conjunctions of specific characteristics and leaf nodes denoting class labels. This approach endeavors to construct an optimal decision tree for classifying available observations. Many decision tree algorithms have been introduced to serve this purpose (e.g., Breiman et al. 1984 ; Pandya and Pandya 2015 ). Ensemble learning methods adopt a voting mechanism involving multiple algorithms to enhance predictive performance (Polikar 2006 ). Analogously, boosting and bagging techniques are incorporated in algorithms like AdaBoost or the tree-based Random Forest (RF) to augment the input data, aiming at more generalizable forecasting models less prone to overfitting issues (Hastie et al. 2017 ). Support Vector Machines (SVM) stand as another example of a supervised ML algorithm, having demonstrated efficacy in tackling classification problems within e-commerce (Micol Policarpo et al. 2021 ).

2.2.2 Return timing and volume forecasts

For product returns, timing is crucial in forecasting end-of-life, end-of-use, or remanufacturing returns that can occur years after the initial purchase (Petropoulos et al. 2022 ). In contrast, for consumer returns, the possible time window in which products are regularly returned in new condition with the aim of a refund is much shorter (usually less than 100 days and mostly less than 30 days), and priorities are more on forecasting return volumes. Forecasting return volumes can be multi-faceted, ranging from forecasting the total return volume a retailer has to process within its logistics department through forecasting product-specific return numbers up to forecasting costly return shares, e.g., return fraud volume. Because returns depend on fluctuating sales, time-series forecasting of return volumes performs only well with constant sales volumes or under risk-pooling (Petropoulos et al. 2022 ). Thus, for a naïve return volume forecast, sales forecasts for a given timeframe are multiplied by the lagged return rate (historical data of products/consumers or any other reference). Possible algorithms for estimating historical return rates include time series forecasting to causal predictions comprising ML approaches (Hachimi et al. 2018 ).

Time-series techniques, e.g., single exponential smoothing (SES) or Holt-Winters-approaches (HW), are based on the assumption that the future development of an outcome variable (e.g., return volume) is dependent on its past numbers, while time acts as the only predictor. Most of these models can be generalized as autoregressive moving averages (ARIMA) models, for which numerous extensions are available. These models can approximate more complex temporal relationships. Similarly, time-series regression models use univariate linear regression with time as a single exogenous variable.

The mentioned multivariate regression models are essential statistical tools and can predict metric variables such as return volume or time. The logic is to fit a linear function of a given set of input variables (“features”) to the outcome variable with the criteria of minimizing the residual sum of squares (Hastie et al. 2017 ). Many variants of regression models are derived from this logic (e.g., generalized linear models), and various extensions are built upon this base (e.g., LASSO for variable selection, Tibshirani 1996 ).

Emerging from more complex statistical methods and using the possibilities of continuously increasing computing power, IT-based machine learning (ML) approaches were developed. Some of these approaches have already been presented in Sect. 2.2.1, being suitable for predicting metric variables in addition to classification tasks, e.g., neural networks, decision tree algorithms, and especially ensemble techniques like random forests.

3 Methodology

Methodologically, the research process of this review follows the PRISMA guideline (Page et al. 2021 ) where applicable and is structured in five steps (Denyer and Tranfield 2009 ; Webster and Watson 2002 ): (1) question formulation; (2) locating studies; (3) study selection and evaluation; (4) (concept-centric) analysis and synthesis; and (5) reporting and using the results for defining an agenda for future research.

The first step refers to the research questions already formulated in the introduction. The second step involves selecting the databases and defining the search terms. In that respect, five scientific databases were selected, aiming at journal as well as conference publications: AIS Electronic Library (AISeL), Business Source Ultimate (BS) via EbscoHost, JSTOR (JS), Science Direct (SD), and Web of Science (WoS). To ensure inclusivity and to account for potential variations in spelling or phrasing, the final search strings incorporate truncations where applicable. The search query utilized in this review comprises two key components. Firstly, it pertains to consumer returns, encompassing products returned by consumers, primarily in the context of e-commerce, to the retailer. While it is recommended to use reasonably general search terms, the term “return” alone would yield results for various stages of reverse logistics and a vast amount of financial literature. Therefore, we conducted a more specific search using the phrase “consumer return*” and the related terms “e-commerce return*”, “product return*”, “return* product”, “customer return*”, and “retail return*”. Secondly, this paper specifically focuses on forecasting (“forecast*”), which can be alternately referred to as “predict*” or “prognos*”. The combination of these terms was searched for in the Title, Abstract and Keywords fields.

The search includes results up to the middle of 2022 and resulted in 725 initial search hits (see Fig.  2 ). As this review aims to identify papers dealing with consumer returns and forecasting, the inclusion criteria for eligibility were:

The title or keywords referred to consumer returns or forecasting (in a broader sense, including data preparation). A connection to the respective subject area and applicability to the retail domain should at least be plausible.

Manuscript in English: No important study would be written and published in a language different than English.

The paper has undergone a single- or double-blind peer-review process, either as a journal publication or as a publication in peer-reviewed conference proceedings.

figure 2

Research process flow diagram

In the third step, duplicates were removed, resulting in a set of 650 unique records. Subsequently, the papers underwent screening based on title, keywords, and language to determine whether they warranted further examination. This preliminary screening phase reduced the number of papers to 85. These papers’ abstracts and full texts were thoroughly reviewed to assess their relevance. This step encompasses all papers pertaining to returns forecasting for retailers or direct-selling manufacturers while excluding those focused on closed-loop supply chain management or remanufacturing, recycling, and end-of-life returns. Ultimately, a final sample of 20 publications was identified, serving as a foundation for identifying additional relevant papers (vom Brocke et al. 2009 ; Webster and Watson 2002 ) through a forward search using Google Scholar and snowballing via backward search. This process yielded an additional five papers, resulting in a total of 25 papers included for review (Table  2 ).

The fourth step comprises the analysis and synthesis of the relevant papers. Data, including bibliographic statistics, were collected in accordance with the research questions. A two-way concept-centric analysis, as described by Webster and Watson ( 2002 ), was conducted, encompassing confirmatory aspects based on the fundamentals outlined in Sect.  2 of this paper, as well as exploratory elements aimed at enriching existing categories and concepts. The objective was to comprehensively describe the relevant concepts, approaches, and dimensions discussed in the literature.

Moving on to the fifth and final step (Denyer and Tranfield 2009 ), the results are presented. Initially, the main scope of the papers included in the analysis is presented. Next, bibliographic data pertaining to the included papers are provided to offer a concise overview of the research area and its recent developments, followed by a content analysis and synthesis of the relevant literature to delve into the current state of research and highlight key findings. Finally, Sect.  5 outlines a research agenda for the domain (vom Brocke et al. 2009 ).

4 Results of the systematic review

After outlining the main scope of the relevant publications (4.1), a short bibliographic characterization (4.2) is given. Next, this section presents the results of the systematic review, focussing on the methodology and datasets used (4.3), predictors used for returns forecasting (4.4), and forecasting techniques employed (4.5). The integration of consumer returns forecasting into an existing taxonomy for e-commerce and machine learning (Micol Policarpo et al. 2021 ) summarizes and concludes the presentation of the results.

4.1 Overview and main scope of the relevant publications

Table 3 provides an overview of the forecasting purpose of the papers, the data source for the forecasting, the algorithms employed, and the predictors used in the forecasting models. The contributions of the respective papers regarding forecasting issues are summarized in the Appendix.

For identifying research streams, the publications are analyzed regarding the intention and main scope, as described in the abstract, the respective research questions, and the remainder of the papers. Most papers were assigned to an unequivocal research scope, while some contributed to two key topics (Fig.  3 ).

figure 3

Classification of main scopes (n = 25; not mutually exclusive)

At first, we identified a stream of literature regarding the comparison of different forecasting models and algorithms (Asdecker and Karl 2018 ; Cui et al. 2020 ; Drechsler and Lasch 2015 ; Heilig et al. 2016 ; Hess and Mayhew 1997 ; Hofmann et al. 2020 ; Imran and Amin 2020 ). These papers use existing approaches, adapt them for individual forecasting purposes, apply models to one or more datasets, and compare and evaluate the resulting forecasting performance. One paper claims that the difference in forecasting accuracy of easily interpretable algorithms is relatively small compared to more sophisticated ML algorithms (Asdecker and Karl 2018 ). This statement is partially confirmed (Cui et al. 2020 ), as the ML algorithms show advantages over simpler models in the training data set but have lower prediction quality due to overfitting issues in the test data. Nevertheless, fine-tuned ML approaches (e.g., deep learning with TabNet) outperform simpler models and gain accuracy when correcting class imbalances during the data preparation phase (Imran and Amin 2020 ). When confronted with large class imbalances (e.g., low return rates), boosting algorithms like Gradient Boosting work well without oversampling (Hofmann et al. 2020 ). Fundamentally, ensemble models incorporating different techniques show the maximum possible accuracy (Asdecker and Karl 2018 ; Heilig et al. 2016 ). Forecasting of return timing is more erroneous than return decisions, and split-hazard-models outperform simple OLS approaches (Hess and Mayhew 1997 ). Time series prediction only works reliably when return rates do not fluctuate heavily (Drechsler and Lasch 2015 ).

The second stream we identified focuses on feature generation or selection and dataset preparation (Ahmed et al. 2016 ; Ding et al. 2016 ; Hofmann et al. 2020 ; Rezaei et al. 2021 ; Samorani et al. 2016 ; Urbanke et al. 2015 , 2017 ). Besides this central topic, some papers also compare different forecasting algorithms (Ahmed et al. 2016 ; Hofmann et al. 2020 ; Rezaei et al. 2021 ; Urbanke et al. 2015 , 2017 ). For example, random oversampling of data with large class imbalances can improve the performance of different forecasting algorithms, while models based only on sales/return history perform worse than models with more features (Hofmann et al. 2020 ). Two similar approaches are based on product, basket, and clickstream data, using different algorithms for feature extraction (Urbanke et al. 2015 , 2017 ). The first developed a Mahalanobis Feature Extraction algorithm, proving superior to other algorithms like principal component analysis or non-negative matrix factorization (Urbanke et al. 2015 ). The second develops a NeuralNet algorithm to extract interpretable features from a high-dimensional dataset, showing superior performance and giving reasonable interpretability of the most important factors (Urbanke et al. 2017 ). For the automated integration of different data sources into single flat tables and the generation of discriminating features, a rolling-path algorithm is developed, improving performance when data is imbalanced (Ahmed et al. 2016 ). Similarly, the software “Dataconda” can automatically generate and integrate relational attributes from different sources into a flat table, which is often the required prerequisite for forecasting algorithms (Samorani et al. 2016 ). A different selection approach clusters the features into groups and applies selection algorithms to the groups, aiming to select a smaller set of attributes (Rezaei et al. 2021 ). As quite an offshoot, one paper predicts a seller’s overall daily return volume dependent on his current “reputation” measured by tweets (Ding et al. 2016 ), which needs sentiment analysis to be integrated into the forecast.

A quite heterogenous research stream belongs to the development of algorithms, heuristics, and models that go beyond a straightforward adaption of existing approaches (Fu et al. 2016 ; Joshi et al. 2018 ; Li et al. 2018 ; Potdar and Rogers 2012 ; Rajasekaran and Priyadarshini 2021 ; Shang et al. 2020 ; Sweidan et al. 2020 ; Zhu et al. 2018 ). Potdar and Rogers ( 2012 ) developed a methodology for forecasting product returns based on reason codes and consumer behavior data. Fu et al. ( 2016 ) developed a conditional probability-based statistical model for predicting return propensities while revealing return reasons and outperforming some baseline benchmark models. Li et al. ( 2018 ) describe their “HyperGo” approach as a ‘framework’ and develop an algorithm for forecasting return intention after basket composition. Zhu et al. ( 2018 ) describe a “LoGraph” random walk algorithm for predicting returned customer/product combinations within their framework. Although Joshi et al. ( 2018 ) label their approach as a “framework”, they describe a specific two-stage algorithm for forecasting return decisions based on network science and ML. Rajasekaran and Priyadarshini ( 2021 ) developed a hybrid metaheuristic-based regression approach to predict return propensities.

Seven papers deal with concepts, meta-models, or substantial frameworks for returns forecasting (Fu et al. 2016 ; Fuchs and Lutz 2021 ; Heilig et al. 2016 ; Hofmann et al. 2020 ; Li et al. 2018 ; Shang et al. 2020 ; Zhu et al. 2018 ). A generic framework for a scalable cloud-based platform, which enables a vertical and horizontal adjustment of resources, could enable the practical real-time use of computationally intensive ML algorithms for forecasting returns in an e-commerce platform (Heilig et al. 2016 ). Two papers (Fuchs and Lutz 2021 ; Hofmann et al. 2020 ) are based on design science research (DSR, Hevner et al. 2004 ) for developing artifacts like meta models and frameworks. The first also refers to CRISP-DM, the “Cross Industry Standard Process for Data Mining” (Wirth and Hipp 2000 ), and develops a shopping-basket-based general forecasting approach suitable across different industries without domain knowledge and attributes needed (Hofmann et al. 2020 ). In a similar approach, based on the basket composition and user interactions, a generic model for real-time return prediction and intervention is developed (Fuchs and Lutz 2021 ) and prepared for integration into an ERP system. Fu et al. ( 2016 ) present a generalized return propensity latent model framework by decomposing returns into different inconsistencies (unmet product expectations, shipping issues, and both factors combined) and enriching the derived propensities with product features and customer profiles. Li et al. ( 2018 ) developed a “HyperGo” framework for forecasting the return intention in real-time after basket composition, including a hypergraph representation of historical purchase and return information. Similarly, Zhu et al. ( 2018 ) developed a “HyGraph” representation of historical customer behavior and customer/product similarity, combined with a “LoGraph” random-walk-based algorithm for predicting customer/product combinations that will be returned. Shang et al. ( 2020 ) discuss two opposing forecasting concepts, demonstrating that their predict-aggregate framework is superior to common and more naïve aggregate-predict approaches.

The last stream covers the detection and forecasting of return fraud and abuse (Drechsler and Lasch 2015 ; John et al. 2020 ; Ketzenberg et al. 2020 ; Li et al. 2019 ). On the employees’ side, one paper tries to automatically predict fraudulent return behavior of agents (employees), e.g., regarding unjustified refunds, by a penalized logit model, enabling a lift in detection (John et al. 2020 ). On the customers’ side, misused returns as a cost-incurring problem are the forecasting purpose of different time series prediction models (Drechsler and Lasch 2015 ). Instead of focussing on fraudulent transactions, a trust-aware random walk model identifies consumer anomalies, enabling retailers to apply targeted measures to specific customer groups (selfish, honest, fraud, and irrelevant customers) (Li et al. 2019 ). Similarly, returning customers can be categorized into abusive, legitimate, and nonreturners (Ketzenberg et al. 2020 ). Based on the characterization of abusive return behavior, a neural network classifier recaptures almost 50% of lost profits due to return abuse (Ketzenberg et al. 2020 ).

One paper (Sweidan et al. 2020 ) could not be assigned to the other scopes. It applies a single algorithm (RF) to a given dataset, and it contributes to the idea that only forecasted return decisions with high confidence should be used for targeted interventions due to their overproportional reliability.

4.2 Bibliographic literature analysis

Forecasting consumer returns has gained more research attention since 2016 (Fig.  4 ). The majority of the sample are conference publications, a couple of years ahead of the rise in journal publications. Compared to the publications on returns forecasting in the broader context of reverse logistics, which emerged in 2006 (Agrawal et al. 2015 ), the research on consumer returns moved into the spotlight about ten years later. This development is linked to a massive increase in e-commerce sales pre- and in-pandemic (Alfonso et al. 2021 ).

figure 4

Publication trend by publication outlet

Out of 9 journal publications in the final sample, only two are published in the same journal (Journal of Operations Management). Out of 16 conference papers, 6 are published at conferences of the Association for Information Systems. In total, 16 of the 25 papers found are published in Information Systems (IS) and related outlets. Others can be assigned to the Management Science / Operations Research discipline (3), Strategy & Management in a broader sense (4), Marketing (1), and Research Methods (1) (Fig.  5 ).

figure 5

Distribution of publication disciplines

Regarding the researchers’ geographical perspective, one paper was jointly published by authors from the US and China, 10 of 25 papers were authored from North America, followed by authors from Germany (7), India (3), China (1), and one paper each from Bangladesh, Singapore, and Sweden.

The most cited paper (200 external citations Footnote 2 ) from Hess and Mayhew ( 1997 ) could be thought of as the root of this research field (Table  4 ). However, only 10 out of 24 papers reference this work. Although Urbanke et al. ( 2015 ) received only 15 citations in total, within the sample, it is the second most cited paper (8 citations) and could eventually be classified as a research strand and origin of returns forecasting in the IS domain. Concerning the remaining papers, no unique strands of literature are recognizable based on citation analysis.

4.3 Methodology and data characterization

Regarding methodology, most of the papers start with a short narrative literature review regarding their respective focus. Not a single paper was based on interviews, surveys, questionnaires, or field experiments. 3 out of 25 papers formulated and tested conventional hypotheses. All of the publications use quantitative data for analysis and forecasting in a “case study” style, including numerical experiments based on real or simulated data.

Table 5 lists further details about the data used in the publications. 4 out of 25 papers rely on simulated data, and 23 out of 25 integrate actual data gained from a retailer. Two papers use both data types. 5 papers use more than one dataset (Ahmed et al. 2016 ; Cui et al. 2020 ; Rezaei et al. 2021 ; Samorani et al. 2016 ; Shang et al. 2020 ). The most frequently studied industry is fashion/apparel (10 papers), followed by five consumer electronics datasets. Two publications are based on data from a Taobao cosmetics retailer, and two datasets originate from general and wide assortment retailers. Two datasets incorporate building material and hardware store articles, and the detailed products are not named for three publications. Based on the previous studies, it is evident that consumer returns forecasting is most relevant for e-commerce, as 19 of the 25 publications refer to e-tailers. Nevertheless, 7 publications refer to brick-and-mortar retailing. Direct selling/marketing is represented in 2 data sets.

4.4 Predictors for consumer returns

There is an individual stream of research into factors that influence or help avoid consumer returns (e.g., Asdecker et al. 2017 ; De et al. 2013 ; Walsh and Möhring 2017 ), which is not part of this review. Nevertheless, the forecasting literature gives insights into return drivers, as the input variables (features, predictors, exogenous variables) for forecasting models represent some of these factors. Table 6 presents the most used predictors and tries to map these to the return driver categorization from Sect.  2.2 (Stöcker et al. 2021 ).

Although only a part of the publications interprets the predictors, some insights can be extracted. For total return volume , sales volume is the most critical predictor (Cui et al. 2020 ; Shang et al. 2020 ). Historical return volume trends can include behavioral aspects (e.g., impulse purchases) in a given timeframe (Cui et al. 2020 ; Shang et al. 2020 ). The product type significantly impacts the volume of returns (Cui et al. 2020 ), confirmed by widely varying return rates between different industries/sectors. Adding transaction-, customer-, or product-level predictors led to a surprisingly small forecasting accuracy gain (4% reduction of RMSE, Shang et al. 2020 ). The latter input variables may be more critical in forecasting return decisions and propensities.

Regarding product attributes , product or order price is one of the most common predictors, while some papers also include price discounts. In most models, price is hypothesized to increase returns (e.g., Asdecker and Karl 2018 ; Hess and Mayhew 1997 ). Promotional (discounted) orders also seem to result in more returns (Imran and Amin 2020 ), which could be explained by the stimulation of impulse purchases. Footnote 3 Brand perception influences return decisions (positive brands, lower returns) (Samorani et al. 2016 ). The order and return history of products are also relevant for predicting future orders and returns (Hofmann et al. 2020 ). Fit importance as a product attribute does not significantly change return propensities (Hess and Mayhew 1997 ).

Concerning customer attributes , gender seems essential, as female customers return significantly more items than men (Asdecker and Karl 2018 ; Fu et al. 2016 ). Younger customers show a slightly lower propensity to return (Asdecker and Karl 2018 ), but age played a more prominent role in predicting return fraud among employees than in customers (John et al. 2020 observed more fraud among younger employees). Customers with low credit scores returned more (Fu et al. 2016 ). The return history of a customer is possibly the most important predictor of future return behavior (Samorani et al. 2016 ). Some papers argue that consumer attributes, including purchase and return history (e.g., number and value of orders), are more relevant predictors than product or transaction profiles, reflecting more or less stable consumer preferences (Li et al. 2019 ).

Basket interactions are significant (Urbanke et al. 2017 ) in returns prediction. E.g., the larger the basket, the higher the return propensity will be (Asdecker and Karl 2018 ). Selection orders (same product in different sizes or colors) increase the return propensity (Li et al. 2018 ). Logistics attributes like delivery times only show minor effects (Asdecker and Karl 2018 ). Regarding the payment method, prepaid products are sent back less frequently than those with post-delivery payment options (Imran and Amin 2020 ), confirming other research results (Asdecker et al. 2017 ).

One literature stream focuses on the automated generation of features , as different and large-scale data sources need to be integrated and prepared for forecasting algorithms. Thus, possible interrelationships are complex to find manually, and ML approaches might outperform human analysts (Rezaei et al. 2021 ). While some approaches generate a large number of features that are hard to make sense of (Ahmed et al. 2016 ), the approach of Urbanke et al. ( 2017 ) aims to maintain the interpretability of automatically generated input variables. Some unexpected but meaningful interrelations might be found by automatic feature generation, e.g., the price of the last returned orders (Samorani et al. 2016 ). Nevertheless, automatic feature generation might be computation-intensive; thus, a parallel integration of feature selection could be advantageous for large data sets (Rezaei et al. 2021 ).

A remarkable research path based on artificial intelligence is integrating qualitative information like product reviews as predictors, going beyond numerical feedback (Rajasekaran and Priyadarshini 2021 ) or tweets. These data can be processed and made accessible for forecasting with ML-based sentiment analysis techniques (Ding et al. 2016 ).

4.5 Forecasting techniques and algorithms

To describe the techniques and algorithms employed, we sorted the papers by forecasting purpose as described in Sect.  2 , then assigned them to different algorithms, either from time series forecasting, statistical techniques, or ML algorithms. Table 7 lists all papers for which an assignment was possible, and the respective techniques used. If a comparison was possible, the best-performing algorithm is marked in this table.

The approaches listed in Table  7 are overlap-free, but some papers use more than one version of an approach, i.e., more than one algorithm from a category. E.g., TabNet is a DeepLearning version of neural networks (NN), and different variants of GradientBoosting are compared in one paper (CatBoost/LightGBM, not differentiated in the table below) (Imran and Amin 2020 ).

The algorithm used most frequently (Fig.  6 ) is the Random Forest algorithm (RF, 10 papers), followed by Support Vector Machines (SVM, 8 papers), Neural Networks (NN, 6 papers), logistic regression (Logit, 6 papers), GradientBoosting (5 papers), Ordinary Least Squares regression (OLS, 4 papers), Adaptive Boosting (AdaBoost), Linear Discriminant Analysis (LDA), and CART (Classification and Regression Trees, 3 papers each).

figure 6

Most frequently used algorithms (used in at least three papers)

The papers focusing on return volume use time series forecasts like (AutoRegressive) Moving Averages (MA), Single Exponential Smoothing (SES), and Holt-Winters Smoothing (HWS) more frequently than ML algorithms. Nevertheless, when considering a predict-aggregate approach as proposed by Shang et al. ( 2020 ), these ML techniques could be helpful in forecasting return decisions first and cumulating the propensity results for the volume prediction in the second step.

In forecasting binary return decisions, Random Forests (RF) (Ahmed et al. 2016 ; Heilig et al. 2016 ; Ketzenberg et al. 2020 ), Neural Networks (NN) (Imran and Amin 2020 ; Ketzenberg et al. 2020 ), as well as Adaptive Boosting (AdaBoost) (Urbanke et al. 2015 , 2017 ) showed high prediction performance. The performance of different algorithms varies depending on the data set, the implementation, and the parameterization used. For this reason, it is hardly possible to make a generally valid statement regarding performance levels. Combining several algorithms in ensembles (Asdecker and Karl 2018 ; Heilig et al. 2016 ) seems advantageous, at least for retrospective analytical purposes, when the required computing resources are less relevant.

When evaluating different forecasting algorithms for return decisions, imbalanced classes (especially evident for low return shares in non-fashion datasets) seem to be handled differently depending on the algorithms. Class imbalances might distort comparison results in some publications. Random oversampling as a measure of data preparation can solve this problem (Hofmann et al. 2020 ).

High-performance algorithms are needed for real-time predictions, e.g., graph and random-walk-based (Li et al. 2018 ; Zhu et al. 2018 ). According to Li et al. ( 2018 ), the proposed algorithm “HyperGo” performs best for most performance metrics.

4.6 E-Commerce and machine learning taxonomy extension

In their literature review regarding the use of ML techniques in e-commerce, Micol Policarpo et al. ( 2021 ) propose a taxonomy to visualize specific ML algorithms in the context of e-commerce platforms. This novel kind of taxonomy is based on direct acyclic graphs, i.e., all input variables need to be fulfilled to reach the target. The first level of the taxonomy represents different target goals for the use of ML in e-commerce. While returns forecasting (“product return prediction”) is identified as an essential goal among others (purchase prediction, repurchase prediction, customer relationship management, discovering relationships between data, fraud detection, and recommendation systems), it was excluded from the taxonomy they developed, possibly because the review comprised only two relevant papers on this topic (Micol Policarpo et al. 2021 ). The review at hand proposes an extension of Micol Policarpo’s taxonomy, renaming the goal to “consumer returns forecasting”. This extension reflects and synthesizes the consumer returns forecasting studies reviewed.

The middle level of the taxonomy represents properties and features that support this superordinate goal. On this level, our extension does not include return fraud detection, which we propose to be integrated into the existing category of “fraud detection”, separated into transaction analysis and consumer analysis (Micol Policarpo et al. 2021 ). Circles represent the necessary data to execute the analysis, referring to categories introduced in (Micol Policarpo et al. 2021 ), with an additional “return history” category. The bottom level presents the algorithms described frequently, while some streamlining is required regarding the tools and approaches that seem the most common or most appropriate.

The schematic above (Fig.  7 ) is to be read as follows: In the context of E-Commerce  +  Artificial Intelligence (Layer 1), Consumer Return Forecasting (Layer 2) is an essential goal among six other goals. Layer 3 presents different purposes of analysis, which are the base for return forecasting. Realtime Basket Analysis is based on clickstream data and basket composition (browsing activities) to target interventions. Basket analysis benefits from customer and product information (dotted line). Graph-based approaches (Li et al. 2018 ; Zhu et al. 2018 ) are promising for real-time analysis due to their lower computing requirements, although cloud-based implementation of more complex algorithms or ensemble models might be feasible (Fuchs and Lutz 2021 ; Heilig et al. 2016 ; Hofmann et al. 2020 ). Customer Analysis and Product Analysis (e.g., Potdar and Rogers 2012 ) require adequate Data Preparation in the sense of input variable generation, extraction, and selection (Urbanke et al. 2015 , 2017 ). For these purposes, data regarding return history (e.g., Hofmann et al. 2020 ; Ketzenberg et al. 2020 ), purchase history (e.g., Cui et al. 2020 ; Fu et al. 2016 ), customer personal information (e.g., Heilig et al. 2016 ; Ketzenberg et al. 2020 ), clickstream data, and browsing activities are required as input (shown by cross-hatched circles). For each purpose, one or more possible algorithms are shown.

figure 7

Proposed consumer returns forecasting extension to the E-commerce and Machine Learning techniques taxonomy of Micol Policarpo et al. ( 2021 , p. 13)

Compared to predicting purchase intention, return predictions seem to require more levels of data. Nevertheless, even simple rule-based interventions can promise benefits, e.g., selection orders that inevitably lead to a return shipment can be easily recognized (Hofmann et al. 2020 ; Sweidan et al. 2020 ). Different ML techniques are helpful for data preparation and input variable (feature) extraction and generation when considering more complex interrelations. NeuralNet is one example of an automatic selection of relevant features (Urbanke et al. 2017 ). These approaches are not only able to enhance forecasting accuracy (Rezaei et al. 2021 ) but can also render the many possible variables interpretable about their content.

5 Discussion

The analysis of the papers above revealed that research in this discipline seems heterogeneous and partly fragmented, and clear-cut research strands are still hard to identify. Thus, the existing literature calls for further publications to render this research field more comprehensive. Below, research opportunities are derived and embedded in a conceptual research framework derived from the results of the existing literature, also integrating the extension of the E-Commerce and Machine Learning taxonomy (Fig.  7 ). A conceptual framework improves the understanding of a complex topic by naming and explaining key concepts and their relationships important to a specific field (Jabareen 2009 ; Miles et al. 2020 ). Thus, this framework aims to organize problems and solutions discussed in the consumer returns forecasting literature and to embed and classify potential future research topics in the existing knowledge base (Ravitch and Riggan 2017 ). The subsections following the framework outline some potential research avenues (P1–P6) that have been touched on in the past but still leave considerable opportunities for further insights. These proposals should not be seen as comprehensive due to numerous other research opportunities in this field but rather as prioritization based on the current literature.

The framework derived (Fig.  8 ) underlines the interdisciplinary nature of this research field, integrating different perspectives (information systems research, marketing and operations perspective, and strategy and management perspective). From a managerial point of view, the literature included in this review is biased towards the information systems perspective. Thus, in contrast to the framework developed by Cirqueira et al. ( 2020 ) for purchase prediction, we do not take a process perspective but instead emphasize the interdependencies and interactions between research topics and highlight the managerial need to take a strategical perspective similar to the framework developed by Winklhofer et al. ( 1996 ). Consequently, a meta-layer on forecasting frameworks and practices includes the mainly technical development frameworks in this review but also accentuates the need for further research regarding actual organizational forecasting practices (e.g., P2, P5, P6). Around this meta-layer, some related research strands are linked in order to embed the topic of returns forecasting in the research landscape. E.g., in general, forecasting purchases and returns could be linked (P6), also effecting inventory decisions.

figure 8

Conceptual Consumer Return Forecasting Framework

The center of the framework consists of three dimensions, namely purposes and tasks, predictors, and techniques. Depending on the strategical purpose, tasks are derived that determine (1) the data (predictors) needed and (2) the usable techniques to execute the forecasting. Different forecasting techniques require an individual set of predictors, whereas the availability of specific data allows and determines the use of more or less sophisticated algorithms.

In the literature, some forecasting purposes were more pronounced (return decisions or propensities), while others have gained less attention (return timing, P1). Regarding the data necessary for accurate forecasting, the return predictors discussed often were hardly comparable, as they originated from different data sources, different industries, were related to different dimensions, or were aggregated in another way. Systematically linking forecasting predictors and research on return drivers and reasons could contribute significant insights (P4) that, from a marketing perspective, may support the development of effective preventive instruments. Furthermore, the literature mainly refers to the fashion or consumer electronics industry, leaving room to validate the findings in the context of other industries (P3).

When (automatically) selecting or creating predictors, the boundaries between predictors and prediction techniques are blurred as machine learning algorithms prepare the input data before executing a forecasting model. Regarding forecasting techniques, time series forecasting was seldom used in recent publications. Machine learning algorithms were the most popular subject of investigation, with random forests, support vector machines, and neural networks as the most popular implementations. Classical statistical models like logit models for return decisions or OLS regression gained less research attention. Literature on end-of-life return forecasting could complement the research on techniques and their accuracy. Most publications used technical indicators for assessing the accuracy of forecasting models, which is the information systems perspective. From a managerial position, evaluating (monetary) performance outcomes (e.g., Ketzenberg et al. 2020 ) of forecasting systems should be more relevant.

5.1 Research proposal P1: return timing for consumer returns

Toktay et al. ( 2004 ) encouraged the integrated forecasting of the return rate and the return time lag. In line with this, Shang et al. ( 2020 ) criticize the missing focus on the timing of return forecasts. The reviewed literature confirms that forecasting return propensities and decisions are more prominent than timing and volume forecasts. While the knowledge of when a return is expected is vital in managing end-of-life returns that occur over the years, for retail consumer returns, return periods are mostly 14–30 days. Thus, the variability of return timing seems limited compared to end-of-life returns in this context, which makes this forecasting purpose less critical. Nevertheless, some retailers offer up to 100 days of free returns (e.g., Zalando). Consequently, more studies about the importance of return timing forecasts in the e-commerce context from a business and planning perspective and their interdependence with return processing or warehousing issues could shed light on this topic and complement the current literature (Toktay et al. 2004 ; Shang et al. 2020 ).

5.2 Research proposal P2: realtime forecasting systems

Another research gap became apparent regarding the real-time use of forecasting systems and the associated activities and interventions, building on the initial research and the frameworks already published (e.g., Heilig et al. 2016 ; Urbanke et al. 2015 ). The generic framework developed by Fuchs and Lutz ( 2021 ) could serve as a launching pad for this stream of research.

The paper from Ketzenberg et al. ( 2020 ) could act as a stimulus and inspiration for a similar approach, not only focusing on return abuse as already examined but on return forecasting in general, the possible associated interventions for various consumer groups, and the resulting consequences for the retailer’s profit. Even the methodology of customer classification could be helpful for many retailers in targeting interventions.

Before real-time return forecasting is implemented, associated preventive return management instruments need to be designed and evaluated. Many of these measures are discussed (e.g., Urbanke et al. 2015 ; Walsh et al.  2014 ), but an overview of which preventive measures (for some examples, see Walsh and Möhring 2017 ) are effective in general (1) and how forecasting accuracy interdepends with their usefulness (2) is still missing, to substantially link the topics of forecasting and interventions. No answers could be found to the call by Urbanke et al. ( 2015 ) for field experiments to investigate such a link.

Thanks to cloud and parallelization technologies and the associated scalability of computing power (Bekkerman et al. 2011 ), algorithm runtimes are becoming less relevant. However, especially for real-time use, it should be evaluated which algorithms and underlying datasets exhibit an appropriate relationship between the targeted forecasting accuracy, the expected benefit, and the required computing power.

Recommendations concerning the algorithms and techniques can be derived (Urbanke et al. 2015 ), and a generic implementation framework was developed (Fuchs and Lutz 2021 ). However, from a business perspective, no contributions could be found regarding the actual implementation of real-time forecasting systems, the interventions involved, and their impact on consumer behavior or profit (also see proposal P5). In addition, the implementations of such systems need to be analyzed concerning the cost-effectiveness of the required investments.

5.3 Research proposal P3: cross-industry and multiple dataset studies

Many publications rely on a single data set from a specific industry or retailer. Only a few compare several retailers (e.g., Cui et al. 2020 ). Studies including and comparing different countries are missing, which is especially interesting since legal regulations for returns vary. For example, in contrast to the U.S., citizens within the EU are granted a 14-day right of withdrawal for distance selling purchases. Footnote 4 Although in most developed countries, liberal and broadly comparable returns policies are standard in practice due to competitive pressure, the generalizability of the results is frequently limited. One remedy for this problem is to use multiple data sets from different retailers (e.g., electronics vs. jewelry, Shang et al. 2020 ). Admittedly, it is challenging to simultaneously collaborate with several retailers and to combine different data sets, due to reasons of preserving corporate privacy and synchronizing various data sources. Nevertheless, research needs to draw conclusions from single data points, as well as logically replicate or falsify those results by integrating more data points to find patterns of similarities and differences, either within or cross-study (Hamermesh 2007 ). Therefore, we suggest that future studies acquire industry-related datasets from several retailers at once or replicate existing studies, which aligns with the aim and scope of Management Review Quarterly (Block and Kuckertz 2018 ). Cross-industry or cross-country manuscripts, which go beyond the mere assertion of an industry-agnostic approach (Hofmann et al. 2020 ) and jointly investigate data from several sectors, would promise an additional gain in knowledge and could be less challenging from a privacy perspective.

5.4 Research proposal P4: extended study of relevant predictors in forecasting applications

Although not the main focus of this review, predictors of consumer returns are especially interesting for marketing and e-commerce research, for example, regarding preventive measures for avoiding returns. In the past, many consumer return papers highlighted single aspects or a limited selection of return drivers or preventive measures employed but rarely attempted to model return behavior as comprehensively as possible. However, the latter is the very objective of returns forecasting, which is why the findings on influencing factors in articles with a forecasting focus tend to be more holistic, although not sufficiently complete (Hachimi et al. 2018 ). Some return reasons named in the literature (e.g., Stöcker et al. 2021 ) have not yet been included in forecasting approaches, and vice versa, only a part of the influencing factors investigated could be mapped to a return reason categorization. The reason categories assigned (Sect.  4.4 , Table  6 ) still contain some uncertainty. For example, a customer’s product return history may reflect the general returning behavior of a customer to some extent, while it can not be ruled out that repeated logistical problems caused the returns. Product attributes may reflect information gaps that consumers can only assess after physically inspecting the product, whereas product price–frequently cited and influential product attribute—is only related to information gaps when considering the price-performance ratio (Stöcker et al. 2021 ). Technical information about the web browser or device used by the customer is difficult to categorize, as it may reflect behavioral (impulse-driven mobile shopping) as well as informational (small display with few visible information) aspects. The payment method chosen by a customer, for example, could not be linked to one of the reason categories.

This reasoning should serve as a basis for linking forecasting predictors and return reasons more closely in the future. For example, the respective relative weighting of return drivers is more likely to be obtained considering as many factors involved as possible, minimizing the unexplained variation. From the reviewed literature, we extracted 18 different return predictor categories. For instance, seven papers (Cui et al. 2020 ; Fu et al. 2016 ; Ketzenberg et al. 2020 ; Li et al. 2018 , 2019 ; Urbanke et al. 2015 , 2017 ) integrated more than five predictor categories. But even though some papers integrate more than 5,000 features for automated feature selection (Ketzenberg et al. 2020 ), there are still combinations of input variable categories that have not been investigated and, more importantly, interpreted yet. Therefore, we call for more comprehensive research on return predictors and their interpretation, including associated preventive return measures, in the context of return forecasting.

5.5 Research proposal P5: descriptive case studies and business implementations surveys

This review identified a lack of publications regarding the actual benefit and the diffusion of consumer returns forecasting systems in different scopes and industries, building on the papers presenting return forecasting frameworks. In 2013, less than half of German retailers analyzed the likelihood of returns (Pur et al. 2013 ). Most of those who did were using naïve approaches that might be outperformed by the models presented in this review. Still, we do not know the status quo regarding the degree of adoption and implementation of forecasting systems for consumer returns in e-commerce firms (e.g., see Mentzer and Kahn 1995 for sales forecasting systems), country-specific and internationally.

Furthermore, the impact of return forecasting practices on company performance should be examined not only based on modeling, but on retrospective data (e.g., see Zotteri and Kalchschmidt 2007 for a similar study on demand forecasting practices in manufacturing). A possible hypothesis to examine might be that accuracy measures like RMSE or precision/recall and subsequently even the choice of the most accurate machine learning algorithm (e.g., see Asdecker and Karl 2018 ) are less relevant from a business perspective: (1) No algorithm clearly outperforms all other algorithms, and (2) the correlation between technical indicators and business value is unstable (Leitch and Tanner 1991 ). Methodologically, implementations of consumer returns forecasting in e-commerce should thus be surveyed and analyzed with multivariate statistical methods to examine critical factors and circumstances of return forecasting systems – similar to publications on reverse logistics performance (Agrawal and Singh 2020 ).

5.6 Research proposal P6: holistic forward and backward forecasting framework for e-tailers

Some publications present frameworks for forecasting returns (Fuchs and Lutz 2021 ). Nevertheless, in the past, forecasting in retail and especially e-commerce commonly focused more on demand (Micol Policarpo et al. 2021 ) than returns. Current approaches for demand forecasting try to predict individual purchase intentions based on click-stream data, online session attributes, and customer history (e.g., Esmeli et al. 2021 ). Our systematic approach could not identify any paper that connects and integrates both directions in e-commerce forecasting, neither conceptual (frameworks) nor with a quantitative or case-study-like approach. Nevertheless, first implementations of return predictions in inventory management are presented (e.g., Goedhart et al. 2023 ). Subsequently, similar to Goltsos et al. ( 2019 ), we call for research addressing both demand and return uncertainties by providing a holistic forecasting framework in the context of e-commerce.

6 Conclusion

To date, no systematic literature review has undertaken an in-depth exploration of the topic of forecasting consumer returns in the e-commerce context. Previous reviews have primarily focused on product returns forecasting within the broader context of reverse logistics or closed-loop supply chain management (Agrawal et al. 2015 ; Ambilkar et al. 2021 ; Hachimi et al. 2018 ). Regrettably, the interdisciplinary nature of this subject has often been overlooked, also neglecting the inclusion of results from information systems research.

The review first aims to provide an overview of the existing literature (Kraus et al. 2022 ) on forecasting consumer returns. The findings confirm that this once novel topic has significantly evolved in recent years. Consequently, this review is timely in examining current gaps and establishing a robust foundation for future research, which forms a second goal of systematic reviews (Kraus et al. 2022 ). The current body of work encompasses various aspects from different domains, including marketing, operations management/research, and information systems research, highlighting the interdisciplinary nature of e-commerce analytics and research. As a result, future studies can find suitable publication outlets in domain-specific as well as methodologically oriented journals and conferences.

Scientifically, the algorithms and predictors investigated in previous research serve as a foundational reference for subsequent publications and informed decisions regarding research design, ensuring that specific predictors and techniques are not overlooked. Researchers can utilize this review and the research framework developed as a structuring guide, e.g., regarding relevant publications on already examined algorithms or predictors.

Managerially, the extended taxonomy for machine learning in e-commerce (Micol Policarpo et al. 2021 ) can serve as a guideline for implementing forecasting systems for consumer returns. This review classifies possible prediction purposes, allowing businesses to apply them based on their respective challenges. Exploring the most frequently used predictors reveals the data that must be collected for the respective purposes. This review also offers valuable insights into data (pre-)processing and highlights popular algorithms. Furthermore, frameworks are outlined that support the design and implementation phase of such forecasting systems, supporting analytical purposes or enabling direct interventions during the online shopping process flow. As an exemplary and promising application, return policies could be personalized (Abbey et al. 2018 ) by identifying opportunistic or fraudulent basket compositions or high-returning customers, thereby reducing unwanted returns (Lantz and Hjort 2013 ).

Finally, a limitation of this review is the exclusion of forecasting algorithms for end-of-use returns, which could potentially be applicable to forecasting shorter-term retail consumer returns. However, the closed-loop supply chain and reverse logistics literature has been systematically excluded. Hence, future reviews could synthesize previous reviews on reverse logistics forecasting with the more detailed findings presented in this paper.

The use of Google Scholar for systematic scientific information search is controversely discussed (e.g., Halevi et al. 2017 ) due to the missing quality control and indexing guidelines, as well as limited advanced search options. But as an additional database for an initial search, the wide coverage of this search system can enrich the results.

External citations according to Google Scholar, which is preferable for citation tracking over controlled databases (Halevi et al. 2017 ).

Other literature also describes a counteracting effect of a reduced price due to lowered quality expectations or a higher perceived value of the “deal” itself (e.g., Sahoo et al. 2018 ).

It should be noted that the relevance of the forecasting topic depends on the maturity of the e-commerce sector. In most developing countries, B2C e-commerce is comparatively young and consumer returns are not yet a common phenomenon, which is why research on return forecasts is relatively insignificant for these countries.

References

Abbey JD, Ketzenberg ME, Metters R (2018) A more profitable approach to product returns. MIT Sloan Manag Rev 60(1):71–74

Google Scholar  

Abdulla H, Ketzenberg ME, Abbey JD (2019) Taking stock of consumer returns: a review and classification of the literature. J Oper Manag 65(6):560–605. https://doi.org/10.1002/joom.1047

Article   Google Scholar  

Agrawal S, Singh RK (2020) Forecasting product returns and reverse logistics performance: structural equation modelling. MEQ 31(5):1223–1237. https://doi.org/10.1108/MEQ-05-2019-0109

Agrawal S, Singh RK, Murtaza Q (2015) A literature review and perspectives in reverse logistics. Resour Conserv Recycl 97:76–92. https://doi.org/10.1016/j.resconrec.2015.02.009

Ahmed F, Samorani M, Bellinger C, Zaiane OR (2016) Advantage of integration in big data: feature generation in multi-relational databases for imbalanced learning. In: Proceedings of the 4th IEEE international conference on big data, pp 532–539. https://doi.org/10.1109/BigData.2016.7840644

Ahsan K, Rahman S (2016) An investigation into critical service determinants of customer to business (C2B) type product returns in retail firms. Int Jnl Phys Dist Log Manage 46(6/7):606–633. https://doi.org/10.1108/IJPDLM-09-2015-0235

Akter S, Wamba SF (2016) Big data analytics in e-commerce: a systematic review and agenda for future research. Electron Markets 26(2):173–194. https://doi.org/10.1007/s12525-016-0219-0

Alfonso V, Boar C, Frost J, Gambacorta L, Liu J (2021) E-commerce in the pandemic and beyond. BIS Bulletin 36

Ambilkar P, Dohale V, Gunasekaran A, Bilolikar V (2021) Product returns management: a comprehensive review and future research agenda. Int J Prod Res. https://doi.org/10.1080/00207543.2021.1933645

Asdecker B (2015) Returning mail-order goods: analyzing the relationship between the rate of returns and the associated costs. Logist Res 8(1):1–12. https://doi.org/10.1007/s12159-015-0124-5

Asdecker B, Karl D (2018) Big data analytics in returns management–are complex techniques necessary to forecast consumer returns properly? In: Proceedings of the 2nd international conference on advanced research methods and analytics, Valencia, pp 39–46. https://doi.org/10.4995/CARMA2018.2018.8303

Asdecker B, Karl D, Sucky E (2017) Examining drivers of consumer returns in e-tailing with real shop data. In: Proceedings of the 50th Hawaii international conference on system sciences (HICSS). https://doi.org/10.24251/HICSS.2017.507

Bandara K, Shi P, Bergmeir C, Hewamalage H, Tran Q, Seaman B (2019) Sales Demand forecast in e-commerce using a long short-term memory neural network methodology. In: Gedeon T, Wong KW, Lee M (eds) Neural information processing: proceedings of the 26th international conference on neural information processing, 1st edn., vol 11955, pp 462–474. https://doi.org/10.1007/978-3-030-36718-3_39

Barbosa MW, La Vicente AdC, Ladeira MB, de Oliveira MPV (2018) Managing supply chain resources with big data analytics: a systematic review. Int J Log Res Appl 21(3):177–200. https://doi.org/10.1080/13675567.2017.1369501

Bekkerman R, Bilenko M, Langford J (2011) Scaling up machine learning. In: Proceedings of the 17th ACM SIGKDD international conference tutorials, p 1. https://doi.org/10.1145/2107736.2107740

Bernon M, Cullen J, Gorst J (2016) Online retail returns management. Int J Phys Distrib Logist Manag 46(6/7):584–605. https://doi.org/10.1108/IJPDLM-01-2015-0010

Block J, Kuckertz A (2018) Seven principles of effective replication studies: strengthening the evidence base of management research. Manag Rev Q 68(4):355–359. https://doi.org/10.1007/s11301-018-0149-3

Breiman L, Friedman JH, Olshen RA, Stone CJ (1984) Classification and regression trees. Wadsworth & Brooks/Cole Advanced Books & Software, Monterey, CA

Cirqueira D, Hofer M, Nedbal D, Helfert M, Bezbradica M (2020) Customer purchase behavior prediction in e-commerce: a conceptual framework and research Agenda. In: Ceci M, Loglisci C, Manco G, Masciari E, Raś Z (eds) New frontiers in mining complex patterns, vol 11948. Springer, Cham, pp 119–136. https://doi.org/10.1007/978-3-030-48861-1_8

Chapter   Google Scholar  

Clottey T, Benton WC (2014) Determining core acquisition quantities when products have long return lags. IIE Trans 46(9):880–893. https://doi.org/10.1080/0740817X.2014.882531

Cook SC, Yurchisin J (2017) Fast fashion environments: consumer’s heaven or retailer’s nightmare? Int J Retail Distrib Manag 45(2):143–157. https://doi.org/10.1108/IJRDM-03-2016-0027

Cui H, Rajagopalan S, Ward AR (2020) Predicting product return volume using machine learning methods. Eur J Oper Res 281(3):612–627. https://doi.org/10.1016/j.ejor.2019.05.046

Dalecke S, Karlsen R (2020) Designing dynamic and personalized nudges. In: Chbeir R, Manolopoulos Y, Akerkar R, Mizera-Pietraszko J (eds) Proceedings of the 10th international conference on web intelligence, mining and semantics. ACM, New York, pp 139–148. https://doi.org/10.1145/3405962.3405975

De P, Hu Y, Rahman MS (2013) Product-oriented web technologies and product returns: an exploratory study. Inf Syst Res 24(4):998–1010. https://doi.org/10.1287/isre.2013.0487

de Brito MP, Dekker R, Flapper SDP (2005) Reverse logistics: a review of case studies. In: Klose A, Fleischmann B (eds) Distribution logistics, vol 544. Springer. Berlin, Heidelberg, pp 243–281

Denyer D, Tranfield D (2009) Producing a systematic review. In: Buchanan DA, Bryman A (eds) The Sage handbook of organizational research methods. Sage, Thousand Oaks, CA, pp 671–689

Difrancesco RM, Huchzermeier A, Schröder D (2018) Optimizing the return window for online fashion retailers with closed-loop refurbishment. Omega 78:205–221. https://doi.org/10.1016/j.omega.2017.07.001

Diggins MA, Chen C, Chen J (2016) A review: customer returns in fashion retailing. In: Choi T-M (ed) Analytical modeling research in fashion business. Springer, Singapore, pp 31–48. https://doi.org/10.1007/978-981-10-1014-9_3

Ding Y, Xu H, Tan BCY (2016) Predicting product return rate with “tweets”. In: Proceedings of the 20th Pacific asia conference on information systems

Drechsler S, Lasch R (2015) Forecasting misused e-commerce consumer returns. In: Logistics management: proceedings of the 9th conference “Logistikmanagement”. Cham, pp 203–215.

Duong QH, Zhou L, Meng M, van Nguyen T, Ieromonachou P, Nguyen DT (2022) Understanding product returns: a systematic literature review using machine learning and bibliometric analysis. Int J Prod Econ 243:108340. https://doi.org/10.1016/j.ijpe.2021.108340

Esmeli R, Bader-El-Den M, Abdullahi H (2021) Towards early purchase intention prediction in online session based retailing systems. Electron Markets 31(3):697–715. https://doi.org/10.1007/s12525-020-00448-x

Fernández-Delgado M, Cernadas E, Barro S, Amorim D (2014) Do we need hundreds of classifiers to solve real world classification problems? J Mach Learn Res 15(1):3133–3181

Fisher RA (1936) The use of multiple measurements in taxonomic problems. Ann Eugen 7(2):179–188. https://doi.org/10.1111/j.1469-1809.1936.tb02137.x

Frei R, Jack L, Brown S (2020) Product returns: a growing problem for business, society and environment. IJOPM 40(10):1613–1621. https://doi.org/10.1108/IJOPM-02-2020-0083

Frei R, Jack L, Krzyzaniak S-A (2022) Mapping product returns processes in multichannel retailing: challenges and opportunities. Sustainability 14(3):1382. https://doi.org/10.3390/su14031382

Fu Y, Liu G, Papadimitriou S, Xiong H, Li X, Chen G (2016) Fused latent models for assessing product return propensity in online commerce. Decis Support Syst 91:77–88. https://doi.org/10.1016/j.dss.2016.08.002

Fuchs K, Lutz O (2021) A stitch in time saves nine–a meta-model for real-time prediction of product returns in ERP systems. In: Proceedings of the 29th european conference on information systems

Ge D, Pan Y, Shen Z-J, Di Wu, Yuan R, Zhang C (2019) Retail supply chain management: a review of theories and practices. J Data Manag 1:45–64. https://doi.org/10.1007/s42488-019-00004-z

Goedhart J, Haijema R, Akkerman R (2023) Modelling the influence of returns for an omni-channel retailer. Eur J Oper Res 306(3):1248–1263. https://doi.org/10.1016/j.ejor.2022.08.021

Goltsos TE, Ponte B, Wang SX, Liu Y, Naim MM, Syntetos AA (2019) The boomerang returns? Accounting for the impact of uncertainties on the dynamics of remanufacturing systems. Int J Prod Res 57(23):7361–7394. https://doi.org/10.1080/00207543.2018.1510191

Govindan K, Bouzon M (2018) From a literature review to a multi-perspective framework for reverse logistics barriers and drivers. J Clean Prod 187:318–337. https://doi.org/10.1016/j.jclepro.2018.03.040

Hachimi HEL, Oubrich M, Souissi O (2018) The optimization of reverse logistics activities: a literature review and future directions. In: Proceedings of the 5th IEEE international conference on technology management, operations and decisions, Piscataway, NJ, pp 18–24. https://doi.org/10.1109/ITMC.2018.8691285

Halevi G, Moed H, Bar-Ilan J (2017) Suitability of Google Scholar as a source of scientific information and as a source of data for scientific evaluation—review of the Literature. J Informet 11(3):823–834. https://doi.org/10.1016/j.joi.2017.06.005

Hamermesh DS (2007) Viewpoint: Replication in economics. Can J of Econ 40(3):715–733. https://doi.org/10.1111/j.1365-2966.2007.00428.x

Hastie T, Tibshirani R, Friedman JH (2017) The elements of statistical learning: data mining, inference, and prediction. Springer, New York, NY

Heilig L, Hofer J, Lessmann S, Voß S (2016) Data-driven product returns prediction: a cloud-based ensemble selection approach. In: Proceedings of the 24th european conference on information systems

Hess JD, Mayhew GE (1997) Modeling merchandise returns in direct marketing. J Direct Market 11(2):20–35. https://doi.org/10.1002/(SICI)1522-7138(199721)11:2<20:AID-DIR4>3.0.CO;2-#

Hevner A, March S, Park J, Ram S (2004) Design science in information systems research. MIS Q 28(1):75. https://doi.org/10.2307/25148625

Hofmann A, Gwinner F, Fuchs K, Winkelmann A (2020) An industry-agnostic approach for the prediction of return shipments. In: Proceedings of the 26th Americas conference on information systems, pp 1–10

Hong Y, Pavlou PA (2014) Product fit uncertainty in online markets: nature, effects, and antecedents. Inf Syst Res 25(2):328–344. https://doi.org/10.1287/isre.2014.0520

Imran AA, Amin MN (2020) Predicting the return of orders in the e-tail industry accompanying with model interpretation. Procedia Comput Sci 176:1170–1179. https://doi.org/10.1016/j.procs.2020.09.113

Jabareen Y (2009) Building a conceptual framework: philosophy, definitions, and procedure. Int J Qual Methods 8(4):49–62. https://doi.org/10.1177/160940690900800406

John S, Shah BJ, Kartha P (2020) Refund fraud analytics for an online retail purchases. J Bus Anal 3(1):56–66. https://doi.org/10.1080/2573234X.2020.1776164

Joshi T, Mukherjee A, Ippadi G (2018) One size does not fit all: predicting product returns in e-commerce platforms. In: Proceedings of the 10th IEEE/ACM international conference on advances in social networks analysis and mining, pp 926–927. https://doi.org/10.1109/ASONAM.2018.8508486

Kaiser D (2018) Individualized choices and digital nudging: multiple studies in digital retail channels. Karlsruher Institut für Technologie (KIT). https://doi.org/10.5445/IR/1000088341

Karl D, Asdecker B (2021) How does the Covid-19 pandemic affect consumer returns: an exploratory study. In: Proceedings of the 50th european marketing academy conference, vol 50

Karl D, Asdecker B, Feddersen-Arden C (2022) The impact of displaying quantity scarcity and relative discounts on sales and consumer returns in flash sale e-commerce. In: Proceedings of the 55th hawaii international conference on system sciences. https://doi.org/10.24251/HICSS.2022.556

Ketzenberg ME, Abbey JD, Heim GR, Kumar S (2020) Assessing customer return behaviors through data analytics. J Oper Manag 66(6):622–645. https://doi.org/10.1002/joom.1086

Kraus S, Breier M, Lim WM, Dabić M, Kumar S, Kanbach D, Mukherjee D, Corvello V, Piñeiro-Chousa J, Liguori E, Palacios-Marqués D, Schiavone F, Ferraris A, Fernandes C, Ferreira JJ (2022) Literature reviews as independent studies: guidelines for academic practice. Rev Manag Sci 16(8):2577–2595. https://doi.org/10.1007/s11846-022-00588-8

Lantz B, Hjort K (2013) Real e-customer behavioural responses to free delivery and free returns. Electron Commer Res 13(2):183–198. https://doi.org/10.1007/s10660-013-9125-0

Leitch G, Tanner JE (1991) Economic forecast evaluation: profits versus the conventional error measures. Am Econ Rev 81(3):580–590

Li X, Zhuang Y, Fu Y, He X (2019) A trust-aware random walk model for return propensity estimation and consumer anomaly scoring in online shopping. Sci China Inf Sci 62(5). https://doi.org/10.1007/s11432-018-9511-1

Li J, He J, Zhu Y (2018) E-tail product return prediction via hypergraph-based local graph cut. In: Proceedings of the 24th ACM sigkdd international conference on knowledge discovery & data mining, New York, NY, pp 519–527. https://doi.org/10.1145/3219819.3219829

Melacini M, Perotti S, Rasini M, Tappia E (2018) E-fulfilment and distribution in omni-channel retailing: a systematic literature review. Int Jnl Phys Dist Log Manage 48(4):391–414. https://doi.org/10.1108/IJPDLM-02-2017-0101

Mentzer JT, Kahn KB (1995) Forecasting technique familiarity, satisfaction, usage, and application. J Forecast 14(5):465–476. https://doi.org/10.1002/for.3980140506

Micol Policarpo L, da Silveira DE, da Rosa RR, Antunes Stoffel R, da Costa CA, Victória Barbosa JL, Scorsatto R, Arcot T (2021) Machine learning through the lens of e-commerce initiatives: an up-to-date systematic literature review. Comput Sci Rev 41:100414. https://doi.org/10.1016/j.cosrev.2021.100414

Miles MB, Huberman AM, Saldaña J (2020) Qualitative data analysis: A methods sourcebook. Sage, Los Angeles

National Retail Federation/Appriss Retail (2023) Consumer returns in the retail industry 2022. https://nrf.com/research/2022-consumer-returns-retail-industry . Accessed 23 May 2023

Ni J, Neslin SA, Sun B (2012) Database submission the ISMS durable goods data sets. Mark Sci 31(6):1008–1013. https://doi.org/10.1287/mksc.1120.0726

Page MJ, McKenzie JE, Bossuyt PM, Boutron I, Hoffmann TC, Mulrow CD, Shamseer L, Tetzlaff JM, Akl EA, Brennan SE, Chou R, Glanville J, Grimshaw JM, Hróbjartsson A, Lalu MM, Li T, Loder EW, Mayo-Wilson E, McDonald S, McGuinness LA, Stewart LA, Thomas J, Tricco AC, Welch VA, Whiting P, Moher D (2021) The PRISMA 2020 statement: an updated guideline for reporting systematic reviews. Syst Rev 10:89. https://doi.org/10.1186/s13643-021-01626-4

Pandya R, Pandya J (2015) C5.0 algorithm to improved decision tree with feature selection and reduced error pruning. IJCA 117(16):18–21. https://doi.org/10.5120/20639-3318

Petropoulos F, Apiletti D, Assimakopoulos V, Babai MZ, Barrow DK, Ben Taieb S, Bergmeir C, Bessa RJ, Bijak J, Boylan JE, Browell J, Carnevale C, Castle JL, Cirillo P, Clements MP, Cordeiro C, Cyrino Oliveira FL, de Baets S, Dokumentov A, Ellison J, Fiszeder P, Franses PH, Frazier DT, Gilliland M, Gönül MS, Goodwin P, Grossi L, Grushka-Cockayne Y, Guidolin M, Guidolin M, Gunter U, Guo X, Guseo R, Harvey N, Hendry DF, Hollyman R, Januschowski T, Jeon J, Jose VRR, Kang Y, Koehler AB, Kolassa S, Kourentzes N, Leva S, Li F, Litsiou K, Makridakis S, Martin GM, Martinez AB, Meeran S, Modis T, Nikolopoulos K, Önkal D, Paccagnini A, Panagiotelis A, Panapakidis I, Pavía JM, Pedio M, Pedregal DJ, Pinson P, Ramos P, Rapach DE, Reade JJ, Rostami-Tabar B, Rubaszek M, Sermpinis G, Shang HL, Spiliotis E, Syntetos AA, Talagala PD, Talagala TS, Tashman L, Thomakos D, Thorarinsdottir T, Todini E, Trapero Arenas JR, Wang X, Winkler RL, Yusupova A, Ziel F (2022) Forecasting: theory and practice. Int J Forecast 38(3):705–871. https://doi.org/10.1016/j.ijforecast.2021.11.001

Polikar R (2006) Ensemble based systems in decision making. IEEE Circuits Syst Mag 6(3):21–45. https://doi.org/10.1109/mcas.2006.1688199

Potdar A, Rogers J (2012) Reason-code based model to forecast product returns. Foresight 14(2):105–120. https://doi.org/10.1108/14636681211222393

Pur S, Stahl E, Wittmann M, Wittmann G, Weinfurtner S (2013) Retourenmanagement im Online-Handel–das Beste daraus machen: Daten, Fakten und Status quo. Ibi Research, Regensburg

Rajasekaran V, Priyadarshini R (2021) An e-commerce prototype for predicting the product return phenomenon using optimization and regression techniques. In: Singh M, Tyagi V, Gupta PK, Flusser J, Ören T, Sonawane VR (eds) Advances in computing and data sciences: proceedings of the 5th international conference on advances in computing and data sciences, 1st edn, vol 1441, pp 230–240. https://doi.org/10.1007/978-3-030-88244-0_22

Ravitch SM, Riggan M (2017) Reason and rigor: how conceptual frameworks guide research. Sage, Los Angeles, London, New Delhi, Singapore, Washington DC

Ren S, Chan H-L, Siqin T (2020) Demand forecasting in retail operations for fashionable products: methods, practices, and real case study. Ann Oper Res 291(1–2):761–777. https://doi.org/10.1007/s10479-019-03148-8

Rezaei M, Cribben I, Samorani M (2021) A clustering-based feature selection method for automatically generated relational attributes. Ann Oper Res 303(1–2):233–263. https://doi.org/10.1007/s10479-018-2830-2

Rogers DS, Lambert DM, Croxton KL, García-Dastugue SJ (2002) The returns management process. Int J Log Manag 13(2):1–18. https://doi.org/10.1108/09574090210806397

Röllecke FJ, Huchzermeier A, Schröder D (2018) Returning customers: the hidden strategic opportunity of returns management. Calif Manage Rev 60(2):176–203. https://doi.org/10.1177/0008125617741125

Sahoo N, Dellarocas C, Srinivasan S (2018) The impact of online product reviews on product returns. Inf Syst Res 29(3):723–738. https://doi.org/10.1287/isre.2017.0736

Samorani M, Ahmed F, Zaiane OR (2016) Automatic generation of relational attributes: an application to product returns. In: Proceedings of the 4th IEEE international conference on big data, pp 1454–1463

Santoro G, Fiano F, Bertoldi B, Ciampi F (2019) Big data for business management in the retail industry. MD 57(8):1980–1992. https://doi.org/10.1108/MD-07-2018-0829

Shaharudin MR, Zailani S, Tan KC (2015) Barriers to product returns and recovery management in a developing country: investigation using multiple methods. J Clean Prod 96:220–232. https://doi.org/10.1016/j.jclepro.2013.12.071

Shang G, McKie EC, Ferguson ME, Galbreth MR (2020) Using transactions data to improve consumer returns forecasting. J Oper Manag 66(3):326–348. https://doi.org/10.1002/joom.1071

Srivastava SK, Srivastava RK (2006) Managing product returns for reverse logistics. Int Jnl Phys Dist Log Manage 36(7):524–546. https://doi.org/10.1108/09600030610684962

Stock JR, Mulki JP (2009) Product returns processing: an examination of practices of manufacturers, wholesalers/distributors, and retailers. J Bus Logist 30(1):33–62. https://doi.org/10.1002/j.2158-1592.2009.tb00098.x

Stöcker B, Baier D, Brand BM (2021) New insights in online fashion retail returns from a customers’ perspective and their dynamics. J Bus Econ 91(8):1149–1187. https://doi.org/10.1007/s11573-021-01032-1

Sweidan D, Johansson U, Gidenstam A (2020) Predicting returns in men’s fashion. In: Proceedings of the 14th international fuzzy logic and intelligent technologies in nuclear science conference, pp 1506–1513. https://doi.org/10.1142/9789811223334_0180

Thaler RH, Sunstein CR (2009) Nudge: Improving decisions about health, wealth and happiness. Penguin

Tibben-Lembke RS, Rogers DS (2002) Differences between forward and reverse logistics in a retail environment. Supp Chain Mnagmnt 7(5):271–282. https://doi.org/10.1108/13598540210447719

Tibshirani R (1996) Regression shrinkage and selection via the lasso. J Roy Stat Soc: Ser B (Methodol) 58(1):267–288. https://doi.org/10.1111/j.2517-6161.1996.tb02080.x

Toktay LB, van der Laan EA, de Brito MP (2004) Managing product returns: the role of forecasting. In: Dekker R, Fleischmann M, Inderfurth K, van Wassenhove LN (eds) Reverse logistics. Springer, Berlin, Heidelberg, pp 45–64. https://doi.org/10.1007/978-3-540-24803-3_3

Toktay LB, Wein LM, Zenios SA (2000) Inventory management of remanufacturable products. Manage Sci 46(11):1412–142. https://doi.org/10.1287/mnsc.46.11.1412.12082

Tranfield D, Denyer D, Smart P (2003) Towards a methodology for developing evidence-informed management knowledge by means of systematic review. Br J Manag 14(3):207–222. https://doi.org/10.1111/1467-8551.00375

Uman LS (2011) Systematic reviews and meta-analyses. J Can Acad Child Adolesc Psychiatry 20(1):57–59

Urbanke P, Kranz J, Kolbe L (2015) Predicting product returns in e-commerce: the contribution of mahalanobis feature extraction. In: Proceedings of the 14th international conference on computer and information science

Urbanke P, Uhlig A, Kranz J (2017) A customized and interpretable deep neural network for high-dimensional business data–evidence from an e-commerce application. In: Proceedings of the 38th international conference on information systems

Vakulenko Y, Shams P, Hellström D, Hjort K (2019) Service innovation in e-commerce last mile delivery: mapping the e-customer journey. J Bus Res 101:461–468. https://doi.org/10.1016/j.jbusres.2019.01.016

vom Brocke J, Simons A, Niehaves B, Reimer K, Plattfaut R, Cleven A (2009) Reconstructing the giant: on the importance of rigour in documenting the literature search process. In: Proceedings of the 17 th european conference on information systems

von Zahn M, Bauer K, Mihale-Wilson C, Jagow J, Speicher M, Hinz O (2022) The smart green nudge: reducing product returns through enriched digital footprints and causal machine learning. SSRN J. https://doi.org/10.2139/ssrn.4262656

Walsh G, Möhring M (2017) Effectiveness of product return-prevention instruments: empirical evidence. Electron Mark 27(4):341–350. https://doi.org/10.1007/s12525-017-0259-0

Walsh G, Möhring M, Koot C, Schaarschmidt M (2014) Preventive product returns management systems–a review and model. In: Proceedings of the 22nd european conference on information systems

Webster J, Watson RT (2002) Analyzing the past to prepare for the future: writing a literature review. MIS Q 26(2):xiii–xxiii

Winklhofer H, Diamantopoulos A, Witt SF (1996) Forecasting practice: a review of the empirical literature and an agenda for future research. Int J Forecast 12(2):193–221. https://doi.org/10.1016/0169-2070(95)00647-8

Wirth R, Hipp J (2000) CRISP-DM: towards a standard process model for data mining. In: Proceedings of the 4th international conference on the practical applications of knowledge discovery and data mining, vol 1, pp 29–40

Zhao X, Hu S, Meng X (2020) Who should pay for return freight in the online retailing? Retailers or consumers. Electron Commer Res 20(2):427–452. https://doi.org/10.1007/s10660-019-09360-9

Zhu Y, Li J, He J, Quanz BL, Deshpande A (2018) A local algorithm for product return prediction in e-commerce. In: Proceedings of the 27th international joint conference on artificial intelligence, pp 3718–3724. https://doi.org/10.24963/ijcai.2018/517

Zotteri G, Kalchschmidt M (2007) Forecasting practices: empirical evidence and a framework for research. Int J Prod Econ 108(1–2):84–99. https://doi.org/10.1016/j.ijpe.2006.12.004

Download references

Open Access funding enabled and organized by Projekt DEAL. The authors have not disclosed any funding.

Author information

Authors and affiliations.

Chair of Operations Management and Logistics, University of Bamberg, Feldkirchenstr. 21, 96052, Bamberg, Germany

You can also search for this author in PubMed   Google Scholar

Corresponding author

Correspondence to David Karl .

Ethics declarations

Conflict of interest.

The authors declare that no funds, grants, or other support were received during the preparation of this manuscript. The authors have no relevant financial or non-financial interests to disclose. The data that support the findings of this study are available from the corresponding author upon request.

Additional information

Publisher's note.

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendix: Author-centric content summary (with focus on forecasting issues)

1.1 journal publications.

Hess and Mayhew ( 1997 ) describe a forecasting approach, taking the example of a direct marketer for apparel with a lenient consumer return policy (free returns anytime). The analysis can plausibly be applied to a general retailer, although return time windows are somewhat different. A regression approach and a hazard model are compared. The regression approach itself is split into an OLS estimation of return timing (with poor fit) and a logit model of return propensities, which is in turn used for the split function of the box-cox-hazard approach for estimating the probability of a return over time. The accuracy was measured by fit statistics regarding the absolute deviation from the actual cumulative return proportion, with the split-hazard model outperforming the regression model. Besides price, the importance of fit of the respective product is used as a predictor.

Potdar and Rogers ( 2012 ) propose a method using reason codes combined with consumer behavior data for forecasting returns volume in the consumer electronics industry, aiming at the retailer stage as well as the preceding supply chain stages. The subject of their study is an offline retailer, which allows generalization for e-tailers due to a similar return policy (14 days free returns with no questions asked). In a multi-step approach, the authors are using essential statistical methods (moving averages, correlations, and linear regression), but use sophisticated domain and product knowledge like product features or price in relation to past return numbers, aiming to rank different competing products regarding their quality, and to predict the volume of returns for a given product for each given period of time.

Fu et al. ( 2016 ) derive a framework for the forecasting of product- and consumer-specific return propensities, i.e., the return propensity for individual purchases. Their study is directed at online shopping and is evaluated using the data from an online cosmetic retailer selling via Taobao.com. The predictors are categorized into inconsistencies in the buying and in the shipping phase of a transaction. A latent factor model is introduced for return propensities capturing differences between expectations and performance. This model is extended by product (e.g., warranty) and customer information (e.g., gender, credit score). The model is based on conditional probabilities, and an iterative expectation–maximization approach derives its parameters. MAE and RMSE, precision/recall, and AUC metrics assess the forecast accuracy. As benchmark models, two matrix factorization models and two memory-based models (historical consumer or product return rates) are compared, while the proposed model outperforms the references. Furthermore, this model allows identifying various return reasons, e.g., return abuse and fraud.

Building on the work of Fu et al. ( 2016 ), Li et al. ( 2019 ) investigate underlying reasons for consumer returns, taking the example and data of an online cosmetic retailer via Taobao.com. They examine the customers’ return propensity for product types, aiming at detecting abnormal returns suspecting abuse. Different from purchase decisions, they find customer profile data to be more important predictors for return decisions than product information or transaction details. The authors detect “selfish” or “fraud” consumers based on this rationale. For estimating return propensities for a given consumer and product, they calculate the return behavior depending on the return decision of similar consumers (“trust network”) and the amount of trust in these other consumers. MAE and precision-recall-measures are used to assess the prediction of different random walk models. The employed trust-based random walk model outperforms the other models on most indicators, building the basis for anomaly detection of consumers to cluster them into groups (honest/selfish/fraud) and individually address the return issues of these groups.

Although the paper from Cui et al. ( 2020 ) aims at product return forecasts from the perspective of the manufacturer, their case can be generalized for classic e-tailers, as the manufacturer is responsible for the return handling in their scenario—a task often performed by the retailer. They used a comprehensive data set from an automotive accessories manufacturer aiming to forecast return volume for sales channels and different products. The observed return rates lower than 1% are uncommonly low, and therefore the results must be interpreted with caution. First, a hierarchical OLS regression step-by-step incorporates up to 40 predictors regarding sales, time, product type, sales channel, and product details, including return history. The full model shows a significantly increased performance measured by a more than 50% decrease of MSE, which was used as the primary performance measure. Interestingly, relatively small differences in model quality (R 2 ) led to overproportional changes in the MSE. Using a machine-learning approach for predictor selection (“LASSO”), another MSE reduction of about 10% was achieved. Data Mining approaches (random forest, gradient boosting) could not outperform the LASSO approach. Forecasting performance was strongly dependent on the variation of the data. The two best predictors for return volume were past sales volume and lagged return statistics. The authors were wondering about the importance of lagged return information, failing to acknowledge that this predictor includes the consumer reaction to detailed product information, which has not been a significant predictor.

Ketzenberg et al. ( 2020 ) segment customers and target detecting the small number of abusive returners, as these are unprofitable for the retailer and generate significant losses over a long time. In general, high-returning customers are usually more profitable. The data used for this study is from a department store retailer with various product groups in the assortment. Predictors are transactional data and customer attributes. For classification, different algorithms like logit, Support Vector Machines (SVM), Random Forests (RF), Neural Networks (NN) are used in combination with different shrinkage methods like LASSO, ridge regression, and elastic net. Random Forests and especially Neural Networks outperform the other algorithms, assessed by sensitivity, precision, and AUC. In conclusion, a low rate of false positives could assure retailers of using abuse detection systems.

Shang et al. (Shang et al. 2020 ) developed a predict-aggregate (P-A) model adaptable both for retailers and manufacturers for forecasting return volume in a continuous timeframe, in contrast to commonly used aggregate-predict (A-P) models. Instead of aggregating data first (i.e., sales volume and returns volume), they first aggregate product-specific return probabilities and then aggregate the purchases by addition of the individual probabilities. As predictors, they only use timestamps and lagged return information. They tune and assess their models on two datasets from an offline electronics and an online jewelry retailer. ARIMA and lagged return models known from end-of-life forecasting (de Brito et al. 2005 ) are used as benchmarks, using RMSE as an assessment criterion. The authors show that even a basic version of their approach outperforms the benchmark models in almost all observed cases by up to 19%, though using only lagged returns and timestamps as input. Different extensions, e.g., including more predictor variables, can easily be integrated and are shown to further improve the forecasting performance.

John et al. ( 2020 ) try to predict the rare event of return fraud from customer representatives that make use of exactly knowing the e-commerce company’s return policy framework and buying and returning items fraudulently. Therefore, predictors range from transaction details to customer service agent attributes. A penalized likelihood logit model was chosen by the authors and was evaluated by precision and recall, focussing on maximizing recall and minimizing false negatives. The most important predictors were communication type and reason for interaction.

The paper by Rezaei et al. ( 2021 ) introduces a new algorithm to automatically select attributes from high-dimensional databases for forecasting purposes. As a demonstration sample, they use simulated data as well as the publicly available ISMS Durable Goods dataset (Ni et al. 2012 ) for consumer electronics. The results are assessed by AUC, precision, recall, and f1-score. They compare different configurations. For the simulated data, LASSO as shrinkage method generally works best, outperforming RF and BaggedTrees. For real-world data, based on a forecast with a logit model, they show that the proposed selection algorithm performs similar or better compared to LASSO, SVM, and RF, while the complexity of the chosen variables is lower.

1.2 Conference publications

Urbanke et al. ( 2015 ) describe a decision support system to better direct return-reducing interventions at e-commerce purchases with highly likely returns. They compare different approaches for extracting input variables for return propensity forecasting. They use a large dataset from a fashion e-tailer, aiming to reduce the input variables regarding consumer profile, product profile, and basket information from over 5,000 binary variables to 10 numeric variables by different algorithms (e.g., principal component analysis, non-negative matrix factorization, etc.). The results are then used to predict return propensities with a wide variety of state-of-the-art algorithms (AdaBoost, CART, ERT, GB, LDA, LR, RF, SVM), thus also revealing both feature selection and prediction performance. The proposed Mahalanobis feature extraction algorithm used as input for AdaBoost outperforms all other combinations presented, while interestingly, a logit model with all original inputs delivers relatively precise forecasts.

Building on some parts of this study, the paper of Urbanke et al. ( 2017 ) presents a return decision forecasting approach and aims at two targets, (1) high predictive accuracy and (2) interpretability of the model. Based on real-world data of a fashion and sports e-tailer, they first hand-craft 18 input variables and then use NN to extract more features and compare this approach to other feature extraction algorithms based on different forecasting algorithms. For assessment, they measure correlations between out-of-sample-predictions and class labels and AUC. The best performing classifier was AdaBoost, while the contribution of NN-based feature extraction shows interpretability as well as superior predictive performance.

Ahmed et al. ( 2016 ) focus on the automatic aggregation and integration of different data sources to generate input variables (features). They use return forecasting just as an exemplary classification problem for their data preparation approach, using various ML algorithms, e.g., RF, NN, DT-based algorithms, to detect returned purchases of an electronics retailer. Based on AUC measure, the results of their GARP-approach are superior to not using aggregations while generating an extensive amount of features with no pruning approach. In general, SVM and RF work best in combination with the proposed GARP approach. The data is based on the publicly available ISMS durable goods data sets (Ni et al. 2012 ).

A similar group of authors published another paper (Samorani et al. 2016 ), again using the aforementioned ISMS dataset as an example for data preparation and automatic attribute generation. Besides forecasting performance, in this paper, they want to generate knowledge about important return predictors; e.g., a higher price is associated with more returns, but only as long price levels are below a 1,500$ threshold. AUC is used to assess different levels of data integration, confirming that overfitting might happen when too many attributes are used.

Heilig et al. ( 2016 ) describe a Forecasting Support System (FSS) to predict return decisions in a real environment. First, they compare different forecasting approaches for data from a fashion e-tailer, assessed by AUC and accuracy metrics. The ensemble selection approach outperforms all other classifiers, with RF being the closest competitor. Computational times grow exponentially when using more data. Based on these results, they secondly describe a cloud framework for implementing such ensemble models for live use in a real shop environment.

Ding et al. ( 2016 ) present an approach to predict the daily return rate of an e-commerce company based on sentiment analysis of tweets regarding this company in the categories of news, experience, products, and service. Therefore, they use sophisticated text mining technologies, while the forecasting approach of an econometric vector autoregression is more or less common. The emotion of posts regarding different variables (news, product, service) impacts the returns rate negatively, while the emotion of purchasing experience impacts it positively, showing that the prediction accuracy enhances through classifying social network posts.

Drechsler and Lasch ( 2015 ) aim at forecasting the volume of fraudulent returns in e-commerce over several periods of time. They present different approaches multiplying the sales volume and the relative return rate, the first referring to Potdar and Rogers ( 2012 ), estimating the rate of misused returns directly based on time-lag-specific return rates. In a second approach referring to Toktay et al. ( 2000 ), they estimate the overall returns rate and multiply it by the time-specific ratio of fraudulent returns. The return rates were forecasted by moving averages and exponential smoothing techniques. Assessment criteria for performance comparison based on simulated data were MAE, MAPE, and TIC, showing the first approach to be superior, but both methods are not sufficiently robust. Therefore, the authors include further time-specific information (like promotions or special events, which could foster fraudulent returns) in a model using a Holt-Winters approach, showing superior performance. All of the models are highly dependent on low fluctuation in return rates, showing a shortcoming of these more or less naive forecasting techniques.

Asdecker and Karl ( 2018 ) compare the performance of different algorithms for forecasting binary return decisions: logit, linear discriminant analysis, neuronal networks, and a decision-tree-based algorithm (C5.0). Their analysis is based on the data of a fashion e-tailer, including price, consumer information, and shipment information (number of articles in shipment, delivery time). For the assessment of different algorithms, they use the total absolut error (TAE) and relative error. An ensemble learning approach performs best and similar to the C5.0 algorithm. Though, differences in performance are relatively small, while only about 68% of return decisions are forecasted correctly.

Li et al. ( 2018 ) propose a hypergraph representation of historical purchase and return information combined with a random-walk-based local graph cut algorithm to forecast return decisions on order (basket) level as well as on product level. By this, they aim to detect the underlying return causes. They use data from two omnichannel fashion e-tailers from the US and Europe to assess the performance of their approach, using precision/recall/F 0.5 /AUC metrics while arguing that precision is the most important indicator for targeted interventions. Three similarity-based approaches (e.g., a k-Nearest Neighbor model) are used as reference. The proposed approach performs best regarding AUC, precision, and F 0.5 metrics.

Zhu et al. ( 2018 ) developed a weighted hybrid graph algorithm representing historical customer behavior and customer/product similarity, combined with a random-walk-based algorithm for predicting customer/product combinations that will be returned. They report an experiment based on data from a European fashion e-tailer suffering from return rates as high as 50%. For assessment, they use precision, recall, and F 0.5 metrics. Their approach is superior to two reference competitors (similarity-based and a bipartite graph algorithm). As predictors, they use product similarities and historical return information, while their approach can be enriched with detailed customer attributes.

Joshi et al. ( 2018 ) model the return decisions based on the data of an Indian e-commerce company, especially dealing with returns for apparel due to fit issues. In a two-step approach, they first model return probabilities using concepts from network science based on a customer’s historical purchase and return decisions, and secondly use a SVM implementation with return probabilities as a single input to classify for the return decision. Assessed by F 1 /precision/recall scores, their approach is superior to a reference random-walk baseline model.

Imran and Amin ( 2020 ) compare different forecasting algorithms (XGBoost, CatBoost, LightGBM, TabNet) for return classification based on the data of a general e-commerce retailer from Bangladesh. As input variables, only order attributes, including payment method and order medium, are used. For evaluation, they use metrics like true negative rate, false-positive rate, false-negative rate, true positive rate, AUC, F 2 -score, precision, and accuracy. In the end, they chose TPR, AUC, and F 2 -score, claiming that misclassifying high return probability objects were the first thing to avoid. According to these metrics, TabNet as a deep learning algorithm outperforms the other models. The most important predictors were payment method, order location, and promotional orders.

As returns are most prominent in fashion e-commerce, most of the forecasting papers take this industry as an example, as forecasting models are more precise when returns are more frequent. Hofmann et al. ( 2020 ) develop a more generalized order-based return decision forecasting approach, appropriate for different industries and suitable also for low return rates. For their analysis, they use a dataset from a german technical wholesaler with a return rate as low as 5%. Input variables were just basket composition and return information. For assessment, they used precision and recall metrics. RF did not perform superior to a statistical baseline approach, nor with oversampling as data preparation, to deal with the group imbalance. The DART algorithm makes use of the group imbalance correction by random oversampling. In general, gradient boosting performs best with imbalanced groups, also without oversampling, but forecasting quality is lower than with more specialized forecasting approaches as described for fashion. Furthermore, results were more accurate on basket level than on single-item level.

Fuchs and Lutz ( 2021 ) use Design Science Research (DSR) principles to design a meta-model for the real-time prediction of returns. The goal is to influence consumer decisions by triggering a feedback system based on the basket composition and its return probability. For forecasting, which is not the primary focus of their paper, they build upon a gradient boosting model taken from existing research (Hofmann et al. 2020 ) and describe possible implementations into an ERP system regarding asynchronous communication requirements and possible architecture.

The paper by Sweidan et al. ( 2020 ) evaluates the forecasting performance of a random forest model for a shipment-based return decision, using real-world data of a fashion e-tailer. For their model, they use customer (e.g., lagged return rate) and order information as inputs. They find that predictions with high confidence are very precise (i.e., low false-positive rate). Thus, interventions can be targeted at such orders already when the items are in the consumers’ basket without risk of a misdirected intervention. For assessment, accuracy, AUC, precision, recall and specificity are used. Regarding the predictors, they note that selection orders (a product in different sizes) are the best predictor for order-based returns.

Rajasekaran and Priyadarshini ( 2021 ) develop a metaheuristic for forecasting the product-based return probabilities. In the first step, they determine return probabilities based on product feedback, time, and product attributes regarding manufacturer return statistics. Secondly, they compare different algorithms (OLS, RF, Gradient Boosting) by MAE, MSE, and RMSE metrics. Interestingly, linear regression performs best in all metrics, but no explanation and a misinterpretation regarding the best algorithm are given.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ .

Reprints and permissions

About this article

Karl, D. Forecasting e-commerce consumer returns: a systematic literature review. Manag Rev Q (2024). https://doi.org/10.1007/s11301-024-00436-x

Download citation

Received : 24 August 2023

Accepted : 12 April 2024

Published : 21 May 2024

DOI : https://doi.org/10.1007/s11301-024-00436-x

Share this article

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

  • Consumer returns
  • Product returns
  • Forecasting
  • Literature review

JEL classification

  • Find a journal
  • Publish with us
  • Track your research
  • Open access
  • Published: 27 May 2024

Optimizing double-layered convolutional neural networks for efficient lung cancer classification through hyperparameter optimization and advanced image pre-processing techniques

  • M. Mohamed Musthafa 1 ,
  • I. Manimozhi 2 ,
  • T. R. Mahesh 3 &
  • Suresh Guluwadi 4  

BMC Medical Informatics and Decision Making volume  24 , Article number:  142 ( 2024 ) Cite this article

123 Accesses

Metrics details

Lung cancer remains a leading cause of cancer-related mortality globally, with prognosis significantly dependent on early-stage detection. Traditional diagnostic methods, though effective, often face challenges regarding accuracy, early detection, and scalability, being invasive, time-consuming, and prone to ambiguous interpretations. This study proposes an advanced machine learning model designed to enhance lung cancer stage classification using CT scan images, aiming to overcome these limitations by offering a faster, non-invasive, and reliable diagnostic tool. Utilizing the IQ-OTHNCCD lung cancer dataset, comprising CT scans from various stages of lung cancer and healthy individuals, we performed extensive preprocessing including resizing, normalization, and Gaussian blurring. A Convolutional Neural Network (CNN) was then trained on this preprocessed data, and class imbalance was addressed using Synthetic Minority Over-sampling Technique (SMOTE). The model’s performance was evaluated through metrics such as accuracy, precision, recall, F1-score, and ROC curve analysis. The results demonstrated a classification accuracy of 99.64%, with precision, recall, and F1-score values exceeding 98% across all categories. SMOTE significantly enhanced the model’s ability to classify underrepresented classes, contributing to the robustness of the diagnostic tool. These findings underscore the potential of machine learning in transforming lung cancer diagnostics, providing high accuracy in stage classification, which could facilitate early detection and tailored treatment strategies, ultimately improving patient outcomes.

Peer Review reports

Introduction

Lung cancer stands as a formidable global health challenge, consistently ranking as one of the leading causes of cancer-related mortality worldwide. It is characterized by the uncontrolled growth of abnormal cells in one or both lungs, typically in the cells lining the air passages. Unlike normal cells, these cancerous cells do not develop into healthy lung tissue; instead, they divide rapidly and form tumors that disrupt the lung’s primary function: oxygen exchange.

The global impact of lung cancer is staggering, with millions of new cases diagnosed annually. Its high mortality rate is primarily due to late-stage detection, where the cancer has progressed to an advanced stage or metastasized to other body parts, significantly diminishing the effectiveness of treatment modalities. Thus, early and accurate diagnosis of lung cancer is paramount in improving patient prognoses, extending survival rates, and enhancing the quality of life for affected individuals.

The primary cause of lung cancer is cigarette smoking, which exposes the lungs to carcinogenic substances that can damage the cells’ DNA and lead to cancer. Other risk factors for lung cancer include exposure to secondhand smoke, radon gas, asbestos, air pollution, and a family history of lung cancer.

Symptoms of lung cancer can vary but may include persistent coughing, chest pain, shortness of breath, hoarseness, coughing up blood, unexplained weight loss, and fatigue. However, lung cancer may not cause symptoms in its initial stages, which is why early detection through screening is crucial for improving outcomes.

Diagnosis of lung cancer typically involves imaging tests such as chest X-rays, CT scans, and PET scans to visualize the lungs and detect any abnormalities. A biopsy, where a small sample of lung tissue is taken and examined under a microscope, is usually needed to confirm the diagnosis.

Treatment options for lung cancer depend on several factors, including the type and stage of the cancer, as well as the patient’s overall health and preferences. Treatment may include surgery to remove the tumor, chemotherapy, radiation therapy, targeted therapy, immunotherapy, or a combination of these approaches.

Lung cancer is a critical condition that necessitates immediate medical care. Detecting it early, along with improvements in treatment methods, has enhanced the prognosis for numerous patients. However, the most effective strategy to avoid lung cancer is to stop smoking and minimize contact with additional risk elements. Figure  1 displays some example images of lung cancer tests.

figure 1

Sample images of lung cancer

Current diagnostic techniques for lung cancer involve various approaches, such as biopsies, CT scans, chest X-rays, PET scans, and MRI, among others [ 1 ]. While these methods are invaluable in the diagnostic process, they come with certain limitations. For instance, biopsies, while definitive, are invasive and carry risks of complications. Less invasive imaging methods such as X-rays or CT scans might produce false positives or negatives, potentially causing unwarranted stress or delays in treatment.

Moreover, the interpretation of these diagnostic tests heavily relies on the expertise of the clinician, introducing a degree of subjectivity and potential for human error. There’s also the challenge of early-stage lung cancer, which often presents very subtle changes not always detectable with conventional imaging techniques [ 2 ].

This context highlights the critical need for advanced diagnostic tools capable of overcoming these challenges. This study aims to address these issues by developing a machine learning model using Convolutional Neural Networks (CNNs) to enhance the precision and effectiveness of lung cancer stage classification from CT scans. By automating and refining the diagnostic process, the proposed model seeks to mitigate the limitations of traditional methods, offering a faster, non-invasive, and more reliable diagnostic alternative.

The impact of this study is significant: the model’s high accuracy in classifying lung cancer stages promises to revolutionize clinical diagnostics, facilitating early detection and enabling tailored treatment strategies. This advancement has the potential to improve patient outcomes by allowing for timely intervention and more effective management of lung cancer, ultimately contributing to reduced mortality rates and enhanced patient care.

The objective of this research paper is to:

Develop a machine learning model utilizing Convolutional Neural Networks (CNNs) for lung cancer stage classification based on CT scans.

Bridge existing diagnostic deficiencies by providing clinicians with a tool for expedited and precise decision-making in lung cancer management.

Contribute to improved patient outcomes through enhanced diagnostic accuracy and early detection capabilities.

The paper is organized as follows: Initially, the Literature Review explores existing research on lung cancer diagnostics, highlighting advancements and limitations, and sets the foundation for the proposed methodology. Subsequently, the Materials and methods section describes the dataset, preprocessing steps, model architecture, training process, and evaluation metrics in detail. The Results section then presents the study’s findings, including model performance metrics and comparative analysis with existing methods. This is followed by the Discussion, which interprets the results, discusses implications for clinical practice, addresses limitations, and suggests future research directions. Finally, the Conclusion summarizes the main findings and their relevance within the broader scope of lung cancer diagnostics, supported by a comprehensive list of References to provide credit and enable readers to explore the research background further.

Through this structured approach, the paper aims to contribute meaningful insights to the field of medical imaging and machine learning, offering a novel tool for the early and accurate diagnosis of lung cancer.

Literature review

The literature surrounding lung cancer diagnostics encompasses various methodologies, ranging from traditional imaging techniques to more advanced approaches such as machine learning. This review aims to explore existing research in this area, highlighting both the advancements made and the limitations faced, ultimately setting the foundation for the proposed machine learning-based methodology.

Diagnosis of lung cancer using CT scans

The utilization of Computed Tomography (CT) scans in lung cancer diagnosis has been a cornerstone in the medical field, offering high-resolution images that are pivotal for detecting and monitoring various stages of lung tumors [ 3 ]. Over the years, numerous studies have underscored the importance of CT scans in identifying nodules that could potentially be malignant, with a particular focus on low-dose CT scans, which have become a standard in screening programs, especially for high-risk populations. Such studies underscore the superior sensitivity of CT scans in identifying early-stage lung cancer, a significant advancement over other imaging methods like chest X-rays, which may overlook smaller, subtler lesions.

Despite the advancements, the interpretation of CT scans remains a significant challenge. Radiologists need to discern between benign and malignant nodules, an endeavor complicated by the presence of various artifacts and benign conditions like scars or inflammatory diseases, which can mimic the appearance of cancerous nodules [ 4 , 5 ].

Machine learning approaches in lung cancer detection and classification

The integration of machine learning, particularly deep learning techniques, into the analysis of CT images has established a groundbreaking paradigm in the identification and classification of lung cancer. Convolutional Neural Networks (CNNs) are spearheading this transformation by providing a framework for automated extraction and categorization of features directly from the images. This advancement marks a substantial stride in augmenting the accuracy and effectiveness of lung cancer diagnostics, thus facilitating more precise and timely interventions.

Binary classification models

Early studies primarily focused on binary classification, distinguishing between malignant and non-malignant nodules. CNNs, through their layered architecture, have demonstrated the ability to learn complex patterns in imaging data, surpassing traditional computer vision techniques in accuracy and reliability [ 6 , 7 ].

Multi-class classification models

Recent advancements have moved towards more nuanced multi-class classification models that categorize nodules into various cancer stages or types. This granularity is crucial for treatment planning and prognosis, offering a more detailed understanding of the disease’s progression [ 8 ].

Transfer learning

Given the challenges of assembling large annotated medical imaging datasets, transfer learning has become a popular approach. Models pre-trained on vast, non-medical image datasets are fine-tuned on smaller medical imaging datasets, leveraging learned features to improve performance in the medical domain [ 9 ].

Data augmentation

To address the issue of restricted training data, strategies such as rotation, scaling, and flipping are commonly employed for data augmentation, effectively expanding the training dataset artificially. These methods bolster the model’s resilience and its ability to generalize from a limited number of examples [ 10 ].

Segmentation models

Deep learning models extend their utility beyond mere classification; they are also employed in segmentation tasks, delineating the precise boundaries of nodules, which is vital for assessing tumor size and growth over time. U-Net, a type of CNN, is particularly noted for its effectiveness in medical image segmentation [ 11 ].

In Table 1  a few of the studies which have been done in this field are given.

Gaps in current research

Despite significant advancements in lung cancer diagnostics, several critical gaps remain in the current research landscape. Many existing models are trained on datasets lacking diversity in demographics, scanner types, and image acquisition parameters, which can limit their generalizability across different populations and clinical settings. This limitation underscores the need for more comprehensive and diverse datasets to enhance the robustness of diagnostic models. Additionally, the “black box” nature of deep learning models poses a challenge for clinical adoption, as there is a growing demand for models that not only predict accurately but also provide insights into the reasoning behind their predictions. This issue of interpretability is crucial for gaining the trust of clinicians and integrating these models into clinical workflows effectively. Furthermore, the transition from research to clinical practice is slow, with models requiring not just technological solutions but also addressing regulatory, ethical, and practical considerations to facilitate their integration into routine medical care. Another critical gap is the need for models capable of longitudinal analysis, which can analyze changes in lung nodules over time, providing a dynamic assessment that aligns more closely with clinical needs. Addressing these gaps, this study introduces a comprehensive CNN model trained on a diverse and extensive dataset, encompassing various stages of lung cancer. The model is designed for multi-class classification, offering detailed insights critical for personalized treatment strategies. Emphasis is placed on the interpretability of the model, aiming to provide clinicians with understandable and actionable information. By demonstrating the model’s effectiveness in a clinical setting, this research contributes to the ongoing effort to integrate advanced machine learning techniques into the realm of lung cancer diagnosis and treatment.

Addressing these gaps, this study introduces a comprehensive CNN model trained on a diverse and extensive dataset, encompassing various stages of lung cancer. The model is designed for multi-class classification, offering detailed insights critical for personalized treatment strategies. Emphasis is placed on the interpretability of the model, aiming to provide clinicians with understandable and actionable information. By demonstrating the model’s effectiveness in a clinical setting, this research contributes to the ongoing effort to integrate advanced machine learning techniques into the realm of lung cancer diagnosis and treatment.

Materials and methods

This section delineates the comprehensive methodology employed to construct and validate a convolutional neural network (CNN) model for the classification of lung cancer stages using the IQ-OTHNCCD lung cancer dataset. The approach encompasses dataset acquisition, application of preprocessing methodologies, formulation of the model architecture, delineation of training procedures, and determination of evaluation metrics to ensure a comprehensive and reliable analysis. The workflow of the proposed model is visually depicted in Fig.  2 .

figure 2

Workflow of the proposed model

Dataset description and preprocessing

The IQ-OTHNCCD lung cancer dataset, integral to this study, is painstakingly curated to facilitate the creation and validation of machine learning models aimed at identifying and classifying lung cancer stages. This dataset encompasses a vast collection of CT scan images essential for advancing diagnostic capabilities in the field of lung cancer.

This dataset comprises CT scan images, comprising a diverse and comprehensive range of cases, covering various stages of lung cancer, including benign, malignant, and normal cases. This diversity is essential for training robust models capable of generalizing well across the spectrum of lung cancer manifestations, enabling effective diagnostic applications. In Table 2  a brief description of the dataset has been given.

Based on Table  2 , to provide visual insights of the data Fig.  3 delves into the same aspects.

figure 3

Dataset description

Annotating and labeling each image meticulously, medical professionals from the Iraq-Oncology Teaching Hospital/National Center for Cancer Diseases have ensured the dataset’s reliability. Annotations categorize images into one of three classes: benign, malignant, or normal. Such granular labeling establishes a solid ground truth essential for training and assessing the model, enhancing the dataset’s utility in research and clinical applications.

Characterized by high quality and consistency, the CT scans adhere to standardized imaging protocols, guaranteeing reliability and accuracy. However, variations in image dimensions necessitate preprocessing to standardize inputs for neural networks. These steps ensure that the model processes uniform data, enhancing its performance and generalizability across diverse datasets. The images of the dataset ratio are checked using Eq.  1 .

Preprocessing steps are pivotal in preparing data for effective model training, including:

Resizing: Resizing images to a uniform dimension ensures consistency in input size for CNNs, optimizing model performance.

Normalization: Normalizing pixel values to a scale of 0 to 1 expedites model convergence during training, facilitating efficient learning. It is achieved using Eq.  2 .

Augmentation: Utilizing data augmentation methods like rotation, flipping, and scaling improves the model’s robustness and helps prevent overfitting by effectively enlarging the dataset size.

Splitting: Partitioning the dataset into training, validation, and test sets is crucial for facilitating effective model training and evaluation, thereby ensuring the model’s ability to generalize and perform accurately on unseen data.

In this process, CNN is trained using the preprocessed dataset to adeptly extract features from CT scan images and accurately classify the stages of lung cancer. The dataset’s diversity and quality are pivotal in enabling the model to learn nuanced features and patterns associated with various lung cancer stages, underscoring its significance in advancing diagnostic accuracy and efficiency.

The IQ-OTHNCCD lung cancer dataset serves as the cornerstone for developing machine learning models that enhance early detection and classification of lung cancer. Through meticulous curation and rigorous preprocessing, this dataset showcases the transformative potential of AI in healthcare, underscoring its role in improving diagnostic accuracy and efficiency.

  • Image preprocessing

The preprocessing of images stands as a pivotal stage in the pipeline of developing a machine learning model, especially when handling medical imaging data like the IQ-OTHNCCD lung cancer dataset. This procedure comprises several crucial steps, each tailored to convert the raw CT scan images into a format conducive to effective analysis by a convolutional neural network (CNN).

Initially, image resizing is conducted. Given the inherent variability in the dimensions of CT scans, it is imperative to standardize the size of all images to ensure consistent input to the CNN. Resizing is performed while preserving the aspect ratio to avoid distortion, typically scaling down to a fixed size (e.g., 256 × 256 pixels). This uniformity is vital for the neural network to process and glean insights from the data effectively, as it necessitates a consistent input size [ 21 ].

Some pre-processed images to enhance the accessibility has been provided in Fig.  4 .

figure 4

Pre-processed images

Following resizing, normalization of pixel values is performed. CT scans, by nature, contain a wide range of pixel intensities, which can adversely affect the training process of a CNN due to the varying scales of image brightness and contrast. Normalization is a crucial preprocessing step in image analysis that adjusts the pixel values to fall within a specific range, commonly 0 to 1 or -1 to 1. This adjustment is typically achieved by dividing the pixel values by the maximum possible value, which is 255 for 8-bit images. Such a normalization process ensures that the model can train faster and more efficiently. This step ensures that the model trains faster and more effectively, as small, standardized values facilitate quicker convergence during the optimization process.

Gaussian blur is then applied as an additional preprocessing step. This technique, which employs a Gaussian kernel to smooth the image, is instrumental in reducing image noise and mitigating the effects of minor variations and artifacts in the scans. By doing so, the model’s focus is directed toward the salient features relevant to lung cancer classification, rather than being distracted by irrelevant noise or details. Gaussian blur operates by convolving the image with a Gaussian function, effectively averaging the pixel values within a specified radius. This process smoothens the image, reducing high-frequency components and noise, which can otherwise lead to overfitting or distraction during the training of the CNN.

In the context of lung cancer CT scans, Gaussian blur helps to highlight the important structural elements of the lungs and nodules while suppressing irrelevant details that could complicate the model’s learning process. By smoothing the images, Gaussian blur enhances the model’s ability to generalize by focusing on the more significant, lower-frequency features of the image, such as the shape and size of nodules, rather than being confounded by small variations or noise. This is particularly beneficial in medical imaging, where the presence of noise and artifacts can obscure critical diagnostic features.

The application of Gaussian blur can also aid in generalizing the model, preventing overfitting to the high-frequency noise present in the training set. It is achieved using Eq.  3 and the SMOTE ratio through Eq.  4 .

These are the preprocessing steps collectively enhance the quality and consistency of the input data, enabling the CNN to focus on learning meaningful, discriminative features from the CT images [ 22 ]. By ensuring that the images are appropriately resized, normalized, and filtered, the model is better equipped to identify the subtle nuances associated with different stages of lung cancer, thereby improving its diagnostic accuracy and reliability. Through meticulous image preprocessing, the foundation is laid for developing a robust machine learning model capable of contributing significantly to the field of medical imaging and diagnostics.

Deep learning model

The model architecture utilized in this study is a Convolutional Neural Network (CNN), renowned for its effectiveness in various image analysis tasks, notably in the domain of medical image processing. In this study, we utilized a Convolutional Neural Network (CNN) architecture, known for its effectiveness in analyzing images, particularly in medical contexts like lung cancer diagnosis from CT scans. Let’s break down how it works in simpler terms. First, the input layer takes in images resized to a standard size of 256 × 256 pixels, in black and white. This consistency helps the CNN learn efficiently. Then comes the first convolutional layer, where the model looks for basic patterns like edges and textures using small 3 × 3 filters. After that, a process called max pooling reduces the image’s size, focusing on the most important features. This step helps the model generalize better and ignore noise. We repeat this process with another convolutional layer to capture more complex patterns. The flattened layer turns the extracted features into a format the model can understand. Next, a fully connected layer reasons based on these features, helping with the final classification. The output layer then gives probabilities for each class (benign, malignant, or normal). Throughout training, we used the Adam optimizer to adjust learning rates and manage gradients effectively. Additionally, we applied a technique called SMOTE to balance our dataset, ensuring the model learned from all classes equally. By carefully designing our CNN architecture and incorporating these steps, we aimed to create a model that can accurately classify lung cancer stages from CT scans.

Input layer : The input layer accepts images resized to 256 × 256 pixels, maintaining a single channel (grayscale), resulting in an input shape of (256, 256, 1).

First convolutional layer : This layer consists of 64 filters of size 3 × 3, using a ReLU (Rectified Linear Unit) activation function. The choice of 64 filters is aimed at capturing a broad array of features from the input image, while the 3 × 3 filter size is standard for capturing spatial relationships in the image data. The equation involved are given in Eqs.  5 and 6 .

First max pooling layer : Following the convolutional layer, the model incorporates a max pooling layer with a 2 × 2 pool size. This layer serves to decrease the spatial dimensions of the feature maps, which not only helps in reducing the computational load but also enhances the model’s generalization capabilities. By focusing on the most prominent features, max pooling ensures that the model does not overfit to the noise in the training data. It is done using Eq.  7 .

Second convolutional layer : Another set of 64 filters is applied, like the first convolutional layer, to further refine the feature extraction. This layer also uses a 3 × 3 kernel and is followed by a ReLU activation. It is achieved using Eqs.  8 and 9 .

Second max pooling layer : This layer additionally decreases the size of the feature maps, aiding in the prevention of overfitting and lessening the computational burden.

Flattening : The feature maps are flattened into a single vector to prepare for the fully connected layers, facilitating the transition from convolutional layers to dense layers.

Fully connected layer : A dense layer with 16 neurons is used, providing a high-level reasoning based on the extracted features. This layer utilizes a linear activation function to allow for a range of linear responses. The equations helping in this are given in Eqs.  10 and 11 .

Output layer : The final layer of the model contains three neurons, each representing one of the classes: benign, malignant, and normal. It uses a SoftMax activation function, which is selected because it provides a probability distribution across these three classes, making it . involved are given in Eqs.  12 and 13 .

Optimizer : The Adam optimizer is used due to its effectiveness in managing sparse gradients and its ability to adapt learning rates, which enhance the convergence speed during training. The equation involved in this is given in Eq.  14 .

CNN is chosen for its proven efficacy in image classification tasks, particularly its ability to learn hierarchical patterns in data. In medical imaging, CNNs have demonstrated success in identifying subtle patterns that are indicative of various pathologies, making them ideal for this application. The sequential model with convolutional layers followed by pooling layers allows for the extraction and down sampling of features, which is critical for capturing relevant information from medical images.

The Synthetic Minority Over-Sampling Technique (SMOTE) represents an innovative strategy devised to address the issue of class imbalance within the dataset. Class imbalance poses a substantial risk of biasing the model’s performance, particularly in medical datasets where one class may be underrepresented. SMOTE functions by creating synthetic samples within the feature space of the minority class, drawing inspiration from the feature space of its nearest neighbors. This process aids in rectifying class imbalances and ensuring more equitable representation during model training.

Filter mapping of a sample image is shown in Fig.  5 to make it more sound about the interoperability of the model.

figure 5

In this research:

Application of SMOTE : SMOTE is applied only to the training data to prevent information leakage and to promote robust generalization on unseen data. It balances the dataset by augmenting the minority classes, ensuring that the model does not become biased toward the majority class.

Impact on model performance : By addressing the class imbalance, SMOTE helps in improving the model’s sensitivity towards the minority class, which is crucial in medical diagnostics, as overlooking a positive case can have serious implications.

Considerations : While SMOTE can significantly improve model performance in cases of class imbalance, it’s essential to monitor for overfitting, as the synthetic samples may cause the model to overgeneralize from the minority class.

The algorithm for the proposed model is presented in Algorithm 1.

figure a

Algorithm 1: Proposed algorithm for the methodology

As per the algorithm in the initial convolutional layers of the model, two sets of convolutional layers followed by max-pooling layers play a pivotal role in feature detection. Utilizing a standard 3 × 3 kernel size allows the model to discern small, localized features within CT scan images. By stacking these convolutional layers before applying max pooling, the model effectively captures intricate patterns such as edges, textures, and shapes, crucial for distinguishing between benign, malignant, and normal lung tissue. The ReLU activation function is employed in these convolutional layers due to its effectiveness in introducing non-linearity, enabling the model to learn complex patterns efficiently. Additionally, max pooling is utilized to downsample the feature maps, reducing computational load and enhancing robustness to image variations, thereby improving translational invariance. Following feature extraction, the model flattens the output and transitions to dense layers, condensing learned information into abstract representations. The final layer consists of three neurons, representing the three classes under consideration, employing the SoftMax activation function to transform logits into probabilities, thereby providing insights into the model’s confidence regarding each class. Throughout the compilation and training phases, the Adam optimizer and sparse categorical crossentropy loss function, as depicted by Eq.  15 , are chosen due to their adaptive learning rate features and appropriateness for classification objectives. Validation on an independent dataset is crucial for detecting overfitting and refining hyperparameters.

In the training phase, SMOTE is strategically applied to create a balanced dataset representative of all classes, crucial for generalizing well across various lung tissue conditions, especially in medical datasets where class imbalance may exist.

Training and validation

Throughout the training and validation phases of the deep learning model, meticulous steps are taken to ensure that the model not only learns effectively from the training data but also demonstrates robust generalization capabilities when presented with new, unseen data. This phase plays a pivotal role in evaluating the model’s proficiency in accurately classifying lung cancer stages from CT scans.

The training process initiates with the segmentation of the dataset into distinct training and validation subsets. This segmentation is performed in a stratified manner to guarantee that each subset encompasses a balanced representation of the various classes. Such stratification is essential for maintaining consistency and mitigating biases, particularly in light of the class imbalance addressed by SMOTE during training. Approximately 80% of the data is allocated for training purposes, while the remaining 20% is reserved for validation.

Subsequent to the data segmentation, the training commences with the utilization of a batch size of 8. The selection of a smaller batch size is deliberate, aiming to facilitate more precise and nuanced updates to the model’s weights during each iteration, thereby potentially enhancing generalization. Nonetheless, it is imperative to strike a balance between this granularity and computational efficiency, as smaller batch sizes may prolong the training duration.

The number of epochs is predetermined to be 12, indicating the total number of complete passes that the learning algorithm will undertake across the entire training dataset. This choice represents a delicate balance between underfitting and overfitting; insufficient epochs may hinder the model’s learning process, whereas excessive epochs may result in the model memorizing the training data, consequently impairing its ability to generalize effectively. The progression of training and validation loss and accuracy across epochs is visualized in Fig.  6 .

figure 6

Training and validation loss and accuracy

During training, the model’s performance is continuously evaluated using a comprehensive set of performance metrics assessed against the validation set. These metrics encompass accuracy, precision, recall, and F1-score, all of which are instrumental in comprehending the model’s strengths and weaknesses in classifying each lung cancer stage. Accuracy furnishes a broad overview of the model’s overall performance, while precision and recall delve deeper into its class-specific performance, a critical consideration in medical diagnostics where false negatives and false positives carry significant consequences. The F1-score serves to harmonize precision and recall, furnishing a unified metric to gauge the model’s equilibrium between these two facets.

Moreover, the validation process incorporates a confusion matrix and ROC curves to furnish a more granular analysis of the model’s performance across diverse thresholds and classes. The confusion matrix delineates the model’s true positives, false positives, false negatives, and true negatives, offering a snapshot of its classification capabilities. Meanwhile, ROC curves and the corresponding AUC (Area Under the Curve) provide insights into the model’s capacity to discriminate between classes at varying threshold settings, a crucial consideration for refining the model’s decision boundary.

In our quest to maximize the performance of our Convolutional Neural Network (CNN) model for lung cancer classification, we meticulously fine-tuned several critical hyperparameters that play pivotal roles in shaping the learning process and ultimately, the model’s accuracy. Specifically, we focused on optimizing the learning rate, batch size, number of filters in each convolutional layer, filter size, and dropout rate. Firstly, we delved into exploring a spectrum of learning rates to pinpoint the optimal value that ensures swift convergence towards the minimum of the loss function without overshooting. Next, we scrutinized various batch sizes to strike a delicate balance between training time and the stability of the gradient descent process. Moving forward, we embarked on an exploration of different combinations of the number of filters and filter sizes in the convolutional layers, aiming to unearth the configuration most adept at extracting salient features from the intricate CT scan images. Additionally, to combat overfitting and foster model robustness, we meticulously optimized the dropout rate, discerning the precise proportion of neurons to deactivate during training. Our methodology embraced a meticulous grid search strategy, systematically traversing through predefined sets of values for each hyperparameter while evaluating the model’s performance using cross-validation. This exhaustive search enabled us to pinpoint the hyperparameter combination that not only elevated the model’s classification accuracy but also bolstered its generalization capabilities. Subsequently, the efficacy of the selected hyperparameters was meticulously validated using a distinct validation set, underscoring the robustness and reliability of our chosen parameters. Through this systematic and rigorous approach to hyperparameter tuning, we achieved remarkable strides in fortifying the performance and stability of our lung cancer classification model, thereby augmenting its potential for real-world clinical applications.

The training and validation phases operate iteratively, with refinements made to the model’s architecture, hyperparameters, or training methodology based on the validation outcomes. This iterative refinement persists until the model achieves a satisfactory equilibrium of accuracy, generalizability, and robustness, thereby ensuring its efficacy and reliability in clinical settings for lung cancer stage classification.

Statistical methods

In the analysis of the IQ-OTH/NCCD lung cancer dataset, various statistical and machine learning techniques were employed to ensure a comprehensive evaluation of the data. The primary focus was on classification metrics to assess the performance of the predictive models.

Confusion matrix : The confusion matrix serves as a pivotal component in our analysis, furnishing a visual representation of the model’s performance. It succinctly presents the counts of true positives, true negatives, false positives, and false negatives, thereby offering a lucid comprehension of the model’s classification accuracy and any instances of misclassification.

Accuracy : The accuracy metric was calculated by dividing the number of correctly predicted observations by the total number of observations, providing a straightforward measure for assessing the model’s overall performance. However, relying solely on accuracy can be deceptive, particularly in datasets with imbalanced class distributions. Therefore, it is imperative to incorporate additional metrics for a more comprehensive evaluation. It is achieved by Eq.  16 .

Precision (positive predictive value) : Precision was utilized to assess the accuracy of positive predictions, quantified as the ratio of true positives to the sum of true positives and false positives. This metric bears significant relevance in scenarios where the repercussions of false positives are considerable. It is achieved by Eq.  17 .

Recall (sensitivity or true positive rate) : Recall assesses the model’s ability to detect positive instances, calculated as the ratio of true positives to the sum of true positives and false negatives. This metric holds particular importance in medical diagnostics, where failing to identify a positive case can lead to severe consequences. It is achieved by Eq.  18 .

F1-score : The F1-score, which is the harmonic mean of precision and recall, was used to provide a balance between the two metrics, particularly valuable in situations of class imbalance. It is a more robust measure than accuracy in scenarios where false negatives and false positives have different implications. It is achieved by Eq.  19 .

Cohen’s kappa : The Cohen’s Kappa statistic was applied to assess the agreement between observed and predicted classifications, accounting for chance agreement. This statistic offers a nuanced understanding of the model’s performance, which is particularly valuable in scenarios involving imbalanced datasets. It is achieved by Eq.  20 .

Mean Squared Error (MSE) and Root Mean Squared Error (RMSE) : MSE (Mean Squared Error) and RMSE (Root Mean Squared Error) were calculated to evaluate the average squared difference and the square root of the average squared differences, respectively, between predicted and actual classification categories. These metrics are instrumental in understanding the variance of prediction errors. MSE and RMSE are achieved using Eqs.  21 and 22 , respectively.

Mean Absolute Error (MAE) : MAE (Mean Absolute Error) measures the average magnitude of errors in a set of predictions, regardless of their direction. It is a linear score, meaning that all individual differences are equally weighted in the average. It is achieved using Eq.  23 .

Receiver Operating Characteristic (ROC) Curve and Area Under the Curve (AUC) : The ROC curve graphically illustrates the diagnostic ability of the model by plotting the true positive rate against the false positive rate at various threshold settings. The AUC (Area Under the Curve) provides a single scalar value summarizing the overall performance of the model across all possible classification thresholds. It is achieved using Eq.  24 .

F2-score : The F2-score was calculated to weigh recall higher than precision, useful in scenarios where missing positive predictions is more detrimental than making false positives. It is achieved using Eq.  25 .

These statistical methods and metrics provided a multifaceted evaluation of the model’s performance, ensuring a robust analysis of the predictive capabilities and reliability in classifying the cases within the IQ-OTH/NCCD lung cancer dataset.

The evaluation of the IQ-OTH/NCCD lung cancer dataset through our predictive model yielded detailed insights across various statistical metrics, showcasing the model’s efficacy in classifying lung cancer stages. Here we delve into a comprehensive analysis of each metric:

Confusion matrix : The confusion matrix offered a detailed perspective on the model’s classification performance, unveiling a notable count of true positives and true negatives, reflecting precise predictions. Notably, there were minimal occurrences of false positives and false negatives, underscoring the model’s accuracy in discerning between benign, malignant, and normal cases. The same is visualized in Fig.  7 .

figure 7

Confusion matrix

Accuracy : The overall model accuracy was noted at 99.64%, highlighting the model’s robust capacity to accurately identify and classify instances within the dataset. This exceptional accuracy rate underscores the model’s reliability in clinical diagnostic settings, establishing a solid basis for subsequent validation and potential clinical implementation. To provide visual insight of this Fig.  8 gives truly classified instances.

figure 8

Correctly classified instances

Precision : The precision metric provided valuable insights into the model’s predictive reliability. It attained a precision of 96.77% for benign cases, signifying a high probability that a case predicted as benign is indeed benign. Moreover, for malignant and normal cases, the precision reached 100%, demonstrating the model’s outstanding ability to predict these categories accurately without any false positives.

Recall : The recall scores were equally remarkable, achieving 100% for both benign and malignant cases, and 99.04% for normal cases. These findings underscore the model’s sensitivity and its capability to accurately detect all true positive cases, thereby mitigating the risk of false negatives as a pivotal consideration in medical diagnostics.

F1-score : The F1-scores, which strike a balance between precision and recall, were 98.36% for benign, 100% for malignant, and 99.52% for normal cases. These scores signify the model’s balanced performance, guaranteeing both the accuracy of positive predictions and the reduction of false negatives. To enhance the visualization of the classification report, Table  3 provides a statistical representation.

Based on Table 3 a heatmap to visualize the same detail is provided in Fig.  9 for better insights.

figure 9

Classification report

Cohen’s kappa : With a Cohen’s Kappa score of 0.9938, the model exhibited perfect agreement with the actual classifications, surpassing the performance expected by chance alone. This underscores an elevated level of consistency in the model’s predictions, thus reinforcing its reliability.

Mean Squared Error (MSE) and Root Mean Squared Error (RMSE) : The model reported an MSE of 0.0145 and an RMSE of 0.1206, indicating minimal variance and bias in the prediction errors. These low values suggest that the model’s predictions are consistently close to the actual values, enhancing trust in its predictive power.

Mean Absolute Error (MAE) : With an MAE of 0.0073, the model exhibited minimal average error magnitude in its predictions, signifying high predictive accuracy. This metric further reinforces the model’s suitability for clinical settings where precision is crucial. To visualize the error metrics, a bar chart is given in Fig.  10 .

figure 10

Error metrics barh chart

Receiver Operating Characteristic (ROC) Curve and Area Under the Curve (AUC) : The ROC curves and corresponding AUC values were exceptional, achieving AUCs of 1.00 for malignant, benign, and normal cases. These results indicate the model’s outstanding discrimination ability between different classes across various threshold settings. The roc-auc curve is provided in Fig.  11 .

figure 11

F2-score : The F2-score of 0.9964, which places more emphasis on recall, indicates the model’s strong ability to identify positive cases. This is particularly important in the medical field, where failing to detect a condition could have profound consequences. The visual representation of performance score is given in Fig.  12 .

figure 12

Performance scores

The detailed results across these metrics provide a comprehensive picture of the model’s performance, highlighting its precision, reliability, and robustness in classifying lung cancer stages from the IQ-OTH/NCCD dataset. The findings demonstrate the model’s potential as a diagnostic tool, supporting its further investigation and potential integration into clinical practice.

The analysis of the IQ-OTH/NCCD lung cancer dataset with our model reveals a groundbreaking level of performance in medical image classification. With an accuracy of 99.64% and exceptional precision and recall metrics across the three categories (benign, malignant, and normal), the model emerges as a highly reliable diagnostic aid. The significance of these results extends beyond the high metric scores; it lies in the model’s capability to accurately distinguish between benign and malignant cases, a critical aspect for patient management and treatment planning.

The high F1-score underscores the model’s balanced consideration of precision and recall, thereby minimizing the risk of misdiagnosis. Additionally, the emphasis on recall in the F2-score holds particular significance in the medical domain, where overlooking a positive case (false negative) can have more severe consequences than erroneously identifying a case as positive (false positive). The comparison between the baseline models and proposed model has been given in Table  4 .

In the realm of lung cancer detection, many existing models focus predominantly on binary classification, often neglecting the nuanced differentiation between benign and malignant cases [ 37 ]. Our model’s tri-classification capability sets a new benchmark, offering a more detailed diagnostic tool compared to the binary classifiers. When juxtaposed with existing methods, our model’s performance underscores its advanced detection capabilities, potentially offering a more nuanced and informative diagnostic perspective than currently available tools.

For clinical practice, the integration of such a high-performing model could revolutionize lung cancer diagnostics [ 22 , 38 ]. It can augment radiologists’ capabilities, reducing diagnostic time and increasing throughput. The ability to accurately classify lung nodules as benign, malignant, or normal could significantly reduce unnecessary interventions, minimizing patient exposure to invasive procedures and associated risks. Additionally, it can streamline the patient pathway, ensuring rapid treatment initiation for malignant cases and appropriate follow-up for benign conditions [ 39 , 40 ].

While the results are promising, the study’s limitations warrant consideration. The model’s training on a dataset from a specific demographic and geographic area raises questions about its applicability to broader populations. Additionally, the model’s performance in a controlled study environment might not fully translate to the diverse and unpredictable nature of clinical settings. The black-box nature of deep learning models also poses a challenge in clinical contexts, where understanding the rationale behind a diagnosis is as crucial as the diagnosis itself [ 41 ]. To make it more clear in Fig.  13 some misclassified instances has been shown.

figure 13

Misclassified instances

When evaluating our CNN model’s performance on the lung cancer dataset, we noticed some errors in classification. These mistakes can happen for various reasons. Firstly, some features in the CT scans may look similar between benign and malignant nodules, making it hard for the model to tell them apart. Also, noise and artifacts in the scans can confuse the model by hiding important details. Even though we tried to balance the classes, rare cases could still be challenging for the model to recognize. Plus, early-stage cancer might look very similar to normal tissue, making it tricky for the model to spot. Differences in how scans are taken can also affect the model’s understanding, leading to errors. Lastly, if the model learns too much from the training data, it might not perform well on new, unseen images. To fix these issues, we’re planning to use better techniques for preparing the data, like removing noise more effectively and making the model more flexible to different imaging conditions. We also aim to combine multiple models and use more diverse data to improve accuracy. By addressing these challenges, we hope to make our model better at classifying lung cancer stages.

While the IQ-OTHNCCD lung cancer dataset has been instrumental in developing and validating our model, it is important to recognize its limitations, particularly concerning demographic and geographic diversity. The dataset predominantly represents a specific population, which may not capture the full spectrum of variations seen in global populations. This limitation poses challenges for the model’s generalizability, as differences in demographics, such as age, ethnicity, and underlying health conditions, can influence the presentation of lung cancer in CT scans.

To address these limitations, future research should focus on expanding the dataset to include a more diverse range of CT scan images from various demographic groups and geographic regions. This expansion can be facilitated through collaborations with international medical institutions and accessing publicly available medical imaging repositories. Additionally, incorporating advanced data augmentation techniques that simulate variations in demographic characteristics, such as age and gender, can further enhance the dataset’s diversity. By broadening the dataset, we aim to improve the model’s robustness and ensure its applicability across different populations, ultimately enhancing the utility and reliability of our diagnostic tool in diverse clinical settings. This approach will contribute to developing a more inclusive and universally applicable model for lung cancer diagnosis.

Sensitivity analysis of precision, recall, and F1-score

In our endeavor to comprehensively assess the performance of our Convolutional Neural Network (CNN) model for lung cancer diagnosis, we conducted a sensitivity analysis focusing on precision, recall, and the F1-score. Precision sensitivity involved systematically adjusting the threshold values used for classification to observe its impact on false positive rates and the model’s conservatism in identifying positive cases. As precision increased, indicating a more stringent classification approach, false positives decreased, but the risk of false negatives rose, necessitating a delicate balance in medical diagnostics. Conversely, recall sensitivity entailed modifying the model’s sensitivity to detect positive cases, thereby influencing its ability to minimize false negatives. Heightened recall improved the identification of true positives, crucial for early diagnosis and treatment, albeit with potential increases in false positives, mandating cautious management. Additionally, analyzing the F1-score, a harmonic mean of precision and recall, elucidated its role in balancing false positives and false negatives. Optimizing for a high F1-score underscored a balanced approach, ensuring robust performance across both precision and recall metrics. Overall, the sensitivity analysis underscored the significance of striking a delicate balance between precision, recall, and the F1-score to optimize the model’s performance in clinical settings. By navigating and managing these trade-offs effectively, we can bolster the reliability and efficacy of our model in diagnosing lung cancer, thereby contributing to improved patient outcomes.

Regulatory considerations for clinical application

Implementing machine learning models in clinical settings involves navigating a complex landscape of regulatory requirements to ensure patient safety, data security, and efficacy. One of the primary regulatory hurdles is obtaining approval from medical device regulatory bodies such as the U.S. Food and Drug Administration (FDA), the European Medicines Agency (EMA), or other relevant national authorities. These regulatory agencies require extensive validation studies to demonstrate the model’s accuracy, reliability, and safety in diagnosing lung cancer. This involves rigorous testing on diverse datasets to ensure the model’s generalizability and performance across different patient populations and clinical scenarios.

Additionally, regulatory guidelines mandate that machine learning models used in healthcare must provide a level of interpretability and transparency. Clinicians need to understand the decision-making process of the model to trust and effectively integrate it into clinical workflows. This requirement for explainability poses a challenge for deep learning models, which are often considered “black boxes.” Therefore, developing methods to elucidate the model’s reasoning, such as feature importance analysis or visual explanations, is crucial for meeting regulatory standards.

Data privacy and security are also significant regulatory concerns, particularly with the implementation of regulations like the General Data Protection Regulation (GDPR) in Europe and the Health Insurance Portability and Accountability Act (HIPAA) in the United States. Ensuring that patient data is anonymized, securely stored, and used ethically is essential for compliance. This includes implementing robust data encryption, access controls, and audit trails to protect sensitive health information from unauthorized access and breaches.

Moreover, post-market surveillance is a critical component of regulatory compliance, requiring continuous monitoring of the model’s performance in real-world clinical settings. This involves tracking the model’s diagnostic accuracy, identifying potential biases, and updating the model as needed to maintain its efficacy and safety over time. Establishing a framework for ongoing evaluation and improvement is essential to meet regulatory requirements and ensure the model’s long-term success in clinical applications.

Addressing these regulatory hurdles necessitates close collaboration between developers, healthcare providers, and regulatory bodies to ensure that machine learning models are safe, effective, and aligned with clinical needs. By adhering to these regulatory frameworks, we can facilitate the successful integration of advanced diagnostic tools into healthcare, ultimately enhancing patient outcomes and advancing the field of medical diagnostics.

Future research directions should focus on external validation of the model across various populations and healthcare settings to ascertain its universality and robustness. Integrating multimodal data, encompassing patient history, genetic information, and other diagnostic results, could enhance the model’s diagnostic precision. Addressing the interpretability of deep learning models could foster greater trust and integration into clinical decision-making processes. Additionally, prospective studies assessing the model’s impact on clinical outcomes, patient satisfaction, and healthcare efficiency would provide invaluable insights into its practical benefits and potential areas for improvement.

This study presented a comprehensive analysis of the IQ-OTH/NCCD lung cancer dataset using a sophisticated machine learning model, which demonstrated exceptional performance in classifying lung cancer stages. Key findings include a near-perfect accuracy rate of 99.64%, alongside impressive precision and recall metrics across benign, malignant, and normal case classifications. The model’s balanced F1-score and the emphasis on recall in the F2-score further highlight its diagnostic precision and sensitivity. These results signify a substantial advancement in the model’s ability to differentiate between nuanced lung cancer stages, providing a critical tool for early and accurate diagnosis.

The implications of these discoveries on the field of lung cancer diagnostics are profound. The model’s precision in classifying lung cancer stages holds the promise of substantially enhancing diagnostic protocols, thereby refining the accuracy and efficiency of lung cancer detection. This advancement has the potential to facilitate earlier treatment interventions, potentially enhancing patient outcomes and survival rates. Moreover, the model’s capability to differentiate between benign and malignant nodules could mitigate the need for unnecessary invasive procedures, consequently reducing patient risk and healthcare expenditures.

Future research should focus on external validation of the model to ensure its effectiveness across diverse populations and clinical settings. The exploration of model interpretability is crucial for clinical adoption, where understanding the basis for diagnostic decisions is essential. Additionally, integrating the model with other diagnostic data and clinical workflows could enhance its utility and impact.

Prospective studies are needed to evaluate the model’s real-world clinical impact, particularly its ability to improve patient outcomes, streamline diagnostic pathways, and reduce healthcare costs. The potential for the model to be adapted or extended to other types of cancers or medical imaging modalities also represents an exciting avenue for future research.

This study highlights the potential of advanced machine learning models to transform lung cancer diagnostics, providing a more precise, effective, and nuanced approach to detecting and classifying lung cancer. The ongoing advancement and incorporation of such models into clinical settings hold the promise of catalyzing substantial progress in patient care and outcomes within the field of oncology.

Availability of data and materials

Data used for the findings are publicly available at https://www.kaggle.com/datasets/hamdallak/the-iqothnccd-lung-cancer-dataset .

Nooreldeen R. Current and future development in lung cancer diagnosis. Int J Mol Sci. 2021;22:8661.

Article   CAS   PubMed   PubMed Central   Google Scholar  

Rea G, et al. Beyond visual interpretation: quantitative analysis and artificial intelligence in interstitial lung disease diagnosis expanding horizons in radiology. Diagnostics. 2023;13:2333.

Article   PubMed   PubMed Central   Google Scholar  

Rajasekar V, et al. Lung cancer disease prediction with CT scan and histopathological images feature analysis using deep learning techniques. Results Eng. 2023;18:101111.

Article   CAS   Google Scholar  

Lanjewar MG, Kamini G, Panchbhai, Panem Charanarur. Lung cancer detection from CT scans using modified DenseNet with feature selection methods and ML classifiers. Expert Syst Appl. 2023;224:119961.

Article   Google Scholar  

Raza R, et al. Lung-EffNet: lung cancer classification using EfficientNet from CT-scan images. Eng Appl Artif Intell. 2023;126:106902.

Chaunzwa TL, et al. Deep learning classification of lung cancer histology using CT images. Sci Rep. 2021;11(1):1–12.

Chaturvedi P, Jhamb A, Vanani M, Nemade V. Prediction and Classification of Lung Cancer Using Machine Learning Techniques. IOP Conference Series: Materials Science and Engineering. 2021;1099:012059. https://doi.org/10.1088/1757-899X/1099/1/012059 .

Hong M, et al. Multi-class classification of lung diseases using CNN models. Appl Sci. 2021;11:9289.

Phankokkruad M. Ensemble transfer learning for lung cancer detection. 2021 4th international conference on data science and information technology. 2021.

Google Scholar  

Ren Z, Zhang Y, Wang S. LCDAE: data augmented ensemble framework for lung cancer classification. Technology Cancer Research Treatment. 2022;21:15330338221124372.

Protonotarios NE, et al. A few-shot U-Net deep learning model for lung cancer lesion segmentation via PET/CT imaging. Biomedical Physics Engineering Express. 2022;8(2):025019.

Heuvelmans MA, van Ooijen PM, Ather S, Silva CF, Han D, Heussel CP, Oudkerk M. Lung cancer prediction by Deep Learning to identify benign lung nodules. Lung Cancer. 2021;154:1–4.

Article   PubMed   Google Scholar  

Le NQK, Kha QH, Nguyen VH, Chen YC, Cheng SJ, Chen CY. Machine learning-based radiomics signatures for EGFR and KRAS mutations prediction in non-small-cell lung cancer. Int J Mol Sci. 2021;22(17):9254.

Xie Y, Meng WY, Li RZ, Wang YW, Qian X, Chan C, Leung ELH. Early lung cancer diagnostic biomarker discovery by machine learning methods. Transl Oncol. 2021;14(1):907.

Li Z, et al. Deep Learning Methods for Lung Cancer Segmentation in Whole-Slide Histopathology Images—The ACDC@LungHP Challenge 2019. IEEE J Biomed Health Inform. 2021;25(2):429–40.

Narvekar S, Shirodkar M, Raut T, Vainganka P, Chaman Kumar KM, Aswale S. A Survey on Detection of Lung Cancer Using Different Image Processing Techniques. London, United Kingdom: 2022 3rd International Conference on Intelligent Engineering and Management (ICIEM); 2022. p. 13–8. https://doi.org/10.1109/ICIEM54221.2022.9853190 .

Book   Google Scholar  

Aharonu M, Kumar RL. Convolutional Neural Network based Framework for Automatic Lung Cancer Detection from Lung CT Images. Bangalore, India: 2022 International Conference on Smart Generation Computing, Communication and Networking (SMART GENCON); 2022. p. 1–7. https://doi.org/10.1109/SMARTGENCON56628.2022.10084235 .

Kavitha BC, Naveen KB. Image Acquisition and Pre-processing for Detection of Lung Cancer using Neural Network. Mandya, India: 2022 Fourth International Conference on Emerging Research in Electronics, Computer Science and Technology (ICERECT); 2022. p. 1–4.

Causey JL, et al. Spatial pyramid pooling with 3D convolution improves Lung Cancer Detection, in IEEE/ACM transactions on Computational Biology and Bioinformatics . 1 March-April. 2022;19(2):1165–72. https://doi.org/10.1109/TCBB.2020.3027744 .

Ahmed I, Chehri A, Jeon G, Piccialli F. Automated Pulmonary Nodule Classification and Detection Using Deep Learning Architecture. IEEE/ACM Trans Comput Biol Bioinform. 2023;20(4):2445–56. https://doi.org/10.1109/TCBB.2022.3192139 .

Thakur A, Gupta M, Sinha DK, Mishra KK, Venkatesan VK, Guluwadi S. Transformative breast Cancer diagnosis using CNNs with optimized ReduceLROnPlateau and Early stopping Enhancements. Int J Comput Intell Syst. 2024;17(1):14.

Albalawi E, Thakur A, Ramakrishna MT, Khan B, Sankaranarayanan S, Almarri SB, Aldhyani T. Oral squamous cell carcinoma detection using EfficientNet on histopathological images. Front Med. 2024;10:1349336.

Shah AA, Malik HAM, Muhammad A, Alourani A, Butt ZA. Deep learning ensemble 2D CNN approach towards the detection of lung cancer. Sci Rep. 2023;13(1):2987.

Alzubaidi MA, Otoom M, Jaradat H. Comprehensive and Comparative Global and Local Feature Extraction Framework for Lung Cancer Detection Using CT Scan Images, in IEEE Access . 2021;9:158140–54. https://doi.org/10.1109/ACCESS.2021.3129597 .

Mathio D, Johansen JS, Cristiano S, Medina JE, Phallen J, Larsen KR, Velculescu E. Detection and characterization of lung cancer using cell-free DNA fragmentomes. Nat Commu. 2021;12(1):5060.

Mehmood S et al. Malignancy Detection in Lung and Colon Histopathology Images Using Transfer Learning With Class Selective Image Processing, in IEEE Access , vol. 10, pp. 25657–25668, 2022, https://doi.org/10.1109/ACCESS.2022.3150924 .

Dritsas E, Trigka M. Lung cancer risk prediction with machine learning models. Big Data Cogn Comput. 2022;6(4):139.

Masud M, Sikder N, Nahid AA, Bairagi AK, AlZain MA. A machine learning approach to diagnosing lung and colon cancer using a deep learning-based classification framework. Sensors. 2021;21(3):748.

Naseer S, Akram T, Masood M, Rashid, Jaffar A. Lung Cancer Classification Using Modified U-Net Based Lobe Segmentation and Nodule Detection, in IEEE Access , vol. 11, pp. 60279–60291, 2023, https://doi.org/10.1109/ACCESS.2023.3285821 .

Bharathy S, Pavithra R. Lung Cancer Detection using Machine Learning. In 2022 International Conference on Applied Artificial Intelligence and Computing (ICAAIC). 2022. p. 539–43 IEEE.

Kasinathan G, Jayakumar S. Cloud based lung tumor detection and stage classification using deep learning techniques. BioMed Res Int. 2022;2022:4185835.

Das S, et al. Automated prediction of Lung Cancer using Deep Learning algorithms. Applied Artificial Intelligence. CRC; 2023. pp. 93–120.

Chapter   Google Scholar  

Tasnim N, et al. A Deep Learning Based Image Processing Technique for Early Lung Cancer Prediction. 2024 ASU International Conference in Emerging Technologies for Sustainability and Intelligent Systems (ICETSIS). 2024. IEEE.

Safta W. Advancing pulmonary nodule diagnosis by integrating Engineered and Deep features extracted from CT scans. Algorithms. 2024;17(4):161.

Khaliq K, et al. LCCNet: a deep learning based Method for the identification of lungs Cancer using CT scans. VFAST Trans Softw Eng. 2023;11(2):80–93.

Nigudgi S. Lung cancer CT image classification using hybrid-SVM transfer learning approach. Soft Comput. 2023;27(14):9845–59.

Diwakar M, Singh P, Shankar A. Multi-modal medical image fusion framework using co-occurrence filter and local extrema in NSST domain. Biomed Signal Process Control. 2021;68:102788. https://doi.org/10.1016/j.bspc.2021.102788 .

Das M, Gupta D, Bakde A. An end-to-end content-aware generative adversarial network-based method for multimodal medical image fusion. Data Analytics Intell Sys. 2024;7(1):7–10. https://doi.org/10.1088/978-0-7503-5417-2ch7 .

Jie Y, Xu Y, Li X, Tan H. (2024). TSJNet: A Multi-modality Target and Semantic Awareness Joint-driven Image Fusion Network. arXiv preprint arXiv:2402.01212 .

Dhaundiyal R, Tripathi A, Joshi K, Diwakar M, Singh P. Clustering based multi-modality medical image fusion. In: Journal of Physics: Conference Series. 2020 (Vol. 1478, No. 1, p. 012024). IOP Publishing.

Diwakar M, Singh P, Shankar A, Nayak RS, Nayak J, Vimal S, Sisodia D. Directive clustering contrast-based multi-modality medical image fusion for smart healthcare system. Netw Model Anal Health Inf Bioinf. 2022;11(1):15.

Download references

Acknowledgements

Not applicable.

This research received no external funding.

Author information

Authors and affiliations.

Al-Ameen Engineering College (Autonomous), Erode, Tamil Nadu, India

M. Mohamed Musthafa

Department of Computer science and Engineering, East Point College of Engineering & Technology, Bangalore, India

I. Manimozhi

Department of Computer Science and Engineering, JAIN (Deemed-to-be University), Bengaluru, 562112, India

T. R. Mahesh

Adama Science and Technology University, Adama, 302120, Ethiopia

Suresh Guluwadi

You can also search for this author in PubMed   Google Scholar

Contributions

M.M.M took care of the review of literature and methodology. M.T.R has done the formal analysis, data collection and investigation. I.M has done the initial drafting and statistical analysis. S.G has supervised the overall project. All the authors of the article have read and approved the final article.

Corresponding author

Correspondence to Suresh Guluwadi .

Ethics declarations

Ethics approval and consent to participate.

Not applicable. 

Consent for publication

Not applicable as the work is carried out on publicly available dataset.

Competing interests

The authors declare no competing interests.

Additional information

Publisher’s note.

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ . The Creative Commons Public Domain Dedication waiver ( http://creativecommons.org/publicdomain/zero/1.0/ ) applies to the data made available in this article, unless otherwise stated in a credit line to the data.

Reprints and permissions

About this article

Cite this article.

Musthafa, M.M., Manimozhi, I., Mahesh, T.R. et al. Optimizing double-layered convolutional neural networks for efficient lung cancer classification through hyperparameter optimization and advanced image pre-processing techniques. BMC Med Inform Decis Mak 24 , 142 (2024). https://doi.org/10.1186/s12911-024-02553-9

Download citation

Received : 16 April 2024

Accepted : 22 May 2024

Published : 27 May 2024

DOI : https://doi.org/10.1186/s12911-024-02553-9

Share this article

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

  • Lung cancer
  • Machine learning
  • Classification
  • Diagnostic accuracy

BMC Medical Informatics and Decision Making

ISSN: 1472-6947

literature review of techniques

  • Open access
  • Published: 24 May 2024

Vertebral hemangiomas: a review on diagnosis and management

  • Kyle Kato 1 ,
  • Nahom Teferi 2 ,
  • Meron Challa 1 ,
  • Kathryn Eschbacher 3 &
  • Satoshi Yamaguchi 2  

Journal of Orthopaedic Surgery and Research volume  19 , Article number:  310 ( 2024 ) Cite this article

217 Accesses

2 Altmetric

Metrics details

Vertebral hemangiomas (VHs) are the most common benign tumors of the spinal column and are often encountered incidentally during routine spinal imaging.

A retrospective review of the inpatient and outpatient hospital records at our institution was performed for the diagnosis of VHs from January 2005 to September 2023. Search filters included “vertebral hemangioma,” "back pain,” “weakness,” “radiculopathy,” and “focal neurological deficits.” Radiographic evaluation of these patients included plain X-rays, CT, and MRI. Following confirmation of a diagnosis of VH, these images were used to generate the figures used in this manuscript. Moreover, an extensive literature search was conducted using PubMed for the literature review portion of the manuscript.

VHs are benign vascular proliferations that cause remodeling of bony trabeculae in the vertebral body of the spinal column. Horizontal trabeculae deteriorate leading to thickening of vertical trabeculae which causes a striated appearance on sagittal magnetic resonance imaging (MRI) and computed tomography (CT), “Corduroy sign,” and a punctuated appearance on axial imaging, “Polka dot sign.” These findings are seen in “typical vertebral hemangiomas” due to a low vascular-to-fat ratio of the lesion. Contrarily, atypical vertebral hemangiomas may or may not demonstrate the “Corduroy” or “Polka-dot” signs due to lower amounts of fat and a higher vascular component. Atypical vertebral hemangiomas often mimic other neoplastic pathologies, making diagnosis challenging. Although most VHs are asymptomatic, aggressive vertebral hemangiomas can present with neurologic sequelae such as myelopathy and radiculopathy due to nerve root and/or spinal cord compression. Asymptomatic vertebral hemangiomas do not require therapy, and there are many treatment options for vertebral hemangiomas causing pain, radiculopathy, and/or myelopathy. Surgery (corpectomy, laminectomy), percutaneous techniques (vertebroplasty, sclerotherapy, embolization), and radiotherapy can be used in combination or isolation as appropriate. Specific treatment options depend on the lesion's size/location and the extent of neural element compression. There is no consensus on the optimal treatment plan for symptomatic vertebral hemangioma patients, although management algorithms have been proposed.

While typical vertebral hemangioma diagnosis is relatively straightforward, the differential diagnosis is broad for atypical and aggressive lesions. There is an ongoing debate as to the best approach for managing symptomatic cases, however, surgical resection is often considered first line treatment for patients with neurologic deficit.

Introduction

Vertebral hemangiomas (VHs) are benign vascular lesions formed from vascular proliferation in bone marrow spaces that are limited by bony trabeculae [ 1 ]. VHs are quite common and are often incidental findings on spinal computed tomography (CT) and magnetic resonance imaging (MRI) of patients presenting with back or neck pain [ 2 , 3 ]. Previous, large autopsy series such as Schmorl (1926) and Junghanns (1932) found a VH prevalence of 11% in adult specimens [ 1 , 4 ]. However, the prevalence is believed to be higher as modern imaging techniques allow for better detection of small VHs that may not be easily diagnosed on autopsy specimens [ 5 ]. They can occur at any age but are most often seen in individuals in their 5th decade of life with a slight female preponderance [ 2 , 6 , 7 ]. Most VHs are found in the thoracic or lumbar spinal column and often involve the vertebral body, though they can extend to the pedicle, lamina, or spinous process, and may span multiple spinal segments [ 5 ].

The vast majority of VHs are asymptomatic, quiescent lesions [ 3 ]. Prior studies have stated less than 5% of VHs are symptomatic [ 8 , 9 ], although the 2023 study by Teferi et. al. demonstrated 35% of their 75 VH patients presented with symptoms including localized pain, numbness, and/or paresthesia [ 1 ]. 85% of symptomatic cases in this series were found to have VHs localized in the thoracic spine [ 1 ].

Among symptomatic VHs, up to 20–45% of cases may exhibit aggressive features including damage to surrounding bone and soft tissue or demonstrate rapid growth that extends beyond the vertebral body and invades the paravertebral and/or epidural space [ 1 , 5 , 10 , 11 ]. When “aggressive”, VHs may compress the spinal cord and nerve roots causing severe symptoms [ 1 , 5 ]. 45% of symptomatic VH patients present with neurologic deficits secondary to compressive lesions, bony expansion, disrupted blood flow, or vertebral body collapse while the remaining 55% present solely with back pain [ 8 , 12 , 13 , 14 , 15 ].

VHs are primarily diagnosed with radiographs, CT, and MRI, although other studies such as angiography, nuclear medicine studies, and positron emission—computed tomography (PET-CT) have been previously utilized to a lesser extent [ 1 , 15 , 16 , 17 , 18 , 19 ]. Radiologically, these lesions can be grouped into Typical, Atypical, and Aggressive subtypes (see radiological features). Histologically, VHs are composed of varying proportions of adipocytes, blood vessels, and interstitial edema which leads to thickening of vertical trabeculae in the affected vertebra [ 5 ]. This histopathology leads to the characteristic “polka-dot” sign on axial CT/MRI and “corduroy” sign on coronal and sagittal CT/MRI [ 5 , 20 ].

In terms of management, conservative treatment with observation and pain control are the mainstay of treatment for asymptomatic VH patients and those with mild-to-moderate pain respectively [ 21 ]. Surgical decompression is indicated for patients with neurologic deficits including compressive myelopathy or radiculopathy [ 22 ]. Other symptomatic patients have a wide variety of treatment options available including sclerotherapy, embolization, radiotherapy, and/or vertebroplasty [ 1 , 5 , 23 ]. The best approach in managing an individual patient with a symptomatic VH has not been elucidated and there have been different management algorithms suggested based on varying institutional experiences [ 1 , 5 , 24 , 25 ].

This article will review what is currently known regarding VHs. Diagnostic techniques and challenges will be highlighted as well as current treatment recommendations from the literature.

A retrospective review of the inpatient and outpatient hospital records at our institution was performed for the diagnosis of VHs from January 2005 to September 2023. Search filters included “vertebral hemangioma” "back pain,” “weakness,” “radiculopathy,” and “focal neurological deficits.” Radiographic evaluation of these patients included plain X-rays, CT, and MRI. Following confirmation of a diagnosis of VH, these images were used to generate the figures used in this manuscript. Moreover, an extensive literature search was conducted using PubMed for the literature review portion of the manuscript.

68 Articles were selected from our PubMed search. This article will review what is currently known about VHs. Diagnostic techniques and challenges will be highlighted as well as current treatment recommendations from the literature.

Histopathological features

VHs are benign tumors composed of various sized blood vessels, adipocytes, smooth muscle, fibrous tissue, hemosiderin, interstitial edema, and remodeled bone [ 5 , 7 , 26 , 27 ]. Macroscopically, they appear as soft, well-demarcated, dark red masses with intralesional, sclerotic boney trabeculae and scattered blood-filled cavities lending to a honeycomb appearance [ 5 , 6 , 7 ].

Microscopically, there are four subtypes of hemangiomas based on vascular composition: capillary, cavernous, arteriovenous (AV), and venous hemangiomas [ 28 ] (Fig.  1 ). Capillary hemangiomas are composed of small, capillary-sized blood vessels while cavernous hemangiomas present with collections of larger, dilated blood vessels [ 1 ]. AV hemangiomas are composed of interconnected arterial and venous networks while an abnormal collection of veins comprises venous hemangiomas [ 1 ]. VHs are predominately capillary and cavernous subtypes with thin-walled blood vessels surrounded by edematous stroma and boney trabeculae that permeate the bone marrow space [ 1 , 7 , 27 ]. In a sample of 64 surgically treated VHs cases, Pastushyn et al. reported 50% were capillary subtype, 28% were cavernous subtype, and 22% were mixed [ 29 ]. Occasionally, secondary reactive phenomena such as fibrous and/or adipose involution of bone marrow and remodeling of bone trabeculae may be seen [ 7 , 26 ]. Symptomatic VHs can be caused by all hemangioma subtypes, and there are no distinguishing features between subtypes on imaging [ 1 ]. However, cavernous and capillary subtypes are associated with favorable postsurgical outcomes [ 29 ].

figure 1

Capillary hemangioma ( A and B ): A H&E 200× magnification showing proliferation of small caliber vessels within a fibrous stroma with surrounding bone, B CD34 immunohistochemical stain, 200× magnification highlighting small caliber vascular spaces. Cavernous hemangioma ( C and D ): C H&E 100× magnification showing proliferation of thin-walled, dilated, blood filled vascular channels, D H&E 200× magnification: Thin-walled, dilated vascular channels within a loose stroma with adjacent mature bone. Venous hemangioma ( E and F ): E H&E 100 ×  magnification showing abnormal proliferation of thick-walled vessels with dilated lumens. F H&E 100× magnification reveals tightly packed, thick-walled vessels with adjacent fragments of mature bone

Radiographic features

The histopathology of VHs gives rise to imaging features used to classify VHs as typical, atypical, or aggressive [ 13 ]. Typical and atypical MRI findings are correlated with the intralesional ratio of fat to vascular components [ 20 ]. Lesions with a high fat content are more likely to demonstrate features of typical VHs while those with a high vascular content (atypical VHs) tend to present without these findings [ 5 , 30 , 31 ]. Aggressive VHs have features including destruction of the cortex, invasion of the epidural and paravertebral spaces, and lesions extending beyond the vertebral body [ 13 , 15 , 20 ].

Laredo et al. demonstrated that VHs with a higher fatty content are generally quiescent lesions, while those with a higher vascular content are more likely to display “active” behavior and potentially evolve into compressive lesions [ 20 ]. Therefore, asymptomatic VHs can display both typical or atypical imaging findings while symptomatic lesions are more likely to present with atypical or aggressive findings [ 1 ]. Despite radiographically typical VHs being relatively easy to diagnose, atypical and aggressive VHs are much more challenging to recognize as they do not present with classic imaging findings and often mimic other pathologies such as multiple myeloma, metastatic bone lesions, and inflammatory conditions [ 5 , 30 , 31 ]. Compressive VHs often have coinciding radiologic and clinical classifications due to the correlation between aggressive behavior and compressive symptoms [ 5 ].

While MRI, CT, and radiographs are the primary imaging modalities used in the workup of VHs, other studies have also been used. Angiography will occasionally be performed to identify feeding/draining vessels and evaluate the blood supply to the spinal cord [ 5 ]. Multiphase technetium 99-methyl diphosphonate ( 99 Tc-MDP) bone scintigraphy may show increased tracer uptake in all phases (perfusion, blood pool, and delayed) due to technetium 99-labeled red blood cell accumulation in the tumors, which occurs in all hemangiomas [ 16 ]. PET-CT has been used to classify VHs as “hot” or “cold” lesions based on the degree of 18-FDG and 68-Ga DOTATATE uptake [ 17 , 18 , 19 ]. Although angiography is useful in clarifying the vascular network of aggressive VHs primarily, nuclear medicine studies offer a much more limited contribution to diagnosis when compared to CT and MRI [ 5 ].

Typical VHs

The collection of thin-walled, blood-filled spaces that comprise VHs cause resorption of horizontal trabeculae and reinforcement of vertical trabeculae, leading to a pattern of thickened vertical trabeculae interspersed with lower density bone of the nonexpanding vertebral body [ 15 , 31 , 32 ]. This composition is responsible for the “corduroy cloth” appearance seen in typical VHs on radiographic images [ 31 ].

On unenhanced axial CT images, typical VHs are characterized by a “polka dot” appearance, termed polka-dot sign. This is caused by small, punctate areas of high attenuation from hyperdense trabeculae surrounded by hypodense stroma [ 20 , 33 ] (Fig.  2 ). Like radiographs, sagittal and coronal CT images display the “corduroy” sign caused by thickened trabeculae in a field of hypodense bone (Fig.  2 ). There is no extraosseous extension of the hemangioma in typical VHs [ 5 ].

figure 2

Sagittal ( A ) and axial ( B ) CT scans of a typical VH in an asymptomatic 50-year-old male demonstrating the “Corduroy” and “Polka-dot” signs respectively. Sagittal ( C ) and axial ( D ) T1-weighted MRIs of typical VHs are predominately hyperintense with areas of hypo-intensity due to thickening of vertical trabeculae. Sagittal ( E ) and axial ( F ) T2-weighted MRIs of typical VHs also appear as hyperintense lesions with areas of hypo-intensity that may demonstrate the “Corduroy” and “Polka-dot” signs as seen in CT images of typical VHs

Typical VHs tend to appear as hyperintense lesions on T1- and T2-weighted MRI sequences due to predominately fatty overgrowth with penetrating blood vessels [ 31 ] (Fig.  2 ). There are punctate areas of slight hypointensity within the lesion on axial T1-weighted MRI due to thickened vertical trabeculae which resembles the “polka-dot" sign [ 5 ] (Fig.  2 ). These trabeculae appear as linear striations on sagittal/coronal T1- and T2-weighted MRI [ 5 ] (Fig.  2 ). Fluid-sensitive sequences (i.e. short-tau inversion recovery or fat-saturated T2-weighted MRI) appear slightly hyperintense due to the vascular components of the lesion, and T1-weighted MRI with contrast demonstrates heterogenous enhancement of the lesion [ 3 ] (Fig.  3 ).

figure 3

Contrast-enhanced T1 MRIs of a T8 VH in an asymptomatic fourteen-year-old female ( A ) and L3, L5 VHs in a thirty-one-year-old female with back pain ( B ), illustrating the heterogenous presentation of hemangiomas on post-contrast MRI

Atypical VHs

In contrast to typical VHs, atypical VHs tend to have a higher vascular component-to-fat ratio and may not demonstrate the classical imaging findings such as the “corduroy” and “polka-dot” signs [ 5 ]. This composition gives the lesion an iso- to hypointense appearance on T1-weighted MRI as well as a very high intensity appearance on T2-weighted and fluid-sensitive MRI [ 20 , 31 ] (Fig.  4 ). Atypical VHs often mimic primary bony malignancies or metastases and are more likely to demonstrate aggressive features, often making them difficult to diagnose [ 12 , 13 , 14 , 15 ].

figure 4

Asymptomatic fifty-six-year-old male with a T9 atypical vertebral hemangioma that appears iso- to hypointense on axial T1 MRI ( A ) and hyperintense on axial T2 MRI ( B ). Atypical vertebral hemangiomas of the L3 and L5 vertebral bodies in a thirty-one-year-old female who presented with backpain. Sagittal T1 ( C ) and T2 ( D ) demonstrate hypo- and hyperintense lesions respectively

Aggressive VHs

Aggressive VHs routinely have atypical features on any imaging modality [ 1 , 5 ]. They may appear radiographically normal or show nonspecific findings such as osteoporosis, pedicle erosion, cortex expansion, vertebral collapse, or irregular vertical trabeculae associated with lytic areas of varying size [ 13 , 15 ] (Fig.  5 ).

figure 5

Fifty-five-year-old female with an aggressive vertebral hemangioma of the L4 vertebral body with extension into the spinal canal. A Sagittal T1 MRI shows hypo-intensity of the entire vertebral body, although vertebral height is maintained. B Sagittal T2 MRI redemonstrates the lesion but appears hyperintense due to the vascularity of the hemangioma. Axial T1 ( C ) and T2 ( D ) MRI show involvement of the pedicles bilaterally and extension of the lesion into the anterior epidural space

CT findings are often nonspecific, including features such as extraosseous soft tissue expansion, cortical ballooning, or cortical lysis [ 34 , 35 ]. As with atypical VHs, the “corduroy” and “polka-dot” signs may not be readily visualized in aggressive or destructive lesions due to the higher vascular-to-fat ratio common in these hemangiomas [ 5 ]. However, it is important to be mindful of these signs because they can guide to the correct diagnosis. Other CT features that may assist in the diagnosis of inconspicuous VHs include extension of the lesion into the neural arch, involvement of the entire vertebral body, or an irregular honeycomb pattern due to serpentine vascular channels and fatty proliferation within the network of reorganizing bony trabeculae [ 20 ]. Vertebral fractures are rare due to the reinforcement of vertical trabeculae [ 1 ].

The composition of aggressive VHs, with a hypervascular stroma and less fat, results in a hypointense lesion on T1-weighted MRI [ 20 , 31 ] (Fig.  5 ). Again, this may conceal the “corduroy” and “polka-dot” signs which remain amongst the most useful imaging findings in the diagnosis of VHs, particularly in cases where other findings are nonspecific [ 5 ]. These non-specific findings may include hyperintensity on T2-weighted MRI due to the vascular components of the lesion (Fig.  5 ), which is also seen in most neoplastic and inflammatory lesions [ 31 ]. Areas of hyperintensity on fluid-sensitive MRI and the presence of lipid-dense content within the lesion may be seen as well [ 31 , 36 ]. Other features suggestive of an aggressive VH include a maintained vertebral body height, a sharp margin with normal marrow, an intact cortex adjacent to a paraspinal mass, or enlarged paraspinal vessels, however these findings are also nonspecific and relatively uncommon [ 5 , 13 ]. Although highly unusual, there have been cases of aggressive VHs with extensive intraosseous fatty stroma and simultaneous extraosseous extension of the lesion, permitting a straightforward diagnosis [ 36 ].

Even though some aggressive VHs may be diagnosed on CT and MRI, challenging cases may warrant the use of more advanced imaging techniques for accurate diagnosis. Higher fluid content relative to cellular soft tissue gives hemangiomas a bright appearance on diffusion weighted imaging (DWI) with elevated apparent diffusion coefficient (ADC) values, distinguishing them from metastases [ 37 ]. Volume transfer constant (K trans ) and plasma volume, which reflect capillary permeability and vessel density respectively, are quantitative measures derived from dynamic contrast enhanced magnetic resonance imaging (DCE MRI) perfusion imaging that can also be used to differentiate VHs and metastases [ 38 ]. K trans and plasma volume are both low in VHs and elevated in metastatic lesions [ 38 ]. Furthermore, aggressive VHs may show a signal drop when comparing non-contrast T1-weighted MRI with and without fat suppression, as well as microscopic lipid content on chemical shift imaging [ 39 ]. Finally, characteristic findings of aggressive VHs in angiography include vertebral body arteriole dilation, multiple capillary phase blood pools, and complete vertebral body opacification [ 15 ].

Laredo et al. [ 15 ] proposed a six-point scoring system to assist in the diagnosis of aggressive VHs based on the more common features observed in radiographs and CT. One point was given for each of the following findings: a soft tissue mass, thoracic location between T3–T9, involvement of the entire vertebral body, an irregular honeycomb appearance, cortical expansion, and extension into the neural arch [ 15 ]. The authors suggest that aggressive VHs should be suspected when a patient presents with nerve root pain in association with three or more of these features [ 15 ]. However, additional studies are needed to determine the utility of this scoring system as the predictive power has not been determined [ 5 ].

Some VHs are difficult to diagnose because they can have nonspecific findings on radiographs, CT, and MRI, making characteristic findings such as the “corduroy” and “polka-dot” signs, when present, important diagnostic features. VHs may also coexist with other vertebral lesions, further complicating the diagnosis. In these cases, angiography can differentiate a VH from a nonvascular lesion [ 40 ]. Ultimately, a biopsy may be required for accurate diagnosis, especially when there is potential for a malignant lesion such as angiosarcoma or epithelioid hemangioendothelioma.

Clinical features

VHs are often noted incidentally on spinal imaging and are often observed in patients in their fifth to sixth decade of life. Studies have shown that vertebral hemangiomas exhibit a slight female preponderance, with a male-to-female ratio of 1:1.5. [ 6 ]. Clinically, most VHs are asymptomatic and quiescent lesions, which rarely demonstrate active behavior and become symptomatic [ 41 ]. VHs occur most frequently in the thoracic spine [ 42 ], followed by the lumbar spine and cervical spine; sacral involvement is very rare [ 43 ].

When symptomatic, VHs can present with localized back pain or result in neurologic symptoms that are attributable to spinal cord compression, nerve root compression, or both, leading to myelopathy and/or radiculopathy [ 1 ]. At least 4 mechanisms of spinal cord and nerve root compression have been suggested: (1) hypertrophy or ballooning of the posterior cortex of the vertebral body caused by the angioma, (2) extension of the angioma through the cortex into the epidural space, (3) compression fracture of the involved vertebra, and (4) epidural hematoma [ 44 ]. When aggressive and symptomatic with spinal cord compression, VHs tend to occur in the thoracic spine [ 42 ].

Boriani et al. classified VHs into 4 groups based on the presence of symptoms and radiographic findings [ 45 ]. These include: Type I—latent, mild bony destruction with no symptoms; Type II—active, bony destruction with pain; Type III—aggressive, asymptomatic lesion with epidural and/or soft-tissue extension; and Type IV—aggressive, neurologic deficit with epidural and/or soft tissue extension.

Management options

Most VHs are asymptomatic and do not require treatment [ 1 , 21 ]. Treatment is indicated in cases with back pain or neurological symptoms, including myelopathy and/or radiculopathy, often caused by neuronal compression or vertebral fracture [ 1 ]. Previously, surgery was the primary treatment option offered to these patients, which was associated with an increased risk of complications, particularly intraoperative bleeding [ 1 ]. New modalities such as vertebroplasty have since gained traction as adjuncts or alternatives to surgery [ 1 ]. Today, there are several management options available for the treatment of symptomatic VHs, including conservative medical therapy, surgery, percutaneous techniques, radiotherapy, or a combination of these modalities [ 1 , 46 ].

There is no consensus on the best treatment strategy, however recently Teferi et. al. proposed a treatment algorithm for VHs based on their institutional experience and literature review (Fig.  6 ) [ 1 ]. They recommend conservative management for typical, asymptomatic VHs, CT-guided biopsy and metastatic workup with PET-CT for radiographically atypical VHs, surgical intervention with or without adjuvant therapy in cases with epidural spinal cord compression or vertebral compression fracture, and radiotherapy for recurrent, asymptomatic VHs following surgery.

figure 6

Algorithm for diagnosis and management of VHs proposed by Teferi et al. [ 1 ]

Surgical treatment of VHs is recommended in cases with rapid or progressive neurologic symptoms including compressive myelopathy or radiculopathy [ 47 ]. Baily et al. documented the first case of surgical management for VHs after they successfully resolved a patient’s paraplegia secondary to an aggressive VH [ 48 ]. Prior to the 1960s, the average neurological recovery rate was 73% (range, 43–85%) with a mortality rate of 11.7% [ 49 ]. This is consistent with a series published by Ghormley et al. in 1941 where 5 symptomatic VH patients were treated with decompressive laminectomy and postoperative radiotherapy. Although three patients achieved partial or complete resolution of neurologic deficits, the procedure resulted in the death of the remaining two patients secondary to significant blood loss [ 50 ]. There were very few cases of symptomatic VHs documented prior to the 1960s, with one literature review reporting only 64 instances of VHs with neurologic dysfunction [ 49 ]. More recent studies demonstrate improvement in surgical outcomes with neurological recovery reaching 100% and mortality as low as 0% [ 42 ].

The goal of surgery is to decompress neural elements and stabilize the spine [ 1 ]. Potential options include corpectomy, involving resection of a portion of the vertebral body containing the hemangioma, followed by anterior column reconstruction and/or laminectomy, which offers indirect decompression [ 1 ]. The selected approach depends on the size of the hemangioma and the extent of vertebral body and/or neural arch involvement due to potential weaknesses in the anterior column and the location of the epidural intrusion into the spinal canal [ 1 ]. For example, corpectomy and reconstruction could be performed in cases with ventral spinal cord compression while cases with dorsal compression could be treated with laminectomy [ 1 ].

Corpectomy has an increased risk of substantial intraoperative blood loss, up to 5 L in some cases, due to the hypervascular nature of VHs [ 1 , 51 ]. Acosta et al. reported an average blood loss of 2.1 L in their series of 10 aggressive VHs treated with corpectomy [ 51 ]. Conversely, laminectomy has a lower surgical burden and reduced risk of significant intraoperative blood loss [ 1 ]. Laminectomy blood loss can be further reduced by nearly 50% by performing vertebroplasty before laminectomy [ 8 ]. Preoperative embolization of VHs should also be considered to minimize intraoperative blood loss and reduce mortality [ 1 , 22 ].

Goldstein et al. demonstrated that en bloc resection may not be necessary, as intralesional resection produced equivalent long-term survival and prevention of recurrence in their series of 65 patients [ 47 ]. However, there have not been any large-scale studies comparing outcomes and recurrence rates of indirect decompression versus corpectomy [ 1 ].

The treatment algorithm proposed by Teferi et al. suggests dividing symptomatic VH patients with radiculopathy or neurological deficit into cohorts of epidural spinal cord compression (ESCC) versus vertebral body compression fracture to determine appropriate surgical intervention (Fig.  6 ) [ 1 ]. Patients with ESCC are encouraged to undergo preoperative embolization followed by laminectomy with or without fusion depending on spinal stability, or preoperative embolization followed by corpectomy and fusion if ESCC is accompanied by extensive anterior column compromise [ 1 ]. Conversely, the recommended treatment for symptomatic VHs secondary to vertebral body compression fracture is posterior laminectomy with decompression and fusion [ 1 ].

Whether through corpectomy or laminectomy, surgical management of VHs has a low recurrence rate [ 1 ]. Piper et al. reported complete remission in 84% of VHs treated surgically in their 2020 meta-analysis [ 52 ]. They also reported a severe complication rate, including pathological fracture, significant intraoperative blood loss, wound infection, and cerebrospinal fluid leak, of 3.5% [ 1 , 52 ].

Percutaneous techniques

Percutaneous techniques include vertebroplasty, sclerotherapy, and embolization which have been rising in popularity as treatment options for VHs in isolation or in combination with surgery [ 1 ].

Vertebroplasty is a minimally invasive procedure that improves the structural integrity of a vertebra by injecting an acrylic compound, such as polymethyl methacrylate (PMMA), into a lesion [ 1 ]. It was first utilized in the treatment of VHs by Galibert et al. in 1987 [ 53 ]. PMMA causes thrombosis and irreversible sclerosis of the hemangiomatous venous pool, shrinking the lesion and consolidating trabecular microfractures [ 1 ]. It allows for rapid recovery of mobility, enhances anterior column support, and provides vertebral stabilization, but does not induce new bone formation due to poor biological activity and absorbability [ 54 , 55 ]. Vertebroplasty is particularly effective in alleviating back pain in VH patients with intravertebral fractures by providing an immediate analgesic effect and has previously been recommended as stand-alone first line therapy for VHs with moderate to severe back pain without neurologic compromise [ 1 , 54 ]. It can also be used in combination with surgery to reduce intraoperative blood loss when given as a preoperative adjunct therapy [ 8 ]. The most common complication of vertebroplasty is extravasation of injected compound outside the vertebral body with rates of 20–35% [ 55 , 56 ]. However, some researchers suggest small amounts of extravasation should be considered a stopping point rather than a complication as the vast majority of cases are asymptomatic [ 55 , 56 ]. In a series of 673 vertebroplasty cases, Layton et al. reported extravasation in 25% of patients with only 1% developing clinical symptoms of new onset radiculopathy (5 patients) or symptomatic pulmonary embolism (1 patient) [ 56 ]. Their second most common complication was rib fracture related to lying prone on the fluoroscopy table during the procedure which occurred in 1% of cases (7 patients) [ 56 ].

Alternatively, sclerotherapy involves direct intralesional injection of ethanol under percutaneous CT-guidance which causes thrombosis and destruction of endothelium, resulting in devascularization, shrinkage of the lesion, and, consequently, decompression of the neural elements [ 46 ]. It was first described as a treatment for VHs in 1994 by Heiss et al. and is less common in the treatment of VHs [ 57 ]. CT angiography is a prerequisite to target the most hypervascular subsection of the lesion and ensure patients are candidates for the procedure without leakage of contrast media, which occurred in 25% of patients in a series of 18 cases [ 58 ]. There are reports of intraoperative sclerotherapy as an adjunct to surgery, but the sample sizes are similarly limited [ 59 , 60 ]. Complications of direct ethanol injection include neurologic deterioration (including Brown- Sequard syndrome), pathologic fractures, and VH recurrence [ 46 , 61 ].

The last option for percutaneous intervention is trans-arterial embolization of feeding vessels using particulate agents [ 1 ]. It has been used as a preoperative adjunct therapy with surgery to reduce blood loss as well as a primary treatment for VHs alone or in conjunction with vertebroplasty [ 41 , 62 , 63 , 64 ]. In a series of 26 patients, Premat et al. demonstrated embolization combined with vertebroplasty was safe and effective in treating pain associated with aggressive VHs but was less effective in resolving motor deficits [ 65 ]. The primary role for embolization in the treatment of compressive VHs is preoperative adjunct therapy to reduce the risk of procedural bleeding [ 62 ].

  • Radiotherapy

Radiotherapy (XRT) is a noninvasive approach that can obliterate hemangiomas and relieve pain through vascular necrosis and/or anti-inflammatory effects [ 1 ]. It is a suitable option for VH patients with back pain and no neurologic deficits, or as postoperative adjunct therapy after suboptimal surgical decompression. Patients with neural element compromise often require prompt decompression to prevent irreversible injury that is more appropriately managed with surgery rather than the delayed response offered by XRT [ 1 , 21 , 66 ]. Neurological deficits may, in fact, be aggravated by XRT, as demonstrated in 20% of patients with aggressive VHs from a series of 29 cases by Jiang et al. [ 8 ]. Multiple studies have proclaimed a 60–80% success rate in eliminating symptoms from VHs using XRT, which increases to over 90% when including partial symptom relief [ 8 , 67 , 68 ]. This does include neurological deficits in some cases, but the response of these symptoms to XRT continues to vary [ 52 ]. A radiation dose of at least 34 Gy was recommended by Heyd et al. after their multicenter study identified significantly greater symptom relief and recurrence control compared to lower doses [ 67 ].

XRT is gaining popularity as a postoperative adjunct therapy intended to reduce local recurrence, especially in subtotal resections [ 8 , 52 , 67 ]. There is a 50% recurrence rate in partial resections without adjunct XRT [ 8 , 11 ]. The extent to which XRT can reduce recurrence has not been fully elucidated and has been suggested for future study [ 52 ]. However, these potential benefits must be weighed against the known adverse effects including nausea, fatigue, anorexia, ileus, radionecrosis, and specifically in spinal XRT, radiation myelitis [ 1 , 8 , 52 ].

VHs are often asymptomatic, incidental findings on routine spinal imaging that do not require treatment or follow-up imaging unless they become symptomatic. Most can be diagnosed with characteristic CT and MRI findings while atypical lesions may be difficult to differentiate from alternative diagnoses. Some authors suggest the utilization of emerging imaging techniques such as DWI or DCE MRI to differentiate atypical lesions from malignancies, which is a promising solution that requires further research. Other authors suggest observation with regular follow-up may be the best course of management for asymptomatic, atypical lesions while others still recommend biopsy for definitive diagnosis of atypical lesions. Regardless, there is a consensus that symptomatic lesions should be treated. Most authors recommend surgical decompression for treatment in patients with neurological deficits, but there is ongoing debate as to the optimal treatment for back pain alone. There are several treatment options which should be considered case-by-case given the properties of various lesions. Management algorithms have been suggested but additional research is required to identify the optimal treatment for the many different classifications of VHs.

Availability of data and materials

Data sharing is not applicable to this article as no datasets were generated or analyzed during the current study.

Abbreviations

  • Vertebral hemangioma

Computed tomography

Magnetic resonance imaging

Arteriovenous

Positron emission-computed tomography

Technetium 99-methyl diphosphonate

Diffusion weighted imaging

Apparent diffusion coefficient

Volume transfer constant

Dynamic contrast enhanced magnetic resonance imaging

Epidural spinal cord compression

Polymethyl methacrylate

Teferi N, Chowdhury AJ, Mehdi Z, Challa M, Eschbacher K, Bathla G, Hitchon P. Surgical management of symptomatic vertebral hemangiomas: a single institution experience and literature review. Spine J. 2023;23(9):1243–54.

Article   PubMed   Google Scholar  

Rodallec MH, Feydy A, Larousserie F, Anract P, Campagna R, Babinet A, Zins M, Drapé JL. Diagnostic imaging of solitary tumors of the spine: what to do and say. Radiographics. 2008;28(4):1019–41.

Baudrez V, Galant C, Vande Berg BC. Benign vertebral hemangioma: MR-histological correlation. Skelet Radiol. 2001;30:442–6.

Article   CAS   Google Scholar  

Huvos AG. Hemangioma, lymphangioma, angiomatosis/lymphangiomatosis, glomus tumor. Bone tumors: diagnosis, treatment, and prognosis. 2nd ed. Philadelphia: Saunders. 1991;553–78.

Gaudino S, Martucci M, Colantonio R, Lozupone E, Visconti E, Leone A, Colosimo C. A systematic approach to vertebral hemangioma. Skelet Radiol. 2015;44:25–36.

Article   Google Scholar  

Campanacci M. Hemangioma. In: Campanacci M, editors. Bone and soft tissue tumors: clinical features, imaging, pathology and treatment. Padova: Piccin Nuova Libraria & Wien: Springer; 1999. p. 599–618.

Hameed M, Wold LE. Hemangioma. In: Fletcher CDM, Bridge JA, Hogendoorn P, Mertens F, editors. WHO classification of tumors of soft tissue and bone. 4th ed. Lyon: IARC Press; 2013. p. 332.

Google Scholar  

Jiang L, Liu XG, Yuan HS, Yang SM, Li J, Wei F, Liu C, Dang L, Liu ZJ. Diagnosis and treatment of vertebral hemangiomas with neurologic deficit: a report of 29 cases and literature review. Spine J. 2014;14(6):944–54.

Unni KK, Inwards CY. Dahlin's bone tumors: general aspects and data on 10,165 cases. Lippincott Williams & Wilkins; 2010.

Corniola MV, Schonauer C, Bernava G, Machi P, Yilmaz H, Lemée JM, Tessitore E. Thoracic aggressive vertebral hemangiomas: multidisciplinary management in a hybrid room. Eur Spine J. 2020;29:3179–86.

Fox MW, Onofrio BM. The natural history and management of symptomatic and asymptomatic vertebral hemangiomas. J Neurosurg. 1993;78(1):36–45.

Article   CAS   PubMed   Google Scholar  

Murphey MD, Fairbairn KJ, Parman LM, Baxter KG, Parsa MB, Smith WS. From the archives of the AFIP. Musculoskeletal angiomatous lesions: radiologic-pathologic correlation. Radiographics. 1995;15(4):893–917.

Cross JJ, Antoun NM, Laing RJ, Xuereb J. Imaging of compressive vertebral haemangiomas. Eur Radiol. 2000;10:997–1002.

Alexander J, Meir A, Vrodos N, Yau YH. Vertebral hemangioma: an important differential in the evaluation of locally aggressive spinal lesions. Spine. 2010;35(18):E917–20.

Laredo JD, Reizine D, Bard M, Merland JJ. Vertebral hemangiomas: radiologic evaluation. Radiology. 1986;161(1):183–9.

Elgazzar AH. Musculoskeletal system. In: Synopsis of pathophysiology in nuclear medicine. New York: Springer; 2014. p. 90–2.

Choi YY, Kim JY, Yang SO. PET/CT in benign and malignant musculoskeletal tumors and tumor-like conditions. In: Editors. Seminars in musculoskeletal radiology. Thieme Medical Publishers; 2014. pp. 133–48

Brogsitter C, Hofmockel T, Kotzerke J. 68Ga DOTATATE uptake in vertebral hemangioma. Clin Nucl Med. 2014;39(5):462–3.

Basu S, Nair N. “Cold” vertebrae on F-18 FDG PET: causes and characteristics. Clin Nucl Med. 2006;31(8):445–50.

Laredo JD, Assouline E, Gelbert F, Wybier M, Merland JJ, Tubiana JM. Vertebral hemangiomas: fat content as a sign of aggressiveness. Radiology. 1990;177(2):467–72.

Dang L, Liu C, Yang SM, Jiang L, Liu ZJ, Liu XG, Yuan HS, Wei F, Yu M. Aggressive vertebral hemangioma of the thoracic spine without typical radiological appearance. Eur Spine J. 2012;21:1994–9.

Article   PubMed   PubMed Central   Google Scholar  

Kato S, Kawahara N, Murakami H, Demura S, Yoshioka K, Okayama T, Fujita T, Tomita K. Surgical management of aggressive vertebral hemangiomas causing spinal cord compression: long-term clinical follow-up of five cases. J Orthop Sci. 2010;15:350–6.

Klekamp J, Samii M. Epidermal Tumors. In: Surgery of spinal tumors. Springer; 2007. p. 321–522.

Blecher R, Smorgick Y, Anekstein Y, Peer A, Mirovsky Y. Management of symptomatic vertebral hemangioma: follow-up of 6 patients. Clin Spine Surg. 2011;24(3):196–201.

Subramaniam MH, Moirangthem V, Venkatesan M. Management of aggressive vertebral haemangioma and assessment of differentiating pointers between aggressive vertebral haemangioma and metastases—a systematic review. Global Spine J. 2023;13(4):1120–33.

Hart JL, Edgar MA, Gardner JM. Vascular tumors of bone. In: Seminars in diagnostic pathology. WB Saunders; 2014. p. 30–8.

Dorfman HD, Czerniak B. Vascular lesions. In: Dorfman HD, Czerniak B, editors. Bone tumors. St. Louis: Mosby; 1998. p. 729–814.

Rudnick J, Stern M. Symptomatic thoracic vertebral hemangioma: a case report and literature review. Arch Phys Med Rehabil. 2004;85(9):1544–7.

Pastushyn AI, Slin’ko EI, Mirzoyeva GM. Vertebral hemangiomas: diagnosis, management, natural history and clinicopathological correlates in 86 patients. Surg Neurol. 1998;50(6):535–47.

Blankstein A, Spiegelmann R, Shacked I, Schinder E, Chechick A. Hemangioma of the thoracic spine involving multiple adjacent levels: case report. Spinal Cord. 1988;26(3):186–91.

Hanrahan CJ, Christensen CR, Crim JR. Current concepts in the evaluation of multiple myeloma with MR imaging and FDG PET/CT. Radiographics. 2010;30(1):127–42.

Ross JS, Masaryk TJ, Modic MT, Carter JR, Mapstone T, Dengel FH. Vertebral hemangiomas: MR imaging. Radiology. 1987;165(1):165–9.

Persaud T. The polka-dot sign. Radiology. 2008;246(3):980–1.

Nguyen JP, Djindjian M, Gaston A, Gherardi R, Benhaiem N, Caron JP, Poirier J. Vertebral hemangiomas presenting with neurologic symptoms. Surg Neurol. 1987;27(4):391–7.

Gaston A, Nguyen JP, Djindjian M, Le Bras F, Gherardi R, Benhaiem N, Marsault C. Vertebral haemangioma: CT and arteriographic features in three cases. J Neuroradiol. 1985;12(1):21–33.

CAS   PubMed   Google Scholar  

Friedman DP. Symptomatic vertebral hemangiomas: MR findings. AJR Am J Roentgenol. 1996;167(2):359–64.

Winfield JM, Poillucci G, Blackledge MD, Collins DJ, Shah V, Tunariu N, Kaiser MF, Messiou C. Apparent diffusion coefficient of vertebral haemangiomas allows differentiation from malignant focal deposits in whole-body diffusion-weighted MRI. Eur Radiol. 2018;28:1687–91.

Morales KA, Arevalo-Perez J, Peck KK, Holodny AI, Lis E, Karimi S. Differentiating atypical hemangiomas and metastatic vertebral lesions: the role of T1-weighted dynamic contrast-enhanced MRI. Am J Neuroradiol. 2018;39(5):968–73.

Article   CAS   PubMed   PubMed Central   Google Scholar  

Shi YJ, Li XT, Zhang XY, Liu YL, Tang L, Sun YS. Differential diagnosis of hemangiomas from spinal osteolytic metastases using 3.0 T MRI: comparison of T1-weighted imaging, chemical-shift imaging, diffusion-weighted and contrast-enhanced imaging. Oncotarget. 2017;8(41):71095–104.

McEvoy SH, Farrell M, Brett F, Looby S. Haemangioma, an uncommon cause of an extradural or intradural extramedullary mass: case series with radiological pathological correlation. Insights Imaging. 2016;7(1):87–98.

Teferi N, Abukhiran I, Noeller J, Helland LC, Bathla G, Ryan EC, Nourski KV, Hitchon PW. Vertebral hemangiomas: diagnosis and management. A single center experience. Clin Neurol Neurosurg. 2020;190:105745.

Acosta FL Jr, Sanai N, Chi JH, Dowd CF, Chin C, Tihan T, Chou D, Weinstein PR, Ames CP. Comprehensive management of symptomatic and aggressive vertebral hemangiomas. Neurosurg Clin N Am. 2008;19(1):17–29.

Wang B, Zhang L, Yang S, Han S, Jiang L, Wei F, Yuan H, Liu X, Liu Z. Atypical radiographic features of aggressive vertebral hemangiomas. JBJS. 2019;101(11):979–86.

Jayakumar PN, Vasudev MK, Srikanth SG. Symptomatic vertebral haemangioma: endovascular treatment of 12 patients. Spinal cord. 1997;35(9):624–8.

Boriani S, Weinstein JN, Biagini R. Primary bone tumors of the spine: terminology and surgical staging. Spine. 1997;22(9):1036–44.

Doppman JL, Oldfield EH, Heiss JD. Symptomatic vertebral hemangiomas: treatment by means of direct intralesional injection of ethanol. Radiology. 2000;214(2):341–8.

Goldstein CL, Varga PP, Gokaslan ZL, Boriani S, Luzzati A, Rhines L, Fisher CG, Chou D, Williams RP, Dekutoski MB, Quraishi NA. Spinal hemangiomas: results of surgical management for local recurrence and mortality in a multicenter study. Spine. 2015;40(9):656–64.

Bailey P, Bucy PC. Cavernous hemangioma of the vertebrae. J Am Med Assoc. 1929;92(21):1748–51.

Krueger EG, Sobel GL, Weinstein C. Vertebral hemangioma with compression of spinal cord. J Neurosurg. 1961;18(3):331–8.

Ghormley RK, Adson AW. Hemangioma of vertebrae. JBJS. 1941;23(4):887–95.

Acosta FL Jr, Sanai N, Cloyd J, Deviren V, Chou D, Ames CP. Treatment of Enneking stage 3 aggressive vertebral hemangiomas with intralesional spondylectomy: report of 10 cases and review of the literature. Clin Spine Surg. 2011;24(4):268–75.

Piper K, Zou L, Li D, Underberg D, Towner J, Chowdhry AK, Li YM. Surgical management and adjuvant therapy for patients with neurological deficits from vertebral hemangiomas: a meta-analysis. Spine. 2020;45(2):E99-110.

Galibert P, Deramond H, Rosat P, Le Gars D. Preliminary note on the treatment of vertebral angioma by percutaneous acrylic vertebroplasty. Neurochirurgie. 1987;33(2):166–8.

Guarnieri G, Ambrosanio G, Vassallo P, Pezzullo MG, Galasso R, Lavanga A, Izzo R, Muto M. Vertebroplasty as treatment of aggressive and symptomatic vertebral hemangiomas: up to 4 years of follow-up. Neuroradiology. 2009;51:471–6.

Kim BS, Hum B, Park JC, Choi IS. Retrospective review of procedural parameters and outcomes of percutaneous vertebroplasty in 673 patients. Interv Neuroradiol. 2014;20(5):564–75.

Layton KF, Thielen KR, Koch CA, Luetmer PH, Lane JI, Wald JT, Kallmes DF. Vertebroplasty, first 1000 levels of a single center: evaluation of the outcomes and complications. Am J Neuroradiol. 2007;28(4):683–9.

CAS   PubMed   PubMed Central   Google Scholar  

Heiss JD, Doppman JL, Oldfield EH. Relief of spinal cord compression from vertebral hemangioma by intralesional injection of absolute ethanol. N Engl J Med. 1994;331(8):508–11.

Bas T, Aparisi F, Bas JL. Efficacy and safety of ethanol injections in 18 cases of vertebral hemangioma: a mean follow-up of 2 years. Spine. 2001;26(14):1577–81.

Murugan L, Samson RS, Chandy MJ. Management of symptomatic vertebral hemangiomas: review of 13 patients. Neurol India. 2002;50(3):300.

Singh P, Mishra NK, Dash HH, Thyalling RK, Sharma BS, Sarkar C, Chandra PS. Treatment of vertebral hemangiomas with absolute alcohol (ethanol) embolization, cord decompression, and single level instrumentation: a pilot study. Neurosurgery. 2011;68(1):78–84.

Niemeyer T, McClellan J, Webb J, Jaspan T, Ramli N. Brown-Sequard syndrome after management of vertebral hemangioma with intralesional alcohol: a case report. Spine. 1999;24(17):1845.

Singh PK, Chandra PS, Vaghani G, Savarkar DP, Garg K, Kumar R, Kale SS, Sharma BS. Management of pediatric single-level vertebral hemangiomas presenting with myelopathy by three-pronged approach (ethanol embolization, laminectomy, and instrumentation): a single-institute experience. Childs Nerv Syst. 2016;32:307–14.

Yao KC, Malek AM. Transpedicular N-butyl cyanoacrylate-mediated percutaneous embolization of symptomatic vertebral hemangiomas. J Neurosurg Spine. 2013;18(5):450–5.

Kawahara N, Tomita K, Murakami H, Demura S, Yoshioka K, Kato S. Total en bloc spondylectomy of the lower lumbar spine: a surgical techniques of combined posterior-anterior approach. Spine. 2011;36(1):74–82.

Premat K, Clarençon F, Cormier É, Mahtout J, Bonaccorsi R, Degos V, Chiras J. Long-term outcome of percutaneous alcohol embolization combined with percutaneous vertebroplasty in aggressive vertebral hemangiomas with epidural extension. Eur Radiol. 2017;27:2860–7.

Yang ZY, Zhang LJ, Chen ZX, Hu HY. Hemangioma of the vertebral column: a report on twenty-three patients with special reference to functional recovery after radiation therapy. Acta Radiol Oncol. 1985;24(2):129–32.

Heyd R, Seegenschmiedt MH, Rades D, Winkler C, Eich HT, Bruns F, Gosheger G, Willich N, Micke O, German Cooperative Group on Radiotherapy for Benign Diseases. Radiotherapy for symptomatic vertebral hemangiomas: results of a multicenter study and literature review. Int J Radiat Oncol* Biol* Phys. 2010;77(1):217–25.

Asthana AK, Tandon SC, Pant GC, Srivastava A, Pradhan S. Radiation therapy for symptomatic vertebral haemangioma. Clin Oncol. 1990;2(3):159–62.

Download references

Acknowledgements

Not applicable.

The University of Iowa Hospitals and Clinics provided funding for this research.

Author information

Authors and affiliations.

University of Iowa Carver, College of Medicine, Iowa City, IA, USA

Kyle Kato & Meron Challa

Department of Neurosurgery, University of Iowa Carver, College of Medicine, Iowa City, IA, USA

Nahom Teferi & Satoshi Yamaguchi

Department of Pathology, University of Iowa Carver, College of Medicine,, Iowa City, IA, USA

Kathryn Eschbacher

You can also search for this author in PubMed   Google Scholar

Contributions

KK performed a portion of the literature review and was a major contributor in writing the manuscript/generating figures. NT performed most of the literature review and was a major contributor in reviewing the manuscript. MC contributed to writing the manuscript. KE provided histological images for figures. SY was a major contributor in reviewing the manuscript. All authors read and approved the final manuscript.

Corresponding author

Correspondence to Kyle Kato .

Ethics declarations

Ethics approval and consent to participate, consent for publication.

Consent was obtained to publish Fig.  6 from Teferi et al. [ 1 ].

Competing interests

The authors declare no competing interests.

Additional information

Publisher's note.

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ . The Creative Commons Public Domain Dedication waiver ( http://creativecommons.org/publicdomain/zero/1.0/ ) applies to the data made available in this article, unless otherwise stated in a credit line to the data.

Reprints and permissions

About this article

Cite this article.

Kato, K., Teferi, N., Challa, M. et al. Vertebral hemangiomas: a review on diagnosis and management. J Orthop Surg Res 19 , 310 (2024). https://doi.org/10.1186/s13018-024-04799-5

Download citation

Received : 02 April 2024

Accepted : 18 May 2024

Published : 24 May 2024

DOI : https://doi.org/10.1186/s13018-024-04799-5

Share this article

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

  • Laminectomy
  • Sclerotherapy
  • Vertebroplasty

Journal of Orthopaedic Surgery and Research

ISSN: 1749-799X

literature review of techniques

IMAGES

  1. Writing the Literature Review

    literature review of techniques

  2. How To Make A Literature Review For A Research Paper

    literature review of techniques

  3. Start

    literature review of techniques

  4. 39 Best Literature Review Examples (Guide & Samples)

    literature review of techniques

  5. 50 Smart Literature Review Templates (APA) ᐅ TemplateLab

    literature review of techniques

  6. 39 Best Literature Review Examples (Guide & Samples)

    literature review of techniques

VIDEO

  1. How to Write a Literature Review in 30 Minutes or Less

  2. How to Write a Literature Review: 3 Minute Step-by-step Guide

  3. LITERATURE REVIEW: Step by Step Guide for Writing an Effective Literature Review

  4. How To Write A Literature Review In 3 Simple Steps (FREE Template With Examples)

  5. How to easily write a perfect literature review (step by step guide)? 12 powerful tips

  6. How To Structure Your Literature Review For Success: Full Tutorial (+Examples)

COMMENTS

  1. How to Write a Literature Review

    Examples of literature reviews. Step 1 - Search for relevant literature. Step 2 - Evaluate and select sources. Step 3 - Identify themes, debates, and gaps. Step 4 - Outline your literature review's structure. Step 5 - Write your literature review.

  2. Ten Simple Rules for Writing a Literature Review

    Literature reviews are in great demand in most scientific fields. Their need stems from the ever-increasing output of scientific publications .For example, compared to 1991, in 2008 three, eight, and forty times more papers were indexed in Web of Science on malaria, obesity, and biodiversity, respectively .Given such mountains of papers, scientists cannot be expected to examine in detail every ...

  3. A Complete Guide on How to Write Good a Literature Review

    1. Outline and identify the purpose of a literature review. As a first step on how to write a literature review, you must know what the research question or topic is and what shape you want your literature review to take. Ensure you understand the research topic inside out, or else seek clarifications.

  4. Steps in the Literature Review Process

    Literature Review and Research Design by Dave Harris This book looks at literature review in the process of research design, and how to develop a research practice that will build skills in reading and writing about research literature--skills that remain valuable in both academic and professional careers. Literature review is approached as a process of engaging with the discourse of scholarly ...

  5. Chapter 9 Methods for Literature Reviews

    Literature reviews play a critical role in scholarship because science remains, first and foremost, a cumulative endeavour (vom Brocke et al., 2009). As in any academic discipline, rigorous knowledge syntheses are becoming indispensable in keeping up with an exponentially growing eHealth literature, assisting practitioners, academics, and graduate students in finding, evaluating, and ...

  6. PDF How to Write a Literature Review

    Use these relevant models to determine: 1. What you are studying 2. The perspective you are taking 3. The field(s) that are relevant. STEP TWO: SEARCH THE LITERATURE. Define the Scope Search the Literature Analyze the Literature Synthesize the Literature Write the Review.

  7. How to Write a Literature Review: Six Steps to Get You from ...

    Sonja Foss and William Walters* describe an efficient and effective way of writing a literature review. Their system provides an excellent guide for getting through the massive amounts of literature for any purpose: in a dissertation, an M.A. thesis, or preparing a research article for publication in any field of study. Below is a summary of ...

  8. Literature Review: The What, Why and How-to Guide

    Example: Predictors and Outcomes of U.S. Quality Maternity Leave: A Review and Conceptual Framework: 10.1177/08948453211037398 ; Systematic review: "The authors of a systematic review use a specific procedure to search the research literature, select the studies to include in their review, and critically evaluate the studies they find." (p. 139).

  9. Literature review as a research methodology: An ...

    A literature review can broadly be described as a more or less systematic way of collecting and synthesizing previous research (Baumeister & Leary, 1997; Tranfield, Denyer, & Smart, 2003). ... Techniques can also be used to discover which study-level or sample characteristics have an effect on the phenomenon being studied, ...

  10. How to write a superb literature review

    The best proposals are timely and clearly explain why readers should pay attention to the proposed topic. It is not enough for a review to be a summary of the latest growth in the literature: the ...

  11. Guidance on Conducting a Systematic Literature Review

    Introduction. Literature review is an essential feature of academic research. Fundamentally, knowledge advancement must be built on prior existing work. To push the knowledge frontier, we must know where the frontier is. By reviewing relevant literature, we understand the breadth and depth of the existing body of work and identify gaps to explore.

  12. Steps in Conducting a Literature Review

    A literature review is an integrated analysis-- not just a summary-- of scholarly writings and other relevant evidence related directly to your research question.That is, it represents a synthesis of the evidence that provides background information on your topic and shows a association between the evidence and your research question.

  13. (PDF) Literature Review as a Research Methodology: An overview and

    Literature reviews allow scientists to argue that they are expanding current. expertise - improving on what already exists and filling the gaps that remain. This paper demonstrates the literatu ...

  14. PDF METHODOLOGY OF THE LITERATURE REVIEW

    In the field of research, the term method represents the specific approaches and procedures that the researcher systematically utilizes that are manifested in the research design, sampling design, data collec-tion, data analysis, data interpretation, and so forth. The literature review represents a method because the literature reviewer chooses ...

  15. 5. The Literature Review

    A literature review may consist of simply a summary of key sources, but in the social sciences, a literature review usually has an organizational pattern and combines both summary and synthesis, often within specific conceptual categories.A summary is a recap of the important information of the source, but a synthesis is a re-organization, or a reshuffling, of that information in a way that ...

  16. Writing Literature Reviews

    Presenting points of attention, techniques and tips for the actual writing of literature reviews is what is found in this chapter. It starts by looking into what makes writing literature reviews different in Section 15.1.The section also pays attention to literature reviews informed by multiple disciplines and introduces a classification with additional points to be attentive to.

  17. A practical guide to data analysis in general literature reviews

    This article is a practical guide to conducting data analysis in general literature reviews. The general literature review is a synthesis and analysis of published research on a relevant clinical issue, and is a common format for academic theses at the bachelor's and master's levels in nursing, physiotherapy, occupational therapy, public health and other related fields.

  18. What is a Literature Review?

    A literature review is a review and synthesis of existing research on a topic or research question. A literature review is meant to analyze the scholarly literature, make connections across writings and identify strengths, weaknesses, trends, and missing conversations. A literature review should address different aspects of a topic as it ...

  19. What is a Literature Review? How to Write It (with Examples)

    A literature review is a critical analysis and synthesis of existing research on a particular topic. It provides an overview of the current state of knowledge, identifies gaps, and highlights key findings in the literature. 1 The purpose of a literature review is to situate your own research within the context of existing scholarship ...

  20. Reviewing literature for research: Doing it the right way

    Literature search. Fink has defined research literature review as a "systematic, explicit and reproducible method for identifying, evaluating, and synthesizing the existing body of completed and recorded work produced by researchers, scholars and practitioners."[]Review of research literature can be summarized into a seven step process: (i) Selecting research questions/purpose of the ...

  21. Mind‐Mapping: A Successful Technique for Organizing a Literature Review

    Literature reviews are more than a summary of the literature. They synthesize the key studies and concepts in a particular area, show relationships between studies, and suggest patterns in the body of literature. Good literature reviews require thoughtful organization of the literature, which can be accomplished through mind-mapping techniques.

  22. The Literature Review: 3. Methods for Searching the Literature

    For each article identified for possible inclusion in the literature review, you need to: 1. read the abstract. decide whether to read the entire article; 2. read the introduction. explains why the study is important; provides a review and evalution of relevant literature; 3. read Methods section critically. focus on participants and methodology

  23. Different types of literature review techniques followed in a research

    The questions are broader, and the evaluation is variable. Narrative or traditional literature review technique can be of three further types: A general literature review is the crucial dimension of current knowledge of a topic. A historical literature review is based on the examination of research throughout a period.

  24. A systematic literature review of group contingencies with adults

    Previous literature reviews have supported the use of group contingencies with children; however, there has yet to be a systematic literature review conducted on group contingencies with adults. Because group contingencies have multiple benefits, it is important that a literature review be conducted to assess the evidence of these interventions ...

  25. Forecasting e-commerce consumer returns: a systematic literature review

    In their literature review regarding the use of ML techniques in e-commerce, Micol Policarpo et al. propose a taxonomy to visualize specific ML algorithms in the context of e-commerce platforms. This novel kind of taxonomy is based on direct acyclic graphs, i.e., all input variables need to be fulfilled to reach the target.

  26. Optimizing double-layered convolutional neural networks for efficient

    The literature surrounding lung cancer diagnostics encompasses various methodologies, ranging from traditional imaging techniques to more advanced approaches such as machine learning. This review aims to explore existing research in this area, highlighting both the advancements made and the limitations faced, ultimately setting the foundation ...

  27. Non‐invasive imaging techniques for diagnosis of pelvic deep

    LITERATURE REVIEW Transvaginal sonography (TVS) Rectosigmoid DE. Since Bazot et al. 15 evaluated the accuracy of TVS against surgical findings of pelvic DE, there have been a considerable number of studies published assessing preoperatively imaging techniques to detect DE, in particular rectosigmoid DE. Of these, TVS is the most studied, and is often used as the first-line modality, given its ...

  28. Microorganisms

    Some infectious agents have the potential to cause specific modifications in the cellular microenvironment that could be propitious to the carcinogenesis process. Currently, there are specific viruses and bacteria, such as human papillomavirus (HPV) and Helicobacter pylori, that are well established as risk factors for neoplasia. Chlamydia trachomatis (CT) infections are one of the most common ...

  29. Vertebral hemangiomas: a review on diagnosis and management

    Vertebral hemangiomas (VHs) are the most common benign tumors of the spinal column and are often encountered incidentally during routine spinal imaging. A retrospective review of the inpatient and outpatient hospital records at our institution was performed for the diagnosis of VHs from January 2005 to September 2023. Search filters included "vertebral hemangioma," "back pain ...