Welcome To Open Case Studies

Connecting you with real-world public health data.

The Open Case Studies project showcases the possibilities of what can be achieved when working with real-world data.

Housed in a freely accessible GitHub repository, the project’s self-contained and experiential guides demonstrate the data analysis process and the use of various data science methods, tools, and software in the context of messy, real-world data.

These case studies will empower current and future data scientists to leverage real-world data to solve leading public health challenges.

Who Are Open Case Studies For?

Your experiential guide to the power of data analysis.

The Open Case Studies project provides insights about gathering and working with data for students, instructors, and those with experience in data science or statistical methods at nonprofit organizations and public sector agencies.

Each case study in the project focuses on an important public health topic and introduces methods to provide users with the skills and knowledge for greater legibility, reproducibility, rigor, and flexibility in their own data analyses.

Case Study Bank Overview

Real data on ten public health challenges in the U.S.

The following in-depth case studies use real data and focus on five areas of public health that are particularly pressing in the United States.

Vaping Behaviors in American Youth

This case study explores the trends of tobacco product usage among American youths surveyed in the National Youth Tobacco Survey (NYTS) from 2015-2019. It demonstrates how to use survey data and code books and provides an introduction to writing functions to wrangle similar but slightly different data repetitively. The case study introduces packages for using survey weighting and survey design to perform an analysis to compare vaping product usage among different groups, and covers how to use a logistic regression to compare groups for a variable that is binary (such as true or false — in this case it was using vaping products or not). This case study also covers how to make visualizations of multiple groups over time with confidence interval error bars.

Opioids in the United States

This case study examines the number of opioid pills (specifically oxycodone and hydrocodone, as they are the top two misused opioids) shipped to pharmacies and practitioners at the county-level around the United States from 2006 to 2014 using data from the Drug Enforcement Administration (DEA). This case study demonstrates how to get data from a source called an application programming interface (API). It explores why and how to normalize data, as well as why and how to potentially stratify or redefine groups. It also shows how to compare two independent groups when the data is not normally distributed using a test called the Wilcoxon rank sum test (also called the Mann Whitney U test) and how to add confidence intervals to plots (using a method called bootstrapping).

Disparities in Youth Disconnection

This case study focuses on rates of youth (people between 16-24) disconnection (those who are neither working nor in school) among different racial, ethnic and gender subgroups to identify subgroups that may be particularly vulnerable. It demonstrates that deeper inspection of subgroups yields some differences that are not otherwise discernable, how to import data from a PDF using screenshots of sections of the PDF, and how to use the Mann-Kendall trend test to test for the presence of a consistent direction in the relationship of disconnection rates with time. This case study also shows how to make a visualization that stylistically matches that of an existing report, how to add images to plots, and how to create effective bar plots for multiple comparisons across several groups.

Mental Health of American Youth

This case study investigates how the rate of self-reported symptoms of major depressive episodes (MDE) has changed over time among American youth (age 12-17) from 2004-2018. It describes the impact of self-reporting bias in surveys, how to get data directly from a website, as well as how to compare changes in the frequency of a variable between two groups using a chi-squared test to determine if two variables are independent (in this case if the sex of the students influenced the frequency of reported MDE symptoms in 2004 and 2018). This case study also demonstrates how to create direct labels on visualizations with many groups across time, as well as how to create an animated gif.

Exploring CO2 Emissions Across Time

This case study investigates how CO2 emissions have changed since the 1700s and how the level of emissions has compared for different countries around the world. It explores how yearly average temperature and the number of natural disasters in the United States has changed over time and provides an introduction for examining if two sets of data are correlated with one another. This case study also goes into great detail about how to make what are called heatmaps and other plots to visualize multiple groups over time. This includes adding labels directly to lines on plots with multiple lines.

Predicting Annual Air Pollution

This case study uses machine learning methods to predict annual air pollution levels spatially within the United States based on data about population density, urbanization, road density, as well as satellite pollution data and chemical modeling data among other predictors. Machine learning methods are used to predict air pollution levels when traditional monitoring systems are not available in a particular area or when there is not enough spatial granularity with current monitoring systems. The case study also demonstrates how to visualize data using maps.

Exploring Global Patterns of Obesity Across Rural and Urban Regions

This case study compares average Body Mass Index measurements for males and females from rural and urban regions from over 200 countries around the world, with a particular emphasis on the United States. It provides a thorough introduction to wrangling data from a PDF, how to compare two paired groups using the t test and the nonparametric Wilcoxon signed-rank test using R programming, and how to make visualizations of group comparisons that emphasize a particular subset of the data.

Exploring Global Patterns of Dietary Behaviors Associated with Health Risk

This case study investigates the consumption of dietary factors associated with health risk among males and females from over 200 countries around the world, with a particular emphasis on the United States. It demonstrates how to wrangle data from a PDF; how to combine data from two different sources; how to compare two paired groups and multiple paired groups using t-tests, ANOVA, and linear regression; and how to create visualizations of several groups and how to combine plots together with very different scales.

Influence of Multicollinearity on Measured Impact of Right-To-Carry Gun Laws

This case study focuses on two well-known studies that evaluated the influence of right-to-carry gun laws on violent crime rates. It demonstrates a phenomenon called multicollinearity, where explanatory variables that can predict one another can lead to aberrant and unstable findings; how to make visualizations with labels, such as arrows or equations; and how to combine multiple plots together.

School Shootings in the United States

This case study illustrates ways to communicate trends in a dataset about the number and characteristics of school shooting events for students in grades K-12 in the United States since 1970. It demonstrates how to create a dashboard, which is a website that shows patterns in a dataset in a concise manner; how to import data from a Google Sheets document; how to create interactive tables and maps; and how to properly calculate percentages for data when there are missing values.

Which Case Study Is Right For Me?

Connecting with the public health data you need.

The Open Case Studies project approaches data in many different ways. The guide below will help connect you with a case study:

Data science projects often start with a question. Here, you may look for case studies that explore a question that is similar to one you are interested in investigating with your data.

How does something change over time?

Investigating how a variable has changed over time can help identify consistent trends.

How do survey responses compare for different groups over time?

Survey data requires special care and attention to the survey design.

How do groups compare?

Public health researchers are often interested to know if one group is more vulnerable than another or if two or more groups are actually different from one another.

How do groups compare over time?

Comparing several groups over time can provide insight into if the change over time is different for different groups.

How do paired groups compare?

Paired groups are those that are not independent in some way. Perhaps you want to know how data from the same person over time compares with that of another person over time, or perhaps you are interested in how something changed in a city before and after an intervention, or perhaps you want to compare groups using data that has structure where there is coupling or matching of data values across samples.

Are certain groups or possibly subgroups more vulnerable?

Understand how to compare subpopulations at a deeper level.

How does something compare across regions?

Often it is useful to investigate if data differs by region, as many environmental, cultural, and political differences can influence public health outcomes.

How can I predict outcomes for new data?

Learn how the data might look next year or for locations that you don’t have data about.

Does this influence my data?

Analyze how a variable influences another variable.

Are these two variables related to one another?

Understand how two variables are related and how strongly they are related to one another.

How can I display this data for others to find and interpret and use easily?

Make it easy for others to find your data, see the major trends in your data, or search for specific values in your data.

Data can come from many different sources, from the more obvious like an excel file to the less obvious like an image or a website. These case studies demonstrate how to use data from a variety of possible sources.

Using data from a PDF or just parts of a PDF can be challenging. You could type the data into a new excel file, but this can result in mistakes and it is difficult to reproduce.

Data are often in CSV files and it is typically easy to import data and work with data in this form. However, sometimes it can be difficult if, for example, the first few lines are structured differently or if you have unusual missing value indicators.

If you find data on a website that doesn’t allow you to download in a convenient way, you can actually directly import the data into R programming language.

This is one of the most common data forms, and it is typically easy to import data and work with data in this form. However, sometimes it can be challenging, especially if you have many files.

You can extract text from image files. This can be useful if, for example, you want to only use certain parts of a PDF.

It is possible to find the data that you need to use from an application programming interface (API).

Google Sheet

You can download data from a Google Sheet, copy and paste it into Excel, or directly import the data into R programming language.

Survey data/Code books

Working with survey data requires special care and attention, and you can do this directly with R programming language.

Multiple files

If you find that you need to import data from multiple files, there is a more efficient way to do so without importing each one by one.

Data wrangling is the process of organizing your data in a more useful format. These case studies explore how to clean, rearrange, reshape, modify, filter, combine, or join your data.

Extracting data from a PDF

Extracting and organizing data from a PDF will make it easier to use.

Geocoding data

The process of assigning relevant latitude and longitude coordinates to data values is called geocoding. This can be helpful (although not always necessary) to create a map of your data.

Recoding data

If you have data values that are confusing and could be changed to something better, or if you want to convert your data to true or false, you might want to consider recoding these values.

Methods of joining data

Sometimes, you obtain data from multiple sources that need to be combined together.

Filtering data

Perhaps you need to filter your data for only specific values for given variables. In other words, you might want to filter census employment data to only values for females who are also Black and live in Connecticut.

Modifying data (normalizing, transforming, scaling etc.)

Sometimes it is difficult to know when or how to normalize data.

Working with text

You can work with, remove, replace, or change words, phrases, letters, numbers, or punctuation marks in your data.

Reshaping data

Sometimes it is useful to shape your data so that you have many columns (for example, when performing certain analyses), however it can be useful at other times (for example, when creating plots) to collapse multiple columns into fewer columns with more rows.

Repetitive process

Sometimes you need to wrangle multiple datasets from different sources in a similar manner.

A picture is worth a thousand words, particularly when it comes to interpreting data. These case studies demonstrate how to make effective visualizations in various contexts. The first ten represent basic visualizations while 11-22 are more advanced.

A table that is easy to interpret

Adding colors or simple graphics can make tables easier to interpret.

Scatter plot

Scatter plots can be a strong option for evaluating the relationship between variables, and especially for evaluating changes in a variable over time.

Line plots are often useful for evaluating changes over time.

Bar plots are a good choice if you want to compare data to a threshold.

Box plots are particularly useful for comparing groups with many data values. They provide information about the spread of the data.

Pie chart/waffle plot

Pie charts or waffle plots can be a strong option when comparing relative percentages.

It can be difficult to visualize multiple groups at simultaneously. In these situations, heat maps can be a great option.

Correlation plots

If you have many variables and need to know if they are correlated to one another, there are methods to efficiently check this.

Visualize missing data

It can be helpful to quickly identify how much of your data is missing (has NA values).

Create a map of your data

Often the best way to interpret regional differences in data is to make a map.

  • Advanced Visualizations

Matching a style

If you are working with collaborators, you can make your visualizations match the style of their figures.

Faceted plots allow you to quickly create multiple plots at once

It can be difficult to visualize multiple groups at the same time, so faceted plots are a great option in this situation.

Adding labels directly to plots with many different groups

If you compare many groups over time, for example, it can be difficult to see which line corresponds to which group. Adding labels directly to these lines can be very helpful and negates the need for an overcomplicated legend.

Emphasize a particular group

Sometimes you will have several different groups and you want to highlight a specific group.

Adding annotations to plots

Adding labels, such as thresholds, arrows, or equations, can make it easier for people to interpret your plot.

Add error bars to your plot

Adding error bars can help convey information about the confidence of the estimates in your plots.

Combine multiple plots together

Sometimes it is useful to put a variety of plots together and add text to explain what the plot shows.

Create an interactive plot when you have too many groups to label

If you compare a very large number of groups, it can be difficult to tell what is happening. Often it can help to make the plot interactive so that the user can hover over points or lines to see what they indicate.

Create an interactive map of your data

Sometimes it is easiest to see regional differences by interacting with and exploring an interactive map.

Create an interactive table of your data

Sometimes you might want to be able to search through your data or allow others to easily do so.

Add images to your figures

Including images to a plot, such as a logo, can be a helpful addition.

Create an interactive dashboard/website for your data

Dashboards can quickly convey major trends in a dataset, and they can also allow users to interact with the data to choose what aspects about the data they wish to explore.

To better understand data, it is helpful to use statistical tests. These case studies demonstrate a variety of statistical tests and concepts.

Are two groups different?

Correlation

Are two variables related to one another?

Are multiple groups different?

Linear regression

Would you like to compare groups?

Chi-squared test of independence

Do the frequencies of two groups suggest that they are independent?

Mann-Kendall Trend test

Is there a consistent change over time?

Machine learning

Would you like to predict data?

Calculate percentages with missing data?

Would you like to calculate percentages, but you are missing some data?

About The Project

Learn about the team behind the Open Case Studies project.

As part of the larger Open Case Studies project (OCS) at opencasestudies.org , these case studies were developed for and funded by the Bloomberg American Health Initiative. The OCS project is made up of a team of researchers at the Johns Hopkins Bloomberg School of Public Health (JHSPH).

Let us know how the Open Case Studies project has enhanced your educational curriculum or ability to tackle tough data-rich research projects.

case study health data

JHSPH Faculty Contributors

Jessica Fanzo, PhD

Brendan Saloner, PhD

Megan Latshaw, PhD, MHS

Renee M. Johnson, PhD, MPH

Daniel Webster, ScD, MPH

Elizabeth Stuart, PhD

Bloomberg American Health Initiative

Joshua M. Sharfstein, MD – Director, Bloomberg American Health Initiative

Michelle Spencer, MS – Associate Director, Bloomberg American Health Initiative

Paulani Mui, MPH – Special Projects Officer, Bloomberg American Health Initiative

Other Contributors

Aboozar Hadavand, PhD, MA, MS, Minerva University

Roger Peng, PhD, MS, Johns Hopkins Bloomberg School of Public Health

Kirsten Koehler, PhD, MS, Johns Hopkins Bloomberg School of Public Health

Alex McCourt, PhD, JD, MPH, Johns Hopkins Bloomberg School of Public Health

Ashkan Afshin, MD, ScD, MPH, MSc, University of Washington and Institute for Health Metrics and Evaluation (IHME)

Erin Mullany, BA, Institute for Health Metrics and Evaluation (IHME)

External Review Panel

Leslie Myint, PhD, Macalester College

Shannon E. Ellis, PhD, University of California – San Diego

Christina Knudson, PhD, University of St. Thomas

Michael Love, PhD, University of North Carolina

Nicholas Horton, ScD, Amherst College

Mine Çetinkaya-Rundel, PhD, University of Edinburgh, Duke University, RStudio

Let Us Know How You're Using Open Case Studies

As the Open Case Studies project expands, we learn from you. Tell us what data you'd like to see, how you're using the data, or anything we can do to improve the project.

Analytics_in_Healthcare_featured

Data Analytics in Healthcare: 7 Real-World Examples and Use Cases

  • Data Science ,   Healthcare
  • 31 Aug, 2020
  • No comments Share

A roster of seven analytics use cases

Analytics application cases in healthcare

Predicting palliative care patients risk: Penn Medicine

Optimization of clinical space usage: texas children's hospital.

  • An online scheduling tool was leveraged to allow self-scheduling through the web.
  • The hospital also established a template for allocating scheduling time in four-hour blocks. Appointments of different duration were allocated to different time blocks. All the unfilled appointments were distributed in a 72-hour time zone to close the gap.
  • Weekend appointments and extended hospital hours were added.
  • An annual revenue increased by $8.3 million with 53 thousand appointments respectively
  • 30 thousand online schedules
  • 39 percent patient satisfaction rate growth

Applying machine learning to predict operation duration and disease risk probability: Lucile Packard Children’s Hospital Stanford

  • Identify patients at clinical decline risk
  • Prevent central line-associated bloodstream infections
  • Predict surgical operation duration

Operation room delay reduction: The University of Chicago Medical Center

Daily emergency room visits prediction: envision physician services, monitoring patient state deterioration: ysbyty gwynedd, leveraging data to create covid-19 mortality model: agilon health.

  • create a COVID-19 model for approximately 125,000 individuals that were assigned with risk scores.
  • increase one partner location’s telehealth appointments from none in the first week to 2,200 in weeks 12 and 13, aligning with social distancing and overall pandemic policies.

What are the other opportunities of data analytics in healthcare?

  • MEMBER DIRECTORY
  • Member Login
  • Publications
  • Clinician Well-Being
  • Culture of Health and Health Equity
  • Fellowships and Leadership Programs
  • Future of Nursing
  • U.S. Health Policy and System Improvement
  • Healthy Longevity
  • Human Gene Editing
  • U.S. Opioid Epidemic
  • Staff Directory
  • Opportunities
  • Action Collaborative on Decarbonizing the U.S. Health Sector
  • Climate Communities Network
  • Communicating About Climate Change & Health
  • Research and Innovation
  • Culture of Health
  • Fellowships
  • Emerging Leaders in Health & Medicine
  • Culture & Inclusiveness
  • Digital Health
  • Evidence Mobilization
  • Value Incentives & Systems
  • Substance Use & Opioid Crises
  • Reproductive Health, Equity, & Society
  • Credible Sources of Health Information
  • Emerging Science, Technology, & Innovation
  • Pandemic & Seasonal Influenza Vaccine Preparedness and Response
  • Preventing Firearm-Related Injuries and Deaths
  • Vital Directions for Health & Health Care
  • NAM Perspectives
  • All Publications
  • Upcoming Events
  • Past Events
  • MEMBER HOME

National Academy of Medicine

Sharing Health Data: The Why, the Will, and the Way Forward

A special publication from the national academy of medicine.

Sharing health data and information across stakeholder groups is the bedrock of a learning health system. As data and information are increasingly combined across various sources, their generative value to transform health, health care, and health equity increases significantly. Health data has proven its centrality in guiding action to change the course of individual and population health, if properly stewarded and used.

In the context of the COVID-19 pandemic, both data and a lack of data illuminated profound shortcomings that affected health care and health equity. Yet, a silver lining of the pandemic was a surge in collaboration among data holders in public health, health care, and technology firms, suggesting that an evolution in health data sharing is visible and tangible.

This Special Publication features some of these novel data sharing collaborations, and has been developed to provide practical context and implementation guidance that is critical to advancing the lessons learned identified in its parent NAM Special Publication, Health Data Sharing: Building a Foundation of Stakeholder Trust. The focus of this publication is to identify and describe exemplar groups to dispel the myth that sharing health data more broadly is impossible and illuminate the innovative approaches that are being taken to make progress in the current environment. It also serves as a resource for those waiting in the wings, showing how barriers were addressed and harvesting lessons and insights from those on the front lines.

In the meantime, knowledge is already available to foster better health care and health outcomes. The examples described in this volume suggest how intentional attention to health data sharing can enable unparalleled advances, securing a healthier and more equitable future for all.

case study health data

More for you

  • Survey paper
  • Open access
  • Published: 19 June 2019

Big data in healthcare: management, analysis and future prospects

  • Sabyasachi Dash 1   na1 ,
  • Sushil Kumar Shakyawar 2 , 3   na1 ,
  • Mohit Sharma 4 , 5 &
  • Sandeep Kaushik 6  

Journal of Big Data volume  6 , Article number:  54 ( 2019 ) Cite this article

449k Accesses

699 Citations

103 Altmetric

Metrics details

‘Big data’ is massive amounts of information that can work wonders. It has become a topic of special interest for the past two decades because of a great potential that is hidden in it. Various public and private sector industries generate, store, and analyze big data with an aim to improve the services they provide. In the healthcare industry, various sources for big data include hospital records, medical records of patients, results of medical examinations, and devices that are a part of internet of things. Biomedical research also generates a significant portion of big data relevant to public healthcare. This data requires proper management and analysis in order to derive meaningful information. Otherwise, seeking solution by analyzing big data quickly becomes comparable to finding a needle in the haystack. There are various challenges associated with each step of handling big data which can only be surpassed by using high-end computing solutions for big data analysis. That is why, to provide relevant solutions for improving public health, healthcare providers are required to be fully equipped with appropriate infrastructure to systematically generate and analyze big data. An efficient management, analysis, and interpretation of big data can change the game by opening new avenues for modern healthcare. That is exactly why various industries, including the healthcare industry, are taking vigorous steps to convert this potential into better services and financial advantages. With a strong integration of biomedical and healthcare data, modern healthcare organizations can possibly revolutionize the medical therapies and personalized medicine.

Introduction

Information has been the key to a better organization and new developments. The more information we have, the more optimally we can organize ourselves to deliver the best outcomes. That is why data collection is an important part for every organization. We can also use this data for the prediction of current trends of certain parameters and future events. As we are becoming more and more aware of this, we have started producing and collecting more data about almost everything by introducing technological developments in this direction. Today, we are facing a situation wherein we are flooded with tons of data from every aspect of our life such as social activities, science, work, health, etc. In a way, we can compare the present situation to a data deluge. The technological advances have helped us in generating more and more data, even to a level where it has become unmanageable with currently available technologies. This has led to the creation of the term ‘big data’ to describe data that is large and unmanageable. In order to meet our present and future social needs, we need to develop new strategies to organize this data and derive meaningful information. One such special social need is healthcare. Like every other industry, healthcare organizations are producing data at a tremendous rate that presents many advantages and challenges at the same time. In this review, we discuss about the basics of big data including its management, analysis and future prospects especially in healthcare sector.

The data overload

Every day, people working with various organizations around the world are generating a massive amount of data. The term “digital universe” quantitatively defines such massive amounts of data created, replicated, and consumed in a single year. International Data Corporation (IDC) estimated the approximate size of the digital universe in 2005 to be 130 exabytes (EB). The digital universe in 2017 expanded to about 16,000 EB or 16 zettabytes (ZB). IDC predicted that the digital universe would expand to 40,000 EB by the year 2020. To imagine this size, we would have to assign about 5200 gigabytes (GB) of data to all individuals. This exemplifies the phenomenal speed at which the digital universe is expanding. The internet giants, like Google and Facebook, have been collecting and storing massive amounts of data. For instance, depending on our preferences, Google may store a variety of information including user location, advertisement preferences, list of applications used, internet browsing history, contacts, bookmarks, emails, and other necessary information associated with the user. Similarly, Facebook stores and analyzes more than about 30 petabytes (PB) of user-generated data. Such large amounts of data constitute ‘ big data ’. Over the past decade, big data has been successfully used by the IT industry to generate critical information that can generate significant revenue.

These observations have become so conspicuous that has eventually led to the birth of a new field of science termed ‘ Data Science ’. Data science deals with various aspects including data management and analysis, to extract deeper insights for improving the functionality or services of a system (for example, healthcare and transport system). Additionally, with the availability of some of the most creative and meaningful ways to visualize big data post-analysis, it has become easier to understand the functioning of any complex system. As a large section of society is becoming aware of, and involved in generating big data, it has become necessary to define what big data is. Therefore, in this review, we attempt to provide details on the impact of big data in the transformation of global healthcare sector and its impact on our daily lives.

Defining big data

As the name suggests, ‘big data’ represents large amounts of data that is unmanageable using traditional software or internet-based platforms. It surpasses the traditionally used amount of storage, processing and analytical power. Even though a number of definitions for big data exist, the most popular and well-accepted definition was given by Douglas Laney. Laney observed that (big) data was growing in three different dimensions namely, volume, velocity and variety (known as the 3 Vs) [ 1 ]. The ‘big’ part of big data is indicative of its large volume. In addition to volume, the big data description also includes velocity and variety. Velocity indicates the speed or rate of data collection and making it accessible for further analysis; while, variety remarks on the different types of organized and unorganized data that any firm or system can collect, such as transaction-level data, video, audio, text or log files. These three Vs have become the standard definition of big data. Although, other people have added several other Vs to this definition [ 2 ], the most accepted 4th V remains ‘veracity’.

The term “ big data ” has become extremely popular across the globe in recent years. Almost every sector of research, whether it relates to industry or academics, is generating and analyzing big data for various purposes. The most challenging task regarding this huge heap of data that can be organized and unorganized, is its management. Given the fact that big data is unmanageable using the traditional software, we need technically advanced applications and software that can utilize fast and cost-efficient high-end computational power for such tasks. Implementation of artificial intelligence (AI) algorithms and novel fusion algorithms would be necessary to make sense from this large amount of data. Indeed, it would be a great feat to achieve automated decision-making by the implementation of machine learning (ML) methods like neural networks and other AI techniques. However, in absence of appropriate software and hardware support, big data can be quite hazy. We need to develop better techniques to handle this ‘endless sea’ of data and smart web applications for efficient analysis to gain workable insights. With proper storage and analytical tools in hand, the information and insights derived from big data can make the critical social infrastructure components and services (like healthcare, safety or transportation) more aware, interactive and efficient [ 3 ]. In addition, visualization of big data in a user-friendly manner will be a critical factor for societal development.

Healthcare as a big-data repository

Healthcare is a multi-dimensional system established with the sole aim for the prevention, diagnosis, and treatment of health-related issues or impairments in human beings. The major components of a healthcare system are the health professionals (physicians or nurses), health facilities (clinics, hospitals for delivering medicines and other diagnosis or treatment technologies), and a financing institution supporting the former two. The health professionals belong to various health sectors like dentistry, medicine, midwifery, nursing, psychology, physiotherapy, and many others. Healthcare is required at several levels depending on the urgency of situation. Professionals serve it as the first point of consultation (for primary care), acute care requiring skilled professionals (secondary care), advanced medical investigation and treatment (tertiary care) and highly uncommon diagnostic or surgical procedures (quaternary care). At all these levels, the health professionals are responsible for different kinds of information such as patient’s medical history (diagnosis and prescriptions related data), medical and clinical data (like data from imaging and laboratory examinations), and other private or personal medical data. Previously, the common practice to store such medical records for a patient was in the form of either handwritten notes or typed reports [ 4 ]. Even the results from a medical examination were stored in a paper file system. In fact, this practice is really old, with the oldest case reports existing on a papyrus text from Egypt that dates back to 1600 BC [ 5 ]. In Stanley Reiser’s words, the clinical case records freeze the episode of illness as a story in which patient, family and the doctor are a part of the plot” [ 6 ].

With the advent of computer systems and its potential, the digitization of all clinical exams and medical records in the healthcare systems has become a standard and widely adopted practice nowadays. In 2003, a division of the National Academies of Sciences, Engineering, and Medicine known as Institute of Medicine chose the term “ electronic health records ” to represent records maintained for improving the health care sector towards the benefit of patients and clinicians. Electronic health records (EHR) as defined by Murphy, Hanken and Waters are computerized medical records for patients any information relating to the past, present or future physical/mental health or condition of an individual which resides in electronic system(s) used to capture, transmit, receive, store, retrieve, link and manipulate multimedia data for the primary purpose of providing healthcare and health-related services” [ 7 ].

Electronic health records

It is important to note that the National Institutes of Health (NIH) recently announced the “All of Us” initiative ( https://allofus.nih.gov/ ) that aims to collect one million or more patients’ data such as EHR, including medical imaging, socio-behavioral, and environmental data over the next few years. EHRs have introduced many advantages for handling modern healthcare related data. Below, we describe some of the characteristic advantages of using EHRs. The first advantage of EHRs is that healthcare professionals have an improved access to the entire medical history of a patient. The information includes medical diagnoses, prescriptions, data related to known allergies, demographics, clinical narratives, and the results obtained from various laboratory tests. The recognition and treatment of medical conditions thus is time efficient due to a reduction in the lag time of previous test results. With time we have observed a significant decrease in the redundant and additional examinations, lost orders and ambiguities caused by illegible handwriting, and an improved care coordination between multiple healthcare providers. Overcoming such logistical errors has led to reduction in the number of drug allergies by reducing errors in medication dose and frequency. Healthcare professionals have also found access over web based and electronic platforms to improve their medical practices significantly using automatic reminders and prompts regarding vaccinations, abnormal laboratory results, cancer screening, and other periodic checkups. There would be a greater continuity of care and timely interventions by facilitating communication among multiple healthcare providers and patients. They can be associated to electronic authorization and immediate insurance approvals due to less paperwork. EHRs enable faster data retrieval and facilitate reporting of key healthcare quality indicators to the organizations, and also improve public health surveillance by immediate reporting of disease outbreaks. EHRs also provide relevant data regarding the quality of care for the beneficiaries of employee health insurance programs and can help control the increasing costs of health insurance benefits. Finally, EHRs can reduce or absolutely eliminate delays and confusion in the billing and claims management area. The EHRs and internet together help provide access to millions of health-related medical information critical for patient life.

Digitization of healthcare and big data

Similar to EHR, an electronic medical record (EMR) stores the standard medical and clinical data gathered from the patients. EHRs, EMRs, personal health record (PHR), medical practice management software (MPM), and many other healthcare data components collectively have the potential to improve the quality, service efficiency, and costs of healthcare along with the reduction of medical errors. The big data in healthcare includes the healthcare payer-provider data (such as EMRs, pharmacy prescription, and insurance records) along with the genomics-driven experiments (such as genotyping, gene expression data) and other data acquired from the smart web of internet of things (IoT) (Fig.  1 ). The adoption of EHRs was slow at the beginning of the 21st century however it has grown substantially after 2009 [ 7 , 8 ]. The management and usage of such healthcare data has been increasingly dependent on information technology. The development and usage of wellness monitoring devices and related software that can generate alerts and share the health related data of a patient with the respective health care providers has gained momentum, especially in establishing a real-time biomedical and health monitoring system. These devices are generating a huge amount of data that can be analyzed to provide real-time clinical or medical care [ 9 ]. The use of big data from healthcare shows promise for improving health outcomes and controlling costs.

figure 1

Workflow of Big data Analytics. Data warehouses store massive amounts of data generated from various sources. This data is processed using analytic pipelines to obtain smarter and affordable healthcare options

Big data in biomedical research

A biological system, such as a human cell, exhibits molecular and physical events of complex interplay. In order to understand interdependencies of various components and events of such a complex system, a biomedical or biological experiment usually gathers data on a smaller and/or simpler component. Consequently, it requires multiple simplified experiments to generate a wide map of a given biological phenomenon of interest. This indicates that more the data we have, the better we understand the biological processes. With this idea, modern techniques have evolved at a great pace. For instance, one can imagine the amount of data generated since the integration of efficient technologies like next-generation sequencing (NGS) and Genome wide association studies (GWAS) to decode human genetics. NGS-based data provides information at depths that were previously inaccessible and takes the experimental scenario to a completely new dimension. It has increased the resolution at which we observe or record biological events associated with specific diseases in a real time manner. The idea that large amounts of data can provide us a good amount of information that often remains unidentified or hidden in smaller experimental methods has ushered-in the ‘- omics ’ era. The ‘ omics ’ discipline has witnessed significant progress as instead of studying a single ‘ gene ’ scientists can now study the whole ‘ genome ’ of an organism in ‘ genomics ’ studies within a given amount of time. Similarly, instead of studying the expression or ‘ transcription ’ of single gene, we can now study the expression of all the genes or the entire ‘ transcriptome ’ of an organism under ‘ transcriptomics ’ studies. Each of these individual experiments generate a large amount of data with more depth of information than ever before. Yet, this depth and resolution might be insufficient to provide all the details required to explain a particular mechanism or event. Therefore, one usually finds oneself analyzing a large amount of data obtained from multiple experiments to gain novel insights. This fact is supported by a continuous rise in the number of publications regarding big data in healthcare (Fig.  2 ). Analysis of such big data from medical and healthcare systems can be of immense help in providing novel strategies for healthcare. The latest technological developments in data generation, collection and analysis, have raised expectations towards a revolution in the field of personalized medicine in near future.

figure 2

Publications associated with big data in healthcare. The numbers of publications in PubMed are plotted by year

Big data from omics studies

NGS has greatly simplified the sequencing and decreased the costs for generating whole genome sequence data. The cost of complete genome sequencing has fallen from millions to a couple of thousand dollars [ 10 ]. NGS technology has resulted in an increased volume of biomedical data that comes from genomic and transcriptomic studies. According to an estimate, the number of human genomes sequenced by 2025 could be between 100 million to 2 billion [ 11 ]. Combining the genomic and transcriptomic data with proteomic and metabolomic data can greatly enhance our knowledge about the individual profile of a patient—an approach often ascribed as “individual, personalized or precision health care”. Systematic and integrative analysis of omics data in conjugation with healthcare analytics can help design better treatment strategies towards precision and personalized medicine (Fig.  3 ). The genomics-driven experiments e.g., genotyping, gene expression, and NGS-based studies are the major source of big data in biomedical healthcare along with EMRs, pharmacy prescription information, and insurance records. Healthcare requires a strong integration of such biomedical data from various sources to provide better treatments and patient care. These prospects are so exciting that even though genomic data from patients would have many variables to be accounted, yet commercial organizations are already using human genome data to help the providers in making personalized medical decisions. This might turn out to be a game-changer in future medicine and health.

figure 3

A framework for integrating omics data and health care analytics to promote personalized treatment

Internet of Things (IOT)

Healthcare industry has not been quick enough to adapt to the big data movement compared to other industries. Therefore, big data usage in the healthcare sector is still in its infancy. For example, healthcare and biomedical big data have not yet converged to enhance healthcare data with molecular pathology. Such convergence can help unravel various mechanisms of action or other aspects of predictive biology. Therefore, to assess an individual’s health status, biomolecular and clinical datasets need to be married. One such source of clinical data in healthcare is ‘internet of things’ (IoT).

In fact, IoT is another big player implemented in a number of other industries including healthcare. Until recently, the objects of common use such as cars, watches, refrigerators and health-monitoring devices, did not usually produce or handle data and lacked internet connectivity. However, furnishing such objects with computer chips and sensors that enable data collection and transmission over internet has opened new avenues. The device technologies such as Radio Frequency IDentification (RFID) tags and readers, and Near Field Communication (NFC) devices, that can not only gather information but interact physically, are being increasingly used as the information and communication systems [ 3 ]. This enables objects with RFID or NFC to communicate and function as a web of smart things. The analysis of data collected from these chips or sensors may reveal critical information that might be beneficial in improving lifestyle, establishing measures for energy conservation, improving transportation, and healthcare. In fact, IoT has become a rising movement in the field of healthcare. IoT devices create a continuous stream of data while monitoring the health of people (or patients) which makes these devices a major contributor to big data in healthcare. Such resources can interconnect various devices to provide a reliable, effective and smart healthcare service to the elderly and patients with a chronic illness [ 12 ].

Advantages of IoT in healthcare

Using the web of IoT devices, a doctor can measure and monitor various parameters from his/her clients in their respective locations for example, home or office. Therefore, through early intervention and treatment, a patient might not need hospitalization or even visit the doctor resulting in significant cost reduction in healthcare expenses. Some examples of IoT devices used in healthcare include fitness or health-tracking wearable devices, biosensors, clinical devices for monitoring vital signs, and others types of devices or clinical instruments. Such IoT devices generate a large amount of health related data. If we can integrate this data with other existing healthcare data like EMRs or PHRs, we can predict a patients’ health status and its progression from subclinical to pathological state [ 9 ]. In fact, big data generated from IoT has been quiet advantageous in several areas in offering better investigation and predictions. On a larger scale, the data from such devices can help in personnel health monitoring, modelling the spread of a disease and finding ways to contain a particular disease outbreak.

The analysis of data from IoT would require an updated operating software because of its specific nature along with advanced hardware and software applications. We would need to manage data inflow from IoT instruments in real-time and analyze it by the minute. Associates in the healthcare system are trying to trim down the cost and ameliorate the quality of care by applying advanced analytics to both internally and externally generated data.

Mobile computing and mobile health (mHealth)

In today’s digital world, every individual seems to be obsessed to track their fitness and health statistics using the in-built pedometer of their portable and wearable devices such as, smartphones, smartwatches, fitness dashboards or tablets. With an increasingly mobile society in almost all aspects of life, the healthcare infrastructure needs remodeling to accommodate mobile devices [ 13 ]. The practice of medicine and public health using mobile devices, known as mHealth or mobile health, pervades different degrees of health care especially for chronic diseases, such as diabetes and cancer [ 14 ]. Healthcare organizations are increasingly using mobile health and wellness services for implementing novel and innovative ways to provide care and coordinate health as well as wellness. Mobile platforms can improve healthcare by accelerating interactive communication between patients and healthcare providers. In fact, Apple and Google have developed devoted platforms like Apple’s ResearchKit and Google Fit for developing research applications for fitness and health statistics [ 15 ]. These applications support seamless interaction with various consumer devices and embedded sensors for data integration. These apps help the doctors to have direct access to your overall health data. Both the user and their doctors get to know the real-time status of your body. These apps and smart devices also help by improving our wellness planning and encouraging healthy lifestyles. The users or patients can become advocates for their own health.

Nature of the big data in healthcare

EHRs can enable advanced analytics and help clinical decision-making by providing enormous data. However, a large proportion of this data is currently unstructured in nature. An unstructured data is the information that does not adhere to a pre-defined model or organizational framework. The reason for this choice may simply be that we can record it in a myriad of formats. Another reason for opting unstructured format is that often the structured input options (drop-down menus, radio buttons, and check boxes) can fall short for capturing data of complex nature. For example, we cannot record the non-standard data regarding a patient’s clinical suspicions, socioeconomic data, patient preferences, key lifestyle factors, and other related information in any other way but an unstructured format. It is difficult to group such varied, yet critical, sources of information into an intuitive or unified data format for further analysis using algorithms to understand and leverage the patients care. Nonetheless, the healthcare industry is required to utilize the full potential of these rich streams of information to enhance the patient experience. In the healthcare sector, it could materialize in terms of better management, care and low-cost treatments. We are miles away from realizing the benefits of big data in a meaningful way and harnessing the insights that come from it. In order to achieve these goals, we need to manage and analyze the big data in a systematic manner.

Management and analysis of big data

Big data is the huge amounts of a variety of data generated at a rapid rate. The data gathered from various sources is mostly required for optimizing consumer services rather than consumer consumption. This is also true for big data from the biomedical research and healthcare. The major challenge with big data is how to handle this large volume of information. To make it available for scientific community, the data is required to be stored in a file format that is easily accessible and readable for an efficient analysis. In the context of healthcare data, another major challenge is the implementation of high-end computing tools, protocols and high-end hardware in the clinical setting. Experts from diverse backgrounds including biology, information technology, statistics, and mathematics are required to work together to achieve this goal. The data collected using the sensors can be made available on a storage cloud with pre-installed software tools developed by analytic tool developers. These tools would have data mining and ML functions developed by AI experts to convert the information stored as data into knowledge. Upon implementation, it would enhance the efficiency of acquiring, storing, analyzing, and visualization of big data from healthcare. The main task is to annotate, integrate, and present this complex data in an appropriate manner for a better understanding. In absence of such relevant information, the (healthcare) data remains quite cloudy and may not lead the biomedical researchers any further. Finally, visualization tools developed by computer graphics designers can efficiently display this newly gained knowledge.

Heterogeneity of data is another challenge in big data analysis. The huge size and highly heterogeneous nature of big data in healthcare renders it relatively less informative using the conventional technologies. The most common platforms for operating the software framework that assists big data analysis are high power computing clusters accessed via grid computing infrastructures. Cloud computing is such a system that has virtualized storage technologies and provides reliable services. It offers high reliability, scalability and autonomy along with ubiquitous access, dynamic resource discovery and composability. Such platforms can act as a receiver of data from the ubiquitous sensors, as a computer to analyze and interpret the data, as well as providing the user with easy to understand web-based visualization. In IoT, the big data processing and analytics can be performed closer to data source using the services of mobile edge computing cloudlets and fog computing. Advanced algorithms are required to implement ML and AI approaches for big data analysis on computing clusters. A programming language suitable for working on big data (e.g. Python, R or other languages) could be used to write such algorithms or software. Therefore, a good knowledge of biology and IT is required to handle the big data from biomedical research. Such a combination of both the trades usually fits for bioinformaticians. The most common among various platforms used for working with big data include Hadoop and Apache Spark. We briefly introduce these platforms below.

Loading large amounts of (big) data into the memory of even the most powerful of computing clusters is not an efficient way to work with big data. Therefore, the best logical approach for analyzing huge volumes of complex big data is to distribute and process it in parallel on multiple nodes. However, the size of data is usually so large that thousands of computing machines are required to distribute and finish processing in a reasonable amount of time. When working with hundreds or thousands of nodes, one has to handle issues like how to parallelize the computation, distribute the data, and handle failures. One of most popular open-source distributed application for this purpose is Hadoop [ 16 ]. Hadoop implements MapReduce algorithm for processing and generating large datasets. MapReduce uses map and reduce primitives to map each logical record’ in the input into a set of intermediate key/value pairs, and reduce operation combines all the values that shared the same key [ 17 ]. It efficiently parallelizes the computation, handles failures, and schedules inter-machine communication across large-scale clusters of machines. Hadoop Distributed File System (HDFS) is the file system component that provides a scalable, efficient, and replica based storage of data at various nodes that form a part of a cluster [ 16 ]. Hadoop has other tools that enhance the storage and processing components therefore many large companies like Yahoo, Facebook, and others have rapidly adopted  it. Hadoop has enabled researchers to use data sets otherwise impossible to handle. Many large projects, like the determination of a correlation between the air quality data and asthma admissions, drug development using genomic and proteomic data, and other such aspects of healthcare are implementing Hadoop. Therefore, with the implementation of Hadoop system, the healthcare analytics will not be held back.

Apache Spark

Apache Spark is another open source alternative to Hadoop. It is a unified engine for distributed data processing that includes higher-level libraries for supporting SQL queries ( Spark SQL ), streaming data ( Spark Streaming ), machine learning ( MLlib ) and graph processing ( GraphX ) [ 18 ]. These libraries help in increasing developer productivity because the programming interface requires lesser coding efforts and can be seamlessly combined to create more types of complex computations. By implementing Resilient distributed Datasets (RDDs), in-memory processing of data is supported that can make Spark about 100× faster than Hadoop in multi-pass analytics (on smaller datasets) [ 19 , 20 ]. This is more true when the data size is smaller than the available memory [ 21 ]. This indicates that processing of really big data with Apache Spark would require a large amount of memory. Since, the cost of memory is higher than the hard drive, MapReduce is expected to be more cost effective for large datasets compared to Apache Spark. Similarly, Apache Storm was developed to provide a real-time framework for data stream processing. This platform supports most of the programming languages. Additionally, it offers good horizontal scalability and built-in-fault-tolerance capability for big data analysis.

Machine learning for information extraction, data analysis and predictions

In healthcare, patient data contains recorded signals for instance, electrocardiogram (ECG), images, and videos. Healthcare providers have barely managed to convert such healthcare data into EHRs. Efforts are underway to digitize patient-histories from pre-EHR era notes and supplement the standardization process by turning static images into machine-readable text. For example, optical character recognition (OCR) software is one such approach that can recognize handwriting as well as computer fonts and push digitization. Such unstructured and structured healthcare datasets have untapped wealth of information that can be harnessed using advanced AI programs to draw critical actionable insights in the context of patient care. In fact, AI has emerged as the method of choice for big data applications in medicine. This smart system has quickly found its niche in decision making process for the diagnosis of diseases. Healthcare professionals analyze such data for targeted abnormalities using appropriate ML approaches. ML can filter out structured information from such raw data.

Extracting information from EHR datasets

Emerging ML or AI based strategies are helping to refine healthcare industry’s information processing capabilities. For example, natural language processing (NLP) is a rapidly developing area of machine learning that can identify key syntactic structures in free text, help in speech recognition and extract the meaning behind a narrative. NLP tools can help generate new documents, like a clinical visit summary, or to dictate clinical notes. The unique content and complexity of clinical documentation can be challenging for many NLP developers. Nonetheless, we should be able to extract relevant information from healthcare data using such approaches as NLP.

AI has also been used to provide predictive capabilities to healthcare big data. For example, ML algorithms can convert the diagnostic system of medical images into automated decision-making. Though it is apparent that healthcare professionals may not be replaced by machines in the near future, yet AI can definitely assist physicians to make better clinical decisions or even replace human judgment in certain functional areas of healthcare.

Image analytics

Some of the most widely used imaging techniques in healthcare include computed tomography (CT), magnetic resonance imaging (MRI), X-ray, molecular imaging, ultrasound, photo-acoustic imaging, functional MRI (fMRI), positron emission tomography (PET), electroencephalography (EEG), and mammograms. These techniques capture high definition medical images (patient data) of large sizes. Healthcare professionals like radiologists, doctors and others do an excellent job in analyzing medical data in the form of these files for targeted abnormalities. However, it is also important to acknowledge the lack of specialized professionals for many diseases. In order to compensate for this dearth of professionals, efficient systems like Picture Archiving and Communication System (PACS) have been developed for storing and convenient access to medical image and reports data [ 22 ]. PACSs are popular for delivering images to local workstations, accomplished by protocols such as digital image communication in medicine (DICOM). However, data exchange with a PACS relies on using structured data to retrieve medical images. This by nature misses out on the unstructured information contained in some of the biomedical images. Moreover, it is possible to miss an additional information about a patient’s health status that is present in these images or similar data. A professional focused on diagnosing an unrelated condition might not observe it, especially when the condition is still emerging. To help in such situations, image analytics is making an impact on healthcare by actively extracting disease biomarkers from biomedical images. This approach uses ML and pattern recognition techniques to draw insights from massive volumes of clinical image data to transform the diagnosis, treatment and monitoring of patients. It focuses on enhancing the diagnostic capability of medical imaging for clinical decision-making.

A number of software tools have been developed based on functionalities such as generic, registration, segmentation, visualization, reconstruction, simulation and diffusion to perform medical image analysis in order to dig out the hidden information. For example, Visualization Toolkit is a freely available software which allows powerful processing and analysis of 3D images from medical tests [ 23 ], while SPM can process and analyze 5 different types of brain images (e.g. MRI, fMRI, PET, CT-Scan and EEG) [ 24 ]. Other software like GIMIAS, Elastix, and MITK support all types of images. Various other widely used tools and their features in this domain are listed in Table  1 . Such bioinformatics-based big data analysis may extract greater insights and value from imaging data to boost and support precision medicine projects, clinical decision support tools, and other modes of healthcare. For example, we can also use it to monitor new targeted-treatments for cancer.

Big data from omics

The big data from “omics” studies is a new kind of challenge for the bioinformaticians. Robust algorithms are required to analyze such complex data from biological systems. The ultimate goal is to convert this huge data into an informative knowledge base. The application of bioinformatics approaches to transform the biomedical and genomics data into predictive and preventive health is known as translational bioinformatics. It is at the forefront of data-driven healthcare. Various kinds of quantitative data in healthcare, for example from laboratory measurements, medication data and genomic profiles, can be combined and used to identify new meta-data that can help precision therapies [ 25 ]. This is why emerging new technologies are required to help in analyzing this digital wealth. In fact, highly ambitious multimillion-dollar projects like “ Big Data Research and Development Initiative ” have been launched that aim to enhance the quality of big data tools and techniques for a better organization, efficient access and smart analysis of big data. There are many advantages anticipated from the processing of ‘ omics’ data from large-scale Human Genome Project and other population sequencing projects. In the population sequencing projects like 1000 genomes, the researchers will have access to a marvelous amount of raw data. Similarly, Human Genome Project based Encyclopedia of DNA Elements (ENCODE) project aimed to determine all functional elements in the human genome using bioinformatics approaches. Here, we list some of the widely used bioinformatics-based tools for big data analytics on omics data.

SparkSeq is an efficient and cloud-ready platform based on Apache Spark framework and Hadoop library that is used for analyses of genomic data for interactive genomic data analysis with nucleotide precision

SAMQA identifies errors and ensures the quality of large-scale genomic data. This tool was originally built for the National Institutes of Health Cancer Genome Atlas project to identify and report errors including sequence alignment/map [SAM] format error and empty reads.

ART can simulate profiles of read errors and read lengths for data obtained using high throughput sequencing platforms including SOLiD and Illumina platforms.

DistMap is another toolkit used for distributed short-read mapping based on Hadoop cluster that aims to cover a wider range of sequencing applications. For instance, one of its applications namely the BWA mapper can perform 500 million read pairs in about 6 h, approximately 13 times faster than a conventional single-node mapper.

SeqWare is a query engine based on Apache HBase database system that enables access for large-scale whole-genome datasets by integrating genome browsers and tools.

CloudBurst is a parallel computing model utilized in genome mapping experiments to improve the scalability of reading large sequencing data.

Hydra uses the Hadoop-distributed computing framework for processing large peptide and spectra databases for proteomics datasets. This specific tool is capable of performing 27 billion peptide scorings in less than 60 min on a Hadoop cluster.

BlueSNP is an R package based on Hadoop platform used for genome-wide association studies (GWAS) analysis, primarily aiming on the statistical readouts to obtain significant associations between genotype–phenotype datasets. The efficiency of this tool is estimated to analyze 1000 phenotypes on 10 6 SNPs in 10 4 individuals in a duration of half-an-hour.

Myrna the cloud-based pipeline, provides information on the expression level differences of genes, including read alignments, data normalization, and statistical modeling.

The past few years have witnessed a tremendous increase in disease specific datasets from omics platforms. For example, the ArrayExpress Archive of Functional Genomics data repository contains information from approximately 30,000 experiments and more than one million functional assays. The growing amount of data demands for better and efficient bioinformatics driven packages to analyze and interpret the information obtained. This has also led to the birth of specific tools to analyze such massive amounts of data. Below, we mention some of the most popular commercial platforms for big data analytics.

Commercial platforms for healthcare data analytics

In order to tackle big data challenges and perform smoother analytics, various companies have implemented AI to analyze published results, textual data, and image data to obtain meaningful outcomes. IBM Corporation is one of the biggest and experienced players in this sector to provide healthcare analytics services commercially. IBM’s Watson Health is an AI platform to share and analyze health data among hospitals, providers and researchers. Similarly, Flatiron Health provides technology-oriented services in healthcare analytics specially focused in cancer research. Other big companies such as Oracle Corporation and Google Inc. are also focusing to develop cloud-based storage and distributed computing power platforms. Interestingly, in the recent few years, several companies and start-ups have also emerged to provide health care-based analytics and solutions. Some of the vendors in healthcare sector are provided in Table  2 . Below we discuss a few of these commercial solutions.

Ayasdi is one such big vendor which focuses on ML based methodologies to primarily provide machine intelligence platform along with an application framework with tried & tested enterprise scalability. It provides various applications for healthcare analytics, for example, to understand and manage clinical variation, and to transform clinical care costs. It is also capable of analyzing and managing how hospitals are organized, conversation between doctors, risk-oriented decisions by doctors for treatment, and the care they deliver to patients. It also provides an application for the assessment and management of population health, a proactive strategy that goes beyond traditional risk analysis methodologies. It uses ML intelligence for predicting future risk trajectories, identifying risk drivers, and providing solutions for best outcomes. A strategic illustration of the company’s methodology for analytics is provided in Fig.  4 .

figure 4

Illustration of application of “Intelligent Application Suite” provided by AYASDI for various analyses such as clinical variation, population health, and risk management in healthcare sector

Linguamatics

It is an NLP based algorithm that relies on an interactive text mining algorithm (I2E). I2E can extract and analyze a wide array of information. Results obtained using this technique are tenfold faster than other tools and does not require expert knowledge for data interpretation. This approach can provide information on genetic relationships and facts from unstructured data. Classical, ML requires well-curated data as input to generate clean and filtered results. However, NLP when integrated in EHR or clinical records per se facilitates the extraction of clean and structured information that often remains hidden in unstructured input data (Fig.  5 ).

figure 5

Schematic representation for the working principle of NLP-based AI system used in massive data retention and analysis in Linguamatics

This is one of the unique ideas of the tech-giant IBM that targets big data analytics in almost every professional sector. This platform utilizes ML and AI based algorithms extensively to extract the maximum information from minimal input. IBM Watson enforces the regimen of integrating a wide array of healthcare domains to provide meaningful and structured data (Fig.  6 ). In an attempt to uncover novel drug targets specifically in cancer disease model, IBM Watson and Pfizer have formed a productive collaboration to accelerate the discovery of novel immune-oncology combinations. Combining Watson’s deep learning modules integrated with AI technologies allows the researchers to interpret complex genomic data sets. IBM Watson has been used to predict specific types of cancer based on the gene expression profiles obtained from various large data sets providing signs of multiple druggable targets. IBM Watson is also used in drug discovery programs by integrating curated literature and forming network maps to provide a detailed overview of the molecular landscape in a specific disease model.

figure 6

IBM Watson in healthcare data analytics. Schematic representation of the various functional modules in IBM Watson’s big-data healthcare package. For instance, the drug discovery domain involves network of highly coordinated data acquisition and analysis within the spectrum of curating database to building meaningful pathways towards elucidating novel druggable targets

In order to analyze the diversified medical data, healthcare domain, describes analytics in four categories: descriptive, diagnostic, predictive, and prescriptive analytics. Descriptive analytics refers for describing the current medical situations and commenting on that whereas diagnostic analysis explains reasons and factors behind occurrence of certain events, for example, choosing treatment option for a patient based on clustering and decision trees. Predictive analytics focuses on predictive ability of the future outcomes by determining trends and probabilities. These methods are mainly built up of machine leaning techniques and are helpful in the context of understanding complications that a patient can develop. Prescriptive analytics is to perform analysis to propose an action towards optimal decision making. For example, decision of avoiding a given treatment to the patient based on observed side effects and predicted complications. In order to improve performance of the current medical systems integration of big data into healthcare analytics can be a major factor; however, sophisticated strategies  need to be developed. An architecture of best practices of different analytics in healthcare domain is required for integrating big data technologies to improve the outcomes. However, there are many challenges associated with the implementation of such strategies.

Challenges associated with healthcare big data

Methods for big data management and analysis are being continuously developed especially for real-time data streaming, capture, aggregation, analytics (using ML and predictive), and visualization solutions that can help integrate a better utilization of EMRs with the healthcare. For example, the EHR adoption rate of federally tested and certified EHR programs in the healthcare sector in the U.S.A. is nearly complete [ 7 ]. However, the availability of hundreds of EHR products certified by the government, each with different clinical terminologies, technical specifications, and functional capabilities has led to difficulties in the interoperability and sharing of data. Nonetheless, we can safely say that the healthcare industry has entered into a ‘post-EMR’ deployment phase. Now, the main objective is to gain actionable insights from these vast amounts of data collected as EMRs. Here, we discuss some of these challenges in brief.

Storing large volume of data is one of the primary challenges, but many organizations are comfortable with data storage on their own premises. It has several advantages like control over security, access, and up-time. However, an on-site server network can be expensive to scale and difficult to maintain. It appears that with decreasing costs and increasing reliability, the cloud-based storage using IT infrastructure is a better option which most of the healthcare organizations have opted for. Organizations must choose cloud-partners that understand the importance of healthcare-specific compliance and security issues. Additionally, cloud storage offers lower up-front costs, nimble disaster recovery, and easier expansion. Organizations can also have a hybrid approach to their data storage programs, which may be the most flexible and workable approach for providers with varying data access and storage needs.

The data needs to cleansed or scrubbed to ensure the accuracy, correctness, consistency, relevancy, and purity after acquisition. This cleaning process can be manual or automatized using logic rules to ensure high levels of accuracy and integrity. More sophisticated and precise tools use machine-learning techniques to reduce time and expenses and to stop foul data from derailing big data projects.

Unified format

Patients produce a huge volume of data that is not easy to capture with traditional EHR format, as it is knotty and not easily manageable. It is too difficult to handle big data especially when it comes without a perfect data organization to the healthcare providers. A need to codify all the clinically relevant information surfaced for the purpose of claims, billing purposes, and clinical analytics. Therefore, medical coding systems like Current Procedural Terminology (CPT) and International Classification of Diseases (ICD) code sets were developed to represent the core clinical concepts. However, these code sets have their own limitations.

Some studies have observed that the reporting of patient data into EMRs or EHRs is not entirely accurate yet [ 26 , 27 , 28 , 29 ], probably because of poor EHR utility, complex workflows, and a broken understanding of why big data is all-important to capture well. All these factors can contribute to the quality issues for big data all along its lifecycle. The EHRs intend to improve the quality and communication of data in clinical workflows though reports indicate discrepancies in these contexts. The documentation quality might improve by using self-report questionnaires from patients for their symptoms.

Image pre-processing

Studies have observed various physical factors that can lead to altered data quality and misinterpretations from existing medical records [ 30 ]. Medical images often suffer technical barriers that involve multiple types of noise and artifacts. Improper handling of medical images can also cause tampering of images for instance might lead to delineation of anatomical structures such as veins which is non-correlative with real case scenario. Reduction of noise, clearing artifacts, adjusting contrast of acquired images and image quality adjustment post mishandling are some of the measures that can be implemented to benefit the purpose.

There have been many security breaches, hackings, phishing attacks, and ransomware episodes that data security is a priority for healthcare organizations. After noticing an array of vulnerabilities, a list of technical safeguards was developed for the protected health information (PHI). These rules, termed as HIPAA Security Rules, help guide organizations with storing, transmission, authentication protocols, and controls over access, integrity, and auditing. Common security measures like using up-to-date anti-virus software, firewalls, encrypting sensitive data, and multi-factor authentication can save a lot of trouble.

To have a successful data governance plan, it would be mandatory to have complete, accurate, and up-to-date metadata regarding all the stored data. The metadata would be composed of information like time of creation, purpose and person responsible for the data, previous usage (by who, why, how, and when) for researchers and data analysts. This would allow analysts to replicate previous queries and help later scientific studies and accurate benchmarking. This increases the usefulness of data and prevents creation of “data dumpsters” of low or no use.

Metadata would make it easier for organizations to query their data and get some answers. However, in absence of proper interoperability between datasets the query tools may not access an entire repository of data. Also, different components of a dataset should be well interconnected or linked and easily accessible otherwise a complete portrait of an individual patient’s health may not be generated. Medical coding systems like ICD-10, SNOMED-CT, or LOINC must be implemented to reduce free-form concepts into a shared ontology. If the accuracy, completeness, and standardization of the data are not in question, then Structured Query Language (SQL) can be used to query large datasets and relational databases.

Visualization

A clean and engaging visualization of data with charts, heat maps, and histograms to illustrate contrasting figures and correct labeling of information to reduce potential confusion, can make it much easier for us to absorb information and use it appropriately. Other examples include bar charts, pie charts, and scatterplots with their own specific ways to convey the data.

Data sharing

Patients may or may not receive their care at multiple locations. In the former case, sharing data with other healthcare organizations would be essential. During such sharing, if the data is not interoperable then data movement between disparate organizations could be severely curtailed. This could be due to technical and organizational barriers. This may leave clinicians without key information for making decisions regarding follow-ups and treatment strategies for patients. Solutions like Fast Healthcare Interoperability Resource (FHIR) and public APIs, CommonWell (a not-for-profit trade association) and Carequality (a consensus-built, common interoperability framework) are making data interoperability and sharing easy and secure. The biggest roadblock for data sharing is the treatment of data as a commodity that can provide a competitive advantage. Therefore, sometimes both providers and vendors intentionally interfere with the flow of information to block the information flow between different EHR systems [ 31 ].

The healthcare providers will need to overcome every challenge on this list and more to develop a big data exchange ecosystem that provides trustworthy, timely, and meaningful information by connecting all members of the care continuum. Time, commitment, funding, and communication would be required before these challenges are overcome.

Big data analytics for cutting costs

To develop a healthcare system based on big data that can exchange big data and provides us with trustworthy, timely, and meaningful information, we need to overcome every challenge mentioned above. Overcoming these challenges would require investment in terms of time, funding, and commitment. However, like other technological advances, the success of these ambitious steps would apparently ease the present burdens on healthcare especially in terms of costs. It is believed that the implementation of big data analytics by healthcare organizations might lead to a saving of over 25% in annual costs in the coming years. Better diagnosis and disease predictions by big data analytics can enable cost reduction by decreasing the hospital readmission rate. The healthcare firms do not understand the variables responsible for readmissions well enough. It would be easier for healthcare organizations to improve their protocols for dealing with patients and prevent readmission by determining these relationships well. Big data analytics can also help in optimizing staffing, forecasting operating room demands, streamlining patient care, and improving the pharmaceutical supply chain. All of these factors will lead to an ultimate reduction in the healthcare costs by the organizations.

Quantum mechanics and big data analysis

Big data sets can be staggering in size. Therefore, its analysis remains daunting even with the most powerful modern computers. For most of the analysis, the bottleneck lies in the computer’s ability to access its memory and not in the processor [ 32 , 33 ]. The capacity, bandwidth or latency requirements of memory hierarchy outweigh the computational requirements so much that supercomputers are increasingly used for big data analysis [ 34 , 35 ]. An additional solution is the application of quantum approach for big data analysis.

Quantum computing and its advantages

The common digital computing uses binary digits to code for the data whereas quantum computation uses quantum bits or qubits [ 36 ]. A qubit is a quantum version of the classical binary bits that can represent a zero, a one, or any linear combination of states (called superpositions ) of those two qubit states [ 37 ]. Therefore, qubits allow computer bits to operate in three states compared to two states in the classical computation. This allows quantum computers to work thousands of times faster than regular computers. For example, a conventional analysis of a dataset with n points would require 2 n processing units whereas it would require just n quantum bits using a quantum computer. Quantum computers use quantum mechanical phenomena like superposition and quantum entanglement to perform computations [ 38 , 39 ].

Quantum algorithms can speed-up the big data analysis exponentially [ 40 ]. Some complex problems, believed to be unsolvable using conventional computing, can be solved by quantum approaches. For example, the current encryption techniques such as RSA, public-key (PK) and Data Encryption Standard (DES) which are thought to be impassable now would be irrelevant in future because quantum computers will quickly get through them [ 41 ]. Quantum approaches can dramatically reduce the information required for big data analysis. For example, quantum theory can maximize the distinguishability between a multilayer network using a minimum number of layers [ 42 ]. In addition, quantum approaches require a relatively small dataset to obtain a maximally sensitive data analysis compared to the conventional (machine-learning) techniques. Therefore, quantum approaches can drastically reduce the amount of computational power required to analyze big data. Even though, quantum computing is still in its infancy and presents many open challenges, it is being implemented for healthcare data.

Applications in big data analysis

Quantum computing is picking up and seems to be a potential solution for big data analysis. For example, identification of rare events, such as the production of Higgs bosons at the Large Hadron Collider (LHC) can now be performed using quantum approaches [ 43 ]. At LHC, huge amounts of collision data (1PB/s) is generated that needs to be filtered and analyzed. One such approach, the quantum annealing for ML (QAML) that implements a combination of ML and quantum computing with a programmable quantum annealer, helps reduce human intervention and increase the accuracy of assessing particle-collision data. In another example, the quantum support vector machine was implemented for both training and classification stages to classify new data [ 44 ]. Such quantum approaches could find applications in many areas of science [ 43 ]. Indeed, recurrent quantum neural network (RQNN) was implemented to increase signal separability in electroencephalogram (EEG) signals [ 45 ]. Similarly, quantum annealing was applied to intensity modulated radiotherapy (IMRT) beamlet intensity optimization [ 46 ]. Similarly, there exist more applications of quantum approaches regarding healthcare e.g. quantum sensors and quantum microscopes [ 47 ].

Conclusions and future prospects

Nowadays, various biomedical and healthcare tools such as genomics, mobile biometric sensors, and smartphone apps generate a big amount of data. Therefore, it is mandatory for us to know about and assess that can be achieved using this data. For example, the analysis of such data can provide further insights in terms of procedural, technical, medical and other types of improvements in healthcare. After a review of these healthcare procedures, it appears that the full potential of patient-specific medical specialty or personalized medicine is under way. The collective big data analysis of EHRs, EMRs and other medical data is continuously helping build a better prognostic framework. The companies providing service for healthcare analytics and clinical transformation are indeed contributing towards better and effective outcome. Common goals of these companies include reducing cost of analytics, developing effective Clinical Decision Support (CDS) systems, providing platforms for better treatment strategies, and identifying and preventing fraud associated with big data. Though, almost all of them face challenges on federal issues like how private data is handled, shared and kept safe. The combined pool of data from healthcare organizations and biomedical researchers have resulted in a better outlook, determination, and treatment of various diseases. This has also helped in building a better and healthier personalized healthcare framework. Modern healthcare fraternity has realized the potential of big data and therefore, have implemented big data analytics in healthcare and clinical practices. Supercomputers to quantum computers are helping in extracting meaningful information from big data in dramatically reduced time periods. With high hopes of extracting new and actionable knowledge that can improve the present status of healthcare services, researchers are plunging into biomedical big data despite the infrastructure challenges. Clinical trials, analysis of pharmacy and insurance claims together, discovery of biomarkers is a part of a novel and creative way to analyze healthcare big data.

Big data analytics leverage the gap within structured and unstructured data sources. The shift to an integrated data environment is a well-known hurdle to overcome. Interesting enough, the principle of big data heavily relies on the idea of the more the information, the more insights one can gain from this information and can make predictions for future events. It is rightfully projected by various reliable consulting firms and health care companies that the big data healthcare market is poised to grow at an exponential rate. However, in a short span we have witnessed a spectrum of analytics currently in use that have shown significant impacts on the decision making and performance of healthcare industry. The exponential growth of medical data from various domains has forced computational experts to design innovative strategies to analyze and interpret such enormous amount of data within a given timeframe. The integration of computational systems for signal processing from both research and practicing medical professionals has witnessed growth. Thus, developing a detailed model of a human body by combining physiological data and “-omics” techniques can be the next big target. This unique idea can enhance our knowledge of disease conditions and possibly help in the development of novel diagnostic tools. The continuous rise in available genomic data including inherent hidden errors from experiment and analytical practices need further attention. However, there are opportunities in each step of this extensive process to introduce systemic improvements within the healthcare research.

High volume of medical data collected across heterogeneous platforms has put a challenge to data scientists for careful integration and implementation. It is therefore suggested that revolution in healthcare is further needed to group together bioinformatics, health informatics and analytics to promote personalized and more effective treatments. Furthermore, new strategies and technologies should be developed to understand the nature (structured, semi-structured, unstructured), complexity (dimensions and attributes) and volume of the data to derive meaningful information. The greatest asset of big data lies in its limitless possibilities. The birth and integration of big data within the past few years has brought substantial advancements in the health care sector ranging from medical data management to drug discovery programs for complex human diseases including cancer and neurodegenerative disorders. To quote a simple example supporting the stated idea, since the late 2000′s the healthcare market has witnessed advancements in the EHR system in the context of data collection, management and usability. We believe that big data will add-on and bolster the existing pipeline of healthcare advances instead of replacing skilled manpower, subject knowledge experts and intellectuals, a notion argued by many. One can clearly see the transitions of health care market from a wider volume base to personalized or individual specific domain. Therefore, it is essential for technologists and professionals to understand this evolving situation. In the coming year it can be projected that big data analytics will march towards a predictive system. This would mean prediction of futuristic outcomes in an individual’s health state based on current or existing data (such as EHR-based and Omics-based). Similarly, it can also be presumed that structured information obtained from a certain geography might lead to generation of population health information. Taken together, big data will facilitate healthcare by introducing prediction of epidemics (in relation to population health), providing early warnings of disease conditions, and helping in the discovery of novel biomarkers and intelligent therapeutic intervention strategies for an improved quality of life.

Availability of data and materials

Not applicable.

Laney D. 3D data management: controlling data volume, velocity, and variety, Application delivery strategies. Stamford: META Group Inc; 2001.

Google Scholar  

Mauro AD, Greco M, Grimaldi M. A formal definition of big data based on its essential features. Libr Rev. 2016;65(3):122–35.

Article   Google Scholar  

Gubbi J, et al. Internet of Things (IoT): a vision, architectural elements, and future directions. Future Gener Comput Syst. 2013;29(7):1645–60.

Doyle-Lindrud S. The evolution of the electronic health record. Clin J Oncol Nurs. 2015;19(2):153–4.

Gillum RF. From papyrus to the electronic tablet: a brief history of the clinical medical record with lessons for the digital Age. Am J Med. 2013;126(10):853–7.

Reiser SJ. The clinical record in medicine part 1: learning from cases*. Ann Intern Med. 1991;114(10):902–7.

Reisman M. EHRs: the challenge of making electronic data usable and interoperable. Pharm Ther. 2017;42(9):572–5.

Murphy G, Hanken MA, Waters K. Electronic health records: changing the vision. Philadelphia: Saunders W B Co; 1999. p. 627.

Shameer K, et al. Translational bioinformatics in the era of real-time biomedical, health care and wellness data streams. Brief Bioinform. 2017;18(1):105–24.

Service, R.F. The race for the $1000 genome. Science. 2006;311(5767):1544–6.

Stephens ZD, et al. Big data: astronomical or genomical? PLoS Biol. 2015;13(7):e1002195.

Yin Y, et al. The internet of things in healthcare: an overview. J Ind Inf Integr. 2016;1:3–13.

Moore SK. Unhooking medicine [wireless networking]. IEEE Spectr 2001; 38(1): 107–8, 110.

MathSciNet   Google Scholar  

Nasi G, Cucciniello M, Guerrazzi C. The role of mobile technologies in health care processes: the case of cancer supportive care. J Med Internet Res. 2015;17(2):e26.

Apple, ResearchKit/ResearchKit: ResearchKit 1.5.3. 2017.

Shvachko K, et al. The hadoop distributed file system. In: Proceedings of the 2010 IEEE 26th symposium on mass storage systems and technologies (MSST). New York: IEEE Computer Society; 2010. p. 1–10.

Dean J, Ghemawat S. MapReduce: simplified data processing on large clusters. Commun ACM. 2008;51(1):107–13.

Zaharia M, et al. Apache Spark: a unified engine for big data processing. Commun ACM. 2016;59(11):56–65.

Gopalani S, Arora R. Comparing Apache Spark and Map Reduce with performance analysis using K-means; 2015.

Ahmed H, et al. Performance comparison of spark clusters configured conventionally and a cloud servicE. Procedia Comput Sci. 2016;82:99–106.

Saouabi M, Ezzati A. A comparative between hadoop mapreduce and apache Spark on HDFS. In: Proceedings of the 1st international conference on internet of things and machine learning. Liverpool: ACM; 2017. p. 1–4.

Strickland NH. PACS (picture archiving and communication systems): filmless radiology. Arch Dis Child. 2000;83(1):82–6.

Article   MathSciNet   Google Scholar  

Schroeder W, Martin K, Lorensen B. The visualization toolkit. 4th ed. Clifton Park: Kitware; 2006.

Friston K, et al. Statistical parametric mapping. London: Academic Press; 2007. p. vii.

Li L, et al. Identification of type 2 diabetes subgroups through topological analysis of patient similarity. Sci Transl Med. 2015;7(311):311ra174.

Valikodath NG, et al. Agreement of ocular symptom reporting between patient-reported outcomes and medical records. JAMA Ophthalmol. 2017;135(3):225–31.

Fromme EK, et al. How accurate is clinician reporting of chemotherapy adverse effects? A comparison with patient-reported symptoms from the Quality-of-Life Questionnaire C30. J Clin Oncol. 2004;22(17):3485–90.

Beckles GL, et al. Agreement between self-reports and medical records was only fair in a cross-sectional study of performance of annual eye examinations among adults with diabetes in managed care. Med Care. 2007;45(9):876–83.

Echaiz JF, et al. Low correlation between self-report and medical record documentation of urinary tract infection symptoms. Am J Infect Control. 2015;43(9):983–6.

Belle A, et al. Big data analytics in healthcare. Biomed Res Int. 2015;2015:370194.

Adler-Milstein J, Pfeifer E. Information blocking: is it occurring and what policy strategies can address it? Milbank Q. 2017;95(1):117–35.

Or-Bach, Z. A 1,000x improvement in computer systems by bridging the processor-memory gap. In: 2017 IEEE SOI-3D-subthreshold microelectronics technology unified conference (S3S). 2017.

Mahapatra NR, Venkatrao B. The processor-memory bottleneck: problems and solutions. XRDS. 1999;5(3es):2.

Voronin AA, Panchenko VY, Zheltikov AM. Supercomputations and big-data analysis in strong-field ultrafast optical physics: filamentation of high-peak-power ultrashort laser pulses. Laser Phys Lett. 2016;13(6):065403.

Dollas, A. Big data processing with FPGA supercomputers: opportunities and challenges. In: 2014 IEEE computer society annual symposium on VLSI; 2014.

Saffman M. Quantum computing with atomic qubits and Rydberg interactions: progress and challenges. J Phys B: At Mol Opt Phys. 2016;49(20):202001.

Nielsen MA, Chuang IL. Quantum computation and quantum information. 10th anniversary ed. Cambridge: Cambridge University Press; 2011. p. 708.

Raychev N. Quantum computing models for algebraic applications. Int J Scientific Eng Res. 2015;6(8):1281–8.

Harrow A. Why now is the right time to study quantum computing. XRDS. 2012;18(3):32–7.

Lloyd S, Garnerone S, Zanardi P. Quantum algorithms for topological and geometric analysis of data. Nat Commun. 2016;7:10138.

Buchanan W, Woodward A. Will quantum computers be the end of public key encryption? J Cyber Secur Technol. 2017;1(1):1–22.

De Domenico M, et al. Structural reducibility of multilayer networks. Nat Commun. 2015;6:6864.

Mott A, et al. Solving a Higgs optimization problem with quantum annealing for machine learning. Nature. 2017;550:375.

Rebentrost P, Mohseni M, Lloyd S. Quantum support vector machine for big data classification. Phys Rev Lett. 2014;113(13):130503.

Gandhi V, et al. Quantum neural network-based EEG filtering for a brain-computer interface. IEEE Trans Neural Netw Learn Syst. 2014;25(2):278–88.

Nazareth DP, Spaans JD. First application of quantum annealing to IMRT beamlet intensity optimization. Phys Med Biol. 2015;60(10):4137–48.

Reardon S. Quantum microscope offers MRI for molecules. Nature. 2017;543(7644):162.

Download references

Acknowledgements

Author information.

Sabyasachi Dash and Sushil Kumar Shakyawar contributed equally to this work

Authors and Affiliations

Department of Pathology and Laboratory Medicine, Weill Cornell Medicine, New York, 10065, NY, USA

Sabyasachi Dash

Center of Biological Engineering, University of Minho, Campus de Gualtar, 4710-057, Braga, Portugal

Sushil Kumar Shakyawar

SilicoLife Lda, Rua do Canastreiro 15, 4715-387, Braga, Portugal

Postgraduate School for Molecular Medicine, Warszawskiego Uniwersytetu Medycznego, Warsaw, Poland

Mohit Sharma

Małopolska Centre for Biotechnology, Jagiellonian University, Kraków, Poland

3B’s Research Group, Headquarters of the European Institute of Excellence on Tissue Engineering and Regenerative Medicine, AvePark - Parque de Ciência e Tecnologia, Zona Industrial da Gandra, Barco, 4805-017, Guimarães, Portugal

Sandeep Kaushik

You can also search for this author in PubMed   Google Scholar

Contributions

MS wrote the manuscript. SD and SKS further added significant discussion that highly improved the quality of manuscript. SK designed the content sequence, guided SD, SS and MS in writing and revising the manuscript and checked the manuscript. All authors read and approved the final manuscript.

Corresponding author

Correspondence to Sandeep Kaushik .

Ethics declarations

Ethics approval and consent to participate, consent for publication, competing interests.

The authors declare that they have no competing interests.

Additional information

Publisher's note.

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License ( http://creativecommons.org/licenses/by/4.0/ ), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.

Reprints and permissions

About this article

Cite this article.

Dash, S., Shakyawar, S.K., Sharma, M. et al. Big data in healthcare: management, analysis and future prospects. J Big Data 6 , 54 (2019). https://doi.org/10.1186/s40537-019-0217-0

Download citation

Received : 17 January 2019

Accepted : 06 June 2019

Published : 19 June 2019

DOI : https://doi.org/10.1186/s40537-019-0217-0

Share this article

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

  • Biomedical research
  • Big data analytics
  • Internet of things
  • Personalized medicine
  • Quantum computing

case study health data

Loading metrics

Open Access

Good practices for clinical data warehouse implementation: A case study in France

* E-mail: [email protected]

Affiliations Mission Data, Haute Autorité de Santé, Saint-Denis, France, Inria, Soda team, Palaiseau, France

ORCID logo

Affiliation Mission Data, Haute Autorité de Santé, Saint-Denis, France

Affiliations Univ. Lille, CHU Lille, ULR 2694—METRICS: Évaluation des Technologies de santé et des Pratiques médicales, Lille, France, Fédération régionale de recherche en psychiatrie et santé mentale (F2RSM Psy), Hauts-de-France, Saint-André-Lez-Lille, France

Affiliation Sorbonne Université, Inserm, Université Sorbonne Paris-Nord, Laboratoire d’informatique médicale et d’ingénierie des connaissances en e-Santé, LIMICS, France

  • Matthieu Doutreligne, 
  • Adeline Degremont, 
  • Pierre-Alain Jachiet, 
  • Antoine Lamer, 
  • Xavier Tannier

PLOS

Published: July 6, 2023

  • https://doi.org/10.1371/journal.pdig.0000298
  • Reader Comments

29 Sep 2023: Doutreligne M, Degremont A, Jachiet PA, Lamer A, Tannier X (2023) Correction: Good practices for clinical data warehouse implementation: A case study in France. PLOS Digital Health 2(9): e0000369. https://doi.org/10.1371/journal.pdig.0000369 View correction

Fig 1

Real-world data (RWD) bears great promises to improve the quality of care. However, specific infrastructures and methodologies are required to derive robust knowledge and brings innovations to the patient. Drawing upon the national case study of the 32 French regional and university hospitals governance, we highlight key aspects of modern clinical data warehouses (CDWs): governance, transparency, types of data, data reuse, technical tools, documentation, and data quality control processes. Semi-structured interviews as well as a review of reported studies on French CDWs were conducted in a semi-structured manner from March to November 2022. Out of 32 regional and university hospitals in France, 14 have a CDW in production, 5 are experimenting, 5 have a prospective CDW project, 8 did not have any CDW project at the time of writing. The implementation of CDW in France dates from 2011 and accelerated in the late 2020. From this case study, we draw some general guidelines for CDWs. The actual orientation of CDWs towards research requires efforts in governance stabilization, standardization of data schema, and development in data quality and data documentation. Particular attention must be paid to the sustainability of the warehouse teams and to the multilevel governance. The transparency of the studies and the tools of transformation of the data must improve to allow successful multicentric data reuses as well as innovations in routine care.

Author summary

Reusing routine care data does not come free of charges. Attention must be paid to the entire life cycle of the data to create robust knowledge and develop innovation. Building upon the first overview of CDWs in France, we document key aspects of the collection and organization of routine care data into homogeneous databases: governance, transparency, types of data, data reuse main objectives, technical tools, documentation, and data quality control processes. The landscape of CDWs in France dates from 2011 and accelerated in the late 2020, showing a progressive but still incomplete homogenization. National and European projects are emerging, supporting local initiatives in standardization, methodological work, and tooling. From this sample of CDWs, we draw general recommendations aimed at consolidating the potential of routine care data to improve healthcare. Particular attention must be paid to the sustainability of the warehouse teams and to the multilevel governance. The transparency of the data transformation tools and studies must improve to allow successful multicentric data reuses as well as innovations for the patient.

Citation: Doutreligne M, Degremont A, Jachiet P-A, Lamer A, Tannier X (2023) Good practices for clinical data warehouse implementation: A case study in France. PLOS Digit Health 2(7): e0000298. https://doi.org/10.1371/journal.pdig.0000298

Editor: Dukyong Yoon, Yonsei University College of Medicine, REPUBLIC OF KOREA

Copyright: © 2023 Doutreligne et al. This is an open access article distributed under the terms of the Creative Commons Attribution License , which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.

Funding: MD, AD, PAJ salaries were funded by the French Haute Autorité de Santé (HAS). XT received fundings to participate in interviews and participate to the article redaction. AL received no fundings for this study. The funders validated the study original idea and the study conclusions. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.

Competing interests: I have read the journal’s policy and the authors of this manuscript have the following competing interests: The first author did a (non-paid) visiting in Leo Anthony Celi’s lab during the first semester of 2023.

Introduction

Real-world data.

Health information systems (HIS) are increasingly collecting routine care data [ 1 – 7 ]. This source of real-world data (RWD) [ 8 ] bears great promises to improve the quality of care. On the one hand, the use of this data translates into direct benefits—primary uses—for the patient by serving as the cornerstone of the developing personalized medicine [ 9 , 10 ]. They also bring indirect benefits—secondary uses—by accelerating and improving knowledge production: on pathologies [ 11 ], on the conditions of use of health products and technologies [ 12 , 13 ], on the measures of their safety [ 14 ], efficacy or usefulness in everyday practice [ 15 ]. They can also be used to assess the organizational impact of health products and technologies [ 16 , 17 ].

In recent years, health agencies in many countries have conducted extensive work to better support the generation and use of real-life data [ 8 , 17 – 19 ]. Study programs have been launched by regulatory agencies: the DARWIN EU program by the European Medicines Agency and the Real World Evidence Program by the Food and Drug Administration [ 20 ].

Clinical data warehouse

In practice, the possibility of mobilizing these routinely collected data depends very much on their degree of concentration, in a gradient that goes from centralization in a single, homogenous HIS to fragmentation in a multitude of HIS with heterogeneous formats. The structure of the HIS reflects the governance structure. Thus, the ease of working with these data depends heavily on the organization of the healthcare actors. The 2 main sources of RWD are insurance claims—more centralized—and clinical data—more fragmented.

Claims data is often collected by national agencies into centralized repositories. In South Korea, the government agency responsible for healthcare system performance and quality (HIRA) is connected to the HIS of all healthcare stakeholders. HIRA data consists of national insurance claims [ 21 ]. England has a centralized healthcare system under the National Health Service (NHS). Despite not having detailed clinical data, this allowed the NHS to merge claims data with detailed data from 2 large urban medicine databases, corresponding to the 2 major software publishers [ 22 ]. This data is currently accessed through Opensafely, a first platform focused on Coronavirus Disease 2019 (COVID-19) research [ 23 ]. In the United States, even if scattered between different insurance providers, claims are pooled into large databases such as Medicare, Medicaid, or IBM MarketScan. Lastly, in Germany, the distinct federal claims have been centralized only very recently [ 24 ].

Clinical data on the other hand, tends to be distributed among many entities, that made different choices, without common management or interoperability. But large institutional data-sharing networks begin to emerge. South Korea very recently launched an initiative to build a national wide data network focused on intensive care. United States is building Chorus4ai, an analysis platform pooling data from 14 university hospitals [ 25 ]. To unlock the potential of clinical data, the German Medical Informatics Initiative [ 26 ] created 4 consortia in 2018. They aim at developing technical and organizational solutions to improve the consistency of clinical data.

Israel stands out as one of the rare countries that pooled together both claims and clinical data at a large scale: half of the population depends on 1 single healthcare provider and insurer [ 27 ].

An infrastructure is needed to pool data data from 1 or more medical information systems—whatever the organizational framework—to homogeneous formats, for management, research, or care reuses [ 28 , 29 ]. Fig 1 illustrates for a CDW, the 4 phases of data flow from the various sources that make up the HIS:

  • Collection and copying of original sources.
  • Integration of sources into a unique database.
  • Deduplication of identifiers.
  • Standardization: A unique data model, independent of the software models harmonizes the different sources in a common schema, possibly with common nomenclatures.
  • Pseudonymization: Removal of directly identifying elements.
  • Provision of subpopulation data sets and transformed datamarts for primary and secondary reuse.
  • Usages thanks to dedicated applications and tools accessing the datamarts and data sets.

In France, the national insurer collects all hospital activity and city care claims into a unique reimbursement database [ 13 ]. However, clinical data is historically scattered at each care site in numerous HISs. Several hospitals deployed efforts for about 10 years to create CDWs from electronic medical records [ 30 – 39 ]. This work has accelerated recently, with the beginning of CDWs structuring at the regional and national levels. Regional cooperation networks are being set up—such as the Ouest Data Hub [ 40 ]. In July 2022, the Ministry of Health opened a 50 million euros call for projects to set up and strengthen a network of hospital CDWs coordinated with the national platform, the Health Data Hub by 2025.

thumbnail

  • PPT PowerPoint slide
  • PNG larger image
  • TIFF original image

CDW: Four steps of data flow from the Hospital Information System: (1) collection, (2) transformations, and (3) provisioning. CDW, clinical data warehouse.

https://doi.org/10.1371/journal.pdig.0000298.g001

Based on an overview of university hospital CDWs in France, this study makes general recommendations for properly leveraging the potential of CDWs to improve healthcare. It focuses on: governance, transparency, types of data, data reuse, technical tools, documentation, and data quality control processes.

Material and methods

Interviews were conducted from March to November 2022 with 32 French regional and university hospitals, both with existing and prospective CDWs.

Ethics statement

This work has been authorized by the board of the French High Authority of Health (HAS). Every interviewed participant was asked by email for their participation and informed on the possible forms of publication: a French official report and an international publication. Furthermore, at each interview, every participant has been asked for their agreement before recording the interview. Only 1 participant refused the video to be recorded.

Semi-structured interviews were conducted on the following themes: the initiation and construction of the CDWs, the current status of the project and the studies carried out, opportunities and obstacles, and quality criteria for observational research. S1 Table lists all interviewed people with their team title. The complete form, with the precised questions, is available in S2 Table .

The interview form was sent to participants in advance and then used as a support to conduct the interviews. The interviews lasted 90 min and were recorded for reference.

Quantitative methods

Three tables detailed the structured answers in S1 Text . The first 2 tables deal with the characteristics of the actors and those of the data warehouses. We completed them based on the notes taken during the interviews, the recordings, and by asking the participants for additional information. The third table focuses on ongoing studies in the CDWs. We collected the list of these studies from the dedicated reporting portals, which we found for 8 out of 14 operational CDWs. We developed a classification of studies, based on the typology of retrospective studies described by the OHDSI research network [ 41 ]. We enriched this typology by comparing it with the collected studies resulting in the 6 following categories:

  • Outcome frequency : Incidence or prevalence estimation for a medically well-defined target population.
  • Population characterization : Characterization of a specific set of covariates. Feasibility and prescreening studies belong to this category [ 42 ].
  • Risk factors : Identification of covariates most associated with a well-defined clinical target (disease course, care event). These studies look at association study without quantifying the causal effect of the factors on the outcome of interest.
  • Treatment effect : Evaluation of the effect of a well-defined intervention on a specific outcome target. These studies intend to show a causal link between these 2 variables [ 43 ].
  • Development of diagnostic and prognostic algorithms : Improve or automate a diagnostic or prognostic process, based on clinical data from a given patient. This can take the form of a risk, a preventive score, or the implementation of a diagnostic assistance system. These studies are part of the individualized medicine approach, with the goal of inferring relevant information at the level of individual patient’s files.
  • Medical informatics : Methodological or tool oriented. These studies aim to improve the understanding and capacity for action of researchers and clinicians. They include the evaluation of a decision support tool, the extraction of information from unstructured data, or automatic phenotyping methods.

Studies were classified according to this nomenclature based on their title and description.

Fig 2 summarizes the development state of progress of CDWs in France. Out of 32 regional and university hospitals in France, 14 have a CDW in production, 5 are experimenting, 5 have a prospective CDW project, 8 did not have any CDW project at the time of writing. The results are described for all projects that are at least in the prospective stage minus the 3 that we were unable to interview after multiple reminders (Orléans, Metz, and Caen), resulting in a denominator of 21 university hospitals.

thumbnail

Base map and data from OpenStreetMap and OpenStreetMap Foundation. Link to the base layer of the map: https://github.com/mapnik/mapnik . CDW, clinical data warehouse.

https://doi.org/10.1371/journal.pdig.0000298.g002

Fig 3 shows the history of the implementation of CDWs. A distinction must be made between the first works—in blue—, which systematically precede the regulatory authorization—in green—from the French Commission on Information Technology and Liberties (CNIL).

thumbnail

CDW, clinical data warehouse.

https://doi.org/10.1371/journal.pdig.0000298.g003

The CDWs have so far been initiated by 1 or 2 people from the hospital world with an academic background in bioinformatics, medical informatics, or statistics. The sustainability of the CDW is accompanied by the construction of a cooperative environment between different actors: Medical Information Department (MID), Information Systems Department (IT), Clinical Research Department (CRD), clinical users, and the support of the management or the Institutional Medical Committee. It is also accompanied by the creation of a team, or entity, dedicated to the maintenance and implementation of the CDW. More recent initiatives, such as those of the HCL (Hospitals of the city of Lyon) or the Grand-Est region, are distinguished by an initial, institutional, and high-level support.

The CDW has a federating potential for the different business departments of the hospital with the active participation of the CRD, the IT Department, and the MID. Although there is always an operational CDW team, the human resources allocated to it vary greatly: from half a full-time equivalent to 80 people for the AP-HP, with a median of 6.0 people. The team systematically includes a coordinating physician. It is multidisciplinary with skills in public health, medical informatics, informatics (web service, database, network, infrastructure), data engineering, and statistics.

Historically, the first CDWs were based on in-house solution development. More recently, private actors are offering their services for the implementation and implementation of CDWs (15/21). These services range from technical expertise in order to build up the data flows and data cleaning up to the delivery of a platform integrating the different stages of data processing.

Management of studies

Before starting, projects are systematically analyzed by a scientific and ethical committee. A local submission and follow-up platform is often mentioned (12/21), but its functional scope is not well defined. It ranges from simple authorization of the project to the automatic provision of data into a Trusted Research Environment (TRE) [ 44 ]. The processes for starting a new project on the CDW are always communicated internally but rarely documented publicly (8/21).

Transparency

Ongoing studies in CDWs are unevenly referenced publicly on hospital websites. Some institutions have comprehensive study portals, while others list only a dozen studies on their public site while mentioning several hundreds ongoing projects during interviews. In total, we found 8 of these portals out of 14 CDWs in production. Uses other than ongoing scientific studies are very rarely documented. The publication of the list of ongoing studies is very heterogeneous and fragmented between several sources: clinicaltrials.gov, the mandatory project portal of the Health Data Hub [ 45 ] or the website of the hospital data warehouse.

Strong dependance to the HIS.

CDW data reflect the HIS used on a daily basis by hospital staff. Stakeholders point out that the quality of CDW data and the amount of work required for rapid and efficient reuse are highly dependent on the source HIS. The possibility of accessing data from an HIS in a structured and standardized format greatly simplifies its integration into the CDW and then its reuse.

Categories of data.

Although the software landscape is varied across the country, the main functionalities of HIS are the same. We can therefore conduct an analysis of the content of the CDWs, according to the main categories of common data present in the HIS.

The common base for all CDWs is constituted by data from the Patient Administrative Management software (patient identification, hospital movements) and the billing codes. Then, data flows are progressively developed from the various softwares that make up the HIS. The goal is to build a homogeneous data schema, linking the sources together, controlled by the CDW team. The prioritization of sources is done through thematic projects, which feed the CDW construction process. These projects improve the understanding of the sources involved, by confronting the CDW team with the quality issues present in the data.

Table 1 presents the different ratio of data categories integrated in French CDWs. Structured biology and texts are almost always integrated (20/21 and 20/21). The texts contain a large amount of information. They constitute unstructured data and are therefore more difficult to use than structured tables. Other integrated sources are the hospital drug circuit (prescriptions and administration, 16/21), Intense Care Unit (ICU, 2/21), or nurse forms (4/21). Imaging is rarely integrated (4/21), notably for reasons of volume. Genomic data are well identified, but never integrated, even though they are sometimes considered important and included in the CDW work program.

thumbnail

https://doi.org/10.1371/journal.pdig.0000298.t001

Data reuse.

Today, the main use put forward for the constitution of CDWs is that of scientific research.

The studies are mainly observational (non-interventional). Fig 4 presents the distribution of the 6 categories defined in Quantitative methods for 231 studies collected on the study portals of 9 hospitals. The studies focus first on population characterization (25%), followed by the development of decision support processes (24%), the study of risk factors (18%), and the treatment effect evaluations (16%).

thumbnail

https://doi.org/10.1371/journal.pdig.0000298.g004

The CDWs are used extensively for internal projects such as student theses (at least in 9/21) and serve as an infrastructure for single-service research: their great interest being the de-siloing of different information systems. For most of the institutions interviewed, there is still a lack of resources and maturity of methods and tools for conducting inter-institutional research (such as in the Grand-Ouest region of France) or via European calls for projects (EHDEN). These 2 research networks are made possible by supra-local governance and a common data schema, respectively, eHop [ 46 ] and OMOP [ 47 ]. The Paris hospitals, thanks to its regional coverage and the choice of OMOP, is also well advanced in multicentric research. At the same time, the Grand-Est region is building a network of CDW based on the model of the Grand-Ouest region, also using eHop.

CDW are used for monitoring and management (16/21).

The CDW have sometimes been initiated to improve and optimize billing coding (4/21). The clinical texts gathered in the same database are queried using keywords to facilitate the structuring of information. The data are then aggregated into indicators, some of which are reported at the national level. The construction of indicators from clinical data can also be used for the administrative management of the institution. Finally, closer to the clinic, some actors state that the CDW could also be used to provide regular and appropriate feedback to healthcare professionals on their practices. This feedback would help to increase the involvement and interest of healthcare professionals in CDW projects. The CDW is sometimes of interest for health monitoring (e.g., during COVID-19) or pharmacovigilance (13/21).

Strong interest for CDW in the context of care (13/21).

Some CDWs develop specific applications that provide new functionalities compared to care software. Search engines can be used to query all the hospital’s data gathered in the CDW, without data compartmentalization between different softwares. Dedicated interfaces can then offer a unified view of the history of a patient’s data, with inter-specialty transversality, which is particularly valuable in internal medicine. These cross-disciplinary search tools also enable healthcare professionals to conduct rapid searches in all the texts, for example, to find similar patients [ 32 ]. Uses for prevention, automation of repetitive tasks, and care coordination are also highlighted. Concrete examples are the automatic sorting of hospital prescriptions by order of complexity or the setting up of specialized channels for primary or secondary prevention.

Technical architecture

The technical architecture of modern CDWs has several layers:

  • Data processing: connection and export of source data, diverse transformation (cleaning, aggregation, filtering, standardization).
  • Data storage: database engines, file storage (on file servers or object storage), indexing engines to optimize certain queries.
  • Data exposure: raw data, APIs, dashboards, development and analysis environments, specific web applications.

Supplementary cross-functional components ensure the efficient and secure operation of the platform: identity and authorization management, activity logging, automated administration of servers and applications.

The analysis environment (Jupyterhub or RStudio datalabs) is a key component of the platform, as it allows data to be processed within the CDW infrastructure. A few CDWs had such operational datalab at the time of our study (6/21) and almost all of them have decided to provide it to researchers. Currently, clinical research teams are still often working on data extractions in less secure environments.

Data quality, standard formats

Quality tools..

Systematic data quality monitoring processes are being built in some CDWs. Often (8/21), scripts are run at regular intervals to detect technical anomalies in data flows. Rare data quality investigation tools, in the form of dashboards, are beginning to be developed internally (3/21). Theoretical reflections are underway on the possibility of automating data consistency checks, for example, demographic or temporal. Some facilities randomly pull records from the EHR to compare them with the information in the CDW.

Standard format.

No single standard data model stands out as being used by all CDWs. All are aware of the existence of the OMOP (research standard) [ 47 ] and HL7 FHIR (communication standard) models [ 48 ]. Several CDWs consider the OMOP model to be a central part of the warehouse, particularly for research purposes (9/21). This tendency has been encouraged by the European call for projects EHDEN, launched by the OHDSI research consortium, the originator of this data model. In the Grand-Ouest region of France, the CDWs use the eHop warehouse software. The latter uses a common data model also named eHop. This model will be extended with the future warehouse network of the Grand Est region also choosing this solution. Including this grouping and the other establishments that have chosen eHop, this model includes 12 establishments out of the 32 university hospitals. This allows eHop adopters to launch ambitious interregional projects. However, eHop does not define a standard nomenclature to be used in its model and is not aligned with emerging international standards.

Documentation.

Half of the CDWs have put in place documentation accessible within the organization on data flows, the meaning and proper use of qualified data (10/21 mentioned). This documentation is used by the team that develops and maintains the warehouse. It is also used by users to understand the transformations performed on the data. However, it is never publicly available. No schema of the data once it has been transformed and prepared for analysis is published.

Principal findings

We give the first overview of the CDWs in university hospitals of France with 32 hospitals reviewed. The implementation of CDW dates from 2011 and accelerated in the late 2020. Today, 24 of the university hospitals have an ongoing CDW project. From this case study, some general considerations can be drawn that should be valuable to all healthcare system implementing CDWs on a national scale.

As the CDW becomes an essential component of data management in the hospital, the creation of an autonomous internal team dedicated to data architecture, process automation, and data documentation should be encouraged [ 44 ]. This multidisciplinary team should develop an excellent knowledge of the data collection process and potential reuses in order to qualify the different flows coming from the source IS, standardize them towards a homogenous schema and harmonize the semantics. It should have a sound knowledge of public health, as well as the technical and statistical skills to develop high-quality software that facilitates data reuse.

The resources specific to the warehouse are rare and often taken from other budgets or from project-based credits. While this is natural for an initial prototyping phase, it does not seem adapted to the perennial and transversal nature of the tool. As a research infrastructure of growing importance, it must have the financial and organizational means to plan for the long term.

The governance of the CDW has multiple layers: local within the university hospital, interregional, and national/international. The first level allow to ensure the quality of data integration as well as the pertinence of data reuse by clinicians themselves. The interregional level is well adapted for resources mutualization and collaboration. Finally, the national and international levels assure coordination, encourage consensus for committing choices such as metadata or interoperability, and provide financial, technical, and regulatory support.

Health technology assessment agencies advocate for public registration of comparative observational study protocols before conducting the analysis [ 8 , 17 , 49 ]. They often refer to clinicaltrials.gov as potential but not ideal registration portal for observational studies. The research community advocates for public registrations of all observational studies [ 50 , 51 ]. More recently, it emphasizes the need for more easy data access and the publication of study code [ 29 , 52 , 53 ]. We embrace these recommendations and we point to the unfortunate duplication of these study reporting systems in France. One source could be favored at the national level and the second one automatically fed from the reference source, by agreeing on common metadata.

From a patient’s perspective, there is currently no way to know if their personal data is included for a specific project. Better patient information about the reuse of their data is needed to build trust over the long term. A strict minimum is the establishment and update of the declarative portals of ongoing studies at each institution.

Data and data usage

When using CDW, the analyst has not defined the data collection process and is generally unaware of the context in which the information is logged. This new dimension of medical research requires a much greater development of data science skills to change the focus from the implementation of the statistical design to the data engineering process. Data reuse requires more effort to prepare the data and document the transformations performed.

The more heterogeneous a HIS system is, the less qualitative would be the CDW built on top of it. There is a need for increasing interoperability, to help EHR vendors interfacing the different hospital softwares, thus facilitating CDW development. One step in this direction would be the open source publication of HIS data schema and vocabularies. At the analysis level, international recommendations insist on the need for common data formats [ 52 , 54 ]. However, there is still a lack of adoption of research standards from hospital CDWs to conduct robust studies across multiple sites. Building open-source tools on top of these standards such as those of OHDSI [ 41 ] could foster their adoption. Finally, in many clinical domains, sufficient sample size is hard to obtain without international data-sharing collaborations. Thus, more incitation is needed to maintain and update the terminology mappings between local nomenclatures and international standards.

Many ongoing studies concern the development of decision support processes whose goal is to save time for healthcare professionals. These are often research projects, not yet integrated into routine care. The analysis of study portals and the interviews revealed that data reuse oriented towards primary care is still rare and rarely supported by appropriate funding. The translation from research to clinical practice takes time and need to be supported on the long run to yield substantial results.

Tools, methods, and data formats of CDW lack harmonization due to the strong technical innovation and the presence of many actors. As suggested by the recent report on the use of data for research in the UK [ 44 ], it would be wise to focus on a small number of model technical platforms.

These platforms should favor open-source solutions to assure transparency by default, foster collaboration and consensus, and avoid technological lock-in of the hospitals.

Data quality and documentation

Quality is not sufficiently considered as a relevant scientific topic itself. However, it is the backbone of all research done within a CDW. In order to improve the quality of the data with respect to research uses, it is necessary to conduct continuous studies dedicated to this topic [ 52 , 54 – 56 ]. These studies should contribute to a reflection on methodologies and standard tools for data quality, such as those developed by the OHDSI research network [ 41 ].

Finally, there is a need for open-source publication of research code to ensure quality retrospective research [ 55 , 57 ]. Recent research in data analysis has shown that innumerable biases can lurk in training data sets [ 58 , 59 ]. Open publication of data schemas is considered an indispensable prerequisite for all data science and artificial intelligence uses [ 58 ]. Inspired by data set cards [ 58 ] and data set publication guides, it would be interesting to define a standard CDW card documenting the main data flows.

Limitations

The interviews were conducted in a semi-structured manner within a limited time frame. As a result, some topics were covered more quickly and only those explicitly mentioned by the participants could be recorded. The uneven existence of study portals introduces a bias in the recording of the types of studies conducted on CDW. Those with a transparency portal already have more maturity in use cases.

For clarity, our results are focused on the perimeter of university hospitals. We have not covered the exhaustive healthcare landscape in France. CDW initiatives also exist in primary care, in smaller hospital groups and in private companies.

Conclusions

The French CDW ecosystem is beginning to take shape, benefiting from an acceleration thanks to national funding, the multiplication of industrial players specializing in health data and the beginning of a supra-national reflection on the European Health Data Space [ 60 ]. However, some points require special attention to ensure that the potential of the CDW translates into patient benefits.

The priority is the creation and perpetuation of multidisciplinary warehouse teams capable of operating the CDW and supporting the various projects. A combination of public health, data engineering, data stewardship, statistics, and IT competences is a prerequisite for the success of the CDW. The team should be the privileged point of contact for data exploitation issues and should collaborate closely with the existing hospital departments.

The constitution of a multilevel collaboration network is another priority. The local level is essential to structure the data and understand its possible uses. Interregional, national, and international coordination would make it possible to create thematic working groups in order to stimulate a dynamic of cooperation and mutualization.

A common data model should be encouraged, with precise metadata allowing to map the integrated data, in order to qualify the uses to be developed today from the CDWs. More broadly, open-source documentation of data flows and transformations performed for quality enhancement would require more incentives to unleash the potential for innovation for all health data reusers.

Finally, the question of expanding the scope of the data beyond the purely hospital domain must be asked. Many risk factors and patient follow-up data are missing from the CDWs, but are crucial for understanding pathologies. Combining city data and hospital data would provide a complete view of patient care.

Supporting information

S1 table. list of interviewed stakeholders with their teams..

https://doi.org/10.1371/journal.pdig.0000298.s001

S2 Table. Interview form.

https://doi.org/10.1371/journal.pdig.0000298.s002

S1 Text. Study data tables.

https://doi.org/10.1371/journal.pdig.0000298.s003

Acknowledgments

We want to thanks all participants and experts interviewed for this study. We also want to thanks other people that proof read the manuscript for external review: Judith Fernandez (HAS), Pierre Liot (HAS), Bastien Guerry (Etalab), Aude-Marie Lalanne Berdouticq (Institut Santé numérique en Société), Albane Miron de L’Espinay (ministère de la Santé et de la Prévention), and Caroline Aguado (ministère de la Santé et de la Prévention). We also thank Gaël Varoquaux for his support and advice.

  • View Article
  • PubMed/NCBI
  • Google Scholar

ONC logo image

Certification of Health IT

Health information technology advisory committee (hitac), health equity, hti-1 final rule, information blocking, interoperability, patient access to health records, clinical quality and safety, health it and health information exchange basics, health it in health care settings, health it resources, laws, regulation, and policy, onc funding opportunities, onc hitech programs, privacy, security, and hipaa, scientific initiatives, standards & technology, usability and provider burden, case studies, specialists achieve meaningful use with support from kentucky’s regional extension center and the department for medicaid services, vendors and communities working together: a catalyst for interoperability and exchange, successful electronic information exchange through direct pilot implementation with cerner and the lewis and clark information exchange (lacie), medallies and the direct project support secure exchange of clinical information in ehr systems, care coordination improved through health information exchange, viewing patients as partners: patient portal implementation and adoption, urban health plan in new york uses its ehr meaningfully to improve care coordination, solo family practitioner demonstrates care coordination with referring physicians, small practice monitors clinical quality through ehr system templates, rural health clinic exchanges information with hospitals and physicians for improved coordination of care, reducing vaccine preventable disease through immunization registries, quality improvement in a primary care practice.

Open Survey

U.S. flag

An official website of the United States government

The .gov means it’s official. Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

The site is secure. The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

  • Publications
  • Account settings

Preview improvements coming to the PMC website in October 2024. Learn More or Try it out now .

  • Advanced Search
  • Journal List
  • Perspect Health Inf Manag
  • v.16(Fall); Fall 2019

Evidence-based Operations Management in Health Information Management: A Case Study

School of Biomedical Informatics at the University of Texas Health Science Center in Houston, TX.

Texas Health Resources in Arlington, TX.

This is a case study of the evidence-based management practices of a centralized health information management (HIM) department in a large integrated healthcare delivery system. The case study used interviews and focus groups, as well as de-identified dashboards, to explore the impact of reporting on the organization. The dashboards and key performance indicators (KPIs) were initially developed in 2012 and have continued to evolve. The themes that resulted include the following: (1) evidence-based management is integral to the culture of the organization; (2) communicating regularly via dashboards and KPIs is key to transmitting the value of HIM to the entire organization; and (3) staff not only report the required measures for the dashboard but also take pride in it and often develop methods for tracking their individual performance. Most evidence supporting HIM operations management is related to coding and clinical documentation improvement, but even in those areas, national benchmarks are missing. It is important for the HIM profession to develop national and regional benchmarks to assist professionals in managing operations effectively and communicating their value to the healthcare industry.

Introduction

As a profession, health information management (HIM) has been actively engaged in data management since at least 1998. 1 The last two decades have seen continued growth and use of data in healthcare. However, even today, the use of data to manage health information operations deserves investigation. This case study was undertaken to explore how the centralized HIM department of a large integrated healthcare delivery system uses data to manage its operations.

This integrated healthcare delivery system is a faith-based, nonprofit system that cares for more patients in North Texas than any other provider. The system's primary service area consists of 16 counties, home to more than 7 million people. The healthcare delivery system was formed in 1997 with the assets of two large existing hospital systems. Later that year, another hospital in the area joined the system. Currently, the system has 27 hospital locations, including 18 acute care hospital locations, five short-stay hospitals, three rehabilitation hospitals, and one transitional care hospital, all owned, operated, joint-ventured, or affiliated with the system. It has more than 4,000 licensed beds, employs more than 25,000 people, and counts more than 6,200 physicians with active staff privileges at its hospitals.

To our knowledge, this is the first case study exploring the use of dashboards and key performance indicators (KPIs), or evidence-based management, in an HIM department. The authors believed this study was needed to demonstrate how data can be used effectively and to suggest additional areas for HIM operations data analytics.

This case study is important because it begins to build the foundation for evidence-based HIM operations management. Hopefully, this case study will also be used by HIM educational programs.

Literature Review

The HIM profession has approached data management from many different perspectives, yet publications related to evidence-based HIM operations management are difficult to find. Both PubMed and the American Health Information Management Association (AHIMA) HIM Library were searched, revealing some research, most often related to patient record coding and clinical documentation improvement. 2 , 3 Unfortunately, this focus does not encompass the entire range of HIM operations, leaving out scanning and release of information, at a minimum. Other articles are focused on more broad-based analytics, applicable to the delivery of healthcare rather than the management of HIM operations. 4 , 5 While valuable, this information does not assist HIM professionals in managing their day-to-day operations.

Since approximately 2014, AHIMA, as the HIM professional association, has focused on information governance, performing studies and creating many resources for healthcare organizations to implement information governance. In 2017, Houser and colleagues discussed the need for information governance related to support for analytics. 6 This article reviewed several models that can be used when managing the information needed for management, but it did not address the actual use of the data and information for operations management. Likewise, the practice brief for data analytics reporting provides guidance regarding the reporting lifecycle, reporting methods, and tools for reporting; however, no actual example is provided. 7

The lack of a comprehensive review of evidence-based operations management in a case study or other form of research reveals a gap in the HIM literature. The following case study is expected to provide only a starting point for evidence-based HIM operations management.

This case study is a joint project between a graduate program in health informatics and the large integrated healthcare delivery system. It was approved by the university's Committee for the Protection of Human Subjects, approval number HSC-SBMI-18-0567.

The methods chosen for conducting the analysis of the health delivery system's evidence-based HIM operations were interviews and focus groups over a two-day period. All interviewees and focus group participants signed forms indicating their informed consent to participate in the case study. The questions included a description of the KPIs used, how they are selected and calculated, and how data are collected for each KPI. They were also asked about the evolution of the KPI reporting and what they liked best or least about using and reporting KPIs. The focus group and interview questions were approved by the institutional review board and can be found in Appendix A.

The interviews were held with the vice president of health information management services (HIMS) and clinical documentation improvement (CDI), the direct supervisor, and the direct reports to this position. Focus groups were held with the coding, clinical documentation improvement, data integrity, release of information, and operations and regulatory compliance units. Transcripts were made of all sessions, and grounded theory was used as the analysis method. A total of 50 persons took part in the focus groups, with 6 persons interviewed individually.

Results and Discussion

This case study explores the evidence-based management of a centralized HIM department. For clarity, this study focuses on the reporting from each department to the VP of HIMS and CDI, as well as to senior-level executives at the system level and executives at the entity level across the healthcare organization.

Themes that emerged from the interviews and focus groups are as follows:

  • The focus on evidence-based management is pervasive across and throughout the organization. Most, if not all, organizational units have dashboards to help them manage their areas using KPIs.
  • Communicating regularly via dashboards and KPIs not only enables more effective management but also ensures that senior management understands the impact of HIM operations on the overall health of the organization.
  • Setting and achieving goals gives the HIM and CDI personnel a sense of pride in doing their job well. The staff report measures beyond those required for the dashboard. More than one person reported having created their own dashboard to track their individual performance.

Organizational Structure

The HIM organization is centralized and complex, as might be expected when managing HIM operations for 19 hospitals and related organizations. The organizational structure is found in Figure ​ Figure1 1 .

An external file that holds a picture, illustration, etc.
Object name is phim0016-0001f-f01.jpg

Health Information Management Services and Clinical Documentation Improvement Organizational Structure

The initial organizational structure was established in 2012 after a two-day rapid design session that involved all HIM directors and managers along with representatives from human resources and information technology. Key objectives of the rapid design session were to design and build the enterprise HIM model, identify best practices, create performance specifications, and develop methods of communication. The organization designed a unified system approach to streamline operations and achieve excellence, with the long-term expectation of benchmarking operations. Consistent quality and timeliness of data reporting across the enterprise, with a focus on developing and using leading-edge tools and enablers to consistently support a leveraged enterprise HIM model, was a key initiative. From the beginning, the organization developed standardized KPIs to be included in a reporting matrix for each functional area. Performance baselines were established to enable postimplementation comparisons. Over time, the organizational structure evolved as responsibilities were added under the VP's leadership.

Use of Dashboards and KPIs

The dashboards and KPIs for HIM operations have been in development for six years and continue to evolve. Staff who worked in other healthcare organizations or those who worked at the healthcare delivery system before the centralization of HIM services experienced a period of adjustment related to the extensive reporting and sharing of data. They reported initially feeling that the dashboard reporting would be used as a “gotcha.” However, they discovered that reporting the data allowed them to identify opportunities for improvement, as well as providing evidence that made it possible to celebrate achievements. Not all goals are achieved; this is consistent with the practice of setting “stretch” goals.

Data are gathered in a variety of ways from each staff member. Examples include turnaround times for different types of requests for patient records; physician documentation compliance by documentation type; data integrity, as demonstrated by duplicate accounts; management of record scanning; coding productivity and denials; and financial analysis of HIM operations. The summary HIM dashboard is seen in Figure ​ Figure2, 2 , while Figure ​ Figure3 3 shows the CDI dashboard. To preserve confidentiality for the organization, the dashboards show synthetic data. The structure of the dashboards is accurate.

An external file that holds a picture, illustration, etc.
Object name is phim0016-0001f-f02.jpg

Health Information Management Services Summary Dashboard

An external file that holds a picture, illustration, etc.
Object name is phim0016-0001f-f03.jpg

Clinical Documentation Improvement Summary Dashboard

In addition to an overall view of the measures for the integrated healthcare delivery system, each hospital or separate organizational unit receives a dashboard detailing their performance for all of the measures. For example, the integrated health delivery system may be compliant with the standard of 95 percent completion of history and physical update within 24 hours for a given measurement period, while one or more of the hospitals or organizational units may not be in compliance with the standard.

Using a dashboard over a long period does not guarantee problem-free management. During the interviews and focus groups, participants noted issues that had recently been encountered in the reporting for release of information. Over several months, the staff reported that they believed the dashboard numbers for a specific type of release request were incorrect; they contended that the numbers on the dashboard were not consistent with what they witnessed in their day-to-day operations. These reports prompted further investigation into the dashboard data that were automatically extracted from the electronic health record (EHR). It was eventually determined that a recent upgrade to the EHR had altered the reporting related to the release requests. This anecdote demonstrates that the frontline staff pay attention to the dashboard. Further, they feel empowered to report inconsistencies and discrepancies they discover in the data reported on the dashboard.

As reported by the healthcare organization employees, benchmarks are an essential component of a useful dashboard. However, as seen in Figure ​ Figure2, 2 , 41 of the 55 measures, or approximately 74.5 percent of the HIM operational measures used by this data-driven organization have no comparable industrywide benchmarks. Although the organizational performance data is synthetic, the industry and organizational benchmarks are accurate. This lack of comparable benchmarks was the topic of a recent Journal of AHIMA article focused on coding accuracy. 8 It is reasonable to suggest that this lack of comparable industrywide benchmarks applies to a majority of HIM operations.

Careful attention to Figure ​ Figure2 2 reveals multiple measures where the organization's standard is much stricter than the industry standard or there is an organizational standard without an industry standard. For example, in the release of information category, the organizational standard for continued care request turnaround time is 7 days, whereas the Texas requirement is 15 days. The organizational standard for stat request turnaround time is 30 minutes; there is no industry standard. Similarly, the organization's standard for medical record delinquency rate is 25 percent, as opposed to the Joint Commission standard of 50 percent. Both management and staff report that the use of the dashboard and KPIs has resulted in an overall lowering of the organizational targets over time, demonstrating performance improvement.

Use of the dashboard over time can also assist the HIM department with compliance and external audits. In this organization, the transcription section was required to undergo an external audit of the data reported in the dashboard. (This is a standard practice at many large organizations. External auditors examine different practices and processes in the organization to ensure accuracy and compliance with regulations.) All of the transcription data were reviewed for accuracy. The data collection sources were examined, as were the numbers reported. Because of the documentation supporting the processes and the data collected, the audit resulted in no recommendations for improvement.

Additional Considerations

The current dashboard used by this healthcare delivery system has evolved over several years to meet the business needs of the organization and to align with strategic goals of the enterprise. The dashboard not only contains data from a system perspective; it also contains data at the hospital level, allowing transparency and benchmarking. It was important to the organization to create a culture of transparency that demonstrates the value of a centralized HIM model. Leaders across the enterprise use the dashboard data to identify patterns or trends and to perform internal comparisons with hospitals of similar size. This level of transparency has created a true partnership for improvement on specific measures between HIM and other departments within the organization. The dashboard has allowed improvement in HIM operations and in quality outcomes through collaborative efforts across the enterprise. More than one person reported that stakeholders look for the monthly dashboard and appreciate the level of transparency.

HIM professionals wishing to initiate a dashboard should choose a starting point. This starting point could be a single measure or a single organizational unit. For example, coding might be reasonable because HIM departments commonly track their coding productivity. Once this reporting is standardized and everyone is comfortable with the reporting, additional units such as release of information or documentation compliance can be added until all operations under HIM supervision have KPIs included on the dashboard.

Limitations

As with all case studies, one of the limitations of this study is the examination of a single organization. Other HIM departments in other healthcare delivery organizations are likely to have different needs for their reports and/or dashboards, so these results cannot be generalized. Additionally, the subjective nature of the case study method may influence the results, a case study can be difficult to replicate, and case studies are time consuming.

This case study is the first thorough examination of evidence-based HIM operations management. As such, it exposes both challenges and benefits of using data to manage operations. Initial challenges include securing employee cooperation for a new management process, efficiently collecting the data, and producing the dashboard in a timely fashion. Benefits include substantiation of HIM operations effectiveness, HIM professionals' pride in their jobs, and validation of HIM reporting under internal or external review.

This study especially noted a lack of industrywide benchmarks that would be useful for HIM operations management. This deficiency should be concerning for the HIM profession as data become ever more ubiquitous and important in all aspects of healthcare delivery. AHIMA is the logical organization to lead the effort to collect HIM operations management data that its members can use for analytics and evidence to support operations. AHIMA could become a source of benchmarks and a resource for the healthcare industry.

Focus Group and Interview Questions Outline

  • Can everyone please introduce yourself, including your position at XXX, the unit and how long you have worked in your position?
  • Tell me about the KPIs you either use for your job or your unit uses to report performance. How many KPIs do you use? How were they chosen? Can you give me details about their calculation? How do you collect data for these KPIs?
  • Are these the same KPIs you have always used or has there been an evolution? If there has been an evolution can you walk me through that process? Where did you begin with the KPIs? How have they been changed or modified over time?
  • Can you tell me about any reports or other documents that you would believe helpful to the case study? All documents will be cleared by the XXX co-investigator for appropriateness.
  • How has using/reporting the KPIs changed how you do your job or how you view your job or the requirements of your job?
  • What do you like best about using and reporting KPIs? What would you change about it if you could?
  • Is there anything else you would like to share with me about your job, your unit, and performance monitoring at XXX?
  • Can I answer any questions for you?

Contributor Information

SH Fenton, School of Biomedical Informatics at the University of Texas Health Science Center in Houston, TX.

DH Smith, Texas Health Resources in Arlington, TX.

Healthcare Data Management: Three Case Studies

Clinician analyzes patient healthcare data

The HIMSS Davies Awards program promotes HIMSS’s vision and mission by recognizing and sharing case studies, model practices and lessons learned on how to improve health and wellness through the power of information and technology. 2019 Davies Award winner Yale New Haven Hospital was recognized for enhancing healthcare data management in a variety of scenarios. Explore their three case studies below to learn more.

Case Studies

1. redesigning the neurology and neurosurgery intensive care unit.

Yale New Haven Hospital needed to enable direct communication between care providers within their redesigned neurology and neurosurgery intensive care unit. This was a transition from a large, multi-isolates model—where multiple care providers could easily ask a nearby colleague for assistance, hear if a neonate alarm went off a few feet away and notice when a colleague was struggling with an infant; to private, isolated patient care rooms—where no care provider may be present at times and providers are isolated from their colleagues.

Download the Case Study

2. Addressing the Opioid Crisis

In light of the national opioid crisis, Yale New Haven Health wanted to reduce the risk of addiction by minimizing the euphoric variable associated with a bolus of opioid administered intravenously, which can lead to dependency. They also wanted to reduce the number of opioid pills prescribed post-discharge to discourage habitual use.

3. Implementing a Capacity Coordination Center

Yale New Haven Hospital needed to facilitate continued success of their safe patient flow initiatives to provide additional value to patients. They put a clinical redesign effort in place to achieve specific process improvement goals related to patient discharge, admissions from the emergency department, reducing wait and transport times, and more. Their Capacity Coordination Center was implemented to expedite these efforts, remove communication barriers and improve the flow of healthcare data management.

case study health data

Health Case Studies

(29 reviews)

case study health data

Glynda Rees, British Columbia Institute of Technology

Rob Kruger, British Columbia Institute of Technology

Janet Morrison, British Columbia Institute of Technology

Copyright Year: 2017

Publisher: BCcampus

Language: English

Formats Available

Conditions of use.

Attribution-ShareAlike

Learn more about reviews.

Reviewed by Jessica Sellars, Medical assistant office instructor, Blue Mountain Community College on 10/11/23

This is a book of compiled and very well organized patient case studies. The author has broken it up by disease patient was experiencing and even the healthcare roles that took place in this patients care. There is a well thought out direction and... read more

Comprehensiveness rating: 5 see less

This is a book of compiled and very well organized patient case studies. The author has broken it up by disease patient was experiencing and even the healthcare roles that took place in this patients care. There is a well thought out direction and plan. There is an appendix to refer to as well if you are needing to find something specific quickly. I have been looking for something like this to help my students have a base to do their project on. This is the most comprehensive version I have found on the subject.

Content Accuracy rating: 5

This is a book compiled of medical case studies. It is very accurate and can be used to learn from great care and mistakes.

Relevance/Longevity rating: 5

This material is very relevant in this context. It also has plenty of individual case studies to utilize in many ways in all sorts of medical courses. This is a very useful textbook and it will continue to be useful for a very long time as you can still learn from each study even if medicine changes through out the years.

Clarity rating: 5

The author put a lot of thought into the ease of accessibility and reading level of the target audience. There is even a "how to use this resource" section which could be extremely useful to students.

Consistency rating: 5

The text follows a very consistent format throughout the book.

Modularity rating: 5

Each case study is individual broken up and in a group of similar case studies. This makes it extremely easy to utilize.

Organization/Structure/Flow rating: 5

The book is very organized and the appendix is through. It flows seamlessly through each case study.

Interface rating: 5

I had no issues navigating this book, It was clearly labeled and very easy to move around in.

Grammatical Errors rating: 5

I did not catch any grammar errors as I was going through the book

Cultural Relevance rating: 5

This is a challenging question for any medical textbook. It is very culturally relevant to those in medical or medical office degrees.

I have been looking for something like this for years. I am so happy to have finally found it.

Reviewed by Cindy Sun, Assistant Professor, Marshall University on 1/7/23

Interestingly, this is not a case of ‘you get what you pay for’. Instead, not only are the case studies organized in a fashion for ease of use through a detailed table of contents, the authors have included more support for both faculty and... read more

Interestingly, this is not a case of ‘you get what you pay for’. Instead, not only are the case studies organized in a fashion for ease of use through a detailed table of contents, the authors have included more support for both faculty and students. For faculty, the introduction section titled ‘How to use this resource’ and individual notes to educators before each case study contain application tips. An appendix overview lists key elements as issues / concepts, scenario context, and healthcare roles for each case study. For students, learning objectives are presented at the beginning of each case study to provide a framework of expectations.

The content is presented accurately and realistic.

The case studies read similar to ‘A Day In the Life of…’ with detailed intraprofessional communications similar to what would be overheard in patient care areas. The authors present not only the view of the patient care nurse, but also weave interprofessional vantage points through each case study by including patient interaction with individual professionals such as radiology, physician, etc.

In addition to objective assessment findings, the authors integrate standard orders for each diagnosis including medications, treatments, and tests allowing the student to incorporate pathophysiology components to their assessments.

Each case study is arranged in the same framework for consistency and ease of use.

This compilation of eight healthcare case studies focusing on new onset and exacerbation of prevalent diagnoses, such as heart failure, deep vein thrombosis, cancer, and chronic obstructive pulmonary disease advancing to pneumonia.

Each case study has a photo of the ‘patient’. Simple as this may seem, it gives an immediate mental image for the student to focus.

Interface rating: 4

As noted by previous reviewers, most of the links do not connect active web pages. This may be due to the multiple options for accessing this resource (pdf download, pdf electronic, web view, etc.).

Grammatical Errors rating: 4

A minor weakness that faculty will probably need to address prior to use is regarding specific term usages differences between Commonwealth countries and United States, such as lung sound descriptors as ‘quiet’ in place of ‘diminished’ and ‘puffers’ in place of ‘inhalers’.

The authors have provided a multicultural, multigenerational approach in selection of patient characteristics representing a snapshot of today’s patient population. Additionally, one case study focusing on heart failure is about a middle-aged adult, contrasting to the average aged patient the students would normally see during clinical rotations. This option provides opportunities for students to expand their knowledge on risk factors extending beyond age.

This resource is applicable to nursing students learning to care for patients with the specific disease processes presented in each case study or for the leadership students focusing on intraprofessional communication. Educators can assign as a supplement to clinical experiences or as an in-class application of knowledge.

Reviewed by Stephanie Sideras, Assistant Professor, University of Portland on 8/15/22

The eight case studies included in this text addressed high frequency health alterations that all nurses need to be able to manage competently. While diabetes was not highlighted directly, it was included as a potential comorbidity. The five... read more

The eight case studies included in this text addressed high frequency health alterations that all nurses need to be able to manage competently. While diabetes was not highlighted directly, it was included as a potential comorbidity. The five overarching learning objectives pulled from the Institute of Medicine core competencies will clearly resonate with any faculty familiar with Quality and Safety Education for Nurses curriculum.

The presentation of symptoms, treatments and management of the health alterations was accurate. Dialogue between the the interprofessional team was realistic. At times the formatting of lab results was confusing as they reflected reference ranges specific to the Canadian healthcare system but these occurrences were minimal and could be easily adapted.

The focus for learning from these case studies was communication - patient centered communication and interprofessional team communication. Specific details, such as drug dosing, was minimized, which increases longevity and allows for easy individualization of the case data.

While some vocabulary was specific to the Canadian healthcare system, overall the narrative was extremely engaging and easy to follow. Subjective case data from patient or provider were formatted in italics and identified as 'thoughts'. Objective and behavioral case data were smoothly integrated into the narrative.

The consistency of formatting across the eight cases was remarkable. Specific learning objectives are identified for each case and these remain consistent across the range of cases, varying only in the focus for the goals for each different health alterations. Each case begins with presentation of essential patient background and the progress across the trajectory of illness as the patient moves from location to location encountering different healthcare professionals. Many of the characters (the triage nurse in the Emergency Department, the phlebotomist) are consistent across the case situations. These consistencies facilitate both application of a variety of teaching methods and student engagement with the situated learning approach.

Case data is presented by location and begins with the patient's first encounter with the healthcare system. This allows for an examination of how specific trajectories of illness are manifested and how care management needs to be prioritized at different stages. This approach supports discussions of care transitions and the complexity of the associated interprofessional communication.

The text is well organized. The case that has two levels of complexity is clearly identified

The internal links between the table of contents and case specific locations work consistently. In the EPUB and the Digital PDF the external hyperlinks are inconsistently valid.

The grammatical errors were minimal and did not detract from readability

Cultural diversity is present across the cases in factors including race, ethnicity, socioeconomic status, family dynamics and sexual orientation.

The level of detail included in these cases supports a teaching approach to address all three spectrums of learning - knowledge, skills and attitudes - necessary for the development of competent practice. I also appreciate the inclusion of specific assessment instruments that would facilitate a discussion of evidence based practice. I will enjoy using these case to promote clinical reasoning discussions of data that is noticed and interpreted with the resulting prioritizes that are set followed by reflections that result from learner choices.

Reviewed by Chris Roman, Associate Professor, Butler University on 5/19/22

It would be extremely difficult for a book of clinical cases to comprehensively cover all of medicine, and this text does not try. Rather, it provides cases related to common medical problems and introduces them in a way that allows for various... read more

Comprehensiveness rating: 4 see less

It would be extremely difficult for a book of clinical cases to comprehensively cover all of medicine, and this text does not try. Rather, it provides cases related to common medical problems and introduces them in a way that allows for various learning strategies to be employed to leverage the cases for deeper student learning and application.

The narrative form of the cases is less subject to issues of accuracy than a more content-based book would be. That said, the cases are realistic and reasonable, avoiding being too mundane or too extreme.

These cases are narrative and do not include many specific mentions of drugs, dosages, or other aspects of clinical care that may grow/evolve as guidelines change. For this reason, the cases should be “evergreen” and can be modified to suit different types of learners.

Clarity rating: 4

The text is written in very accessible language and avoids heavy use of technical language. Depending on the level of learner, this might even be too simplistic and omit some details that would be needed for physicians, pharmacists, and others to make nuanced care decisions.

The format is very consistent with clear labeling at transition points.

The authors point out in the introductory materials that this text is designed to be used in a modular fashion. Further, they have built in opportunities to customize each cases, such as giving dates of birth at “19xx” to allow for adjustments based on instructional objectives, etc.

The organization is very easy to follow.

I did not identify any issues in navigating the text.

The text contains no grammatical errors, though the language is a little stiff/unrealistic in some cases.

Cases involve patients and members of the care team that are of varying ages, genders, and racial/ethnic backgrounds

Reviewed by Trina Larery, Assistant Professor, Pittsburg State University on 4/5/22

The book covers common scenarios, providing allied health students insight into common health issues. The information in the book is thorough and easily modified if needed to include other scenarios not listed. The material was easy to understand... read more

The book covers common scenarios, providing allied health students insight into common health issues. The information in the book is thorough and easily modified if needed to include other scenarios not listed. The material was easy to understand and apply to the classroom. The E-reader format included hyperlinks that bring the students to subsequent clinical studies.

Content Accuracy rating: 4

The treatments were explained and rationales were given, which can be very helpful to facilitate effective learning for a nursing student or novice nurse. The case studies were accurate in explanation. The DVT case study incorrectly identifies the location of the clot in the popliteal artery instead of in the vein.

The content is relevant to a variety of different types of health care providers and due to the general nature of the cases, will remain relevant over time. Updates should be made annually to the hyperlinks and to assure current standard of practice is still being met.

Clear, simple and easy to read.

Consistent with healthcare terminology and framework throughout all eight case studies.

The text is modular. Cases can be used individually within a unit on the given disease process or relevant sections of a case could be used to illustrate a specific point providing great flexibility. The appendix is helpful in locating content specific to a certain diagnosis or a certain type of health care provider.

The book is well organized, presenting in a logical clear fashion. The appendix allows the student to move about the case study without difficulty.

The interface is easy and simple to navigate. Some links to external sources might need to be updated regularly since those links are subject to change based on current guidelines. A few hyperlinks had "page not found".

Few grammatical errors were noted in text.

The case studies include people of different ethnicities, socioeconomic status, ages, and genders to make this a very useful book.

I enjoyed reading the text. It was interesting and relevant to today's nursing student. There are roughly 25 broken online links or "pages not found", care needs to be taken to update at least annually and assure links are valid and utilizing the most up to date information.

Reviewed by Benjamin Silverberg, Associate Professor/Clinician, West Virginia University on 3/24/22

The appendix reviews the "key roles" and medical venues found in all 8 cases, but is fairly spartan on medical content. The table of contents at the beginning only lists the cases and locations of care. It can be a little tricky to figure out what... read more

Comprehensiveness rating: 3 see less

The appendix reviews the "key roles" and medical venues found in all 8 cases, but is fairly spartan on medical content. The table of contents at the beginning only lists the cases and locations of care. It can be a little tricky to figure out what is going on where, especially since each case is largely conversation-based. Since this presents 8 cases (really 7 with one being expanded upon), there are many medical topics (and venues) that are not included. It's impossible to include every kind of situation, but I'd love to see inclusion of sexual health, renal pathology, substance abuse, etc.

Though there are differences in how care can be delivered based on personal style, changing guidelines, available supplies, etc, the medical accuracy seems to be high. I did not detect bias or industry influence.

Relevance/Longevity rating: 4

Medications are generally listed as generics, with at least current dosing recommendations. The text gives a picture of what care looks like currently, but will be a little challenging to update based on new guidelines (ie, it can be hard to find the exact page in which a medication is dosed/prescribed). Even if the text were to be a little out of date, an instructor can use that to point out what has changed (and why).

Clear text, usually with definitions of medical slang or higher-tier vocabulary. Minimal jargon and there are instances where the "characters" are sorting out the meaning as well, making it accessible for new learners, too.

Overall, the style is consistent between cases - largely broken up into scenes and driven by conversation rather than descriptions of what is happening.

There are 8 (well, again, 7) cases which can be reviewed in any order. Case #2 builds upon #1, which is intentional and a good idea, though personally I would have preferred one case to have different possible outcomes or even a recurrence of illness. Each scene within a case is reasonably short.

Organization/Structure/Flow rating: 4

These cases are modular and don't really build on concepts throughout. As previously stated, case #2 builds upon #1, but beyond that, there is no progression. (To be sure, the authors suggest using case #1 for newer learners and #2 for more advanced ones.) The text would benefit from thematic grouping, a longer introduction and debriefing for each case (there are learning objectives but no real context in medical education nor questions to reflect on what was just read), and progressively-increasing difficulty in medical complexity, ethics, etc.

I used the PDF version and had no interface issues. There are minimal photographs and charts. Some words are marked in blue but those did not seem to be hyperlinked anywhere.

No noticeable errors in grammar, spelling, or formatting were noted.

I appreciate that some diversity of age and ethnicity were offered, but this could be improved. There were Canadian Indian and First Nations patients, for example, as well as other characters with implied diversity, but there didn't seem to be any mention of gender diverse or non-heterosexual people, or disabilities. The cases tried to paint family scenes (the first patient's dog was fairly prominently mentioned) to humanize them. Including more cases would allow for more opportunities to include sex/gender minorities, (hidden) disabilities, etc.

The text (originally from 2017) could use an update. It could be used in conjunction with other Open Texts, as a compliment to other coursework, or purely by itself. The focus is meant to be on improving communication, but there are only 3 short pages at the beginning of the text considering those issues (which are really just learning objectives). In addition to adding more cases and further diversity, I personally would love to see more discussion before and after the case to guide readers (and/or instructors). I also wonder if some of the ambiguity could be improved by suggesting possible health outcomes - this kind of counterfactual comparison isn't possible in real life and could be really interesting in a text. Addition of comprehension/discussion questions would also be worthwhile.

Reviewed by Danielle Peterson, Assistant Professor, University of Saint Francis on 12/31/21

This text provides readers with 8 case studies which include both chronic and acute healthcare issues. Although not comprehensive in regard to types of healthcare conditions, it provides a thorough look at the communication between healthcare... read more

This text provides readers with 8 case studies which include both chronic and acute healthcare issues. Although not comprehensive in regard to types of healthcare conditions, it provides a thorough look at the communication between healthcare workers in acute hospital settings. The cases are primarily set in the inpatient hospital setting, so the bulk of the clinical information is basic emergency care and inpatient protocol: vitals, breathing, medication management, etc. The text provides a table of contents at opening of the text and a handy appendix at the conclusion of the text that outlines each case’s issue(s), scenario, and healthcare roles. No index or glossary present.

Although easy to update, it should be noted that the cases are taking place in a Canadian healthcare system. Terms may be unfamiliar to some students including “province,” “operating theatre,” “physio/physiotherapy,” and “porter.” Units of measurement used include Celsius and meters. Also, the issue of managed care, health insurance coverage, and length of stay is missing for American students. These are primary issues that dictate much of the healthcare system in the US and a primary job function of social workers, nurse case managers, and medical professionals in general. However, instructors that wish to add this to the case studies could do so easily.

The focus of this text is on healthcare communication which makes it less likely to become obsolete. Much of the clinical information is stable healthcare practice that has been standard of care for quite some time. Nevertheless, given the nature of text, updates would be easy to make. Hyperlinks should be updated to the most relevant and trustworthy sources and checked frequently for effectiveness.

The spacing that was used to note change of speaker made for ease of reading. Although unembellished and plain, I expect students to find this format easy to digest and interesting, especially since the script is appropriately balanced with ‘human’ qualities like the current TV shows and songs, the use of humor, and nonverbal cues.

A welcome characteristic of this text is its consistency. Each case is presented in a similar fashion and the roles of the healthcare team are ‘played’ by the same character in each of the scenarios. This allows students to see how healthcare providers prioritize cases and juggle the needs of multiple patients at once. Across scenarios, there was inconsistency in when clinical terms were hyperlinked.

The text is easily divisible into smaller reading sections. However, since the nature of the text is script-narrative format, if significant reorganization occurs, one will need to make sure that the communication of the script still makes sense.

The text is straightforward and presented in a consistent fashion: learning objectives, case history, a script of what happened before the patient enters the healthcare setting, and a script of what happens once the patient arrives at the healthcare setting. The authors use the term, “ideal interactions,” and I would agree that these cases are in large part, ‘best case scenarios.’ Due to this, the case studies are well organized, clear, logical, and predictable. However, depending on the level of student, instructors may want to introduce complications that are typical in the hospital setting.

The interface is pleasing and straightforward. With exception to the case summary and learning objectives, the cases are in narrative, script format. Each case study supplies a photo of the ‘patient’ and one of the case studies includes a link to a 3-minute video that introduces the reader to the patient/case. One of the highlights of this text is the use of hyperlinks to various clinical practices (ABG, vital signs, transfer of patient). Unfortunately, a majority of the links are broken. However, since this is an open text, instructors can update the links to their preference.

Although not free from grammatical errors, those that were noticed were minimal and did not detract from reading.

Cultural Relevance rating: 4

Cultural diversity is visible throughout the patients used in the case studies and includes factors such as age, race, socioeconomic status, family dynamics, and sexual orientation. A moderate level of diversity is noted in the healthcare team with some stereotypes: social workers being female, doctors primarily male.

As a social work instructor, I was grateful to find a text that incorporates this important healthcare role. I would have liked to have seen more content related to advance directives, mediating decision making between the patient and care team, emotional and practical support related to initial diagnosis and discharge planning, and provision of support to colleagues, all typical roles of a medical social worker. I also found it interesting that even though social work was included in multiple scenarios, the role was only introduced on the learning objectives page for the oncology case.

case study health data

Reviewed by Crystal Wynn, Associate Professor, Virginia State University on 7/21/21

The text covers a variety of chronic diseases within the cases; however, not all of the common disease states were included within the text. More chronic diseases need to be included such as diabetes, cancer, and renal failure. Not all allied... read more

The text covers a variety of chronic diseases within the cases; however, not all of the common disease states were included within the text. More chronic diseases need to be included such as diabetes, cancer, and renal failure. Not all allied health care team members are represented within the case study. Key terms appear throughout the case study textbook and readers are able to click on a hyperlink which directs them to the definition and an explanation of the key term.

Content is accurate, error-free and unbiased.

The content is up-to-date, but not in a way that will quickly make the text obsolete within a short period of time. The text is written and/or arranged in such a way that necessary updates will be relatively easy and straightforward to implement.

The text is written in lucid, accessible prose, and provides adequate context for any jargon/technical terminology used

The text is internally consistent in terms of terminology and framework.

The text is easily and readily divisible into smaller reading sections that can be assigned at different points within the course. Each case can be divided into a chronic disease state unit, which will allow the reader to focus on one section at a time.

Organization/Structure/Flow rating: 3

The topics in the text are presented in a logical manner. Each case provides an excessive amount of language that provides a description of the case. The cases in this text reads more like a novel versus a clinical textbook. The learning objectives listed within each case should be in the form of questions or activities that could be provided as resources for instructors and teachers.

Interface rating: 3

There are several hyperlinks embedded within the textbook that are not functional.

The text contains no grammatical errors.

Cultural Relevance rating: 3

The text is not culturally insensitive or offensive in any way. More examples of cultural inclusiveness is needed throughout the textbook. The cases should be indicative of individuals from a variety of races and ethnicities.

Reviewed by Rebecca Hillary, Biology Instructor, Portland Community College on 6/15/21

This textbook consists of a collection of clinical case studies that can be applicable to a wide range of learning environments from supplementing an undergraduate Anatomy and Physiology Course, to including as part of a Medical or other health... read more

This textbook consists of a collection of clinical case studies that can be applicable to a wide range of learning environments from supplementing an undergraduate Anatomy and Physiology Course, to including as part of a Medical or other health care program. I read the textbook in E-reader format and this includes hyperlinks that bring the students to subsequent clinical study if the book is being used in a clinical classroom. This book is significantly more comprehensive in its approach from other case studies I have read because it provides a bird’s eye view of the many clinicians, technicians, and hospital staff working with one patient. The book also provides real time measurements for patients that change as they travel throughout the hospital until time of discharge.

Each case gave an accurate sense of the chaos that would be present in an emergency situation and show how the conditions affect the practitioners as well as the patients. The reader gets an accurate big picture--a feel for each practitioner’s point of view as well as the point of view of the patient and the patient’s family as the clock ticks down and the patients are subjected to a number of procedures. The clinical information contained in this textbook is all in hyperlinks containing references to clinical skills open text sources or medical websites. I did find one broken link on an external medical resource.

The diseases presented are relevant and will remain so. Some of the links are directly related to the Canadian Medical system so they may not be applicable to those living in other regions. Clinical links may change over time but the text itself will remain relevant.

Each case study clearly presents clinical data as is it recorded in real time.

Each case study provides the point of view of several practitioners and the patient over several days. While each of the case studies covers different pathology they all follow this same format, several points of view and data points, over a number of days.

The case studies are divided by days and this was easy to navigate as a reader. It would be easy to assign one case study per body system in an Anatomy and Physiology course, or to divide them up into small segments for small in class teaching moments.

The topics are presented in an organized way showing clinical data over time and each case presents a large number of view points. For example, in the first case study, the patient is experiencing difficulty breathing. We follow her through several days from her entrance to the emergency room. We meet her X Ray Technicians, Doctor, Nurses, Medical Assistant, Porter, Physiotherapist, Respiratory therapist, and the Lab Technicians running her tests during her stay. Each practitioner paints the overall clinical picture to the reader.

I found the text easy to navigate. There were not any figures included in the text, only clinical data organized in charts. The figures were all accessible via hyperlink. Some figures within the textbook illustrating patient scans could have been helpful but I did not have trouble navigating the links to visualize the scans.

I did not see any grammatical errors in the text.

The patients in the text are a variety of ages and have a variety of family arrangements but there is not much diversity among the patients. Our seven patients in the eight case studies are mostly white and all cis gendered.

Some of the case studies, for example the heart failure study, show clinical data before and after drug treatments so the students can get a feel for mechanism in physiological action. I also liked that the case studies included diet and lifestyle advice for the patients rather than solely emphasizing these pharmacological interventions. Overall, I enjoyed reading through these case studies and I plan to utilize them in my Anatomy and Physiology courses.

Reviewed by Richard Tarpey, Assistant Professor, Middle Tennessee State University on 5/11/21

As a case study book, there is no index or glossary. However, medical and technical terms provide a useful link to definitions and explanations that will prove useful to students unfamiliar with the terms. The information provided is appropriate... read more

As a case study book, there is no index or glossary. However, medical and technical terms provide a useful link to definitions and explanations that will prove useful to students unfamiliar with the terms. The information provided is appropriate for entry-level health care students. The book includes important health problems, but I would like to see coverage of at least one more chronic/lifestyle issue such as diabetes. The book covers adult issues only.

Content is accurate without bias

The content of the book is relevant and up-to-date. It addresses conditions that are prevalent in today's population among adults. There are no pediatric cases, but this does not significantly detract from the usefulness of the text. The format of the book lends to easy updating of data or information.

The book is written with clarity and is easy to read. The writing style is accessible and technical terminology is explained with links to more information.

Consistency is present. Lack of consistency is typically a problem with case study texts, but this book is consistent with presentation, format, and terminology throughout each of the eight cases.

The book has high modularity. Each of the case studies can be used independently from the others providing flexibility. Additionally, each case study can be partitioned for specific learning objectives based on the learning objectives of the course or module.

The book is well organized, presenting students conceptually with differing patient flow patterns through a hospital. The patient information provided at the beginning of each case is a wonderful mechanism for providing personal context for the students as they consider the issues. Many case studies focus on the problem and the organization without students getting a patient's perspective. The patient perspective is well represented in these cases.

The navigation through the cases is good. There are some terminology and procedure hyperlinks within the cases that do not work when accessed. This is troubling if you intend to use the text for entry-level health care students since many of these links are critical for a full understanding of the case.

There are some non-US variants of spelling and a few grammatical errors, but these do not detract from the content of the messages of each case.

The book is inclusive of differing backgrounds and perspectives. No insensitive or offensive references were found.

I like this text for its application flexibility. The book is useful for non-clinical healthcare management students to introduce various healthcare-related concepts and terminology. The content is also helpful for the identification of healthcare administration managerial issues for students to consider. The book has many applications.

Reviewed by Paula Baldwin, Associate Professor/Communication Studies, Western Oregon University on 5/10/21

The different case studies fall on a range, from crisis care to chronic illness care. read more

The different case studies fall on a range, from crisis care to chronic illness care.

The contents seems to be written as they occurred to represent the most complete picture of each medical event's occurence.

These case studies are from the Canadian medical system, but that does not interfere with it's applicability.

It is written for a medical audience, so the terminology is mostly formal and technical.

Some cases are shorter than others and some go in more depth, but it is not problematic.

The eight separate case studies is the perfect size for a class in the quarter system. You could combine this with other texts, videos or learning modalities, or use it alone.

As this is a case studies book, there is not a need for a logical progression in presentation of topics.

No problems in terms of interface.

I have not seen any grammatical errors.

I did not see anything that was culturally insensitive.

I used this in a Health Communication class and it has been extraordinarily successful. My studies are analyzing the messaging for the good, the bad, and the questionable. The case studies are widely varied and it gives the class insights into hospital experiences, both front and back stage, that they would not normally be able to examine. I believe that because it is based real-life medical incidents, my students are finding the material highly engaging.

Reviewed by Marlena Isaac, Instructor, Aiken Technical College on 4/23/21

This text is great to walk through patient care with entry level healthcare students. The students are able to take in the information, digest it, then provide suggestions to how they would facilitate patient healing. Then when they are faced with... read more

This text is great to walk through patient care with entry level healthcare students. The students are able to take in the information, digest it, then provide suggestions to how they would facilitate patient healing. Then when they are faced with a situation in clinical they are not surprised and now how to move through it effectively.

The case studies provided accurate information that relates to the named disease.

It is relevant to health care studies and the development of critical thinking.

Cases are straightforward with great clinical information.

Clinical information is provided concisely.

Appropriate for clinical case study.

Presented to facilitate information gathering.

Takes a while to navigate in the browser.

Cultural Relevance rating: 1

Text lacks adequate representation of minorities.

Reviewed by Kim Garcia, Lecturer III, University of Texas Rio Grande Valley on 11/16/20

The book has 8 case studies, so obviously does not cover the whole of medicine, but the cases provided are descriptive and well developed. Cases are presented at different levels of difficulty, making the cases appropriate for students at... read more

The book has 8 case studies, so obviously does not cover the whole of medicine, but the cases provided are descriptive and well developed. Cases are presented at different levels of difficulty, making the cases appropriate for students at different levels of clinical knowledge. The human element of both patient and health care provider is well captured. The cases are presented with a focus on interprofessional interaction and collaboration, more so than teaching medical content.

Content is accurate and un-biased. No errors noted. Most diagnostic and treatment information is general so it will remain relevant over time. The content of these cases is more appropriate for teaching interprofessional collaboration and less so for teaching the medical care for each diagnosis.

The content is relevant to a variety of different types of health care providers (nurses, radiologic technicians, medical laboratory personnel, etc) and due to the general nature of the cases, will remain relevant over time.

Easy to read. Clear headings are provided for sections of each case study and these section headings clearly tell when time has passed or setting has changed. Enough description is provided to help set the scene for each part of the case. Much of the text is written in the form of dialogue involving patient, family and health care providers, making it easy to adapt for role play. Medical jargon is limited and links for medical terms are provided to other resources that expound on medical terms used.

The text is consistent in structure of each case. Learning objectives are provided. Cases generally start with the patient at home and move with the patient through admission, testing and treatment, using a variety of healthcare services and encountering a variety of personnel.

The text is modular. Cases could be used individually within a unit on the given disease process or relevant sections of a case could be used to illustrate a specific point. The appendix is helpful in locating content specific to a certain diagnosis or a certain type of health care provider.

Each case follows a patient in a logical, chronologic fashion. A clear table of contents and appendix are provided which allows the user to quickly locate desired content. It would be helpful if the items in the table of contents and appendix were linked to the corresponding section of the text.

The hyperlinks to content outside this book work, however using the back arrow on your browser returns you to the front page of the book instead of to the point at which you left the text. I would prefer it if the hyperlinks opened in a new window or tab so closing that window or tab would leave you back where you left the text.

No grammatical errors were noted.

The text is culturally inclusive and appropriate. Characters, both patients and care givers are of a variety of races, ethnicities, ages and backgrounds.

I enjoyed reading the cases and reviewing this text. I can think of several ways in which I will use this content.

Reviewed by Raihan Khan, Instructor/Assistant Professor, James Madison University on 11/3/20

The book contains several important health issues, however still missing some chronic health issues that the students should learn before they join the workforce, such as diabetes-related health issues suffered by the patients. read more

The book contains several important health issues, however still missing some chronic health issues that the students should learn before they join the workforce, such as diabetes-related health issues suffered by the patients.

The health information contained in the textbook is mostly accurate.

I think the book is written focusing on the current culture and health issues faced by the patients. To keep the book relevant in the future, the contexts especially the culture/lifestyle/health care modalities, etc. would need to be updated regularly.

The language is pretty simple, clear, and easy to read.

There is no complaint about consistency. One of the main issues of writing a book, consistency was well managed by the authors.

The book is easy to explore based on how easy the setup is. Students can browse to the specific section that they want to read without much hassle of finding the correct information.

The organization is simple but effective. The authors organized the book based on what can happen in a patient's life and what possible scenarios students should learn about the disease. From that perspective, the book does a good job.

The interface is easy and simple to navigate. Some links to external sources might need to be updated regularly since those links are subject to change that is beyond the author's control. It's frustrating for the reader when the external link shows no information.

The book is free of any major language and grammatical errors.

The book might do a little better in cultural competency. e.g. Last name Singh is mainly for Sikh people. In the text Harj and Priya Singh are Muslim. the authors can consult colleagues who are more familiar with those cultures and revise some cultural aspects of the cases mentioned in the book.

The book is a nice addition to the open textbook world. Hope to see more health issues covered by the book.

Reviewed by Ryan Sheryl, Assistant Professor, California State University, Dominguez Hills on 7/16/20

This text contains 8 medical case studies that reflect best practices at the time of publication. The text identifies 5 overarching learning objectives: interprofessional collaboration, client centered care, evidence-based practice, quality... read more

This text contains 8 medical case studies that reflect best practices at the time of publication. The text identifies 5 overarching learning objectives: interprofessional collaboration, client centered care, evidence-based practice, quality improvement, and informatics. While the case studies do not cover all medical conditions or bodily systems, the book is thorough in conveying details of various patients and medical team members in a hospital environment. Rather than an index or glossary at the end of the text, it contains links to outside websites for more information on medical tests and terms referenced in the cases.

The content provided is reflective of best practices in patient care, interdisciplinary collaboration, and communication at the time of publication. It is specifically accurate for the context of hospitals in Canada. The links provided throughout the text have the potential to supplement with up-to-date descriptions and definitions, however, many of them are broken (see notes in Interface section).

The content of the case studies reflects the increasingly complex landscape of healthcare, including a variety of conditions, ages, and personal situations of the clients and care providers. The text will require frequent updating due to the rapidly changing landscape of society and best practices in client care. For example, a future version may include inclusive practices with transgender clients, or address ways medical racism implicitly impacts client care (see notes in Cultural Relevance section).

The text is written clearly and presents thorough, realistic details about working and being treated in an acute hospital context.

The text is very straightforward. It is consistent in its structure and flow. It uses consistent terminology and follows a structured framework throughout.

Being a series of 8 separate case studies, this text is easily and readily divisible into smaller sections. The text was designed to be taken apart and used piece by piece in order to serve various learning contexts. The parts of each case study can also be used independently of each other to facilitate problem solving.

The topics in the case studies are presented clearly. The structure of each of the case studies proceeds in a similar fashion. All of the cases are set within the same hospital so the hospital personnel and service providers reappear across the cases, giving a textured portrayal of the experiences of the various service providers. The cases can be used individually, or one service provider can be studied across the various studies.

The text is very straightforward, without complex charts or images that could become distorted. Many of the embedded links are broken and require updating. The links that do work are a very useful way to define and expand upon medical terms used in the case studies.

Grammatical errors are minimal and do not distract from the flow of the text. In one instance the last name Singh is spelled Sing, and one patient named Fred in the text is referred to as Frank in the appendix.

The cases all show examples of health care personnel providing compassionate, client-centered care, and there is no overt discrimination portrayed. Two of the clients are in same-sex marriages and these are shown positively. It is notable, however, that the two cases presenting people of color contain more negative characteristics than the other six cases portraying Caucasian people. The people of color are the only two examples of clients who smoke regularly. In addition, the Indian client drinks and is overweight, while the First Nations client is the only one in the text to have a terminal diagnosis. The Indian client is identified as being Punjabi and attending a mosque, although there are only 2% Muslims in the Punjab province of India. Also, the last name Singh generally indicates a person who is a Hindu or Sikh, not Muslim.

Reviewed by Monica LeJeune, RN Instructor, LSUE on 4/24/20

Has comprehensive unfolding case studies that guide the reader to recognize and manage the scenario presented. Assists in critical thinking process. read more

Has comprehensive unfolding case studies that guide the reader to recognize and manage the scenario presented. Assists in critical thinking process.

Accurately presents health scenarios with real life assessment techniques and patient outcomes.

Relevant to nursing practice.

Clearly written and easily understood.

Consistent with healthcare terminology and framework

Has a good reading flow.

Topics presented in logical fashion

Easy to read.

No grammatical errors noted.

Text is not culturally insensitive or offensive.

Good book to have to teach nursing students.

Reviewed by april jarrell, associate professor, J. Sargeant Reynolds Community College on 1/7/20

The text is a great case study tool that is appropriate for nursing school instructors to use in aiding students to learn the nursing process. read more

The text is a great case study tool that is appropriate for nursing school instructors to use in aiding students to learn the nursing process.

The content is accurate and evidence based. There is no bias noted

The content in the text is relevant, up to date for nursing students. It will be easy to update content as needed because the framework allows for addition to the content.

The text is clear and easy to understand.

Framework and terminology is consistent throughout the text; the case study is a continual and takes the student on a journey with the patient. Great for learning!

The case studies can be easily divided into smaller sections to allow for discussions, and weekly studies.

The text and content progress in a logical, clear fashion allowing for progression of learning.

No interface issues noted with this text.

No grammatical errors noted in the text.

No racial or culture insensitivity were noted in the text.

I would recommend this text be used in nursing schools. The use of case studies are helpful for students to learn and practice the nursing process.

Reviewed by Lisa Underwood, Practical Nursing Instructor, NTCC on 12/3/19

The text provides eight comprehensive case studies that showcase the different viewpoints of the many roles involved in patient care. It encompasses the most common seen diagnoses seen across healthcare today. Each case study comes with its own... read more

The text provides eight comprehensive case studies that showcase the different viewpoints of the many roles involved in patient care. It encompasses the most common seen diagnoses seen across healthcare today. Each case study comes with its own set of learning objectives that can be tweaked to fit several allied health courses. Although the case studies are designed around the Canadian Healthcare System, they are quite easily adaptable to fit most any modern, developed healthcare system.

Content Accuracy rating: 3

Overall, the text is quite accurate. There is one significant error that needs to be addressed. It is located in the DVT case study. In the study, a popliteal artery clot is mislabeled as a DVT. DVTs are located in veins, not in arteries. That said, the case study on the whole is quite good. This case study could be used as a learning tool in the classroom for discussion purposes or as a way to test student understanding of DVTs, on example might be, "Can they spot the error?"

At this time, all of the case studies within the text are current. Healthcare is an ever evolving field that rests on the best evidence based practice. Keeping that in mind, educators can easily adapt the studies as the newest evidence emerges and changes practice in healthcare.

All of the case studies are well written and easy to understand. The text includes several hyperlinks and it also highlights certain medical terminology to prompt readers as a way to enhance their learning experience.

Across the text, the language, style, and format of the case studies are completely consistent.

The text is divided into eight separate case studies. Each case study may be used independently of the others. All case studies are further broken down as the focus patient passes through each aspect of their healthcare system. The text's modularity makes it possible to use a case study as individual work, group projects, class discussions, homework or in a simulation lab.

The case studies and the diagnoses that they cover are presented in such a way that educators and allied health students can easily follow and comprehend.

The book in itself is free of any image distortion and it prints nicely. The text is offered in a variety of digital formats. As noted in the above reviews, some of the hyperlinks have navigational issues. When the reader attempts to access them, a "page not found" message is received.

There were minimal grammatical errors. Some of which may be traced back to the differences in our spelling.

The text is culturally relevant in that it includes patients from many different backgrounds and ethnicities. This allows educators and students to explore cultural relevance and sensitivity needs across all areas in healthcare. I do not believe that the text was in any way insensitive or offensive to the reader.

By using the case studies, it may be possible to have an open dialogue about the differences noted in healthcare systems. Students will have the ability to compare and contrast the Canadian healthcare system with their own. I also firmly believe that by using these case studies, students can improve their critical thinking skills. These case studies help them to "put it all together".

Reviewed by Melanie McGrath, Associate Professor, TRAILS on 11/29/19

The text covered some of the most common conditions seen by healthcare providers in a hospital setting, which forms a solid general base for the discussions based on each case. read more

The text covered some of the most common conditions seen by healthcare providers in a hospital setting, which forms a solid general base for the discussions based on each case.

I saw no areas of inaccuracy

As in all healthcare texts, treatments and/or tests will change frequently. However, everything is currently up-to-date thus it should be a good reference for several years.

Each case is written so that any level of healthcare student would understand. Hyperlinks in the text is also very helpful.

All of the cases are written in a similar fashion.

Although not structured as a typical text, each case is easily assigned as a stand-alone.

Each case is organized clearly in an appropriate manner.

I did not see any issues.

I did not see any grammatical errors

The text seemed appropriately inclusive. There are no pediatric cases and no cases of intellectually-impaired patients, but those types of cases introduce more advanced problem-solving which perhaps exceed the scope of the text. May be a good addition to the text.

I found this text to be an excellent resource for healthcare students in a variety of fields. It would be best utilized in inter professional courses to help guide discussion.

Reviewed by Lynne Umbarger, Clinical Assistant Professor, Occupational Therapy, Emory and Henry College on 11/26/19

While the book does not cover every scenario, the ones in the book are quite common and troublesome for inexperienced allied health students. The information in the book is thorough enough, and I have found the cases easy to modify for educational... read more

While the book does not cover every scenario, the ones in the book are quite common and troublesome for inexperienced allied health students. The information in the book is thorough enough, and I have found the cases easy to modify for educational purposes. The material was easily understood by the students but challenging enough for classroom discussion. There are no mentions in the book about occupational therapy, but it is easy enough to add a couple words and make inclusion simple.

Very nice lab values are provided in the case study, making it more realistic for students.

These case studies focus on commonly encountered diagnoses for allied health and nursing students. They are comprehensive, realistic, and easily understood. The only difference is that the hospital in one case allows the patient's dog to visit in the room (highly unusual in US hospitals).

The material is easily understood by allied health students. The cases have links to additional learning materials for concepts that may be less familiar or should be explored further in a particular health field.

The language used in the book is consistent between cases. The framework is the same with each case which makes it easier to locate areas that would be of interest to a particular allied health profession.

The case studies are comprehensive but well-organized. They are short enough to be useful for class discussion or a full-blown assignment. The students seem to understand the material and have not expressed that any concepts or details were missing.

Each case is set up like the other cases. There are learning objectives at the beginning of each case to facilitate using the case, and it is easy enough to pull out material to develop useful activities and assignments.

There is a quick chart in the Appendix to allow the reader to determine the professions involved in each case as well as the pertinent settings and diagnoses for each case study. The contents are easy to access even while reading the book.

As a person who attends carefully to grammar, I found no errors in all of the material I read in this book.

There are a greater number of people of different ethnicities, socioeconomic status, ages, and genders to make this a very useful book. With each case, I could easily picture the person in the case. This book appears to be Canadian and more inclusive than most American books.

I was able to use this book the first time I accessed it to develop a classroom activity for first-year occupational therapy students and a more comprehensive activity for second-year students. I really appreciate the links to a multitude of terminology and medical lab values/issues for each case. I will keep using this book.

Reviewed by Cindy Krentz, Assistant Professor, Metropolitan State University of Denver on 6/15/19

The book covers eight case studies of common inpatient or emergency department scenarios. I appreciated that they had written out the learning objectives. I liked that the patient was described before the case was started, giving some... read more

The book covers eight case studies of common inpatient or emergency department scenarios. I appreciated that they had written out the learning objectives. I liked that the patient was described before the case was started, giving some understanding of the patient's background. I think it could benefit from having a glossary. I liked how the authors included the vital signs in an easily readable bar. I would have liked to see the labs also highlighted like this. I also felt that it would have been good written in a 'what would you do next?' type of case study.

The book is very accurate in language, what tests would be prudent to run and in the day in the life of the hospital in all cases. One inaccuracy is that the authors called a popliteal artery clot a DVT. The rest of the DVT case study was great, though, but the one mistake should be changed.

The book is up to date for now, but as tests become obsolete and new equipment is routinely used, the book ( like any other health textbook) will need to be updated. It would be easy to change, however. All that would have to happen is that the authors go in and change out the test to whatever newer, evidence-based test is being utilized.

The text is written clearly and easy to understand from a student's perspective. There is not too much technical jargon, and it is pretty universal when used- for example DVT for Deep Vein Thrombosis.

The book is consistent in language and how it is broken down into case studies. The same format is used for highlighting vital signs throughout the different case studies. It's great that the reader does not have to read the book in a linear fashion. Each case study can be read without needing to read the others.

The text is broken down into eight case studies, and within the case studies is broken down into days. It is consistent and shows how the patient can pass through the different hospital departments (from the ER to the unit, to surgery, to home) in a realistic manner. The instructor could use one or more of the case studies as (s)he sees fit.

The topics are eight different case studies- and are presented very clearly and organized well. Each one is broken down into how the patient goes through the system. The text is easy to follow and logical.

The interface has some problems with the highlighted blue links. Some of them did not work and I got a 'page not found' message. That can be frustrating for the reader. I'm wondering if a glossary could be utilized (instead of the links) to explain what some of these links are supposed to explain.

I found two or three typos, I don't think they were grammatical errors. In one case I think the Canadian spelling and the United States spelling of the word are just different.

This is a very culturally competent book. In today's world, however, one more type of background that would merit delving into is the trans-gender, GLBTQI person. I was glad that there were no stereotypes.

I enjoyed reading the text. It was interesting and relevant to today's nursing student. Since we are becoming more interprofessional, I liked that we saw what the phlebotomist and other ancillary personnel (mostly different technicians) did. I think that it could become even more interdisciplinary so colleges and universities could have more interprofessional education- courses or simulations- with the addition of the nurse using social work, nutrition, or other professional health care majors.

Reviewed by Catherine J. Grott, Interim Director, Health Administration Program, TRAILS on 5/5/19

The book is comprehensive but is specifically written for healthcare workers practicing in Canada. The title of the book should reflect this. read more

The book is comprehensive but is specifically written for healthcare workers practicing in Canada. The title of the book should reflect this.

The book is accurate, however it has numerous broken online links.

Relevance/Longevity rating: 3

The content is very relevant, but some links are out-dated. For example, WHO Guidelines for Safe Surgery 2009 (p. 186) should be updated.

The book is written in clear and concise language. The side stories about the healthcare workers make the text interesting.

The book is consistent in terms of terminology and framework. Some terms that are emphasized in one case study are not emphasized (with online links) in the other case studies. All of the case studies should have the same words linked to online definitions.

Modularity rating: 3

The book can easily be parsed out if necessary. However, the way the case studies have been written, it's evident that different authors contributed singularly to each case study.

The organization and flow are good.

Interface rating: 1

There are numerous broken online links and "pages not found."

The grammar and punctuation are correct. There are two errors detected: p. 120 a space between the word "heart" and the comma; also a period is needed after Dr (p. 113).

I'm not quite sure that the social worker (p. 119) should comment that the patient and partner are "very normal people."

There are roughly 25 broken online links or "pages not found." The BC & Canadian Guidelines (p. 198) could also include a link to US guidelines to make the text more universal . The basilar crackles (p. 166) is very good. Text could be used compare US and Canadian healthcare. Text could be enhanced to teach "soft skills" and interdepartmental communication skills in healthcare.

Reviewed by Lindsey Henry, Practical Nursing Instructor, Fletcher on 5/1/19

I really appreciated how in the introduction, five learning objectives were identified for students. These objectives are paramount in nursing care and they are each spelled out for the learner. Each Case study also has its own learning... read more

I really appreciated how in the introduction, five learning objectives were identified for students. These objectives are paramount in nursing care and they are each spelled out for the learner. Each Case study also has its own learning objectives, which were effectively met in the readings.

As a seasoned nurse, I believe that the content regarding pathophysiology and treatments used in the case studies were accurate. I really appreciated how many of the treatments were also explained and rationales were given, which can be very helpful to facilitate effective learning for a nursing student or novice nurse.

The case studies are up to date and correlate with the current time period. They are easily understood.

I really loved how several important medical terms, including specific treatments were highlighted to alert the reader. Many interventions performed were also explained further, which is great to enhance learning for the nursing student or novice nurse. Also, with each scenario, a background and history of the patient is depicted, as well as the perspectives of the patient, patients family member, and the primary nurse. This really helps to give the reader a full picture of the day in the life of a nurse or a patient, and also better facilitates the learning process of the reader.

These case studies are consistent. They begin with report, the patient background or updates on subsequent days, and follow the patients all the way through discharge. Once again, I really appreciate how this book describes most if not all aspects of patient care on a day to day basis.

Each case study is separated into days. While they can be divided to be assigned at different points within the course, they also build on each other. They show trends in vital signs, what happens when a patient deteriorates, what happens when they get better and go home. Showing the entire process from ER admit to discharge is really helpful to enhance the students learning experience.

The topics are all presented very similarly and very clearly. The way that the scenarios are explained could even be understood by a non-nursing student as well. The case studies are very clear and very thorough.

The book is very easy to navigate, prints well on paper, and is not distorted or confusing.

I did not see any grammatical errors.

Each case study involves a different type of patient. These differences include race, gender, sexual orientation and medical backgrounds. I do not feel the text was offensive to the reader.

I teach practical nursing students and after reading this book, I am looking forward to implementing it in my classroom. Great read for nursing students!

Reviewed by Leah Jolly, Instructor, Clinical Coordinator, Oregon Institute of Technology on 4/10/19

Good variety of cases and pathologies covered. read more

Good variety of cases and pathologies covered.

Content Accuracy rating: 2

Some examples and scenarios are not completely accurate. For example in the DVT case, the sonographer found thrombus in the "popliteal artery", which according to the book indicated presence of DVT. However in DVT, thrombus is located in the vein, not the artery. The patient would also have much different symptoms if located in the artery. Perhaps some of these inaccuracies are just typos, but in real-life situations this simple mistake can make a world of difference in the patient's course of treatment and outcomes.

Good examples of interprofessional collaboration. If only it worked this way on an every day basis!

Clear and easy to read for those with knowledge of medical terminology.

Good consistency overall.

Broken up well.

Topics are clear and logical.

Would be nice to simply click through to the next page, rather than going through the table of contents each time.

Minor typos/grammatical errors.

No offensive or insensitive materials observed.

Reviewed by Alex Sargsyan, Doctor of Nursing Practice/Assistant Professor , East Tennessee State University on 10/8/18

Because of the case study character of the book it does not have index or glossary. However it has summary for each health case study outlining key elements discussed in each case study. read more

Because of the case study character of the book it does not have index or glossary. However it has summary for each health case study outlining key elements discussed in each case study.

Overall the book is accurately depicting the clinical environment. There are numerous references to external sites. While most of them are correct, some of them are not working. For example Homan’s test link is not working "404 error"

Book is relevant in its current version and can be used in undergraduate and graduate classes. That said, the longevity of the book may be limited because of the character of the clinical education. Clinical guidelines change constantly and it may require a major update of the content.

Cases are written very clearly and have realistic description of an inpatient setting.

The book is easy to read and consistent in the language in all eight cases.

The cases are very well written. Each case is subdivided into logical segments. The segments reflect different setting where the patient is being seen. There is a flow and transition between the settings.

Book has eight distinct cases. This is a great format for a book that presents distinct clinical issues. This will allow the students to have immersive experiences and gain better understanding of the healthcare environment.

Book is offered in many different formats. Besides the issues with the links mentioned above, overall navigation of the book content is very smooth.

Book is very well written and has no grammatical errors.

Book is culturally relevant. Patients in the case studies come different cultures and represent diverse ethnicities.

Reviewed by Justin Berry, Physical Therapist Assistant Program Director, Northland Community and Technical College, East Grand Forks, MN on 8/2/18

This text provides eight patient case studies from a variety of diagnoses, which can be utilized by healthcare students from multiple disciplines. The cases are comprehensive and can be helpful for students to determine professional roles,... read more

This text provides eight patient case studies from a variety of diagnoses, which can be utilized by healthcare students from multiple disciplines. The cases are comprehensive and can be helpful for students to determine professional roles, interprofessional roles, when to initiate communication with other healthcare practitioners due to a change in patient status, and treatment ideas. Some additional patient information, such as lab values, would have been beneficial to include.

Case study information is accurate and unbiased.

Content is up to date. The case studies are written in a way so that they will not be obsolete soon, even with changes in healthcare.

The case studies are well written, and can be utilized for a variety of classroom assignments, discussions, and projects. Some additional lab value information for each patient would have been a nice addition.

The case studies are consistently organized to make it easy for the reader to determine the framework.

The text is broken up into eight different case studies for various patient diagnoses. This design makes it highly modular, and would be easy to assign at different points of a course.

The flow of the topics are presented consistently in a logical manner. Each case study follows a patient chronologically, making it easy to determine changes in patient status and treatment options.

The text is free of interface issues, with no distortion of images or charts.

The text is not culturally insensitive or offensive in any way. Patients are represented from a variety of races, ethnicities, and backgrounds

This book would be a good addition for many different health programs.

Reviewed by Ann Bell-Pfeifer, Instructor/Program Director, Minnesota State Community and Technical College on 5/21/18

The book gives a comprehensive overview of many types of cases for patient conditions. Emergency Room patients may arrive with COPD, heart failure, sepsis, pneumonia, or as motor vehicle accident victims. It is directed towards nurses, medical... read more

The book gives a comprehensive overview of many types of cases for patient conditions. Emergency Room patients may arrive with COPD, heart failure, sepsis, pneumonia, or as motor vehicle accident victims. It is directed towards nurses, medical laboratory technologists, medical radiology technologists, and respiratory therapists and their roles in caring for patients. Most of the overview is accurate. One suggestion is to provide an embedded radiologist interpretation of the exams which are performed which lead to the patients diagnosis.

Overall the book is accurate. Would like to see updates related to the addition of direct radiography technology which is commonly used in the hospital setting.

Many aspects of medicine will remain constant. The case studies seem fairly accurate and may be relevant for up to 3 years. Since technology changes so quickly in medicine, the CT and x-ray components may need minor updates within a few years.

The book clarity is excellent.

The case stories are consistent with each scenario. It is easy to follow the structure and learn from the content.

The book is quite modular. It is easy to break it up into cases and utilize them individually and sequentially.

The cases are listed by disease process and follow a logical flow through each condition. They are easy to follow as they have the same format from the beginning to the end of each case.

The interface seems seamless. Hyperlinks are inserted which provide descriptions and references to medical procedures and in depth definitions.

The book is free of most grammatical errors. There is a place where a few words do not fit the sentence structure and could be a typo.

The book included all types of relationships and ethnic backgrounds. One type which could be added is a transgender patient.

I think the book was quite useful for a variety of health care professionals. The authors did an excellent job of integrating patient cases which could be applied to the health care setting. The stories seemed real and relevant. This book could be used to teach health care professionals about integrated care within the emergency department.

Reviewed by Shelley Wolfe, Assistant Professor, Winona State University on 5/21/18

This text is comprised of comprehensive, detailed case studies that provide the reader with multiple character views throughout a patient’s encounter with the health care system. The Table of Contents accurately reflected the content. It should... read more

This text is comprised of comprehensive, detailed case studies that provide the reader with multiple character views throughout a patient’s encounter with the health care system. The Table of Contents accurately reflected the content. It should be noted that the authors include a statement that conveys that this text is not like traditional textbooks and is not meant to be read in a linear fashion. This allows the educator more flexibility to use the text as a supplement to enhance learning opportunities.

The content of the text appears accurate and unbiased. The “five overarching learning objectives” provide a clear aim of the text and the educator is able to glean how these objectives are captured into each of the case studies. While written for the Canadian healthcare system, this text is easily adaptable to the American healthcare system.

Overall, the content is up-to-date and the case studies provide a variety of uses that promote longevity of the text. However, not all of the blue font links (if using the digital PDF version) were still in working order. I encountered links that led to error pages or outdated “page not found” websites. While the links can be helpful, continued maintenance of these links could prove time-consuming.

I found the text easy to read and understand. I enjoyed that the viewpoints of all the different roles (patient, nurse, lab personnel, etc.) were articulated well and allowed the reader to connect and gain appreciation of the entire healthcare team. Medical jargon was noted to be appropriate for the intended audience of this text.

The terminology and organization of this text is consistent.

The text is divided into 8 case studies that follow a similar organizational structure. The case studies can further be divided to focus on individual learning objectives. For example, the case studies could be looked at as a whole for discussing communication or could be broken down into segments to focus on disease risk factors.

The case studies in this text follow a similar organizational structure and are consistent in their presentation. The flow of individual case studies is excellent and sets the reader on a clear path. As noted previously, this text is not meant to be read in a linear fashion.

This text is available in many different forms. I chose to review the text in the digital PDF version in order to use the embedded links. I did not encounter significant interface issues and did not find any images or features that would distract or confuse a reader.

No significant grammatical errors were noted.

The case studies in this text included patients and healthcare workers from a variety of backgrounds. Educators and students will benefit from expanding the case studies to include discussions and other learning opportunities to help develop culturally-sensitive healthcare providers.

I found the case studies to be very detailed, yet written in a way in which they could be used in various manners. The authors note a variety of ways in which the case studies could be employed with students; however, I feel the authors could also include that the case studies could be used as a basis for simulated clinical experiences. The case studies in this text would be an excellent tool for developing interprofessional communication and collaboration skills in a variety healthcare students.

Reviewed by Darline Foltz, Assistant Professor, University of Cincinnati - Clermont College on 3/27/18

This book covers all areas listed in the Table of Contents. In addition to the detailed patient case studies, there is a helpful section of "How to Use this Resource". I would like to note that this resource "aligns with the open textbooks... read more

This book covers all areas listed in the Table of Contents. In addition to the detailed patient case studies, there is a helpful section of "How to Use this Resource". I would like to note that this resource "aligns with the open textbooks Clinical Procedures for Safer Patient Care and Anatomy and Physiology: OpenStax" as noted by the authors.

The book appears to be accurate. Although one of the learning outcomes is as follows: "Demonstrate an understanding of the Canadian healthcare delivery system.", I did not find anything that is ONLY specific to the Canadian healthcare delivery system other than some of the terminology, i.e. "porter" instead of "transporter" and a few french words. I found this to make the book more interesting for students rather than deter from it. These are patient case studies that are relevant in any country.

The content is up-to-date. Changes in medical science may occur, i.e. a different test, to treat a diagnosis that is included in one or more of the case studies, however, it would be easy and straightforward to implement these changes.

This book is written in lucid, accessible prose. The technical/medical terminology that is used is appropriate for medical and allied health professionals. Something that would improve this text would to provide a glossary of terms for the terms in blue font.

This book is consistent with current medical terminology

This text is easily divided into each of the 6 case studies. The case studies can be used singly according to the body system being addressed or studied.

Because this text is a collection of case studies, flow doesn't pertain, however the organization and structure of the case studies are excellent as they are clear and easy to read.

There are no distractions in this text that would distract or confuse the reader.

I did not identify any grammatical errors.

This text is not culturally insensitive or offensive in any way and uses patients and healthcare workers that are of a variety of races, ethnicities and backgrounds.

I believe that this text would not only be useful to students enrolled in healthcare professions involved in direct patient care but would also be useful to students in supporting healthcare disciplines such as health information technology and management, medical billing and coding, etc.

Table of Contents

  • Introduction

Case Study #1: Chronic Obstructive Pulmonary Disease (COPD)

  • Learning Objectives
  • Patient: Erin Johns
  • Emergency Room

Case Study #2: Pneumonia

  • Day 0: Emergency Room
  • Day 1: Emergency Room
  • Day 1: Medical Ward
  • Day 2: Medical Ward
  • Day 3: Medical Ward
  • Day 4: Medical Ward

Case Study #3: Unstable Angina (UA)

  • Patient: Harj Singh

Case Study #4: Heart Failure (HF)

  • Patient: Meryl Smith
  • In the Supermarket
  • Day 0: Medical Ward

Case Study #5: Motor Vehicle Collision (MVC)

  • Patient: Aaron Knoll
  • Crash Scene
  • Operating Room
  • Post Anaesthesia Care Unit (PACU)
  • Surgical Ward

Case Study #6: Sepsis

  • Patient: George Thomas
  • Sleepy Hollow Care Facility

Case Study #7: Colon Cancer

  • Patient: Fred Johnson
  • Two Months Ago
  • Pre-Surgery Admission

Case Study #8: Deep Vein Thrombosis (DVT)

  • Patient: Jamie Douglas

Appendix: Overview About the Authors

Ancillary Material

About the book.

Health Case Studies is composed of eight separate health case studies. Each case study includes the patient narrative or story that models the best practice (at the time of publishing) in healthcare settings. Associated with each case is a set of specific learning objectives to support learning and facilitate educational strategies and evaluation.

The case studies can be used online in a learning management system, in a classroom discussion, in a printed course pack or as part of a textbook created by the instructor. This flexibility is intentional and allows the educator to choose how best to convey the concepts presented in each case to the learner.

Because these case studies were primarily developed for an electronic healthcare system, they are based predominantly in an acute healthcare setting. Educators can augment each case study to include primary healthcare settings, outpatient clinics, assisted living environments, and other contexts as relevant.

About the Contributors

Glynda Rees teaches at the British Columbia Institute of Technology (BCIT) in Vancouver, British Columbia. She completed her MSN at the University of British Columbia with a focus on education and health informatics, and her BSN at the University of Cape Town in South Africa. Glynda has many years of national and international clinical experience in critical care units in South Africa, the UK, and the USA. Her teaching background has focused on clinical education, problem-based learning, clinical techniques, and pharmacology.

Glynda‘s interests include the integration of health informatics in undergraduate education, open accessible education, and the impact of educational technologies on nursing students’ clinical judgment and decision making at the point of care to improve patient safety and quality of care.

Faculty member in the critical care nursing program at the British Columbia Institute of Technology (BCIT) since 2003, Rob has been a critical care nurse for over 25 years with 17 years practicing in a quaternary care intensive care unit. Rob is an experienced educator and supports student learning in the classroom, online, and in clinical areas. Rob’s Master of Education from Simon Fraser University is in educational technology and learning design. He is passionate about using technology to support learning for both faculty and students.

Part of Rob’s faculty position is dedicated to providing high fidelity simulation support for BCIT’s nursing specialties program along with championing innovative teaching and best practices for educational technology. He has championed the use of digital publishing and was the tech lead for Critical Care Nursing’s iPad Project which resulted in over 40 multi-touch interactive textbooks being created using Apple and other technologies.

Rob has successfully completed a number of specialist certifications in computer and network technologies. In 2015, he was awarded Apple Distinguished Educator for his innovation and passionate use of technology to support learning. In the past five years, he has presented and published abstracts on virtual simulation, high fidelity simulation, creating engaging classroom environments, and what the future holds for healthcare and education.

Janet Morrison is the Program Head of Occupational Health Nursing at the British Columbia Institute of Technology (BCIT) in Burnaby, British Columbia. She completed a PhD at Simon Fraser University, Faculty of Communication, Art and Technology, with a focus on health information technology. Her dissertation examined the effects of telehealth implementation in an occupational health nursing service. She has an MA in Adult Education from St. Francis Xavier University and an MA in Library and Information Studies from the University of British Columbia.

Janet’s research interests concern the intended and unintended impacts of health information technologies on healthcare students, faculty, and the healthcare workforce.

She is currently working with BCIT colleagues to study how an educational clinical information system can foster healthcare students’ perceptions of interprofessional roles.

Contribute to this Page

  • Case Study 1
  • Case Study 2
  • Case Study 3
  • Case Study 4
  • Case Study 5
  • Individual Case Study

Case Study 2: Visualizing Global Health Data

Learning objectives.

  • show mastery of data wrangling techniques, including combining data across different sources and formats, data harmonization, and creation of new variables
  • create effective visual displays of complex information using R and Tableau
  • create interactive visual presentations using Tableau

Case Study Goals

Using data from the World Bank, CIA Factbook, or a complex data source of your own choosing,

  • create a Shiny app in R to illustrate an aspect of our world in data
  • create an accompanying Tableau dashboard interactive visual presentation of our world in data

R Tutorial Data

  • popDF.RData
  • infMortDF.RData
  • codeMapDF.RData
  • country_codes.csv
  • allCtryData.RData

Full People And Society Data Set: - full.RData

How To Obtain Data from Factbook: - Factbook Tutorial

  • Final report: the final report will consist of a Shiny app and Tableau dashboard and will be presented to the class as part of a short (5 minute) oral presentation. Data sources must be clearly referenced, and a rationale for the data chosen to tell a story must be provided (one page maximum of written material)

Chapter 11 of Data Science in R: A Case Studies Approach to Computational Reasoning and Problem Solving by Nolan and Temple Lang (free e-book through Duke Libraries)

Shiny Tutorials

Super simple Shiny app you can edit

Tableau for Students

Introduction to Tableau Video

Initial description of CIA Factbook data and plotting in R

Obtaining Tableau Guide

Tableau visualization of infant mortality

Tableau dashboard visualization of infant mortality

Home

June 2020 to December 2021

Lead research institution:  Hiroshima University (Japan); Johns Hopkins University (USA) Other participating research institutions:  Kibi International University (Japan); University of Occupational and Environmental Health (Japan); WHO Emergency Medical Team; Disaster Medical Assistance Team Secretariat (Japan); Ministry of Health, Mozambique; Nippon Medical School (Japan); Hyogo Emergency Medical Center (Japan) Principal investigators:  Tatsuhiko Kubo (Hiroshima University); Edbert Hsu (Johns Hopkins University)

The Sendai Framework on Disaster Risk Reduction 2015-2030 highlighted the health imperative in disaster risk management and the importance of scientific evidence for Health Emergency and Disaster Risk Management (Health EDRM). Reliable health data before, during and after emergencies and disasters are essential for evidence-based policies and programmes.

Recently two standardised health data collection tools were created and tested. These are the WHO Emergency Medical Team (EMT) Minimum Data Set (MDS) (global) and the Japan Surveillance in Post Extreme Emergencies and Disasters (J-SPEED) (for use in Japan). In this project, case studies were conducted on the application of the WHO EMT MDS and J-SPEED. A scoping review was also conducted to synthesise existing knowledge on evidence-gaps, as well as the various facilitators and barriers to using these types of standardised tools.

  • To understand the evidence gaps about health data management before, during and after emergencies and disasters.
  • To identify facilitators and barriers for successful implementation of standardised health data collection systems in the context of disasters and public health emergencies in different settings.
  • To highlight internationally accepted, standardised tools or methods for setting up essential public health data for disaster response, and to demonstrate their potential uses in research, epidemiology, and services planning.
  • Scoping review: A scoping review was performed in English and Japanese to identify studies pertaining to healthcare data collection and minimum data set criteria before, during, and after disasters and public health emergencies. Three electronic databases – PubMed & EMBASE (English), and the ICHUSHI database (Japanese), as well as grey literature were searched using a combination of terms related to data collection and minimum datasets that were applied to a wide range of emergency and disaster situations. Papers published prior to July 2021 were included with no other restrictions. After screening 8864 articles in English, Japanese or from grey literature, 68 studies were included in the review.
  • Case Studies: Five case studies were analysed. They included the application of J-SPEED during the Hokkaido Earthquake 2018 (case 1), the West Japan Heavy Rain 2018 (case 2), and the comparison between them (case 3). In addition, the use of J-SPEED for comparative purposes was analysed in two disasters in 2018, the West Japan heavy rain that occurred in the absence of COVID-19 and the Kumamoto heavy rain that occurred in the presence of COVID-19 (case 4). In the latter comparison, data on acute respiratory infections (ARI) from daily aggregated summaries were extracted and their frequencies compared. Finally, the first application of the WHO EMT MDS during the Mozambique Cyclone Idai response 2019, was described (case 5).

The data in each case study were sent to and collated by their respective EMT coordination cells. The data were then analysed to produce descriptive analyses with summary data about individual health events that had been encountered. The five most frequent health conditions were reported, with statistical analyses conducted where relevant to examine the differences between the observed frequencies of different types of health consultations, or between disasters, or to analyse the specific health problems among different sub-groups (e.g., by age or gender).

Scoping review

Findings from the scoping review revealed a range of critical operational, structural, and functional factors of relevance to the implementation of an EMT MDS.

Facilitating factors : a standardised system that is quick and adaptable to implement to a variety of disaster types; ease in data sharing and secure storage; designated data managers and sufficient human resources to optimise data collection, analysis and relevant training; standardisation of data collection forms, containing clear definitions and operational guidance; robust technological infrastructure to support data collection and security; and collaboration with stakeholders, including local authorities.

Key barriers : insufficient standardisation and operational guidance within and among data collection forms; the absence of reliable study designs to validate and compare collected data; limited collaboration among health facilities, countries, and relevant specialists; disaster-related logistical constraints; lack of trained personnel for data collection and entry; and unreliable technology infrastructure for data collection.

Knowledge gap: The main gap lies in the transition from the acute emergency phase to the recovery phase. This is due to a lack of standardised data collection during and after an emergency, resulting in suboptimal data quality and thus substantially limiting the ability to compare variables and populations during the transition.

The Case Studies

J-SPEED in Japanese Emergencies 2018 – 2020

Case 1 Hokkaido Earthquake 2018: The J-SPEED data detailed a total of 739 consultations over 32 days. The analysis of J-SPEED data showed that the highest number of health consultations (n=721; 97.6%) occurred between day 1 and 13 of the 32-day EMT response. Most consultation were done with people over the age of 65. Women accounted for the majority of consultations. During the response period, disaster stress related symptoms were the most frequently reported health condition.

Case 2 West Japan Heavy Rain 2018 : The J-SPEED data detailed a total of 3,617 health consultations, with the highest number of consultations (2,579; 71.3%) occurring between days 5 and 12 of the 65-day EMT response. Patients aged 15 to 64 comprised the majority of people seeking medical help. During the response period, the most frequently reported health complaint was skin disease followed by wounds.

Case 3 Comparing the West Japan Heavy Rain 2018 & the Hokkaido Earthquake 2018 using J-SPEED : Three major health conditions (Disaster stress related symptom, skin diseases and wounds) during two health emergencies were compared and analysed. Disaster stress related symptoms were significantly higher (p<0.01) for the earthquake than heavy rain. On the other hand, skin diseases were a greater concern with the heavy rain (p<0.01) compared to the earthquake. No significant differences in the prevalence of wounds were observed between the heavy rain and the earthquake (p>0.05). 

Case 4 J-SPEED during the COVID-19 pandemic : Data for acute respiratory infections (ARIs) were compared between the West Japan heavy rain that occurred before the COVID-19 pandemic (2018) and the Kumamoto heavy rain that occurred during the COVID-19 (2020) pandemic. The results showed that ARIs accounted for 5.4% in the 2018 event and only 1.2% in the 2020 event of the total consultations for each of the respective emergencies (p<0.001). The significance of the result may be related to COVID-19 preventive measures implemented in 2020, which could also indicate how health-related behaviour pre-disaster can affect health outcomes during and after emergencies and disasters.

WHO EMT in 2019

Case 5 First Use of WHO EMT MDS in Mozambique, Cyclone Idai, 2019: During the 110 days of the disaster response, there were a total of 18,468 consultations, 1,184 new admissions and 94 live births.  Minor injuries (9.8%) and acute watery diarrhoea (9.4%) were the two most frequently reported conditions. This was the first application of the WHO EMT MDS and provided valuable information towards understanding the disaster situation in real-time. This information also enabled the response manager to deploy the most effective plan for resource allocations. However, there were incomplete daily reports with errors which seem to be related to the lack of sufficient pre-training.  

Global Implications

Globally, unreliable and varying standards of documentation currently exist for reporting health data during emergencies and disasters. These research projects demonstrated the importance of the EMT MDS data for initiating appropriate disaster responses. It reinforces the importance of standardised data collection approaches to collect a minimum data set.

Standardised MDS data provide perspectives about the types of health issues encountered during an emergency or disaster. This standardisation also allows for comparisons between different types of emergencies or as we have seen in Japan, between similar emergencies but under different circumstances. The ability to do comparisons may in the future illuminate whether practice and policy changes have had some impact at the national, community or individual levels. With standardised MDS data, governments have access to the information they need to appropriately allocate medical resources over the short term. However, these data will also contribute to the planning of health services responses for future disasters.  Understanding the range of medical conditions suffered by a population during an emergency or disaster could also inform health services about whether or not long-term support will be needed for disaster survivors.

To maintain high quality data collection, continuous efforts to train EMT members to deploy the MDS correctly are important.  All countries should be encouraged to adopt an EMT MDS. However, the scoping review showed that a range of operational, structural, and functional factors affect the implementation of an EMT MDS. These factors need to be taken into consideration before, during and after the implementation of an EMT MDS. The consistent implementation of an EMT MDS requires a systematic plan for addressing the practical challenges to data collection throughout the course of emergencies and disasters. Local contexts and capacities will differ, and a system to conduct country assessments for capacity to implement an EMT-MDS will be needed.

Implications for Kansai

Disasters resulting from heavy rains, earthquakes, typhoons and public health emergencies are frequent and repeated problems that arise in Kansai. The adoption of a system for EMT-MDS such as J-SPEED, may allow local policy makers to track disaster and emergency responses and services demand through time. As seen with the West Japan and Kumamoto heavy rains case study, utilisation of an EMT-MDS may eventually allow for the comparisons of the impact of service or policy interventions from one disaster to another, or from one context to another, especially as the MDS database grows.

Adopting a common EMT-MDS like J-SPEED in Kansai, could improve disaster and emergency medical responses in the region, as well as provide data and information needed by local governments to plan health services for current and future disaster needs. By establishing the range of medical conditions suffered by a population during an emergency or disaster through the use of an EMT-MDS, it could also inform local services about whether or not long-term support will be needed for disaster survivors, especially in relationship to post-traumatic stress and other long term health impacts.

case study health data

Publications

  • Mitchell AJ, Kubo T, Chang AH, Ochir OC, Salerno A, Yumiya Y, Barnett DJ, Nakase K, Hsu EB. (2022). Disaster and public health emergency health data collection and management: A scoping review. Am J Disaster Med. 2022 Fall;17(4):277-285. doi: 10.5055/ajdm.2022.0443. PMID: 37551899. (Published: 25 July 2023).
  • Yumiya Y, Chimed-Ochir O, Kayano R, Hitomi Y, Akahoshi K, Kondo H, Wakai A, Mimura S, Chishima K, Toyokuni Y, Koido Y, and Kubo T. (2023). Emergency Medical Team Response during the Hokkaido Eastern Iburi Earthquake 2018: J-SPEED Data Analysis. Prehospital and Disaster Medicine, 1-6. doi:10.1017/S1049023X23000432
  • Chimed-Ochir O, Yumiya Y, Taji A, Kishita E, Kondo H, Wakai A, Akahoshi K, Chishima K, Toyokuni Y, Koido Y, Kubo T. Emergency Medical Teams' Responses during the West Japan Heavy Rain 2018: J-SPEED Data Analysis. (2022). Prehosp Disaster Med. 28;37(2):1-7. doi:10.1017/S1049023X22000231. Epub ahead of print. PMID: 35225205; PMCID: PMC8958047. 
  • Sugimura M, Chimed-Ochir O, Yumiya Y, Taji A, Kishita E, Tsurugi Y, Kiwaki K, Wakai A, Kondo H, Akahoshi K, Chishima K, Toyokuni Y, Koido Y, Kubo T. Incidence of Acute Respiratory Infections during Disasters in the Absence and Presence of COVID-19 Pandemic. (2022). Prehosp Disaster Med. 11:1-10. doi: 10.1017/S1049023X22000085. Epub ahead of print. PMID: 35012691.  
  • Kubo T, Chimed-Ochir O, Cossa M, Ussene I, Toyokuni Y, Yumiya Y, Kayano, R, and Salio, F. (2022). First Activation of the WHO Emergency Medical Team Minimum Data Set in the 2019 Response to Tropical Cyclone Idai in Mozambique. Prehospital and Disaster Medicine, 37(6), 727-734. doi:10.1017/S1049023X22001406

Presentations at Conferences / Symposiums / Webinars

  • 避難所アセスメント 情報分析 (Assessment of shelter: Information analysis). Kumamoto University Hospital Disaster Medical Education and Research Center. 2021 Disaster Medical Worker Workshop-Practical Training, 26-28 November, 2021. Kumamoto, Japan.
  • 将来の新興感染症も見据えた広島県独自のデータ収集システム 広島県新型コロナウイルス感染症版 J-SPEED (Hiroshima Prefecture's unique data collection system with focus on future emerging infectious diseases: Hiroshima Prefecture New Coronavirus Infection J-SPEED Version). Hiroshima University Kasumi Campus Joint Homecoming Day. 13 November 2021. Hiroshima, Japan.
  • J-SPEEDによる診療概況可視化: 東日本大震災の教訓に基づく変革への挑戦災害医学に学ぶ診療現場データの収集・可視化 (Challenge to change based on lessons learned from J-SPEED Collection and visualization of medical field data learned from disaster medicine of the Great East Japan Earthquake). The 49th Annual Meeting of the Japanese Society of Emergency Medicine. 23 November 2021. Tokyo, Japan.
  •  J-SPEED専門職をつなぐ災害医療の取り組み(Disaster medical care initiatives connecting professionals). The 29th Annual Meeting of the Japanese Society of Clinical Behavior and the 39th Annual Meeting of the Society of Japan. Fukuoka, Japan.
  • J-SPEED 災害医療分野の日本発WHO国際標準の国際戦略 (J-SPEED International Strategy of WHO International Standards from Japan in Disaster Medicine). 28 October 2021. Japan.
  • 災害診療記録/J-SPEED 令和2年熊本豪雨等からの最新知見 (Disaster Medical Records/J-SPEED Latest knowledge from 2020 Kumamoto heavy rain). The 2nd Emergency and Disaster Medical Response Committee of the Japan Hospital Association. 26 October 2021. Japan.
  • Instruction for the EMT MDS Daily Report. WHO EMT MDS Working Group/Japan Disaster Relief EMT Initiative Corresponding Unit. 21 October 2022. Japan.
  • 災害時における保健医療の不易流行:災害とパブリックヘルス/J-SPEED (The epidemic of health care in the event of a disaster Disasters and Public Health/J-SPEED). International University of Health and Welfare. 1 October 2021. Tokyo, Japan.
  • 東日本大震災の教訓から災害医療の未来を創る: J-SPEED-災害時の診療情報管理 (Creating the Future of Disaster Medicine from the Lessons Learned from the Great East Japan Earthquake: J-SPEED - Management of Medical Information in the Event of a Disaster. The 80th POC Seminar (The 53 rd  Annual Meeting of the Japanese Society of Medical Laboratory Science. 8 October 2021. Yokohama, Japan.
  • J-SPEED ― 災害時の診療情報管理について―(J-SPEED - Management of medical information in the event of a disaster). 16 September 2021.
  • 災害医療をデータするJ-SPEED/MDS日本発WHO国際標準の国際戦略 (Data on disaster health J-SPEED/MDS International Strategy of WHO International Standards from Japan). International Conference on the Unity of the Sciences, ICUS. 3 September 2021.
  • 災害時診療概況報告システムJ-SPEEDと広島県の新型コロナウイルス感染症対応 (With the disaster medical care overview report system J-SPEED and Response to the new coronavirus infection in Hiroshima Prefecture). Disaster Response Medical Training Session, Nishi-ku Community Health Measures Industry Council, Hiroshima City. 9 September 2021.
  • 災害医療チームの診療情報管理:災害診療記録/J-SPEED. (Medical information management of disaster medical care team: Medical Records/J-SPEED). On-demand delivery Fukuoka Medical Association JMAT training (Basic Edition). March, 2021.
  • J-SPEEDをアタッチメントとした災害医療分野におけるAIの導入方向性 (Direction of Introduction of AI in disaster medicine with J-SPEED as an attachment). The 26th Annual Meeting of the Japanese Society of Disaster Medicine. 17 March 2021.
  • 新型コロナウィルス感染症を踏まえた今後の広島県健康危機管理 - DX推進のための突破口 (In light of the new coronavirus infection Future Hiroshima Prefecture Health Crisis Management - Breakthroughs for DX Promotion). Hiroshima Prefectural Assembly Budget Special Committee. 5 March 2021.
  • 新型コロナウイルス感染症 (New Coronavirus Infection). 17 March 2021. Hiroshima, Japan. 
  • 新型コロナウイルス感染症を踏まえた感染症対策と災害支援のあり方 (Measures against infectious diseases and disaster support based on the novel coronavirus infection). 2020 Chugoku-Shikoku Area Disaster Support Seminar. 26 January 2021. Hiroshima, Japan.
  • 災害医療チームの診療情報管理 災害診療記録/J-SPEED (Medical information management of disaster medical care team. Disaster Medical Records/J-SPEED). Education Material on J-SPEED for Japan Medical Association Team (JMAT). 16 January 2021. Japan.
  • EMT Minimum Data Set (MDS) for COVID-19. 2nd Webinar on Good Practice on Medical Response Against COVID-19 Outbreak. 8 December 2020.
  • 保健師等研究の基礎知識量的研究、やりましょう!(Basic knowledge of public health nurse research: Quantitative research, let's do it!). 2020 New Late And Mid-Term Public Health Nurse Training Program. 5 October 2020, Hiroshima, Japan.
  • 災害防止の実際から見えてきた公衆衛生学的課題とその対応~自然災害から何を学び、職場における緊急対応として何を備えるべきか (Public health issues and responses to disaster prevention - What should be learned from natural disasters and prepared as an emergency response in the workplace). Yamaguchi Medical Association Industrial Physician Workshop. 19 September 2020. Yamaguchi, Japan.
  • 災害診療記録/J-SPEED ― 豪雨災害を踏まえた避難所COVID-19 モニタリングを含めて(Disaster Medical Care Record/J-SPEED - Including COVID-19 Monitoring of Evacuation Centers in Light of Heavy Rain Disasters). 2020 Asa Medical Association Disaster Medical Lecture. 27 August 2020.
  • Introducing the WHO Emergency Medical Team Minimum Data Set (MDS). The ASEAN EOC NETWORK. 18 August 2020.

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

  • View all journals
  • My Account Login
  • Explore content
  • About the journal
  • Publish with us
  • Sign up for alerts
  • Open access
  • Published: 22 April 2024

Practical approaches in evaluating validation and biases of machine learning applied to mobile health studies

  • Johannes Allgaier   ORCID: orcid.org/0000-0002-9051-2004 1 &
  • Rüdiger Pryss   ORCID: orcid.org/0000-0003-1522-785X 1  

Communications Medicine volume  4 , Article number:  76 ( 2024 ) Cite this article

31 Accesses

1 Altmetric

Metrics details

  • Health services
  • Medical research

Machine learning (ML) models are evaluated in a test set to estimate model performance after deployment. The design of the test set is therefore of importance because if the data distribution after deployment differs too much, the model performance decreases. At the same time, the data often contains undetected groups. For example, multiple assessments from one user may constitute a group, which is usually the case in mHealth scenarios.

In this work, we evaluate a model’s performance using several cross-validation train-test-split approaches, in some cases deliberately ignoring the groups. By sorting the groups (in our case: Users) by time, we additionally simulate a concept drift scenario for better external validity. For this evaluation, we use 7 longitudinal mHealth datasets, all containing Ecological Momentary Assessments (EMA). Further, we compared the model performance with baseline heuristics, questioning the essential utility of a complex ML model.

Hidden groups in the dataset leads to overestimation of ML performance after deployment. For prediction, a user’s last completed questionnaire is a reasonable heuristic for the next response, and potentially outperforms a complex ML model. Because we included 7 studies, low variance appears to be a more fundamental phenomenon of mHealth datasets.

Conclusions

The way mHealth-based data are generated by EMA leads to questions of user and assessment level and appropriate validation of ML models. Our analysis shows that further research needs to follow to obtain robust ML models. In addition, simple heuristics can be considered as an alternative for ML. Domain experts should be consulted to find potentially hidden groups in the data.

Plain Language Summary

Computational approaches can be used to analyse health-related data collected using mobile applications from thousands of participants. We tested the impact of some participants being represented multiple times or some not being counted properly within the analysis. In this context, we label a multi-represented participant a group. We find that ignoring such groups can lead to false estimation of health-related predictions. In some cases, simpler quantitative methods can outperform complex computational models. This highlights the importance of monitoring and validating results conducted by complex computational models and confers the use of simpler analytical methods in its place.

Similar content being viewed by others

case study health data

Using remotely monitored patient activity patterns after hospital discharge to predict 30 day hospital readmission: a randomized trial

case study health data

Best practices for analyzing large-scale health data from wearables and smartphone apps

case study health data

Detecting the impact of subject characteristics on machine learning-based diagnostic applications

Introduction.

When machine learning models are applied to medical data, an important question is whether the model learns subject-specific characteristics (not desired effect) or disease-related characteristics (desired effect) between an input and output. A recent paper by Kunjan et al. 1 describes this very well at the example of classification and EEG disease diagnosis. In the Kunjan et al. paper, this is discussed using different variants of cross-validation. It is well shown that the type of validation can cause extreme differences. Older work has evaluated different cross-validation techniques on datasets with different recommendations for the number of optimal folds 2 , 3 . We transfer and adapt this idea to mHealth data and the application of machine-learning-based classification and raise new questions about this. To this end, we will briefly explain the background. Using simple, understandable models rather than complex black box models is a clamor of Rudin et. al., which motivates us to evaluate simple heuristics against complex models 4 . The Cross-Industry Standard Process for Data Mining (CRISP-DM) highlights the importance of subject matter experts to get familiar with a dataset 5 . In turn, familiarity with the dataset is necessary to detect hidden groups in the dataset. In our mHealth use cases, one app user that fills out more several questionnaires constitutes a group.

We have developed numerous applications in mobile health in recent years (e.g. 6 , 7 ) and the issue of disease-related or subject-specific characteristics is particularly pronounced in these applications. mHealth applications very often use the principles of Patient-reported Outcome Measures (PROMs) or/and Ecological Momentary Assessments (EMAs) 8 . EMAs have the major goal that users record symptoms several times a day over a longer period. As a result, users of an mHealth solution generate longitudinal data with many assessments. Since not all users respond equally frequently in the applications (as shown by many applications that have been in operation for a long time 9 ), the result is a very different number of assessments per user. Therefore, the question arises in the application of machine learning, how the actual learning takes place. In learning, should we group the ratings per user so that a user only appears in either the training set or the testing set, which is correct by design. Or, can we accept that a user’s ratings appear in both the training and test sets, since users with many ratings have such a high variance in ratings. Finally, individual users may undergo concept drift in the way they answer questions in many assessments over a long period of time. In such a case, the question also arises as to whether it makes sense to use an individual’s ratings separately in the training and testing sets.

In this context, we also see another question as relevant that is not given enough attention: What is an appropriate baseline for a machine learning outcome in studies? As mentioned earlier, some mHealth users fill out thousands of assessments, and do so for years. In this case, there may be questions about whether a previous assessment can reliably predict the next one, and the use of machine learning may be poorly targeted.

With respect to the above research questions, we use another component to further promote the results. We selected seven studies from the pool of developed apps that we will use for the analysis of this paper. Since a total of 7 studies are used, a more representative picture should emerge. However, since the studies do not all have the same research goals, classification tasks need to be found per app to make the overall results comparable. The studies also do not all have the same duration. Even though the studies are not always directly comparable, the setting is very promising as the results will show in the end. Before deriving specific research questions against this background, related work and technical background information will be briefly discussed.

This section surveys relevant literature to contextualize our contributions within the broader field of study. Cawley et al. also address the question of how to minimize the error in the estimator of performance in ground truth. Using synthetic data sets, they argue that overfitting a model is as problematic as selection bias in the training data 10 . However, they do not address the phenomenon of groups in the data. Refaeilzadeh et al. give an overview of common cross-validation techniques such as leave-one-out, repeated k-fold, or hold-out validation 11 . They discuss pros and cons of each kind and mention an underestimated performance variance for repeated k-fold cross-validation, but they also do not address the problem with (unknown) groups in the dataset 11 . Schratz et. al. focus on spatial auto-correlation and spatial cross-validation rather than on groups and splitting approaches 12 . Spatial cross-validation is sometimes also referred to as block cross-validation13. They observe large performance differences in the use or non-use of spatial cross validation. By random sampling of train and test samples, a train and test sample might be too close to each other on a geographical space, which induces a selection bias and thus an overoptimistic estimate of the generalization error. They then use spatial cross-validation. We would like to briefly differentiate between space and group . Two samples belong to the same space if they are geographically close to each other 13 . They belong to the same group if a domain expert assigns them to a group. In our work, multiple assessments belonging to one user form a group. Meyer et al. also evaluate using a spatial cross-validation approach, but also add a time dimension using Leave-Time-Out cross-validation where samples belong to one fold if they fall into a specific time range 14 . This leave-time-out approach is like our time-cut approach, which will be introduced in the methods section. Yet, we are not aware of any related approach on mHealth data like the one we are pursuing in this work.

As written at the beginning of the introduction, we want to evaluate how much the model’s performance depends on specific users (syn. subjects, patients, persons ) that are represented several times within our dataset, but with a varying number of assessments per user. From previous work, we already know that so-called power-users with many more assessments than most of the other users have a high impact on the models training procedure 15 . We would further like to investigate whether a simple heuristic can outperform complex ensemble methods. Simple heuristics are interesting because they are easy to understand, have a low maintenance requirement, and have low variance, but also generate high bias.

Technically, across studies (i.e., across the seven studies), we investigate simple heuristics at the user and assessment level and compare them to tree-based non-tuned ML ensembles. Tree-based methods have already been proven in the literature on the specific mHealth data used, that is why we use only tree-based methods. The reason for not tuning these models is that we want to be more comparable across the used studies. With these levels of consideration, we would like to elaborate on the following research two questions: First, what is the variance in performance when using different splitting methods for train and test set of mHealth data (RQ1)? Second, in which cases is the development, deployment and maintenance of a ML model compared to a simple baseline heuristic worthwhile when being used on mHealth data?

The present work compares the performance of a tree-based ensemble method if the split of the data happens on two different levels: User and assessment. It further compares this performance to non-ML approaches that uses simple heuristics to also predict the target on a user- or assessment level. To summarize the major findings: First, ignoring users in datasets during cross-validation leads to an overestimation of the model’s performance and robustness. Second, for some use cases, simple heuristics are as good as complicated tree-based ensemble methods. Within this domain, heuristics are more advantageous if they are trained or applied at the user level. ML models also work at the assessment level. And third, sorting users can simulate concept drift in training if the time span of data collection is large enough. The results in the test set change due to the shuffling of users.

In this section, we first describe how Ecological Momentary Assessments work and how they differentiate from assessments that are collected within a clinical environment. Second, we present the studies and ML use cases for each dataset. Next, we introduce the non-ML baseline heuristics and explain the ML preprocessing steps. Finally, we describe existing train-test-split approaches (cross-validation) and the splitting approaches at the user- and assessment levels.

Ecological momentary assessments

Within this context, ecological means “within the subject’s natural environment", and momentary “within this moment" and ideally, in real time 16 . Assessments collected in research or clinical environments may cause recall bias of the subject’s answers and are not primarily designed to track changes in mood or behavior longitudinally. Ecological Momentary Assessments (EMA) thus increase validity and decrease recall bias. They are suitable for asking users in their daily environment about their state of being, which can change over time, by random or interval time sampling. Combining EMAs and mobile crowdsensing sensor measurements allows for multimodal analyses, which can gain new insights in, e.g., chronic diseases 8 , 15 . The datasets used within this work have EMA in common and are described in the following subsection.

The ML use cases

From ongoing projects of our team, we are constantly collecting mHealth data as well as Ecological Momentary Assessments 6 , 17 , 18 , 19 . To investigate how the machine learning performance varies based on the splits, we wanted different datasets with different use cases. However, to increase comparability between the use cases, we created multi-class classification tasks.

We train each model using historical assessments, the oldest assessment was collected at time t s t a r t , the latest historical assessment at time t l a s t . A current assessment is created and collected at time t n o w , a future assessment at time t n e x t . Depending on the study design, the actual point of time t n e x t may be in some hours or in a few weeks from t n o w . For each dataset and for each user, we want to predict a feature (synonym, a question of an assessment) at time t n e x t using the features at time t n o w . This feature at time t n e x t is then called the target. For each use case, a model is trained using data between t s t a r t and t l a s t , and given the input data from t n o w , it predicts the target at t n e x t . Figure  1 gives a schematic representation of the relevant points of time t s t a r t ,  t l a s t ,  t n o w , and t n e x t .

figure 1

At time t s t a r t , the first assessment is given; t l a s t is the last known assessment used for training, whereas t n o w is the currently available assessment as input for the classifier and the target is predicted at time t t e x t .

To increase comparability between the approaches, we used the same model architecture with the same pseudo-random initialisation. The model is a Random Forest classifier with 100 trees and the Gini impurity as the splitting criterion. The whole coding was in Python 3.9, using mostly scikit-learn , pandas and Jupyter Notebooks . Details can be found on GitHub in the supplementary material.

The included apps and studies in more detail

For all datasets that we used in this study, we have ethical approvals (UNITI No. 20-1936-101, TYT No. 15-101-0204, Corona Check No. 71/20-me, and Corona Health No. 130/20-me). The following section provides an overview of the studies, the available datasets with characteristics, and then describes each use case in more detail. An brief overview is given in Table  1 with baseline statistics for each dataset in Table  2 .

To provide some more background info about the studies: The analyses happen with all apps on the so-called EMA questionnaires (synonym: assessment), i.e., the questionnaires that are filled out multiple times in all apps and the respective studies. This can happen several times a day (e.g., for the tinnitus study TrackYourTinnitus (TYT)) or at weekly intervals (e.g., studies in the Corona Health (CH) app). Nevertheless, the analysis happens on the recurring questionnaires, which collect symptoms over time and in the real environment through unforeseen (i.e., random) notifications.

The TrackYourTinnitus (TYT) dataset has the most filled-out assessments with more than 110,000 questionnaires as by 2022-10-24. The Corona Check (CC) study has the most users. This is because each time an assessment is filled out, a new user can optionally be created. Notably, this app has the largest ratio of non-German users and the youngest user group with the largest standard deviation. The Corona Health (CH) app with its studies Mental health for adults, adolescents and physical health for adults has the highest proportion of German users because it was developed in collaboration with the Robert Koch Institute and was primarily promoted in Germany. Unification of treatments and Interventions for Tinnitus patients (UNITI) is a European Union-wide project, which overall aim is to deliver a predictive computational model based on existing and longitudinal data 19 . The dataset from the UNITI randomized controlled trial is described by Simoes et al. 20 .

TrackYourTinnitus (TYT)

With this app, it is possible to record the individual fluctuations in tinnitus perception. With the help of a mobile device, users can systematically measure the fluctuations of their tinnitus. Via the TYT website or the app, users can also view the progress of their own data and, if necessary, discuss it with their physician.

The ML task at hand is a classification task with target variable Tinnitus distress at time t n o w and the questions from the daily questionnaire as the features of the problem. The target’s values range in [0, 1] on a continuous scale. To make it a classification task, we created bins with step size of 0.2 resulting in 5 classes. The features are perception , loudness , and stressfulness of tinnitus, as well as the current mood , arousal and stress level of a user, the concentration level while filling out the questionnaire, and perception of the worst tinnitus symptom . A detailed description of the features was already done in previous works 21 . Of note, the time delta of two assessments of one user at t n e x t and t n o w varies between users. Its median value is 11 hours.

Unification of Treatments and Interventions for Tinnitus Patients (UNITI)

The overall goal of UNITI is to treat the heterogeneity of tinnitus patients on an individual basis. This requires understanding more about the patient-specific symptoms that are captured by EMA in real time.

The use case we created at UNITI is like that of TYT. The target variable encumbrance, coded as cumberness , which was also continuously recorded, was divided into an ordinal scale from 0 to 1 in 5 steps. Features also include momentary assessments of the user during completion, such as jawbone, loudness, movement, stress, emotion , and questions about momentary tinnitus. The data was collected using our mobile apps 7 . Here, of note: on average, the median time gap between two assessment is 24 hours for each user.

Corona Check (CC)

At the beginning of the COVID-19 pandemic, it was not easy to get initial feedback about an infection, given the lack of knowledge about the novel virus and the absence of widely available tests. To assist all citizens in this regard, we launched the mobile health app Corona Check together with the Bavarian State Office for Health and Food Safety 22 .

The Corona Check dataset predicts whether a user has a Covid infection based on a list of given symptoms 23 . It was developed in the early pandemic back in 2020 and helped people to get quick estimate for an infection without having an antigen test. The target variable has four classes: First, “suspected coronavirus (COVID-19) case", second, “symptoms, but no known contact with confirmed corona case", third, “contact with confirmed corona case, but currently no symptoms", and last, “neither symptoms nor contact".

The features are a list of Boolean variables, which were known at this time to be typically related with a Covid infection, such as fever, a sore throat, a runny nose, cough, loss of smell, loss of taste, shortness of breath, headache, muscle pain, diarrhea, and general weakness. Depending on the answers given by a user, the application programming interface returned one of the classes. The median time gap of two assessments for the same user is 8 hours on average with a much larger standard deviation of 24.6 days.

Corona Health ∣ Mental health for adults (CHA)

The last four use cases are all derived from a bigger Covid-related mHealth project called Corona Health 6 , 24 . The app was developed in collaboration with the Robert Koch-Institute and was primarily promoted in Germany, it includes several studies about the mental or physical health, or the stress level of a user. A user can download the app and then sign up for a study. He or she will then receive a baseline one-time questionnaire, followed by recurring follow-ups with between-study varying time gaps. The follow-up assessment of CHA has a total of 159 questions including a full PHQ9 questionnaire 25 . We then used the nine questions of PHQ9 as features at t n o w to predict the level of depression for this user for t n e x t . Depression levels are ordinally scaled from None to Severe in a total of 5 classes. The median time gap of two assessments for the same user is 7.5 days. That is, the models predict the future in this time interval.

Corona Health ∣ Mental health for adolescents (CHY)

Similar to the adult cohort, the mental health of adolescents during the pandemic and its lock-downs is also captured by our app using EMA.

A lightweight version of the mental health questionnaire for adults was also offered to adolescents. However, this did not include a full PHQ9 questionnaire, so we created a different use case. The target variable to be classified on a 4-level ordinal scale is perceived dejection coming from the PHQ instruments, features are a subset of quality of live assessments and PHQ questions, such as concernment, tremor, comfort, leisure quality, lethargy, prostration, and irregular sleep. For this study, the median time gap of two follow up assessments is 7.3 days.

Corona Health ∣ Physical health for adults (CHP)

Analogous to the mental health of adults, this study aims to track how the physical health of adults changes during the pandemic period.

Adults had the option to sign up for a study with recurring assessments asking for their physical health. The target variable to be classified asks about the constraints in everyday life that arise due to physical pain at t n e x t . The features for this use case include aspects like sport, nutrition, and pain at t n o w . The median time gap of two assessments for the same user is 14.0 days.

Corona Health ∣ Stress (CHS)

This additional study within the Corona Health app asks users about their stress level on a weekly basis. Both features and target are assessed on a five-level ordinal scale from never to very often . The target asks for the ability of stress management, features include the first nine questions of the perceived stress scale instrument 26 . The median time gap of two assessments for the same user on average is 7.0 days.

Baseline heuristics instead of complex ML models?

We also want to compare the ML approaches with a baseline heuristic ( synonym: Baseline model ). A baseline heuristic can be a simple ML model like a linear regression or a small Decision Tree, or alternatively, depending on the use case, it could also be a simple statement like “The next value equals the last one". The typical approach for improving ML models is to estimate the generalization error of the model on a benchmark data set when compared to a baseline heuristic. However, it is often not clear, which baseline heuristic to consider, i.e.: The same model architecture as the benchmark model, but without tuned hyperparameters? A simple, intrinsically explainable model with or without hyperparameter tuning? A random guess? A naive guess, in which the majority class is predicted? Since we have approaches on a user-level (i.e., we consider users when splitting) and on an assessment-level (i.e., we ignore users when splitting), we also should create baseline heuristics on both levels. We additionally account for within-user variance in Ecological Momentary Assessments by averaging a user’s previously known assessments. Previously known here means that we calculate the mode or median of all assessments of a user that are older than the given timestamp. In total, this leads to four baseline heuristics (user-level latest, user-level average, assessment-level latest, assessment-level average) that do not use any machine learning but simple heuristics. On the assessment-level, the latest known target or the mean of all known targets so far is taken to predict the next target, no matter of the user-id of this assessment. On the user-level, either the last known, or median, or mode value of this user is taken to predict the target. This, in turn, leads to a cold-start problem for users that appear for the first time in a dataset. In this case, either the last known, or mode, or median of all assessments that are known so far are taken to predict the target.

ML preprocessing

Before the data and approaches could be compared, it was necessary to homogenize them. In order for all approaches to work on all data sets, at least the following information is necessary: Assessment_id, user_id, timestamp, features, and the target. Any other information such as GPS data, or additional answers to questions of the assessment, we did not include into the ML pipeline. Additionally, targets that were collected on a continuous scale, had to be binned into an ordinal scale of five classes. For an easier interpretation and readability of the outputs, we also created label encodings for each target. To ensure consistency of the pre-processing, we created helper utilities within Python to ensure that the same function was applied on each dataset. For missing values, we created a user-wise missing value treatment. More precisely, if a user skipped a question in an assessment, we filled the missing value with the mean or mode ( mode = most common value) of all other answers of this user for this assessment. If a user had only one assessment, we filled it with the overall mean for this question.

For each dataset and for each script, we set random states and seeds to enhance reproducibility. For the outer validation set, we assigned the first 80 % of all users that signed up for a study to the train set, the latest 20% to the test set. To ensure comparability, the test users were the same for all approaches. We did not shuffle the users to simulate a deployment scenario where new users join the study. This would also add potential concept drift from the train to the test set and thus improve the simulation quality.

For the cross-validation within the training set, which we call internal validation, we chose a total of 5 folds with 1 validation fold. We then applied the four baseline heuristics (on user level and assessment level with either latest target or average target as prediction) to calculate the within-train-set performance standard deviation and the mean of the weighted F1 scores for each train fold. The mean and standard deviation of the weighted F1 score are then the estimator of the performance of our model in the test set.

We call one approach superior to another if the final score is higher. The final score to evaluate an approach is calculated as:

If the standard deviation between the folds during training is large, the final score is lower. The test set must not contain any selection bias against the underlying population. The pre-factor α of the standard deviation is another hyperparameter. The more important model robustness for the use case, the higher α should be set.

Existing train-test-split approaches

Within cross-validation, there exist several approaches on how to split up the data into folds and validate them, such as the k -fold approach with k as the number of folds in the training set. Here, k  − 1 folds form the training folds and one fold is the validation fold 27 . One can then calculate k performance scores and their standard deviation to get an estimator for the performance of the model in the test set, which itself is an estimator for the model’s performance after deployment (see also Fig.  2 ).

figure 2

Schematic visualisation of the steps required to perform a k -fold cross-validation, here with k  = 5.

In addition, there exist the following strategies: First, (repeated) stratified k -fold, in which the target distribution is retained in each fold, which can also be seen in Fig.  3 . After shuffling the samples, the stratified split can be repeated 3 . Second, leave- one -out cross-validation 28 , in which the validation fold contains only one sample while the model has been trained on all other samples. And third, leave- p -out cross-validation, in which \(\left(\begin{array}{c}n\\ p\end{array}\right)\) train-test-pairs are created with n equals number of assessments (synonym sample ) 29 .

figure 3

While this approach retains the class distribution in each fold, it still ignores user groups. Each color represents a different class or user id.

These approaches, however, do not always focus on samples that might belong to our mHealth data peculiarities. To be more specific, they do not account for users (syn. groups, subjects) that generate daily assessments (syn. samples) with a high variance.

Splitting approaches related to EMA

To precisely explain the splitting approaches, we would like to differentiate between the terms folds and sets . We call a chunk of samples (synonym: assessments, filled-out questionnaires) a set on the outer split of the data, for which we cut-off the final test set . However, within the training set, we then split further to create training and validation folds . That is, using the term fold , we are in the context of cross validation. When we use the term set , then we are in the outer split of the ML pipeline. Figure  4 visualizes this approach. Following this, we define 4 different approaches to split the data. For one of them we ignore the fact that there are users, for the other three we do not. We call these approaches user-cut, average-user, user-wise and time-cut . All approaches have in common that the first 80 % of all users are always in the training set and the remaining 20 % are in the test set. A schematic visualization of the splitting approaches is shown in Fig.  5 . Within the training set, we then split on user-level for the approaches user-cut, average-user and user-wise , and on assessment-level for the approach time-cut .

figure 4

In the second step, users are ordered by their study registration time, with the initial 80 % designated as training users and the remaining 20 % as test users. Subsequently, assessments by training users are allocated to the training set, and those by test users to the test set. Within the training set, user grouping dictates the validation approach: group-cross-validation is applied if users are declared as a group, otherwise, standard cross-validation is utilized. We compute the average f 1 score, \({f}_{1}^{train}\) , from training folds and the f 1 score on the test set, \({f}_{1}^{test}\) . The standard deviation of \({f}_{1}^{train},\sigma ({f}_{1}^{train})\) , indicates model robustness. The hyperparameter α adjusts the emphasis on robustness, with higher α values prioritizing it. Ultimately, \({f}_{1}^{final}\) , which is a more precise estimate if group-cross-validation is applied, offers a refined measure of model performance in real-world scenarios.

figure 5

Yellow means that this sample is part of the validation fold, green means it is part of a training fold. Crossed out means that the sample has been dropped in that approach because it does not meet the requirements. Users can be sorted by time to accommodate any concept drift.

In the following section, we will explain the splitting approaches in more detail. The time-cut approach ignores the fact of given groups in the dataset and simply creates validation folds based on the time the assessments arrive in the database. In this example, the month, in which a sample was collected, is known. More precisely, all samples from January until April are in the training set while May is in the test set. The user-cut approach shuffles all user ids and creates five data folds with distinct user-groups. It ignores the time dimension of the data, but provides user-distinct training and validation folds, which is like the GroupKFold cross-validation approach as implemented in scikit-learn 30 . The average-user approach is very similar to the user-cut approach. However, each answer of a user is replaced by the median or mode answer of this user up to the point in question to reduce within-user-variance. While all the above-mentioned approaches require only one single model to be trained, the user-wise approach requires as many models as distinct users are given in the dataset. Therefore, for each user, 80 % of his or her assessments are used to train a user-specific model, and the remaining 20% of the time-sorted assessments are used to test the model. This means that for this approach, we can directly evaluate on the test set as each model is user specific and we solved the cold-start problem by training the model on the first assessments of this user. If a user has less than 10 assessments, he or she is not evaluated on that approach.

Approval for the UNITI randomized controlled trial and the UNITI app was obtained by the Ethics Committee of the University Clinic of Regensburg (ethical approval No. 20-1936-101). All users read and approved the informed consent before participating in the study. The study was carried out in accordance with relevant guidelines and regulations. The procedures used in this study adhere to the tenets of the Declaration of Helsinki. The Track Your Tinnitus (TYT) study was approved by the Ethics Committee of the University Clinic of Regensburg (ethical approval No. 15-101-0204). The Corona Check (CH) study was approved by the Ethics Committee of the University of Würzburg (ethical approval no. 71/20-me) and the university’s data protection officer and was carried out in accordance with the General Data Protection Regulations of the European Union. The procedures used in the Corona Health (CH) study were in accordance with the 1964 Helsinki declaration and its later amendments and was approved by the ethics committee of the University of Würzburg, Germany (No. 130/20-me). Ethical approvals include secondary use. The data from this study are available on request from the corresponding author. The data are not publicly available, as the informed consent of the participants did not provide for public publication of the data.

Reporting summary

Further information on research design is available in the  Nature Portfolio Reporting Summary linked to this article.

We will see in this results section that ignoring users in training leads to an underestimation of the generalizability of the model, the standard deviation is then too small. To further explain, a model is ranked first in the comparison of all computations if it has the highest final score, and last if it has the lowest final score. We recall the formula of the final score from the methods section: \({f}_{1}^{final}={f}_{1}^{test}-0.5{\sigma }\left(\,{f}_{1}^{train}\right)\) . For these use cases, we set α  = 0.5. The greater the emphasis on model robustness and the increased concerns regarding concept drift, the greater the alpha value should be set.

RQ1: What is the variance in performance when using different splitting methods for train and test set?

Considering performance aspects and ignoring the user groups in the data, the time cut approach has on average the best performance on assessment level. As an additional variant, we have sorted users once by time and once by random. When sorting by time, the baseline heuristic with the last known assessment of a user follows at rank 2, whereas with randomly sorted users, the user cut approach takes rank 2. The baseline heuristic with all known assessments on the user-level has the highest standard deviation in ranks, which means that this approach is highly dependent on the use case: For some datasets, it works better, for other it does not. The user-wise model approach has also a higher standard deviation in the ranking score, which means that the success of this approach is more use-case specific. As we set the threshold of users to be included into this approach to a minimum of 10 assessments, we have a high chance of a selection bias for the train-test split for users with only a few assessments, which could be a reason for the larger variance in performance. Details for the result are given in Table  3 .

Could there be a selection bias of users that are sorted and split by time? To answer this, we randomly draw 5 different user test sets for the whole pipeline and compared the approaches’ rankings with the variation where users were sorted by time. The approaches’ ranking changes by .44, which is less than one rank and can be calculated from Table  3 . This shows that there is no easily classifiable group of test users.

Cross-validation within the train helps to estimate the generalization error of the model for unseen data. On assessment-level, the standard deviations of the weighted F1 score within the train set for all datasets varies between 0.25 % for TrackYourTinnitus and 1.29 % for Corona Health Stress. On user-level, depending on the splitting approach, the standard deviation varies from 1.42 % to 4.69 %. However, on the test set, the estimator of the generalization error (i.e., the standard deviation of the F1 scores of the validation folds within the train set) is too low for all 7 datasets on assessment-level. On user-level, the estimator of the generalization error is too low for 4 out of 7 datasets. We define the estimator of the generalization error as in range if its smaller or equals the performance drop between validation and test set. Details for the result are given in Table  4 .

Both approaches, user- and assessment, overestimate the performance of the model during training. However, the quality of estimator of the generalization error increases if users are split on user-level.

RQ2: In which cases is the development, deployment and maintenance of a ML model compared to a simple baseline heuristic worthwhile?

For our 7 datasets, the baseline heuristics on a user-level perform better than those on assessment-level. For the datasets Corona Check (CC), Corona Health Stress (CH), TrackYourTinnitus (TYT) and UNITI , the last known user assessment is the best predictor within the baseline heuristics. For the psychological Corona Health study with adolescents (CHY) and adults (CHA), and physical health for adults (CHP), the average of the historic assessments is the best baseline predictor. The last known assessment on an assessment-level as a baseline heuristic performs worse for each dataset compared to the assessment level. The average of all so far known assessment as a predictor for the next assessment - independent from the user - has worst performance within the baseline heuristics for all datasets except CHA. Notably, the larger the number of assessments, the more the all-instances-approach on assessment-level converts to the mean of the target, which has high bias and minimum variance.

These results lead us to conclude that recognizing user groups in datasets leads to an improved baseline when trying to predict future ones from historical assessments. When these non-machine-learning baseline heuristics are then compared to machine learning models without hyperparameter tuning, it is found that they sometimes outperform or similarly outperform the machine learning model.

The approaches ranking in Table  5 shows the general overestimation of the performance of the time-cut approach as this approach is ranked best on average. It can be also seen that these approaches are ranked closely to each other. We chose α to be 0.5. Because we only subtract 0.5 (0.5 = α , our hyperparameter to control importance of model robustness) of the standard deviation of the f 1 scores of the validation folds, approaches with a higher standard deviation are less punished. This means, in turn, that the overestimation of the performance of the splits on assessment-level would be higher if α was higher. Another reason for the similarity of the approaches is that the same model architecture has been finally trained on all assessments of all train users to be evaluated on the test set. Thus, the only difference of the rankings results from the standard deviation of the f 1 scores of the validation folds.

To answer the question whether it is worthwhile to turn a prediction task into an ML project, further constraints should be considered. The above analysis shows that the baseline heuristics are competitive to the non-tuned random forest with much lower complexity. At the same time, the overall results are an f1 score between 55 and 65 for a multi-class classification with potential for improvement. Thus, the question should be additionally asked, from which f 1 score can be deployed, which depends on the use case, and in addition it is not clear whether the ML approach can be significantly improved by a different model or the right tuning.

The present work compared the performance of a tree-based ensemble method if the split of the data happens on two different levels: User and assessment. It further compared this performance to non-ML approaches that uses simple heuristics to also predict the target on a user- or assessment level. We quickly summarize the findings and then discuss them in more detail in the sections below. Neglecting user data during cross-validation may result in an inflated estimation of model performance and robustness, a phenomenon critical to the integrity of model evaluation. In specific scenarios, empirical evidence suggests that straightforward heuristic approaches can rival the efficacy of complex tree-based ensemble methodologies. Particularly, heuristics tailored or applied at the user level manifest a distinct advantage, while machine learning models maintain efficacy at the assessment level. Additionally, the methodological sorting of users in the dataset can serve as a proxy for concept drift in longitudinal studies, given a sufficiently extensive data collection period. This manipulation affects the test set outcomes, underscoring the influence of temporal user behavior variations on model validation.

The - still - small number of 7 use cases itself has a risk of selection bias in the data, features, or variables. This limits the generalizability of the statements. However, it is also arguable whether the trends found turn in a different direction when more use cases are included in the analysis. We do not believe that the tendencies would turn. We restricted the ML model to be a random forest classifier with a default hyperparameter set up to increase the degree of comparability between use cases. We are aware that each use case is different and direct comparability is not possible. Furthermore, we could have additionally evaluated the entire pipeline on other ML models that are not tree-based. However, this would have added another dimension to the comparison and further complicated the comparison of the results. Therefore, we cannot preclude that the results would have been substantially different for non-tree-based methods, which can be investigated further in future analyses.

Future research of this user-vs.-assessment-level comparison could include a hyperparameter tuning of the model on each use case, a change of model kind (i.e., from a random forest to a support vector machine) to see whether this changes the ranking. The overarching goal remains to obtain the most accurate estimate of the model’s performance after deployment.

We cannot give a final answer to what can be chosen as a common baseline heuristic. In machine learning projects, a majority vote is typically used for classification tasks, and a simple model such as a linear regression can be used for regression tasks. These approaches can also be called naive approaches since they often do not do justice to the complexity of the use case. Nevertheless, the power of a simple non-ML heuristic should not be underestimated. If only a few percentage points more performance can be achieved by the maintenance- and development-intensive ML approach, it is worth considering whether the application of a simple heuristic such as “the next assessment will be the same as the last one" is sufficient for a use case. Notably, Cawley and Talbot argue that it might be easier to build domain expert knowledge into hierarchical models, which could also function as a baseline heuristic 10 .

To retain consistency and reproducibility, we kept the users sorted by sign-up date to draw train and test users. The advantage of sorting the users is that one can simulate potential concept drift during training. The disadvantage, however, is an inherent risk of a selection bias towards users that signed up earlier for a study. From Figure 3 , we can see that the overfitting of users increases when we shuffle them. We conclude this from the fact that the difference between the average ranks of the approaches time cut and user cut increases. The advantage of shuffling users is that the splitting methods seem to depend less on the dataset. This can be deduced from the reduced standard deviation of the ranks compared to the sorted users.

Regardless of the level of splitting (user- or assessment-level), one can expect a performance drop if unknown users with unknown assessments are withheld from the model in the test set. When splitting at the user-level, the performance drop is lower during training and validation compared to the assessment-level. However, it remains questionable why we see this performance drop in the test set at all, because both, the validation folds and the test set contain unknown users with unknown assessments. A possible cause could be simple overfitting of the training data with the large random forest classifier and its 100 trees. But, also a single tree with max depth = number of features and balanced class weights has this performance drop from the validation to the test set. One explanation for the defiant performance drop could be that during cross validation information leaks from training folds to validation folds, but not to the test set.

A simple heuristic is not always trivial to beat by an ML model, depending on the use case and the complexity of the search space. Thinking of the complexity that a ML model adds to a project, a heuristic might be a valuable start to see how well the model fits into the workflow and improves the outcome. A frequent communication with the domain expert of the use case helps to set up a heuristic as a baseline heuristic. In a second step, it can be evaluated whether the performance gain from an ML model justifies the additional development effort.

Data availability

In relation to the individual data sets used (see Table  2 ), the availability is as follows: (1) TYT: The data presented in this study are available on request from the corresponding author. The data are not publicly available for data protection reasons. (2) UNITI, Corona Check, Corona Health: The investigators have access to the study data. Raw data (de-identified) can be made available on request from the corresponding author. Furthermore, only the mHealth data was used in this study on UNITI, but the entire UNITI RCT study contains even more data, which can be found here 20 .

Code availability

All code to replicate the results, models, numbers, figures, and tables is publicly available to anyone on https://github.com/joa24jm/UsAs 32 , DOI = 10.5281/zenodo.10401660.

Kunjan, S. et al. The necessity of leave one subject out (loso) cross validation for eeg disease diagnosis. In Brain Informatics: 14th International Conference, BI 2021, Virtual Event, September 17–19, 2021, Proceedings vol. 14, 558–567 (Springer, 2021).

Kohavi, R. A study of cross-validation and bootstrap for accuracy estimation and model selection. In Proceedings of the 14th International Joint Conference on Artificial Intelligence (IJCAI) , vol. 14, 1137–1145 (Montreal, Canada, 1995).

Dietterich, T. G. Approximate statistical tests for comparing supervised classification learning algorithms. Neural Comput 10 , 1895–1923 (1998).

Article   CAS   PubMed   Google Scholar  

Rudin, C. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nat. Mach. Intell. 1 , 206–215 (2019).

Article   PubMed   PubMed Central   Google Scholar  

Chapman, P. et al. Crisp-dm 1.0: Step-by-step data mining guide. SPSS Inc 9 , 1–73 (2000).

Google Scholar  

Beierle, F. et al. Corona health–a study-and sensor-based mobile app platform exploring aspects of the covid-19 pandemic. Int. J. Environ. Res. Public Health 18 , 7395 (2021).

Article   CAS   PubMed   PubMed Central   Google Scholar  

Vogel, C., Schobel, J., Schlee, W., Engelke, M. & Pryss, R. Uniti mobile–emi-apps for a large-scale european study on tinnitus. In 2021 43rd Annual International Conference of the IEEE Engineering in Medicine & Biology Society (EMBC) , vol. 43, 2358–2362 (IEEE, 2021).

Kraft, R. et al. Combining mobile crowdsensing and ecological momentary assessments in the healthcare domain. Front. Neurosci. 14 , 164 (2020).

Schleicher, M. et al. Understanding adherence to the recording of ecological momentary assessments in the example of tinnitus monitoring. Sci. Rep. 10 , 22459 (2020).

Cawley, G. C. & Talbot, N. L. On over-fitting in model selection and subsequent selection bias in performance evaluation. J. Mach. Learn. Res. 11 , 2079–2107 (2010).

Refaeilzadeh, P., Tang, L. & Liu, H. Cross-validation. Encyclopedia Database Syst. 5 , 532–538 (2009).

Article   Google Scholar  

Schratz, P., Muenchow, J., Iturritxa, E., Richter, J. & Brenning, A. Hyperparameter tuning and performance assessment of statistical and machine-learning algorithms using spatial data. Ecolog. Model. 406 , 109–120 (2019).

Shao, J. Linear model selection by cross-validation. J. Am. Stat. Associat. 88 , 486–494 (1993).

Meyer, H., Reudenbach, C., Hengl, T., Katurji, M. & Nauss, T. Improving performance of spatio-temporal machine learning models using forward feature selection and target-oriented validation. Environ. Model. Software 101 , 1–9 (2018).

Allgaier, J., Schlee, W., Probst, T. & Pryss, R. Prediction of tinnitus perception based on daily life mhealth data using country origin and season. J. Clin. Med. 11 , 4270 (2022).

Shiffman, S., Stone, A. A. & Hufford, M. R. Ecological momentary assessment. Annu. Rev. Clin. Psychol. 4 , 1–32 (2008).

Article   PubMed   Google Scholar  

Holfelder, M. et al. Medical device regulation efforts for mhealth apps during the covid-19 pandemic–an experience report of corona check and corona health. J 4 , 206–222 (2021).

Pryss, R., Reichert, M., Herrmann, J., Langguth, B. & Schlee, W. Mobile crowd sensing in clinical and psychological trials–a case study. In 2015 IEEE 28th international symposium on computer-based medical systems , 23–24 (IEEE, 2015).

Schlee, W. et al. Towards a unification of treatments and interventions for tinnitus patients: The eu research and innovation action uniti. Progress Brain Res. 260 , 441–451 (2021).

Simoes, J. P. et al. The statistical analysis plan for the unification of treatments and interventions for tinnitus patients randomized clinical trial (uniti-rct). Trials 24 , 472 (2023).

Allgaier, J., Schlee, W., Langguth, B., Probst, T. & Pryss, R. Predicting the gender of individuals with tinnitus based on daily life data of the trackyourtinnitus mhealth platform. Sci. Rep. 11 , 1–14 (2021).

Beierle, F. et al. Self-assessment of having covid-19 with the corona check mhealth app. IEEE J Biomed Health Inform. 27 , 2794–2805 (2023).

Humer, E. et al. Associations of country-specific and sociodemographic factors with self-reported covid-19–related symptoms: Multivariable analysis of data from the coronacheck mobile health platform. JMIR Public Health Surveil. 9 , e40958 (2023).

Wetzel, B. et al. "How come you don’t call me?” Smartphone communication app usage as an indicator of loneliness and social well-being across the adult lifespan during the COVID-19 pandemic. Int. Environ. Res. Public Health 18 , 6212 (2021).

Article   CAS   Google Scholar  

Kroenke, K., Spitzer, R. L. & Williams, J. B. The phq-9: validity of a brief depression severity measure. J. General Internal Med. 16 , 606–613 (2001).

Cohen, S., Kamarck, T. & Mermelstein, R. et al. Perceived stress scale. Measur. Stress: Guider Health Social Scient. 10 , 1–2 (1994).

Stone, M. Cross-validatory choice and assessment of statistical predictions. J. Royal Stat. Society: Series B (Methodological) 36 , 111–133 (1974).

Lachenbruch, P. A. & Mickey, M. R. Estimation of error rates in discriminant analysis. Technometrics 10 , 1–11 (1968).

Geisser, S. The predictive sample reuse method with applications. J. Am. Stat. Associa. 70 , 320–328 (1975).

Pedregosa, F. et al. Scikit-learn: Machine learning in python. J. Mach. Learn. Res. 12 , 2825–2830 (2011).

Schlee, W. et al. Innovations in doctoral training and research on tinnitus: The european school on interdisciplinary tinnitus research (esit) perspective. Front. Aging Neurosci 9 , 447 (2018).

Allgaier, J. Github repository ∣ from hidden groups to robust models: How to better estimate performance of mobile health models. Zenodo https://doi.org/10.5281/zenodo.10401660 (2023).

Download references

Acknowledgements

This work was partly funded by the ESIT (European School for Interdisciplinary Tinnitus Research 31 ) project, which is financed by European Union’s Horizon 2020 research and innovation programme under the Marie Sklodowska-Curie grant agreement number 722046 and the UNITI (Unification of Treatments and Interventions for Tinnitus Patients) project financed by the European Union’s Horizon 2020 Research and Innovation Programme, Grant Agreement Number 848261 19 . J.A. and R.P. are supported by grants in the projects COMPASS and NAPKON. The COMPASS and NAPKON projects are part of the German COVID-19 Research Network of University Medicine ("Netzwerk Universitätsmedizin”), funded by the German Federal Ministry of Education and Research (funding reference 01KX2021). This publication was supported by the Open Access Publication Fund of the University of Wuerzburg.

Open Access funding enabled and organized by Projekt DEAL.

Author information

Authors and affiliations.

Institute of Clinical Epidemiology and Biometry, Julius-Maximilians-University Würzburg, Josef-Schneider-Straße 2, Würzburg, Germany

Johannes Allgaier & Rüdiger Pryss

You can also search for this author in PubMed   Google Scholar

Contributions

J.A. primarily wrote this paper, created the figures, tables and plots, and trained the machine learning algorithms. R.P. supervised and revised the paper.

Corresponding author

Correspondence to Johannes Allgaier .

Ethics declarations

Competing interests.

The authors declare no competing interests.

Peer review

Peer review information.

Communications Medicine thanks Mostafa Rezapour, Koushik Howlader, and Wenyu Gao for their contribution to the peer review of this work. A peer review file is available.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary information

Peer review file, reporting summary, rights and permissions.

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ .

Reprints and permissions

About this article

Cite this article.

Allgaier, J., Pryss, R. Practical approaches in evaluating validation and biases of machine learning applied to mobile health studies. Commun Med 4 , 76 (2024). https://doi.org/10.1038/s43856-024-00468-0

Download citation

Received : 21 March 2023

Accepted : 27 February 2024

Published : 22 April 2024

DOI : https://doi.org/10.1038/s43856-024-00468-0

Share this article

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

Quick links

  • Explore articles by subject
  • Guide to authors
  • Editorial policies

Sign up for the Nature Briefing newsletter — what matters in science, free to your inbox daily.

case study health data

  • Research Article
  • Open access
  • Published: 25 April 2024

Assessing intersectional gender analysis in Nepal’s health management information system: a case study on tuberculosis for inclusive health systems

  • Ayuska Parajuli 1 ,
  • Sampurna Kakchapati 1 ,
  • Abriti Arjyal 1 ,
  • Deepak Joshi 1 ,
  • Chandani Kharel 1 ,
  • Mariam Otmani del Barrio 2 &
  • Sushil C Baral   ORCID: orcid.org/0000-0002-3425-6915 1  

Infectious Diseases of Poverty volume  13 , Article number:  31 ( 2024 ) Cite this article

Metrics details

Tuberculosis (TB) remains a major public health problem in Nepal, high in settings marked by prevalent gender and social inequities. Various social stratifiers intersect, either privileging or oppressing individuals based on their characteristics and contexts, thereby increasing risks, vulnerabilities and marganilisation associated with TB. This study aimed to assess the inclusiveness of gender and other social stratifiers in key health related national policies and the Health Management Information System (HMIS) of National Tuberculosis Programme (NTP) by conducting an intersectional analysis of TB cases recorded via HMIS.

A desk review of key policies and the NTP’s HMIS was conducted. Retrospective intersectional analysis utilized two secondary data sources: annual NTP report (2017–2021) and records of 628 TB cases via HMIS 6.5 from two TB centres (2017/18–2018/19). Chi-square test and multi-variate analysis was used to assess the association between social stratifers and types of TB, registration category and treatment outcome.

Gender, social inclusion and concept of intersectionality are incorporated into various health policies and strategies but lack effective implementation. NTP has initiated the collection of age, sex, ethnicity and location data since 2014/15 through the HMIS. However, only age and sex disaggregated data are routinely reported, leaving recorded social stratifiers of TB patients static without analysis and dissemination. Furthermore, findings from the intersectional analysis using TB secondary data, showed that male more than 25 years exhibited higher odds [adjusted odds ratio (a OR ) = 4.95, 95% confidence interval ( CI ): 1.60–19.06, P  = 0.01)] of successful outcome compared to male TB patients less than 25 years. Similarly, sex was significantly associated with types of TB ( P  < 0.05) whereas both age ( P  < 0.05) and sex ( P  < 0.05) were significantly associated with patient registration category (old/new cases).

Conclusions

The results highlight inadequacy in the availability of social stratifiers in the routine HMIS. This limitation hampers the NTP’s ability to conduct intersectional analyses, crucial for unveiling the roles of other social determinants of TB. Such limitation underscores the need for more disaggregated data in routine NTP to better inform policies and plans contributing to the development of a more responsive and equitable TB programme and effectively addressing disparities.

Nepal, in its early stage of federalisation, is a multi-ethnic, multi-lingual, multi-religious and multi-cultural state with diverse geography. The new state architecture comprises three tiers of government—one federal, seven provincial and 753 local governments. In this federal structure, health is among the most decentralized sectors, where basic health services fall under the exclusive functions of local government [ 1 , 2 ]. The local governments have the authority to plan, operate, and manage their own health systems, bringing health services closer to peoples’ home. This approach aims to narrow gaps in health service access and utilization caused by synergistic interaction with various social stratifiers, such as gender, education, occupation and socio-economic status of the individual [ 3 , 4 ].

Intersectional gender analysis involves analyzing how gender power relations intersect with other social factors (such as age, ethnicity, religion, gender, education, occupation, geography, migration status, etc.) to affect people’s lives and create differences in needs and experiences [ 5 ]. These factors intersect, either privileging or oppressing individuals based on their characteristics and contexts, thereby increasing the risk, vulnerabilities, and marginalisation. Such evidence can better inform policies, programmes and services to ensure that no one is left behind. Tuberculosis (TB) servs as an example, as National TB compared against yearly disease estimation by WHO shows that 10,000 TB patients are beyond the reach of the National Tuberculosis Programme (NTP), Nepal [ 6 ].

TB remains a public health challenge in Nepal. As of 2021, Nepal is one of the high TB burden countries, with an increasing prevalence of cases. A total of 28,677 cases were notified and registered within the NTP in 2020/21 [ 7 ]. National data indicates that males suffer two times more from TB than females [ 7 ]. The higher prevalence of TB among males is attributed to sex and gender specific behavioral factors, such as daily activities/occupation, risk behaviors, social roles and responsibilities [ 8 , 9 ]. Males travel more frequently, leading to more social contacts; spend more time in settings conducive for TB transmission (e.g. bars) and engaged in occupations associated with a higher risk of infection, such as mining, labor work [ 8 , 9 ].

TB is high settings with common practice of gender and social inequities. Lower TB prevalence among females may suggest under reporting and underdiagnosis [ 10 , 11 ]. Women in Nepal experience a longer total delay before TB diagnosis (median 3.3 months) compared to men (2.3 months) [ 12 ]. Limited household decision‐making power to females, particularly regarding healthcare, may contribute to this delay. In 2016, over 40% of women could not make decisions about their own healthcare due to reasons such as treatment cost, distance to health facilities, and lack of permission to seek treatment [ 13 ]. Apart from gender disparities, patients from rural areas experience longer delays in seeking care compared to the urban population [ 14 ]. Social barriers to the healthcare access include fear of stigma and discrimination, linked to poverty, lower caste and TB [ 15 ]. Moreover, treatment outcomes of TB are also associated with sex, gender, age, education, race/ethnicity and residential area of TB patients [ 16 ]. Although TB drugs are provided free of cost, several disabling factors such as poor socioeconomic conditions, family liabilities, and the burden of losing income contribute to loss to follow up during TB treatment [ 17 ].

Taking TB as a case example, these literature findings provide evidence that various social stratifiers such as age, sex, education, occupation, gendered roles and responsibilities, largely influence the disease and treatment outcomes [ 7 , 8 , 9 , 10 , 11 , 12 , 13 , 14 , 15 , 16 , 17 ]. Additionally, traditionally inherited caste-based discrimination is one of the major barriers to accessing healthcare services in Nepal, often interacting synergistically with gender, education, occupation and socio-economic status of individuals [ 3 , 4 ]. Therefore, the application of an intersectional gender lens is critical to improving the health status of people, including those with TB. A toolkit for conducting intersectional gender analysis for infectious diseases of poverty has been established to comprehend concerns via an intersectional gender lens. This pilot study aimed to inform the toolkit "Incorporating Intersectional Gender Analysis into Research on Poverty-Related Infectious Diseases" [ 27 ]. The study objective was to explore inclusiveness of gender and social stratifiers in key health-related policy documents and Health Management Information System (HMIS) through a desk review. Using TB as a case example, this study also aimed to assess the feasibility of conducting disaggregated and intersectional analysis of TB patient data recorded via HMIS within the NTP in Nepal.

Study design

A retrospective study was conducted through desk review and secondary data analysis. A web-based search was performed on key health-related policy documents of relevant government ministries. The reviewed policy documents included the Constitution of Nepal (2015), Urban Health Policy (2015), National Population Policy (2014), National Health Policy (2019), National Strategy for Reaching the Unreached (2016), Gender Equality and Social Inclusion Strategy of the Health Sector (2018), and Health Sector Information System National Strategy (2006). The review was focused to explore inclusiveness of gender, equity and social stratifiers within the policies and strategies [ 18 , 19 , 20 , 21 , 22 , 23 , 24 ]. Additionally, National strategy related to TB, namely National Strategic Plan for Tuberculosis Prevention, Care and Control 2016–2021 and the National Strategic Plan to End TB 2021/22–2025/26 was reviewed, considering TB as a case example in this study [ 6 , 25 ].

Moreover, HMIS was reviewed to explore the availability of various social stratifiers, with a specific focus on TB as a case example [ 26 ]. TB data reported over the last five years (2017–2021) were obtained from the website of National Tuberculosis Control Center (NTCC) for age-and -sex disaggregated trend analysis. Similarly, secondary analysis of recorded TB patients from two fiscal years (2017/18–2018/19) from two Directly Observed Therapy, Short-course (DOTS) centers was carried out to explore how sex, age and ethnicity interact with each other, shaping the treatment of TB patients enrolled in the NTP.

Study setting

Two selected DOTS centers were identified in the Metropolitan City of Kathmandu district in Bagmati province. This province had the highest number of notified TB patients in the year 2020/21 (6664) compared to other provinces in Nepal [ 7 ]. Among the districts within this province, Kathmandu alone accounted for around 44% (2982 TB cases), contributing approximately 10.39% to the national total (28,677) [ 7 ].

According to the 2021 National Housing and Population Census, Kathmandu district had a total population of 2,017,532, with the male population (1,025,727), slightly outnumbering the female population (991,805) [ 27 ]. This district experiences high migration from various parts of the country. There were 45 DOTS center in Kathmandu, located in various hospitals, referral centers and Urban Health Clinics (UHC) [ 28 , 29 ]. Based on the information obtained for the fiscal year 2017/18 and 2018/19, Swayambhu UHC was one of the UHCs with a high case load, handling approximately 200–250 TB cases per year. Similarly, a referral center named Nepal Anti-Tuberculosis Association (NATA), supported by the German Nepal Tuberculosis Project (GENETUP), was the largest TB referral center in Kathmandu, linked to various DOTS centres. This referral center documented approximately 400–450 cases in the last two fiscal years- 2017/18, 2018/2019. Therefore, considering feasibility with limited time and resources, we purposively selected Swayambhu UHC and NATA for this study based on their TB case load.

Data collection

All DOTS centers follow HMIS 6.5 as the TB treatment register. The same template was created in an excel sheet for data collection. Before data collection, coordination meetings with officials of the NTCC and the Epidemiology and Disease Control Division (EDCD) of the Department of Health Services, Ministry of Health and Population were conducted. During these meetings, stakeholders were oriented about the study objectives and methodology. Official letters of support were received from the respective government offices, facilitating communication with officials at DOTS centres. Similarly, clinic supervisors and DOTS focal persons from the study sites were oriented on the objectives and methodology of the study.

Supervisors, who were the data custodians at the DOTS centers, were requested to provide anonymised data during data collection. Each patient was assigned a unique identification number and no other patient identifiable information was obtained during the data collection. Data was collected and entered at the DOTS center in the presence of data custodian. Any missing information in the register was immediately discussed with the data custodian, who validated data with other records of the respective patient present in the DOTS center. Data collection took place over a period of one month in July 2020.

Data management and analysis

MS Excel was used for the data cleaning, while statistical software STATA version 14 (StataCorp LLC, College Station, Texas, USA), and the R Programme (Lucent Technologies, Jasmine Mountain, USA) were used to analyse data and create graphs, respectively. The study variables were patients’ age, sex, ethnicity, types of TB, patient registration category (new or old cases), and treatment outcome. Ethnicity of patient was categorised into two groups: ‘advantaged caste groups’ and ‘dis-advantaged caste groups’ [ 30 ]. Advantaged caste groups included the upper caste group (from both hilly and Terai region) and relatively advantaged Janajatis (Newar, Thakali, Gurung) [ 30 ]. Dis-advantaged caste group included Dalit (from both hilly and Terai region), dis-advantaged Janajati (hilly and Terai), religious minorities (Muslim) and dis-advantaged non-dalit Terai caste groups [ 30 ].

The final treatment outcome was dichotomised into ‘successful treatment outcome’ and ‘unfavorable treatment outcome’ variables [ 26 ]. Successful treatment outcome comprised patients classified as ‘cured’ and ‘completed treatment’ [ 26 , 31 ], while unfavorable treatment outcome included ‘died’, ‘treatment failure’, ‘lost to follow up’ and ‘not evaluated’ [ 26 , 27 ]. Participant age was categorised into groups according to weightage of the data i.e. ≤ 14, 15–24, 25–54 and ≥ 55 years, to facilitate comparison [ 29 ].

Data exploration involved descriptive and inferential statistics, following the process outlined in WHO toolkit ‘Incorporating intersectional gender analysis into research on infectious diseases of poverty: A toolkit for health researchers’ [ 15 ]. Sex-disaggregated data analysis was conducted in each step to identify difference between males and females across different ages and ethnic groups. Furthermore, we assessed whether any statistical difference existed between different age groups and ethnic groups within sex. Bivariate analysis employed Chi-square test to measure the association between available social stratifiers and types of TB and patient registration category. For variables with expected cell value less than five in bivariate analysis, the Fischer-exact test was applied. Multivariate logistic regression determined the most significant determinants associated with treatment outcome. Crude and adjusted odds ratios ( OR s) were calculated during the analysis, with a 95% confidence interval ( CI ) used to report the OR .

Where do we stand in terms of understanding inequalities in health system of Nepal? What is being done?

The Constitution of Nepal (2015) provides greater inclusion of female, marginalized and disadvantaged groups [ 18 , 33 ]. Subsequently, there has been notable progress in biological and the social construct of gender approaches in various policies and strategies. These initiatives mandate civil society and economic participation, as well as health service utilisation by women. Gender, social inclusion and the concept of intersectionality are well incorporated into existing National Health Policy, Nepal Health Sector Strategy, Gender Equality and Social Inclusion Strategy of the Health Sector, Urban Health Policy, Population Policy and National Strategy for Reaching the Unreached [ 19 , 20 , 21 , 23 , 24 ].

In an endeavor to reach the unreached, the Ministry of Health and Population (MoHP) established a 'Gender Equality and Social Inclusion' (GESI) section in 2013. This proactive step aimed to address disparities and promote inclusivity by mainstreaming GESI in the health sector [ 34 ]. However, despite numerous efforts, the implementation of GESI policies faces challenges due to limited operational structures and capacity at various levels within the health system. Consequently, inequities in health outcomes persist across various social stratifiers [ 3 , 35 ]. Challenges continue with the implementation of gender-sensitive and gender-responsive legislation, policies, and acts, including the intersectional recognition of factors affecting men or women based on ethnicity, caste, religion, language, indigeneity, marital status, occupation, geographical location, ability, and access to health and education [ 34 , 36 , 37 , 38 ]. These interaction occur within connected systems where social determinants and the structure of power in the society synergistically and antagonistically act, forming the privilege and oppression of individuals [ 39 ].

Furthermore, in 2014/15, the MoHP revised the HMIS to include variables such as sex, age, caste/ethnicity, and location/address. This revision enables the assessment of disaggregated health data, offering a more comprehensive understanding of 11 selected health indicators [ 34 , 40 ]. HMIS is primarily used in the public sector for recording and reporting routine health services data from public health facilities at all three levels of government (local, provincial, and federal). The private sector maintains its own information systems for recording purposes, which are not yet integrated with the government's HMIS. However, a few private health facilities report to HMIS for selected programme indicators only.

Health Management Information System (HMIS): TB as a case example

HMIS in Nepal comprises distinct registers for recording TB service data, namely HIMS 6.1 Tuberculosis Sample Collection Form, 6.2 Tuberculosis Laboratory Register, 6.3 Tuberculosis Treatment Card (Health Facility), 6.4 Tuberculosis Treatment Card (Patient), 6.5 Tuberculosis Treatment Register, 6.6 Smoking cessation Register, 6.7 drug resistant (DR) Tuberculosis Laboratory Register, and 6.8 DR Tuberculosis Treatment Register [ 26 ]. All these TB registers typically include fields for recording demographic information, including age, sex, ethnicity, address, name of the caregivers and contact number of the service recipients. The classification of sex is limited to male and female, with no provision for individuals with non-binary gender identities.

While social stratifiers such as age, sex and ethnicity are recorded at the health facility levels, there are limitations in reporting this data to higher authorities. The standard reporting format predominantly focus on sex and age-disaggregated data. Information disseminated at the national level by the government through annual reports based on HMIS findings includes disaggregation by sex, age, and province. This highlights the gap, indicating that the health information system has limitations in understanding service utilisation patterns by different population groups to make tailored decisions and interventions (Fig.  1 ).

figure 1

Flowchart presenting loss of variables during recording and reporting mechanism of TB. DHIS2 District Health Information Software 2; DoHS Department of Health Services; HMIS Health management information system

Scope of conducting disaggregated and intersectional analysis from the available HMIS data: taking TB as an example

Secondary data analysis was performed to assess the current limitations in conducting intersectional gender analysis with the available TB data through the HMIS, rather than producing new findings to inform disease (TB) perspective. It is essential to note that the TB programme is taken only as an illustrative example. The insights gained from this analysis could contribute to inform HMIS recording and reporting practices for various diseases and health programmes, promoting a more inclusive system.

Trend of annually reported TB cases disaggregated by ecological region, age and sex

There were pronounced variations in TB cases across different regions of Nepal. The Terai region (the lowland plains) consistently reported the highest TB cases, followed by the Hill region (the hilly areas) and the Mountain region (the mountainous areas) for the last five years. The highest proportion of TB cases was found among the population aged 65 years and above, whereas lowest proportion was found among less than 14 years. In terms of sex-wise distribution, the proportion of TB cases is notably higher among males compared to females over the last five years. These findings provide important insights into the epidemiology of TB in Nepal, showcasing variations in regional prevalence, age-related patterns, and gender disparities (Fig.  2 ).

figure 2

Tuberculosis cases by region, age, sex (Data Source-National Tuberculosis Control Center) [ 41 ]

Disaggregated analysis of the recorded TB cases

We collected information from 628 TB patients from two DOTS centers, among whom 510 (81.2%) were new TB patients, while 118 (18.8%) had received previous TB treatment. During the data collection period, 152 (24.2%) were under TB DOTS treatment and 476 (75.8%) had completed their treatment. Among the patients, 338 (54.0%) had pulmonary TB (PTB), and 290 (46.0%) had extra-pulmonary TB (EPTB). Of those who completed treatment, 399 (83.8%) were successfully treated, 71 (14.9%) had an unfavorable treatment outcome and 6 (1.3%) moved to second line treatment (data not shown).

The overall male-to-female TB patient ratio was 1.1 (333/295). The age distribution of male TB patients ranged widely from a minimum age of 9 months to a maximum age of 92 years. Similarly, the age diversity of female TB patients followed a similar pattern, ranging from a minimum age of one year to a maximum age of 93 years. However, median (md) and inter-quartile range (IQR) for the age of males (md = 34 years; IQR = 22–50) were higher than those of females (md = 27 years; IQR = 21–38). In both sexes, the highest percentage of TB patients belonged to the 25–54 years age group [male (46.6%) and female (45.4%)], while the ≤ 14 years age group had the lowest TB cases [male (5.7%) as well as female (4.4%)]. Similarly, more than half of the male (55.6%) and female (56.3%) TB patients belonged to advantaged caste group, while the remaining belonged to disadvantaged caste group (Table  1 ).

Comparison of types of TB according to age, sex and ethnicity

There was a significant association between sex of the patient and the types of TB ( P  < 0.05). Among the reported cases, the proportion of males with PTB was higher (61.3%) compared to females (45.4%), while the proportion of males with EPTB was lower (38.7%) than that of females (54.6%). Figure  3 shows the proportion of pulmonary TB patients and their 95% confidence interval among different age and ethnic groups disaggregated by sex. The red horizontal line in Fig.  3 represents the proportion of pulmonary TB among total cases, i.e., 54.0%. Within males, the proportion of PTB increased with age, with the highest proportion of TB patients observed in the ≥ 55 years age group. Males had a higher prevalence of PTB compared to females in both, advantaged and dis-advantaged caste group (Fig.  3 ).

figure 3

Comparison of Pulmonary TB cases by age and ethnic groups disaggregated by sex

Patient’s registration category (old/new cases) across age, sex and ethnicity

Age and sex were significantly associated with patients’ types of TB cases during registration while enrolling into the TB regimen ( P  < 0.05). A significantly higher percentage of males (61.9%) sought retreatment compared to females (38.1%) ( P  < 0.05). Similarly, patients in the 25–54 years age group constituted a significantly higher proportion (44.1%) in the retreatment category. Although not statistically significant, a higher proportion (59.3%) of the disadvantaged caste group sought retreatment compared to the advantaged caste group (40.7%). While the difference is not statistically significant, it still underscores a noteworthy trend. (Table  2 ).

Treatment outcome across age, sex and ethnicity

Out of 628 TB patients, a treatment outcome was obtained for 470 patients and 6 patients were moved to the second line treatment, which was not considered in the two categories of treatment outcome (successful and unfavorable) [ 32 ]. Figure  4 demonstrates the successful treatment outcome and its 95% confidence interval among age groups and ethnic groups disaggregated by sex. The red horizontal line represents the proportion of treatment success of TB patients among total TB patients i.e., 84.8%. The rate of successful treatment gradually decreased with age among both male and female TB patients. Female TB patients had higher successful treatment outcome in comparison to male across both caste groups.

figure 4

Comparison of treatment success rate of TB cases by age and ethnic groups disaggregated by sex

Multivariate logistic analysis was conducted to assess the relationship between combined variables i.e. ‘sex and age’ and ‘sex and ethnicity’ and treatment outcome, where age group was categorized into two groups (≤ 25 years and > 25 years) due to insufficient sample size within four category of age groups. The results reveal that male more than 25 years exhibited higher odds (a OR  = 4.95, 95% CI : 1.60–19.06, P  = 0.01) of successful outcome compared to male TB patients less than 25 years (Table  3 ).

Despite of numerous efforts to apply an intersectional gender lens in the policies, implementation has been mixed, leading to evident health inequities across various social stratifiers [ 3 , 35 ]. The literature suggests that variations exist in the availability and utilization of health services, as well as health status of individuals based on several factors, including gender, age groups (with a special focus on vulnerable age groups), geography, urban/rural locations, socio-economic status, caste, ethnicity and religions, the presence of disabilities (both physical and mental) and disaster affected areas [ 38 , 42 , 43 ]. Moreover, multiple layers of vulnerability are created when two or more of these determinants intersect, amplifying the risks faced by excluded or marganilised populations. This shows the critical need for and importance of conducting gender and intersectional analysis in policy making and health planning, ensuring that no one is left behind and to addressing the specific needs of diverse and vulnerable population groups.

Although the importance of disaggregated data is emphasised in policies, its actual use for planning and developing programmes and interventions is limited. Another challenge lies in the fact that HMIS often have limited variables to record and report health service delivery data, restricting gender and intersectional analysis. This study highlights practical constraints in using existing HMIS for inclusive gender and intersectional analysis. However, it is unclear whether, how and what extent these information management systems in public and private sector provide gender and equity-focused evidence and how they inform decisions. All these challenges impede progress towards strengthening a health system that is more responsive and leaves no-one behind in federalized context. Therefore, more social stratifiers should be added to the HMIS recording and reporting forms, followed by incorporating intersectional gender lens while analysing and reporting the HMIS data. Such context highlights the complexity of addressing gender and social inclusion issues within the health sector in Nepal. While efforts have been made to recognise and tackle these challenges, practical implementation remains a significant hurdle due to capacity gaps, resource constraints and the limitations for comprehensive data collection and analysis. Addressing these challenges is crucial for achieving more equitable health outcomes across diverse social groups.

Intersectional analysis of HMIS recorded data conducted in this study illustrated various differences across sex, age and ethnicity. The proportion of females with EPTB was higher than males, consistent with studies in the United Denmark [ 16 ], and India [ 17 ]. Several factors, such as endocrine factors, smoking, and past history of TB exposure were thought to be related to this inequality [ 36 , 38 ].

Sex was significantly associated with the treatment success rate where a greater proportion of females had favorable treatment compared to males. Analysis of gender differentials has indicated that women who begin treatment for TB are more likely to adhere to the full course of treatment compared to men, resulting in a positive treatment outcome [ 40 , 41 ]. Men, being sole breadwinners, are engaged in various informal sectors and have less chance to become aware of the disease; hence, the probability of treatment non-adherence is high [ 32 ]. This continues as a cycle of TB, where a high proportion of male TB patients came for re-treatment of TB compared to female, as evidenced in this study. Furthermore, this study identified that the treatment success rate gradually decreases with an increase in age among both sexes, aligning with other studies [ 45 , 46 ]. This could be because older TB patients interrupt adherence to treatment more often than younger persons and are challenged by several determinants of health, such as low socioeconomic status, low immunity and poor access to health facility [ 47 ]. Therefore, older persons with TB might benefit from close monitoring in order to make their treatment successful [ 48 ].

Our study did not identify significant differences in TB-related outcomes across ethnic groups. However, various studies conducted in other countries have shown that the migrant population and ethnic minorities have a higher prevalence of TB in comparison to the general population [ 49 , 50 , 51 , 52 , 53 ]. This could be because of interactions between cultural and structural barriers to accessing healthcare [ 3 , 4 , 50 , 51 , 52 , 53 ]. Behind this, social power and structures have influenced vulnerability and treatment outcome of TB among people living in slums and densely populated urban settings, people living in congregate settings like factories, prisons, camps and refugees [ 44 ]. With limitations on disaggregated population data in the routine healthcare information system and a lack of context-specific models for identification and determining numbers and distribution of high-risk groups, there is less effective coverage of priority health interventions among these groups[ 14 ]. This has resulted in difficulties in the timely diagnosis of TB and prompt initiation of treatment [ 14 ].

Apart from these, other studies shows that, even though anti-TB medicines are provided free of cost, various factors such as socio-economic conditions, fear of losing job, lack of education, ethnicity as a cross cutting factor, family responsibilities contribute to the loss to follow up during TB treatment [ 16 , 17 ]. Because of these reasons, sex, gender, age, education, occupation, race/ethnicity and residential area of TB patients also interplay with each other to influence the treatment outcome of TB [ 16 , 17 ]. Hence, if we could move towards specific approaches of recording, reporting, and analysing of TB cases according to social strata (age, sex, ethnicity, education, occupation, province, etc.) of TB patients, this would contribute to narrowing down the existing information gap and identifying the unreached population.

There are some limitations in our study. Secondary data was used for the study which limited the scope of variables of this study as social stratifiers recorded in the HMIS 6.5 register of the TB was just confined to age, sex and ethnicity of the patient. This narrowed down the opportunity to conduct intersectional gender analysis to the wider extent. Also, treatment outcome of all the TB patients from the collected data could not be analysed across social stratifiers because 152 patients were still under TB treatment regimen during the time of data collection, for which treatment outcome was awaited. This ultimately reduced our sample size while analyzing ‘treatment outcome’ for this study.

The intersectional analysis conducted with limited variables (age, sex and ethnicity) presented differences across treatment outcome and types of TB within different age group and ethnicity of male and female TB patients. Hence, this study reflected the potential of reaching the unreached or vulnerable group of population via intersectional gender analysis when range of social stratifiers are captured, analysed and evidence-based decision is taken. Similarly, the findings highlight the inadequacy in the availability of social stratifiers in routine HMIS TB data. This limitation hampers the NTP’s ability to conduct intersectional analysis, essential for unveiling the roles and impacts of various social determinants of TB. Such limitation underscores the necessity for more disaggregated and inclusive data in routine NTP HMIS, enhancing the ability to inform policies and plans for building a more responsive and equitable TB programme that can systematically address disparities in TB outcomes.

Availability of data and materials

The datasets generated during this study are not publicly available due data confidentiality policy but are available from the corresponding author on reasonable request.

Abbreviations

Adjusted odds ratio

Confidence interval

District Health Information Software 2

Department of Health Services

Drug resistant

Directly observed therapy, short-course

Epidemiology and Disease Control Division

Extra-pulmonary tuberculosis

Gender Equality and Social Inclusion

German Nepal Tuberculosis Project

  • Health Management Information System

Inter-quartile range

Ministry of Health and Population

Nepal Anti-Tuberculosis Association

National Tuberculosis Control Center

  • National Tuberculosis Programme

Pulmonary tuberculosis

  • Tuberculosis

Urban Health Clinics

World Health Organization

Vaidya A, Simkhada P, Simkhada B. The impact of federalization on health sector in Nepal: new opportunities and challenges. J Nepal Health Res Counc. 2019;17:558–9.

Article   Google Scholar  

Thapa R, Bam K, Tiwari P, Sinha TK, Dahal S. Implementing federalism in the health system of Nepal: opportunities and challenges. Int J Heal Policy Manag. 2019;8:195–8.

Ghimire U, Manandhar J, Gautam A, Tuladhar S, Prasai Y, Gebreselassie T, et al. Inequalities in health outcomes and access to services by caste/ethnicity, province, and wealth quintile in Nepal. 2019. https://dhsprogram.com/publications/publication-fa117-further-analysis.cfm . Accessed 2 Jul 2021.

Prasad Pandey J, Dhakal MR, Karki S, Poudel P, Pradhan MS. Maternal and child health in Nepal: the effects of caste, ethnicity, and regional identity: Further analysis of the 2011 Nepal demographic and health survey. 2013. https://www.dhsprogram.com/pubs/pdf/FA73/FA73.pdf . Accessed 2 Jul 2021.

World Health Organization. Incorporating intersectional gender analysis into research on infectious diseases of poverty: a toolkit for health researchers. https://www.who.int/publications/i/item/9789240008458 . 2020. Accessed 10 Jan 2021.

National Tuberculosis Center. National strategic plan for tuberculosis prevention, care and control 2016–2021. https://nepalntp.gov.np/wp-content/uploads/2018/01/NSP-report-english-revised.pdf . 2016. Accessed 28 Jun 2021.

Department of Health Services. Annual report 2077/78 (2020/21). https://dohs.gov.np/wp-content/uploads/2022/07/DoHS-Annual-Report-FY-2077-78-date-5-July-2022-2022_FINAL.pdf . 2021. Accessed 20 Oct 2022.

Guerra-Silveira F, Abad-Franch F. Sex bias in infectious disease epidemiology: patterns and processes. PLoS ONE. 2013;8: e62390.

Article   CAS   PubMed   PubMed Central   Google Scholar  

Nhamoyebonde S, Leslie A. Biological differences between the sexes and susceptibility to tuberculosis. J Infect Dis. 2014;209:S100–6.

Article   PubMed   Google Scholar  

Yang W-T, Gounder CR, Akande T, De Neve J-W, McIntire KN, Chandrasekhar A, et al. Barriers and delays in tuberculosis diagnosis and treatment services: does gender matter? Tuberc Res Treat. 2014;2014: 461935.

PubMed   PubMed Central   Google Scholar  

Krishnan L, Akande T, Shankar AV, McIntire KN, Gounder CR, Gupta A, et al. Gender-related barriers and delays in accessing tuberculosis diagnostic and treatment services: a systematic review of qualitative studies. Tuberc Res Treat. 2014;2014: 215059.

Khanal S, Elsey H, King R, Baral SC, Bhatta BR, Newell JN. Development of a patient-centred, psychosocial support intervention for multi-drug-resistant tuberculosis (MDR-TB) care in Nepal. PLoS ONE. 2017;12: e0167559.

Article   PubMed   PubMed Central   Google Scholar  

Faramand TH, Dale K, Ivankovich M, Roberts K, Hall ML, Foster AA, et al. Desk review on gender issues affecting neglected tropical diseases. 2019. http://www.wi-her.org/wp-content/uploads/2019/10/Act-East-Gender-Desk-Review.pdf . Accessed 10 Jul 2023.

National Tuberculosis Control Centre. Epidemiological review of tuberculosis surveillance in Nepal. https://nepalntp.gov.np/wp-content/uploads/2021/04/EPI-Report-27-May-2020.pdf . 2019. Accessed 28 Jun 2021.

Baral SC, Karki DK, Newell JN. Causes of stigma and discrimination associated with tuberculosis in Nepal: a qualitative study. BMC Public Health. 2007;7:211.

Dos Santos JN, Sales CMM, do Prado TN, Maciel EL. Factors associated with cure in the treatment of tuberculosis in the state of Rio de Janeiro, 2011–2014. Epidemiol Serv Saúde. 2018;27:e2017464.

Heemanshu A, Satwanti K. Determinants of lost to follow up during treatment among tuberculosis patients in Delhi. Int J Med Res Heal Sci. 2016;5:145–52.

Google Scholar  

National Law Commission. The constitution of Nepal, 2015. https://lawcommission.gov.np/np/ . 2015. Accessed 11 Feb 2023.

Ministry of Health. National strategy for reaching the unreached 2016–2030. https://mohp.gov.np/uploads/Resources/1657873926209MoH’s NationalStrategyforReachingUnreached2016-2030(1).pdf. 2016. Accessed 1 Dec 2021.

Ministry of Health and Population. National health policy of Nepal, 2076 (2019). http://climate.mohp.gov.np/news/31-acts/164-national-health-policy-2076 . 2019. Accessed 10 Sep 2021.

Ministry of Health and Population. National population policy, 2071 (2014). https://mohp.gov.np/uploads/Resources/1657877798972Population_Policy.pdf . 2014. Accessed 24 Dec 2021.

Ministry of Health and Population. Health sector information system: National strategy. https://mohp.gov.np/uploads/Resources/1657873799807Health-Sector-Information-Strategy.pdf . 2006. Accessed 13 Oct 2022.

Ministry of Health and Population. Urban health policy, 2072 (2015). https://climate.mohp.gov.np/downloads/Urban_Health_Policy_2072.pdf . 2015. Accessed 8 Mar 2021.

Ministry of Health and Population. Gender equality and social inclusion strategy of the health sector, 2018. https://nepalindata.com/media/resources/items/0/bGENDER_EQUALITY_AND_SOCIAL_INCLUSION_STRATEGY_OF_THE_HEALTH_SECTOR_2018.pdf . 2018. Accessed 5 Jul 2021.

National Tuberculosis Control Center. National strategic plan to end TB 2021/22–2025/26. https://nepalntp.gov.np/wp-content/uploads/2022/07/TB-National-Strategic-Plan-English-report-UPDATED-July-15-2022.pdf . 2021. Accessed 11 Feb 2023.

Department of Health Services. Health management information system guidelines. https://dohs.gov.np/wp-content/uploads/2019/03/HMIS-Guideline-2075.pdf . 2018. Accessed 23 Dec 2021.

National Statistical Office. Preliminary report of census 2021. https://censusnepal.cbs.gov.np/Home/Details?tpid=5&dcid=3479c092-7749-4ba6-9369-45486cd67f30&tfsid=17 . 2021. Accessed 7 Jun 2022.

Department of Health Services. Annual report 2074/75 (2017/18). https://dohs.gov.np/wp-content/uploads/2019/07/DoHS-Annual-Report-FY-2074-75-date-22-Ashad-2076-for-web-1.pdf . 2018. Accessed 3 Sep 2020.

National Tuberculosis Center. National tuberculosis program Nepal, annual report 2074/75. 2018. https://nepalntp.gov.np/wp-content/uploads/2019/03/NTP-Annual-Report-2074-75-Up.pdf . Accessed 15 Oct 2020.

Pandey JP, Dhakal MR, Karki S, Poudel P, Pradham MS. Maternal and child health in Nepal: the effects of caste, ethnicity, and regional identity: Further analysis of the 2011 Nepal Demographic and Health Survey. 2013. https://www.dhsprogram.com/pubs/pdf/FA73/FA73.pdf . Accessed 2 Jul 2021.

World Health Organization. Definitions and reporting framework for tuberculosis - 2013 revision (updated Dec 2014 and Jan 2020). 2020. https://www.who.int/publications/i/item/9789241505345 . 2020. Accessed 19 Jul 2022.

Nanzaluka FH, Chibuye S, Kasapo CC, Langa N, Nyimbili S, Moonga G, et al. Factors associated with unfavourable tuberculosis treatment outcomes in Lusaka, Zambia, 2015: a secondary analysis of routine surveillance data. Pan Afr Med J. 2019;32:159.

Acharya KK. Local governance restructuring in Nepal: from government to governmentality. Dhaulagiri J Sociol Anthropol. 2018;12:37–49.

Ministry of Health and Population. Progress report on gender equality and social inclusion for NHSSP-2 2013/14. http://www.nhssp.org.np/NHSSP_Archives/jar/2015/08GESI_JAR_report_february2015.pdf . 2015. Accessed 11 Oct 2022.

Central Bureau of Statistics. Nepal multiple indicator cluster survey 2019, survey findings report. https://www.unicef.org/nepal/reports/multiple-indicator-cluster-survey-final-report-2019 . 2020. Accessed 19 Jul 2023.

Asian Development Bank. Sectoral perspectives on gender and social inclusion. 2011. https://www.adb.org/sites/default/files/publication/30354/spgsi-monograph-4-health.pdf . Accessed 3 Oct 2022.

Hankivsky O, Cormier R. Intersectionality and public policy: some lessons from existing models. Polit Res Q. 2011;64:217–29.

Daniels D, Ghimire K, Thapa P, Réveillon M, Pathak DR, Baral K, et al. Nepal health sector programme II (NHSP II): mid-term review. 2013. http://www.heart-resources.org/wp-content/uploads/2013/06/NHSP-II-MTR-Report-FINAL-15-02-13.pdf?08012f . Accessed 3 Jul 2021.

Wolfe R, Molyneux S, Morgan R, Gilson L. Using intersectionality to better understand health system resilience. 2017. https://resyst.lshtm.ac.uk/sites/resyst/files/content/attachments/2018-08-21/Resilience andintersectionalitybrief.pdf. Accessed 2 Jul 2021.

Department of Health Services. Annual report 2071/72 (2014/15). https://dohs.gov.np/wp-content/uploads/2016/06/Annual_Report_FY_2071_72.pdf. 2016. Accessed 7 Jul 2021.

National Tuberculosis Control Center. National tuberculosis program service data. 2022. https://nepalntp.gov.np/pub_cat/service_data/ . Accessed 11 Feb 2023.

Department of Health Services. Annual report 2076/77 (2019/2020). https://dohs.gov.np/wp-content/uploads/2021/07/DoHS-Annual-Report-FY-2076-77-for-website.pdf . 2020. Accessed 10 Jul 2021.

Saito E, Gilmour S, Yoneoka D, Gautam GS, Rahman MM, Shrestha PK, et al. Inequality and inequity in healthcare utilization in urban Nepal: a cross-sectional observational study. Health Policy Plan. 2016;31:817–24.

National Tuberculosis Control Centre. Strategic interventions. 2021. https://nepalntp.gov.np/strategic-interventions/ . Accessed 28 Jun 2021.

Izudi J, Tamwesigire IK, Bajunirwe F. Treatment success and mortality among adults with tuberculosis in rural eastern Uganda: a retrospective cohort study. BMC Public Health. 2020;20:501.

Atif M, Anwar Z, Fatima RK, Malik I, Asghar S, Scahill S. Analysis of tuberculosis treatment outcomes among pulmonary tuberculosis patients in Bahawalpur. Pakistan BMC Res Notes. 2018;11:370.

Gabida M, Tshimanga M, Chemhuru M, Gombe N, Bangure D. Trends for tuberculosis treatment outcomes, new sputum smear positive patients in Kwekwe district, Zimbabwe, 2007–2011: a cohort analysis. J Tuberc Res. 2015;03:126–35.

Disassa H, Teklu T, Tafess K, Asebe G, Ameni G. Treatment outcome of tuberculosis patients under directly observed treatment of short course in Benishangul Gumuz region, Western Ethiopia: a ten-year retrospective study. Gen Med Open Access. 2015;3:4.

Gilmour B, Xu Z, Bai L, Alene KA, Clements ACA. The impact of ethnic minority status on tuberculosis diagnosis and treatment delays in Hunan Province. China BMC Infect Dis. 2022;22:90.

Gopie FA, Hassankhan A, Ottevanger S, Krishnadath I, de Lange W, Zijlmans CWR, et al. Ethnic disparities in tuberculosis incidence and related factors among indigenous and other communities in ethnically diverse Suriname. J Clin Tuberc Other Mycobact Dis. 2021;23: 100227.

Maharjan B, Nakajima C, Isoda N, Thapa J, Poudel A, Shah Y, et al. Genetic diversity and distribution dynamics of multidrug-resistant Mycobacterium tuberculosis isolates in Nepal. Sci Rep. 2018;8:16634.

Hayward S, Harding RM, McShane H, Tanner R. Factors influencing the higher incidence of tuberculosis among migrants and ethnic minorities in the UK. F1000Research. 2018;7:461.

Adhikari N, Bhattarai RB, Basnet R, Joshi LR, Tinkari BS, Thapa A, et al. Prevalence and associated risk factors for tuberculosis among people living with HIV in Nepal. PLoS ONE. 2022;17: e0262720.

Download references

Acknowledgements

This study acknowledges support received from the study respondents, health institutions at federal, provincial, and local government of Nepal.

This research was funded by the UNICEF/UNDP/World Bank/WHO Special Programme for Research and Training in Tropical Diseases (TDR), World Health Organization, Geneva, Switzerland (Reference 2019/980668-1).

Author information

Authors and affiliations.

HERD International, Saibu Awas Cr-10 Marga, Bhaisepati, Lalitpur, Nepal

Ayuska Parajuli, Sampurna Kakchapati, Abriti Arjyal, Deepak Joshi, Chandani Kharel & Sushil C Baral

UNICEF/UNDP/World Bank/WHO Special Programme for Research and Training in Tropical Diseases (TDR), World Health Organization, Geneva, Switzerland

Mariam Otmani del Barrio

You can also search for this author in PubMed   Google Scholar

Contributions

AP: design, tool development, data collection, analysis and write up. SK: quantitative data analysis and write up. AA: analysis, write up and review. DJ: analysis and review. CK: design, analysis, and review. MOB: design and review. SCB: design, supervision and quality assurance, analysis, write up, review and submission.

Corresponding author

Correspondence to Sushil C Baral .

Ethics declarations

Ethics approval and consent to participate.

This study obtained ethical approval (Reg. No. 656/2019) from the Ethical Review Board of Nepal Health Research Council (NHRC) and the Research Ethics Review Committee of the World Health Organization. All the information collected during data collection was recorded in password-protected computer, with access granted only to the core research team. As per HERD International’s data management policy, in alignment with NHRC’s data management guideline, the data collected for this research will be disposed after 5 years. In this study, we didn’t collect the data/information from the participants. However, institutional consent was obtained from the supervisor of the DOTS center, who is also the data custodian, to collect retrospective data. During the data collection, the name and identity of the TB patients was anonymized, and the collected information was used solely for research purpose. Data was collected from the TB registers into an Excel template by our researchers under the direct observation of the data custodian of the respective DOTS center. To ensure anonymity, the data custodian completely covered the section of the register with the patients’ name and address with an opaque sheet of chart paper. As each patient’s entry was given unique identification number, we do not have any personal information of TB patient to identify them. This process of collecting anonymized data, with the help of the data custodian, ensured the maintenance of patient information anonymity and confidentiality.

Consent for publication

Not applicable.

Competing interest

The authors declare that they have no competing interests.

The authors alone are responsible for the views expressed in this article and they do not necessarily represent the views, decisions or policies of the institutions with which they are affiliated.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ . The Creative Commons Public Domain Dedication waiver ( http://creativecommons.org/publicdomain/zero/1.0/ ) applies to the data made available in this article, unless otherwise stated in a credit line to the data.

Reprints and permissions

About this article

Cite this article.

Parajuli, A., Kakchapati, S., Arjyal, A. et al. Assessing intersectional gender analysis in Nepal’s health management information system: a case study on tuberculosis for inclusive health systems. Infect Dis Poverty 13 , 31 (2024). https://doi.org/10.1186/s40249-024-01194-4

Download citation

Received : 16 October 2023

Accepted : 06 March 2024

Published : 25 April 2024

DOI : https://doi.org/10.1186/s40249-024-01194-4

Share this article

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

  • Intersectional gender analysis
  • Gender and social inequities
  • Social determinant
  • Social inclusion

Infectious Diseases of Poverty

ISSN: 2049-9957

  • Submission enquiries: Access here and click Contact Us
  • General enquiries: [email protected]

case study health data

ORIGINAL RESEARCH article

The effects of ambient temperature on road traffic injuries in jinan city: a time-stratified case-crossover study based on distributed lag nonlinear model.

YinLu Li

  • 1 School of Public Health, Weifang Medical University, Weifang, China
  • 2 Department of Non-communicable Disease Control and Prevention, Shandong Center for Disease Control and Prevention, Jinan, China

Objectives: The impact of climate change, especially extreme temperatures, on health outcomes has become a global public health concern. Most previous studies focused on the impact of disease incidence or mortality, whereas much less has been done on road traffic injuries (RTIs). This study aimed to explore the effects of ambient temperature, particularly extreme temperature, on road traffic deaths in Jinan city.

Methods: Daily data on road traffic deaths and meteorological factors were collected among all residents in Jinan city during 2011–2020. We used a time-stratified case-crossover design with distributed lag nonlinear model to evaluate the association between daily mean temperature, especially extreme temperature and road traffic deaths, and its variation in different subgroups of transportation mode, adjusting for meteorological confounders.

Results: A total of 9,794 road traffic deaths were collected in our study. The results showed that extreme temperatures were associated with increased risks of deaths from road traffic injuries and four main subtypes of transportation mode, including walking, Bicycle, Motorcycle and Motor vehicle (except motorcycles), with obviously lag effects. Meanwhile, the negative effects of extreme high temperatures were significantly higher than those of extreme low temperatures. Under low-temperature exposure, the highest cumulative lag effect of 1.355 (95% CI, 1.054, 1.742) for pedal cyclists when cumulated over lag 0 to 6 day, and those for pedestrians, motorcycles and motor vehicle occupants all persisted until 14 days, with ORs of 1.227 (95% CI, 1.102, 1.367), 1.453 (95% CI, 1.214, 1.740) and 1.202 (95% CI, 1.005, 1.438), respectively. Under high-temperature exposure, the highest cumulative lag effect of 3.106 (95% CI, 1.646, 5.861) for motorcycle occupants when cumulated over lag 0 to 12 day, and those for pedestrian, pedal cyclists, and motor vehicle accidents all peaked when persisted until 14 days, with OR values of 1.638 (95% CI, 1.281, 2.094), 2.603 (95% CI, 1.695, 3.997) and 1.603 (95% CI, 1.066, 2.411), respectively.

Conclusion: This study provides evidence that ambient temperature is significantly associated with the risk of road traffic injuries accompanied by obvious lag effect, and the associations differ by the mode of transportation. Our findings help to promote a more comprehensive understanding of the relationship between temperature and road traffic injuries, which can be used to establish appropriate public health policies and targeted interventions.

1 Introduction

Road traffic injuries (RTIs) has not only become a social safety problem that cannot be ignored worldwide ( 1 , 2 ), but a huge challenge to economic development and public health as well ( 3 ), with the development of motorization rapidly. The Global Status Report on Road Safety 2018 showed that around 1.35 million people worldwide died from RTIs each year, and 50 million people suffered non-fatal injuries from traffic crashes, some disabled as a result among them ( 4 ). RTIs can lead to permanent injuries, such as traumatic brain injury, spinal injury, amputations, which might have devastating, lifelong effects on road users ( 5 , 6 ). The global economic cost resulted from RTIs is estimated at $1.8 trillion in 2015–2030, equivalent to 0.12 percent of global gross domestic product ( 7 ). RTIs are more severe in developing countries than in developed countries, and 93 percent of road traffic deaths occur in low-income and middle-income countries ( 6 ).

World Health Organization reported that one-fifth of global road traffic deaths occurred in China ( 8 ). RTIs are the leading cause of accident deaths in China and have been on the rise in China since the end of 1980s ( 9 ). In recent years, although some studies have reported that RTIs showed a downward trend attributed to road safety laws and various policies in China, the mortality of RTIs is still much higher than that in developed countries ( 9 – 11 ). As a result, it is still a very serious problem in China that should be paid more attention to ( 2 , 12 ).

In addition to human behaviors, road conditions and vehicles, environmental factors are one of the important reasons leading to RTIs. In recent years, accompanying for the increasing of global warming and air pollution, more and more researches pay attention to the influence of ambient temperature on people’s health. Extreme temperature is currently considered to be an important risk factor for RTIs, which has a direct or indirect impact on the occurring of traffic injuries ( 13 , 14 ). Weather changes can affect the vehicle itself, road conditions, the judgment and reaction of the driver during the driving process, and the riding environment of the driver and passenger on varying degrees. However, much less studies have focused on the impact on RTIs, especially on RTIs fatalities and the epidemiological evidence for its exact impact is not uniform due to the limitations of traditional research methods ( 15 , 16 ). To reduce the incidence and mortality of RTIs under extreme temperature conditions, It is urgent need to study the relationship between extreme temperature and RTIs specifically.

A time-stratified case-crossover design approach for distributed lag nonlinear model can be used to study the relationship between environmental factors and health outcomes, overcome multiple biases, and make full use of information from sample data, maximally close to the truth to achieve unbiased estimates. In this study, we used a time-stratified case-crossover design based on distributed lag nonlinear model to examine the relationship between extreme temperature and the risk of RTIs’ deaths, on the basis of 10-year traffic accident statistics from Jinan city, Shandong Province in China. The study will help to develop public health policies and interventions to reduce the negative impacts of climate change on RTIs fatalities in extreme weather conditions.

2 Materials and methods

2.1 study area and data collection.

Jinan is the capital city of Shandong Province, located at the east coast of China an mid-latitudes. It’s a temperate monsoon climate, characterized by four distinct seasons with warm and rainy summer, cold and dry winter and comfortable spring and autumn transition season.

We collected individual road traffic death records among all inhabitants in Jinan city during 2011–2020 from the Shandong Provincial Death Registration Information Reporting System. Data information were extracted such as date of death, sex, age, and underlying cause of death. The underlying cause of road traffic death was coded by the International Classification of Diseases (version 10), which the coding range was V00-V89. In this study, Road traffic deaths were classified into walking, bicycle, motorcycle, motor vehicle (except motorcycle) and other subtypes according to the mode of transport. Due to the small sample size, deaths caused by injuries in other transportation accidents were excluded from the statistical analysis.

The meteorological data for the same period was obtained from the China Meteorological Data Sharing Service System, 1 including daily mean temperature (°C), relative humidity (%), wind speed (m/s), and barometric pressure (hPa). To adjust for the potential impacts of air pollutants, daily concentrations of PM2.5 (μg/m 3 ), PM10 (μg/m 3 ), SO2 (μg/m 3 ), NO2 (μg/m3), CO (mg/m 3 ), and O3 (μg/m 3 ) were extracted from the National Environmental Monitoring Center (NEMC) of China.

2.2 Statistical analysis

A time-stratified case-crossover design ( 15 ) was used in this study to assess the effects of extreme temperature on RTIs, which is equivalent to a time-stratified self-matched case–control study. Poisson regression was achieved by setting dummy variables (e.g., year, month, day of the week), with the same day of the week of the same month in the same year as the control day (up to 4 control days per case). It can not only control the influence of time trend (such as seasonality and the “day of the week” effect) and meteorological factor, etc., at the same time, the bias caused by individual-level confounders (such as age, intelligence, heredity, etc.) between cases and controls can be avoided, thus the unbiased estimation of parameters can be obtained.

Considering the possible nonlinear relationship between daily mean temperature and RTIs ( 13 , 16 , 17 ), we used conditional Poisson regression with Distributed Lag Non-linear Models (DLNM) to estimate the exposure-response and exposure-lag associations between temperature and RTIs deaths ( 18 – 20 ). The results were reported as odds ratios (OR) and 95% confidence interval (CI). A lag period up to 14 days was used to adequately respond to exposure effects and lag effects. Based on generalized cross-validation ( 21 ), we fitted the exposure-response and exposure-lag associations using natural cubic spline with 2 degrees of freedom (df). To avoid potential meteorological and pollutant confounders in the association between ambient temperature and RTIs ( 22 ), a natural cubic spline with 3 df was used to control for the effect of relative humidity, and that with 2 df was used to control for pollutants. All data were analyzed using R software (Version 4.2.2), where “gam” and “dlmn” software packages were used to fit the conditional Poisson regression models and exposure-lag-response curves, respectively. The reference temperature, namely minimum-mortality temperature, was determined as the temperature corresponding to the lowest death risk in the exposure-response curve. Extremely low and high temperatures were defined as the 1th and 99th percentile of temperature, respectively.

2.3 Sensitivity analyses

We carried out a series of methods to test the robustness of the results. First, the number of lag days in this study was changed from 0–14 days to 0–7 days and 0–21 days so as to test whether a 14-day lagging was sufficient for exposure-response and exposure-lag effects. Second, The df of meteorological confounders were varied from 2 to 4 to check the robustness of the fitted model. Last, the 2.5th or 10th percentile of temperature, defined as extreme low temperature, and the 97.5th or 90th percentile of temperature, defined as extreme high temperature, respectively, were used to fit the temperature-road traffic death lag response curve to check for changes in the model results.

3.1 Descriptive analysis

Table 1 shows the descriptive statistics for daily deaths related to RTIs in Jinan city. A total of 9,794 road traffic injury deaths were collected during 2011–2020 and the daily mean deaths were 3. The percentage of deaths was much higher in male (73.01%), those aged 35–64 years (57.31%), and pedestrians (45.9%) than in female (26.99%), those with other age groups, and those by other transport mode, respectively.

www.frontiersin.org

Table 1 . The descriptive statistics for daily deaths on RTIs among residents with different characteristics in Jinan city, 2011–2020.

Table 2 presents the results of descriptive statistics about the daily levels of meteorological indicators and air pollutants. The mean daily mean temperature was 15.13°C, and the daily mean minimum and maximum temperatures were − 12.4°C and 33.8°C, respectively.

www.frontiersin.org

Table 2 . Descriptive statistics about daily level of meteorological and air pollution indicators in Jinan city, 2011–2020.

Spearman correlation tests were used to identify the correlation between meteorological and air pollution indicators, and the correlation coefficients r were showed in Table 3 . R ≥ 0.6 was regarded as strong correlation. The daily mean temperature showed a strong correlation with the daily maximum 8-h average ozone and daily atmospheric pressure.

www.frontiersin.org

Table 3 . Correlation coefficients between meteorological and air pollution indicators in Jinan city, 2011–2020.

3.2 Relationship between ambient temperature and road traffic injuries

As shown in Figure 1 , the cumulative exposure-response relationship between daily mean temperature and RTIs deaths presented an inverted U-shaped curve. The minimum-mortality temperature (MMT) was −12.4°C when the death risk of RTIs was the lowest. With increasing daily mean temperature, the death risk of RTIs increased gradually, reached the highest at 17.6°C (OR = 2.09, 95% CI: 1.66, 2.63), and then decreased. We found an obviously nonlinear relationship between daily mean temperature and RTIs fatalities from the exposure-lag-response 3D and contour plots. High and low temperatures were found to have “protective” effects on RTIs deaths on the current day. However, with the increase of lag days, both high and low temperature could increase the death risk of RTIs. The death risk reached highest when the lag day was 14th day, and was obviously higher in high temperature than that in low temperature.

www.frontiersin.org

Figure 1 . Exposure-response relationship between total RTIs deaths and daily mean temperature in Jinan city, 2011–2020. (A) Exposure-response curve. (B) 3D effect plot. (C) Contour plot.

The exposure-lag-response curves were plotted, when the 1st (−6°C) and 99th (32°C) percentile of daily mean temperatures as regarded to the temperature thresholds for extremely low and high temperatures, respectively ( Figure 2 ). The results showed that the effects of extreme low temperature and high temperature on RTIs deaths showed a similar trend with the change of lag time and the death risks of extreme temperature had obviously lag effects on RTIs fatalities. Both extreme high temperature and low temperature were not associated with RTIs deaths on the current day and up to 10 lag days, while the association appeared until 11th lag day, and then cumulatively reached highest over lag days of 14. The effect of extreme high temperature on RTIs fatalities was obviously higher than that of extreme low temperature, no matter single-day effect or cumulative effect.

www.frontiersin.org

Figure 2 . The lagged-response curves for association between Daily Mean Temperature and RTIs fatalities in Jinan City, 2011–2020. (A) Single-day lagged low temperature. (B) Cumulative lagged low temperature. (C) Single-day lagged high-temperature. (D) Cumulative lagged high-temperature.

The results have shown that exposure to low temperature has a significant lag effect on the risk of death from different types of RTIs. When exposed to low temperature, the cumulative effect of road injury mortality risk for pedestrians and motorcycle riders was significant on Lag11 and Lag13, respectively, and the maximum single-day effect was appeared on Lag14, with OR values of 1.227 (95% CI: 1102, 1.367) and 1.453 (95% CI: 1.214, 1.74). The risk of death for cyclists was occurred on Lag3 and reached the maximum on Lag6 (OR = 1.355, 95% CI: 1.054, 1.742), disappeared after 6 days, then reappeared on Lag12 and reached the maximum on Lag14. The mortality risk for passengers of motor vehicles was statistically significant on Lag14 (OR = 1.202, 95% CI: 1.005, 1.438). Exposure to high temperatures also has a lag effect on the risk of RTIs death among residents. The cumulative effect of high temperature on the risk of death for pedestrians, cyclists, and motor vehicle passengers all reached the maximum on Lag14, with OR values of 1.638 (95% CI: 1.281, 2.094), 2.603 (95% CI: 1.695, 3.997), and 1.603 (95% CI: 1.066, 2.411), respectively. The mortality risk for motorcycle riders increased significantly on Lag6, reaching a maximum OR of 3.106 (1.646, 5.861) on Lag12 ( Figure 3 ; Table 4 ).

www.frontiersin.org

Figure 3 . Cumulative Effects of extreme low-temperature and high-temperature on RTIs fatalities over lag days 0–14 in different subtypes of transport mode, in Jinan city, 2011–2020. (A) Cumulative effect at extreme low temperature (−6°C) on pedestrian injuries in transport accidents. (B) Cumulative effect at extreme low temperature (−6°C) on cyclist injuries in transport accidents. (C) Cumulative effect at extreme low temperature (−6°C) on motorcyclist injuries. (D) Cumulative effect at extreme low temperature (−6°C) on motor vehicle (except motorcycles) occupant injuries in transport accidents. (E) Cumulative effect at extreme high temperature (32°C) on pedestrian injuries in transport accidents. (F) Cumulative effect at extreme high temperature (32°C) in cyclist injuries in transport accidents. (G) Cumulative effect at extreme high temperature (32°C) on motorcyclist injuries in transport accidents. (H) Cumulative effect at extreme high temperature (32°C) on motor vehicle (except motorcycles) occupant injuries in transport accidents.

www.frontiersin.org

Table 4 . Cumulative effect estimates of extreme low temperature and high temperature related to RTIs fatalities over lag days 0–14 by different mode of transportation.

3.3 Results of sensitivity analysis

The results of the sensitivity analysis were robust, as described in Supplementary material . The exposure-response curves and relative risks were similar when changing the maximum lag days from 14 to 7 and 21, respectively ( Supplementary Figure S1 ). The effects of confounders with different df on RTIs remained unchanged when exposed to extreme low and high temperature, respectively, over lag day 0–14 ( Supplementary Table S1 ). The associations between extreme low and high temperature and RTIs fatalities were stable over lag days 0–14 using different threshold values of extreme temperature ( Supplementary Figure S2 ).

4 Discussion

In this study, we examined the association between ambient temperature, particularly extreme temperatures, and road injury deaths in Jinan city. The results confirmed that extreme high and low temperatures were positively associated with the risk of RTIs fatalities with a significant lag effect. Meanwhile, the effect of extreme high temperature on RTIs fatalities was significantly higher than that of extreme low temperatures. In addition, the effects of extreme high temperature on RTIs fatalities were obviously stronger among cyclist and motorcyclist than those among pedestrians and motorized vehicle (except motorcycles) personnel.

Our study provided evidence for the association between extreme temperatures and road accident deaths in Jinan city, Shandong Province. The present study found an inverted U-shaped curve in the relationship between daily mean temperature and RTIs fatalities, with the highest risk occurring at moderate temperatures (17.6°C). Similar to our findings, a study in Beijing (China) found that accidental injuries and deaths occurred more frequently on warm days, with the highest likelihood of emergency treatment for accidental injuries occurring at 26°C instead of extreme temperatures ( 23 ). An Italian study also found that the peak of workplace accidents occurred at hot but not extreme temperatures ( 24 ). The increased risk of RTIs in warm temperatures may be related to the increased frequency with which people go out or traveling ( 25 ). With the rapid development of road transportation and urban service industry, increased traffic jams and poor self-protection of non-motor vehicles make road traffic injuries more likely to occur ( 26 ).

Previous study have shown that high and low temperatures were obviously correlated with RTIs, and the effect of extreme hot weather was significantly higher than that of extreme cold weather ( 27 ), which was consistent with the results of this paper on the impact of high temperatures. Wu et al. ( 16 ) showed that there was a significant positive correlation between fatal traffic accidents and heat waves in the United States. Bergel-Hayat et al. ( 28 ) believed that small temperature changes will have a significant impact on the risk of traffic accidents. For every 1°C increase in monthly mean temperature, the number of collisions increases by 1–2%. These studies all suggested an association between ambient temperature and RTIs. In a study of meta-analysis using daily mean temperatures, higher temperatures was found to increased the risk of traffic injuries by 2.4% (RR = 1.024, 95% CI 0.939, 1.116) ( 29 ). Basagaña et al. ( 30 ) found that for every 1°C increase in maximum temperature, there was a significant increase in the estimated risk (OR = 1.1, 95% CI: 0.1, 2.1) of crashes due to the driver performance factor. The susceptibility to crashes exposing to high-temperature was related to multiple mechanisms, such as human behavior, vehicle conditions, and environmental factors ( 31 ). Higher temperatures may lead to reduced vigilance and inattention which directly resulted in the poor driving behaviors ( 32 – 35 ). At the same time, high temperatures may increase the likelihood of dangerous behaviors such as running red lights and driving in the wrong lane ( 36 ). In addition, changes in ambient temperatures may also result in the occurrence of disease, which increased the risk of traffic accidents for road users when they traveled ( 37 ). These reasons ultimately lead to an increase in the risk of road collisions in high-temperature environments.

Our study found that, pedestrians, cyclists and motorcyclists are at greater risk of injury than motorized vehicle (except motorcycles) personnel, regardless of exposure to low temperature or high temperature, which is in line with the results of the World Health Organization report on vulnerable groups of road traffic injuries. In our study, pedestrians, bicyclists and motorcyclists were the vulnerable groups of road traffic injuries, accounting for more than half of all RTIs deaths (77.84%). It may be related to their occupations and the modes of transport they used. For example, with the rapid development of takeaway, express delivery industry, bike-sharing and other industries, the use of bicycles and motorcycles has been increasing. Road crashes and injuries are more likely to occur resulted from increased traffic jams and the poor self-protection performance of road users in above travel modes ( 38 ). In addition, ambient temperature has a greater effect on two-wheeler (bicycle and motorcycle) users than four-wheeler vehicle users due to their direct exposure to the external environment. A study in Belgium has shown an increase in the frequency of road traffic accidents among two-wheeled vehicle (bicycle and motorcycle) users in warm weather ( 39 ). Daanen et al. ( 40 ) noted that in extremely cold environments, the driving status of motorcyclists become worse due to the cold, thus increasing the risk of traffic accidents. Similarly, studies have shown that motorcyclists will distract their attention because of coping with thermal stress under high temperature exposure, and then indirectly reduce their ability to cope with various traffic conditions ( 41 ). Moreover, the temperature may affect resident’s choice of travel mode so as to make more appropriate travel decisions. Gan found that the effect of temperature on the number of bicycle trips was significant. Compared with spring, the number of bicycle trips increased in autumn, but decreased in summer and winter ( 42 ). Hu thought that residents in cold areas could reduce the use of bicycles and electric vehicles, and more use of motor vehicles to travel ( 43 ).

There are inconsistent evidences regarding to the lag effects of ambient temperature on RTIs. Some studies suggested that the risk of traffic injuries was significantly associated with high and low temperatures, with significant lag effects, which was consistent with the findings in our study ( 17 , 27 ). A study in Dalian, China, using distributed lag nonlinear model found that both high (RR = 1.198, 95% CI: 1.017–1.411) and low temperatures (RR = 1.017, 95% CI: 1.001–1.035) increased the risk of RTIs, with a cumulative lagged effect that beyond day 7 ( 44 ). This indicates that the effects of ambient temperature on human health may continue to affect road users for several days, resulting in a lagged effect of ambient temperature on traffic injuries. The mechanism of the lagged response between temperature and road traffic accidents is not completely clear. Some scholars believed high temperatures can result in prolonged heat stress and sleep disturbances to increase the risk of daytime fatigue driving, and ultimately lead to road traffic accidents ( 45 ). Ma et al. ( 23 ) pointed out that extreme temperature leaded to the changes in the body ‘s immune system and body temperature system, and the intensity of its regulation indirectly affected the state of road users over a period of time. In addition, high temperatures may lead to 5-hydroxytryptamine dysfunction ( 46 ) and brain damage ( 47 ), which will have a negative impact on people’s decision-making ability in the long term and increase the risk of road traffic accidents. However, some other studies have shown that there is no or less lag in the effect of temperature on RTIs. Lee et al. showed that the effect of high temperature on RTIs reached the maximum on the same day, while the effect of low temperature was significant with 2-days lag ( 14 ). The inconsistence in research findings may be due to differences in the definition of injury types, subgroups of the target population, and meteorological and geographical conditions. Focusing on the lag effect will help us better understand the impacts of environmental temperature, especially extreme temperature, on RTIs, so as to effectively control and prevent temperature-related traffic injuries.

There are some limitations in this study. First, there may be some error in the measurements of temperature at the time of death from road accidents because the measurements of daily temperature were collected from fixed monitoring sites in Jinan city, and therefore may not have been the actual temperature of the location where the deceased patient was exposed. Second, we did not distinguish ally the road traffic injuries related to occupations in our study. Given that most previous studies on unintentional injuries have focused on occupational injuries, direct comparisons may have had some impact on the results. Third, we did not consider the impact on lag effects of high and low temperatures caused by the possible intervention of early warning systems for heat or cold waves. In addition, the impact of rainfall on traffic accidents is evident, however, due to the unavailability of data, this study did not consider the confounding effects of rainfall on temperature. And last, The results may be given rise to some bias in subgroup analysis due to the small sample sizes for each subgroup of road traffic injuries. In the future, more large-scale and meticulous studies are needed to determine the correlation between temperature and road traffic injuries.

5 Conclusion

In summary, extreme low and high temperature were positively associated with the increased risk of road traffic fatalities in Jinan city, with a significant lag effect, and the effect of extreme high temperature on RTIs fatalities was significantly higher than that of extreme low temperatures. The risk association and days of lag effects were different between extreme temperatures and RTIs fatalities in subtgroups of transportation mode. This study helps to develop public health policies and interventions to reduce the negative impacts of climate change on road traffic injuries in extreme weather conditions.

Data availability statement

The raw data supporting the conclusions of this article will be made available by the authors, without undue reservation.

Author contributions

YL: Writing – original draft, Writing – review & editing, Data curation, Formal analysis, Methodology, Visualization, Software. JR: Writing – review & editing, Project administration. WZ: Writing – review & editing, Supervision. JD: Writing – review & editing, Project administration. ZL: Writing – review & editing, Data curation, Formal analysis. ZZ: Writing – review & editing, Data curation. XG: Writing – review & editing, Conceptualization, Data curation, Funding acquisition, Project administration, Validation. JC: Writing – review & editing, Conceptualization, Data curation, Funding acquisition, Project administration, Validation. AX: Writing – review & editing, Conceptualization, Funding acquisition, Project administration, Validation.

The author(s) declare that financial support was received for the research, authorship, and/or publication of this article. We are grateful for funding support from Taishan Scholar Project (TS201511105).

Acknowledgments

We would like to thank the officials of the local health agencies and all of the participants and staff at the study sites for their cooperation.

Conflict of interest

The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Publisher’s note

All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article, or claim that may be made by its manufacturer, is not guaranteed or endorsed by the publisher.

Supplementary material

The Supplementary material for this article can be found online at: https://www.frontiersin.org/articles/10.3389/fpubh.2024.1324191/full#supplementary-material

Abbreviations

AEs, adverse events; ASV, asunaprevir; CI, credible interval; DAAs, direct-acting antiviral regimens; DCV, daclatasvir; ESRD, end-stage renal disease; G/P, glecaprevir/pibrentasvir; GZR/EBR, grazoprevir-elbasvir; HCC, hepatocellular carcinoma; HCV, hepatitis C virus; LDV, ledipasvir; OBV/PTV/R, ombitasvir/paritaprevir/ritonavir; OBV/PTV/R plus DSV, ombitasvir /paritaprevir/ritonavir plus dasabuvir; RBV, ribavirin; SAEs, serious adverse events; SOF, sofosbuvir; SVR, sustained virologic response; VEL, velpatasvir.

1. ^ http://data.cma.cn/

1. Rosen, HE, Bari, I, Paichadze, N, Peden, M, Khayesi, M, Monclús, J, et al. Global road safety 2010-18: an analysis of global status reports. Injury . (2022). doi: 10.1016/j.injury.2022.07.030

PubMed Abstract | Crossref Full Text | Google Scholar

2. Qi, M, Hu, X, Li, X, Wang, X, and Shi, X. Analysis of road traffic injuries and casualties in China: a ten-year nationwide longitudinal study. PeerJ . (2022) 10:e14046. doi: 10.7717/peerj.14046

3. Ahmed, SK, Mohammed, MG, Abdulqadir, SO, El-Kader, RGA, El-Shall, NA, Chandran, D, et al. Road traffic accidental injuries and deaths: a neglected global health issue. Health Sci Rep . (2023) 6:e1240. doi: 10.1002/hsr2.1240

4. Li, X, Ma, Q, Wang, W, and Wang, B. Influence of weather conditions on the intercity travel mode choice: a case of Xi'an. Comput Intell Neurosci . (2021) 2021:1–15. doi: 10.1155/2021/9969322

Crossref Full Text | Google Scholar

5. Bakhsh, A, Aljuzair, AH, and Eldawoody, H. An epidemiological overview of spinal trauma in the Kingdom of Saudi Arabia. Spine Surg Related Res . (2020) 4:300–4. doi: 10.22603/ssrr.2019-0118

6. Maas, AIR, Menon, DK, Manley, GT, Abrams, M, Åkerlund, C, Andelic, N, et al. Traumatic brain injury: progress and challenges in prevention, clinical care, and research. Lancet Neurol . (2022) 21:1004–60. doi: 10.1016/S1474-4422(22)00309-X

7. Bezabih, Y, Tesfaye, B, Melaku, B, and Asmare, H. Pattern of orthopedic injuries related to road traffic accidents among patients managed at the emergency Department in Black Lion Hospital, Addis Ababa, Ethiopia, 2021. Open Access Emerg Med . (2022) 14:347–54. doi: 10.2147/OAEM.S368324

8. Liu, G, Chen, S, Zeng, Z, Cui, H, Fang, Y, Gu, D, et al. Risk factors for extremely serious road accidents: results from national road accident statistical annual report of China. PLoS One . (2018) 13:e0201587. doi: 10.1371/journal.pone.0201587

9. Wang, SY, Li, YH, Chi, GB, Xiao, SY, Ozanne-Smith, J, Stevenson, M, et al. Injury-related fatalities in China: an under-recognised public-health problem. Lancet (London, England) . (2008) 372:1765–73. doi: 10.1016/S0140-6736(08)61367-7

10. Ren, K, Miao, L, and Lyu, J. The temporal trend of road traffic mortality in China from 2004 to 2020. SSM Popul Health . (2023) 24:101527. doi: 10.1016/j.ssmph.2023.101527

11. Chu, J, Xu, ML, Lu, ZL, Liu, J, Chen, XX, Dong, J, et al. Mortality level and tendency of road traffic injury in Shandong Province from 2012 to 2020. Zhonghua Yu Fang Yi Xue Za Zhi . (2022) 56:1307–13. doi: 10.3760/cma.j.cn112150-20220520-00510

12. Yuan, P, Qi, G, Hu, X, Qi, M, Zhou, Y, and Shi, X. Characteristics, likelihood and challenges of road traffic injuries in China before COVID-19 and in the postpandemic era. Humanit Soc Sci Commun . (2023) 10:2. doi: 10.1057/s41599-022-01482-0

13. Zhan, ZY, Yu, YM, Chen, TT, Xu, LJ, and Ou, CQ. Effects of hourly precipitation and temperature on road traffic casualties in Shenzhen, China (2010-2016): a time-stratified case-crossover study. Sci Total Environ . (2020) 720:137482. doi: 10.1016/j.scitotenv.2020.137482

14. Lee, H, Myung, W, Kim, H, Lee, EM, and Kim, H. Association between ambient temperature and injury by intentions and mechanisms: a case-crossover design with a distributed lag nonlinear model. Sci Total Environ . (2020) 746:141261. doi: 10.1016/j.scitotenv.2020.141261

15. Wu, Y, Li, S, and Guo, Y. Space-time-stratified case-crossover Design in Environmental Epidemiology Study. Health Data Sci . (2021) 2021:9870798. doi: 10.34133/2021/9870798

16. Wu, CYH, Zaitchik, BF, and Gohlke, JM. Heat waves and fatal traffic crashes in the continental United States. Accid Anal Prev . (2018) 119:195–201. doi: 10.1016/j.aap.2018.07.025

17. Gariazzo, C, Bruzzone, S, Finardi, S, Scortichini, M, Veronico, L, and Marinaccio, A. Association between extreme ambient temperatures and general indistinct and work-related road crashes. A nationwide study in Italy. Accid Anal Prev . (2021) 155:106110. doi: 10.1016/j.aap.2021.106110

18. Gasparrini, A. Distributed lag linear and non-linear models in R: the package dlnm. J Stat Softw . (2011) 43:1–20. doi: 10.18637/jss.v043.i08

19. Ai, H, Nie, R, and Wang, X. Evaluation of the effects of meteorological factors on COVID-19 prevalence by the distributed lag nonlinear model. J Transl Med . (2022) 20:170. doi: 10.1186/s12967-022-03371-1

20. Gasparrini, A, and Armstrong, B. Reducing and meta-analysing estimates from distributed lag non-linear models. BMC Med Res Methodol . (2013) 13:1. doi: 10.1186/1471-2288-13-1

21. Kucheryavskiy, S, Rodionova, O, and Pomerantsev, A. Procrustes cross-validation of multivariate regression models. Anal Chim Acta . (2023) 1255:341096. doi: 10.1016/j.aca.2023.341096

22. Guo, Y, Punnasiri, K, and Tong, S. Effects of temperature on mortality in Chiang Mai city, Thailand: a time series study. Environ Health . (2012) 11:36. doi: 10.1186/1476-069X-11-36

23. Ma, P, Wang, S, Fan, X, and Li, T. The impacts of air temperature on accidental casualties in Beijing, China. Int J Environ Res Public Health . (2016) 13:1073. doi: 10.3390/ijerph13111073

24. Morabito, M, Cecchi, L, Crisci, A, Modesti, PA, and Orlandini, S. Relationship between work-related accidents and hot weather conditions in Tuscany (Central Italy). Ind Health . (2006) 44:458–64. doi: 10.2486/indhealth.44.458

25. He, L, Liu, C, Shan, X, Zhang, L, Zheng, L, Yu, Y, et al. Impact of high temperature on road injury mortality in a changing climate, 1990-2019: a global analysis. Sci Total Environ . (2023) 857:159369. doi: 10.1016/j.scitotenv.2022.159369

26. Ma, C, Yang, D, Zhou, J, Feng, Z, and Yuan, Q. Risk riding behaviors of urban E-bikes: a literature review. Int J Environ Res Public Health . (2019) 16:2308. doi: 10.3390/ijerph16132308

27. Zare Sakhvidi, MJ, Yang, J, Mohammadi, D, FallahZadeh, H, Mehrparvar, A, Stevenson, M, et al. Extreme environmental temperatures and motorcycle crashes: a time-series analysis. Environ Sci Pollut Res Int . (2022) 29:76251–62. doi: 10.1007/s11356-022-21151-8

28. Bergel-Hayat, R, Debbarh, M, Antoniou, C, and Yannis, G. Explaining the road accident risk: weather effects. Accid Anal Prev . (2013) 60:456–65. doi: 10.1016/j.aap.2013.03.006

29. Liang, M, Min, M, Guo, X, Song, Q, Wang, H, Li, N, et al. The relationship between ambient temperatures and road traffic injuries: a systematic review and meta-analysis. Environ Sci Pollut Res Int . (2022) 29:50647–60. doi: 10.1007/s11356-022-19437-y

30. Basagaña, X, Escalera-Antezana, JP, Dadvand, P, Llatje, Ò, Barrera-Gómez, J, Cunillera, J, et al. High ambient temperatures and risk of motor vehicle crashes in Catalonia, Spain (2000-2011): a time-series analysis. Environ Health Perspect . (2015) 123:1309–16. doi: 10.1289/ehp.1409223

31. Theofilatos, A, and Yannis, G. A review of the effect of traffic and weather characteristics on road safety. Accid Anal Prev . (2014) 72:244–56. doi: 10.1016/j.aap.2014.06.017

32. Abdel-Aty, M, Ekram, AA, Huang, H, and Choi, K. A study on crashes related to visibility obstruction due to fog and smoke. Accid Anal Prev . (2011) 43:1730–7. doi: 10.1016/j.aap.2011.04.003

33. Abedi, L, and Sadeghi-Bazargani, H. Epidemiological patterns and risk factors of motorcycle injuries in Iran and eastern Mediterranean region countries: a systematic review. Int J Inj Control Saf Promot . (2017) 24:263–70. doi: 10.1080/17457300.2015.1080729

34. Hammad, HM, Ashraf, M, Abbas, F, Bakhat, HF, Qaisrani, SA, Mubeen, M, et al. Environmental factors affecting the frequency of road traffic accidents: a case study of sub-urban area of Pakistan. Environ Sci Pollut Res Int . (2019) 26:11674–85. doi: 10.1007/s11356-019-04752-8

35. Nazif-Munoz, JI, Martínez, P, Williams, A, and Spengler, J. The risks of warm nights and wet days in the context of climate change: assessing road safety outcomes in Boston, USA and Santo Domingo, Dominican Republic. Inj Epidemiol . (2021) 8:47. doi: 10.1186/s40621-021-00342-w

36. Fu, C, and Liu, H. Investigating influence factors of traffic violations at signalized intersections using data gathered from traffic enforcement camera. PLoS One . (2020) 15:e0229653. doi: 10.1371/journal.pone.0229653

37. Luo, J, He, G, Xu, Y, Chen, Z, Xu, X, Peng, J, et al. The relationship between ambient temperature and fasting plasma glucose, temperature-adjusted type 2 diabetes prevalence and control rate: a series of cross-sectional studies in Guangdong Province, China. BMC Public Health . (2021) 21:1534. doi: 10.1186/s12889-021-11563-5

38. Hou, K, Zhang, L, Xu, X, Yang, F, Chen, B, and Hu, W. Ambient temperatures associated with increased risk of motor vehicle crashes in New York and Chicago. Sci Total Environ . (2022) 830:154731. doi: 10.1016/j.scitotenv.2022.154731

39. Masterson, JM, and Richardson, FA. Humidex: a method of quantifying human discomfort due to excessive heat and humidity . Downsview: Environment Canada (1979).

Google Scholar

40. Daanen, HA, van de Vliert, E, and Huang, X. Driving performance in cold, warm, and thermoneutral environments. Appl Ergon . (2003) 34:597–602. doi: 10.1016/S0003-6870(03)00055-3

41. Hancock, PA, Ross, JM, and Szalma, JL. A meta-analysis of performance response under thermal stressors. Hum Factors . (2007) 49:851–77. doi: 10.1518/001872007X230226

42. Sung, H. Causal impacts of the COVID-19 pandemic on daily ridership of public bicycle sharing in Seoul. Sustain Cities Soc . (2023) 89:104344. doi: 10.1016/j.scs.2022.104344

43. ZQ, H. Research on the influencing factors of winter travel mode choice for residents in cold regions . Lanzhou Gansu Province: Lanzhou Jiaotong University. (2023).

44. Liang, M, Zhao, D, Wu, Y, Ye, P, Wang, Y, Yao, Z, et al. Short-term effects of ambient temperature and road traffic accident injuries in Dalian, northern China: a distributed lag non-linear analysis. Accid Anal Prev . (2021) 153:106057. doi: 10.1016/j.aap.2021.106057

45. Zheng, G, Li, K, and Wang, Y. The effects of high-temperature weather on human sleep quality and appetite. Int J Environ Res Public Health . (2019) 16:270. doi: 10.3390/ijerph16020270

46. Brewerton, TD, Putnam, KT, Lewine, RRJ, and Risch, SC. Seasonality of cerebrospinal fluid monoamine metabolite concentrations and their associations with meteorological variables in humans. J Psychiatr Res . (2018) 99:76–82. doi: 10.1016/j.jpsychires.2018.01.004

47. Bazille, C, Megarbane, B, Bensimhon, D, Lavergne-Slove, A, Baglin, AC, Loirat, P, et al. Brain damage after heat stroke. J Neuropathol Exp Neurol . (2005) 64:970–5. doi: 10.1097/01.jnen.0000186924.88333.0d

Keywords: ambient temperature, extreme temperature, road traffic injuries, case-crossover study, distributed lag nonlinear model

Citation: Li Y, Ren J, Zheng W, Dong J, Lu Z, Zhang Z, Xu A, Guo X and Chu J (2024) The effects of ambient temperature on road traffic injuries in Jinan city: a time-stratified case-crossover study based on distributed lag nonlinear model. Front. Public Health . 12:1324191. doi: 10.3389/fpubh.2024.1324191

Received: 19 October 2023; Accepted: 03 April 2024; Published: 23 April 2024.

Reviewed by:

Copyright © 2024 Li, Ren, Zheng, Dong, Lu, Zhang, Xu, Guo and Chu. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY) . The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

*Correspondence: Xiaolei Guo, [email protected] ; Jie Chu, [email protected]

Disclaimer: All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article or claim that may be made by its manufacturer is not guaranteed or endorsed by the publisher.

  • Agri-Commodities
  • Asean Economic Community
  • Banking & Finance
  • Business Sense
  • Entrepreneur
  • Executive Views
  • Export Unlimited
  • Harvard Management Update
  • Monday Morning
  • Mutual Funds
  • Stock Market Outlook
  • The Integrity Initiative
  • Editorial cartoon
  • Design&Space
  • Digital Life
  • 360° Review
  • Biodiversity
  • Climate Change
  • Environment
  • Envoys & Expats
  • Health & Fitness
  • Mission: PHL
  • Perspective
  • Today in History
  • Tony&Nick
  • When I Was 25
  • Wine & Dine
  • Live & In Quarantine
  • Bulletin Board
  • Public Service
  • The Broader Look

Today’s front page, Thursday, April 25, 2024

screenshot 2024 04 24 at 11.59.25 pm

PHL is a complex, inspiring case study in sexual, reproductive health, rights–UNFPA

  • Rizal Raoul Reyes
  • April 25, 2024
  • 3 minute read

bm featured image templates 39

Despite global gains in sexual and reproductive health and rights over the last 30 years, millions of women and girls, including Filipinas, have been deprived of access and opportunities, according to the 2024 State of World Population report, released recently by UNFPA, the United Nations’ sexual and reproductive health agency.

In its report titled, “Interwoven Lives, Threads of Hope: Ending inequalities in sexual and reproductive health and rights,” gender inequality and other forms of discrimination have thwarted the broad gains in sexual and reproductive health for women and girls. Women and girls who are poor, belong to ethnic, racial and indigenous minority groups, or are trapped in conflict settings, are more likely to die because they lack access to timely health care.

As far as the Philippines is concerned, UNFPA Philippines Country Representative Dr. Leila Joudane pointed out that it is a paradox because of the several contradictions.

“The Philippines serves as a complex, and in many ways, inspiring case study.  We’ve seen significant advances in some areas, yet profound inequalities persist,” she said.

“Access to contraception, reductions in maternal death, and the continued fight for gender equality are successes to be honored. Yet we know the fight for sexual and reproductive health and rights is far from over, especially for marginalized communities, young people, and those left furthest behind,” she added.

Strides made

The Philippines has made strides when it comes to maternal and reproductive health since 1994. Antenatal care for women increased by 30 percent (from 53 to 83 percent), unmet need for Family Planning has been halved (from 30 to 12 percent), and important laws like such as the Responsible Parenthood and Reproductive Health Act of 2012, the Act Prohibiting the Practice of Child Marriage, and many others have been enacted.

Over half of all preventable maternal deaths are estimated to occur in countries with humanitarian crises and conflicts; that’s nearly 500 deaths per day. Women from indigenous ethnic groups are more likely to die of causes related to pregnancy and childbirth.

Women with disabilities are up to 10 times more likely to experience gender-based violence than their peers without disabilities. People of diverse sexual orientation and gender expression face rampant violence and steep barriers to care.

“This report underscores a sobering truth: inequalities are widening and the rights of women, girls, and gender-diverse people face increasing pushbacks. This report delves into the critical issues that continue to shape our world and provides a roadmap for realizing the promises we made in Cairo for the International Conference on Population and Development [30 years ago],” explained Joudane during the local launch of the report at the University of the Philippines.

Anniversary celebration

The event, attended by government officials such as those from the National Economic and Development Authority and the Commission on Population and Development, also celebrated the 30th anniversary of the International Conference on Population and Development (ICPD). From a focus on population control, the ICPD was pivotal in convincing the world’s nations to prioritize reproductive justice by recognizing the rights and choices of individuals—especially women and girls—to make informed choices about their bodies, their lives, and their futures. Through the ICPD in 1994, 179 governments, including that of the Philippines, committed to placing sexual and reproductive health and rights at the core of sustainable development.

Further, UNFPA Philippines presented additional data on how access to family planning, especially among the poorest or lowest socioeconomic quintile, has greatly expanded due to the implementation of the Responsible Parenthood and Reproductive Health Act of 2012. Nevertheless, while the law improved access for economically disadvantaged women, it has also resulted in lower access to family planning among adolescents due to the need for parental consent.

UNFPA Deputy Regional Director for Asia and the Pacific Dr. Aleksandar Sasha Bodiroza, said the report is “a celebration of progress and a stark reminder of the work that remains.”

“Millions of women and girls remain far behind, and progress is slowing or stalled on key measures: 800 women die every day giving birth, unchanged since 2016; a quarter of women cannot make her own health-care decisions and around the same number cannot say no to sex with their partner. In 40 percent of countries with data, women’s bodily autonomy is diminishing,” Bodiroza said.

Related Topics

Seeing magenta new heat warning system signals extreme danger.

  • Seth Borenstein / AP Science Writer
  • April 23, 2024

c2 photo01a 042524

New clinical trials recruitment model powered by AI, blockchain launched in PHL

  • Roderick Abad

bm featured image templates 57

2 Filipino pharma companies win healthcare awards in Singapore

  • Manuel Cayon

c4 photo01b 042524

TMC South Luzon, Landers Superstore Nuvali’s partnership officially begins

  • BusinessMirror

c4 photo01 042524

Vaccination is still the most effective method vs pertussis

c4 photo01a 042524

DOH told to expedite flu immunization campaign for seniors to prevent outbreaks

  • Claudeth Mocon-Ciriaco

c3 photo01 042524

Healthway Cancer Care Hospital offers world-class cancer care at affordable prices

  • Anne Ruth Dela Cruz

c3 photo01d 042524

Leading the Way: Precision Imaging Sets New Standards in Cancer Diagnosis

c3 photo01c 042524

DEATH TALKS: End-of-life planning is healthy, practical–insurer

c2 photo01a 042524

Drop in Chronic Kidney Disease in Type 2 Diabetes seen with new drug

  • John Eiron R. Francisco

c2 photo01b 042524

Borough Lasik Center expands to Bacolod City, opens 2nd satellite clinic

c1 photo01c 042524

For Makati Medical Center, the future is now with acquisition of da Vinci XI Robotic System

c1 photo01a 042524

Asian Hospital extends medical services through HEAL Program

c2 photo01c 042524

Go bats for improved public health services

tech people trying achieve ambitious sustainability goals min

Health and wellness corporate giveaways promote well-being among employees

  • April 20, 2024

davao international airport

  • Transportation

CAAP told: Shape up on Davao airport

  • April 18, 2024

h&f04 041824

VisayasMed partners with Singaporean firm to offer quality health services

h&f02 041824

Lab, pharma company partner to push biomarker testing to determine cancer treatment approach

bm featured image templates 57

PHL’s first cancer specialty hospital, National Cancer Centre Singapore partner to strengthen cancer care

  • Candy P. Dalizon

h&f03 041824

Pinoys to get more affordable health insurance packages with digital bank-insurer partnership

Input your search keywords and press Enter.

  • Open access
  • Published: 22 April 2024

The influence of maternal prepregnancy weight and gestational weight gain on the umbilical cord blood metabolome: a case–control study

  • Xianxian Yuan   ORCID: orcid.org/0000-0001-8762-8471 1 ,
  • Yuru Ma 1 ,
  • Jia Wang 2 ,
  • Yan Zhao 1 ,
  • Wei Zheng 1 ,
  • Ruihua Yang 1 ,
  • Lirui Zhang 1 ,
  • Xin Yan 1 &
  • Guanghui Li   ORCID: orcid.org/0000-0003-2290-1515 1  

BMC Pregnancy and Childbirth volume  24 , Article number:  297 ( 2024 ) Cite this article

40 Accesses

Metrics details

Maternal overweight/obesity and excessive gestational weight gain (GWG) are frequently reported to be risk factors for obesity and other metabolic disorders in offspring. Cord blood metabolites provide information on fetal nutritional and metabolic health and could provide an early window of detection of potential health issues among newborns. The aim of the study was to explore the impact of maternal prepregnancy overweight/obesity and excessive GWG on cord blood metabolic profiles.

A case control study including 33 pairs of mothers with prepregnancy overweight/obesity and their neonates, 30 pairs of mothers with excessive GWG and their neonates, and 32 control mother-neonate pairs. Untargeted metabolomic profiling of umbilical cord blood samples were performed using UHPLC‒MS/MS.

Forty-six metabolites exhibited a significant increase and 60 metabolites exhibited a significant reduction in umbilical cord blood from overweight and obese mothers compared with mothers with normal body weight. Steroid hormone biosynthesis and neuroactive ligand‒receptor interactions were the two top-ranking pathways enriched with these metabolites ( P  = 0.01 and 0.03, respectively). Compared with mothers with normal GWG, in mothers with excessive GWG, the levels of 63 metabolites were increased and those of 46 metabolites were decreased in umbilical cord blood. Biosynthesis of unsaturated fatty acids was the most altered pathway enriched with these metabolites ( P  < 0.01).

Conclusions

Prepregnancy overweight and obesity affected the fetal steroid hormone biosynthesis pathway, while excessive GWG affected fetal fatty acid metabolism. This emphasizes the importance of preconception weight loss and maintaining an appropriate GWG, which are beneficial for the long-term metabolic health of offspring.

Peer Review reports

The obesity epidemic is an important public health problem in developed and developing countries [ 1 ] and is associated with the emergence of chronic noncommunicable diseases, including type 2 diabetes mellitus (T2DM), hypertension, cardiovascular disease, nonalcoholic fatty liver disease (NAFLD), and cancer [ 2 , 3 , 4 ]. Maternal obesity is the most common metabolic disturbance in pregnancy, and the prevalence of obesity among women of childbearing age is 7.1% ~ 31.9% in some countries [ 5 ]. In China, the prevalence of overweight and obesity has also increased rapidly in the past four decades. Based on Chinese criteria, the latest national prevalence estimates for 2015–2019 were 34.3% for overweight and 16.4% for obesity in adults (≥ 18 years of age) [ 6 ].

Increasing evidence implicates overnutrition in utero as a major determinant of the health of offspring during childhood and adulthood, which is compatible with the developmental origins of health and disease (DOHaD) framework [ 7 ]. Maternal obesity and excessive gestational weight gain (GWG) are important risk factors for several adverse maternal outcomes, including gestational diabetes and hypertensive disorders, fetal death, and preterm birth [ 8 , 9 , 10 ]. More importantly, they have negative implications for offspring, both perinatally and later in life. Evidence from cohort studies focusing on offspring development confirms the relationship between maternal obesity/excessive GWG and offspring obesity programming [ 11 , 12 , 13 ]. Currently, there is no unified mechanism to explain the adverse outcomes associated with maternal obesity and excessive GWG, which may be the independent and interactive effects of the obese maternal phenotype itself and the diet associated with this phenotype. In addition to genetic and environmental factors, metabolic programming may also lead to the intergenerational transmission of obesity through epigenetic mechanisms.

Metabolomics, which reflects the metabolic phenotype of human subjects and animals, is the profiling of metabolites in biofluids, cells and tissues using high-throughput platforms, such as mass spectrometry. It has unique potential in identifying biomarkers for predicting occurrence, severity, and progression of diseases, as well as exploring underlying mechanistic abnormalities [ 14 , 15 ]. Umbilical cord metabolites can provide information about fetal nutritional and metabolic health, and may provide an early window for detection of potential health issues in newborns [ 16 ]. Previous studies have reported differences in umbilical cord metabolite profiles associated with maternal obesity [ 17 , 18 ]. However, the results were inconsistent due to differences in sample sizes, ethnicity and region, and mass spectrometry. In addition, most studies have not considered the difference in the effects of prepregnancy body mass index (BMI) and GWG on cord blood metabolites.

To investigate the relationship between early metabolic programming and the increased incidence of metabolic diseases in offspring, we studied the associations between elevated prepregnancy BMI/excessive GWG and umbilical cord metabolic profiles. Another purpose of this study was to explore whether there were differences in the effects of prepregnancy overweight/obesity and excessive GWG on cord blood metabolites.

Study population

This was a hospital-based, case control study that included singleton pregnant women who received prenatal care and delivered vaginally at Beijing Obstetrics and Gynecology Hospital, Capital Medical University, from January 2022 to March 2022. We selected 33 pregnant women with a prepregnancy BMI ≥ 24.0 kg/m 2 regardless of their gestational weight gain as the overweight/obese group, 30 pregnant women with a prepregnancy BMI of 18.5–23.9 kg/m 2 and a GWG > 14.0 kg as the excessive GWG group, and 32 pregnant women with a BMI of 18.5–23.9 kg/m 2 and a GWG of 8.0–14.0 kg as the control group. The ages of the three groups were matched (± 1.0 years), and the prepregnancy BMIs of the excessive GWG and control groups were matched (± 1.0 kg/m 2 ).

The inclusion criteria were women with singleton pregnancies, those aged between 20 and 45 years, those with full-term delivery (gestational age ≥ 37 weeks), those with a prepregnancy BMI ≥ 18.5 kg/m 2 , those without prepregnancy diabetes mellitus (DM) or hypertension, and those without gestational diabetes mellitus (GDM). The exclusion criteria were women with multiple pregnancies, those less than 20 years or more than 45 years old, those with a prepregnancy BMI < 18.5 kg/m 2 , those with prepregnancy DM, hypertension or GDM, and those without cord blood samples.

We classified pregnant women into BMI categories based on Chinese guidelines [ 19 ]: normal weight (prepregnancy BMI 18.5–23.9 kg/m 2 ), overweight (prepregnancy BMI 24.0–27.9 kg/m 2 ), and obese (prepregnancy BMI ≥ 28.0 kg/m 2 ). GWG guideline concordance was defined by the 2021 Chinese Nutrition Society recommendations according to prepregnancy BMI. The upper limits of GWG for normal weight, overweight, and obesity were 14.0 kg, 11.0 kg, and 9.0 kg, respectively.

Ethical approval and written informed consent were obtained from all participants. The study has been performed according to the Declaration of Helsinki, and the procedures have been approved by the ethics committees of Beijing Obstetrics and Gynecology Hospital, Capital Medical University (2021-KY-037).

Sample and data collection

Maternal and neonatal clinical data were collected from the electronic medical records system of Beijing Obstetrics and Gynecology Hospital. Maternal clinical characteristics included age, height, prepregnancy and predelivery weight, education level, smoking and drinking status during pregnancy, parity, conception method, comorbidities and complications of pregnancy, family history of DM and hypertension, gestational age, mode of delivery, and biochemical results during pregnancy. Prepregnancy BMI was calculated as prepregnancy weight in kilograms divided by the square of height in meters. GWG was determined by subtracting the prepregnancy weight in kilograms from the predelivery weight in kilograms. GDM was defined using the IAPDSG’s diagnostic criteria at 24 to 28 +6  weeks gestation and the fasting glucose and 1- and 2-h glucose concentrations at the time of the oral glucose tolerance test (OGTT). Neonatal clinical characteristics included sex, birth weight and length. Macrosomia was defined as a birth weight of 4,000 g or more [ 20 ]. Low birth weight (LBW) was defined as a birth weight less than 2,500 g [ 21 ].

Umbilical cord blood samples were obtained by trained midwives after clamping the cord at delivery. Whole blood samples were collected in EDTA tubes, refrigerated for < 24 h, and centrifuged at 2,000 r.p.m. at 4 ℃ for 10 min. Plasma aliquots were stored at -80 ℃ until shipment on dry ice to Novogene, Inc. (Beijing, China) for untargeted metabolomic analysis.

Untargeted metabolomic analyses

Ultrahigh-performance liquid chromatography tandem mass spectrometry (UHPLC‒MS/MS) analyses were performed using a Vanquish UHPLC system (Thermo Fisher, Germany) coupled with an Orbitrap Q Exactive™ HF mass spectrometer (Thermo Fisher, Germany) at Novogene Co., Ltd. (Beijing, China). Detailed descriptions of the sample preparation, mass spectrometry and automated metabolite identification procedures are described in the Supplementary materials .

Statistical analysis

Clinical data statistical analysis.

Quantitative data are shown as the mean ± standard deviation (SD) or median (interquartile range), and categorical data are presented as percentages. The Mann‒Whitney U test, chi-square test, and general linear repeated-measures model were used to assess the differences between the control and study groups when appropriate. A P value < 0.05 was considered statistically significant. All analyses were performed using Statistical Package of Social Sciences version 25.0 (SPSS 25.0) for Windows (SPSS Inc).

Umbilical cord metabolome statistical analysis

These metabolites were annotated using the Human Metabolome Database (HMDB) ( https://hmdb.ca/metabolites ), LIPIDMaps database ( http://www.lipidmaps.org/ ), and Kyoto Encylopaedia of Genes and Genomes (KEGG) database ( https://www.genome.jp/kegg/pathway.html ). Principal component analysis (PCA) and partial least-squares discriminant analysis (PLS-DA) were performed at metaX. We applied univariate analysis ( T test) to calculate the statistical significance ( P value). Metabolites with a variable importance for the projection (VIP) > 1, a P value < 0.05 and a fold change (FC) ≥ 2 or FC ≤ 0.5 were considered to be differential metabolites. A false discovery rate (FDR) control was implemented to correct for multiple comparisons. The q -value in the FDR control was defined as the FDR analog of the P -value. In this study, the q -value was set at 0.2. For clustering heatmaps, the data were normalized using z scores of the intensity areas of differential metabolites and were plotted by the Pheatmap package in R language.

The correlations among differential metabolites were analyzed by cor () in R language (method = Pearson). Statistically significant correlations among differential metabolites were calculated by cor.mtest () in R language. A P value < 0.05 was considered statistically significant, and correlation plots were plotted by the corrplot package in R language. The functions of these metabolites and metabolic pathways were studied using the KEGG database. The metabolic pathway enrichment analysis of differential metabolites was performed when the ratio was satisfied by x/n > y/N, and the metabolic pathway was considered significantly enriched when P  < 0.05.

Demographic characteristics of study participants

The demographic and clinical characteristics of the three population groups enrolled in the study are summarized in Table  1 . Mothers had no significant difference regarding their ages or gestational ages. Compared to the mothers in the excessive GWG and control groups, those in the prepregnancy overweight/obesity group had a significantly higher prepregnancy BMI (25.6 (24.5, 27.2) kg/m 2 ). However, there was no significant difference in prepregnancy BMI between mothers in the excessive GWG group (20.3 ± 1.2 kg/m 2 ) and mothers in the control group (20.6 ± 1.5 kg/m 2 ). Mothers in the excessive GWG group had the highest GWG (17.0 (15.5, 19.1) kg) among the three groups. The mean GWG of the mothers in the prepregnancy overweight/obesity group was 12.9 ± 3.8 kg, which was similar to that of the control group (11.8 ± 1.5 kg). It was noteworthy that among the 33 prepregnancy overweight/obese pregnant women, 20 of them had appropriate GWG, 1 had insufficient GWG, and 12 had excessive GWG. The proportion of mothers who underwent invitro fertilization and embryo transfer (IVF-ET) in the prepregnancy overweight/obesity group (15.2%) was significantly higher than that in the excessive GWG and control groups. There were no statistically significant differences in the proportions of pregnancy outcomes among the three groups, including preeclampsia, premature rupture of membranes, postpartum hemorrhage, macrosomia, and LBW. The babies in the three groups showed no significant difference regarding their birth weights or lengths.

The biochemical parameters of the mothers during pregnancy are shown in Table  2 . The levels of triglyceride (TG) and uric acid (UA) of mothers in the prepregnancy overweight/obesity group were significantly higher than those of the mothers in the excessive GWG and control groups in the first trimester. However, there was no significant difference in the blood glucose and lipid levels in the second and third trimesters of pregnancy among the three groups.

PCA and PLS-DA analysis of cord blood metabolites

Functional and taxonomic annotations of the identified metabolites included the HMDB classification annotations, LIPID MAPS classification annotations, and KEGG pathway annotations. Those cord blood metabolites included lipids and lipid-like molecules, organic acids and their derivatives, and organoheterocyclic compounds, which were mainly involved in metabolism. To better understand the structure of the cord blood metabolome in cases versus controls, we used unsupervised PCA to identify metabolites contributing the most to observed differences in the dataset. PCA did not clearly separate the three groups. We next used PLS-DA to identify metabolites that were predictive of case versus control status. PLS-DA clearly distinguished the cases from the controls (Fig.  1 ), the prepregnancy overweight/obesity group vs. the control group (R2Y = 0.82, Q2Y = 0.37; R2Y = 0.77, Q2Y = 0.13, respectively) (Fig.  1 A), and the excessive GWG group vs. the control group (R2Y = 0.76, Q2Y = 0.16; R2Y = 0.81, Q2Y = 0.41) (Fig.  1 B).

figure 1

PLS-DA of identified cord blood metabolites. A the prepregnancy overweight/obesity group vs. the control group; B the excessive GWG group vs. the control group. (a) PLS-DA score. The horizontal coordinates are the score of the sample on the first principal component; the longitudinal coordinates are the score of the sample on the second principal component; R2Y represents the interpretation rate of the model, and Q2Y is used to evaluate the predictive ability of the PLS-DA model, and when R2Y is greater than Q2Y, it means that the model is well established. (b) PLS-DA valid. Horizontal coordinates represent the correlation between randomly grouped Y and the original group Y, and vertical coordinates represent the scores of R2 and Q2. (1) POS, positive metabolites; (2) NEG, negative metabolites

Maternal prepregnancy overweight/obesity

Screening differential metabolites according to a PLS-DA VIP > 1.0, a FC > 1.2 or < 0.833 and a P value < 0.05, a total of 106 cord blood metabolites (77 positive metabolites and 29 negative metabolites) differed between the prepregnancy overweight/obesity group and the control group. Compared with those in the control group, the levels of 46 metabolites (19 positive metabolites and 27 negative metabolites) were increased in the prepregnancy overweight/obesity group, among which octopamine was the metabolite with the largest increase, followed by (2S)-4-Oxo-2-phenyl-3,4-dihydro-2H-chromen-7-yl beta-D-glucopyranoside, N-tetradecanamide, stearamide, and methanandamide (Fig.  2 A). Compared with the control group, in the prepregnancy overweight/obesity group, there were 60 metabolites (58 positive metabolites and 2 negative metabolites) with reduced concentrations, among which senecionine was the metabolite with the largest decrease, followed by 3-(methylsulfonyl)-2H-chromen-2-one, methyl EudesMate, cuminaldehyde, and 2-(tert-butyl)-1,3-thiazolane-4-carboxylic acid (Fig.  2 A).

figure 2

Stem plots of differential cord blood metabolites. A the prepregnancy overweight/obesity group vs. the control group; B the excessive GWG group vs. the control group. (1) positive metabolites; (2) negative metabolites. Notes: The color of the dot in the stem plots represents the upward and lower adjustment, the blue represents downward, and the red represents upward. The length of the rod represents the size of log2 (FC), and the size of the dot represents the size of the VIP value

A hierarchical analysis of the two groups of differential metabolites obtained was carried out, and the difference in metabolic expression patterns between the two groups and within the same comparison was obtained, which is shown in Fig.  3 . KEGG pathway analysis of differential cord blood metabolites associated with the prepregnancy overweight/obesity group versus the control group is shown in Table  3 and Fig.  4 A. The metabolite enrichment analysis revealed that steroid hormone biosynthesis ( P value = 0.01) and neuroactive ligand‒receptor interactions ( P value = 0.03) were the two pathways that were most altered between the prepregnancy overweight/obesity group and the control group. 19 metabolites were distributed in the pathway of steroid hormone biosynthesis, and 4 metabolites were distributed in the pathway of neuroactive ligand‒receptor interactions. In the steroid hormone biosynthesis pathway, the levels of corticosterone, 11-deoxycortisol, cortisol, testosterone, and 7α-hydroxytestosterone were decreased in the prepregnancy overweight/obesity group relative to those in the control group. In the neuroactive ligand‒receptor interaction pathway, the level of cortisol was decreased and the levels of trace amines were increased in the prepregnancy overweight/obesity group relative to the control group.

figure 3

Clustering heat maps of differential cord blood metabolites of the three groups. A positive metabolites; B negative metabolites. Notes: Longitudinal clustering of samples and trans-verse clustering of metabolites. The shorter the clustering branches, the higher the similarity. Through horizontal comparison, we can see the relationship between groups of metabolite content clustering

figure 4

KEGG enrichment scatterplots (a) and net (b) of differential cord blood metabolites. A the prepregnancy overweight/obesity group vs. the control group; B the excessive GWG group vs. the control group. (1) positive metabolites; (2) negative metabolites. Notes: (a) The horizontal co-ordinates in the figure are x/y (the number of differential metabolites in the corresponding metabolic pathway/the total number of total metabolites identified in this pathway). The value represents the enrichment degree of differential metabolites in the pathway. The color of the point rep-resents the P -value of the hypergeometric test, and the size of the point represents the number of differential metabolites in the corresponding pathway. (b) The red dot represents a metabolic pathway, the yellow dot represents a substance-related regulatory enzyme information, the green dot represents the background substance of a metabolic pathway, the purple dot represents the molecular module information of a class of substances, the blue dot represents a substance chemical reaction, and the green square represents the differential substance obtained by this comparison

Maternal excessive GWG

A total of 109 cord blood metabolites (52 positive metabolites and 57 negative metabolites) differed between the excessive GWG group and the control group. Compared with the control group, in the excessive GWG group, there were 63 metabolites (15 positive metabolites and 48 negative metabolites) with increased concentrations, among which 2-thio-acetyl MAGE was the metabolite with the largest increase, followed by PC (7:0/8:0), lysopc 16:2 (2 N isomer), MGMG (18:2), and thromboxane B2 (Fig.  2 B). Compared with the levels in the control group, the levels of 46 metabolites (37 positive metabolites and 9 negative metabolites) in the excessive GWG group were reduced, among which hippuric acid had the largest decrease, followed by 8-hydroxyquinoline, gamithromycin, 2-phenylglycine, and cefmetazole (Fig.  2 B).

A hierarchical analysis of differential metabolites obtained in the two groups was carried out, and the difference in metabolic expression patterns between the two groups and within the same comparison was obtained, which is shown in Fig.  3 . KEGG pathway analysis of the cord blood metabolites associated with the excessive GWG group versus the control group is shown in Table  4 and Fig.  4 B. The metabolite enrichment analysis revealed that biosynthesis of unsaturated fatty acids was the most altered pathway between the excessive GWG and control groups ( P value < 0.01). There were 13 metabolites distributed in the enriched pathway. The levels of docosapentaenoic acid (DPA), docosahexaenoic acid (DHA), arachidonic acid, adrenic acid, palmitic acid, stearic acid, behenic acid, lignoceric acid, and erucic acid were increased in the excessive GWG group relative to those in the control group.

Our present study found that both maternal prepregnancy overweight/obesity and excessive GWG could affect umbilical cord blood metabolites, and they had different effects on these metabolites. Regardless of their gestational weight gain, the umbilical cord blood of prepregnancy overweight and obese mothers had 46 metabolites increased and 60 metabolites decreased compared with the umbilical cord blood of mothers with normal body weight and appropriate GWG. Steroid hormone biosynthesis and neuroactive ligand‒receptor interactions were the two top-ranking pathways enriched with these metabolites. Compared with mothers with normal prepregnancy BMI and appropriate GWG, in mothers with normal prepregnancy BMI but excessive GWG, the levels of 63 metabolites were increased and those of 46 metabolites were decreased in umbilical cord blood. Biosynthesis of unsaturated fatty acids was the most altered pathway enriched with these metabolites.

There were many differential metabolites in the cord blood between the prepregnancy overweight/obesity group and the control group and between the excessive GWG group and the control group. However, the roles of most of these differential metabolites are unknown. The levels of stearamide and methanandamide were increased in the prepregnancy overweight/obesity group. Stearamide, also known as octadecanamide or kemamide S, belongs to the class of organic compounds known as carboximidic acids. Stearamide, which is increased in the serum of patients with hepatic cirrhosis and sepsis, may be associated with the systemic inflammatory state [ 22 , 23 ]. Methanandamide is a stable analog of anandamide that participates in energy balance mainly by activating cannabinoid receptors. Methanandamide dose-dependently inhibits and excites tension-sensitive gastric vagal afferents (GVAs), which play a role in appetite regulation [ 24 ]. In mice fed a high-fat diet, only an inhibitory effect of methanandamide was observed, and GVA responses to tension were dampened [ 24 , 25 ]. These changes may contribute to the development and/or maintenance of obesity. Moreover, methanandamide can produce dose-related hypothermia and attenuate cocaine-induced hyperthermia by a cannabinoid 1-dopamine D2 receptor mechanism [ 26 ].

Metabolomic pathway analysis of the cord blood metabolite features in the prepregnancy overweight and obesity group identified two filtered significant pathways: steroid hormone biosynthesis and neuroactive ligand‒receptor interaction pathways. In the steroid hormone biosynthesis pathway, the levels of several glucocorticoids (including corticosterone, 11-deoxycortisol, cortisol, testosterone, and 7α-hydroxytestosterone) were decreased in the prepregnancy overweight/obesity group. In addition to the physiological role of glucocorticoids in the healthy neuroendocrine development and maturation of fetuses and babies, glucocorticoids are essential to human health by regulating different physiological events in mature organs and tissues, such as glucose metabolism, lipid biosynthesis and distribution, food intake, thermogenesis, and mood and learning patterns [ 27 ]. Glucocorticoids have been considered as a link between adverse early-life conditions and the development of metabolic disorders in later life [ 28 , 29 , 30 ]. However, there is still much controversy regarding the role of maternal obesity in the fetal–steroid hormone biosynthesis pathway. Studies of maternal obesity animal models showed that corticosterone and cortisol levels were increased in the offspring of obese mothers [ 31 , 32 ]. A study reported by Satu M Kumpulainen et al. showed that young adults born to mothers with higher early pregnancy BMIs show lower average levels of diurnal cortisol, especially in the morning [ 33 ]. Laura I. Stirrat et al. found that increased maternal BMI was associated with lower maternal cortisol, corticosterone, and 11-dehydrocorticosterone levels. However, there were no associations between maternal BMI and glucocorticoid levels in the cord blood [ 34 ]. The differences in the study protocols of these previous studies may explain the mixed findings, such as cortisol measured from peripheral blood, cord blood or saliva; variation in measurement time points; the number of samples. Although the effect of maternal obesity on fetal steroid hormone levels is controversial, dysregulation of glucocorticoids may be a plausible mechanism by which maternal obesity can increase the risk of metabolic disorders and mental health disorders in offspring.

The effect of excessive GWG on umbilical cord blood metabolites is different from that of maternal overweight and obesity. Compared with the control group, in the excessive GWG group, the level of thromboxane B2 was increased and the level of hippuric acid was decreased. Thromboxane B2, which is important in the platelet release reaction, is a stable, physiologically active compound formed in vivo from prostaglandin endoperoxides. Hippuric acid is an acyl glycine formed from the conjugation of benzoic acid with glycine. Several studies have confirmed that both thromboxane B2 and hippuric acid levels are associated with diet. Dietary fatty acids affect platelet thromboxane production [ 35 , 36 , 37 ]. In our study, several fatty acids (e.g., palmitic acid, stearic acid, behenic acid, and lignoceric acid) in the excessive GWG group were also increased, which may have led to the increase in thromboxane B2 levels. Hippuric acid can be detected after the consumption of whole grains and anthocyanin-rich bilberries [ 38 , 39 ]. A healthy diet intervention increased the signals for hippuric acid to incorporate polyunsaturated fatty acids [ 38 ], and the low level of hippuric acid was associated with lower fruit-vegetable intakes [ 39 ]. Maternal overnutrition and unhealthy dietary patterns are the main reasons for excessive GWG [ 40 , 41 ]. Therefore, we speculated that the differences in thromboxane B2 and hippuric acid between the excessive GWG and control groups were associated with maternal diet during pregnancy. The effect of these differential metabolites on the long-term metabolic health of offspring after birth needs further study.

Metabolomic pathway analysis of the cord blood metabolite features in the excessive GWG group identified that biosynthesis of unsaturated fatty acids was the filtered significant pathway. The levels of several fatty acids in this pathway were increased in the excessive GWG group, including long-chain saturated fatty acids (e.g., palmitic acid (C 16:0), stearic acid (C 18:0), behenic acid (C 22:0), and lignoceric acid (C 23:0)), monounsaturated fatty acids (erucic acid), and polyunsaturated fatty acids (e.g., DPA, DHA, arachidonic acid, and adrenic acid). Because perinatal fatty acid status can be influenced by maternal dietary modifications or supplementation [ 42 ], we speculated that maternal diet during pregnancy caused the difference in umbilical cord blood fatty acids between the excessive GWG and control groups. A large body of evidence from mechanistic studies supports the potential of fatty acids to influence later obesity. However, the possible mechanisms and observed relationships are complex and related to the types and patterns of fatty acids [ 43 , 44 ]. Maternal dietary fatty acids have been found to induce hypothalamic inflammation, cause epigenetic changes, and alter the mechanisms of energy control in offspring [ 43 ]. Evidence from cell culture and rodent studies showed that polyunsaturated fatty acids might serve several complex roles in fetuses, including the stimulation and/or inhibition regulation of adipocyte differentiation [ 44 ]. The questions of whether lower n-6 long-chain polyunsaturated fatty acid levels or higher n-3 long-chain polyunsaturated fatty acid levels are of more relevance and whether the long-term effects differ with different offspring ages remain [ 44 ]. Although there is a biologically plausible case for the relevance of perinatal fatty acid status in later obesity risk, available data in humans suggest that the influence of achievable modification of perinatal n-3/n-6 status is not sufficient to influence offspring obesity risk in the general population [ 45 ]. Further studies seem justified to clarify the reasons.

The advantage of our present study is that we simultaneously analyzed the effects of prepregnancy overweight/obesity and excessive GWG on cord blood metabolites and explored their differences. In addition, to exclude the effect of hyperglycemia on cord blood metabolites, both women with prepregnancy diabetes mellitus and gestational diabetes mellitus were excluded from our study. The limitation of our study is that it was a single-center study with a small sample, especially in the prepregnancy overweight/obesity group. In the future, we can expand the sample size and conduct a subgroup analysis of the prepregnancy overweight/obesity group and analyze the differences in the effects of different degrees of obesity on cord blood metabolites. The prepregnancy overweight/obesity group can be further divided into an appropriate GWG group and an excessive GWG group, and the differences in the effects of these two groups on umbilical cord blood metabolites can be analyzed. Moreover, the dietary pattern of the pregnant woman could affect the production of cord blood metabolites. We did not investigate the dietary patterns of the mothers in this study, which is another limitation of this study. In future studies, we should investigate maternal dietary patterns as a very important confounding variable.

In conclusion, our present study confirmed that both prepregnancy overweight/obesity and excessive GWG could affect umbilical cord blood metabolites, and they had different effects on these metabolites. Prepregnancy overweight and obesity affected the fetal steroid hormone biosynthesis pathway, while normal prepregnancy body weight but excessive GWG affected fetal fatty acid metabolism. This emphasizes the importance of preconception weight loss and maintaining an appropriate GWG, which are beneficial for the long-term metabolic health of offspring.

Availability of data and materials

Data sets generated during the current study are not publicly available but will be available from the corresponding author at a reasonable request. Responses to the request for the raw data will be judged by a committee including XXY and GHL.

Abbreviations

Excessive gestational weight gain

Ultrahigh-performance liquid chromatography tandem mass spectrometry

Type 2 diabetes mellitus

Nonalcoholic fatty liver disease

The developmental origins of health and disease

Body mass index

Diabetes mellitus

Gestational diabetes mellitus

Oral glucose tolerance test

Low birth weight

Standard deviation

The Human Metabolome Database

Kyoto Encylopaedia of Genes and Genomes

Principal component analysis

Partial least-squares discriminant analysis

Importance for the projection

Fold change

Invitro fertilization and embryo transfer

Triglyceride

Docosapentaenoic acid

Docosahexaenoic acid

Gastric vagal afferents

Collaborators GBDO, Afshin A, Forouzanfar MH, Reitsma MB, Sur P, Estep K, Lee A, Marczak L, Mokdad AH, Moradi-Lakeh M, et al. Health effects of overweight and obesity in 195 countries over 25 years. N Engl J Med. 2017;377(1):13–27.

Article   Google Scholar  

Bjerregaard LG, Jensen BW, Angquist L, Osler M, Sorensen TIA, Baker JL. Change in overweight from childhood to early adulthood and risk of type 2 diabetes. N Engl J Med. 2018;378(14):1302–12.

Article   PubMed   Google Scholar  

Sharma V, Coleman S, Nixon J, Sharples L, Hamilton-Shield J, Rutter H, Bryant M. A systematic review and meta-analysis estimating the population prevalence of comorbidities in children and adolescents aged 5 to 18 years. Obes Rev. 2019;20(10):1341–9.

Article   PubMed   PubMed Central   Google Scholar  

Llewellyn A, Simmonds M, Owen CG, Woolacott N. Childhood obesity as a predictor of morbidity in adulthood: a systematic review and meta-analysis. Obes Rev. 2016;17(1):56–67.

Article   CAS   PubMed   Google Scholar  

Poston L, Caleyachetty R, Cnattingius S, Corvalan C, Uauy R, Herring S, Gillman MW. Preconceptional and maternal obesity: epidemiology and health consequences. Lancet Diabetes Endocrinol. 2016;4(12):1025–36.

Pan XF, Wang L, Pan A. Epidemiology and determinants of obesity in China. Lancet Diabetes Endocrinol. 2021;9(6):373–92.

Barker DJ. The developmental origins of adult disease. J Am Coll Nutr. 2004;23(6 Suppl):588S-595S.

LifeCycle Project-Maternal O, Childhood Outcomes Study G, Voerman E, Santos S, Inskip H, Amiano P, Barros H, Charles MA, Chatzi L, Chrousos GP, et al. Association of gestational weight gain with adverse maternal and infant outcomes. JAMA. 2019;321(17):1702–15.

Aune D, Saugstad OD, Henriksen T, Tonstad S. Maternal body mass index and the risk of fetal death, stillbirth, and infant death: a systematic review and meta-analysis. JAMA. 2014;311(15):1536–46.

Ukah UV, Bayrampour H, Sabr Y, Razaz N, Chan WS, Lim KI, Lisonkova S. Association between gestational weight gain and severe adverse birth outcomes in Washington State, US: a population-based retrospective cohort study, 2004–2013. PLoS Med. 2019;16(12):e1003009.

Starling AP, Brinton JT, Glueck DH, Shapiro AL, Harrod CS, Lynch AM, Siega-Riz AM, Dabelea D. Associations of maternal BMI and gestational weight gain with neonatal adiposity in the Healthy Start study. Am J Clin Nutr. 2015;101(2):302–9.

Voerman E, Santos S, Patro Golab B, Amiano P, Ballester F, Barros H, Bergstrom A, Charles MA, Chatzi L, Chevrier C, et al. Maternal body mass index, gestational weight gain, and the risk of overweight and obesity across childhood: an individual participant data meta-analysis. PLoS Med. 2019;16(2):e1002744.

Heslehurst N, Vieira R, Akhter Z, Bailey H, Slack E, Ngongalah L, Pemu A, Rankin J. The association between maternal body mass index and child obesity: a systematic review and meta-analysis. PLoS Med. 2019;16(6):e1002817.

Newgard CB. Metabolomics and metabolic diseases: where do we stand? Cell Metab. 2017;25(1):43–56.

Johnson CH, Ivanisevic J, Siuzdak G. Metabolomics: beyond biomarkers and towards mechanisms. Nat Rev Mol Cell Biol. 2016;17(7):451–9.

Article   CAS   PubMed   PubMed Central   Google Scholar  

Hivert MF, Perng W, Watkins SM, Newgard CS, Kenny LC, Kristal BS, Patti ME, Isganaitis E, DeMeo DL, Oken E, et al. Metabolomics in the developmental origins of obesity and its cardiometabolic consequences. J Dev Orig Health Dis. 2015;6(2):65–78.

Schlueter RJ, Al-Akwaa FM, Benny PA, Gurary A, Xie G, Jia W, Chun SJ, Chern I, Garmire LX. Prepregnant obesity of mothers in a multiethnic cohort is associated with cord blood metabolomic changes in offspring. J Proteome Res. 2020;19(4):1361–74.

Shokry E, Marchioro L, Uhl O, Bermudez MG, Garcia-Santos JA, Segura MT, Campoy C, Koletzko B. Impact of maternal BMI and gestational diabetes mellitus on maternal and cord blood metabolome: results from the PREOBE cohort study. Acta Diabetol. 2019;56(4):421–30.

Chen C, Lu FC, Department of Disease Control Ministry of Health PRC. The guidelines for prevention and control of overweight and obesity in Chinese adults. Biomed Environ Sci. 2004;17(Suppl):1–36.

PubMed   Google Scholar  

The American College of Obstetricians and Gynecologists. Macrosomia: ACOG practice bulletin, number 216. Obstet Gynecol. 2020;135(1):e18–e35.

Goldenberg RL, Culhane JF. Low birth weight in the United States. Am J Clin Nutr. 2007;85(2):584S-590S.

Lian JS, Liu W, Hao SR, Guo YZ, Huang HJ, Chen DY, Xie Q, Pan XP, Xu W, Yuan WX, et al. A serum metabonomic study on the difference between alcohol- and HBV-induced liver cirrhosis by ultraperformance liquid chromatography coupled to mass spectrometry plus quadrupole time-of-flight mass spectrometry. Chin Med J (Engl). 2011;124(9):1367–73.

CAS   PubMed   Google Scholar  

Ding W, Xu S, Zhou B, Zhou R, Liu P, Hui X, Long Y, Su L. Dynamic plasma lipidomic analysis revealed cholesterol ester and amides associated with sepsis development in critically Ill patients after cardiovascular surgery with cardiopulmonary bypass. J Pers Med. 2022;12(11):1838.

Christie S, O’Rielly R, Li H, Nunez-Salces M, Wittert GA, Page AJ. Modulatory effect of methanandamide on gastric vagal afferent satiety signals depends on nutritional status. J Physiol. 2020;598(11):2169–82.

Christie S, O’Rielly R, Li H, Wittert GA, Page AJ. High fat diet induced obesity alters endocannabinoid and ghrelin mediated regulation of components of the endocannabinoid system in nodose ganglia. Peptides. 2020;131:170371.

Rasmussen BA, Kim E, Unterwald EM, Rawls SM. Methanandamide attenuates cocaine-induced hyperthermia in rats by a cannabinoid CB1-dopamine D2 receptor mechanism. Brain Res. 2009;1260:7–14.

Facchi JC, Lima TAL, Oliveira LR, Costermani HO, Miranda GDS, de Oliveira JC. Perinatal programming of metabolic diseases: the role of glucocorticoids. Metabolism. 2020;104:154047.

Reynolds RM, Walker BR, Syddall HE, Andrew R, Wood PJ, Whorwood CB, Phillips DI. Altered control of cortisol secretion in adult men with low birth weight and cardiovascular risk factors. J Clin Endocrinol Metab. 2001;86(1):245–50.

Valtat B, Dupuis C, Zenaty D, Singh-Estivalet A, Tronche F, Breant B, Blondeau B. Genetic evidence of the programming of beta cell mass and function by glucocorticoids in mice. Diabetologia. 2011;54(2):350–9.

Jia Y, Li R, Cong R, Yang X, Sun Q, Parvizi N, Zhao R. Maternal low-protein diet affects epigenetic regulation of hepatic mitochondrial DNA transcription in a sex-specific manner in newborn piglets associated with GR binding to its promoter. PLoS ONE. 2013;8(5):e63855.

Rodriguez JS, Rodriguez-Gonzalez GL, Reyes-Castro LA, Ibanez C, Ramirez A, Chavira R, Larrea F, Nathanielsz PW, Zambrano E. Maternal obesity in the rat programs male offspring exploratory, learning and motivation behavior: prevention by dietary intervention pre-gestation or in gestation. Int J Dev Neurosci. 2012;30(2):75–81.

Tuersunjiang N, Odhiambo JF, Long NM, Shasa DR, Nathanielsz PW, Ford SP. Diet reduction to requirements in obese/overfed ewes from early gestation prevents glucose/insulin dysregulation and returns fetal adiposity and organ development to control levels. Am J Physiol Endocrinol Metab. 2013;305(7):E868-878.

Kumpulainen SM, Heinonen K, Kaseva N, Andersson S, Lano A, Reynolds RM, Wolke D, Kajantie E, Eriksson JG, Raikkonen K. Maternal early pregnancy body mass index and diurnal salivary cortisol in young adult offspring. Psychoneuroendocrinology. 2019;104:89–99.

Stirrat LI, Just G, Homer NZM, Andrew R, Norman JE, Reynolds RM. Glucocorticoids are lower at delivery in maternal, but not cord blood of obese pregnancies. Sci Rep. 2017;7(1):10263.

Prisco D, Filippini M, Francalanci I, Paniccia R, Gensini GF, Serneri GG. Effect of n-3 fatty acid ethyl ester supplementation on fatty acid composition of the single platelet phospholipids and on platelet functions. Metabolism. 1995;44(5):562–9.

Kaapa P, Uhari M, Nikkari T, Viinikka L, Ylikorkala O. Dietary fatty acids and platelet thromboxane production in puerperal women and their offspring. Am J Obstet Gynecol. 1986;155(1):146–9.

Teng KT, Chang CY, Kanthimathi MS, Tan AT, Nesaretnam K. Effects of amount and type of dietary fats on postprandial lipemia and thrombogenic markers in individuals with metabolic syndrome. Atherosclerosis. 2015;242(1):281–7.

Hanhineva K, Lankinen MA, Pedret A, Schwab U, Kolehmainen M, Paananen J, de Mello V, Sola R, Lehtonen M, Poutanen K, et al. Nontargeted metabolite profiling discriminates diet-specific biomarkers for consumption of whole grains, fatty fish, and bilberries in a randomized controlled trial. J Nutr. 2015;145(1):7–17.

Brunelli L, Davin A, Sestito G, Mimmi MC, De Simone G, Balducci C, Pansarasa O, Forloni G, Cereda C, Pastorelli R, et al. Plasmatic hippuric acid as a hallmark of frailty in an Italian cohort: the mediation effect of fruit-vegetable intake. J Gerontol A Biol Sci Med Sci. 2021;76(12):2081–9.

Ferreira LB, Lobo CV, Miranda A, Carvalho BDC, Santos LCD. Dietary patterns during pregnancy and gestational weight gain: a systematic review. Rev Bras Ginecol Obstet. 2022;44(5):540–7.

Tielemans MJ, Garcia AH, Peralta Santos A, Bramer WM, Luksa N, Luvizotto MJ, Moreira E, Topi G, de Jonge EA, Visser TL, et al. Macronutrient composition and gestational weight gain: a systematic review. Am J Clin Nutr. 2016;103(1):83–99.

Lewis RM, Wadsack C, Desoye G. Placental fatty acid transfer. Curr Opin Clin Nutr Metab Care. 2018;21(2):78–82.

Cesar HC, Pisani LP. Fatty-acid-mediated hypothalamic inflammation and epigenetic programming. J Nutr Biochem. 2017;42:1–6.

Demmelmair H, Koletzko B. Perinatal polyunsaturated fatty acid status and obesity risk. Nutrients. 2021;13(11):3882.

Hauner H, Brunner S. Early fatty acid exposure and later obesity risk. Curr Opin Clin Nutr Metab Care. 2015;18(2):113–7.

Download references

Acknowledgements

The authors thank the study participants for their involvement and research assistants for their help conducting the study.

This research was funded by the Beijing Natural Science Foundation, grant number 7214231.

Author information

Authors and affiliations.

Division of Endocrinology and Metabolism, Department of Obstetrics, Beijing Obstetrics and Gynecology Hospital, Capital Medical University, Beijing Maternal and Child Health Care Hospital, No. 251, Yaojiayuan Road, Chaoyang District, Beijing, 100026, China

Xianxian Yuan, Yuru Ma, Yan Zhao, Wei Zheng, Ruihua Yang, Lirui Zhang, Xin Yan & Guanghui Li

Department of Obstetrics and Gynecology, The Second Hospital of Jilin University, Changchun, 130041, Jilin, China

You can also search for this author in PubMed   Google Scholar

Contributions

XXY designed the study. XXY, WZ, LRZ and XY analyzed the data. YRM, JW, YZ and RHY took part in data collection and management. XXY wrote the manuscript. XXY and GHL reviewed the manuscript and contributed to manuscript revision. All authors contributed to the article and approved the submitted version. All authors reviewed the manuscript.

Corresponding author

Correspondence to Guanghui Li .

Ethics declarations

Ethics approval and consent to participate.

This study has been performed in accordance with the Declaration of Helsinki and has been approved by the ethics committee of Beijing Obstetrics and Gynecology Hospital, Capital Medical University (2021-KY-037). Informed consent was obtained from all subjects involved in the study to publish this paper. All methods were carried out in accordance with relevant guidelines and regulations in the declaration.

Consent for publication

Not applicable.

Competing interests

The authors declare no competing interests.

Additional information

Publisher’s note.

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary Information

Supplementary material 1., rights and permissions.

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ . The Creative Commons Public Domain Dedication waiver ( http://creativecommons.org/publicdomain/zero/1.0/ ) applies to the data made available in this article, unless otherwise stated in a credit line to the data.

Reprints and permissions

About this article

Cite this article.

Yuan, X., Ma, Y., Wang, J. et al. The influence of maternal prepregnancy weight and gestational weight gain on the umbilical cord blood metabolome: a case–control study. BMC Pregnancy Childbirth 24 , 297 (2024). https://doi.org/10.1186/s12884-024-06507-x

Download citation

Received : 30 September 2023

Accepted : 11 April 2024

Published : 22 April 2024

DOI : https://doi.org/10.1186/s12884-024-06507-x

Share this article

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

  • Maternal obesity
  • Gestational weight gain
  • Offspring health
  • Metabolites
  • Umbilical cord blood

BMC Pregnancy and Childbirth

ISSN: 1471-2393

case study health data

  • Copy/Paste Link Link Copied

Science Update: Steroid treatment in late pregnancy does not appear to affect children’s neurodevelopment, NICHD-funded study suggests

Adult hand holding tiny preterm infant hand.

Children who were exposed to a steroid at 34 to 36 weeks of pregnancy are no more likely to have cognitive effects than children whose mother did not receive a steroid, suggests a study funded by the Eunice Kennedy Shriver National Institute of Child Health and Human Development (NICHD). The findings help to allay concerns that fetal exposure to a steroid in the uterus—given to speed lung development in case of preterm birth—could affect a child’s neurodevelopment.

The study was conducted by Cynthia Gyamfi-Bannerman, M.D., M.S., and colleagues in the NICHD Maternal-Fetal Medicine Units Network. It appears in the Journal of the American Medical Association .

A previous study concluded that giving a single dose of the steroid betamethasone to pregnant people at risk of giving birth at 34 to 36 weeks of pregnancy significantly reduced the risk of respiratory complications in their newborns. However, the study also found that these infants were more likely to develop hypoglycemia (low blood sugar). Prolonged hypoglycemia in newborns is associated with brain injury . Other research suggests that multiple doses of steroids before birth could affect a child’s neurodevelopment.

For the current study, researchers evaluated children of the previous study’s participants when the children were six years old or older. A psychologist evaluated each child using a variety of tests that measured verbal and nonverbal reasoning and comprehension. A total of 949 children completed the testing (479 in the betamethasone group and 470 in the placebo group).

Both groups of children scored similarly across all measures of the test, called the Differential Ability Scales. A total of 17.1% in the betamethasone group received a score of less than 85, which did not differ significantly from the 18.5% of the placebo group. Similarly, the average score was 96.6 for both groups (compared to a national average of 100). Also similar between the groups were scores for verbal ability, nonverbal ability, spatial ability, social responsiveness, gross motor function, and behavior.

Significance

The authors conclude that giving a steroid to pregnant people at risk for late preterm birth to reduce potential respiratory complications in their infants is not associated with adverse neurodevelopmental outcomes at age 6 or older. The results help support the prescribing of corticosteroids to pregnant people at risk for late preterm birth.

Gyamfi-Bannerman, C, et al. Neurodevelopmental Outcomes After late preterm antenatal corticosteroids: The ALPS follow-up study. The Journal of the American Medical Association. 2024. doi:10.1001/jama.2024.4303

IMAGES

  1. Top 6 Data Science Use Cases in Healthcare With Case Study

    case study health data

  2. Case Study: Health Center Company Develops Analytics & Power BI Roadmap

    case study health data

  3. Healthcare Sports Medicine Case Study Design

    case study health data

  4. Architecture of health care data analytics

    case study health data

  5. How to Customize a Case Study Infographic With Animated Data

    case study health data

  6. Healthcare Data Visualization: Examples & Key Benefits

    case study health data

VIDEO

  1. Health Informatics & Health Analysis specialisms webinar: September 2024 entry

  2. Data Science in Healthcare (case study)

  3. No-Code Data Analytics 📈 Hal9 Case Study: Health Data 🏥

  4. Using data and statistics to inform healthcare decisions

  5. A German Health Data Space in the making

  6. Understanding Data as Stories

COMMENTS

  1. Open Case Studies

    Addiction & Overdose. Opioids in the United States This case study examines the number of opioid pills (specifically oxycodone and hydrocodone, as they are the top two misused opioids) shipped to pharmacies and practitioners at the county-level around the United States from 2006 to 2014 using data from the Drug Enforcement Administration (DEA).

  2. Case Studies Apply Big Data Analytics to Public Health Research

    By Jessica Kent. December 10, 2020 - Researchers at Johns Hopkins Bloomberg School of Public Health have developed a series of case studies for public health issues that will enable healthcare leaders to use big data analytics tools in their work. The Open Case Studies project offers an interactive online hub made up of ten case studies that ...

  3. Data Analytics in Healthcare: 7 Big Data Use Cases

    Data Analytics in Healthcare: 7 Real-World Examples and Use Cases. There are few things in the world requiring such precision as clinical decision-making. The adoption of technologies supports healthcare organizations on different levels: from population monitoring, health records, diagnostics, and clinical decisions, to drug procurement, and ...

  4. Sharing Health Data Case Studies Special Publication

    Health data has proven its centrality in guiding action to change the course of individual and population health, if properly stewarded and used. In the context of the COVID-19 pandemic, both data and a lack of data illuminated profound shortcomings that affected health care and health equity. Yet, a silver lining of the pandemic was a surge in ...

  5. 10 top case studies: Big data analytics in healthcare

    Below are 10 case studies Health Data Management ran in the past year. Each offers an in-depth look at the technologies these organizations are using, the challenges they overcame and the results ...

  6. Impact Case Studies

    Impact Case Studies AHRQ's evidence-based tools and resources are used by organizations nationwide to improve the quality, safety, effectiveness, and efficiency of health care. The Agency's Impact Case Studies highlight these successes, describing the use and impact of AHRQ-funded tools by State and Federal policy makers, health systems ...

  7. Big data in healthcare: management, analysis and future prospects

    'Big data' is massive amounts of information that can work wonders. It has become a topic of special interest for the past two decades because of a great potential that is hidden in it. Various public and private sector industries generate, store, and analyze big data with an aim to improve the services they provide. In the healthcare industry, various sources for big data include hospital ...

  8. Data Science in Healthcare: COVID-19 and Beyond

    This increased data sharing, in combination with advances in health data management, works hand-in-hand with trends such as increased patient-centricity (with shared decision making), self-care (e.g., using wearables), and integrated healthcare delivery. ... A Case Study from Baja California, Mexico" by Rojas-Mendizabal et al. aims to ...

  9. Good practices for clinical data warehouse implementation: A case study

    Real-world data. Health information systems (HIS) are increasingly collecting routine care data [1-7].This source of real-world data (RWD) [] bears great promises to improve the quality of care.On the one hand, the use of this data translates into direct benefits—primary uses—for the patient by serving as the cornerstone of the developing personalized medicine [9,10].

  10. Case Studies

    Benefits of Health IT; Case Studies. A Solo Practitioner Uses EHR to Assess Quality of Care; A West Virginia Health Center Discusses Implementing Electronic Health Records; Care Coordination Improved through Health Information Exchange; EHRs Improving Care Coordination with Local Referral Network; Florida Physician uses EHR for Practice ...

  11. Big data in healthcare

    2. Promises. In the era of genomics, the volume of data being captured from biological experiments and routine health care procedures is growing at an unprecedented pace 4.This data trove has brought new promises for discovery in health care research and breakthrough treatments as well as new challenges in technology, management, and dissemination of knowledge.

  12. Evidence-based Operations Management in Health Information Management

    Abstract. This is a case study of the evidence-based management practices of a centralized health information management (HIM) department in a large integrated healthcare delivery system. The case study used interviews and focus groups, as well as de-identified dashboards, to explore the impact of reporting on the organization.

  13. Healthcare Data Management: Three Case Studies

    The HIMSS Davies Awards program promotes HIMSS's vision and mission by recognizing and sharing case studies, model practices and lessons learned on how to improve health and wellness through the power of information and technology. 2019 Davies Award winner Yale New Haven Hospital was recognized for enhancing healthcare data management in a variety of scenarios.

  14. Health Case Studies

    Health Case Studies is composed of eight separate health case studies. Each case study includes the patient narrative or story that models the best practice (at the time of publishing) in healthcare settings. Associated with each case is a set of specific learning objectives to support learning and facilitate educational strategies and evaluation.

  15. PDF Clinical Data Standards in Health Care: Five Case Studies

    Four of the five case studies are regional or national in scope and involve participa-tion by at least four distinct organizations. Explore alternative financing models. Adoption of clinical data standards requires not only an organizational commitment, but a financial one.

  16. Case studies

    When autocomplete results are available use up and down arrows to review and enter to select.

  17. Case Study 2: Visualizing Global Health Data

    Case Study Goals. Using data from the World Bank, CIA Factbook, or a complex data source of your own choosing, create a Shiny app in R to illustrate an aspect of our world in data. create an accompanying Tableau dashboard interactive visual presentation of our world in data.

  18. Case Study Methodology of Qualitative Research: Key Attributes and

    A case study is one of the most commonly used methodologies of social research. This article attempts to look into the various dimensions of a case study research strategy, the different epistemological strands which determine the particular case study type and approach adopted in the field, discusses the factors which can enhance the effectiveness of a case study research, and the debate ...

  19. Case Studies in Health Information Management

    Section One. Data Content, Structure, and Information Governance. Case 1-1. Subjective, Objective, Assessment, and Plan (SOAP) Statements and the Problem-Oriented Health Record (POHR) Case 1-2. Problem-Oriented Record Format. Case 1-3. Master Patient Index and Duplicate Health Record Number Assignment. Case 1-4.

  20. Scoping Review and Case Studies of Health Data Management Before

    The Case Studies. J-SPEED in Japanese Emergencies 2018 - 2020. Case 1 Hokkaido Earthquake 2018: The J-SPEED data detailed a total of 739 consultations over 32 days. The analysis of J-SPEED data showed that the highest number of health consultations (n=721; 97.6%) occurred between day 1 and 13 of the 32-day EMT response.

  21. Case Studies in Health Information Management

    Developed specifically for learning environments, CASE STUDIES IN HEALTH INFORMATION MANAGEMENT, 3rd Edition maps the latest AHIMA domains and competencies to case study content, helping readers prepare for RHIA® and RHIT® certification. More than a collection of readings, this versatile worktext offers extended online references and content ...

  22. Case Studies of Quality Improvement Initiatives

    The studies provide practical examples of efforts to improve performance on various aspects of patients' experience of health care as measured by the CAHPS surveys. Each case study presents a short overview of the steps an organization took in its quality improvement initiative, followed by a more detailed description of specific actions ...

  23. Practical approaches in evaluating validation and biases of ...

    (2) UNITI, Corona Check, Corona Health: The investigators have access to the study data. Raw data (de-identified) can be made available on request from the corresponding author.

  24. Assessing intersectional gender analysis in Nepal's health management

    Background Tuberculosis (TB) remains a major public health problem in Nepal, high in settings marked by prevalent gender and social inequities. Various social stratifiers intersect, either privileging or oppressing individuals based on their characteristics and contexts, thereby increasing risks, vulnerabilities and marganilisation associated with TB. This study aimed to assess the ...

  25. Frontiers

    The study will help to develop public health policies and interventions to reduce the negative impacts of climate change on RTIs fatalities in extreme weather conditions. 2 Materials and methods 2.1 Study area and data collection. Jinan is the capital city of Shandong Province, located at the east coast of China an mid-latitudes.

  26. PHL is a complex, inspiring case study in sexual, reproductive health

    Despite global gains in sexual and reproductive health and rights over the last 30 years, millions of women and girls, including Filipinas, have been deprived of access and opportunities ...

  27. Challenges

    Given the effect of urbanization on land use and the allocation and implementation of urban green spaces, this paper attempts to analyze the distribution and accessibility of public parks in India's Bengaluru city (previously known as Bangalore). Availability, accessibility, and utilization—the key measures of Urban Green Spaces (UGS)—are mostly used in health research and policy and are ...

  28. The influence of maternal prepregnancy weight and gestational weight

    Background Maternal overweight/obesity and excessive gestational weight gain (GWG) are frequently reported to be risk factors for obesity and other metabolic disorders in offspring. Cord blood metabolites provide information on fetal nutritional and metabolic health and could provide an early window of detection of potential health issues among newborns. The aim of the study was to explore the ...

  29. Age-friendly urban design: an Israeli national case study

    Inon Schenker is a Global Public Health Specialist, entrepreneur, and CEO of Impact for Healthy Futures. Holding a PhD, MPH, and BA from Hebrew University, he is an expert in municipal urban health programs. With 20+ years in management, planning, and partnerships, he advises the UN, local authorities and government ministries.

  30. Science Update: Steroid treatment in late pregnancy does not ...

    Children who were exposed to a steroid at 34 to 36 weeks of pregnancy are no more likely to have cognitive effects than children whose mother did not receive a steroid, suggests a study funded by the Eunice Kennedy Shriver National Institute of Child Health and Human Development (NICHD). The findings help to allay concerns that fetal exposure to a steroid in the uterus—given to speed lung ...