• Research article
  • Open access
  • Published: 04 June 2021

Coronavirus disease (COVID-19) pandemic: an overview of systematic reviews

  • Israel Júnior Borges do Nascimento 1 , 2 ,
  • Dónal P. O’Mathúna 3 , 4 ,
  • Thilo Caspar von Groote 5 ,
  • Hebatullah Mohamed Abdulazeem 6 ,
  • Ishanka Weerasekara 7 , 8 ,
  • Ana Marusic 9 ,
  • Livia Puljak   ORCID: orcid.org/0000-0002-8467-6061 10 ,
  • Vinicius Tassoni Civile 11 ,
  • Irena Zakarija-Grkovic 9 ,
  • Tina Poklepovic Pericic 9 ,
  • Alvaro Nagib Atallah 11 ,
  • Santino Filoso 12 ,
  • Nicola Luigi Bragazzi 13 &
  • Milena Soriano Marcolino 1

On behalf of the International Network of Coronavirus Disease 2019 (InterNetCOVID-19)

BMC Infectious Diseases volume  21 , Article number:  525 ( 2021 ) Cite this article

16k Accesses

28 Citations

13 Altmetric

Metrics details

Navigating the rapidly growing body of scientific literature on the SARS-CoV-2 pandemic is challenging, and ongoing critical appraisal of this output is essential. We aimed to summarize and critically appraise systematic reviews of coronavirus disease (COVID-19) in humans that were available at the beginning of the pandemic.

Nine databases (Medline, EMBASE, Cochrane Library, CINAHL, Web of Sciences, PDQ-Evidence, WHO’s Global Research, LILACS, and Epistemonikos) were searched from December 1, 2019, to March 24, 2020. Systematic reviews analyzing primary studies of COVID-19 were included. Two authors independently undertook screening, selection, extraction (data on clinical symptoms, prevalence, pharmacological and non-pharmacological interventions, diagnostic test assessment, laboratory, and radiological findings), and quality assessment (AMSTAR 2). A meta-analysis was performed of the prevalence of clinical outcomes.

Eighteen systematic reviews were included; one was empty (did not identify any relevant study). Using AMSTAR 2, confidence in the results of all 18 reviews was rated as “critically low”. Identified symptoms of COVID-19 were (range values of point estimates): fever (82–95%), cough with or without sputum (58–72%), dyspnea (26–59%), myalgia or muscle fatigue (29–51%), sore throat (10–13%), headache (8–12%) and gastrointestinal complaints (5–9%). Severe symptoms were more common in men. Elevated C-reactive protein and lactate dehydrogenase, and slightly elevated aspartate and alanine aminotransferase, were commonly described. Thrombocytopenia and elevated levels of procalcitonin and cardiac troponin I were associated with severe disease. A frequent finding on chest imaging was uni- or bilateral multilobar ground-glass opacity. A single review investigated the impact of medication (chloroquine) but found no verifiable clinical data. All-cause mortality ranged from 0.3 to 13.9%.

Conclusions

In this overview of systematic reviews, we analyzed evidence from the first 18 systematic reviews that were published after the emergence of COVID-19. However, confidence in the results of all reviews was “critically low”. Thus, systematic reviews that were published early on in the pandemic were of questionable usefulness. Even during public health emergencies, studies and systematic reviews should adhere to established methodological standards.

Peer Review reports

The spread of the “Severe Acute Respiratory Coronavirus 2” (SARS-CoV-2), the causal agent of COVID-19, was characterized as a pandemic by the World Health Organization (WHO) in March 2020 and has triggered an international public health emergency [ 1 ]. The numbers of confirmed cases and deaths due to COVID-19 are rapidly escalating, counting in millions [ 2 ], causing massive economic strain, and escalating healthcare and public health expenses [ 3 , 4 ].

The research community has responded by publishing an impressive number of scientific reports related to COVID-19. The world was alerted to the new disease at the beginning of 2020 [ 1 ], and by mid-March 2020, more than 2000 articles had been published on COVID-19 in scholarly journals, with 25% of them containing original data [ 5 ]. The living map of COVID-19 evidence, curated by the Evidence for Policy and Practice Information and Co-ordinating Centre (EPPI-Centre), contained more than 40,000 records by February 2021 [ 6 ]. More than 100,000 records on PubMed were labeled as “SARS-CoV-2 literature, sequence, and clinical content” by February 2021 [ 7 ].

Due to publication speed, the research community has voiced concerns regarding the quality and reproducibility of evidence produced during the COVID-19 pandemic, warning of the potential damaging approach of “publish first, retract later” [ 8 ]. It appears that these concerns are not unfounded, as it has been reported that COVID-19 articles were overrepresented in the pool of retracted articles in 2020 [ 9 ]. These concerns about inadequate evidence are of major importance because they can lead to poor clinical practice and inappropriate policies [ 10 ].

Systematic reviews are a cornerstone of today’s evidence-informed decision-making. By synthesizing all relevant evidence regarding a particular topic, systematic reviews reflect the current scientific knowledge. Systematic reviews are considered to be at the highest level in the hierarchy of evidence and should be used to make informed decisions. However, with high numbers of systematic reviews of different scope and methodological quality being published, overviews of multiple systematic reviews that assess their methodological quality are essential [ 11 , 12 , 13 ]. An overview of systematic reviews helps identify and organize the literature and highlights areas of priority in decision-making.

In this overview of systematic reviews, we aimed to summarize and critically appraise systematic reviews of coronavirus disease (COVID-19) in humans that were available at the beginning of the pandemic.

Methodology

Research question.

This overview’s primary objective was to summarize and critically appraise systematic reviews that assessed any type of primary clinical data from patients infected with SARS-CoV-2. Our research question was purposefully broad because we wanted to analyze as many systematic reviews as possible that were available early following the COVID-19 outbreak.

Study design

We conducted an overview of systematic reviews. The idea for this overview originated in a protocol for a systematic review submitted to PROSPERO (CRD42020170623), which indicated a plan to conduct an overview.

Overviews of systematic reviews use explicit and systematic methods for searching and identifying multiple systematic reviews addressing related research questions in the same field to extract and analyze evidence across important outcomes. Overviews of systematic reviews are in principle similar to systematic reviews of interventions, but the unit of analysis is a systematic review [ 14 , 15 , 16 ].

We used the overview methodology instead of other evidence synthesis methods to allow us to collate and appraise multiple systematic reviews on this topic, and to extract and analyze their results across relevant topics [ 17 ]. The overview and meta-analysis of systematic reviews allowed us to investigate the methodological quality of included studies, summarize results, and identify specific areas of available or limited evidence, thereby strengthening the current understanding of this novel disease and guiding future research [ 13 ].

A reporting guideline for overviews of reviews is currently under development, i.e., Preferred Reporting Items for Overviews of Reviews (PRIOR) [ 18 ]. As the PRIOR checklist is still not published, this study was reported following the Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) 2009 statement [ 19 ]. The methodology used in this review was adapted from the Cochrane Handbook for Systematic Reviews of Interventions and also followed established methodological considerations for analyzing existing systematic reviews [ 14 ].

Approval of a research ethics committee was not necessary as the study analyzed only publicly available articles.

Eligibility criteria

Systematic reviews were included if they analyzed primary data from patients infected with SARS-CoV-2 as confirmed by RT-PCR or another pre-specified diagnostic technique. Eligible reviews covered all topics related to COVID-19 including, but not limited to, those that reported clinical symptoms, diagnostic methods, therapeutic interventions, laboratory findings, or radiological results. Both full manuscripts and abbreviated versions, such as letters, were eligible.

No restrictions were imposed on the design of the primary studies included within the systematic reviews, the last search date, whether the review included meta-analyses or language. Reviews related to SARS-CoV-2 and other coronaviruses were eligible, but from those reviews, we analyzed only data related to SARS-CoV-2.

No consensus definition exists for a systematic review [ 20 ], and debates continue about the defining characteristics of a systematic review [ 21 ]. Cochrane’s guidance for overviews of reviews recommends setting pre-established criteria for making decisions around inclusion [ 14 ]. That is supported by a recent scoping review about guidance for overviews of systematic reviews [ 22 ].

Thus, for this study, we defined a systematic review as a research report which searched for primary research studies on a specific topic using an explicit search strategy, had a detailed description of the methods with explicit inclusion criteria provided, and provided a summary of the included studies either in narrative or quantitative format (such as a meta-analysis). Cochrane and non-Cochrane systematic reviews were considered eligible for inclusion, with or without meta-analysis, and regardless of the study design, language restriction and methodology of the included primary studies. To be eligible for inclusion, reviews had to be clearly analyzing data related to SARS-CoV-2 (associated or not with other viruses). We excluded narrative reviews without those characteristics as these are less likely to be replicable and are more prone to bias.

Scoping reviews and rapid reviews were eligible for inclusion in this overview if they met our pre-defined inclusion criteria noted above. We included reviews that addressed SARS-CoV-2 and other coronaviruses if they reported separate data regarding SARS-CoV-2.

Information sources

Nine databases were searched for eligible records published between December 1, 2019, and March 24, 2020: Cochrane Database of Systematic Reviews via Cochrane Library, PubMed, EMBASE, CINAHL (Cumulative Index to Nursing and Allied Health Literature), Web of Sciences, LILACS (Latin American and Caribbean Health Sciences Literature), PDQ-Evidence, WHO’s Global Research on Coronavirus Disease (COVID-19), and Epistemonikos.

The comprehensive search strategy for each database is provided in Additional file 1 and was designed and conducted in collaboration with an information specialist. All retrieved records were primarily processed in EndNote, where duplicates were removed, and records were then imported into the Covidence platform [ 23 ]. In addition to database searches, we screened reference lists of reviews included after screening records retrieved via databases.

Study selection

All searches, screening of titles and abstracts, and record selection, were performed independently by two investigators using the Covidence platform [ 23 ]. Articles deemed potentially eligible were retrieved for full-text screening carried out independently by two investigators. Discrepancies at all stages were resolved by consensus. During the screening, records published in languages other than English were translated by a native/fluent speaker.

Data collection process

We custom designed a data extraction table for this study, which was piloted by two authors independently. Data extraction was performed independently by two authors. Conflicts were resolved by consensus or by consulting a third researcher.

We extracted the following data: article identification data (authors’ name and journal of publication), search period, number of databases searched, population or settings considered, main results and outcomes observed, and number of participants. From Web of Science (Clarivate Analytics, Philadelphia, PA, USA), we extracted journal rank (quartile) and Journal Impact Factor (JIF).

We categorized the following as primary outcomes: all-cause mortality, need for and length of mechanical ventilation, length of hospitalization (in days), admission to intensive care unit (yes/no), and length of stay in the intensive care unit.

The following outcomes were categorized as exploratory: diagnostic methods used for detection of the virus, male to female ratio, clinical symptoms, pharmacological and non-pharmacological interventions, laboratory findings (full blood count, liver enzymes, C-reactive protein, d-dimer, albumin, lipid profile, serum electrolytes, blood vitamin levels, glucose levels, and any other important biomarkers), and radiological findings (using radiography, computed tomography, magnetic resonance imaging or ultrasound).

We also collected data on reporting guidelines and requirements for the publication of systematic reviews and meta-analyses from journal websites where included reviews were published.

Quality assessment in individual reviews

Two researchers independently assessed the reviews’ quality using the “A MeaSurement Tool to Assess Systematic Reviews 2 (AMSTAR 2)”. We acknowledge that the AMSTAR 2 was created as “a critical appraisal tool for systematic reviews that include randomized or non-randomized studies of healthcare interventions, or both” [ 24 ]. However, since AMSTAR 2 was designed for systematic reviews of intervention trials, and we included additional types of systematic reviews, we adjusted some AMSTAR 2 ratings and reported these in Additional file 2 .

Adherence to each item was rated as follows: yes, partial yes, no, or not applicable (such as when a meta-analysis was not conducted). The overall confidence in the results of the review is rated as “critically low”, “low”, “moderate” or “high”, according to the AMSTAR 2 guidance based on seven critical domains, which are items 2, 4, 7, 9, 11, 13, 15 as defined by AMSTAR 2 authors [ 24 ]. We reported our adherence ratings for transparency of our decision with accompanying explanations, for each item, in each included review.

One of the included systematic reviews was conducted by some members of this author team [ 25 ]. This review was initially assessed independently by two authors who were not co-authors of that review to prevent the risk of bias in assessing this study.

Synthesis of results

For data synthesis, we prepared a table summarizing each systematic review. Graphs illustrating the mortality rate and clinical symptoms were created. We then prepared a narrative summary of the methods, findings, study strengths, and limitations.

For analysis of the prevalence of clinical outcomes, we extracted data on the number of events and the total number of patients to perform proportional meta-analysis using RStudio© software, with the “meta” package (version 4.9–6), using the “metaprop” function for reviews that did not perform a meta-analysis, excluding case studies because of the absence of variance. For reviews that did not perform a meta-analysis, we presented pooled results of proportions with their respective confidence intervals (95%) by the inverse variance method with a random-effects model, using the DerSimonian-Laird estimator for τ 2 . We adjusted data using Freeman-Tukey double arcosen transformation. Confidence intervals were calculated using the Clopper-Pearson method for individual studies. We created forest plots using the RStudio© software, with the “metafor” package (version 2.1–0) and “forest” function.

Managing overlapping systematic reviews

Some of the included systematic reviews that address the same or similar research questions may include the same primary studies in overviews. Including such overlapping reviews may introduce bias when outcome data from the same primary study are included in the analyses of an overview multiple times. Thus, in summaries of evidence, multiple-counting of the same outcome data will give data from some primary studies too much influence [ 14 ]. In this overview, we did not exclude overlapping systematic reviews because, according to Cochrane’s guidance, it may be appropriate to include all relevant reviews’ results if the purpose of the overview is to present and describe the current body of evidence on a topic [ 14 ]. To avoid any bias in summary estimates associated with overlapping reviews, we generated forest plots showing data from individual systematic reviews, but the results were not pooled because some primary studies were included in multiple reviews.

Our search retrieved 1063 publications, of which 175 were duplicates. Most publications were excluded after the title and abstract analysis ( n = 860). Among the 28 studies selected for full-text screening, 10 were excluded for the reasons described in Additional file 3 , and 18 were included in the final analysis (Fig. 1 ) [ 25 , 26 , 27 , 28 , 29 , 30 , 31 , 32 , 33 , 34 , 35 , 36 , 37 , 38 , 39 , 40 , 41 , 42 ]. Reference list screening did not retrieve any additional systematic reviews.

figure 1

PRISMA flow diagram

Characteristics of included reviews

Summary features of 18 systematic reviews are presented in Table 1 . They were published in 14 different journals. Only four of these journals had specific requirements for systematic reviews (with or without meta-analysis): European Journal of Internal Medicine, Journal of Clinical Medicine, Ultrasound in Obstetrics and Gynecology, and Clinical Research in Cardiology . Two journals reported that they published only invited reviews ( Journal of Medical Virology and Clinica Chimica Acta ). Three systematic reviews in our study were published as letters; one was labeled as a scoping review and another as a rapid review (Table 2 ).

All reviews were published in English, in first quartile (Q1) journals, with JIF ranging from 1.692 to 6.062. One review was empty, meaning that its search did not identify any relevant studies; i.e., no primary studies were included [ 36 ]. The remaining 17 reviews included 269 unique studies; the majority ( N = 211; 78%) were included in only a single review included in our study (range: 1 to 12). Primary studies included in the reviews were published between December 2019 and March 18, 2020, and comprised case reports, case series, cohorts, and other observational studies. We found only one review that included randomized clinical trials [ 38 ]. In the included reviews, systematic literature searches were performed from 2019 (entire year) up to March 9, 2020. Ten systematic reviews included meta-analyses. The list of primary studies found in the included systematic reviews is shown in Additional file 4 , as well as the number of reviews in which each primary study was included.

Population and study designs

Most of the reviews analyzed data from patients with COVID-19 who developed pneumonia, acute respiratory distress syndrome (ARDS), or any other correlated complication. One review aimed to evaluate the effectiveness of using surgical masks on preventing transmission of the virus [ 36 ], one review was focused on pediatric patients [ 34 ], and one review investigated COVID-19 in pregnant women [ 37 ]. Most reviews assessed clinical symptoms, laboratory findings, or radiological results.

Systematic review findings

The summary of findings from individual reviews is shown in Table 2 . Overall, all-cause mortality ranged from 0.3 to 13.9% (Fig. 2 ).

figure 2

A meta-analysis of the prevalence of mortality

Clinical symptoms

Seven reviews described the main clinical manifestations of COVID-19 [ 26 , 28 , 29 , 34 , 35 , 39 , 41 ]. Three of them provided only a narrative discussion of symptoms [ 26 , 34 , 35 ]. In the reviews that performed a statistical analysis of the incidence of different clinical symptoms, symptoms in patients with COVID-19 were (range values of point estimates): fever (82–95%), cough with or without sputum (58–72%), dyspnea (26–59%), myalgia or muscle fatigue (29–51%), sore throat (10–13%), headache (8–12%), gastrointestinal disorders, such as diarrhea, nausea or vomiting (5.0–9.0%), and others (including, in one study only: dizziness 12.1%) (Figs. 3 , 4 , 5 , 6 , 7 , 8 and 9 ). Three reviews assessed cough with and without sputum together; only one review assessed sputum production itself (28.5%).

figure 3

A meta-analysis of the prevalence of fever

figure 4

A meta-analysis of the prevalence of cough

figure 5

A meta-analysis of the prevalence of dyspnea

figure 6

A meta-analysis of the prevalence of fatigue or myalgia

figure 7

A meta-analysis of the prevalence of headache

figure 8

A meta-analysis of the prevalence of gastrointestinal disorders

figure 9

A meta-analysis of the prevalence of sore throat

Diagnostic aspects

Three reviews described methodologies, protocols, and tools used for establishing the diagnosis of COVID-19 [ 26 , 34 , 38 ]. The use of respiratory swabs (nasal or pharyngeal) or blood specimens to assess the presence of SARS-CoV-2 nucleic acid using RT-PCR assays was the most commonly used diagnostic method mentioned in the included studies. These diagnostic tests have been widely used, but their precise sensitivity and specificity remain unknown. One review included a Chinese study with clinical diagnosis with no confirmation of SARS-CoV-2 infection (patients were diagnosed with COVID-19 if they presented with at least two symptoms suggestive of COVID-19, together with laboratory and chest radiography abnormalities) [ 34 ].

Therapeutic possibilities

Pharmacological and non-pharmacological interventions (supportive therapies) used in treating patients with COVID-19 were reported in five reviews [ 25 , 27 , 34 , 35 , 38 ]. Antivirals used empirically for COVID-19 treatment were reported in seven reviews [ 25 , 27 , 34 , 35 , 37 , 38 , 41 ]; most commonly used were protease inhibitors (lopinavir, ritonavir, darunavir), nucleoside reverse transcriptase inhibitor (tenofovir), nucleotide analogs (remdesivir, galidesivir, ganciclovir), and neuraminidase inhibitors (oseltamivir). Umifenovir, a membrane fusion inhibitor, was investigated in two studies [ 25 , 35 ]. Possible supportive interventions analyzed were different types of oxygen supplementation and breathing support (invasive or non-invasive ventilation) [ 25 ]. The use of antibiotics, both empirically and to treat secondary pneumonia, was reported in six studies [ 25 , 26 , 27 , 34 , 35 , 38 ]. One review specifically assessed evidence on the efficacy and safety of the anti-malaria drug chloroquine [ 27 ]. It identified 23 ongoing trials investigating the potential of chloroquine as a therapeutic option for COVID-19, but no verifiable clinical outcomes data. The use of mesenchymal stem cells, antifungals, and glucocorticoids were described in four reviews [ 25 , 34 , 35 , 38 ].

Laboratory and radiological findings

Of the 18 reviews included in this overview, eight analyzed laboratory parameters in patients with COVID-19 [ 25 , 29 , 30 , 32 , 33 , 34 , 35 , 39 ]; elevated C-reactive protein levels, associated with lymphocytopenia, elevated lactate dehydrogenase, as well as slightly elevated aspartate and alanine aminotransferase (AST, ALT) were commonly described in those eight reviews. Lippi et al. assessed cardiac troponin I (cTnI) [ 25 ], procalcitonin [ 32 ], and platelet count [ 33 ] in COVID-19 patients. Elevated levels of procalcitonin [ 32 ] and cTnI [ 30 ] were more likely to be associated with a severe disease course (requiring intensive care unit admission and intubation). Furthermore, thrombocytopenia was frequently observed in patients with complicated COVID-19 infections [ 33 ].

Chest imaging (chest radiography and/or computed tomography) features were assessed in six reviews, all of which described a frequent pattern of local or bilateral multilobar ground-glass opacity [ 25 , 34 , 35 , 39 , 40 , 41 ]. Those six reviews showed that septal thickening, bronchiectasis, pleural and cardiac effusions, halo signs, and pneumothorax were observed in patients suffering from COVID-19.

Quality of evidence in individual systematic reviews

Table 3 shows the detailed results of the quality assessment of 18 systematic reviews, including the assessment of individual items and summary assessment. A detailed explanation for each decision in each review is available in Additional file 5 .

Using AMSTAR 2 criteria, confidence in the results of all 18 reviews was rated as “critically low” (Table 3 ). Common methodological drawbacks were: omission of prospective protocol submission or publication; use of inappropriate search strategy: lack of independent and dual literature screening and data-extraction (or methodology unclear); absence of an explanation for heterogeneity among the studies included; lack of reasons for study exclusion (or rationale unclear).

Risk of bias assessment, based on a reported methodological tool, and quality of evidence appraisal, in line with the Grading of Recommendations Assessment, Development, and Evaluation (GRADE) method, were reported only in one review [ 25 ]. Five reviews presented a table summarizing bias, using various risk of bias tools [ 25 , 29 , 39 , 40 , 41 ]. One review analyzed “study quality” [ 37 ]. One review mentioned the risk of bias assessment in the methodology but did not provide any related analysis [ 28 ].

This overview of systematic reviews analyzed the first 18 systematic reviews published after the onset of the COVID-19 pandemic, up to March 24, 2020, with primary studies involving more than 60,000 patients. Using AMSTAR-2, we judged that our confidence in all those reviews was “critically low”. Ten reviews included meta-analyses. The reviews presented data on clinical manifestations, laboratory and radiological findings, and interventions. We found no systematic reviews on the utility of diagnostic tests.

Symptoms were reported in seven reviews; most of the patients had a fever, cough, dyspnea, myalgia or muscle fatigue, and gastrointestinal disorders such as diarrhea, nausea, or vomiting. Olfactory dysfunction (anosmia or dysosmia) has been described in patients infected with COVID-19 [ 43 ]; however, this was not reported in any of the reviews included in this overview. During the SARS outbreak in 2002, there were reports of impairment of the sense of smell associated with the disease [ 44 , 45 ].

The reported mortality rates ranged from 0.3 to 14% in the included reviews. Mortality estimates are influenced by the transmissibility rate (basic reproduction number), availability of diagnostic tools, notification policies, asymptomatic presentations of the disease, resources for disease prevention and control, and treatment facilities; variability in the mortality rate fits the pattern of emerging infectious diseases [ 46 ]. Furthermore, the reported cases did not consider asymptomatic cases, mild cases where individuals have not sought medical treatment, and the fact that many countries had limited access to diagnostic tests or have implemented testing policies later than the others. Considering the lack of reviews assessing diagnostic testing (sensitivity, specificity, and predictive values of RT-PCT or immunoglobulin tests), and the preponderance of studies that assessed only symptomatic individuals, considerable imprecision around the calculated mortality rates existed in the early stage of the COVID-19 pandemic.

Few reviews included treatment data. Those reviews described studies considered to be at a very low level of evidence: usually small, retrospective studies with very heterogeneous populations. Seven reviews analyzed laboratory parameters; those reviews could have been useful for clinicians who attend patients suspected of COVID-19 in emergency services worldwide, such as assessing which patients need to be reassessed more frequently.

All systematic reviews scored poorly on the AMSTAR 2 critical appraisal tool for systematic reviews. Most of the original studies included in the reviews were case series and case reports, impacting the quality of evidence. Such evidence has major implications for clinical practice and the use of these reviews in evidence-based practice and policy. Clinicians, patients, and policymakers can only have the highest confidence in systematic review findings if high-quality systematic review methodologies are employed. The urgent need for information during a pandemic does not justify poor quality reporting.

We acknowledge that there are numerous challenges associated with analyzing COVID-19 data during a pandemic [ 47 ]. High-quality evidence syntheses are needed for decision-making, but each type of evidence syntheses is associated with its inherent challenges.

The creation of classic systematic reviews requires considerable time and effort; with massive research output, they quickly become outdated, and preparing updated versions also requires considerable time. A recent study showed that updates of non-Cochrane systematic reviews are published a median of 5 years after the publication of the previous version [ 48 ].

Authors may register a review and then abandon it [ 49 ], but the existence of a public record that is not updated may lead other authors to believe that the review is still ongoing. A quarter of Cochrane review protocols remains unpublished as completed systematic reviews 8 years after protocol publication [ 50 ].

Rapid reviews can be used to summarize the evidence, but they involve methodological sacrifices and simplifications to produce information promptly, with inconsistent methodological approaches [ 51 ]. However, rapid reviews are justified in times of public health emergencies, and even Cochrane has resorted to publishing rapid reviews in response to the COVID-19 crisis [ 52 ]. Rapid reviews were eligible for inclusion in this overview, but only one of the 18 reviews included in this study was labeled as a rapid review.

Ideally, COVID-19 evidence would be continually summarized in a series of high-quality living systematic reviews, types of evidence synthesis defined as “ a systematic review which is continually updated, incorporating relevant new evidence as it becomes available ” [ 53 ]. However, conducting living systematic reviews requires considerable resources, calling into question the sustainability of such evidence synthesis over long periods [ 54 ].

Research reports about COVID-19 will contribute to research waste if they are poorly designed, poorly reported, or simply not necessary. In principle, systematic reviews should help reduce research waste as they usually provide recommendations for further research that is needed or may advise that sufficient evidence exists on a particular topic [ 55 ]. However, systematic reviews can also contribute to growing research waste when they are not needed, or poorly conducted and reported. Our present study clearly shows that most of the systematic reviews that were published early on in the COVID-19 pandemic could be categorized as research waste, as our confidence in their results is critically low.

Our study has some limitations. One is that for AMSTAR 2 assessment we relied on information available in publications; we did not attempt to contact study authors for clarifications or additional data. In three reviews, the methodological quality appraisal was challenging because they were published as letters, or labeled as rapid communications. As a result, various details about their review process were not included, leading to AMSTAR 2 questions being answered as “not reported”, resulting in low confidence scores. Full manuscripts might have provided additional information that could have led to higher confidence in the results. In other words, low scores could reflect incomplete reporting, not necessarily low-quality review methods. To make their review available more rapidly and more concisely, the authors may have omitted methodological details. A general issue during a crisis is that speed and completeness must be balanced. However, maintaining high standards requires proper resourcing and commitment to ensure that the users of systematic reviews can have high confidence in the results.

Furthermore, we used adjusted AMSTAR 2 scoring, as the tool was designed for critical appraisal of reviews of interventions. Some reviews may have received lower scores than actually warranted in spite of these adjustments.

Another limitation of our study may be the inclusion of multiple overlapping reviews, as some included reviews included the same primary studies. According to the Cochrane Handbook, including overlapping reviews may be appropriate when the review’s aim is “ to present and describe the current body of systematic review evidence on a topic ” [ 12 ], which was our aim. To avoid bias with summarizing evidence from overlapping reviews, we presented the forest plots without summary estimates. The forest plots serve to inform readers about the effect sizes for outcomes that were reported in each review.

Several authors from this study have contributed to one of the reviews identified [ 25 ]. To reduce the risk of any bias, two authors who did not co-author the review in question initially assessed its quality and limitations.

Finally, we note that the systematic reviews included in our overview may have had issues that our analysis did not identify because we did not analyze their primary studies to verify the accuracy of the data and information they presented. We give two examples to substantiate this possibility. Lovato et al. wrote a commentary on the review of Sun et al. [ 41 ], in which they criticized the authors’ conclusion that sore throat is rare in COVID-19 patients [ 56 ]. Lovato et al. highlighted that multiple studies included in Sun et al. did not accurately describe participants’ clinical presentations, warning that only three studies clearly reported data on sore throat [ 56 ].

In another example, Leung [ 57 ] warned about the review of Li, L.Q. et al. [ 29 ]: “ it is possible that this statistic was computed using overlapped samples, therefore some patients were double counted ”. Li et al. responded to Leung that it is uncertain whether the data overlapped, as they used data from published articles and did not have access to the original data; they also reported that they requested original data and that they plan to re-do their analyses once they receive them; they also urged readers to treat the data with caution [ 58 ]. This points to the evolving nature of evidence during a crisis.

Our study’s strength is that this overview adds to the current knowledge by providing a comprehensive summary of all the evidence synthesis about COVID-19 available early after the onset of the pandemic. This overview followed strict methodological criteria, including a comprehensive and sensitive search strategy and a standard tool for methodological appraisal of systematic reviews.

In conclusion, in this overview of systematic reviews, we analyzed evidence from the first 18 systematic reviews that were published after the emergence of COVID-19. However, confidence in the results of all the reviews was “critically low”. Thus, systematic reviews that were published early on in the pandemic could be categorized as research waste. Even during public health emergencies, studies and systematic reviews should adhere to established methodological standards to provide patients, clinicians, and decision-makers trustworthy evidence.

Availability of data and materials

All data collected and analyzed within this study are available from the corresponding author on reasonable request.

World Health Organization. Timeline - COVID-19: Available at: https://www.who.int/news/item/29-06-2020-covidtimeline . Accessed 1 June 2021.

COVID-19 Dashboard by the Center for Systems Science and Engineering (CSSE) at Johns Hopkins University (JHU). Available at: https://coronavirus.jhu.edu/map.html . Accessed 1 June 2021.

Anzai A, Kobayashi T, Linton NM, Kinoshita R, Hayashi K, Suzuki A, et al. Assessing the Impact of Reduced Travel on Exportation Dynamics of Novel Coronavirus Infection (COVID-19). J Clin Med. 2020;9(2):601.

Chinazzi M, Davis JT, Ajelli M, Gioannini C, Litvinova M, Merler S, et al. The effect of travel restrictions on the spread of the 2019 novel coronavirus (COVID-19) outbreak. Science. 2020;368(6489):395–400. https://doi.org/10.1126/science.aba9757 .

Article   CAS   PubMed   PubMed Central   Google Scholar  

Fidahic M, Nujic D, Runjic R, Civljak M, Markotic F, Lovric Makaric Z, et al. Research methodology and characteristics of journal articles with original data, preprint articles and registered clinical trial protocols about COVID-19. BMC Med Res Methodol. 2020;20(1):161. https://doi.org/10.1186/s12874-020-01047-2 .

EPPI Centre . COVID-19: a living systematic map of the evidence. Available at: http://eppi.ioe.ac.uk/cms/Projects/DepartmentofHealthandSocialCare/Publishedreviews/COVID-19Livingsystematicmapoftheevidence/tabid/3765/Default.aspx . Accessed 1 June 2021.

NCBI SARS-CoV-2 Resources. Available at: https://www.ncbi.nlm.nih.gov/sars-cov-2/ . Accessed 1 June 2021.

Gustot T. Quality and reproducibility during the COVID-19 pandemic. JHEP Rep. 2020;2(4):100141. https://doi.org/10.1016/j.jhepr.2020.100141 .

Article   PubMed   PubMed Central   Google Scholar  

Kodvanj, I., et al., Publishing of COVID-19 Preprints in Peer-reviewed Journals, Preprinting Trends, Public Discussion and Quality Issues. Preprint article. bioRxiv 2020.11.23.394577; doi: https://doi.org/10.1101/2020.11.23.394577 .

Dobler CC. Poor quality research and clinical practice during COVID-19. Breathe (Sheff). 2020;16(2):200112. https://doi.org/10.1183/20734735.0112-2020 .

Article   Google Scholar  

Bastian H, Glasziou P, Chalmers I. Seventy-five trials and eleven systematic reviews a day: how will we ever keep up? PLoS Med. 2010;7(9):e1000326. https://doi.org/10.1371/journal.pmed.1000326 .

Lunny C, Brennan SE, McDonald S, McKenzie JE. Toward a comprehensive evidence map of overview of systematic review methods: paper 1-purpose, eligibility, search and data extraction. Syst Rev. 2017;6(1):231. https://doi.org/10.1186/s13643-017-0617-1 .

Pollock M, Fernandes RM, Becker LA, Pieper D, Hartling L. Chapter V: Overviews of Reviews. In: Higgins JPT, Thomas J, Chandler J, Cumpston M, Li T, Page MJ, Welch VA (editors). Cochrane Handbook for Systematic Reviews of Interventions version 6.1 (updated September 2020). Cochrane. 2020. Available from www.training.cochrane.org/handbook .

Higgins JPT, Thomas J, Chandler J, Cumpston M, Li T, Page MJ, et al. Cochrane handbook for systematic reviews of interventions version 6.1 (updated September 2020). Cochrane. 2020; Available from www.training.cochrane.org/handbook .

Pollock M, Fernandes RM, Newton AS, Scott SD, Hartling L. The impact of different inclusion decisions on the comprehensiveness and complexity of overviews of reviews of healthcare interventions. Syst Rev. 2019;8(1):18. https://doi.org/10.1186/s13643-018-0914-3 .

Pollock M, Fernandes RM, Newton AS, Scott SD, Hartling L. A decision tool to help researchers make decisions about including systematic reviews in overviews of reviews of healthcare interventions. Syst Rev. 2019;8(1):29. https://doi.org/10.1186/s13643-018-0768-8 .

Hunt H, Pollock A, Campbell P, Estcourt L, Brunton G. An introduction to overviews of reviews: planning a relevant research question and objective for an overview. Syst Rev. 2018;7(1):39. https://doi.org/10.1186/s13643-018-0695-8 .

Pollock M, Fernandes RM, Pieper D, Tricco AC, Gates M, Gates A, et al. Preferred reporting items for overviews of reviews (PRIOR): a protocol for development of a reporting guideline for overviews of reviews of healthcare interventions. Syst Rev. 2019;8(1):335. https://doi.org/10.1186/s13643-019-1252-9 .

Moher D, Liberati A, Tetzlaff J, Altman DG, PRISMA Group. Preferred reporting items for systematic reviews and meta-analyses: the PRISMA statement. Open Med. 2009;3(3):e123–30.

Krnic Martinic M, Pieper D, Glatt A, Puljak L. Definition of a systematic review used in overviews of systematic reviews, meta-epidemiological studies and textbooks. BMC Med Res Methodol. 2019;19(1):203. https://doi.org/10.1186/s12874-019-0855-0 .

Puljak L. If there is only one author or only one database was searched, a study should not be called a systematic review. J Clin Epidemiol. 2017;91:4–5. https://doi.org/10.1016/j.jclinepi.2017.08.002 .

Article   PubMed   Google Scholar  

Gates M, Gates A, Guitard S, Pollock M, Hartling L. Guidance for overviews of reviews continues to accumulate, but important challenges remain: a scoping review. Syst Rev. 2020;9(1):254. https://doi.org/10.1186/s13643-020-01509-0 .

Covidence - systematic review software. Available at: https://www.covidence.org/ . Accessed 1 June 2021.

Shea BJ, Reeves BC, Wells G, Thuku M, Hamel C, Moran J, et al. AMSTAR 2: a critical appraisal tool for systematic reviews that include randomised or non-randomised studies of healthcare interventions, or both. BMJ. 2017;358:j4008.

Borges do Nascimento IJ, et al. Novel Coronavirus Infection (COVID-19) in Humans: A Scoping Review and Meta-Analysis. J Clin Med. 2020;9(4):941.

Article   PubMed Central   Google Scholar  

Adhikari SP, Meng S, Wu YJ, Mao YP, Ye RX, Wang QZ, et al. Epidemiology, causes, clinical manifestation and diagnosis, prevention and control of coronavirus disease (COVID-19) during the early outbreak period: a scoping review. Infect Dis Poverty. 2020;9(1):29. https://doi.org/10.1186/s40249-020-00646-x .

Cortegiani A, Ingoglia G, Ippolito M, Giarratano A, Einav S. A systematic review on the efficacy and safety of chloroquine for the treatment of COVID-19. J Crit Care. 2020;57:279–83. https://doi.org/10.1016/j.jcrc.2020.03.005 .

Li B, Yang J, Zhao F, Zhi L, Wang X, Liu L, et al. Prevalence and impact of cardiovascular metabolic diseases on COVID-19 in China. Clin Res Cardiol. 2020;109(5):531–8. https://doi.org/10.1007/s00392-020-01626-9 .

Article   CAS   PubMed   Google Scholar  

Li LQ, Huang T, Wang YQ, Wang ZP, Liang Y, Huang TB, et al. COVID-19 patients’ clinical characteristics, discharge rate, and fatality rate of meta-analysis. J Med Virol. 2020;92(6):577–83. https://doi.org/10.1002/jmv.25757 .

Lippi G, Lavie CJ, Sanchis-Gomar F. Cardiac troponin I in patients with coronavirus disease 2019 (COVID-19): evidence from a meta-analysis. Prog Cardiovasc Dis. 2020;63(3):390–1. https://doi.org/10.1016/j.pcad.2020.03.001 .

Lippi G, Henry BM. Active smoking is not associated with severity of coronavirus disease 2019 (COVID-19). Eur J Intern Med. 2020;75:107–8. https://doi.org/10.1016/j.ejim.2020.03.014 .

Lippi G, Plebani M. Procalcitonin in patients with severe coronavirus disease 2019 (COVID-19): a meta-analysis. Clin Chim Acta. 2020;505:190–1. https://doi.org/10.1016/j.cca.2020.03.004 .

Lippi G, Plebani M, Henry BM. Thrombocytopenia is associated with severe coronavirus disease 2019 (COVID-19) infections: a meta-analysis. Clin Chim Acta. 2020;506:145–8. https://doi.org/10.1016/j.cca.2020.03.022 .

Ludvigsson JF. Systematic review of COVID-19 in children shows milder cases and a better prognosis than adults. Acta Paediatr. 2020;109(6):1088–95. https://doi.org/10.1111/apa.15270 .

Lupia T, Scabini S, Mornese Pinna S, di Perri G, de Rosa FG, Corcione S. 2019 novel coronavirus (2019-nCoV) outbreak: a new challenge. J Glob Antimicrob Resist. 2020;21:22–7. https://doi.org/10.1016/j.jgar.2020.02.021 .

Marasinghe, K.M., A systematic review investigating the effectiveness of face mask use in limiting the spread of COVID-19 among medically not diagnosed individuals: shedding light on current recommendations provided to individuals not medically diagnosed with COVID-19. Research Square. Preprint article. doi : https://doi.org/10.21203/rs.3.rs-16701/v1 . 2020 .

Mullins E, Evans D, Viner RM, O’Brien P, Morris E. Coronavirus in pregnancy and delivery: rapid review. Ultrasound Obstet Gynecol. 2020;55(5):586–92. https://doi.org/10.1002/uog.22014 .

Pang J, Wang MX, Ang IYH, Tan SHX, Lewis RF, Chen JIP, et al. Potential Rapid Diagnostics, Vaccine and Therapeutics for 2019 Novel coronavirus (2019-nCoV): a systematic review. J Clin Med. 2020;9(3):623.

Rodriguez-Morales AJ, Cardona-Ospina JA, Gutiérrez-Ocampo E, Villamizar-Peña R, Holguin-Rivera Y, Escalera-Antezana JP, et al. Clinical, laboratory and imaging features of COVID-19: a systematic review and meta-analysis. Travel Med Infect Dis. 2020;34:101623. https://doi.org/10.1016/j.tmaid.2020.101623 .

Salehi S, Abedi A, Balakrishnan S, Gholamrezanezhad A. Coronavirus disease 2019 (COVID-19): a systematic review of imaging findings in 919 patients. AJR Am J Roentgenol. 2020;215(1):87–93. https://doi.org/10.2214/AJR.20.23034 .

Sun P, Qie S, Liu Z, Ren J, Li K, Xi J. Clinical characteristics of hospitalized patients with SARS-CoV-2 infection: a single arm meta-analysis. J Med Virol. 2020;92(6):612–7. https://doi.org/10.1002/jmv.25735 .

Yang J, Zheng Y, Gou X, Pu K, Chen Z, Guo Q, et al. Prevalence of comorbidities and its effects in patients infected with SARS-CoV-2: a systematic review and meta-analysis. Int J Infect Dis. 2020;94:91–5. https://doi.org/10.1016/j.ijid.2020.03.017 .

Bassetti M, Vena A, Giacobbe DR. The novel Chinese coronavirus (2019-nCoV) infections: challenges for fighting the storm. Eur J Clin Investig. 2020;50(3):e13209. https://doi.org/10.1111/eci.13209 .

Article   CAS   Google Scholar  

Hwang CS. Olfactory neuropathy in severe acute respiratory syndrome: report of a case. Acta Neurol Taiwanica. 2006;15(1):26–8.

Google Scholar  

Suzuki M, Saito K, Min WP, Vladau C, Toida K, Itoh H, et al. Identification of viruses in patients with postviral olfactory dysfunction. Laryngoscope. 2007;117(2):272–7. https://doi.org/10.1097/01.mlg.0000249922.37381.1e .

Rajgor DD, Lee MH, Archuleta S, Bagdasarian N, Quek SC. The many estimates of the COVID-19 case fatality rate. Lancet Infect Dis. 2020;20(7):776–7. https://doi.org/10.1016/S1473-3099(20)30244-9 .

Wolkewitz M, Puljak L. Methodological challenges of analysing COVID-19 data during the pandemic. BMC Med Res Methodol. 2020;20(1):81. https://doi.org/10.1186/s12874-020-00972-6 .

Rombey T, Lochner V, Puljak L, Könsgen N, Mathes T, Pieper D. Epidemiology and reporting characteristics of non-Cochrane updates of systematic reviews: a cross-sectional study. Res Synth Methods. 2020;11(3):471–83. https://doi.org/10.1002/jrsm.1409 .

Runjic E, Rombey T, Pieper D, Puljak L. Half of systematic reviews about pain registered in PROSPERO were not published and the majority had inaccurate status. J Clin Epidemiol. 2019;116:114–21. https://doi.org/10.1016/j.jclinepi.2019.08.010 .

Runjic E, Behmen D, Pieper D, Mathes T, Tricco AC, Moher D, et al. Following Cochrane review protocols to completion 10 years later: a retrospective cohort study and author survey. J Clin Epidemiol. 2019;111:41–8. https://doi.org/10.1016/j.jclinepi.2019.03.006 .

Tricco AC, Antony J, Zarin W, Strifler L, Ghassemi M, Ivory J, et al. A scoping review of rapid review methods. BMC Med. 2015;13(1):224. https://doi.org/10.1186/s12916-015-0465-6 .

COVID-19 Rapid Reviews: Cochrane’s response so far. Available at: https://training.cochrane.org/resource/covid-19-rapid-reviews-cochrane-response-so-far . Accessed 1 June 2021.

Cochrane. Living systematic reviews. Available at: https://community.cochrane.org/review-production/production-resources/living-systematic-reviews . Accessed 1 June 2021.

Millard T, Synnot A, Elliott J, Green S, McDonald S, Turner T. Feasibility and acceptability of living systematic reviews: results from a mixed-methods evaluation. Syst Rev. 2019;8(1):325. https://doi.org/10.1186/s13643-019-1248-5 .

Babic A, Poklepovic Pericic T, Pieper D, Puljak L. How to decide whether a systematic review is stable and not in need of updating: analysis of Cochrane reviews. Res Synth Methods. 2020;11(6):884–90. https://doi.org/10.1002/jrsm.1451 .

Lovato A, Rossettini G, de Filippis C. Sore throat in COVID-19: comment on “clinical characteristics of hospitalized patients with SARS-CoV-2 infection: a single arm meta-analysis”. J Med Virol. 2020;92(7):714–5. https://doi.org/10.1002/jmv.25815 .

Leung C. Comment on Li et al: COVID-19 patients’ clinical characteristics, discharge rate, and fatality rate of meta-analysis. J Med Virol. 2020;92(9):1431–2. https://doi.org/10.1002/jmv.25912 .

Li LQ, Huang T, Wang YQ, Wang ZP, Liang Y, Huang TB, et al. Response to Char’s comment: comment on Li et al: COVID-19 patients’ clinical characteristics, discharge rate, and fatality rate of meta-analysis. J Med Virol. 2020;92(9):1433. https://doi.org/10.1002/jmv.25924 .

Download references

Acknowledgments

We thank Catherine Henderson DPhil from Swanscoe Communications for pro bono medical writing and editing support. We acknowledge support from the Covidence Team, specifically Anneliese Arno. We thank the whole International Network of Coronavirus Disease 2019 (InterNetCOVID-19) for their commitment and involvement. Members of the InterNetCOVID-19 are listed in Additional file 6 . We thank Pavel Cerny and Roger Crosthwaite for guiding the team supervisor (IJBN) on human resources management.

This research received no external funding.

Author information

Authors and affiliations.

University Hospital and School of Medicine, Universidade Federal de Minas Gerais, Belo Horizonte, Minas Gerais, Brazil

Israel Júnior Borges do Nascimento & Milena Soriano Marcolino

Medical College of Wisconsin, Milwaukee, WI, USA

Israel Júnior Borges do Nascimento

Helene Fuld Health Trust National Institute for Evidence-based Practice in Nursing and Healthcare, College of Nursing, The Ohio State University, Columbus, OH, USA

Dónal P. O’Mathúna

School of Nursing, Psychotherapy and Community Health, Dublin City University, Dublin, Ireland

Department of Anesthesiology, Intensive Care and Pain Medicine, University of Münster, Münster, Germany

Thilo Caspar von Groote

Department of Sport and Health Science, Technische Universität München, Munich, Germany

Hebatullah Mohamed Abdulazeem

School of Health Sciences, Faculty of Health and Medicine, The University of Newcastle, Callaghan, Australia

Ishanka Weerasekara

Department of Physiotherapy, Faculty of Allied Health Sciences, University of Peradeniya, Peradeniya, Sri Lanka

Cochrane Croatia, University of Split, School of Medicine, Split, Croatia

Ana Marusic, Irena Zakarija-Grkovic & Tina Poklepovic Pericic

Center for Evidence-Based Medicine and Health Care, Catholic University of Croatia, Ilica 242, 10000, Zagreb, Croatia

Livia Puljak

Cochrane Brazil, Evidence-Based Health Program, Universidade Federal de São Paulo, São Paulo, Brazil

Vinicius Tassoni Civile & Alvaro Nagib Atallah

Yorkville University, Fredericton, New Brunswick, Canada

Santino Filoso

Laboratory for Industrial and Applied Mathematics (LIAM), Department of Mathematics and Statistics, York University, Toronto, Ontario, Canada

Nicola Luigi Bragazzi

You can also search for this author in PubMed   Google Scholar

Contributions

IJBN conceived the research idea and worked as a project coordinator. DPOM, TCVG, HMA, IW, AM, LP, VTC, IZG, TPP, ANA, SF, NLB and MSM were involved in data curation, formal analysis, investigation, methodology, and initial draft writing. All authors revised the manuscript critically for the content. The author(s) read and approved the final manuscript.

Corresponding author

Correspondence to Livia Puljak .

Ethics declarations

Ethics approval and consent to participate.

Not required as data was based on published studies.

Consent for publication

Not applicable.

Competing interests

The authors declare no conflict of interest.

Additional information

Publisher’s note.

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary Information

Additional file 1: appendix 1..

Search strategies used in the study.

Additional file 2: Appendix 2.

Adjusted scoring of AMSTAR 2 used in this study for systematic reviews of studies that did not analyze interventions.

Additional file 3: Appendix 3.

List of excluded studies, with reasons.

Additional file 4: Appendix 4.

Table of overlapping studies, containing the list of primary studies included, their visual overlap in individual systematic reviews, and the number in how many reviews each primary study was included.

Additional file 5: Appendix 5.

A detailed explanation of AMSTAR scoring for each item in each review.

Additional file 6: Appendix 6.

List of members and affiliates of International Network of Coronavirus Disease 2019 (InterNetCOVID-19).

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ . The Creative Commons Public Domain Dedication waiver ( http://creativecommons.org/publicdomain/zero/1.0/ ) applies to the data made available in this article, unless otherwise stated in a credit line to the data.

Reprints and permissions

About this article

Cite this article.

Borges do Nascimento, I.J., O’Mathúna, D.P., von Groote, T.C. et al. Coronavirus disease (COVID-19) pandemic: an overview of systematic reviews. BMC Infect Dis 21 , 525 (2021). https://doi.org/10.1186/s12879-021-06214-4

Download citation

Received : 12 April 2020

Accepted : 19 May 2021

Published : 04 June 2021

DOI : https://doi.org/10.1186/s12879-021-06214-4

Share this article

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

  • Coronavirus
  • Evidence-based medicine
  • Infectious diseases

BMC Infectious Diseases

ISSN: 1471-2334

covid 19 data analysis research paper

Click through the PLOS taxonomy to find articles in your field.

For more information about PLOS Subject Areas, click here .

Loading metrics

Open Access

Peer-reviewed

Research Article

Data-based analysis, modelling and forecasting of the COVID-19 outbreak

Roles Conceptualization, Data curation, Writing – original draft, Writing – review & editing

* E-mail: [email protected] (CS); [email protected] (CA)

Affiliation Department of Microbiology, Medical School, University of Athens, Athens, Greece

Roles Formal analysis, Methodology

Affiliation Consiglio Nazionale delle Ricerche, Science and Technology for Energy and Sustainable Mobility, Napoli, Italy

Roles Investigation, Writing – review & editing

Roles Conceptualization, Formal analysis, Methodology, Writing – original draft, Writing – review & editing

Affiliation Dipartimento di Matematica e Applicazioni “Renato Caccioppoli”, Università degli Studi di Napoli Federico II, Napoli, Italy

ORCID logo

  • Cleo Anastassopoulou, 
  • Lucia Russo, 
  • Athanasios Tsakris, 
  • Constantinos Siettos

PLOS

  • Published: March 31, 2020
  • https://doi.org/10.1371/journal.pone.0230405
  • Peer Review
  • Reader Comments

Fig 1

Since the first suspected case of coronavirus disease-2019 (COVID-19) on December 1st, 2019, in Wuhan, Hubei Province, China, a total of 40,235 confirmed cases and 909 deaths have been reported in China up to February 10, 2020, evoking fear locally and internationally. Here, based on the publicly available epidemiological data for Hubei, China from January 11 to February 10, 2020, we provide estimates of the main epidemiological parameters. In particular, we provide an estimation of the case fatality and case recovery ratios, along with their 90% confidence intervals as the outbreak evolves. On the basis of a Susceptible-Infectious-Recovered-Dead (SIDR) model, we provide estimations of the basic reproduction number ( R 0 ), and the per day infection mortality and recovery rates. By calibrating the parameters of the SIRD model to the reported data, we also attempt to forecast the evolution of the outbreak at the epicenter three weeks ahead, i.e. until February 29. As the number of infected individuals, especially of those with asymptomatic or mild courses, is suspected to be much higher than the official numbers, which can be considered only as a subset of the actual numbers of infected and recovered cases in the total population, we have repeated the calculations under a second scenario that considers twenty times the number of confirmed infected cases and forty times the number of recovered, leaving the number of deaths unchanged. Based on the reported data, the expected value of R 0 as computed considering the period from the 11th of January until the 18th of January, using the official counts of confirmed cases was found to be ∼4.6, while the one computed under the second scenario was found to be ∼3.2. Thus, based on the SIRD simulations, the estimated average value of R 0 was found to be ∼2.6 based on confirmed cases and ∼2 based on the second scenario. Our forecasting flashes a note of caution for the presently unfolding outbreak in China. Based on the official counts for confirmed cases, the simulations suggest that the cumulative number of infected could reach 180,000 (with a lower bound of 45,000) by February 29. Regarding the number of deaths, simulations forecast that on the basis of the up to the 10th of February reported data, the death toll might exceed 2,700 (as a lower bound) by February 29. Our analysis further reveals a significant decline of the case fatality ratio from January 26 to which various factors may have contributed, such as the severe control measures taken in Hubei, China (e.g. quarantine and hospitalization of infected individuals), but mainly because of the fact that the actual cumulative numbers of infected and recovered cases in the population most likely are much higher than the reported ones. Thus, in a scenario where we have taken twenty times the confirmed number of infected and forty times the confirmed number of recovered cases, the case fatality ratio is around ∼0.15% in the total population. Importantly, based on this scenario, simulations suggest a slow down of the outbreak in Hubei at the end of February.

Citation: Anastassopoulou C, Russo L, Tsakris A, Siettos C (2020) Data-based analysis, modelling and forecasting of the COVID-19 outbreak. PLoS ONE 15(3): e0230405. https://doi.org/10.1371/journal.pone.0230405

Editor: Sreekumar Othumpangat, Center for Disease control and Prevention, UNITED STATES

Received: February 11, 2020; Accepted: March 1, 2020; Published: March 31, 2020

Copyright: © 2020 Anastassopoulou et al. This is an open access article distributed under the terms of the Creative Commons Attribution License , which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.

Data Availability: The data used in this paper were acquired from https://gisanddata.maps.arcgis.com/apps/opsdashboard/index.html#/bda7594740fd40299423467b48e9ecf6 . In S1 Table we provide the data that we have used for this study, i.e. the cumulative confirmed cases of infected recovered and deaths from January 11 to February 10.

Funding: The authors received no specific funding for this work.

Competing interests: The authors have declared that no competing interests exist.

Introduction

An outbreak of “pneumonia of unknown etiology” in Wuhan, Hubei Province, China in early December 2019 has spiraled into an epidemic that is ravaging China and threatening to reach a pandemic state [ 1 ]. The causative agent soon proved to be a new betacoronavirus related to the Middle East Respiratory Syndrome virus (MERS-CoV) and the Severe Acute Respiratory Syndrome virus (SARS-CoV). The novel coronavirus SARS-CoV-2 disease has been named “COVID-19” by the World Health Organization (WHO) and on January 30, the COVID-19 outbreak was declared to constitute a Public Health Emergency of International Concern by the WHO Director-General [ 2 ]. Despite the lockdown of Wuhan and the suspension of all public transport, flights and trains on January 23, a total of 40,235 confirmed cases, including 6,484 (16.1%) with severe illness, and 909 deaths (2.2%) had been reported in China by the National Health Commission up to February 10, 2020; meanwhile, 319 cases and one death were reported outside of China, in 24 countries [ 3 ].

The origin of COVID-19 has not yet been determined although preliminary investigations are suggestive of a zoonotic, possibly of bat, origin [ 4 , 5 ]. Similarly to SARS-CoV and MERS-CoV, the novel virus is transmitted from person to person principally by respiratory droplets, causing such symptoms as fever, cough, and shortness of breath after a period believed to range from 2 to 14 days following infection, according to the Centers for Disease Control and Prevention (CDC) [ 1 , 6 , 7 ]. Preliminary data suggest that older males with comorbidities may be at higher risk for severe illness from COVID-19 [ 6 , 8 , 9 ]. However, the precise virologic and epidemiologic characteristics, including transmissibility and mortality, of this third zoonotic human coronavirus are still unknown.

Using the serial intervals (SI) of the two other well-known coronavirus diseases, MERS and SARS, as approximations for the true unknown SI, Zhao et al. estimated the mean basic reproduction number ( R 0 ) of SARS-CoV-2 to range between 2.24 (95% CI: 1.96-2.55) and 3.58 (95% CI: 2.89-4.39) in the early phase of the outbreak [ 10 ]. Very similar estimates, 2.2 (95% CI: 1.4-3.9), were obtained for R 0 at the early stages of the epidemic by Imai et al. 2.6 (95% CI: 1.5-3.5) [ 11 ], as well as by Li et al., who also reported a doubling in size every 7.4 days [ 1 ]. Wu et al. estimated the R 0 at 2.68 (95% CI: 2.47–2.86) with a doubling time every 6.4 days (95% CI: 5.8–7.1) and the epidemic growing exponentially in multiple major Chinese cities with a lag time behind the Wuhan outbreak of about 1–2 weeks [ 12 ].

Amidst such an important ongoing public health crisis that also has severe economic repercussions, we reverted to mathematical modelling that can shed light to essential epidemiologic parameters that determine the fate of the epidemic [ 13 ]. Here, we present the results of the analysis of time series of epidemiological data available in the public domain [ 14 – 16 ] (WHO, CDC, ECDC, NHC and DXY) from January 11 to February 10, 2020, and attempt a three-week forecast of the spreading dynamics of the emerged coronavirus epidemic in the epicenter in mainland China.

Methodology

covid 19 data analysis research paper

The basic reproduction number ( R 0 ) is one of the key values that can predict whether the infectious disease will spread into a population or die out. R 0 represents the average number of secondary cases that result from the introduction of a single infectious case in a totally susceptible population during the infectiousness period. Based on the reported data of confirmed cases, we provide estimations of the R 0 from the 16th up to the 20th of January in order to satisfy as much as possible the hypothesis of S ≈ N that is a necessary condition for the computation of R 0 .

covid 19 data analysis research paper

Furthermore, we calibrated the parameters of the SIRD model to fit the reported data. We first provide a coarse estimation of the recovery ( β ) and mortality rates ( γ ) of the SIRD model using the first period of the outbreak. Then, an estimation of the infection rate α is accomplished by “wrapping” around the SIRD simulator an optimization algorithm to fit the reported data from the 11th of January to the 10th of February. We have started our simulations with one infected person on the 16th of November, which has been suggested as a starting date of the epidemic and run the SIR model until the 10th of February. Below, we describe analytically our approach.

covid 19 data analysis research paper

The above system is defined in discrete time points t = 1, 2, …, with the corresponding initial condition at the very start of the epidemic: S (0) = N − 1, I (0) = 1, R (0) = D (0) = 0. Here, β and γ denote the “effective/apparent” per day recovery and fatality rates. Note that these parameters do not correspond to the actual per day recovery and mortality rates as the new cases of recovered and deaths come from infected cases several days back in time. However, one can attempt to provide some coarse estimations of the “effective/apparent” values of these epidemiological parameters based on the reported confirmed cases using an assumption and approach described in the next section.

Estimation of the basic reproduction number from the SIRD model

covid 19 data analysis research paper

Let us also denote by Δ X ( t ) = [Δ X (1), Δ X (2), ⋯, Δ X ( t )] T the t × 1 column vector containing all the reported new cases up to time t and by C Δ X ( t ) = [ C Δ X (1), C Δ X (2), ⋯, C Δ X ( t )] T , the t × 1 column vector containing the corresponding cumulative numbers up to time t . On the basis of Eqs ( 2 ), ( 3 ) and ( 4 ), one can provide a coarse estimation of the parameters R 0 , β and γ as follows.

covid 19 data analysis research paper

Note that one can use directly Eq (9) to compute R 0 with regression, without the need to compute first the other parameters, i.e. β , γ and α .

covid 19 data analysis research paper

Here, we used Eq (10) to estimate R 0 in order to reduce the noise included in the differences. Note that the above expression is a valid approximation only at the beginning of the spread of the disease.

covid 19 data analysis research paper

Estimation of the case fatality and case recovery ratios for the period January11-February 10

covid 19 data analysis research paper

As the reported data are just a subset of the actual number of infected and recovered cases including the asymptomatic and/or mild ones, we have repeated the above calculations considering twenty times the reported number of infected and forty times the reported number of recovered in the toal population, while leaving the reported number of dead the same given that their cataloguing is close to the actual number of deaths due to COVID-19.

Estimation of the “effective” SIRD model parameters

covid 19 data analysis research paper

As discussed, we have derived results using two different scenarios (see in Methodology). For each scenario, we first present the results for the basic reproduction number as well as the case fatality and case recovery ratios as obtained by solving the least squares problem using a rolling window of an one-day step. For their computation, we used the first six days i.e. from the 11th up to the 16th of January to provide the very first estimations. We then proceeded with the calculations by adding one day in the rolling window as described in the methodology until the 10th of February. We also report the corresponding 90% confidence intervals instead of the more standard 95% because of the small size of the data. For each window, we also report the corresponding coefficients of determination ( R 2 ) representing the proportion of the variance in the dependent variable that is predictable from the independent variables, and the root mean square of error (RMSE). The estimation of R 0 was based on the data until January 20, in order to satisfy as much as possible the hypothesis underlying its calculation by Eq (9) .

Then, as described above, we provide coarse estimations of the “effective” per day recovery and mortality rates of the SIRD model based on the reported data by solving the corresponding least squares problems. Then, an estimation of the infection rate α was obtained by “wrapping” around the SIRD simulator an optimization algorithm as described in the previous section. Finally, we provide tentative forecasts for the evolution of the outbreak based on both scenarios until the end of February.

Scenario I: Results obtained using the exact numbers of the reported confirmed cases

covid 19 data analysis research paper

  • PPT PowerPoint slide
  • PNG larger image
  • TIFF original image

The solid line corresponds to the mean value and dashed lines to lower and upper 90% confidence intervals.

https://doi.org/10.1371/journal.pone.0230405.g001

covid 19 data analysis research paper

Solid lines correspond to the mean values and dashed lines to lower and upper 90% confidence intervals.

https://doi.org/10.1371/journal.pone.0230405.g002

covid 19 data analysis research paper

https://doi.org/10.1371/journal.pone.0230405.g003

thumbnail

https://doi.org/10.1371/journal.pone.0230405.g004

thumbnail

https://doi.org/10.1371/journal.pone.0230405.g005

covid 19 data analysis research paper

Finally, using the derived values of the parameters α , β , γ , we performed simulations until the end of February. The results of the simulations are given in Figs 6 , 7 and 8 . Solid lines depict the evolution, when using the expected (mean) estimations and dashed lines illustrate the corresponding lower and upper bounds as computed at the limits of the confidence intervals of the estimated parameters.

thumbnail

Dots correspond to the number of confirmed cases from the 16th of January to the 10th of February. The initial date of the simulations was the 16th of November with one infected, zero recovered and zero deaths. Solid lines correspond to the dynamics obtained using the estimated expected values of the epidemiological parameters α = 0.191, β = 0.064 d −1 , γ = 0.01; dashed lines correspond to the lower and upper bounds derived by performing simulations on the limits of the confidence intervals of the parameters.

https://doi.org/10.1371/journal.pone.0230405.g006

thumbnail

https://doi.org/10.1371/journal.pone.0230405.g007

thumbnail

Dots correspond to the number of confirmed cases from 16th of January to the 10th of February. The initial date of the simulations was the 16th of November with one infected, zero recovered and zero deaths. Solid lines correspond to the dynamics obtained using the estimated expected values of the epidemiological parameters α = 0.191, β = 0.064 d −1 , γ = 0.01; dashed lines correspond to the lower and upper bounds derived by performing simulations on the limits of the confidence intervals of the parameters.

https://doi.org/10.1371/journal.pone.0230405.g008

As Figs 6 and 7 suggest, the forecast of the outbreak at the end of February, through the SIRD model is characterized by high uncertainty. In particular, simulations result in an expected number of ∼ 180,000 infected cases but with a high variation: the lower bound is at ∼ 45,000 infected cases while the upper bound is at ∼ 760,000 cases. Similarly for the recovered population, simulations result in an expected number of ∼ 60,000, while the lower and upper bounds are at ∼ 22,000 and ∼ 170,000, respectively. Finally, regarding the deaths, simulations result in an average number of ∼ 9,000, with lower and upper bounds, ∼ 2,700 and ∼ 34,000, respectively.

Thus, the expected trends of the simulations suggest that the mortality rate is lower than the estimated with the current data and thus the death toll is expected to be significantly less compared with the expected trends of the predictions.

As this paper was revised, the reported number of deaths on the 22th February was 2,344, while the expected number of the forecast was ∼4300 with a lower bound of ∼1,300. Regarding the number of infected and recovered cases by February 20, the cumulative numbers of confirmed reported cases were 64,084 infected and 15,299 recovered, while the expected trends of the forecasts were ∼83,000 for the infected and ∼28,000 for the recovered cases. Hence, based on this estimation, the evolution of the epidemic was well within the bounds of our forecasting.

Scenario II. Results obtained based by taking twenty times the number of infected and forty times the number of recovered people with respect to the confirmed cases

covid 19 data analysis research paper

https://doi.org/10.1371/journal.pone.0230405.g009

It is interesting to note that the above estimation of R 0 is close enough to the one reported in other studies (see in the Introduction for a review).

covid 19 data analysis research paper

https://doi.org/10.1371/journal.pone.0230405.g010

covid 19 data analysis research paper

https://doi.org/10.1371/journal.pone.0230405.g011

thumbnail

https://doi.org/10.1371/journal.pone.0230405.g012

thumbnail

https://doi.org/10.1371/journal.pone.0230405.g013

The computed values of the “effective” per day mortality and recovery rates of the SIRD model were γ ∼ 0.0005 and β ∼0.16 d −1 (corresponding to a recovery period of ∼ 6 d). Note that because of the extremely small number of the data used, the confidence intervals have been disregarded. Instead, for calculating the corresponding lower and upper bounds in our simulations, we have taken intervals of 20% around the expected least squares solutions. Hence, for γ we have taken the interval (0.0004 and 0.0006) and for β , we have taken the interval between (0.13 and 0.19) corresponding to an interval of recovery periods from 5 to 8 days.

Again, we used the SIRD simulator to provide estimation of the infection rate by optimization setting w 1 = 1, w 2 = 400, w 3 = 1 to balance the residuals of deaths with the scaled numbers of the infected and recovered cases. Thus, to find the optimal infection transmission rate, we used the SIRD simulations with β = 0.16 d −1 , and γ = 0.0005 and as initial conditions one infected, zero recovered, zero deaths on November 16th 2019, and ran until the 10th of February.

covid 19 data analysis research paper

Finally, using the derived values of the parameters α , β , γ , we have run the SIRD simulator until the end of February. The simulation results are given in Figs 14 , 15 and 16 . Solid lines depict the evolution, when using the expected (mean) estimations and dashed lines illustrate the corresponding lower and upper bounds as computed at the limits of the confidence intervals of the estimated parameters.

thumbnail

Dots correspond to the number of confirmed cases from 16th of Jan to the 10th of February. The initial date of the simulations was the 16th of November with one infected, zero recovered and zero deaths. Solid lines correspond to the dynamics obtained using the estimated expected values of the epidemiological parameters α = 0.319, β = 0.16 d −1 , γ = 0.0005; dashed lines correspond to the lower and upper bounds derived by performing simulations on the limits of the confidence intervals of the parameters.

https://doi.org/10.1371/journal.pone.0230405.g014

thumbnail

Dots correspond to the number of confirmed cases from 16th of January to the 10th of February. The initial date of the simulations was the 16th of November, with one infected, zero recovered and zero deaths. Solid lines correspond to the dynamics obtained using the estimated expected values of the epidemiological parameters α = 0.319, β = 0.16 d −1 , γ = 0.0005; dashed lines correspond to the lower and upper bounds derived by performing simulations on the limits of the confidence intervals of the parameters.

https://doi.org/10.1371/journal.pone.0230405.g015

thumbnail

Dots correspond to the number of confirmed cases from the 16th of November to the 10th of February. The initial date of the simulations was the 16th of November with zero infected, zero recovered and zero deaths. Solid lines correspond to the dynamics obtained using the estimated expected values of the epidemiological parameters α = 0.319, β = 0.16 d −1 , γ = 0.0005; dashed lines correspond to the lower and upper bounds derived by performing simulations on the limits of the confidence intervals of the parameters.

https://doi.org/10.1371/journal.pone.0230405.g016

Again as Figs 15 and 16 suggest, the forecast of the outbreak at the end of February, through the SIRD model is characterized by high uncertainty. In particular, in Scenario II, by February 29, simulations result in an expected actual number of ∼8m infected cases (corresponding to a ∼13% of the total population) with a lower bound at ∼720,000 and an upper bound at ∼37m cases. Similarly, for the recovered population, simulations result in an expected actual number of ∼4.5m (corresponding to a 8% of the total population), while the lower and upper bounds are at ∼430,000 and ∼23m, respectively. Finally, regarding the deaths, simulations under this scenario result in an average number of ∼14,000, with lower and upper bounds at ∼900 and ∼100,000.

Importantly, under this scenario, the simulations shown in Fig 14 suggest a decline of the outbreak at the end of February. Table 1 summarizes the above results for both scenarios.

thumbnail

https://doi.org/10.1371/journal.pone.0230405.t001

We note that the results derived under Scenario II seem to predict a slowdown of the outbreak in Hubei after the end of February.

We have proposed a methodology for the estimation of the key epidemiological parameters as well as the modelling and forecasting of the spread of the COVID-19 epidemic in Hubei, China by considering publicly available data from the 11th of January 2020 to the 10th of February 2020.

By the time of the acceptance of our paper, according to the official data released on the 29th of February, the cumulative number of confirmed infected cases in Hubei was ∼67,000, that of recovered was ∼31,300 and the death toll was ∼2,800. These numbers are within the lower bounds and expected trends of our forecasts from the 10th of February that are based on Scenario I. Importantly, by assuming a 20-fold scaling of the confirmed cumulative number of the infected cases and a 40-fold scaling of the confirmed number of the recovered cases in the total population, forecasts show a decline of the outbreak in Hubei at the end of February. Based on this scenario the case fatality rate in the total population is of the order of ∼0.15%.

At this point we should note that our SIRD modelling approach did not take into account many factors that play an important role in the dynamics of the disease such as the effect of the incubation period in the transmission dynamics, the heterogeneous contact transmission network, the effect of the measures already taken to combat the epidemic, the characteristics of the population (e.g. the effect of the age, people who had already health problems). Also the estimation of the model parameters is based on an assumption, considering just the first period in which the first cases were confirmed and reported. Of note, COVID-19, which is thought to be principally transmitted from person to person by respiratory droplets and fomites without excluding the possibility of the fecal-oral route [ 21 ] had been spreading for at least over a month and a half before the imposed lockdown and quarantine of Wuhan on January 23, having thus infected unknown numbers of people. The number of asymptomatic and mild cases with subclinical manifestations that probably did not present to hospitals for treatment may be substantial; these cases, which possibly represent the bulk of the COVID-19 infections, remain unrecognized, especially during the influenza season [ 22 ]. This highly likely gross under-detection and underreporting of mild or asymptomatic cases inevitably throws severe disease courses calculations and death rates out of context, distorting epidemiologic reality.

Another important factor that should be taken into consideration pertains to the diagnostic criteria used to determine infection status and confirm cases. A positive PCR test was required to be considered a confirmed case by China’s Novel Coronavirus Pneumonia Diagnosis and Treatment program in the early phase of the outbreak [ 14 ]. However, the sensitivity of nucleic acid testing for this novel viral pathogen may only be 30-50%, thereby often resulting in false negatives, particularly early in the course of illness. To complicate matters further, the guidance changed in the recently-released fourth edition of the program on February 6 to allow for diagnosis based on clinical presentation, but only in Hubei province [ 14 ].

The swiftly growing epidemic seems to be overwhelming even for the highly efficient Chinese logistics that did manage to build two new hospitals in record time to treat infected patients. Supportive care with extracorporeal membrane oxygenation (ECMO) in intensive care units (ICUs) is critical for severe respiratory disease. Large-scale capacities for such level of medical care in Hubei province, or elsewhere in the world for that matter, amidst this public health emergency may prove particularly challenging. We hope that the results of our analysis contribute to the elucidation of critical aspects of this outbreak so as to contain the novel coronavirus as soon as possible and mitigate its effects regionally, in mainland China, and internationally.

In the digital and globalized world of today, new data and information on the novel coronavirus and the evolution of the outbreak become available at an unprecedented pace. Still, crucial questions remain unanswered and accurate answers for predicting the dynamics of the outbreak simply cannot be obtained at this stage. We emphatically underline the uncertainty of available official data, particularly pertaining to the true baseline number of infected (cases), that may lead to ambiguous results and inaccurate forecasts by orders of magnitude, as also pointed out by other investigators [ 1 , 17 , 22 ].

Supporting information

S1 table. reported cumulative numbers of cases for the hubei region, china for the period january 11-february 10..

https://doi.org/10.1371/journal.pone.0230405.s001

  • 1. Li Q, Guan X, Wu P, et al. Early Transmission Dynamics in Wuhan, China, of Novel Coronavirus-Infected Pneumonia; 2020. Available from: https://doi.org/10.1088%2F0951-7715%2F16%2F2%2F308 .
  • 2. Organization WH. WHO Statement Regarding Cluster of Pneumonia Cases in Wuhan, China; 2020. Available from: https://www.who.int/china/news/detail/09-01-2020-who-statement-regarding-cluster-of-pneumonia-cases-in-wuhan-china .
  • 3. Organization WH. Novel coronavirus(2019-nCoV). Situation report 21. Geneva, Switzerland: World Health Organization; 2020; 2020. Available from: https://www.who.int/docs/default-source/coronaviruse/situation-reports/20200210-sitrep-21-ncov.pdf?sfvrsn=947679ef_2 .
  • View Article
  • Google Scholar
  • PubMed/NCBI
  • 14. The Johns Hopkins Center for Health Security. Daily updates on the emerging novel coronavirus from the Johns Hopkins Center for Health Security. February 9, 2020; 2020. Available from: https://hub.jhu.edu/2020/01/23/coronavirus-outbreak-mapping-tool-649-em1-art1-dtd-health/ .
  • 15. The Johns Hopkins Center for Health Security. Coronavirus COVID-19 Global Cases by Johns Hopkins CSSE; 2020. Available from: https://gisanddata.maps.arcgis.com/apps/opsdashboard/index.html#/bda7594740fd40299423467b48e9ecf6 .
  • 18. NHC. NHS Press Conference, Feb. 4 2020—National Health Commission (NHC) of the People’s Republic of China; 2020.
  • 20. MATLAB R2018b; 2018.
  • 21. Gale J. Coronavirus May Transmit Along Fecal-Oral Route, Xinhua Reports; 2020. Available from: https://www.bloomberg.com/news/articles/2020-02-02/coronavirus-may-transmit-along-fecal-oral-route-xinhua-reports .
  • Research article
  • Open access
  • Published: 07 June 2021

Predicting the incidence of COVID-19 using data mining

  • Fatemeh Ahouz 1 &
  • Amin Golabpour   ORCID: orcid.org/0000-0001-7649-4033 2  

BMC Public Health volume  21 , Article number:  1087 ( 2021 ) Cite this article

13k Accesses

12 Citations

Metrics details

The high prevalence of COVID-19 has made it a new pandemic. Predicting both its prevalence and incidence throughout the world is crucial to help health professionals make key decisions. In this study, we aim to predict the incidence of COVID-19 within a two-week period to better manage the disease.

The COVID-19 datasets provided by Johns Hopkins University, contain information on COVID-19 cases in different geographic regions since January 22, 2020 and are updated daily. Data from 252 such regions were analyzed as of March 29, 2020, with 17,136 records and 4 variables, namely latitude, longitude, date, and records. In order to design the incidence pattern for each geographic region, the information was utilized on the region and its neighboring areas gathered 2 weeks prior to the designing. Then, a model was developed to predict the incidence rate for the coming 2 weeks via a Least-Square Boosting Classification algorithm.

The model was presented for three groups based on the incidence rate: less than 200, between 200 and 1000, and above 1000. The mean absolute error of model evaluation were 4.71, 8.54, and 6.13%, respectively. Also, comparing the forecast results with the actual values in the period in question showed that the proposed model predicted the number of globally confirmed cases of COVID-19 with a very high accuracy of 98.45%.

Using data from different geographical regions within a country and discovering the pattern of prevalence in a region and its neighboring areas, our boosting-based model was able to accurately predict the incidence of COVID-19 within a two-week period.

Peer Review reports

On December 8, 2019 the Chinese government reported the death of one patient and hospitalization of 41 others with unknown etiology in Wuhan [ 1 ]. This cluster initiated the novel coronavirus (COVID-19) epidemic respiratory disease. While the early cases were linked to the wet market, human-to-human transmission had led to widespread outbreak of the virus nationwide [ 2 ]. On January 30, 2020 the World Health Organization (WHO) declared COVID-19 as a public health emergency with international concern (PHEIC) [ 3 ].

On the basis of the global spread and severity of the disease, on March 11, 2020 the Director-General of WHO officially declared the COVID-19 outbreak a pandemic [ 4 ]. The pandemic as such, entered a new stage with rapid spread in countries outside China [ 5 ]. According to the 56th WHO situation report [ 6 ], as of March 16, 2020 the number of COVID-19 confirmed cases outside China exceeded those inside. Consequently, after March 17, 2020 WHO began to report the number of confirmed and dead cases on each continent as opposed to merely providing patient statistics in and out of China.

According to the 70th WHO situation report [ 7 ], by March 30, 2020 the number of people infected with COVID-19 worldwide were 693,282. 392,815 (about 57%) of whom were in Europe, 142,081 (about 20%) in the Americas, 103,775 (about 15%) in Western Pacific, 46,329 (about 7%) in Eastern Mediterranean, 4084 (about 0.5%) in South-East Asia, and 3486 (about 0.5%) in Africa. Of that total, 33,106 died worldwide, 23,962 of whom (around 72% of all death) were in Europe, 3649 (around 11%) in Western Pacific, and 5488 (around 17%) were in other regions collectively.

Due to the growing prevalence of COVID-19 across the world, several works have examined different aspects of the disease. They involve identifying the source of the virus as well as analyzing its gene sequences [ 8 , 9 ], patient information [ 10 ], early cases in the countries infected [ 11 , 12 , 13 ], methods of virus detection [ 14 , 15 ], the epidemiological outbreak [ 16 , 17 ], and predicting COVID-19 cases [ 2 , 17 , 18 , 19 , 20 ].

In [ 18 ], using heuristic method and WHO situation reports, an exponential curve was proposed to predict the number of cases in the next 2 weeks by March 30, 2020. The model was then tested for the 58th situation report. The authors reported 1.29% error. Afterwards, on the assumption that the current trend could continue for the next 17 days, they predicted that by March 30, 1 million cases outside China would be reported in the 70/71th WHO situation report. Given that the number of confirmed cases outside China was 693,176 on March 30 [ 21 ], their forecast error was 44.26%.

In [ 17 ], the CoronaTracker team proposed a Susceptible-Exposed-Infectious-Recovered (SEIR) model based on the queried data in their website, and made the 240-day prediction of COVID-19 cases in and out of China, started on 20 January 2020. They predicted that the outbreak would reach its peak on May 23, 2020 and the maximum number of infected individuals would amount to 425.066 million globally. In addition, the authors stated that this number would start to drop around early July 2020 and reach below 10,000 on 14 Sep 2020. Given the information available now, these predictions were far from what really happened around the world.

Elsewhere [ 19 ], the authors examined some available models to predict 5 and 10-day ahead of cumulative cases in Guangdong and Zhejiang by February 23, 2020. They used generalized logistic growth, the Richards growth, and a sub-epidemic wave model, which were utilized to forecast some previous infectious outbreaks.

Although some works have proposed methods for predicting COVID-19 cases, to our knowledge at the time of writing this paper, none have been comprehensive and have not predicted the new cases in each geographical region along with each continent. In this study, using the COVID-19 Cases dataset provided by Johns Hopkins University [ 22 ], we aim to predict COVID-19 infected people in each geographical regions included in the dataset as well as each continent in the coming 2-week period. Predicting the situation in the current pandemic is very crucial to containment of the threat because it helps make timely medical measures e.g. equipping medical facilities, managing resource allocation, sending more personnel to high-risk areas, deciding whether to close borders or resume traffic, and suspending or resuming community services.

COVID-19 epidemiological data have been compiled by the Johns Hopkins University Center for Systems Science and Engineering (JHU CCSE) [ 22 ]. The data have been provided in three separate datasets for confirmed, recovered, and death cases since January 22, 2020 and are updated daily. In each of these datasets, there is a record (row) for every geographic region. The variables in each dataset are province/state, country/region, latitude, longitude, and the incremental dates since January 22. For each region, the value for any date indicates the cumulative number of confirmed/recovered/death cases from January 22, 2020.

In this study, according to the input requirements of the proposed model, we changed the data representation so that instead of three separate datasets for three groups of confirmed, recovered, and death cases, only one dataset containing the information of all three groups was arranged. In this new dataset, each record (or row) of the dataset contains information about the number of confirmed, recovered, or deaths per day for each geographic region. As a result, the variables in this new dataset are: Province / State, Country / Region, Latitude (Lat), Longitude (Long), Date (specifying a certain date), Cases (indicating the number of confirmed, recovered, or death cases on the certain date), and Type (specifying the type of cases, i.e. confirmed, recovered, or death) as suggested by Rami Krispin [ 23 ].

In this study, the data were applied into the analysis by March 29, 2020, with 50,660 records and 7 variables. This period includes information about parts of winter and spring in the northern hemisphere and summer and autumn in the southern hemisphere. By March 29, the dataset consisted of cases from 177 countries and 252 different regions around the world. There were 720,139 confirmed, 33,925 death, and 149,082 recovered cases in the dataset.

Preprocessing step

Pre-processing was carried out on the dataset before training the proposed model. Figure  1 shows the preprocessing steps. The dataset was first examined for noise, since the noise data were considered as having negative values in Cases variable. The dataset contained 42 negative values in this variable. After deleting these values, the number of records were reduced to 50,618.

figure 1

Preprocessing steps on COVID-19 dataset

Subsequently, the Date variable was written in numerical format and renamed into “Day” variable. To that effect, January 22, 2020 marked the beginning of the outbreak and the next days were calculated in terms of distance from the origin. As a result, January 22 and March 29 were considered as Day 1 and Day 68, respectively.

Since each region is uniquely identified by its latitude and longitude, the data for Province/State and Country/Region were excluded from the dataset. Moreover, as the study aimed at predicting the incidence in any geographical region, we considered only those records providing information on the confirmed cases (17,179 records), but not on the dead or the recovered. So, after preserving the records with “Confirmed” value in the Type variable, it was deleted from the dataset. In this study, the “Cases” is considered as the dependent variable.

Constructing the prediction model

An ensemble method of regression learners was utilized to predict the incidence of COVID-19 in different regions. The idea of ensemble learning is to build a prediction model by combining the strengths of a collection of simpler base models called weak learners [ 24 ]. At every step, the ensemble fits a new learner to the difference between the observed response and the aggregated prediction of all learners grown previously. One of the most commonly used loss functions is least-squares (LS) error [ 25 ].

In this study, the model employed a set of individual Least-squares boosting (LSBoost) learners trying to minimize the mean squared error (MSE). The output of the model in step m, F m (x), was calculated using Eq. 1 :

where x is input variable and h(x;a) is the parameterized function of x, characterized by parameters a [ 25 ]. The values of ρ and a were obtained from Eq. 2 :

Where N is the number of training data and \( \tilde{y}_{i} \) is the difference between the observed response and the aggregated prediction up to the previous step.

Due to the recent major changes in the incidence of COVID-19 worldwide over the past 2 weeks, we aimed to predict the number of new cases as an indicator of prevalence over the next 2 weeks. The structure of the proposed method is shown in Fig.  2 .

figure 2

The structure of the Proposed model

Since the incubation period of COVID-19 can be 14 days [ 26 ], we assumed that we needed at least 14 days prior information to predict the incidence of Covid-19 in 1 day. Therefore, the proposed model examined all possible intervals between the first and the last 14 days to find the optimal time period to use its information to predict the number of cases in the coming days.

We hypothesized that the incidence in any region might follow the pattern of recent days in the same region and nearby. Therefore, after determining the optimal time period, the model added the information on confirmed cases in each region and nearby in the specified period to the same region’s record in the dataset.

After setting the time interval, [A, B], and the number of neighbors, the dataset was rearranged. In this case, the number of records was reduced from N to M, where M is calculated from Eq. 3 :

Where R is the number of different regions in the dataset and B is the last day of the time period. Similarly, the number of variables stored for each record increased from the first 4 variables (latitude, longitude, Day and Cases) to F, which is calculated from Eq. 4 :

Where NN is the number of neighbors and 4 is the number of variables in the original data set because for each geographical region, Lat, Long, Day and Cases are stored. |B-A + 1| is the number of days within the period that participate in the forecast of the next 14 days. The value of NN is multiplied by 2 because for each neighbor, latitude and longitude are added to the record information. Furthermore, for each day within the period of forecast, the Cases were added to the record information, so NN was multiplied by|B-A + 1|. For each region, the Day and Cases data during the period were added as well. Thus, |B-A + 1| was multiplied by 2. It should be noted, however, that the dependent variable remained the Cases of current day.

Since the number of both the nearby regions and the previous days effective in forecasting were unknown, we assumed these values to be unknown variables and obtained the most accurate model by examining all possible combinations of such variables in an iterative process.

The accuracy of the model was evaluated in terms of Mean Squared Error (MSE) and Mean Absolute Error (MAE); Due to the normalization of MAE between [0, 1], the evaluation error is equal to 2 times MAE. To do so, the information of the last 2 weeks on all regions was considered as a validation set, and the model was trained using other information in the dataset.

Forecast incidence in the next 2 weeks

A new test set was created to predict incidence in the next 2 weeks (by April 12, 2020). The number of records in this dataset was equal to that of unique geographical regions in the COVID-19 dataset. Then, according to the best neighborhood and optimal time interval specified in the previous step, the necessary features were provided for each record. After that, the best model was created in the previous step was retrained on the entire dataset as a training set. Later on, this model was examined on the new test set to predict the incidence rate.

Evaluation the actual performance of the proposed model

Given that the actual number of confirmed cases within March 30–April 12, 2020 period was available at the time of review, the performance of the proposed model was measured based on percent error between the predicted and the actual values. The percent error was calculated from Eq. 5 :

Where δ is percent error, v A is the actual observed value and v E is the expected (predicted) value. Furthermore, according to the predicted and actual confirmed cases in 252 geographical regions in the dataset, the continental incidence rate was calculated using Eq. 6 :

where I C is the incidence in each continent and I W is the global incidence of COVID-19 from March 30 to April 12, 2020.

The experimentation platform is Intel® Core™ i7-8550U CPU @ 1.80GHz 1.99 GHz CPU and 12.0 GB of RAM running 64-bits OS of MS Windows. The pre-processing and model construction has been implemented in MATLAB.

Model construction

The number of neighbors ranged from zero to 10. The value of 10 was obtained by trial and error. Euclidean distance based on latitude and longitude was used to calculate nearest neighbors. Given that the dataset contains data from January 22, 2020 to March 29, 2020 for the day we want to predict the incidence, the nearest and farthest days were selected as 14 and 54, respectively. Because the number of confirmed cases varies greatly from region to region, the proposed algorithm was implemented for 3 different groups of regions: for regions with less than 200 confirmed cases per day (16,825 records), those with 200 to 1000 cases per day (220 records), and those with over 1000 cases per day (152 records).

Table  1 shows the results of the best proposed model with regard to the different composition of the neighborhood and the days before. In order to predict the incidence of COVID-19 in regions with more than 1000 confirmed cases per day, the proposed model demonstrated the best performance with MAE of 6.13%, considering the information of the last 14 to 17 days of the region and its two neighboring areas. In the dataset, the number of cases records in these regions varied from 1019 to 19,821.

For regions with 200 to 1000 cases per day, the proposed model performed best with respect to the 9 nearest neighboring areas and with data from the last 14 to 20 days, with MAE of 8.54% on the validation set. For regions with fewer than 200 cases per day, on the other hand, the proposed model performs best with MAE of 4.71%, taking into account the region data for the last 14 to 34 days.

Prediction of incidence by April 12, 2020

Figure  3 shows the prevalence of the COVID-19 from the first week to the tenth week in different regions, based on the information provided by the COVID-19 epidemiological dataset [ 22 ]. In this Figure, the diameter of the circles is proportional to the prevalence in those regions and the center of each circle matches the geographical coordinates of the region.

figure 3

Visualize the outbreak over the days (created by ourselves, gimp software, open source)

Table  2 shows the results of the forecast as to the number of new cases per day on different continents. According to the location of the continents in the northern and southern hemispheres, the period in question contains winter and early spring information in the continents of North America, Europe and almost entire parts of Asia. It includes summer and parts of autumn in Australian and approximately whole South America. Given that Africa lies in all four hemispheres, the data recorded for this continent in this period in the data set includes all seasons.

By April 12, 1,134,018 new cases worldwide were expected to be on record. Of these, Europe with 687,665 (60.64%), North America with 272,957 (24.07%) and Asia with 107,000 (9.44%) new cases were the most prevalent, whereas Australia with 14,526 (1.28%), Africa with 19,131 (1.69%) and South America with 32.739 (2.89%) new cases were the least incidence. Africa, Europe and South America had the highest rates of COVID-19 incidence, with 283, 221.23, and 178.87%, respectively. Asia was the only continent that had slowed its growth with an incidence rate of − 34.

Figure  4 shows the prediction of incidence rates in different regions. Accordingly, the prevalence would decrease over the next 2 weeks in the Middle East, yet it would increase in North America and Europe. Outbreak forecasts for 244 geographic regions are provided in Additional file  1 : Appendix 1.

figure 4

Prediction of the incidence in week 10 and 11 (created by ourselves, gimp software, open source)

Comparison of predicted and actual cases from March 30 to April 12, 2020

Table  3 shows the total number of daily cases in the 252 regions surveyed between March 30 and April 12, 2020. As shown, the daily percent error is below 20%. The best accuracy of the proposed model in predicting the incidence of COVID-19 was obtained on April 10 with 99.6%, and the worst on April 11 with 81.3%. Data analysis of the two-week continental incidence rates are also shown in Fig.  5 . The best predicted continental incidence rates were found in South America and Asia with 18.15 and 21.04% percent error, respectively. The worst cases, still, were observed in Africa and Australian with more than 80% percent errors.

figure 5

Comparison of predicted and actual continental incidence rates between March 30 and April 12, 2020

Data mining is capable of presenting a predictive model and extracting new knowledge from retrospective data. The way data is processed, as well as the variables selected, had a significant impact on knowledge discovery. There are various data mining techniques used to predict an outbreak. As an actual global health concern, COVID-19 had already developed into one of the world’s major emergencies. The present study proposed to investigate its outbreak worldwide during a two-week period via a predictive model based on retrospective data. It was concluded that such a model could be presented with acceptable error rates.

The study made use of a coronavirus dataset to design an incidence of COVID-19 prediction model. According to the incidence rate per day, the model was trained based on three groups of below 200, 200–1000 and above 1000 cases. One-way ANOVA results showed that there was a statistically significant difference between the prevalence rates in the three groups ( p -value < 0.001). For each group, the prediction model was implemented and the incidence was predicted for the next 2 weeks. The proposed model achieved about 10% error (90% accuracy) for the group of less than 200 cases, 18% error (82% accuracy) for the group of 200–1000 cases, and 13% error (87% accuracy) for that exceeding 1000 cases.

In this study, as the incidence of COVID-19 was evaluated for 68 days worldwide, and a prediction model presented for the two-week period (i.e., March 30–April 12, 2020), more than 1000,000 people were expected to contract the disease within the next 2 weeks, which was statistically up 58% compared to 700,000 of the outbreak by March 29, 2020.

The study found that adjacent regions with a prevalence of less than 1000 had similar incidence, so the incidence of each of these regions could be determined from information on neighboring areas. The use of neighborhood information enables the model to indirectly consider the effective policies of other regions in predicting the incidence of COVID-19 in each region.

Given that the proposed model was trained using only 68-day data (which was the most up-to-date information at the time of writing), the accuracy of predicting the incidence above 81% was deemed acceptable for such an unknown disease. Further, according to the results shown in Table 3 , the model prediction error for a total of 12 days for 252 regions was less than 2%. Therefore, if the data of each country were stored more precisely using more geographical regions, it was promising that we could create an accurate model for predicting the incidence of covid-19 over a two-week period in each country. While many unknowns would be expected of a new pandemic, having this information can guide planning and resource allocation for prevention, treatment, and palliative care.

Although time series usually need to be long enough (normally a few years) to adequately account for seasonality, based on the results of the model implementations, we believe that this model, even with that short a time series, is able to manage seasonality and can predict the number of cases with acceptable accuracy (see Additional file 1 : Appendices 2 and 3 for the results of all analyses). However, it is suggested that future research specifically address the effect of seasonal changes on the prevalence of this disease.

One of the limitations of the study was that the dataset did not provide sufficient information from all continents. Given that the disease did not occur simultaneously on all continents, and the continental prevalence was in most cases after the 40th day of the first case in China, 68 days of data did not seem sufficient to predict the prevalence of such an unknown disease.

In Africa, the first case was reported in more than 80% of the 45 geographical regions since the 50th day. The number of confirmed cases since then was 4682, which was 97.83% of the total 4783 confirmed cases in Africa. In Australian, the first case was reported in more than 45% of the 11 geographical regions from the 40th day onwards. Also, out of a total of 4504 cases on the continent, 4478 cases (99.4%) were confirmed then.

In Europe, the first case was reported in 60 of the 69 geographic regions in the dataset from the 40th day onwards. Out of a total of 385,735 cases, information on 384,268 cases (i.e. 99.62%) has also been entered since that day. Similarly, South America confirmed its first case after the 40th day in 16 out of 17 regions. It is noteworthy that out of a total of 11,642 cases, 11,542 (14.99%) were confirmed from day 50 onwards.

In contrast, 88% of the North American regions had their first cases confirmed since day 50. In addition, of the 46 confirmed cases by March 29, 2020 on the continent, 38 were reported since day 50 (82.61%) And 41 were confirmed from day 40 onwards (89.13%).

Due to insufficient information on some continents as a result of their prevalence later than the declared beginning of the outbreak, the effect of measures such as increasing the number of tests taken per day as well as quarantine restrictions in some continents such as Europe, begin in place from March 30 to April 12, were not reflected in the dataset.

Nevertheless, the inaccurate prediction of the number of cases in Africa could be attributed, in turn, to the insufficient information about the continent in the dataset. In 80% of the African regions, the first confirmed case was recorded 50 days into the outbreak. Out of a total of 4786 cases there, up until the 68th day, 4682 cases (more than 97%) were reported since day 50.

In addition, due to the fact that latitude and longitude are two important indicators in the data set, the non-uniformity of recording these information for different geographical regions is another limitation of the work; for some areas, the information is about one state of a country and for some areas it is for the whole country. For example, in the dataset for USA, all cases are provided in terms of only one latitude and longitude, but for Netherlands, the data of COVID-19 cases are provided for four different latitude and longitude pairs.

Another limitation of this study was the use of data from all countries coping with in COVID-19 with their own protocols for testing and identifying patients. However, in general, this is the only global dataset for COVID-19 that has been used in other studies [ 16 , 17 ]. Besides, early information on each country was taken into account in the proposed model to predict the incidence in that country to reduce the mentioned limitation.

It is worth noting that the model rests on both the info provided by the dataset and the current measures taken in dealing with the disease. Hence, if government’s’ policies to tackle the disease change, so will the accuracy of the information.

Conclusions

Since epidemiological models such as SIR failed to accurately predict COVID-19 cases, as stated in [ 17 , 27 , 28 ], the current study relied on data from January 22 to March 29 provided by Johns Hopkins University and proposed a more complex model based on machine learning methods. The mean absolute error of the proposed model was 6.13% in predicting the incidence of COVID-19 in the two-week period of March 16–29 for regions with more than 1000 cases per day. The MAE was 8.45 and 4.71% for regions with a daily incidence rate between 200 and 1000 cases and less than 200 cases, respectively. An accuracy of more than 82% on the evaluation set confirms our perception that the pattern of incidence of a region is influenced by the pattern of disease in recent days in the same region and neighboring areas.

Last but not least, despite numerous limitations of the dataset, lack of knowledge about such an unknown disease and changes in disease control policies in different countries during the period under scrutiny, the proposed model proved effective in predicting the global incidence of COVID-19 in the two-week period of March 30 and April 12 with 98.45% accuracy. In addition, the accuracy of the proposed model in predicting daily cases in a worst-case scenario was 81.31%.

This model is written in general and can be run for different intervals (see Additional file 1 : Appendix 4). It is suggested that the model be implemented for future data as well.

Availability of data and materials

The dataset analyzed during the current study is public and it is available in the [ https://data.humdata.org/dataset/novel-coronavirus-2019-ncov-cases ] and in [ https://codeload.github.com/RamiKrispin/coronavirus-csv/zip/master ].

Abbreviations

World Health Organization

Public Health Emergency with International Concern

Susceptible-Exposed-Infectious-Recovered

Johns Hopkins University Center for Systems Science and Engineering

Least-squares boosting

Mean Squared Error

Mean Absolute Error

Nkengasong J. Author Correction: China’s response to a novel coronavirus stands in stark contrast to the 2002 SARS outbreak response. Nat Med. 2020;26(3):441. https://doi.org/10.1038/s41591-020-0816-5 .

Roosa K, Lee Y, Luo R, Kirpich A, Rothenberg R, Hyman JM, et al. Real-time forecasts of the COVID-19 epidemic in China from February 5th to February 24th, 2020. Infect Dis Model. 2020;5:256–63. https://doi.org/10.1016/j.idm.2020.02.002 .

Article   CAS   PubMed   PubMed Central   Google Scholar  

Eurosurveillance Editorial T. Note from the editors: World Health Organization declares novel coronavirus (2019-nCoV) sixth public health emergency of international concern. Eurosurveillance. 2020;25(5):2–3.

Article   Google Scholar  

World Health Organization, WHO Director-General's opening remarks at the media briefing on COVID-19 - 11 March 2020. 2020. Available from: https://www.who.int/dg/speeches/detail/who-director-general-s-opening-remarks-at-the-media-briefing-on-covid-19%2D%2D-11-march-2020 . Accessed 27 May 2021.

Bedford J, et al. COVID-19: towards controlling of a pandemic . 2020.

Google Scholar  

Who, World Health Organization, Coronavirus disease 2019 (COVID-19) situation report −60. 2020.

World Health Organization, Coronavirus disease 2019 (COVID-19) Situation Report −70. 2020 [updated 19March 2020. Available from: https://www.who.int/docs/default-source/coronaviruse/situation-reports/20200330-sitrep-70-covid-19.pdf?sfvrsn=7e0fe3f8_4 . Accessed 27 May 2021.

Ji W, Wang W, Zhao X, Zai J, Li X. Cross-species transmission of the newly identified coronavirus 2019-nCoV. J Med Virol. 2020;92(4):433–40. https://doi.org/10.1002/jmv.25682 .

Paraskevis D, Kostaki EG, Magiorkinis G, Panayiotakopoulos G, Sourvinos G, Tsiodras S. Full-genome evolutionary analysis of the novel corona virus (2019-nCoV) rejects the hypothesis of emergence as a result of a recent recombination event. Infect Genet Evol. 2020;79:104212. https://doi.org/10.1016/j.meegid.2020.104212 .

Huang C, Wang Y, Li X. Clinical features of patients infected with 2019 novel coronavirus in Wuhan, China (vol 395, pg 497, 2020). Lancet. 2020;395(10223):496.

Kim JY, Choe PG, Oh Y, Oh KJ, Kim J, Park SJ, et al. The first case of 2019 novel coronavirus pneumonia imported into Korea from Wuhan, China: implication for infection prevention and control measures. J Korean Med Sci. 2020;35(5):e61.  https://doi.org/10.3346/jkms.2020.35.e61 .

Bernard Stoecklin S, Rolland P, Silue Y, Mailles A, Campese C, Simondon A, et al. First cases of coronavirus disease 2019 (COVID-19) in France: surveillance, investigations and control measures, January 2020. Euro Surveill. 2020;25(6):2000094. https://doi.org/10.2807/1560-7917.ES.2020.25.6.2000094 .

Giovanetti M, Benvenuto D, Angeletti S, Ciccozzi M. The first two cases of 2019-nCoV in Italy: Where they come from? J Med Virol. 92(5):518–21. https://doi.org/10.1002/jmv.25699 .

Corman VM, et al. Detection of 2019 novel coronavirus (2019-nCoV) by real-time RT-PCR. Eurosurveillance. 2020;25(3):23–30.

Zhang NR, et al. Recent advances in the detection of respiratory virus infection in humans. J Med Virol. 2020;92(4):408–17. https://doi.org/10.1002/jmv.25674 .

Dey SK, Rahman MM, Siddiqi UR, Howlader A. Analyzing the epidemiological outbreak of COVID-19: a visual exploratory data analysis approach. J Med Virol. 92(6):632–8. https://doi.org/10.1002/jmv.25743 .

Binti Hamzah FA, et al. CoronaTracker: world-wide COVID-19 outbreak data analysis and prediction . 2020.

Koczkodaj WW, Mansournia MA, Pedrycz W, Wolny-Dominiak A, Zabrodskii PF, Strzałka D, et al. 1,000,000 cases of COVID-19 outside of China: The date predicted by a simple heuristic. Glob Epidemiol. 2020;2:100023. https://doi.org/10.1016/j.gloepi.2020.100023 .

Roosa K, Lee Y, Luo R, Kirpich A, Rothenberg R, Hyman JM, et al. Short-term Forecasts of the COVID-19 Epidemic in Guangdong and Zhejiang, China: February 13–23, 2020. J Clin Med. 2020;9(2):596. https://doi.org/10.3390/jcm9020596 .

Nishiura H, Jung SM, Linton NM, Kinoshita R, Yang YC, Hayashi K, et al. The extent of transmission of novel coronavirus in Wuhan, China, 2020. J Clin Med. 2020;9(2):330. https://doi.org/10.3390/jcm9020330 .

Organization, W.H. Coronavirus disease 2019 (COVID-19) Situation Report −70. 2020. Available from: https://www.who.int/docs/default-source/coronaviruse/situation-reports/20200330-sitrep-70-covid-19.pdf?sfvrsn=7e0fe3f8_4 .

(CCSE), J.H.U.C.f.S.S.a.E.J. Novel Coronavirus (COVID-19) Cases Data. 2020. Available from: https://data.humdata.org/dataset/novel-coronavirus-2019-ncov-cases .

Krispin R. Coronavirus. 2020. Available from: https://github.com/RamiKrispin/coronavirus .

Hastie T, Tibshirani R, Friedman J. The Elements of Statistical Learning, second edition. Springer Series in Statistics. New York: Springer-Verlag; 2008.

Friedman J. Greedy function approximation: a gradient boosting machine. Ann Stat. 2000;29:1189–232. https://doi.org/10.1214/aos/1013203451 .

Organization, w.H. Transmission of SARS-CoV-2: implications for infection prevention precautions. 2020. Available from: https://www.who.int/news-room/commentaries/detail/transmission-of-sars-cov-2-implications-for-infection-prevention-precautions#:~:text=The%20incubation%20period%20of%20COVID,to%20a%20confirmed%20case .

Postnikov EB. Estimation of COVID-19 dynamics “on a back-of-envelope”: Does the simplest SIR model provide quantitative parameters and predictions? Chaos, Solitons Fractals. 2020;135:109841. https://doi.org/10.1016/j.chaos.2020.109841 .

Cooper I, Mondal A, Antonopoulos CG. A SIR model assumption for the spread of COVID-19 in different communities. Chaos, Solitons Fractals. 2020;139:110057.

Download references

Acknowledgements

The authors appreciate Deputy of research and technology of Khatam Alanbia University of technology.

Not applicable.

Author information

Authors and affiliations.

Department of Computer Engineering, School of Engineering, Behbahan Khatam Alanbia University of Technology, Behbahan, Iran

Fatemeh Ahouz

School of Medicine, Shahroud University of Medical Sciences, Shahroud, Iran

Amin Golabpour

You can also search for this author in PubMed   Google Scholar

Contributions

‘FA’ and ‘AG’ equally contributed to the conception, design of the work, analysis and interpretation of data. In addition, they read and approved the final manuscript.

Corresponding author

Correspondence to Amin Golabpour .

Ethics declarations

Ethics approval and consent to participate, consent for publication, competing interests.

The authors declare that they have no competing interests.

Additional information

Publisher’s note.

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary Information

Additional file 1: appendix 1..

Point-to-point forecast for all areas in the dataset. Appendix 2. Investigation the effect of seasonal changes on model performance. Appendix 3. The performance of the proposed method on randomly selected regions. Appendix 4. The results of the proposed method on the updated data.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ . The Creative Commons Public Domain Dedication waiver ( http://creativecommons.org/publicdomain/zero/1.0/ ) applies to the data made available in this article, unless otherwise stated in a credit line to the data.

Reprints and permissions

About this article

Cite this article.

Ahouz, F., Golabpour, A. Predicting the incidence of COVID-19 using data mining. BMC Public Health 21 , 1087 (2021). https://doi.org/10.1186/s12889-021-11058-3

Download citation

Received : 03 April 2020

Accepted : 13 May 2021

Published : 07 June 2021

DOI : https://doi.org/10.1186/s12889-021-11058-3

Share this article

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

  • Data mining

BMC Public Health

ISSN: 1471-2458

covid 19 data analysis research paper

  • Contact Tracing
  • Pandemic Data Initiative
  • Webcasts & Videos
  • 30-Minute COVID-19 Briefing

Research Papers

Jhu has stopped collecting data as of.

After three years of around-the-clock tracking of COVID-19 data from...

The Johns Hopkins Coronavirus Resource Center has collected, verified, and published local, regional, national, and international pandemic data since it launched in March 2020. From the beginning, the information has been freely available to all — researchers, institutions, the media, the public, and policymakers. As a result, the CRC and its data have been cited in many published research papers and reports. Here we have gathered publications authored by CRC team members that focus on the CRC or its data.

July 14, 2022

Misaligned Federal and State Covid data limits demographic insights

CDC underreports cases and deaths among African American and Hispanic or Latino individuals.

February 17, 2022

Experts Call for Open Public Health Data

Johns Hopkins team highlighted the urgent need for better COVID data collection.

Unifying Epidemiologists and Economists

Researchers from disparate fields join to chart a new path for formulating policies in response to future pandemics.

Mobility Data Supported Social Distancing

Study found that physical distancing was an effective COVID mitigation strategy.

Johns Hopkins Engineers Build COVID Dashboard

Lancet Infectious Diseases published first paper detailing how the global map was built.

Researchers Identify Disparities in COVID Testing

Johns Hopkins team conducted an analysis of state-published demographic data

Data Analytics for the COVID-19 Epidemic

Ieee account.

  • Change Username/Password
  • Update Address

Purchase Details

  • Payment Options
  • Order History
  • View Purchased Documents

Profile Information

  • Communications Preferences
  • Profession and Education
  • Technical Interests
  • US & Canada: +1 800 678 4333
  • Worldwide: +1 732 981 0060
  • Contact & Support
  • About IEEE Xplore
  • Accessibility
  • Terms of Use
  • Nondiscrimination Policy
  • Privacy & Opting Out of Cookies

A not-for-profit organization, IEEE is the world's largest technical professional organization dedicated to advancing technology for the benefit of humanity. © Copyright 2024 IEEE - All rights reserved. Use of this web site signifies your agreement to the terms and conditions.

Methodologies for COVID-19 research and data analysis

This collection has closed and is no longer accepting new submissions. 

New Content Item © © denisismagilov

This  BMC Medical Research Methodology collection of articles has not been sponsored and articles undergo the journal’s standard peer-review process overseen by our Guest Editors, Prof. Dr. Livia Puljak (Catholic University of Croatia in Zagreb, Croatia) and Prof. Dr. Martin Wolkewitz (University of Freiburg, Germany).  

Delays in reporting and publishing trial results during pandemics: cross sectional analysis of 2009 H1N1, 2014 Ebola, and 2016 Zika clinical trials

Pandemic events often trigger a surge of clinical trial activity aimed at rapidly evaluating therapeutic or preventative interventions. Ensuring rapid public access to the complete and unbiased trial record is...

  • View Full Text

Open science saves lives: lessons from the COVID-19 pandemic

In the last decade Open Science principles have been successfully advocated for and are being slowly adopted in different research communities. In response to the COVID-19 pandemic many publishers and research...

Clinical research activities during COVID-19: the point of view of a promoter of academic clinical trials

During the COVID-19 emergency, IRST IRCCS, an Italian cancer research institute and promoter of no profit clinical studies, adapted its activities and procedures as per European and national guidelines to main...

Instruments to measure fear of COVID-19: a diagnostic systematic review

The COVID-19 pandemic has become a source of fear across the world. Measuring the level or significance of fear in different populations may help identify populations and areas in need of public health and edu...

A risk assessment tool for resumption of research activities during the COVID-19 pandemic for field trials in low resource settings

The spread of severe acute respiratory syndrome coronavirus-2 has suspended many non-COVID-19 related research activities. Where restarting research activities is permitted, investigators need to evaluate the ...

Outbreaks of publications about emerging infectious diseases: the case of SARS-CoV-2 and Zika virus

Outbreaks of infectious diseases generate outbreaks of scientific evidence. In 2016 epidemics of Zika virus emerged, and in 2020, a novel severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) caused a p...

Analysis of clinical and methodological characteristics of early COVID-19 treatment clinical trials: so much work, so many lost opportunities

The COVID-19 pandemic continues to rage on, and clinical research has been promoted worldwide. We aimed to assess the clinical and methodological characteristics of treatment clinical trials that have been set...

The unintended consequences of COVID-19 mitigation measures matter: practical guidance for investigating them

COVID-19 has led to the adoption of unprecedented mitigation measures which could trigger many unintended consequences. These unintended consequences can be far-reaching and just as important as the intended o...

Short-term real-time prediction of total number of reported COVID-19 cases and deaths in South Africa: a data driven approach

The rising burden of the ongoing COVID-19 epidemic in South Africa has motivated the application of modeling strategies to predict the COVID-19 cases and deaths. Reliable and accurate short and long-term forec...

Incorporating and addressing testing bias within estimates of epidemic dynamics for SARS-CoV-2

The disease burden of SARS-CoV-2 as measured by tests from various localities, and at different time points present varying estimates of infection and fatality rates. Models based on these acquired data may su...

COVID-19-related medical research: a meta-research and critical appraisal

Since the start of the COVID-19 outbreak, a large number of COVID-19-related papers have been published. However, concerns about the risk of expedited science have been raised. We aimed at reviewing and catego...

Use of out-of-hospital cardiac arrest registries to assess COVID-19 home mortality

In most countries, the official statistics for the coronavirus disease 2019 (COVID-19) take account of in-hospital deaths but not those that occur at home. The study’s objective was to introduce a methodology ...

Strengthening policy coding methodologies to improve COVID-19 disease modeling and policy responses: a proposed coding framework and recommendations

In recent months, multiple efforts have sought to characterize COVID-19 social distancing policy responses. These efforts have used various coding frameworks, but many have relied on coding methodologies that ...

Predictive accuracy of a hierarchical logistic model of cumulative SARS-CoV-2 case growth until May 2020

Infectious disease predictions models, including virtually all epidemiological models describing the spread of the SARS-CoV-2 pandemic, are rarely evaluated empirically. The aim of the present study was to inv...

Alternative graphical displays for the monitoring of epidemic outbreaks, with application to COVID-19 mortality

Classic epidemic curves – counts of daily events or cumulative events over time –emphasise temporal changes in the growth or size of epidemic outbreaks. Like any graph, these curves have limitations: they are ...

The Correction to this article has been published in BMC Medical Research Methodology 2020 20 :265

COVID19-world: a shiny application to perform comprehensive country-specific data visualization for SARS-CoV-2 epidemic

Data analysis and visualization is an essential tool for exploring and communicating findings in medical research, especially in epidemiological surveillance.

Social network analysis methods for exploring SARS-CoV-2 contact tracing data

Contact tracing data of severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) pandemic is used to estimate basic epidemiological parameters. Contact tracing data could also be potentially used for assessin...

Establishment of a pediatric COVID-19 biorepository: unique considerations and opportunities for studying the impact of the COVID-19 pandemic on children

COVID-19, the disease caused by the highly infectious and transmissible coronavirus SARS-CoV-2, has quickly become a morbid global pandemic. Although the impact of SARS-CoV-2 infection in children is less clin...

Statistical design of Phase II/III clinical trials for testing therapeutic interventions in COVID-19 patients

Because of unknown features of the COVID-19 and the complexity of the population affected, standard clinical trial designs on treatments may not be optimal in such patients. We propose two independent clinical...

Rapid establishment of a COVID-19 perinatal biorepository: early lessons from the first 100 women enrolled

Collection of biospecimens is a critical first step to understanding the impact of COVID-19 on pregnant women and newborns - vulnerable populations that are challenging to enroll and at risk of exclusion from ...

Disease progression of cancer patients during COVID-19 pandemic: a comprehensive analytical strategy by time-dependent modelling

As the whole world is experiencing the cascading effect of a new pandemic, almost every aspect of modern life has been disrupted. Because of health emergencies during this period, widespread fear has resulted ...

A four-step strategy for handling missing outcome data in randomised trials affected by a pandemic

The coronavirus pandemic (Covid-19) presents a variety of challenges for ongoing clinical trials, including an inevitably higher rate of missing outcome data, with new and non-standard reasons for missingness....

Joint analysis of duration of ventilation, length of intensive care, and mortality of COVID-19 patients: a multistate approach

The clinical progress of patients hospitalized due to COVID-19 is often associated with severe pneumonia which may require intensive care, invasive ventilation, or extracorporeal membrane oxygenation (ECMO). T...

COVID-19 prevalence estimation by random sampling in population - optimal sample pooling under varying assumptions about true prevalence

The number of confirmed COVID-19 cases divided by population size is used as a coarse measurement for the burden of disease in a population. However, this fraction depends heavily on the sampling intensity and...

Coronavirus disease 2019 (COVID-19): an evidence map of medical literature

Since the beginning of the COVID-19 outbreak in December 2019, a substantial body of COVID-19 medical literature has been generated. As of June 2020, gaps and longitudinal trends in the COVID-19 medical litera...

Group testing performance evaluation for SARS-CoV-2 massive scale screening and testing

The capacity of the current molecular testing convention does not allow high-throughput and community level scans of COVID-19 infections. The diameter in the current paradigm of shallow tracing is unlikely to ...

Research methodology and characteristics of journal articles with original data, preprint articles and registered clinical trial protocols about COVID-19

The research community reacted rapidly to the emergence of COVID-19. We aimed to assess characteristics of journal articles, preprint articles, and registered trial protocols about COVID-19 and its causal agen...

Using online technologies to improve diversity and inclusion in cognitive interviews with young people

We aimed to assess the feasibility of using multiple technologies to recruit and conduct cognitive interviews among young people across the United States to test items measuring sexual and reproductive empower...

Towards reduction in bias in epidemic curves due to outcome misclassification through Bayesian analysis of time-series of laboratory test results: case study of COVID-19 in Alberta, Canada and Philadelphia, USA

Despite widespread use, the accuracy of the diagnostic test for SARS-CoV-2 infection is poorly understood. The aim of our work was to better quantify misclassification errors in identification of true cases of...

The semi-automation of title and abstract screening: a retrospective exploration of ways to leverage Abstrackr’s relevance predictions in systematic and rapid reviews

We investigated the feasibility of using a machine learning tool’s relevance predictions to expedite title and abstract screening.

Social media as a recruitment platform for a nationwide online survey of COVID-19 knowledge, beliefs, and practices in the United States: methodology and feasibility analysis

The COVID-19 pandemic has evolved into one of the most impactful health crises in modern history, compelling researchers to explore innovative ways to efficiently collect public health data in a timely manner....

Current methods for development of rapid reviews about diagnostic tests: an international survey

Rapid reviews (RRs) have emerged as an efficient alternative to time-consuming systematic reviews—they can help meet the demand for accelerated evidence synthesis to inform decision-making in healthcare. The s...

Methodological challenges of analysing COVID-19 data during the pandemic

  • Introduction
  • Article Information

RCT indicates randomized clinical trial.

CDC indicates Centers for Disease Control and Prevention; NICE, UK National Institute for Health and Care Excellence; WHO, World Health Organization.

eAppendix. Search Strategy

Data Sharing Statement

  • Toward a Universal Definition of Post–COVID-19 Condition JAMA Network Open Invited Commentary April 5, 2023 Daniel Pan, MRCP; Manish Pareek, PhD, MRCP

See More About

Sign up for emails based on your interests, select your interests.

Customize your JAMA Network experience by selecting one or more topics from the list below.

  • Academic Medicine
  • Acid Base, Electrolytes, Fluids
  • Allergy and Clinical Immunology
  • American Indian or Alaska Natives
  • Anesthesiology
  • Anticoagulation
  • Art and Images in Psychiatry
  • Artificial Intelligence
  • Assisted Reproduction
  • Bleeding and Transfusion
  • Caring for the Critically Ill Patient
  • Challenges in Clinical Electrocardiography
  • Climate and Health
  • Climate Change
  • Clinical Challenge
  • Clinical Decision Support
  • Clinical Implications of Basic Neuroscience
  • Clinical Pharmacy and Pharmacology
  • Complementary and Alternative Medicine
  • Consensus Statements
  • Coronavirus (COVID-19)
  • Critical Care Medicine
  • Cultural Competency
  • Dental Medicine
  • Dermatology
  • Diabetes and Endocrinology
  • Diagnostic Test Interpretation
  • Drug Development
  • Electronic Health Records
  • Emergency Medicine
  • End of Life, Hospice, Palliative Care
  • Environmental Health
  • Equity, Diversity, and Inclusion
  • Facial Plastic Surgery
  • Gastroenterology and Hepatology
  • Genetics and Genomics
  • Genomics and Precision Health
  • Global Health
  • Guide to Statistics and Methods
  • Hair Disorders
  • Health Care Delivery Models
  • Health Care Economics, Insurance, Payment
  • Health Care Quality
  • Health Care Reform
  • Health Care Safety
  • Health Care Workforce
  • Health Disparities
  • Health Inequities
  • Health Policy
  • Health Systems Science
  • History of Medicine
  • Hypertension
  • Images in Neurology
  • Implementation Science
  • Infectious Diseases
  • Innovations in Health Care Delivery
  • JAMA Infographic
  • Law and Medicine
  • Leading Change
  • Less is More
  • LGBTQIA Medicine
  • Lifestyle Behaviors
  • Medical Coding
  • Medical Devices and Equipment
  • Medical Education
  • Medical Education and Training
  • Medical Journals and Publishing
  • Mobile Health and Telemedicine
  • Narrative Medicine
  • Neuroscience and Psychiatry
  • Notable Notes
  • Nutrition, Obesity, Exercise
  • Obstetrics and Gynecology
  • Occupational Health
  • Ophthalmology
  • Orthopedics
  • Otolaryngology
  • Pain Medicine
  • Palliative Care
  • Pathology and Laboratory Medicine
  • Patient Care
  • Patient Information
  • Performance Improvement
  • Performance Measures
  • Perioperative Care and Consultation
  • Pharmacoeconomics
  • Pharmacoepidemiology
  • Pharmacogenetics
  • Pharmacy and Clinical Pharmacology
  • Physical Medicine and Rehabilitation
  • Physical Therapy
  • Physician Leadership
  • Population Health
  • Primary Care
  • Professional Well-being
  • Professionalism
  • Psychiatry and Behavioral Health
  • Public Health
  • Pulmonary Medicine
  • Regulatory Agencies
  • Reproductive Health
  • Research, Methods, Statistics
  • Resuscitation
  • Rheumatology
  • Risk Management
  • Scientific Discovery and the Future of Medicine
  • Shared Decision Making and Communication
  • Sleep Medicine
  • Sports Medicine
  • Stem Cell Transplantation
  • Substance Use and Addiction Medicine
  • Surgical Innovation
  • Surgical Pearls
  • Teachable Moment
  • Technology and Finance
  • The Art of JAMA
  • The Arts and Medicine
  • The Rational Clinical Examination
  • Tobacco and e-Cigarettes
  • Translational Medicine
  • Trauma and Injury
  • Treatment Adherence
  • Ultrasonography
  • Users' Guide to the Medical Literature
  • Vaccination
  • Venous Thromboembolism
  • Veterans Health
  • Women's Health
  • Workflow and Process
  • Wound Care, Infection, Healing

Get the latest research based on your areas of interest.

Others also liked.

  • Download PDF
  • X Facebook More LinkedIn

Chaichana U , Man KKC , Chen A, et al. Definition of Post–COVID-19 Condition Among Published Research Studies. JAMA Netw Open. 2023;6(4):e235856. doi:10.1001/jamanetworkopen.2023.5856

Manage citations:

© 2024

  • Permissions

Definition of Post–COVID-19 Condition Among Published Research Studies

  • 1 UCL School of Pharmacy, London, United Kingdom
  • 2 Laboratory of Data Discovery for Health (D24H), Hong Kong Science Park, Hong Kong Special Administrative Region, China
  • 3 Centre for Medicines Optimisation Research and Education, University College London Hospitals National Health Service (NHS) Foundation Trust, London, United Kingdom
  • 4 Centre for Safe Medication Practice and Research, Department of Pharmacology and Pharmacy, Li Ka Shing Faculty of Medicine, The University of Hong Kong, Hong Kong Special Administrative Region, China
  • 5 Ninewells Hospital, University of Dundee Medical School, Dundee, United Kingdom
  • 6 University College London Hospitals NHS Foundation Trust, London, United Kingdom
  • Invited Commentary Toward a Universal Definition of Post–COVID-19 Condition Daniel Pan, MRCP; Manish Pareek, PhD, MRCP JAMA Network Open

As of February 2023, there have been approximately 759 million confirmed cases of COVID-19 infections globally 1 and some individuals have experienced persistent symptoms, such as fatigue and shortness of breath, after recovering from the initial illness from COVID-19. The UK National Institute for Health and Care Excellence (NICE), 2 the World Health Organization (WHO), 3 and the US Centers for Disease Control and Prevention (CDC) 4 have published their definitions of post–COVID-19 condition (PCC) between December 2020 and October 2021, with some discrepancies between them. Despite the growing volume of research on lasting symptoms of COVID-19, the definition has not been universally agreed on. This study aimed to describe how post–COVID-19 condition has been defined to date in studies on this topic.

We conducted a descriptive study on PCC definition following the STROBE reporting guideline and performed the literature search using the PRISMA checklist in PubMed on October 26, 2022. A total of 7087 studies containing information on PCC were identified from February 1, 2020, to October 26, 2022. Definition of PCC (eAppendix in Supplement 1 ), study type, country where the study was conducted, and manuscript submission date were extracted from the publications and are presented chronologically (eAppendix in Supplement 1 ).

Two investigators (U.C. and A.C.) reviewed the studies and screened titles and abstracts independently and cross-checked a 10% sample of the data collected from the studies. When submission dates were not available, the publication dates were used to determine the study time. Exemption from ethical approval was indicated by the University College of London Ethics Committee. SPSS Statistics for Windows, version 28 (IBM Corp) was used for data analysis.

Among 7087 studies, we excluded 6792 that were not relevant to PCC (eg, SARS-CoV-2 vaccines, commentary, systematic review, and full articles in languages other than English). The remaining 295 studies were included, consisting of 2 randomized clinical trials (0.7%), 134 cohort studies (45.4%), 66 cross-sectional studies (22.4%), 13 case-control studies (4.4%), 45 case reports or case series (15.3%), and 35 studies using other designs (11.9%) ( Figure 1 ). Of these, 167 studies (56.6%) were conducted in European countries. We found that only 102 studies (34.6%) used 1 of the 3 organizational definitions for their studies (NICE: 56, WHO: 31, and CDC: 15). A total of 193 studies (65.4%) did not follow any of the 3 definitions for PCC and 6 studies were submitted for publication before NICE released their PCC definition (ie, before December 18, 2020) ( Figure 2 ).

Of 193 studies that did not follow any of 3 definitions, 129 studies (66.8%) used their own definitions for PCC (eg, presence of chronic symptoms that last >5 months or after 2 weeks of SARS-CoV-2 infection), while 64 studies (33.2%) did not define PCC.

We found substantial heterogeneity in defining PCC in the published studies, with almost two-thirds (65.4%) not complying with the definitions from the NICE, CDC, or WHO. This study highlights major issues in comparing interventions and outcomes between these reported studies in PCC due to differences in definition. The differences also result in considerable variation when translating findings into clinical management and cost-effectiveness assessments of interventions in patients with PCC. The clinical management of PCC must be evidence-based and include a personalized approach. A clearer definition of PCC is timely so that clinical trial evidence can reliably be applied to clinical management and the well-being of patients with PCC can be improved.

Our study has some limitations. We conducted the literature search only in PubMed. Furthermore, the NICE updated their PCC definition in November 2022 after we finished the study screening. However, the updated definition would not affect our study and would only apply to studies conducted after November 2022.

Accepted for Publication: February 8, 2023.

Published: April 5, 2023. doi:10.1001/jamanetworkopen.2023.5856

Open Access: This is an open access article distributed under the terms of the CC-BY License . © 2023 Chaichana U et al. JAMA Network Open .

Corresponding Author: Li Wei, PhD, UCL School of Pharmacy, 29-39 Brunswick Square, London WC1N 1AX, United Kingdom ( [email protected] ).

Author Contributions: Ms Chaichana and Dr Wei had full access to all of the data in the study and take responsibility for the integrity of the data and the accuracy of the data analysis.

Concept and design: Chaichana, Man, Wong, Wei.

Acquisition, analysis, or interpretation of data: Chaichana, Man, Chen, George, Wilson, Wei.

Drafting of the manuscript: Chaichana, Chen.

Critical revision of the manuscript for important intellectual content: All authors.

Statistical analysis: Chaichana, Chen.

Obtained funding: Wong.

Administrative, technical, or material support: Chen.

Supervision: Man, Wilson, Wei.

Conflict of Interest Disclosures: Ms Chaichana reported receiving a scholarship from the Royal Thai Government outside the submitted work. Dr Man reported receiving grants from the Hong Kong Research Grant Council during the conduct of the study; grants from CW Maplethorpe Fellowship, European Commission Horizon 2020, and the National Institute for Health and Care Research; and personal fees from IQVIA Ltd outside the submitted work. Dr Wong reported receiving grants from the Hong Kong Health and Medical Research Fund, Amgen, Bristol-Myers Squibb, Pfizer, Janssen, Bayer, GSK, Novartis, the Food and Health Bureau of the Government of the Hong Kong Special Administrative Region, the UK National Institute for Health and Care Research, the European Commission, and the National Health and Medical Research Council in Australia outside the submitted work; receiving consulting fees from IQVIA outside the submitted work; and serving as a paid nonexecutive director of Jacobson Medical in Hong Kong and a paid consultant to the World Health Organization. Dr Wilson reported receiving personal fees from the Pfizer Advisory Board and the Roche Drug Safety Monitoring Board outside the submitted work. Dr Wei reported receiving grants from the National Institute Health Research Health Technology Assessment, Hong Kong Innovation and Technology Commission, Diabetes UK, The Cure Parkinson’s Trust, and BOPA-PRUK outside the submitted work. No other disclosures were reported.

Funding/Support: This work was partially supported by grant C7154-20G from the Research Grants Council of Hong Kong under the Collaborative Research Fund Scheme.

Role of the Funder/Sponsor: The funding organization had no role in the design and conduct of the study; collection, management, analysis, and interpretation of the data; preparation, review, or approval of the manuscript; and decision to submit the manuscript for publication.

Data Sharing Statement: See Supplement 2 .

  • Register for email alerts with links to free full-text articles
  • Access PDFs of free articles
  • Manage your interests
  • Save searches and receive search alerts
  • Open access
  • Published: 09 April 2024

Socioeconomic factors associated with use of telehealth services in outpatient care settings during the COVID-19

  • Pearl C. Kim 1 ,
  • Lo-Fu Tan 2 ,
  • Jillian Kreston 3 ,
  • Haniyeh Shariatmadari 1 ,
  • Estella Sky Keyoung 4 ,
  • Jay J. Shen 1 , 5 &
  • Bing-Long Wang 6  

BMC Health Services Research volume  24 , Article number:  446 ( 2024 ) Cite this article

81 Accesses

Metrics details

To examine potential changes and socioeconomic disparities in utilization of telemedicine in non-urgent outpatient care in Nevada since the COVID-19 pandemic.

This retrospective cross-sectional analysis of telemedicine used the first nine months of 2019 and 2020 electronic health record data from regular non-urgent outpatient care in a large healthcare provider in Nevada. The dependent variables were the use of telemedicine among all outpatient visits and using telemedicine more than once among those patients who did use telemedicine. The independent variables were race/ethnicity, insurance status, and language preference.

Telemedicine services increased from virtually zero (16 visits out of 237,997 visits) in 2019 to 10.8% (24,159 visits out of 222,750 visits) in 2020. Asians (odds ratio [OR] = 0.85; 95% confidence interval [CI] = 0.85,0.94) and Latinos/Hispanics (OR = 0.89; 95% CI = 0.85, 0.94) were less likely to use telehealth; Spanish-speaking patients (OR = 0.68; 95% CI = 0.63, 0.73) and other non-English-speaking patients (OR = 0.93; 95% CI = 0.88, 0.97) were less likely to use telehealth; and both Medicare (OR = 0.94; 95% CI = 0.89, 0.99) and Medicaid patients (OR = 0.91; 95% CI = 0.87, 0.97) were less likely to use telehealth than their privately insured counterparts. Patients treated in pediatric (OR = 0.76; 95% CI = 0.60, 0.96) and specialty care (OR = 0.67; 95% CI = 0.65, 0.70) were less likely to use telemedicine as compared with patients who were treated in adult medicine.

Conclusions

Racial/ethnic and linguistic factors were significantly associated with the utilization of telemedicine in non-urgent outpatient care during COVID-19, with a dramatic increase in telemedicine utilization during the onset of the pandemic. Reducing barriers related to socioeconomic factors can be improved via policy and program interventions.

Peer Review reports

The Coronavirus disease 2019 (COVID-19) pandemic has significantly impacted the landscape of the health care delivery system, resulting in faster changes in traditional medical care infrastructure; there has been a transition from face-to-face healthcare services to more information and technology-based care through telemedicine. In order to reduce unnecessary exposure to COVID-19, mitigate the spread of the virus, and reduce surges in hospitals and clinics, the implementation of telemedicine as an alternative to in-person outpatient visits became a necessary component for non-emergency healthcare, with many hospitals choosing to discontinue in-person treatment [ 1 , 2 ]. Doctors immediately started offering treatments via telehealth due to payment equality, the removal of administrative obstacles, and the Affordable Care Act of 1996 exemptions by the Department of Health and Human Services [ 3 ]. Additionally, almost all federal, state, and commercial insurers altered their telehealth coverages in reaction to the COVID-19 outbreak, which led to an increase in the usage of telehealth in actual practice. Increasing the population eligible for telehealth was one of many adjustments made as part of the expansion to make it easier for patients to get clinical treatment outside of in-person, face-to-face appointments [ 4 ].

Telehealth refers to the utilization of digital data and telecommunication services to help and encourage clinical health care, according to the Health Resources and Services Administration (HRSA) of the U.S. Department of Health and Human Services. The World Health Organization (WHO) defines telemedicine as the delivery of healthcare services by all healthcare professionals using information and communication technologies where distance is a critical factor [ 5 , 6 ]. Although telehealth and telemedicine are used interchangeably, telemedicine refers to remote doctor-patient consultations that are services delivered by physicians only [ 7 ]. Remote diagnoses, e-prescriptions, e-consultations/specialist appointments, and digital transmissions of medical imaging are examples of telemedicine [ 7 ]. Technologies for telemedicine typically include videoconferencing, the internet, store-and-forward imaging, streaming media, and terrestrial and wireless communications.

Through video imaging and other technologies, telehealth allows medical practitioners to perform services remotely [ 8 ]. According to previous research, telehealth visits may provide patients with health outcomes that are equivalent to regular in-person doctor’s visits, with the added advantage of better access to treatment [ 9 ]. However, increases in telemedicine usage during the pandemic have raised concerns of exacerbating health disparities [ 10 ]. Individuals with insurance coverage, those who do not understand English, and elderly patients are less likely to utilize telehealth than in-person appointments [ 11 , 12 , 13 ]. Conversely, research from 2016 demonstrates that patients without health insurance had a 21% higher likelihood of favoring a telehealth consultation over a regular one [ 14 ]. Pediatric patients who did not speak English (1.7% vs. 2.7%) or who had Medicaid insurance (32.0 vs. 35.9%) had a lower likelihood of finishing a consultation via telemedicine [ 15 ]. Also, compared to those aged 31–45, individuals aged 46–60 were more likely to schedule telehealth meetings; compared to white patients, Black and other racially-identifying patients used telehealth less frequently [ 13 , 16 ]. Another study identified that telehealth visits made up more than 50% of all visits, with Asian patients using telehealth less frequently than both Black and White patients. There was no difference in telehealth utilization among Black and White patients below 65 years old, however utilization among Asian and Hispanic patients was less than that of same-aged White patients [ 17 ].

Since the COVID-19 pandemic, there has been a large influx of literature regarding different aspects of telemedicine use. However, they had limited socioeconomic analyses, and very few studies used clinical data to examine the use of telemedicine in outpatient care during the pandemic. Therefore, the objective of this study was to examine potential changes in the utilization of telemedicine in non-urgent outpatient care in Nevada during the beginning of the COVID-19 pandemic and investigate potential factors associated with telemedicine use before and after the COVID-19 outbreak.

This study was a retrospective secondary data analysis using the first nine months of 2019 and 2020 non-urgent outpatient visit data abstracted from the electronic health record (EHR) system of a large healthcare provider in the Southwest region of the United State. The EHR data was completely de-identified by the healthcare provider organization before being given to the research team for analysis to maintain anonymity of patients. A Data Use Agreement was approved by both the healthcare organization and the affiliated institution of the academic researchers in order to protect the privacy of individual health information through the Health Insurance Portability and Accountability Act (HIPAA). Further, the study was reviewed then exempted by the Institutional Review Board (IRB) of the University of Nevada, Las Vegas. The non-urgent care outpatient services include three departments: adult care, pediatric care, and specialty care. The utilization of telemedicine services in both years were compared. The 15 most frequent principal diagnoses were defined using the International Classification of Diseases, Tenth Revision, Clinical Modification for outpatient care prior and during the pandemic were also compared between 2019 and 2020.

Social demographic factors in association with use of telemedicine services were examined through multivariable analysis. The dependent variables were the use of telemedicine among all outpatient visits and using telemedicine more than once among those patients who did use telemedicine. The independent variables (i.e., sociocultural factors) included age (age groups-5 years old or younger, 6–17, 18–24, 25–34, 35–44, 45–54, 55–64, 65–74, >=75 years old), sex (female or male), race/ethnicity (White, Black, Hispanic/Latino, Asian/Pacific Islander, and other), primary spoken language (English, Spanish, and other), and health insurance program (Medicare, Medicaid, and private insurance). Given the small sample size, American Indians and Native Alaskans were grouped with the other group.

Multiple logistic regression was used to analyze factors associated with telemedicine use in 2020 only because there was virtually no telemedicine use in 2019. Clinical categories such as adult care, pediatric care, and specialty care (cardiology, gastroenterology, rheumatology, endocrinology, and neurology) were controlled in the analysis. Sensitivity analysis was conducted by analyzing the three clinical categories separately, and the results were fairly similar to those of analyzing all three together. It is important to note that EHR data in this study only includes audiovisual telemedicine visits.

There were 237,993 and 222,750 outpatient visits in the first nine months of 2019 and 2020, respectively, among which there were 16 and 24,228 telehealth visits in 2019 and 2020, respectively. The total number of patients who had 24,228 telehealth visits in 2020 was 19,503. Telemedicine services were virtually zero (16 visits out of 237,997 visits) in the first nine months of 2019 but increased to 10.8% (24,159 visits out of 222,750 visits) in the nine months of 2020. Figure  1 depicts the most frequent principal diagnoses of outpatient visits prior to and during the COVID-19 pandemic. The top 15 most frequent diagnoses for the all-patient visits were similar between the two years while the volume of long-term (current) drug therapy increased by 158% (21,907 in 2019 to 56,611 in 2020). The five most frequent primary diagnoses for all patients in 2019 and 2020 were exam, long term drug therapy, type 2 diabetes without complications, disorders of lipoprotein metabolism, and hypertension. Specifically for telemedicine users in 2020, type 2 diabetics, disorders of lipoprotein metabolism, and hypertension were the top three primary diagnoses, showing consistencies between telemedicine and in-person diagnoses (Fig.  1 ). Characteristics of telemedicine users in outpatient care during the COVID-19 period in 2020 are shown in Table  1 . The total number of patients who used telemedicine in 2020 was 19,503 (Table  1 ). Among those 19,503 patients who used telemedicine in 2020, over 60% were females, about half were minorities, about 13% spoke languages other than English, 42.7% were covered by Medicare, and 16.0% were covered by Medicaid. In addition, about two-thirds were treated in adult medicine, and 18.1% had more than one telemedicine visit (Table  1 ).

figure 1

15 most frequent principal diagnosis of outpatient visits pre- and during the COVID-19 pandemic. Notes . [ ] indicates ICD-10 codes; COPD = chronic obstructive pulmonary disease; T2D = type 2 diabetics

Table  2 describes factors associated with the utilization of telemedicine services among all outpatient visits during the COVID-19 period. Among all outpatient visits in 2020, older and younger age groups were less likely to use telemedicine compared to the median age group 35–44 (aOR = 0.74, CI=[0.68–0.81] for 18–24, aOR = 0.85, CI=[0.80–0.89] for 55–64 ages, aOR = 0.71, CI=[0.67–0.76] for 65–74 ages, and aOR = 0.69, CI=[0.65–0.74] for older than 75 years old; males, as compared to females, were less likely to use telemedicine (adjusted odds ratio (aOR) = 0.86, 95% confidence interval (CI) = [0.83–0.88]); Asians and Latinos/Hispanics, as compared to Whites, were less likely to use telehealth (aOR = 0.85, CI = [0.81–0.90] for Asians, and aOR = 0.89, CI = [0.85–0.94] for Latinos/Hispanics); Spanish-speaking patients and other non-English speaking patients, as compared with English-speaking patients, were less likely to use telehealth (aOR = 0.68, CI = [0.63–0.73] for Spanish speaking patients and aOR = 0.93, CI = [0.88–0.97] for other language speaking patients); and both Medicare and Medicaid patients were less likely to use telehealth than their privately insured counterparts (aOR = 0.94, CI = [0.89–0.99] for Medicare patients and aOR = 0.91, CI = [0.87–0.97] for Medicaid patients). In addition, patients treated in pediatric and specialty care were less likely to use telemedicine (aOR = 0.76, CI=[0.60–0.96] for Pediatric, and aOR = 0.67, CI=[0.65–0.70] for Specialty), as compared with patients who were treated in adult medicine.

Furthermore, factors associated with frequent telemedicine users were shown in Table  3 . Among patients who did use telehealth services, males, as compared to females, were less likely to have multiple telehealth visits (aOR = 0.89, CI = [0.82–0.96]); Asians, as compared to their white counterparts, were less likely to have multiple telehealth visits (aOR = 0.85, CI = [0.81–0.90]); patients speaking languages other than English or Spanish, as compared with their English-speaking counterparts, were less likely to have multiple telehealth outpatient visits (aOR = 0.86, CI = [0.74–0.99]); whereas both Medicare and Medicaid patients were more likely to have multiple telehealth visits than patients covered by private health insurance (aOR = 1.33, CI = [1.16–1.52] for Medicare patients and aOR = 1.19, CI = [1.07–1.34] for Medicaid patients).

This cross-sectional analysis of telemedicine use during the emergence of the COVID-19 pandemic (January– September 2020) demonstrated dramatic increases in telemedicine utilization in non-urgent outpatient care compared to the pre-pandemic period (January-September 2019). Contributing factors that could have affected telemedicine use during the pandemic include the following policy adaptations and guidance regulations. In February 2020, the Centers for Disease Control and Prevention (CDC) issued guidelines for social distancing practices and recommended that healthcare facilities and providers offer clinical services through virtual care in order to mitigate the spread of the virus and reduce hospital case surges. The increased number of telemedicine utilization during COVID-19 might also be related to the policy changes and regulatory waivers from the CDC in response to COVID-19 and provisions of the U.S. Coronavirus Aid, Relief and Economic Security Act on March 6, 2020. Under the emergency policies and waivers, no preexisting patient-provider relationship is required, virtual visits from the patient’s home or audio-only are allowed, and reimbursements for telemedicine improved dramatically through the expansion of Medicare’s telehealth coverage [ 6 ].

During the pandemic, the increase of telemedicine utilization diverged significantly across clinical specialties, medical conditions, and patient demographics. Delays in medical care were observed in the U.S. with concerns of possible COVID-19 exposure and limiting non-essential care; The CDC reported that about 41% of U.S. adults experienced delayed medical care in 2020 [ 18 ]. Similarly, the scope of deferred care during the pandemic was shown in our study. We observed that the number of visits for the fifteen most frequent principal diagnoses in outpatient care remained similar between the pre- and during the pandemic period, however, the volume of long-term (or current) drug therapy doubled from 2019 to 2020. Our findings indicate that telemedicine may have provided the opportunity to dramatically increase drug therapy and thereby defer patient care during the pandemic. Studies showed that routine and chronic care management were the most reported types of deferred care [ 19 , 20 ], yet patients with common chronic conditions showed lower use of telemedicine during the pandemic [ 10 ]. Contrary to previous studies, our study indicated that the scope of outpatient care for common chronic conditions such as hypertension and diabetes did not alter with the use of telemedicine during the pandemic.

Telemedicine use also varied across socioeconomic characteristics of patients. Our study demonstrates that inequities exist in regards to utilization of outpatient care via telemedicine for socioeconomically diverse patients during the pandemic. We found that younger and older age, Asian and Hispanic/Latino race, Spanish-speaking patients, and patients with Medicaid insurance were independently associated with less utilizations of telemedicine. Our findings are consistent with prior research regarding telemedicine use in overall and primary care [ 10 , 13 , 19 , 21 , 22 ].

As older adults tend to have slower rates of technology adoption, less experience with technology, and low digital health literacy, a common concern has arisen regarding their access to telemedicine and virtual care [ 13 , 23 , 24 , 25 ]. Our findings highlight this disparity in telemedicine access as older adults are less likely to utilize telemedicine than their younger counterparts. Notably, patients aged 55 to 64, 65 to 74, and 75 and older were found to be 15%, 29%, and 31% less likely to access telemedicine in comparison to adults aged 35 to 44. More studies must be done to identify potential barriers to telemedicine access in young people; a potential hypothesis is that younger patients may have different perceptions of their medical needs, potentially attributing to less telemedicine usage. Additionally, our study discovered that males had less telemedicine use compared to females during the pandemic, [ 13 , 26 ]. This is consistent with past studies that indicated that females tend to have more medical visits and prefer telemedicine care over in-person care [ 26 ].

Furthermore, our study demonstrated that socioeconomic disparities in accessing telemedicine care were present during the pandemic. Both Asians and Hispanics/Latinos were less likely to use telemedicine compared to Whites. While Asians have been known to have high technology adoption rates and broadband service use, sociocultural factors may have played a role in our findings [ 27 ]. Asian and Hispanic/Latino patients may experience specific language and cultural barriers, which may explain less telemedicine use as they may prefer to stay with their providers in person in order to retain communication and familiarity. While their providers might have conducted some telemedicine visits, follow-up visits could be done by mid-level providers, leading to patient-doctor relationship discontinuities for the patient. A recent study in California identified language as an essential barrier to telehealth use as primary care and specialty ambulatory clinics have showed that non-English speaking patients were less likely utilize telemedicine [ 13 , 28 ]. Our study also showed language as an essential barrier to telemedicine use in outpatient care, as non-English speaking patients were less likely to use telemedicine. Compared with in-person visits, effective communications were a common challenge during telemedicine [ 29 ]. Barriers to speaking up and asking questions and establishing a provider-patient relationship via telemedicine can create challenges for non-English speaking patients. Federal laws require Medicare and Medicaid to provide translation services to patients in their preferred language, however, not all providers offer language services [ 28 ]. The telemedicine platform of the healthcare provider in our study provided the use of a third-party vendor that could be contacted for translation services during live telemedicine visits. However, even with these accommodations, our study showed that telemedicine was still challenging for patients with limited English proficiency, making them less likely to repeatedly use telemedicine. The provision of linguistically appropriate care in telemedicine will not only help reduce language disparities but also eliminate other systematic and cultural barriers for patients with limited English proficiency.

There are three main different types of healthcare payers in the United States. Government payers include U.S. government-funded health insurance plans such as Medicare and Medicaid. Medicare is federal health insurance for anyone age 65 and older or people with certain disabilities and conditions. Medicaid is a state and federal program that provides health coverage to people with limited income and resources. Most individual and group health insurance plans are covered by private payers which include non-insurance payment such as paying cash directly. Our study defines private insurance as both commercial and private payers. Our study found that patients with government-funded insurance payers, both Medicare and Medicaid patients, were less likely to use telemedicine. However, for patients that did use telemedicine, they tended to use it multiple times. Past studies have shown mixed findings regarding telehealth use and payer status. Previous studies have shown that there is an association between patients with Medicaid and Medicare with lower healthcare utilization and telemedicine [ 13 ]. A Missouri study showed that Medicaid and Medicare patients had relatively higher odds of telehealth use compared to patients with private insurance during the COVID-19 telehealth expansion [ 30 ]. Findings from our study may reflect the early implementation of telemedicine at our healthcare system. Our health system has provided telemedicine in urgent care for private insurance patients since 2014, Medicaid patients since 2016, and Medicare patients since 2018. Therefore, patients with private insurance were more familiar with telemedicine than patients with government payers, potentially leading to higher telemedicine visits for those patients during the pandemic. While prior to the pandemic telemedicine was only used for urgent care, familiarity with the technology may have led to increases in non-urgent outpatient visits. Despite having less telemedicine use, patients with government-funded health insurance who had telemedicine visits tend to utilize telemedicine more frequently. A potential explanation for this finding is the advantages of telemedicine, as it reduces transportation costs, is convenient, and provides quality care to the patient with mobility limitations.

Our study also showed medical specialties as a potential contributing factor associated with telemedicine use during the pandemic. Patients in pediatric and specialty care were less likely to use telemedicine compared to patients in adult medicine. This may have been due to patient concerns regarding quality of care; provider preference of providing telemedicine care may have differed depending on their medical specialties in addition to the needs of in-person care [ 31 ]. However, reasons for this preference among physicians are unclear and may suggest an area for further research in order to identify specific barriers.

We acknowledge that our study had limitations. First, the data in this study was from a sample of large outpatient care providers in a single state, meaning that our findings may not be generalizable across all virtual encounters conducted during the study period. Second, our study defined the first nine months of 2020 as a post-COVID-19 period without considering any COVID-19-related policy implementations such as Medicare’s telehealth coverage expansion on March 17, 2020 and the healthcare system shutdown in Nevada on March 15, 2020 due to nationwide guidelines. Therefore, our findings might underestimate the impact of the COVID-19 pandemic on telemedicine use. Further, studies of telemedicine utilization beyond 2020 will help us to understand impact of post-pandemic on socioeconomic factors associated with telehealth use.

Telemedicine utilization has increased dramatically since the pandemic, assisted by technological advancement and availability. Nevertheless, disparities exist for different races/ethnicities and non-English speakers. In order to reduce barriers related to socioeconomic factors, policy and program interventions must be improved in order to meet the new healthcare demands set in place by the transformation of healthcare delivery models. For example, enhancing language-related communication supports can likely reduce disparities in patient telehealth use for those with varied sociocultural backgrounds and socioeconomic statuses.

Data availability

Data will be available based on request. Please contact Jillian Kreston at [email protected].

Colbert GB, Venegas-Vera AV, Lerma EV. Utility of telemedicine in the COVID-19 era. Rev Cardiovasc Med. 2020;21(4):583–7.

Article   PubMed   Google Scholar  

Keesara S, Jonas A, Schulman K. Covid-19 and Health Care’s Digital Revolution. N Engl J Med. 2020;382(23):e82.

Article   CAS   PubMed   Google Scholar  

Kruse CS, Krowski N, Rodriguez B, Tran L, Vela J, Brooks M. Telehealth and patient satisfaction: a systematic review and narrative analysis. BMJ Open. 2017;7(8):e016242.

Article   PubMed   PubMed Central   Google Scholar  

Campos-Castillo C, Anthony D. Racial and ethnic differences in self-reported telehealth use during the COVID-19 pandemic: a secondary analysis of a US survey of internet users from late March. J Am Med Inf Assoc. 2021;28(1):119–25.

Article   Google Scholar  

A health telematics policy in. Support of WHO’s Health-for-all strategy for global health development: report of the WHO Group Consultation on Health Telematics, 11–16 December, Geneva, 1997. Geneva, Switzerland: World Health Organization; 1998.

Google Scholar  

Shaver J. The state of Telehealth before and after the COVID-19 pandemic. Prim Care. 2022;49(4):517–30.

Doraiswamy S, Abraham A, Mamtani R, Cheema S. Use of Telehealth during the COVID-19 pandemic: scoping review. J Med Internet Res. 2020;22(12):e24087.

Weinstein RS, Lopez AM, Joseph BA, Erps KA, Holcomb M, Barker GP, et al. Telemedicine, telehealth, and mobile health applications that work: opportunities and barriers. Am J Med. 2014;127(3):183–7.

Uscher-Pines L, Mehrotra A. Analysis of teladoc use seems to indicate expanded access to care for patients without prior connection to a provider. Health Aff (Millwood). 2014;33(2):258–64.

Patel SY, Mehrotra A, Huskamp HA, Uscher-Pines L, Ganguli I, Barnett ML. Variation in Telemedicine Use and Outpatient Care during the COVID-19 pandemic in the United States. Health Aff (Millwood). 2021;40(2):349–58.

Wegermann K, Wilder JM, Parish A, Niedzwiecki D, Gellad ZF, Muir AJ, et al. Racial and socioeconomic disparities in utilization of Telehealth in patients with Liver Disease during COVID-19. Dig Dis Sci. 2022;67(1):93–9.

Hsueh L, Huang J, Millman AK, Gopalan A, Parikh RK, Teran S, et al. Disparities in Use of Video Telemedicine among patients with Limited English proficiency during the COVID-19 pandemic. JAMA Netw Open. 2021;4(11):e2133129.

Eberly LA, Kallan MJ, Julien HM, Haynes N, Khatana SAM, Nathan AS, et al. Patient characteristics Associated with Telemedicine Access for primary and Specialty Ambulatory Care during the COVID-19 pandemic. JAMA Netw Open. 2020;3(12):e2031640.

Polinski JM, Barker T, Gagliano N, Sussman A, Brennan TA, Shrank WH. Patients’ satisfaction with and preference for Telehealth visits. J Gen Intern Med. 2016;31(3):269–75.

Tilden DR, Datye KA, Moore DJ, French B, Jaser SS. The Rapid Transition to Telemedicine and its Effect on Access to care for patients with type 1 diabetes during the COVID-19 pandemic. Diabetes Care. 2021;44(6):1447–50.

Friedman EE, Devlin SA, Gilson SF, Ridgway JP. Age and racial disparities in Telehealth Use among people with HIV during the COVID-19 pandemic. AIDS Behav. 2022;26(8):2686–91.

Stevens JP, Mechanic O, Markson L, O’Donoghue A, Kimball AB. Telehealth Use by Age and Race at a single Academic Medical Center during the COVID-19 pandemic: Retrospective Cohort Study. J Med Internet Res. 2021;23(5):e23905.

Czeisler M, Marynak K, Clarke KEN, Salah Z, Shakya I, Thierry JM, et al. Delay or Avoidance of Medical Care because of COVID-19-Related concerns - United States, June 2020. MMWR Morb Mortal Wkly Rep. 2020;69(36):1250–7.

Article   CAS   PubMed   PubMed Central   Google Scholar  

Ponce SA, Wilkerson M, Le R, Nápoles AM, Strassle PD. Inability to get needed health care during the COVID-19 pandemic among a nationally representative, diverse population of U.S. adults with and without chronic conditions. BMC Public Health. 2023;23(1):1868.

Rose GL, Bonnell LN, Clifton J, Natkin LW, Hitt JR, O’Rourke-Lavoie J. Outcomes of Delay of Care after the onset of COVID-19 for patients managing multiple chronic conditions. J Am Board Fam Med. 2022;35(6):1081–91.

Lucas JW, Villarroel MA. Telemedicine Use among adults: United States, 2021. NCHS Data Brief. 2022;445:1–8.

Lahat A, Shatz Z. Telemedicine in clinical gastroenterology practice: what do patients prefer? Th Adv Gastroenterol. 2021;14:1756284821989178.

Gordon NP, Hornbrook MC. Differences in Access to and preferences for using patient portals and other eHealth technologies based on race, ethnicity, and age: a database and survey study of seniors in a large Health Plan. J Med Internet Res. 2016;18(3):e50.

Levine DM, Lipsitz SR, Linder JA. Trends in seniors’ use of Digital Health Technology in the United States, 2011–2014. JAMA. 2016;316(5):538–40.

Tartaglia E, Vozzella EA, Iervolino A, Egidio R, Buonocore G, Perrone A, et al. Telemedicine: a cornerstone of healthcare assistance during the SARS-Cov2 pandemic outbreak but also a great opportunity for the near future. Smart Health (Amst). 2022;26:100324.

Hung M, Ocampo M, Raymond B, Mohajeri A, Lipsky MS. Telemedicine among adults living in America during the COVID-19 pandemic. Int J Environ Res Public Health. 2023;20(9).

Ryan C. Computer and internet use in the United States: 2016. United States Census Bureau, Commerce USDo; 2018 August.

Rodriguez JA, Saadi A, Schwamm LH, Bates DW, Samal L. Disparities in Telehealth Use among California patients with Limited English proficiency. Health Aff (Millwood). 2021;40(3):487–95.

Gordon HS, Solanki P, Bokhour BG, Gopal RK. I’m not feeling like I’m part of the conversation patients’ perspectives on communicating in clinical video telehealth visits. J Gen Intern Med. 2020;35(6):1751–8.

Pierce RP, Stevermer JJ. Disparities in the use of telehealth at the onset of the COVID-19 public health emergency. J Telemed Telecare. 2023;29(1):3–9.

Nies S, Patel S, Shafer M, Longman L, Sharif I, Pina P. Understanding Physicians’ preferences for Telemedicine during the COVID-19 Pandemic: cross-sectional study. JMIR Form Res. 2021;5(8):e26565.

Download references

Acknowledgements

Not applicable.

Author information

Authors and affiliations.

Department of Health Care Administration and Policy, School of Public Health, University of Nevada in Las Vegas, Las Vegas, USA

Pearl C. Kim, Haniyeh Shariatmadari & Jay J. Shen

InnovAge PACE, San Bernardino, USA

Optum Care, United Health Group, Las Vegas, USA

Jillian Kreston

Orange County School of the Arts, Santa Ana, USA

Estella Sky Keyoung

Center for Health Disparities and Research, School of Public Health, University of Nevada in Las Vegas, Las Vegas, USA

Jay J. Shen

School of Health Policy and Management, Chinese Academy of Medical Sciences & Peking Union Medical College, Beijing, China

Bing-Long Wang

You can also search for this author in PubMed   Google Scholar

Contributions

P.K wrote the main the manuscript text, interpreted the data, and substantively revised it. L.T. drafted the manuscript and interpreted the data. J.K. made substantial contributions to the acquisition and analysis of data. E.K made a substantial contribution to revisions and prepared Fig. 1; Tables 1, 2 and 3. H.S. drafted the manuscript. J.S. made substantial contributions to the design of the work, data analysis, and revision. B.W. made substantial contributions to data interpretation and revision.

Corresponding authors

Correspondence to Jay J. Shen or Bing-Long Wang .

Ethics declarations

Ethics approval and consent to participate.

EHR data in our study used completely deidentified data in accordance with relevant guidelines and regulations. The Institutional Review Board (IRB) of University of Nevada, Las Vegas waived the need for Ethical approval and need for informed consent.

Consent for publication

Competing interests.

The authors declare no competing interests.

Additional information

Publisher’s note.

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ . The Creative Commons Public Domain Dedication waiver ( http://creativecommons.org/publicdomain/zero/1.0/ ) applies to the data made available in this article, unless otherwise stated in a credit line to the data.

Reprints and permissions

About this article

Cite this article.

Kim, P.C., Tan, LF., Kreston, J. et al. Socioeconomic factors associated with use of telehealth services in outpatient care settings during the COVID-19. BMC Health Serv Res 24 , 446 (2024). https://doi.org/10.1186/s12913-024-10797-4

Download citation

Received : 11 October 2023

Accepted : 28 February 2024

Published : 09 April 2024

DOI : https://doi.org/10.1186/s12913-024-10797-4

Share this article

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

  • Telemedicine
  • Socioeconomic disparities

BMC Health Services Research

ISSN: 1472-6963

covid 19 data analysis research paper

National Center for Science and Engineering Statistics

  • Report PDF (588 KB)
  • Report - All Formats .ZIP (2.2 MB)
  • Share on X/Twitter
  • Share on Facebook
  • Share on LinkedIn
  • Send as Email

International Collaboration in Selected Critical and Emerging Fields: COVID-19 and Artificial Intelligence

April 11, 2024

Research collaboration is a critical strategy for pooling resources, sharing expertise, and accelerating innovation, and institutions may use collaboration to synthesize novel ideas and bridge knowledge or material gaps (Katz and Hicks 1997; Lee, Walsh, and Wang 2015; Wagner et al. 2001). Ongoing research on the transformative potential of artificial intelligence (AI) and the mitigation and treatment of COVID-19 in 2020 are two cases in which scientific progress has been important. Both fields have been recognized as national priorities ( https://www.whitehouse.gov/priorities/ ) and have complex challenges that both domestic and international institutions are motivated to overcome.

A country’s collaboration patterns, both domestic and international, can indicate the presence of expertise or the necessity of knowledge and resource sharing, as countries tend to collaborate internationally less in fields when they have sufficient resources within their own borders (Chinchilla-Rodríguez, Sugimoto, and Larivière 2019). International research collaboration can provide a rapid response to societal challenges, including public health crises (Carvalho et al. 2023) or technological paradigm shifts, and strong international collaborators play a large role in shaping the direction and priorities of research fields worldwide (Leydesdorff and Wagner 2009). A concentration on domestic research can indicate the presence of sufficient domestic knowledge and resources or an interest in preserving in-house expertise. This InfoBrief examines the extent to which top producers of science and engineering (S&E) articles engaged in domestic and international collaborations in AI and COVID-19 research.

Growth in Artificial Intelligence Articles

Between 2003 and 2022, the number of published articles in AI grew faster relative to the number of articles in computer science, table SPBS-22 in National Science Board, National Science Foundation. 2023. Publications Output: U.S. Trends and International Comparisons. Science and Engineering Indicators 2024. NSB-2023-33. Available at https://ncses.nsf.gov/pubs/nsb202333 ." data-bs-content="See table SPBS-22 in National Science Board, National Science Foundation. 2023. Publications Output: U.S. Trends and International Comparisons. Science and Engineering Indicators 2024. NSB-2023-33. Available at https://ncses.nsf.gov/pubs/nsb202333 ." data-endnote-uuid="5569bd18-3709-4dce-830a-89d0460f257a">​ See table SPBS-22 in National Science Board, National Science Foundation. 2023. Publications Output: U.S. Trends and International Comparisons. Science and Engineering Indicators 2024. NSB-2023-33. Available at https://ncses.nsf.gov/pubs/nsb202333 . due in part to the newness of the AI field compared with the more established field of computer science. AI articles worldwide grew by 1,100% during this period, reaching 123,402 articles in 2022, table SPBS-99 ." data-bs-content="See NSB-2023-33, table SPBS-99 ." data-endnote-uuid="5d5b3221-7e43-4793-83a8-dfe2d9c83207">​ See NSB-2023-33, table SPBS-99 . or 4% of all S&E publications globally, figure PBS-3 ." data-bs-content="See NSB-2023-33, figure PBS-3 ." data-endnote-uuid="dca1499d-57bb-40e0-8612-5666dde1c402">​ See NSB-2023-33, figure PBS-3 . compared with 290% growth in computer science articles. table SPBS-22 ." data-bs-content="See NSB-2023-33, table SPBS-22 ." data-endnote-uuid="7542a24d-409a-4612-abb1-2b561d8afe2b">​ See NSB-2023-33, table SPBS-22 . From 2017 to 2022, the six countries with the highest overall publication outputs figure PBS-3 ." data-bs-content="See NSB-2023-33, figure PBS-3 ." data-endnote-uuid="77e4b989-e695-4853-90bc-fe667d3a3ea4">​ See NSB-2023-33, figure PBS-3 . were also the countries with the highest AI research output (China, India, the United States, Japan, the United Kingdom, and Germany) ( figure 1 ). In 2022, the top two producers of AI research articles were China (42,524 articles, or 35% of total AI publication output) and India (22,557, or 18%), followed by the United States (12,642, or 10%). Germany, Japan, and the United Kingdom published similar numbers of publications, ranging between 3,700 and 4,700 articles (3% – 4%).

  • For grouped bar charts, Tab to the first data element (bar/line data point) which will bring up a pop-up with the data details
  • To read the data in all groups Arrow-Down will go back and forth
  • For bar/line chart data points are linear and not grouped, Arrow-Down will read each bar/line data points in order
  • For line charts, Arrow-Left and Arrow-Right will move to the next set of data points after Tabbing to the first data point
  • For stacked bars use the Arrow-Down key again after Tabbing to the first data bar
  • Then use Arrow-Right and Arrow-Left to navigate the stacked bars within that stack
  • Arrow-Down to advance to the next stack. Arrow-Up reverses

AI articles, by selected country: 2003–22

AI = artificial intelligence.

AI article counts refer to publications from a selection of conference proceedings and peer-reviewed journals in science and engineering fields from Scopus. The subset of AI articles was determined by All Science Journal Classification subject matter classification, supplemented by an algorithm that used a series of article characteristics to determine the field of papers published in multidisciplinary journals. Articles are classified by their year of publication and are assigned to a region, country, or economy on the basis of the institutional addresses of the authors listed in the article. Articles are credited on a whole count basis (i.e., for articles produced by authors from different countries, each country is credited for one article). Data for all regions, countries, and economies are available in supplemental table SPBS-99 in Publications Output: U.S. Trends and International Comparisons ( https://ncses.nsf.gov/pubs/nsb202333/table/SPBS-99 ).

National Center for Science and Engineering Statistics; Science-Metrix; Elsevier, Scopus abstract and citation database, accessed April 2023.

Collaboration Trends in Artificial Intelligence Articles

Coauthorship trends on S&E articles shed light on overall collaboration practices. The affiliations of authors to their home institutions and countries are used to infer whether collaboration has occurred across institutions, both domestically and internationally. Three types of collaboration are detailed in this InfoBrief, and an article is the unit of analysis. An article with at least one author from an institution of a given country is classified as one of three categories: an international collaboration , if an author from any other country is present; a domestic collaboration , if all authors are from the same country, but are affiliated with more than one institution; or a single institution article if all authors share the same institutional affiliation or the article is solo authored.

Collaboration Trends

From 2017 to 2022, 37% of U.S. research papers on AI were the result of international collaboration, placing the United States between the five other top producers of AI research papers, with the United Kingdom (61%) and Germany (40%) producing a higher rate of internationally collaborative research and with Japan (25%), China (17%), and India (10%) producing a lower rate ( figure 2 ). Rates of international collaboration for the United States were slightly lower for AI research papers than for all S&E research papers (37% versus 39%). Likewise, across the other five top producers of AI research papers, rates of international collaboration were lower for AI research papers than for all S&E research papers. Compared with other countries, China had the greatest proportion of AI papers that were domestic collaborations (41%). Across the six top-producing countries, the rate of articles produced by a single institution were more common in AI research than in all S&E research (42% versus 26%).

International collaboration, domestic collaboration, and single institution publications on AI research and overall international collaboration on all S&E research, by selected country: 2017–22

AI = artificial intelligence; S&E = science and engineering.

AI articles are assigned to a country, or economy on the basis of the institutional addresses of the authors listed in the article. The subset of AI articles was determined by All Science Journal Classification subject matter classification, supplemented by an algorithm that used a series of article characteristics to determine the field of papers published in multidisciplinary journals. Articles are credited on a whole count basis (i.e., for articles produced by authors from different countries, each country is credited for one article). The percentages refer to the proportion of AI articles to feature collaboration or to the proportion of general articles across all fields to feature collaboration. Articles were excluded when one or more coauthored publications had incomplete address information in the Scopus database; therefore, they cannot be reliably identified as international or domestic collaborations. Data for all regions, countries, and economies are available in supplemental table SPBS-99 and supplemental table SPBS-33 in Publications Output: U.S. Trends and International Comparisons ( https://ncses.nsf.gov/pubs/nsb202333/table/SPBS-99 and https://ncses.nsf.gov/pubs/nsb202333/table/SPBS-33 ).

International Collaboration

Overall, scientific research has become increasingly collaborative over time (Gazni, Sugimoto, and Didegah 2012; Wuchty, Jones, and Uzzi 2007). Although the rate of international collaboration in AI publications has been smaller than the rate of international collaboration across all S&E fields over the past 5 years, international collaboration in AI articles has gradually increased overall between 2003 and 2022. By country, international collaborations in AI increased in Japan (from 15% to 28%), the United States (from 24% to 39%), Germany (from 37% to 42%), and the United Kingdom (from 36% to 66%) ( figure 3 ). Over this same time period, India and China did not show an increasing trend, despite some fluctuation. For example, after China exhibited a period of increased international collaboration in AI research, from 7% in 2009 to 23% in 2015, the rate has since decreased to 16% in 2022.

International collaboration on AI articles, by selected country: 2003–22

AI article counts refer to publications from a selection of conference proceedings and peer-reviewed journals in science and engineering fields from Scopus. The subset of AI articles was determined by All Science Journal Classification subject matter classification, supplemented by an algorithm that used a series of article characteristics to determine the field of papers published in multidisciplinary journals. Articles are assigned to a country, or economy on the basis of the institutional addresses of the authors listed in the article. Articles are credited on a whole count basis (i.e., for articles produced by authors from different countries, each country is credited for one article). The percentages refer to the proportion of AI articles to feature collaboration. Data for all regions, countries, and economies are available in supplemental table SPBS-99 in Publications Output: U.S. Trends and International Comparisons ( https://ncses.nsf.gov/pubs/nsb202333/table/SPBS-99 ).

Domestic Collaborations and Single Institution Publications

The proportion of single institution publications in AI decreased over time in the United States, from 48% in 2003 to 31% in 2022 ( figure 4 ). Despite this decrease, the proportion of U.S. single institution publications remained higher in AI research than in all S&E research, which decreased from 36% to 20% over the same time period. Over time, the rate of domestic collaboration in AI between U.S. institutions remained relatively stable from 2003 to 2022, ranging between 25% and 30%. In China, the proportion of single institution publications in AI decreased from 59% to 38% between 2003 and 2022, albeit with more fluctuation. China’s proportion of single institution publications both in AI papers and among all S&E fields were similar until 2007, after which the proportion of single institution papers in AI research became higher, while the overall proportion of single institution papers in all S&E research continued to decrease.

Collaborative and single institution articles on AI and single institution articles on all S&E research in the United States and China: 2003–22

Article counts refer to publications from a selection of conference proceedings and peer-reviewed journals in S&E fields from Scopus. The subset of AI articles was determined by All Science Journal Classification subject matter classification, supplemented by an algorithm that used a series of article characteristics to determine the field of papers published in multidisciplinary journals. Articles are assigned to a country, or economy on the basis of the institutional addresses of the authors listed in the article. Articles are credited on a whole count basis (i.e., for articles produced by authors from different countries, each country is credited for one article). The percentages refer to the proportion of AI articles to feature collaboration or to the proportion of general articles across all fields to feature collaboration. Articles were excluded when one or more coauthored publications had incomplete address information in the Scopus database; therefore, they cannot be reliably identified as international or domestic collaborations. Data for all regions, countries, and economies are available in supplemental table SPBS-99 and supplemental table SPBS-33 in Publications Output: U.S. Trends and International Comparisons ( https://ncses.nsf.gov/pubs/nsb202333/table/SPBS-99 and https://ncses.nsf.gov/pubs/nsb202333/table/SPBS-33 ).

COVID-19 Research Collaboration

In 2020, COVID-19 was identified as a national priority ( https://www.whitehouse.gov/priorities/ ), and this shifting priority in research may have impacted collaboration patterns for this research area in 2020. In the same year, 35% of the United States’ published research on COVID-19 involved international collaborations, which was lower than the rates in the United Kingdom (55%), Germany (52%), and Japan (45%) but was higher than the rates in China (27%) and India (28%) ( figure 5 ). The overall rates of international collaboration in the United Kingdom and Germany were higher for all S&E research than for COVID-19 research (65% and 55%, respectively).

International collaboration, domestic collaboration, and single institution publications on COVID-19 research and overall international collaboration on all S&E research, by selected country: 2020

S&E = science and engineering.

Article counts refer to publications from a selection of conference proceedings and peer-reviewed journals in S&E fields from Scopus. Articles are assigned to a country, or economy on the basis of the institutional addresses of the authors listed in the article. Articles are credited on a whole count basis (i.e., for articles produced by authors from different countries, each country is credited for one article). The percents refer to the proportion of COVID-19 articles to feature collaboration or to the proportion of general articles across all fields to feature collaboration. Articles were excluded when one or more coauthored publications had incomplete address information in the Scopus database; therefore, they cannot be reliably identified as international or domestic collaborations. Data for all regions, countries, and economies are available in supplemental table SPBS-91 and supplemental table SPBS-35 in Publications Output: U.S. Trends and International Comparisons ( https://ncses.nsf.gov/pubs/nsb202333/table/SPBS-91 and https://ncses.nsf.gov/pubs/nsb202333/table/SPBS-35 ).

National Center for Science and Engineering Statistics; Science-Metrix; Elsevier, Scopus abstract and citation database, accessed April 2021.

Although each of the top producing countries had a lower rate of international collaborations in AI research than in S&E research, the results were mixed for COVID-19. As the number of articles in AI has increased, the rate of international collaboration also increased. For COVID-19 collaborations in 2020, only some of the top producing countries had lower rates of international collaboration in AI research than in all S&E research.

Data Sources, Limitations, and Availability

Publication data are derived from a large database of publication records that were developed for Science and Engineering Indicators 2024, Publications Output: U.S. Trends and International Comparisons (NSB-2023-33), from the Scopus database by Elsevier. The publication counts and coauthorship information presented are derived from information about research articles and conference papers (hereafter referred to collectively as articles) published in conference proceedings and peer-reviewed scientific and technical journals. Elsevier selects journals and conference proceedings for the Scopus database based on evaluation by an international group of subject-matter experts (see NSB-2023-33, Technical Appendix ), and the National Center for Science and Engineering Statistics (NCSES) undertakes additional filtering of the Scopus data to ensure that the statistics presented in Science and Engineering Indicators measure original and high-quality research publications (Science-Metrix 2023). Although the listed affiliation is generally reflective of the locations where research was conducted, authors may have honorary affiliations, have moved, or have experienced other circumstances preventing their affiliations from being an exact corollary to the research environment.

The subset of AI articles was determined by All Science Journal Classification subject matter classification. Global coronavirus publication output data for 2020 were extracted from two different sources. The COVID-19 Open Research Dataset (CORD-19) was created through a partnership between the Office of Science and Technology Policy, the Allen Institute for Artificial Intelligence, the Chan Zuckerberg Initiative, Microsoft Research, Kaggle, and the National Library of Medicine at the National Institutes of Health, coordinated by Georgetown University’s Center for Security and Emerging Technology. CORD-19 is a highly inclusive, noncurated database. The other coronavirus publication output data source was the Scopus database, which permits more refined analysis because it includes more fields (e.g., instructional country of each author). (See NSB-2021-4, Technical Appendix ).

1 See table SPBS-22 in National Science Board, National Science Foundation. 2023. Publications Output: U.S. Trends and International Comparisons. Science and Engineering Indicators 2024. NSB-2023-33. Available at https://ncses.nsf.gov/pubs/nsb202333 .

2 See NSB-2023-33, table SPBS-99 .

3 See NSB-2023-33, figure PBS-3 .

4 See NSB-2023-33, table SPBS-22 .

5 See NSB-2023-33, figure PBS-3 .

Carvalho DS, Felipe LL, Albuquerque PC, Zicker F, Fonseca BDP. 2023. Leadership and International Collaboration on COVID-19 Research: Reducing the North–South Divide? Scientometrics 128:4689–705. Available at https://doi.org/10.1007/s11192-023-04754-x .

Chinchilla-Rodríguez Z, Sugimoto CR, Larivière V. 2019. Follow the Leader: On the Relationship between Leadership and Scholarly Impact in International Collaborations. PLOS ONE 14:e0218309. Available at https://doi.org/10.1371/journal.pone.0218309 .

Gazni A, Sugimoto CR, Didegah F. 2012. Mapping World Scientific Collaboration: Authors, Institutions, and Countries. Journal of the American Society for Information Science and Technology 63:323–35. Available at https://doi.org/10.1002/asi.21688 .

Katz JS, Hicks D. 1997. How Much Is a Collaboration Worth? A Calibrated Bibliometric Model. Scientometrics 40:541–54. Available at https://doi.org/10.1007/BF02459299 .

Lee Y-N, Walsh JP, Wang J. 2015. Creativity in Scientific Teams: Unpacking Novelty and Impact. Research Policy 44:684–97. Available at https://doi.org/10.1016/j.respol.2014.10.007 .

Leydesdorff L, Wagner CS. 2008. International Collaboration in Science and the Formation of a Core Group. Journal of Informetrics 2:317–25. Available at https://doi.org/10.1016/j.joi.2008.07.003 .

Science-Metrix. 2023. Bibliometric Indicators for the Science and Engineering Indicators 2024. Technical Documentation . Available at https://science-metrix.com/bibliometrics-indicators-for-the-science-and-engineering-indicators-2024-technical-documentation/ . Accessed 26 August 2023.

Wagner CS, Brahmakulam IT, Jackson BA, Wong A, Yoda T. 2001. Science and Technology Collaboration : Building Capacity i n Developing Countries ? Santa Monica, CA: RAND Corporation. Available at https://www.rand.org/pubs/monograph_reports/MR1357z0.html .

Wuchty S, Jones BF, Uzzi B. 2007. The Increasing Dominance of Teams in Production of Knowledge. Science 316:1036. Available at https://doi.org/10.1126/science.1136099 .

Suggested Citation

Boothby C, Schneider B; National Center for Science and Engineering Statistics (NCSES). 2024. International Collaboration in Selected Critical and Emerging Fields: COVID-19 and Artificial Intelligence. NSF 24-323. Alexandria, VA: National Science Foundation. Available at https://ncses.nsf.gov/pubs/nsf24323 .

Report Authors

Clara Boothby ORISE Fellow NCSES E-mail: [email protected]

Benjamin Schneider Interdisciplinary Science Analyst NCSES Tel: 703.292.8828 E-mail: [email protected]

National Center for Science and Engineering Statistics Directorate for Social, Behavioral and Economic Sciences National Science Foundation 2415 Eisenhower Avenue, Suite W14200 Alexandria, VA 22314 Tel: (703) 292-8780 FIRS: (800) 877-8339 TDD: (800) 281-8749 E-mail: [email protected]

Source Data & Analysis

Related content, get e-mail updates from ncses.

NCSES is an official statistical agency. Subscribe below to receive our latest news and announcements.

  • Open access
  • Published: 02 April 2024

The effect of antifibrotic agents on acute respiratory failure in COVID-19 patients: a retrospective cohort study from TriNetX US collaborative networks

  • Hsin-Yi Wang 1 , 2 ,
  • Shih-Chuan Tsai 1 , 3 ,
  • Yi-Ching Lin 1 , 4 ,
  • Jing-Uei Hou 1 &
  • Chih-Hao Chao 5  

BMC Pulmonary Medicine volume  24 , Article number:  160 ( 2024 ) Cite this article

356 Accesses

1 Altmetric

Metrics details

The coronavirus disease 2019 (COVID-19) pandemic has had a significant impact on global health and economies, resulting in millions of infections and deaths. This retrospective cohort study aimed to investigate the effect of antifibrotic agents (nintedanib and pirfenidone) on 1-year mortality in COVID-19 patients with acute respiratory failure.

Data from 61 healthcare organizations in the TriNetX database were analyzed. Adult patients with COVID-19 and acute respiratory failure were included. Patients with a pre-existing diagnosis of idiopathic pulmonary fibrosis before their COVID-19 diagnosis were excluded. The study population was divided into an antifibrotic group and a control group. Propensity score matching was used to compare outcomes, and hazard ratios (HR) for 1-year mortality were calculated.

The antifibrotic group exhibited a significantly lower 1-year mortality rate compared to the control group. The survival probability at the end of the study was 84.42% in the antifibrotic group and 69.87% in the control group. The Log-Rank test yielded a p-value of less than 0.001. The hazard ratio was 0.434 (95% CI: 0.264–0.712), indicating a significant reduction in 1-year mortality in the antifibrotic group. Subgroup analysis demonstrated significantly improved 1-year survival in patients receiving nintedanib treatment and during periods when the Wuhan strain was predominant.

This study is the first to demonstrate a survival benefit of antifibrotic agents in COVID-19 patients with acute respiratory failure. Further research and clinical trials are needed to confirm the efficacy of these antifibrotic agents in the context of COVID-19 and acute respiratory failure.

Peer Review reports

Coronavirus disease 2019 (COVID-19), caused by the novel coronavirus SARS-CoV-2, has had a major impact on global health and economies [ 1 ]. As of May 2023, there have been more than 760 million confirmed infections and 6.9 million deaths worldwide [ 2 ].

SARS-CoV-2 mainly infects the respiratory system and causes mild symptomatic or asymptomatic disease. However, approximately 14% of patients will require oxygen and hospital treatment, and approximately 5–6% will progress to severe pneumonia or acute respiratory distress syndrome (ARDS) and require intensive care unit (ICU) treatment [ 3 ]. The mortality rate in COVID-19 patients with ARDS requiring mechanical ventilation ranges from 35.7 to 94% [ 4 , 5 ].

An emerging complication of COVID-19 is pulmonary fibrosis [ 6 , 7 ]. Several studies have suggested that pulmonary fibrosis is a common complication that results in poor survival and functional outcomes in COVID-19 patients [ 8 , 9 ]. Novel antifibrotic agents such as nintedanib and pirfenidone have demonstrated effectiveness in managing patients with idiopathic pulmonary fibrosis (IPF). Nintedanib functions as an intracellular inhibitor targeting vascular endothelial growth factor receptors 1–3, fibroblast growth factor receptors 1–3, and platelet-derived growth factor receptors a and b [ 10 ]. On the other hand, pirfenidone is an orally bioavailable pyridone derivative known for its anti-inflammatory, antifibrotic, and antioxidant properties [ 11 ]. Both medications have been proven to reduce the rate of annual decline in forced vital capacity and mortality in patients with IPF [ 12 , 13 , 14 , 15 ]. Nintedanib also exhibits inhibitory effects on fibrogenesis across various pulmonary disorders, including connective tissue-associated interstitial lung diseases [ 16 ]. Notably, a recent study showed that antifibrotic therapy had similar efficacy in treating patients with progressive pulmonary fibrosis regardless of whether the underlying disease was IPF or non-IPF [ 17 ]. Novel antifibrotic agents are potential treatments for COVID-19-induced acute respiratory failure. We conducted this retrospective study to investigate the effect of antifibrotic agents on patients with acute respiratory failure due to COVID-19.

In this retrospective cohort study, we utilized the US Collaborative Network in the TriNetX database, which comprises 61 healthcare organizations (HCOs). TriNetX is a global federated health research network that provides access to electronic medical records, including diagnoses, procedures, medications, laboratory values, and genomic information across large HCOs.

The TriNetX platform complies with the Health Insurance Portability & Accountability Act and the General Data Protection Regulation. The Western Institutional Review Board has granted TriNetX a waiver of informed consent, as the platform only aggregates counts and statistical summaries of deidentified information.

We included adult patients (≥ 20 years old) with a positive SARS-CoV-2 PCR test or a COVID-19 diagnosis (ICD10: U07.1, U07.2, or J12.82) during the study period in the TriNetX database. Patients with a past medical history of idiopathic pulmonary fibrosis (ICD10: J84.12) prior to their COVID-19 diagnosis were excluded. The index date was set as the day acute respiratory failure (ICD10: J96.00, J96.0, J96.01, and J96.02) developed within 3 days before to 1 month after the COVID-19 diagnosis. The study population was then divided into 2 groups: those who received oral antifibrotic treatment with nintedanib or pirfenidone (antifibrotic group) and those who did not receive antifibrotic treatment (control group). A flowchart of the cohort construction from participants enrolled between 1 June 2019 and 23 August 2023 is shown in Fig.  1 . The main outcome of this study was 1-year mortality. Detailed query criteria are shown in Additional file 1.

figure 1

A flowchart of the cohort construction from participants

Subgroup analysis was conducted to compare 1-year mortality rates between patients treated with nintedanib and those who received no antifibrotic agent, as well as between patients treated with pirfenidone and those who received no antifibrotic agent. Detailed query criteria are shown in Additional file 2 and 3. Additionally, we examined 1-year mortality rates between the antifibrotic and control groups using the same method but within different time frames to assess the effects of antifibrotic treatment across various COVID-19 strains. The study period was divided into four distinct time frames based on dominant strains in America: Whhan strain (December 2019 to April 2021, 88 patients), Alpha strain (May 2021 to June 2021, 29 patients), Delta strain (July 2021 to October 2021, 48 patients), and Omicron strain (November 2021 to August 2023, 95 patients). Detailed query criteria are shown in Additional file 4–7.

Statistical analyses

We performed propensity score matching at a 1:1 ratio on age at index, gender, comorbidities, and corticosteroid use. We used the TriNetX built-in function to compare outcomes. We calculated the hazard ratio (HR) of 1-year mortality for both groups. We tested the proportional hazard assumption using the generalized Schoenfeld approach built in the TriNetX platform. We used the Kaplan‒Meier method for the survival probability. We defined statistical significance as a p value < 0.05. A 95% confidence interval (95% CI) was also considered evidence of statistical significance.

Baseline characteristics of the study population

Table  1 summarizes the demographic and lifestyle characteristics, comorbidities, adrenal corticosteroid, antiviral agents and biologic agents use in the antifibrotic and control groups before and after propensity score matching. The mean age of the population in both groups was approximately 64.5 years in the antifibrotic group and 65.8 years in the control group at the index date after matching. Approximately 59% of the individuals were male, and Caucasian was the predominant race (75.4% and 64.1% in the antifibrotic and control groups, respectively). The two groups were well matched regarding demographic, lifestyle, comorbidity characteristics. The propotion of patients receiving corticosteroid and remdesivir were similar in both groups. However, a significantly higher percentage of individuals in the antifibrotic group used Tocilizumab (6% vs. 0%, respectively, p  = 0.001) (Table  1 ).

Figure  2 shows the Kaplan‒Meier curve of survival probability. The survival probability at the end of the study was 84.42% in the antifibrotic group and 69.87% in the control group. The Log-Rank test yielded a p value of less than0.001. The hazard ratio was 0.434 (95% CI: 0.264–0.712). The chi-square was 6.721, and the p value was 0.01 for the proportionality assessment (Table  2 ). The raw data for the Kaplan-Meier graph was shown in additional file 8.

figure 2

Kaplan – Meier survival curve of 1-year mortality

Subgroup analysis revealed that out of the total participants, 127 patients received nintedanib, while 39 patients received pirfenidone. We observed a statistically significant improvement in 1-year mortality rates in the nintedanib group compared to the control group ( p  = 0.013). Although there was a noted improvement in the pirfenidone group, it was not statistically significant ( p  = 0.601) (Fig.  3 a and b). The raw data for the Kaplan-Meier graph was shown in additional file 9 and 10.

figure 3

Kaplan – Meier survival curve of 1-year mortality in (a) nintedanib and (b) pirfenidone treatment group

After dividing the study period into four distinct time frames, our analysis indicated better survival rates in the antifibrotic group across all time frames, with statistical significance observed only in the Wuhan strain period ( p  = 0.002) (Fig.  4 a-d). The raw data for the Kaplan-Meier graph was shown in additional file 11–14.

figure 4

Kaplan – Meier survival curve of 1-year mortality in (a) Wuhan strain, (b) Alpha strain, (c) Delta strain, and (d) Omicron strain time frame

The present retrospective multi-institutional study was conducted by the U.S. The Collaborative Research Network has demonstrated that the novel antifibrotic agents nintedanib and pirfenidone effectively reduce the 1-year mortality rate among COVID-19 patients suffering from acute respiratory failure. Remarkably, this study represents, to our knowledge, the first instance of survival benefit observed.

We observed significantly better 1-year survival in patient receiving nintedanib compared to those receiving pirfenidone. However, only 39 patients received pirfenidone in our cohort, indicating that small sample size may have contributed to insignificant results. Likewise, the small sample size during Alpha strain and Delta strain periods could also have led to insignificant findings. The characteristics of the Omicron strain, including its tendency to infect the upper respiratory tract rather than the lower respiratory tract and its lower IL-6 secretion, may have influenced the effectiveness of antifibrotic agents [ 18 ]. However, due to limitations of the TriNetX database, which does not specify the virus strain patients were infected with, comparing the effects of antifibrotic treatment across different strains was challenging. There likely were cases of mixed-strain infection across the four time frames. As a result, we are unable to ascertain the actual effect of antifibrotic agents on different virus strain. Further studies are warranted to validate these findings.

Our result revealed a higher percentage of patients in the antifibrotic group received Tocilizumab. Elevated IL-6 concentrations are associated with severe outcomes in COVID-19 [ 19 ]. Tocilizumab, an IL-6 receptor antagonist, has been demonstrated to reduce inflammatory responses and improve 28-day mortality in COVID-19 patients requiring oxygen therapy [ 20 ].Currently, the World Health Organization suggested using Tocilizumab in severe or critical COVID-19 patients who exhibit signs of desaturation in room air, severe respiratory distress, ARDS, require life-sustaining treatment, sepsis and septic shock [ 21 ]. However, Tocilizumab may also exacerbate bacterial infections in COVID-19 patients, which could restrict its usage. In the RECOVERY trial, 16% of patients in the Tocilizumab group ultimately did not receive this treatment [ 20 ]. Tocilizumab would be administered only to COVID-19 patients with higher severity and fewer signs of complicating bacterial infection. This observation may reflect a higher severity of illness among patients in the antifibrotic group.

Nintedanib functions by binding to intracellular ATP pockets and inhibiting profibrotic signaling pathways, including platelet-derived growth factor, fibroblast growth factor, vascular endothelial growth factor, and transforming growth factor-beta (TGF-β) [ 10 ]. Similarly, pirfenidone regulates pro-fibrotic and pro-inflammatory cytokines such as TGF-β, tumor necrosis factor-α, interferon γ, interleukin-1β, and interleukin-6, thus inhibiting fibroblast proliferation and collagen synthesis [ 22 , 23 ]. Moreover, pirfenidone has demonstrated a capacity to reduce ACE receptor expression, which is considered a major cellular receptor for SARS-CoV-2 virus entry [ 24 ]. Notably, case reports have indicated successful treatment outcomes using nintedanib for COVID-19-related ARDS [ 25 , 26 ]. While ongoing studies on the treatment of COVID-19 patients with acute respiratory failure or pulmonary fibrosis exist, only limited published data are currently available. Umemura et al. observed that nintedanib significantly shortened the duration of mechanical ventilation and reduced the high attenuation area percentage on computed tomography volumetry in COVID-19 patients admitted to the ICU and requiring mechanical ventilation. However, no significant differences in 28-day mortality were found [ 27 ]. Similarly, Zhang et al. demonstrated that pirfenidone significantly decreased inflammatory biomarkers, including interleukin-2R, tumor necrosis factor-alpha (TNF-α), and D-dimer, although clinical parameters such as clinical improvement time, duration of oxygen therapy, time from randomization to death, and interstitial changes on CT images exhibited insignificant improvement [ 28 ].

Epithelial injury followed by a subsequent fibroproliferative cascade has been recognized as a key pathogenic mechanism of pulmonary fibrosis shared between COVID-19-related ARDS and idiopathic pulmonary fibrosis [ 29 , 30 , 31 , 32 ]. Type 2 alveolar epithelial cells (ATII cells) plays a crucial role in IPF development [ 33 , 34 , 35 , 36 ]. SARS-CoV-2 enters lower respiratory tract cells via the angiotensin-converting enzyme 2 (ACE2) receptor in conjunction with transmembrane protease serine 2, expressed by ATII cells. Inadequate responses of ATII cells to lung injury lead to aberrant tissue repair, characterized by fibroblast activation, collagen deposition, connective tissue accumulation, and angiogenesis [ 37 , 38 ]. Both idiopathic pulmonary fibrosis and COVID-19-induced pulmonary fibrosis involve the renin-angiotensin system (RAS) in their disease progression. A pivotal player in this context, ACE2, interacts with other components of the RAS. ACE2-mediated SARS-CoV-2 entry into lung cells is believed to result in reduced ACE2 expression, disturbing the RAS system balance and subsequently triggering inflammation and fibrosis [ 39 ].

Dysregulation of microRNAs (miRNAs) has been implicated in the development of pulmonary fibrosis in COVID-19 patients, contributing to collagen deposition and myofibroblast transformation [ 40 ]. This miRNA imbalance has also been observed in idiopathic pulmonary fibrosis patients [ 41 ]. Lacedonia et al. analyzed the expression of exosomal miRNAs and confirmed the key involvement of a let-7d down-regulation and dysregulation of miR-16 in IPF [ 42 ]. Notably, Guiot et al. identified a total of 34 dysregulated miRNAs that overlapped between COVID-19 and idiopathic pulmonary fibrosis [ 40 ].

Despite the insightful findings, our study has certain limitations. The utilization of retrospective electronic records introduces inherent weaknesses. Lack of access to raw data prevents an accurate assessment of disease severity, identification of the causes of acute respiratory failure, determination of whether SARS-CoV-2 related pneumonia was presenting, specification of specific dosage and timing or the reasons for initiating antifibrotic treatment. Potential miscoding, inaccurate coding, or incomplete clinical information about comorbidities, socioeconomic status, and lifestyle habits may also introduce biases. Moreover, the TriNetX database lacks certain clinical information, such as ventilator-free days, ICU days, ventilator weaning rates, and hospitalization duration. Consequently, the efficacy of antifibrotic therapy cannot be adequately evaluated based on these parameters.

Conclusions

This retrospective multi-institutional study highlights the potential benefits of novel antifibrotic agents, nintedanib and pirfenidone, in reducing mortality among COVID-19 patients with acute respiratory failure. This study offers important insights into the therapeutic potential of these agents in managing the complex pathogenesis of both COVID-19-induced pulmonary fibrosis and idiopathic pulmonary fibrosis. The findings underscore the significance of targeting fibroproliferative pathways and the RAS in mitigating inflammation and fibrosis. However, it is important to acknowledge the study’s limitations, particularly the retrospective nature of data analysis and the potential for biases. Future research and clinical trials are needed to further validate the efficacy of these antifibrotic agents and explore their precise mechanisms of action in the context of COVID-19 and acute respiratory failure.

Data availability

All data generated or analysed during this study are included in this published article and its supplementary information files.

Abbreviations

19-coronavirus disease 2019

acute respiratory distress syndrome

intensive care unit

idiopathic pulmonary fibrosis

healthcare organizations

hazard ratio

confidence interval

β-transforming growth factor beta

α-tumor necrosis factor-alpha

type 2 alveolar epithelial cells

renin-angiotensin system

angiotensin-converting enzyme 2

Lai CC, Shih TP, Ko WC, Tang HJ, Hsueh PR. Severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) and coronavirus disease-2019 (COVID-19): the epidemic and the challenges. Int J Antimicrob Agents. 2020;55(3):105924.

Article   CAS   PubMed   PubMed Central   Google Scholar  

WHO Coronavirus (COVID-19.) Dashboard [Available from: https://covid19.who.int/ .

Guan WJ, Ni ZY, Hu Y, Liang WH, Ou CQ, He JX, et al. Clinical characteristics of Coronavirus Disease 2019 in China. N Engl J Med. 2020;382(18):1708–20.

Article   CAS   PubMed   Google Scholar  

Auld SC, Caridi-Scheible M, Blum JM, Robichaux C, Kraft C, Jacob JT, et al. ICU and ventilator mortality among critically ill adults with Coronavirus Disease 2019. Crit Care Med. 2020;48(9):e799–804.

Sinha S, Sardesai I, Galwankar SC, Nanayakkara PWB, Narasimhan DR, Grover J, et al. Optimizing respiratory care in coronavirus disease-2019: a comprehensive, protocolized, evidence-based, algorithmic approach. Int J Crit Illn Inj Sci. 2020;10(2):56–63.

Article   PubMed   PubMed Central   Google Scholar  

George PM, Wells AU, Jenkins RG. Pulmonary fibrosis and COVID-19: the potential role for antifibrotic therapy. Lancet Respir Med. 2020;8(8):807–15.

John AE, Joseph C, Jenkins G, Tatler AL. COVID-19 and pulmonary fibrosis: a potential role for lung epithelial cells and fibroblasts. Immunol Rev. 2021;302(1):228–40.

Pan Y, Guan H, Zhou S, Wang Y, Li Q, Zhu T, et al. Initial CT findings and temporal changes in patients with the novel coronavirus pneumonia (2019-nCoV): a study of 63 patients in Wuhan, China. Eur Radiol. 2020;30(6):3306–9.

Schwensen HF, Borreschmidt LK, Storgaard M, Redsted S, Christensen S, Madsen LB. Fatal pulmonary fibrosis: a post-COVID-19 autopsy case. J Clin Pathol. 2020.

Wollin L, Wex E, Pautsch A, Schnapp G, Hostettler KE, Stowasser S, et al. Mode of action of nintedanib in the treatment of idiopathic pulmonary fibrosis. Eur Respir J. 2015;45(5):1434–45.

Conte E, Gili E, Fagone E, Fruciano M, Iemmolo M, Vancheri C. Effect of pirfenidone on proliferation, TGF-beta-induced myofibroblast differentiation and fibrogenic activity of primary human lung fibroblasts. Eur J Pharm Sci. 2014;58:13–9.

Richeldi L, Costabel U, Selman M, Kim DS, Hansell DM, Nicholson AG, et al. Efficacy of a tyrosine kinase inhibitor in idiopathic pulmonary fibrosis. N Engl J Med. 2011;365(12):1079–87.

Richeldi L, Kreuter M, Selman M, Crestani B, Kirsten AM, Wuyts WA, et al. Long-term treatment of patients with idiopathic pulmonary fibrosis with nintedanib: results from the TOMORROW trial and its open-label extension. Thorax. 2018;73(6):581–3.

Article   PubMed   Google Scholar  

King TE Jr., Bradford WZ, Castro-Bernardini S, Fagan EA, Glaspole I, Glassberg MK, et al. A phase 3 trial of pirfenidone in patients with idiopathic pulmonary fibrosis. N Engl J Med. 2014;370(22):2083–92.

Taguchi Y, Ebina M, Hashimoto S, Ogura T, Azuma A, Taniguchi H, et al. Efficacy of pirfenidone and disease severity of idiopathic pulmonary fibrosis: extended analysis of phase III trial in Japan. Respir Investig. 2015;53(6):279–87.

Flaherty KR, Wells AU, Cottin V, Devaraj A, Walsh SLF, Inoue Y, et al. Nintedanib in Progressive Fibrosing interstitial lung diseases. N Engl J Med. 2019;381(18):1718–27.

Finnerty JP, Ponnuswamy A, Dutta P, Abdelaziz A, Kamil H. Efficacy of antifibrotic drugs, nintedanib and pirfenidone, in treatment of progressive pulmonary fibrosis in both idiopathic pulmonary fibrosis (IPF) and non-IPF: a systematic review and meta-analysis. BMC Pulm Med. 2021;21(1):411.

Zaderer V, Abd El Halim H, Wyremblewsky AL, Lupoli G, Dachert C, Muenchhoff M, et al. Omicron subvariants illustrate reduced respiratory tissue penetration, cell damage and inflammatory responses in human airway epithelia. Front Immunol. 2023;14:1258268.

Del Valle DM, Kim-Schulze S, Huang HH, Beckmann ND, Nirenberg S, Wang B, et al. An inflammatory cytokine signature predicts COVID-19 severity and survival. Nat Med. 2020;26(10):1636–43.

Group RC. Tocilizumab in patients admitted to hospital with COVID-19 (RECOVERY): a randomised, controlled, open-label, platform trial. Lancet. 2021;397(10285):1637–45.

Article   Google Scholar  

Agarwal A, Hunt BJ, Stegemann M, Rochwerg B, Lamontagne F, Siemieniuk RA, et al. A living WHO guideline on drugs for covid-19. BMJ. 2020;370:m3379.

Raghu G, Richeldi L. Current approaches to the management of idiopathic pulmonary fibrosis. Respir Med. 2017;129:24–30.

Yue X, Shan B, Lasky JA. TGF-beta: Titan of Lung Fibrogenesis. Curr Enzym Inhib. 2010;6(2).

Li C, Han R, Kang L, Wang J, Gao Y, Li Y, et al. Pirfenidone controls the feedback loop of the AT1R/p38 MAPK/renin-angiotensin system axis by regulating liver X receptor-alpha in myocardial infarction-induced cardiac fibrosis. Sci Rep. 2017;7:40523.

Marwah V, Choudhary R, Bhati G, Peter DK. Early experience with anti-interleukin-6 therapy in COVID-19 hyperinflammation. Lung India. 2021;38(Supplement):S119–21.

PubMed   PubMed Central   Google Scholar  

Ogata H, Nakagawa T, Sakoda S, Ishimatsu A, Taguchi K, Kadowaki M, et al. Nintedanib treatment for pulmonary fibrosis after coronavirus disease 2019. Respirol Case Rep. 2021;9(5):e00744.

Umemura Y, Mitsuyama Y, Minami K, Nishida T, Watanabe A, Okada N, et al. Efficacy and safety of nintedanib for pulmonary fibrosis in severe pneumonia induced by COVID-19: an interventional study. Int J Infect Dis. 2021;108:454–60.

Zhang FWY, He L, Zhang H, Hu Q, Yue H, He J, Dai H. A trial of pirfenidone in hospitalized adult patients with severe coronavirus disease 2019. Chin Med J (Engl). 2022;135:368–70.

Article   CAS   Google Scholar  

Wijsenbeek M, Cottin V. Spectrum of Fibrotic Lung diseases. N Engl J Med. 2020;383(10):958–68.

Richeldi L, Collard HR, Jones MG. Idiopathic pulmonary fibrosis. Lancet. 2017;389(10082):1941–52.

Lee JM, Yoshida M, Kim MS, Lee JH, Baek AR, Jang AS, et al. Involvement of alveolar epithelial cell necroptosis in idiopathic pulmonary fibrosis pathogenesis. Am J Respir Cell Mol Biol. 2018;59(2):215–24.

Minagawa S, Yoshida M, Araya J, Hara H, Imai H, Kuwano K. Regulated necrosis in Pulmonary Disease. A focus on necroptosis and Ferroptosis. Am J Respir Cell Mol Biol. 2020;62(5):554–62.

Confalonieri P, Volpe MC, Jacob J, Maiocchi S, Salton F, Ruaro B et al. Regeneration or repair? The role of alveolar epithelial cells in the pathogenesis of idiopathic pulmonary fibrosis (IPF). Cells. 2022;11(13).

Calkovska A, Kolomaznik M, Calkovsky V. Alveolar type II cells and pulmonary surfactant in COVID-19 era. Physiol Res. 2021;70(S2):S195–208.

Ziegler CGK, Allon SJ, Nyquist SK, Mbano IM, Miao VN, Tzouanas CN, et al. SARS-CoV-2 receptor ACE2 is an Interferon-stimulated gene in human airway epithelial cells and is detected in specific cell subsets across tissues. Cell. 2020;181(5):1016–e3519.

Carcaterra M, Caruso C. Alveolar epithelial cell type II as main target of SARS-CoV-2 virus and COVID-19 development via NF-Kb pathway deregulation: a physio-pathological theory. Med Hypotheses. 2021;146:110412.

Parimon T, Yao C, Stripp BR, Noble PW, Chen P. Alveolar epithelial type II cells as drivers of lung fibrosis in idiopathic pulmonary fibrosis. Int J Mol Sci. 2020;21(7).

Selman M, Pardo A. The leading role of epithelial cells in the pathogenesis of idiopathic pulmonary fibrosis. Cell Signal. 2020;66:109482.

Tan WSD, Liao W, Zhou S, Mei D, Wong WF. Targeting the renin-angiotensin system as novel therapeutic strategy for pulmonary diseases. Curr Opin Pharmacol. 2018;40:9–17.

Guiot J, Henket M, Remacle C, Cambier M, Struman I, Winandy M, et al. Systematic review of overlapping microRNA patterns in COVID-19 and idiopathic pulmonary fibrosis. Respir Res. 2023;24(1):112.

Bagnato G, Roberts WN, Roman J, Gangemi S. A systematic review of overlapping microRNA patterns in systemic sclerosis and idiopathic pulmonary fibrosis. Eur Respir Rev. 2017;26(144).

Lacedonia D, Scioscia G, Soccio P, Conese M, Catucci L, Palladino GP, et al. Downregulation of exosomal let-7d and miR-16 in idiopathic pulmonary fibrosis. BMC Pulm Med. 2021;21(1):188.

Download references

Acknowledgements

Not applicable.

The funder of the study had no role in the study design, data collection, data analysis, data interpretation, or writing of the report.

Author information

Authors and affiliations.

Department of Nuclear Medicine, Taichung Veterans General Hospital, Taichung, Taiwan

Hsin-Yi Wang, Shih-Chuan Tsai, Yi-Ching Lin & Jing-Uei Hou

School of Medicine, National Defense Medical Center, Taipei, Taiwan

Hsin-Yi Wang

Department of Medical Imaging and Radiological Technology, Institute of Radiological Science, Central Taiwan University of Science and Technology, Taichung, Taiwan

Shih-Chuan Tsai

Department of Public Health, China Medical University, Taichung, Taiwan

Yi-Ching Lin

Division of Chest Medicine, Department of Internal Medicine, Chang Bing Show Chwan Memorial Hospital, Changhua, Taiwan

Chih-Hao Chao

You can also search for this author in PubMed   Google Scholar

Contributions

HY-W and CH-C wrote the draft of the manuscript; CH-C performed data analysis; HY-W and CH-C designed and supervised the study. SC-T, YC-L, JU-H and CH-C revised the manuscript. All authors read and approved the submitted version.

Corresponding author

Correspondence to Chih-Hao Chao .

Ethics declarations

Ethics approval and consent to participate.

The use of TriNetX for this study was approved under the authority of the Institutional Review Board of Taichung Veterans General Hospital (TCVGH-IRB No. SE22220A-1). The Western Institutional Review Board has granted TriNetX a waiver of informed consent, as the platform only aggregates counts and statistical summaries of deidentified information.

Consent for publication

Competing interests.

The authors declare no competing interests.

Additional information

Publisher’s note.

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary Material 1

Supplementary material 2, supplementary material 3, supplementary material 4, supplementary material 5, supplementary material 6, supplementary material 7, supplementary material 8, supplementary material 9, supplementary material 10, supplementary material 11, supplementary material 12, supplementary material 13, supplementary material 14, rights and permissions.

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ . The Creative Commons Public Domain Dedication waiver ( http://creativecommons.org/publicdomain/zero/1.0/ ) applies to the data made available in this article, unless otherwise stated in a credit line to the data.

Reprints and permissions

About this article

Cite this article.

Wang, HY., Tsai, SC., Lin, YC. et al. The effect of antifibrotic agents on acute respiratory failure in COVID-19 patients: a retrospective cohort study from TriNetX US collaborative networks. BMC Pulm Med 24 , 160 (2024). https://doi.org/10.1186/s12890-024-02947-5

Download citation

Received : 05 November 2023

Accepted : 04 March 2024

Published : 02 April 2024

DOI : https://doi.org/10.1186/s12890-024-02947-5

Share this article

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

  • Antifibrotic agent
  • Pirfenidone
  • Acute respiratory failure

BMC Pulmonary Medicine

ISSN: 1471-2466

covid 19 data analysis research paper

U.S. flag

An official website of the United States government

The .gov means it’s official. Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

The site is secure. The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

  • Publications
  • Account settings

Preview improvements coming to the PMC website in October 2024. Learn More or Try it out now .

  • Advanced Search
  • Journal List
  • Elsevier - PMC COVID-19 Collection

Logo of pheelsevier

A systematic review and meta-analysis of published research data on COVID-19 infection fatality rates

Gideon meyerowitz-katz.

a Western Sydney Local Health District, Australia

b University of Wollongong, Australia

c James Cook University, Australia

d Tropical Public Health Service, Cairns, Australia

Associated Data

An important unknown during the coronavirus disease-2019 (COVID-19) pandemic has been the infection fatality rate (IFR). This differs from the case fatality rate (CFR) as an estimate of the number of deaths and as a proportion of the total number of cases, including those who are mild and asymptomatic. While the CFR is extremely valuable for experts, IFR is increasingly being called for by policy makers and the lay public as an estimate of the overall mortality from COVID-19.

Pubmed, Medline, SSRN, and Medrxiv were searched using a set of terms and Boolean operators on 25/04/2020 and re-searched on 14/05/2020, 21/05/2020 and 16/06/2020. Articles were screened for inclusion by both authors. Meta-analysis was performed in Stata 15.1 by using the metan command, based on IFR and confidence intervals extracted from each study. Google/Google Scholar was used to assess the grey literature relating to government reports.

After exclusions, there were 24 estimates of IFR included in the final meta-analysis, from a wide range of countries, published between February and June 2020.

The meta-analysis demonstrated a point estimate of IFR of 0.68% (0.53%–0.82%) with high heterogeneity (p < 0.001).

Based on a systematic review and meta-analysis of published evidence on COVID-19 until July 2020, the IFR of the disease across populations is 0.68% (0.53%–0.82%). However, due to very high heterogeneity in the meta-analysis, it is difficult to know if this represents a completely unbiased point estimate. It is likely that, due to age and perhaps underlying comorbidities in the population, different places will experience different IFRs due to the disease. Given issues with mortality recording, it is also likely that this represents an underestimate of the true IFR figure. More research looking at age-stratified IFR is urgently needed to inform policymaking on this front.

Introduction

The year 2020 saw the emergence of a global pandemic, coronavirus disease-2019 (COVID-19), caused by the SARS-CoV-2 virus, which began in China and has since spread across the world. One of the most challenging questions to answer during the COVID-19 pandemic has been regarding the true infection fatality rate (IFR) of the disease. While case fatality rates (CFR) are eminently calculable from various published data sources ( Kahathuduwa et al., 2020 ) – CFR being the number of deaths divided by the number of confirmed cases – it is far more difficult to extrapolate to the proportion of all infected individuals who have died due to the infection because those who have very mild, atypical or asymptomatic disease are frequently left undetected and therefore omitted from fatality rate calculations ( Rinaldi and Paradisi, 2020 ). Given the issues with obtaining accurate estimates, it is not unexpected that there are wide disparities in the published estimates of case numbers. This is an issue for several reasons, most importantly in that policy is dependent on modelling, and modelling is dependent on assumptions. If we do not have a robust estimate of IFR, it is challenging to make predictions about the true impact of COVID-19 in any given susceptible population, which may stymie policy development and may have serious consequences for decision-making into the future. While CFR is a more commonly used statistic, and is very widely understood among experts, IFR provides important context for policymakers that is hard to convey, particularly given the wide variation in CFR estimates. While CFR is naturally a function of the denominator – i.e. how many people have been tested for the disease – policymakers are often most interested in the total burden in the population rather than the biased estimates given from testing only the acutely unwell patients.

This is particularly important when considering the reopening of countries post ‘lockdown’. Depending on the severity of the disease, it may be reasonable to reopen services such as schools, bars, and clubs, at different timings. Another salient point is the expected burden of disease in younger age groups — while there are likely long-term impacts other than death, it will be important for future planning to know how many people in various age groups are likely to die if the infection becomes widespread across societies. Age-stratified estimates are also important as it may give countries some way to predict the number of deaths expected given their demographic breakdown.

There are a number of methods for investigating the IFR in a population. Retrospective modelling studies of influenza, as a common cause of global pandemics, have successfully predicted the true number of cases and deaths from influenza-like illness records and excess mortality estimates ( Wong et al., 2013 , Thompson et al., 2009 ). However, these may not be accurate, in part due to the general difficulty in attributing influenza cases to subsequent mortality, meaning that CFRs may both overestimate and equally underestimate the true number of deaths due to the disease in a population ( Spychalski et al., 2020 ).

The standard test for COVID-19 involves polymerase chain reaction (PCR) testing of nasopharyngeal swabs from patients suspected of having contracted the virus. This can produce some false negatives ( Anon, 2020a ), with one study demonstrating almost a quarter of patients experiencing a positive result following up to two previous false negatives ( Xiao et al., 2020 ). The sensitivity of PCR is believed to be around 70%, which may lead to the underdiagnosis of COVID-19 ( Fernández-Barat et al., 2020 ). PCR is also limited in that it cannot test for previous infection. Serology testing is more invasive, requiring a blood sample. However, it can determine whether there has been previous infection and can be performed rapidly at the point of care (PoC). Serology PoC testing cannot determine if a person is infectious or if infection is recent and there is a risk of misinterpretation of results ( Winter and Hegde, 2020 ). Generally, serology testing is more sensitive and specific than PCR, but will still likely overestimate prevalence when few people have been infected with COVID-19 and underestimate in populations with more infections ( Lisboa Bastos et al., 2020 ). Additionally, there has been great variation noted in the sensitivity, the ability of the test to detect truly positive cases, of COVID-19 serology tests ( Ghaffari et al., 2020 ). Serological tests are reliant on seroconversion, which in COVID-19 occurs several days after the viral load has peaked, meaning serology is less effective in the earlier stages of the disease ( Ghaffari et al., 2020 ). Some studies suggest that there are those who do not seroconvert at all ( Staines et al., 2020 ). The lack of reliable testing may be problematic for estimating CFRs and IFRs.

Given the emergence of COVID-19 as a global pandemic, it is somewhat unlikely that these issues are entirely the same for the newer disease, but there are likely similarities between the two. Some analysis in mainstream media publications and pre-prints has implied that there is a large burden of deaths that remains unattributed to COVID-19. Similarly, serological surveys have demonstrated that there is a large proportion of cases that have not been captured in the case numbers reported in the US, Europe and potentially worldwide ( Bendavid et al., 2020 , Erikstrup et al., 2020 , Simon, 2020 ).

This paper presents a systematic effort to collate and aggregate these disparate estimates of IFR using an easily replicable method. While any meta-analysis is only as reliable as the quality of included studies, this will at least put a realistic estimate to the IFR given current published evidence.

This study used a simple systematic review protocol. PubMed, MedLine, and Medrxiv were searched on the 25/04/2020 using the terms and Boolean operators: (infection fatality rate OR ifr OR seroprevalence) AND (COVID-19 OR SARS-CoV-2). This search was repeated on 14/05/2020, 25/05/2020 and 16/06/2020. The pre-print server SSRN was also searched on 25/05/2020; however, as it does not allow this format, the Boolean operators and brackets were removed. While Medrxiv and SSRN would usually be excluded from systematic review, given that the papers included are not peer-reviewed, during the pandemic it has been an important source of information and contains many of the most recent estimates for epidemiological information about COVID-19. Inclusion criteria for the studies were:

  • - Regarding COVID-19/SARS-CoV-2 (i.e. not SARS-CoV-1 extrapolations).
  • - Presented an estimated population IFR (or allowed the calculation of such from publicly available data).

Titles and abstracts were screened for eligibility and discarded if they did not meet the inclusion criteria. GMK then conducted a simple Google and Google Scholar search using the same terms to assess the grey literature, in particular published estimates from government agencies that may not appear on formal academic databases. LM assessed the articles to ensure congruence. If these met the inclusion criteria, they were included in the systematic review and meta-analysis. Similarly, Twitter searches were performed using similar search terms to assess the evidence available on social media. Estimates for IFR and the confidence interval were extracted for each study.

All analysis and data transformation were performed in Stata 15.1. The meta-analysis was performed using the metan command for continuous estimates, with IFR and the lower/upper bounds of the confidence interval as the variables entered. This model used the DerSimonian and Laird random-effects method. The metan command in Stata automatically generates an I 2 statistic that was used to investigate heterogeneity. Histograms were visually inspected to ensure that there was no significant positive or negative skew to the results that would invalidate this methodology. For the studies where no confidence interval was provided, one was calculated.

An external file that holds a picture, illustration, etc.
Object name is fx1_lrg.jpg

A PRISMA flow diagram of the search methods.

Sensitivity analyses were performed stratifying the results into the type of study – serological vs non – by country, and by the month of calculation.

The metabias and metafunnel commands were used to examine publication bias in the included research, with Egger’s test used for the metabias estimation. It was challenging to formally rate the risk of bias of the included modelling studies, as there was very significant heterogeneity in methodology and implementation, with the result that the risk of bias in these studies was considered to be high across all included research. Serological surveys were rated using the risk of bias in the prevalence tool with a resulting estimate in line with Cochrane GRADE criteria of low, moderate or high ( Hoy et al., 2012 ). This tool asks a series of 10 questions about the sampling and data collection of prevalence studies, with a final rating based on the previous questions. Each question is answered yes/no, with a lack of information presumed to be no/unclear. A separate sensitivity analysis was conducted using only serological survey results stratified by the risk of bias.

Because of a recent surge in the number of serological surveys being published, these were included in the infection fatality estimate despite not formally calculating an IFR in the study text itself. Regional death rates were taken from the John Hopkins University CSSE dashboard ( Dong et al., 2020 ) 10 days after the serosurvey completion where no IFR was calculated to account for right-censoring of these estimates ( Giorgi Rossi et al., 2020 ), and used to estimate the IFR given the population.

All code and data files are available (in. do and. csv format) upon request.

Initial searches identified 252 studies across all databases. Later searches on Google and social media, as well as resampling the included databases revealed a further 17 estimates to include in the study. These came from a variety of sources, with some appearing from blog posts, others posted on Twitter, and some government documents being found through Google. There were no duplicates specifically, however, two pre-prints had been published and so appeared in slightly different forms in both databases. In this case, the published study was used rather than the pre-print. Results are collated in Table 1 .

Results of the systematic review of published research data on COVID-19 infection fatality rates.

Studies were excluded for a variety of reasons. Some studies only looked at COVID-19 incidence, rather than the prevalence of antibodies, and were thus considered potentially unreliable as population estimates ( Gudbjartsson et al., 2020 ). The most common reason for exclusion was selection bias — many studies only looked at targeted populations in their seroprevalence data, and thus could not be used as population estimators of IFR ( Erikstrup et al., 2020 , Doi et al., 2020 , Takita et al., 2020 , Jerkovic et al., 2020 , Valenti et al., 2020 , Garcia-Basteiro et al., 2020 , Fontanet et al., 2020 , Thompson et al., 2020 , Ed Slot and Reusken, 2020 ). For some data, it was difficult to determine the numerator (i.e. number of deaths) associated with the seroprevalence estimate or the denominator (i.e. population) was not well defined and thus we did not calculate an IFR ( Silveira et al., 2020 , Bryan et al., 2020 ). One study explicitly warned against using its data to obtain an IFR ( Sood et al., 2020 ). Another study calculated an IFR, but did not allow for an estimate of confidence bounds and thus could not be included in the quantitative synthesis ( Wilson, 2020 ).

After screening titles and abstracts, 227 studies were removed. Many of these looked at case fatality estimates or discussed IFR as a concept and/or a model input, rather than estimate the figure themselves. Forty papers were assessed for eligibility for inclusion in the study, which resulted in a final 25 to be included in the qualitative synthesis.

Studies varied widely in design, with 3 entirely modelled estimates ( Nishiura et al., 2020 , Jung et al., 2020 , Salje et al., 2020 ), 4 observational studies ( Bendavid et al., 2020 , Verity et al., 2020 , Tian et al., 2020 , Russell et al., 2020 ), 5 pre-prints that were challenging to otherwise classify ( Rinaldi and Paradisi, 2020 , Roques et al., 2020 , Villa et al., 2020 , Modi et al., 2020a , Streeck et al., 2020 ), and a number of serological surveys of varying types reported by government agencies ( Bassett, 2020 , Anon, 2020b , IU, 2020 , Snoeck et al., 2020 , Slovenia RO, 2020 , Anon, 2020c , Shakiba et al., 2020 , Statistics OFN, 2020 , Hallal et al., 2020 , Institut SS, 2020 , Folkhälsomyndigheten, 2020a , Anon, 2020e , Stringhini et al., 2020b ). For the purposes of this research, an estimate for New York City was calculated from official statistics and the serosurvey; however, this was correlated with a published estimate ( Wilson, 2020 ) to ensure validity.

The main result from the random effects meta-analysis is presented in Figure 1 . Overall, the aggregated estimate across all 24 studies indicated an IFR of 0.68% (95% CI 0.53%–0.82%), or 68 deaths per 10,000 infections. Heterogeneity was extremely high, with the overall I 2 exceeding 99% (p < 0.0001) ( Figure 2 ).

An external file that holds a picture, illustration, etc.
Object name is gr1_lrg.jpg

The monthly sensitivity analysis from Figure 3 showed that earlier estimates of IFR were lower, with later estimates showing a higher figure, although this appears to have stabilised in May.

An external file that holds a picture, illustration, etc.
Object name is gr3_lrg.jpg

Analysing by the region of origin did not appear to have a substantial effect on the findings, although there was a slightly lower estimate seen in Asia. As the Middle East was only represented by one study, this region was excluded from the meta-synthesis by region. Two studies were also excluded as they did not present an IFR for a specific region (i.e. Diamond Princess).

Of note, there was some difference in the estimates of IFR between estimates based on serosurveys and those of modelled or PCR-based estimates. The overall estimates from serosurvey studies were 0.60% (0.42%–0.77%), although again with very high heterogeneity, as can be seen in Figure 4 .

An external file that holds a picture, illustration, etc.
Object name is gr4_lrg.jpg

There were insufficient data in the included research to perform a meta-analysis of IFR by age. However, qualitatively synthesizing the data that were presented indicates that the expected IFR below the age of 60 years is likely to be reduced by a large factor. This is supported by studies examining the CFR, which were not included in the quantitative synthesis and studies examining IFR in selected populations younger than 70 years of age that demonstrate a strong age-related gradient to the death rate from COVID-19.

Plotting the studies using a funnel plot produced some visual indication of publication bias, with more high estimates than would be expected; however, the Egger’s regression was not significant (p = 0.74).

Risk of bias

As previously noted, all estimates obtained from modelling studies are considered to be at a high risk of bias due to the heterogeneity and difficulty in rating these studies for accuracy. After using the rating ‘risk of bias’ tool for prevalence studies, 6 studies were considered to be at a low risk of bias, 4 studies at a moderate risk of bias, and the remaining 5 estimates at a high risk of bias. This is summarized in the table below (full scoring in Supplementary materials) ( Table 2 ).

Risk of bias in included serosurveys.

In general, the primary reason for down-rating studies was non-response bias, the lack of representativeness of the population sample, and a lack of information across all fields. Some reports were published with minimal information, which substantially increased the uncertainty and thus the risk of bias in these estimates.

The sensitivity analysis by study quality results are given below. Broadly, study quality was correlated with a higher inferred IFR, with lower-quality serosurveys reporting higher estimates of population prevalence than randomly sampled population-wide prevalence estimates. Restricting the analysis to only those studies at a low risk of bias resulted in modestly reduced heterogeneity and an increased IFR of 0.76% (0.37%–1.15%) ( Figure 5 ).

An external file that holds a picture, illustration, etc.
Object name is gr5_lrg.jpg

Other estimates of IFR

Several estimates of IFR were identified but not included in the meta-analysis as they did not meet the inclusion criteria. The aggregated best estimate from the Centre for Evidence-Based Medicine at Oxford University of 0.1%–0.41% ( Jason Oke, 2020 ), and the pre-print estimate reported by Grewelle and Leo of 1.04% (0.77%–1.38%) ( Grewelle and De Leo, 2020 ) were both pertinent but could not be included because of collinearity.

Similarly, the estimate of symptomatic IFR produced by Basu of 1.3% (0.6%–2.1%) ( Basu, 2020 ) was excluded because of the exclusion of asymptomatic cases. Using reported estimates of asymptomatic cases, this estimate would likely match the meta-analytic IFR; however, this correction could not be applied for the estimates in this study as it could easily introduce bias in the results.

As pandemic COVID-19 progresses, it is useful to use the IFR when reporting figures, particularly as some countries begin to engage in enhanced screening and surveillance, and observe an increase in positive cases that are asymptomatic and/or mild enough that they have so far avoided testing ( Sutton et al., 2020 ). It has been acknowledged that COVID-19 is often spread from asymptomatic and/or very mildly symptomatic cases – potentially up to 50% of all patients – and that asymptomatic transmission may also be possible with COVID-19 ( Nishiura et al., 2020 , Bai et al., 2020 ) and the use of IFR would aid the capture of these individuals in mortality figures. IFR modelling, calculation and figures, however, are inconsistent.

The main finding of this research is that there is very high heterogeneity among estimates of IFR for COVID-19 and therefore, it is difficult to draw a single conclusion regarding the number. Aggregating the results together provides a point estimate of 0.68% (0.53%–0.82%), but there remains considerable uncertainty about whether this is a reasonable figure or simply a best guess. It appears likely, however, that the true population IFR in most places from COVID-19 will lie somewhere between the lower bound and upper bounds of this estimate.

One reason for the very high heterogeneity is likely that different countries and regions will experience different death rates due to the disease. One factor that may impact this is government response, with more prepared countries suffering lower death rates than those that have sufficient resources to combat a large outbreak ( Scally et al., 2020 ). Moreover, it is very likely, given the evidence around age-related fatality, that a country with a significantly younger population would see fewer deaths on average than one with a far older population, given similar levels of healthcare provisions between the two. For example, Israel, with a median age of 30 years, would expect a lower IFR than Italy, with a much higher median age (45.4 years).

Some included studies ( Rinaldi and Paradisi, 2020 , Modi et al., 2020a ) compared fatality during COVID-19 pandemic with previous years’ average fatality, determining that mortality has been higher during the pandemic and whilst correlation does not necessarily equate to causation, it is reasonable to link the events as causal given the high CFR observed across countries. It is highly likely from the data analysed that IFR increases with age group, with those aged over 60 years old experiencing the highest IFR, in one case close to 15% ( Modi et al., 2020a ). Given the elderly are the most vulnerable in society to illness and likely to carry a higher disease burden owing to increased susceptibility and comorbidity ( Liu et al., 2020 , Rothan and Byrareddy, 2020 ), the lower IFRs observed in the younger populations may skew the figure somewhat. There are some reasonable estimates of fatality in younger age groups that were not included in the population estimates ( Erikstrup et al., 2020 , Valenti et al., 2020 , Thompson et al., 2020 ), which imply a substantially lower rate of death in the population below 70 years of age. While these studies were not considered applicable for quantitative synthesis, they do imply a lower IFR for those aged 18–70 years. Indeed, a recently published estimate stratified infection fatality by age and found a very low risk for under 50 s that increased exponentially with age from 0.0016% <50 years to 0.14% for 50–64 year olds and up to 5.6% for those 65 years and older ( Perez-Saez et al., 2020 ). This has also been demonstrated in a pre-print meta-analysis of age-stratified IFR that found an exponential increase in IFR by age, from 0.005% for children to 0.2% at age 50, 0.75% at age 60, and 27% for ages 85 and above ( Levin et al., 2020 ).

While not included in the quantitative synthesis, one paper did examine the extreme lower bound of IFR of COVID-19 in situations where the healthcare system has been overwhelmed. This is likely to be higher than the IFR in a less problematic situation but demonstrates that the absolute minimum in such a situation cannot be lower than 0.2%, and is likely much higher than this figure in most scenarios involving overburdened hospitals.

Of note, there appears to be a divergence between estimates based on serosurveys and those that are modelled or inferred from other forms of testing, with the IFR based purely on serosurveillance being 0.60% (0.43%–0.77%). Some have argued that serological surveys are the only proper way to estimate IFR, which would lead to the acceptance of this slightly lower IFR as the most likely estimate ( Ioannidis, 2020 ). However, even these estimates are very heterogeneous in quality, with some extremely robust data such as that reported from the Spanish and Swedish health agencies ( Anon, 2020b , Folkhälsomyndigheten, 2020b ), and some that have clear and worrying flaws such as a study from Iran where death estimates are reportedly substantially lower than the true figure ( Shakiba et al., 2020 ). However, when taking quality into account, and only analysing those serosurveys that had a low risk of bias, it is interesting to note that the inferred IFR rises substantially to 0.76% (0.37%–1.15%). This may be due to the bias in lower-quality serosurveys being towards a higher prevalence ( Sood et al., 2020 ), which in turn lowers the IFR substantially.

Another key issue is accounting for deaths. While official death counts were used for all serosurvey estimates, and included in all modelled estimates, these counts are increasingly being recognized as undercounts of the true death figure ( Modi et al., 2020a ). Published research is already estimating that, even in many wealthy countries with excellent death-reporting systems, more than 50% of COVID-19 deaths are likely being missed ( Modi et al., 2020b , Modig and Ebeling, 2020 ). It is not unlikely that, after correcting for excess mortality not captured in official death-reporting systems, the IFR of COVID-19 in most populations would be substantially higher than our analysis suggests. It is also possible that the IFR of the disease will drop over time as treatments improve; however, our analysis at least does not demonstrate that this has been the case in the first half of 2020.

Conversely, there is evidence that the tests used in these serosurveys have drawbacks despite their high specificity and sensitivity. For example, in asymptomatic/mild cases, the tests may have reduced sensitivity, leading to a biased overestimation of the IFR ( Takahashi et al., 2020 ). A recent systematic review and meta-analysis of serological tests for COVID-19 found that even the better serology tests would likely overestimate prevalence in an area with few cases and underestimate prevalence when many people had already been infected ( Lisboa Bastos et al., 2020 ). In areas with a prevalence of 1%–2%, for example, the systematic review implies that a study employing an enzyme-linked immunosorbent assay to examine antibodies would produce an estimated infection rate almost double the true prevalence. This would then cause the IFR to be underestimated by the same fraction.

There are a number of limitations to this research. Importantly, the heterogeneity in the meta-analysis was very high. This may mean that the point estimates are less reliable than would be expected. It is also notable that any meta-analysis is only as reliable as the data contained within — this research included a very broad range of studies that address slightly different questions with a very wide range of methodological rigor, and thus cannot represent the certainty of any kind. While modelling studies were not formally graded, at least one has already been critiqued for simple mathematical errors, and given that many were pre-prints, it is hard to ascertain if they have provided accurate representations of the data. Serology studies were at a variable risk of bias, and analysing only the highest quality serosurveys produced a higher estimate than relying on lower quality studies.

Moreover, the quality of included serosurvey estimates was often questionable. Many countries have a clear political motivation to present lower estimates, making it challenging to ascertain whether these may have biased the reporting of results, particularly for those places that have only presented results as press releases thus far. Some have also been criticized for sampling issues that would likely lead to a biased overestimate of population infection rates ( Bendavid et al., 2020 ).

Accounting for right-censoring in these estimates was also a challenge. Using a 10-day cut-off for deaths is far too crude a method to create a reliable estimate. In some cases, this could be an overestimate, due to the seroconversion process taking almost as much time as the median time until death. Conversely, there is a long tail for COVID-19 deaths ( Giorgi Rossi et al., 2020 ), and therefore it is almost certain that some proportion of the ‘true’ number of deaths will be missed by using a 10-day cut-off, biasing the estimated IFRs down. This may be why serosurvey estimates at first appear to result in somewhat lower IFRs than modelled and observational data suggest.

It is also important to recognize that this is a living estimate. With new data being published every single day during this pandemic, in a wide variety of languages and in innumerable formats, it is impossible to collate every single piece of information into one document no matter how rigorous. Moreover, this aggregated estimate is only as correct as the most recent search — the point estimate has not shifted substantially because of the inclusion of new research, but the confidence interval has changed. It is almost certain that, over the course of coming months and years, the IFR will be revised a number of times. In particular, it is vital that future research stratifies this estimate by age, as this appears to be the most significant factor in the risk of death from COVID-19.

This research has a range of very important implications. Some countries have announced the aim of pursuing herd immunity with regard to COVID-19 in the absence of a vaccination. The aggregated IFR would suggest that, at a minimum, you would expect 0.45%–0.53% of a population to die before the herd immunity threshold of the disease (based on R0 of 2.5–3 ( Russell et al., 2020 )) was reached ( Mahase, 2020 ). As an example, in the US this would imply more than 1 million deaths at the lower end of the scale. Even with a lower herd immunity threshold suggested by more recent modelling ( Aguas et al., 2020 ), this would imply an unmanageable number of deaths to reach the threshold across a country.

This also has implications for future planning. Governments looking to exit lockdowns should be prepared to see a relatively high IFR within the population that is infected if COVID-19 re-emerges. This should inform the decision to relax restrictions, given that the IFR for people infected with COVID-19 appears to be not insignificant even in places with very robust healthcare systems.

Conclusions

Based on a systematic review and meta-analysis of published evidence on COVID-19 until July 2020, the IFR of the disease across populations is 0.68% (0.53%–0.82%). However, because of very high heterogeneity in the meta-analysis, it is difficult to know if this represents the ‘true’ point estimate. In particular, higher quality serosurveys with lower risk of bias appeared to generate higher IFRs. It is likely that, because of age and perhaps underlying comorbidities in the population, different places will experience different IFRs due to the disease. Given the issues with mortality recording, it is also likely that this represents an underestimate of the true IFR figure. More research looking at age-stratified IFR is urgently needed to inform policymaking on this front.

Authors’ declarations

The authors declare no conflicts of interest. No funding was received for this study. A pre-print version can be found here: https://www.medrxiv.org/content/10.1101/2020.05.03.20089854v1 .

No ethical approval was sought for this study.

Appendix A Supplementary material related to this article can be found, in the online version, at doi: https://doi.org/10.1016/j.ijid.2020.09.1464 .

Appendix A. Supplementary data

The following is Supplementary data to this article:

  • Aguas R., Corder R.M., King J.G., Goncalves G., Ferreira M.U., Gomes M.G.M. Herd immunity thresholds for SARS-CoV-2 estimated from unfolding epidemics. medRxiv. 2020 2020.07.23.20160762. [ Google Scholar ]
  • Variation in false-negative rate of reverse transcriptase polymerase chain reaction-based SARS-CoV-2 tests by time since exposure. Ann Intern Med; 0(0):null. [ PMC free article ] [ PubMed ]
  • Anon . Instituto de Salud Carlos III; 2020. ESTUDIO NACIONAL DE SERO-EPIDEMIOLOGÍA DE LA INFECCIÓN POR SARS-COV-2 EN ESPAÑA. [ Google Scholar ]
  • Anon . Czech Republic Ministry of Health; 2020. Collective immunity study SARS-COV-2-CZ-Preval: preliminary results. [ Google Scholar ]
  • Anon . Finnish Government; Finland: 2020. Weekly Report of the Population Serology Survey of the Corona Epidemic. [ Google Scholar ]
  • Bai Y., Yao L., Wei T., Tian F., Jin D.-Y., Chen L. Presumed asymptomatic carrier transmission of COVID-19. JAMA. 2020; 323 (14):1406–1407. [ PMC free article ] [ PubMed ] [ Google Scholar ]
  • Bassett B.A. Strict lower bound on the COVID-19 fatality rate in overwhelmed healthcare systems. medRxiv. 2020 2020.04.22.20076026. [ Google Scholar ]
  • Basu A. Estimating the infection fatality rate among symptomatic COVID-19 cases in the United States. Health Aff (Project Hope) 2020 101377hlthaff202000455. [ PubMed ] [ Google Scholar ]
  • Bendavid E., Mulaney B., Sood N., Shah S., Ling E., Bromley-Dulfano R. COVID-19 antibody seroprevalence in Santa Clara County, California. medRxiv. 2020 2020.04.14.20062463. [ PMC free article ] [ PubMed ] [ Google Scholar ]
  • Bryan A., Pepper G., Wener M.H., Fink S.L., Morishima C., Chaudhary A. Performance characteristics of the Abbott Architect SARS-CoV-2 IgG assay and seroprevalence in Boise, Idaho. J Clin Microbiol. 2020 JCM.00941-20. [ PMC free article ] [ PubMed ] [ Google Scholar ]
  • Doi A., Iwata K., Kuroda H., Hasuike T., Nasu S., Kanda A. Estimation of seroprevalence of novel coronavirus disease (COVID-19) using preserved serum at an outpatient setting in Kobe, Japan: a cross-sectional study. medRxiv. 2020 2020.04.26.20079822. [ PMC free article ] [ PubMed ] [ Google Scholar ]
  • Dong E., Du H., Gardner L. An interactive web-based dashboard to track COVID-19 in real time. Lancet Infect Dis. 2020; 20 (5):533–534. [ PMC free article ] [ PubMed ] [ Google Scholar ]
  • Ed Slot B.M.H., Reusken Chantal B.E.M. Herd immunity is not a realistic exit strategy during a COVID-19 outbreak. Epidemiology. 2020 28 April 2020, PREPRINT (Version 1) available at Research Square [+https://doi.org/10.21203/rs.3.rs-25862/v1+]. In Review. [ Google Scholar ]
  • Erikstrup C., Hother C.E., Pedersen O.B.V., Mølbak K., Skov R.L., Holm D.K. Estimation of SARS-CoV-2 infection fatality rate by real-time antibody screening of blood donors. medRxiv. 2020 2020.04.24.20075291. [ PMC free article ] [ PubMed ] [ Google Scholar ]
  • Fernández-Barat L., López-Aladid R., Torres A. The value of serology testing to manage SARS-CoV-2 infections. Eur Respir J. 2020:2002411. [ PMC free article ] [ PubMed ] [ Google Scholar ]
  • Folkhälsomyndigheten . Public Health Agency of Sweden; Sweden: 2020. The infection fatality rate of COVID-19 in Stockholm — technical report. [ Google Scholar ]
  • Folkhälsomyndigheten, editor. Första resultaten från pågående undersökning av antikroppar för covid-19-virus. Folkhälsomyndigheten; Sweden: 2020. [ Google Scholar ]
  • Fontanet A., Tondeur L., Madec Y., Grant R., Besombes C., Jolly N. Cluster of COVID-19 in northern France: a retrospective closed cohort study. medRxiv. 2020 2020.04.18.20071134. [ Google Scholar ]
  • Garcia-Basteiro A.L., Moncunill G., Tortajada M., Vidal M., Guinovart C., Jimenez A. Seroprevalence of antibodies against SARS-CoV-2 among health care workers in a large Spanish reference hospital. medRxiv. 2020 2020.04.27.20082289. [ PMC free article ] [ PubMed ] [ Google Scholar ]
  • Ghaffari A., Meurant R., Ardakani A. COVID-19 serological tests: how well do they actually perform? Diagnostics. 2020; 10 (7):453. [ PMC free article ] [ PubMed ] [ Google Scholar ]
  • Giorgi Rossi P., Emilia-Romagna COVID-19 working group, Broccoli S., Angelini P. Case fatality rate in patients with COVID-19 infection and its relationship with length of follow up. J Clin Virol. 2020; 128 :104415. [ PMC free article ] [ PubMed ] [ Google Scholar ]
  • Grewelle R., De Leo G. Estimating the global infection fatality rate of COVID-19. medRxiv. 2020 2020.05.11.20098780. [ Google Scholar ]
  • Gudbjartsson D.F., Helgason A., Jonsson H., Magnusson O.T., Melsted P., Norddahl G.L. Spread of SARS-CoV-2 in the Icelandic population. N Engl J Med. 2020 [ PMC free article ] [ PubMed ] [ Google Scholar ]
  • Hallal P., Hartwig F., Horta B., Victora G.D., Silveira M., Struchiner C. Remarkable variability in SARS-CoV-2 antibodies across Brazilian regions: nationwide serological household survey in 27 states. medRxiv. 2020 2020.05.30.20117531. [ Google Scholar ]
  • Herzog S., De Bie J., Abrams S., Wouters I., Ekinci E., Patteet L. Seroprevalence of IgG antibodies against SARS coronavirus 2 in Belgium: a serial prospective cross-sectional nationwide study of residual samples. medRxiv. 2020 2020.06.08.20125179.2. [ PMC free article ] [ PubMed ] [ Google Scholar ]
  • Hoy D., Brooks P., Woolf A., Blyth F., March L., Bain C. Assessing risk of bias in prevalence studies: modification of an existing tool and evidence of interrater agreement. J Clin Epidemiol. 2012; 65 (9):934–939. [ PubMed ] [ Google Scholar ]
  • Institut SS, editor. Notat: Foreløbige resultater fra den repræsentative seroprævalensundersøgelse af COVID-19. Statens Serum Institut; Denmark: 2020. [ Google Scholar ]
  • Ioannidis J. The infection fatality rate of COVID-19 inferred from seroprevalence data. medRxiv. 2020 2020.05.13.20101253. [ PMC free article ] [ PubMed ] [ Google Scholar ]
  • IU . Indiana University; Indiana: 2020. ISDH release preliminary findings about impact of COVID-19 in Indiana. [ Google Scholar ]
  • Jason Oke C.H. Centre for Evidence-Based Medicine: University of Oxford; 2020. Global Covid-19 case fatality rates. [ Google Scholar ]
  • Jerkovic I., Ljubic T., Basic Z., Kruzic I., Kunac N., Bezic J. SARS-CoV-2 antibody seroprevalence in industry workers in Split-Dalmatia and Sibenik-Knin County, Croatia. medRxiv. 2020 2020.05.11.20095158. [ PMC free article ] [ PubMed ] [ Google Scholar ]
  • Jung S.M., Akhmetzhanov A.R., Hayashi K., Linton N.M., Yang Y., Yuan B. Real-time estimation of the risk of death from novel coronavirus (COVID-19) infection: inference using exported cases. J Clin Med. 2020; 9 (2) [ PMC free article ] [ PubMed ] [ Google Scholar ]
  • Kahathuduwa C.N., Dhanasekara C.S., Chin S.-H. Case fatality rate in COVID-19: a systematic review and meta-analysis. medRxiv. 2020 2020.04.01.20050476. [ Google Scholar ]
  • Levin A.T., Meyerowitz-Katz G., Owusu-Boaitey N., Cochran K.B., Walsh S.P. Assessing the age specificity of infection fatality rates for COVID-19: systematic review, meta-analysis, and public policy implications. medRxiv. 2020 2020.07.23.20160895. [ PMC free article ] [ PubMed ] [ Google Scholar ]
  • Lisboa Bastos M., Tavaziva G., Abidi S.K., Campbell J.R., Haraoui L.-P., Johnston J.C. Diagnostic accuracy of serological tests for covid-19: systematic review and meta-analysis. BMJ. 2020; 370 :m2516. [ PMC free article ] [ PubMed ] [ Google Scholar ]
  • Liu K., Chen Y., Lin R., Han K. Clinical features of COVID-19 in elderly patients: a comparison with young and middle-aged patients. J Infect. 2020 S0163-4453(20)30116-X. [ PMC free article ] [ PubMed ] [ Google Scholar ]
  • Mahase E. Covid-19: UK starts social distancing after new model points to 260 000 potential deaths. BMJ. 2020; 368 (1089) [ PubMed ] [ Google Scholar ]
  • Modi C., Boehm V., Ferraro S., Stein G., Seljak U. Total COVID-19 mortality in italy: excess mortality and age dependence through time-series analysis. medRxiv. 2020 2020.04.15.20067074. [ Google Scholar ]
  • Modi C., Boehm V., Ferraro S., Stein G., Seljak U. How deadly is COVID-19? A rigorous analysis of excess mortality and age-dependent fatality rates in Italy. medRxiv. 2020 2020.04.15.20067074. [ Google Scholar ]
  • Modig K., Ebeling M. Excess mortality from COVID-19. Weekly excess death rates by age and sex for Sweden. medRxiv. 2020 2020.05.10.20096909. [ PMC free article ] [ PubMed ] [ Google Scholar ]
  • Nishiura H., Kobayashi T., Yang Y., Hayashi K., Miyama T., Kinoshita R. The rate of underascertainment of novel coronavirus (2019-nCoV) infection: estimation using Japanese passengers data on evacuation flights. J Clin Med. 2020; 9 (2) [ PMC free article ] [ PubMed ] [ Google Scholar ]
  • Perez-Saez J., Lauer S.A., Kaiser L., Regard S., Delaporte E., Guessous I. Serology-informed estimates of SARS-COV-2 infection fatality risk in Geneva, Switzerland. medRxiv. 2020 2020.06.10.20127423. [ PMC free article ] [ PubMed ] [ Google Scholar ]
  • Rinaldi G., Paradisi M. An empirical estimate of the infection fatality rate of COVID-19 from the first Italian outbreak. medRxiv. 2020 2020.04.18.20070912. [ Google Scholar ]
  • Roques L., Klein E., Papaix J., Sar A., Soubeyrand S. Using early data to estimate the actual infection fatality ratio from COVID-19 in France. medRxiv. 2020 2020.03.22.20040915. [ PMC free article ] [ PubMed ] [ Google Scholar ]
  • Rothan H.A., Byrareddy S.N. The epidemiology and pathogenesis of coronavirus disease (COVID-19) outbreak. J Autoimmun. 2020; 109 :102433. [ PMC free article ] [ PubMed ] [ Google Scholar ]
  • Russell T.W., Hellewell J., Jarvis C.I., van Zandvoort K., Abbott S., Ratnayake R. Estimating the infection and case fatality ratio for coronavirus disease (COVID-19) using age-adjusted data from the outbreak on the Diamond Princess cruise ship, February 2020. Euro Surveill. 2020; 25 (12) [ PMC free article ] [ PubMed ] [ Google Scholar ]
  • Salje H., Tran Kiem C., Lefrancq N., Courtejoie N., Bosetti P., Paireau J. Estimating the burden of SARS-CoV-2 in France. Science. 2020:eabc3517. [ PMC free article ] [ PubMed ] [ Google Scholar ]
  • Scally G., Jacobson B., Abbasi K. The UK’s public health response to covid-19. BMJ. 2020; 369 :m1932. [ PubMed ] [ Google Scholar ]
  • Shakiba M., Hashemi Nazari S.S., Mehrabian F., Rezvani S.M., Ghasempour Z., Heidarzadeh A. Seroprevalence of COVID-19 virus infection in Guilan province, Iran. medRxiv. 2020 2020.04.26.20079244. [ PMC free article ] [ PubMed ] [ Google Scholar ]
  • Silveira M., Barros A., Horta B., Pellanda L., Victora G., Dellagostin O. Repeated population-based surveys of antibodies against SARS-CoV-2 in Southern Brazil. medRxiv. 2020 2020.05.01.20087205. [ PubMed ] [ Google Scholar ]
  • Simon P. Robust estimation of infection fatality rates during the early phase of a pandemic. medRxiv. 2020 2020.04.08.20057729. [ Google Scholar ]
  • Slovenia RO, editor. First study carried out on herd immunity of the population in the whole territory of Slovenia. Republic of Slovenia; Slovenia: 2020. [ Google Scholar ]
  • Snoeck C.J., Vaillant M., Abdelrahman T., Satagopam V.P., Turner J.D., Beaumont K. Prevalence of SARS-CoV-2 infection in the Luxembourgish population: the CON-VINCE study. medRxiv. 2020 2020.05.11.20092916. [ Google Scholar ]
  • Sood N., Simon P., Ebner P., Eichner D., Reynolds J., Bendavid E. Seroprevalence of SARS-CoV-2—specific antibodies among adults in Los Angeles County, California, on April 10–11, 2020. JAMA. 2020 [ PMC free article ] [ PubMed ] [ Google Scholar ]
  • Spychalski P., Błażyńska-Spychalska A., Kobiela J. Estimating case fatality rates of COVID-19. Lancet Infect Dis. 2020 [ PMC free article ] [ PubMed ] [ Google Scholar ]
  • Staines H.M., Kirwan D.E., Clark D.J., Adams E.R., Augustin Y., Byrne R.L. Dynamics of IgG seroconversion and pathophysiology of COVID-19 infections. medRxiv. 2020 2020.06.07.20124636. [ Google Scholar ]
  • Statistics OFN, editor. Coronavirus (COVID-19) Infection Survey pilot: 5 June 2020. Office for National Statistics; England: 2020. [ Google Scholar ]
  • Streeck H., Schulte B., Kuemmerer B., Richter E., Hoeller T., Fuhrmann C. Infection fatality rate of SARS-CoV-2 infection in a German community with a super-spreading event. medRxiv. 2020 2020.05.04.20090076.3. [ PMC free article ] [ PubMed ] [ Google Scholar ]
  • Stringhini S., Wisniak A., Piumatti G., Azman A.S., Lauer S.A., Baysson H. Repeated seroprevalence of anti-SARS-CoV -2 IgG antibodies in a population-based sample from Geneva, Switzerland. medRxiv. 2020 2020.05.02.20088898. [ PMC free article ] [ PubMed ] [ Google Scholar ]
  • Stringhini S., Wisniak A., Piumatti G., Azman A.S., Lauer S.A., Baysson H. Seroprevalence of anti-SARS-CoV-2 IgG antibodies in Geneva, Switzerland (SEROCoV-POP): a population-based study. Lancet. 2020 [ PMC free article ] [ PubMed ] [ Google Scholar ]
  • Sutton D., Fuchs K., D’Alton M., Goffman D. Universal screening for SARS-CoV-2 in women admitted for delivery. N Engl J Med. 2020 [ PMC free article ] [ PubMed ] [ Google Scholar ]
  • Takahashi S., Greenhouse Bryan, Rodríguez-Barraquer Isabel. OSF Preprints; 2020. Are Sars-cov-2 seroprevalence estimates biased? [ PMC free article ] [ PubMed ] [ Google Scholar ]
  • Takita M., Matsumura T., Yamamoto K., Yamashita E., Hosoda K., Hamaki T. Preliminary results of seroprevalence of SARS-CoV-2 at community clinics in Tokyo. medRxiv. 2020 2020.04.29.20085449. [ Google Scholar ]
  • Thompson W.W., Moore M.R., Weintraub E., Cheng P.-Y., Jin X., Bridges C.B. Estimating influenza-associated deaths in the United States. Am J Public Health. 2009; 99 (S2):S225–S230. [ PMC free article ] [ PubMed ] [ Google Scholar ]
  • Thompson C., Grayson N., Paton R., Lourenço J., Penman B., Lee L.N. Neutralising antibodies to SARS coronavirus 2 in Scottish blood donors — a pilot study of the value of serology to determine population exposure. medRxiv. 2020 2020.04.13.20060467. [ PMC free article ] [ PubMed ] [ Google Scholar ]
  • Tian S., Hu N., Lou J., Chen K., Kang X., Xiang Z. Characteristics of COVID-19 infection in Beijing. J Infect. 2020; 80 (4):401–406. [ PMC free article ] [ PubMed ] [ Google Scholar ]
  • Valenti L., Bergna A., Pelusi S., Facciotti F., Lai A., Tarkowski M. SARS-CoV-2 seroprevalence trends in healthy blood donors during the COVID-19 Milan outbreak. medRxiv. 2020 2020.05.11.20098442. [ PMC free article ] [ PubMed ] [ Google Scholar ]
  • Verity R., Okell L.C., Dorigatti I., Winskill P., Whittaker C., Imai N. Estimates of the severity of coronavirus disease 2019: a model-based analysis. Lancet Infect Dis. 2020 [ PMC free article ] [ PubMed ] [ Google Scholar ]
  • Villa M., Myers J.F., Turkheimer F. COVID-19: recovering estimates of the infected fatality rate during an ongoing pandemic through partial data. medRxiv. 2020 2020.04.10.20060764. [ Google Scholar ]
  • Wilson L. SSRN; 2020. SARS-CoV-2, COVID-19, Infection Fatality Rate (IFR) implied by the serology, antibody, testing in New York City. [ Google Scholar ]
  • Winter A.K., Hegde S.T. The important role of serology for COVID-19 control. Lancet Infect Dis. 2020 [ PMC free article ] [ PubMed ] [ Google Scholar ]
  • Wong J.Y., Wu P., Nishiura H., Goldstein E., Lau E.H.Y., Yang L. Infection fatality risk of the pandemic A(H1N1)2009 virus in Hong Kong. Am J Epidemiol. 2013; 177 (8):834–840. [ PMC free article ] [ PubMed ] [ Google Scholar ]
  • Xiao AT, Tong YX, Zhang S. False-negative of RT-PCR and prolonged nucleic acid conversion in COVID-19: rather than recurrence. J Med Virol; n/a (n/a). [ PMC free article ] [ PubMed ]

Log in using your username and password

  • Search More Search for this keyword Advanced search
  • Latest content
  • Current issue
  • For authors
  • BMJ Journals More You are viewing from: Google Indexer

You are here

  • Volume 109, Issue 3
  • Child mask mandates for COVID-19: a systematic review
  • Article Text
  • Article info
  • Citation Tools
  • Rapid Responses
  • Article metrics

Download PDF

  • http://orcid.org/0009-0000-5534-1839 Johanna Sandlund 1 ,
  • Ram Duriseti 2 ,
  • http://orcid.org/0000-0002-0856-2476 Shamez N Ladhani 3 , 4 ,
  • Kelly Stuart 5 ,
  • Jeanne Noble 6 ,
  • http://orcid.org/0000-0002-2341-6573 Tracy Beth Høeg 7 , 8
  • 1 Board-Certified Clinical Microbiologist and Independent Scholar , Alameda , California , USA
  • 2 Stanford University School of Medicine , Stanford , California , USA
  • 3 Immunisation Department , UK Health Security Agency , London , UK
  • 4 Centre for Neonatal and Paediatric Infection , St. George's University of London , London , UK
  • 5 SmallTalk Pediatric Therapy , San Diego , California , USA
  • 6 Emergency Medicine , University of California San Francisco , San Francisco , California , USA
  • 7 Epidemiology and Biostatistics , University of California San Francisco , San Francisco , California , USA
  • 8 Clinical Research , University of Southern Denmark , Odense , Denmark
  • Correspondence to Dr Johanna Sandlund, Independent, Alameda, USA; johanna.sandlund{at}gmail.com

Background Mask mandates for children during the COVID-19 pandemic varied in different locations. A risk-benefit analysis of this intervention has not yet been performed. In this study, we performed a systematic review to assess research on the effectiveness of mask wearing in children.

Methods We performed database searches up to February 2023. The studies were screened by title and abstract, and included studies were further screened as full-text references. A risk-of-bias analysis was performed by two independent reviewers and adjudicated by a third reviewer.

Results We screened 597 studies and included 22 in the final analysis. There were no randomised controlled trials in children assessing the benefits of mask wearing to reduce SARS-CoV-2 infection or transmission. The six observational studies reporting an association between child masking and lower infection rate or antibody seropositivity had critical (n=5) or serious (n=1) risk of bias; all six were potentially confounded by important differences between masked and unmasked groups and two were shown to have non-significant results when reanalysed. Sixteen other observational studies found no association between mask wearing and infection or transmission.

Conclusions Real-world effectiveness of child mask mandates against SARS-CoV-2 transmission or infection has not been demonstrated with high-quality evidence. The current body of scientific data does not support masking children for protection against COVID-19.

  • Infectious Disease Medicine
  • Child Health

Data availability statement

Data are available upon reasonable request. All data relevant to the study are included in the article or uploaded as supplementary information.

This is an open access article distributed in accordance with the Creative Commons Attribution Non Commercial (CC BY-NC 4.0) license, which permits others to distribute, remix, adapt, build upon this work non-commercially, and license their derivative works on different terms, provided the original work is properly cited, appropriate credit is given, any changes made indicated, and the use is non-commercial. See: http://creativecommons.org/licenses/by-nc/4.0/ .

https://doi.org/10.1136/archdischild-2023-326215

Statistics from Altmetric.com

Request permissions.

If you wish to reuse any or all of this article please use the link below which will take you to the Copyright Clearance Center’s RightsLink service. You will be able to get a quick price and instant permission to reuse the content in many different ways.

WHAT IS ALREADY KNOWN ON THIS TOPIC

Child mask mandates have been extensively used as a public health measure during the COVID-19 pandemic.

Masking recommendations appear to be entirely based on mechanistic and observational data, and a systematic review assessing the evidence has not been performed.

WHAT THIS STUDY ADDS

In this systematic review, 16 studies found no effect of mask wearing on infection or transmission, while six studies reporting a protective association had critical or serious risk of bias.

Because benefits of masking for COVID-19 have not been identified, it should be recognised that mask recommendations for children are not supported by scientific evidence.

HOW THIS STUDY MIGHT AFFECT RESEARCH, PRACTICE OR POLICY

COVID-19-related policy recommendations should be informed by high-quality evidence and consider the possibility of harm, especially for children, who are vulnerable and an ethically protected group.

Healthcare providers and adults working with children should be educated about the absence of high-quality data supporting masking to lower SARS-CoV-2 infection and transmission risks.

Because absence of harm is not established, recommending child masking does not meet the accepted practice of promulgating only medical interventions where benefits clearly outweigh harms.

Introduction

Mandating masks for children has been one of the most polarising public-health measures implemented during the COVID-19 pandemic. Two Cochrane reviews of randomised controlled trials (RCT) of masking for prevention of upper respiratory infections failed to find a benefit against infection or transmission. 1 2 Most countries have now removed all public mask mandates, while the USA’s Centers for Disease Control and Prevention (CDC) and American Academy of Pediatrics continue to recommend masking down to the age of two. 3 4 This recommendation appears to be entirely based on observational data finding associations with lower case rates in masked versus unmasked individuals but does not take into account the potential adverse consequences of masking, especially in young children, including but not limited to impact on speech, language, learning, mental health and physiological factors. Seeing mouth movements and facial gestures accelerate word recognition and speech comprehension, 5–8 the integration of facial information is important for speech perception, 9 10 and recognition of facial expressions is critical for children’s abilities to communicate and understand and show emotions. 7 11 12 Mask wearing may also cause breathing difficulties, headaches, dermatitis, general discomfort and pain. 2 13–17

There is an urgent need to base pandemic-related policy recommendations on robust scientific data that include risk-benefit analyses, preferably with the long-term goals and the beneficiaries of the intervention clearly defined. 18 Ethically, children should be treated as a protected group, where the benefits of any intervention should clearly outweigh harms.

The aim of this systematic review is to evaluate the body of literature on mask wearing in children to assess the existing evidence regarding protection offered by face masks against SARS-CoV-2 infection or transmission.

We conducted a systematic review to evaluate the evidence for effectiveness of child mask mandates in reducing transmission or disease severity in COVID-19.

References were identified through searches of PubMed, Google Scholar, three major preprint servers (SSRN, MedRxiv and Research Square) and major public health agency publication databases and websites until February 2023 ( online supplemental appendix 1 ). We included primary studies of any design investigating mask effectiveness against COVID-19 (SARS-CoV-2) transmission, infection and disease in individuals <18 years old. Publications of case reports, case series, reviews and comments without new data were excluded, as were studies where age groups were not specified or out of the paediatric range, or when the setting or study objective/design were not applicable. The systematic review was prepared according to Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) guidelines. The quality risk of bias (ROB) was estimated using the ROB-2 and ROBINS-I tools, 19 a structured approach for assessing the ROB utilising different domains of bias and an overall judgement. All ROB assessments were conducted by two independent reviewers (RD and SNL), and disagreements were resolved by a third reviewer (JS).

Supplemental material

Our literature search identified 597 publications that were screened by title and abstract. We then screened 40 full-text references and excluded 18 that did not meet the inclusion criteria ( figure 1 ). Details of the screened publications are presented in table 1 . The ROB analysis by the two reviewers resulted in 18 differences in ratings and four differences in overall ROB that needed to be adjudicated.

  • Download figure
  • Open in new tab
  • Download powerpoint

PRISMA flow diagram. PRISMA, Preferred Reporting Items for Systematic Reviews and Meta-Analyses.

  • View inline

Characteristics of included observational studies

To date, there are no RCTs assessing the effects of masking children in reducing COVID-19 transmission or disease. Among the 22 observational studies identified, the overall ROB was critical in six studies (27.2%), serious in 10 studies (45.5%), moderate in five studies (22.7%) and low in none of the studies ( table 2 ). Of the six studies reporting a significant negative correlation between masking and COVID-19 cases, five had critical and one had serious ROB. Of the 16 studies failing to find a significant correlation, 1 (6.3%) had critical, 10 (62.5%) had serious, 5 (31.3%) had moderate and none had low ROB.

All six studies, 20 , 21 , 22 , 23 , 24 , 25 reporting a negative assocation were potentially confounded by crucial differences between masked and unmasked groups, including the number of instructional school days, differences in school size, systematic baseline differences in case rates in all phases of the pandemic, testing policies, contact-tracing policy differences and teacher vaccination rates. These confounders—alone and in combination—resulted in a failure to demonstrate an isolated effect of masks themselves. 20–22

One study from Boston found that lifting of school mask mandates was associated with increased number of COVID-19 cases, 23 which was questioned upon re-analysis. 26 US studies in North Carolina 24 and Arizona 21 found that mask requirements had negative associations with in-school transmission and COVID-19 outbreaks, respectively. In a 2020 Canadian study published as a preprint, children who did not wear a mask had higher seropositivity than children who wore masks, but the overall seropositivity was low (9/541 or 1.7% in total) and findings were confounded by multiple external factors including social distancing and attendance in schools, social functions and organised sports. 25

In a Spanish study of almost 600 000 children, the researchers did not find a significant difference in cases between unmasked 5-year-olds and masked 6-year-olds; instead, case rates correlated closely with the age of children, 27 which was also observed in another Spanish study. 28 An observational CDC-funded US study 20 found no significant association between county-wide mask mandates and paediatric case counts on expanded reanalysis. 29 A lack of significant association between masking children and risk of COVID-19 was also reported by the UK Department of Education. 30 In three US studies, there was no correlation between mask mandates and COVID-19 rates, 31 no significant association between COVID-19 incidence and face mask use, 32 and no risk reduction for COVID-19-related outcomes with student mask mandates. 33 Spanish and Irish studies have independently observed similar primary-school COVID-19 transmission in young children with or without masking, respectively. 28 34 In another CDC study, there was no reduction in COVID-19 incidence in schools requiring student masking compared with those with optional masking. 35 When comparing adjacent school districts with and without mask mandates, multiple studies have reported no difference in transmission. 36–38 A Finnish study compared case rates in children with and without mask mandates in 10–12 year-olds, and the authors found no reduction in COVID-19 case rates when mask recommendations were extended to include 10–12 year olds. 39 Face-mask use among high school athletes was not found to have an impact on transmission. 32

To explore the effect on disease severity, there was no association between viral load of index cases with confirmed COVID-19 and disease severity among secondary cases. 40 In Sweden, where schools remained open and masks were not required, only 15 of the nearly 2 million children were hospitalised and none died during the spring of 2020; also, the infection rate among teachers was similar to that of other professions. 41 In Finland, where children have not worn masks under the age of 10–12 years, no child died from COVID-19. 42 In Norway, where masks in schools have not been recommended, in-school transmission was <1% among children and< 2% in child-adult contacts during August–November 2020. 43 During a SARS-CoV-2 Delta variant outbreak in a US elementary school in May–June 2021, mask use for staff and students in classrooms did not significantly prevent transmission from symptomatic adults, while very few children went on to infect their family members. 44 In New York City public schools with more than 1600 schools and 1 million enrolled students, the transmission rate (secondary attack rate) during the Delta variant period (October–December 2021) was estimated to be 0.5%. 45

Risk-of-bias rating per study.

In this systematic review on benefits of child masking against SARS-CoV-2, we identified no RCT on the efficacy for use of face masks and the risk of transmission or disease. Among the 22 identified observational studies of masking for prevention of COVID-19, more than 70% of the studies had a critical or serious overall ROB. None of the observational studies reporting a negative correlation between masking and COVID-19 cases had a level of bias that was less than “serious.”

Specifically, of the 6 out of 22 observational studies that reported a significant negative correlation between masking and COVID-19 cases, five had critical and one had serious ROB. Of the 16 out of 22 studies failing to find a significant correlation, only 6.3% had critical ROB, while 62.5% had serious and 31.3% had moderate ROB. Importantly, the largest studies with the lowest ROB did not identify a benefit from masking. 27 28 30 The study (currently in preprint publication) with the most robust internal control showed no benefit from a mask mandate. 38 Observational studies reporting a negative association between masking and COVID-19 rates have failed to demonstrate a benefit when confounding factors have adequately been considered. 20–24 Larger observational studies, 28 31 including a regression-discontinuity analysis 39 and a more robust reanalysis 29 of a prior publication, 20 as well as other observational studies, 27 30 32–38 41–44 failed to find benefit of masking against COVID-19. Observational studies in adults also repeatedly fail to properly adjust for confounding factors to avoid bias. 46–48 Furthermore, the Boston observational study 23 stated they could infer causality between lifting school mask mandates and increases in student and staff cases by using a difference-in-differences technique. However, a subsequent reanalysis called the methodology and results of this study into question and failed to find the same association when expanding the population to include the entire state or using different statistical analysis and also found the initial study’s results were likely confounded by differences in prior infection rates. 26

Observational studies have also failed to find an association between voluntary mask wearing among adults in schools and lower odds of COVID-19 in the school 49 or between mask mandates or mask use and reduced transmission. 50 In addition, a systematic review showed a 10-fold lower secondary attack rate in schools compared with community/household settings. 51

In adults, there are only a limited number of published RCTs of mask wearing and COVID-19 prevention. DANMASK-19 failed to find a 50% reduction in COVID-19 infections in surgical mask wearers in the community. 52 A cluster RCT in Bangladesh found no effect of community cloth masking on COVID-19 infections, no reduction from surgical masking for anyone under age 50, and only a marginal reduction among >50-year olds and in the context of observer-enforced physical distancing, 53 an association that was found to be insignificant after re-analysis. 54 In a predominantly adult cluster RCT of almost 40, 000 participants from age 10 and up (but not reported by age group and, therefore, not included in our systematic review), there was no difference in COVID-19-like illness or mortality between masked and unmasked groups. 55 A Cochrane systematic review published in 2020 similarly found use of surgical masks and respirators in adults to have ‘little to no effect’ on the transmission of respiratory viruses, while side effects included discomfort. 1 In the 2023 updated version that included COVID-19, these conclusions remained unchanged. 2

Perpetual masking in early childhood is without historical precedent. In children, the harms associated with masking are often challenging to identify, measure and quantify with correlational studies, and many of these outcomes will take years to fully evaluate. An extensive body of research has found harms associated with mask wearing or mask requirements in children. 56 These associated harms include negative impacts on speech, language and learning. Mask wearing causes reduced word identification 57–59 and impedes the ability to teach and evaluate speech. 60 There is a link between observation of the mouth and language processing, and people of all ages continue to focus on the mouth when listening to non-native speech. 61 The sensitive period for language development is through age 4, and development of connected speech is ongoing beyond age 10. 62

Mask wearing may also impact mental health and social-emotional well-being by limiting the ability to accurately interpret emotions, particularly in younger children. 63–66 There is also evidence that masks hinder social-emotional learning and language/literacy development in young children. 67 Children with special-education needs and autism may be disproportionately impacted by mask requirements as they rely heavily on facial expressions to pick up social cues. 68 Misinterpretation of facial expressions increases anxiety and depression in individuals. 69 School environments with mask mandates were also found to have increased anxiety levels compared to those without mandates. 70 In addition, mask wearing has been associated with physiological harm 2 13 13–17 —many of which are more frequently reported in children than in adults 2 17 71 —which may have multiple negative downstream effects, including reduced time and intensity of exercise, additional sick days, reduced learning capacity, and increased anxiety. Masking has also been found to lead to rapid increase in CO 2 content in inhaled air—higher in children than in adults—and to levels above acceptable safety standards for healthy adult workers, which may rise further with physical exertion. 72–74

In medicine, new interventions with unknown benefit but known or potential risks cannot be ethically recommended or enforced until absence of harm is demonstrated. Rather, the accepted standard is that an intervention should only be employed after benefit has been demonstrated, ideally through an RCT, together with safety data to ensure that proven benefits outweigh harms. The burden of proof to show that an intervention is both safe and beneficial is the responsibility of the person, institution or body implementing and recommending that intervention. 75

In this systematic review, we fail to find any evidence of benefit from masking children, to either protect themselves or those around them, from COVID-19. Harms of masking may include affected speech, language and emotional development, and physical discomfort contributing to reduced time and intensity of exercise and learning activities, and the long-term effects are too early to be measured. Adults who work with children should be educated about the lack of clear benefits and the potential harms of masking children, and there is no scientific evidence supporting a recommendation for masking in these professions.

In summary, child mask mandates fail a basic risk-benefit analysis. Recommending child masking to prevent the spread of COVID-19 is unsupported by current scientific data and inconsistent with accepted ethical norms that aim to provide additional protection from harm for vulnerable populations.

Ethics statements

Patient consent for publication.

Not applicable.

Ethics approval

  • Jefferson T ,
  • Del Mar CB ,
  • Dooley L , et al
  • Ferroni E , et al
  • American Academy of Pedicatrics
  • Chipot J , et al
  • Mohamadi T ,
  • Pascalis O ,
  • Loevenbruck H ,
  • Quinn PC , et al
  • Charney SA ,
  • Camarata SM ,
  • Kritzinger A ,
  • van der Linde J
  • Worster E ,
  • Pimperton H ,
  • Ralph-Lewis A , et al
  • Vaillant-Molina M ,
  • Bahrick LE ,
  • Anzures G ,
  • Izard CE , et al
  • Krzyzaniak N ,
  • Scott AM , et al
  • Bharatendu C ,
  • Goh Y , et al
  • Andréoletti L ,
  • Ferrari P , et al
  • Hughes RC ,
  • Bhopal SS ,
  • Tomlinson M
  • Higgins J ,
  • Chandler J , et al
  • Budzyn SE ,
  • Panaggio MJ ,
  • Parks SE , et al
  • McCullough JM ,
  • Dale AP , et al
  • Nelson SB ,
  • Dugdale CM ,
  • Bilinski A , et al
  • Cowger TL ,
  • Murray EJ ,
  • Clarke J , et al
  • Boutzoukas AE ,
  • Zimmerman KO ,
  • Benjamin DK
  • Carroll A ,
  • Charlton C , et al
  • Chandra A ,
  • Duriseti R , et al
  • Méndez-Boo L , et al
  • Alvarez-Lacalle E ,
  • Català M , et al
  • U.K. Department for Education
  • Halloran C , et al
  • McGuine TA ,
  • Haraldsdottir K , et al
  • Lessler J ,
  • Grabowski MK ,
  • Grantz KH , et al
  • Kennedy E , et al
  • Gettings J ,
  • Czarnik M ,
  • Morris E , et al
  • Tennessee Department of Health and Census
  • Stevenson J , et al
  • Juutinen A ,
  • Sarvikivi E ,
  • Laukkanen-Nevala P , et al
  • Trunfio M ,
  • Alladio F , et al
  • Ludvigsson JF
  • Suryawijaya Ong D ,
  • Brandal LT ,
  • Ofitserova TS ,
  • Meijerink H , et al
  • Lam-Hine T ,
  • McCurdy SA ,
  • Santora L , et al
  • Thamkittikasem J ,
  • Whittemore K , et al
  • Ginther DK ,
  • Adjodah D ,
  • Dinakar K ,
  • Chinazzi M , et al
  • Fisher BT ,
  • Tam V , et al
  • Marchant E ,
  • Griffiths L ,
  • Crick T , et al
  • Guerra DD ,
  • Waddington C ,
  • Mytton O , et al
  • Bundgaard H ,
  • Bundgaard JS ,
  • Raaschou-Pedersen DET , et al
  • Abaluck J ,
  • Styczynski A , et al
  • Chikina M ,
  • Nanque LM ,
  • Jensen AM ,
  • Diness AR , et al
  • González-Dambrauskas S ,
  • Caldwell-Kurtzman J ,
  • Motlagh Zadeh L , et al
  • Sfakianaki A ,
  • Kafentzis, GP ,
  • Kiagiadaki D
  • Lewkowicz DJ ,
  • Hansen-Tift AM
  • Glaspey AM ,
  • Wilson JJ ,
  • Reeder JD , et al
  • Schiatti L ,
  • Paglieri F , et al
  • Grundmann F ,
  • Epstude K ,
  • Pazhoohi F ,
  • Kingstone A
  • Harmer CJ ,
  • Goodwin GM ,
  • Powell AA ,
  • Ireland G ,
  • Aiano F , et al
  • Remschmidt C ,
  • Schink SB , et al
  • Traindl H ,
  • Prentice J , et al
  • Brooks JP ,
  • Martellucci CA ,
  • Flacco ME ,
  • Martellucci M , et al

Supplementary materials

Supplementary data.

This web only file has been produced by the BMJ Publishing Group from an electronic file supplied by the author(s) and has not been edited for content.

  • Data supplement 1

X @shamezladhani

Correction notice This article has been corrected since it was first published. There were two minor spelling mistakes in ‘What this study adds’.

Contributors JS, RD, SNL, KS, JN and TBH participated in the search selection and directly accessed and verified the underlying data reported in the manuscript. JS wrote the first draft of the manuscript, with input from RD, SNL, KS, JN and TBH. JS is guarantor.

Funding The authors have not declared a specific grant for this research from any funding agency in the public, commercial or not-for-profit sectors.

Competing interests None declared.

Provenance and peer review Not commissioned; externally peer reviewed.

Supplemental material This content has been supplied by the author(s). It has not been vetted by BMJ Publishing Group Limited (BMJ) and may not have been peer-reviewed. Any opinions or recommendations discussed are solely those of the author(s) and are not endorsed by BMJ. BMJ disclaims all liability and responsibility arising from any reliance placed on the content. Where the content includes any translated material, BMJ does not warrant the accuracy and reliability of the translations (including but not limited to local regulations, clinical guidelines, terminology, drug names and drug dosages), and is not responsible for any error and/or omissions arising from translation and adaptation or otherwise.

Read the full text or download the PDF:

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

  • View all journals
  • Explore content
  • About the journal
  • Publish with us
  • Sign up for alerts
  • NATURE PODCAST
  • 17 December 2020

Coronapod: The big COVID research papers of 2020

  • Benjamin Thompson ,
  • Noah Baker &
  • Traci Watson

You can also search for this author in PubMed   Google Scholar

Benjamin Thompson, Noah Baker and Traci Watson discuss some of 2020's most significant coronavirus research papers.

In the final Coronapod of 2020, we dive into the scientific literature to reflect on the COVID-19 pandemic. Researchers have discovered so much about SARS-CoV-2 – information that has been vital for public health responses and the rapid development of effective vaccines. But we also look forward to 2021, and the critical questions that remain to be answered about the pandemic.

Papers discussed

A Novel Coronavirus from Patients with Pneumonia in China, 2019 - New England Journal of Medicine, 24 January

Clinical features of patients infected with 2019 novel coronavirus in Wuhan, China - The Lancet , 24 January

A pneumonia outbreak associated with a new coronavirus of probable bat origin - Nature , 3 February

A new coronavirus associated with human respiratory disease in China - Nature , 3 February

Temporal dynamics in viral shedding and transmissibility of COVID-19 - Nature Medicine , 15 April

Spread of SARS-CoV-2 in the Icelandic Population - New England Journal of Medicine , 11 June

High SARS-CoV-2 Attack Rate Following Exposure at a Choir Practice — Skagit County, Washington, March 2020 - Morbidity & Mortality Weekly Report , 15 August

Respiratory virus shedding in exhaled breath and efficacy of face masks - Nature Medicine , 3 April

Aerosol and Surface Stability of SARS-CoV-2 as Compared with SARS-CoV-1 - New England Journal of Medicine , 13 April

Projecting the transmission dynamics of SARS-CoV-2 through the postpandemic period - Science , 22 May

Estimating the effects of non-pharmaceutical interventions on COVID-19 in Europe - Nature, 8 June

The effect of large-scale anti-contagion policies on the COVID-19 pandemic - Nature , 8 June

Retraction—Hydroxychloroquine or chloroquine with or without a macrolide for treatment of COVID-19: a multinational registry analysis - The Lancet, 20 June

A Randomized Trial of Hydroxychloroquine as Postexposure Prophylaxis for Covid-19 - New England Journal of Medicine , 3 June

Association Between Administration of Systemic Corticosteroids and Mortality Among Critically Ill Patients With COVID-19 - JAMA , 2 September

Immunological memory to SARS-CoV-2 assessed for greater than six months after infection - bioRxiv, 16 November

Coronavirus Disease 2019 (COVID-19) Re-infection by a Phylogenetically Distinct Severe Acute Respiratory Syndrome Coronavirus 2 Strain Confirmed by Whole Genome Sequencing - Clinical Infectious Diseases , 25 August

Nature’s COVID research updates – summarising key coronavirus papers as they appear

Never miss an episode: Subscribe to the Nature Podcast on Apple Podcasts , Google Podcasts , Spotify or your favourite podcast app. Head here for the Nature Podcast RSS feed .

doi: https://doi.org/10.1038/d41586-020-03609-2

Related Articles

covid 19 data analysis research paper

  • Public health

Long COVID still has no cure — so these patients are turning to research

Long COVID still has no cure — so these patients are turning to research

News Feature 02 APR 24

Google AI could soon use a person’s cough to diagnose disease

Google AI could soon use a person’s cough to diagnose disease

News 21 MAR 24

COVID’s toll on the brain: new clues emerge

COVID’s toll on the brain: new clues emerge

News 20 MAR 24

Lethal dust storms blanket Asia every spring — now AI could help predict them

Lethal dust storms blanket Asia every spring — now AI could help predict them

News 15 APR 24

Bird flu outbreak in US cows: why scientists are concerned

Bird flu outbreak in US cows: why scientists are concerned

News Explainer 08 APR 24

Adopt universal standards for study adaptation to boost health, education and social-science research

Correspondence 02 APR 24

The rise of eco-anxiety: scientists wake up to the mental-health toll of climate change

The rise of eco-anxiety: scientists wake up to the mental-health toll of climate change

News Feature 10 APR 24

Metabolic rewiring promotes anti-inflammatory effects of glucocorticoids

Metabolic rewiring promotes anti-inflammatory effects of glucocorticoids

Article 10 APR 24

Use fines from EU social-media act to fund research on adolescent mental health

Correspondence 09 APR 24

Junior Group Leader Position at IMBA - Institute of Molecular Biotechnology

The Institute of Molecular Biotechnology (IMBA) is one of Europe’s leading institutes for basic research in the life sciences. IMBA is located on t...

Austria (AT)

IMBA - Institute of Molecular Biotechnology

covid 19 data analysis research paper

Research Group Head, BeiGene Institute

A cross-disciplinary research organization where cutting-edge science and technology drive the discovery of impactful Insights

Pudong New Area, Shanghai

BeiGene Institute

covid 19 data analysis research paper

Open Rank Faculty, Center for Public Health Genomics

Center for Public Health Genomics & UVA Comprehensive Cancer Center seek 2 tenure-track faculty members in Cancer Precision Medicine/Precision Health.

Charlottesville, Virginia

Center for Public Health Genomics at the University of Virginia

covid 19 data analysis research paper

Husbandry Technician I

Memphis, Tennessee

St. Jude Children's Research Hospital (St. Jude)

covid 19 data analysis research paper

Lead Researcher – Department of Bone Marrow Transplantation & Cellular Therapy

covid 19 data analysis research paper

Sign up for the Nature Briefing newsletter — what matters in science, free to your inbox daily.

Quick links

  • Explore articles by subject
  • Guide to authors
  • Editorial policies

IMAGES

  1. COVID-19 Open Research Dataset

    covid 19 data analysis research paper

  2. Data capture and sharing in the COVID-19 pandemic: a cause for concern

    covid 19 data analysis research paper

  3. Improving the public’s understanding of COVID-19 data in charts and graphs

    covid 19 data analysis research paper

  4. COVID-19 Data and Analytics: Healthcare Industry Impact

    covid 19 data analysis research paper

  5. Python Data Analysis: covid-19, part 4 [visual data exploration

    covid 19 data analysis research paper

  6. Exploratory Data Analysis on COVID-19 Worldwide Data using Python

    covid 19 data analysis research paper

VIDEO

  1. Covid 19 Data visualization

  2. Coronavirus COVID-19 Dashboard in Microsoft Excel

  3. Why Data Sharing is Crucial to the COVID-19 Response

  4. How to Assess the Quantitative Data Collected from Questionnaire

  5. Thesis (students): Where do I start? Technical spoken. Meta Analysis, Research Paper

  6. Covid-19 Data Analysis and Visualization in R (Basics)

COMMENTS

  1. A statistical analysis of the novel coronavirus (COVID-19) in ...

    The novel coronavirus (COVID-19) that was first reported at the end of 2019 has impacted almost every aspect of life as we know it. This paper focuses on the incidence of the disease in Italy and Spain—two of the first and most affected European countries. Using two simple mathematical epidemiological models—the Susceptible-Infectious-Recovered model and the log-linear regression model, we ...

  2. Applications of Big Data Analytics to Control COVID-19 Pandemic

    The study presents as a taxonomy several applications used to manage and control the pandemic. Moreover, this study discusses several challenges encountered when analyzing COVID-19 data. The findings of this paper suggest valuable future directions to be considered for further research and applications.

  3. Data interpretation and visualization of COVID-19 cases using R

    Data analysis and visualization are essential for exploring and communicating medical research findings, especially when working with COVID records. Results. Data on COVID-19 diagnosed cases ... American College of Academic International Medicine-World Academic Council of Emergency Medicine Multidisciplinary COVID-19 working group consensus paper.

  4. A critical analysis of the impacts of COVID-19 on the global economy

    Considering the above, this paper employed archival data consisting of journal articles, documented news in the media, expert reports, government and relevant stakeholders' policy documents, published expert interviews and policy feedback literature that are relevant to COVID-19 and the concept of CE.

  5. The prediction and analysis of COVID-19 epidemic trend by ...

    Aktar, S. et al. Machine learning approach to predicting COVID-19 disease severity based on clinical blood test data: Statistical analysis and model development. JMIR Med. Inf. 9 , e25884 (2021).

  6. Coronavirus disease (COVID-19) pandemic: an overview of systematic

    The spread of the "Severe Acute Respiratory Coronavirus 2" (SARS-CoV-2), the causal agent of COVID-19, was characterized as a pandemic by the World Health Organization (WHO) in March 2020 and has triggered an international public health emergency [].The numbers of confirmed cases and deaths due to COVID-19 are rapidly escalating, counting in millions [], causing massive economic strain ...

  7. Data science approaches to confronting the COVID-19 pandemic: a

    For citing papers, half were not in medicine, biology or public health. Most of these fields are computational sciences. These bibliographic analysis results suggest that COVID-19 research is highly multidisciplinary and there is strong evidence of knowledge transfer between different disciplines. Figure 5.

  8. COVID-19 Open-Data a global-scale spatially granular meta ...

    In this paper, we introduce the COVID-19 Open Dataset (COD), available at goo.gle/covid-19-open-data. A static copy is of the dataset is also available at figshare 2 .

  9. Data-based analysis, modelling and forecasting of the COVID-19 outbreak

    Since the first suspected case of coronavirus disease-2019 (COVID-19) on December 1st, 2019, in Wuhan, Hubei Province, China, a total of 40,235 confirmed cases and 909 deaths have been reported in China up to February 10, 2020, evoking fear locally and internationally. Here, based on the publicly available epidemiological data for Hubei, China from January 11 to February 10, 2020, we provide ...

  10. A machine learning analysis of COVID-19 mental health data

    Introduction. In late December 2019, the novel coronavirus (Sars-Cov-2) and the resulting disease COVID-19 were first identified in Wuhan China 1. The disease slipped through containment measures ...

  11. Predicting the incidence of COVID-19 using data mining

    Background The high prevalence of COVID-19 has made it a new pandemic. Predicting both its prevalence and incidence throughout the world is crucial to help health professionals make key decisions. In this study, we aim to predict the incidence of COVID-19 within a two-week period to better manage the disease. Methods The COVID-19 datasets provided by Johns Hopkins University, contain ...

  12. Big Data Visualization and Visual Analytics of COVID-19 Data

    In this paper, we present a big data visualization and visual analytics tool for visualizing and analyzing COVID-19 epidemiological data. The tool helps users to get a better understanding of information about the confirmed cases of COVID-19. Although this tool is designed for visualization and visual analytics of epidemiological data, it is ...

  13. Analysis and Prediction of COVID-19 using Regression Models and Time

    The COVID-19 dataset for India is being used to serve the research of this paper. The model is predicting the number of confirmed, recovered, and death cases based on the data available from March 12 to October 31,2020. For forecasting the future trend of these cases, we are utilizing the time series forecasting approach of tableau.

  14. Data capture and sharing in the COVID-19 pandemic: a cause for concern

    Routine health care and research have been profoundly influenced by digital-health technologies. These technologies range from primary data collection in electronic health records (EHRs) and administrative claims to web-based artificial-intelligence-driven analyses. There has been increased use of such health technologies during the COVID-19 pandemic, driven in part by the availability of ...

  15. Research Papers

    The Johns Hopkins Coronavirus Resource Center has collected, verified, and published local, regional, national and international pandemic data since it launched in March 2020. From the beginning, the information has been freely available to all — researchers, institutions, the media, the public, and policymakers. As a result, the CRC and its data have been cited in many published research ...

  16. Data Analytics for the COVID-19 Epidemic

    Abstract: With the spread of COVID-19 worldwide, peo-plej -s production and life have been significantly affected. Artificial intelligence and big data technologies have been vigorously developed in recent years. It is very significant to use data science and technology to help humans in a timely and accurate manner to prevent and control the development of the epidemic, maintain social ...

  17. A data science perspective of real-world COVID-19 databases

    5.1. Introduction. Since the 1918 Spanish flu, the COVID-19 pandemic is the biggest public health crisis faced by mankind. As of January 2021, more than 91 million people have been infected by COVID-19 and more than 1.9 million people have died worldwide (Johns Hopkins University, 2021).The economy of many countries has been damaged by COVID-19 due to mandatory lockdowns and billions of ...

  18. Methodologies for COVID-19 research and data analysis

    Methodologies for COVID-19 research and data analysis. This collection has closed and is no longer accepting new submissions. In March 2020, the World Health Organization (WHO) declared COVID-19 a pandemic, caused by the novel SARS-CoV-2 virus. Following the call from the WHO to immediately assess available data to learn what care approaches ...

  19. Enhancing CNN-LSTM neural networks using ...

    For the image, imagine a detailed and intricate network graph visualized in VOSViewer, 12 showcasing nodes representing critical terms related to COVID-19, Machine Learning, and Deep Learning. Each node's size corresponds to the term's prevalence within the dataset, while lines connecting the nodes indicate the strength and frequency of their co-occurrence in the literature.

  20. Parent perceptions of social well‐being in children with special

    In the current paper, we report on the survey results related to the perceptions of parents regarding their children's peer relationships and social well-being during the first wave of COVID-19, drawing on quantitative (scaled and closed-ended questions) and qualitative (open-ended questions) data. 2.1 Participants

  21. Comparison of T cells mediated immunity and side effects of mRNA

    Since late 2019, COVID-19, ... Data analysis. Statistical analysis was performed using Statistical Package for Social Sciences (SPSS) version 26. Categorical parameters were presented as absolute numbers with percentages and continuous parameters as medians with interquartile ranges. ... Saja Ebdah: collecting data and samples, lab analysis and ...

  22. Methodological quality of COVID-19 clinical research

    A total of 9895 titles and abstracts were screened and 686 COVID-19 articles were included in the final analysis. Comparative analysis of COVID-19 to historical articles reveals a shorter time to ...

  23. Definition of Post-COVID-19 Condition Among Published Research Studies

    Despite the growing volume of research on lasting symptoms of COVID-19, the definition has not been universally agreed on. This study aimed to describe how post-COVID-19 condition has been defined to date in studies on this topic. ... Acquisition, analysis, or interpretation of data: Chaichana, Man, Chen, George, Wilson, Wei.

  24. Socioeconomic factors associated with use of telehealth services in

    To examine potential changes and socioeconomic disparities in utilization of telemedicine in non-urgent outpatient care in Nevada since the COVID-19 pandemic. This retrospective cross-sectional analysis of telemedicine used the first nine months of 2019 and 2020 electronic health record data from regular non-urgent outpatient care in a large healthcare provider in Nevada.

  25. International Collaboration in Selected Critical and Emerging Fields

    Artificial intelligence (AI) and COVID-19 research are two areas that have complex challenges that both domestic and international institutions are motivated to overcome. A concentration on domestic research can indicate the presence of sufficient domestic knowledge and resources or an interest in preserving in-house expertise. This InfoBrief examines the extent to which top producers of ...

  26. The effect of antifibrotic agents on acute respiratory failure in COVID

    Background The coronavirus disease 2019 (COVID-19) pandemic has had a significant impact on global health and economies, resulting in millions of infections and deaths. This retrospective cohort study aimed to investigate the effect of antifibrotic agents (nintedanib and pirfenidone) on 1-year mortality in COVID-19 patients with acute respiratory failure. Methods Data from 61 healthcare ...

  27. A systematic review and meta-analysis of published research data on

    A systematic review and meta-analysis of published research data on COVID-19 infection fatality rates. ... This paper presents a systematic effort to collate and aggregate these disparate estimates of IFR using an easily replicable method. While any meta-analysis is only as reliable as the quality of included studies, this will at least put a ...

  28. Policing during a pandemic: A case study analysis of body ...

    Using a population of 136 interactions involving suspected violations of COVID-19 ordinance violations between March 2020 and November 2020, this study uses convergent holistic triangulation within a mixed-method research design to extract data for quantitative analysis paired with qualitative analysis of transcripts and BWC footage. Results.

  29. Child mask mandates for COVID-19: a systematic review

    Background Mask mandates for children during the COVID-19 pandemic varied in different locations. A risk-benefit analysis of this intervention has not yet been performed. In this study, we performed a systematic review to assess research on the effectiveness of mask wearing in children. Methods We performed database searches up to February 2023. The studies were screened by title and abstract ...

  30. Coronapod: The big COVID research papers of 2020

    Download MP3. In the final Coronapod of 2020, we dive into the scientific literature to reflect on the COVID-19 pandemic. Researchers have discovered so much about SARS-CoV-2 - information that ...