Skip to content

Read the latest news stories about Mailman faculty, research, and events. 

Departments

We integrate an innovative skills-based curriculum, research collaborations, and hands-on field experience to prepare students.

Learn more about our research centers, which focus on critical issues in public health.

Our Faculty

Meet the faculty of the Mailman School of Public Health. 

Become a Student

Life and community, how to apply.

Learn how to apply to the Mailman School of Public Health. 

The Future Is: Data Science for Health

An unprecedented volume of health data demands a new generation of scientists equipped to translate data into improved health outcomes—and do so ethically..

Long before the advent of machine learning, interactive data visualizations, or the flurry of concern and wonder surrounding ChatGPT, there were public health officials manually collecting and cataloging health data, then analyzing those data by hand with the aim of improving the health of their communities. “Public health has always been a very data-oriented discipline,” says Jeff Goldsmith, PhD , associate dean of data science and associate professor of Biostatistics . “Today, we are seeing a natural progression and growing sophistication of analytic techniques that public health researchers are using to address the same fundamental questions that we always have.”

Goldsmith sees the arrival of artificial intelligence (AI), augmented intelligence, and machine learning as a natural evolution in public health. (Augmented intelligence itself evolved out of AI; it involves applying AI to enhance, rather than replace, human tasks and decision-making.) These tools are becoming increasingly essential to translate an unprecedented volume of data into population-wide health improvements. “We’ve moved from a world with a paucity of data to one with an overabundance of it,” says Moise Desvarieux, MD, PhD, MPH ’91 , associate professor of Epidemiology . According to Nature Genetics, there were an estimated 2,314 exabytes of health data produced worldwide in 2020, up from 153 exabytes in 2013. (Five exabytes is thought to be equal to all the words ever spoken by humanity.) With this explosion of data comes tremendous potential to improve public health, but also the dangerous possibility that technology—or those who wield it—will exacerbate disparities.  

Big Data, Getting Bigger

Health data now extend far beyond information that has traditionally been collected—demographics, environmental exposures, medical history, family history—to new sources such as continuously collected activity levels. Desvarieux offers the example of renting a Citi Bike in New York. “We know when and where the person got on and off the bike, the distance they rode, whether there was a hill, and the amount of time they spent riding.” Data from sources such as Citi Bikes, smartphones, and wearable devices present a rich opportunity for public health. “We have not only personal data, but also data on our environment, the quality of the air we breathe, the soil quality,” Desvarieux says. 

In research at Columbia Mailman School, Desvarieux and colleagues are using personal and environmental data, as well as genetic sequencing data, to pinpoint personalized risk estimates for someone’s likelihood of developing a given chronic condition. Genetic sequencing technology can now paint deep and comprehensive pictures of individual genomes, too. Taken together, the data on behaviors, biology, risks, environment, genomics, and more can help public health researchers determine who may be at a greater risk for adverse health outcomes, and the best ways to mitigate those risks. This quantity and diversity of information mark what Desvarieux calls “the new world” in public health data science. However it’s characterized, this abundance of data requires new skills from public health professionals.   

Equipping a New Generation

The School recognizes this demand, and just graduated its first cohort of students from the MS Public Health Data Science track. Introduced three years ago, it has quickly become the most popular MS degree program track, with 54 new students this fall. “I don’t see demand slowing down any time soon,” says Kiros Berhane, PhD , the chair of Biostatistics . “All signs point to the need for more computationally heavy techniques.”

Berhane describes data science as an umbrella term encompassing a fusion of rigorous statistical principles (vitally important where health is concerned) and quickly evolving computer science–driven machine learning and AI techniques. “The discipline is about the ability to arrive at conclusions based on evidence you get from the data, coupled with machine learning and artificial intelligence techniques able to handle huge quantities of data,” he says. Students in the MS Public Health Data Science track learn skills including data reproducibility, management, and manipulation; how to use graphics effectively; dissemination and visualization of data; and web scraping, which involves gathering data from many different web sources when those data aren’t formatted or structured the same way. These students also focus on data science methods, including supervised learning (machine learning on labeled data) and unsupervised learning (machine learning on unlabeled data). With unsupervised methods, students learn how machines can identify hidden patterns in data that humans may not have observed.

Jeff Goldsmith, PhD Photo credit: Diana Reddy

Although required courses in the Public Health Data Science track are designed to give students a strong grasp of the theories, tools, methods, and terms that make up the vast world of AI, the practical application of these methods can’t be taught in a classroom alone. To that end, each student is required to complete a practicum. Students work on designing and proposing their practicum projects with their faculty advisors, then submit a report and give an oral presentation.

These practical applications tap into a range of tools and methods that are inherently broad—just as AI itself is broad. Goldsmith acknowledges how this breadth can make defining terms challenging. “It’s hard to pin down exactly what we mean by AI,” he says. “Artificial intelligence encompasses traditional statistical approaches but also convolutional neural networks,” which are used to identify patterns, including in images, as is the case with facial recognition.  

 When defining AI in a public health context, Goldsmith says, “We’re trying to take relatively big, complicated information on individuals and understand how that changes their health outcomes.” In the case of cancer, for instance, a researcher might utilize AI to pull from an individual’s biological and genomic sequencing information but would also explore vast troves of data from the broader population. Then the insights could be coupled with environmental or exposure data to calculate an individual’s risk.

Beyond graduate program curricula, the School is also building a pipeline for undergraduates to enter the public health data science field. Several years ago, it received funding from the National Institutes of Health (NIH) to launch the Summer Institute in Biostatistics and Data Science at Columbia . Undergraduate students spend seven weeks learning about data science software, analytic tools, and responsible research conduct. They also tackle data analysis projects using data from the National Heart, Lung, and Blood Institute and the National Institute of Allergy and Infectious Diseases. The data come from clinical studies of chronic disease and infectious disease treatment and prevention. 

In working with practicing biostatisticians and the investigators actively engaged in these studies, students have a chance to see and experience firsthand how the principles of data science shape public health—and to contribute to the field. Students enrolled in this summer program have applied data science to projects ranging from analyses of dementia biomarkers to comparisons of schizophrenia treatments to the effect of expanded access to HIV treatment in Lesotho. The school welcomed its second cohort into the free program last summer.  

Columbia Mailman School is also focused on partnering with scientists in the international community to share knowledge, including through a program with the Addis Ababa University in Ethiopia and the University of Nairobi in Kenya. “Data science is a global phenomenon,” Berhane says. This program, born out of a $1.7 million award from the NIH, is meant to create new training opportunities in health data science in Eastern Africa. It’s part of a five-year, $74.5 million NIH initiative, and its goal is to support research projects focused on the ethical, legal, and social implications of data science research.

Through this grant, faculty members abroad are paired with Columbia faculty mentors who work with them through research, coursework, boot camps, and training. Then, for a week in the fall, these faculty members visit Columbia Mailman School for a week, after which they bring their knowledge back to their own institutions and serve as peer mentors to subsequent scholars.

AI Tools for Better Health

As Columbia Mailman School researchers mentor and train the next generation of public health data scientists both on Columbia’s campus and abroad, they themselves continue to pave the way for data science’s role in improved public health outcomes. Desvarieux, for instance, is part of a team using AI to develop tools to personalize prevention for chronic conditions including asthma, cardiovascular disease, diabetes, respiratory diseases, and cancer. The project was one of eight projects to receive a Centennial Grand Challenge research grant from the School in 2022, a signal of its critical importance to the field of public health. 

“We’re hoping to develop a semi-automatic algorithm that we could translate into tools that are impactful for any person or health professional at the user level,” Desvarieux says. The algorithm behind the tools learns from clinical research and real-world information spanning medical data, environmental exposure data, behavioral data, and biological data, among other categories.

To date, the health applications for these sophisticated data science tools have mostly involved personalizing treatment decisions for patients who already have a disease or condition. In contrast, Desvarieux’s tools would help determine when to intervene with certain prevention strategies depending on an individual’s risk factors. 

Another data science research project underway is the Interstitial Lung Disease Diagnostics Tool, through which Qixuan Chen, PhD , associate professor of Biostatistics , and her colleagues aim to help radiologists more accurately and quickly determine diagnoses. The tool, which is meant to help radiologists distinguish between chronic hypersensitivity pneumonitis, usual interstitial pneumonia, and nonspecific interstitial pneumonia, is designed to be accessed through a free, user-friendly app. It asks radiologists to specify the presence or absence of four CT scan features that Chen and her team identified as the most important in differentiating between diagnoses. Although the list of potential features could have been longer, Chen and her Biostatistics colleagues used a statistical method called a Bayesian additive regression tree to pinpoint four. “The radiologists don’t need to answer 20 questions,” Chen says, explaining how they are already spread thin. “When we only talk about four important features, it makes their life easier, and they are more likely to adopt it.”

Chen and her team are still fine-tuning the tool, but the algorithm has already demonstrated a strong predictive ability she believes could help minimize errors and ensure patients get the right intervention at the right time.

Acknowledging and Ending Bias

Becoming a public health data scientist in 2023 means building and applying new algorithms, tools, code, and techniques to vast troves of data to improve health outcomes. Increasingly, it also means learning to recognize biases, and to understand these tools have a dangerous capacity to deepen health disparities. “Public health must take the lead in protecting health while leveraging new technologies to improve human health,” says Gary Miller, PhD , vice dean for research strategy and innovation . “There is extraordinary potential for AI and data science to improve human health, but there is also extraordinary potential for these systems to exacerbate disparities and create new health problems.” Berhane notes that we have seen this risk already in many settings (for example, criminal profiling, in which AI has fueled dangerous biases). During the COVID-19 pandemic, it came to light that the pulse oximeters used to measure blood oxygen levels were more often inaccurate when used on Black patients versus white patients. If complex algorithms meant to predict disease risk are trained using data that don’t adequately represent certain subgroups, those tools cannot be expected to work in those populations. The same is true of drugs and treatments clinically tested on homogenous patient populations.  

“Big data can give you a false sense of comfort if misused,” Berhane says. “If the millions or billions of data points you have are not coming from the entire population that is being targeted for subsequent actions, then it’s actually dangerous.” Safeguarding the next generation of public health data scientists against these biases and disparities involves continuous learning, ethical discussions, and open discourse. It also involves ensuring that the researchers, faculty, and students engaging in this discourse themselves represent the diverse populations. Diversity in the public health data science field, to that end, is crucial both in the U.S. and globally, Berhane says. “There are many sections of the world that don’t have the capacity to collect data, but decisions are being made for them based on data from elsewhere,” he adds. “A seat at the table in data science is powerful.”

Health and science reporter Caroline Hopkins is a 2019 graduate of Columbia Journalism School.

The Future Is: Data Science for Health  was first published in the 2023-2024 issue of Columbia Public Health Magazine . 

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

  • View all journals
  • My Account Login
  • Explore content
  • About the journal
  • Publish with us
  • Sign up for alerts
  • Open access
  • Published: 19 October 2021

Genomic health data generation in the UK: a 360 view

  • Elizabeth Ormondroyd   ORCID: orcid.org/0000-0002-9116-4064 1 , 2 ,
  • Peter Border 3 ,
  • Judith Hayward 4 , 5 &
  • Andrew Papanikitas 6  

European Journal of Human Genetics volume  30 ,  pages 782–789 ( 2022 ) Cite this article

5884 Accesses

7 Citations

17 Altmetric

Metrics details

  • Health policy
  • Risk factors

In the UK, genomic health data is being generated in three major contexts: the healthcare system (based on clinical indication), in large scale research programmes, and for purchasers of direct-to-consumer genetic tests. The recently delivered hybrid clinical/research programme, 100,000 Genomes Project set the scene for a new Genomic Medicine Service, through which the National Health Service aims to deliver consistent and equitable care informed by genomics, while providing data to inform academic and industry research and development. In parallel, a large scale research study, Our Future Health, has UK Government and Industry investment and aims to recruit 5 million volunteers to support research intended to improve early detection, risk stratification, and early intervention for chronic diseases. To explore how current models of genomic health data generation intersect, and to understand clinical, ethical, legal, policy and social issues arising from this intersection, we conducted a series of five multidisciplinary panel discussions attended by 28 invited stakeholders. Meetings were recorded and transcribed. We present a summary of issues identified: genomic test attributes; reasons for generating genomic health data; individuals’ motivation to seek genomic data; health service impacts; role of genetic counseling; equity; data uses and security; consent; governance and regulation. We conclude with some suggestions for policy consideration.

Similar content being viewed by others

health data research futures

Immortal data: a qualitative exploration of patients’ understandings of genomic data

health data research futures

A practical checklist for return of results from genomic research in the European context

health data research futures

Genome sequencing in healthcare: understanding the UK general public’s views and implications for clinical practice

Introduction.

Once the preserve of specialist clinical genetics departments, patient genomic health data are now generated in mainstream medicine and from research participants. In the UK, the 100,000 Genomes Project (100kGP), a hybrid clinical/research programme, aimed to deliver genome sequencing to individual patients and families in the National Health Service (NHS) [ 1 ] while simultaneously developing a resource for research and development. 100kGP informed the recently implemented NHS Genomic Medicine Service (GMS) [ 2 ], which, through a ‘National Genomic Test Directory’ (NGTD) aims to promote consistent and equitable access to genomic tests for patients, and build a national genomic knowledge base to inform academic and industry research and development. In parallel with such initiatives, ‘Direct-to-consumer’ genetic tests (DTC-GT) can be purchased via the internet and from pharmacies. Whilst DTC-GT may have clear disclaimers about their use for healthcare decision-making, they nonetheless offer health information on which customers may base such decisions.

For individuals, genomic information can support (or reduce likelihood of) a clinical diagnosis, warn about future disease risk that might be managed through targeting of healthcare resources and behaviour change, or alert to asymptomatic disease. Generating genomic data on a large scale might prove a powerful means of improving population health through greater understanding of genomic contributions to health and disease. This potential has driven public investment in endeavours such as UK Biobank [ 3 ], AllofUs [ 4 ] in the US, and the developing Our Future Health (OFH) initiative in the UK [ 5 ]. The commercial sector has recognised the appeal of marketing genetic health data agnostic to disease presentation [ 6 ], as well as the commercial value of genomic information [ 7 , 8 ]. Clearly, from a public policy perspective, there are implications for healthcare services of reporting non clinically-directed genomic health data to individuals [ 9 ].

In 2019, the UK Parliament Science and Technology Select Committee (STC) set up an inquiry into commercial genomics to collect and assess evidence to inform future policy. To contribute comprehensive evidence to the enquiry, and to explore broadly how current models of genomic health information intersect, academic researchers (EO, AP) partnered with the Parliamentary Office of Science and Technology (POST; PB), and the Health Education England Genomics Education Programme (JH). We conducted a series of five multidisciplinary panel discussions during May–July 2020. Our aim was to generate an exchange of ideas from informed stakeholders rather than a series of ‘official positions’. Representation was successfully sought from clinical, genomic science, ethical, legal, DTC-GT provider, public health and patient support groups. Discussions focused on three sources of genomic health data generation: NHS, research, and commercial sector (Table  1 ). A total of 28 professionals took part; many had more than one concurrent role, or had previous experience of alternative sectors. The exercise began with a strengths, weaknesses, opportunities and threats analysis in the first two meetings to generate a framework for the subsequent meetings, which the authors discussed and developed after each meeting. All meetings were conducted under Chatham House rule, [ 10 ] recorded and transcribed verbatim. A near-final draft of this manuscript was sent to all discussants; comments received were incorporated.

This process identified a number of issues as discussed in this paper: genomic test attributes; reasons for generating genomic health data; motivation to seek genomic data; health service impacts; role of genetic counseling; equity; data uses and security; consent; governance and regulation. We conclude with some suggestions for policy consideration.

Test attributes

In this article we use the term ‘test’ although in some contexts genomic health data generation might be considered a screen rather than a test [ 11 ].

We considered two types of genetic test technology: sequencing of one or more genes, including the ‘whole’ genome (or the protein-coding sections, the exome), and single nucleotide polymorphism (SNP) genotyping. While sequencing ‘reads’ the genetic code and can detect individual sequence variation, SNP genotyping samples genetic variation at hundreds of thousands of specific locations across the genome. Through the NGTD [ 2 ], the GMS specifies which genomic tests are commissioned by the NHS, appropriate test technology and which patients are eligible; exome or genome sequencing are specified for some disorders. A significant proportion of rare disease develops as a result of variants in a single gene that are very rare in the population; clinical testing to investigate rare disease involves sequencing of one or a panel of genes with proven disease association. Many ‘common’ or multifactorial diseases are influenced by multiple variants that occur at higher frequency and individually have small effect size; a polygenic risk score (PRS) for a given disease can be calculated by combining information from multiple SNPs. SNP-based testing is not part of the NGTD at present. Discussants reported that most of the prominent DTC-GT companies are using SNP genotyping [ 12 ], although some offer genome sequencing, and one provider predicted that more will move towards offering sequencing. Individuals may also upload their raw data to automated ‘third party’ interpretation services for a small fee, allowing consumers to access much more data than the test provider offers [ 13 , 14 ]. Research programmes use SNP arrays or genome/exome sequencing, or both; for example, UK Biobank has recently added exome sequencing data on 200,000 of its 500,000 participants [ 15 ].

In discussions, SNP genotyping of rare variants conferring high disease risk (such as BRCA1/2 mediated cancer predisposition) was considered problematic since there may be a high proportion of false variant calling [ 16 , 17 ] when analytic validation has not been performed. SNP arrays have a positive predictive value of <16% for detecting very rare variants [ 18 ]. Individuals may receive erroneous reports informing of a high risk of disease, potentially causing anxiety, wasted clinical time and expense repeating the test using appropriate technology [ 19 ]. However one participating DTC-GT provider has received regulatory authorisation for reporting on variants with high disease risk, having met a >99% accuracy and reproducibility standard for those variants.

Clinical validity

In rare disease genetics, variant interpretation—understanding the contribution of a specific variant to disease—remains a challenge, and a focus for international efforts [ 20 ]. There is often insufficient evidence supporting variant association with disease, and interpretation can change over time as more data become available. Very large population databases are now available to aid in variant interpretation [ 21 ], with a major caveat that diverse ethnicities are under-represented. Discussants noted that increasing generation of genome sequencing data has already improved understanding of rare disease-gene relationships, and were optimistic this would continue.

Genetic disease risk prediction is often uncertain, particularly for healthy individuals receiving a rare variant result, since understanding of rare variant penetrance is based on families showing penetrant disease [ 22 ]. In the NHS, expert scientists in accredited laboratories interpret rare variants in genes with proven association with the clinical presentation, and report only those with a probability of clinical significance. In contrast, when individuals present to the NHS with ‘results’ from DTC-GT, the provenance and interpretation of data may be unclear. Further, SNP arrays allow assessment of a limited number of variants: a provider may design SNP arrays to include certain rare pathogenic variants, such as three BRCA variants that are over-represented in people of Ashkenazi Jewish descent [ 23 ]. However absence of a variant does not preclude the presence of another pathogenic variant not assayed by the specific SNP-genotyping, and customers with a clinical or family history of disease might be falsely reassured by a ‘negative’ result. Discussants saw this as a direct harm, compounded by misunderstandings about genetic contributions to disease, although the regulatory authorisation process completed by one DTC-GT provider had explicitly sought to mitigate false reassurance and misunderstanding through user comprehension of key concepts.

Clinical utility

Clinical utility [ 24 , 25 ] was described as ‘multi-layered’, and lacking a consistent definition. Some suggested it should encompass some element of ‘personal utility’ [ 26 ], however some argue that personal utility is contingent upon clinical validity [ 27 ]. Risk management strategies can be harmful to individuals as well as economically costly; evidence is required to inform decisions about frequency, timing and effectiveness of genomics-based interventions. Most discussants thought PRS are not ready for clinical application; further evidence is required around data interpretation, risk magnitude and risk management, communication and decision support to enact lifestyle changes [ 28 ]. While behaviour change is central to the premise of preventive risk prediction, and therefore to the utility of PRS in common disease, discussants agreed that it is difficult to measure. Several noted the lack of evidence that PRS change health behaviours [ 29 , 30 ], and questioned the added value of PRS over and above well-established lifestyle advice. The view was expressed that behaviour change can be a long process; one person speculated that learning risk information at a younger age might be more effective, since people would have longer to ‘build good habits’. A further view was that patient benefit would accrue from clinicians’ responses to PRS. DTC-GT providers reported that customer surveys show that there is significant demand for genomic health information, but accepted that demand alone does not imply utility.

Reasons for generating genomic health data

Clinically-directed genetic testing has been available in the NHS for several decades for molecular genetic investigation of a clinical presentation, and predictive testing for family members. Patient management is often informed by genetic test results. In discussions, genome sequencing was considered a powerful additional tool for rare disease genetic diagnosis in a clinical setting, including through 100kGP and other UK clinical research programmes such as Deciphering Developmental Disorders (DDD) [ 31 ] and the NIHR BioResource for Rare Disease [ 32 ]. Clinical contributors recognised the potential for genomic data to identify new gene/disease associations and refine phenotypic spectrum, and were enthusiastic about a service delivery model which combines capture of genome sequencing with phenotype data allowing aggregate and reiterative analysis such as that proposed by the GMS [ 2 ].

The dual clinical and research aims were seen as essential to the success of 100kGP, but they also created tensions, for example that patients are also research participants (consent to data collection and research access was a condition of participation), and clinicians may also be researchers, or acting on researchers’ behalf.

Discussants accepted that a key aim of commercial DTC–GT is to generate revenue, achieved in part by creating a large dataset to which access is sold for secondary research [ 33 ]. Some commercial providers mentioned specific data access partnerships with global reach, commenting that partnerships with pharmaceutical companies can be lucrative.

The explicit purpose of established population-scale research projects is to improve understanding of human health and disease; significant advancements have been made through initiatives such as UK Biobank [ 3 ], which links to participant NHS datasets from the vast majority registered with the NHS. UK Biobank provides access for proposed research that is in the public interest; researchers must undertake to publish results and to make derived data and methods available to other approved researchers [ 34 ]. OFH (previously named Early Disease Detection) aspires to create health and wealth, providing a platform for discovery research to improve the early detection or diagnosis, prevention and interventions for chronic diseases [ 5 ]. OFH will create a large dataset comprising phenotype and lifestyle data, and some genomic information derived from SNP based array technology. As envisioned by the UK Industrial Strategy Challenge Fund, OFH will partner with the NHS, potentially both for phenotype data collection and participant recruitment. OFH plans to offer some feedback of genetic information to participants. Some discussants were sceptical about the individual benefits of data collection initiatives which do not aim to answer a clinical question, and considered that maintaining realistic participant expectations should be prioritised. Some did not view large research programmes with a commercial funding component as being significantly different from DTC-GT.

Individuals’ motivation to seek genomic information

100kGP participants took part both to derive personal or family health information, often in context of a diagnostic odyssey, and for altruistic reasons [ 35 ]. Patient representatives had a view that patients were ‘donating’ in order to further aims with intrinsic value, comparable with making a charitable donation, and on that basis had supported 100kGP. Several discussants listed the benefits for families: offering a diagnosis and promoting establishment of patient groups for mutual support, as well as hope for treatments; conversely, some discussed clinical experience of distress to patients when a diagnostic label was removed. Discussants believed there is a strong element of social solidarity in NHS-delivered services, however 100kGP was perceived as having become politicised, with one result being that the benefits for patients—likelihood of deriving clinically useful information—were overemphasised to patients and clinicians [ 36 , 37 ]. Some suggested that patients in a clinical care pathway, offered appropriate genetic testing, might be less likely to seek non-NHS testing.

Primary care discussants (UK-based) reported that relatively few of their patients have so far presented with DTC-GT information; those who do tend to be relatively young, educated and affluent. One person suggested that some consumers perceive they are helping the NHS. Marketing materials presenting genetic information, particularly risk information, were considered a strength of commercial provision. However, some suggested that advertising might create demand for information that people hadn’t known they wanted. Marketing may promote tests as recreational, suitable as gifts [ 38 ], and consumers may not appreciate the potential for adverse health information. Evidence was discussed showing that DTC-GT companies expose people to more positive messages about the health benefits of test purchase and little about the limitations, accuracy and risks, which can often only be found in the contract as disclaimers [ 39 ]. Indeed, many commercial providers state that information provided does not constitute medical advice, but some discussants perceived a disconnect between sales promotion messaging and contract content.

From a healthcare perspective, a view was expressed that the NHS remains paternalistic and inadequately educates for agency in healthcare. Some suggested that patients have limited control over the information they can access in the NHS, whereas in the commercial sector consumers can theoretically access information on demand.

Health service impacts

When genomic health data is generated for reasons other than to investigate a clinical question, a key issue for considering potential impacts on healthcare services is the type and extent of individual data made available to contributors. For example, rare variant information would warrant specialist genetics referral, genetic counselling, laboratory confirmation, clinical follow up, and family cascading.

Primary care contributors considered that health risk information can prompt useful conversations with patients. However they were wary of information that may be unclear, have uncertain evidence base, or that requires specialist referral yet is unprompted by clinical need. Primary care and clinical genetics discussants agreed that the small numbers of patients coming to the NHS with DTC-GT information take significant amounts of resources and have high expectations of a clinical response. Clinical genetics professionals reported that such patients are often anxious and confused about their disease risks. Referrals are usually accompanied by a well-presented multiple-page readout which health professionals struggle to ‘unravel, explain and undo any emotional damage’. Investigation may be hampered by lack of transparency about how the data were generated and interpreted, yet patients might have begun psychological and clinical preparation for risk-reducing surgery. Conversely, it has been argued that current testing guidelines are too conservative [ 23 , 40 , 41 ].

Research programmes have been encouraged to formulate a plan for returning individual results; UK Biobank for example does not provide individual feedback about information derived from analyses of data or samples [ 42 ], but may report ‘incidental’ findings detected during data collection [ 3 ]. One discussed the analogous example of reporting potentially serious incidental findings from collection of research imaging data to participants’ GPs; concerns centred on the meaning of findings, and capacity of the NHS to manage them [ 43 ].

Several discussants’ clinical experience suggested that many people struggle to understand risk, and often consider genetic information ‘deterministic’. There was a perception that patients, the public and many healthcare professionals overestimate genetic contributions to disease and the potential of genomics to explain and predict disease [ 44 ]; an example was discussed of a UK minister who publicly misinterpreted personal risk from a PRS [ 45 ]. Such beliefs could be amplified if people believe the private sector can outperform the NHS, or that personal expenditure for a test correlates with quality [ 46 ]. These factors may lead to unrealistic expectations of clinical action based on results.

Discussants talked about the importance of integrating genomics education into curricula [ 47 ]; continuing education of all health professionals was considered a crucial task requiring multidisciplinary models. Work on genomic medicine education is ongoing [ 48 ], but some felt that competence is currently patchy, potentially compromising equity of access and service delivery. Educational content should include test attributes such as analytic validity, but also ethical and social issues around genetic testing; some believed that these issues are embedded in the extensive training undergone by health professionals which provides an aspirational framework. Educating the public was also seen as important, to promote the concept that genetics is part of a longer term understanding of human health, and promote agency. Some felt this would ideally form part of the school national curriculum. One DTC-GT provider suggested that ancestry testing is a useful entry point for understanding genetics.

Role of genetic counselling

Discussants had divergent ideas about the role of genetic counselling and whether, how, and when it should be provided. In a clinical setting, genetic counselling provides individually contextualised information about risk, aids adaptation to increasingly complex results and directs appropriate follow up care. Individual genetic test results can be important for relatives and the onus is on the individual to inform their relatives; discussing familial risk provides a forum for identifying ethical complexities and supporting communication. The role and purpose of genetic counselling in DTC-GT, or in research when patients are offered test results in return for data contribution, is unclear. Although some have encouraged genetic counseling provision in DTC-GT [ 49 ], genetic counselling arguably sits uneasily with a commercial transaction. DTC-GT providers recognised that genetic risk information can have a range of impacts; one provider explained that some companies employ genetic counsellors, but only about 3–4% of customers use this service. Another DTC-GT provider had designed education modules to be viewed before opening results; this was thought potentially useful, but dependent on the reader’s ability to understand and process probabilities.

Good quality DTC-GT was acknowledged to have an important role where clinical services are inaccessible; one contributor suggested that recent scrutiny has improved larger DTC-GT companies’ products. Some discussants were concerned that genomic health data might increase health disparities; they felt strongly that skewing resources towards people who already access healthcare efficiently is unhelpful. While concerns were expressed that the pay-for-service model inevitably impacts equity, it was acknowledged that while equity of access is a linchpin of the NHS, not all sections of society receive equitable care. Discussants mentioned that a key aim of the GMS is to increase equity of access, but that delivery would also need equitable awareness among health professionals.

In the context of research recruiting from the population, some discussed challenges with representativeness, an acknowledged limitation of UK Biobank [ 50 ]. Many genetic databases lack ethnic diversity, which hampers efforts to generate equitable healthcare benefits. Some spoke about dilemmas experienced by indigenous peoples [ 51 ], which have led to a strong data sovereignty movement, limiting participation in genetic research [ 52 ]. Addressing ethnic representation was considered desirable for all sectors, and several people discussed ideas to increase uptake.

Aggregate data uses

Aggregate genomic data linked to phenotypic and health outcomes data were considered to have various values. Economic value of a dataset is dependent on a number of factors including sample size and representativeness, length of follow up, type of genomic data (SNP, exome/genome sequence), depth of phenotype data, ethically approved accessability, and recontactability. Genetic databases represent a very significant commercial asset. Linkage with primary care and hospital episode data (the generation, capture and curation of which incurs significant cost to healthcare systems) clearly increases economic value. Some saw parallels between DTC-GT and ‘tech giants’ who provide a service but also generate wealth from collection and sale of data. According to DTC-GT provider discussants, potential users of DTC-GT datasets include the military, pharmaceutical and insurance companies as well as academic researchers, however a DTC-GT provider stated that their company does not share information with third parties for research purposes without the explicit consent of the customer.

Hope was expressed that large datasets could leverage improved healthcare, with benefits for individuals and society; some suggested that data-sharing in healthcare would become the norm, benefitting from partnerships with technology and possibly pharmaceutical companies. Examples discussed included using artificial intelligence to create algorithms for predictive diagnostics, achieving rare disease diagnoses, studying genetic susceptibility to infectious disease such as COVID-19 or HIV.

Several suggested that patients and the public participate in health data-sharing initiatives in order to further understanding of disease: they assume their data is contributing to the ‘greater good’ [ 53 ]. Public views around data access and uses are often strong, but nuanced [ 54 ] and difficult to articulate and capture. Commercialising data in the public sector was a concern; ‘red lines’—distinguishing the acceptable from the unacceptable—appeared when people considered how data might be used against, rather than for, public benefit: using data to discriminate or inform the benefits system, or as the basis for surveillance or insurance [ 55 ]. Some expressed the concern that many people do not feel informed enough to ask who might access their data, for what purposes and who might derive commercial benefit. One discussant stated that insurance companies are investing in DTC-GT data; this was corroborated by a DTC-GT provider who suggested that insurance companies use the concept of genetic information as a way of encouraging clients to think about their health proactively. The same person felt this underlined the importance of understanding the limitations of genetic information.

The right of ‘individuals to enjoy effective control over their personal information’ is enshrined in European Law [ 56 ], and underpins the EU General Data Protection Regulation (GDPR), the legal framework for processing (including storage) of the personal data of living individuals. GDPR includes genetic data alongside health data or biometric data as categories of data that deserve higher protection. However, since genomic data are shared—it is possible to infer identities of individuals from their relatives in genetic databases [ 57 ]—it can be challenging to determine the extent to which genomic data is ‘personal’ data, and fall within the scope of GDPR [ 58 ].

The potential for a major security breach was seen as a threat to the public’s faith in genetic testing and data storage in all sectors; the risk of a breach was considered higher if there is pressure to streamline costs at the possible expense of data protection. Cyber security is an acknowledged challenge, requiring resources to stay abreast of hackers. One DTC-GT provider discussed standards for data protection, and considered the sector has a responsibility both to meet minimum standards and to provide potential consumers with information about compliance.

Consent is a legal basis for processing data, and all discussants recognised its central role in any endeavour collecting genomic data for future access. Many considered that informed consent in genetic testing should also include understanding the extent and implications of possible findings for the recipient and their relatives. Precisely what information should be conveyed, and how to ensure that consent is ‘informed’ were unresolved. Designing material which is comprehensive and protects an organisation yet is readable is a challenge. Discussion covered the process of gaining consent in a clinical setting, when legal contributors noted that there is now an expectation that consent conversations are individually tailored and take account of personal relevance and level of risk.

Representatives of all sectors acknowledged extensive efforts to optimise consent processes. In DTC-GT, several different types of consent approaches are in use, covering privacy, what data are collected and how they are used, and what choices are available to customers. Disclaimers in consent documents might include reservation of the right to change stated policy at any time. DTC-GT providers discussed separate research consent forms, and how they may re-contact customers to take part in specific research projects; they aim to balance ‘making consent as frictionless as possible while putting enough of a barrier there that people stop and look’. A DTC-GT provider noted that best practice is to have a consent document separate from the terms of service, and for the consent process to be overseen by an independent review body, to ensure compliance with ethical and legal guidelines. For OFH, consent materials have been co-designed with the public, following extensive focus group and interview research, in concert with developing data access and governance arrangements.

In clinical genomics research, participants are patients; consent to past, present and future health data access could be a condition of participation (such as for 100kGP) or optional as in the GMS, where the ‘research offer’ is communicated by patient choice documents. 100kGP was set up with an ethics advisory committee [ 1 ]; embedding recruitment into routine care received much deliberation. Participant information and consent forms were long and complex, and several shared the opinion that 100kGP was positioned primarily as a clinical test to patients who underappreciated its commercial nature. While consent paperwork was standardised, its complexity meant that the extent to which consent could be considered informed was dependent on the individual health professional [ 59 ] delivering an in-person interaction.

Consent for genetic testing/research participation usually occurs at a single point in time; research was discussed showing that retrospectively, patients thought they had underestimated the complexities [ 60 ]. It was felt that for consent to be informed, people need to understand implications beyond personal benefits, including the broader values of personal and aggregate data. Concern was expressed about how gaining informed consent fits alongside clinical activities, and some perceived a possible conflict between seeking informed consent and representing patients’ interests in terms of the balance of potential benefits and harms.

Consent may become problematic as soon as it loses specificity for the immediate question: for example, consent questions asking about feedback of certain types of information beyond the immediate clinical question (‘secondary’ findings), where research has shown that some participants either did not recall consent choices or recalled them incorrectly [ 54 , 61 ]. Attitudes towards the types of health information people might want to learn might change over the life course. Consent forms that combine questions asking for affirmation, with questions asking participants to make a choice, might have the unintended consequence that more time and effort is spent thinking about the latter rather than the larger context of the project.

The impression that generating commercial profit from research data (with or without public benefit) is unpalatable to the public led to discussions about transparency and trust. Some considered that the commercial component of 100kGP presented an ethical dilemma; it was suggested that any commercial company—including when set up to incorporate ethical principles—could be sold on. A view was also expressed that if the state has invested in creating a dataset (recruitment, genotyping and phenotyping, data input and curation), prioritising benefits to the NHS is reasonable.

The ‘broad consent’ model [ 60 , 62 , 63 ] also relies on participant consent at a single time, but many factors can change the use, and usefulness of the data: data interpretation, technology, governance laws, acceptable uses and commercial partners were listed as examples. Equally, social and environmental circumstances may change abruptly, as in the onset of a pandemic, or there may be slower changes to the ‘social contract’. None of these factors can be known or predicted at the time of consent, and some saw broad consent as a kind of ‘get-out clause’. Seeking re-consent was seen as respectful of autonomy but very resource intensive.

The challenges discussed above suggest that consent cannot be the sole means of ensuring that data uses are ethically valid, and that data contributors’ values are respected. Consent is necessary, but not sufficient; it forms one component in the framework in which the test sits. The concept of trust in those frameworks is critical [ 64 ]; several discussants with clinical experience considered that 100kGP benefitted from the trust accrued by the NHS as a beneficent service provider. Participants were ascertained and recruited by NHS personnel in teams often known to them [ 54 ]; recruitment materials carried ‘those three letters in the little blue box’, and ‘assumed [their donation] is going to the greater good’. Publicity materials were perceived to have emphasised the personal benefits of participation, including families who had been provided with a valued diagnosis. Some felt that this message had been disingenuous. For OFH, still in planning stages, reaching an agreement to partner with the NHS for recruitment and data gathering, was recognised as critical.

Governance and regulation

Trustworthiness was considered to encompass ideas of public (including future public) benefit, evidence-based standards, governance [ 65 ], transparency, consistency and communication. Trustworthiness applies to people, infrastructure and processes. Some felt that the sustainability of genomic research depends on public trust, and that data access needs appropriate governance. In the research environment, participants may be told that data uses are unpredictable, but that there are usage restrictions, and discussants mentioned that some very sensitive datasets—for example where ethnic tensions are involved—are not released. Regulating access can happen at several levels, including data access agreements, by affiliation, project review, restriction to secure servers, as well as having data access committees who make strategic decisions. As an example of an adverse event, discussants talked about the sharing with Google DeepMind of identifiable medical information of millions of UK patients through a data-sharing agreement with an NHS Trust without appropriate consent [ 66 ].

One commercial sector provider acknowledged the potential that the prospect of ‘great revenues’, might tempt some to take shortcuts around handling of data and ethical frameworks, and would welcome government-mandated mechanisms to protect consumers. Another commercial provider stated a preference for a balanced approach to regulation, recognising that forthcoming European Union In Vitro Diagnostic Medical Device regulations (IVDR) [ 67 ] are challenging for DTC-GT providers. The need for regulation was re-iterated by submissions to the STC committee hearing. Discussants suggested that the scope of regulation should include data flows, data access, test quality, analytic and evidence-based clinical validity, and variant interpretation. In view of GDPR giving individuals rights over their data, regulation of information and the uses consumers can make of ‘their’ information was considered less feasible. IVDR requires manufacturers to provide evidence for analytical and clinical performance, while leaving member states free to determine how informed consent and genetic counselling should be provided [ 68 ]. From January 2021, all medical devices including in vitro diagnostic medical devices placed on the Great Britain market need to be registered with the Medicines and Healthcare products Regulatory Agency (MHRA) [ 69 ]. The recent House of Commons Science and Technology Committee report makes several recommendations for regulation, and urges the Government to set out a specific timeframe in which it intends to review the case for introducing new regulations for genomic tests provided directly to consumers [ 70 ].

Conclusions

Generation of genomic health data in the UK involves public and private sector interaction at multiple points, and care is required to balance rights and protections of patients, publics and public healthcare systems. Despite the concerns outlined in this article, there was broad consensus that plurality in provision of genomic health information is inevitable. A publicly funded health system such as the NHS requires buffering against tests which provide information of lower quality or benefit. Protecting consumers requires that tests marketed commercially should, at a minimum, measure what they claim to measure, make accurate claims, and be safe [ 9 ].

DTC-GT and population research tests are not part of routine healthcare: their role, and that of PRS in healthcare is uncertain. More research is needed around the clinical utility of different types of risk information including PRS. There is a danger that tests that are not clinically indicated could overdiagnose, transforming ‘well’ people, whose risk is low, into patients receiving NHS care. In order to provide equitable care the NHS prioritises clinical need; recent guidance [ 71 ] represents an attempt to manage DTC-GT in the NHS. We also highlight the need for protection of individual healthcare data, and for its use to be transparent and to respect public preferences. There was a consensus that governance and ongoing review are required to ensure trustworthiness and maintain the trust of current and future data contributors. We make the following suggestions for policy attention:

International regulatory standards for genetic testing, including test technology, variant calling and reporting, should apply to all individual genomic data that are reported.

Patients, research participants and consumers should expect clear information about the test and results. Information should be evidence-based, regularly reviewed and updated, and purported benefits and limitations responsibly balanced. Information about data uses, privacy and security should also be provided, with options for full withdrawal of data, including de-identified data.

Data collection initiatives benefiting from public sector investment, or individual health data harvesting, should prioritise and resource efforts to understand and respect public opinion, put in place transparent and robust governance structures, and include a principle of re-investment of revenue into public healthcare and health promotion.

Consideration should be given to models of joint provision, for example a ‘third way’, in which commercial providers fund an independent organisation staffed by trained and professionally regulated personnel such as clinical scientists and genetic counsellors. Under this model, individuals could be triaged and those who meet clinical risk criteria managed within the NHS. Clinical outcomes data on rare variants identified outside the context of clinically ascertained families is required to inform clinical utility, and consent should be sought for data capture.

Data availability

Data sharing not applicable to this article as no datasets were generated or analysed during the current study.

https://www.genomicsengland.co.uk/about-genomics-england/the-100000-genomes-project/ . Accessed 26/10/2020.

https://www.england.nhs.uk/genomics/nhs-genomic-med-service/ . Accessed 26/10/20.

https://www.ukbiobank.ac.uk/ .Accessed 27/10/20.

https://allofus.nih.gov/ . Accessed 26/10/20 5.

https://www.ukri.org/innovation/industrial-strategy-challenge-fund/accelerating-detection-of-disease/ . Accessed 26/10/20.

Phillips AM. ‘Only a click away - DTC genetics for ancestry, health, love…and more: a view of the business and regulatory landscape’. Appl Transl Genom. 2016;8:16–22.

PubMed   PubMed Central   Google Scholar  

Direct-to-consumer genetic testing. Opportunities and risks in a rapidly evolving market. KPMG International, 2018. https://assets.kpmg/content/dam/kpmg/xx/pdf/2018/08/direct-to-consumer-genetic-testing.pdf .

Covolo L, Rubinelli S, Ceretti E, Gelatti U. Internet-Based Direct-to-Consumer Genetic Testing: A Systematic Review. J Med Internet Res. 2015;17:e279.

Article   PubMed   PubMed Central   Google Scholar  

McGuire AL, Burke W. Health System Implications of Direct-to-Consumer Personal Genome Testing. Public Health Genom. 2011;14:53–8.

Article   Google Scholar  

https://www.chathamhouse.org/about-us/chatham-house-rule . Accessed 1.4.21.

Wright CF, Hall A, Zimmern RL. Regulating direct-to-consumer genetic tests: What is all the fuss about? Genet Med. 2011;13:295–300.

Article   PubMed   Google Scholar  

Horton R, Crawford G, Freeman L, Fenwick A, Wright CF, Lucassen A. Direct-to-consumer genetic testing. BMJ. 2019;367:l5688.

Nelson SC, Bowen DJ, Fullerton SM. Third-Party Genetic Interpretation Tools: A Mixed-Methods Study of Consumer Motivation and Behavior. Am J Hum Genet. 2019;105:122–31.

Article   CAS   PubMed   PubMed Central   Google Scholar  

Guerrini CJ, Wagner JK, Nelson SC, Javitt GH, McGuire AL. Who’s on third? Regulation of third-party genetic interpretation services. Genet Med. 2020;22:4–11. Jan.

Article   CAS   PubMed   Google Scholar  

https://www.ukbiobank.ac.uk/2020/10/uk-biobank-makes-available-new-exome-sequencing-data/ . Accessed 26/10/2020.

Tandy-Connor S, Guiltinan J, Krempely K, Laduca H, Reineke P, Gutierrez S, et al. False-positive results released by direct-to-consumer genetic tests highlight the importance of clinical confirmation testing for appropriate patient care. Genet Med. 2018;20:1515–21.

Millward M, Tiller J, Bogwitz M, Kincaid H, Taylor S, Trainer AH, et al. Impact of direct-to-consumer genetic testing on Australian clinical genetics services. Eur J Med Genet. 2020;63:103968.

Weedon MN, Jackson L, Harrison JW, Ks R, Tyrell J, Hattersley AT, et al. Use of SNP chips to detect rare pathogenic variants: retrospective, population based diagnostic evaluation. BMJ. 2021;15:372:n214.

Google Scholar  

Moscarello T, Murray B, Reuter CM, Demo E. Direct-to-consumer raw genetic data and third-party interpretation services: more burden than bargain? Genet Med. 2019;21:539–41. of Medical Genetics and Genomics and the Association for Molecular Pathology. Genetics in Medicine. 2015;17:405–23.

El Mecky J, Johansson L, Plantinga M, Fenwick A, Lucassen A, Dijkhuizen T, et al. Reinterpretation, reclassification, and its downstream effects: challenges for clinical laboratory geneticists. BMC Med Genom. 2019;12:170.

Lek M, Karczewski KJ, Minikel EV, Samocha KE, Banks E, Fennell T, et al. Analysis of protein-coding genetic variation in 60,706 humans. Nature. 2016;536:285–91.

Wright CF, West B, Tuke M, Jones SE, Patel K, Laver TW, et al. Assessing the Pathogenicity, Penetrance, and Expressivity of Putative Disease-Causing Variants in a Population Setting. Am J Hum Genet. 2019;104:275–86.

Tennen RI, Laskey SB, Koelsch BL, McIntyre MH, Tung JY. Identifying Ashkenazi Jewish BRCA1/2 founder variants in individuals who do not self-report Jewish ancestry. Scientific Rep. 2020;10:7669.

Article   CAS   Google Scholar  

Grosse SD, Khoury MJ. What is the clinical utility of genetic testing? Genet Med. 2006;8:448–50.

Walcott SE, Miller FA, Dunsmore K, Lazor T, Feldman BM, Hayeems RZ. Measuring clinical utility in the context of genetic testing: a scoping review. Eur J Human Genet. 2020;29:378–86.

Savard J, Hickerton C, Metcalfe SA, Gaff C, Middleton A, Newson AJ. From Expectations to Experiences: Consumer Autonomy and Choice in Personal Genomic Testing. AJOB Empirical. Bioethics 2020;11:63–76.

Bunnik EM, Janssens ACJW, Schermer MHN. Personal utility in genomic testing: is there such a thing? J Med Ethics. 2015;41:322–6.

Torkamani A, Wineinger NE, Topol EJ. The personal and clinical utility of polygenic risk scores. Nat Rev Genet. 2018;19:581–90.

Stewart KFJ, Wesselius A, Schreurs MAC. Schols AMWJ, Zeegers MP. Behavioural changes, sharing behaviour and psychological responses after receiving direct-to-consumer genetic test results: a systematic review and meta-analysis. J Community Genet. 2018;9:1–18.

PHG Foundation Polygenic scores, risk and cardiovascular disease. 2019; https://www.phgfoundation.org/documents/prs-report-final-web.pdf .

Fitzgerald TW. Large-scale discovery of novel genetic causes of developmental disorders. Nature. 2015;519:223–8.

Turro E, Astle WJ, Megy K, Gräf S, Greene D, Shamardina O, et al. Whole-genome sequencing of patients with rare diseases in a national health system. Nature. 2020;583:96–102.

Hogarth S, Saukko P. A market in the making: the past, present and future of direct-to-consumer genomics. N Genet Soc. 2017;36:197–208.

Sudlow C, Gallacher J, Allen N, Beral V, Burton P, Danesh J, et al. UK Biobank: An Open Access Resource for Identifying the Causes of a Wide Range of Complex Diseases of Middle and Old Age. PLOS Med. 2015;12:e1001779.

Policy Innovation and Evaluation Research Unit (2020) Understanding experiences of recruiting for, and participating in, genomics research and service transformation: the 100,000 Genomes Project, 2015–17. https://piru.ac.uk/assets/files/100k%20genomes%20project-Summary.pdf . 2020.

Sterckx S, Dheensa S, Cockbain J. Presuming the Promotion of the Common Good by Large-Scale Health Research. In: Van Beers, B., Sterckx, S., & Dickenson, D. editors. Personalised Medicine, Individual Choice and the Common Good (Cambridge Bioethics and Law). Cambridge: Cambridge University Press; 2018.

Samuel GN, Farsides B. The UK’s 100,000 Genomes Project: manifesting policymakers’ expectations. N. Genet Soc. 2017;36:336–53.

Hall JA, Gertz R, Amato J, Pagliari C. Transparency of genetic testing services for ‘health, wellness and lifestyle’: analysis of online prepurchase information for UK consumers. Eur J Hum Genet. 2017;25:908–17.

Phillips AM. Reading the fine print when buying your genetic self online: direct-to-consumer genetic testing terms and conditions. N. Genet Soc. 2017;36:273–95.

Manchanda R, Loggenberg K, Sanderson S, Burnell M, Wardle J, Gessler S, et al. Population Testing for Cancer Predisposing BRCA1/BRCA2 Mutations in the Ashkenazi-Jewish Community: a Randomized Controlled Trial. JNCI. J Natl Cancer Inst. 2015;107:dju379–dju.

Manickam K, Buchanan AH, Schwartz MLB, Hallquist MLG, Williams JL, Rahm AK, et al. Exome Sequencing–Based Screening for BRCA1/2 Expected Pathogenic Variants Among Adult Biobank Participants. JAMA Netw Open. 2018;1:e182140.

Graham M, Hallowell N, Solberg B, Haukkala A, Holliday J, Kerasidou A, et al. Taking it to the bank: the ethical management of individual findings arising in secondary research. J Med Ethics. 2021:medethics-2020-106941.

Gibson LM, Littlejohns TJ, Adamska L, Garratt S, Doherty N, Wardlaw JM, et al. Impact of detecting potentially serious incidental findings during multi-modal imaging. Wellcome Open. Research 2018;2:114.

Wynn J, Lewis K, Amendola LM, Bernhardt BA, Biswas S, Joshi M, et al. Clinical providers’ experiences with returning results from genomic sequencing: an interview study. BMC Med Genom. 2018;11:45.

https://www.independent.co.uk/news/health/matt-hancock-genetic-test-prostate-cancer-nhs-genomics-a8832081.html . Accessed 03/11/2020.

Liu W, Outlaw JJ, Wineinger N, Boeldt D, Bloss CS. Effect of co-payment on behavioral response to consumer genomic testing. Transl Behav Med. 2018;8:130–6.

Dickenson D, Rafi I, Spicer J, Papanikitas A. Should UK primary care be an early adopter of genomic medicine? Br J Gen Pr. 2019;69:330–1.

https://www.genomicseducation.hee.nhs.uk/ . Accessed 29/10/2020.

Middleton A, Mendes Á, Benjamin CM, Howard HC. Direct-to-consumer genetic testing: where and how does genetic counseling fit? Personalized Med. 2017;14:249–57.

Fry A, Littlejohns TJ, Sudlow C, Doherty N, Adamska L, Sprosen T, et al. Comparison of Sociodemographic and Health-Related Characteristics of UK Biobank Participants With Those of the General Population. Am J Epidemiol. 2017;186:1026–34.

Tsosie KS, Yracheta JM, Dickenson D. Overvaluing individual consent ignores risks to tribal participants. Nat Rev Genet. 2019;20:497–8.

Begay RL, Garrison NA, Sage F, Bauer M, Knoki-Wilson U, Begay DH, et al. Weaving the Strands of Life (<em>Iiná Bitł'ool</em>): History of Genetic Research Involving Navajo People. Hum Biol. 2019;91:189–208.

Ipsos MORI. A public dialogue on genomic medicine: time for a new social contract? https://www.ipsos.com/sites/default/files/ct/publication/documents/2019-04/public-dialogue-on-genomic-medicine-full-report.pdf . 2019.

Dheensa S, Lucassen A, Fenwick A. Fostering trust in healthcare: Participants’ experiences, views, and concerns about the 100,000 genomes project. Eur J Med Genet. 2019;62:335–41.

Ipsos MORI. The One-Way Mirror: Public attitudes to commercial access to health data. 2016 https://wellcome.org/sites/default/files/public-attitudes-to-commercial-access-to-health-data-wellcome-mar16.pdf . 2016.

European Commission. Proposal for a REGULATION OF THE EUROPEAN PARLIAMENT AND OF THE COUNCIL on the protection of individuals with regard to the processing of personal data and on the free movement of such data (General Data Protection Regulation) 2012. https://eur-lex.europa.eu/LexUriServ/LexUriServ.do?uri=COM:2012:0011:FIN:EN:PDF . Brussels; 2012.

Erlich Y, Shor T, Pe’Er I, Carmi S. Identity inference of genomic data using long-range familial searches. Science 2018;362:690–4.

PHG Foundation. The GDPR and genomic data https://www.phgfoundation.org/documents/gdpr-and-genomic-data-report.pdf . 2020.

Sanderson SC, Lewis C, Patch C, Hill M, Bitner-Glindzicz M, Chitty LS. Opening the “black box” of informed consent appointments for genome sequencing: a multisite observational study. Genet Med. 2019;21:1083–91.

Ballard LM, Horton RH, Dheensa S, Fenwick A, Lucassen AM. Exploring broad consent in the context of the 100,000 Genomes Project: a mixed methods study. Eur J Hum Genet. 2020;28:732–41.

Mackley MP, Blair E, Parker M, Taylor JC, Watkins H, Ormondroyd E. Views of rare disease participants in a UK whole-genome sequencing study towards secondary findings: a qualitative study. Eur J Hum Genet. 2018;26:652–9.

Caulfield T, Kaye J. Broad Consent in Biobanking: reflections on Seemingly Insurmountable Dilemmas. Med Law Int. 2009;10:85–100.

Steinsbekk KS, Kåre Myskja B, Solberg B. Broad consent versus dynamic consent in biobank research: is passive participation an ethical problem? Eur J Hum Genet. 2013;21:897–902.

Samuel G, Dheensa S. Perspectives on Achieving Institutional Trust in Personalized Medicine. Am J Bioeth. 2018;18:39–41.

O’Doherty KC, Shabani M, Dove ES, Bentzen HB, Borry P, Burgess MM, et al. Toward better governance of human genomic data. Nat Genet. 2021;53:2–8.

Article   PubMed   PubMed Central   CAS   Google Scholar  

Hodson, H. New Scientist 13 May 2106. Did Google’s NHS patient data deal need ethical approval? https://www.newscientist.com/article/2088056-did-googles-nhs-patient-data-deal-need-ethical-approval/ . Accessed 27.10.2068.

REGULATION (EU) 2017/746 OF THE EUROPEAN PARLIAMENT AND OF THE COUNCIL on in vitro diagnostic medical devices. Official Journal of the European Union. https://eur-lex.europa.eu/legal-content/EN/TXT/PDF/?uri=CELEX:32017R0746&from=cs . 2017.

Kalokairinou L, Howard HC, Slokenberga S, Fisher E, Flatscher-Thöni M, Hartlev M, et al. Legislation of direct-to-consumer genetic testing in Europe: a fragmented regulatory landscape. J Community Genet. 2018;9:117–32.

https://www.gov.uk/guidance/regulating-medical-devices-in-the-uk . Accessed 29.4.21.

House of CommonsScience and Technology Committee: Direct-to-consumer genomic testing. https://committees.parliament.uk/publications/6347/documents/69832/default/ . June 2021.

BSGM/RCGP (2019) Position Statement on DTC-GT. https://www.rcgp.org.uk/-/media/Files/CIRC/Clinical-Policy/Position-statements/RCGP-position-statement-on-direct-to-consumer-genomic-testing-oct-2019.ashx?la=en .

Download references

Acknowledgements

We thank the following for generously giving their time and expertise: Adam Barnett (Dipex Charity), Dr Edward Blair (Oxford University Hospitals NHS Foundation Trust), Dr Sarion Bowers (Wellcome Sanger Centre), Dr Tara Clancy (Central Manchester University Hospitals NHS Foundation Trust), Prof Donna Dickenson (University of London), Alison Hall (PHG Foundation), Dr Rachel Horton (University of Southampton), Dr Anant Jani (University of Oxford), Avi Lasarow (DNAFit), James Lawford Davies (Hill Dickinson LLP), Prof Anneke Lucassen (University of Southampton), Dr Jo Mason (Yourgene Health), Dr Richard Milne (Wellcome Connecting Science), Prof Michael Parker (Ethox), Dr Andelka Phillips (University of Waikato, New Zealand; HeLEX Centre, University of Oxford), Dr Imran Rafi (University of Oxford), Dr Sian Rees (Oxford Academic Health Science Network), Joel Rose (Cardiomyopathy UK), Dr Helen Salisbury (Academic GP, Oxford), Dr Saskia Sanderson (OFH), Dr Joyce Solomons (Oxford University Hospitals NHS Foundation Trust), Dr John Spicer (GP, London), Dr Ellen Thomas (100,000 Genomes Project), Dr Kate Thomson (Oxford University Hospitals NHS Foundation Trust), Prof Caroline Wright (University of Exeter), Dr Shirley Wu (23andMe, California, USA).

NIHR Oxford Biomedical Research Centre; Research England Higher Education Innovation Fund, via the University of Oxford Research and Public Policy Partnership Scheme, and University of Oxford Strategic Priorities Fund allocation for policy engagement.

Author information

Authors and affiliations.

Radcliffe Department of Medicine, University of Oxford, Oxford, UK

Elizabeth Ormondroyd

National Institute for Health Research (NIHR) Comprehensive Biomedical Research Centre, Oxford, UK

Parliamentary Office of Science and Technology, London, UK

Peter Border

Health Education England Genomics Education Programme, London, UK

Judith Hayward

Yorkshire Regional Genetics Service, Chapel Allerton Hospital, Leeds Teaching Hospitals Trust, Leeds, UK

Nuffield Department of Primary Care Health Science, University of Oxford, Oxford, UK

Andrew Papanikitas

You can also search for this author in PubMed   Google Scholar

Contributions

Conceived by EO, PB, AP; data collection/analysis by EO, PB, AP. All authors contributed to manuscript drafting.

Corresponding author

Correspondence to Elizabeth Ormondroyd .

Ethics declarations

Competing interests.

The authors declare no competing interests.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/ .

Reprints and permissions

About this article

Cite this article.

Ormondroyd, E., Border, P., Hayward, J. et al. Genomic health data generation in the UK: a 360 view. Eur J Hum Genet 30 , 782–789 (2022). https://doi.org/10.1038/s41431-021-00976-w

Download citation

Received : 27 May 2021

Revised : 25 August 2021

Accepted : 27 September 2021

Published : 19 October 2021

Issue Date : July 2022

DOI : https://doi.org/10.1038/s41431-021-00976-w

Share this article

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

This article is cited by

2022: the year that was in the european journal of human genetics.

  • Alisdair McNeill

European Journal of Human Genetics (2023)

Clinical genomics testing: mainstreaming and globalising

European Journal of Human Genetics (2022)

Quick links

  • Explore articles by subject
  • Guide to authors
  • Editorial policies

health data research futures

  • Visit the Gateway
  • Visit the Alliance
  • Visit HDR UK Futures

The Health Data Research Innovation Gateway provides a common entry point to discover and request access to UK health datasets. Users can search for health data tools, research projects, publications and collaborate via a community forum. Find out more

We are the national institute for health data science.

Our mission is to unite the UK's health data to enable discoveries that improve people's lives.

Our Research Priorities

Five UK-wide Research Driver programmes are focusing on areas where data science has great potential to improve public health, prevent people from becoming unwell, and enhance patient care.

The Alliance marks its fifth anniversary

The UK Health Data Research Alliance is marking five remarkable years of progress and commitment to maximising the benefits of health data research for all.

CVD-COVID-UK / COVID-IMPACT

CVD-COVID-UK aims to understand the relationship between COVID-19 and cardiovascular diseases such as heart attack, heart failure, stroke, and blood clots in the lungs through analyses of de-identified, linked, nationally collated healthcare datasets across the four nations of the UK.

Health Data Science Black Internship Programme

As the UK’s national institute for health data research, we are helping to transform the prospects of talented Black health data scientists in the UK by providing opportunities to flourish in STEM careers through our health data research Black Internship Programme.

HDR UK Futures

HDR UK Futures is a free and flexible virtual learning platform which lets you learn from national and international experts however and whenever suits you.

health data research futures

New insights to tackle major global health data research challenges

An analysis of the challenges faced by researchers during the International COVID-19 Data Alliance (ICODA) programme has shed new light on solutions that could accelerate global health data research.

health data research futures

HDR UK and GA4GH form a strategic partnership to unite genomic and health data

New ai tool may offer insights into patients’ future health.

health data research futures

Impact at HDR UK

Our vision is that every patient, clinical trial, biomedical discovery and public health policy benefits from the use of large scale data and advanced analytics.

health data research futures

Research Driver Programmes

Five UK-wide Research Driver Programmes will harness the power of large-scale data to deliver ground-breaking scientific impacts and infrastructure innovation.

Using Health Data

Case Studies

Case study examples of how we have enabled research across the UK.

Gateway Logo

Discover Data on the Gateway

The Health Data Research Innovation Gateway (the 'Gateway') search engine or 'portal' provides a common entry point for researchers and innovators to discover and request access to health...

health data research futures

Health Data Research Hubs

The Health Data Research Hubs provide a rich toolkit of healthcare datasets, infrastructure and expertise that enable users to identify, access, understand and use data to improve people’s lives.

health data research futures

UK Health Data Research Alliance

The UK Health Data Research Alliance (the 'Alliance') is an independent alliance of leading healthcare and research organisations united to establish best practice for the ethical use of UK health...

Subscribe to our Mailing List

Join our community and keep up-to-date on matters related to health data science!

  • Health IT Buzz

LinkedIn Logo

Setting Our Sights Toward a Healthier, More Innovative, Data-Driven Future

Seth Pazinski; Peter Karras and Dustin Charles | March 27, 2024

Portrait of Seth Pazinski

The draft 2024–2030 Federal Health IT Strategic Plan [PDF – 2.3 MB] (the draft Plan) is now open for public comment. The public comment period on the draft Plan ends on May 28, 2024 at 11:59:59 PM ET .

The draft Plan is a comprehensive and strategic effort developed by ONC in collaboration with more than 25 federal organizations.

The preceding 2020-2025 Federal Health IT Strategic Plan promoted a modern health IT infrastructure and drove significant progress across government and industry to advance the access, exchange, and use of electronic health information (EHI). The 2024–2030 Federal Health IT Strategic Plan seeks to build upon this progress and includes an increased emphasis in areas such as public health, health equity, and artificial intelligence (AI).

ONC and its federal partners addressed several considerations to ensure the draft Plan’s effectiveness and relevance in an ever-evolving health care landscape. For example, to address the challenges presented during the COVID-19 pandemic, the draft Plan identifies technological gaps and opportunities to modernize public health data systems.

Additionally, the draft Plan recognizes the current persistent disparities in health care access and outcomes and therefore, seeks to promote equitable access to EHI and communications technology, fair representation in research endeavors, and ensure that equity is built in by design to health IT.

Lastly, the draft Plan acknowledges the swift evolution of AI and increased use in health care, emphasizing the urgent need for the federal government to navigate this transformative landscape both responsibly and effectively in health and health care.

What are the Goals and Objectives of the draft Plan?

The draft Plan is deliberately outcomes-driven, with goals and objectives focused on improving access to EHI, delivering a better, more equitable health care experience, and modernizing our nation’s public health infrastructure. The goals are divided into distinct categories, with goals 1 through 3 addressing plans to improve the experiences and outcomes for health IT users, while goal 4 is focused on the policies and technologies needed to support those users.

Image describing the Federal Health IT Strategic Plan Framework

How will the final 2024-203 Federal Health IT Strategic Plan be used?

This Plan is intended to serve as a roadmap for federal health IT initiatives and activities, and as a catalyst for activities in the private sector. Specifically, the final Plan will help to:

  • Prioritize resources within federal agencies
  • Align and coordinate federal health IT initiatives and activities
  • Signal priorities to the private sector
  • Benchmark and assess progress of federal health IT initiatives and activities

Submit your Feedback

Please help inform the federal government’s health IT and EHI sharing direction by submitting your feedback on the draft 2024-2030 Federal Health IT Strategic Plan .

  • Visit the Gateway
  • Visit the Alliance
  • Visit HDR UK Futures

Welcome to Health Data Research UK (HDR UK) - we are the UK's national institute for health data science. We are uniting the UK’s health data to enable discoveries that improve people’s lives. Our vision is that every health and care interaction and research endeavour will be enhanced by access to large scale data and advanced analytics.

Our mission is to unite the UK’s health and care data to enable discoveries that improve people’s lives. We do this by uniting, improving and using health and care data as one national institute.

Our vision is that every health interaction and research endeavour will be enhanced by access to large scale data and advanced analytics.

Find out more

We are an independent, charity organisation supported by 9 funders, with work based at 31 locations across the UK. At Health Data Research UK (HDR UK), our work spans across academia, healthcare, industry, and charities, as well as patients and the public.

' title=

Our strategy

Our strategy sets out every location and team, and outlines how they are working together for the collective effort.

' title=

Our funders

Our funders recognise the pivotal contribution of health data science towards achieving transformative health benefits.

' title=

Our partners

Through partnership, we aim to identify and act on opportunities to make bigger and better health data science improvements.

View our FAQs

We are a registered charity No. 1194431, funded by UK Research and Innovation, the Department of Health and Social Care in England and equivalents in Northern Ireland, Wales and Scotland, and leading medical research charities.

health data research futures

Support for Ukraine

We stand in solidarity with the people of Ukraine, especially those members of the healthcare and research professions.

Get in touch with us

We're here to help you with your health data research enquiries.

MIT Technology Review

  • Newsletters

Building a data-driven health-care ecosystem

Harnessing data to improve the equity, affordability, and quality of the health care system.

  • MIT Technology Review Insights archive page

health data research futures

In association with JPMorgan Chase

The application of AI to health-care data has promise to align the U.S. health-care system to quality care and positive health outcomes. But AI for health care hasn’t reached its full capacity.  One reason is the inconsistent quality and integrity of the data that AI depends on. The industry—hospitals, providers, insurers, and administrators—uses diverse systems. The resulting data can be difficult to share because of incompatibility, privacy regulations, and the unstructured nature of much of the data. The data can carry errors, omissions, and duplications, making it difficult to access, analyze, and use. Even the best data can cause data bias : the data used to train AI models can reinforce underrepresentation of historically marginalized populations. The growth of AI in all industries means data quality is increasingly vital.

While AI-driven innovation is still growing, the U.S. continues to spend more than twice as much as the average high-income country for its health care, while its health outcomes are falling: the latest data from the U.S. Center for Disease Control’s National Center for Health Statistics indicates U.S. life expectancy rates dropped for the second year in a row in 2021.

To spark innovation by identifying gaps and pain points in the employer-based health-care system, JPMorgan Chase launched Morgan Health in 2021. Morgan Health’s chief technology officer of corporate responsibility, Tiffany West Polk, says Morgan Health is driven to improve health outcomes, affordability, and equity, with data at its foundation. Gaining insights from large data streams means optimizing analytical platforms and ensuring data remains secure, while also HIPAA and Health Resources and Services Administration (HRSA) compliant, she says.

Currently, Polk says, the U.S. health-care system seems to be “quite stuck” in terms of keeping health-care quality and positive outcomes in line with rising costs.

  • “If you look across the broader U.S. environment in particular, employer sponsored insurance is a huge part of the health-care net for the United States, and employers make significant financial investment to provide health benefits to their employees. It's one of the main things that people look at when they're looking across an employer landscape and thinking about who they want to work for.”

Investing in new ways to provide health care

Nearly 160 million people in the U.S. have employer-sponsored health insurance as of 2022, according to health-care policy research non-profit KFF (formerly the Kaiser Family Foundation). JPMorgan Chase launched Morgan Health because of its focus on improving employer-sponsored health care, not least for its 165,000 employees.

Morgan Health has invested $130 million in capital during the past 18-plus months in five innovative health-care companies: advanced primary care provider Vera Whole Health; health-care data analytics specialist Embold Health; Kindbody, a fertility clinic network and global family-building benefits provider; LetsGetChecked, which creates home-monitoring clinical tools; and Centivo, which provides health care plans for self-insured employers.

All of these companies offer new approaches to conventional employer-sponsored health care to deliver a higher standard of care. Morgan Health’s collaboration with these enterprises will examine how these change patient outcomes, health-care equity, and affordability, and how to scale their successes.

“Many Americans today face real barriers to receiving high-quality, affordable, and equitable health care, even with employer-sponsored insurance,” Polk says. This calls for breaking the paradigm of delivery-incentivized health care, she says, which rewards providers for delivering services, but pays insufficient attention to outcomes.  

  • “We have a model today where our health-care providers are incentivized based on the number of patients they see or the number of services they perform. What that means is that they're not incentivized based on improvements, patient's health, and wellbeing. And so when you have a model that thinks volume versus value, those challenges then serve to compound the disparities that we have. And that then also means that those who have employer-sponsored insurance are also similarly challenged.”

For Morgan Health, AI and machine learning (ML) will be a key to problem-solving with health-care technology, Polk says. AI is ubiquitous across industries, and is the go-to when we think about innovation, she says, but the hype can mean we forget about the importance of data accessibility and quality.

Polk says solving this data challenge makes this an exciting and transformational time to be a chief technology officer and a technologist. The next stage of evolution in health care can’t proceed without better data, Polk says, and this is what the data and analytics team at Morgan Health are addressing.

  • “[AI] has become so ubiquitous in terms of how we think about everything. And we think that it is the thing that's going to fix anything and everything in technology. And it has become so ubiquitous and so the go-to when you think about innovation, that I think that sometimes, there's this way in which people kind of forget about what AI actually is underneath the covers.”

Garnering data-based insights

To address the strength of health-care data, the industry is moving increasingly toward standard electronic health-care records (EHRs) for patients. A 2023 Deloitte study says use of EHRs and health information exchanges (HIEs) is growing rapidly, with organizations building data lakes and using AI to combine and cleanse data. These measures provide a “strong digital backbone” for building connections between hospitals, primary care centers, and payment tools, the study says, and this should help reduce errors, unnecessary readmissions, and duplicate testing.

The U.S. Department of Health and Human Services (HHS) is also building a network for digital connection in the health-care industry, to allow data to flow among multiple providers and geographies. Its Office of the National Coordinator for Health Information Technology (ONC) announced in December 2023 that its national health data exchange —the Trusted Exchange Framework and Common Agreement (TEFCA)—is operational. The exchange connects Quality Health Care Information Networks, which it certifies and onboards, with standard policies and technical requirements.

Polk says Morgan Health is improving foundations to incentivize better outcomes for patients. Morgan Health’s work can create standards—grounded in data—that incentivize better performance, which can then be shared across the employer-sponsored insurance network, and among broader communities. Using AI features such as metadata tagging (algorithms that can group and label data that has a common purpose), she says, “is one way health-care companies can simplify tasks and open up more time for providing care.”

  • “If you do your data ingestion right, if you cleanse your data right, if you make sure that your metadata tagging is correct, and then you are very aware of the way in which your algorithms have been biased in the past, you can be aware of that so that you can make sure that your algorithms are inclusive moving forward.”

“I think the most important thing is incentivizing our health-care partners who provide for our employees to meaningfully improve health-care quality, equity, and affordability through incentivizing outcomes, not incentivizing volume, not incentivizing visits, but really incentivizing outcomes,” Polk says.

This article is for informational purposes only and it is not intended as legal, tax, financial, investment, accounting or regulatory advice. Opinions expressed herein are the personal views of the individual(s) and do not represent the views of JPMorgan Chase & Co. The accuracy of any statements, linked resources, reported findings or quotations are not the responsibility of JPMorgan Chase & Co.

Keep Reading

Most popular, how scientists traced a mysterious covid case back to six toilets.

When wastewater surveillance turns into a hunt for a single infected individual, the ethics get tricky.

  • Cassandra Willyard archive page

It’s time to retire the term “user”

The proliferation of AI means we need a new word.

  • Taylor Majewski archive page

The problem with plug-in hybrids? Their drivers.

Plug-in hybrids are often sold as a transition to EVs, but new data from Europe shows we’re still underestimating the emissions they produce.

  • Casey Crownhart archive page

Sam Altman says helpful agents are poised to become AI’s killer function

Open AI’s CEO says we won’t need new hardware or lots more training data to get there.

  • James O'Donnell archive page

Stay connected

Get the latest updates from mit technology review.

Discover special offers, top stories, upcoming events, and more.

Thank you for submitting your email!

It looks like something went wrong.

We’re having trouble saving your preferences. Try refreshing this page and updating them one more time. If you continue to get this message, reach out to us at [email protected] with a list of newsletters you’d like to receive.

Woman Filing Medical Records

‘Our health data is about to flow more freely, like it or not’: big tech’s plans for the NHS

The government is about to award a £480m contract to build a vast new database of patient data. But if people don’t trust it, they’ll opt out – I know, because I felt I had to

L ast December, I had an abortion. Most unwanted pregnancies set a panic-timer ticking, but I was almost counting down the seconds until I had to catch a flight from London, where I live, to Texas, where I grew up, and where the provision of abortion care was recently made a felony . You bleed for a while after most abortions, and I was still bleeding when I boarded the plane home for Christmas.

Going to Texas so soon after the procedure made me consider where the record of my abortion – my health data – would end up. When I phoned an abortion clinic in late November to book an appointment, one of the first questions staff asked was: “May we share a record of your treatment with your GP?”

I hesitated. There’s nothing wrong, in principle, with this question, and a lot that’s right. It’s not just that a complete health record helps my GP treat me. My Texan parents, both scientists, taught me that sharing information with organisations like the NHS can help it plan services and research ways to improve care. I’ve joined clinical studies in the past.

But I also help run a legal campaign group, Foxglove, that takes action against the government and tech companies when they infringe people’s rights. In a series of cases about NHS data since the start of the pandemic, we have defended people’s right to a say about who sees their medical information. This work has exposed me to worrying details about how our medical data can be used, including the Home Office practice of tracking migrants using their health records.

NHS data is special. For decades the government has required GPs to store patients’ records in a standardised way : as well as longhand notes, every interaction with a GP is saved on a computer database in a simple, consistent code. The NHS may hold the richest set of population-wide, machine-readable health data in the world. Many see in that data a source of immense potential – and profit. Ernst & Young has valued NHS patient data at £9.6bn a year . It is particularly valuable to tech giants, who would like to get their hands on NHS datasets to build AI machine-learning systems.

As things stand today, I believe my local GP would safeguard the record of my abortion. By Christmas next year? I’m not so sure. It may no longer be up to them. The government is seeking to overhaul the way it handles every health record in England, and its plans have filled some healthcare workers with alarm.

There is clearly a need to make patient data more consistent and accessible across the NHS. At the moment, caregivers in one part of the NHS often can’t access the records of care their patients received elsewhere. Imagine you have had knee pain for years. Doctors want to operate. At various times you have seen your GP, and been to a hospital in south London, but the surgery is due to be carried out in a different borough. (This kind of thing happens all the time.) A doctor is trying to judge how risky your operation is, to decide if you can go to a health centre, where you might be seen sooner and released quicker, or must be sent to a hospital.

Because your health data is held in many places – at your GP and across various hospitals where you have been – there is not one complete picture of you as a patient they can use to check. Of course the clinician will ask about your health before the operation; but you may not remember, or even know, every detail that could affect how your surgery could go. If this data was better shared, they could create a clearer picture of your health. Without this information, their assessment of your risk for surgery is, at best, an educated guess.

This partial view of the patient is commonplace in the NHS, and it is one reason the government is trying to create a vast new NHS England database , known as the federated data platform (FDP). (Health data is managed nationally, so this system is England-only.) If data from the nation’s hospitals, GPs and social care were fed into a single system, accessible in one place to health service doctors and planners, it could potentially help planners by showing trends across regions, and the population as a whole. The contract to build and run the platform is worth a huge sum – £480m over five years – with the winning bidder expected to be announced any day.

Putting so much data under central control may increase efficiency – but it also risks failing because of poor consultation and low patient trust. Many would prefer that NHS England invested in its own capacity, instead of farming out to private enterprise. The frontrunner for the contract to run the FDP is the US tech firm Palantir , which has performed data analytics work for the US security services, border forces and police. If you don’t know Palantir, you may be familiar with the company’s chair, Peter Thiel, a tech billionaire and Trump supporter, who has funded anti-abortion candidates and invested in anti-birth-control startups . Contacted for this article, Palantir emphasised that it is “not in the business of mining data, nor do we sell or monetise it in any way. What we do is provide tools that help customers understand and organise the information that they hold.” NHS England said that it, not the platform, will control the patient data inside it. It also points to contractual safeguards “to prevent any supplier gaining a dominant role in NHS data management”.

Still, the FDP gives government a lot of power over health data – and it hasn’t always earned our trust. In 2019, the head of NHS England and other top health officials discussed nine “ commercial models ” for access to the nation’s health data with tech and drug executives. A bill currently going through parliament, the data protection and digital information bill, would significantly dilute some of the laws that protect patient privacy. Data protection law – including the EU 2018 General Data Protection Regulation (GDPR) and its UK equivalent – limits what can be done with sensitive data such as health records. We mostly associate GDPR with infuriating website pop-ups , but it is a useful safeguard. It was one of the reasons the abortion clinic needed my permission to share my medical record with anyone, including my own doctor.

One of the most important of the proposed changes is the move to redefine what counts as personal data. Through a process of “pseudonymisation”, the system can detach your name and other identifying details. But this process is imperfect and reversible – and patients consistently say they want control over how their health data is used. If the government amends or interprets data protection law to take this choice away from us, it is hard to see how patient confidentiality as we know it will still exist. Our health records are precious to us. We come to the doctor as supplicants. We share worry, fear, pain. Our ability to do so depends on trust, which itself depends on the guarantee of privacy.

After a summer of strikes over pay and conditions, everyone can see that the NHS urgently needs more resources. The NHS could use our data – with strict safeguards – to streamline clinical decision-making and take pressure off doctors and nurses. But the government’s weakening of data protection rules, set against its cosying up to tech industry, is worrying. In the US in 2021, Google won a deal to access patient records at more than 2,000 clinics to develop healthcare algorithms. Google, like Microsoft and IBM, seeks to access patient records at scale in order to train algorithms to develop apps or organise care. In this troubling scenario, my pain, and yours, may become training data for someone else’s AI.

All this lingered in my mind on the call with the abortion clinic.

“I asked, was it all right if we shared this record with your doctor?”

“I’m sorry, but no,” I said.

Having an abortion was the most isolating experience of my adult life. It taught me how stigma lingers over abortion even in the UK, where treatment is widely available. It also changed how I think about health data: not only mine, but that of my family in Texas, and of patients across the UK. It spurred me to reflect on what is at stake if these changes to patient data go ahead.

With the drive towards AI in healthcare (from triage to diagnosis to drug research), we in England face a choice: will we privilege care and the public sector, or profit? People in the UK treasure the sanctity of our health record – we want control over who sees it, when, and why. What will it mean when we no longer believe a conversation with an NHS doctor is private?

A s I was flicking through Twitter on the early morning train to the clinic, a promoted post I’d never seen before popped up. It was a jokey meme showing a contraceptive pill pack, with the line: “Take your magic pills ladies, you don’t want any Baby Jesuses for Christmas.” Living under surveillance capitalism makes you paranoid. I snapped a screenshot, sent it to two female friends on Signal, then panicked, deleted the picture and turned my phone off.

It was 8am when I sat down at the abortion clinic in Ealing, west London, where a range of women awaited their appointments. Some wore fur coats, others tracksuits and false eyelashes; a few had been crying. A middle-aged woman, who looked as if she was rushing out to her next meeting, just seemed very pissed off. The only man in the waiting room was a teenager who had come along with his girlfriend.

At first I thought my reluctance to share my abortion record had mainly to do with data. I later realised it also had to do with shame. A cloud hangs over abortion. There is a strong residual stereotype of the woman who aborts: someone young, a bit careless, who knows no better.

In the parallel system you enter in England to access abortion, the language reveals a world shadowed by female vulnerability and male violence. I could hear it when I phoned the clinic: “Are you alone?” “Are you safe?” “Please choose a password and a memorable number; we will only use these to contact you about your treatment.” The clinic never used the word “abortion” by text or email. All this signals that you have ventured to a space outside normal healthcare. Nobody hassled me outside the clinic that day, but I’ve seen the protesters with signs outside another London clinic.

Nurses examining data on a computer in a hospital ward in Glasgow.

Go for surgery, a friend at a party had whispered; the pills cause far more pain. I remembered lying on the bathroom floor a decade ago during a miscarriage, the weeks of bleeding and pain, and decided to investigate. At the clinic, a seasoned Danish nurse told me I needed an internal ultrasound to assess possible treatments. I asked how soon I could have the surgery. The nurse said there would be a wait, until just before I was due to go to Texas, where there would be no aftercare.

It was only then, when I was lying on the table and the nurse prodded my ovaries and my vagina, that the sobs came. I didn’t want another child. But bodies tend to hold on to the remnants of pregnancies, miscarriages and births; I remembered watching other ultrasounds: the child who developed and the child who did not. I lay on the shore of life and death and felt utterly alone.

The nurse told me it was early, about six weeks, and handed me a tissue. (Six weeks is the criminal threshold in Texas.) She had had this conversation thousands of times. “Were you hoping I wouldn’t see anything?” No, I blubbered, I am just tired. I didn’t mention my rage: at every Baptist or Lutheran or Methodist minister in the small town where I was raised, at Texas, at myself. It was clear the surgery would be too late, as I had to fly to the US. Before leaving the clinic I took the first lot of pills.

You may support abortion rights on principle, as I do. You may also, as I do, admit privately that you find its ethics ambiguous. Having an abortion taught me that a woman’s ability to sustain or end life is simultaneously an awful burden and an awful power. Mulling over the complexity of it all now seems like wallowing in a warm bath. When the reality appears, you are thrown in a cold river. The cells are dividing; a choice must be made. Swim on, or turn back? I felt the power and I hated and feared it, and I realised why so many men hate it too, and seek to control it.

In most of the UK, abortion is still technically a crime. The 1967 Abortion Act does not legalise abortion ; it excuses it if a doctor considers that pregnancy (before 24 weeks) poses a risk to the mental or physical health of the mother. (Northern Ireland is different, with abortion legal on demand to 12 weeks.) This teaches women that to access care they must construct a narrative that continuing the pregnancy would be damaging to mother or baby, or both.

Perhaps that’s why I didn’t want the clinic to share my data. Having reduced a complex experience into a just-so story once, I didn’t want that story to be fed into a mass data system that could, I worried, one day be shared with the health division of Google, or the German drug giant Merck, or some other company. I saw my abortion as my story, and not one from which I was keen to permit any corporation, however modestly, to profit.

You emerge from this ordeal with a story you guard fiercely. This doesn’t mean you don’t tell it to others: it means you want absolute authority to tell the story as and when you see fit. It was months before I told my Texan family about the abortion; in the service of a wider point, I am writing about it now.

In German legal language, this is “informational self-determination”, or informationelle Selbstbestimmung . This idea means something slightly different to me from privacy. Privacy connotes a turning away, a hiding. But I didn’t want to hide my abortion: I wanted agency over when the story was told. Real data protection means you deal with your record on your terms.

I n some places, the data trails we leave have become a way to criminalise us. Earlier this year, a heartrending photo from Nebraska showed a girl of 19 sobbing as she was escorted from court after being sentenced to 90 days in prison after having an abortion at the start of her third trimester. Police had obtained her Facebook messages with her mother about organising the pills. Her mother was given five years for procuring the pills and hiding the body.

In Texas, over Christmas lunch with my sister and her teenage daughter, I wondered what I would do if they needed help. On what platform could we speak safely? How could I send them information, or bootleg them pills? Texas, like most of the US, has no GDPR equivalent, meaning women have no real control over their data. This allows apps such as period trackers to share location and other data with the authorities when the law requires.

This summer, a British woman was sentenced to prison after her search history revealed that, during the pandemic, she had a late-stage abortion. A couple of phrases typed into Google – one was “I need to have an abortion but I’m past 24 weeks” – helped to convict her. (The sentence, which drew broad condemnation from MPs, doctors and feminist groups, was later suspended, but her criminal record stands.)

“The fundamental thing to understand about NHS England is that it is the government,” says Marcus Baw, a GP and IT specialist. By this he means that officials, not clinicians, will make many of the critical decisions about which data flows into the new NHS platform. These officials, not doctors, will interpret the law about who will have access to it, including other government departments. I have yet to see an official give a clear answer to this question: once a copy of my health data has gone into the central platform, who decides what happens to it? The answer will seriously affect the public’s trust in the scheme.

Peter Thiel, cofounder of Palantir, at the Bitcoin 2022 Conference in Miami.

This is not a new problem. In 2013, NHS Digital was set up to develop and operate digital services for NHS England. Its first chair, Kingsley Manning, stepped down in 2016 because he said he got fed up with the Home Office demanding to see migrants’ health data, to check their location for border enforcement. This, understandably, deterred so many migrants from seeking care that the government had to run an information campaign during Covid to encourage people who had migrated to Britain to come for their jabs.

NHS Digital was also set up to protect patient data, with decisions about data-sharing theoretically insulated from central government control. It was abolished last year, in a move Manning described as a “retrograde step” with “the potential for undermining the relationship between clinicians and their patients”.

It is jarring to set these serious questions of patient trust against Rishi Sunak’s breezy promises, last November, about “using our new Brexit freedoms to create the most pro-innovation regulatory environment in the world in sectors like life sciences, financial services, artificial intelligence and data”. Palantir’s CEO, Alex Karp, has praised the UK’s “pragmatic but serious understanding of data protection”. He recently told the BBC: “If you’re a pharmaceutical company and you want to do research on a medicine, you’re going to be able to do things in the UK you would not be able to do easily in the continent.”

The winning bidder for the contract to run the federated data platform will soon be announced. The data protection and digital information bill, which weakens individual control over personal data, has had a second reading , a key step on its way to passing into law. In the name of progress, our health data is about to flow more freely, like it or not.

P alantir has spent years preparing the ground for its FDP bid. At the start of the pandemic, with the normal procurement rules suspended, Palantir was given a contract to build a national Covid-19 data store for the NHS for the nominal fee of £1, collecting vast sets of patient data to track the pandemic and, for example, create a list of vulnerable people who needed to shield at home. This data was used to distribute ventilators and manage the vaccine rollout. But the scale of the data collection caused alarm: one Whitehall observer described the level of sensitive patient data being swept into the Covid data store as “ unprecedented ”, and claimed there was insufficient regard to privacy and data protection. In December 2020, it was given a two-year, £23m contract to continue working on the project. That was renewed for another six months at a cost of £11.5m in January this year. Officials had promised to delete that data store when the pandemic was over – instead, Palantir is being paid £24.9m to migrate its data store to the new system. The procurement documents describe the purpose of the FDP as “to replace the Covid-19 datastore provided by Palantir”. If it’s anything like its predecessor, the central platform of the FDP will retain a copy of at least some patient data for all of England.

The rules of the FDP – the framework for handling patient records – are crucial. In particular, the public needs to know exactly which parts of their health record are being pooled or accessed nationally. Patients are also likely to ask what specific uses the FDP will be put to, and what may be bolted on later. (The procurement documents for the FDP explicitly say the purposes are set to expand.) Once the FDP is built and a permanent national flow of patient data is established, the main thing standing between that data and other uses is the health minister. There will be information governance advisers, and a watchdog, the national data guardian, but at this scale, these soft power checks and balances will be severely tested .

If the data protection and digital information bill passes, there will be even more reason to be concerned about the security of national health data. “Taken together, this bill maximises data and minimises governance of data fed to AIs,” says the privacy group medConfidential. Given Sunak’s stated enthusiasm for the use of AI in healthcare , the concern is that a senior official will later seek to open up the data for commercial purposes. Without clear, strong safeguards, more people will opt out, and the FDP may fail.

Anti-Trump demonstrators with a banner that reads ‘Take your hands off our NHS’ in Parliament Square, London, 4 June 2019.

The NHS is a boneyard of failed data-centralisation programmes. The first version of a national IT structure for the NHS was proposed by Tony Blair’s government in 2002. The idea was to replace paper records with a single digital version accessible to patients and doctors. The National Programme for IT ran into serious problems with software, and an over-centralised delivery. It cost more than £12bn over 10 years, and it failed partly because the proposed scale of data-sharing was seen as “highly problematic”. In 2014, David Cameron’s government tried again, with a programme called care.data . Again unresolved questions about corporate access, particularly insurance companies, sparked an outcry, provoking more than a million people to opt out of sharing health records. The government eventually gave up and cancelled the programme in 2016. Five years later, the government launched another project, General Practice Data for Planning and Research, to pool the most detailed data the NHS has: GP records. Following an outcry over government plans to make health data available to private companies, there was another huge wave of opt-outs. In August 2021, the scheme was paused . (My organisation, Foxglove, was involved in a legal case that precipitated the freeze.)

This adds up to more than 20 years of health centralisation plans – costing billions – and all of it foundered on the shoals of patient trust. Every scandal over centralising patient data also tends to increase the numbers of people exercising their right to “ opt out ” of sharing.

Statistics for the national opt-out serve as a rough measure of patient trust. It’s not a good picture: the number of patients who have opted out has ballooned over the years to some 3.3 million . (More than a million of those were added during the row over GP records in 2021.)

Opting out is currently the only practical way to register disapproval with state health plans, but it’s a blunt instrument. If you don’t want to give Google access to healthcare records to develop its algorithms, for example, or support the tobacco firm Philip Morris to develop inhalers , you’re also refusing positive uses of health data – say, helping NHS wards to plan ahead for numbers of patients vulnerable to flu. The current rate of opt-out is “incredibly problematic”, warns Dr Jess Morley, a researcher at Oxford University who has worked on patient data systems, because it makes the dataset “more random” and damages the value of the health data to the NHS. But to get these 3.3 million people to opt back in, the state will have to persuade them of its intentions – and its competence.

Recent polling by YouGov that Foxglove organised for a report also suggests that if the FDP were to go ahead in a form that is both managed by and accessible to private companies, around 48% of the residual population of England who have not opted out will do so. Anything like that figure will be disastrous for the NHS, because it would make the data useless.

P alantir isn’t the only contender for the platform. A team at Oxford University developed an open-source system called openSafely to support researchers during the pandemic. The system was secure and transparent – every question the system is asked is logged, and all approved projects are published. But it seems no one asked openSafely to help set up the FDP. There were other bidders, from Oracle Cerner to IBM to a “best of British” consortium of small UK companies and universities – but it’s not clear any of them had a serious shot. By the time the procurement opened in January, Palantir had its feet under the NHS table.

Palantir seems an odd choice of partner for the NHS. Originally funded by the CIA’s venture capital arm, In-Q-Tel, its early corporate history mainly involves supporting mass surveillance and the wars in Iraq and Afghanistan. In 2021, Palantir’s CTO, Shyam Sankar, said he hoped to see “something that looks like Palantir inside of every missile, inside of every drone”. (The company notes its services are also used by Ukrainian armed forces to resist Russian aggression, and by the World Food Programme.) Palantir helped the US and UK’s digital spy agencies (NSA and GCHQ) manage mass surveillance programmes such as XKeyscore , a system that tracked millions of people online. In Los Angeles, the police have used Palantir to build a tool, Laser, that claimed it would “extract” suspected offenders from the community like “ tumors ,” until community organisers forced the LAPD to scrap it . Not once have I seen a senior British official publicly reflect on how this history may affect trust in public healthcare among, say, migrant groups, or Muslims.

Palantir’s values appear to sit uncomfortably with the NHS. In January this year, Thiel described Britons’ affection for the NHS as “ Stockholm syndrome ” and said the best thing to do would be to “rip the whole thing from the ground and start over”. The company quickly distanced itself from its chairman’s remarks, which they said he made as a private individual.

This summer Karp hosted Sunak at a baseball game in the US, one in a string of lobbying efforts over the past four years. In 2019, Palantir served watermelon cocktails to the then-chair of the NHS, and Palantir representatives joined a closed-door chat at Davos with the then trade secretary Liam Fox and other tech executives. Fox’s briefing notes for that meetings said: “We have a huge and as yet untapped resource through the UK’s single healthcare system … We aim to treble industry contract and R&D collaborative research in the NHS over 10 years, to nearly £1bn.” (Again, there’s no suggestion that Palantir was seeking to sell data – just that UK officials were courting it and others to help the UK make health data commercially available.)

Views of Palantir’s usefulness to NHS staff vary. One hospital, Chelsea & Westminster, says its operating platform, called Foundry, which they have been trialling since early 2022, has helped cut waiting lists by 28%. Other local trials of Palantir’s software were apparently less successful. An answer to a parliamentary question earlier this year said that of the pilots of Foundry software at local NHS hospitals, 11 had been paused or suspended. Some have resumed, but a spokesperson for Liverpool Heart & Chest hospital stated that it had stopped using Foundry because staff had decided “ it didn’t meet our needs ”. Other trusts said they had more pressing problems to fix, and at least one said they had no one who could manage the system. A more recent letter to the health and social care committee updated these figures to say there are now 36 trials underway at various stages of rollout, with three of those paused due to restart, and five still suspended.

Turning to Palantir for so much of England’s infrastructure and analysis exposes a deeper problem: our own lack of data science staff. The NHS badly needs data scientists to make it fit for the future. (Palantir has hired two of its senior people.) In February, it was reported that the NHS has more than 3,000 vacant tech roles . NHS data scientists have warned that Foundry is not “user friendly”, and that the cost is “ridiculous”.

What would a better future for NHS data look like? It would probably start more humbly, with broad democratic engagement. The government could ask the public how they would like their data managed. Who do you trust to handle it, who would you like to be able to use it, and under what conditions? Should the opt-outs be reformed to give you more choice?

With a £480m contract about to be awarded, and the DPDI bill trundling ahead, expect to see a pitched battle over these questions in the coming months. It is a battle the government seemingly hopes to avoid, as it so often has in the past, by dodging debate rather than meeting tricky questions head on. Most people want to see the NHS use data better. But they also care deeply about who is given the keys to the kingdom.

Every prior effort to centralise NHS data has failed; this chapter is yet to be written. We can still get it right, and win trust for a better solution. If we don’t, many patients will do as I did, and turn away.

This article was amended on 14 September 2023 to update the number of ongoing pilots of Foundry software.

  • The long read
  • Data and computer security
  • Health policy
  • Healthcare industry
  • Peter Thiel

Most viewed

Home

Clinical Futures, formerly The Center for Pediatric Clinical Effectiveness (CPCE), gathers and manages pediatric health information as complex data resources that can be accessed or utilized by our researchers to answer clinical effectiveness research questions.

THE DATA SCIENCE AND BIOSTATISTICS UNIT

The Data Science and Biostatistics Unit (DSBU), a core facility of CHOP’s Research Institute, administers these resources. DSBU’s staff has expertise in managing and using various data sources, ranging from electronic health records and clinical trial or registry data to administrative, claims, or survey data.

The DSBU serves as a resource for Clinical Futures, PolicyLab, and other CHOP investigators using complex data to address research questions. DSBU provides services in data extraction and management, statistical programming, biostatistics analysis, and analytics data consultation.

EXAMPLES OF DATABASES

Examples of the pediatric healthcare data resources accessible to Clinical Futures investigators include:

  • Pediatric Health Information System (PHIS): The PHIS is a comparative pediatric database that includes clinical and resource utilization data for inpatient, ambulatory surgery, Emergency Department, and observation unit patient encounters for 45 children's hospitals.Read more.
  • Premier Healthcare Database (PHD): PHD is a large, U.S. hospital-based, service-level, all-payer database that contains information on inpatient discharges, primarily from geographically diverse non-profit, non-governmental and community and teaching hospitals and health systems from rural and urban areas. Inpatient admissions include over 121 million visits, representing approximately 25% of annual United States inpatient admissions. The PHD also includes over 897 million outpatient visits to emergency departments, ambulatory surgery centers and alternate sites of care.Read more.
  • IBM MarketScan: IBM® MarketScan® Research Databases provide one of the longest-running and largest collections of proprietary de-identified claims data for privately and publicly insured people in the US. Insights from this integrated, patient-level data can help demonstrate the clinical and commercial value of treatments.Read more.

Contact Information

health data research futures

This paper is in the following e-collection/theme issue:

Published on 24.4.2024 in Vol 10 (2024)

Now Is the Time to Strengthen Government-Academic Data Infrastructures to Jump-Start Future Public Health Crisis Response

Authors of this article:

Author Orcid Image

  • Jian-Sin Lee 1 , MSc   ; 
  • Allison R B Tyler 2 , PhD   ; 
  • Tiffany Christine Veinot 1, 3, 4 , MLS, PhD   ; 
  • Elizabeth Yakel 1 , PhD  

1 School of Information, University of Michigan, Ann Arbor, MI, United States

2 UK Data Archive, University of Essex, Colchester, United Kingdom

3 Department of Health Behavior and Health Education, School of Public Health, University of Michigan, Ann Arbor, MI, United States

4 Department of Learning Health Sciences, School of Medicine, University of Michigan, Ann Arbor, MI, United States

Corresponding Author:

Jian-Sin Lee, MSc

School of Information

University of Michigan

105 S State St

Ann Arbor, MI, 48109-1285

United States

Phone: 1 734 389 9552

Email: [email protected]

During public health crises, the significance of rapid data sharing cannot be overstated. In attempts to accelerate COVID-19 pandemic responses, discussions within society and scholarly research have focused on data sharing among health care providers, across government departments at different levels, and on an international scale. A lesser-addressed yet equally important approach to sharing data during the COVID-19 pandemic and other crises involves cross-sector collaboration between government entities and academic researchers. Specifically, this refers to dedicated projects in which a government entity shares public health data with an academic research team for data analysis to receive data insights to inform policy. In this viewpoint, we identify and outline documented data sharing challenges in the context of COVID-19 and other public health crises, as well as broader crisis scenarios encompassing natural disasters and humanitarian emergencies. We then argue that government-academic data collaborations have the potential to alleviate these challenges, which should place them at the forefront of future research attention. In particular, for researchers, data collaborations with government entities should be considered part of the social infrastructure that bolsters their research efforts toward public health crisis response. Looking ahead, we propose a shift from ad hoc, intermittent collaborations to cultivating robust and enduring partnerships. Thus, we need to move beyond viewing government-academic data interactions as 1-time sharing events. Additionally, given the scarcity of scholarly exploration in this domain, we advocate for further investigation into the real-world practices and experiences related to sharing data from government sources with researchers during public health crises.

Introduction

Although the world appears to be recovering from the intense impact of the COVID-19 pandemic, it is imperative to recognize that the far-reaching effects of this disease continue to endure across the globe. Its significant ramifications, such as economic fallout, disruptions in education, mental health challenges, and racial and socioeconomic health inequities, have left indelible marks on humanity, showcasing the vulnerabilities of human society when confronted with a global public health crisis. Looking back, we should now ask: “What actions could have been done better to save more lives and reduce the aforementioned negative effects?” Moving forward, it is crucial to reflect on the valuable lessons that appear when we pose this question. Such reflection may prepare us better for future public health crises.

Amid any effort to effectively manage the spread of the COVID-19 pandemic, the rapid sharing of data stands out as a key strategy. However, instances of failure have been exposed, signaling opportunities for improvement in this critical facet of the pandemic response. Just 1 year into the COVID-19 outbreak, the British Medical Association’s chair of council voiced criticism, suggesting that “our devastating mortality figures could in part be a result of the failure of the government to properly and openly share data, communicate accurately, and act swiftly” [ 1 ]. He stressed the problematic absence of transparency concerning the actual availability of personal protective equipment supplies and the formulation of decisions on restrictions and tiers (ie, a classification system indicating local COVID-19 alert levels adopted in the United Kingdom) without also revealing the underlying statistics that supported them [ 1 ]. The US Centers for Disease Control and Prevention (CDC) itself faced criticism for its perceived “slow and siloed approach to sharing data,” which resulted in overly optimistic evaluations of the evolving vaccine effectiveness against the delta variant and contributed to the country’s falling behind in addressing that new viral mutation [ 2 ].

Government entities, a primary source of public health data, bear the responsibility for ensuring timely availability and facilitating the subsequent use of such data. To expedite the pandemic response efforts, social debates and academic studies have centered on enhancing data sharing among health care providers [ 3 ], across government departments at different levels [ 4 ], and on an international scale [ 5 ]. However, a data sharing approach that receives less attention but carries equal significance during public health crises is cross-sector data sharing collaborations between government entities and academic researchers. Specifically, this involves dedicated projects wherein a government entity shares the public health data it aggregates with an academic research team for further data analysis. The resulting data insights are then leveraged to inform policy making.

In this viewpoint, we outline the common challenges in data sharing during COVID-19, as well as other public health and general crises such as natural disasters and humanitarian emergencies. We then argue that government-academic data collaboration has the potential to alleviate these challenges, making it a topic worthy of deeper scholarly exploration. We aim to initiate a constructive discussion on effective strategies to foster this kind of cross-sector collaboration, thus paving the way for more robust and resilient responses to future public health crises.

The Significance of Data Sharing During Crises Cannot Be Overstated

In times of crisis, the saying “speed is everything” resonates more profoundly than ever. The landscape of a crisis is defined by difficulties related to uncertainty, urgency, and the relentless pursuit of solutions. Among these difficulties, data emerge as a beacon of understanding—a key resource that can illuminate facts, accelerate actions, and prevent unnecessary public panic. The significance of data sharing cannot be overstated during such moments when rapid access to comprehensive, accurate, and timely data becomes the linchpin of effective crisis management.

The lessons learned from previous natural disasters and humanitarian crises have reinforced the importance of rapid data sharing [ 6 - 8 ]. This demand originated from not only frontline responders, decision makers, and those directly impacted but also those capable of and willing to contribute to crisis resolution, such as academic researchers. Similarly, even before the arrival of the unprecedented COVID-19 pandemic, a collection of public health crises highlighted the critical role of sharing data to safeguard global well-being. The outcomes of not sharing data or failing to integrate and coordinate data access during public health emergencies were studied, as seen during the 2003 outbreak of SARS [ 9 , 10 ]. In the past, data sharing was limited by technological constraints or underdeveloped legal frameworks. However, the necessity of data exchange was evident. Today, with advancements in data science knowledge and techniques, data sharing holds the potential for even greater advances that may improve the responses to public health crises.

Several characteristics collectively set COVID-19 apart from preceding crisis events: its expansive geographical reach, rapid transmission, prolonged duration, and far-reaching socioeconomic consequences. Moreover, the exceptional need for rapid data dissemination across jurisdictional and organizational boundaries during this crisis also added to its distinctiveness [ 11 ]. As stated by epidemiologist Dr Maria van Kerkhove, the COVID-19 technical lead at the World Health Organization (WHO), “When there’s so little information on a novel pathogen, any information that you can get your hands on is absolutely critical” [ 12 ]. During the COVID-19 pandemic, data sharing has played a crucial role in enhancing decision makers’ contextual understanding of the pandemic, which was deemed a key aspect of “public health situational awareness” by the US government during the pandemic [ 13 ]. Ideally, such heightened awareness would have facilitated more rapid responses, thus enabling the implementation of appropriate measures, such as resource allocation and public health interventions. When successful, data sharing also enables the examination of health disparities by providing access to demographic data, which was particularly significant in COVID-19 due to the disproportionate impact on marginalized socioeconomic, racial, and ethnic groups worldwide [ 14 , 15 ].

During the early stages of COVID-19, we did witness efforts to share data, such as information about potential treatments and disease spread. However, in many instances, the underlying data were of varying or low quality, as evidenced by the pandemic-related preprints that emerged [ 16 ]. In addition, considering the unpredictability of the COVID-19 pandemic, rapid data exchange necessitated effective coordination particularly among various stakeholders engaged in the “learning health system” cycle [ 17 ], which involved assembling, analyzing, and transforming data into knowledge and then performance, all within a limited time frame. Different actors’ data needs also varied across pandemic phases, such as whether a situation occurred suddenly or insidiously or whether consequences rapidly abated or lingered. Given the complexity of crisis data–sharing relationships, in the next section, we examine documented challenges in data sharing during crises that provide us with a better understanding of this issue.

Data Sharing Challenges During Public Health Crises

Public health crises have the potential for global ramifications; however, addressing them necessitates a localized approach [ 18 ]. During the COVID-19 outbreak, there existed attempts to share key data points (eg, viral genomes and methods of transmission) at the international, national, and local levels [ 19 ]. However, a difficulty lay in effectively acquiring public health insights due to the complexity of integrating disparate data across geopolitical boundaries and jurisdictional levels.

In the United States, the COVID-19 response was impaired due to the public health data infrastructures’ inability to effectively share data across and within jurisdictions. As widely reported, the efficiency and timeliness of data sharing, supported by these data infrastructures, have not always been satisfactory. The desired data are “scattered” across unconnected or proprietary databases, exist in incompatible formats, or are of dubious quality and provenance [ 20 ]. Present methods of collecting public health data primarily depend on manual processes for reporting instances of certain communicable diseases and outbreaks of new diseases. Using the data accessible in electronic health record systems is not often done, even when possible. This disconnect impedes the effective use of available data. Unsurprisingly, secondary use of clinical data for public health purposes is usually insufficient [ 21 ]. Furthermore, many researchers were and still are unable to swiftly integrate with existing public health data infrastructures and “find the right antidote” for research demands when facing unforeseen public health crises [ 22 ].

The long-standing underinvestment in the maturity and agility of US public health data infrastructures has been frequently emphasized at the federal as well as state and local levels [ 23 ]. This gap became particularly problematic during the COVID-19 pandemic. From a broader perspective, neither early legislative efforts, such as the HITECH Act of 2009, nor more recent programs dedicated to COVID-19, such as the CDC’s Data Modernization Initiative, have successfully addressed the country’s fragilities in constructing critical infrastructures to meet the ongoing public health surveillance needs [ 21 ]. Admittedly, modern data-driven technologies have been widely implemented in both research and clinical contexts. Nevertheless, the aforementioned deficiencies hindered not only comprehensive real-time data analyses at the technical level [ 24 ] but also large-scale coordination between data holders and requesters at the social and organizational levels [ 25 ]. For example, in response to COVID-19, a plethora of data aggregation initiatives emerged, involving key actors, such as academic medical center networks as data holders and public health departments as data consumers. Subsequently, data requests from these federal- or state-level public health departments turned out to impose a substantial burden on academic medical centers’ reporting mechanisms [ 25 ].

While measures taken in response to the COVID-19 pandemic brought some data sharing challenges into sharp relief, the challenges arising at that time have a long history. A succession of diverse crises exposed the challenges that responders encountered when aiming to achieve rapid data sharing. By mapping the nature of different types of crises alongside their documented challenges, we can enhance our understanding of the most important data sharing approaches to study and maintain during such crises. In Table 1 [ 25 - 97 ], based on the extant research literature, we summarize 5 categories and 20 subcategories of common data sharing challenges during COVID-19, other public health crises, as well as other natural disasters and humanitarian crises.

After conducting a comprehensive search of scholarly literature in various fields such as public health, biomedicine, crisis informatics, and broader information science, we developed a corpus of papers relevant to data sharing challenges. Starting from this corpus, the aforementioned categories and subcategories were developed through an iterative qualitative analysis process to describe, conceptually order, and classify the bodies of text [ 98 ] as follows. First, we discerned challenges documented in the literature. Identifiable patterns then emerged, leading to the inductive grouping of specific challenges into common broad categories. These categories were further refined into their respective subcategories, taking into account the nuances of specific challenges and whether they were experienced during the COVID-19 pandemic, other public health crises, or additional types of crises (ie, natural disasters and humanitarian crises). Given the vast volume of publications discussing data sharing challenges, the categories and subcategories we developed are not meant to be exhaustive. Rather, they function as a starting framework for an interpretive critique of existing research gaps.

In all, the 5 categories of data sharing challenges we have identified are (1) data availability and quality, (2) data management and sharing, (3) information systems and data interoperability, (4) resource limitations, and (5) multiparty collaboration and coordination. In Table 1 , we use omnibus terms, such as “stakeholders” and “data sharing entities,” to encompass a diverse range of individuals (eg, researchers, clinicians, and first responders) and organizations (eg, government agencies, health care providers, and humanitarian groups) actively involved in data exchange during crises. In the next section, we describe government-academic data collaborations—an often-underexplored type of data interaction during public health crises—and how to mitigate the challenges outlined below in these collaborations.

Navigating Challenges: Government-Academic Collaboration as Part of the Social Infrastructure for Public Health Crisis Response

Data collected, aggregated, or provided by government entities have been observed to play a key part in managing the COVID-19 crisis. As an example that involves government data for internal use, in the United States, the city government of Boston used its preexisting data warehouse, aggregating data from 31 departments, to rapidly develop a public dashboard at the outset of the pandemic [ 99 ].

Simultaneously, there were instances globally where academia has engaged in using government data to understand the pandemic, with various forms of such cross-sector initiatives. For example, the Israeli government orchestrated a “datathon” competition—an event uniting participants from diverse sectors, including academic scientists, to devise practical, data-driven models and insights [ 100 ]. Another approach to effectively leveraging government data is by directly making them available to citizens, including academic researchers. The term “open government data” (OGD) refers to government-held data made accessible to the public to enhance transparency regarding government operations [ 101 ]. One common method for obtaining OGD is through government agencies’ open data portals [ 102 ]. This method has enabled various stakeholders to engage in data analysis before and during the COVID-19 period [ 103 , 104 ].

Nonetheless, there are limitations to the aforementioned data interactions between government entities and academia. Specifically, short-term data analysis competitions may not adequately support the relatively medium to long-term needs for pandemic response policy planning by government entities. Participants from various sectors are unlikely to sustainably remain within the government's data collaboration network after the competition ends. On the other hand, in relation to OGD, there is a critical aspect of their availability—when accessing OGD, users download data independently without direct interactions with government agencies as data providers. Such a relatively unilateral data access approach may give rise to issues related to data quality and usability (eg, data integrity, granularity, and timeliness [ 105 ]), potentially impeding users’ understanding and appropriate interpretation of OGD and in turn undermining the data’s overall effectiveness to be used accurately, appropriately, and efficiently.

Consequently, there is a growing demand for enhanced “direct” collaboration between government entities and researchers on data-centric research projects. This form of collaboration entails government entities sharing data with researchers who have advanced data science skills, leading to productive “research-policy partnerships” that subsequently inform policy-making processes [ 106 ]. When successful, such partnerships serve as a catalyst for “rapid response data science” to address public health crises [ 107 ]. Particularly, governments possess valuable data but may lack the necessary resources to analyze them [ 108 ], while academic researchers often face challenges in collecting or accessing critical data due to legal, technical, or financial limitations [ 109 ]. Consequently, an ideal scenario involves researchers with data science and other methodological expertise effectively using these public health data to conduct studies that surpass the capabilities of the government’s in-house efforts [ 110 ], thereby facilitating greater evidence-based policymaking. Notably, beyond merely handing over the data, governmental collaborators play a role in identifying the critical problems to be solved, as well as in pinpointing the strengths and limitations of the shared data. This helps academic collaborators develop solutions that truly address the issues at hand and mitigate the risk of misinterpretation or misuse of the data they acquire. However, despite the needs and benefits, there is a lack of in-depth investigations into the precise nature of such cross-sector collaborations driven by data flows specifically from government entities to academic researchers.

Transitioning into the postpandemic era, the current juncture presents an opportune moment for conducting more systematic research into the aforementioned form of government-academic data collaborations, whether examining individual cases or identifying patterns across multiple cases. Notably, the sole, relatively detailed examination of similar government-researcher data collaborations during COVID-19 that we are aware of is a report [ 111 ] revealing the partnership between the Washington State Department of Health (DOH), the University of Washington, and multiple research institutes, including the Institute for Disease Modeling (IDM), which operates as an embedded research group in a nonprofit foundation. Within this collaboration, the Washington State DOH was able to successfully share data with IDM, ultimately leveraging insights derived from IDM researchers’ modeling and analytical findings to inform the state’s pandemic response strategies.

We possess limited information about the development and maintenance of other government-academic collaborations, primarily in the United States. These include a project in which a Stanford University team of data modeling experts used data provided by California state agencies to forecast disease trends for public health officials [ 112 ]. Also in California, in response to the temporary closure of most daycare centers, the state’s Health and Human Services Agency and the University of Southern California built on their preexisting data integration program, attempting to connect essential workers with available childcare providers [ 113 ]. Additionally, a multidisciplinary team at the University of Michigan partnered with the Michigan Department of Health and Human Services to develop a series of data-driven tools (eg, symptom and vaccination monitoring applications) to track the pandemic within the state [ 114 ]. Despite this limited information, these additional cases make it clear that such collaborations exist more broadly, and that they potentially hold value in responding to public health crises.

Our call for research into government-academic collaborations is vital given the history of previous efforts. For example, between 2005 and 2011, the CDC launched the Centers of Excellence in Public Health Informatics program. This initiative shared similarities with the form of government-academic collaboration we are advocating as it aimed to bridge the gap between public health research and practice through collaborations among academia, local or state public health departments, and other health informatics professionals [ 115 ]. The program financially supported academic institutions, such as Harvard University and Indiana University [ 116 ], to establish research centers that would translate research outcomes into public health practice. Data sharing and information exchange were integral components as well, though primarily from clinical sources to public health information systems [ 117 ]. However, after the funding concluded, the infrastructures of the research centers necessitated institutional or external backing for sustainability. This indicates the difficulty in maintaining the long-term viability of such short- or medium-term programs.

Up to this point, we have outlined several forms of government-academic data interactions and their limitations in terms of effectiveness, efficiency, and sustainability. In the following subsection, we elaborate on an alternative collaborative form that may function better during public health crises than the other aforementioned government-academic data interactions.

Government-Academic Collaborations as a Promising Solution to Data Sharing Challenges

A public health data infrastructure can be defined as “an ecosystem composed of the people, processes, procedures, tools, facilities, and technologies, which supports the capture, storage, management, exchange, and creation of data and information to support individual patient care and population health” [ 118 ]. Nonetheless, it is notable that existing discussions around these kinds of infrastructures mostly center on technological aspects, such as health information systems and their data standards for interoperability. At the same time, other key components involving people, processes, and norms have received less attention [ 119 ]. In fact, for researchers, we contend that data collaborations with government entities should be considered part of the social infrastructure that supports their research efforts toward public health crisis response. Specifically, as we argue below, many of the data sharing challenges displayed in Table 1 may be effectively mitigated through well-planned government-academic collaborations. In the upcoming sections, we explain how the 5 types of challenges (data availability and quality, data management and sharing, information systems and data interoperability, resource limitation, and multiparty collaboration and coordination) can be navigated by developing partnerships between government entities and academia, particularly during public health emergencies:

Data Availability and Quality

Government-academic data collaborations are goal oriented and usually provide researchers with prepared data sources, alleviating concerns about data location and the identification of relevant information from the vast pool of available data “outside,” which are often collected by researchers themselves. In the exemplar case described above, the IDM researchers highlighted that during their collaboration with the Washington State DOH, the COVID-19 data shared by the government was of high quality, which significantly expedited their work, enabling them to deliver outputs swiftly [ 111 ]. Furthermore, this collaborative model often implies prioritized communication channels, facilitating prompt resolution of data quality issues by both parties.

Data Management and Sharing

Intentionally built collaborations often smooth data exchange activities by facilitating the creation of a preestablished data sharing framework, which comprises agreements on data management and use, as well as clearly defined obligations and codes of conduct for both parties involved [ 120 ]. For example, in response to the external demands for Zika-related research projects, Brazil’s Secretariat of Health took a proactive approach by initiating collaboration protocols with researchers [ 74 ]. These data-related agreements clearly regulated the conditions for accessing data and thus laid the foundations for project execution at maximum speed [ 74 ]. Besides, such data sharing frameworks function as a common space where collaborators can fine-tune their collective data activities based on project performance.

Information Systems and Data Interoperability

As government entities and researchers work together around data resources, potential issues, such as inadequate technical infrastructures and inconsistent data standards, may come to light during the early stages [ 121 ]. Nonetheless, such a collaboration also presents an opportunity for both parties to acknowledge these problems and proactively work toward resolving them. To illustrate, although not identical to the collaborative model we advocate, the well-known National COVID Cohort Collaborative initiative aimed to overcome interoperability barriers. Ultimately, it managed to build a scalable data analytics infrastructure by uniting US federal agencies, health care providers, and research leaders to harmonize pandemic data across different organizations [ 122 ]. Notably, there remains much to investigate regarding the implementation of data standardization within collaborative efforts on a smaller scale, specifically between government entities and researchers.

Resource Limitations

Government-academic collaborations make dedicated investments, including workforce and funding, in their data projects. This commitment enables the efficient integration of complementary resources from both sectors, facilitating a synergistic approach to data-driven initiatives. In particular, public health and biomedical informatics experts recently stressed the need to build “a public health workforce that is skilled in informatics and data science...to meet 21st century health threats” [ 21 ]. Nonetheless, they simultaneously pointed out the challenges in recruiting incoming talent as well as in training this workforce, an ongoing problem that state and local public health departments have historically faced [ 21 ]. This further highlights the value of government-academic collaborations, in which public health authorities can borrow well-established expertise from academics in “rapid response data science” [ 107 ].

Multiparty Collaboration and Coordination

Previous ongoing collaborative relationships help to build trust in advance, eliminating the need for a cumbersome initiation period characterized by misaligned organizational priorities and conflicting power dynamics that impede the rapid circulation of data assets. In the case of the Washington State DOH and IDM described above, the IDM researchers achieved favorable outcomes in their collaboration with DOH employees by recognizing the significance of building trust [ 111 ]. Based on their accounts, the crucial element that contributed to the success of the collaboration was the ability to align needs and tasks at an early stage of the partnership [ 111 ]. In addition, such early coordination efforts may also help data holders preempt the challenges posed by the previously discussed scenario of receiving a sudden surge in requests from data consumers [ 25 ].

To summarize, we contend that placing sole emphasis on the exchange of data through technological infrastructures falls short when confronted with the challenges of a public health crisis. Government-academic data collaborations, as essential social infrastructures, encompass not only people, processes, and norms but also rely on trusting relationships within the larger legal and political context. These elements are all integral and indispensable components for the success of the data sharing enterprise. Ultimately, sharing data is not just a technical process—it should be a collaborative endeavor that transcends boundaries. Thus, we should study and implement such collaborations now, before the next public health crisis is on us. Doing so may help to establish greater readiness and more rapid responses in the future. We now outline recommendations that, if implemented, may assist in developing this crucial sociotechnical infrastructure.

Conclusion and Recommendations

In this viewpoint paper, we investigate the challenges associated with sharing data in public health crises, many of which stem from the long-standing inadequacy in the US public health data infrastructures. In particular, we have witnessed repeated appeals for increased data sharing endeavors spanning various sectors and extending in multiple directions, such as data scientists in the health care industry stressing that “sharing data should not just be a one-way street from the clinician to the researcher” [ 123 ]. However, the factors for successful collaborative data sharing across sectors in public health crises—in which government entities share data with academic researchers for effective use—need further attention. Therefore, resulting from a synthesis of extant research and our arguments, we call for more effort to be invested in building data sharing infrastructures capable of bridging and leveraging the respective strengths of government entities and academic researchers. Such infrastructures need to be established within an ecosystem that incorporates not only technologies but also policies, processes, and personnel. This holistic framework is ideally designed to facilitate researchers in seamlessly accessing and employing data aggregated and managed by government entities for their mutual benefit.

The COVID-19 pandemic has taught us a valuable lesson, which surpasses those gained from any previous public health emergencies: that the aforementioned infrastructure for rapid and effective data sharing should be established well in advance of a crisis. Particularly, we argue that government-academic data interactions should not be thought about as only 1-time data sharing. Instead, we recommend that emphasis should be placed on the construction of robust and enduring collaborative infrastructure that not only outlasts a specific public health crisis but also is in place to respond to the next one. Ideally, these data collaborations should not be confined to emergencies or a small number of high-priority threats [ 51 ]. After all, data sharing practices both during and between crises affect crisis response efforts, albeit potentially in distinct manners. To be specific, routine data sharing practices in scientific research and the availability of preexisting baseline data before a crisis can play a crucial role in facilitating prompt planning for health relief activities [ 32 , 55 ]. In addition, even before implementing crisis response measures, persistent data partnerships may hold the potential to enhance the detection and early characterization of issues arising during a crisis, facilitated by the accelerated exchange of information between government entities and academics.

As of May 2023, the WHO and the US government declassified COVID-19 as a public health emergency. While most individuals have moved on, for those who have compromised immune systems or are otherwise at greater risk for negative outcomes from the virus, exchanging data to facilitate accurate disease-level reporting remains crucial for evaluating their safety. However, the termination of certain data sharing mandates and data-collection initiatives could hinder government bodies and research institutions from maintaining uninterrupted access to vital disease-related metrics [ 124 ]. With the resurgence of COVID-19 hospital admissions since July 2023 in the United States [ 125 ] and the possibility of “a new norm of summer surges” [ 126 ], it is worth considering whether we want to revert to a just-in-time approach to data sharing practices or if we should be proactive and build just-in-case resilient, long-term data infrastructures for forthcoming public health scenarios. We strongly assert that our choice should be the latter.

Future Research Agenda

As mentioned, it is critical to begin now to establish more effective government-academic collaborative infrastructures for public health crisis response. To do so, we must develop more systematic research on the facilitating and impeding factors for such data collaborations. In this viewpoint paper, we reviewed existing research literature and summarized data sharing challenges during different crisis scenarios ( Table 1 ). Significantly, we conclude from the literature review that there is a conspicuous scarcity of scholarship addressing the practices and experiences related to disseminating data from government sources to researchers throughout extended or ongoing crisis response situations, including instances of global health crises. In particular, in terms of data-exchange partnerships during public health crises, the public health and biomedical informatics literature often enumerates a wide range of stakeholders [ 21 , 127 ] but generally lacks a specialized focus on the connections between government entities and researchers. On the other hand, literature within the realm of crisis informatics more often addresses the circulation of information and data among the public, frontline responders, and governmental bodies in natural disaster scenarios (eg, earthquakes [ 88 ], hurricanes [ 128 ], and wildfires [ 129 ]).

While government-academic collaborations that allow data exchange did exist at different administrative levels during COVID-19, there is a notable dearth of research studying these relationships. To initiate further discussions, we draw on the data sharing challenges outlined earlier and propose 3 key research questions, to foster more substantive dialogues and shape the future research agenda: (1) What types of government-academic collaborative infrastructures should we be developing? How can these infrastructures be best sustained? (2) Considering the unique characteristics of public health crises, what are the best practices for implementing data sharing and data collaborations? and (3) From the respective views of government entities and researchers, what are the incentives and disincentives that influence their willingness and capacity to engage in developing and sustaining collaborative data infrastructures?

In conclusion, the COVID-19 pandemic has emphasized the imperative for robust and durable government-academic partnerships in public health crises. As we transition beyond the pandemic, it is crucial to develop systematic research on the factors influencing these collaborations. Before the next public health crisis arises, we invite decision makers, researchers, and practitioners across government entities, academia, and various sectors to leverage the collective knowledge and expertise of diverse stakeholders, strengthening existing and building new government-academic data collaborative infrastructures. The time to act is now, and the path to a more resilient future begins with our commitment to addressing these critical challenges.

Acknowledgments

The authors would like to thank the Academic Data Science Alliance and Dr Jing Liu at the Michigan Institute for Data Science for their inspiration in the development of this manuscript. TCV’s effort on this manuscript was funded by an unrestricted gift from Google Health as part of its COVID-19 relief initiatives.

Authors' Contributions

JSL led the content development of the manuscript, prepared the initial draft, and coordinated contributions and subsequent revisions. ARBT took the lead in drafting an earlier version of the manuscript and contributed to subsequent revisions of the current version. TCV and EY provided critical revisions to this manuscript, enhancing its intellectual content. All authors contributed content and feedback that informed the drafting process and agreed to be responsible for all aspects of the work, ensuring its integrity and accuracy.

Conflicts of Interest

None declared.

  • Mahase E. COVID-19: government's failure to share data and face scrutiny have undermined response, say MPs. BMJ. 2021;372:n717. [ FREE Full text ] [ CrossRef ] [ Medline ]
  • Abutaleb Y, Sun L. How CDC data problems put the U.S. behind on the delta variant. The Washington Post. 2021. URL: https://www.washingtonpost.com/health/2021/08/18/cdc-data-delay-delta-variant/ [accessed 2024-03-27]
  • Sarkar IN. Transforming health data to actionable information: recent progress and future opportunities in health information exchange. Yearb Med Inform. 2022;31(1):203-214. [ FREE Full text ] [ CrossRef ] [ Medline ]
  • Benjamin-Chung J, Reingold A. Measuring the success of the US COVID-19 vaccine campaign-it's time to invest in and strengthen immunization information systems. Am J Public Health. 2021;111(6):1078-1080. [ FREE Full text ] [ CrossRef ] [ Medline ]
  • Tacconelli E, Gorska A, Carrara E, Davis RJ, Bonten M, Friedrich AW, et al. Challenges of data sharing in European COVID-19 projects: a learning opportunity for advancing pandemic preparedness and response. Lancet Reg Health Eur. 2022;21:100467. [ FREE Full text ] [ CrossRef ] [ Medline ]
  • Fast L. Governing data: relationships, trust and ethics in leveraging data and technology in service of humanitarian health delivery. Daedalus. 2023;152(2):125-140. [ FREE Full text ] [ CrossRef ]
  • Smith SW, Braun J, Portelli I, Malik S, Asaeda G, Lancet E, et al. Prehospital indicators for disaster preparedness and response: New York city emergency medical services in Hurricane Sandy. Disaster Med Public Health Prep. 2016;10(3):333-343. [ FREE Full text ] [ CrossRef ] [ Medline ]
  • Soden R, Palen L. Infrastructure in the wild: what mapping in post-earthquake nepal reveals about infrastructural emergence. 2016. Presented at: CHI '16: Proceedings of the 2016 CHI Conference on Human Factors in Computing Systems; May 7-12, 2016;2796-2807; San Jose, California, USA. [ CrossRef ]
  • Langat P, Pisartchik D, Silva D, Bernard C, Olsen K, Smith M, et al. Is there a duty to share? Ethics of sharing research data in the context of public health emergencies. Public Health Ethics. 2011;4(1):4-11. [ CrossRef ]
  • Naylor CD, Chantler C, Griffiths S. Learning from SARS in Hong Kong and Toronto. JAMA. 2004;291(20):2483-2487. [ CrossRef ] [ Medline ]
  • Kraemer MUG, Scarpino SV, Marivate V, Gutierrez B, Xu B, Lee G, et al. Data curation during a pandemic and lessons learned from COVID-19. Nat Comput Sci. 2021;1(1):9-10. [ FREE Full text ] [ CrossRef ] [ Medline ]
  • Watson C. Rise of the preprint: how rapid data sharing during COVID-19 has changed science forever. Nat Med. 2022;28(1):2-5. [ FREE Full text ] [ CrossRef ] [ Medline ]
  • Sekar K, Naplii A. Tracking COVID-19: U.S. Public Health Surveillance and Data. Congressional Research Service. 2020. URL: https://crsreports.congress.gov/product/pdf/R/R46588/4 [accessed 2024-03-27]
  • Khunti K, Feldman EL, Laiteerapong N, Parker W, Routen A, Peek M. The impact of the COVID-19 pandemic on ethnic minority groups with diabetes. Diabetes Care. 2023;46(2):228-236. [ FREE Full text ] [ CrossRef ] [ Medline ]
  • Tai DBG, Shah A, Doubeni CA, Sia IG, Wieland ML. The disproportionate impact of COVID-19 on racial and ethnic minorities in the United States. Clin Infect Dis. 2021;72(4):703-706. [ FREE Full text ] [ CrossRef ] [ Medline ]
  • Fraser N, Brierley L, Dey G, Polka JK, Pálfy M, Nanni F, et al. The evolving role of preprints in the dissemination of COVID-19 research and their impact on the science communication landscape. PLoS Biol. 2021;19(4):e3000959. [ FREE Full text ] [ CrossRef ] [ Medline ]
  • Friedman CP. What is unique about learning health systems? Learn Health Syst. 2022;6(3):e10328. [ FREE Full text ] [ CrossRef ] [ Medline ]
  • Wilkinson A. Local response in health emergencies: key considerations for COVID-19 in informal urban settlements. Environ Urban. 2020;32(2):503-522. [ FREE Full text ] [ CrossRef ] [ Medline ]
  • Rios RS, Zheng KI, Zheng MH. Data sharing during COVID-19 pandemic: what to take away. Expert Rev Gastroenterol Hepatol. 2020;14(12):1125-1130. [ CrossRef ] [ Medline ]
  • LaFraniere S. ‘Very harmful’ lack of data blunts U.S. response to outbreaks. The New York Times. 2022. URL: https:/​/www.​nytimes.com/​2022/​09/​20/​us/​politics/​covid-data-outbreaks.​html?smid=nytcore-ios-share&referringSource=articleShare [accessed 2024-03-27]
  • Acharya JC, Staes C, Allen KS, Hartsell J, Cullen TA, Lenert L, et al. Strengths, weaknesses, opportunities, and threats for the nation's public health information systems infrastructure: synthesis of discussions from the 2022 ACMI Symposium. J Am Med Inform Assoc. 2023;30(6):1011-1021. [ FREE Full text ] [ CrossRef ] [ Medline ]
  • Bookman RJ, Cimino JJ, Harle CA, Kost RG, Mooney S, Pfaff E, et al. Research informatics and the COVID-19 pandemic: challenges, innovations, lessons learned, and recommendations. J Clin Transl Sci. 2021;5(1):e110. [ FREE Full text ] [ CrossRef ] [ Medline ]
  • Maani N, Galea S. COVID-19 and underinvestment in the public health infrastructure of the United States. Milbank Q. 2020;98(2):250-259. [ FREE Full text ] [ CrossRef ] [ Medline ]
  • Cosgriff CV, Ebner DK, Celi LA. Data sharing in the era of COVID-19. Lancet Digit Health. 2020;2(5):e224. [ FREE Full text ] [ CrossRef ] [ Medline ]
  • Madhavan S, Bastarache L, Brown JS, Butte AJ, Dorr DA, Embi PJ, et al. Use of electronic health records to support a public health response to the COVID-19 pandemic in the United States: a perspective from 15 academic medical centers. J Am Med Inform Assoc. 2021;28(2):393-401. [ FREE Full text ] [ CrossRef ] [ Medline ]
  • Austin CC, Bernier A, Bezuidenhout L, Bicarregui J, Biro T, Cambon-Thomsen A, et al. Fostering global data sharing: highlighting the recommendations of the Research Data Alliance COVID-19 working group. Wellcome Open Res. 2021;5:267. [ FREE Full text ] [ CrossRef ] [ Medline ]
  • Kannampallil TG, Foraker RE, Lai AM, Woeltje KF, Payne PRO. When past is not a prologue: adapting informatics practice during a pandemic. J Am Med Inform Assoc. 2020;27(7):1142-1146. [ FREE Full text ] [ CrossRef ] [ Medline ]
  • Meesters K, Steenbergen O, Wang Y. Data management in emergency response: observations from the field during COVID-19 and Storm Ana. 2023. Presented at: Proceedings of the 56th Hawaii International Conference on System Sciences; January 2023;1818-1827; Maui, Hawaii. URL: https://hdl.handle.net/10125/102858 [ CrossRef ]
  • Lal A, Ashworth HC, Dada S, Hoemeke L, Tambo E. Optimizing pandemic preparedness and response through health information systems: lessons learned from Ebola to COVID-19. Disaster Med Public Health Prep. 2022;16(1):333-340. [ FREE Full text ] [ CrossRef ] [ Medline ]
  • Ohrt C, Roberts KW, Sturrock HJW, Wegbreit J, Lee BY, Gosling RD. Information systems to support surveillance for malaria elimination. Am J Trop Med Hyg. 2015;93(1):145-152. [ FREE Full text ] [ CrossRef ] [ Medline ]
  • Wetsman N. The missing pieces: lack of Zika data from Africa complicates search for answers. Nat Med. 2017;23(8):904-906. [ CrossRef ] [ Medline ]
  • Aung E, Whittaker M. Preparing routine health information systems for immediate health responses to disasters. Health Policy Plan. 2013;28(5):495-507. [ FREE Full text ] [ CrossRef ] [ Medline ]
  • Son J, Lim CS, Shim HS, Kang JS. Development of knowledge graph for data management related to flooding disasters using open data. Future Internet. 2021;13(5):124. [ FREE Full text ] [ CrossRef ]
  • Alamo T, Reina DG, Mammarella M, Abella A. COVID-19: open-data resources for monitoring, modeling, and forecasting the epidemic. Electronics. 2020;9(5):827. [ FREE Full text ] [ CrossRef ]
  • Gao F, Tao L, Huang Y, Shu Z. Management and data sharing of COVID-19 pandemic information. Biopreserv Biobank. 2020;18(6):570-580. [ CrossRef ] [ Medline ]
  • Moher D. COVID-19 and the research scholarship ecosystem: help!. J Clin Epidemiol. 2021;137:133-136. [ FREE Full text ] [ CrossRef ] [ Medline ]
  • Ahmed A. Urgent call for a global enforcement of the public sharing of health emergencies data: lesson learned from serious arboviral disease epidemics in Sudan. Int Health. 2020;12(4):238-240. [ FREE Full text ] [ CrossRef ] [ Medline ]
  • Kahn R, Mahmud AS, Schroeder A, Ramirez LHA, Crowley J, Chan J, et al. Rapid forecasting of cholera risk in Mozambique: translational challenges and opportunities. Prehosp Disaster Med. 2019;34(5):557-562. [ CrossRef ] [ Medline ]
  • National Academies of Sciences, Engineering, and Medicine, Health and Medicine Division, Board on Global Health, Forum on Microbial Threats. Overcoming barriers in the field to bolster access and practical use of innovations. In: Nicholson A, Amponsah E, Buckley G, Pavlin J, editors. Exploring the Frontiers of Innovation to Tackle Microbial Threats: Proceedings of a Workshop. Washington, DC. National Academies Press; 2020;37-57.
  • Bjerge B, Clark N, Fisker P, Raju E. Technology and information sharing in disaster relief. PLoS One. 2016;11(9):e0161783. [ FREE Full text ] [ CrossRef ] [ Medline ]
  • Dron L, Dillman A, Zoratti MJ, Haggstrom J, Mills EJ, Park JJH. Clinical trial data sharing for COVID-19-related research. J Med Internet Res. 2021;23(3):e26718. [ FREE Full text ] [ CrossRef ] [ Medline ]
  • Ma F. How can information and data management be used to address global health crisis. Data Inf Manag. 2020;4(3):127-129. [ FREE Full text ] [ CrossRef ] [ Medline ]
  • Wolkewitz M, Puljak L. Methodological challenges of analysing COVID-19 data during the pandemic. BMC Med Res Methodol. 2020;20(1):81. [ FREE Full text ] [ CrossRef ] [ Medline ]
  • Tambo E, Kazienga A, Talla M, Chengho C, Fotsing C. Digital technology and mobile applications impact on Zika and Ebola epidemics data sharing and emergency response. J Health Med Inform. 2017;08(02):1000254. [ FREE Full text ] [ CrossRef ]
  • Altay N, Labonte M. Challenges in humanitarian information management and exchange: evidence from Haiti. Disasters. 2014;38(Suppl 1):S50-S72. [ CrossRef ] [ Medline ]
  • Hellmann D, Maitland C, Tapia A. Collaborative analytics and brokering in digital humanitarian response. 2016. Presented at: CSCW '16: Proceedings of the 19th ACM Conference on Computer-Supported Cooperative Work & Social Computing; February 27-March 2, 2016;1284-1294; San Francisco, California, USA. [ CrossRef ]
  • Qadir J, Ali A, Rasool RU, Zwitter A, Sathiaseelan A, Crowcroft J. Crisis analytics: big data-driven crisis response. Int J Humanitarian Action. 2016;1(1):12. [ FREE Full text ] [ CrossRef ]
  • Komorowski M, Kraemer MUG, Brownstein JS. Sharing patient-level real-time COVID-19 data. Lancet Digit Health. 2020;2(7):e345. [ FREE Full text ] [ CrossRef ] [ Medline ]
  • Strcic J, Civljak A, Glozinic T, Pacheco RL, Brkovic T, Puljak L. Open data and data sharing in articles about COVID-19 published in preprint servers medRxiv and bioRxiv. Scientometrics. 2022;127(5):2791-2802. [ FREE Full text ] [ CrossRef ] [ Medline ]
  • Delaunay S, Kahn P, Tatay M, Liu J. Knowledge sharing during public health emergencies: from global call to effective implementation. Bull World Health Organ. 2016;94(4):236-236A. [ FREE Full text ] [ CrossRef ] [ Medline ]
  • Kozlakidis Z, Abduljawad J, Al Khathaami AM, Schaper L, Stelling J. Global health and data-driven policies for emergency responses to infectious disease outbreaks. Lancet Glob Health. 2020;8(11):e1361-e1363. [ FREE Full text ] [ CrossRef ] [ Medline ]
  • Checchi F, Warsame A, Treacy-Wong V, Polonsky J, van Ommeren M, Prudhon C. Public health information in crisis-affected populations: a review of methods and their use for advocacy and action. Lancet. 2017;390(10109):2297-2313. [ CrossRef ] [ Medline ]
  • Thieren M. Health information systems in humanitarian emergencies. Bull World Health Organ. 2005;83(8):584-589. [ FREE Full text ] [ Medline ]
  • Renda A, Castro R. Towards stronger EU governance of health threats after the COVID-19 pandemic. Eur J Risk Regul. 2020;11(2):1-10. [ FREE Full text ] [ CrossRef ]
  • Chretien JP, Rivers CM, Johansson MA. Make data sharing routine to prepare for public health emergencies. PLoS Med. 2016;13(8):e1002109. [ FREE Full text ] [ CrossRef ] [ Medline ]
  • Littler K, Boon WM, Carson G, Depoortere E, Mathewson S, Mietchen D, et al. Progress in promoting data sharing in public health emergencies. Bull World Health Organ. 2017;95(4):243. [ FREE Full text ] [ CrossRef ] [ Medline ]
  • Modjarrad K, Moorthy VS, Millett P, Gsell PS, Roth C, Kieny MP. Developing global norms for sharing data and results during public health emergencies. PLoS Med. 2016;13(1):e1001935. [ FREE Full text ] [ CrossRef ] [ Medline ]
  • Jafar AJN. Disaster documentation: improving medical information-sharing in sudden-onset disaster scenarios. Third World Q. 2019;41(2):321-339. [ CrossRef ]
  • Manfré LA, Hirata E, Silva JB, Shinohara EJ, Giannotti MA, Larocca APC, et al. An analysis of geospatial technologies for risk and natural disaster management. Int J Geo Inf. 2012;1(2):166-185. [ FREE Full text ] [ CrossRef ]
  • Muskat J, Gustafson LG. Data management, sharing, and dissemination at drills and spills. Int Oil Spill Conf Proc. 2017;2017(1):2550-2560. [ FREE Full text ] [ CrossRef ]
  • Bubela T, Flood CM, McGrail K, Straus SE, Mishra S. How Canada's decentralised COVID-19 response affected public health data and decision making. BMJ. 2023;382:e075665. [ FREE Full text ] [ CrossRef ] [ Medline ]
  • Foraker RE, Lai AM, Kannampallil TG, Woeltje KF, Trolard AM, Payne PRO. Transmission dynamics: data sharing in the COVID-19 era. Learn Health Syst. 2021;5(1):e10235. [ FREE Full text ] [ CrossRef ] [ Medline ]
  • Schwalbe N, Wahl B, Song J, Lehtimaki S. Data sharing and global public health: defining what we mean by data. Front Digit Health. 2020;2:612339. [ FREE Full text ] [ CrossRef ] [ Medline ]
  • McNabb SJN, Shaikh AT, Nuzzo JB, Zumla AI, Heymann DL. Triumphs, trials, and tribulations of the global response to MERS coronavirus. Lancet Respir Med. 2014;2(6):436-437. [ FREE Full text ] [ CrossRef ] [ Medline ]
  • Yozwiak NL, Schaffner SF, Sabeti PC. Data sharing: make outbreak research open access. Nature. 2015;518(7540):477-479. [ FREE Full text ] [ CrossRef ] [ Medline ]
  • Abdeen FN, Fernando T, Kulatunga U, Hettige S, Ranasinghe KDA. Challenges in multi-agency collaboration in disaster management: a Sri Lankan perspective. Int J Disaster Risk Reduct. 2021;62:102399. [ FREE Full text ] [ CrossRef ]
  • Sage D, Zebrowski C, Jorden N. Communications in crisis: the politics of information-sharing in the UK’s COVID-19 response. Crit Stud Secur. 2021;9(2):146-149. [ FREE Full text ] [ CrossRef ]
  • Yaylymova A. COVID-19 in Turkmenistan: no data, no health rights. Health Hum Rights. 2020;22(2):325-327. [ FREE Full text ] [ Medline ]
  • Kamradt-Scott A. WHO’s to blame? The World Health Organization and the 2014 Ebola outbreak in West Africa. Third World Q. 2016;37(3):401-418. [ FREE Full text ] [ CrossRef ]
  • Nibanupudi H, Shaw R. Analysis of regional cooperation from the perspective of regional and global geo-political developments and future scenarios. In: Nibanupudi HK, Shaw R, editors. Mountain Hazards and Disaster Risk Reduction. Tokyo, Japan. Springer; 2015;271-284.
  • Madariaga A, Kasherman L, Karakasis K, Degendorfer P, Heesters AM, Xu W, et al. Optimizing clinical research procedures in public health emergencies. Med Res Rev. 2021;41(2):725-738. [ CrossRef ] [ Medline ]
  • O'Reilly-Shah VN, Gentry KR, van Cleve W, Kendale SM, Jabaley CS, Long DR. The COVID-19 pandemic highlights shortcomings in US health care informatics infrastructure: a call to action. Anesth Analg. 2020;131(2):340-344. [ FREE Full text ] [ CrossRef ] [ Medline ]
  • Tan L, Yu K, Shi N, Yang C, Wei W, Lu H. Towards secure and privacy-preserving data sharing for COVID-19 medical records: a blockchain-empowered approach. IEEE Trans Netw Sci Eng. 2022;9(1):271-281. [ CrossRef ]
  • de Arruda Jorge V, Albagli S. Research data sharing during the Zika virus public health emergency. Inf Res. 2020;25(1):846. [ FREE Full text ]
  • Chan J, Bateman L, Olafsson G. A people and purpose approach to humanitarian data information security and privacy. Procedia Eng. 2016;159:3-5. [ CrossRef ]
  • D'Agostino M, Samuel NO, Sarol MJ, de Cosio FG, Marti M, Luo T, et al. Open data and public health. Rev Panam Salud Publica. 2018;42:e66. [ FREE Full text ] [ CrossRef ] [ Medline ]
  • Arancibia R. An international military perspective on information sharing during disasters. Procedia Eng. 2016;159:348-352. [ FREE Full text ] [ CrossRef ]
  • Callaghan S. COVID-19 is a data science issue. Patterns (N Y). 2020;1(2):100022. [ FREE Full text ] [ CrossRef ] [ Medline ]
  • Naudé W, Vinuesa R. Data deprivations, data gaps and digital divides: lessons from the COVID-19 pandemic. Big Data Soc. 2021;8(2):1-12. [ FREE Full text ] [ CrossRef ]
  • Kim W, Jung TY, Roth S, Um W, Kim C. Management of the COVID-19 pandemic in the Republic of Korea from the perspective of governance and public-private partnership. Yonsei Med J. 2021;62(9):777-791. [ FREE Full text ] [ CrossRef ] [ Medline ]
  • Bedson J, Jalloh MF, Pedi D, Bah S, Owen K, Oniba A, et al. Community engagement in outbreak response: lessons from the 2014-2016 Ebola outbreak in Sierra Leone. BMJ Glob Health. 2020;5(8):e002145. [ FREE Full text ] [ CrossRef ] [ Medline ]
  • Berchtold C, Vollmer M, Sendrowski P, Neisser F, Müller L, Grigoleit S. Barriers and facilitators in interorganizational disaster response: identifying examples across Europe. Int J Disaster Risk Sci. 2020;11(1):46-58. [ FREE Full text ] [ CrossRef ]
  • Silva T, Wuwongse V, Sharma HN. Disaster mitigation and preparedness using linked open data. J Ambient Intell Humaniz Comput. 2012;4(5):591-602. [ CrossRef ]
  • Dron L, Kalatharan V, Gupta A, Haggstrom J, Zariffa N, Morris AD, et al. Data capture and sharing in the COVID-19 pandemic: a cause for concern. Lancet Digit Health. 2022;4(10):e748-e756. [ FREE Full text ] [ CrossRef ] [ Medline ]
  • Subbian V, Solomonides A, Clarkson M, Rahimzadeh VN, Petersen C, Schreiber R, et al. Ethics and informatics in the age of COVID-19: challenges and recommendations for public health organization and public policy. J Am Med Inform Assoc. 2021;28(1):184-189. [ FREE Full text ] [ CrossRef ] [ Medline ]
  • Whitty CJM, Mundel T, Farrar J, Heymann DL, Davies SC, Walport MJ. Providing incentives to share data early in health emergencies: the role of journal editors. Lancet. 2015;386(10006):1797-1798. [ FREE Full text ] [ CrossRef ] [ Medline ]
  • da Costa MP, Leite FCL. Factors influencing research data communication on Zika virus: a grounded theory. J Doc. 2019;75(5):910-926. [ CrossRef ]
  • Clark N, Guiffault F. Seeing through the clouds: processes and challenges for sharing geospatial data for disaster management in Haiti. Int J Disaster Risk Reduct. 2018;28:258-270. [ CrossRef ]
  • Schroeder A, Dresser C, Yadav A, Chan J, Jia S, Buckee C, et al. CrisisReady's novel framework for transdisciplinary translation: case-studies in wildfire and hurricane response. J Clim Change Health. 2023;9:100193. [ FREE Full text ] [ CrossRef ]
  • Waterman L, Casado MR, Bergin E, McInally G. A mixed-methods investigation into barriers for sharing geospatial and resilience flood data in the UK. Water. 2021;13(9):1235. [ FREE Full text ] [ CrossRef ]
  • Callaway E. Zika-microcephaly paper sparks data-sharing confusion. Nature. 2016.:1-2. [ FREE Full text ] [ CrossRef ]
  • Owino B. Harmonising data systems for cash transfer programming in emergencies in Somalia. J Int Humanit Action. 2020;5(1):11. [ FREE Full text ] [ CrossRef ]
  • Edelstein M, Lee LM, Herten-Crabb A, Heymann DL, Harper DR. Strengthening global public health surveillance through data and benefit sharing. Emerg Infect Dis. 2018;24(7):1324-1330. [ FREE Full text ] [ CrossRef ]
  • Lencucha R, Bandara S. Trust, risk, and the challenge of information sharing during a health emergency. Glob Health. 2021;17(1):21. [ FREE Full text ] [ CrossRef ] [ Medline ]
  • Maidin SS, Othman M, Ahmad MN. Information sharing in Governance of Flood Management in Malaysia: COBIT based framework. Information sharing in governance of flood management in Malaysia. COBIT based framework. InInternational Conference on IT Convergence and Security (ICITCS). IEEE Computer Society; 2014. Presented at: 2014 International Conference on IT Convergence and Security (ICITCS); October 28-30, 2014; Beijing, China. [ CrossRef ]
  • Shrestha B, Pathranarakul P. Nepal government‘s emergency response to the 2015 earthquake: a case study. Soc Sci. 2018;7(8):127. [ FREE Full text ] [ CrossRef ]
  • Huang Y. The SARS epidemic and its aftermath in China: a political perspective. In: Mahmoud A, Mack A, Oberholtzer K, Sivitz L, Knobler S, Lemon S, editors. Learning from SARS: Preparing for the Next Disease Outbreak: Workshop Summary. Washington, DC. National Academies Press; 2004;116-136.
  • Strauss A, Corbin JM. Basics of Qualitative Research: Techniques and Procedures for Developing Grounded Theory. Thousand Oaks, CA. Sage Publications; 1998.
  • Wiseman JJ. Two models for successful intergovernmental data sharing. IBM Center for the Business of Government. 2020. URL: https://www.businessofgovernment.org/blog/two-models-successful-intergovernmental-data-sharing [accessed 2022-04-08]
  • Peleg M, Reichman A, Shachar S, Gadot T, Tsadok MA, Azaria M, et al. Collaboration between government and research community to respond to COVID-19: Israel’s case. J Open Innov Technol Mark Complex. 2021;7(4):208. [ FREE Full text ] [ CrossRef ]
  • Wang V, Shepherd D. Exploring the extent of openness of open government data—a critique of open government datasets in the UK. Gov Inf Q. 2020;37(1):101405. [ CrossRef ]
  • Nikiforova A. Timeliness of open data in open government data portals through pandemic-related data: a long data way from the publisher to the user. 2020. Presented at: 2020 Fourth International Conference on Multimedia Computing, Networking and Applications (MCNA); October 19-22, 2020; Valencia, Spain. [ CrossRef ]
  • Yiannakoulias N, Slavik CE, Sturrock SL, Darlington JC. Open government data, uncertainty and coronavirus: an infodemiological case study. Soc Sci Med. 2020;265:113549. [ FREE Full text ] [ CrossRef ] [ Medline ]
  • Silva LVE, de Andrade Abi Harb MDP, Dos Santos AMTB, de Mattos Teixeira CA, Gomes VHM, Cardoso EHS, et al. COVID-19 mortality underreporting in Brazil: analysis of data from government internet portals. J Med Internet Res. 2020;22(8):e21413. [ FREE Full text ] [ CrossRef ] [ Medline ]
  • Maione G, Sorrentino D, Kruja AD. Open data for accountability at times of exception: an exploratory analysis during the COVID-19 pandemic. Transform Gov People Process Policy. 2022;16(2):231-243. [ CrossRef ]
  • Cole S, Dhaliwal I, Sautmann A, Vilhuber L. Using administrative data for research and evidence-based policy: an introduction. In: Sautmann A, Dhaliwal I, Cole S, editors. Handbook on Using Administrative Data for Research and Evidence-Based Policy. Cambridge, MA. Abdul Latif Jameel Poverty Action Lab; 2020;1-33.
  • Rapid response data science. Academic Data Science Alliance. URL: https://academicdatascience.org/community-projects/rapidresponse-datascience/ [accessed 2023-10-17]
  • Parnofiello M. How states overcome big data analytics challenges. StateTech. 2020. URL: https://statetechmagazine.com/article/2020/01/how-states-overcome-big-data-analytics-challenges [accessed 2024-03-27]
  • Lugg-Widger FV, Angel L, Cannings-John R, Hood K, Hughes K, Moody G, et al. Challenges in accessing routinely collected data from multiple providers in the UK for primary studies: managing the morass. Int J Popul Data Sci. 2018;3(3):432. [ FREE Full text ] [ CrossRef ] [ Medline ]
  • Martin EG, Begany GM. Opening government health data to the public: benefits, challenges, and lessons learned from early innovators. J Am Med Inform Assoc. 2017;24(2):345-351. [ FREE Full text ] [ CrossRef ] [ Medline ]
  • Lalika M, Beckford J, Seiler J, Badshah A, Stewart B. COVID-19 modeling to policy in Washington state: lessons learned. University of Washington Strategic Analysis, Research & Training (START) Center. 2021. URL: https://gatesopenresearch.org/documents/7-38 [accessed 2024-03-27]
  • Duff-Brown B. Stanford team uses data to help California track and prevent COVID-19. Stanford Health Policy. 2020. URL: https:/​/healthpolicy.​fsi.stanford.edu/​news/​stanford-team-uses-data-help-california-track-and-prevent-covid-19 [accessed 2024-03-27]
  • AISP network: children's data network. Actionable Intelligence for Social Policy (AISP), University of Pennsylvania School of Social Policy & Practice. URL: https://aisp.upenn.edu/network-site/california-cdn/ [accessed 2023-10-24]
  • Tracking COVID-19 in Michigan: online tools to support data-driven public health responses. University of Michigan Institute for Healthcare Policy and Innovation. 2021. URL: https://ihpi.umich.edu/COVIDdashboards [accessed 2021-10-10]
  • RFA-CD-05-109: centers of excellence in public health informatics. The U.S. Department of Health and Human Services. 2005. URL: https://grants.nih.gov/grants/guide/rfa-files/RFA-CD-05-109.html [accessed 2024-03-27]
  • Husting EL, Gadsden-Knowles K. The centers of excellence in public health informatics: improving public health through innovation, collaboration, dissemination, and translation. Online J Public Health Inform. 2011;3(3):ojphi.v3i3.3897. [ FREE Full text ] [ CrossRef ] [ Medline ]
  • Dixon BE, Grannis SJ. Why "what data are necessary for this project?" and other basic questions are important to address in public health informatics practice and research. Online J Public Health Inform. 2011;3(3):ojphi.v3i3.3792. [ FREE Full text ] [ CrossRef ] [ Medline ]
  • Broyles D, Dixon BE, Crichton R, Biondich P, Grannis SJ. The evolving health information infrastructure. In: Dixon BE, editor. Health Information Exchange: Navigating and Managing a Network of Health Information Systems. San Diego, CA. Elsevier Science; 2016;107-122.
  • Platt JE, Raj M, Wienroth M. An analysis of the learning health system in its first decade in practice: scoping review. J Med Internet Res. 2020;22(3):e17026. [ FREE Full text ] [ CrossRef ] [ Medline ]
  • Wiehe SE, Rosenman MB, Chartash D, Lipscomb ER, Nelson TL, Magee LA, et al. A solutions-based approach to building data-sharing partnerships. EGEMS (Wash DC). 2018;6(1):20. [ FREE Full text ] [ CrossRef ] [ Medline ]
  • Douglass K, Allard S, Tenopir C, Wu L, Frame M. Managing scientific data as public assets: data sharing practices and policies among full-time government employees. J Assoc Inf Sci Technol. 2013;65(2):251-262. [ CrossRef ]
  • Haendel MA, Chute CG, Bennett TD, Eichmann DA, Guinney J, Kibbe WA, et al. The National COVID Cohort Collaborative (N3C): rationale, design, infrastructure, and deployment. J Am Med Inform Assoc. 2021;28(3):427-443. [ FREE Full text ] [ CrossRef ] [ Medline ]
  • Hulsen T. Sharing is caring-data sharing initiatives in healthcare. Int J Environ Res Public Health. 2020;17(9):3046. [ FREE Full text ] [ CrossRef ] [ Medline ]
  • Whitehead S. End of data sharing could make COVID-19 harder to control, experts and high-risk patients warn. CNN Health. URL: https://www.cnn.com/2023/04/24/health/pandemic-data-tracking-kff-partner/index.html [accessed 2023-04-23]
  • Trends in United States COVID-19 hospitalizations, deaths, Emergency Department (ED) visits, and test positivity by geographic area. Centers for Disease Control and Prevention, COVID Data Tracker. URL: https://covid.cdc.gov/covid-data-tracker/#trends_weeklyhospitaladmissions_select_00 [accessed 2023-10-17]
  • Choi J. COVID hospital admissions jump in what could be a new norm of summer surges. The Hill. 2023. URL: https:/​/thehill.​com/​policy/​healthcare/​4131057-covid-hospital-admissions-jump-in-what-could-be-a-new-norm-of-summer-surges/​ [accessed 2024-03-27]
  • Bernardo T, Sobkowich KE, Forrest RO, Stewart LS, D'Agostino M, Gutierrez EP, et al. Collaborating in the time of COVID-19: the scope and scale of innovative responses to a global pandemic. JMIR Public Health Surveill. 2021;7(2):e25935. [ FREE Full text ] [ CrossRef ] [ Medline ]
  • Pourebrahim N, Sultana S, Edwards J, Gochanour A, Mohanty S. Understanding communication dynamics on Twitter during natural disasters: a case study of Hurricane Sandy. Int J Disaster Risk Reduct. 2019;37:101176. [ FREE Full text ] [ CrossRef ]
  • Sutton JN, Palen L, Shklovski I. Backchannels on the front lines: emergent uses of social media in the 2007 Southern California wildfires. 2007. Presented at: 5th International ISCRAM Conference on Information Systems for Crisis Response and Management; May 2008;624-631; Washington, DC, USA.

Abbreviations

Edited by A Mavragani; submitted 27.10.23; peer-reviewed by C Staes, F Wirth; comments to author 08.12.23; revised version received 24.02.24; accepted 05.03.24; published 24.04.24.

©Jian-Sin Lee, Allison R B Tyler, Tiffany Christine Veinot, Elizabeth Yakel. Originally published in JMIR Public Health and Surveillance (https://publichealth.jmir.org), 24.04.2024.

This is an open-access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work, first published in JMIR Public Health and Surveillance, is properly cited. The complete bibliographic information, a link to the original publication on https://publichealth.jmir.org, as well as this copyright and license information must be included.

IMAGES

  1. Health Data Research Futures

    health data research futures

  2. Health Data Research UK Celebrates 100th Open Access Publication

    health data research futures

  3. Health Data Research Hubs

    health data research futures

  4. Celebrating diversity of skills in health data research and innovation

    health data research futures

  5. New data hub for mental health research

    health data research futures

  6. Health Data Research UK Scientific Conference 2022: Data for global health and society

    health data research futures

VIDEO

  1. Opening remarks: Health & Life Sciences

  2. Day 1 Summary Remarks (Andrew Morris): HDR UK Conference 2024

  3. #TradeTalks: Future of Healthcare is Emerging Technology

  4. Dr Tom Foley

  5. Big Data, Health Equity and the Future of Clinical Trials

  6. Institutional Review Board (IRBMED) Mobile Medical Application Review

COMMENTS

  1. Microsoft Health Futures

    The future of health will be data-driven, predictive, and precise. Microsoft Health Futures is focused on empowering every person on the planet to live a healthier future. To this end, we are responsible for research, incubations, and moonshots that drive cross-company strategy, partnerships, and real-world impact across healthcare and the life ...

  2. The Emergence and Future of Public Health Data Science

    The Emergence and Future of Public Health Data Science. Data science is a newly‐formed and, as yet, loosely‐defined discipline that has nonetheless emerged as a critical component of successful scientific research. We seek to provide an understanding of the term "data science," particularly as it relates to public health; to identify ...

  3. The Future Is: Data Science for Health

    This program, born out of a $1.7 million award from the NIH, is meant to create new training opportunities in health data science in Eastern Africa. It's part of a five-year, $74.5 million NIH initiative, and its goal is to support research projects focused on the ethical, legal, and social implications of data science research.

  4. Transforming Health Data to Actionable Information: Recent Progress and

    Harnessing HIE data for research studies also has underscored the importance of ensuring privacy of health data while meeting short-term information needs for research and healthcare delivery [ 118]. The COVID-19 pandemic has exposed numerous challenges in the healthcare infrastructure, including those pertaining to HIE.

  5. Looking forward 25 years: the future of medicine

    In 2045, I hope we will have developed a planetary health infrastructure based on deep, longitudinal, multimodal human data, ideally collected from and accessible to as many as possible of the 9 ...

  6. Unlocking the potential of health data to help research and ...

    Likewise, the non-profit organization Health Data Research UK (HDR UK) in London launched its Innovation Gateway platform in 2020 to curate health data sets and a suite of analysis tools.

  7. AI and the Future of Health

    A crucial factor in future-facing, data-driven health systems is the accessibility and interpretability of multimodal health information. To meet this need, Microsoft has laid a solid foundation across multiple modalities in biomedical NLP building on our deep research assets in deep learning and biomedical machine reading.

  8. HDRUK Innovation Gateway

    As the national institute for health data science, HDR UK want to ensure that everyone who wants to build a career in this rapidly advancing field has access to high quality learning and training opportunities, whatever their background and location. Our virtual learning platform Health Data Research Futures is a free and flexible learning ...

  9. The Lancet and Financial Times Commission on governing health futures

    From the short-term and long-term effects of the COVID-19 pandemic to the health insecurities brought about by climate change, health futures are unfolding in an era of accelerating economic, societal, technological, and environmental changes. Digital transformations, which we define as the multifaceted processes of integration of digital technologies and platforms into all areas of life ...

  10. Developing the next generation of health data scientists

    Press Release Developing the next generation of health data scientists . 20 May 2021 . Health Data Research UK (HDR UK) launches "HDR UK Futures" - a new virtual learning platform containing free, expert-led online training courses, adding a new dimension to the UK's contribution to health data science training globally.

  11. Genomic health data generation in the UK: a 360 view

    Abstract. In the UK, genomic health data is being generated in three major contexts: the healthcare system (based on clinical indication), in large scale research programmes, and for purchasers of ...

  12. Health forecasting

    Health forecasting. We create forecasts and custom future scenarios to study the impact of new policies, technologies, or interventions on health outcomes. 9.73 billion people are expected to be alive at the projected population peak in 2064. 8.79 billion people are expected to be alive as the world's population declines in 2100.

  13. Future of Health and Data Science

    Future of Health and Data Science. Remote healthcare has seen an explosion in implementation with a 3,000 percent increase in usage during the height of the COVID-19 pandemic. Examine the theoretical foundations as well as the practical, technical, and business aspects of telemedicine and how to best utilize technology to enhance equitable ...

  14. Learn with HDR UK Futures

    A Revolution in Health Data Science Learning We're excited to announce that HDR UK Futures, your trusted learning platform, is undergoing a major transformation! While we work on bringing you a revamped and more powerful learning experience, the current version of our learning platform will be offline (from 4th March).

  15. Futures for Health Research Data Platforms From the Participants

    After all, the future of health research data platforms should be guided by the participants' perspectives, because they are the ones putting themselves in a vulnerable position for the common good. ACKNOWLEDGMENTS. This project is part of TEAM (Technology Enabled Mental Health for Young People), which is funded by the European Union's ...

  16. Home

    The Health Data Research Innovation Gateway provides a common entry point to discover and request access to UK health datasets. Users can search for health data tools, research projects, publications and collaborate via a community forum. Find out more. ... HDR UK Futures.

  17. Setting Our Sights Toward a Healthier, More Innovative, Data-Driven Future

    Setting Our Sights Toward a Healthier, More Innovative, Data-Driven Future. The draft 2024-2030 Federal Health IT Strategic Plan [PDF - 2.3 MB] (the draft Plan) is now open for public comment. The public comment period on the draft Plan ends on May 28, 2024 at 11:59:59 PM ET. The draft Plan is a comprehensive and strategic effort developed ...

  18. Our Research

    Advancing Health Research. Our research will use health data in all its forms - including NHS patient data, genomics, biomedicine and wearable data - to understand predispositions to disease, develop targeted treatments and deliver new discoveries from real-world data to improve patient care. Our national health data scientific programmes ...

  19. Medical, health, and genomics

    Our organization, Microsoft Health Futures, is working at the intersection of Large Language Models (LLM), User Experience (UX) Innovation, and biomedicine. We are an interdisciplinary group of researchers, data scientists, computational biologists, bioinformaticians, engineers, medical…

  20. About

    About. Welcome to Health Data Research UK (HDR UK) - we are the UK's national institute for health data science. We are uniting the UK's health data to enable discoveries that improve people's lives. Our vision is that every health and care interaction and research endeavour will be enhanced by access to large scale data and advanced analytics.

  21. Building a data-driven health-care ecosystem

    The application of AI to health-care data has promise to align the U.S. health-care system to quality care and positive health outcomes. But AI for health care hasn't reached its full capacity.

  22. Health forecasting

    Health policy and planning. Health forecasting. We create forecasts and custom future scenarios to study the impact of new policies, technologies, or interventions on health outcomes. 9.73 billionpeople are expected to be alive at the projected population peak in 2064. 8.79 billionpeople are expected to be alive as the world's population ...

  23. The U.S. wants to change how researchers get access to a huge ...

    Health researchers are urging the U.S. government to rethink a plan that would require them to use an in-house government system, and pay substantially more, to access a massive trove of data assembled by federal programs that support medical care for some 140 million people.

  24. 'Our health data is about to flow more freely, like it or not': big

    We aim to treble industry contract and R&D collaborative research in the NHS over 10 years, to nearly £1bn." (Again, there's no suggestion that Palantir was seeking to sell data - just that ...

  25. Towards the European Health Data Space (EHDS) ecosystem: A survey

    In 2021, the DayOne healthcare innovation initiative conducted a study on shaping the health data future based on the insights of 50 experts [10].The study utilized the scenario-building technique [11] to effectively address the health data ecosystem challenges through understanding and analyzing current and historical trends and events. This study examined 20 influencing factors to draw the ...

  26. 3 Companies Hope to Advance Health Research in a Quantum Leap

    Qradle. The final company chosen, Qradle Inc., provides quantum software used to make drug discoveries. The team plans to develop a group of programs that will work together to aid drug-discovery research. This includes a classical computing-to-quantum computing conversion tool that can leverage existing classical AI/ML solutions for drug ...

  27. Our Future Health joins the UK Health Data Research Alliance

    Convened by Health Data Research UK (HDR UK), the Alliance includes members from across the healthcare and research sector, including the NHS, medical research charities, academia, health data research hubs and AI Centres of Excellence.Our Future Health is one of ten new organisations working together to develop and coordinate the adoption of standards, tools and technologies for the ...

  28. Data Resources

    DSBU's staff has expertise in managing and using various data sources, ranging from electronic health records and clinical trial or registry data to administrative, claims, or survey data. The DSBU serves as a resource for Clinical Futures, PolicyLab, and other CHOP investigators using complex data to address research questions. DSBU provides ...

  29. JMIR Public Health and Surveillance

    During public health crises, the significance of rapid data sharing cannot be overstated. In attempts to accelerate COVID-19 pandemic responses, discussions within society and scholarly research have focused on data sharing among health care providers, across government departments at different levels, and on an international scale. A lesser-addressed yet equally important approach to sharing ...

  30. Bridging the gap between research evidence and its ...

    Aim To investigate the potential of embedded research in bridging the gap between research evidence and its implementation in public health practice. Methods Using a case study methodology, semi-structured interviews were conducted with 4 embedded researchers, 9 public health practitioners, and 4 other stakeholders (2 teachers and 2 students) across four case study sites. Sites and individuals ...