U.S. flag

An official website of the United States government

The .gov means it’s official. Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

The site is secure. The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

  • Publications
  • Account settings

Preview improvements coming to the PMC website in October 2024. Learn More or Try it out now .

  • Advanced Search
  • Journal List
  • v.23(7); 2013 Jul

Hypothesis-generating research and predictive medicine

Leslie g. biesecker.

National Human Genome Research Institute, National Institutes of Health, Bethesda, Maryland 20892, USA

Genomics has profoundly changed biology by scaling data acquisition, which has provided researchers with the opportunity to interrogate biology in novel and creative ways. No longer constrained by low-throughput assays, researchers have developed hypothesis-generating approaches to understand the molecular basis of nature—both normal and pathological. The paradigm of hypothesis-generating research does not replace or undermine hypothesis-testing modes of research; instead, it complements them and has facilitated discoveries that may not have been possible with hypothesis-testing research. The hypothesis-generating mode of research has been primarily practiced in basic science but has recently been extended to clinical-translational work as well. Just as in basic science, this approach to research can facilitate insights into human health and disease mechanisms and provide the crucially needed data set of the full spectrum of genotype–phenotype correlations. Finally, the paradigm of hypothesis-generating research is conceptually similar to the underpinning of predictive genomic medicine, which has the potential to shift medicine from a primarily population- or cohort-based activity to one that instead uses individual susceptibility, prognostic, and pharmacogenetic profiles to maximize the efficacy and minimize the iatrogenic effects of medical interventions.

The goal of this article is to describe how recent technological changes provide opportunities to undertake novel approaches to biomedical research and to practice genomic preventive medicine. Massively parallel sequencing is the primary technology that will be addressed here ( Mardis 2008 ), but the principles apply to many other technologies, such as proteomics, metabolomics, transcriptomics, etc. Readers of this journal are well aware of the precipitous fall of sequencing costs over the last several decades. The consequence of this fall is that we are no longer in a scientific and medical world where the throughput (and the costs) of testing is the key limiting factor around which these enterprises are organized. Once one is released from this limiting factor, one may ask whether these enterprises should be reorganized. Here I outline the principles of how these enterprises are organized, show how high-throughput biology can allow alternative organizations of these enterprises to be considered, and show how biology and medicine are in many ways similar. The discussion includes three categories of enterprises: basic research, clinical research, and medical practice.

The basic science hypothesis-testing paradigm

The classical paradigm for basic biological research has been to develop a specific hypothesis that can be tested by the application of a prospectively defined experiment (see Box 1 ). I suggest that one of the major (although not the only) factors that led to the development of this paradigm is that experimental design was limited by the throughput of available assays. This low throughput mandated that the scientific question had to be focused narrowly to make the question tractable. However, the paradigm can be questioned if the scientist has the ability to assay every potential attribute of a given type (e.g., all genes). If the hypothesis is only needed to select the assay, one does not need a hypothesis to apply a technology that assays all attributes. In the case of sequencing, the radical increase in throughput can release scientists from the constraint of the specific hypothesis because it has allowed them to interrogate essentially all genotypes in a genome in a single assay. This capability facilitates fundamental biological discoveries that were impossible or impractical with a hypothesis-testing mode of scientific inquiry. Examples of this approach are well demonstrated by several discoveries that followed the sequencing of a number of genomes. An example was the discovery that the human gene count was just over 20,000 ( International Human Genome Sequencing Consortium 2004 ), much lower than prior estimates. This result, although it was much debated and anticipated, was not a hypothesis that drove the human genome project, but nonetheless was surprising and led to insights into the nuances of gene regulation and transcriptional isoforms to explain the complexity of the human organism. The availability of whole genome sequence data from multiple species facilitated analyses of conservation. While it was expected that protein-coding regions, and to a lesser extent promoters and 5′- and 3′-untranslated regions of genes, would exhibit recognizable sequence conservation, it was unexpected that an even larger fraction of the genomes outside of genes are highly conserved ( Mouse Genome Sequencing Consortium 2002 ). This surprising and unanticipated discovery has spawned a novel field of scientific inquiry to determine the functional roles of these elements, which are undoubtedly important in physiology and pathophysiology. These discoveries demonstrate the power of hypothesis-generating basic research to illuminate important biological principles.

Basic science hypothesis-testing and hypothesis-generating paradigms

An external file that holds a picture, illustration, etc.
Object name is 1051tbl1.jpg

Clinical and translational research

The approach to clinical research grew out of the basic science paradigm as described above. The first few steps of selecting a scientific problem and developing a hypothesis are similar, with the additional step ( Box 2 ) of rigorously defining a phenotype and then carefully selecting research participants with and without that trait. As in the basic science paradigm, the hypothesis is tested by the application of a specific assay to the cases and controls. Again, this paradigm has been incredibly fruitful and should not be abandoned, but the hypothesis-generating approach can be used here as well. In this approach, a cohort of participants is consented, basic information is gathered on their health, and then a high-throughput assay, such as genome or exome sequencing, is applied to all of the participants. Again, because the assay tests all such attributes, the research design does not necessitate a priori selections of phenotypes and genes to be interrogated. Then, the researcher can examine the sequence data set for patterns and perturbations, form hypotheses about how such perturbations might affect the phenotype of the participants, and test that hypothesis with a clinical research evaluation. This approach has been used with data from genome-wide copy number assessments (array CGH and SNP arrays), but sequencing takes it to a higher level of interrogation and provides innumerable variants with which to work.

Clinical research paradigms

An external file that holds a picture, illustration, etc.
Object name is 1051tbl2.jpg

An example of this type of sequence-based hypothesis-generating clinical research started with a collaborative project in which we showed that mutations in the gene ACSF3 caused the biochemical phenotype of combined malonic and methylmalonic acidemia ( Sloan et al. 2011 ). At that time, the disorder was believed to be a classic pediatric, autosomal-recessive severe metabolic disorder with decompensation and sometimes death. We then queried the ClinSeq cohort ( Biesecker et al. 2009 ) to assess the carrier frequency, to estimate the population frequency of this rare disorder. Because ClinSeq is a cohort of adults with a range of atherosclerosis severity, we reasoned that this would serve as a control population for an unbiased estimate of ACSF3 heterozygote mutant alleles. Surprisingly, we identified a ClinSeq participant who was homozygous for one of the mutations identified in the children with the typical phenotype. Indeed, one potential interpretation of the data would be that the variant is, in fact, benign and was erroneously concluded to be pathogenic, based on finding it in a child with the typical phenotype. It has been shown that this error is common, with up to 20% of variants listed in databases as pathogenic actually being benign ( Bell et al. 2011 ). Further clinical research on this participant led to the surprising result that she had severely abnormal blood and urine levels of malonic and methylmalonic acid ( Sloan et al. 2011 ). This novel approach to translational research was a powerful confirmation that the mutation was indeed pathogenic, but there was another, even more important conclusion. We had conceptualized the disease completely incorrectly. Instead of being only a severe, pediatric metabolic disorder, it was instead a disorder with a wide phenotypic spectrum in which one component of the disease is a metabolic perturbation and another component is a susceptibility to severe decompensation and strokes. This research indeed raises many questions about the natural history of the disorder, whether the pediatric decompensation phenotype is attributable to modifiers, what the appropriate management of such an adult would be, etc.

Irrespective of these limitations, the understanding of the disease has markedly advanced, and the key to understanding the broader spectrum of this disease was the hypothesis-generating approach enabled by the massively parallel sequence data and the ability to phenotype patients iteratively from ClinSeq. The iterative phenotyping was essential because we could not have anticipated when the patients were originally ascertained that we would need to assay malonic and methylmalonic acid. Nor did we recognize prospectively that we should be evaluating apparently healthy patients in their seventh decade for this phenotype. Indeed, it is impossible to evaluate patients for all potential phenotypes prospectively, and it is essential to minimize ascertainment bias for patient recruitment in order to allow the discovery of the full spectrum of phenotypes associated with genomic variations. This latter issue has become a critical challenge for implementing predictive medicine, as described below.

Predictive genomic medicine in practice

The principles of scientific inquiry are parallel to the processes of clinical diagnosis ( Box 3 ). In the classic, hypothesis-testing paradigm, clinicians gather background information including chief complaint, 2 medical and family history, and physical examination, and use these data to formulate the differential diagnosis, which is a set of potential medical diagnoses that could explain the patient's signs and symptoms. Then, the clinician selects, among the myriad of tests (imaging, biochemical, genetic, physiologic, etc.), a few tests, the results of which should distinguish among (or possibly exclude entirely) the disorders on the differential diagnosis. Like the scientist, the physician must act as a test selector, because each of the tests is low throughput, time consuming, and expensive.

Clinical practice paradigms—hypothesis testing and hypothesis generating

An external file that holds a picture, illustration, etc.
Object name is 1051tbl3.jpg

As in the basic and translational research discussion above, the question could be raised as to whether the differential diagnostic paradigm is necessary for genetic disorders. Indeed, the availability of clinical genome and exome sequencing heralds an era when the test could be ordered relatively early in the diagnostic process, with the clinician serving in a more interpretative role, rather than as a test selector ( Hennekam and Biesecker 2012 ). This approach has already been adopted for copy number variation, because whole genome array CGH- or SNP-based approaches have mostly displaced more specific single-gene or single-locus assays and standard chromosome analyses ( Miller et al. 2010 ). But the paradigm can be taken beyond hypothesis-generating clinical diagnosis into predictive medicine. One can now begin to envision how whole genome approaches could be used to assess risks prospectively for susceptibility to late-onset disorders or occult or subclinical disorders. The heritable cancer susceptibility syndromes are a good example of this. The current clinical approach is to order a specific gene test if a patient presents with a personal history of an atypical or early-onset form of a specific cancer syndrome, or has a compelling family history of the disease. As in the prior examples, this is because individual cancer gene testing is expensive and low throughput. One can ask the question whether this is the ideal approach or if we could be screening for these disorders from genome or exome data. Again, we applied sequencing analysis for these genes to the ClinSeq cohort because they were not ascertained for that phenotype. In a published study of 572 exomes ( Johnston et al. 2012 ), updated here to include 850 exomes, we have identified 10 patients with seven distinct cancer susceptibility syndrome mutations. These were mostly familial breast and ovarian cancer ( BRCA1 and BRCA2 ), with one patient each with paraganglioma and pheochromocytoma ( SDHC ) and one with Lynch syndrome ( MSH6 ). What is remarkable about these diagnoses is that only about half of them had a convincing personal or family history of the disease, and thus most would have not been offered testing using the current, hypothesis-testing clinical paradigm. These data suggest that screening for these disorders using genome or exome sequencing could markedly improve our ability to identify such families before they develop or die from these diseases—the ideal of predictive genomic medicine.

Despite these optimistic scenarios and examples, it remains true that our ability to perform true predictive medicine is limited. These limitations include technical factors such as incomplete sequence coverage, imperfect sequence quality, inadequate knowledge regarding the penetrance and expressivity of most variants, uncertain medical approaches and utility of pursuing variants from genomic sequencing, and the poor preparation of most clinicians for addressing genomic concerns in the clinic ( Biesecker 2013 ). Recognizing all of these limitations, it is clear that we are not prepared to launch broad-scale implementation of predictive genomic medicine, nor should all research be structured using the hypothesis-generating approach.

Hypothesis-testing approaches to science and medicine have served us well and should continue. However, the advent of massively parallel sequencing and other high-throughput technologies provides opportunities to undertake hypothesis-generating approaches to science and medicine, which in turn provide unprecedented opportunities for discovery in the research realm. This can allow the discovery of results that were not anticipated or intended by the research design, yet provide critical insights into biology and pathophysiology. Similarly, hypothesis-generating clinical research has the potential to provide these same insights and, in addition, has the potential to provide us with data that will illuminate the full spectrum of genotype–phenotype correlations, eliminating the biases that have limited this understanding in the past. Finally, applying these principles to clinical medicine can provide new pathways to diagnosis and provide the theoretical basis for predictive medicine that can detect disease susceptibility and allow health to be maintained, instead of solely focusing on the treatment of evident disease.

Article is online at http://www.genome.org/cgi/doi/10.1101/gr.157826.113 .

2 The chief complaint is a brief description of the problem that led the patient to the clinician, such as “I have a cough and fever.”

  • Bell CJ, Dinwiddie DL, Miller NA, Hateley SL, Ganusova EE, Mudge J, Langley RJ, Zhang L, Lee CC, Schilkey FD, et al. 2011. Carrier testing for severe childhood recessive diseases by next-generation sequencing . Sci Transl Med 3 : 65ra64 [ PMC free article ] [ PubMed ] [ Google Scholar ]
  • Biesecker LG 2013. Incidental findings are critical for genomics . Am J Hum Genet 92 : 648–651 [ PMC free article ] [ PubMed ] [ Google Scholar ]
  • Biesecker LG, Mullikin JC, Facio FM, Turner C, Cherukuri PF, Blakesley RW, Bouffard GG, Chines PS, Cruz P, Hansen NF, et al. 2009. The ClinSeq Project: Piloting large-scale genome sequencing for research in genomic medicine . Genome Res 19 : 1665–1674 [ PMC free article ] [ PubMed ] [ Google Scholar ]
  • Hennekam RC, Biesecker LG 2012. Next-generation sequencing demands next-generation phenotyping . Hum Mutat 33 : 884–886 [ PMC free article ] [ PubMed ] [ Google Scholar ]
  • International Human Genome Sequencing Consortium 2004. Finishing the euchromatic sequence of the human genome . Nature 431 : 931–945 [ PubMed ] [ Google Scholar ]
  • Johnston JJ, Rubinstein WS, Facio FM, Ng D, Singh LN, Teer JK, Mullikin JC, Biesecker LG 2012. Secondary variants in individuals undergoing exome sequencing: Screening of 572 individuals identifies high-penetrance mutations in cancer-susceptibility genes . Am J Hum Genet 91 : 97–108 [ PMC free article ] [ PubMed ] [ Google Scholar ]
  • Mardis ER 2008. The impact of next-generation sequencing technology on genetics . Trends Genet 24 : 133–141 [ PubMed ] [ Google Scholar ]
  • Miller DT, Adam MP, Aradhya S, Biesecker LG, Brothman AR, Carter NP, Church DM, Crolla JA, Eichler EE, Epstein CJ, et al. 2010. Consensus statement: Chromosomal microarray is a first-tier clinical diagnostic test for individuals with developmental disabilities or congenital anomalies . Am J Hum Genet 86 : 749–764 [ PMC free article ] [ PubMed ] [ Google Scholar ]
  • Mouse Genome Sequencing Consortium. 2002. Initial sequencing and comparative analysis of the mouse genome . Nature 420 : 520–562 [ PubMed ] [ Google Scholar ]
  • Sloan JL, Johnston JJ, Manoli I, Chandler RJ, Krause C, Carrillo-Carrasco N, Chandrasekaran SD, Sysol JR, O'Brien K, Hauser NS, et al. 2011. Exome sequencing identifies ACSF3 as a cause of combined malonic and methylmalonic aciduria . Nat Genet 43 : 883–886 [ PMC free article ] [ PubMed ] [ Google Scholar ]

The Research Hypothesis: Role and Construction

  • First Online: 01 January 2012

Cite this chapter

definition of hypothesis generation

  • Phyllis G. Supino EdD 3  

6094 Accesses

A hypothesis is a logical construct, interposed between a problem and its solution, which represents a proposed answer to a research question. It gives direction to the investigator’s thinking about the problem and, therefore, facilitates a solution. There are three primary modes of inference by which hypotheses are developed: deduction (reasoning from a general propositions to specific instances), induction (reasoning from specific instances to a general proposition), and abduction (formulation/acceptance on probation of a hypothesis to explain a surprising observation).

A research hypothesis should reflect an inference about variables; be stated as a grammatically complete, declarative sentence; be expressed simply and unambiguously; provide an adequate answer to the research problem; and be testable. Hypotheses can be classified as conceptual versus operational, single versus bi- or multivariable, causal or not causal, mechanistic versus nonmechanistic, and null or alternative. Hypotheses most commonly entail statements about “variables” which, in turn, can be classified according to their level of measurement (scaling characteristics) or according to their role in the hypothesis (independent, dependent, moderator, control, or intervening).

A hypothesis is rendered operational when its broadly (conceptually) stated variables are replaced by operational definitions of those variables. Hypotheses stated in this manner are called operational hypotheses, specific hypotheses, or predictions and facilitate testing.

Wrong hypotheses, rightly worked from, have produced more results than unguided observation

—Augustus De Morgan, 1872[ 1 ]—

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save.

  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
  • Available as EPUB and PDF
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
  • Durable hardcover edition

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

definition of hypothesis generation

The Nature and Logic of Science: Testing Hypotheses

definition of hypothesis generation

Abductive Research Methods in Psychological Science

definition of hypothesis generation

Abductive Research Methods in Psychological Science

De Morgan A, De Morgan S. A budget of paradoxes. London: Longmans Green; 1872.

Google Scholar  

Leedy Paul D. Practical research. Planning and design. 2nd ed. New York: Macmillan; 1960.

Bernard C. Introduction to the study of experimental medicine. New York: Dover; 1957.

Erren TC. The quest for questions—on the logical force of science. Med Hypotheses. 2004;62:635–40.

Article   PubMed   Google Scholar  

Peirce CS. Collected papers of Charles Sanders Peirce, vol. 7. In: Hartshorne C, Weiss P, editors. Boston: The Belknap Press of Harvard University Press; 1966.

Aristotle. The complete works of Aristotle: the revised Oxford Translation. In: Barnes J, editor. vol. 2. Princeton/New Jersey: Princeton University Press; 1984.

Polit D, Beck CT. Conceptualizing a study to generate evidence for nursing. In: Polit D, Beck CT, editors. Nursing research: generating and assessing evidence for nursing practice. 8th ed. Philadelphia: Wolters Kluwer/Lippincott Williams and Wilkins; 2008. Chapter 4.

Jenicek M, Hitchcock DL. Evidence-based practice. Logic and critical thinking in medicine. Chicago: AMA Press; 2005.

Bacon F. The novum organon or a true guide to the interpretation of nature. A new translation by the Rev G.W. Kitchin. Oxford: The University Press; 1855.

Popper KR. Objective knowledge: an evolutionary approach (revised edition). New York: Oxford University Press; 1979.

Morgan AJ, Parker S. Translational mini-review series on vaccines: the Edward Jenner Museum and the history of vaccination. Clin Exp Immunol. 2007;147:389–94.

Article   PubMed   CAS   Google Scholar  

Pead PJ. Benjamin Jesty: new light in the dawn of vaccination. Lancet. 2003;362:2104–9.

Lee JA. The scientific endeavor: a primer on scientific principles and practice. San Francisco: Addison-Wesley Longman; 2000.

Allchin D. Lawson’s shoehorn, or should the philosophy of science be rated, ‘X’? Science and Education. 2003;12:315–29.

Article   Google Scholar  

Lawson AE. What is the role of induction and deduction in reasoning and scientific inquiry? J Res Sci Teach. 2005;42:716–40.

Peirce CS. Collected papers of Charles Sanders Peirce, vol. 2. In: Hartshorne C, Weiss P, editors. Boston: The Belknap Press of Harvard University Press; 1965.

Bonfantini MA, Proni G. To guess or not to guess? In: Eco U, Sebeok T, editors. The sign of three: Dupin, Holmes, Peirce. Bloomington: Indiana University Press; 1983. Chapter 5.

Peirce CS. Collected papers of Charles Sanders Peirce, vol. 5. In: Hartshorne C, Weiss P, editors. Boston: The Belknap Press of Harvard University Press; 1965.

Flach PA, Kakas AC. Abductive and inductive reasoning: background issues. In: Flach PA, Kakas AC, ­editors. Abduction and induction. Essays on their relation and integration. The Netherlands: Klewer; 2000. Chapter 1.

Murray JF. Voltaire, Walpole and Pasteur: variations on the theme of discovery. Am J Respir Crit Care Med. 2005;172:423–6.

Danemark B, Ekstrom M, Jakobsen L, Karlsson JC. Methodological implications, generalization, scientific inference, models (Part II) In: explaining society. Critical realism in the social sciences. New York: Routledge; 2002.

Pasteur L. Inaugural lecture as professor and dean of the faculty of sciences. In: Peterson H, editor. A treasury of the world’s greatest speeches. Douai, France: University of Lille 7 Dec 1954.

Swineburne R. Simplicity as evidence for truth. Milwaukee: Marquette University Press; 1997.

Sakar S, editor. Logical empiricism at its peak: Schlick, Carnap and Neurath. New York: Garland; 1996.

Popper K. The logic of scientific discovery. New York: Basic Books; 1959. 1934, trans. 1959.

Caws P. The philosophy of science. Princeton: D. Van Nostrand Company; 1965.

Popper K. Conjectures and refutations. The growth of scientific knowledge. 4th ed. London: Routledge and Keegan Paul; 1972.

Feyerabend PK. Against method, outline of an anarchistic theory of knowledge. London, UK: Verso; 1978.

Smith PG. Popper: conjectures and refutations (Chapter IV). In: Theory and reality: an introduction to the philosophy of science. Chicago: University of Chicago Press; 2003.

Blystone RV, Blodgett K. WWW: the scientific method. CBE Life Sci Educ. 2006;5:7–11.

Kleinbaum DG, Kupper LL, Morgenstern H. Epidemiological research. Principles and quantitative methods. New York: Van Nostrand Reinhold; 1982.

Fortune AE, Reid WJ. Research in social work. 3rd ed. New York: Columbia University Press; 1999.

Kerlinger FN. Foundations of behavioral research. 1st ed. New York: Hold, Reinhart and Winston; 1970.

Hoskins CN, Mariano C. Research in nursing and health. Understanding and using quantitative and qualitative methods. New York: Springer; 2004.

Tuckman BW. Conducting educational research. New York: Harcourt, Brace, Jovanovich; 1972.

Wang C, Chiari PC, Weihrauch D, Krolikowski JG, Warltier DC, Kersten JR, Pratt Jr PF, Pagel PS. Gender-specificity of delayed preconditioning by isoflurane in rabbits: potential role of endothelial nitric oxide synthase. Anesth Analg. 2006;103:274–80.

Beyer ME, Slesak G, Nerz S, Kazmaier S, Hoffmeister HM. Effects of endothelin-1 and IRL 1620 on myocardial contractility and myocardial energy metabolism. J Cardiovasc Pharmacol. 1995;26(Suppl 3):S150–2.

PubMed   CAS   Google Scholar  

Stone J, Sharpe M. Amnesia for childhood in patients with unexplained neurological symptoms. J Neurol Neurosurg Psychiatry. 2002;72:416–7.

Naughton BJ, Moran M, Ghaly Y, Michalakes C. Computer tomography scanning and delirium in elder patients. Acad Emerg Med. 1997;4:1107–10.

Easterbrook PJ, Berlin JA, Gopalan R, Matthews DR. Publication bias in clinical research. Lancet. 1991;337:867–72.

Stern JM, Simes RJ. Publication bias: evidence of delayed publication in a cohort study of clinical research projects. BMJ. 1997;315:640–5.

Stevens SS. On the theory of scales and measurement. Science. 1946;103:677–80.

Knapp TR. Treating ordinal scales as interval scales: an attempt to resolve the controversy. Nurs Res. 1990;39:121–3.

The Cochrane Collaboration. Open Learning Material. www.cochrane-net.org/openlearning/html/mod14-3.htm . Accessed 12 Oct 2009.

MacCorquodale K, Meehl PE. On a distinction between hypothetical constructs and intervening ­variables. Psychol Rev. 1948;55:95–107.

Baron RM, Kenny DA. The moderator-mediator variable distinction in social psychological research: ­conceptual, strategic and statistical considerations. J Pers Soc Psychol. 1986;51:1173–82.

Williamson GM, Schultz R. Activity restriction mediates the association between pain and depressed affect: a study of younger and older adult cancer patients. Psychol Aging. 1995;10:369–78.

Song M, Lee EO. Development of a functional capacity model for the elderly. Res Nurs Health. 1998;21:189–98.

MacKinnon DP. Introduction to statistical mediation analysis. New York: Routledge; 2008.

Download references

Author information

Authors and affiliations.

Department of Medicine, College of Medicine, SUNY Downstate Medical Center, 450 Clarkson Avenue, 1199, Brooklyn, NY, 11203, USA

Phyllis G. Supino EdD

You can also search for this author in PubMed   Google Scholar

Corresponding author

Correspondence to Phyllis G. Supino EdD .

Editor information

Editors and affiliations.

, Cardiovascular Medicine, SUNY Downstate Medical Center, Clarkson Avenue, box 1199 450, Brooklyn, 11203, USA

Phyllis G. Supino

, Cardiovascualr Medicine, SUNY Downstate Medical Center, Clarkson Avenue 450, Brooklyn, 11203, USA

Jeffrey S. Borer

Rights and permissions

Reprints and permissions

Copyright information

© 2012 Springer Science+Business Media, LLC

About this chapter

Supino, P.G. (2012). The Research Hypothesis: Role and Construction. In: Supino, P., Borer, J. (eds) Principles of Research Methodology. Springer, New York, NY. https://doi.org/10.1007/978-1-4614-3360-6_3

Download citation

DOI : https://doi.org/10.1007/978-1-4614-3360-6_3

Published : 18 April 2012

Publisher Name : Springer, New York, NY

Print ISBN : 978-1-4614-3359-0

Online ISBN : 978-1-4614-3360-6

eBook Packages : Medicine Medicine (R0)

Share this chapter

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

  • Publish with us

Policies and ethics

  • Find a journal
  • Track your research
  • Bipolar Disorder
  • Therapy Center
  • When To See a Therapist
  • Types of Therapy
  • Best Online Therapy
  • Best Couples Therapy
  • Managing Stress
  • Sleep and Dreaming
  • Understanding Emotions
  • Self-Improvement
  • Healthy Relationships
  • Student Resources
  • Personality Types
  • Sweepstakes
  • Guided Meditations
  • Verywell Mind Insights
  • 2024 Verywell Mind 25
  • Mental Health in the Classroom
  • Editorial Process
  • Meet Our Review Board
  • Crisis Support

How to Write a Great Hypothesis

Hypothesis Definition, Format, Examples, and Tips

Verywell / Alex Dos Diaz

  • The Scientific Method

Hypothesis Format

Falsifiability of a hypothesis.

  • Operationalization

Hypothesis Types

Hypotheses examples.

  • Collecting Data

A hypothesis is a tentative statement about the relationship between two or more variables. It is a specific, testable prediction about what you expect to happen in a study. It is a preliminary answer to your question that helps guide the research process.

Consider a study designed to examine the relationship between sleep deprivation and test performance. The hypothesis might be: "This study is designed to assess the hypothesis that sleep-deprived people will perform worse on a test than individuals who are not sleep-deprived."

At a Glance

A hypothesis is crucial to scientific research because it offers a clear direction for what the researchers are looking to find. This allows them to design experiments to test their predictions and add to our scientific knowledge about the world. This article explores how a hypothesis is used in psychology research, how to write a good hypothesis, and the different types of hypotheses you might use.

The Hypothesis in the Scientific Method

In the scientific method , whether it involves research in psychology, biology, or some other area, a hypothesis represents what the researchers think will happen in an experiment. The scientific method involves the following steps:

  • Forming a question
  • Performing background research
  • Creating a hypothesis
  • Designing an experiment
  • Collecting data
  • Analyzing the results
  • Drawing conclusions
  • Communicating the results

The hypothesis is a prediction, but it involves more than a guess. Most of the time, the hypothesis begins with a question which is then explored through background research. At this point, researchers then begin to develop a testable hypothesis.

Unless you are creating an exploratory study, your hypothesis should always explain what you  expect  to happen.

In a study exploring the effects of a particular drug, the hypothesis might be that researchers expect the drug to have some type of effect on the symptoms of a specific illness. In psychology, the hypothesis might focus on how a certain aspect of the environment might influence a particular behavior.

Remember, a hypothesis does not have to be correct. While the hypothesis predicts what the researchers expect to see, the goal of the research is to determine whether this guess is right or wrong. When conducting an experiment, researchers might explore numerous factors to determine which ones might contribute to the ultimate outcome.

In many cases, researchers may find that the results of an experiment  do not  support the original hypothesis. When writing up these results, the researchers might suggest other options that should be explored in future studies.

In many cases, researchers might draw a hypothesis from a specific theory or build on previous research. For example, prior research has shown that stress can impact the immune system. So a researcher might hypothesize: "People with high-stress levels will be more likely to contract a common cold after being exposed to the virus than people who have low-stress levels."

In other instances, researchers might look at commonly held beliefs or folk wisdom. "Birds of a feather flock together" is one example of folk adage that a psychologist might try to investigate. The researcher might pose a specific hypothesis that "People tend to select romantic partners who are similar to them in interests and educational level."

Elements of a Good Hypothesis

So how do you write a good hypothesis? When trying to come up with a hypothesis for your research or experiments, ask yourself the following questions:

  • Is your hypothesis based on your research on a topic?
  • Can your hypothesis be tested?
  • Does your hypothesis include independent and dependent variables?

Before you come up with a specific hypothesis, spend some time doing background research. Once you have completed a literature review, start thinking about potential questions you still have. Pay attention to the discussion section in the  journal articles you read . Many authors will suggest questions that still need to be explored.

How to Formulate a Good Hypothesis

To form a hypothesis, you should take these steps:

  • Collect as many observations about a topic or problem as you can.
  • Evaluate these observations and look for possible causes of the problem.
  • Create a list of possible explanations that you might want to explore.
  • After you have developed some possible hypotheses, think of ways that you could confirm or disprove each hypothesis through experimentation. This is known as falsifiability.

In the scientific method ,  falsifiability is an important part of any valid hypothesis. In order to test a claim scientifically, it must be possible that the claim could be proven false.

Students sometimes confuse the idea of falsifiability with the idea that it means that something is false, which is not the case. What falsifiability means is that  if  something was false, then it is possible to demonstrate that it is false.

One of the hallmarks of pseudoscience is that it makes claims that cannot be refuted or proven false.

The Importance of Operational Definitions

A variable is a factor or element that can be changed and manipulated in ways that are observable and measurable. However, the researcher must also define how the variable will be manipulated and measured in the study.

Operational definitions are specific definitions for all relevant factors in a study. This process helps make vague or ambiguous concepts detailed and measurable.

For example, a researcher might operationally define the variable " test anxiety " as the results of a self-report measure of anxiety experienced during an exam. A "study habits" variable might be defined by the amount of studying that actually occurs as measured by time.

These precise descriptions are important because many things can be measured in various ways. Clearly defining these variables and how they are measured helps ensure that other researchers can replicate your results.

Replicability

One of the basic principles of any type of scientific research is that the results must be replicable.

Replication means repeating an experiment in the same way to produce the same results. By clearly detailing the specifics of how the variables were measured and manipulated, other researchers can better understand the results and repeat the study if needed.

Some variables are more difficult than others to define. For example, how would you operationally define a variable such as aggression ? For obvious ethical reasons, researchers cannot create a situation in which a person behaves aggressively toward others.

To measure this variable, the researcher must devise a measurement that assesses aggressive behavior without harming others. The researcher might utilize a simulated task to measure aggressiveness in this situation.

Hypothesis Checklist

  • Does your hypothesis focus on something that you can actually test?
  • Does your hypothesis include both an independent and dependent variable?
  • Can you manipulate the variables?
  • Can your hypothesis be tested without violating ethical standards?

The hypothesis you use will depend on what you are investigating and hoping to find. Some of the main types of hypotheses that you might use include:

  • Simple hypothesis : This type of hypothesis suggests there is a relationship between one independent variable and one dependent variable.
  • Complex hypothesis : This type suggests a relationship between three or more variables, such as two independent and dependent variables.
  • Null hypothesis : This hypothesis suggests no relationship exists between two or more variables.
  • Alternative hypothesis : This hypothesis states the opposite of the null hypothesis.
  • Statistical hypothesis : This hypothesis uses statistical analysis to evaluate a representative population sample and then generalizes the findings to the larger group.
  • Logical hypothesis : This hypothesis assumes a relationship between variables without collecting data or evidence.

A hypothesis often follows a basic format of "If {this happens} then {this will happen}." One way to structure your hypothesis is to describe what will happen to the  dependent variable  if you change the  independent variable .

The basic format might be: "If {these changes are made to a certain independent variable}, then we will observe {a change in a specific dependent variable}."

A few examples of simple hypotheses:

  • "Students who eat breakfast will perform better on a math exam than students who do not eat breakfast."
  • "Students who experience test anxiety before an English exam will get lower scores than students who do not experience test anxiety."​
  • "Motorists who talk on the phone while driving will be more likely to make errors on a driving course than those who do not talk on the phone."
  • "Children who receive a new reading intervention will have higher reading scores than students who do not receive the intervention."

Examples of a complex hypothesis include:

  • "People with high-sugar diets and sedentary activity levels are more likely to develop depression."
  • "Younger people who are regularly exposed to green, outdoor areas have better subjective well-being than older adults who have limited exposure to green spaces."

Examples of a null hypothesis include:

  • "There is no difference in anxiety levels between people who take St. John's wort supplements and those who do not."
  • "There is no difference in scores on a memory recall task between children and adults."
  • "There is no difference in aggression levels between children who play first-person shooter games and those who do not."

Examples of an alternative hypothesis:

  • "People who take St. John's wort supplements will have less anxiety than those who do not."
  • "Adults will perform better on a memory task than children."
  • "Children who play first-person shooter games will show higher levels of aggression than children who do not." 

Collecting Data on Your Hypothesis

Once a researcher has formed a testable hypothesis, the next step is to select a research design and start collecting data. The research method depends largely on exactly what they are studying. There are two basic types of research methods: descriptive research and experimental research.

Descriptive Research Methods

Descriptive research such as  case studies ,  naturalistic observations , and surveys are often used when  conducting an experiment is difficult or impossible. These methods are best used to describe different aspects of a behavior or psychological phenomenon.

Once a researcher has collected data using descriptive methods, a  correlational study  can examine how the variables are related. This research method might be used to investigate a hypothesis that is difficult to test experimentally.

Experimental Research Methods

Experimental methods  are used to demonstrate causal relationships between variables. In an experiment, the researcher systematically manipulates a variable of interest (known as the independent variable) and measures the effect on another variable (known as the dependent variable).

Unlike correlational studies, which can only be used to determine if there is a relationship between two variables, experimental methods can be used to determine the actual nature of the relationship—whether changes in one variable actually  cause  another to change.

The hypothesis is a critical part of any scientific exploration. It represents what researchers expect to find in a study or experiment. In situations where the hypothesis is unsupported by the research, the research still has value. Such research helps us better understand how different aspects of the natural world relate to one another. It also helps us develop new hypotheses that can then be tested in the future.

Thompson WH, Skau S. On the scope of scientific hypotheses .  R Soc Open Sci . 2023;10(8):230607. doi:10.1098/rsos.230607

Taran S, Adhikari NKJ, Fan E. Falsifiability in medicine: what clinicians can learn from Karl Popper [published correction appears in Intensive Care Med. 2021 Jun 17;:].  Intensive Care Med . 2021;47(9):1054-1056. doi:10.1007/s00134-021-06432-z

Eyler AA. Research Methods for Public Health . 1st ed. Springer Publishing Company; 2020. doi:10.1891/9780826182067.0004

Nosek BA, Errington TM. What is replication ?  PLoS Biol . 2020;18(3):e3000691. doi:10.1371/journal.pbio.3000691

Aggarwal R, Ranganathan P. Study designs: Part 2 - Descriptive studies .  Perspect Clin Res . 2019;10(1):34-36. doi:10.4103/picr.PICR_154_18

Nevid J. Psychology: Concepts and Applications. Wadworth, 2013.

By Kendra Cherry, MSEd Kendra Cherry, MS, is a psychosocial rehabilitation specialist, psychology educator, and author of the "Everything Psychology Book."

Hypothesis Generation from Literature for Advancing Biological Mechanism Research: A Perspective

New citation alert added.

This alert has been successfully added and will be sent to:

You will be notified whenever a record that you have chosen has been cited.

To manage your alert preferences, click on the button below.

New Citation Alert!

Please log in to your account

Information & Contributors

Bibliometrics & citations, index terms.

Applied computing

Document management and text processing

Life and medical sciences

Bioinformatics

Recommendations

Automated hypothesis generation based on mining scientific literature.

Keeping up with the ever-expanding flow of data and publications is untenable and poses a fundamental bottleneck to scientific progress. Current search technologies typically find many relevant documents, but they do not extract and organize the ...

Research Article: Bioinformatic analysis of molecular network of glucosinolate biosynthesis

Glucosinolates constitute a major group of secondary metabolites in Arabidopsis, which play an important role in plant interaction with pathogens and insects. Advances in glucosinolate research have defined the biosynthetic pathways. However, cross-talk ...

Mining pathway signatures from microarray data and relevant biological knowledge

High-throughput technologies such as DNA microarray are in the process of revolutionising the way modern biological research is being done. Bioinformatics tools are becoming increasingly important to assist biomedical scientists in their quest in ...

Information

Published in.

cover image ACM Other conferences

Association for Computing Machinery

New York, NY, United States

Publication History

Permissions, check for updates, author tags.

  • bioinformatics
  • biomedical knowledge mining
  • machine learning
  • Research-article
  • Refereed limited

Contributors

Other metrics, bibliometrics, article metrics.

  • 0 Total Citations
  • 22 Total Downloads
  • Downloads (Last 12 months) 22
  • Downloads (Last 6 weeks) 0

View Options

Login options.

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

View options.

View or Download as a PDF file.

View online with eReader .

HTML Format

View this article in HTML Format.

Share this Publication link

Copying failed.

Share on social media

Affiliations, export citations.

  • Please download or close your previous search result export first before starting a new bulk export. Preview is not available. By clicking download, a status dialog will open to start the export process. The process may take a few minutes but once it finishes a file will be downloadable from your browser. You may continue to browse the DL while the export process is in progress. Download
  • Download citation
  • Copy citation

We are preparing your search results for download ...

We will inform you here when the file is ready.

Your file of search results citations is now ready.

Your search export query has expired. Please try again.

Machine Learning as a Tool for Hypothesis Generation

While hypothesis testing is a highly formalized activity, hypothesis generation remains largely informal. We propose a systematic procedure to generate novel hypotheses about human behavior, which uses the capacity of machine learning algorithms to notice patterns people might not. We illustrate the procedure with a concrete application: judge decisions about who to jail. We begin with a striking fact: The defendant’s face alone matters greatly for the judge’s jailing decision. In fact, an algorithm given only the pixels in the defendant’s mugshot accounts for up to half of the predictable variation. We develop a procedure that allows human subjects to interact with this black-box algorithm to produce hypotheses about what in the face influences judge decisions. The procedure generates hypotheses that are both interpretable and novel: They are not explained by demographics (e.g. race) or existing psychology research; nor are they already known (even if tacitly) to people or even experts. Though these results are specific, our procedure is general. It provides a way to produce novel, interpretable hypotheses from any high-dimensional dataset (e.g. cell phones, satellites, online behavior, news headlines, corporate filings, and high-frequency time series). A central tenet of our paper is that hypothesis generation is in and of itself a valuable activity, and hope this encourages future work in this largely “pre-scientific” stage of science.

This is a revised version of Chicago Booth working paper 22-15 “Algorithmic Behavioral Science: Machine Learning as a Tool for Scientific Discovery.” We gratefully acknowledge support from the Alfred P. Sloan Foundation, Emmanuel Roman, and the Center for Applied Artificial Intelligence at the University of Chicago. For valuable comments we thank Andrei Shliefer, Larry Katz and five anonymous referees, as well as Marianne Bertrand, Jesse Bruhn, Steven Durlauf, Joel Ferguson, Emma Harrington, Supreet Kaur, Matteo Magnaricotte, Dev Patel, Betsy Levy Paluck, Roberto Rocha, Evan Rose, Suproteem Sarkar, Josh Schwartzstein, Nick Swanson, Nadav Tadelis, Richard Thaler, Alex Todorov, Jenny Wang and Heather Yang, as well as seminar participants at Bocconi, Brown, Columbia, ETH Zurich, Harvard, MIT, Stanford, the University of California Berkeley, the University of Chicago, the University of Pennsylvania, the 2022 Behavioral Economics Annual Meetings and the 2022 NBER summer institute. For invaluable assistance with the data and analysis we thank Cecilia Cook, Logan Crowl, Arshia Elyaderani, and especially Jonas Knecht and James Ross. This research was reviewed by the University of Chicago Social and Behavioral Sciences Institutional Review Board (IRB20-0917) and deemed exempt because the project relies on secondary analysis of public data sources. All opinions and any errors are of course our own. The views expressed herein are those of the authors and do not necessarily reflect the views of the National Bureau of Economic Research.

MARC RIS BibTeΧ

Download Citation Data

Published Versions

Jens Ludwig & Sendhil Mullainathan, 2024. " Machine Learning as a Tool for Hypothesis Generation, " The Quarterly Journal of Economics, vol 139(2), pages 751-827.

Working Groups

Conferences, more from nber.

In addition to working papers , the NBER disseminates affiliates’ latest findings through a range of free periodicals — the NBER Reporter , the NBER Digest , the Bulletin on Retirement and Disability , the Bulletin on Health , and the Bulletin on Entrepreneurship  — as well as online conference reports , video lectures , and interviews .

2024, 16th Annual Feldstein Lecture, Cecilia E. Rouse," Lessons for Economists from the Pandemic" cover slide

U.S. flag

An official website of the United States government

The .gov means it’s official. Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

The site is secure. The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

  • Publications
  • Account settings
  • My Bibliography
  • Collections
  • Citation manager

Save citation to file

Email citation, add to collections.

  • Create a new collection
  • Add to an existing collection

Add to My Bibliography

Your saved search, create a file for external citation management software, your rss feed.

  • Search in PubMed
  • Search in NLM Catalog
  • Add to Search

Hypothesis-generating research and predictive medicine

Affiliation.

  • 1 National Human Genome Research Institute, National Institutes of Health, Bethesda, Maryland 20892, USA. [email protected]
  • PMID: 23817045
  • PMCID: PMC3698497
  • DOI: 10.1101/gr.157826.113

Genomics has profoundly changed biology by scaling data acquisition, which has provided researchers with the opportunity to interrogate biology in novel and creative ways. No longer constrained by low-throughput assays, researchers have developed hypothesis-generating approaches to understand the molecular basis of nature-both normal and pathological. The paradigm of hypothesis-generating research does not replace or undermine hypothesis-testing modes of research; instead, it complements them and has facilitated discoveries that may not have been possible with hypothesis-testing research. The hypothesis-generating mode of research has been primarily practiced in basic science but has recently been extended to clinical-translational work as well. Just as in basic science, this approach to research can facilitate insights into human health and disease mechanisms and provide the crucially needed data set of the full spectrum of genotype-phenotype correlations. Finally, the paradigm of hypothesis-generating research is conceptually similar to the underpinning of predictive genomic medicine, which has the potential to shift medicine from a primarily population- or cohort-based activity to one that instead uses individual susceptibility, prognostic, and pharmacogenetic profiles to maximize the efficacy and minimize the iatrogenic effects of medical interventions.

PubMed Disclaimer

Similar articles

  • Translational Genomics in Low- and Middle-Income Countries: Opportunities and Challenges. Tekola-Ayele F, Rotimi CN. Tekola-Ayele F, et al. Public Health Genomics. 2015;18(4):242-7. doi: 10.1159/000433518. Epub 2015 Jun 26. Public Health Genomics. 2015. PMID: 26138992 Free PMC article.
  • Genomics and medicine: distraction, incremental progress, or the dawn of a new age? Cooper RS, Psaty BM. Cooper RS, et al. Ann Intern Med. 2003 Apr 1;138(7):576-80. doi: 10.7326/0003-4819-138-7-200304010-00014. Ann Intern Med. 2003. PMID: 12667028
  • The future of Cochrane Neonatal. Soll RF, Ovelman C, McGuire W. Soll RF, et al. Early Hum Dev. 2020 Nov;150:105191. doi: 10.1016/j.earlhumdev.2020.105191. Epub 2020 Sep 12. Early Hum Dev. 2020. PMID: 33036834
  • Insulin-like growth factor binding protein 2: gene expression microarrays and the hypothesis-generation paradigm. Zhang W, Wang H, Song SW, Fuller GN. Zhang W, et al. Brain Pathol. 2002 Jan;12(1):87-94. doi: 10.1111/j.1750-3639.2002.tb00425.x. Brain Pathol. 2002. PMID: 11770904 Free PMC article. Review.
  • Systems Medicine: The Future of Medical Genomics, Healthcare, and Wellness. Saqi M, Pellet J, Roznovat I, Mazein A, Ballereau S, De Meulder B, Auffray C. Saqi M, et al. Methods Mol Biol. 2016;1386:43-60. doi: 10.1007/978-1-4939-3283-2_3. Methods Mol Biol. 2016. PMID: 26677178 Review.
  • Rethinking the utility of the Five Domains model. Hampton JO, Hemsworth LM, Hemsworth PH, Hyndman TH, Sandøe P. Hampton JO, et al. Anim Welf. 2023 Sep 27;32:e62. doi: 10.1017/awf.2023.84. eCollection 2023. Anim Welf. 2023. PMID: 38487458 Free PMC article. Review.
  • Data-driven hypothesis generation among inexperienced clinical researchers: A comparison of secondary data analyses with visualization (VIADS) and other tools. Jing X, Cimino JJ, Patel VL, Zhou Y, Shubrook JH, De Lacalle S, Draghi BN, Ernst MA, Weaver A, Sekar S, Liu C. Jing X, et al. J Clin Transl Sci. 2024 Jan 4;8(1):e13. doi: 10.1017/cts.2023.708. eCollection 2024. J Clin Transl Sci. 2024. PMID: 38384898 Free PMC article.
  • How do clinical researchers generate data-driven scientific hypotheses? Cognitive events using think-aloud protocol. Jing X, Draghi BN, Ernst MA, Patel VL, Cimino JJ, Shubrook JH, Zhou Y, Liu C, De Lacalle S. Jing X, et al. medRxiv [Preprint]. 2023 Oct 31:2023.10.31.23297860. doi: 10.1101/2023.10.31.23297860. medRxiv. 2023. PMID: 37961555 Free PMC article. Preprint.
  • Data-driven hypothesis generation among inexperienced clinical researchers: A comparison of secondary data analyses with visualization (VIADS) and other tools. Jing X, Cimino JJ, Patel VL, Zhou Y, Shubrook JH, De Lacalle S, Draghi BN, Ernst MA, Weaver A, Sekar S, Liu C. Jing X, et al. medRxiv [Preprint]. 2023 Oct 31:2023.05.30.23290719. doi: 10.1101/2023.05.30.23290719. medRxiv. 2023. Update in: J Clin Transl Sci. 2024 Jan 04;8(1):e13. doi: 10.1017/cts.2023.708. PMID: 37333271 Free PMC article. Updated. Preprint.
  • The Roles of a Secondary Data Analytics Tool and Experience in Scientific Hypothesis Generation in Clinical Research: Protocol for a Mixed Methods Study. Jing X, Patel VL, Cimino JJ, Shubrook JH, Zhou Y, Liu C, De Lacalle S. Jing X, et al. JMIR Res Protoc. 2022 Jul 18;11(7):e39414. doi: 10.2196/39414. JMIR Res Protoc. 2022. PMID: 35736798 Free PMC article.
  • Bell CJ, Dinwiddie DL, Miller NA, Hateley SL, Ganusova EE, Mudge J, Langley RJ, Zhang L, Lee CC, Schilkey FD, et al. 2011. Carrier testing for severe childhood recessive diseases by next-generation sequencing. Sci Transl Med 3: 65ra64 - PMC - PubMed
  • Biesecker LG 2013. Incidental findings are critical for genomics. Am J Hum Genet 92: 648–651 - PMC - PubMed
  • Biesecker LG, Mullikin JC, Facio FM, Turner C, Cherukuri PF, Blakesley RW, Bouffard GG, Chines PS, Cruz P, Hansen NF, et al. 2009. The ClinSeq Project: Piloting large-scale genome sequencing for research in genomic medicine. Genome Res 19: 1665–1674 - PMC - PubMed
  • Hennekam RC, Biesecker LG 2012. Next-generation sequencing demands next-generation phenotyping. Hum Mutat 33: 884–886 - PMC - PubMed
  • International Human Genome Sequencing Consortium 2004. Finishing the euchromatic sequence of the human genome. Nature 431: 931–945 - PubMed

Publication types

  • Search in MeSH

LinkOut - more resources

Full text sources.

  • Europe PubMed Central
  • PubMed Central

Other Literature Sources

  • scite Smart Citations
  • Citation Manager

NCBI Literature Resources

MeSH PMC Bookshelf Disclaimer

The PubMed wordmark and PubMed logo are registered trademarks of the U.S. Department of Health and Human Services (HHS). Unauthorized use of these marks is strictly prohibited.

  • Privacy Policy

Research Method

Home » What is a Hypothesis – Types, Examples and Writing Guide

What is a Hypothesis – Types, Examples and Writing Guide

Table of Contents

What is a Hypothesis

Definition:

Hypothesis is an educated guess or proposed explanation for a phenomenon, based on some initial observations or data. It is a tentative statement that can be tested and potentially proven or disproven through further investigation and experimentation.

Hypothesis is often used in scientific research to guide the design of experiments and the collection and analysis of data. It is an essential element of the scientific method, as it allows researchers to make predictions about the outcome of their experiments and to test those predictions to determine their accuracy.

Types of Hypothesis

Types of Hypothesis are as follows:

Research Hypothesis

A research hypothesis is a statement that predicts a relationship between variables. It is usually formulated as a specific statement that can be tested through research, and it is often used in scientific research to guide the design of experiments.

Null Hypothesis

The null hypothesis is a statement that assumes there is no significant difference or relationship between variables. It is often used as a starting point for testing the research hypothesis, and if the results of the study reject the null hypothesis, it suggests that there is a significant difference or relationship between variables.

Alternative Hypothesis

An alternative hypothesis is a statement that assumes there is a significant difference or relationship between variables. It is often used as an alternative to the null hypothesis and is tested against the null hypothesis to determine which statement is more accurate.

Directional Hypothesis

A directional hypothesis is a statement that predicts the direction of the relationship between variables. For example, a researcher might predict that increasing the amount of exercise will result in a decrease in body weight.

Non-directional Hypothesis

A non-directional hypothesis is a statement that predicts the relationship between variables but does not specify the direction. For example, a researcher might predict that there is a relationship between the amount of exercise and body weight, but they do not specify whether increasing or decreasing exercise will affect body weight.

Statistical Hypothesis

A statistical hypothesis is a statement that assumes a particular statistical model or distribution for the data. It is often used in statistical analysis to test the significance of a particular result.

Composite Hypothesis

A composite hypothesis is a statement that assumes more than one condition or outcome. It can be divided into several sub-hypotheses, each of which represents a different possible outcome.

Empirical Hypothesis

An empirical hypothesis is a statement that is based on observed phenomena or data. It is often used in scientific research to develop theories or models that explain the observed phenomena.

Simple Hypothesis

A simple hypothesis is a statement that assumes only one outcome or condition. It is often used in scientific research to test a single variable or factor.

Complex Hypothesis

A complex hypothesis is a statement that assumes multiple outcomes or conditions. It is often used in scientific research to test the effects of multiple variables or factors on a particular outcome.

Applications of Hypothesis

Hypotheses are used in various fields to guide research and make predictions about the outcomes of experiments or observations. Here are some examples of how hypotheses are applied in different fields:

  • Science : In scientific research, hypotheses are used to test the validity of theories and models that explain natural phenomena. For example, a hypothesis might be formulated to test the effects of a particular variable on a natural system, such as the effects of climate change on an ecosystem.
  • Medicine : In medical research, hypotheses are used to test the effectiveness of treatments and therapies for specific conditions. For example, a hypothesis might be formulated to test the effects of a new drug on a particular disease.
  • Psychology : In psychology, hypotheses are used to test theories and models of human behavior and cognition. For example, a hypothesis might be formulated to test the effects of a particular stimulus on the brain or behavior.
  • Sociology : In sociology, hypotheses are used to test theories and models of social phenomena, such as the effects of social structures or institutions on human behavior. For example, a hypothesis might be formulated to test the effects of income inequality on crime rates.
  • Business : In business research, hypotheses are used to test the validity of theories and models that explain business phenomena, such as consumer behavior or market trends. For example, a hypothesis might be formulated to test the effects of a new marketing campaign on consumer buying behavior.
  • Engineering : In engineering, hypotheses are used to test the effectiveness of new technologies or designs. For example, a hypothesis might be formulated to test the efficiency of a new solar panel design.

How to write a Hypothesis

Here are the steps to follow when writing a hypothesis:

Identify the Research Question

The first step is to identify the research question that you want to answer through your study. This question should be clear, specific, and focused. It should be something that can be investigated empirically and that has some relevance or significance in the field.

Conduct a Literature Review

Before writing your hypothesis, it’s essential to conduct a thorough literature review to understand what is already known about the topic. This will help you to identify the research gap and formulate a hypothesis that builds on existing knowledge.

Determine the Variables

The next step is to identify the variables involved in the research question. A variable is any characteristic or factor that can vary or change. There are two types of variables: independent and dependent. The independent variable is the one that is manipulated or changed by the researcher, while the dependent variable is the one that is measured or observed as a result of the independent variable.

Formulate the Hypothesis

Based on the research question and the variables involved, you can now formulate your hypothesis. A hypothesis should be a clear and concise statement that predicts the relationship between the variables. It should be testable through empirical research and based on existing theory or evidence.

Write the Null Hypothesis

The null hypothesis is the opposite of the alternative hypothesis, which is the hypothesis that you are testing. The null hypothesis states that there is no significant difference or relationship between the variables. It is important to write the null hypothesis because it allows you to compare your results with what would be expected by chance.

Refine the Hypothesis

After formulating the hypothesis, it’s important to refine it and make it more precise. This may involve clarifying the variables, specifying the direction of the relationship, or making the hypothesis more testable.

Examples of Hypothesis

Here are a few examples of hypotheses in different fields:

  • Psychology : “Increased exposure to violent video games leads to increased aggressive behavior in adolescents.”
  • Biology : “Higher levels of carbon dioxide in the atmosphere will lead to increased plant growth.”
  • Sociology : “Individuals who grow up in households with higher socioeconomic status will have higher levels of education and income as adults.”
  • Education : “Implementing a new teaching method will result in higher student achievement scores.”
  • Marketing : “Customers who receive a personalized email will be more likely to make a purchase than those who receive a generic email.”
  • Physics : “An increase in temperature will cause an increase in the volume of a gas, assuming all other variables remain constant.”
  • Medicine : “Consuming a diet high in saturated fats will increase the risk of developing heart disease.”

Purpose of Hypothesis

The purpose of a hypothesis is to provide a testable explanation for an observed phenomenon or a prediction of a future outcome based on existing knowledge or theories. A hypothesis is an essential part of the scientific method and helps to guide the research process by providing a clear focus for investigation. It enables scientists to design experiments or studies to gather evidence and data that can support or refute the proposed explanation or prediction.

The formulation of a hypothesis is based on existing knowledge, observations, and theories, and it should be specific, testable, and falsifiable. A specific hypothesis helps to define the research question, which is important in the research process as it guides the selection of an appropriate research design and methodology. Testability of the hypothesis means that it can be proven or disproven through empirical data collection and analysis. Falsifiability means that the hypothesis should be formulated in such a way that it can be proven wrong if it is incorrect.

In addition to guiding the research process, the testing of hypotheses can lead to new discoveries and advancements in scientific knowledge. When a hypothesis is supported by the data, it can be used to develop new theories or models to explain the observed phenomenon. When a hypothesis is not supported by the data, it can help to refine existing theories or prompt the development of new hypotheses to explain the phenomenon.

When to use Hypothesis

Here are some common situations in which hypotheses are used:

  • In scientific research , hypotheses are used to guide the design of experiments and to help researchers make predictions about the outcomes of those experiments.
  • In social science research , hypotheses are used to test theories about human behavior, social relationships, and other phenomena.
  • I n business , hypotheses can be used to guide decisions about marketing, product development, and other areas. For example, a hypothesis might be that a new product will sell well in a particular market, and this hypothesis can be tested through market research.

Characteristics of Hypothesis

Here are some common characteristics of a hypothesis:

  • Testable : A hypothesis must be able to be tested through observation or experimentation. This means that it must be possible to collect data that will either support or refute the hypothesis.
  • Falsifiable : A hypothesis must be able to be proven false if it is not supported by the data. If a hypothesis cannot be falsified, then it is not a scientific hypothesis.
  • Clear and concise : A hypothesis should be stated in a clear and concise manner so that it can be easily understood and tested.
  • Based on existing knowledge : A hypothesis should be based on existing knowledge and research in the field. It should not be based on personal beliefs or opinions.
  • Specific : A hypothesis should be specific in terms of the variables being tested and the predicted outcome. This will help to ensure that the research is focused and well-designed.
  • Tentative: A hypothesis is a tentative statement or assumption that requires further testing and evidence to be confirmed or refuted. It is not a final conclusion or assertion.
  • Relevant : A hypothesis should be relevant to the research question or problem being studied. It should address a gap in knowledge or provide a new perspective on the issue.

Advantages of Hypothesis

Hypotheses have several advantages in scientific research and experimentation:

  • Guides research: A hypothesis provides a clear and specific direction for research. It helps to focus the research question, select appropriate methods and variables, and interpret the results.
  • Predictive powe r: A hypothesis makes predictions about the outcome of research, which can be tested through experimentation. This allows researchers to evaluate the validity of the hypothesis and make new discoveries.
  • Facilitates communication: A hypothesis provides a common language and framework for scientists to communicate with one another about their research. This helps to facilitate the exchange of ideas and promotes collaboration.
  • Efficient use of resources: A hypothesis helps researchers to use their time, resources, and funding efficiently by directing them towards specific research questions and methods that are most likely to yield results.
  • Provides a basis for further research: A hypothesis that is supported by data provides a basis for further research and exploration. It can lead to new hypotheses, theories, and discoveries.
  • Increases objectivity: A hypothesis can help to increase objectivity in research by providing a clear and specific framework for testing and interpreting results. This can reduce bias and increase the reliability of research findings.

Limitations of Hypothesis

Some Limitations of the Hypothesis are as follows:

  • Limited to observable phenomena: Hypotheses are limited to observable phenomena and cannot account for unobservable or intangible factors. This means that some research questions may not be amenable to hypothesis testing.
  • May be inaccurate or incomplete: Hypotheses are based on existing knowledge and research, which may be incomplete or inaccurate. This can lead to flawed hypotheses and erroneous conclusions.
  • May be biased: Hypotheses may be biased by the researcher’s own beliefs, values, or assumptions. This can lead to selective interpretation of data and a lack of objectivity in research.
  • Cannot prove causation: A hypothesis can only show a correlation between variables, but it cannot prove causation. This requires further experimentation and analysis.
  • Limited to specific contexts: Hypotheses are limited to specific contexts and may not be generalizable to other situations or populations. This means that results may not be applicable in other contexts or may require further testing.
  • May be affected by chance : Hypotheses may be affected by chance or random variation, which can obscure or distort the true relationship between variables.

About the author

' src=

Muhammad Hassan

Researcher, Academic Writer, Web developer

You may also like

Research Topic

Research Topics – Ideas and Examples

Informed Consent in Research

Informed Consent in Research – Types, Templates...

Figures in Research Paper

Figures in Research Paper – Examples and Guide

Research Problem

Research Problem – Examples, Types and Guide

APA Table of Contents

APA Table of Contents – Format and Example

Conceptual Framework

Conceptual Framework – Types, Methodology and...

Encyclopedia Britannica

  • History & Society
  • Science & Tech
  • Biographies
  • Animals & Nature
  • Geography & Travel
  • Arts & Culture
  • Games & Quizzes
  • On This Day
  • One Good Fact
  • New Articles
  • Lifestyles & Social Issues
  • Philosophy & Religion
  • Politics, Law & Government
  • World History
  • Health & Medicine
  • Browse Biographies
  • Birds, Reptiles & Other Vertebrates
  • Bugs, Mollusks & Other Invertebrates
  • Environment
  • Fossils & Geologic Time
  • Entertainment & Pop Culture
  • Sports & Recreation
  • Visual Arts
  • Demystified
  • Image Galleries
  • Infographics
  • Top Questions
  • Britannica Kids
  • Saving Earth
  • Space Next 50
  • Student Center

experiments disproving spontaneous generation

  • When did science begin?
  • Where was science invented?

Blackboard inscribed with scientific formulas and calculations in physics and mathematics

scientific hypothesis

Our editors will review what you’ve submitted and determine whether to revise the article.

  • National Center for Biotechnology Information - PubMed Central - On the scope of scientific hypotheses
  • LiveScience - What is a scientific hypothesis?
  • The Royal Society - Open Science - On the scope of scientific hypotheses

experiments disproving spontaneous generation

scientific hypothesis , an idea that proposes a tentative explanation about a phenomenon or a narrow set of phenomena observed in the natural world. The two primary features of a scientific hypothesis are falsifiability and testability, which are reflected in an “If…then” statement summarizing the idea and in the ability to be supported or refuted through observation and experimentation. The notion of the scientific hypothesis as both falsifiable and testable was advanced in the mid-20th century by Austrian-born British philosopher Karl Popper .

The formulation and testing of a hypothesis is part of the scientific method , the approach scientists use when attempting to understand and test ideas about natural phenomena. The generation of a hypothesis frequently is described as a creative process and is based on existing scientific knowledge, intuition , or experience. Therefore, although scientific hypotheses commonly are described as educated guesses, they actually are more informed than a guess. In addition, scientists generally strive to develop simple hypotheses, since these are easier to test relative to hypotheses that involve many different variables and potential outcomes. Such complex hypotheses may be developed as scientific models ( see scientific modeling ).

Depending on the results of scientific evaluation, a hypothesis typically is either rejected as false or accepted as true. However, because a hypothesis inherently is falsifiable, even hypotheses supported by scientific evidence and accepted as true are susceptible to rejection later, when new evidence has become available. In some instances, rather than rejecting a hypothesis because it has been falsified by new evidence, scientists simply adapt the existing idea to accommodate the new information. In this sense a hypothesis is never incorrect but only incomplete.

The investigation of scientific hypotheses is an important component in the development of scientific theory . Hence, hypotheses differ fundamentally from theories; whereas the former is a specific tentative explanation and serves as the main tool by which scientists gather data, the latter is a broad general explanation that incorporates data from many different scientific investigations undertaken to explore hypotheses.

Countless hypotheses have been developed and tested throughout the history of science . Several examples include the idea that living organisms develop from nonliving matter, which formed the basis of spontaneous generation , a hypothesis that ultimately was disproved (first in 1668, with the experiments of Italian physician Francesco Redi , and later in 1859, with the experiments of French chemist and microbiologist Louis Pasteur ); the concept proposed in the late 19th century that microorganisms cause certain diseases (now known as germ theory ); and the notion that oceanic crust forms along submarine mountain zones and spreads laterally away from them ( seafloor spreading hypothesis ).

  • Metaphysics

Definition of Scientific Hypothesis: A Generalization or a Causal Explanation?

  • January 2006
  • Journal of The Korean Association For Science Education 26(5)
  • This person is not on ResearchGate, or hasn't claimed this research yet.

Discover the world's research

  • 25+ million members
  • 160+ million publication pages
  • 2.3+ billion citations

No full-text available

Request Full-text Paper PDF

To read the full-text of this research, you can request a copy directly from the authors.

William Hedley Thompson

  • Yilan Jiang
  • Remco Chang

Mijung Kim

  • COGNITIVE SCI

David Klahr

  • Richard E. Mayer
  • J. J. C. Smart
  • Ernest Nagel
  • Wesley C. Salmon
  • Rudolf Schmid
  • Peter H. Raven

Ray Franklin Evert

  • John A. Hutchison
  • W. L. Reese
  • Anton E. Lawson
  • P. M. B. Walker (editor
  • Steven Darian
  • Ted F. Andrews
  • N. A. Anderson
  • D. I. Arnon
  • Ronald D. Simpson

Norman D. Anderson

  • A. M. Sarquis

Rom Harré

  • Antony Flew
  • Clarence Barnhart
  • Edward J. Tarbuck
  • Frederick K. Lutgens
  • Edward Víctor
  • Sybil P. Parker
  • Carl Gustav Hempel

John H. Postlethwait

  • C H Heimler
  • Recruit researchers
  • Join for free
  • Login Email Tip: Most researchers use their institutional email address as their ResearchGate login Password Forgot password? Keep me logged in Log in or Continue with Google Welcome back! Please log in. Email · Hint Tip: Most researchers use their institutional email address as their ResearchGate login Password Forgot password? Keep me logged in Log in or Continue with Google No account? Sign up
  • Subscriber Services
  • For Authors
  • Publications
  • Archaeology
  • Art & Architecture
  • Bilingual dictionaries
  • Classical studies
  • Encyclopedias
  • English Dictionaries and Thesauri
  • Language reference
  • Linguistics
  • Media studies
  • Medicine and health
  • Names studies
  • Performing arts
  • Science and technology
  • Social sciences
  • Society and culture
  • Overview Pages
  • Subject Reference
  • English Dictionaries
  • Bilingual Dictionaries

Recently viewed (0)

  • Save Search
  • Share This Facebook LinkedIn Twitter

Related Content

Related overviews.

classification

More Like This

Show all results sharing these subjects:

  • Life Sciences

hypothesis-generating method

Quick reference.

A data-structuring technique, such as a classification and ordination method which, by grouping and ranking data, suggests possible relationships with other factors (i.e. generates an hypothesis). Appropriate data may then be collected to test the hypothesis statistically.

From:   hypothesis-generating method   in  A Dictionary of Zoology »

Subjects: Science and technology — Life Sciences

Related content in Oxford Reference

Reference entries.

View all related items in Oxford Reference »

Search for: 'hypothesis-generating method' in Oxford Reference »

  • Oxford University Press

PRINTED FROM OXFORD REFERENCE (www.oxfordreference.com). (c) Copyright Oxford University Press, 2023. All Rights Reserved. Under the terms of the licence agreement, an individual user may print out a PDF of a single entry from a reference work in OR for personal use (for details see Privacy Policy and Legal Notice ).

date: 21 August 2024

  • Cookie Policy
  • Privacy Policy
  • Legal Notice
  • Accessibility
  • [195.158.225.230]
  • 195.158.225.230

Character limit 500 /500

Have a language expert improve your writing

Run a free plagiarism check in 10 minutes, generate accurate citations for free.

  • Knowledge Base

Hypothesis Testing | A Step-by-Step Guide with Easy Examples

Published on November 8, 2019 by Rebecca Bevans . Revised on June 22, 2023.

Hypothesis testing is a formal procedure for investigating our ideas about the world using statistics . It is most often used by scientists to test specific predictions, called hypotheses, that arise from theories.

There are 5 main steps in hypothesis testing:

  • State your research hypothesis as a null hypothesis and alternate hypothesis (H o ) and (H a  or H 1 ).
  • Collect data in a way designed to test the hypothesis.
  • Perform an appropriate statistical test .
  • Decide whether to reject or fail to reject your null hypothesis.
  • Present the findings in your results and discussion section.

Though the specific details might vary, the procedure you will use when testing a hypothesis will always follow some version of these steps.

Table of contents

Step 1: state your null and alternate hypothesis, step 2: collect data, step 3: perform a statistical test, step 4: decide whether to reject or fail to reject your null hypothesis, step 5: present your findings, other interesting articles, frequently asked questions about hypothesis testing.

After developing your initial research hypothesis (the prediction that you want to investigate), it is important to restate it as a null (H o ) and alternate (H a ) hypothesis so that you can test it mathematically.

The alternate hypothesis is usually your initial hypothesis that predicts a relationship between variables. The null hypothesis is a prediction of no relationship between the variables you are interested in.

  • H 0 : Men are, on average, not taller than women. H a : Men are, on average, taller than women.

Here's why students love Scribbr's proofreading services

Discover proofreading & editing

For a statistical test to be valid , it is important to perform sampling and collect data in a way that is designed to test your hypothesis. If your data are not representative, then you cannot make statistical inferences about the population you are interested in.

There are a variety of statistical tests available, but they are all based on the comparison of within-group variance (how spread out the data is within a category) versus between-group variance (how different the categories are from one another).

If the between-group variance is large enough that there is little or no overlap between groups, then your statistical test will reflect that by showing a low p -value . This means it is unlikely that the differences between these groups came about by chance.

Alternatively, if there is high within-group variance and low between-group variance, then your statistical test will reflect that with a high p -value. This means it is likely that any difference you measure between groups is due to chance.

Your choice of statistical test will be based on the type of variables and the level of measurement of your collected data .

  • an estimate of the difference in average height between the two groups.
  • a p -value showing how likely you are to see this difference if the null hypothesis of no difference is true.

Based on the outcome of your statistical test, you will have to decide whether to reject or fail to reject your null hypothesis.

In most cases you will use the p -value generated by your statistical test to guide your decision. And in most cases, your predetermined level of significance for rejecting the null hypothesis will be 0.05 – that is, when there is a less than 5% chance that you would see these results if the null hypothesis were true.

In some cases, researchers choose a more conservative level of significance, such as 0.01 (1%). This minimizes the risk of incorrectly rejecting the null hypothesis ( Type I error ).

The results of hypothesis testing will be presented in the results and discussion sections of your research paper , dissertation or thesis .

In the results section you should give a brief summary of the data and a summary of the results of your statistical test (for example, the estimated difference between group means and associated p -value). In the discussion , you can discuss whether your initial hypothesis was supported by your results or not.

In the formal language of hypothesis testing, we talk about rejecting or failing to reject the null hypothesis. You will probably be asked to do this in your statistics assignments.

However, when presenting research results in academic papers we rarely talk this way. Instead, we go back to our alternate hypothesis (in this case, the hypothesis that men are on average taller than women) and state whether the result of our test did or did not support the alternate hypothesis.

If your null hypothesis was rejected, this result is interpreted as “supported the alternate hypothesis.”

These are superficial differences; you can see that they mean the same thing.

You might notice that we don’t say that we reject or fail to reject the alternate hypothesis . This is because hypothesis testing is not designed to prove or disprove anything. It is only designed to test whether a pattern we measure could have arisen spuriously, or by chance.

If we reject the null hypothesis based on our research (i.e., we find that it is unlikely that the pattern arose by chance), then we can say our test lends support to our hypothesis . But if the pattern does not pass our decision rule, meaning that it could have arisen by chance, then we say the test is inconsistent with our hypothesis .

If you want to know more about statistics , methodology , or research bias , make sure to check out some of our other articles with explanations and examples.

  • Normal distribution
  • Descriptive statistics
  • Measures of central tendency
  • Correlation coefficient

Methodology

  • Cluster sampling
  • Stratified sampling
  • Types of interviews
  • Cohort study
  • Thematic analysis

Research bias

  • Implicit bias
  • Cognitive bias
  • Survivorship bias
  • Availability heuristic
  • Nonresponse bias
  • Regression to the mean

Hypothesis testing is a formal procedure for investigating our ideas about the world using statistics. It is used by scientists to test specific predictions, called hypotheses , by calculating how likely it is that a pattern or relationship between variables could have arisen by chance.

A hypothesis states your predictions about what your research will find. It is a tentative answer to your research question that has not yet been tested. For some research projects, you might have to write several hypotheses that address different aspects of your research question.

A hypothesis is not just a guess — it should be based on existing theories and knowledge. It also has to be testable, which means you can support or refute it through scientific research methods (such as experiments, observations and statistical analysis of data).

Null and alternative hypotheses are used in statistical hypothesis testing . The null hypothesis of a test always predicts no effect or no relationship between variables, while the alternative hypothesis states your research prediction of an effect or relationship.

Cite this Scribbr article

If you want to cite this source, you can copy and paste the citation or click the “Cite this Scribbr article” button to automatically add the citation to our free Citation Generator.

Bevans, R. (2023, June 22). Hypothesis Testing | A Step-by-Step Guide with Easy Examples. Scribbr. Retrieved August 21, 2024, from https://www.scribbr.com/statistics/hypothesis-testing/

Is this article helpful?

Rebecca Bevans

Rebecca Bevans

Other students also liked, choosing the right statistical test | types & examples, understanding p values | definition and examples, what is your plagiarism score.

Tutorial Playlist

Statistics tutorial, everything you need to know about the probability density function in statistics, the best guide to understand central limit theorem, an in-depth guide to measures of central tendency : mean, median and mode, the ultimate guide to understand conditional probability.

A Comprehensive Look at Percentile in Statistics

The Best Guide to Understand Bayes Theorem

Everything you need to know about the normal distribution, an in-depth explanation of cumulative distribution function, a complete guide to chi-square test, what is hypothesis testing in statistics types and examples, understanding the fundamentals of arithmetic and geometric progression, the definitive guide to understand spearman’s rank correlation, mean squared error: overview, examples, concepts and more, all you need to know about the empirical rule in statistics, the complete guide to skewness and kurtosis, a holistic look at bernoulli distribution.

All You Need to Know About Bias in Statistics

A Complete Guide to Get a Grasp of Time Series Analysis

The Key Differences Between Z-Test Vs. T-Test

The Complete Guide to Understand Pearson's Correlation

A complete guide on the types of statistical studies, everything you need to know about poisson distribution, your best guide to understand correlation vs. regression, the most comprehensive guide for beginners on what is correlation, hypothesis testing in statistics - types | examples.

Lesson 10 of 24 By Avijeet Biswal

What Is Hypothesis Testing in Statistics? Types and Examples

Table of Contents

In today’s data-driven world, decisions are based on data all the time. Hypothesis plays a crucial role in that process, whether it may be making business decisions, in the health sector, academia, or in quality improvement. Without hypothesis & hypothesis tests, you risk drawing the wrong conclusions and making bad decisions. In this tutorial, you will look at Hypothesis Testing in Statistics.

The Ultimate Ticket to Top Data Science Job Roles

The Ultimate Ticket to Top Data Science Job Roles

What Is Hypothesis Testing in Statistics?

Hypothesis Testing is a type of statistical analysis in which you put your assumptions about a population parameter to the test. It is used to estimate the relationship between 2 statistical variables.

Let's discuss few examples of statistical hypothesis from real-life - 

  • A teacher assumes that 60% of his college's students come from lower-middle-class families.
  • A doctor believes that 3D (Diet, Dose, and Discipline) is 90% effective for diabetic patients.

Now that you know about hypothesis testing, look at the two types of hypothesis testing in statistics.

Hypothesis Testing Formula

Z = ( x̅ – μ0 ) / (σ /√n)

  • Here, x̅ is the sample mean,
  • μ0 is the population mean,
  • σ is the standard deviation,
  • n is the sample size.

How Hypothesis Testing Works?

An analyst performs hypothesis testing on a statistical sample to present evidence of the plausibility of the null hypothesis. Measurements and analyses are conducted on a random sample of the population to test a theory. Analysts use a random population sample to test two hypotheses: the null and alternative hypotheses.

The null hypothesis is typically an equality hypothesis between population parameters; for example, a null hypothesis may claim that the population means return equals zero. The alternate hypothesis is essentially the inverse of the null hypothesis (e.g., the population means the return is not equal to zero). As a result, they are mutually exclusive, and only one can be correct. One of the two possibilities, however, will always be correct.

Your Dream Career is Just Around The Corner!

Your Dream Career is Just Around The Corner!

Null Hypothesis and Alternative Hypothesis

The Null Hypothesis is the assumption that the event will not occur. A null hypothesis has no bearing on the study's outcome unless it is rejected.

H0 is the symbol for it, and it is pronounced H-naught.

The Alternate Hypothesis is the logical opposite of the null hypothesis. The acceptance of the alternative hypothesis follows the rejection of the null hypothesis. H1 is the symbol for it.

Let's understand this with an example.

A sanitizer manufacturer claims that its product kills 95 percent of germs on average. 

To put this company's claim to the test, create a null and alternate hypothesis.

H0 (Null Hypothesis): Average = 95%.

Alternative Hypothesis (H1): The average is less than 95%.

Another straightforward example to understand this concept is determining whether or not a coin is fair and balanced. The null hypothesis states that the probability of a show of heads is equal to the likelihood of a show of tails. In contrast, the alternate theory states that the probability of a show of heads and tails would be very different.

Become a Data Scientist with Hands-on Training!

Become a Data Scientist with Hands-on Training!

Hypothesis Testing Calculation With Examples

Let's consider a hypothesis test for the average height of women in the United States. Suppose our null hypothesis is that the average height is 5'4". We gather a sample of 100 women and determine that their average height is 5'5". The standard deviation of population is 2.

To calculate the z-score, we would use the following formula:

z = ( x̅ – μ0 ) / (σ /√n)

z = (5'5" - 5'4") / (2" / √100)

z = 0.5 / (0.045)

We will reject the null hypothesis as the z-score of 11.11 is very large and conclude that there is evidence to suggest that the average height of women in the US is greater than 5'4".

Steps in Hypothesis Testing

Hypothesis testing is a statistical method to determine if there is enough evidence in a sample of data to infer that a certain condition is true for the entire population. Here’s a breakdown of the typical steps involved in hypothesis testing:

Formulate Hypotheses

  • Null Hypothesis (H0): This hypothesis states that there is no effect or difference, and it is the hypothesis you attempt to reject with your test.
  • Alternative Hypothesis (H1 or Ha): This hypothesis is what you might believe to be true or hope to prove true. It is usually considered the opposite of the null hypothesis.

Choose the Significance Level (α)

The significance level, often denoted by alpha (α), is the probability of rejecting the null hypothesis when it is true. Common choices for α are 0.05 (5%), 0.01 (1%), and 0.10 (10%).

Select the Appropriate Test

Choose a statistical test based on the type of data and the hypothesis. Common tests include t-tests, chi-square tests, ANOVA, and regression analysis. The selection depends on data type, distribution, sample size, and whether the hypothesis is one-tailed or two-tailed.

Collect Data

Gather the data that will be analyzed in the test. This data should be representative of the population to infer conclusions accurately.

Calculate the Test Statistic

Based on the collected data and the chosen test, calculate a test statistic that reflects how much the observed data deviates from the null hypothesis.

Determine the p-value

The p-value is the probability of observing test results at least as extreme as the results observed, assuming the null hypothesis is correct. It helps determine the strength of the evidence against the null hypothesis.

Make a Decision

Compare the p-value to the chosen significance level:

  • If the p-value ≤ α: Reject the null hypothesis, suggesting sufficient evidence in the data supports the alternative hypothesis.
  • If the p-value > α: Do not reject the null hypothesis, suggesting insufficient evidence to support the alternative hypothesis.

Report the Results

Present the findings from the hypothesis test, including the test statistic, p-value, and the conclusion about the hypotheses.

Perform Post-hoc Analysis (if necessary)

Depending on the results and the study design, further analysis may be needed to explore the data more deeply or to address multiple comparisons if several hypotheses were tested simultaneously.

Types of Hypothesis Testing

To determine whether a discovery or relationship is statistically significant, hypothesis testing uses a z-test. It usually checks to see if two means are the same (the null hypothesis). Only when the population standard deviation is known and the sample size is 30 data points or more, can a z-test be applied.

A statistical test called a t-test is employed to compare the means of two groups. To determine whether two groups differ or if a procedure or treatment affects the population of interest, it is frequently used in hypothesis testing.

Chi-Square 

You utilize a Chi-square test for hypothesis testing concerning whether your data is as predicted. To determine if the expected and observed results are well-fitted, the Chi-square test analyzes the differences between categorical variables from a random sample. The test's fundamental premise is that the observed values in your data should be compared to the predicted values that would be present if the null hypothesis were true.

Hypothesis Testing and Confidence Intervals

Both confidence intervals and hypothesis tests are inferential techniques that depend on approximating the sample distribution. Data from a sample is used to estimate a population parameter using confidence intervals. Data from a sample is used in hypothesis testing to examine a given hypothesis. We must have a postulated parameter to conduct hypothesis testing.

Bootstrap distributions and randomization distributions are created using comparable simulation techniques. The observed sample statistic is the focal point of a bootstrap distribution, whereas the null hypothesis value is the focal point of a randomization distribution.

A variety of feasible population parameter estimates are included in confidence ranges. In this lesson, we created just two-tailed confidence intervals. There is a direct connection between these two-tail confidence intervals and these two-tail hypothesis tests. The results of a two-tailed hypothesis test and two-tailed confidence intervals typically provide the same results. In other words, a hypothesis test at the 0.05 level will virtually always fail to reject the null hypothesis if the 95% confidence interval contains the predicted value. A hypothesis test at the 0.05 level will nearly certainly reject the null hypothesis if the 95% confidence interval does not include the hypothesized parameter.

Become a Data Scientist through hands-on learning with hackathons, masterclasses, webinars, and Ask-Me-Anything! Start learning now!

Simple and Composite Hypothesis Testing

Depending on the population distribution, you can classify the statistical hypothesis into two types.

Simple Hypothesis: A simple hypothesis specifies an exact value for the parameter.

Composite Hypothesis: A composite hypothesis specifies a range of values.

A company is claiming that their average sales for this quarter are 1000 units. This is an example of a simple hypothesis.

Suppose the company claims that the sales are in the range of 900 to 1000 units. Then this is a case of a composite hypothesis.

One-Tailed and Two-Tailed Hypothesis Testing

The One-Tailed test, also called a directional test, considers a critical region of data that would result in the null hypothesis being rejected if the test sample falls into it, inevitably meaning the acceptance of the alternate hypothesis.

In a one-tailed test, the critical distribution area is one-sided, meaning the test sample is either greater or lesser than a specific value.

In two tails, the test sample is checked to be greater or less than a range of values in a Two-Tailed test, implying that the critical distribution area is two-sided.

If the sample falls within this range, the alternate hypothesis will be accepted, and the null hypothesis will be rejected.

Become a Data Scientist With Real-World Experience

Become a Data Scientist With Real-World Experience

Right Tailed Hypothesis Testing

If the larger than (>) sign appears in your hypothesis statement, you are using a right-tailed test, also known as an upper test. Or, to put it another way, the disparity is to the right. For instance, you can contrast the battery life before and after a change in production. Your hypothesis statements can be the following if you want to know if the battery life is longer than the original (let's say 90 hours):

  • The null hypothesis is (H0 <= 90) or less change.
  • A possibility is that battery life has risen (H1) > 90.

The crucial point in this situation is that the alternate hypothesis (H1), not the null hypothesis, decides whether you get a right-tailed test.

Left Tailed Hypothesis Testing

Alternative hypotheses that assert the true value of a parameter is lower than the null hypothesis are tested with a left-tailed test; they are indicated by the asterisk "<".

Suppose H0: mean = 50 and H1: mean not equal to 50

According to the H1, the mean can be greater than or less than 50. This is an example of a Two-tailed test.

In a similar manner, if H0: mean >=50, then H1: mean <50

Here the mean is less than 50. It is called a One-tailed test.

Type 1 and Type 2 Error

A hypothesis test can result in two types of errors.

Type 1 Error: A Type-I error occurs when sample results reject the null hypothesis despite being true.

Type 2 Error: A Type-II error occurs when the null hypothesis is not rejected when it is false, unlike a Type-I error.

Suppose a teacher evaluates the examination paper to decide whether a student passes or fails.

H0: Student has passed

H1: Student has failed

Type I error will be the teacher failing the student [rejects H0] although the student scored the passing marks [H0 was true]. 

Type II error will be the case where the teacher passes the student [do not reject H0] although the student did not score the passing marks [H1 is true].

Our Data Scientist Master's Program covers core topics such as R, Python, Machine Learning, Tableau, Hadoop, and Spark. Get started on your journey today!

Limitations of Hypothesis Testing

Hypothesis testing has some limitations that researchers should be aware of:

  • It cannot prove or establish the truth: Hypothesis testing provides evidence to support or reject a hypothesis, but it cannot confirm the absolute truth of the research question.
  • Results are sample-specific: Hypothesis testing is based on analyzing a sample from a population, and the conclusions drawn are specific to that particular sample.
  • Possible errors: During hypothesis testing, there is a chance of committing type I error (rejecting a true null hypothesis) or type II error (failing to reject a false null hypothesis).
  • Assumptions and requirements: Different tests have specific assumptions and requirements that must be met to accurately interpret results.

Learn All The Tricks Of The BI Trade

Learn All The Tricks Of The BI Trade

After reading this tutorial, you would have a much better understanding of hypothesis testing, one of the most important concepts in the field of Data Science . The majority of hypotheses are based on speculation about observed behavior, natural phenomena, or established theories.

If you are interested in statistics of data science and skills needed for such a career, you ought to explore the Post Graduate Program in Data Science.

If you have any questions regarding this ‘Hypothesis Testing In Statistics’ tutorial, do share them in the comment section. Our subject matter expert will respond to your queries. Happy learning!

1. What is hypothesis testing in statistics with example?

Hypothesis testing is a statistical method used to determine if there is enough evidence in a sample data to draw conclusions about a population. It involves formulating two competing hypotheses, the null hypothesis (H0) and the alternative hypothesis (Ha), and then collecting data to assess the evidence. An example: testing if a new drug improves patient recovery (Ha) compared to the standard treatment (H0) based on collected patient data.

2. What is H0 and H1 in statistics?

In statistics, H0​ and H1​ represent the null and alternative hypotheses. The null hypothesis, H0​, is the default assumption that no effect or difference exists between groups or conditions. The alternative hypothesis, H1​, is the competing claim suggesting an effect or a difference. Statistical tests determine whether to reject the null hypothesis in favor of the alternative hypothesis based on the data.

3. What is a simple hypothesis with an example?

A simple hypothesis is a specific statement predicting a single relationship between two variables. It posits a direct and uncomplicated outcome. For example, a simple hypothesis might state, "Increased sunlight exposure increases the growth rate of sunflowers." Here, the hypothesis suggests a direct relationship between the amount of sunlight (independent variable) and the growth rate of sunflowers (dependent variable), with no additional variables considered.

4. What are the 3 major types of hypothesis?

The three major types of hypotheses are:

  • Null Hypothesis (H0): Represents the default assumption, stating that there is no significant effect or relationship in the data.
  • Alternative Hypothesis (Ha): Contradicts the null hypothesis and proposes a specific effect or relationship that researchers want to investigate.
  • Nondirectional Hypothesis: An alternative hypothesis that doesn't specify the direction of the effect, leaving it open for both positive and negative possibilities.

Find our PL-300 Microsoft Power BI Certification Training Online Classroom training classes in top cities:

NameDatePlace
7 Sep -22 Sep 2024,
Weekend batch
Your City
21 Sep -6 Oct 2024,
Weekend batch
Your City
12 Oct -27 Oct 2024,
Weekend batch
Your City

About the Author

Avijeet Biswal

Avijeet is a Senior Research Analyst at Simplilearn. Passionate about Data Analytics, Machine Learning, and Deep Learning, Avijeet is also interested in politics, cricket, and football.

Recommended Resources

The Key Differences Between Z-Test Vs. T-Test

Free eBook: Top Programming Languages For A Data Scientist

Normality Test in Minitab: Minitab with Statistics

Normality Test in Minitab: Minitab with Statistics

A Comprehensive Look at Percentile in Statistics

Machine Learning Career Guide: A Playbook to Becoming a Machine Learning Engineer

  • PMP, PMI, PMBOK, CAPM, PgMP, PfMP, ACP, PBA, RMP, SP, and OPM3 are registered marks of the Project Management Institute, Inc.
  • Search Menu
  • Sign in through your institution
  • Advance articles
  • Editor's Choice
  • 100 years of the AJE
  • Collections
  • Author Guidelines
  • Submission Site
  • Open Access Options
  • About American Journal of Epidemiology
  • About the Johns Hopkins Bloomberg School of Public Health
  • Journals Career Network
  • Editorial Board
  • Advertising and Corporate Services
  • Self-Archiving Policy
  • Dispatch Dates
  • Journals on Oxford Academic
  • Books on Oxford Academic

Society for Epidemiologic Research

Article Contents

Abbreviations, a framework for hypothesis generation, acknowledgments.

  • < Previous

Hypothesis Generation During Foodborne-Illness Outbreak Investigations

  • Article contents
  • Figures & tables
  • Supplementary Data

Alice E White, Kirk E Smith, Hillary Booth, Carlota Medus, Robert V Tauxe, Laura Gieraltowski, Elaine Scallan Walter, Hypothesis Generation During Foodborne-Illness Outbreak Investigations, American Journal of Epidemiology , Volume 190, Issue 10, October 2021, Pages 2188–2197, https://doi.org/10.1093/aje/kwab118

  • Permissions Icon Permissions

Hypothesis generation is a critical, but challenging, step in a foodborne outbreak investigation. The pathogens that contaminate food have many diverse reservoirs, resulting in seemingly limitless potential vehicles. Identifying a vehicle is particularly challenging for clusters detected through national pathogen-specific surveillance, because cases can be geographically dispersed and lack an obvious epidemiologic link. Moreover, state and local health departments could have limited resources to dedicate to cluster and outbreak investigations. These challenges underscore the importance of hypothesis generation during an outbreak investigation. In this review, we present a framework for hypothesis generation focusing on 3 primary sources of information, typically used in combination: 1) known sources of the pathogen causing illness; 2) person, place, and time characteristics of cases associated with the outbreak (descriptive data); and 3) case exposure assessment. Hypothesis generation can narrow the list of potential food vehicles and focus subsequent epidemiologic, laboratory, environmental, and traceback efforts, ensuring that time and resources are used more efficiently and increasing the likelihood of rapidly and conclusively implicating the contaminated food vehicle.

Shiga toxin-producing Escherichia coli

pulsed-field gel electrophoresis

whole-genome sequencing

hypothesis-generating questionnaire

Foodborne diseases are a continuing public health problem in the United States, where they cause an estimated 48 million illnesses, 128,000 hospitalizations, and 3,000 deaths annually ( 1 ). Public health and regulatory agencies rely on data from foodborne disease surveillance and outbreak investigations to prioritize food safety regulations, policies, and practices aimed at reducing the burden of disease ( 2 ). In particular, foodborne illness outbreaks provide critical information on the foods causing illness, common food-pathogen pairs, and high-risk production technologies and practices. However, only half of the foodborne outbreaks reported each year identify a pathogen, and less than half implicate a food vehicle, decreasing the utility of these data ( 3 ).

A model framework for hypothesis generation during a foodborne-illness outbreak investigation.

A model framework for hypothesis generation during a foodborne-illness outbreak investigation.

Foodborne disease outbreaks require rapid public health response to quickly identify potential sources and prevent future exposures; however, implicating a food vehicle in an outbreak can be challenging. The pathogens that contaminate food have many diverse reservoirs and can be transmitted in other ways (e.g., from one person to another or through contact with animals or contaminated water), resulting in seemingly limitless potential vehicles ( 2 ). Identifying a food vehicle is particularly challenging for clusters detected through national pathogen-specific surveillance: Cases can be geographically dispersed and lack an obvious epidemiologic link ( 4 ). Moreover, state and local health departments might have limited resources to dedicate to cluster and outbreak investigations ( 5 ). These challenges underscore the importance of hypothesis generation during an outbreak investigation. Hypothesis generation can narrow the list of potential food vehicles and focus subsequent epidemiologic, laboratory, environmental, and traceback efforts, ensuring that time and resources are used more efficiently and increasing the likelihood of timely identification of the vehicle. Timely investigations can prevent additional illnesses and increase the likelihood of identifying factors contributing to the outbreak.

The Integrated Food Safety Centers of Excellence were established in 2012 under the Food Safety Modernization Act to serve as resources for federal, state, and local public health professionals who detect and respond to foodborne illness outbreaks. The Integrated Food Safety Centers of Excellence aim to improve the quality of foodborne-illness outbreak investigations by providing public health professionals with training, tools, and model practices. In this paper, we provide a framework for generating hypotheses early during investigation of an outbreak or cluster detected through pathogen-specific surveillance; highlight tools to support rapid and effective hypothesis generation; and illustrate the practice of hypothesis generation using example outbreak case studies.

A hypothesis is “a supposition, arrived at from observation or reflection, that leads to refutable predictions; (or) any conjecture cast in a form that will allow it to be tested and refuted” ( 6 ). In a foodborne outbreak, the hypothesis states which food vehicle(s) could be the source of the outbreak and warrant further investigation. In practice, hypothesis generation is dynamic and iterative. It begins in the earliest stages of an investigation as investigators review available information and look for a pattern or “signal” that might emerge. As more information becomes available hypotheses are frequently evaluated and refined.

The framework presented here focuses on 3 primary sources of information for generating hypotheses, typically used in combination: 1) known sources of the pathogen causing illness; 2) person, place, and time characteristics of cases associated with the outbreak (descriptive data); and 3) case exposure assessment ( Figure 1 ). We discuss the approach for collecting, summarizing, and interpreting each of these sources of information and provide example outbreak case studies ( Table 1 ). We focus primarily on food exposures. However, at the onset of an investigation the transmission route is often unknown, and many pathogens commonly transmitted though food can also be transmitted through other routes (e.g., animal contact, person-to-person, waterborne). Thus, hypothesis generation should consider all potential transmission routes early in the investigation. Moreover, hypothesis generation should involve a multidisciplinary outbreak investigation team, including experienced colleagues who can provide information about past outbreaks and known sources of the pathogen causing illness.

Foodborne-Illness Outbreak Case Studies Highlighting Hypothesis-Generation Methods, United States, 2006–2018

STEC O157 outbreaks
STEC O157 associated with spinach22527August–September 2006Of cases, 72% were female, with a median age of 27 years (range, 1–84 years), similar to descriptive data for other leafy greens outbreaks. Cases were interviewed using the Oregon “shotgun” questionnaire and results compared with the FoodNet Population Survey using binomial probability calculations; the proportion of outbreak cases reporting fresh spinach consumption was statistically significantly higher than the surveyed population ( ).
HG methods: descriptive data, “shotgun” questionnaire, binomial probability calculations
STEC O157 associated with cookie dough7730March–July 2009Investigators initially focused on known sources for STEC O157 (e.g., ground beef, raw dairy products), but no commonalities were identified. Then, a single interviewer conducted conversational open-ended interviews with 5 cases; all reported consuming ready-to-bake commercial prepackaged cookie dough. This hypothesis aligned with descriptive data (median age of 15 years, range, 2–65 years, 71% female) ( ).
HG methods: open-ended interviews, single interviewer, descriptive data
STEC O157 associated with hazelnuts83December 2010 to February 2011In HG interviews most cases reported eating ground beef and in-shell mixed-nuts or in-shell hazelnuts. The ground beef hypothesis was ruled out because cases reported purchasing ground beef that was locally processed and distributed (i.e., inconsistent with cases in multiple states). The hazelnuts hypothesis was supported by binomial probability and case-case comparison studies, and confirmed using traceback investigations ( ).
HG methods: specific product information, binomial probability calculations, case-case comparisons
outbreaks
serotypes Wandsworth and Typhimurium associated with vegetable-coated snack food6923February–June 2007Investigators in multiple states interviewed parents of cases (96% of whom were <6 years of age) using HGQs. After no signal emerged, a single interviewer conducted 10 open-ended interviews, while 6 interviews were conducted using a questionnaire that included previously mentioned items and foods commonly consumed by young children. After multiple cases reported eating a vegetable-coated snack food, a formal multistate case-control study was conducted. The serotype Wandsworth strain was identified in product testing, along with multiple other serotypes. In a “backward” investigation, cases in PulseNet with the other outbreak strains were interviewed and found to have also consumed the snack food ( ).
HG methods: single interviewer, open-ended interviews, iterative interviewing, backward investigation
I 4:[5]:12:i:- associated with Banquet turkey pot pies27235August–October 2007During HG interviews, the first 2 cases reported frequent consumption of various microwaveable entrees. The third case reported daily consumption of Banquet pot pies. This prompted investigators to implement the iterative interviewing approach. When investigators specifically asked the first 2 cases, they both reported eating Banquet pot pies. A specific question about Banquet pot pies was added to the HG interviews for new cases, and the fourth case also reported having eaten them. The hypothesis was quickly confirmed by other states who asked a handful of cases specifically about their consumption of Banquet pot pies.
HG methods: specific product information, iterative interviewing
Typhimurium associated with peanut butter71446September 2008 to March 2009During HG interviews, 58% of cases reported exposure to institutional settings, 71% reported eating peanut butter, and 86% reported eating chicken, although cases reported eating multiple brands of both peanut butter and chicken. Then, investigators in one state were able to identify a common food distributor (of peanut butter) for subclusters of cases at 2 different long-term care facilities and an elementary school. Testing of an open 5-lb. container of peanut butter from one of the long-term care facilities yielded the outbreak strain. The company that produced the peanut butter also produced peanut paste used in packaged peanut butter crackers consumed by numerous cases in another state. Additional traceback investigations and testing of intact food products in other states ultimately confirmed the source as peanut butter ( ).
HG methods: subcluster investigation, food testing
Virchow associated with Garden of Life Raw Meal Replacement3323December 2015 to March 2016Garden of Life Raw Meal Replacement emerged as a strong hypothesis after it was mentioned by 3 cases in 3 different states and was quickly confirmed by interviewing a few additional cases. Three different questionnaires were used by state investigators, which shows it is not necessarily questionnaire design that is most important, but rather doing a quality interview and obtaining product details (either at the time of initial interview or upon re-interview) ( ).
HG methods: specific product information
Montevideo associated with black and red pepper27244July 2009 to April 2010Investigators conducted HG interviews, which did not lead to a hypothesis, but they did identify 3 subclusters. During open-ended interviews, cases reported consuming Italian-style meats and salami, and shopping at a national warehouse store chain. Using warehouse store membership cards, investigators confirmed that multiple cases had purchased the same pepper-encrusted salami product ( ).
HG methods: subcluster investigation, shopper membership-card purchase information
Multiple serotypes associated with kratom19941January 2017 to May 2018On the first multistate coordinating call, an investigator stated that a case mentioned “kratom” on a routine interview when asked about dietary supplements. This novel exposure was added to a supplemental question list for the outbreak shared with investigators and many others quickly collected reports of kratom consumption. Testing samples of kratom identified other serotypes, which matched more cases in PulseNet, who on interview had also consumed kratom. Ultimately, there were dozens of distinct PFGE patterns and 6 serotypes ( ).
HG methods: iterative interviewing, backward investigation
outbreaks
associated with Crave Brothers Cheese65May–July 2013During interviews using the Initiative questionnaire (44), all 5 cases in a 4-state cluster reported eating soft cheeses at restaurants or from grocery stores. Investigators identified Crave Brothers as the common producer. A search of the PulseNet database revealed a large number of matching (by PFGE) environmental isolates collected 2 years prior, and all had come from the Crave Brothers plant.
HG methods: specific product information, iterative interviewing, historical environmental isolates in PulseNet
associated with prepackaged caramel apples3512October 2014 to January 2015Investigators conducted an open-ended interview with the first case. Then, investigators conducted an open-ended interview with the second case, along with adding objective questions about some foods mentioned by the first case. Specifically, a local investigator asked the second case about caramel apples based on the first case’s interview. The hypothesis was strengthened by other states quickly re-interviewing their cases ( ).
HG methods: open-ended interviews, iterative interviewing
STEC O157 outbreaks
STEC O157 associated with spinach22527August–September 2006Of cases, 72% were female, with a median age of 27 years (range, 1–84 years), similar to descriptive data for other leafy greens outbreaks. Cases were interviewed using the Oregon “shotgun” questionnaire and results compared with the FoodNet Population Survey using binomial probability calculations; the proportion of outbreak cases reporting fresh spinach consumption was statistically significantly higher than the surveyed population ( ).
HG methods: descriptive data, “shotgun” questionnaire, binomial probability calculations
STEC O157 associated with cookie dough7730March–July 2009Investigators initially focused on known sources for STEC O157 (e.g., ground beef, raw dairy products), but no commonalities were identified. Then, a single interviewer conducted conversational open-ended interviews with 5 cases; all reported consuming ready-to-bake commercial prepackaged cookie dough. This hypothesis aligned with descriptive data (median age of 15 years, range, 2–65 years, 71% female) ( ).
HG methods: open-ended interviews, single interviewer, descriptive data
STEC O157 associated with hazelnuts83December 2010 to February 2011In HG interviews most cases reported eating ground beef and in-shell mixed-nuts or in-shell hazelnuts. The ground beef hypothesis was ruled out because cases reported purchasing ground beef that was locally processed and distributed (i.e., inconsistent with cases in multiple states). The hazelnuts hypothesis was supported by binomial probability and case-case comparison studies, and confirmed using traceback investigations ( ).
HG methods: specific product information, binomial probability calculations, case-case comparisons
outbreaks
serotypes Wandsworth and Typhimurium associated with vegetable-coated snack food6923February–June 2007Investigators in multiple states interviewed parents of cases (96% of whom were <6 years of age) using HGQs. After no signal emerged, a single interviewer conducted 10 open-ended interviews, while 6 interviews were conducted using a questionnaire that included previously mentioned items and foods commonly consumed by young children. After multiple cases reported eating a vegetable-coated snack food, a formal multistate case-control study was conducted. The serotype Wandsworth strain was identified in product testing, along with multiple other serotypes. In a “backward” investigation, cases in PulseNet with the other outbreak strains were interviewed and found to have also consumed the snack food ( ).
HG methods: single interviewer, open-ended interviews, iterative interviewing, backward investigation
I 4:[5]:12:i:- associated with Banquet turkey pot pies27235August–October 2007During HG interviews, the first 2 cases reported frequent consumption of various microwaveable entrees. The third case reported daily consumption of Banquet pot pies. This prompted investigators to implement the iterative interviewing approach. When investigators specifically asked the first 2 cases, they both reported eating Banquet pot pies. A specific question about Banquet pot pies was added to the HG interviews for new cases, and the fourth case also reported having eaten them. The hypothesis was quickly confirmed by other states who asked a handful of cases specifically about their consumption of Banquet pot pies.
HG methods: specific product information, iterative interviewing
Typhimurium associated with peanut butter71446September 2008 to March 2009During HG interviews, 58% of cases reported exposure to institutional settings, 71% reported eating peanut butter, and 86% reported eating chicken, although cases reported eating multiple brands of both peanut butter and chicken. Then, investigators in one state were able to identify a common food distributor (of peanut butter) for subclusters of cases at 2 different long-term care facilities and an elementary school. Testing of an open 5-lb. container of peanut butter from one of the long-term care facilities yielded the outbreak strain. The company that produced the peanut butter also produced peanut paste used in packaged peanut butter crackers consumed by numerous cases in another state. Additional traceback investigations and testing of intact food products in other states ultimately confirmed the source as peanut butter ( ).
HG methods: subcluster investigation, food testing
Virchow associated with Garden of Life Raw Meal Replacement3323December 2015 to March 2016Garden of Life Raw Meal Replacement emerged as a strong hypothesis after it was mentioned by 3 cases in 3 different states and was quickly confirmed by interviewing a few additional cases. Three different questionnaires were used by state investigators, which shows it is not necessarily questionnaire design that is most important, but rather doing a quality interview and obtaining product details (either at the time of initial interview or upon re-interview) ( ).
HG methods: specific product information
Montevideo associated with black and red pepper27244July 2009 to April 2010Investigators conducted HG interviews, which did not lead to a hypothesis, but they did identify 3 subclusters. During open-ended interviews, cases reported consuming Italian-style meats and salami, and shopping at a national warehouse store chain. Using warehouse store membership cards, investigators confirmed that multiple cases had purchased the same pepper-encrusted salami product ( ).
HG methods: subcluster investigation, shopper membership-card purchase information
Multiple serotypes associated with kratom19941January 2017 to May 2018On the first multistate coordinating call, an investigator stated that a case mentioned “kratom” on a routine interview when asked about dietary supplements. This novel exposure was added to a supplemental question list for the outbreak shared with investigators and many others quickly collected reports of kratom consumption. Testing samples of kratom identified other serotypes, which matched more cases in PulseNet, who on interview had also consumed kratom. Ultimately, there were dozens of distinct PFGE patterns and 6 serotypes ( ).
HG methods: iterative interviewing, backward investigation
outbreaks
associated with Crave Brothers Cheese65May–July 2013During interviews using the Initiative questionnaire (44), all 5 cases in a 4-state cluster reported eating soft cheeses at restaurants or from grocery stores. Investigators identified Crave Brothers as the common producer. A search of the PulseNet database revealed a large number of matching (by PFGE) environmental isolates collected 2 years prior, and all had come from the Crave Brothers plant.
HG methods: specific product information, iterative interviewing, historical environmental isolates in PulseNet
associated with prepackaged caramel apples3512October 2014 to January 2015Investigators conducted an open-ended interview with the first case. Then, investigators conducted an open-ended interview with the second case, along with adding objective questions about some foods mentioned by the first case. Specifically, a local investigator asked the second case about caramel apples based on the first case’s interview. The hypothesis was strengthened by other states quickly re-interviewing their cases ( ).
HG methods: open-ended interviews, iterative interviewing

Abbreviations: STEC: Shiga toxin-producing Escherichia coli , HG: hypothesis generation, HGQ: hypothesis-generating questionnaires, PFGE: pulsed-field gel electrophoresis.

Known pathogen sources

When generating a hypothesis, investigators should consider historical information about the causative pathogen, including known reservoirs; foods (and animals) implicated in past outbreaks; findings from case-control studies of sporadic illnesses (i.e., diagnosed cases investigated during routine surveillance not linked to other cases); and molecular subtyping information of the pathogen, including information about nonhuman isolates (i.e., food, animal, or environmental sources).

The reservoir of the infectious agent can indicate potential sources and contributing factors. Pathogens with a human reservoir (e.g., norovirus, hepatitis A virus, and Shigella ) are commonly associated with infected food handlers or ready-to-eat foods that have been contaminated with human feces. In contrast, pathogens with animal reservoirs (e.g., Shiga toxin-producing Escherichia coli (STEC), nontyphoidal Salmonella , and Campylobacter ) are often associated with food sources of animal origin or foods that have been contaminated by animal feces during production (e.g., fresh produce). Pathogens with environmental reservoirs (e.g., Vibrio spp., Listeria monocytogenes , Clostridium botulinum ) are commonly associated with foods that can become contaminated by soil or water. Tools that help identify known pathogen sources include the National Outbreak Reporting System Dashboard ( 7 ), the Food and Drug Administration Bad Bug Book ( 8 ), and An Atlas of Salmonella in the United States ( 9 ).

Food-pathogen pairs identified in past outbreaks and case-control studies of sporadic illnesses provide information on common food vehicles associated with a pathogen. Using data on reported outbreaks from 1998–2016, the Interagency Food Safety Analytics Collaboration estimated the proportion of illnesses attributable to 17 major food categories ( 10 ). The foods most commonly associated with Salmonella illnesses were seeded vegetables (e.g., tomatoes and cucumbers), chicken, pork, and fruit, whereas most STEC illnesses were attributed to leafy greens or beef, and most Listeria illnesses to dairy products or fruits. Similarly, case-control studies of sporadic illnesses have found associations between pathogens and specific foods; for example, Campylobacter and poultry ( 11 ) and Listeria monocytogenes and melons and hummus ( 12 ).

For pathogens with multiple reservoirs, information that distinguishes isolates of the same species by phenotypic or genotypic characteristics can provide increased specificity. For example, there are over 2,600 serotypes of Salmonella ; however, some serotypes have been associated with specific food vehicles, such as Salmonella enterica serotype Enteritidis (SE) and eggs and chicken; serotypes Uganda and Infantis and pork; and serotypes Litchfield, Poona, Oranienburg, and Javiana and fruit ( 13 ). Antimicrobial resistance has also proven useful in differentiating major sources of Salmonella serotypes found in both animal- and plant-derived food commodities. For example, antimicrobial-resistant Salmonella outbreaks were more likely to be associated with meat and poultry (e.g., beef, chicken, and turkey), whereas foods commonly associated with susceptible Salmonella outbreaks were eggs, tomatoes, and melons ( 14 ).

Molecular subtyping with pulsed-field gel electrophoresis (PFGE) has been an essential subtyping tool for outbreak detection, and PFGE patterns have been associated with specific foods . For example, SE isolates with PFGE PulseNet pattern JEGX01.0004 have commonly been associated with eggs (and more recently, chicken), pattern JEGX01.0005 with chicken, and pattern JEGX01.0002 with travel or exposure to the US Pacific Northwest region and Mexico. Similarly, the same PFGE pattern of STEC O157:H7 has been associated with recurrent romaine lettuce outbreaks ( 15 , 16 ). In July 2019, whole-genome sequencing (WGS) replaced PFGE as the standard molecular subtyping method for the national PulseNet network, providing greater discrimination and more reliable indication of genetically related groupings than PFGE. This change in molecular method might limit historical comparisons temporarily, particularly to isolates from before the transition, as PFGE patterns and WGS results are not readily comparable. However, WGS allele codes have been applied to sequenced historical isolates in PulseNet, and although this represents a small proportion of all isolates in PulseNet, the representativeness of the WGS database will increase with time. As historical isolates and regulatory isolates from the Food and Drug Administration and US Department of Agriculture Food Safety and Inspection Service are sequenced, information about recent findings in foods and animals will fill the national database maintained at the National Center for Biotechnology Information ( 17 ) and be readily comparable to sequenced human clinical isolates.

Subtyping of nonhuman isolates collected by regulatory agencies from foods and food chain environments through routine testing or special studies can lead to the identification of outbreaks of human illness by searching the PulseNet database for the same molecular subtypes in human infections, sometimes referred to as “backward” outbreaks. For example, in 2007 public health authorities were investigating a multistate outbreak of Salmonella serotype Wandsworth in which patients reported consuming a puffed vegetable-coated snack food. Food testing yielded the outbreak strain of Salmonella serotype Wandsworth, but it also yielded Salmonella serotype Typhimurium; a search in the PulseNet database identified matching isolates from human cases of Salmonella serotype Typhimurium infection, and these cases confirmed consumption of the same snack food upon re-interview ( 18 ). Importantly, identifying a close genetic match between strains from a product and an illness does not alone establish causation; epidemiologic investigation and traceback are needed to connect the product and patient.

Descriptive data

Descriptive epidemiology of cases, including person, place, or time characteristics, remains a powerful tool for hypothesis generation. Person characteristics can suggest foods that are more likely to be eaten by certain groups, whereas place and time characteristics can provide clues about the geographic distribution and shelf life of the food.

Person characteristics suggestive of certain foods include, but are not limited to, sex age, race, and ethnicity. For example, the median percentage of female cases in vegetable-associated STEC outbreaks was 64%, compared with 50% in beef STEC outbreaks ( 19 ). Likewise, there are differences in food consumption patterns by age, with the lowest median percent of children and adolescents in vegetable-associated STEC outbreaks and the highest in STEC dairy outbreaks ( 19 ). Similar trends are evident in the Centers for Disease Control and Prevention FoodNet Population Survey, a population-based survey to estimate the prevalence of risk factors for foodborne illness, which found that women reported consuming more fruits and vegetables than men, and men reported consuming more meat and poultry ( 20 ).

Time characteristics, displayed by the shape and pattern of an epidemic curve, can indicate the shelf life of a product or the harvest duration of a contaminated field. For example, cases spread over a longer time period might suggest a shelf-stable or frozen food item, ongoing harborage of the contaminating pathogen in a food processing plant, or other sustained mechanism of contamination. Conversely, cases with illness onset dates spread over a limited duration of time might suggest a perishable item, such as fresh produce. However, some fresh produce items have longer shelf lives than others and can cause more protracted outbreaks. Additionally, there are “special case” produce types. For example, outbreaks associated with sprouted seeds or beans, which have a short shelf life, are typically driven by a single contaminated seed lot, and un-sprouted seeds and beans can have a shelf life of months to years. Thus, single batches might be sprouted from the same contaminated lot of seeds at different times and in different places leading to a more sustained outbreak, or resulting in temporally and geographically distinct outbreaks ( 21 ). If an outbreak is detected early and exposure is ongoing, the temporal distribution of cases might be less clear early in an investigation. Thus, epidemic curves can provide supporting evidence that adds to the plausibility of a suspected food vehicle; however, depending on the outbreak, epidemic curves might provide more relevant information as the outbreak progresses.

Geographical mapping of cases can also help assess the plausibility of a suspected vehicle by comparing the distribution of cases with the distribution pattern of that food item, in consultation with regulatory and industry partners. For example, widespread outbreaks are caused by widely distributed commercial products, and some foods are more likely to be distributed nationally (e.g., bagged leafy greens, packaged cereal, national meat brands), whereas other are more likely to be distributed regionally (e.g., popular brands of ice cream) or locally (e.g., raw milk) ( 22 ). Likewise, if some outbreak-associated illnesses are clearly related to travel to a specific country, and others are in nontravelers, it suggests the latter might be associated with a product imported from that country. For example, a 2018 outbreak of Salmonella serotype Typhimurium infections in Canada occurred among persons traveling to Thailand, and among others who shopped at particular stores in Western Canada; the outbreak was ultimately traced to contaminated frozen profiteroles imported from Thailand ( 23 ). Similarly, in a 2011 multistate outbreak in the United States, a subset of cases traveled to Mexico and ate papaya there, and nontravel-associated cases ate papaya imported from Mexico ( 24 ).

Outbreak size and distribution can suggest certain food-pathogen pairs. For example, seafood toxins like ciguatoxin are typically produced or concentrated in an individual fish and therefore cause illness in a limited number of people in a single jurisdiction, whereas Salmonella and other bacterial pathogens can contaminate large amounts of a widely distributed product ( 22 ). The distribution of cases can be misleading or incomplete early in an outbreak, so investigators must use caution when using these parameters to rule out hypotheses and revisit as additional cases are identified. Moreover, an apparently local outbreak can be an early indicator of a larger problem. For example, in 2018, a large multistate outbreak of E. coli O157:H7 infections linked to romaine lettuce was initially detected in New Jersey in association with a single restaurant chain; within 8 days of detecting the cluster it had expanded to include many more cases with a variety of different exposure locations as far away as Nome, Alaska ( 15 ).

Case exposure assessment

Rapidly collecting detailed food histories from cases in an outbreak is the most critical step in identifying commonalities between these cases. Before a cluster is detected, local or state public health agencies typically attempt to interview each individual, reportable enteric-pathogen case using a standard pathogen-specific questionnaire. If a cluster is detected, a review of these routine interviews can provide information on obvious high-risk exposures. In most jurisdictions, detailed hypothesis-generating questionnaires (HGQs) historically have been used only if commonalities are not identified from the initial routine interviews or if the hypotheses identified from routine interviews collapse under further investigation. However, a growing number of state health jurisdictions are conducting hypothesis-generating interviews with all cases of laboratory-confirmed Salmonella and STEC infection, opting to gather this information during the initial interview. This method is considered a best practice to maximize exposure recall ( 25 ), shaving days or weeks off the delay between case exposure and hypothesis-generating interview.

There are 3 major types of HGQs used in the United States ( 26 ):

Oregon “shotgun” questionnaire: This questionnaire uses a “shotgun,” or “trawling” approach of asking mostly close-ended questions for a long list of individual food items. The section order is designed to prompt recall of specific food exposures through review of places where food was purchased or eaten out, and specific repetitive questions for high-risk exposures such as raw foods or sprouts.

Minnesota “long form” hypothesis-generating questionnaire: This questionnaire combines close-ended questions about fewer food items with open-ended questions that seek details on dining/purchase location and brand-variety details for all foods.

National Hypothesis Generating Questionnaire: This questionnaire is a hybridized approach developed by Centers for Disease Control and Prevention that contains elements of both the Oregon and Minnesota models. Close-ended questions are asked about an intermediate number of food items, and brand/variety details are obtained only for commonly eaten types of foods. During national cluster investigations, the National Hypothesis Generating Questionnaire is deployed across state and local health departments to improve standardization across jurisdictions.

In addition to these questionnaires, there are many modified state-specific versions and national pathogen-specific HGQs (e.g., Listeria Initiative questionnaire, Cyclospora ). The use of HGQs can be enhanced by adopting a dynamic or iterative cluster investigation approach. In this approach, if a suspected food item or branded product emerges during interviews, that food item can be added to questionnaires administered to subsequent cases, and individuals who have already been interviewed can be re-interviewed to systematically collect information about that exposure ( 27 ). Decisions about which exposures should be pursued through re-interviews can be informed by descriptive data, as well as incubation periods, which can help define the most likely exposure period ( 28 ).

The number of interviewers participating in hypothesis-generating interviews can depend on resources and the specifics of the outbreak. A single interviewer approach can be advantageous in that a single interviewer might more clearly remember what previously interviewed persons mentioned and pursue clues as they arise during a live interview. However, this approach could slow investigations, particularly in sizable multistate clusters. An alternative is the “lead investigator model,” in which a single person directs the interviewing team with a limited number of interviewers, reviews completed interviews, and decides which exposures to pursue. This approach can be faster and more efficient than the single interviewer approach. When interviews are done by multiple agencies, it is important that the completed interviews be forwarded to the lead investigator promptly and that the group meet regularly and review results of interviews as the investigation proceeds.

If interviews with HGQs do not yield an actionable hypothesis, investigators should consider alternative approaches, such as questionnaire modification or open-ended interviews. Deciding when to attempt an alternative approach depends on cluster size, velocity of incident cases, and investigation effort expended and time elapsed without identification of a solid hypothesis. Questionnaire modification could include adding questions, such as open-ended questions or supplemental questions about exposures that came up on previous interviews, or pruning questions. For example, after 8–10 interviews, items that no case reported “yes” or “maybe” to eating may be removed. Removal of questions should be done cautiously because certain foods (e.g., stealth ingredients such as cilantro and sprouts) might be reported by a low proportion of cases who ate them. Another approach is open-ended interviews of recent cases, which could be considered after 20–25 initial cases in a large multistate investigation have been interviewed without yielding solid hypotheses. Conducted by a single interviewer, if possible, open-ended interviews should cover everything that a case ate or drank in the exposure period of interest, as well as other exposures including animals, grocery stores, restaurants, travel, parties or events, and details about how they prepare their food at home, including recipes. After the first person is interviewed, objective questions about specific exposures can be added to the open-ended interviews of subsequent cases, creating a hybrid open-ended/iterative model. This requires cooperative patients and a persistent investigative approach but has yielded correct hypotheses with as few as 2 interviews ( 29 ).

Additional methods to ascertain exposures, such as obtaining consumer food purchase data, can be appropriate, particularly for outbreaks where obtaining a food history is challenging ( 30 ). For example, during a multistate Salmonella serotype Montevideo outbreak, initial hypothesis-generating interviews did not identify a clear signal beyond shopping at the same warehouse store. Investigators used shopper membership card purchase information to generate hypotheses, which ultimately helped identify red and black peppercorns coating a ready-to-eat salami as the vehicle ( 31 ). In addition, information from services for grocery home delivery, restaurant take-out delivery, and meal kits might help to clarify specific exposures. Other potential methods include focus-group interviews and household inspections, although these are used more rarely and in specific scenarios, with mixed results ( 32 ).

Binomial probability comparisons can further refine hypotheses by comparing the proportion of cases in an outbreak reporting a food exposure with the expected background proportion of the population reporting the food exposure ( 33 , 34 ). Binomial probability calculations in foodborne-disease outbreak investigations emerged in Oregon in 2003 as a complement to the pioneered “shotgun” questionnaire and use independent data sources on food exposure frequency from sporadic cases, past outbreak cases, or well persons sampled from the population. Such data sources include data from healthy people surveyed as part of the FoodNet Population Survey, standardized data collected in previous outbreaks, or sporadic cases as is done with the Listeria Initiative and Project Hg ( 33 , 35 , 36 ).

Hypothesis generation is a critical, but challenging, step in a foodborne outbreak investigation. A well-informed hypothesis can increase the likelihood of rapidly and conclusively implicating the contaminated food vehicle; conversely, the chances of implicating a food item are small if that item is not considered as part of the outbreak investigation. Inadequate hypothesis generation can delay investigation progress and limit investigators’ ability to rapidly identify the outbreak source, potentially leading to prolonged exposure and more illnesses. The 3 primary sources of information presented as part of this framework—known sources of the pathogen causing illness, descriptive data, and case exposure assessment—provide vital information for hypothesis generation, particularly when used in combination and revisited throughout the outbreak investigation.

Despite these sources of information, there are certain types of outbreaks for which hypothesis generation is inherently more challenging. These include outbreaks for which the vehicle has a high background rate of consumption (e.g., chicken) or outbreaks associated with a “stealth” food (e.g., garnishes, spices, chili peppers, or sprouts) that many cases could have consumed, but few remember eating. These challenges can sometimes be overcome by obtaining details on food exposures such as brand/variety and point of purchase. Obtaining this information is also critical to rapidly initiating a traceback investigation. An outbreak might also be caused by multiple contaminated food products when, for example, multiple foods have a single common ingredient or when poor sanitation or contaminated equipment leads to cross-contamination. Furthermore, the key exposure might not be a food at all, but rather an environmental or animal exposure, emphasizing that food should not be the default hypothesis.

There might be specific clues or “toe-holds” that help identify a hypothesis and accelerate an investigation. For example, cases with restricted diets, food diaries, or highly unusual or specific exposures can narrow the list of potential foods. This could include cases who traveled briefly to the outbreak location, and thus had a limited number of exposures. Smaller, localized clusters within a larger outbreak associated with restaurants, events, stores, or institutions, or “subclusters,” are often crucial to hypothesis generation, providing a finite list of foods. For example, in a multistate outbreak of Salmonella serotype Typhimurium infections associated with consumption of tomatoes, comparison of 4 restaurant-associated subclusters was instrumental in rapidly identifying a small set of potential vehicles ( 4 ). Subcluster investigations are precisely focused and as such can lead to much more rapid and efficient hypothesis generation and testing than attempts to assess all exposures among all cases in a large outbreak. Because of the immense value of subclusters, every effort should be made to quickly identify them through initial interviews and the iterative interviewing approach ( 25 ).

The majority of outbreaks are associated with common foods previously associated with that pathogen. In an investigation, it is important to both rule in and rule out common vehicles, while keeping an open mind about potential novel vehicles. If investigators suspect a novel vehicle, they should still rule out the most common vehicles when designing epidemiologic studies. For example, if an STEC outbreak investigation implicates cucumbers, regulatory partners will want to confirm that investigators have eliminated common STEC vehicles such as ground beef, leafy greens, and sprouts. That said, food vehicles change over time, reflecting changing food preferences and trends in food safety measures, and new vehicles continue to emerge (e.g., in recent years: SoyNut butter, raw flour, caramel apples, kratom, and chia seed powder). HGQs are biased toward previously implicated foods and a finite list of foods. If cases continue without a clear hypothesis emerging, it might be necessary to try open-ended hypothesis-generating interviews.

Hypothesis generation during foodborne outbreak investigation will evolve as laboratory techniques advance. Molecular sequencing techniques based on WGS might give investigators more conviction in devoting resources to following leads because there is more confidence that the cases have a common source for their illnesses ( 17 , 37 ). Concurrent or recent nonhuman isolates (e.g., food isolates) that match human case isolates by sequencing will be considered even more likely to be related to the human cases and become a priori hypotheses during investigations.

Foodborne-outbreak investigation methods are constantly evolving. Food production, processing, and distribution are changing to meet consumer demands. Outbreak investigations are more complex, given that laboratory methods for subtyping, strategies for epidemiologic investigation, and environmental assessments are also changing. Rapid investigation is essential, because with mass production and distribution, food safety errors can cause large and widespread outbreaks. Outbreak investigations balance the need for expediency to implement control measures with the need for accuracy. If hastily developed hypotheses are incorrect or insufficiently refined, analytical studies are unlikely to succeed and can waste time and resources. Alternatively, a refined hypothesis can lead directly to effective public health interventions, sometimes bypassing the need for an analytical study, if accompanied with other compelling evidence, such as laboratory evidence or traceback information.

Effectively and swiftly sharing data across jurisdictions increases an investigations team’s ability to quickly develop hypotheses and implicate food vehicles. Successful investigations depend on including the correct hypothesis, the result of a systematic approach to hypothesis generation. The exact path to identifying a hypothesis is rarely the same between outbreaks. Therefore, investigators should be familiar with different hypothesis-generating strategies and be flexible in deciding which strategies to employ.

Author affiliations: Department of Epidemiology, Colorado School of Public Health, Aurora, Colorado, United States (Alice E. White, Elaine Scallan Walter); Minnesota Department of Health, St. Paul, Minnesota, United States (Kirk E. Smith, Carlota Medus); Washington State Department of Health, Tumwater, Washington, United States (Hillary Booth); and Division of Foodborne, Waterborne, and Environmental Diseases, National Center for Emerging Zoonotic and Infectious Diseases, Centers for Disease Control and Prevention, Atlanta, Georgia, United States (Robert V. Tauxe, Laura Gieraltowski).

This work was funded in part by the Colorado and Minnesota Integrated Food Safety Centers of Excellence, which are supported by the Epidemiology and Laboratory Capacity for Infectious Disease Cooperative Agreement through the Centers for Disease Control and Prevention.

Conflict of interest: none declared.

Scallan E , Hoekstra RM , Angulo FJ , et al.  Foodborne illness acquired in the United States—major pathogens . Emerg Infect Dis . 2011 ; 17 ( 1 ): 7 – 15 .

Google Scholar

Tauxe RV . Surveillance and investigation of foodborne diseases; roles for public health in meeting objectives for food safety . Food Control . 2002 ; 13 ( 6-7 ): 363 – 369 .

Dewey-Mattia D , Manikonda K , Hall AJ , et al.  Surveillance for foodborne disease outbreaks—United States, 2009–2015 . MMWR Morb Mortal Wkly Rep . 2018 ; 67 ( 10 ): 1 – 11 .

Behravesh CB , Blaney D , Medus C , et al.  Multistate outbreak of Salmonella serotype typhimurium infections associated with consumption of restaurant tomatoes, USA, 2006: hypothesis generation through case exposures in multiple restaurant clusters . Epidemiol Infect . 2012 ; 140 ( 11 ): 2053 – 2061 .

Boulton ML , Rosenberg LD . Food safety epidemiology capacity in state health departments—United States, 2010 . MMWR Morb Mortal Wkly Rep . 2011 ; 60 ( 50 ): 1701 – 1704 .

Porta MA A Dictionary of Epidemiology . 5th ed. New York, NY : Oxford University Press ; 2008 ( 4 ): 82 .

Centers for Disease Control and Prevention . National Outbreak Reporting System Dashboard. https://wwwn.cdc.gov/norsdashboard/ . Updated December 7, 2018 . Accessed April 9, 2021 .

Lampel KA , Al-Khaldi S , Cahill SM , eds. Bad Bug Book, Foodborne Pathogenic Microorganisms and Natural Toxins . 2nd ed. Washington, DC : Food and Drug Administration ; 2012 .

Google Preview

Centers for Disease Control and Prevention . An Atlas of Salmonella in the United States, 1968–2011: Laboratory-Based Enteric Disease Surveillance . Atlanta, GA : US Department of Health and Human Services, CDC ; 2013 . https://www.cdc.gov/salmonella/pdf/salmonella-atlas-508c.pdf . Accessed April 9, 2021 .

Interagency Food Safety Analytics Collaboration . Foodborne Illness Source Attribution Estimates for 2017 for Salmonella , Escherichia coli O157, Listeria monocytogenes , and Campylobacter Using Multi-Year Outbreak Surveillance Data, United States . Atlanta, GA and Washington DC : US Department of Health and Human Services ; 2019 . https://www.cdc.gov/foodsafety/ifsac/pdf/P19-2017-report-TriAgency-508-archived.pdf . Accessed April 9, 2021 .

Friedman CR , Hoekstra RM , Samuel M , et al.  Risk factors for sporadic Campylobacter infection in the United States: a case‐control study in FoodNet sites . Clin Infect Dis . 2004 ; 38 ( suppl 3 ): S285 – S296 .

Varma J , Samuel M , Marcus R , et al.  Listeria monocytogenes infection from foods prepared in a commercial establishment: a case-control study of potential sources of sporadic illness in the United States . Clin Infect Dis . 2007 ; 44 ( 4 ): 521 – 528 .

Jackson BR , Griffin PM , Cole D , et al.  Outbreak-associated Salmonella enterica serotypes and food commodities, United States, 1998--2008 . Emerg Infect Dis . 2013 ; 19 ( 8 ): 1239 – 1244 .

Brown AC , Grass JE , Richardson LC , et al.  Antimicrobial resistance in Salmonella that caused foodborne disease outbreaks: United States, 2003–2012 . Epidemiol Infect . 2017 ; 145 ( 4 ): 766 – 774 .

Centers for Disease Control and Prevention . Multistate outbreak of E. coli O157:H7 infections linked to romaine lettuce. https://www.cdc.gov/ecoli/2018/o157h7-04-18/index.html . Published June 28, 2018 . Accessed August 6, 2020 .

Centers for Disease Control and Prevention . Outbreak of E. coli infections linked to romaine lettuce. https://www.cdc.gov/ecoli/2019/o157h7-11-19/index.html . Published January 15, 2020 . Accessed August 6, 2020 .

Besser JM , Carleton HA , Trees E , et al.  Interpretation of whole-genome sequencing for enteric disease surveillance and outbreak investigation . Foodborne Pathog Dis . 2019 ; 16 ( 7 ): 504 – 512 .

Sotir MJ , Ewald G , Kimura AC , et al.  Outbreak of Salmonella Wandsworth and Typhimurium infections in infants and toddlers traced to a commercial vegetable-coated snack food . Pediatr Infect Dis J . 2009 ; 28 ( 12 ): 1041 – 1046 .

White A , Cronquist A , Bedrick E , et al.  Food source prediction of Shiga toxin-producing Escherichia coli outbreaks using demographic and outbreak characteristics, United States, 1998–2014 . Foodborne Pathog Dis . 2016 ; 13 ( 10 ): 527 – 534 .

Shiferaw B , Verrill L , Booth H , et al.  Sex-based differences in food consumption: Foodborne Diseases Active Surveillance Network (FoodNet) Population Survey, 2006–2007 . Clin Infect Dis . 2012 ; 54 ( suppl 5 ): S453 – S457 .

Ferguson DD , Scheftel J , Cronquist A , et al.  Temporally distinct Escherichia coli O157 outbreaks associated with alfalfa sprouts linked to a common seed source—Colorado and Minnesota, 2003 . Epidemiol Infect . 2005 ; 133 ( 3 ): 439 – 447 .

Tauxe RV . Emerging foodborne diseases: an evolving public health challenge . Emerg Infect Dis . 1997 ; 3 ( 4 ): 425 – 434 .

Public Health Agency of Canada . Public Health Notice—outbreak of Salmonella infections linked to Celebrate brand frozen classic/classical and egg nog flavoured profiteroles (cream puffs) and mini chocolate eclairs. https://www.canada.ca/en/public-health/services/public-health-notices/2019/outbreak-salmonella.html . Published June 27, 2019 . Accessed August 6, 2020 .

Mba-Jonas A , Culpepper W , Hill T , et al.  A multistate outbreak of human Salmonella Agona infections associated with consumption of fresh, whole papayas imported from Mexico—United States, 2011 . Clin Infect Dis . 2018 ; 66 ( 11 ): 1756 – 1761 .

Hedberg C . Guidelines for Foodborne Disease Outbreak Response . 3rd ed. Atlanta, GA : Council to Improve Foodborne Outbreak Response (CIFOR) ; 2020 .

Centers for Disease Control and Prevention . Foodborne disease outbreak investigation and surveillance tools. https://www.cdc.gov/foodsafety/outbreaks/surveillance-reporting/investigation-toolkit.html . Reviewed June 10, 2021 . Accessed July 2, 2021 .

Meyer SD , Kirk SE , Hedberg CH . Chapter 7.2—Surveillance for foodborne diseases, part 2: investigation of foodborne disease outbreaks. In: M'ikanatha NM , Lynfield R , Van Beneden CA , et al. eds. Infectious Disease Surveillance . 5th ed. West Sussex, UK : Wiley-Blackwell ; 2013 : 120 – 128 .

Chai S , Gu W , O'Connor KA , et al.  Incubation periods of enteric illnesses in foodborne outbreaks, United States, 1998-2013 . Epidemiol Infect . 2019 ; 147 :e285.

Angelo KM , Conrad AR , Saupe A , et al.  Multistate outbreak of Listeria monocytogenes infections linked to whole apples used in commercially produced, prepackaged caramel apples: United States, 2014-2015 . Epidemiol Infect . 2017 ; 145 ( 5 ): 848 – 856 .

Møller FT , Mølbak K , Ethelberg S . Analysis of consumer food purchase data used for outbreak investigations, a review . Euro Surveill . 2018 ; 23 ( 24 ):1700503.

Gieraltowski L , Julian E , Pringle J , et al.  Nationwide outbreak of Salmonella Montevideo infections associated with contaminated imported black and red pepper: warehouse membership cards provide critical clues to identify the source . Epidemiol Infect . 2013 ; 141 ( 6 ): 1244 – 1252 .

Ickert C , Cheng J , Reimer D , et al.  Methods for generating hypotheses in human enteric illness outbreak investigations: a scoping review of the evidence . Epidemiol Infect . 2019 ; 147 :e280.

Jervis RH , Booth H , Cronquist AB , et al.  Moving away from population-based case-control studies during outbreak investigations . J Food Prot . 2019 ; 82 ( 8 ): 1412 – 1416 .

Keene W . The use of binomial probabilities in outbreak investigations (abstract). In: Presented at the Annual OutbreakNet Conference, Long Beach . California ; September 22, 2011 .

McCollum JT , Cronquist AB , Silk BJ , et al.  Multistate outbreak of listeriosis associated with cantaloupe . N Engl J Med . 2013 ; 369 ( 10 ): 944 – 953 .

Centers for Disease Control and Prevention . National Listeria Surveillance: Listeria initiative. https://www.cdc.gov/nationalsurveillance/listeria-surveillance.html . Published September 13, 2018 . Accessed August 6, 2020

Jackson BR , Tarr C , Strain E , et al.  Implementation of nationwide real-time whole-genome sequencing to enhance listeriosis outbreak detection and investigation . Clin Infect Dis . 2016 ; 63 ( 3 ): 380 – 386 .

Sharapov UM , Wendel AM , Davis JP , et al.  Multistate outbreak of Escherichia coli O157:H7 infections associated with consumption of fresh spinach: United States, 2006 . J Food Prot . 2016 ; 79 ( 12 ): 2024 – 2030 .

Neil KP , Biggerstaff G , MacDonald JK , et al.  A novel vehicle for transmission of Escherichia coli O157:H7 to humans: multistate outbreak of E. coli O157:H7 infections associated with consumption of ready-to-bake commercial prepackaged cookie dough—United States, 2009 . Clin Infect Dis . 2012 ; 54 ( 4 ): 511 – 518 .

Miller BD , Rigdon CE , Ball J , et al.  Use of traceback methods to confirm the source of a multistate Escherichia coli O157:H7 outbreak due to in-shell hazelnuts . J Food Prot . 2012 ; 75 ( 2 ): 320 – 327 .

Medus C , Meyer S , Smith K , et al.  Multistate outbreak of Salmonella infections associated with peanut butter and peanut butter-containing products—United States, 2008–2009 . MMWR Morb Mortal Wkly Rep . 2009 ; 58 ( 4 ): 85 – 90 .

Gambino-Shirley KJ , Tesfai A , Schwensohn CA , et al.  Multistate outbreak of Salmonella Virchow infections linked to a powdered meal replacement product—United States, 2015–2016 . Clin Infect Dis . 2018 ; 67 ( 6 ): 890 – 896 .

Centers for Disease Control and Prevention . Multistate outbreak of Salmonella infections linked to kratom. https://www.cdc.gov/salmonella/kratom-02-18/index.html . 2018 . Published February 20, 2018 . Accessed September 14, 2020 .

Centers for Disease Control and Prevention . Multistate outbreak of Salmonella infections linked to kratom. https://www.cdc.gov/nationalsurveillance/listeria-surveillance.html . Last reviewed September 13, 2018 . Accessed July 2, 2021 .

  • disease outbreaks
  • pathogenic organism
  • foodborne disease
Month: Total Views:
April 2021 45
May 2021 34
June 2021 45
July 2021 37
August 2021 22
September 2021 30
October 2021 89
November 2021 60
December 2021 45
January 2022 33
February 2022 67
March 2022 61
April 2022 32
May 2022 37
June 2022 36
July 2022 11
August 2022 23
September 2022 33
October 2022 86
November 2022 72
December 2022 62
January 2023 58
February 2023 63
March 2023 95
April 2023 70
May 2023 108
June 2023 57
July 2023 68
August 2023 71
September 2023 82
October 2023 78
November 2023 85
December 2023 64
January 2024 87
February 2024 73
March 2024 111
April 2024 96
May 2024 69
June 2024 74
July 2024 57
August 2024 32

Email alerts

Citing articles via, looking for your next opportunity.

  • Recommend to your Library

Affiliations

  • Online ISSN 1476-6256
  • Print ISSN 0002-9262
  • Copyright © 2024 Johns Hopkins Bloomberg School of Public Health
  • About Oxford Academic
  • Publish journals with us
  • University press partners
  • What we publish
  • New features  
  • Open access
  • Institutional account management
  • Rights and permissions
  • Get help with access
  • Accessibility
  • Advertising
  • Media enquiries
  • Oxford University Press
  • Oxford Languages
  • University of Oxford

Oxford University Press is a department of the University of Oxford. It furthers the University's objective of excellence in research, scholarship, and education by publishing worldwide

  • Copyright © 2024 Oxford University Press
  • Cookie settings
  • Cookie policy
  • Privacy policy
  • Legal notice

This Feature Is Available To Subscribers Only

Sign In or Create an Account

This PDF is available to Subscribers Only

For full access to this pdf, sign in to an existing account, or purchase an annual subscription.

  • Data Science
  • Data Analysis
  • Data Visualization
  • Machine Learning
  • Deep Learning
  • Computer Vision
  • Artificial Intelligence
  • AI ML DS Interview Series
  • AI ML DS Projects series
  • Data Engineering
  • Web Scrapping

Understanding Hypothesis Testing

Hypothesis testing involves formulating assumptions about population parameters based on sample statistics and rigorously evaluating these assumptions against empirical evidence. This article sheds light on the significance of hypothesis testing and the critical steps involved in the process.

What is Hypothesis Testing?

A hypothesis is an assumption or idea, specifically a statistical claim about an unknown population parameter. For example, a judge assumes a person is innocent and verifies this by reviewing evidence and hearing testimony before reaching a verdict.

Hypothesis testing is a statistical method that is used to make a statistical decision using experimental data. Hypothesis testing is basically an assumption that we make about a population parameter. It evaluates two mutually exclusive statements about a population to determine which statement is best supported by the sample data. 

To test the validity of the claim or assumption about the population parameter:

  • A sample is drawn from the population and analyzed.
  • The results of the analysis are used to decide whether the claim is true or not.
Example: You say an average height in the class is 30 or a boy is taller than a girl. All of these is an assumption that we are assuming, and we need some statistical way to prove these. We need some mathematical conclusion whatever we are assuming is true.

Defining Hypotheses

  • Null hypothesis (H 0 ): In statistics, the null hypothesis is a general statement or default position that there is no relationship between two measured cases or no relationship among groups. In other words, it is a basic assumption or made based on the problem knowledge. Example : A company’s mean production is 50 units/per da H 0 : [Tex]\mu [/Tex] = 50.
  • Alternative hypothesis (H 1 ): The alternative hypothesis is the hypothesis used in hypothesis testing that is contrary to the null hypothesis.  Example: A company’s production is not equal to 50 units/per day i.e. H 1 : [Tex]\mu [/Tex] [Tex]\ne [/Tex] 50.

Key Terms of Hypothesis Testing

  • Level of significance : It refers to the degree of significance in which we accept or reject the null hypothesis. 100% accuracy is not possible for accepting a hypothesis, so we, therefore, select a level of significance that is usually 5%. This is normally denoted with  [Tex]\alpha[/Tex] and generally, it is 0.05 or 5%, which means your output should be 95% confident to give a similar kind of result in each sample.
  • P-value: The P value , or calculated probability, is the probability of finding the observed/extreme results when the null hypothesis(H0) of a study-given problem is true. If your P-value is less than the chosen significance level then you reject the null hypothesis i.e. accept that your sample claims to support the alternative hypothesis.
  • Test Statistic: The test statistic is a numerical value calculated from sample data during a hypothesis test, used to determine whether to reject the null hypothesis. It is compared to a critical value or p-value to make decisions about the statistical significance of the observed results.
  • Critical value : The critical value in statistics is a threshold or cutoff point used to determine whether to reject the null hypothesis in a hypothesis test.
  • Degrees of freedom: Degrees of freedom are associated with the variability or freedom one has in estimating a parameter. The degrees of freedom are related to the sample size and determine the shape.

Why do we use Hypothesis Testing?

Hypothesis testing is an important procedure in statistics. Hypothesis testing evaluates two mutually exclusive population statements to determine which statement is most supported by sample data. When we say that the findings are statistically significant, thanks to hypothesis testing. 

One-Tailed and Two-Tailed Test

One tailed test focuses on one direction, either greater than or less than a specified value. We use a one-tailed test when there is a clear directional expectation based on prior knowledge or theory. The critical region is located on only one side of the distribution curve. If the sample falls into this critical region, the null hypothesis is rejected in favor of the alternative hypothesis.

One-Tailed Test

There are two types of one-tailed test:

  • Left-Tailed (Left-Sided) Test: The alternative hypothesis asserts that the true parameter value is less than the null hypothesis. Example: H 0 ​: [Tex]\mu \geq 50 [/Tex] and H 1 : [Tex]\mu < 50 [/Tex]
  • Right-Tailed (Right-Sided) Test : The alternative hypothesis asserts that the true parameter value is greater than the null hypothesis. Example: H 0 : [Tex]\mu \leq50 [/Tex] and H 1 : [Tex]\mu > 50 [/Tex]

Two-Tailed Test

A two-tailed test considers both directions, greater than and less than a specified value.We use a two-tailed test when there is no specific directional expectation, and want to detect any significant difference.

Example: H 0 : [Tex]\mu = [/Tex] 50 and H 1 : [Tex]\mu \neq 50 [/Tex]

To delve deeper into differences into both types of test: Refer to link

What are Type 1 and Type 2 errors in Hypothesis Testing?

In hypothesis testing, Type I and Type II errors are two possible errors that researchers can make when drawing conclusions about a population based on a sample of data. These errors are associated with the decisions made regarding the null hypothesis and the alternative hypothesis.

  • Type I error: When we reject the null hypothesis, although that hypothesis was true. Type I error is denoted by alpha( [Tex]\alpha [/Tex] ).
  • Type II errors : When we accept the null hypothesis, but it is false. Type II errors are denoted by beta( [Tex]\beta [/Tex] ).


Null Hypothesis is True

Null Hypothesis is False

Null Hypothesis is True (Accept)

Correct Decision

Type II Error (False Negative)

Alternative Hypothesis is True (Reject)

Type I Error (False Positive)

Correct Decision

How does Hypothesis Testing work?

Step 1: define null and alternative hypothesis.

State the null hypothesis ( [Tex]H_0 [/Tex] ), representing no effect, and the alternative hypothesis ( [Tex]H_1 [/Tex] ​), suggesting an effect or difference.

We first identify the problem about which we want to make an assumption keeping in mind that our assumption should be contradictory to one another, assuming Normally distributed data.

Step 2 – Choose significance level

Select a significance level ( [Tex]\alpha [/Tex] ), typically 0.05, to determine the threshold for rejecting the null hypothesis. It provides validity to our hypothesis test, ensuring that we have sufficient data to back up our claims. Usually, we determine our significance level beforehand of the test. The p-value is the criterion used to calculate our significance value.

Step 3 – Collect and Analyze data.

Gather relevant data through observation or experimentation. Analyze the data using appropriate statistical methods to obtain a test statistic.

Step 4-Calculate Test Statistic

The data for the tests are evaluated in this step we look for various scores based on the characteristics of data. The choice of the test statistic depends on the type of hypothesis test being conducted.

There are various hypothesis tests, each appropriate for various goal to calculate our test. This could be a Z-test , Chi-square , T-test , and so on.

  • Z-test : If population means and standard deviations are known. Z-statistic is commonly used.
  • t-test : If population standard deviations are unknown. and sample size is small than t-test statistic is more appropriate.
  • Chi-square test : Chi-square test is used for categorical data or for testing independence in contingency tables
  • F-test : F-test is often used in analysis of variance (ANOVA) to compare variances or test the equality of means across multiple groups.

We have a smaller dataset, So, T-test is more appropriate to test our hypothesis.

T-statistic is a measure of the difference between the means of two groups relative to the variability within each group. It is calculated as the difference between the sample means divided by the standard error of the difference. It is also known as the t-value or t-score.

Step 5 – Comparing Test Statistic:

In this stage, we decide where we should accept the null hypothesis or reject the null hypothesis. There are two ways to decide where we should accept or reject the null hypothesis.

Method A: Using Crtical values

Comparing the test statistic and tabulated critical value we have,

  • If Test Statistic>Critical Value: Reject the null hypothesis.
  • If Test Statistic≤Critical Value: Fail to reject the null hypothesis.

Note: Critical values are predetermined threshold values that are used to make a decision in hypothesis testing. To determine critical values for hypothesis testing, we typically refer to a statistical distribution table , such as the normal distribution or t-distribution tables based on.

Method B: Using P-values

We can also come to an conclusion using the p-value,

  • If the p-value is less than or equal to the significance level i.e. ( [Tex]p\leq\alpha [/Tex] ), you reject the null hypothesis. This indicates that the observed results are unlikely to have occurred by chance alone, providing evidence in favor of the alternative hypothesis.
  • If the p-value is greater than the significance level i.e. ( [Tex]p\geq \alpha[/Tex] ), you fail to reject the null hypothesis. This suggests that the observed results are consistent with what would be expected under the null hypothesis.

Note : The p-value is the probability of obtaining a test statistic as extreme as, or more extreme than, the one observed in the sample, assuming the null hypothesis is true. To determine p-value for hypothesis testing, we typically refer to a statistical distribution table , such as the normal distribution or t-distribution tables based on.

Step 7- Interpret the Results

At last, we can conclude our experiment using method A or B.

Calculating test statistic

To validate our hypothesis about a population parameter we use statistical functions . We use the z-score, p-value, and level of significance(alpha) to make evidence for our hypothesis for normally distributed data .

1. Z-statistics:

When population means and standard deviations are known.

[Tex]z = \frac{\bar{x} – \mu}{\frac{\sigma}{\sqrt{n}}}[/Tex]

  • [Tex]\bar{x} [/Tex] is the sample mean,
  • μ represents the population mean, 
  • σ is the standard deviation
  • and n is the size of the sample.

2. T-Statistics

T test is used when n<30,

t-statistic calculation is given by:

[Tex]t=\frac{x̄-μ}{s/\sqrt{n}} [/Tex]

  • t = t-score,
  • x̄ = sample mean
  • μ = population mean,
  • s = standard deviation of the sample,
  • n = sample size

3. Chi-Square Test

Chi-Square Test for Independence categorical Data (Non-normally distributed) using:

[Tex]\chi^2 = \sum \frac{(O_{ij} – E_{ij})^2}{E_{ij}}[/Tex]

  • [Tex]O_{ij}[/Tex] is the observed frequency in cell [Tex]{ij} [/Tex]
  • i,j are the rows and columns index respectively.
  • [Tex]E_{ij}[/Tex] is the expected frequency in cell [Tex]{ij}[/Tex] , calculated as : [Tex]\frac{{\text{{Row total}} \times \text{{Column total}}}}{{\text{{Total observations}}}}[/Tex]

Real life Examples of Hypothesis Testing

Let’s examine hypothesis testing using two real life situations,

Case A: D oes a New Drug Affect Blood Pressure?

Imagine a pharmaceutical company has developed a new drug that they believe can effectively lower blood pressure in patients with hypertension. Before bringing the drug to market, they need to conduct a study to assess its impact on blood pressure.

  • Before Treatment: 120, 122, 118, 130, 125, 128, 115, 121, 123, 119
  • After Treatment: 115, 120, 112, 128, 122, 125, 110, 117, 119, 114

Step 1 : Define the Hypothesis

  • Null Hypothesis : (H 0 )The new drug has no effect on blood pressure.
  • Alternate Hypothesis : (H 1 )The new drug has an effect on blood pressure.

Step 2: Define the Significance level

Let’s consider the Significance level at 0.05, indicating rejection of the null hypothesis.

If the evidence suggests less than a 5% chance of observing the results due to random variation.

Step 3 : Compute the test statistic

Using paired T-test analyze the data to obtain a test statistic and a p-value.

The test statistic (e.g., T-statistic) is calculated based on the differences between blood pressure measurements before and after treatment.

t = m/(s/√n)

  • m  = mean of the difference i.e X after, X before
  • s  = standard deviation of the difference (d) i.e d i ​= X after, i ​− X before,
  • n  = sample size,

then, m= -3.9, s= 1.8 and n= 10

we, calculate the , T-statistic = -9 based on the formula for paired t test

Step 4: Find the p-value

The calculated t-statistic is -9 and degrees of freedom df = 9, you can find the p-value using statistical software or a t-distribution table.

thus, p-value = 8.538051223166285e-06

Step 5: Result

  • If the p-value is less than or equal to 0.05, the researchers reject the null hypothesis.
  • If the p-value is greater than 0.05, they fail to reject the null hypothesis.

Conclusion: Since the p-value (8.538051223166285e-06) is less than the significance level (0.05), the researchers reject the null hypothesis. There is statistically significant evidence that the average blood pressure before and after treatment with the new drug is different.

Python Implementation of Case A

Let’s create hypothesis testing with python, where we are testing whether a new drug affects blood pressure. For this example, we will use a paired T-test. We’ll use the scipy.stats library for the T-test.

Scipy is a mathematical library in Python that is mostly used for mathematical equations and computations.

We will implement our first real life problem via python,

import numpy as np from scipy import stats # Data before_treatment = np . array ([ 120 , 122 , 118 , 130 , 125 , 128 , 115 , 121 , 123 , 119 ]) after_treatment = np . array ([ 115 , 120 , 112 , 128 , 122 , 125 , 110 , 117 , 119 , 114 ]) # Step 1: Null and Alternate Hypotheses # Null Hypothesis: The new drug has no effect on blood pressure. # Alternate Hypothesis: The new drug has an effect on blood pressure. null_hypothesis = "The new drug has no effect on blood pressure." alternate_hypothesis = "The new drug has an effect on blood pressure." # Step 2: Significance Level alpha = 0.05 # Step 3: Paired T-test t_statistic , p_value = stats . ttest_rel ( after_treatment , before_treatment ) # Step 4: Calculate T-statistic manually m = np . mean ( after_treatment - before_treatment ) s = np . std ( after_treatment - before_treatment , ddof = 1 ) # using ddof=1 for sample standard deviation n = len ( before_treatment ) t_statistic_manual = m / ( s / np . sqrt ( n )) # Step 5: Decision if p_value <= alpha : decision = "Reject" else : decision = "Fail to reject" # Conclusion if decision == "Reject" : conclusion = "There is statistically significant evidence that the average blood pressure before and after treatment with the new drug is different." else : conclusion = "There is insufficient evidence to claim a significant difference in average blood pressure before and after treatment with the new drug." # Display results print ( "T-statistic (from scipy):" , t_statistic ) print ( "P-value (from scipy):" , p_value ) print ( "T-statistic (calculated manually):" , t_statistic_manual ) print ( f "Decision: { decision } the null hypothesis at alpha= { alpha } ." ) print ( "Conclusion:" , conclusion )

T-statistic (from scipy): -9.0 P-value (from scipy): 8.538051223166285e-06 T-statistic (calculated manually): -9.0 Decision: Reject the null hypothesis at alpha=0.05. Conclusion: There is statistically significant evidence that the average blood pressure before and after treatment with the new drug is different.

In the above example, given the T-statistic of approximately -9 and an extremely small p-value, the results indicate a strong case to reject the null hypothesis at a significance level of 0.05. 

  • The results suggest that the new drug, treatment, or intervention has a significant effect on lowering blood pressure.
  • The negative T-statistic indicates that the mean blood pressure after treatment is significantly lower than the assumed population mean before treatment.

Case B : Cholesterol level in a population

Data: A sample of 25 individuals is taken, and their cholesterol levels are measured.

Cholesterol Levels (mg/dL): 205, 198, 210, 190, 215, 205, 200, 192, 198, 205, 198, 202, 208, 200, 205, 198, 205, 210, 192, 205, 198, 205, 210, 192, 205.

Populations Mean = 200

Population Standard Deviation (σ): 5 mg/dL(given for this problem)

Step 1: Define the Hypothesis

  • Null Hypothesis (H 0 ): The average cholesterol level in a population is 200 mg/dL.
  • Alternate Hypothesis (H 1 ): The average cholesterol level in a population is different from 200 mg/dL.

As the direction of deviation is not given , we assume a two-tailed test, and based on a normal distribution table, the critical values for a significance level of 0.05 (two-tailed) can be calculated through the z-table and are approximately -1.96 and 1.96.

The test statistic is calculated by using the z formula Z = [Tex](203.8 – 200) / (5 \div \sqrt{25}) [/Tex] ​ and we get accordingly , Z =2.039999999999992.

Step 4: Result

Since the absolute value of the test statistic (2.04) is greater than the critical value (1.96), we reject the null hypothesis. And conclude that, there is statistically significant evidence that the average cholesterol level in the population is different from 200 mg/dL

Python Implementation of Case B

import scipy.stats as stats import math import numpy as np # Given data sample_data = np . array ( [ 205 , 198 , 210 , 190 , 215 , 205 , 200 , 192 , 198 , 205 , 198 , 202 , 208 , 200 , 205 , 198 , 205 , 210 , 192 , 205 , 198 , 205 , 210 , 192 , 205 ]) population_std_dev = 5 population_mean = 200 sample_size = len ( sample_data ) # Step 1: Define the Hypotheses # Null Hypothesis (H0): The average cholesterol level in a population is 200 mg/dL. # Alternate Hypothesis (H1): The average cholesterol level in a population is different from 200 mg/dL. # Step 2: Define the Significance Level alpha = 0.05 # Two-tailed test # Critical values for a significance level of 0.05 (two-tailed) critical_value_left = stats . norm . ppf ( alpha / 2 ) critical_value_right = - critical_value_left # Step 3: Compute the test statistic sample_mean = sample_data . mean () z_score = ( sample_mean - population_mean ) / \ ( population_std_dev / math . sqrt ( sample_size )) # Step 4: Result # Check if the absolute value of the test statistic is greater than the critical values if abs ( z_score ) > max ( abs ( critical_value_left ), abs ( critical_value_right )): print ( "Reject the null hypothesis." ) print ( "There is statistically significant evidence that the average cholesterol level in the population is different from 200 mg/dL." ) else : print ( "Fail to reject the null hypothesis." ) print ( "There is not enough evidence to conclude that the average cholesterol level in the population is different from 200 mg/dL." )

Reject the null hypothesis. There is statistically significant evidence that the average cholesterol level in the population is different from 200 mg/dL.

Limitations of Hypothesis Testing

  • Although a useful technique, hypothesis testing does not offer a comprehensive grasp of the topic being studied. Without fully reflecting the intricacy or whole context of the phenomena, it concentrates on certain hypotheses and statistical significance.
  • The accuracy of hypothesis testing results is contingent on the quality of available data and the appropriateness of statistical methods used. Inaccurate data or poorly formulated hypotheses can lead to incorrect conclusions.
  • Relying solely on hypothesis testing may cause analysts to overlook significant patterns or relationships in the data that are not captured by the specific hypotheses being tested. This limitation underscores the importance of complimenting hypothesis testing with other analytical approaches.

Hypothesis testing stands as a cornerstone in statistical analysis, enabling data scientists to navigate uncertainties and draw credible inferences from sample data. By systematically defining null and alternative hypotheses, choosing significance levels, and leveraging statistical tests, researchers can assess the validity of their assumptions. The article also elucidates the critical distinction between Type I and Type II errors, providing a comprehensive understanding of the nuanced decision-making process inherent in hypothesis testing. The real-life example of testing a new drug’s effect on blood pressure using a paired T-test showcases the practical application of these principles, underscoring the importance of statistical rigor in data-driven decision-making.

Frequently Asked Questions (FAQs)

1. what are the 3 types of hypothesis test.

There are three types of hypothesis tests: right-tailed, left-tailed, and two-tailed. Right-tailed tests assess if a parameter is greater, left-tailed if lesser. Two-tailed tests check for non-directional differences, greater or lesser.

2.What are the 4 components of hypothesis testing?

Null Hypothesis ( [Tex]H_o [/Tex] ): No effect or difference exists. Alternative Hypothesis ( [Tex]H_1 [/Tex] ): An effect or difference exists. Significance Level ( [Tex]\alpha [/Tex] ): Risk of rejecting null hypothesis when it’s true (Type I error). Test Statistic: Numerical value representing observed evidence against null hypothesis.

3.What is hypothesis testing in ML?

Statistical method to evaluate the performance and validity of machine learning models. Tests specific hypotheses about model behavior, like whether features influence predictions or if a model generalizes well to unseen data.

4.What is the difference between Pytest and hypothesis in Python?

Pytest purposes general testing framework for Python code while Hypothesis is a Property-based testing framework for Python, focusing on generating test cases based on specified properties of the code.

Please Login to comment...

Similar reads.

  • data-science

Improve your Coding Skills with Practice

 alt=

What kind of Experience do you want to share?

definition of hypothesis generation

Outbreak Toolkit

Hypothesis generation, on this page, questionnaires and interviewing, exposure analysis.

Generating hypotheses is an important, but often challenging, step in an outbreak investigation . When generating hypotheses, it is best to keep an open mind and to cast a wide net. A good starting place would be to identify exposures that have been previously been associated with the pathogen under investigation. This can be done by:

  • searching an outbreak database such as Outbreak Summaries , the Marler-Clark database, and the CDC Foodborne Outbreak Online database (see Tools  for links to these databases)
  • reviewing the published literature using a search engine such as PubMed or Google Scholar.

If the case definition for the illnesses under investigation includes laboratory information in the form of Whole Genome Sequencing (WGS) results, consider investigating where and when the sequence has been seen before. Provincial and federal public health laboratories maintain WGS databases that can contain valuable information for outbreak investigation purposes. PulseNet Canada can provide information about how common or rare the serotype or sequence is nationally, where and when it was last seen, and if it has been detected in any food samples in the past. PulseNet Canada will also be able to check the United States’ PulseNet WGS databases for matches. FoodNet Canada can provide information about whether the sequence has previously been seen in farm or retail samples from its sentinel sites.

While it is important to gather such historical information, the most effective way to generate a high-quality hypothesis is to identify common exposures amongst cases. This can be achieved by interviewing cases using a hypothesis generating questionnaire and analysing exposures. 

Back to top

Hypothesis generating questionnaires

Hypothesis generating questionnaires (or shotgun questionnaires) are intended to obtain detailed information on what a person’s exposures were in the days leading up to their illness. They are typically quite long and ask about many exposures such as travel history, contact with animals, restaurants, events attended, and a comprehensive food history. The time period of interest varies between pathogens, as the exposure period is equal to the maximum incubation period of the pathogen.

When designing a questionnaire, it is important to ensure that the questions are gathering the intended information. Questions should be concise, informal, and specific. Before interviewing cases, questionnaires should be tested to ensure clarity and identify any potential errors.

Read more – Questionnaire Design

Case interviewing

Once the questionnaire is developed and piloted, it should be administered to cases in a consistent and unbiased manner. Case interviews can be conducted by one or multiple interviewers. A centralized approach allows a single interviewer to standardize interviews, detect patterns, and probe for items of interest. However, a multiple- interviewer approach is more time-efficient and allows for multiple perspectives when it comes time to identify the source.

Although case interviewing is an important outbreak investigation tool, it is not without its challenges. By the time the outbreak team is ready to conduct the interview, it could be weeks to months after the onset of symptoms. It is difficult for people to recall what they ate over a month ago. Sometimes cases might need to be interviewed multiple times as the hypothesis is developed and refined.

Read more- Case interviews

Once the interviews are complete, the data can be entered into a database or line list . The frequency of exposures for the cases is then obtained (e.g., % of cases that consumed each food item).

It is tempting to conclude that the most commonly consumed food items are the most likely suspects, but it is possible that these foods are commonly consumed amongst the general population as well. What is needed is a baseline proportion to compare the exposure frequencies to. Reference population studies, such as the CDC Food Atlas, the Nesbitt Waterloo study and Foodbook (see Tools ), can be used for this purpose. These studies provide investigators with the expected food frequencies based on 7-day food histories from thousands of respondents. These data can be used as a point of comparison for questionnaire data to identify exposures such as food items with higher than expected frequencies. Statistical tests (e.g., binomial probability tests) can then be used to test whether the differences between the proportion of cases exposed is significantly different from the proportion of “controls” (i.e., people included in the population studies) (see Tools ).

There are many limitations to using expected food frequencies, such as some studies not accounting for:

  • Seasonality (e.g., consumption of cherries is higher in the summer, however the expected levels are the same year-round)
  • Differences in consumption between sexes, adults and children
  • Geographic location
  • Various ethnic/religious/cultural groups

Further, since specific questions differ among surveys, it is often difficult to find the most appropriate comparison group. For example, the CDC Atlas of Exposures differentiates between hamburgers eaten at home or outside the home, while questionnaires used in investigations typically do not. Such differences in food definitions can make it challenging to determine which reference variable is the most appropriate to use as an “expected” level.

It is important to keep in mind that some foods with high expected consumption levels (e.g., chicken) may not flag statistically, but could still be potential sources. Further, there are other common exposures amongst cases that can carry important clues about the source of the outbreak. Cases that report common restaurants, events, or grocery stores can be considered sub-clusters. These sub-clusters should be investigated thoroughly by obtaining menus, receipts, or shopper card information if possible.

  • Case study, Module 2: Hypothesis generation
  • Case study, Module 2 – Exercise 3: Food frequency analysis in Excel
  • Case study, Module 2: Interpreting hypothesis generation results
  • Hypothesis generation through case exposures in multiple restaurant clusters example:  Barton Behravesh C, et al . 2012. Multistate outbreak of  Salmonella  serotype Typhimurium infections associated with consumption of restaurant tomatoes, USA, 2006: hypothesis generation through case exposures in multiple restaurant clusters. Epidemiol Infect . 140 (11): 2053-2061 .
  • Case-case study example: Galanis, E.,  et al . 2014. The association between campylobacteriosis, agriculture and drinking water: a case-case study in a region of British Columbia, Canada, 2005–2009.  Epidemiol Infect . 142 (10): 2075-2084 .
  • Exact Probability Calculation and Case-Case Study example:  Gaulin, C.,  et al. 2012.  Escherichia coli O157:H7 Outbreak Linked to Raw Milk Cheese in Quebec, Canada: Use of Exact Probability Calculation and Case-Case Study Approaches to Foodborne Outbreak Investigation. J Food Prot . 5: 812-818 . 

Toolkit binomial probability calculation tool for food exposures

  • This Microsoft Excel document allows users to enter outbreak case food exposure numbers for 300 food items and automatically calculates binomial probabilities using two reference populations and flags exposures of interest for follow-up (Reference populations: CDC Population Survey Atlas of Exposures, 2006-2007 and Waterloo Region, Ontario Food Consumption Survey, November 2005 to March 2006).

Toolkit Outbreak Summaries overview

  • This PDF document provides an overview of the Outbreak Summaries application, its key features and benefits, and an example of how it can be used during an outbreak investigation.

CDC National Outbreak Reporting System (NORS) Dashboard

  • The NORS dashboard allows users to view and download data on disease outbreaks reported to CDC.Data can be filtered by type of outbreak, year, state, etiology (genus only), setting, food/ingredient, water exposure, and water type.

Food Consumption Patterns in the Waterloo Region

  • This food frequency study by Nesbitt et. al. was conducted in Waterloo, Ontario in 2005-2006. The study collected 7-day food consumption data from 2,332 Canadians.

CDC Food Atlas 2006-2007

  • This study by CDC was conducted in 10 U.S. states in 2006-2007. The study asked 17,000 respondents about their exposure to a comprehensive list of foods as well as animal exposure.

FoodNet Canada Reports and Publications

  • FoodNet Canada reports and publications provide information on the areas of greatest risk to human health to help direct food safety actions and programming as well as public health interventions, and to evaluate their effectiveness.

CDC FoodNet Reports

  • The Foodborne Diseases Active Surveillance Network (FoodNet) Annual Reports are summaries of information collected through active surveillance of nine pathogens.

Marler Clark Foodborne Illness Outbreak Database

  • This database provides summaries of food and water related outbreaks caused by various enteric pathogens dating back to 1984.

FDA Foodborne Illness-Causing Organisms Cheat Sheet

  • A quick summary chart on foodborne illnesses, organisms involved, symptom onset times, signs and symptoms to expect, and food sources.

CFIA: Canada’s 10 Least Wanted Foodborne Pathogens

  • This infographic prepared by the CFIA includes information on symptoms, onset time, transmission, potential sources, and preventative measures for ten foodborne pathogens.

Foodbook: Canadian Food Exposure Study to Strengthen Outbreak Response

  • Foodbook is a population-based telephone survey that was conducted in all Canadian provinces and territories. It provides essential data on food, animal and water exposure which is used by the Agency, as well as other federal, provincial, and territorial (F/P/T) partners to understand, respond to, control and prevent enteric illness in Canada.

Toolkit outbreak response database*

*Due to the Government of Canada’s Standard on Web Accessibility, this tool cannot be posted, but it is available upon request. Please contact us at [email protected] to request a copy. Please let us know if you need support or an accessible format.

Hypothesis Generation

Write down all the hypothesis and assumptions as a starting point for the project., applied for.

Stakeholders

also called

How might we...

related content

Research Plan

Interview Guide

Empathy Map

Hypothesis generation is a quick exercise that allows to reflect on all the already-known assumptions and insights related to user needs and behaviours, share them amongst team members, and derive initial ideas for service experiences or features that could be offered.

Ground the first step of the project on existing knowledge.

remember to

Put everything on the table, without hiding or saving ideas for later.

Preview image of the template for Hypothesis Generation

Grow with us! Share your case studies

The collection is always evolving, following the development of our practice. If you have any interesting tools or example of application to share, please get in touch.

This website uses cookies to collect anonymized usage statistics so that we can improve the overall user experience. If you want to know more or change your preferences, read our Cookie Policy . By clicking Accept you are giving consent to the use of cookies.

No, thank you.

COMMENTS

  1. Hypothesis

    Describe the definition, properties, and life cycle of a hypothesis. Describe relationships between a hypothesis and a theory, a model, and data. Categorize and explain research questions that provide hints for hypothesis generation. Explain how to visualize data and analysis results.

  2. Data-Driven Hypothesis Generation in Clinical Research: What We Learned

    However, a review of the literature reveals the lack of original studies on hypothesis generation in clinical research. The authors then summarize their first human participant study exploring data-driven hypothesis generation by clinical researchers in a simulated setting.

  3. Formulating Hypotheses for Different Study Designs

    Formulating Hypotheses for Different Study Designs. Generating a testable working hypothesis is the first step towards conducting original research. Such research may prove or disprove the proposed hypothesis. Case reports, case series, online surveys and other observational studies, clinical trials, and narrative reviews help to generate ...

  4. Hypothesis-generating research and predictive medicine

    The paradigm of hypothesis-generating research does not replace or undermine hypothesis-testing modes of research; instead, it complements them and has facilitated discoveries that may not have been possible with hypothesis-testing research. The hypothesis-generating mode of research has been primarily practiced in basic science but has ...

  5. Hypothesis Generation

    Hypothesis generation is the formation of guesses as to what the segment of code does; this step can also guide a re- segmentation of the code. Finally, verification is the process of examining the code and associated documentation to determine the consistency of the code with the current hypotheses.

  6. The Research Hypothesis: Role and Construction

    A hypothesis (from the Greek, foundation) is a logical construct, interposed between a problem and its solution, which represents a proposed answer to a research question. It gives direction to the investigator's thinking about the problem and, therefore, facilitates a solution. Unlike facts and assumptions (presumed true and, therefore, not ...

  7. Hypothesis: Definition, Examples, and Types

    A hypothesis is a tentative statement about the relationship between two or more variables. Explore examples and learn how to format your research hypothesis.

  8. Hypothesis Generation from Literature for Advancing Biological

    Hypothesis Generation is a literature-based discovery approach that utilizes existing literature to automatically generate implicit biomedical associations and provide reasonable predictions for future research. Despite its potential, current hypothesis generation methods face challenges when applied to research on biological mechanisms.

  9. Machine Learning as a Tool for Hypothesis Generation

    While hypothesis testing is a highly formalized activity, hypothesis generation remains largely informal. We propose a systematic procedure to generate novel hypotheses about human behavior, which uses the capacity of machine learning algorithms to notice patterns people might not.

  10. Hypothesis-generating research and predictive medicine

    The hypothesis-generating mode of research has been primarily practiced in basic science but has recently been extended to clinical-translational work as well. Just as in basic science, this approach to research can facilitate insights into human health and disease mechanisms and provide the crucially needed data set of the full spectrum of ...

  11. Hypothesis Generation for Data Science Projects

    Hypothesis generation is a key step in data science projects. Here's a case study on hypotheis generation for data science.

  12. What is a Hypothesis

    Definition: Hypothesis is an educated guess or proposed explanation for a phenomenon, based on some initial observations or data. It is a tentative statement that can be tested and potentially proven or disproven through further investigation and experimentation. Hypothesis is often used in scientific research to guide the design of experiments ...

  13. Hypothesis

    A hypothesis ( pl.: hypotheses) is a proposed explanation for a phenomenon. For a hypothesis to be a scientific hypothesis, the scientific method requires that one can test it. Scientists generally base scientific hypotheses on previous observations that cannot satisfactorily be explained with the available scientific theories. Even though the words "hypothesis" and "theory" are often used ...

  14. Scientific hypothesis

    The formulation and testing of a hypothesis is part of the scientific method, the approach scientists use when attempting to understand and test ideas about natural phenomena. The generation of a hypothesis frequently is described as a creative process and is based on existing scientific knowledge, intuition, or experience.

  15. Definition of Scientific Hypothesis: A Generalization or a Causal

    Therefore, the term 'hypothesis-generating' in this study refers to the abductive thinking process of formulating a set of propositions proposed as a tentative causal explanation for an observed ...

  16. Hypothesis-generating method

    Search for: 'hypothesis-generating method' in Oxford Reference ». A data-structuring technique, such as a classification and ordination method which, by grouping and ranking data, suggests possible relationships with other factors (i.e. generates an hypothesis). Appropriate data may then be collected to test the hypothesis statistically.

  17. Hypothesis Testing

    Hypothesis testing is a formal procedure for investigating our ideas about the world using statistics. It is most often used by scientists to test specific predictions, called hypotheses, that arise from theories.

  18. Hypothesis Testing in Statistics

    Explore hypothesis testing, a fundamental method in data analysis. Understand how to use it to draw accurate conclusions and make informed decisions.

  19. Hypothesis Generation During Foodborne-Illness ...

    Abstract. Hypothesis generation is a critical, but challenging, step in a foodborne outbreak investigation. The pathogens that contaminate food have many d

  20. Understanding Hypothesis Testing

    Hypothesis testing is an important procedure in statistics. Hypothesis testing evaluates two mutually exclusive population statements to determine which statement is most supported by sample data. When we say that the findings are statistically significant, thanks to hypothesis testing.

  21. Hypothesis generation

    Hypothesis generating questionnaires Hypothesis generating questionnaires (or shotgun questionnaires) are intended to obtain detailed information on what a person's exposures were in the days leading up to their illness. They are typically quite long and ask about many exposures such as travel history, contact with animals, restaurants, events attended, and a comprehensive food history. The ...

  22. Hypothesis Generation

    Hypothesis generation is a quick exercise that allows to reflect on all the already-known assumptions and insights related to user needs and behaviours, share them amongst team members, and derive initial ideas for service experiences or features that could be offered.

  23. Combating Pathogens Using Carbon-Fiber Ionizers (CFIs) for Air ...

    Park et al. highlighted the effectiveness of ionizers equipped to minimize ozone production, focusing on the generation of ions as the main method of achieving bactericidal effects. The health and safety issues stemming from ozone generation by air ionizers degrading indoor air quality are often emphasized . Ozone, as a strong oxidant, can ...