Weekend batch
Avijeet is a Senior Research Analyst at Simplilearn. Passionate about Data Analytics, Machine Learning, and Deep Learning, Avijeet is also interested in politics, cricket, and football.
Free eBook: Top Programming Languages For A Data Scientist
Normality Test in Minitab: Minitab with Statistics
Machine Learning Career Guide: A Playbook to Becoming a Machine Learning Engineer
Abbreviations, a framework for hypothesis generation, acknowledgments.
Alice E White, Kirk E Smith, Hillary Booth, Carlota Medus, Robert V Tauxe, Laura Gieraltowski, Elaine Scallan Walter, Hypothesis Generation During Foodborne-Illness Outbreak Investigations, American Journal of Epidemiology , Volume 190, Issue 10, October 2021, Pages 2188–2197, https://doi.org/10.1093/aje/kwab118
Hypothesis generation is a critical, but challenging, step in a foodborne outbreak investigation. The pathogens that contaminate food have many diverse reservoirs, resulting in seemingly limitless potential vehicles. Identifying a vehicle is particularly challenging for clusters detected through national pathogen-specific surveillance, because cases can be geographically dispersed and lack an obvious epidemiologic link. Moreover, state and local health departments could have limited resources to dedicate to cluster and outbreak investigations. These challenges underscore the importance of hypothesis generation during an outbreak investigation. In this review, we present a framework for hypothesis generation focusing on 3 primary sources of information, typically used in combination: 1) known sources of the pathogen causing illness; 2) person, place, and time characteristics of cases associated with the outbreak (descriptive data); and 3) case exposure assessment. Hypothesis generation can narrow the list of potential food vehicles and focus subsequent epidemiologic, laboratory, environmental, and traceback efforts, ensuring that time and resources are used more efficiently and increasing the likelihood of rapidly and conclusively implicating the contaminated food vehicle.
Shiga toxin-producing Escherichia coli
pulsed-field gel electrophoresis
whole-genome sequencing
hypothesis-generating questionnaire
Foodborne diseases are a continuing public health problem in the United States, where they cause an estimated 48 million illnesses, 128,000 hospitalizations, and 3,000 deaths annually ( 1 ). Public health and regulatory agencies rely on data from foodborne disease surveillance and outbreak investigations to prioritize food safety regulations, policies, and practices aimed at reducing the burden of disease ( 2 ). In particular, foodborne illness outbreaks provide critical information on the foods causing illness, common food-pathogen pairs, and high-risk production technologies and practices. However, only half of the foodborne outbreaks reported each year identify a pathogen, and less than half implicate a food vehicle, decreasing the utility of these data ( 3 ).
A model framework for hypothesis generation during a foodborne-illness outbreak investigation.
Foodborne disease outbreaks require rapid public health response to quickly identify potential sources and prevent future exposures; however, implicating a food vehicle in an outbreak can be challenging. The pathogens that contaminate food have many diverse reservoirs and can be transmitted in other ways (e.g., from one person to another or through contact with animals or contaminated water), resulting in seemingly limitless potential vehicles ( 2 ). Identifying a food vehicle is particularly challenging for clusters detected through national pathogen-specific surveillance: Cases can be geographically dispersed and lack an obvious epidemiologic link ( 4 ). Moreover, state and local health departments might have limited resources to dedicate to cluster and outbreak investigations ( 5 ). These challenges underscore the importance of hypothesis generation during an outbreak investigation. Hypothesis generation can narrow the list of potential food vehicles and focus subsequent epidemiologic, laboratory, environmental, and traceback efforts, ensuring that time and resources are used more efficiently and increasing the likelihood of timely identification of the vehicle. Timely investigations can prevent additional illnesses and increase the likelihood of identifying factors contributing to the outbreak.
The Integrated Food Safety Centers of Excellence were established in 2012 under the Food Safety Modernization Act to serve as resources for federal, state, and local public health professionals who detect and respond to foodborne illness outbreaks. The Integrated Food Safety Centers of Excellence aim to improve the quality of foodborne-illness outbreak investigations by providing public health professionals with training, tools, and model practices. In this paper, we provide a framework for generating hypotheses early during investigation of an outbreak or cluster detected through pathogen-specific surveillance; highlight tools to support rapid and effective hypothesis generation; and illustrate the practice of hypothesis generation using example outbreak case studies.
A hypothesis is “a supposition, arrived at from observation or reflection, that leads to refutable predictions; (or) any conjecture cast in a form that will allow it to be tested and refuted” ( 6 ). In a foodborne outbreak, the hypothesis states which food vehicle(s) could be the source of the outbreak and warrant further investigation. In practice, hypothesis generation is dynamic and iterative. It begins in the earliest stages of an investigation as investigators review available information and look for a pattern or “signal” that might emerge. As more information becomes available hypotheses are frequently evaluated and refined.
The framework presented here focuses on 3 primary sources of information for generating hypotheses, typically used in combination: 1) known sources of the pathogen causing illness; 2) person, place, and time characteristics of cases associated with the outbreak (descriptive data); and 3) case exposure assessment ( Figure 1 ). We discuss the approach for collecting, summarizing, and interpreting each of these sources of information and provide example outbreak case studies ( Table 1 ). We focus primarily on food exposures. However, at the onset of an investigation the transmission route is often unknown, and many pathogens commonly transmitted though food can also be transmitted through other routes (e.g., animal contact, person-to-person, waterborne). Thus, hypothesis generation should consider all potential transmission routes early in the investigation. Moreover, hypothesis generation should involve a multidisciplinary outbreak investigation team, including experienced colleagues who can provide information about past outbreaks and known sources of the pathogen causing illness.
Foodborne-Illness Outbreak Case Studies Highlighting Hypothesis-Generation Methods, United States, 2006–2018
. | . | . | . | . |
---|---|---|---|---|
STEC O157 outbreaks | ||||
STEC O157 associated with spinach | 225 | 27 | August–September 2006 | Of cases, 72% were female, with a median age of 27 years (range, 1–84 years), similar to descriptive data for other leafy greens outbreaks. Cases were interviewed using the Oregon “shotgun” questionnaire and results compared with the FoodNet Population Survey using binomial probability calculations; the proportion of outbreak cases reporting fresh spinach consumption was statistically significantly higher than the surveyed population ( ). |
HG methods: descriptive data, “shotgun” questionnaire, binomial probability calculations | ||||
STEC O157 associated with cookie dough | 77 | 30 | March–July 2009 | Investigators initially focused on known sources for STEC O157 (e.g., ground beef, raw dairy products), but no commonalities were identified. Then, a single interviewer conducted conversational open-ended interviews with 5 cases; all reported consuming ready-to-bake commercial prepackaged cookie dough. This hypothesis aligned with descriptive data (median age of 15 years, range, 2–65 years, 71% female) ( ). |
HG methods: open-ended interviews, single interviewer, descriptive data | ||||
STEC O157 associated with hazelnuts | 8 | 3 | December 2010 to February 2011 | In HG interviews most cases reported eating ground beef and in-shell mixed-nuts or in-shell hazelnuts. The ground beef hypothesis was ruled out because cases reported purchasing ground beef that was locally processed and distributed (i.e., inconsistent with cases in multiple states). The hazelnuts hypothesis was supported by binomial probability and case-case comparison studies, and confirmed using traceback investigations ( ). |
HG methods: specific product information, binomial probability calculations, case-case comparisons | ||||
outbreaks | ||||
serotypes Wandsworth and Typhimurium associated with vegetable-coated snack food | 69 | 23 | February–June 2007 | Investigators in multiple states interviewed parents of cases (96% of whom were <6 years of age) using HGQs. After no signal emerged, a single interviewer conducted 10 open-ended interviews, while 6 interviews were conducted using a questionnaire that included previously mentioned items and foods commonly consumed by young children. After multiple cases reported eating a vegetable-coated snack food, a formal multistate case-control study was conducted. The serotype Wandsworth strain was identified in product testing, along with multiple other serotypes. In a “backward” investigation, cases in PulseNet with the other outbreak strains were interviewed and found to have also consumed the snack food ( ). |
HG methods: single interviewer, open-ended interviews, iterative interviewing, backward investigation | ||||
I 4:[5]:12:i:- associated with Banquet turkey pot pies | 272 | 35 | August–October 2007 | During HG interviews, the first 2 cases reported frequent consumption of various microwaveable entrees. The third case reported daily consumption of Banquet pot pies. This prompted investigators to implement the iterative interviewing approach. When investigators specifically asked the first 2 cases, they both reported eating Banquet pot pies. A specific question about Banquet pot pies was added to the HG interviews for new cases, and the fourth case also reported having eaten them. The hypothesis was quickly confirmed by other states who asked a handful of cases specifically about their consumption of Banquet pot pies. |
HG methods: specific product information, iterative interviewing | ||||
Typhimurium associated with peanut butter | 714 | 46 | September 2008 to March 2009 | During HG interviews, 58% of cases reported exposure to institutional settings, 71% reported eating peanut butter, and 86% reported eating chicken, although cases reported eating multiple brands of both peanut butter and chicken. Then, investigators in one state were able to identify a common food distributor (of peanut butter) for subclusters of cases at 2 different long-term care facilities and an elementary school. Testing of an open 5-lb. container of peanut butter from one of the long-term care facilities yielded the outbreak strain. The company that produced the peanut butter also produced peanut paste used in packaged peanut butter crackers consumed by numerous cases in another state. Additional traceback investigations and testing of intact food products in other states ultimately confirmed the source as peanut butter ( ). |
HG methods: subcluster investigation, food testing | ||||
Virchow associated with Garden of Life Raw Meal Replacement | 33 | 23 | December 2015 to March 2016 | Garden of Life Raw Meal Replacement emerged as a strong hypothesis after it was mentioned by 3 cases in 3 different states and was quickly confirmed by interviewing a few additional cases. Three different questionnaires were used by state investigators, which shows it is not necessarily questionnaire design that is most important, but rather doing a quality interview and obtaining product details (either at the time of initial interview or upon re-interview) ( ). |
HG methods: specific product information | ||||
Montevideo associated with black and red pepper | 272 | 44 | July 2009 to April 2010 | Investigators conducted HG interviews, which did not lead to a hypothesis, but they did identify 3 subclusters. During open-ended interviews, cases reported consuming Italian-style meats and salami, and shopping at a national warehouse store chain. Using warehouse store membership cards, investigators confirmed that multiple cases had purchased the same pepper-encrusted salami product ( ). |
HG methods: subcluster investigation, shopper membership-card purchase information | ||||
Multiple serotypes associated with kratom | 199 | 41 | January 2017 to May 2018 | On the first multistate coordinating call, an investigator stated that a case mentioned “kratom” on a routine interview when asked about dietary supplements. This novel exposure was added to a supplemental question list for the outbreak shared with investigators and many others quickly collected reports of kratom consumption. Testing samples of kratom identified other serotypes, which matched more cases in PulseNet, who on interview had also consumed kratom. Ultimately, there were dozens of distinct PFGE patterns and 6 serotypes ( ). |
HG methods: iterative interviewing, backward investigation | ||||
outbreaks | ||||
associated with Crave Brothers Cheese | 6 | 5 | May–July 2013 | During interviews using the Initiative questionnaire (44), all 5 cases in a 4-state cluster reported eating soft cheeses at restaurants or from grocery stores. Investigators identified Crave Brothers as the common producer. A search of the PulseNet database revealed a large number of matching (by PFGE) environmental isolates collected 2 years prior, and all had come from the Crave Brothers plant. |
HG methods: specific product information, iterative interviewing, historical environmental isolates in PulseNet | ||||
associated with prepackaged caramel apples | 35 | 12 | October 2014 to January 2015 | Investigators conducted an open-ended interview with the first case. Then, investigators conducted an open-ended interview with the second case, along with adding objective questions about some foods mentioned by the first case. Specifically, a local investigator asked the second case about caramel apples based on the first case’s interview. The hypothesis was strengthened by other states quickly re-interviewing their cases ( ). |
HG methods: open-ended interviews, iterative interviewing |
. | . | . | . | . |
---|---|---|---|---|
STEC O157 outbreaks | ||||
STEC O157 associated with spinach | 225 | 27 | August–September 2006 | Of cases, 72% were female, with a median age of 27 years (range, 1–84 years), similar to descriptive data for other leafy greens outbreaks. Cases were interviewed using the Oregon “shotgun” questionnaire and results compared with the FoodNet Population Survey using binomial probability calculations; the proportion of outbreak cases reporting fresh spinach consumption was statistically significantly higher than the surveyed population ( ). |
HG methods: descriptive data, “shotgun” questionnaire, binomial probability calculations | ||||
STEC O157 associated with cookie dough | 77 | 30 | March–July 2009 | Investigators initially focused on known sources for STEC O157 (e.g., ground beef, raw dairy products), but no commonalities were identified. Then, a single interviewer conducted conversational open-ended interviews with 5 cases; all reported consuming ready-to-bake commercial prepackaged cookie dough. This hypothesis aligned with descriptive data (median age of 15 years, range, 2–65 years, 71% female) ( ). |
HG methods: open-ended interviews, single interviewer, descriptive data | ||||
STEC O157 associated with hazelnuts | 8 | 3 | December 2010 to February 2011 | In HG interviews most cases reported eating ground beef and in-shell mixed-nuts or in-shell hazelnuts. The ground beef hypothesis was ruled out because cases reported purchasing ground beef that was locally processed and distributed (i.e., inconsistent with cases in multiple states). The hazelnuts hypothesis was supported by binomial probability and case-case comparison studies, and confirmed using traceback investigations ( ). |
HG methods: specific product information, binomial probability calculations, case-case comparisons | ||||
outbreaks | ||||
serotypes Wandsworth and Typhimurium associated with vegetable-coated snack food | 69 | 23 | February–June 2007 | Investigators in multiple states interviewed parents of cases (96% of whom were <6 years of age) using HGQs. After no signal emerged, a single interviewer conducted 10 open-ended interviews, while 6 interviews were conducted using a questionnaire that included previously mentioned items and foods commonly consumed by young children. After multiple cases reported eating a vegetable-coated snack food, a formal multistate case-control study was conducted. The serotype Wandsworth strain was identified in product testing, along with multiple other serotypes. In a “backward” investigation, cases in PulseNet with the other outbreak strains were interviewed and found to have also consumed the snack food ( ). |
HG methods: single interviewer, open-ended interviews, iterative interviewing, backward investigation | ||||
I 4:[5]:12:i:- associated with Banquet turkey pot pies | 272 | 35 | August–October 2007 | During HG interviews, the first 2 cases reported frequent consumption of various microwaveable entrees. The third case reported daily consumption of Banquet pot pies. This prompted investigators to implement the iterative interviewing approach. When investigators specifically asked the first 2 cases, they both reported eating Banquet pot pies. A specific question about Banquet pot pies was added to the HG interviews for new cases, and the fourth case also reported having eaten them. The hypothesis was quickly confirmed by other states who asked a handful of cases specifically about their consumption of Banquet pot pies. |
HG methods: specific product information, iterative interviewing | ||||
Typhimurium associated with peanut butter | 714 | 46 | September 2008 to March 2009 | During HG interviews, 58% of cases reported exposure to institutional settings, 71% reported eating peanut butter, and 86% reported eating chicken, although cases reported eating multiple brands of both peanut butter and chicken. Then, investigators in one state were able to identify a common food distributor (of peanut butter) for subclusters of cases at 2 different long-term care facilities and an elementary school. Testing of an open 5-lb. container of peanut butter from one of the long-term care facilities yielded the outbreak strain. The company that produced the peanut butter also produced peanut paste used in packaged peanut butter crackers consumed by numerous cases in another state. Additional traceback investigations and testing of intact food products in other states ultimately confirmed the source as peanut butter ( ). |
HG methods: subcluster investigation, food testing | ||||
Virchow associated with Garden of Life Raw Meal Replacement | 33 | 23 | December 2015 to March 2016 | Garden of Life Raw Meal Replacement emerged as a strong hypothesis after it was mentioned by 3 cases in 3 different states and was quickly confirmed by interviewing a few additional cases. Three different questionnaires were used by state investigators, which shows it is not necessarily questionnaire design that is most important, but rather doing a quality interview and obtaining product details (either at the time of initial interview or upon re-interview) ( ). |
HG methods: specific product information | ||||
Montevideo associated with black and red pepper | 272 | 44 | July 2009 to April 2010 | Investigators conducted HG interviews, which did not lead to a hypothesis, but they did identify 3 subclusters. During open-ended interviews, cases reported consuming Italian-style meats and salami, and shopping at a national warehouse store chain. Using warehouse store membership cards, investigators confirmed that multiple cases had purchased the same pepper-encrusted salami product ( ). |
HG methods: subcluster investigation, shopper membership-card purchase information | ||||
Multiple serotypes associated with kratom | 199 | 41 | January 2017 to May 2018 | On the first multistate coordinating call, an investigator stated that a case mentioned “kratom” on a routine interview when asked about dietary supplements. This novel exposure was added to a supplemental question list for the outbreak shared with investigators and many others quickly collected reports of kratom consumption. Testing samples of kratom identified other serotypes, which matched more cases in PulseNet, who on interview had also consumed kratom. Ultimately, there were dozens of distinct PFGE patterns and 6 serotypes ( ). |
HG methods: iterative interviewing, backward investigation | ||||
outbreaks | ||||
associated with Crave Brothers Cheese | 6 | 5 | May–July 2013 | During interviews using the Initiative questionnaire (44), all 5 cases in a 4-state cluster reported eating soft cheeses at restaurants or from grocery stores. Investigators identified Crave Brothers as the common producer. A search of the PulseNet database revealed a large number of matching (by PFGE) environmental isolates collected 2 years prior, and all had come from the Crave Brothers plant. |
HG methods: specific product information, iterative interviewing, historical environmental isolates in PulseNet | ||||
associated with prepackaged caramel apples | 35 | 12 | October 2014 to January 2015 | Investigators conducted an open-ended interview with the first case. Then, investigators conducted an open-ended interview with the second case, along with adding objective questions about some foods mentioned by the first case. Specifically, a local investigator asked the second case about caramel apples based on the first case’s interview. The hypothesis was strengthened by other states quickly re-interviewing their cases ( ). |
HG methods: open-ended interviews, iterative interviewing |
Abbreviations: STEC: Shiga toxin-producing Escherichia coli , HG: hypothesis generation, HGQ: hypothesis-generating questionnaires, PFGE: pulsed-field gel electrophoresis.
When generating a hypothesis, investigators should consider historical information about the causative pathogen, including known reservoirs; foods (and animals) implicated in past outbreaks; findings from case-control studies of sporadic illnesses (i.e., diagnosed cases investigated during routine surveillance not linked to other cases); and molecular subtyping information of the pathogen, including information about nonhuman isolates (i.e., food, animal, or environmental sources).
The reservoir of the infectious agent can indicate potential sources and contributing factors. Pathogens with a human reservoir (e.g., norovirus, hepatitis A virus, and Shigella ) are commonly associated with infected food handlers or ready-to-eat foods that have been contaminated with human feces. In contrast, pathogens with animal reservoirs (e.g., Shiga toxin-producing Escherichia coli (STEC), nontyphoidal Salmonella , and Campylobacter ) are often associated with food sources of animal origin or foods that have been contaminated by animal feces during production (e.g., fresh produce). Pathogens with environmental reservoirs (e.g., Vibrio spp., Listeria monocytogenes , Clostridium botulinum ) are commonly associated with foods that can become contaminated by soil or water. Tools that help identify known pathogen sources include the National Outbreak Reporting System Dashboard ( 7 ), the Food and Drug Administration Bad Bug Book ( 8 ), and An Atlas of Salmonella in the United States ( 9 ).
Food-pathogen pairs identified in past outbreaks and case-control studies of sporadic illnesses provide information on common food vehicles associated with a pathogen. Using data on reported outbreaks from 1998–2016, the Interagency Food Safety Analytics Collaboration estimated the proportion of illnesses attributable to 17 major food categories ( 10 ). The foods most commonly associated with Salmonella illnesses were seeded vegetables (e.g., tomatoes and cucumbers), chicken, pork, and fruit, whereas most STEC illnesses were attributed to leafy greens or beef, and most Listeria illnesses to dairy products or fruits. Similarly, case-control studies of sporadic illnesses have found associations between pathogens and specific foods; for example, Campylobacter and poultry ( 11 ) and Listeria monocytogenes and melons and hummus ( 12 ).
For pathogens with multiple reservoirs, information that distinguishes isolates of the same species by phenotypic or genotypic characteristics can provide increased specificity. For example, there are over 2,600 serotypes of Salmonella ; however, some serotypes have been associated with specific food vehicles, such as Salmonella enterica serotype Enteritidis (SE) and eggs and chicken; serotypes Uganda and Infantis and pork; and serotypes Litchfield, Poona, Oranienburg, and Javiana and fruit ( 13 ). Antimicrobial resistance has also proven useful in differentiating major sources of Salmonella serotypes found in both animal- and plant-derived food commodities. For example, antimicrobial-resistant Salmonella outbreaks were more likely to be associated with meat and poultry (e.g., beef, chicken, and turkey), whereas foods commonly associated with susceptible Salmonella outbreaks were eggs, tomatoes, and melons ( 14 ).
Molecular subtyping with pulsed-field gel electrophoresis (PFGE) has been an essential subtyping tool for outbreak detection, and PFGE patterns have been associated with specific foods . For example, SE isolates with PFGE PulseNet pattern JEGX01.0004 have commonly been associated with eggs (and more recently, chicken), pattern JEGX01.0005 with chicken, and pattern JEGX01.0002 with travel or exposure to the US Pacific Northwest region and Mexico. Similarly, the same PFGE pattern of STEC O157:H7 has been associated with recurrent romaine lettuce outbreaks ( 15 , 16 ). In July 2019, whole-genome sequencing (WGS) replaced PFGE as the standard molecular subtyping method for the national PulseNet network, providing greater discrimination and more reliable indication of genetically related groupings than PFGE. This change in molecular method might limit historical comparisons temporarily, particularly to isolates from before the transition, as PFGE patterns and WGS results are not readily comparable. However, WGS allele codes have been applied to sequenced historical isolates in PulseNet, and although this represents a small proportion of all isolates in PulseNet, the representativeness of the WGS database will increase with time. As historical isolates and regulatory isolates from the Food and Drug Administration and US Department of Agriculture Food Safety and Inspection Service are sequenced, information about recent findings in foods and animals will fill the national database maintained at the National Center for Biotechnology Information ( 17 ) and be readily comparable to sequenced human clinical isolates.
Subtyping of nonhuman isolates collected by regulatory agencies from foods and food chain environments through routine testing or special studies can lead to the identification of outbreaks of human illness by searching the PulseNet database for the same molecular subtypes in human infections, sometimes referred to as “backward” outbreaks. For example, in 2007 public health authorities were investigating a multistate outbreak of Salmonella serotype Wandsworth in which patients reported consuming a puffed vegetable-coated snack food. Food testing yielded the outbreak strain of Salmonella serotype Wandsworth, but it also yielded Salmonella serotype Typhimurium; a search in the PulseNet database identified matching isolates from human cases of Salmonella serotype Typhimurium infection, and these cases confirmed consumption of the same snack food upon re-interview ( 18 ). Importantly, identifying a close genetic match between strains from a product and an illness does not alone establish causation; epidemiologic investigation and traceback are needed to connect the product and patient.
Descriptive epidemiology of cases, including person, place, or time characteristics, remains a powerful tool for hypothesis generation. Person characteristics can suggest foods that are more likely to be eaten by certain groups, whereas place and time characteristics can provide clues about the geographic distribution and shelf life of the food.
Person characteristics suggestive of certain foods include, but are not limited to, sex age, race, and ethnicity. For example, the median percentage of female cases in vegetable-associated STEC outbreaks was 64%, compared with 50% in beef STEC outbreaks ( 19 ). Likewise, there are differences in food consumption patterns by age, with the lowest median percent of children and adolescents in vegetable-associated STEC outbreaks and the highest in STEC dairy outbreaks ( 19 ). Similar trends are evident in the Centers for Disease Control and Prevention FoodNet Population Survey, a population-based survey to estimate the prevalence of risk factors for foodborne illness, which found that women reported consuming more fruits and vegetables than men, and men reported consuming more meat and poultry ( 20 ).
Time characteristics, displayed by the shape and pattern of an epidemic curve, can indicate the shelf life of a product or the harvest duration of a contaminated field. For example, cases spread over a longer time period might suggest a shelf-stable or frozen food item, ongoing harborage of the contaminating pathogen in a food processing plant, or other sustained mechanism of contamination. Conversely, cases with illness onset dates spread over a limited duration of time might suggest a perishable item, such as fresh produce. However, some fresh produce items have longer shelf lives than others and can cause more protracted outbreaks. Additionally, there are “special case” produce types. For example, outbreaks associated with sprouted seeds or beans, which have a short shelf life, are typically driven by a single contaminated seed lot, and un-sprouted seeds and beans can have a shelf life of months to years. Thus, single batches might be sprouted from the same contaminated lot of seeds at different times and in different places leading to a more sustained outbreak, or resulting in temporally and geographically distinct outbreaks ( 21 ). If an outbreak is detected early and exposure is ongoing, the temporal distribution of cases might be less clear early in an investigation. Thus, epidemic curves can provide supporting evidence that adds to the plausibility of a suspected food vehicle; however, depending on the outbreak, epidemic curves might provide more relevant information as the outbreak progresses.
Geographical mapping of cases can also help assess the plausibility of a suspected vehicle by comparing the distribution of cases with the distribution pattern of that food item, in consultation with regulatory and industry partners. For example, widespread outbreaks are caused by widely distributed commercial products, and some foods are more likely to be distributed nationally (e.g., bagged leafy greens, packaged cereal, national meat brands), whereas other are more likely to be distributed regionally (e.g., popular brands of ice cream) or locally (e.g., raw milk) ( 22 ). Likewise, if some outbreak-associated illnesses are clearly related to travel to a specific country, and others are in nontravelers, it suggests the latter might be associated with a product imported from that country. For example, a 2018 outbreak of Salmonella serotype Typhimurium infections in Canada occurred among persons traveling to Thailand, and among others who shopped at particular stores in Western Canada; the outbreak was ultimately traced to contaminated frozen profiteroles imported from Thailand ( 23 ). Similarly, in a 2011 multistate outbreak in the United States, a subset of cases traveled to Mexico and ate papaya there, and nontravel-associated cases ate papaya imported from Mexico ( 24 ).
Outbreak size and distribution can suggest certain food-pathogen pairs. For example, seafood toxins like ciguatoxin are typically produced or concentrated in an individual fish and therefore cause illness in a limited number of people in a single jurisdiction, whereas Salmonella and other bacterial pathogens can contaminate large amounts of a widely distributed product ( 22 ). The distribution of cases can be misleading or incomplete early in an outbreak, so investigators must use caution when using these parameters to rule out hypotheses and revisit as additional cases are identified. Moreover, an apparently local outbreak can be an early indicator of a larger problem. For example, in 2018, a large multistate outbreak of E. coli O157:H7 infections linked to romaine lettuce was initially detected in New Jersey in association with a single restaurant chain; within 8 days of detecting the cluster it had expanded to include many more cases with a variety of different exposure locations as far away as Nome, Alaska ( 15 ).
Rapidly collecting detailed food histories from cases in an outbreak is the most critical step in identifying commonalities between these cases. Before a cluster is detected, local or state public health agencies typically attempt to interview each individual, reportable enteric-pathogen case using a standard pathogen-specific questionnaire. If a cluster is detected, a review of these routine interviews can provide information on obvious high-risk exposures. In most jurisdictions, detailed hypothesis-generating questionnaires (HGQs) historically have been used only if commonalities are not identified from the initial routine interviews or if the hypotheses identified from routine interviews collapse under further investigation. However, a growing number of state health jurisdictions are conducting hypothesis-generating interviews with all cases of laboratory-confirmed Salmonella and STEC infection, opting to gather this information during the initial interview. This method is considered a best practice to maximize exposure recall ( 25 ), shaving days or weeks off the delay between case exposure and hypothesis-generating interview.
There are 3 major types of HGQs used in the United States ( 26 ):
Oregon “shotgun” questionnaire: This questionnaire uses a “shotgun,” or “trawling” approach of asking mostly close-ended questions for a long list of individual food items. The section order is designed to prompt recall of specific food exposures through review of places where food was purchased or eaten out, and specific repetitive questions for high-risk exposures such as raw foods or sprouts.
Minnesota “long form” hypothesis-generating questionnaire: This questionnaire combines close-ended questions about fewer food items with open-ended questions that seek details on dining/purchase location and brand-variety details for all foods.
National Hypothesis Generating Questionnaire: This questionnaire is a hybridized approach developed by Centers for Disease Control and Prevention that contains elements of both the Oregon and Minnesota models. Close-ended questions are asked about an intermediate number of food items, and brand/variety details are obtained only for commonly eaten types of foods. During national cluster investigations, the National Hypothesis Generating Questionnaire is deployed across state and local health departments to improve standardization across jurisdictions.
In addition to these questionnaires, there are many modified state-specific versions and national pathogen-specific HGQs (e.g., Listeria Initiative questionnaire, Cyclospora ). The use of HGQs can be enhanced by adopting a dynamic or iterative cluster investigation approach. In this approach, if a suspected food item or branded product emerges during interviews, that food item can be added to questionnaires administered to subsequent cases, and individuals who have already been interviewed can be re-interviewed to systematically collect information about that exposure ( 27 ). Decisions about which exposures should be pursued through re-interviews can be informed by descriptive data, as well as incubation periods, which can help define the most likely exposure period ( 28 ).
The number of interviewers participating in hypothesis-generating interviews can depend on resources and the specifics of the outbreak. A single interviewer approach can be advantageous in that a single interviewer might more clearly remember what previously interviewed persons mentioned and pursue clues as they arise during a live interview. However, this approach could slow investigations, particularly in sizable multistate clusters. An alternative is the “lead investigator model,” in which a single person directs the interviewing team with a limited number of interviewers, reviews completed interviews, and decides which exposures to pursue. This approach can be faster and more efficient than the single interviewer approach. When interviews are done by multiple agencies, it is important that the completed interviews be forwarded to the lead investigator promptly and that the group meet regularly and review results of interviews as the investigation proceeds.
If interviews with HGQs do not yield an actionable hypothesis, investigators should consider alternative approaches, such as questionnaire modification or open-ended interviews. Deciding when to attempt an alternative approach depends on cluster size, velocity of incident cases, and investigation effort expended and time elapsed without identification of a solid hypothesis. Questionnaire modification could include adding questions, such as open-ended questions or supplemental questions about exposures that came up on previous interviews, or pruning questions. For example, after 8–10 interviews, items that no case reported “yes” or “maybe” to eating may be removed. Removal of questions should be done cautiously because certain foods (e.g., stealth ingredients such as cilantro and sprouts) might be reported by a low proportion of cases who ate them. Another approach is open-ended interviews of recent cases, which could be considered after 20–25 initial cases in a large multistate investigation have been interviewed without yielding solid hypotheses. Conducted by a single interviewer, if possible, open-ended interviews should cover everything that a case ate or drank in the exposure period of interest, as well as other exposures including animals, grocery stores, restaurants, travel, parties or events, and details about how they prepare their food at home, including recipes. After the first person is interviewed, objective questions about specific exposures can be added to the open-ended interviews of subsequent cases, creating a hybrid open-ended/iterative model. This requires cooperative patients and a persistent investigative approach but has yielded correct hypotheses with as few as 2 interviews ( 29 ).
Additional methods to ascertain exposures, such as obtaining consumer food purchase data, can be appropriate, particularly for outbreaks where obtaining a food history is challenging ( 30 ). For example, during a multistate Salmonella serotype Montevideo outbreak, initial hypothesis-generating interviews did not identify a clear signal beyond shopping at the same warehouse store. Investigators used shopper membership card purchase information to generate hypotheses, which ultimately helped identify red and black peppercorns coating a ready-to-eat salami as the vehicle ( 31 ). In addition, information from services for grocery home delivery, restaurant take-out delivery, and meal kits might help to clarify specific exposures. Other potential methods include focus-group interviews and household inspections, although these are used more rarely and in specific scenarios, with mixed results ( 32 ).
Binomial probability comparisons can further refine hypotheses by comparing the proportion of cases in an outbreak reporting a food exposure with the expected background proportion of the population reporting the food exposure ( 33 , 34 ). Binomial probability calculations in foodborne-disease outbreak investigations emerged in Oregon in 2003 as a complement to the pioneered “shotgun” questionnaire and use independent data sources on food exposure frequency from sporadic cases, past outbreak cases, or well persons sampled from the population. Such data sources include data from healthy people surveyed as part of the FoodNet Population Survey, standardized data collected in previous outbreaks, or sporadic cases as is done with the Listeria Initiative and Project Hg ( 33 , 35 , 36 ).
Hypothesis generation is a critical, but challenging, step in a foodborne outbreak investigation. A well-informed hypothesis can increase the likelihood of rapidly and conclusively implicating the contaminated food vehicle; conversely, the chances of implicating a food item are small if that item is not considered as part of the outbreak investigation. Inadequate hypothesis generation can delay investigation progress and limit investigators’ ability to rapidly identify the outbreak source, potentially leading to prolonged exposure and more illnesses. The 3 primary sources of information presented as part of this framework—known sources of the pathogen causing illness, descriptive data, and case exposure assessment—provide vital information for hypothesis generation, particularly when used in combination and revisited throughout the outbreak investigation.
Despite these sources of information, there are certain types of outbreaks for which hypothesis generation is inherently more challenging. These include outbreaks for which the vehicle has a high background rate of consumption (e.g., chicken) or outbreaks associated with a “stealth” food (e.g., garnishes, spices, chili peppers, or sprouts) that many cases could have consumed, but few remember eating. These challenges can sometimes be overcome by obtaining details on food exposures such as brand/variety and point of purchase. Obtaining this information is also critical to rapidly initiating a traceback investigation. An outbreak might also be caused by multiple contaminated food products when, for example, multiple foods have a single common ingredient or when poor sanitation or contaminated equipment leads to cross-contamination. Furthermore, the key exposure might not be a food at all, but rather an environmental or animal exposure, emphasizing that food should not be the default hypothesis.
There might be specific clues or “toe-holds” that help identify a hypothesis and accelerate an investigation. For example, cases with restricted diets, food diaries, or highly unusual or specific exposures can narrow the list of potential foods. This could include cases who traveled briefly to the outbreak location, and thus had a limited number of exposures. Smaller, localized clusters within a larger outbreak associated with restaurants, events, stores, or institutions, or “subclusters,” are often crucial to hypothesis generation, providing a finite list of foods. For example, in a multistate outbreak of Salmonella serotype Typhimurium infections associated with consumption of tomatoes, comparison of 4 restaurant-associated subclusters was instrumental in rapidly identifying a small set of potential vehicles ( 4 ). Subcluster investigations are precisely focused and as such can lead to much more rapid and efficient hypothesis generation and testing than attempts to assess all exposures among all cases in a large outbreak. Because of the immense value of subclusters, every effort should be made to quickly identify them through initial interviews and the iterative interviewing approach ( 25 ).
The majority of outbreaks are associated with common foods previously associated with that pathogen. In an investigation, it is important to both rule in and rule out common vehicles, while keeping an open mind about potential novel vehicles. If investigators suspect a novel vehicle, they should still rule out the most common vehicles when designing epidemiologic studies. For example, if an STEC outbreak investigation implicates cucumbers, regulatory partners will want to confirm that investigators have eliminated common STEC vehicles such as ground beef, leafy greens, and sprouts. That said, food vehicles change over time, reflecting changing food preferences and trends in food safety measures, and new vehicles continue to emerge (e.g., in recent years: SoyNut butter, raw flour, caramel apples, kratom, and chia seed powder). HGQs are biased toward previously implicated foods and a finite list of foods. If cases continue without a clear hypothesis emerging, it might be necessary to try open-ended hypothesis-generating interviews.
Hypothesis generation during foodborne outbreak investigation will evolve as laboratory techniques advance. Molecular sequencing techniques based on WGS might give investigators more conviction in devoting resources to following leads because there is more confidence that the cases have a common source for their illnesses ( 17 , 37 ). Concurrent or recent nonhuman isolates (e.g., food isolates) that match human case isolates by sequencing will be considered even more likely to be related to the human cases and become a priori hypotheses during investigations.
Foodborne-outbreak investigation methods are constantly evolving. Food production, processing, and distribution are changing to meet consumer demands. Outbreak investigations are more complex, given that laboratory methods for subtyping, strategies for epidemiologic investigation, and environmental assessments are also changing. Rapid investigation is essential, because with mass production and distribution, food safety errors can cause large and widespread outbreaks. Outbreak investigations balance the need for expediency to implement control measures with the need for accuracy. If hastily developed hypotheses are incorrect or insufficiently refined, analytical studies are unlikely to succeed and can waste time and resources. Alternatively, a refined hypothesis can lead directly to effective public health interventions, sometimes bypassing the need for an analytical study, if accompanied with other compelling evidence, such as laboratory evidence or traceback information.
Effectively and swiftly sharing data across jurisdictions increases an investigations team’s ability to quickly develop hypotheses and implicate food vehicles. Successful investigations depend on including the correct hypothesis, the result of a systematic approach to hypothesis generation. The exact path to identifying a hypothesis is rarely the same between outbreaks. Therefore, investigators should be familiar with different hypothesis-generating strategies and be flexible in deciding which strategies to employ.
Author affiliations: Department of Epidemiology, Colorado School of Public Health, Aurora, Colorado, United States (Alice E. White, Elaine Scallan Walter); Minnesota Department of Health, St. Paul, Minnesota, United States (Kirk E. Smith, Carlota Medus); Washington State Department of Health, Tumwater, Washington, United States (Hillary Booth); and Division of Foodborne, Waterborne, and Environmental Diseases, National Center for Emerging Zoonotic and Infectious Diseases, Centers for Disease Control and Prevention, Atlanta, Georgia, United States (Robert V. Tauxe, Laura Gieraltowski).
This work was funded in part by the Colorado and Minnesota Integrated Food Safety Centers of Excellence, which are supported by the Epidemiology and Laboratory Capacity for Infectious Disease Cooperative Agreement through the Centers for Disease Control and Prevention.
Conflict of interest: none declared.
Scallan E , Hoekstra RM , Angulo FJ , et al. Foodborne illness acquired in the United States—major pathogens . Emerg Infect Dis . 2011 ; 17 ( 1 ): 7 – 15 .
Google Scholar
Tauxe RV . Surveillance and investigation of foodborne diseases; roles for public health in meeting objectives for food safety . Food Control . 2002 ; 13 ( 6-7 ): 363 – 369 .
Dewey-Mattia D , Manikonda K , Hall AJ , et al. Surveillance for foodborne disease outbreaks—United States, 2009–2015 . MMWR Morb Mortal Wkly Rep . 2018 ; 67 ( 10 ): 1 – 11 .
Behravesh CB , Blaney D , Medus C , et al. Multistate outbreak of Salmonella serotype typhimurium infections associated with consumption of restaurant tomatoes, USA, 2006: hypothesis generation through case exposures in multiple restaurant clusters . Epidemiol Infect . 2012 ; 140 ( 11 ): 2053 – 2061 .
Boulton ML , Rosenberg LD . Food safety epidemiology capacity in state health departments—United States, 2010 . MMWR Morb Mortal Wkly Rep . 2011 ; 60 ( 50 ): 1701 – 1704 .
Porta MA A Dictionary of Epidemiology . 5th ed. New York, NY : Oxford University Press ; 2008 ( 4 ): 82 .
Centers for Disease Control and Prevention . National Outbreak Reporting System Dashboard. https://wwwn.cdc.gov/norsdashboard/ . Updated December 7, 2018 . Accessed April 9, 2021 .
Lampel KA , Al-Khaldi S , Cahill SM , eds. Bad Bug Book, Foodborne Pathogenic Microorganisms and Natural Toxins . 2nd ed. Washington, DC : Food and Drug Administration ; 2012 .
Google Preview
Centers for Disease Control and Prevention . An Atlas of Salmonella in the United States, 1968–2011: Laboratory-Based Enteric Disease Surveillance . Atlanta, GA : US Department of Health and Human Services, CDC ; 2013 . https://www.cdc.gov/salmonella/pdf/salmonella-atlas-508c.pdf . Accessed April 9, 2021 .
Interagency Food Safety Analytics Collaboration . Foodborne Illness Source Attribution Estimates for 2017 for Salmonella , Escherichia coli O157, Listeria monocytogenes , and Campylobacter Using Multi-Year Outbreak Surveillance Data, United States . Atlanta, GA and Washington DC : US Department of Health and Human Services ; 2019 . https://www.cdc.gov/foodsafety/ifsac/pdf/P19-2017-report-TriAgency-508-archived.pdf . Accessed April 9, 2021 .
Friedman CR , Hoekstra RM , Samuel M , et al. Risk factors for sporadic Campylobacter infection in the United States: a case‐control study in FoodNet sites . Clin Infect Dis . 2004 ; 38 ( suppl 3 ): S285 – S296 .
Varma J , Samuel M , Marcus R , et al. Listeria monocytogenes infection from foods prepared in a commercial establishment: a case-control study of potential sources of sporadic illness in the United States . Clin Infect Dis . 2007 ; 44 ( 4 ): 521 – 528 .
Jackson BR , Griffin PM , Cole D , et al. Outbreak-associated Salmonella enterica serotypes and food commodities, United States, 1998--2008 . Emerg Infect Dis . 2013 ; 19 ( 8 ): 1239 – 1244 .
Brown AC , Grass JE , Richardson LC , et al. Antimicrobial resistance in Salmonella that caused foodborne disease outbreaks: United States, 2003–2012 . Epidemiol Infect . 2017 ; 145 ( 4 ): 766 – 774 .
Centers for Disease Control and Prevention . Multistate outbreak of E. coli O157:H7 infections linked to romaine lettuce. https://www.cdc.gov/ecoli/2018/o157h7-04-18/index.html . Published June 28, 2018 . Accessed August 6, 2020 .
Centers for Disease Control and Prevention . Outbreak of E. coli infections linked to romaine lettuce. https://www.cdc.gov/ecoli/2019/o157h7-11-19/index.html . Published January 15, 2020 . Accessed August 6, 2020 .
Besser JM , Carleton HA , Trees E , et al. Interpretation of whole-genome sequencing for enteric disease surveillance and outbreak investigation . Foodborne Pathog Dis . 2019 ; 16 ( 7 ): 504 – 512 .
Sotir MJ , Ewald G , Kimura AC , et al. Outbreak of Salmonella Wandsworth and Typhimurium infections in infants and toddlers traced to a commercial vegetable-coated snack food . Pediatr Infect Dis J . 2009 ; 28 ( 12 ): 1041 – 1046 .
White A , Cronquist A , Bedrick E , et al. Food source prediction of Shiga toxin-producing Escherichia coli outbreaks using demographic and outbreak characteristics, United States, 1998–2014 . Foodborne Pathog Dis . 2016 ; 13 ( 10 ): 527 – 534 .
Shiferaw B , Verrill L , Booth H , et al. Sex-based differences in food consumption: Foodborne Diseases Active Surveillance Network (FoodNet) Population Survey, 2006–2007 . Clin Infect Dis . 2012 ; 54 ( suppl 5 ): S453 – S457 .
Ferguson DD , Scheftel J , Cronquist A , et al. Temporally distinct Escherichia coli O157 outbreaks associated with alfalfa sprouts linked to a common seed source—Colorado and Minnesota, 2003 . Epidemiol Infect . 2005 ; 133 ( 3 ): 439 – 447 .
Tauxe RV . Emerging foodborne diseases: an evolving public health challenge . Emerg Infect Dis . 1997 ; 3 ( 4 ): 425 – 434 .
Public Health Agency of Canada . Public Health Notice—outbreak of Salmonella infections linked to Celebrate brand frozen classic/classical and egg nog flavoured profiteroles (cream puffs) and mini chocolate eclairs. https://www.canada.ca/en/public-health/services/public-health-notices/2019/outbreak-salmonella.html . Published June 27, 2019 . Accessed August 6, 2020 .
Mba-Jonas A , Culpepper W , Hill T , et al. A multistate outbreak of human Salmonella Agona infections associated with consumption of fresh, whole papayas imported from Mexico—United States, 2011 . Clin Infect Dis . 2018 ; 66 ( 11 ): 1756 – 1761 .
Hedberg C . Guidelines for Foodborne Disease Outbreak Response . 3rd ed. Atlanta, GA : Council to Improve Foodborne Outbreak Response (CIFOR) ; 2020 .
Centers for Disease Control and Prevention . Foodborne disease outbreak investigation and surveillance tools. https://www.cdc.gov/foodsafety/outbreaks/surveillance-reporting/investigation-toolkit.html . Reviewed June 10, 2021 . Accessed July 2, 2021 .
Meyer SD , Kirk SE , Hedberg CH . Chapter 7.2—Surveillance for foodborne diseases, part 2: investigation of foodborne disease outbreaks. In: M'ikanatha NM , Lynfield R , Van Beneden CA , et al. eds. Infectious Disease Surveillance . 5th ed. West Sussex, UK : Wiley-Blackwell ; 2013 : 120 – 128 .
Chai S , Gu W , O'Connor KA , et al. Incubation periods of enteric illnesses in foodborne outbreaks, United States, 1998-2013 . Epidemiol Infect . 2019 ; 147 :e285.
Angelo KM , Conrad AR , Saupe A , et al. Multistate outbreak of Listeria monocytogenes infections linked to whole apples used in commercially produced, prepackaged caramel apples: United States, 2014-2015 . Epidemiol Infect . 2017 ; 145 ( 5 ): 848 – 856 .
Møller FT , Mølbak K , Ethelberg S . Analysis of consumer food purchase data used for outbreak investigations, a review . Euro Surveill . 2018 ; 23 ( 24 ):1700503.
Gieraltowski L , Julian E , Pringle J , et al. Nationwide outbreak of Salmonella Montevideo infections associated with contaminated imported black and red pepper: warehouse membership cards provide critical clues to identify the source . Epidemiol Infect . 2013 ; 141 ( 6 ): 1244 – 1252 .
Ickert C , Cheng J , Reimer D , et al. Methods for generating hypotheses in human enteric illness outbreak investigations: a scoping review of the evidence . Epidemiol Infect . 2019 ; 147 :e280.
Jervis RH , Booth H , Cronquist AB , et al. Moving away from population-based case-control studies during outbreak investigations . J Food Prot . 2019 ; 82 ( 8 ): 1412 – 1416 .
Keene W . The use of binomial probabilities in outbreak investigations (abstract). In: Presented at the Annual OutbreakNet Conference, Long Beach . California ; September 22, 2011 .
McCollum JT , Cronquist AB , Silk BJ , et al. Multistate outbreak of listeriosis associated with cantaloupe . N Engl J Med . 2013 ; 369 ( 10 ): 944 – 953 .
Centers for Disease Control and Prevention . National Listeria Surveillance: Listeria initiative. https://www.cdc.gov/nationalsurveillance/listeria-surveillance.html . Published September 13, 2018 . Accessed August 6, 2020
Jackson BR , Tarr C , Strain E , et al. Implementation of nationwide real-time whole-genome sequencing to enhance listeriosis outbreak detection and investigation . Clin Infect Dis . 2016 ; 63 ( 3 ): 380 – 386 .
Sharapov UM , Wendel AM , Davis JP , et al. Multistate outbreak of Escherichia coli O157:H7 infections associated with consumption of fresh spinach: United States, 2006 . J Food Prot . 2016 ; 79 ( 12 ): 2024 – 2030 .
Neil KP , Biggerstaff G , MacDonald JK , et al. A novel vehicle for transmission of Escherichia coli O157:H7 to humans: multistate outbreak of E. coli O157:H7 infections associated with consumption of ready-to-bake commercial prepackaged cookie dough—United States, 2009 . Clin Infect Dis . 2012 ; 54 ( 4 ): 511 – 518 .
Miller BD , Rigdon CE , Ball J , et al. Use of traceback methods to confirm the source of a multistate Escherichia coli O157:H7 outbreak due to in-shell hazelnuts . J Food Prot . 2012 ; 75 ( 2 ): 320 – 327 .
Medus C , Meyer S , Smith K , et al. Multistate outbreak of Salmonella infections associated with peanut butter and peanut butter-containing products—United States, 2008–2009 . MMWR Morb Mortal Wkly Rep . 2009 ; 58 ( 4 ): 85 – 90 .
Gambino-Shirley KJ , Tesfai A , Schwensohn CA , et al. Multistate outbreak of Salmonella Virchow infections linked to a powdered meal replacement product—United States, 2015–2016 . Clin Infect Dis . 2018 ; 67 ( 6 ): 890 – 896 .
Centers for Disease Control and Prevention . Multistate outbreak of Salmonella infections linked to kratom. https://www.cdc.gov/salmonella/kratom-02-18/index.html . 2018 . Published February 20, 2018 . Accessed September 14, 2020 .
Centers for Disease Control and Prevention . Multistate outbreak of Salmonella infections linked to kratom. https://www.cdc.gov/nationalsurveillance/listeria-surveillance.html . Last reviewed September 13, 2018 . Accessed July 2, 2021 .
Month: | Total Views: |
---|---|
April 2021 | 45 |
May 2021 | 34 |
June 2021 | 45 |
July 2021 | 37 |
August 2021 | 22 |
September 2021 | 30 |
October 2021 | 89 |
November 2021 | 60 |
December 2021 | 45 |
January 2022 | 33 |
February 2022 | 67 |
March 2022 | 61 |
April 2022 | 32 |
May 2022 | 37 |
June 2022 | 36 |
July 2022 | 11 |
August 2022 | 23 |
September 2022 | 33 |
October 2022 | 86 |
November 2022 | 72 |
December 2022 | 62 |
January 2023 | 58 |
February 2023 | 63 |
March 2023 | 95 |
April 2023 | 70 |
May 2023 | 108 |
June 2023 | 57 |
July 2023 | 68 |
August 2023 | 71 |
September 2023 | 82 |
October 2023 | 78 |
November 2023 | 85 |
December 2023 | 64 |
January 2024 | 87 |
February 2024 | 73 |
March 2024 | 111 |
April 2024 | 96 |
May 2024 | 69 |
June 2024 | 74 |
July 2024 | 57 |
August 2024 | 32 |
Citing articles via, looking for your next opportunity.
Oxford University Press is a department of the University of Oxford. It furthers the University's objective of excellence in research, scholarship, and education by publishing worldwide
Sign In or Create an Account
This PDF is available to Subscribers Only
For full access to this pdf, sign in to an existing account, or purchase an annual subscription.
Hypothesis testing involves formulating assumptions about population parameters based on sample statistics and rigorously evaluating these assumptions against empirical evidence. This article sheds light on the significance of hypothesis testing and the critical steps involved in the process.
A hypothesis is an assumption or idea, specifically a statistical claim about an unknown population parameter. For example, a judge assumes a person is innocent and verifies this by reviewing evidence and hearing testimony before reaching a verdict.
Hypothesis testing is a statistical method that is used to make a statistical decision using experimental data. Hypothesis testing is basically an assumption that we make about a population parameter. It evaluates two mutually exclusive statements about a population to determine which statement is best supported by the sample data.
To test the validity of the claim or assumption about the population parameter:
Example: You say an average height in the class is 30 or a boy is taller than a girl. All of these is an assumption that we are assuming, and we need some statistical way to prove these. We need some mathematical conclusion whatever we are assuming is true.
Hypothesis testing is an important procedure in statistics. Hypothesis testing evaluates two mutually exclusive population statements to determine which statement is most supported by sample data. When we say that the findings are statistically significant, thanks to hypothesis testing.
One tailed test focuses on one direction, either greater than or less than a specified value. We use a one-tailed test when there is a clear directional expectation based on prior knowledge or theory. The critical region is located on only one side of the distribution curve. If the sample falls into this critical region, the null hypothesis is rejected in favor of the alternative hypothesis.
There are two types of one-tailed test:
A two-tailed test considers both directions, greater than and less than a specified value.We use a two-tailed test when there is no specific directional expectation, and want to detect any significant difference.
Example: H 0 : [Tex]\mu = [/Tex] 50 and H 1 : [Tex]\mu \neq 50 [/Tex]
To delve deeper into differences into both types of test: Refer to link
In hypothesis testing, Type I and Type II errors are two possible errors that researchers can make when drawing conclusions about a population based on a sample of data. These errors are associated with the decisions made regarding the null hypothesis and the alternative hypothesis.
Null Hypothesis is True | Null Hypothesis is False | |
---|---|---|
Null Hypothesis is True (Accept) | Correct Decision | Type II Error (False Negative) |
Alternative Hypothesis is True (Reject) | Type I Error (False Positive) | Correct Decision |
Step 1: define null and alternative hypothesis.
State the null hypothesis ( [Tex]H_0 [/Tex] ), representing no effect, and the alternative hypothesis ( [Tex]H_1 [/Tex] ), suggesting an effect or difference.
We first identify the problem about which we want to make an assumption keeping in mind that our assumption should be contradictory to one another, assuming Normally distributed data.
Select a significance level ( [Tex]\alpha [/Tex] ), typically 0.05, to determine the threshold for rejecting the null hypothesis. It provides validity to our hypothesis test, ensuring that we have sufficient data to back up our claims. Usually, we determine our significance level beforehand of the test. The p-value is the criterion used to calculate our significance value.
Gather relevant data through observation or experimentation. Analyze the data using appropriate statistical methods to obtain a test statistic.
The data for the tests are evaluated in this step we look for various scores based on the characteristics of data. The choice of the test statistic depends on the type of hypothesis test being conducted.
There are various hypothesis tests, each appropriate for various goal to calculate our test. This could be a Z-test , Chi-square , T-test , and so on.
We have a smaller dataset, So, T-test is more appropriate to test our hypothesis.
T-statistic is a measure of the difference between the means of two groups relative to the variability within each group. It is calculated as the difference between the sample means divided by the standard error of the difference. It is also known as the t-value or t-score.
In this stage, we decide where we should accept the null hypothesis or reject the null hypothesis. There are two ways to decide where we should accept or reject the null hypothesis.
Comparing the test statistic and tabulated critical value we have,
Note: Critical values are predetermined threshold values that are used to make a decision in hypothesis testing. To determine critical values for hypothesis testing, we typically refer to a statistical distribution table , such as the normal distribution or t-distribution tables based on.
We can also come to an conclusion using the p-value,
Note : The p-value is the probability of obtaining a test statistic as extreme as, or more extreme than, the one observed in the sample, assuming the null hypothesis is true. To determine p-value for hypothesis testing, we typically refer to a statistical distribution table , such as the normal distribution or t-distribution tables based on.
At last, we can conclude our experiment using method A or B.
To validate our hypothesis about a population parameter we use statistical functions . We use the z-score, p-value, and level of significance(alpha) to make evidence for our hypothesis for normally distributed data .
When population means and standard deviations are known.
[Tex]z = \frac{\bar{x} – \mu}{\frac{\sigma}{\sqrt{n}}}[/Tex]
T test is used when n<30,
t-statistic calculation is given by:
[Tex]t=\frac{x̄-μ}{s/\sqrt{n}} [/Tex]
Chi-Square Test for Independence categorical Data (Non-normally distributed) using:
[Tex]\chi^2 = \sum \frac{(O_{ij} – E_{ij})^2}{E_{ij}}[/Tex]
Let’s examine hypothesis testing using two real life situations,
Imagine a pharmaceutical company has developed a new drug that they believe can effectively lower blood pressure in patients with hypertension. Before bringing the drug to market, they need to conduct a study to assess its impact on blood pressure.
Let’s consider the Significance level at 0.05, indicating rejection of the null hypothesis.
If the evidence suggests less than a 5% chance of observing the results due to random variation.
Using paired T-test analyze the data to obtain a test statistic and a p-value.
The test statistic (e.g., T-statistic) is calculated based on the differences between blood pressure measurements before and after treatment.
t = m/(s/√n)
then, m= -3.9, s= 1.8 and n= 10
we, calculate the , T-statistic = -9 based on the formula for paired t test
The calculated t-statistic is -9 and degrees of freedom df = 9, you can find the p-value using statistical software or a t-distribution table.
thus, p-value = 8.538051223166285e-06
Step 5: Result
Conclusion: Since the p-value (8.538051223166285e-06) is less than the significance level (0.05), the researchers reject the null hypothesis. There is statistically significant evidence that the average blood pressure before and after treatment with the new drug is different.
Let’s create hypothesis testing with python, where we are testing whether a new drug affects blood pressure. For this example, we will use a paired T-test. We’ll use the scipy.stats library for the T-test.
Scipy is a mathematical library in Python that is mostly used for mathematical equations and computations.
We will implement our first real life problem via python,
import numpy as np from scipy import stats # Data before_treatment = np . array ([ 120 , 122 , 118 , 130 , 125 , 128 , 115 , 121 , 123 , 119 ]) after_treatment = np . array ([ 115 , 120 , 112 , 128 , 122 , 125 , 110 , 117 , 119 , 114 ]) # Step 1: Null and Alternate Hypotheses # Null Hypothesis: The new drug has no effect on blood pressure. # Alternate Hypothesis: The new drug has an effect on blood pressure. null_hypothesis = "The new drug has no effect on blood pressure." alternate_hypothesis = "The new drug has an effect on blood pressure." # Step 2: Significance Level alpha = 0.05 # Step 3: Paired T-test t_statistic , p_value = stats . ttest_rel ( after_treatment , before_treatment ) # Step 4: Calculate T-statistic manually m = np . mean ( after_treatment - before_treatment ) s = np . std ( after_treatment - before_treatment , ddof = 1 ) # using ddof=1 for sample standard deviation n = len ( before_treatment ) t_statistic_manual = m / ( s / np . sqrt ( n )) # Step 5: Decision if p_value <= alpha : decision = "Reject" else : decision = "Fail to reject" # Conclusion if decision == "Reject" : conclusion = "There is statistically significant evidence that the average blood pressure before and after treatment with the new drug is different." else : conclusion = "There is insufficient evidence to claim a significant difference in average blood pressure before and after treatment with the new drug." # Display results print ( "T-statistic (from scipy):" , t_statistic ) print ( "P-value (from scipy):" , p_value ) print ( "T-statistic (calculated manually):" , t_statistic_manual ) print ( f "Decision: { decision } the null hypothesis at alpha= { alpha } ." ) print ( "Conclusion:" , conclusion )
T-statistic (from scipy): -9.0 P-value (from scipy): 8.538051223166285e-06 T-statistic (calculated manually): -9.0 Decision: Reject the null hypothesis at alpha=0.05. Conclusion: There is statistically significant evidence that the average blood pressure before and after treatment with the new drug is different.
In the above example, given the T-statistic of approximately -9 and an extremely small p-value, the results indicate a strong case to reject the null hypothesis at a significance level of 0.05.
Data: A sample of 25 individuals is taken, and their cholesterol levels are measured.
Cholesterol Levels (mg/dL): 205, 198, 210, 190, 215, 205, 200, 192, 198, 205, 198, 202, 208, 200, 205, 198, 205, 210, 192, 205, 198, 205, 210, 192, 205.
Populations Mean = 200
Population Standard Deviation (σ): 5 mg/dL(given for this problem)
As the direction of deviation is not given , we assume a two-tailed test, and based on a normal distribution table, the critical values for a significance level of 0.05 (two-tailed) can be calculated through the z-table and are approximately -1.96 and 1.96.
The test statistic is calculated by using the z formula Z = [Tex](203.8 – 200) / (5 \div \sqrt{25}) [/Tex] and we get accordingly , Z =2.039999999999992.
Step 4: Result
Since the absolute value of the test statistic (2.04) is greater than the critical value (1.96), we reject the null hypothesis. And conclude that, there is statistically significant evidence that the average cholesterol level in the population is different from 200 mg/dL
import scipy.stats as stats import math import numpy as np # Given data sample_data = np . array ( [ 205 , 198 , 210 , 190 , 215 , 205 , 200 , 192 , 198 , 205 , 198 , 202 , 208 , 200 , 205 , 198 , 205 , 210 , 192 , 205 , 198 , 205 , 210 , 192 , 205 ]) population_std_dev = 5 population_mean = 200 sample_size = len ( sample_data ) # Step 1: Define the Hypotheses # Null Hypothesis (H0): The average cholesterol level in a population is 200 mg/dL. # Alternate Hypothesis (H1): The average cholesterol level in a population is different from 200 mg/dL. # Step 2: Define the Significance Level alpha = 0.05 # Two-tailed test # Critical values for a significance level of 0.05 (two-tailed) critical_value_left = stats . norm . ppf ( alpha / 2 ) critical_value_right = - critical_value_left # Step 3: Compute the test statistic sample_mean = sample_data . mean () z_score = ( sample_mean - population_mean ) / \ ( population_std_dev / math . sqrt ( sample_size )) # Step 4: Result # Check if the absolute value of the test statistic is greater than the critical values if abs ( z_score ) > max ( abs ( critical_value_left ), abs ( critical_value_right )): print ( "Reject the null hypothesis." ) print ( "There is statistically significant evidence that the average cholesterol level in the population is different from 200 mg/dL." ) else : print ( "Fail to reject the null hypothesis." ) print ( "There is not enough evidence to conclude that the average cholesterol level in the population is different from 200 mg/dL." )
Reject the null hypothesis. There is statistically significant evidence that the average cholesterol level in the population is different from 200 mg/dL.
Hypothesis testing stands as a cornerstone in statistical analysis, enabling data scientists to navigate uncertainties and draw credible inferences from sample data. By systematically defining null and alternative hypotheses, choosing significance levels, and leveraging statistical tests, researchers can assess the validity of their assumptions. The article also elucidates the critical distinction between Type I and Type II errors, providing a comprehensive understanding of the nuanced decision-making process inherent in hypothesis testing. The real-life example of testing a new drug’s effect on blood pressure using a paired T-test showcases the practical application of these principles, underscoring the importance of statistical rigor in data-driven decision-making.
1. what are the 3 types of hypothesis test.
There are three types of hypothesis tests: right-tailed, left-tailed, and two-tailed. Right-tailed tests assess if a parameter is greater, left-tailed if lesser. Two-tailed tests check for non-directional differences, greater or lesser.
Null Hypothesis ( [Tex]H_o [/Tex] ): No effect or difference exists. Alternative Hypothesis ( [Tex]H_1 [/Tex] ): An effect or difference exists. Significance Level ( [Tex]\alpha [/Tex] ): Risk of rejecting null hypothesis when it’s true (Type I error). Test Statistic: Numerical value representing observed evidence against null hypothesis.
Statistical method to evaluate the performance and validity of machine learning models. Tests specific hypotheses about model behavior, like whether features influence predictions or if a model generalizes well to unseen data.
Pytest purposes general testing framework for Python code while Hypothesis is a Property-based testing framework for Python, focusing on generating test cases based on specified properties of the code.
Similar reads.
Hypothesis generation, on this page, questionnaires and interviewing, exposure analysis.
Generating hypotheses is an important, but often challenging, step in an outbreak investigation . When generating hypotheses, it is best to keep an open mind and to cast a wide net. A good starting place would be to identify exposures that have been previously been associated with the pathogen under investigation. This can be done by:
If the case definition for the illnesses under investigation includes laboratory information in the form of Whole Genome Sequencing (WGS) results, consider investigating where and when the sequence has been seen before. Provincial and federal public health laboratories maintain WGS databases that can contain valuable information for outbreak investigation purposes. PulseNet Canada can provide information about how common or rare the serotype or sequence is nationally, where and when it was last seen, and if it has been detected in any food samples in the past. PulseNet Canada will also be able to check the United States’ PulseNet WGS databases for matches. FoodNet Canada can provide information about whether the sequence has previously been seen in farm or retail samples from its sentinel sites.
While it is important to gather such historical information, the most effective way to generate a high-quality hypothesis is to identify common exposures amongst cases. This can be achieved by interviewing cases using a hypothesis generating questionnaire and analysing exposures.
Back to top
Hypothesis generating questionnaires (or shotgun questionnaires) are intended to obtain detailed information on what a person’s exposures were in the days leading up to their illness. They are typically quite long and ask about many exposures such as travel history, contact with animals, restaurants, events attended, and a comprehensive food history. The time period of interest varies between pathogens, as the exposure period is equal to the maximum incubation period of the pathogen.
When designing a questionnaire, it is important to ensure that the questions are gathering the intended information. Questions should be concise, informal, and specific. Before interviewing cases, questionnaires should be tested to ensure clarity and identify any potential errors.
Read more – Questionnaire Design
Once the questionnaire is developed and piloted, it should be administered to cases in a consistent and unbiased manner. Case interviews can be conducted by one or multiple interviewers. A centralized approach allows a single interviewer to standardize interviews, detect patterns, and probe for items of interest. However, a multiple- interviewer approach is more time-efficient and allows for multiple perspectives when it comes time to identify the source.
Although case interviewing is an important outbreak investigation tool, it is not without its challenges. By the time the outbreak team is ready to conduct the interview, it could be weeks to months after the onset of symptoms. It is difficult for people to recall what they ate over a month ago. Sometimes cases might need to be interviewed multiple times as the hypothesis is developed and refined.
Read more- Case interviews
Once the interviews are complete, the data can be entered into a database or line list . The frequency of exposures for the cases is then obtained (e.g., % of cases that consumed each food item).
It is tempting to conclude that the most commonly consumed food items are the most likely suspects, but it is possible that these foods are commonly consumed amongst the general population as well. What is needed is a baseline proportion to compare the exposure frequencies to. Reference population studies, such as the CDC Food Atlas, the Nesbitt Waterloo study and Foodbook (see Tools ), can be used for this purpose. These studies provide investigators with the expected food frequencies based on 7-day food histories from thousands of respondents. These data can be used as a point of comparison for questionnaire data to identify exposures such as food items with higher than expected frequencies. Statistical tests (e.g., binomial probability tests) can then be used to test whether the differences between the proportion of cases exposed is significantly different from the proportion of “controls” (i.e., people included in the population studies) (see Tools ).
There are many limitations to using expected food frequencies, such as some studies not accounting for:
Further, since specific questions differ among surveys, it is often difficult to find the most appropriate comparison group. For example, the CDC Atlas of Exposures differentiates between hamburgers eaten at home or outside the home, while questionnaires used in investigations typically do not. Such differences in food definitions can make it challenging to determine which reference variable is the most appropriate to use as an “expected” level.
It is important to keep in mind that some foods with high expected consumption levels (e.g., chicken) may not flag statistically, but could still be potential sources. Further, there are other common exposures amongst cases that can carry important clues about the source of the outbreak. Cases that report common restaurants, events, or grocery stores can be considered sub-clusters. These sub-clusters should be investigated thoroughly by obtaining menus, receipts, or shopper card information if possible.
Toolkit binomial probability calculation tool for food exposures
Toolkit Outbreak Summaries overview
CDC National Outbreak Reporting System (NORS) Dashboard
Food Consumption Patterns in the Waterloo Region
CDC Food Atlas 2006-2007
FoodNet Canada Reports and Publications
CDC FoodNet Reports
Marler Clark Foodborne Illness Outbreak Database
FDA Foodborne Illness-Causing Organisms Cheat Sheet
CFIA: Canada’s 10 Least Wanted Foodborne Pathogens
Foodbook: Canadian Food Exposure Study to Strengthen Outbreak Response
Toolkit outbreak response database*
*Due to the Government of Canada’s Standard on Web Accessibility, this tool cannot be posted, but it is available upon request. Please contact us at [email protected] to request a copy. Please let us know if you need support or an accessible format.
Write down all the hypothesis and assumptions as a starting point for the project., applied for.
Stakeholders
How might we...
Research Plan
Interview Guide
Empathy Map
Hypothesis generation is a quick exercise that allows to reflect on all the already-known assumptions and insights related to user needs and behaviours, share them amongst team members, and derive initial ideas for service experiences or features that could be offered.
Ground the first step of the project on existing knowledge.
Put everything on the table, without hiding or saving ideas for later.
The collection is always evolving, following the development of our practice. If you have any interesting tools or example of application to share, please get in touch.
This website uses cookies to collect anonymized usage statistics so that we can improve the overall user experience. If you want to know more or change your preferences, read our Cookie Policy . By clicking Accept you are giving consent to the use of cookies.
No, thank you.
COMMENTS
Describe the definition, properties, and life cycle of a hypothesis. Describe relationships between a hypothesis and a theory, a model, and data. Categorize and explain research questions that provide hints for hypothesis generation. Explain how to visualize data and analysis results.
However, a review of the literature reveals the lack of original studies on hypothesis generation in clinical research. The authors then summarize their first human participant study exploring data-driven hypothesis generation by clinical researchers in a simulated setting.
Formulating Hypotheses for Different Study Designs. Generating a testable working hypothesis is the first step towards conducting original research. Such research may prove or disprove the proposed hypothesis. Case reports, case series, online surveys and other observational studies, clinical trials, and narrative reviews help to generate ...
The paradigm of hypothesis-generating research does not replace or undermine hypothesis-testing modes of research; instead, it complements them and has facilitated discoveries that may not have been possible with hypothesis-testing research. The hypothesis-generating mode of research has been primarily practiced in basic science but has ...
Hypothesis generation is the formation of guesses as to what the segment of code does; this step can also guide a re- segmentation of the code. Finally, verification is the process of examining the code and associated documentation to determine the consistency of the code with the current hypotheses.
A hypothesis (from the Greek, foundation) is a logical construct, interposed between a problem and its solution, which represents a proposed answer to a research question. It gives direction to the investigator's thinking about the problem and, therefore, facilitates a solution. Unlike facts and assumptions (presumed true and, therefore, not ...
A hypothesis is a tentative statement about the relationship between two or more variables. Explore examples and learn how to format your research hypothesis.
Hypothesis Generation is a literature-based discovery approach that utilizes existing literature to automatically generate implicit biomedical associations and provide reasonable predictions for future research. Despite its potential, current hypothesis generation methods face challenges when applied to research on biological mechanisms.
While hypothesis testing is a highly formalized activity, hypothesis generation remains largely informal. We propose a systematic procedure to generate novel hypotheses about human behavior, which uses the capacity of machine learning algorithms to notice patterns people might not.
The hypothesis-generating mode of research has been primarily practiced in basic science but has recently been extended to clinical-translational work as well. Just as in basic science, this approach to research can facilitate insights into human health and disease mechanisms and provide the crucially needed data set of the full spectrum of ...
Hypothesis generation is a key step in data science projects. Here's a case study on hypotheis generation for data science.
Definition: Hypothesis is an educated guess or proposed explanation for a phenomenon, based on some initial observations or data. It is a tentative statement that can be tested and potentially proven or disproven through further investigation and experimentation. Hypothesis is often used in scientific research to guide the design of experiments ...
A hypothesis ( pl.: hypotheses) is a proposed explanation for a phenomenon. For a hypothesis to be a scientific hypothesis, the scientific method requires that one can test it. Scientists generally base scientific hypotheses on previous observations that cannot satisfactorily be explained with the available scientific theories. Even though the words "hypothesis" and "theory" are often used ...
The formulation and testing of a hypothesis is part of the scientific method, the approach scientists use when attempting to understand and test ideas about natural phenomena. The generation of a hypothesis frequently is described as a creative process and is based on existing scientific knowledge, intuition, or experience.
Therefore, the term 'hypothesis-generating' in this study refers to the abductive thinking process of formulating a set of propositions proposed as a tentative causal explanation for an observed ...
Search for: 'hypothesis-generating method' in Oxford Reference ». A data-structuring technique, such as a classification and ordination method which, by grouping and ranking data, suggests possible relationships with other factors (i.e. generates an hypothesis). Appropriate data may then be collected to test the hypothesis statistically.
Hypothesis testing is a formal procedure for investigating our ideas about the world using statistics. It is most often used by scientists to test specific predictions, called hypotheses, that arise from theories.
Explore hypothesis testing, a fundamental method in data analysis. Understand how to use it to draw accurate conclusions and make informed decisions.
Abstract. Hypothesis generation is a critical, but challenging, step in a foodborne outbreak investigation. The pathogens that contaminate food have many d
Hypothesis testing is an important procedure in statistics. Hypothesis testing evaluates two mutually exclusive population statements to determine which statement is most supported by sample data. When we say that the findings are statistically significant, thanks to hypothesis testing.
Hypothesis generating questionnaires Hypothesis generating questionnaires (or shotgun questionnaires) are intended to obtain detailed information on what a person's exposures were in the days leading up to their illness. They are typically quite long and ask about many exposures such as travel history, contact with animals, restaurants, events attended, and a comprehensive food history. The ...
Hypothesis generation is a quick exercise that allows to reflect on all the already-known assumptions and insights related to user needs and behaviours, share them amongst team members, and derive initial ideas for service experiences or features that could be offered.
Park et al. highlighted the effectiveness of ionizers equipped to minimize ozone production, focusing on the generation of ions as the main method of achieving bactericidal effects. The health and safety issues stemming from ozone generation by air ionizers degrading indoor air quality are often emphasized . Ozone, as a strong oxidant, can ...