• Privacy Policy

Buy Me a Coffee

Research Method

Home » What is a Hypothesis – Types, Examples and Writing Guide

What is a Hypothesis – Types, Examples and Writing Guide

Table of Contents

What is a Hypothesis

Definition:

Hypothesis is an educated guess or proposed explanation for a phenomenon, based on some initial observations or data. It is a tentative statement that can be tested and potentially proven or disproven through further investigation and experimentation.

Hypothesis is often used in scientific research to guide the design of experiments and the collection and analysis of data. It is an essential element of the scientific method, as it allows researchers to make predictions about the outcome of their experiments and to test those predictions to determine their accuracy.

Types of Hypothesis

Types of Hypothesis are as follows:

Research Hypothesis

A research hypothesis is a statement that predicts a relationship between variables. It is usually formulated as a specific statement that can be tested through research, and it is often used in scientific research to guide the design of experiments.

Null Hypothesis

The null hypothesis is a statement that assumes there is no significant difference or relationship between variables. It is often used as a starting point for testing the research hypothesis, and if the results of the study reject the null hypothesis, it suggests that there is a significant difference or relationship between variables.

Alternative Hypothesis

An alternative hypothesis is a statement that assumes there is a significant difference or relationship between variables. It is often used as an alternative to the null hypothesis and is tested against the null hypothesis to determine which statement is more accurate.

Directional Hypothesis

A directional hypothesis is a statement that predicts the direction of the relationship between variables. For example, a researcher might predict that increasing the amount of exercise will result in a decrease in body weight.

Non-directional Hypothesis

A non-directional hypothesis is a statement that predicts the relationship between variables but does not specify the direction. For example, a researcher might predict that there is a relationship between the amount of exercise and body weight, but they do not specify whether increasing or decreasing exercise will affect body weight.

Statistical Hypothesis

A statistical hypothesis is a statement that assumes a particular statistical model or distribution for the data. It is often used in statistical analysis to test the significance of a particular result.

Composite Hypothesis

A composite hypothesis is a statement that assumes more than one condition or outcome. It can be divided into several sub-hypotheses, each of which represents a different possible outcome.

Empirical Hypothesis

An empirical hypothesis is a statement that is based on observed phenomena or data. It is often used in scientific research to develop theories or models that explain the observed phenomena.

Simple Hypothesis

A simple hypothesis is a statement that assumes only one outcome or condition. It is often used in scientific research to test a single variable or factor.

Complex Hypothesis

A complex hypothesis is a statement that assumes multiple outcomes or conditions. It is often used in scientific research to test the effects of multiple variables or factors on a particular outcome.

Applications of Hypothesis

Hypotheses are used in various fields to guide research and make predictions about the outcomes of experiments or observations. Here are some examples of how hypotheses are applied in different fields:

  • Science : In scientific research, hypotheses are used to test the validity of theories and models that explain natural phenomena. For example, a hypothesis might be formulated to test the effects of a particular variable on a natural system, such as the effects of climate change on an ecosystem.
  • Medicine : In medical research, hypotheses are used to test the effectiveness of treatments and therapies for specific conditions. For example, a hypothesis might be formulated to test the effects of a new drug on a particular disease.
  • Psychology : In psychology, hypotheses are used to test theories and models of human behavior and cognition. For example, a hypothesis might be formulated to test the effects of a particular stimulus on the brain or behavior.
  • Sociology : In sociology, hypotheses are used to test theories and models of social phenomena, such as the effects of social structures or institutions on human behavior. For example, a hypothesis might be formulated to test the effects of income inequality on crime rates.
  • Business : In business research, hypotheses are used to test the validity of theories and models that explain business phenomena, such as consumer behavior or market trends. For example, a hypothesis might be formulated to test the effects of a new marketing campaign on consumer buying behavior.
  • Engineering : In engineering, hypotheses are used to test the effectiveness of new technologies or designs. For example, a hypothesis might be formulated to test the efficiency of a new solar panel design.

How to write a Hypothesis

Here are the steps to follow when writing a hypothesis:

Identify the Research Question

The first step is to identify the research question that you want to answer through your study. This question should be clear, specific, and focused. It should be something that can be investigated empirically and that has some relevance or significance in the field.

Conduct a Literature Review

Before writing your hypothesis, it’s essential to conduct a thorough literature review to understand what is already known about the topic. This will help you to identify the research gap and formulate a hypothesis that builds on existing knowledge.

Determine the Variables

The next step is to identify the variables involved in the research question. A variable is any characteristic or factor that can vary or change. There are two types of variables: independent and dependent. The independent variable is the one that is manipulated or changed by the researcher, while the dependent variable is the one that is measured or observed as a result of the independent variable.

Formulate the Hypothesis

Based on the research question and the variables involved, you can now formulate your hypothesis. A hypothesis should be a clear and concise statement that predicts the relationship between the variables. It should be testable through empirical research and based on existing theory or evidence.

Write the Null Hypothesis

The null hypothesis is the opposite of the alternative hypothesis, which is the hypothesis that you are testing. The null hypothesis states that there is no significant difference or relationship between the variables. It is important to write the null hypothesis because it allows you to compare your results with what would be expected by chance.

Refine the Hypothesis

After formulating the hypothesis, it’s important to refine it and make it more precise. This may involve clarifying the variables, specifying the direction of the relationship, or making the hypothesis more testable.

Examples of Hypothesis

Here are a few examples of hypotheses in different fields:

  • Psychology : “Increased exposure to violent video games leads to increased aggressive behavior in adolescents.”
  • Biology : “Higher levels of carbon dioxide in the atmosphere will lead to increased plant growth.”
  • Sociology : “Individuals who grow up in households with higher socioeconomic status will have higher levels of education and income as adults.”
  • Education : “Implementing a new teaching method will result in higher student achievement scores.”
  • Marketing : “Customers who receive a personalized email will be more likely to make a purchase than those who receive a generic email.”
  • Physics : “An increase in temperature will cause an increase in the volume of a gas, assuming all other variables remain constant.”
  • Medicine : “Consuming a diet high in saturated fats will increase the risk of developing heart disease.”

Purpose of Hypothesis

The purpose of a hypothesis is to provide a testable explanation for an observed phenomenon or a prediction of a future outcome based on existing knowledge or theories. A hypothesis is an essential part of the scientific method and helps to guide the research process by providing a clear focus for investigation. It enables scientists to design experiments or studies to gather evidence and data that can support or refute the proposed explanation or prediction.

The formulation of a hypothesis is based on existing knowledge, observations, and theories, and it should be specific, testable, and falsifiable. A specific hypothesis helps to define the research question, which is important in the research process as it guides the selection of an appropriate research design and methodology. Testability of the hypothesis means that it can be proven or disproven through empirical data collection and analysis. Falsifiability means that the hypothesis should be formulated in such a way that it can be proven wrong if it is incorrect.

In addition to guiding the research process, the testing of hypotheses can lead to new discoveries and advancements in scientific knowledge. When a hypothesis is supported by the data, it can be used to develop new theories or models to explain the observed phenomenon. When a hypothesis is not supported by the data, it can help to refine existing theories or prompt the development of new hypotheses to explain the phenomenon.

When to use Hypothesis

Here are some common situations in which hypotheses are used:

  • In scientific research , hypotheses are used to guide the design of experiments and to help researchers make predictions about the outcomes of those experiments.
  • In social science research , hypotheses are used to test theories about human behavior, social relationships, and other phenomena.
  • I n business , hypotheses can be used to guide decisions about marketing, product development, and other areas. For example, a hypothesis might be that a new product will sell well in a particular market, and this hypothesis can be tested through market research.

Characteristics of Hypothesis

Here are some common characteristics of a hypothesis:

  • Testable : A hypothesis must be able to be tested through observation or experimentation. This means that it must be possible to collect data that will either support or refute the hypothesis.
  • Falsifiable : A hypothesis must be able to be proven false if it is not supported by the data. If a hypothesis cannot be falsified, then it is not a scientific hypothesis.
  • Clear and concise : A hypothesis should be stated in a clear and concise manner so that it can be easily understood and tested.
  • Based on existing knowledge : A hypothesis should be based on existing knowledge and research in the field. It should not be based on personal beliefs or opinions.
  • Specific : A hypothesis should be specific in terms of the variables being tested and the predicted outcome. This will help to ensure that the research is focused and well-designed.
  • Tentative: A hypothesis is a tentative statement or assumption that requires further testing and evidence to be confirmed or refuted. It is not a final conclusion or assertion.
  • Relevant : A hypothesis should be relevant to the research question or problem being studied. It should address a gap in knowledge or provide a new perspective on the issue.

Advantages of Hypothesis

Hypotheses have several advantages in scientific research and experimentation:

  • Guides research: A hypothesis provides a clear and specific direction for research. It helps to focus the research question, select appropriate methods and variables, and interpret the results.
  • Predictive powe r: A hypothesis makes predictions about the outcome of research, which can be tested through experimentation. This allows researchers to evaluate the validity of the hypothesis and make new discoveries.
  • Facilitates communication: A hypothesis provides a common language and framework for scientists to communicate with one another about their research. This helps to facilitate the exchange of ideas and promotes collaboration.
  • Efficient use of resources: A hypothesis helps researchers to use their time, resources, and funding efficiently by directing them towards specific research questions and methods that are most likely to yield results.
  • Provides a basis for further research: A hypothesis that is supported by data provides a basis for further research and exploration. It can lead to new hypotheses, theories, and discoveries.
  • Increases objectivity: A hypothesis can help to increase objectivity in research by providing a clear and specific framework for testing and interpreting results. This can reduce bias and increase the reliability of research findings.

Limitations of Hypothesis

Some Limitations of the Hypothesis are as follows:

  • Limited to observable phenomena: Hypotheses are limited to observable phenomena and cannot account for unobservable or intangible factors. This means that some research questions may not be amenable to hypothesis testing.
  • May be inaccurate or incomplete: Hypotheses are based on existing knowledge and research, which may be incomplete or inaccurate. This can lead to flawed hypotheses and erroneous conclusions.
  • May be biased: Hypotheses may be biased by the researcher’s own beliefs, values, or assumptions. This can lead to selective interpretation of data and a lack of objectivity in research.
  • Cannot prove causation: A hypothesis can only show a correlation between variables, but it cannot prove causation. This requires further experimentation and analysis.
  • Limited to specific contexts: Hypotheses are limited to specific contexts and may not be generalizable to other situations or populations. This means that results may not be applicable in other contexts or may require further testing.
  • May be affected by chance : Hypotheses may be affected by chance or random variation, which can obscure or distort the true relationship between variables.

About the author

' src=

Muhammad Hassan

Researcher, Academic Writer, Web developer

You may also like

Data collection

Data Collection – Methods Types and Examples

Delimitations

Delimitations in Research – Types, Examples and...

Research Process

Research Process – Steps, Examples and Tips

Research Design

Research Design – Types, Methods and Examples

Institutional Review Board (IRB)

Institutional Review Board – Application Sample...

Evaluating Research

Evaluating Research – Process, Examples and...

helpful professor logo

13 Different Types of Hypothesis

hypothesis definition and example, explained below

There are 13 different types of hypothesis. These include simple, complex, null, alternative, composite, directional, non-directional, logical, empirical, statistical, associative, exact, and inexact.

A hypothesis can be categorized into one or more of these types. However, some are mutually exclusive and opposites. Simple and complex hypotheses are mutually exclusive, as are direction and non-direction, and null and alternative hypotheses.

Below I explain each hypothesis in simple terms for absolute beginners. These definitions may be too simple for some, but they’re designed to be clear introductions to the terms to help people wrap their heads around the concepts early on in their education about research methods .

Types of Hypothesis

Before you Proceed: Dependent vs Independent Variables

A research study and its hypotheses generally examine the relationships between independent and dependent variables – so you need to know these two concepts:

  • The independent variable is the variable that is causing a change.
  • The dependent variable is the variable the is affected by the change. This is the variable being tested.

Read my full article on dependent vs independent variables for more examples.

Example: Eating carrots (independent variable) improves eyesight (dependent variable).

1. Simple Hypothesis

A simple hypothesis is a hypothesis that predicts a correlation between two test variables: an independent and a dependent variable.

This is the easiest and most straightforward type of hypothesis. You simply need to state an expected correlation between the dependant variable and the independent variable.

You do not need to predict causation (see: directional hypothesis). All you would need to do is prove that the two variables are linked.

Simple Hypothesis Examples

2. complex hypothesis.

A complex hypothesis is a hypothesis that contains multiple variables, making the hypothesis more specific but also harder to prove.

You can have multiple independent and dependant variables in this hypothesis.

Complex Hypothesis Example

In the above example, we have multiple independent and dependent variables:

  • Independent variables: Age and weight.
  • Dependent variables: diabetes and heart disease.

Because there are multiple variables, this study is a lot more complex than a simple hypothesis. It quickly gets much more difficult to prove these hypotheses. This is why undergraduate and first-time researchers are usually encouraged to use simple hypotheses.

3. Null Hypothesis

A null hypothesis will predict that there will be no significant relationship between the two test variables.

For example, you can say that “The study will show that there is no correlation between marriage and happiness.”

A good way to think about a null hypothesis is to think of it in the same way as “innocent until proven guilty”[1]. Unless you can come up with evidence otherwise, your null hypothesis will stand.

A null hypothesis may also highlight that a correlation will be inconclusive . This means that you can predict that the study will not be able to confirm your results one way or the other. For example, you can say “It is predicted that the study will be unable to confirm a correlation between the two variables due to foreseeable interference by a third variable .”

Beware that an inconclusive null hypothesis may be questioned by your teacher. Why would you conduct a test that you predict will not provide a clear result? Perhaps you should take a closer look at your methodology and re-examine it. Nevertheless, inconclusive null hypotheses can sometimes have merit.

Null Hypothesis Examples

4. alternative hypothesis.

An alternative hypothesis is a hypothesis that is anything other than the null hypothesis. It will disprove the null hypothesis.

We use the symbol H A or H 1 to denote an alternative hypothesis.

The null and alternative hypotheses are usually used together. We will say the null hypothesis is the case where a relationship between two variables is non-existent. The alternative hypothesis is the case where there is a relationship between those two variables.

The following statement is always true: H 0 ≠ H A .

Let’s take the example of the hypothesis: “Does eating oatmeal before an exam impact test scores?”

We can have two hypotheses here:

  • Null hypothesis (H 0 ): “Eating oatmeal before an exam does not impact test scores.”
  • Alternative hypothesis (H A ): “Eating oatmeal before an exam does impact test scores.”

For the alternative hypothesis to be true, all we have to do is disprove the null hypothesis for the alternative hypothesis to be true. We do not need an exact prediction of how much oatmeal will impact the test scores or even if the impact is positive or negative. So long as the null hypothesis is proven to be false, then the alternative hypothesis is proven to be true.

5. Composite Hypothesis

A composite hypothesis is a hypothesis that does not predict the exact parameters, distribution, or range of the dependent variable.

Often, we would predict an exact outcome. For example: “23 year old men are on average 189cm tall.” Here, we are giving an exact parameter. So, the hypothesis is not composite.

But, often, we cannot exactly hypothesize something. We assume that something will happen, but we’re not exactly sure what. In these cases, we might say: “23 year old men are not on average 189cm tall.”

We haven’t set a distribution range or exact parameters of the average height of 23 year old men. So, we’ve introduced a composite hypothesis as opposed to an exact hypothesis.

Generally, an alternative hypothesis (discussed above) is composite because it is defined as anything except the null hypothesis. This ‘anything except’ does not define parameters or distribution, and therefore it’s an example of a composite hypothesis.

6. Directional Hypothesis

A directional hypothesis makes a prediction about the positivity or negativity of the effect of an intervention prior to the test being conducted.

Instead of being agnostic about whether the effect will be positive or negative, it nominates the effect’s directionality.

We often call this a one-tailed hypothesis (in contrast to a two-tailed or non-directional hypothesis) because, looking at a distribution graph, we’re hypothesizing that the results will lean toward one particular tail on the graph – either the positive or negative.

Directional Hypothesis Examples

7. non-directional hypothesis.

A non-directional hypothesis does not specify the predicted direction (e.g. positivity or negativity) of the effect of the independent variable on the dependent variable.

These hypotheses predict an effect, but stop short of saying what that effect will be.

A non-directional hypothesis is similar to composite and alternative hypotheses. All three types of hypothesis tend to make predictions without defining a direction. In a composite hypothesis, a specific prediction is not made (although a general direction may be indicated, so the overlap is not complete). For an alternative hypothesis, you often predict that the even will be anything but the null hypothesis, which means it could be more or less than H 0 (or in other words, non-directional).

Let’s turn the above directional hypotheses into non-directional hypotheses.

Non-Directional Hypothesis Examples

8. logical hypothesis.

A logical hypothesis is a hypothesis that cannot be tested, but has some logical basis underpinning our assumptions.

These are most commonly used in philosophy because philosophical questions are often untestable and therefore we must rely on our logic to formulate logical theories.

Usually, we would want to turn a logical hypothesis into an empirical one through testing if we got the chance. Unfortunately, we don’t always have this opportunity because the test is too complex, expensive, or simply unrealistic.

Here are some examples:

  • Before the 1980s, it was hypothesized that the Titanic came to its resting place at 41° N and 49° W, based on the time the ship sank and the ship’s presumed path across the Atlantic Ocean. However, due to the depth of the ocean, it was impossible to test. Thus, the hypothesis was simply a logical hypothesis.
  • Dinosaurs closely related to Aligators probably had green scales because Aligators have green scales. However, as they are all extinct, we can only rely on logic and not empirical data.

9. Empirical Hypothesis

An empirical hypothesis is the opposite of a logical hypothesis. It is a hypothesis that is currently being tested using scientific analysis. We can also call this a ‘working hypothesis’.

We can to separate research into two types: theoretical and empirical. Theoretical research relies on logic and thought experiments. Empirical research relies on tests that can be verified by observation and measurement.

So, an empirical hypothesis is a hypothesis that can and will be tested.

  • Raising the wage of restaurant servers increases staff retention.
  • Adding 1 lb of corn per day to cows’ diets decreases their lifespan.
  • Mushrooms grow faster at 22 degrees Celsius than 27 degrees Celsius.

Each of the above hypotheses can be tested, making them empirical rather than just logical (aka theoretical).

10. Statistical Hypothesis

A statistical hypothesis utilizes representative statistical models to draw conclusions about broader populations.

It requires the use of datasets or carefully selected representative samples so that statistical inference can be drawn across a larger dataset.

This type of research is necessary when it is impossible to assess every single possible case. Imagine, for example, if you wanted to determine if men are taller than women. You would be unable to measure the height of every man and woman on the planet. But, by conducting sufficient random samples, you would be able to predict with high probability that the results of your study would remain stable across the whole population.

You would be right in guessing that almost all quantitative research studies conducted in academic settings today involve statistical hypotheses.

Statistical Hypothesis Examples

  • Human Sex Ratio. The most famous statistical hypothesis example is that of John Arbuthnot’s sex at birth case study in 1710. Arbuthnot used birth data to determine with high statistical probability that there are more male births than female births. He called this divine providence, and to this day, his findings remain true: more men are born than women.
  • Lady Testing Tea. A 1935 study by Ronald Fisher involved testing a woman who believed she could tell whether milk was added before or after water to a cup of tea. Fisher gave her 4 cups in which one randomly had milk placed before the tea. He repeated the test 8 times. The lady was correct each time. Fisher found that she had a 1 in 70 chance of getting all 8 test correct, which is a statistically significant result.

11. Associative Hypothesis

An associative hypothesis predicts that two variables are linked but does not explore whether one variable directly impacts upon the other variable.

We commonly refer to this as “ correlation does not mean causation ”. Just because there are a lot of sick people in a hospital, it doesn’t mean that the hospital made the people sick. There is something going on there that’s causing the issue (sick people are flocking to the hospital).

So, in an associative hypothesis, you note correlation between an independent and dependent variable but do not make a prediction about how the two interact. You stop short of saying one thing causes another thing.

Associative Hypothesis Examples

  • Sick people in hospital. You could conduct a study hypothesizing that hospitals have more sick people in them than other institutions in society. However, you don’t hypothesize that the hospitals caused the sickness.
  • Lice make you healthy. In the Middle Ages, it was observed that sick people didn’t tend to have lice in their hair. The inaccurate conclusion was that lice was not only a sign of health, but that they made people healthy. In reality, there was an association here, but not causation. The fact was that lice were sensitive to body temperature and fled bodies that had fevers.

12. Causal Hypothesis

A causal hypothesis predicts that two variables are not only associated, but that changes in one variable will cause changes in another.

A causal hypothesis is harder to prove than an associative hypothesis because the cause needs to be definitively proven. This will often require repeating tests in controlled environments with the researchers making manipulations to the independent variable, or the use of control groups and placebo effects .

If we were to take the above example of lice in the hair of sick people, researchers would have to put lice in sick people’s hair and see if it made those people healthier. Researchers would likely observe that the lice would flee the hair, but the sickness would remain, leading to a finding of association but not causation.

Causal Hypothesis Examples

13. exact vs. inexact hypothesis.

For brevity’s sake, I have paired these two hypotheses into the one point. The reality is that we’ve already seen both of these types of hypotheses at play already.

An exact hypothesis (also known as a point hypothesis) specifies a specific prediction whereas an inexact hypothesis assumes a range of possible values without giving an exact outcome. As Helwig [2] argues:

“An “exact” hypothesis specifies the exact value(s) of the parameter(s) of interest, whereas an “inexact” hypothesis specifies a range of possible values for the parameter(s) of interest.”

Generally, a null hypothesis is an exact hypothesis whereas alternative, composite, directional, and non-directional hypotheses are all inexact.

See Next: 15 Hypothesis Examples

This is introductory information that is basic and indeed quite simplified for absolute beginners. It’s worth doing further independent research to get deeper knowledge of research methods and how to conduct an effective research study. And if you’re in education studies, don’t miss out on my list of the best education studies dissertation ideas .

[1] https://jnnp.bmj.com/content/91/6/571.abstract

[2] http://users.stat.umn.edu/~helwig/notes/SignificanceTesting.pdf

Chris

Chris Drew (PhD)

Dr. Chris Drew is the founder of the Helpful Professor. He holds a PhD in education and has published over 20 articles in scholarly journals. He is the former editor of the Journal of Learning Development in Higher Education. [Image Descriptor: Photo of Chris]

  • Chris Drew (PhD) https://helpfulprofessor.com/author/chris-drew-phd/ 5 Top Tips for Succeeding at University
  • Chris Drew (PhD) https://helpfulprofessor.com/author/chris-drew-phd/ 50 Durable Goods Examples
  • Chris Drew (PhD) https://helpfulprofessor.com/author/chris-drew-phd/ 100 Consumer Goods Examples
  • Chris Drew (PhD) https://helpfulprofessor.com/author/chris-drew-phd/ 30 Globalization Pros and Cons

2 thoughts on “13 Different Types of Hypothesis”

' src=

Wow! This introductionary materials are very helpful. I teach the begginers in research for the first time in my career. The given tips and materials are very helpful. Chris, thank you so much! Excellent materials!

' src=

You’re more than welcome! If you want a pdf version of this article to provide for your students to use as a weekly reading on in-class discussion prompt for seminars, just drop me an email in the Contact form and I’ll get one sent out to you.

When I’ve taught this seminar, I’ve put my students into groups, cut these definitions into strips, and handed them out to the groups. Then I get them to try to come up with hypotheses that fit into each ‘type’. You can either just rotate hypothesis types so they get a chance at creating a hypothesis of each type, or get them to “teach” their hypothesis type and examples to the class at the end of the seminar.

Cheers, Chris

Leave a Comment Cancel Reply

Your email address will not be published. Required fields are marked *

  • Resources Home 🏠
  • Try SciSpace Copilot
  • Search research papers
  • Add Copilot Extension
  • Try AI Detector
  • Try Paraphraser
  • Try Citation Generator
  • April Papers
  • June Papers
  • July Papers

SciSpace Resources

The Craft of Writing a Strong Hypothesis

Deeptanshu D

Table of Contents

Writing a hypothesis is one of the essential elements of a scientific research paper. It needs to be to the point, clearly communicating what your research is trying to accomplish. A blurry, drawn-out, or complexly-structured hypothesis can confuse your readers. Or worse, the editor and peer reviewers.

A captivating hypothesis is not too intricate. This blog will take you through the process so that, by the end of it, you have a better idea of how to convey your research paper's intent in just one sentence.

What is a Hypothesis?

The first step in your scientific endeavor, a hypothesis, is a strong, concise statement that forms the basis of your research. It is not the same as a thesis statement , which is a brief summary of your research paper .

The sole purpose of a hypothesis is to predict your paper's findings, data, and conclusion. It comes from a place of curiosity and intuition . When you write a hypothesis, you're essentially making an educated guess based on scientific prejudices and evidence, which is further proven or disproven through the scientific method.

The reason for undertaking research is to observe a specific phenomenon. A hypothesis, therefore, lays out what the said phenomenon is. And it does so through two variables, an independent and dependent variable.

The independent variable is the cause behind the observation, while the dependent variable is the effect of the cause. A good example of this is “mixing red and blue forms purple.” In this hypothesis, mixing red and blue is the independent variable as you're combining the two colors at your own will. The formation of purple is the dependent variable as, in this case, it is conditional to the independent variable.

Different Types of Hypotheses‌

Types-of-hypotheses

Types of hypotheses

Some would stand by the notion that there are only two types of hypotheses: a Null hypothesis and an Alternative hypothesis. While that may have some truth to it, it would be better to fully distinguish the most common forms as these terms come up so often, which might leave you out of context.

Apart from Null and Alternative, there are Complex, Simple, Directional, Non-Directional, Statistical, and Associative and casual hypotheses. They don't necessarily have to be exclusive, as one hypothesis can tick many boxes, but knowing the distinctions between them will make it easier for you to construct your own.

1. Null hypothesis

A null hypothesis proposes no relationship between two variables. Denoted by H 0 , it is a negative statement like “Attending physiotherapy sessions does not affect athletes' on-field performance.” Here, the author claims physiotherapy sessions have no effect on on-field performances. Even if there is, it's only a coincidence.

2. Alternative hypothesis

Considered to be the opposite of a null hypothesis, an alternative hypothesis is donated as H1 or Ha. It explicitly states that the dependent variable affects the independent variable. A good  alternative hypothesis example is “Attending physiotherapy sessions improves athletes' on-field performance.” or “Water evaporates at 100 °C. ” The alternative hypothesis further branches into directional and non-directional.

  • Directional hypothesis: A hypothesis that states the result would be either positive or negative is called directional hypothesis. It accompanies H1 with either the ‘<' or ‘>' sign.
  • Non-directional hypothesis: A non-directional hypothesis only claims an effect on the dependent variable. It does not clarify whether the result would be positive or negative. The sign for a non-directional hypothesis is ‘≠.'

3. Simple hypothesis

A simple hypothesis is a statement made to reflect the relation between exactly two variables. One independent and one dependent. Consider the example, “Smoking is a prominent cause of lung cancer." The dependent variable, lung cancer, is dependent on the independent variable, smoking.

4. Complex hypothesis

In contrast to a simple hypothesis, a complex hypothesis implies the relationship between multiple independent and dependent variables. For instance, “Individuals who eat more fruits tend to have higher immunity, lesser cholesterol, and high metabolism.” The independent variable is eating more fruits, while the dependent variables are higher immunity, lesser cholesterol, and high metabolism.

5. Associative and casual hypothesis

Associative and casual hypotheses don't exhibit how many variables there will be. They define the relationship between the variables. In an associative hypothesis, changing any one variable, dependent or independent, affects others. In a casual hypothesis, the independent variable directly affects the dependent.

6. Empirical hypothesis

Also referred to as the working hypothesis, an empirical hypothesis claims a theory's validation via experiments and observation. This way, the statement appears justifiable and different from a wild guess.

Say, the hypothesis is “Women who take iron tablets face a lesser risk of anemia than those who take vitamin B12.” This is an example of an empirical hypothesis where the researcher  the statement after assessing a group of women who take iron tablets and charting the findings.

7. Statistical hypothesis

The point of a statistical hypothesis is to test an already existing hypothesis by studying a population sample. Hypothesis like “44% of the Indian population belong in the age group of 22-27.” leverage evidence to prove or disprove a particular statement.

Characteristics of a Good Hypothesis

Writing a hypothesis is essential as it can make or break your research for you. That includes your chances of getting published in a journal. So when you're designing one, keep an eye out for these pointers:

  • A research hypothesis has to be simple yet clear to look justifiable enough.
  • It has to be testable — your research would be rendered pointless if too far-fetched into reality or limited by technology.
  • It has to be precise about the results —what you are trying to do and achieve through it should come out in your hypothesis.
  • A research hypothesis should be self-explanatory, leaving no doubt in the reader's mind.
  • If you are developing a relational hypothesis, you need to include the variables and establish an appropriate relationship among them.
  • A hypothesis must keep and reflect the scope for further investigations and experiments.

Separating a Hypothesis from a Prediction

Outside of academia, hypothesis and prediction are often used interchangeably. In research writing, this is not only confusing but also incorrect. And although a hypothesis and prediction are guesses at their core, there are many differences between them.

A hypothesis is an educated guess or even a testable prediction validated through research. It aims to analyze the gathered evidence and facts to define a relationship between variables and put forth a logical explanation behind the nature of events.

Predictions are assumptions or expected outcomes made without any backing evidence. They are more fictionally inclined regardless of where they originate from.

For this reason, a hypothesis holds much more weight than a prediction. It sticks to the scientific method rather than pure guesswork. "Planets revolve around the Sun." is an example of a hypothesis as it is previous knowledge and observed trends. Additionally, we can test it through the scientific method.

Whereas "COVID-19 will be eradicated by 2030." is a prediction. Even though it results from past trends, we can't prove or disprove it. So, the only way this gets validated is to wait and watch if COVID-19 cases end by 2030.

Finally, How to Write a Hypothesis

Quick-tips-on-how-to-write-a-hypothesis

Quick tips on writing a hypothesis

1.  Be clear about your research question

A hypothesis should instantly address the research question or the problem statement. To do so, you need to ask a question. Understand the constraints of your undertaken research topic and then formulate a simple and topic-centric problem. Only after that can you develop a hypothesis and further test for evidence.

2. Carry out a recce

Once you have your research's foundation laid out, it would be best to conduct preliminary research. Go through previous theories, academic papers, data, and experiments before you start curating your research hypothesis. It will give you an idea of your hypothesis's viability or originality.

Making use of references from relevant research papers helps draft a good research hypothesis. SciSpace Discover offers a repository of over 270 million research papers to browse through and gain a deeper understanding of related studies on a particular topic. Additionally, you can use SciSpace Copilot , your AI research assistant, for reading any lengthy research paper and getting a more summarized context of it. A hypothesis can be formed after evaluating many such summarized research papers. Copilot also offers explanations for theories and equations, explains paper in simplified version, allows you to highlight any text in the paper or clip math equations and tables and provides a deeper, clear understanding of what is being said. This can improve the hypothesis by helping you identify potential research gaps.

3. Create a 3-dimensional hypothesis

Variables are an essential part of any reasonable hypothesis. So, identify your independent and dependent variable(s) and form a correlation between them. The ideal way to do this is to write the hypothetical assumption in the ‘if-then' form. If you use this form, make sure that you state the predefined relationship between the variables.

In another way, you can choose to present your hypothesis as a comparison between two variables. Here, you must specify the difference you expect to observe in the results.

4. Write the first draft

Now that everything is in place, it's time to write your hypothesis. For starters, create the first draft. In this version, write what you expect to find from your research.

Clearly separate your independent and dependent variables and the link between them. Don't fixate on syntax at this stage. The goal is to ensure your hypothesis addresses the issue.

5. Proof your hypothesis

After preparing the first draft of your hypothesis, you need to inspect it thoroughly. It should tick all the boxes, like being concise, straightforward, relevant, and accurate. Your final hypothesis has to be well-structured as well.

Research projects are an exciting and crucial part of being a scholar. And once you have your research question, you need a great hypothesis to begin conducting research. Thus, knowing how to write a hypothesis is very important.

Now that you have a firmer grasp on what a good hypothesis constitutes, the different kinds there are, and what process to follow, you will find it much easier to write your hypothesis, which ultimately helps your research.

Now it's easier than ever to streamline your research workflow with SciSpace Discover . Its integrated, comprehensive end-to-end platform for research allows scholars to easily discover, write and publish their research and fosters collaboration.

It includes everything you need, including a repository of over 270 million research papers across disciplines, SEO-optimized summaries and public profiles to show your expertise and experience.

If you found these tips on writing a research hypothesis useful, head over to our blog on Statistical Hypothesis Testing to learn about the top researchers, papers, and institutions in this domain.

Frequently Asked Questions (FAQs)

1. what is the definition of hypothesis.

According to the Oxford dictionary, a hypothesis is defined as “An idea or explanation of something that is based on a few known facts, but that has not yet been proved to be true or correct”.

2. What is an example of hypothesis?

The hypothesis is a statement that proposes a relationship between two or more variables. An example: "If we increase the number of new users who join our platform by 25%, then we will see an increase in revenue."

3. What is an example of null hypothesis?

A null hypothesis is a statement that there is no relationship between two variables. The null hypothesis is written as H0. The null hypothesis states that there is no effect. For example, if you're studying whether or not a particular type of exercise increases strength, your null hypothesis will be "there is no difference in strength between people who exercise and people who don't."

4. What are the types of research?

• Fundamental research

• Applied research

• Qualitative research

• Quantitative research

• Mixed research

• Exploratory research

• Longitudinal research

• Cross-sectional research

• Field research

• Laboratory research

• Fixed research

• Flexible research

• Action research

• Policy research

• Classification research

• Comparative research

• Causal research

• Inductive research

• Deductive research

5. How to write a hypothesis?

• Your hypothesis should be able to predict the relationship and outcome.

• Avoid wordiness by keeping it simple and brief.

• Your hypothesis should contain observable and testable outcomes.

• Your hypothesis should be relevant to the research question.

6. What are the 2 types of hypothesis?

• Null hypotheses are used to test the claim that "there is no difference between two groups of data".

• Alternative hypotheses test the claim that "there is a difference between two data groups".

7. Difference between research question and research hypothesis?

A research question is a broad, open-ended question you will try to answer through your research. A hypothesis is a statement based on prior research or theory that you expect to be true due to your study. Example - Research question: What are the factors that influence the adoption of the new technology? Research hypothesis: There is a positive relationship between age, education and income level with the adoption of the new technology.

8. What is plural for hypothesis?

The plural of hypothesis is hypotheses. Here's an example of how it would be used in a statement, "Numerous well-considered hypotheses are presented in this part, and they are supported by tables and figures that are well-illustrated."

9. What is the red queen hypothesis?

The red queen hypothesis in evolutionary biology states that species must constantly evolve to avoid extinction because if they don't, they will be outcompeted by other species that are evolving. Leigh Van Valen first proposed it in 1973; since then, it has been tested and substantiated many times.

10. Who is known as the father of null hypothesis?

The father of the null hypothesis is Sir Ronald Fisher. He published a paper in 1925 that introduced the concept of null hypothesis testing, and he was also the first to use the term itself.

11. When to reject null hypothesis?

You need to find a significant difference between your two populations to reject the null hypothesis. You can determine that by running statistical tests such as an independent sample t-test or a dependent sample t-test. You should reject the null hypothesis if the p-value is less than 0.05.

what are the types hypothesis

You might also like

Consensus GPT vs. SciSpace GPT: Choose the Best GPT for Research

Consensus GPT vs. SciSpace GPT: Choose the Best GPT for Research

Sumalatha G

Literature Review and Theoretical Framework: Understanding the Differences

Nikhil Seethi

Types of Essays in Academic Writing - Quick Guide (2024)

Research Hypothesis In Psychology: Types, & Examples

Saul Mcleod, PhD

Editor-in-Chief for Simply Psychology

BSc (Hons) Psychology, MRes, PhD, University of Manchester

Saul Mcleod, PhD., is a qualified psychology teacher with over 18 years of experience in further and higher education. He has been published in peer-reviewed journals, including the Journal of Clinical Psychology.

Learn about our Editorial Process

Olivia Guy-Evans, MSc

Associate Editor for Simply Psychology

BSc (Hons) Psychology, MSc Psychology of Education

Olivia Guy-Evans is a writer and associate editor for Simply Psychology. She has previously worked in healthcare and educational sectors.

On This Page:

A research hypothesis, in its plural form “hypotheses,” is a specific, testable prediction about the anticipated results of a study, established at its outset. It is a key component of the scientific method .

Hypotheses connect theory to data and guide the research process towards expanding scientific understanding

Some key points about hypotheses:

  • A hypothesis expresses an expected pattern or relationship. It connects the variables under investigation.
  • It is stated in clear, precise terms before any data collection or analysis occurs. This makes the hypothesis testable.
  • A hypothesis must be falsifiable. It should be possible, even if unlikely in practice, to collect data that disconfirms rather than supports the hypothesis.
  • Hypotheses guide research. Scientists design studies to explicitly evaluate hypotheses about how nature works.
  • For a hypothesis to be valid, it must be testable against empirical evidence. The evidence can then confirm or disprove the testable predictions.
  • Hypotheses are informed by background knowledge and observation, but go beyond what is already known to propose an explanation of how or why something occurs.
Predictions typically arise from a thorough knowledge of the research literature, curiosity about real-world problems or implications, and integrating this to advance theory. They build on existing literature while providing new insight.

Types of Research Hypotheses

Alternative hypothesis.

The research hypothesis is often called the alternative or experimental hypothesis in experimental research.

It typically suggests a potential relationship between two key variables: the independent variable, which the researcher manipulates, and the dependent variable, which is measured based on those changes.

The alternative hypothesis states a relationship exists between the two variables being studied (one variable affects the other).

A hypothesis is a testable statement or prediction about the relationship between two or more variables. It is a key component of the scientific method. Some key points about hypotheses:

  • Important hypotheses lead to predictions that can be tested empirically. The evidence can then confirm or disprove the testable predictions.

In summary, a hypothesis is a precise, testable statement of what researchers expect to happen in a study and why. Hypotheses connect theory to data and guide the research process towards expanding scientific understanding.

An experimental hypothesis predicts what change(s) will occur in the dependent variable when the independent variable is manipulated.

It states that the results are not due to chance and are significant in supporting the theory being investigated.

The alternative hypothesis can be directional, indicating a specific direction of the effect, or non-directional, suggesting a difference without specifying its nature. It’s what researchers aim to support or demonstrate through their study.

Null Hypothesis

The null hypothesis states no relationship exists between the two variables being studied (one variable does not affect the other). There will be no changes in the dependent variable due to manipulating the independent variable.

It states results are due to chance and are not significant in supporting the idea being investigated.

The null hypothesis, positing no effect or relationship, is a foundational contrast to the research hypothesis in scientific inquiry. It establishes a baseline for statistical testing, promoting objectivity by initiating research from a neutral stance.

Many statistical methods are tailored to test the null hypothesis, determining the likelihood of observed results if no true effect exists.

This dual-hypothesis approach provides clarity, ensuring that research intentions are explicit, and fosters consistency across scientific studies, enhancing the standardization and interpretability of research outcomes.

Nondirectional Hypothesis

A non-directional hypothesis, also known as a two-tailed hypothesis, predicts that there is a difference or relationship between two variables but does not specify the direction of this relationship.

It merely indicates that a change or effect will occur without predicting which group will have higher or lower values.

For example, “There is a difference in performance between Group A and Group B” is a non-directional hypothesis.

Directional Hypothesis

A directional (one-tailed) hypothesis predicts the nature of the effect of the independent variable on the dependent variable. It predicts in which direction the change will take place. (i.e., greater, smaller, less, more)

It specifies whether one variable is greater, lesser, or different from another, rather than just indicating that there’s a difference without specifying its nature.

For example, “Exercise increases weight loss” is a directional hypothesis.

hypothesis

Falsifiability

The Falsification Principle, proposed by Karl Popper , is a way of demarcating science from non-science. It suggests that for a theory or hypothesis to be considered scientific, it must be testable and irrefutable.

Falsifiability emphasizes that scientific claims shouldn’t just be confirmable but should also have the potential to be proven wrong.

It means that there should exist some potential evidence or experiment that could prove the proposition false.

However many confirming instances exist for a theory, it only takes one counter observation to falsify it. For example, the hypothesis that “all swans are white,” can be falsified by observing a black swan.

For Popper, science should attempt to disprove a theory rather than attempt to continually provide evidence to support a research hypothesis.

Can a Hypothesis be Proven?

Hypotheses make probabilistic predictions. They state the expected outcome if a particular relationship exists. However, a study result supporting a hypothesis does not definitively prove it is true.

All studies have limitations. There may be unknown confounding factors or issues that limit the certainty of conclusions. Additional studies may yield different results.

In science, hypotheses can realistically only be supported with some degree of confidence, not proven. The process of science is to incrementally accumulate evidence for and against hypothesized relationships in an ongoing pursuit of better models and explanations that best fit the empirical data. But hypotheses remain open to revision and rejection if that is where the evidence leads.
  • Disproving a hypothesis is definitive. Solid disconfirmatory evidence will falsify a hypothesis and require altering or discarding it based on the evidence.
  • However, confirming evidence is always open to revision. Other explanations may account for the same results, and additional or contradictory evidence may emerge over time.

We can never 100% prove the alternative hypothesis. Instead, we see if we can disprove, or reject the null hypothesis.

If we reject the null hypothesis, this doesn’t mean that our alternative hypothesis is correct but does support the alternative/experimental hypothesis.

Upon analysis of the results, an alternative hypothesis can be rejected or supported, but it can never be proven to be correct. We must avoid any reference to results proving a theory as this implies 100% certainty, and there is always a chance that evidence may exist which could refute a theory.

How to Write a Hypothesis

  • Identify variables . The researcher manipulates the independent variable and the dependent variable is the measured outcome.
  • Operationalized the variables being investigated . Operationalization of a hypothesis refers to the process of making the variables physically measurable or testable, e.g. if you are about to study aggression, you might count the number of punches given by participants.
  • Decide on a direction for your prediction . If there is evidence in the literature to support a specific effect of the independent variable on the dependent variable, write a directional (one-tailed) hypothesis. If there are limited or ambiguous findings in the literature regarding the effect of the independent variable on the dependent variable, write a non-directional (two-tailed) hypothesis.
  • Make it Testable : Ensure your hypothesis can be tested through experimentation or observation. It should be possible to prove it false (principle of falsifiability).
  • Clear & concise language . A strong hypothesis is concise (typically one to two sentences long), and formulated using clear and straightforward language, ensuring it’s easily understood and testable.

Consider a hypothesis many teachers might subscribe to: students work better on Monday morning than on Friday afternoon (IV=Day, DV= Standard of work).

Now, if we decide to study this by giving the same group of students a lesson on a Monday morning and a Friday afternoon and then measuring their immediate recall of the material covered in each session, we would end up with the following:

  • The alternative hypothesis states that students will recall significantly more information on a Monday morning than on a Friday afternoon.
  • The null hypothesis states that there will be no significant difference in the amount recalled on a Monday morning compared to a Friday afternoon. Any difference will be due to chance or confounding factors.

More Examples

  • Memory : Participants exposed to classical music during study sessions will recall more items from a list than those who studied in silence.
  • Social Psychology : Individuals who frequently engage in social media use will report higher levels of perceived social isolation compared to those who use it infrequently.
  • Developmental Psychology : Children who engage in regular imaginative play have better problem-solving skills than those who don’t.
  • Clinical Psychology : Cognitive-behavioral therapy will be more effective in reducing symptoms of anxiety over a 6-month period compared to traditional talk therapy.
  • Cognitive Psychology : Individuals who multitask between various electronic devices will have shorter attention spans on focused tasks than those who single-task.
  • Health Psychology : Patients who practice mindfulness meditation will experience lower levels of chronic pain compared to those who don’t meditate.
  • Organizational Psychology : Employees in open-plan offices will report higher levels of stress than those in private offices.
  • Behavioral Psychology : Rats rewarded with food after pressing a lever will press it more frequently than rats who receive no reward.

Print Friendly, PDF & Email

  • Bipolar Disorder
  • Therapy Center
  • When To See a Therapist
  • Types of Therapy
  • Best Online Therapy
  • Best Couples Therapy
  • Best Family Therapy
  • Managing Stress
  • Sleep and Dreaming
  • Understanding Emotions
  • Self-Improvement
  • Healthy Relationships
  • Student Resources
  • Personality Types
  • Guided Meditations
  • Verywell Mind Insights
  • 2023 Verywell Mind 25
  • Mental Health in the Classroom
  • Editorial Process
  • Meet Our Review Board
  • Crisis Support

How to Write a Great Hypothesis

Hypothesis Format, Examples, and Tips

Kendra Cherry, MS, is a psychosocial rehabilitation specialist, psychology educator, and author of the "Everything Psychology Book."

what are the types hypothesis

Amy Morin, LCSW, is a psychotherapist and international bestselling author. Her books, including "13 Things Mentally Strong People Don't Do," have been translated into more than 40 languages. Her TEDx talk,  "The Secret of Becoming Mentally Strong," is one of the most viewed talks of all time.

what are the types hypothesis

Verywell / Alex Dos Diaz

  • The Scientific Method

Hypothesis Format

Falsifiability of a hypothesis, operational definitions, types of hypotheses, hypotheses examples.

  • Collecting Data

Frequently Asked Questions

A hypothesis is a tentative statement about the relationship between two or more  variables. It is a specific, testable prediction about what you expect to happen in a study.

One hypothesis example would be a study designed to look at the relationship between sleep deprivation and test performance might have a hypothesis that states: "This study is designed to assess the hypothesis that sleep-deprived people will perform worse on a test than individuals who are not sleep-deprived."

This article explores how a hypothesis is used in psychology research, how to write a good hypothesis, and the different types of hypotheses you might use.

The Hypothesis in the Scientific Method

In the scientific method , whether it involves research in psychology, biology, or some other area, a hypothesis represents what the researchers think will happen in an experiment. The scientific method involves the following steps:

  • Forming a question
  • Performing background research
  • Creating a hypothesis
  • Designing an experiment
  • Collecting data
  • Analyzing the results
  • Drawing conclusions
  • Communicating the results

The hypothesis is a prediction, but it involves more than a guess. Most of the time, the hypothesis begins with a question which is then explored through background research. It is only at this point that researchers begin to develop a testable hypothesis. Unless you are creating an exploratory study, your hypothesis should always explain what you  expect  to happen.

In a study exploring the effects of a particular drug, the hypothesis might be that researchers expect the drug to have some type of effect on the symptoms of a specific illness. In psychology, the hypothesis might focus on how a certain aspect of the environment might influence a particular behavior.

Remember, a hypothesis does not have to be correct. While the hypothesis predicts what the researchers expect to see, the goal of the research is to determine whether this guess is right or wrong. When conducting an experiment, researchers might explore a number of factors to determine which ones might contribute to the ultimate outcome.

In many cases, researchers may find that the results of an experiment  do not  support the original hypothesis. When writing up these results, the researchers might suggest other options that should be explored in future studies.

In many cases, researchers might draw a hypothesis from a specific theory or build on previous research. For example, prior research has shown that stress can impact the immune system. So a researcher might hypothesize: "People with high-stress levels will be more likely to contract a common cold after being exposed to the virus than people who have low-stress levels."

In other instances, researchers might look at commonly held beliefs or folk wisdom. "Birds of a feather flock together" is one example of folk wisdom that a psychologist might try to investigate. The researcher might pose a specific hypothesis that "People tend to select romantic partners who are similar to them in interests and educational level."

Elements of a Good Hypothesis

So how do you write a good hypothesis? When trying to come up with a hypothesis for your research or experiments, ask yourself the following questions:

  • Is your hypothesis based on your research on a topic?
  • Can your hypothesis be tested?
  • Does your hypothesis include independent and dependent variables?

Before you come up with a specific hypothesis, spend some time doing background research. Once you have completed a literature review, start thinking about potential questions you still have. Pay attention to the discussion section in the  journal articles you read . Many authors will suggest questions that still need to be explored.

To form a hypothesis, you should take these steps:

  • Collect as many observations about a topic or problem as you can.
  • Evaluate these observations and look for possible causes of the problem.
  • Create a list of possible explanations that you might want to explore.
  • After you have developed some possible hypotheses, think of ways that you could confirm or disprove each hypothesis through experimentation. This is known as falsifiability.

In the scientific method ,  falsifiability is an important part of any valid hypothesis.   In order to test a claim scientifically, it must be possible that the claim could be proven false.

Students sometimes confuse the idea of falsifiability with the idea that it means that something is false, which is not the case. What falsifiability means is that  if  something was false, then it is possible to demonstrate that it is false.

One of the hallmarks of pseudoscience is that it makes claims that cannot be refuted or proven false.

A variable is a factor or element that can be changed and manipulated in ways that are observable and measurable. However, the researcher must also define how the variable will be manipulated and measured in the study.

For example, a researcher might operationally define the variable " test anxiety " as the results of a self-report measure of anxiety experienced during an exam. A "study habits" variable might be defined by the amount of studying that actually occurs as measured by time.

These precise descriptions are important because many things can be measured in a number of different ways. One of the basic principles of any type of scientific research is that the results must be replicable.   By clearly detailing the specifics of how the variables were measured and manipulated, other researchers can better understand the results and repeat the study if needed.

Some variables are more difficult than others to define. How would you operationally define a variable such as aggression ? For obvious ethical reasons, researchers cannot create a situation in which a person behaves aggressively toward others.

In order to measure this variable, the researcher must devise a measurement that assesses aggressive behavior without harming other people. In this situation, the researcher might utilize a simulated task to measure aggressiveness.

Hypothesis Checklist

  • Does your hypothesis focus on something that you can actually test?
  • Does your hypothesis include both an independent and dependent variable?
  • Can you manipulate the variables?
  • Can your hypothesis be tested without violating ethical standards?

The hypothesis you use will depend on what you are investigating and hoping to find. Some of the main types of hypotheses that you might use include:

  • Simple hypothesis : This type of hypothesis suggests that there is a relationship between one independent variable and one dependent variable.
  • Complex hypothesis : This type of hypothesis suggests a relationship between three or more variables, such as two independent variables and a dependent variable.
  • Null hypothesis : This hypothesis suggests no relationship exists between two or more variables.
  • Alternative hypothesis : This hypothesis states the opposite of the null hypothesis.
  • Statistical hypothesis : This hypothesis uses statistical analysis to evaluate a representative sample of the population and then generalizes the findings to the larger group.
  • Logical hypothesis : This hypothesis assumes a relationship between variables without collecting data or evidence.

A hypothesis often follows a basic format of "If {this happens} then {this will happen}." One way to structure your hypothesis is to describe what will happen to the  dependent variable  if you change the  independent variable .

The basic format might be: "If {these changes are made to a certain independent variable}, then we will observe {a change in a specific dependent variable}."

A few examples of simple hypotheses:

  • "Students who eat breakfast will perform better on a math exam than students who do not eat breakfast."
  • Complex hypothesis: "Students who experience test anxiety before an English exam will get lower scores than students who do not experience test anxiety."​
  • "Motorists who talk on the phone while driving will be more likely to make errors on a driving course than those who do not talk on the phone."

Examples of a complex hypothesis include:

  • "People with high-sugar diets and sedentary activity levels are more likely to develop depression."
  • "Younger people who are regularly exposed to green, outdoor areas have better subjective well-being than older adults who have limited exposure to green spaces."

Examples of a null hypothesis include:

  • "Children who receive a new reading intervention will have scores different than students who do not receive the intervention."
  • "There will be no difference in scores on a memory recall task between children and adults."

Examples of an alternative hypothesis:

  • "Children who receive a new reading intervention will perform better than students who did not receive the intervention."
  • "Adults will perform better on a memory task than children." 

Collecting Data on Your Hypothesis

Once a researcher has formed a testable hypothesis, the next step is to select a research design and start collecting data. The research method depends largely on exactly what they are studying. There are two basic types of research methods: descriptive research and experimental research.

Descriptive Research Methods

Descriptive research such as  case studies ,  naturalistic observations , and surveys are often used when it would be impossible or difficult to  conduct an experiment . These methods are best used to describe different aspects of a behavior or psychological phenomenon.

Once a researcher has collected data using descriptive methods, a correlational study can then be used to look at how the variables are related. This type of research method might be used to investigate a hypothesis that is difficult to test experimentally.

Experimental Research Methods

Experimental methods  are used to demonstrate causal relationships between variables. In an experiment, the researcher systematically manipulates a variable of interest (known as the independent variable) and measures the effect on another variable (known as the dependent variable).

Unlike correlational studies, which can only be used to determine if there is a relationship between two variables, experimental methods can be used to determine the actual nature of the relationship—whether changes in one variable actually  cause  another to change.

A Word From Verywell

The hypothesis is a critical part of any scientific exploration. It represents what researchers expect to find in a study or experiment. In situations where the hypothesis is unsupported by the research, the research still has value. Such research helps us better understand how different aspects of the natural world relate to one another. It also helps us develop new hypotheses that can then be tested in the future.

Some examples of how to write a hypothesis include:

  • "Staying up late will lead to worse test performance the next day."
  • "People who consume one apple each day will visit the doctor fewer times each year."
  • "Breaking study sessions up into three 20-minute sessions will lead to better test results than a single 60-minute study session."

The four parts of a hypothesis are:

  • The research question
  • The independent variable (IV)
  • The dependent variable (DV)
  • The proposed relationship between the IV and DV

Castillo M. The scientific method: a need for something better? . AJNR Am J Neuroradiol. 2013;34(9):1669-71. doi:10.3174/ajnr.A3401

Nevid J. Psychology: Concepts and Applications. Wadworth, 2013.

By Kendra Cherry, MSEd Kendra Cherry, MS, is a psychosocial rehabilitation specialist, psychology educator, and author of the "Everything Psychology Book."

Nayturr

8 Different Types of Hypotheses (Plus Essential Facts)

A hand highlighting the word

The hypothesis is an idea or a premise used as a jumping off the ground for further investigation. It’s essential to scientific research because it serves as a compass for scientists or researchers in carrying out their experiments or studies.

There are different types of hypotheses but crafting a good hypothesis can be tricky. A sound hypothesis should be logical, affirmative, clear, precise, quantifiable, or can be tested, and has a cause and effect factor.

Types 

Alternative hypothesis.

Also known as a maintained hypothesis or a research hypothesis, an alternative hypothesis is the exact opposite of a null hypothesis, and it is often used in statistical hypothesis testing. There are four main types of alternative hypothesis:

  • Point alternative hypothesis . This hypothesis occurs when the population distribution in the hypothesis test is fully defined and has no unknown parameters. It usually has no practical interest, but it is considered important in other statistical activities.
  • Non-directional alternative hypothesis. These hypotheses have nothing to do with the either region of rejection (i.e., one-tailed or two-tailed directional hypotheses) but instead, only that the null hypothesis is untrue.
  • One-tailed directional hypothesis. This hypothesis is only concerned with the region of direction for one tail of a sampling distribution, not both of them.
  • Two-tailed directional hypothesis. This hypothesis is concerned with both regions of rejection of a particular sampling distribution

Known by the symbol H1, this type of hypothesis proclaims the expected relationship between the variables in the theory.

Associative and Causal Hypothesis

Associative hypotheses simply state that there is a relationship between two variables, whereas causal hypotheses state that any difference in the type or amount of one particular variable is going to directly affect the difference in the type or amount of the next variable in the equation.

Note: This post may contain affiliate links which will take you to online retailers that sell products and services. If you click on one and buy something, I may earn from qualifying purchases. See my Affiliate Disclosure for more details.

These hypotheses are often used in the field of psychology. A causal hypothesis looks at how manipulation affects events in the future, while an associative hypothesis looks at how specific events co-occur.

A good example of its practical use occurs when discussing the psychological aspects of eyewitness testimonies, and they generally affect four areas of this phenomenon: emotion and memory, system variables in the line-up, estimation of the duration of the event, and own-race bias.

Complex Hypothesis

In a complex hypothesis, a relationship exists between the variables . In these hypotheses, there are more than two independent and dependent variables, as demonstrated in the following hypotheses:

  • Taking drugs and smoking cigarettes leads to respiratory problems, increased tension, and cancer.
  • The people who are older and living in rural areas are happier than people who are younger and who live in the city or suburbs.
  • If you eat a high-fat diet and a few vegetables, you are more likely to suffer from hypertension and high cholesterol than someone who eats a lot of vegetables and sticks to a low-fat diet.

Directional Hypothesis

A directional hypothesis is one regarding either a positive or negative difference or change in the two variables involved. Typically based on aspects such as accepted theory, literature printed on the topic at hand, past research, and even accepted theory, researchers normally develop this type of hypothesis from research questions, and they use statistical methods to check its validity.

Words you often hear in hypotheses that are directional in nature include more, less, increase, decrease, positive, negative, higher, and lower. Directional hypotheses specify the direction or nature of the relationship between two or more independent variables and two or more dependent variables.

Non-Directional Hypothesis

This hypothesis states that there is a distinct relationship between two variables; however, it does not predict the exact nature or direction of that particular relationship.

Null Hypothesis

Null hypothesis with gear icons as background.

Indicated by the symbol Ho, a null hypothesis predicts that the variables in a certain hypothesis have no relationship to one another and that the hypothesis is normally subjected to some type of statistical analysis. It essentially states that the data and variables being investigated do not actually exist.

A perfect example of this comes when looking at scientific medical studies, where you have both an experimental and control group, and you are hypothesizing that there will be no difference in the results of these two groups.

Simple Hypothesis

This hypothesis consists of two variables, an independent variable or cause, and a dependent variable or cause. Simple hypotheses contain a relationship between these two variables. For example, the following are examples of simple hypotheses:

  • The more you chew tobacco, the more likely you are to develop mouth cancer.
  • The more money you make, the less likely you are to be involved in criminal activity.
  • The more educated you are, the more likely you are to have a well-paying job.

Statistical Hypothesis

This is just a hypothesis that is able to be verified through statistics. It can be either logical or illogical, but if you can use statistics to verify it, it is called a statistical hypothesis.

Facts about Hypotheses

what are the types hypothesis

Difference Between Simple and Complex Hypotheses

In a simple hypothesis, there is a dependent and an independent variable, as well as a relationship between the two. The independent variable is the cause and comes first when they’re in chronological order, and the dependent variable describes the effect. In a complex hypothesis, the relationship is between two or more independent variables and two or more dependent variables.

Difference Between Non-Directional and Directional Hypotheses

In a directional research hypothesis, the direction of the relationship is predicted. The advantages of this type of hypothesis include one-tailed statistical tests, theoretical propositions that can be tested in a more precise manner, and the fact that the researcher’s expectations are very clear right from the start.

In a non-directional research hypothesis, the relationship between the variables is predicted but not the direction of that relationship. Reasons to use this type of research hypothesis include when your previous research findings contradict one another and when there is no theory on which to base your predictions.

Difference Between a Hypothesis and a Theory

There are many different differences between a theory and a hypothesis, including the following:

  • A hypothesis is a suggestion of what might happen when you test out a theory. It is a prediction of a possible correlation between various phenomena. On the other hand, a theory has been tested and is well-substantiated. If a hypothesis succeeds in proving a certain point, it can then be called a theory.
  • The data for a hypothesis is most often very limited, whereas the data relating to theory has been tested under numerous circumstances.
  • A hypothesis offers a very specific instance; that is, it is limited to just one observation. On the other hand, a theory is more generalized and is put through a multitude of experiments and tests, which can then apply to various specific instances.
  • The purposes of these two items are different as well. A hypothesis starts with a possibility that is uncertain but can be studied further via observations and experiments. A theory is used to explain why large sets of observations are continuously made.
  • Hypotheses are based on various suggestions and possibilities but have uncertain results, while theories have a steady and reliable consensus among scientists and other professionals.
  • Both theories and hypotheses are testable and falsifiable, but unlike theories, hypotheses are neither well-tested nor well-substantiated.

What is the Interaction Effect?

This effect describes the two variables’ relationship to one another.

When Writing the Hypothesis, There is a Certain Format to Follow

This includes three aspects:

  • The correlational statement
  • The comparative statement
  • A statistical analysis

How are Hypotheses Used to Test Theories?

  • Do not test the entire theory, just the proposition
  • It can never be either proved or disproved

When Formulating a Hypothesis, There are Things to Consider

These include:

  • You have to write it in the present tense
  • It has to be empirically testable
  • You have to write it in a declarative sentence
  • It has to contain all of the variables
  • It must contain three parts: the purpose statement, the problem statement, and the research question
  • It has to contain the population

What is the Best Definition of a Scientific Hypothesis?

It is essentially an educated guess; however, that guess will lose its credibility if it is falsifiable.

How to Use Research Questions

There are two ways to include research questions when testing a theory. The first is in addition to a hypothesis related to the topic’s other areas of interest, and the second is in place of the actual hypothesis, which occurs in some instances.

Tips to Keep in Mind When Developing a Hypothesis

  • Use language that is very precise. Your language should be concise, simple, and clean. This is not a time when you want to be vague, because everything needs to be spelled out in great detail.
  • Be as logical as possible. If you believe in something, you want to prove it, and remaining logical at all times is a great start.
  • Use research and experimentation to determine whether your hypothesis is testable. All hypotheses need to be proven. You have to know that proving your theory is going to work, even if you find out different in the end.

What is the Number-One Purpose of a Scientific Method?

Scientific methods are there to provide a structured way to get the appropriate evidence in order to either refute or prove a scientific hypothesis.

Glossary of Terms Related to Hypotheses

Scientist pointing on a chalkboard to explain the scientific method steps.

Bivariate Data: This is data that includes two distinct variables, which are random and usually graphed via a scatter plot.

Categorical Data: These data fit into a tiny number of very discrete categories. They are usually either nominal, or non-ordered, which can include things such as age or country; or they can be ordinal, or ordered, which includes aspects such as hot or cold temperature.

Correlation: This is a measure of how closely two variables are to one another. It measures whether a change in one random variable corresponds to a change in the other random variable. For example, the correlation between smoking and getting lung cancer has been widely studied.

Data: These are the results found from conducting a survey or experiment, or even an observation study of some type.

Dependent Event: If the happening of one event affects the probability of another event occurring also, they are said to be dependent events.

Distribution: The way the probability of a random variable taking a certain value is described is called its distribution. Possible distribution functions include the cumulative, probability density, or probability mass function.

Element: This refers to an object in a certain set, and that object is an element of that set.

Empirical Probability: This refers to the likelihood of an outcome happening, and it is determined by the repeat performance of a particular experiment.  You can do this by dividing the number of times that event took place by the number of times you conducted the experiment.

Equality of Sets: If two sets contain the exact same elements, they are considered equal sets. In order to determine if this is so, it can be advantageous to show that each set is contained in the other set.

Equally Likely Outcomes: Refers to outcomes that have the same probability; for example, if you toss a coin there are only two likely outcomes.

Event: This term refers to the subset of a sample space.

Expected Value: This demonstrates the average value of a quantity that is random and which has been observed numerous times in order to duplicate the same results of previous experiments.

Experiment: A scientific process that results in a set of outcomes that is observable. Even selecting a toy from a box of toys can be considered an experiment in this instance.

Experimental Probability: When you estimate how likely something is to occur, this is an experimental probability example. To get this probability, you divide the number of trials that were successful by the total number of trials that were performed.

Finite Sample Space: These sample spaces have a finite number of outcomes that could possibly occur.

Frequency: The frequency is the number of times a certain value occurs when you observe an experiment’s results.

Frequency Distribution: This refers to the data that describes possible groups or values and the frequencies that correspond to those groups or values.

Histogram: A histogram, or frequency histogram, is a bar graph that demonstrates how frequently data points occur.

Independent Event: If two events occur, and one event’s outcome has no effect on the other’s outcome, this is known as an independent event.

Infinite Sample Space: This refers to a sample space that consists of outcomes with an infinite number of possibilities.

Mutually Exclusive: Events are mutually exclusive if their outcomes have absolutely nothing in common.

Notations: Notations are operations or quantities described by symbols instead of numbers.

Observational Study: Like the name implies, these are studies that allow you to collect data through basic observation.

Odds: This is a way to express the likelihood that a certain event will happen. If you see odds of m:n, it means it is expected that a certain event will happen m times for every n times it does not happen.

One-Variable Data: Data that have related behaviors usually associated in some important way.

Outcome: The outcome is simply the result of a particular experiment. If you consider a set of all of the possible outcomes, this is called the sample space.

Probability: A probability is merely the likelihood that a certain event will take place, and it is expressed on a scale of 0 to one, with 0 meaning it is impossible that it will happen and one being a certainty that it will happen. Probability can also be expressed as a percentage, starting with 0 and ending at 100%.

Random Experiment: A random experiment is one whereby the outcome can’t be predicted with any amount of certainty, at least not before the experiment actually takes place.

Random Variable: Random variables take on different numerical values, based on the results of a particular experiment.

Replacement: Replacement is the act of returning or replacing an item back into a sample space, which takes place after an event and allows the item to be chosen more than one time.

Sample Space: This term refers to all of the possible outcomes that could result from a probability experiment.

Set: A collection of objects that is well-defined is called a set.

Simple Event: When an event is a single element of the sample space, it is known as a simple event.

Simulation: A simulation is a type of experiment that mimics a real-life event.

Single-Variable Data: These are data that use only one unknown variable.

Statistics: This is the branch of mathematics that deals with the study of quantitative data. If you analyze certain events that are governed by probability, this is called statistics.

Theoretical Probability: This probability describes the ratio of the number of outcomes in a specific event to the number of outcomes found in the sample space. It is based on the presumption that all outcomes are equally liable.

Union: Usually described by the symbol ∪, or the cup symbol, a union describes the combination of two or more sets and their elements.

Variable: A variable is a quantity that varies and is almost always represented by letters.

8 different types of hypotheses.

Share this post:

Definition of a Hypothesis

What it is and how it's used in sociology

  • Key Concepts
  • Major Sociologists
  • News & Issues
  • Research, Samples, and Statistics
  • Recommended Reading
  • Archaeology

A hypothesis is a prediction of what will be found at the outcome of a research project and is typically focused on the relationship between two different variables studied in the research. It is usually based on both theoretical expectations about how things work and already existing scientific evidence.

Within social science, a hypothesis can take two forms. It can predict that there is no relationship between two variables, in which case it is a null hypothesis . Or, it can predict the existence of a relationship between variables, which is known as an alternative hypothesis.

In either case, the variable that is thought to either affect or not affect the outcome is known as the independent variable, and the variable that is thought to either be affected or not is the dependent variable.

Researchers seek to determine whether or not their hypothesis, or hypotheses if they have more than one, will prove true. Sometimes they do, and sometimes they do not. Either way, the research is considered successful if one can conclude whether or not a hypothesis is true. 

Null Hypothesis

A researcher has a null hypothesis when she or he believes, based on theory and existing scientific evidence, that there will not be a relationship between two variables. For example, when examining what factors influence a person's highest level of education within the U.S., a researcher might expect that place of birth, number of siblings, and religion would not have an impact on the level of education. This would mean the researcher has stated three null hypotheses.

Alternative Hypothesis

Taking the same example, a researcher might expect that the economic class and educational attainment of one's parents, and the race of the person in question are likely to have an effect on one's educational attainment. Existing evidence and social theories that recognize the connections between wealth and cultural resources , and how race affects access to rights and resources in the U.S. , would suggest that both economic class and educational attainment of the one's parents would have a positive effect on educational attainment. In this case, economic class and educational attainment of one's parents are independent variables, and one's educational attainment is the dependent variable—it is hypothesized to be dependent on the other two.

Conversely, an informed researcher would expect that being a race other than white in the U.S. is likely to have a negative impact on a person's educational attainment. This would be characterized as a negative relationship, wherein being a person of color has a negative effect on one's educational attainment. In reality, this hypothesis proves true, with the exception of Asian Americans , who go to college at a higher rate than whites do. However, Blacks and Hispanics and Latinos are far less likely than whites and Asian Americans to go to college.

Formulating a Hypothesis

Formulating a hypothesis can take place at the very beginning of a research project , or after a bit of research has already been done. Sometimes a researcher knows right from the start which variables she is interested in studying, and she may already have a hunch about their relationships. Other times, a researcher may have an interest in ​a particular topic, trend, or phenomenon, but he may not know enough about it to identify variables or formulate a hypothesis.

Whenever a hypothesis is formulated, the most important thing is to be precise about what one's variables are, what the nature of the relationship between them might be, and how one can go about conducting a study of them.

Updated by Nicki Lisa Cole, Ph.D

  • What Is a Hypothesis? (Science)
  • Understanding Path Analysis
  • Null Hypothesis Examples
  • What Are the Elements of a Good Hypothesis?
  • What 'Fail to Reject' Means in a Hypothesis Test
  • How Intervening Variables Work in Sociology
  • Null Hypothesis Definition and Examples
  • Understanding Simple vs Controlled Experiments
  • Scientific Method Vocabulary Terms
  • Null Hypothesis and Alternative Hypothesis
  • Six Steps of the Scientific Method
  • What Are Examples of a Hypothesis?
  • Structural Equation Modeling
  • Scientific Method Flow Chart
  • How To Design a Science Fair Experiment
  • Hypothesis Test for the Difference of Two Population Proportions

Grad Coach

What Is A Research (Scientific) Hypothesis? A plain-language explainer + examples

By:  Derek Jansen (MBA)  | Reviewed By: Dr Eunice Rautenbach | June 2020

If you’re new to the world of research, or it’s your first time writing a dissertation or thesis, you’re probably noticing that the words “research hypothesis” and “scientific hypothesis” are used quite a bit, and you’re wondering what they mean in a research context .

“Hypothesis” is one of those words that people use loosely, thinking they understand what it means. However, it has a very specific meaning within academic research. So, it’s important to understand the exact meaning before you start hypothesizing. 

Research Hypothesis 101

  • What is a hypothesis ?
  • What is a research hypothesis (scientific hypothesis)?
  • Requirements for a research hypothesis
  • Definition of a research hypothesis
  • The null hypothesis

What is a hypothesis?

Let’s start with the general definition of a hypothesis (not a research hypothesis or scientific hypothesis), according to the Cambridge Dictionary:

Hypothesis: an idea or explanation for something that is based on known facts but has not yet been proved.

In other words, it’s a statement that provides an explanation for why or how something works, based on facts (or some reasonable assumptions), but that has not yet been specifically tested . For example, a hypothesis might look something like this:

Hypothesis: sleep impacts academic performance.

This statement predicts that academic performance will be influenced by the amount and/or quality of sleep a student engages in – sounds reasonable, right? It’s based on reasonable assumptions , underpinned by what we currently know about sleep and health (from the existing literature). So, loosely speaking, we could call it a hypothesis, at least by the dictionary definition.

But that’s not good enough…

Unfortunately, that’s not quite sophisticated enough to describe a research hypothesis (also sometimes called a scientific hypothesis), and it wouldn’t be acceptable in a dissertation, thesis or research paper . In the world of academic research, a statement needs a few more criteria to constitute a true research hypothesis .

What is a research hypothesis?

A research hypothesis (also called a scientific hypothesis) is a statement about the expected outcome of a study (for example, a dissertation or thesis). To constitute a quality hypothesis, the statement needs to have three attributes – specificity , clarity and testability .

Let’s take a look at these more closely.

Need a helping hand?

what are the types hypothesis

Hypothesis Essential #1: Specificity & Clarity

A good research hypothesis needs to be extremely clear and articulate about both what’ s being assessed (who or what variables are involved ) and the expected outcome (for example, a difference between groups, a relationship between variables, etc.).

Let’s stick with our sleepy students example and look at how this statement could be more specific and clear.

Hypothesis: Students who sleep at least 8 hours per night will, on average, achieve higher grades in standardised tests than students who sleep less than 8 hours a night.

As you can see, the statement is very specific as it identifies the variables involved (sleep hours and test grades), the parties involved (two groups of students), as well as the predicted relationship type (a positive relationship). There’s no ambiguity or uncertainty about who or what is involved in the statement, and the expected outcome is clear.

Contrast that to the original hypothesis we looked at – “Sleep impacts academic performance” – and you can see the difference. “Sleep” and “academic performance” are both comparatively vague , and there’s no indication of what the expected relationship direction is (more sleep or less sleep). As you can see, specificity and clarity are key.

A good research hypothesis needs to be very clear about what’s being assessed and very specific about the expected outcome.

Hypothesis Essential #2: Testability (Provability)

A statement must be testable to qualify as a research hypothesis. In other words, there needs to be a way to prove (or disprove) the statement. If it’s not testable, it’s not a hypothesis – simple as that.

For example, consider the hypothesis we mentioned earlier:

Hypothesis: Students who sleep at least 8 hours per night will, on average, achieve higher grades in standardised tests than students who sleep less than 8 hours a night.  

We could test this statement by undertaking a quantitative study involving two groups of students, one that gets 8 or more hours of sleep per night for a fixed period, and one that gets less. We could then compare the standardised test results for both groups to see if there’s a statistically significant difference. 

Again, if you compare this to the original hypothesis we looked at – “Sleep impacts academic performance” – you can see that it would be quite difficult to test that statement, primarily because it isn’t specific enough. How much sleep? By who? What type of academic performance?

So, remember the mantra – if you can’t test it, it’s not a hypothesis 🙂

A good research hypothesis must be testable. In other words, you must able to collect observable data in a scientifically rigorous fashion to test it.

Defining A Research Hypothesis

You’re still with us? Great! Let’s recap and pin down a clear definition of a hypothesis.

A research hypothesis (or scientific hypothesis) is a statement about an expected relationship between variables, or explanation of an occurrence, that is clear, specific and testable.

So, when you write up hypotheses for your dissertation or thesis, make sure that they meet all these criteria. If you do, you’ll not only have rock-solid hypotheses but you’ll also ensure a clear focus for your entire research project.

What about the null hypothesis?

You may have also heard the terms null hypothesis , alternative hypothesis, or H-zero thrown around. At a simple level, the null hypothesis is the counter-proposal to the original hypothesis.

For example, if the hypothesis predicts that there is a relationship between two variables (for example, sleep and academic performance), the null hypothesis would predict that there is no relationship between those variables.

At a more technical level, the null hypothesis proposes that no statistical significance exists in a set of given observations and that any differences are due to chance alone.

And there you have it – hypotheses in a nutshell. 

If you have any questions, be sure to leave a comment below and we’ll do our best to help you. If you need hands-on help developing and testing your hypotheses, consider our private coaching service , where we hold your hand through the research journey.

what are the types hypothesis

Psst… there’s more (for free)

This post is part of our dissertation mini-course, which covers everything you need to get started with your dissertation, thesis or research project. 

You Might Also Like:

Research limitations vs delimitations

16 Comments

Lynnet Chikwaikwai

Very useful information. I benefit more from getting more information in this regard.

Dr. WuodArek

Very great insight,educative and informative. Please give meet deep critics on many research data of public international Law like human rights, environment, natural resources, law of the sea etc

Afshin

In a book I read a distinction is made between null, research, and alternative hypothesis. As far as I understand, alternative and research hypotheses are the same. Can you please elaborate? Best Afshin

GANDI Benjamin

This is a self explanatory, easy going site. I will recommend this to my friends and colleagues.

Lucile Dossou-Yovo

Very good definition. How can I cite your definition in my thesis? Thank you. Is nul hypothesis compulsory in a research?

Pereria

It’s a counter-proposal to be proven as a rejection

Egya Salihu

Please what is the difference between alternate hypothesis and research hypothesis?

Mulugeta Tefera

It is a very good explanation. However, it limits hypotheses to statistically tasteable ideas. What about for qualitative researches or other researches that involve quantitative data that don’t need statistical tests?

Derek Jansen

In qualitative research, one typically uses propositions, not hypotheses.

Samia

could you please elaborate it more

Patricia Nyawir

I’ve benefited greatly from these notes, thank you.

Hopeson Khondiwa

This is very helpful

Dr. Andarge

well articulated ideas are presented here, thank you for being reliable sources of information

TAUNO

Excellent. Thanks for being clear and sound about the research methodology and hypothesis (quantitative research)

I have only a simple question regarding the null hypothesis. – Is the null hypothesis (Ho) known as the reversible hypothesis of the alternative hypothesis (H1? – How to test it in academic research?

Tesfaye Negesa Urge

this is very important note help me much more

Trackbacks/Pingbacks

  • What Is Research Methodology? Simple Definition (With Examples) - Grad Coach - […] Contrasted to this, a quantitative methodology is typically used when the research aims and objectives are confirmatory in nature. For example,…

Submit a Comment Cancel reply

Your email address will not be published. Required fields are marked *

Save my name, email, and website in this browser for the next time I comment.

  • Print Friendly

Have a language expert improve your writing

Run a free plagiarism check in 10 minutes, generate accurate citations for free.

  • Knowledge Base
  • Null and Alternative Hypotheses | Definitions & Examples

Null & Alternative Hypotheses | Definitions, Templates & Examples

Published on May 6, 2022 by Shaun Turney . Revised on June 22, 2023.

The null and alternative hypotheses are two competing claims that researchers weigh evidence for and against using a statistical test :

  • Null hypothesis ( H 0 ): There’s no effect in the population .
  • Alternative hypothesis ( H a or H 1 ) : There’s an effect in the population.

Table of contents

Answering your research question with hypotheses, what is a null hypothesis, what is an alternative hypothesis, similarities and differences between null and alternative hypotheses, how to write null and alternative hypotheses, other interesting articles, frequently asked questions.

The null and alternative hypotheses offer competing answers to your research question . When the research question asks “Does the independent variable affect the dependent variable?”:

  • The null hypothesis ( H 0 ) answers “No, there’s no effect in the population.”
  • The alternative hypothesis ( H a ) answers “Yes, there is an effect in the population.”

The null and alternative are always claims about the population. That’s because the goal of hypothesis testing is to make inferences about a population based on a sample . Often, we infer whether there’s an effect in the population by looking at differences between groups or relationships between variables in the sample. It’s critical for your research to write strong hypotheses .

You can use a statistical test to decide whether the evidence favors the null or alternative hypothesis. Each type of statistical test comes with a specific way of phrasing the null and alternative hypothesis. However, the hypotheses can also be phrased in a general way that applies to any test.

Receive feedback on language, structure, and formatting

Professional editors proofread and edit your paper by focusing on:

  • Academic style
  • Vague sentences
  • Style consistency

See an example

what are the types hypothesis

The null hypothesis is the claim that there’s no effect in the population.

If the sample provides enough evidence against the claim that there’s no effect in the population ( p ≤ α), then we can reject the null hypothesis . Otherwise, we fail to reject the null hypothesis.

Although “fail to reject” may sound awkward, it’s the only wording that statisticians accept . Be careful not to say you “prove” or “accept” the null hypothesis.

Null hypotheses often include phrases such as “no effect,” “no difference,” or “no relationship.” When written in mathematical terms, they always include an equality (usually =, but sometimes ≥ or ≤).

You can never know with complete certainty whether there is an effect in the population. Some percentage of the time, your inference about the population will be incorrect. When you incorrectly reject the null hypothesis, it’s called a type I error . When you incorrectly fail to reject it, it’s a type II error.

Examples of null hypotheses

The table below gives examples of research questions and null hypotheses. There’s always more than one way to answer a research question, but these null hypotheses can help you get started.

*Note that some researchers prefer to always write the null hypothesis in terms of “no effect” and “=”. It would be fine to say that daily meditation has no effect on the incidence of depression and p 1 = p 2 .

The alternative hypothesis ( H a ) is the other answer to your research question . It claims that there’s an effect in the population.

Often, your alternative hypothesis is the same as your research hypothesis. In other words, it’s the claim that you expect or hope will be true.

The alternative hypothesis is the complement to the null hypothesis. Null and alternative hypotheses are exhaustive, meaning that together they cover every possible outcome. They are also mutually exclusive, meaning that only one can be true at a time.

Alternative hypotheses often include phrases such as “an effect,” “a difference,” or “a relationship.” When alternative hypotheses are written in mathematical terms, they always include an inequality (usually ≠, but sometimes < or >). As with null hypotheses, there are many acceptable ways to phrase an alternative hypothesis.

Examples of alternative hypotheses

The table below gives examples of research questions and alternative hypotheses to help you get started with formulating your own.

Null and alternative hypotheses are similar in some ways:

  • They’re both answers to the research question.
  • They both make claims about the population.
  • They’re both evaluated by statistical tests.

However, there are important differences between the two types of hypotheses, summarized in the following table.

Prevent plagiarism. Run a free check.

To help you write your hypotheses, you can use the template sentences below. If you know which statistical test you’re going to use, you can use the test-specific template sentences. Otherwise, you can use the general template sentences.

General template sentences

The only thing you need to know to use these general template sentences are your dependent and independent variables. To write your research question, null hypothesis, and alternative hypothesis, fill in the following sentences with your variables:

Does independent variable affect dependent variable ?

  • Null hypothesis ( H 0 ): Independent variable does not affect dependent variable.
  • Alternative hypothesis ( H a ): Independent variable affects dependent variable.

Test-specific template sentences

Once you know the statistical test you’ll be using, you can write your hypotheses in a more precise and mathematical way specific to the test you chose. The table below provides template sentences for common statistical tests.

Note: The template sentences above assume that you’re performing one-tailed tests . One-tailed tests are appropriate for most studies.

If you want to know more about statistics , methodology , or research bias , make sure to check out some of our other articles with explanations and examples.

  • Normal distribution
  • Descriptive statistics
  • Measures of central tendency
  • Correlation coefficient

Methodology

  • Cluster sampling
  • Stratified sampling
  • Types of interviews
  • Cohort study
  • Thematic analysis

Research bias

  • Implicit bias
  • Cognitive bias
  • Survivorship bias
  • Availability heuristic
  • Nonresponse bias
  • Regression to the mean

Hypothesis testing is a formal procedure for investigating our ideas about the world using statistics. It is used by scientists to test specific predictions, called hypotheses , by calculating how likely it is that a pattern or relationship between variables could have arisen by chance.

Null and alternative hypotheses are used in statistical hypothesis testing . The null hypothesis of a test always predicts no effect or no relationship between variables, while the alternative hypothesis states your research prediction of an effect or relationship.

The null hypothesis is often abbreviated as H 0 . When the null hypothesis is written using mathematical symbols, it always includes an equality symbol (usually =, but sometimes ≥ or ≤).

The alternative hypothesis is often abbreviated as H a or H 1 . When the alternative hypothesis is written using mathematical symbols, it always includes an inequality symbol (usually ≠, but sometimes < or >).

A research hypothesis is your proposed answer to your research question. The research hypothesis usually includes an explanation (“ x affects y because …”).

A statistical hypothesis, on the other hand, is a mathematical statement about a population parameter. Statistical hypotheses always come in pairs: the null and alternative hypotheses . In a well-designed study , the statistical hypotheses correspond logically to the research hypothesis.

Cite this Scribbr article

If you want to cite this source, you can copy and paste the citation or click the “Cite this Scribbr article” button to automatically add the citation to our free Citation Generator.

Turney, S. (2023, June 22). Null & Alternative Hypotheses | Definitions, Templates & Examples. Scribbr. Retrieved April 4, 2024, from https://www.scribbr.com/statistics/null-and-alternative-hypotheses/

Is this article helpful?

Shaun Turney

Shaun Turney

Other students also liked, inferential statistics | an easy introduction & examples, hypothesis testing | a step-by-step guide with easy examples, type i & type ii errors | differences, examples, visualizations, what is your plagiarism score.

Enago Academy

How to Develop a Good Research Hypothesis

' src=

The story of a research study begins by asking a question. Researchers all around the globe are asking curious questions and formulating research hypothesis. However, whether the research study provides an effective conclusion depends on how well one develops a good research hypothesis. Research hypothesis examples could help researchers get an idea as to how to write a good research hypothesis.

This blog will help you understand what is a research hypothesis, its characteristics and, how to formulate a research hypothesis

Table of Contents

What is Hypothesis?

Hypothesis is an assumption or an idea proposed for the sake of argument so that it can be tested. It is a precise, testable statement of what the researchers predict will be outcome of the study.  Hypothesis usually involves proposing a relationship between two variables: the independent variable (what the researchers change) and the dependent variable (what the research measures).

What is a Research Hypothesis?

Research hypothesis is a statement that introduces a research question and proposes an expected result. It is an integral part of the scientific method that forms the basis of scientific experiments. Therefore, you need to be careful and thorough when building your research hypothesis. A minor flaw in the construction of your hypothesis could have an adverse effect on your experiment. In research, there is a convention that the hypothesis is written in two forms, the null hypothesis, and the alternative hypothesis (called the experimental hypothesis when the method of investigation is an experiment).

Characteristics of a Good Research Hypothesis

As the hypothesis is specific, there is a testable prediction about what you expect to happen in a study. You may consider drawing hypothesis from previously published research based on the theory.

A good research hypothesis involves more effort than just a guess. In particular, your hypothesis may begin with a question that could be further explored through background research.

To help you formulate a promising research hypothesis, you should ask yourself the following questions:

  • Is the language clear and focused?
  • What is the relationship between your hypothesis and your research topic?
  • Is your hypothesis testable? If yes, then how?
  • What are the possible explanations that you might want to explore?
  • Does your hypothesis include both an independent and dependent variable?
  • Can you manipulate your variables without hampering the ethical standards?
  • Does your research predict the relationship and outcome?
  • Is your research simple and concise (avoids wordiness)?
  • Is it clear with no ambiguity or assumptions about the readers’ knowledge
  • Is your research observable and testable results?
  • Is it relevant and specific to the research question or problem?

research hypothesis example

The questions listed above can be used as a checklist to make sure your hypothesis is based on a solid foundation. Furthermore, it can help you identify weaknesses in your hypothesis and revise it if necessary.

Source: Educational Hub

How to formulate a research hypothesis.

A testable hypothesis is not a simple statement. It is rather an intricate statement that needs to offer a clear introduction to a scientific experiment, its intentions, and the possible outcomes. However, there are some important things to consider when building a compelling hypothesis.

1. State the problem that you are trying to solve.

Make sure that the hypothesis clearly defines the topic and the focus of the experiment.

2. Try to write the hypothesis as an if-then statement.

Follow this template: If a specific action is taken, then a certain outcome is expected.

3. Define the variables

Independent variables are the ones that are manipulated, controlled, or changed. Independent variables are isolated from other factors of the study.

Dependent variables , as the name suggests are dependent on other factors of the study. They are influenced by the change in independent variable.

4. Scrutinize the hypothesis

Evaluate assumptions, predictions, and evidence rigorously to refine your understanding.

Types of Research Hypothesis

The types of research hypothesis are stated below:

1. Simple Hypothesis

It predicts the relationship between a single dependent variable and a single independent variable.

2. Complex Hypothesis

It predicts the relationship between two or more independent and dependent variables.

3. Directional Hypothesis

It specifies the expected direction to be followed to determine the relationship between variables and is derived from theory. Furthermore, it implies the researcher’s intellectual commitment to a particular outcome.

4. Non-directional Hypothesis

It does not predict the exact direction or nature of the relationship between the two variables. The non-directional hypothesis is used when there is no theory involved or when findings contradict previous research.

5. Associative and Causal Hypothesis

The associative hypothesis defines interdependency between variables. A change in one variable results in the change of the other variable. On the other hand, the causal hypothesis proposes an effect on the dependent due to manipulation of the independent variable.

6. Null Hypothesis

Null hypothesis states a negative statement to support the researcher’s findings that there is no relationship between two variables. There will be no changes in the dependent variable due the manipulation of the independent variable. Furthermore, it states results are due to chance and are not significant in terms of supporting the idea being investigated.

7. Alternative Hypothesis

It states that there is a relationship between the two variables of the study and that the results are significant to the research topic. An experimental hypothesis predicts what changes will take place in the dependent variable when the independent variable is manipulated. Also, it states that the results are not due to chance and that they are significant in terms of supporting the theory being investigated.

Research Hypothesis Examples of Independent and Dependent Variables

Research Hypothesis Example 1 The greater number of coal plants in a region (independent variable) increases water pollution (dependent variable). If you change the independent variable (building more coal factories), it will change the dependent variable (amount of water pollution).
Research Hypothesis Example 2 What is the effect of diet or regular soda (independent variable) on blood sugar levels (dependent variable)? If you change the independent variable (the type of soda you consume), it will change the dependent variable (blood sugar levels)

You should not ignore the importance of the above steps. The validity of your experiment and its results rely on a robust testable hypothesis. Developing a strong testable hypothesis has few advantages, it compels us to think intensely and specifically about the outcomes of a study. Consequently, it enables us to understand the implication of the question and the different variables involved in the study. Furthermore, it helps us to make precise predictions based on prior research. Hence, forming a hypothesis would be of great value to the research. Here are some good examples of testable hypotheses.

More importantly, you need to build a robust testable research hypothesis for your scientific experiments. A testable hypothesis is a hypothesis that can be proved or disproved as a result of experimentation.

Importance of a Testable Hypothesis

To devise and perform an experiment using scientific method, you need to make sure that your hypothesis is testable. To be considered testable, some essential criteria must be met:

  • There must be a possibility to prove that the hypothesis is true.
  • There must be a possibility to prove that the hypothesis is false.
  • The results of the hypothesis must be reproducible.

Without these criteria, the hypothesis and the results will be vague. As a result, the experiment will not prove or disprove anything significant.

What are your experiences with building hypotheses for scientific experiments? What challenges did you face? How did you overcome these challenges? Please share your thoughts with us in the comments section.

Frequently Asked Questions

The steps to write a research hypothesis are: 1. Stating the problem: Ensure that the hypothesis defines the research problem 2. Writing a hypothesis as an 'if-then' statement: Include the action and the expected outcome of your study by following a ‘if-then’ structure. 3. Defining the variables: Define the variables as Dependent or Independent based on their dependency to other factors. 4. Scrutinizing the hypothesis: Identify the type of your hypothesis

Hypothesis testing is a statistical tool which is used to make inferences about a population data to draw conclusions for a particular hypothesis.

Hypothesis in statistics is a formal statement about the nature of a population within a structured framework of a statistical model. It is used to test an existing hypothesis by studying a population.

Research hypothesis is a statement that introduces a research question and proposes an expected result. It forms the basis of scientific experiments.

The different types of hypothesis in research are: • Null hypothesis: Null hypothesis is a negative statement to support the researcher’s findings that there is no relationship between two variables. • Alternate hypothesis: Alternate hypothesis predicts the relationship between the two variables of the study. • Directional hypothesis: Directional hypothesis specifies the expected direction to be followed to determine the relationship between variables. • Non-directional hypothesis: Non-directional hypothesis does not predict the exact direction or nature of the relationship between the two variables. • Simple hypothesis: Simple hypothesis predicts the relationship between a single dependent variable and a single independent variable. • Complex hypothesis: Complex hypothesis predicts the relationship between two or more independent and dependent variables. • Associative and casual hypothesis: Associative and casual hypothesis predicts the relationship between two or more independent and dependent variables. • Empirical hypothesis: Empirical hypothesis can be tested via experiments and observation. • Statistical hypothesis: A statistical hypothesis utilizes statistical models to draw conclusions about broader populations.

' src=

Wow! You really simplified your explanation that even dummies would find it easy to comprehend. Thank you so much.

Thanks a lot for your valuable guidance.

I enjoy reading the post. Hypotheses are actually an intrinsic part in a study. It bridges the research question and the methodology of the study.

Useful piece!

This is awesome.Wow.

It very interesting to read the topic, can you guide me any specific example of hypothesis process establish throw the Demand and supply of the specific product in market

Nicely explained

It is really a useful for me Kindly give some examples of hypothesis

It was a well explained content ,can you please give me an example with the null and alternative hypothesis illustrated

clear and concise. thanks.

So Good so Amazing

Good to learn

Thanks a lot for explaining to my level of understanding

Explained well and in simple terms. Quick read! Thank you

Rate this article Cancel Reply

Your email address will not be published.

what are the types hypothesis

Enago Academy's Most Popular Articles

Content Analysis vs Thematic Analysis: What's the difference?

  • Reporting Research

Choosing the Right Analytical Approach: Thematic analysis vs. content analysis for data interpretation

In research, choosing the right approach to understand data is crucial for deriving meaningful insights.…

Cross-sectional and Longitudinal Study Design

Comparing Cross Sectional and Longitudinal Studies: 5 steps for choosing the right approach

The process of choosing the right research design can put ourselves at the crossroads of…

what are the types hypothesis

  • Industry News

COPE Forum Discussion Highlights Challenges and Urges Clarity in Institutional Authorship Standards

The COPE forum discussion held in December 2023 initiated with a fundamental question — is…

Networking in Academic Conferences

  • Career Corner

Unlocking the Power of Networking in Academic Conferences

Embarking on your first academic conference experience? Fear not, we got you covered! Academic conferences…

Research recommendation

Research Recommendations – Guiding policy-makers for evidence-based decision making

Research recommendations play a crucial role in guiding scholars and researchers toward fruitful avenues of…

Choosing the Right Analytical Approach: Thematic analysis vs. content analysis for…

Comparing Cross Sectional and Longitudinal Studies: 5 steps for choosing the right…

How to Design Effective Research Questionnaires for Robust Findings

what are the types hypothesis

Sign-up to read more

Subscribe for free to get unrestricted access to all our resources on research writing and academic publishing including:

  • 2000+ blog articles
  • 50+ Webinars
  • 10+ Expert podcasts
  • 50+ Infographics
  • 10+ Checklists
  • Research Guides

We hate spam too. We promise to protect your privacy and never spam you.

I am looking for Editing/ Proofreading services for my manuscript Tentative date of next journal submission:

what are the types hypothesis

What should universities' stance be on AI tools in research and academic writing?

PrepScholar

Choose Your Test

Sat / act prep online guides and tips, what is a hypothesis and how do i write one.

author image

General Education

body-glowing-question-mark

Think about something strange and unexplainable in your life. Maybe you get a headache right before it rains, or maybe you think your favorite sports team wins when you wear a certain color. If you wanted to see whether these are just coincidences or scientific fact, you would form a hypothesis, then create an experiment to see whether that hypothesis is true or not.

But what is a hypothesis, anyway? If you’re not sure about what a hypothesis is--or how to test for one!--you’re in the right place. This article will teach you everything you need to know about hypotheses, including: 

  • Defining the term “hypothesis” 
  • Providing hypothesis examples 
  • Giving you tips for how to write your own hypothesis

So let’s get started!

body-picture-ask-sign

What Is a Hypothesis?

Merriam Webster defines a hypothesis as “an assumption or concession made for the sake of argument.” In other words, a hypothesis is an educated guess . Scientists make a reasonable assumption--or a hypothesis--then design an experiment to test whether it’s true or not. Keep in mind that in science, a hypothesis should be testable. You have to be able to design an experiment that tests your hypothesis in order for it to be valid. 

As you could assume from that statement, it’s easy to make a bad hypothesis. But when you’re holding an experiment, it’s even more important that your guesses be good...after all, you’re spending time (and maybe money!) to figure out more about your observation. That’s why we refer to a hypothesis as an educated guess--good hypotheses are based on existing data and research to make them as sound as possible.

Hypotheses are one part of what’s called the scientific method .  Every (good) experiment or study is based in the scientific method. The scientific method gives order and structure to experiments and ensures that interference from scientists or outside influences does not skew the results. It’s important that you understand the concepts of the scientific method before holding your own experiment. Though it may vary among scientists, the scientific method is generally made up of six steps (in order):

  • Observation
  • Asking questions
  • Forming a hypothesis
  • Analyze the data
  • Communicate your results

You’ll notice that the hypothesis comes pretty early on when conducting an experiment. That’s because experiments work best when they’re trying to answer one specific question. And you can’t conduct an experiment until you know what you’re trying to prove!

Independent and Dependent Variables 

After doing your research, you’re ready for another important step in forming your hypothesis: identifying variables. Variables are basically any factor that could influence the outcome of your experiment . Variables have to be measurable and related to the topic being studied.

There are two types of variables:  independent variables and dependent variables. I ndependent variables remain constant . For example, age is an independent variable; it will stay the same, and researchers can look at different ages to see if it has an effect on the dependent variable. 

Speaking of dependent variables... dependent variables are subject to the influence of the independent variable , meaning that they are not constant. Let’s say you want to test whether a person’s age affects how much sleep they need. In that case, the independent variable is age (like we mentioned above), and the dependent variable is how much sleep a person gets. 

Variables will be crucial in writing your hypothesis. You need to be able to identify which variable is which, as both the independent and dependent variables will be written into your hypothesis. For instance, in a study about exercise, the independent variable might be the speed at which the respondents walk for thirty minutes, and the dependent variable would be their heart rate. In your study and in your hypothesis, you’re trying to understand the relationship between the two variables.

Elements of a Good Hypothesis

The best hypotheses start by asking the right questions . For instance, if you’ve observed that the grass is greener when it rains twice a week, you could ask what kind of grass it is, what elevation it’s at, and if the grass across the street responds to rain in the same way. Any of these questions could become the backbone of experiments to test why the grass gets greener when it rains fairly frequently.

As you’re asking more questions about your first observation, make sure you’re also making more observations . If it doesn’t rain for two weeks and the grass still looks green, that’s an important observation that could influence your hypothesis. You'll continue observing all throughout your experiment, but until the hypothesis is finalized, every observation should be noted.

Finally, you should consult secondary research before writing your hypothesis . Secondary research is comprised of results found and published by other people. You can usually find this information online or at your library. Additionally, m ake sure the research you find is credible and related to your topic. If you’re studying the correlation between rain and grass growth, it would help you to research rain patterns over the past twenty years for your county, published by a local agricultural association. You should also research the types of grass common in your area, the type of grass in your lawn, and whether anyone else has conducted experiments about your hypothesis. Also be sure you’re checking the quality of your research . Research done by a middle school student about what minerals can be found in rainwater would be less useful than an article published by a local university.

body-pencil-notebook-writing

Writing Your Hypothesis

Once you’ve considered all of the factors above, you’re ready to start writing your hypothesis. Hypotheses usually take a certain form when they’re written out in a research report.

When you boil down your hypothesis statement, you are writing down your best guess and not the question at hand . This means that your statement should be written as if it is fact already, even though you are simply testing it.

The reason for this is that, after you have completed your study, you'll either accept or reject your if-then or your null hypothesis. All hypothesis testing examples should be measurable and able to be confirmed or denied. You cannot confirm a question, only a statement! 

In fact, you come up with hypothesis examples all the time! For instance, when you guess on the outcome of a basketball game, you don’t say, “Will the Miami Heat beat the Boston Celtics?” but instead, “I think the Miami Heat will beat the Boston Celtics.” You state it as if it is already true, even if it turns out you’re wrong. You do the same thing when writing your hypothesis.

Additionally, keep in mind that hypotheses can range from very specific to very broad.  These hypotheses can be specific, but if your hypothesis testing examples involve a broad range of causes and effects, your hypothesis can also be broad.  

body-hand-number-two

The Two Types of Hypotheses

Now that you understand what goes into a hypothesis, it’s time to look more closely at the two most common types of hypothesis: the if-then hypothesis and the null hypothesis.

#1: If-Then Hypotheses

First of all, if-then hypotheses typically follow this formula:

If ____ happens, then ____ will happen.

The goal of this type of hypothesis is to test the causal relationship between the independent and dependent variable. It’s fairly simple, and each hypothesis can vary in how detailed it can be. We create if-then hypotheses all the time with our daily predictions. Here are some examples of hypotheses that use an if-then structure from daily life: 

  • If I get enough sleep, I’ll be able to get more work done tomorrow.
  • If the bus is on time, I can make it to my friend’s birthday party. 
  • If I study every night this week, I’ll get a better grade on my exam. 

In each of these situations, you’re making a guess on how an independent variable (sleep, time, or studying) will affect a dependent variable (the amount of work you can do, making it to a party on time, or getting better grades). 

You may still be asking, “What is an example of a hypothesis used in scientific research?” Take one of the hypothesis examples from a real-world study on whether using technology before bed affects children’s sleep patterns. The hypothesis read s:

“We hypothesized that increased hours of tablet- and phone-based screen time at bedtime would be inversely correlated with sleep quality and child attention.”

It might not look like it, but this is an if-then statement. The researchers basically said, “If children have more screen usage at bedtime, then their quality of sleep and attention will be worse.” The sleep quality and attention are the dependent variables and the screen usage is the independent variable. (Usually, the independent variable comes after the “if” and the dependent variable comes after the “then,” as it is the independent variable that affects the dependent variable.) This is an excellent example of how flexible hypothesis statements can be, as long as the general idea of “if-then” and the independent and dependent variables are present.

#2: Null Hypotheses

Your if-then hypothesis is not the only one needed to complete a successful experiment, however. You also need a null hypothesis to test it against. In its most basic form, the null hypothesis is the opposite of your if-then hypothesis . When you write your null hypothesis, you are writing a hypothesis that suggests that your guess is not true, and that the independent and dependent variables have no relationship .

One null hypothesis for the cell phone and sleep study from the last section might say: 

“If children have more screen usage at bedtime, their quality of sleep and attention will not be worse.” 

In this case, this is a null hypothesis because it’s asking the opposite of the original thesis! 

Conversely, if your if-then hypothesis suggests that your two variables have no relationship, then your null hypothesis would suggest that there is one. So, pretend that there is a study that is asking the question, “Does the amount of followers on Instagram influence how long people spend on the app?” The independent variable is the amount of followers, and the dependent variable is the time spent. But if you, as the researcher, don’t think there is a relationship between the number of followers and time spent, you might write an if-then hypothesis that reads:

“If people have many followers on Instagram, they will not spend more time on the app than people who have less.”

In this case, the if-then suggests there isn’t a relationship between the variables. In that case, one of the null hypothesis examples might say:

“If people have many followers on Instagram, they will spend more time on the app than people who have less.”

You then test both the if-then and the null hypothesis to gauge if there is a relationship between the variables, and if so, how much of a relationship. 

feature_tips

4 Tips to Write the Best Hypothesis

If you’re going to take the time to hold an experiment, whether in school or by yourself, you’re also going to want to take the time to make sure your hypothesis is a good one. The best hypotheses have four major elements in common: plausibility, defined concepts, observability, and general explanation.

#1: Plausibility

At first glance, this quality of a hypothesis might seem obvious. When your hypothesis is plausible, that means it’s possible given what we know about science and general common sense. However, improbable hypotheses are more common than you might think. 

Imagine you’re studying weight gain and television watching habits. If you hypothesize that people who watch more than  twenty hours of television a week will gain two hundred pounds or more over the course of a year, this might be improbable (though it’s potentially possible). Consequently, c ommon sense can tell us the results of the study before the study even begins.

Improbable hypotheses generally go against  science, as well. Take this hypothesis example: 

“If a person smokes one cigarette a day, then they will have lungs just as healthy as the average person’s.” 

This hypothesis is obviously untrue, as studies have shown again and again that cigarettes negatively affect lung health. You must be careful that your hypotheses do not reflect your own personal opinion more than they do scientifically-supported findings. This plausibility points to the necessity of research before the hypothesis is written to make sure that your hypothesis has not already been disproven.

#2: Defined Concepts

The more advanced you are in your studies, the more likely that the terms you’re using in your hypothesis are specific to a limited set of knowledge. One of the hypothesis testing examples might include the readability of printed text in newspapers, where you might use words like “kerning” and “x-height.” Unless your readers have a background in graphic design, it’s likely that they won’t know what you mean by these terms. Thus, it’s important to either write what they mean in the hypothesis itself or in the report before the hypothesis.

Here’s what we mean. Which of the following sentences makes more sense to the common person?

If the kerning is greater than average, more words will be read per minute.

If the space between letters is greater than average, more words will be read per minute.

For people reading your report that are not experts in typography, simply adding a few more words will be helpful in clarifying exactly what the experiment is all about. It’s always a good idea to make your research and findings as accessible as possible. 

body-blue-eye

Good hypotheses ensure that you can observe the results. 

#3: Observability

In order to measure the truth or falsity of your hypothesis, you must be able to see your variables and the way they interact. For instance, if your hypothesis is that the flight patterns of satellites affect the strength of certain television signals, yet you don’t have a telescope to view the satellites or a television to monitor the signal strength, you cannot properly observe your hypothesis and thus cannot continue your study.

Some variables may seem easy to observe, but if you do not have a system of measurement in place, you cannot observe your hypothesis properly. Here’s an example: if you’re experimenting on the effect of healthy food on overall happiness, but you don’t have a way to monitor and measure what “overall happiness” means, your results will not reflect the truth. Monitoring how often someone smiles for a whole day is not reasonably observable, but having the participants state how happy they feel on a scale of one to ten is more observable. 

In writing your hypothesis, always keep in mind how you'll execute the experiment.

#4: Generalizability 

Perhaps you’d like to study what color your best friend wears the most often by observing and documenting the colors she wears each day of the week. This might be fun information for her and you to know, but beyond you two, there aren’t many people who could benefit from this experiment. When you start an experiment, you should note how generalizable your findings may be if they are confirmed. Generalizability is basically how common a particular phenomenon is to other people’s everyday life.

Let’s say you’re asking a question about the health benefits of eating an apple for one day only, you need to realize that the experiment may be too specific to be helpful. It does not help to explain a phenomenon that many people experience. If you find yourself with too specific of a hypothesis, go back to asking the big question: what is it that you want to know, and what do you think will happen between your two variables?

body-experiment-chemistry

Hypothesis Testing Examples

We know it can be hard to write a good hypothesis unless you’ve seen some good hypothesis examples. We’ve included four hypothesis examples based on some made-up experiments. Use these as templates or launch pads for coming up with your own hypotheses.

Experiment #1: Students Studying Outside (Writing a Hypothesis)

You are a student at PrepScholar University. When you walk around campus, you notice that, when the temperature is above 60 degrees, more students study in the quad. You want to know when your fellow students are more likely to study outside. With this information, how do you make the best hypothesis possible?

You must remember to make additional observations and do secondary research before writing your hypothesis. In doing so, you notice that no one studies outside when it’s 75 degrees and raining, so this should be included in your experiment. Also, studies done on the topic beforehand suggested that students are more likely to study in temperatures less than 85 degrees. With this in mind, you feel confident that you can identify your variables and write your hypotheses:

If-then: “If the temperature in Fahrenheit is less than 60 degrees, significantly fewer students will study outside.”

Null: “If the temperature in Fahrenheit is less than 60 degrees, the same number of students will study outside as when it is more than 60 degrees.”

These hypotheses are plausible, as the temperatures are reasonably within the bounds of what is possible. The number of people in the quad is also easily observable. It is also not a phenomenon specific to only one person or at one time, but instead can explain a phenomenon for a broader group of people.

To complete this experiment, you pick the month of October to observe the quad. Every day (except on the days where it’s raining)from 3 to 4 PM, when most classes have released for the day, you observe how many people are on the quad. You measure how many people come  and how many leave. You also write down the temperature on the hour. 

After writing down all of your observations and putting them on a graph, you find that the most students study on the quad when it is 70 degrees outside, and that the number of students drops a lot once the temperature reaches 60 degrees or below. In this case, your research report would state that you accept or “failed to reject” your first hypothesis with your findings.

Experiment #2: The Cupcake Store (Forming a Simple Experiment)

Let’s say that you work at a bakery. You specialize in cupcakes, and you make only two colors of frosting: yellow and purple. You want to know what kind of customers are more likely to buy what kind of cupcake, so you set up an experiment. Your independent variable is the customer’s gender, and the dependent variable is the color of the frosting. What is an example of a hypothesis that might answer the question of this study?

Here’s what your hypotheses might look like: 

If-then: “If customers’ gender is female, then they will buy more yellow cupcakes than purple cupcakes.”

Null: “If customers’ gender is female, then they will be just as likely to buy purple cupcakes as yellow cupcakes.”

This is a pretty simple experiment! It passes the test of plausibility (there could easily be a difference), defined concepts (there’s nothing complicated about cupcakes!), observability (both color and gender can be easily observed), and general explanation ( this would potentially help you make better business decisions ).

body-bird-feeder

Experiment #3: Backyard Bird Feeders (Integrating Multiple Variables and Rejecting the If-Then Hypothesis)

While watching your backyard bird feeder, you realized that different birds come on the days when you change the types of seeds. You decide that you want to see more cardinals in your backyard, so you decide to see what type of food they like the best and set up an experiment. 

However, one morning, you notice that, while some cardinals are present, blue jays are eating out of your backyard feeder filled with millet. You decide that, of all of the other birds, you would like to see the blue jays the least. This means you'll have more than one variable in your hypothesis. Your new hypotheses might look like this: 

If-then: “If sunflower seeds are placed in the bird feeders, then more cardinals will come than blue jays. If millet is placed in the bird feeders, then more blue jays will come than cardinals.”

Null: “If either sunflower seeds or millet are placed in the bird, equal numbers of cardinals and blue jays will come.”

Through simple observation, you actually find that cardinals come as often as blue jays when sunflower seeds or millet is in the bird feeder. In this case, you would reject your “if-then” hypothesis and “fail to reject” your null hypothesis . You cannot accept your first hypothesis, because it’s clearly not true. Instead you found that there was actually no relation between your different variables. Consequently, you would need to run more experiments with different variables to see if the new variables impact the results.

Experiment #4: In-Class Survey (Including an Alternative Hypothesis)

You’re about to give a speech in one of your classes about the importance of paying attention. You want to take this opportunity to test a hypothesis you’ve had for a while: 

If-then: If students sit in the first two rows of the classroom, then they will listen better than students who do not.

Null: If students sit in the first two rows of the classroom, then they will not listen better or worse than students who do not.

You give your speech and then ask your teacher if you can hand out a short survey to the class. On the survey, you’ve included questions about some of the topics you talked about. When you get back the results, you’re surprised to see that not only do the students in the first two rows not pay better attention, but they also scored worse than students in other parts of the classroom! Here, both your if-then and your null hypotheses are not representative of your findings. What do you do?

This is when you reject both your if-then and null hypotheses and instead create an alternative hypothesis . This type of hypothesis is used in the rare circumstance that neither of your hypotheses is able to capture your findings . Now you can use what you’ve learned to draft new hypotheses and test again! 

Key Takeaways: Hypothesis Writing

The more comfortable you become with writing hypotheses, the better they will become. The structure of hypotheses is flexible and may need to be changed depending on what topic you are studying. The most important thing to remember is the purpose of your hypothesis and the difference between the if-then and the null . From there, in forming your hypothesis, you should constantly be asking questions, making observations, doing secondary research, and considering your variables. After you have written your hypothesis, be sure to edit it so that it is plausible, clearly defined, observable, and helpful in explaining a general phenomenon.

Writing a hypothesis is something that everyone, from elementary school children competing in a science fair to professional scientists in a lab, needs to know how to do. Hypotheses are vital in experiments and in properly executing the scientific method . When done correctly, hypotheses will set up your studies for success and help you to understand the world a little better, one experiment at a time.

body-whats-next-post-it-note

What’s Next?

If you’re studying for the science portion of the ACT, there’s definitely a lot you need to know. We’ve got the tools to help, though! Start by checking out our ultimate study guide for the ACT Science subject test. Once you read through that, be sure to download our recommended ACT Science practice tests , since they’re one of the most foolproof ways to improve your score. (And don’t forget to check out our expert guide book , too.)

If you love science and want to major in a scientific field, you should start preparing in high school . Here are the science classes you should take to set yourself up for success.

If you’re trying to think of science experiments you can do for class (or for a science fair!), here’s a list of 37 awesome science experiments you can do at home

author image

Ashley Sufflé Robinson has a Ph.D. in 19th Century English Literature. As a content writer for PrepScholar, Ashley is passionate about giving college-bound students the in-depth information they need to get into the school of their dreams.

Student and Parent Forum

Our new student and parent forum, at ExpertHub.PrepScholar.com , allow you to interact with your peers and the PrepScholar staff. See how other students and parents are navigating high school, college, and the college admissions process. Ask questions; get answers.

Join the Conversation

Ask a Question Below

Have any questions about this article or other topics? Ask below and we'll reply!

Improve With Our Famous Guides

  • For All Students

The 5 Strategies You Must Be Using to Improve 160+ SAT Points

How to Get a Perfect 1600, by a Perfect Scorer

Series: How to Get 800 on Each SAT Section:

Score 800 on SAT Math

Score 800 on SAT Reading

Score 800 on SAT Writing

Series: How to Get to 600 on Each SAT Section:

Score 600 on SAT Math

Score 600 on SAT Reading

Score 600 on SAT Writing

Free Complete Official SAT Practice Tests

What SAT Target Score Should You Be Aiming For?

15 Strategies to Improve Your SAT Essay

The 5 Strategies You Must Be Using to Improve 4+ ACT Points

How to Get a Perfect 36 ACT, by a Perfect Scorer

Series: How to Get 36 on Each ACT Section:

36 on ACT English

36 on ACT Math

36 on ACT Reading

36 on ACT Science

Series: How to Get to 24 on Each ACT Section:

24 on ACT English

24 on ACT Math

24 on ACT Reading

24 on ACT Science

What ACT target score should you be aiming for?

ACT Vocabulary You Must Know

ACT Writing: 15 Tips to Raise Your Essay Score

How to Get Into Harvard and the Ivy League

How to Get a Perfect 4.0 GPA

How to Write an Amazing College Essay

What Exactly Are Colleges Looking For?

Is the ACT easier than the SAT? A Comprehensive Guide

Should you retake your SAT or ACT?

When should you take the SAT or ACT?

Stay Informed

what are the types hypothesis

Get the latest articles and test prep tips!

Looking for Graduate School Test Prep?

Check out our top-rated graduate blogs here:

GRE Online Prep Blog

GMAT Online Prep Blog

TOEFL Online Prep Blog

Holly R. "I am absolutely overjoyed and cannot thank you enough for helping me!”

Chapter 2: Summarizing and Visualizing Data

Chapter 3: measure of central tendency, chapter 4: measures of variation, chapter 5: measures of relative standing, chapter 6: probability distributions, chapter 7: estimates, chapter 8: distributions, chapter 9: hypothesis testing, chapter 10: analysis of variance, chapter 11: correlation and regression, chapter 12: statistics in practice.

The JoVE video player is compatible with HTML5 and Adobe Flash. Older browsers that do not support HTML5 and the H.264 video codec will still use a Flash-based video player. We recommend downloading the newest version of Flash here, but we support all versions 10 and above.

what are the types hypothesis

Consider the example of testing a claim about the proportion of healthy and scabbed apples from a cultivar.

In this case, the null hypothesis is stated as the cultivar produces an equal number of healthy and scabbed apples.

Here, the alternative hypothesis can be expressed in three different ways, and based on that, the type of hypothesis test is decided.

One way to state the alternative hypothesis is that the cultivar produces more healthy apples than scabbed apples. In this case, the right-tailed hypothesis test is applicable as the critical region would be at the right tail of the distribution.

When we state that the cultivar produces less number of healthy apples, the critical region would be at the left tail of the distribution. Here, the left-tailed hypothesis test is applicable.

In case of uncertainty of the direction of the hypothesis, we may state that the cultivar produces an unequal number of healthy and scabbed apples. As the critical region would be at both the tails equally, the two-tailed hypothesis test would be applicable.

9.5: Types of Hypothesis Testing

There are three types of hypothesis tests: right-tailed, left-tailed, and two-tailed.

When the null and alternative hypotheses are stated, it is observed that the null hypothesis is a neutral statement against which the alternative hypothesis is tested. The alternative hypothesis is a claim that instead has a certain direction. If the null hypothesis claims that p = 0.5, the alternative hypothesis would be an opposing statement to this and can be put either p > 0.5, p < 0.5, or p ≠ 0.5. In all these alternative hypotheses statements, the inequality symbols indicate the direction of the hypothesis. Based on the direction mentioned in the hypothesis, the type of hypothesis test can be decided for the given population parameter.

When the alternative hypothesis claims p > 0.5 (notice the 'greater than symbol), the critical region would fall at the right side of the probability distribution curve. In this case, the right-tailed hypothesis test is used.

When the alternative hypothesis claims p < 0.5 (notice the 'less than' symbol), the critical region would fall at the left side of the probability distribution curve. In this case, the left-tailed hypothesis test is used.

In the case of the alternative hypothesis p ≠ 0.5, a definite direction cannot be decided, and therefore the critical region falls at both the tails of the probability distribution curve. In this case, the  two-tailed test should be used.

Get cutting-edge science videos from J o VE sent straight to your inbox every month.

mktb-description

We use cookies to enhance your experience on our website.

By continuing to use our website or clicking “Continue”, you are agreeing to accept our cookies.

WeChat QR Code - JoVE

  • Maths Notes Class 12
  • NCERT Solutions Class 12
  • RD Sharma Solutions Class 12
  • Maths Formulas Class 12
  • Maths Previous Year Paper Class 12
  • Class 12 Syllabus
  • Class 12 Revision Notes
  • Physics Notes Class 12
  • Chemistry Notes Class 12
  • Biology Notes Class 12
  • Domain and Range of Trigonometric Functions
  • Exponential Graph
  • Line Integral
  • Determinant of 2x2 Matrix
  • Integral of Cos x
  • Algebra of Matrices
  • Random Sampling
  • Derivative of Sin 2x
  • Integration
  • Derivative of Sec Square x
  • Derivative Rules
  • Derivative of Sec x
  • Systematic Random Sampling
  • Derivative of Tan Inverse x
  • Derivative of Arctan
  • Zero Vector
  • Triple Integrals
  • Local Maxima and Minima in Calculus

Hypothesis is a testable statement that explains what is happening or observed. It proposes the relation between the various participating variables. Hypothesis is also called Theory, Thesis, Guess, Assumption, or Suggestion. Hypothesis creates a structure that guides the search for knowledge.

In this article, we will learn what a is hypothesis, its characteristics, types, and examples. We will also learn how hypothesis helps in scientific research.

Hypothesis

Table of Content

What is Hypothesis?

Characteristics of hypothesis, sources of hypothesis, types of hypothesis, examples of hypothesis, functions of hypothesis.

Hypothesis is a suggested idea or plan that has little proof, meant to lead to more study. It’s mainly a smart guess or suggested answer to a problem that can be checked through study and trial. In science work, we make guesses called hypotheses to try and figure out what will happen in tests or watching. These are not sure things but rather ideas that can be proved or disproved based on real-life proofs. A good theory is clear and can be tested and found wrong if the proof doesn’t support it.

Hypothesis Definition

A hypothesis is a proposed statement that is testable and is given for something that happens or observed
  • It is made using what we already know and have seen, and it’s the basis for scientific research.
  • A clear guess tells us what we think will happen in an experiment or study.
  • It’s a testable clue that can be proven true or wrong with real-life facts and checking it out carefully.
  • It usually looks like a “if-then” rule, showing the expected cause and effect relationship between what’s being studied.

Here are some key characteristics of a hypothesis:

  • Testable: An idea (hypothesis) should be made so it can be tested and proven true through doing experiments or watching. It should show a clear connection between things.
  • Specific: It needs to be easy and on target, talking about a certain part or connection between things in a study.
  • Falsifiable: A good guess should be able to show it’s wrong. This means there must be a chance for proof or seeing something that goes against the guess.
  • Logical and Rational: It should be based on things we know now or have seen, giving a reasonable reason that fits with what we already know.
  • Predictive: A guess often tells what to expect from an experiment or observation. It gives a guide for what someone might see if the guess is right.
  • Concise: It should be short and clear, showing the suggested link or explanation simply without extra confusion.
  • Grounded in Research: A guess is usually made from before studies, ideas or watching things. It comes from a deep understanding of what is already known in that area.
  • Flexible: A guess helps in the research but it needs to change or fix when new information comes up.
  • Relevant: It should be related to the question or problem being studied, helping to direct what the research is about.
  • Empirical: Hypotheses come from observations and can be tested using methods based on real-world experiences.

Hypotheses can come from different places based on what you’re studying and the kind of research. Here are some common sources from which hypotheses may originate:

  • Existing Theories: Often, guesses come from well-known science ideas. These ideas may show connections between things or occurrences that scientists can look into more.
  • Observation and Experience: Watching something happen or having personal experiences can lead to guesses. We notice odd things or repeat events in everyday life and experiments. This can make us think of guesses called hypotheses.
  • Previous Research: Using old studies or discoveries can help come up with new ideas. Scientists might try to expand or question current findings, making guesses that further study old results.
  • Literature Review: Looking at books and research in a subject can help make guesses. Noticing missing parts or mismatches in previous studies might make researchers think up guesses to deal with these spots.
  • Problem Statement or Research Question: Often, ideas come from questions or problems in the study. Making clear what needs to be looked into can help create ideas that tackle certain parts of the issue.
  • Analogies or Comparisons: Making comparisons between similar things or finding connections from related areas can lead to theories. Understanding from other fields could create new guesses in a different situation.
  • Hunches and Speculation: Sometimes, scientists might get a gut feeling or make guesses that help create ideas to test. Though these may not have proof at first, they can be a beginning for looking deeper.
  • Technology and Innovations: New technology or tools might make guesses by letting us look at things that were hard to study before.
  • Personal Interest and Curiosity: People’s curiosity and personal interests in a topic can help create guesses. Scientists could make guesses based on their own likes or love for a subject.

Here are some common types of hypotheses:

Simple Hypothesis

Complex hypothesis, directional hypothesis.

  • Non-directional Hypothesis

Null Hypothesis (H0)

Alternative hypothesis (h1 or ha), statistical hypothesis, research hypothesis, associative hypothesis, causal hypothesis.

Simple Hypothesis guesses a connection between two things. It says that there is a connection or difference between variables, but it doesn’t tell us which way the relationship goes.
Complex Hypothesis tells us what will happen when more than two things are connected. It looks at how different things interact and may be linked together.
Directional Hypothesis says how one thing is related to another. For example, it guesses that one thing will help or hurt another thing.

Non-Directional Hypothesis

Non-Directional Hypothesis are the one that don’t say how the relationship between things will be. They just say that there is a connection, without telling which way it goes.
Null hypothesis is a statement that says there’s no connection or difference between different things. It implies that any seen impacts are because of luck or random changes in the information.
Alternative Hypothesis is different from the null hypothesis and shows that there’s a big connection or gap between variables. Scientists want to say no to the null hypothesis and choose the alternative one.
Statistical Hypotheis are used in math testing and include making ideas about what groups or bits of them look like. You aim to get information or test certain things using these top-level, common words only.
Research Hypothesis comes from the research question and tells what link is expected between things or factors. It leads the study and chooses where to look more closely.
Associative Hypotheis guesses that there is a link or connection between things without really saying it caused them. It means that when one thing changes, it is connected to another thing changing.
Causal Hypothesis are different from other ideas because they say that one thing causes another. This means there’s a cause and effect relationship between variables involved in the situation. They say that when one thing changes, it directly makes another thing change.

Following are the examples of hypotheses based on their types:

Simple Hypothesis Example

  • Studying more can help you do better on tests.
  • Getting more sun makes people have higher amounts of vitamin D.

Complex Hypothesis Example

  • How rich you are, how easy it is to get education and healthcare greatly affects the number of years people live.
  • A new medicine’s success relies on the amount used, how old a person is who takes it and their genes.

Directional Hypothesis Example

  • Drinking more sweet drinks is linked to a higher body weight score.
  • Too much stress makes people less productive at work.

Non-directional Hypothesis Example

  • Drinking caffeine can affect how well you sleep.
  • People often like different kinds of music based on their gender.
  • The average test scores of Group A and Group B are not much different.
  • There is no connection between using a certain fertilizer and how much it helps crops grow.

Alternative Hypothesis (Ha)

  • Patients on Diet A have much different cholesterol levels than those following Diet B.
  • Exposure to a certain type of light can change how plants grow compared to normal sunlight.
  • The average smarts score of kids in a certain school area is 100.
  • The usual time it takes to finish a job using Method A is the same as with Method B.
  • Having more kids go to early learning classes helps them do better in school when they get older.
  • Using specific ways of talking affects how much customers get involved in marketing activities.
  • Regular exercise helps to lower the chances of heart disease.
  • Going to school more can help people make more money.
  • Playing violent video games makes teens more likely to act aggressively.
  • Less clean air directly impacts breathing health in city populations.

Hypotheses have many important jobs in the process of scientific research. Here are the key functions of hypotheses:

  • Guiding Research: Hypotheses give a clear and exact way for research. They act like guides, showing the predicted connections or results that scientists want to study.
  • Formulating Research Questions: Research questions often create guesses. They assist in changing big questions into particular, checkable things. They guide what the study should be focused on.
  • Setting Clear Objectives: Hypotheses set the goals of a study by saying what connections between variables should be found. They set the targets that scientists try to reach with their studies.
  • Testing Predictions: Theories guess what will happen in experiments or observations. By doing tests in a planned way, scientists can check if what they see matches the guesses made by their ideas.
  • Providing Structure: Theories give structure to the study process by arranging thoughts and ideas. They aid scientists in thinking about connections between things and plan experiments to match.
  • Focusing Investigations: Hypotheses help scientists focus on certain parts of their study question by clearly saying what they expect links or results to be. This focus makes the study work better.
  • Facilitating Communication: Theories help scientists talk to each other effectively. Clearly made guesses help scientists to tell others what they plan, how they will do it and the results expected. This explains things well with colleagues in a wide range of audiences.
  • Generating Testable Statements: A good guess can be checked, which means it can be looked at carefully or tested by doing experiments. This feature makes sure that guesses add to the real information used in science knowledge.
  • Promoting Objectivity: Guesses give a clear reason for study that helps guide the process while reducing personal bias. They motivate scientists to use facts and data as proofs or disprovals for their proposed answers.
  • Driving Scientific Progress: Making, trying out and adjusting ideas is a cycle. Even if a guess is proven right or wrong, the information learned helps to grow knowledge in one specific area.

How Hypothesis help in Scientific Research?

Researchers use hypotheses to put down their thoughts directing how the experiment would take place. Following are the steps that are involved in the scientific method:

  • Initiating Investigations: Hypotheses are the beginning of science research. They come from watching, knowing what’s already known or asking questions. This makes scientists make certain explanations that need to be checked with tests.
  • Formulating Research Questions: Ideas usually come from bigger questions in study. They help scientists make these questions more exact and testable, guiding the study’s main point.
  • Setting Clear Objectives: Hypotheses set the goals of a study by stating what we think will happen between different things. They set the goals that scientists want to reach by doing their studies.
  • Designing Experiments and Studies: Assumptions help plan experiments and watchful studies. They assist scientists in knowing what factors to measure, the techniques they will use and gather data for a proposed reason.
  • Testing Predictions: Ideas guess what will happen in experiments or observations. By checking these guesses carefully, scientists can see if the seen results match up with what was predicted in each hypothesis.
  • Analysis and Interpretation of Data: Hypotheses give us a way to study and make sense of information. Researchers look at what they found and see if it matches the guesses made in their theories. They decide if the proof backs up or disagrees with these suggested reasons why things are happening as expected.
  • Encouraging Objectivity: Hypotheses help make things fair by making sure scientists use facts and information to either agree or disagree with their suggested reasons. They lessen personal preferences by needing proof from experience.
  • Iterative Process: People either agree or disagree with guesses, but they still help the ongoing process of science. Findings from testing ideas make us ask new questions, improve those ideas and do more tests. It keeps going on in the work of science to keep learning things.

Also, Check

Mathematics Maths Formulas Branches of Mathematics

Hypothesis-Frequently Asked Questions

What is a hypothesis.

A guess is a possible explanation or forecast that can be checked by doing research and experiments.

What are Components of a Hypothesis?

The components of a Hypothesis are Independent Variable, Dependent Variable, Relationship between Variables, Directionality etc.

What makes a Good Hypothesis?

Testability, Falsifiability, Clarity and Precision, Relevance are some parameters that makes a Good Hypothesis

Can a Hypothesis be Proven True?

You cannot prove conclusively that most hypotheses are true because it’s generally impossible to examine all possible cases for exceptions that would disprove them.

How are Hypotheses Tested?

Hypothesis testing is used to assess the plausibility of a hypothesis by using sample data

Can Hypotheses change during Research?

Yes, you can change or improve your ideas based on new information discovered during the research process.

What is the Role of a Hypothesis in Scientific Research?

Hypotheses are used to support scientific research and bring about advancements in knowledge.

Please Login to comment...

Similar reads.

author

  • Geeks Premier League 2023
  • Maths-Class-12
  • Geeks Premier League
  • Mathematics
  • School Learning
  • 10 Best Todoist Alternatives in 2024 (Free)
  • How to Get Spotify Premium Free Forever on iOS/Android
  • Yahoo Acquires Instagram Co-Founders' AI News Platform Artifact
  • OpenAI Introduces DALL-E Editor Interface
  • Top 10 R Project Ideas for Beginners in 2024

Improve your Coding Skills with Practice

 alt=

What kind of Experience do you want to share?

  • Math Article

Hypothesis Definition

In Statistics, the determination of the variation between the group of data due to true variation is done by hypothesis testing. The sample data are taken from the population parameter based on the assumptions. The hypothesis can be classified into various types. In this article, let us discuss the hypothesis definition, various types of hypothesis and the significance of hypothesis testing, which are explained in detail.

Hypothesis Definition in Statistics

In Statistics, a hypothesis is defined as a formal statement, which gives the explanation about the relationship between the two or more variables of the specified population. It helps the researcher to translate the given problem to a clear explanation for the outcome of the study. It clearly explains and predicts the expected outcome. It indicates the types of experimental design and directs the study of the research process.

Types of Hypothesis

The hypothesis can be broadly classified into different types. They are:

Simple Hypothesis

A simple hypothesis is a hypothesis that there exists a relationship between two variables. One is called a dependent variable, and the other is called an independent variable.

Complex Hypothesis

A complex hypothesis is used when there is a relationship between the existing variables. In this hypothesis, the dependent and independent variables are more than two.

Null Hypothesis

In the null hypothesis, there is no significant difference between the populations specified in the experiments, due to any experimental or sampling error. The null hypothesis is denoted by H 0 .

Alternative Hypothesis

In an alternative hypothesis, the simple observations are easily influenced by some random cause. It is denoted by the H a or H 1 .

Empirical Hypothesis

An empirical hypothesis is formed by the experiments and based on the evidence.

Statistical Hypothesis

In a statistical hypothesis, the statement should be logical or illogical, and the hypothesis is verified statistically.

Apart from these types of hypothesis, some other hypotheses are directional and non-directional hypothesis, associated hypothesis, casual hypothesis.

Characteristics of Hypothesis

The important characteristics of the hypothesis are:

  • The hypothesis should be short and precise
  • It should be specific
  • A hypothesis must be related to the existing body of knowledge
  • It should be capable of verification

To learn more Maths definitions, register with BYJU’S – The Learning App.

Quiz Image

Put your understanding of this concept to test by answering a few MCQs. Click ‘Start Quiz’ to begin!

Select the correct answer and click on the “Finish” button Check your score and answers at the end of the quiz

Visit BYJU’S for all Maths related queries and study materials

Your result is as below

Request OTP on Voice Call

Leave a Comment Cancel reply

Your Mobile number and Email id will not be published. Required fields are marked *

Post My Comment

what are the types hypothesis

  • Share Share

Register with BYJU'S & Download Free PDFs

Register with byju's & watch live videos.

close

Beyond Neyman-Pearson: e-values enable hypothesis testing with a data-driven alpha

A standard practice in statistical hypothesis testing is to mention the p-value alongside the accept/reject decision. We show the advantages of mentioning an e-value instead. With p-values, it is not clear how to use an extreme observation (e.g. p ≪ α much-less-than p 𝛼 \textsc{p}\ll\alpha p ≪ italic_α ) for getting better frequentist decisions. With e-values it is straightforward, since they provide Type-I risk control in a generalized Neyman-Pearson setting with the decision task (a general loss function) determined post-hoc, after observation of the data — thereby providing a handle on ‘roving α 𝛼 \alpha italic_α ’s’. When Type-II risks are taken into consideration, the only admissible decision rules in the post-hoc setting turn out to be e-value-based. Similarly, if the loss incurred when specifying a faulty confidence interval is not fixed in advance, standard confidence intervals and distributions may fail whereas e-confidence sets and e-posteriors still provide valid risk guarantees. Sufficiently powerful e-values have by now been developed for a range of classical testing problems. We discuss the main challenges for wider development and deployment.

We perform a null hypothesis test with significance level α 𝛼 \alpha italic_α and we observe a p-value p ≪ α much-less-than p 𝛼 \textsc{p}\ll\alpha p ≪ italic_α . Why aren’t we allowed to say “we have rejected the null at level p ”? While a continuous source of bewilderment to the applied scientist, professional statisticians understand the reason: to get a Type-I error probability guarantee of α 𝛼 \alpha italic_α — a cornerstone of the Neyman-Pearson (NP) theory of testing — we must set α 𝛼 \alpha italic_α in advance. But this immediately raises another question: why should the p-value be mentioned at all in scientific papers, next to the reject/accept decision for the pre-specified α 𝛼 \alpha italic_α [ 4 , 20 ] ? The prevailing attitude is to accept this standard practice, on the grounds that it “provides more information” — as explicitly stated by, for example, Lehmann [ 25 ] , one of NP theory’s main contributors. But this is problematic: there is nothing in NP theory to tell us what the decision-theoretic consequences of ‘ p ≪ α much-less-than p 𝛼 \textsc{p}\ll\alpha p ≪ italic_α ’ could be, whereas at the same time, the fundamental motivation behind NP theory is decision-theoretic: according to [ 32 ] , “[all of] mathematical statistics deals with problems relating to performance characteristics of rules of inductive behavior [i.e. decision rules] based on random experiments ”. There is no simple way though to translate observation of a p with p ≪ α much-less-than p 𝛼 \textsc{p}\ll\alpha p ≪ italic_α into better decisions: as is well-known and reviewed below (( 4 ) and ( 29 )), intuitive and common decision-theoretic interpretations of p ≪ α much-less-than p 𝛼 \textsc{p}\ll\alpha p ≪ italic_α are usually just wrong. We are therefore faced with a standard practice in NP testing that, according to (strict, behaviorist) NP theory, is not part of mathematical statistics!

E as an alternative for P

In our main result, Theorem  1 , we show that this issue can be resolved by mentioning e-values rather than p-values next to the accept/reject decision. E-values [ 13 , 52 , 42 , 56 , 37 ] are a recently popularized alternative for p-values that are related to, but far more general than, likelihood ratios. Importantly, as reviewed in Example  2 below, for any NP test with the accept/reject-decision based on a p-value, the exact same test can be implemented by basing the decision on an e-value. Thus there is no a priori reason why one should accompany the decision of a NP test with a p-value rather than an e-value. But, in contrast to the p-value, the e-value has a clear decision-theoretic justification that remains valid if decision tasks are formulated post-hoc , i.e. after seeing, and in light of, the data. Concretely, after the result of a study has been published, and when new circumstances prevail, one conceivably might contemplate different actions, with different associated losses, than originally planned. For example, a study about vaccine efficacy ( ve ) in a pandemic may have been set up as a test between null hypothesis ve ≤ 30 % ve percent 30 \text{\sc ve}\leq 30\% ve ≤ 30 % and alternative ve ≥ 50 % ve percent 50 \text{\sc ve}\geq 50\% ve ≥ 50 % [ 45 ] . The original plan was to vaccinate all people above 60 years of age if the null is rejected. But suppose the null actually gets rejected with a very small p-value ≪ α much-less-than absent 𝛼 \ll\alpha ≪ italic_α , and at the same time the virus’ reproduction rate may be much higher than anticipated. Based on both the observed data (summarized by p ) and the changed circumstances, one might now contemplate a new action, vaccinate everyone over 40, with higher losses if the alternative is false and higher pay-offs if it is true. E-values can be used unproblematically for such a post-hoc formulated decision task; p-values cannot. A second example is simply the fact that scientific results are published and remain on record so as to be useful for future deployment: a company contemplating to produce medication X 𝑋 X italic_X may find a publication about the efficacy of X 𝑋 X italic_X that is, say, 15 years old. Back then, in two independent studies the null (no efficacy) was rejected at the given α = 0.05 𝛼 0.05 \alpha=0.05 italic_α = 0.05 , but producing X 𝑋 X italic_X would have been prohibitively expensive so this finding was not acted upon. But recently the company managed a technological breakthrough making production of X 𝑋 X italic_X much cheaper. Had α 𝛼 \alpha italic_α been smaller than 0.01 0.01 0.01 0.01 , they would now decide to take X 𝑋 X italic_X into production. But now suppose that in both original studies, p < 0.01 p 0.01 \textsc{p}<0.01 p < 0.01 yet α = 0.05 𝛼 0.05 \alpha=0.05 italic_α = 0.05 . The upshot of Example  1 , Proposition  1 and Theorem  1 below is that, if one had observed S − 1 < 0.01 superscript 𝑆 1 0.01 S^{-1}<0.01 italic_S start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT < 0.01 for an e-variable S 𝑆 S italic_S then acting anyway, despite the changed circumstances is Type-I risk safe , in the precise sense of ( 1 ) below; but doing this based on p < 0.01 p 0.01 \textsc{p}<0.01 p < 0.01 is unsafe in the sense that no clear risk (performance) bounds can be given when engaging in such behavior.

From Testing to Estimation with Confidence: the e-posterior

Bind assumption underlying p-values and standard css.

+ + independence) assumption. E-values and -posteriors lead to decisions that retain Type-I risk safety if BIND is violated.

Technical Contribution and Contents

To obtain frequentist guarantees without BIND we first need to reformulate NP testing in terms of losses and risks rather than errors and error probabilities, an idea going back to Wald’s seminal 1939 paper introducing statistical decision theory [ 54 ] . But while Wald lets go off the Type-I/II error distinction as soon as he allows for more than two actions, we stick with Type-I and Type-II risks (replacing Type-I and Type-II error probabilities, respectively) and show that the e-value is then the natural statistic to base decisions upon, and remains so if the decision task is determined post-hoc. Thus, our GNP ( Generalized Neyman-Pearson ) Theory follows a path opened up by Wald but apparently not pursued further thereafter. In Section  1 we informally present this reformulation, show how p -based procedures get in trouble if BIND is violated, introduce e-values and explain how, when combined with a maximally compatible decision rule , they guarantee Type-I risk safety even without BIND. Section  2 then formalizes the reasoning and presents our main result, Theorem  1 . Among all Type-I risk safe decision rules, we aim only for those that have admissible Type-II risk behavior; we call a rule admissible if there exists no other decision rule that is never worse and sometimes strictly better. Theorem  1 , which has the flavour of a complete class theorem [ 3 , 9 ] shows that, under mild regularity conditions, the set of admissible decision rules are precisely those that are based on some e-variable S 𝑆 S italic_S via a maximally compatible decision rule. Section  3 extends our findings to confidence intervals and distributions ( cd ’s). cd ’s can be replaced by e-posteriors, a novel notion treated in much more detail in my recent paper [ 15 ] , which may be be viewed as a companion to this one, more oriented towards a Bayesian-inclined readership.

An Important Caveat

Systematic development of e-values has only started very recently (in 2019). While a lot of progress has been made, and by now useful ( ≈ \approx ≈ powerful) e-values are available for a number of practically important parametric and nonparametric testing and estimation problems, there is still an enormously wide range of problems for which p-values — systematically developed since the 1930s — exist yet e-values have not yet been developed. We briefly review initial success stories and current challenges in Section  4 , informing the final Section  5 which indicates the way forward and re-interprets our findings as establishing a quasi-conditional paradigm. All longer mathematical derivations and proofs are delegated to the Supporting Information Appendix (SI).

1 Generalized Neyman-Pearson Theory

1.1 losses instead of errors.

In the basic NP setting, we observe data Y 𝑌 Y italic_Y taking values in some set 𝒴 𝒴 \mathcal{Y} caligraphic_Y , with both the null hypothesis ℋ ⁢ ( 0 ¯ ) ℋ ¯ 0 \mathcal{H}(\underline{0}) caligraphic_H ( under¯ start_ARG 0 end_ARG ) and the alternative ℋ ⁢ ( 1 ¯ ) ℋ ¯ 1 \mathcal{H}(\underline{1}) caligraphic_H ( under¯ start_ARG 1 end_ARG ) being represented as collections of distributions for Y 𝑌 Y italic_Y . NP [ 34 ] tell us to fix some α 𝛼 \alpha italic_α and then adopt the decision rule that, among all decision rules with Type-I error bounded by α 𝛼 \alpha italic_α , minimizes the Type-II error. Following Wald [ 54 ] , we re-interpret this procedure in terms of a nonnegative loss function L ⁢ ( ⋅ , ⋅ ) 𝐿 ⋅ ⋅ L(\cdot,\cdot) italic_L ( ⋅ , ⋅ ) , with L ⁢ ( κ , a ) 𝐿 𝜅 𝑎 L(\kappa,a) italic_L ( italic_κ , italic_a ) denoting the loss made by action a 𝑎 a italic_a if κ 𝜅 \kappa italic_κ is the true state of nature. We have κ ∈ { 0 ¯ , 1 ¯ } 𝜅 ¯ 0 ¯ 1 \kappa\in\{\underline{0},\underline{1}\} italic_κ ∈ { under¯ start_ARG 0 end_ARG , under¯ start_ARG 1 end_ARG } and 𝒜 = { 0 , 1 } 𝒜 0 1 \mathcal{A}=\{0,1\} caligraphic_A = { 0 , 1 } , κ = 0 ¯ 𝜅 ¯ 0 \kappa=\underline{0} italic_κ = under¯ start_ARG 0 end_ARG representing that the null is correct, κ = 1 ¯ 𝜅 ¯ 1 \kappa=\underline{1} italic_κ = under¯ start_ARG 1 end_ARG that the alternative is correct, a = 0 𝑎 0 a=0 italic_a = 0 standing for ‘accept’ and a = 1 𝑎 1 a=1 italic_a = 1 for ‘reject’ the null. We invariably assume L ⁢ ( 0 ¯ , 1 ) > L ⁢ ( 0 ¯ , 0 ) ≥ 0 , L ⁢ ( 1 ¯ , 0 ) > L ⁢ ( 1 ¯ , 1 ) ≥ 0 formulae-sequence 𝐿 ¯ 0 1 𝐿 ¯ 0 0 0 𝐿 ¯ 1 0 𝐿 ¯ 1 1 0 L(\underline{0},1)>L(\underline{0},0)\geq 0,L(\underline{1},0)>L(\underline{1}% ,1)\geq 0 italic_L ( under¯ start_ARG 0 end_ARG , 1 ) > italic_L ( under¯ start_ARG 0 end_ARG , 0 ) ≥ 0 , italic_L ( under¯ start_ARG 1 end_ARG , 0 ) > italic_L ( under¯ start_ARG 1 end_ARG , 1 ) ≥ 0 . ‘Of course’ (as Wald writes) we may want to set L ⁢ ( 0 ¯ , 0 ) = L ⁢ ( 1 ¯ , 1 ) = 0 𝐿 ¯ 0 0 𝐿 ¯ 1 1 0 L(\underline{0},0)=L(\underline{1},1)=0 italic_L ( under¯ start_ARG 0 end_ARG , 0 ) = italic_L ( under¯ start_ARG 1 end_ARG , 1 ) = 0 and we will do this for now, but it is not required for the subsequent developments. In this formulation, the usual α 𝛼 \alpha italic_α -Type-I error guarantee is replaced by an ℓ ℓ \ell roman_ℓ - Type-I risk guarantee . Formally, we fix an ℓ ℓ \ell roman_ℓ in advance of observing the data and we say that decision rule δ 𝛿 \delta italic_δ (i.e. a test), defined as a function from 𝒴 𝒴 \mathcal{Y} caligraphic_Y to 𝒜 𝒜 \mathcal{A} caligraphic_A , is Type-I risk safe if

( 1 ) expresses that, whatever we decide, we want to make sure that our risk (expected loss) under the null is no larger than ℓ ℓ \ell roman_ℓ . In a standard level- α 𝛼 \alpha italic_α test, one rejects the null if p ⁢ ( y ) p 𝑦 \textsc{p}(y) p ( italic_y ) , the p-value corresponding to data y 𝑦 y italic_y , satisfies p ⁢ ( y ) ≤ α p 𝑦 𝛼 \textsc{p}(y)\leq\alpha p ( italic_y ) ≤ italic_α . A corresponding decision rule in terms of loss functions is to reject the null whenever the observed p ⁢ ( y ) p 𝑦 \textsc{p}(y) p ( italic_y ) satisfies

We get exactly the same behavior as for the standard level α 𝛼 \alpha italic_α -test if we set L ⁢ ( 0 ¯ , 1 ) = ℓ / α 𝐿 ¯ 0 1 ℓ 𝛼 L(\underline{0},1)=\ell/\alpha italic_L ( under¯ start_ARG 0 end_ARG , 1 ) = roman_ℓ / italic_α . For example, for α = 0.05 𝛼 0.05 \alpha=0.05 italic_α = 0.05 we can set ℓ = 1 ℓ 1 \ell=1 roman_ℓ = 1 and then L ⁢ ( 0 ¯ , 1 ) := 20 assign 𝐿 ¯ 0 1 20 L(\underline{0},1):=20 italic_L ( under¯ start_ARG 0 end_ARG , 1 ) := 20 ; then, just like in NP testing ( 3 ) tells us to pick a δ ∘ superscript 𝛿 \delta^{\circ} italic_δ start_POSTSUPERSCRIPT ∘ end_POSTSUPERSCRIPT which rejects the null if p ≤ 0.05 p 0.05 \textsc{p}\leq 0.05 p ≤ 0.05 . If p is defined so that δ ∘ superscript 𝛿 \delta^{\circ} italic_δ start_POSTSUPERSCRIPT ∘ end_POSTSUPERSCRIPT is UMP (uniformly most powerful), then combined with any loss function L ⁢ ( 1 ¯ , 0 ) > 0 𝐿 ¯ 1 0 0 L(\underline{1},0)>0 italic_L ( under¯ start_ARG 1 end_ARG , 0 ) > 0 , δ ∘ superscript 𝛿 \delta^{\circ} italic_δ start_POSTSUPERSCRIPT ∘ end_POSTSUPERSCRIPT will also minimize the worst-case Type-II risk ( 2 ) among all δ 𝛿 \delta italic_δ that satisfy Type-I error probability ≤ α absent 𝛼 \leq\alpha ≤ italic_α : up until now we have merely reformulated standard NP theory.

Actions of Varying Intensity

But now suppose we have more than two actions available. For example, consider four alternative actions: accept the null (retain the status quo), take mild action (e.g. vaccinate all people over 60), take more drastic action (vaccinate everyone over 40) and extreme action (vaccinate the whole population). We consider this question, too, in terms of Type-I and Type-II risk and confidence — thereby taking a different direction than standard decision theory. For example, our action space could now be 𝒜 b = { 0 , 1 , 2 , 3 } subscript 𝒜 𝑏 0 1 2 3 \mathcal{A}_{b}=\{0,1,2,3\} caligraphic_A start_POSTSUBSCRIPT italic_b end_POSTSUBSCRIPT = { 0 , 1 , 2 , 3 } with loss function L b ⁢ ( 0 ¯ , 0 ) = 0 , L b ⁢ ( 0 ¯ , 1 ) = 20 ⁢ ℓ formulae-sequence subscript 𝐿 𝑏 ¯ 0 0 0 subscript 𝐿 𝑏 ¯ 0 1 20 ℓ L_{b}(\underline{0},0)=0,L_{b}(\underline{0},1)=20\ell italic_L start_POSTSUBSCRIPT italic_b end_POSTSUBSCRIPT ( under¯ start_ARG 0 end_ARG , 0 ) = 0 , italic_L start_POSTSUBSCRIPT italic_b end_POSTSUBSCRIPT ( under¯ start_ARG 0 end_ARG , 1 ) = 20 roman_ℓ , L b ⁢ ( 0 ¯ , 2 ) = 100 ⁢ ℓ subscript 𝐿 𝑏 ¯ 0 2 100 ℓ L_{b}(\underline{0},2)=100\ell italic_L start_POSTSUBSCRIPT italic_b end_POSTSUBSCRIPT ( under¯ start_ARG 0 end_ARG , 2 ) = 100 roman_ℓ , L b ⁢ ( 0 ¯ , 3 ) = 500 ⁢ ℓ subscript 𝐿 𝑏 ¯ 0 3 500 ℓ L_{b}(\underline{0},3)=500\ell italic_L start_POSTSUBSCRIPT italic_b end_POSTSUBSCRIPT ( under¯ start_ARG 0 end_ARG , 3 ) = 500 roman_ℓ and L b ⁢ ( 1 ¯ , 3 ) < L b ⁢ ( 1 ¯ , 2 ) < L b ⁢ ( 1 ¯ , 1 ) < L b ⁢ ( 1 ¯ , 0 ) = ℓ subscript 𝐿 𝑏 ¯ 1 3 subscript 𝐿 𝑏 ¯ 1 2 subscript 𝐿 𝑏 ¯ 1 1 subscript 𝐿 𝑏 ¯ 1 0 ℓ L_{b}(\underline{1},3)<L_{b}(\underline{1},2)<L_{b}(\underline{1},1)<L_{b}(% \underline{1},0)=\ell italic_L start_POSTSUBSCRIPT italic_b end_POSTSUBSCRIPT ( under¯ start_ARG 1 end_ARG , 3 ) < italic_L start_POSTSUBSCRIPT italic_b end_POSTSUBSCRIPT ( under¯ start_ARG 1 end_ARG , 2 ) < italic_L start_POSTSUBSCRIPT italic_b end_POSTSUBSCRIPT ( under¯ start_ARG 1 end_ARG , 1 ) < italic_L start_POSTSUBSCRIPT italic_b end_POSTSUBSCRIPT ( under¯ start_ARG 1 end_ARG , 0 ) = roman_ℓ . More generally, as long as Type-I loss is increasing in a 𝑎 a italic_a and Type-II loss is decreasing, such an extension of the NP setting makes intuitive sense.

In terms of p-values, the straightforward extension of ( 3 ) to this multi-action case would be to play action a 𝑎 a italic_a where a 𝑎 a italic_a is the largest value such that

But, assuming our p-value is strict so that it has a uniform distribution under the null, this gives a Type-I risk of

violating the guarantee we aimed to impose and showing that a naive p-value based procedure does not work. The problem gets exacerbated if we allow for more than four actions: in the SI we show that the expected loss of the naive procedure ( 4 ) may go to ∞ \infty ∞ as we add additional actions with L b ⁢ ( 0 ¯ , a ) subscript 𝐿 𝑏 ¯ 0 𝑎 L_{b}(\underline{0},a) italic_L start_POSTSUBSCRIPT italic_b end_POSTSUBSCRIPT ( under¯ start_ARG 0 end_ARG , italic_a ) increasing and L b ⁢ ( 1 ¯ , a ) subscript 𝐿 𝑏 ¯ 1 𝑎 L_{b}(\underline{1},a) italic_L start_POSTSUBSCRIPT italic_b end_POSTSUBSCRIPT ( under¯ start_ARG 1 end_ARG , italic_a ) decreasing in a 𝑎 a italic_a . There we also show that an obvious ‘fix’, namely modifying ( 4 ) to make sure that for each action a 𝑎 a italic_a , L b ⁢ ( 0 ¯ , a ) subscript 𝐿 𝑏 ¯ 0 𝑎 L_{b}(\underline{0},a) italic_L start_POSTSUBSCRIPT italic_b end_POSTSUBSCRIPT ( under¯ start_ARG 0 end_ARG , italic_a ) gets multiplied by exactly the probability that action a 𝑎 a italic_a is taken, does not solve this issue.

Post-Hoc Loss Functions

Allowing more than two actions is really just a warm-up to a further extension which arguably better models what often happens in, for example, medical practice: the post-hoc determination or modification of a decision task, after seeing the data and dependent on the data, such as in the vaccine efficacy example in the introduction. That is, there is really an underlying class (whose definition may be unknowable) of loss functions L b ⁢ ( ⋅ , ⋅ ) subscript 𝐿 𝑏 ⋅ ⋅ L_{b}(\cdot,\cdot) italic_L start_POSTSUBSCRIPT italic_b end_POSTSUBSCRIPT ( ⋅ , ⋅ ) with associated action spaces 𝒜 b subscript 𝒜 𝑏 \mathcal{A}_{b} caligraphic_A start_POSTSUBSCRIPT italic_b end_POSTSUBSCRIPT , and the decision-maker (DM) is posed a particular decision task L b ⁢ ( ⋅ , ⋅ ) subscript 𝐿 𝑏 ⋅ ⋅ L_{b}(\cdot,\cdot) italic_L start_POSTSUBSCRIPT italic_b end_POSTSUBSCRIPT ( ⋅ , ⋅ ) where b 𝑏 b italic_b , indexing the loss actually used, is really the outcome of a random variable B = b 𝐵 𝑏 B=b italic_B = italic_b , whose distribution may depend on the data in all kinds of ways. The actual B = b 𝐵 𝑏 B=b italic_B = italic_b that is presented is thus random and only fixed after the study result has become available; i.e. ‘post-hoc’. Crucially, the process determining the actual value of B 𝐵 B italic_B is typically murky; nobody knows exactly what loss function would have been considered in what alternative circumstances; DM only knows the loss function finally arrived at.

Again, with p-values, we might be tempted to pick the largest action a 𝑎 a italic_a such that ( 4 ) holds, where now b 𝑏 b italic_b is really the (observed, known) outcome of random variable B 𝐵 B italic_B whose definition is itself unknown. Now, even if for each b 𝑏 b italic_b , L b subscript 𝐿 𝑏 L_{b} italic_L start_POSTSUBSCRIPT italic_b end_POSTSUBSCRIPT allows for only two actions, so that the problem superficially resembles the standard NP setting, using ( 4 ) can have disastrous consequences in the post-hoc setting, as the following example shows.

Suppose there are three loss functions L b subscript 𝐿 𝑏 L_{b} italic_L start_POSTSUBSCRIPT italic_b end_POSTSUBSCRIPT , for b ∈ ℬ = { 1 , 2 , 3 } 𝑏 ℬ 1 2 3 b\in\mathcal{B}=\{1,2,3\} italic_b ∈ caligraphic_B = { 1 , 2 , 3 } , with corresponding actions 𝒜 b = { 0 , b } subscript 𝒜 𝑏 0 𝑏 \mathcal{A}_{b}=\{0,b\} caligraphic_A start_POSTSUBSCRIPT italic_b end_POSTSUBSCRIPT = { 0 , italic_b } . We set L 1 ⁢ ( 0 ¯ , 1 ) = 20 ⁢ ℓ , L 2 ⁢ ( 0 ¯ , 2 ) = 100 ⁢ ℓ , L 3 ⁢ ( 0 ¯ , 3 ) = 500 ⁢ ℓ formulae-sequence subscript 𝐿 1 normal-¯ 0 1 20 normal-ℓ formulae-sequence subscript 𝐿 2 normal-¯ 0 2 100 normal-ℓ subscript 𝐿 3 normal-¯ 0 3 500 normal-ℓ L_{1}(\underline{0},1)=20\ell,L_{2}(\underline{0},2)=100\ell,L_{3}(\underline{% 0},3)=500\ell italic_L start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ( under¯ start_ARG 0 end_ARG , 1 ) = 20 roman_ℓ , italic_L start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ( under¯ start_ARG 0 end_ARG , 2 ) = 100 roman_ℓ , italic_L start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT ( under¯ start_ARG 0 end_ARG , 3 ) = 500 roman_ℓ , L b ⁢ ( 0 ¯ , 0 ) = 0 , L b ⁢ ( 1 ¯ , 0 ) := ℓ formulae-sequence subscript 𝐿 𝑏 normal-¯ 0 0 0 assign subscript 𝐿 𝑏 normal-¯ 1 0 normal-ℓ L_{b}(\underline{0},0)=0,L_{b}(\underline{1},0):=\ell italic_L start_POSTSUBSCRIPT italic_b end_POSTSUBSCRIPT ( under¯ start_ARG 0 end_ARG , 0 ) = 0 , italic_L start_POSTSUBSCRIPT italic_b end_POSTSUBSCRIPT ( under¯ start_ARG 1 end_ARG , 0 ) := roman_ℓ for all b ∈ ℬ 𝑏 ℬ b\in\mathcal{B} italic_b ∈ caligraphic_B , and L b ⁢ ( 1 ¯ , b ) subscript 𝐿 𝑏 normal-¯ 1 𝑏 L_{b}(\underline{1},b) italic_L start_POSTSUBSCRIPT italic_b end_POSTSUBSCRIPT ( under¯ start_ARG 1 end_ARG , italic_b ) strictly decreasing in b 𝑏 b italic_b . This is like the previous example, but rather than always being able to choose one among four actions, the very set of choices that is presented to DM via setting B = b 𝐵 𝑏 B=b italic_B = italic_b might depend on the data Y 𝑌 Y italic_Y or on external situations. One cannot rule out that this is done in an unfavourable manner — if the data suggest strong evidence then the policy developers (e.g. a pandemic outbreak management team) might only suggest actions with drastic consequences. Suppose, for example, that if p > 0.02 p 0.02 \textsc{p}>0.02 p > 0.02 , the DMs are presented loss L 1 subscript 𝐿 1 L_{1} italic_L start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ; if 0.001 < p ≤ 0.02 0.001 p 0.02 0.001<\textsc{p}\leq 0.02 0.001 < p ≤ 0.02 they are presented loss L 2 subscript 𝐿 2 L_{2} italic_L start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ; and if p ≤ 0.001 p 0.001 \textsc{p}\leq 0.001 p ≤ 0.001 they are presented loss L 3 subscript 𝐿 3 L_{3} italic_L start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT . Using ( 4 ), we then get (assuming again uniform p ) a Type-I risk of

As in ( 5 ) the resulting decision rule ( 4 ) is not Type-I risk safe, and again, the Type-I risk can even go to infinity with the number of potential actions.

1.2 E-Values to the Rescue

Reporting evidence as e-values (as defined by ( 7 )) rather than p-values solves both the multiple action and post-hoc-loss issue identified above. An e- value is the value of a special type of statistic called an e- variable . An e-variable is any nonnegative random variable S = S ⁢ ( Y ) 𝑆 𝑆 𝑌 S=S(Y) italic_S = italic_S ( italic_Y ) that can be written as a function of the observed Y 𝑌 Y italic_Y and that satisfies the inequality:

The e-variable’s simplest application is in defining tests: the S 𝑆 S italic_S -based hypothesis test at level α 𝛼 \alpha italic_α is defined to reject the null iff S ≥ 1 / α 𝑆 1 𝛼 S\geq 1/\alpha italic_S ≥ 1 / italic_α . Since for any e-variable S 𝑆 S italic_S , all P ∈ ℋ ⁢ ( 0 ¯ ) 𝑃 ℋ ¯ 0 P\in\mathcal{H}(\underline{0}) italic_P ∈ caligraphic_H ( under¯ start_ARG 0 end_ARG ) , by Markov’s inequality, P ( S ≥ 1 / α ) ≤ α ) P(S\geq 1/\alpha)\leq\alpha) italic_P ( italic_S ≥ 1 / italic_α ) ≤ italic_α ) , with such a test we get a Type-I error guarantee of α 𝛼 \alpha italic_α , with the advantage that (as shown by [ 13 , 52 ] ) unlike with p-values, the Type-I error guarantee remains valid under optional continuation , i.e. deciding based on a study result whether new studies should be undertaken and if so, multiplying the corresponding e-values. The term ‘e-variable’ was coined in 2019 [ 13 , 52 ] but their history is older, as described by [ 13 , 37 ] .

We may now simply pick any e-variable S 𝑆 S italic_S we like and replace decision rule ( 4 ) by the following maximally compatible alternative rule: upon observing data Y = y 𝑌 𝑦 Y=y italic_Y = italic_y and loss function indexed by B = b 𝐵 𝑏 B=b italic_B = italic_b with accompanying maximum imposed risk bound ℓ ℓ \ell roman_ℓ , select the largest a 𝑎 a italic_a for which

where we adopt the (in our setting harmless) convention that, for u = 0 𝑢 0 u=0 italic_u = 0 and v ≥ 0 𝑣 0 v\geq 0 italic_v ≥ 0 , u − 1 ⁢ v := 0 assign superscript 𝑢 1 𝑣 0 u^{-1}v:=0 italic_u start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT italic_v := 0 if v = 0 𝑣 0 v=0 italic_v = 0 and u − 1 ⁢ v = ∞ superscript 𝑢 1 𝑣 u^{-1}v=\infty italic_u start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT italic_v = ∞ if v > 0 𝑣 0 v>0 italic_v > 0 (in Section  5 we discuss where ℓ ℓ \ell roman_ℓ comes from). Theorem  1 below gives conditions under which ( 8 ) has a unique solution. For the original NP setting of two actions, ( 8 ) is simply the p-value based rule ( 4 ) with p replaced by 1 / S 1 𝑆 1/S 1 / italic_S , illustrating that large e-values correspond to evidence against the null. But in contrast to the p-value based rule, with the e-based rule, no matter what e-variable S 𝑆 S italic_S we take (as long as it is itself chosen before data are observed), no matter how many actions 𝒜 𝒜 \mathcal{A} caligraphic_A contains, no matter the process determining the loss B 𝐵 B italic_B , we have the Type-I risk guarantee ( 1 ) (Theorem  1 below): replacing p by 1 / S 1 𝑆 1/S 1 / italic_S resolves the BIND problem. Of course, this raises the question whether p-values cannot be used safely for Type-I risk after all, in a manner different from ( 4 ). The only such method we know of is to first convert a p-value into an e-value and then use ( 8 ) after all. As discussed in the SI, the e-values resulting from such a conversion are usually suboptimal, so we prefer to design and use e-values directly.

[The NP and LR E-Variables] As with p-values, many different e-variables can be defined for the same ℋ ⁢ ( 0 ¯ ) ℋ normal-¯ 0 \mathcal{H}(\underline{0}) caligraphic_H ( under¯ start_ARG 0 end_ARG ) . As discussed by [ 42 ] , an extreme choice is to start with a fixed level α 𝛼 \alpha italic_α and p-value p and to set S np ⁢ ( α ) := ( 1 / α ) assign superscript 𝑆 np 𝛼 1 𝛼 S^{\textsc{np}(\alpha)}:=(1/\alpha) italic_S start_POSTSUPERSCRIPT np ( italic_α ) end_POSTSUPERSCRIPT := ( 1 / italic_α ) if p ≤ α p 𝛼 \textsc{p}\leq\alpha p ≤ italic_α and S np ⁢ ( α ) = 0 superscript 𝑆 np 𝛼 0 S^{\textsc{np}(\alpha)}=0 italic_S start_POSTSUPERSCRIPT np ( italic_α ) end_POSTSUPERSCRIPT = 0 otherwise. Clearly 𝐄 Y ∼ P 0 ⁢ [ S np ⁢ ( α ) ] ≤ α ⁢ ( 1 / α ) = 1 subscript 𝐄 similar-to 𝑌 subscript 𝑃 0 delimited-[] superscript 𝑆 np 𝛼 𝛼 1 𝛼 1 {\bf E}_{Y\sim P_{0}}[S^{\textsc{np}(\alpha)}]\leq\alpha(1/\alpha)=1 bold_E start_POSTSUBSCRIPT italic_Y ∼ italic_P start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT end_POSTSUBSCRIPT [ italic_S start_POSTSUPERSCRIPT np ( italic_α ) end_POSTSUPERSCRIPT ] ≤ italic_α ( 1 / italic_α ) = 1 so S np ⁢ ( α ) superscript 𝑆 np 𝛼 S^{\textsc{np}(\alpha)} italic_S start_POSTSUPERSCRIPT np ( italic_α ) end_POSTSUPERSCRIPT is an e-variable. In the case of a classical, 2-action NP problem as defined underneath ( 3 ), the test ( 8 ) based on e-variable S = S np ⁢ ( α ) 𝑆 superscript 𝑆 np 𝛼 S=S^{\textsc{np}(\alpha)} italic_S = italic_S start_POSTSUPERSCRIPT np ( italic_α ) end_POSTSUPERSCRIPT will lead to a = 1 𝑎 1 a=1 italic_a = 1 (reject the null) exactly iff the classical NP test based on p does. This shows that any p -based NP test can also be arrived at using ( 8 ) with a special e-value: nothing is lost by replacing p-values with e-values. Still, in case there are more than 2 actions and/or post-hoc decisions, while preserving the ℓ normal-ℓ \ell roman_ℓ -Type-I risk guarantee, decisions based on S np ⁢ ( α ) superscript 𝑆 np 𝛼 S^{\textsc{np}(\alpha)} italic_S start_POSTSUPERSCRIPT np ( italic_α ) end_POSTSUPERSCRIPT may not be very wise in the Type-II risk sense. For example, with the loss function used in ( 5 ) and α = 0.05 𝛼 0.05 \alpha=0.05 italic_α = 0.05 , we get that even for very small underlying p (i.e. extreme data), we will still choose action 1 1 1 1 whereas it seems more reasonable to select more extreme actions, minimizing Type-II loss, as the evidence against the null gets stronger. In case ℋ ⁢ ( 0 ¯ ) = { P 0 } ℋ normal-¯ 0 subscript 𝑃 0 \mathcal{H}(\underline{0})=\{P_{0}\} caligraphic_H ( under¯ start_ARG 0 end_ARG ) = { italic_P start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT } and ℋ ⁢ ( 1 ¯ ) = { P 1 } ℋ normal-¯ 1 subscript 𝑃 1 \mathcal{H}(\underline{1})=\{P_{1}\} caligraphic_H ( under¯ start_ARG 1 end_ARG ) = { italic_P start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT } are simple, this can be achieved by taking S 𝑆 S italic_S to be a likelihood ratio : assuming P j subscript 𝑃 𝑗 P_{j} italic_P start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT has density p j subscript 𝑝 𝑗 p_{j} italic_p start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ,

which is immediately seen to be an e-variable:

2 Mathematical Formalization and Results

2.1 type-i risk safety and compatibility.

Let ℋ ⁢ ( 0 ¯ ) ℋ ¯ 0 \mathcal{H}(\underline{0}) caligraphic_H ( under¯ start_ARG 0 end_ARG ) , the null hypothesis, be a set of probability distributions for random Y 𝑌 Y italic_Y taking values in a outcome space 𝒴 𝒴 \mathcal{Y} caligraphic_Y .

Definition 1

A GNP (Generalized Neyman-Pearson) testing problem relative to ℋ ⁢ ( 0 ¯ ) ℋ normal-¯ 0 \mathcal{H}(\underline{0}) caligraphic_H ( under¯ start_ARG 0 end_ARG ) is a tuple ( ℬ , { ( 𝒜 b , L b ( 0 ¯ , ⋅ ) : 𝒜 b → ℝ 0 + ) : b ∈ ℬ } ) (\mathcal{B},\{(\mathcal{A}_{b},L_{b}(\underline{0},\cdot):\mathcal{A}_{b}% \rightarrow{\mathbb{R}}^{+}_{0}):b\in\mathcal{B}\}) ( caligraphic_B , { ( caligraphic_A start_POSTSUBSCRIPT italic_b end_POSTSUBSCRIPT , italic_L start_POSTSUBSCRIPT italic_b end_POSTSUBSCRIPT ( under¯ start_ARG 0 end_ARG , ⋅ ) : caligraphic_A start_POSTSUBSCRIPT italic_b end_POSTSUBSCRIPT → blackboard_R start_POSTSUPERSCRIPT + end_POSTSUPERSCRIPT start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ) : italic_b ∈ caligraphic_B } ) where for all b ∈ ℬ 𝑏 ℬ b\in\mathcal{B} italic_b ∈ caligraphic_B , we call L b ⁢ ( 0 ¯ , ⋅ ) subscript 𝐿 𝑏 normal-¯ 0 normal-⋅ L_{b}(\underline{0},\cdot) italic_L start_POSTSUBSCRIPT italic_b end_POSTSUBSCRIPT ( under¯ start_ARG 0 end_ARG , ⋅ ) the Type-I loss indexed by b 𝑏 b italic_b with action space 𝒜 b subscript 𝒜 𝑏 \mathcal{A}_{b} caligraphic_A start_POSTSUBSCRIPT italic_b end_POSTSUBSCRIPT .

In Section  3 we extend the definition to uncertainty quantification beyond testing. Relative to any given GNP testing problem, we further define a decision rule to be any collection of functions { δ b : b ∈ ℬ } conditional-set subscript 𝛿 𝑏 𝑏 ℬ \{\delta_{b}:b\in\mathcal{B}\} { italic_δ start_POSTSUBSCRIPT italic_b end_POSTSUBSCRIPT : italic_b ∈ caligraphic_B } , where δ b ⁢ ( y ) subscript 𝛿 𝑏 𝑦 \delta_{b}(y) italic_δ start_POSTSUBSCRIPT italic_b end_POSTSUBSCRIPT ( italic_y ) denotes the a ∈ 𝒜 b 𝑎 subscript 𝒜 𝑏 a\in\mathcal{A}_{b} italic_a ∈ caligraphic_A start_POSTSUBSCRIPT italic_b end_POSTSUBSCRIPT picked when loss function indexed by B = b 𝐵 𝑏 B=b italic_B = italic_b (i.e. L b subscript 𝐿 𝑏 L_{b} italic_L start_POSTSUBSCRIPT italic_b end_POSTSUBSCRIPT ) is presented and Y = y 𝑌 𝑦 Y=y italic_Y = italic_y is observed. Let δ 𝛿 \delta italic_δ be any decision rule and let S = S ⁢ ( Y ) 𝑆 𝑆 𝑌 S=S(Y) italic_S = italic_S ( italic_Y ) be any e-variable. We call δ 𝛿 \delta italic_δ compatible with S 𝑆 S italic_S if

We now prepare the definition of Type-I risk safety for GNP decision problems. First, we note that in general, the threshold ℓ ℓ \ell roman_ℓ a DM would like to impose on the risk via ( 8 ) when confronted with loss function L b subscript 𝐿 𝑏 L_{b} italic_L start_POSTSUBSCRIPT italic_b end_POSTSUBSCRIPT may be an arbitrary positive real. However, using this maximal rule ( 8 ), for every observed Y = y 𝑌 𝑦 Y=y italic_Y = italic_y and B = b 𝐵 𝑏 B=b italic_B = italic_b , the exact same decision will be taken if we normalize all losses, using L b ′ subscript superscript 𝐿 ′ 𝑏 L^{\prime}_{b} italic_L start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_b end_POSTSUBSCRIPT with L b ′ ⁢ ( 0 ¯ , a ) = L b ⁢ ( 0 ¯ , a ) / ℓ subscript superscript 𝐿 ′ 𝑏 ¯ 0 𝑎 subscript 𝐿 𝑏 ¯ 0 𝑎 ℓ L^{\prime}_{b}(\underline{0},a)=L_{b}(\underline{0},a)/\ell italic_L start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_b end_POSTSUBSCRIPT ( under¯ start_ARG 0 end_ARG , italic_a ) = italic_L start_POSTSUBSCRIPT italic_b end_POSTSUBSCRIPT ( under¯ start_ARG 0 end_ARG , italic_a ) / roman_ℓ instead of L b subscript 𝐿 𝑏 L_{b} italic_L start_POSTSUBSCRIPT italic_b end_POSTSUBSCRIPT and ℓ ′ = 1 superscript ℓ ′ 1 \ell^{\prime}=1 roman_ℓ start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT = 1 instead of ℓ ℓ \ell roman_ℓ . Hence, without loss of generality, from now on we simplify the treatment by taking ℓ = 1 ℓ 1 \ell=1 roman_ℓ = 1 (in the SI we discuss in more detail why this is not harmful). With this in mind, consider a concrete setting in which the actual loss function L B subscript 𝐿 𝐵 L_{B} italic_L start_POSTSUBSCRIPT italic_B end_POSTSUBSCRIPT with index B 𝐵 B italic_B presented to DM is determined in a data-dependent manner (perhaps by some policy makers, perhaps completely implicitly). Since we do not know the definition of B 𝐵 B italic_B , i.e. how the choice is made, we want to ensure that the analogue of Eq.  1 holds, in the worst case, over all possible choices. Thus, as a first attempt, we may extend Eq.  1 , by defining δ 𝛿 \delta italic_δ to be Type-I risk safe if

0 U:\mathcal{Y}\rightarrow{\mathbb{R}}^{+}_{0} italic_U : caligraphic_Y → blackboard_R start_POSTSUPERSCRIPT + end_POSTSUPERSCRIPT start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT such that for all P 0 ∈ ℋ ⁢ ( 0 ¯ ) subscript 𝑃 0 ℋ ¯ 0 P_{0}\in\mathcal{H}(\underline{0}) italic_P start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ∈ caligraphic_H ( under¯ start_ARG 0 end_ARG ) , 𝐄 P 0 ⁢ [ U ⁢ ( Y ) ] subscript 𝐄 subscript 𝑃 0 delimited-[] 𝑈 𝑌 {\bf E}_{P_{0}}[U(Y)] bold_E start_POSTSUBSCRIPT italic_P start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT end_POSTSUBSCRIPT [ italic_U ( italic_Y ) ] is well-defined, and for all y ∈ 𝒴 𝑦 𝒴 y\in\mathcal{Y} italic_y ∈ caligraphic_Y ,

E-Variable Compatibility ⇔ ⇔ \Leftrightarrow ⇔ Type-I Risk Safety

In NP Theory, Type-I error guarantees come first — we look for an optimal decision rule among all rules that have the desired Type-I error guarantee. Analogously, here we first restrict our search for ‘good’ decision rules to those that are Type-I risk safe for the given decision problem. How to find these? Realizing that the second equation in ( 13 ) expresses that U 𝑈 U italic_U is an e-variable, and the first equation says that δ 𝛿 \delta italic_δ is compatible with this e-variable, we see that the Type-I risk safe decision rules are exactly those that are compatible with an e-variable, thereby explaining the importance of e-variables to generalized NP testing. Formally, we have just proved the following trivial consequence of our definitions:

Proposition 1

Fix an arbitrary GNP testing problem. For every δ 𝛿 \delta italic_δ defined relative to this problem:

For every e-variable S 𝑆 S italic_S for ℋ ⁢ ( 0 ¯ ) ℋ ¯ 0 \mathcal{H}(\underline{0}) caligraphic_H ( under¯ start_ARG 0 end_ARG ) : if δ 𝛿 \delta italic_δ is compatible with S 𝑆 S italic_S , then δ 𝛿 \delta italic_δ is Type-I risk safe.

Suppose that δ 𝛿 \delta italic_δ is Type-I risk safe. Let S = U 𝑆 𝑈 S=U italic_S = italic_U be as in ( 13 ) (in standard cases we can simply take S ⁢ ( y ) = sup b ∈ ℬ L b ⁢ ( 0 ¯ , δ b ⁢ ( y ) ) 𝑆 𝑦 subscript supremum 𝑏 ℬ subscript 𝐿 𝑏 ¯ 0 subscript 𝛿 𝑏 𝑦 S(y)=\sup_{b\in\mathcal{B}}L_{b}(\underline{0},\delta_{b}(y)) italic_S ( italic_y ) = roman_sup start_POSTSUBSCRIPT italic_b ∈ caligraphic_B end_POSTSUBSCRIPT italic_L start_POSTSUBSCRIPT italic_b end_POSTSUBSCRIPT ( under¯ start_ARG 0 end_ARG , italic_δ start_POSTSUBSCRIPT italic_b end_POSTSUBSCRIPT ( italic_y ) ) ). Then S 𝑆 S italic_S is an e-variable for ℋ ⁢ ( 0 ¯ ) ℋ ¯ 0 \mathcal{H}(\underline{0}) caligraphic_H ( under¯ start_ARG 0 end_ARG ) , and δ 𝛿 \delta italic_δ is compatible with S 𝑆 S italic_S .

2.2 Admissibility

whereas there exist b ∈ ℬ , P ∈ ℋ ⁢ ( 0 ¯ ) formulae-sequence 𝑏 ℬ 𝑃 ℋ ¯ 0 b\in\mathcal{B},P\in\mathcal{H}(\underline{0}) italic_b ∈ caligraphic_B , italic_P ∈ caligraphic_H ( under¯ start_ARG 0 end_ARG ) such that

If 𝒴 𝒴 \mathcal{Y} caligraphic_Y is uncountable, L b subscript 𝐿 𝑏 L_{b} italic_L start_POSTSUBSCRIPT italic_b end_POSTSUBSCRIPT , δ ∘ superscript 𝛿 \delta^{\circ} italic_δ start_POSTSUPERSCRIPT ∘ end_POSTSUPERSCRIPT and δ 𝛿 \delta italic_δ could again be picked in highly pathological ways, such that the probabilities above are undefined. This is fully resolved by the generalization of ( 14 ) and ( 15 ) given in the SI.

Clearly, if both δ ∘ superscript 𝛿 \delta^{\circ} italic_δ start_POSTSUPERSCRIPT ∘ end_POSTSUPERSCRIPT and δ 𝛿 \delta italic_δ are Type-I risk safe and δ ∘ superscript 𝛿 \delta^{\circ} italic_δ start_POSTSUPERSCRIPT ∘ end_POSTSUPERSCRIPT is Type-II strictly better than δ 𝛿 \delta italic_δ , we would always prefer playing δ ∘ superscript 𝛿 \delta^{\circ} italic_δ start_POSTSUPERSCRIPT ∘ end_POSTSUPERSCRIPT over δ 𝛿 \delta italic_δ . We may say that δ 𝛿 \delta italic_δ is inadmissible . Formally, for any decision rule δ 𝛿 \delta italic_δ we say that it is admissible if it is Type-I risk safe and no other Type-I risk safe decision rule is Type-II strictly better.

Main Result

This admissibility notion is reminiscent of standard admissibility notions in classical statistical decision theory, and the theorem below is in the spirit of a complete class theorem [ 3 , 9 ] expressing that in searching for reasonable (i.e., admissible) decision rules in GNP problems we may restrict ourselves to those based on e-variables via maximally compatible decision rules . Formally, we call a decision rule δ 𝛿 \delta italic_δ maximally compatible with e-variable S 𝑆 S italic_S relative to a given GNP testing problem, if it is compatible with S 𝑆 S italic_S and there exists no decision rule δ ∘ superscript 𝛿 \delta^{\circ} italic_δ start_POSTSUPERSCRIPT ∘ end_POSTSUPERSCRIPT such that δ ∘ superscript 𝛿 \delta^{\circ} italic_δ start_POSTSUPERSCRIPT ∘ end_POSTSUPERSCRIPT is also compatible with S 𝑆 S italic_S yet δ ∘ superscript 𝛿 \delta^{\circ} italic_δ start_POSTSUPERSCRIPT ∘ end_POSTSUPERSCRIPT is Type-II strictly better than δ 𝛿 \delta italic_δ . We will relate this to the earlier informal definition of maximum compatibility (( 8 )) further below.

0 {\mathbb{R}}^{+}_{0} blackboard_R start_POSTSUPERSCRIPT + end_POSTSUPERSCRIPT start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ). An example of a GNP testing problem that is rich relative to e-variable S np ⁢ ( α ) superscript 𝑆 np 𝛼 S^{\textsc{np}(\alpha)} italic_S start_POSTSUPERSCRIPT np ( italic_α ) end_POSTSUPERSCRIPT of Example  2 is given by ℬ = { np } ∪ ℬ ′ ℬ np superscript ℬ ′ \mathcal{B}=\{\textsc{np}\}\cup\mathcal{B}^{\prime} caligraphic_B = { np } ∪ caligraphic_B start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT , for arbitrary ℬ ′ superscript ℬ ′ \mathcal{B}^{\prime} caligraphic_B start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT , where 𝒜 np = { 0 , 1 } subscript 𝒜 np 0 1 \mathcal{A}_{\textsc{np}}=\{0,1\} caligraphic_A start_POSTSUBSCRIPT np end_POSTSUBSCRIPT = { 0 , 1 } and L np ⁢ ( 0 ¯ , 0 ) = 0 , L np ⁢ ( 0 ¯ , 1 ) = 1 / α formulae-sequence subscript 𝐿 np ¯ 0 0 0 subscript 𝐿 np ¯ 0 1 1 𝛼 L_{\text{\sc np}}(\underline{0},0)=0,L_{\text{\sc np}}(\underline{0},1)=1/\alpha italic_L start_POSTSUBSCRIPT np end_POSTSUBSCRIPT ( under¯ start_ARG 0 end_ARG , 0 ) = 0 , italic_L start_POSTSUBSCRIPT np end_POSTSUBSCRIPT ( under¯ start_ARG 0 end_ARG , 1 ) = 1 / italic_α (if ℬ ′ = ∅ superscript ℬ ′ \mathcal{B}^{\prime}=\emptyset caligraphic_B start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT = ∅ , this is the classical NP setting of Section  1 again): choose B = np 𝐵 np B=\text{\sc np} italic_B = np , a = 0 𝑎 0 a=0 italic_a = 0 if S np ⁢ ( α ) = 0 superscript 𝑆 np 𝛼 0 S^{\text{\sc np}(\alpha)}=0 italic_S start_POSTSUPERSCRIPT np ( italic_α ) end_POSTSUPERSCRIPT = 0 , and choose B = np 𝐵 np B=\text{\sc np} italic_B = np , a = 1 𝑎 1 a=1 italic_a = 1 if S np ⁢ ( α ) = 1 / α superscript 𝑆 np 𝛼 1 𝛼 S^{\text{\sc np}(\alpha)}=1/\alpha italic_S start_POSTSUPERSCRIPT np ( italic_α ) end_POSTSUPERSCRIPT = 1 / italic_α .

Consider a GNP testing problem. Then:

If δ 𝛿 \delta italic_δ is an admissible decision rule, then there exists an e-variable S 𝑆 S italic_S such that δ 𝛿 \delta italic_δ is a maximally compatible decision rule for S 𝑆 S italic_S .

As a partial converse, suppose that δ 𝛿 \delta italic_δ is a maximally compatible decision rule for some e-variable S 𝑆 S italic_S . If (a) all P ∈ ℋ ⁢ ( 0 ¯ ) 𝑃 ℋ ¯ 0 P\in\mathcal{H}(\underline{0}) italic_P ∈ caligraphic_H ( under¯ start_ARG 0 end_ARG ) are mutually absolutely continuous (see below) and (b) S 𝑆 S italic_S is sharp relative to the given testing problem, i.e. 𝐄 P 0 ⁢ [ S ] = 1 subscript 𝐄 subscript 𝑃 0 delimited-[] 𝑆 1 {\bf E}_{P_{0}}[S]=1 bold_E start_POSTSUBSCRIPT italic_P start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT end_POSTSUBSCRIPT [ italic_S ] = 1 for some P 0 ∈ ℋ ⁢ ( 0 ¯ ) subscript 𝑃 0 ℋ ¯ 0 P_{0}\in\mathcal{H}(\underline{0}) italic_P start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ∈ caligraphic_H ( under¯ start_ARG 0 end_ARG ) , and (c) the GNP testing problem is rich relative to S 𝑆 S italic_S , then δ 𝛿 \delta italic_δ is admissible.

0 {\mathbb{R}}^{+}_{0}\cup\{\infty\} blackboard_R start_POSTSUPERSCRIPT + end_POSTSUPERSCRIPT start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ∪ { ∞ } (in particular this includes the case that 𝒜 b subscript 𝒜 𝑏 \mathcal{A}_{b} caligraphic_A start_POSTSUBSCRIPT italic_b end_POSTSUBSCRIPT is finite); (ii) the Type-I loss L b ⁢ ( 0 ¯ , a ) subscript 𝐿 𝑏 ¯ 0 𝑎 L_{b}(\underline{0},a) italic_L start_POSTSUBSCRIPT italic_b end_POSTSUBSCRIPT ( under¯ start_ARG 0 end_ARG , italic_a ) is monotonically and, on each interval, continuously increasing in a 𝑎 a italic_a , and (iii) all P ∈ 𝒫 𝑃 𝒫 P\in\mathcal{P} italic_P ∈ caligraphic_P are mutually absolutely continuous. The following is easily checked: for arbitrary e-variable S 𝑆 S italic_S , such simple GNP decision problems must have a maximally compatible δ * superscript 𝛿 \delta^{*} italic_δ start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT relative to S 𝑆 S italic_S that generalizes ( 8 ), with our simplification ℓ = 1 ℓ 1 \ell=1 roman_ℓ = 1 : δ * superscript 𝛿 \delta^{*} italic_δ start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT is the rule which selects, when presented Y = y , B = b formulae-sequence 𝑌 𝑦 𝐵 𝑏 Y=y,B=b italic_Y = italic_y , italic_B = italic_b ,

0 \mathcal{B}\subset{\mathbb{R}}^{+}_{0} caligraphic_B ⊂ blackboard_R start_POSTSUPERSCRIPT + end_POSTSUPERSCRIPT start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT with for b ∈ ℬ 𝑏 ℬ b\in\mathcal{B} italic_b ∈ caligraphic_B , 𝒜 b = { 0 , 1 } subscript 𝒜 𝑏 0 1 \mathcal{A}_{b}=\{0,1\} caligraphic_A start_POSTSUBSCRIPT italic_b end_POSTSUBSCRIPT = { 0 , 1 } and L b ⁢ ( 0 ¯ , 0 ) = 0 , L b ⁢ ( 0 ¯ , 1 ) = b formulae-sequence subscript 𝐿 𝑏 normal-¯ 0 0 0 subscript 𝐿 𝑏 normal-¯ 0 1 𝑏 L_{b}(\underline{0},0)=0,L_{b}(\underline{0},1)=b italic_L start_POSTSUBSCRIPT italic_b end_POSTSUBSCRIPT ( under¯ start_ARG 0 end_ARG , 0 ) = 0 , italic_L start_POSTSUBSCRIPT italic_b end_POSTSUBSCRIPT ( under¯ start_ARG 0 end_ARG , 1 ) = italic_b . Take arbitrary but fixed 0 < α < 1 0 𝛼 1 0<\alpha<1 0 < italic_α < 1 . Then the maximally compatible decision rule δ * superscript 𝛿 \delta^{*} italic_δ start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT as in ( 16 ) relative to e-variable S np ⁢ ( α ) superscript 𝑆 np 𝛼 S^{\textsc{np}(\alpha)} italic_S start_POSTSUPERSCRIPT np ( italic_α ) end_POSTSUPERSCRIPT is sharp. When presented with loss function L b subscript 𝐿 𝑏 L_{b} italic_L start_POSTSUBSCRIPT italic_b end_POSTSUBSCRIPT , this δ * superscript 𝛿 \delta^{*} italic_δ start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT always plays 0 0 if b > 1 / α 𝑏 1 𝛼 b>1/\alpha italic_b > 1 / italic_α . If b ≤ 1 / α 𝑏 1 𝛼 b\leq 1/\alpha italic_b ≤ 1 / italic_α , it plays 1 1 1 1 if b ≤ S np ⁢ ( α ) 𝑏 superscript 𝑆 np 𝛼 b\leq S^{\textsc{np}(\alpha)} italic_b ≤ italic_S start_POSTSUPERSCRIPT np ( italic_α ) end_POSTSUPERSCRIPT (i.e. if S np ⁢ ( α ) = 1 / α superscript 𝑆 np 𝛼 1 𝛼 S^{\textsc{np}(\alpha)}=1/\alpha italic_S start_POSTSUPERSCRIPT np ( italic_α ) end_POSTSUPERSCRIPT = 1 / italic_α , i.e. if p ≤ α p 𝛼 \textsc{p}\leq\alpha p ≤ italic_α ) and 0 0 otherwise (i.e. if S np ⁢ ( α ) = 0 superscript 𝑆 np 𝛼 0 S^{\textsc{np}(\alpha)}=0 italic_S start_POSTSUPERSCRIPT np ( italic_α ) end_POSTSUPERSCRIPT = 0 , i.e. if p > α p 𝛼 \textsc{p}>\alpha p > italic_α ). By Part 2 of Theorem  1 , this δ * superscript 𝛿 \delta^{*} italic_δ start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT is admissible if ℬ ℬ \mathcal{B} caligraphic_B contains b = 1 / α 𝑏 1 𝛼 b=1/\alpha italic_b = 1 / italic_α , which ensures richness relative to S np ⁢ ( α ) superscript 𝑆 np 𝛼 S^{\text{\sc np}(\alpha)} italic_S start_POSTSUPERSCRIPT np ( italic_α ) end_POSTSUPERSCRIPT .

0 {\mathbb{R}}^{+}_{0} blackboard_R start_POSTSUPERSCRIPT + end_POSTSUPERSCRIPT start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT .

3 Robust Confidence via the E-Posterior

Now let us consider a statistical model 𝒫 𝒫 \mathcal{P} caligraphic_P partitioned according to a parameter of interest θ ∈ Θ 𝜃 Θ \theta\in\Theta italic_θ ∈ roman_Θ , with ϕ : 𝒫 → Θ : italic-ϕ → 𝒫 Θ \phi:\mathcal{P}\rightarrow\Theta italic_ϕ : caligraphic_P → roman_Θ indicating the parameter corresponding to each P 𝑃 P italic_P ; for example, θ = ϕ ⁢ ( P ) 𝜃 italic-ϕ 𝑃 \theta=\phi(P) italic_θ = italic_ϕ ( italic_P ) might be the mean of P 𝑃 P italic_P , or, if 𝒫 = { P θ : θ ∈ Θ } 𝒫 conditional-set subscript 𝑃 𝜃 𝜃 Θ \mathcal{P}=\{P_{\theta}:\theta\in\Theta\} caligraphic_P = { italic_P start_POSTSUBSCRIPT italic_θ end_POSTSUBSCRIPT : italic_θ ∈ roman_Θ } is a parametric model, ϕ italic-ϕ \phi italic_ϕ might simply denote the parameterization function, ϕ ⁢ ( P θ ) = θ italic-ϕ subscript 𝑃 𝜃 𝜃 \phi(P_{\theta})=\theta italic_ϕ ( italic_P start_POSTSUBSCRIPT italic_θ end_POSTSUBSCRIPT ) = italic_θ . Any collection of p-values { p θ : θ ∈ Θ } conditional-set subscript p 𝜃 𝜃 Θ \{\textsc{p}_{\theta}:\theta\in\Theta\} { p start_POSTSUBSCRIPT italic_θ end_POSTSUBSCRIPT : italic_θ ∈ roman_Θ } , with p θ subscript p 𝜃 \textsc{p}_{\theta} p start_POSTSUBSCRIPT italic_θ end_POSTSUBSCRIPT a p-value for the null ℋ ⁢ ( θ ) := { P ∈ 𝒫 : ϕ ⁢ ( P ) = θ } assign ℋ 𝜃 conditional-set 𝑃 𝒫 italic-ϕ 𝑃 𝜃 \mathcal{H}(\theta):=\{P\in\mathcal{P}:\phi(P)=\theta\} caligraphic_H ( italic_θ ) := { italic_P ∈ caligraphic_P : italic_ϕ ( italic_P ) = italic_θ } can be used to build a valid ( 1 − α ) 1 𝛼 (1-\alpha) ( 1 - italic_α ) confidence set, by setting cs α ⁢ ( Y ) = { θ : p θ ⁢ ( Y ) > α } subscript cs 𝛼 𝑌 conditional-set 𝜃 subscript p 𝜃 𝑌 𝛼 \text{\sc cs}_{\alpha}(Y)=\{\theta:\textsc{p}_{\theta}(Y)>\alpha\} cs start_POSTSUBSCRIPT italic_α end_POSTSUBSCRIPT ( italic_Y ) = { italic_θ : p start_POSTSUBSCRIPT italic_θ end_POSTSUBSCRIPT ( italic_Y ) > italic_α } to be the set of θ 𝜃 \theta italic_θ ’s that would not have been rejected at the given level α 𝛼 \alpha italic_α . For simplicity, we restrict attention to scalar Θ ⊆ ℝ Θ ℝ \Theta\subseteq{\mathbb{R}} roman_Θ ⊆ blackboard_R ; then the cs α subscript cs 𝛼 \text{\sc cs}_{\alpha} cs start_POSTSUBSCRIPT italic_α end_POSTSUBSCRIPT will usually be intervals, and indeed this p-value based construction is a standard way to construct such intervals. Analogously [ 58 , 37 ] , any e-collection , i.e. a collection of e-variables { S θ : θ ∈ Θ } conditional-set subscript 𝑆 𝜃 𝜃 Θ \{S_{\theta}:\theta\in\Theta\} { italic_S start_POSTSUBSCRIPT italic_θ end_POSTSUBSCRIPT : italic_θ ∈ roman_Θ } such that S θ subscript 𝑆 𝜃 S_{\theta} italic_S start_POSTSUBSCRIPT italic_θ end_POSTSUBSCRIPT is an e-variable for the ‘null’ ℋ ⁢ ( θ ) ℋ 𝜃 \mathcal{H}(\theta) caligraphic_H ( italic_θ ) (by this we mean that S θ subscript 𝑆 𝜃 S_{\theta} italic_S start_POSTSUBSCRIPT italic_θ end_POSTSUBSCRIPT must satisfy ( 7 ), i.e. 𝐄 P ⁢ [ S θ ] ≤ 1 subscript 𝐄 𝑃 delimited-[] subscript 𝑆 𝜃 1 {\bf E}_{P}[S_{\theta}]\leq 1 bold_E start_POSTSUBSCRIPT italic_P end_POSTSUBSCRIPT [ italic_S start_POSTSUBSCRIPT italic_θ end_POSTSUBSCRIPT ] ≤ 1 , for all P ∈ ℋ ⁢ ( θ ) 𝑃 ℋ 𝜃 P\in\mathcal{H}(\theta) italic_P ∈ caligraphic_H ( italic_θ ) ) can be used to build an equally valid, usually larger, e-based ( 1 − α ) 1 𝛼 (1-\alpha) ( 1 - italic_α ) -confidence set (again, for scalar θ 𝜃 \theta italic_θ this usually becomes an interval), one for each α 𝛼 \alpha italic_α , by setting cs α * ⁢ ( Y ) = { θ : S θ ⁢ ( Y ) < 1 / α } subscript superscript cs 𝛼 𝑌 conditional-set 𝜃 subscript 𝑆 𝜃 𝑌 1 𝛼 \text{\sc cs}^{*}_{\alpha}(Y)=\{\theta:S_{\theta}(Y)<1/\alpha\} cs start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_α end_POSTSUBSCRIPT ( italic_Y ) = { italic_θ : italic_S start_POSTSUBSCRIPT italic_θ end_POSTSUBSCRIPT ( italic_Y ) < 1 / italic_α } as the set of θ 𝜃 \theta italic_θ ’s that would not have been rejected at level α 𝛼 \alpha italic_α with an e-value based test. Below we first give a simple example. We then, in Section  3 . 3.1 retrace the steps of Section  1 and Section  2 , re-interpreting confidence sets in terms of actions with associated losses and risks. Section  3 . 3.2 and 3 . 3.3 show that, once again, if losses are determined post-hoc (BIND is violated), then standard confidence intervals loose their validity whereas e-based confidence intervals remain Type-I risk safe. Relatedly, without BIND, decisions based on confidence distributions can be unsafe, but those based on the e-posterior — a means of summarizing e- cs ’s for all α 𝛼 \alpha italic_α ’s at once — remain Type-I risk safe.

𝜃 \theta^{+}>\theta italic_θ start_POSTSUPERSCRIPT + end_POSTSUPERSCRIPT > italic_θ to satisfy

𝜃 S^{+}_{\theta} italic_S start_POSTSUPERSCRIPT + end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_θ end_POSTSUBSCRIPT are also uniformly most powerful Bayes factors [ 23 , 15 ] and hence reasonable e-variables for 1-sided cs s. We continue with S θ subscript 𝑆 𝜃 S_{\theta} italic_S start_POSTSUBSCRIPT italic_θ end_POSTSUBSCRIPT for two-sided cs s. As is proved analogously to ( 10 ), S θ subscript 𝑆 𝜃 S_{\theta} italic_S start_POSTSUBSCRIPT italic_θ end_POSTSUBSCRIPT remains an e-variable even if neither the actual sample size n 𝑛 n italic_n nor the to-be-used significance level 0 < α < 1 0 𝛼 1 0<\alpha<1 0 < italic_α < 1 are equal to the hoped-for n * superscript 𝑛 n^{*} italic_n start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT and α * superscript 𝛼 \alpha^{*} italic_α start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT ; more on this below. In the SI we show that a sufficient condition for S θ ⁢ ( Y ) ≥ α − 1 subscript 𝑆 𝜃 𝑌 superscript 𝛼 1 S_{\theta}(Y)\geq\alpha^{-1} italic_S start_POSTSUBSCRIPT italic_θ end_POSTSUBSCRIPT ( italic_Y ) ≥ italic_α start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT , i.e. for θ ∉ cs α * ⁢ ( Y ) 𝜃 subscript superscript cs 𝛼 𝑌 \theta\not\in\text{\sc cs}^{*}_{\alpha}(Y) italic_θ ∉ cs start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_α end_POSTSUBSCRIPT ( italic_Y ) is that

E-Processes

3.1 reformulating coverage in terms of type-i risk.

We now generalize the definition of GNP testing problem so that (besides much else) it also allows for estimation with confidence intervals.

Definition 2

Fix a set of distributions 𝒫 𝒫 \mathcal{P} caligraphic_P for Y 𝑌 Y italic_Y , a set Θ normal-Θ \Theta roman_Θ and a function ϕ : 𝒫 → Θ normal-: italic-ϕ normal-→ 𝒫 normal-Θ \phi:\mathcal{P}\rightarrow\Theta italic_ϕ : caligraphic_P → roman_Θ mapping P ∈ 𝒫 𝑃 𝒫 P\in\mathcal{P} italic_P ∈ caligraphic_P to property ϕ ⁢ ( P ) ∈ Θ italic-ϕ 𝑃 normal-Θ \phi(P)\in\Theta italic_ϕ ( italic_P ) ∈ roman_Θ as above. A GNP (Generalized Neyman-Pearson) decision problem relative to 𝒫 𝒫 \mathcal{P} caligraphic_P , Θ normal-Θ \Theta roman_Θ and ϕ italic-ϕ \phi italic_ϕ is a tuple ( ℬ , { ( 𝒜 b , L b : Θ × 𝒜 b → ℝ 0 + ) : b ∈ ℬ } ) (\mathcal{B},\{(\mathcal{A}_{b},L_{b}:\Theta\times\mathcal{A}_{b}\rightarrow{% \mathbb{R}}^{+}_{0}):b\in\mathcal{B}\}) ( caligraphic_B , { ( caligraphic_A start_POSTSUBSCRIPT italic_b end_POSTSUBSCRIPT , italic_L start_POSTSUBSCRIPT italic_b end_POSTSUBSCRIPT : roman_Θ × caligraphic_A start_POSTSUBSCRIPT italic_b end_POSTSUBSCRIPT → blackboard_R start_POSTSUPERSCRIPT + end_POSTSUPERSCRIPT start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ) : italic_b ∈ caligraphic_B } ) .

A GNP decision problem is really a set of GNP testing problems, one for each θ ∈ Θ 𝜃 Θ \theta\in\Theta italic_θ ∈ roman_Θ : we recover Definition  1 by taking a singleton Θ = { 0 ¯ } , 𝒫 = ℋ ⁢ ( 0 ¯ ) formulae-sequence Θ ¯ 0 𝒫 ℋ ¯ 0 \Theta=\{\underline{0}\},\mathcal{P}=\mathcal{H}(\underline{0}) roman_Θ = { under¯ start_ARG 0 end_ARG } , caligraphic_P = caligraphic_H ( under¯ start_ARG 0 end_ARG ) and ϕ ⁢ ( P ) = 0 ¯ italic-ϕ 𝑃 ¯ 0 \phi(P)=\underline{0} italic_ϕ ( italic_P ) = under¯ start_ARG 0 end_ARG for all P ∈ ℋ ⁢ ( 0 ¯ ) 𝑃 ℋ ¯ 0 P\in\mathcal{H}(\underline{0}) italic_P ∈ caligraphic_H ( under¯ start_ARG 0 end_ARG ) . For general θ ∈ Θ 𝜃 Θ \theta\in\Theta italic_θ ∈ roman_Θ , the θ 𝜃 \theta italic_θ -testing problem corresponding to the GNP decision problem is the testing problem ( ℬ , { ( 𝒜 b , L b ( θ , ⋅ ) : 𝒜 b → ℝ 0 + ) : b ∈ ℬ } ) (\mathcal{B},\{(\mathcal{A}_{b},L_{b}(\theta,\cdot):\mathcal{A}_{b}\rightarrow% {\mathbb{R}}^{+}_{0}):b\in\mathcal{B}\}) ( caligraphic_B , { ( caligraphic_A start_POSTSUBSCRIPT italic_b end_POSTSUBSCRIPT , italic_L start_POSTSUBSCRIPT italic_b end_POSTSUBSCRIPT ( italic_θ , ⋅ ) : caligraphic_A start_POSTSUBSCRIPT italic_b end_POSTSUBSCRIPT → blackboard_R start_POSTSUPERSCRIPT + end_POSTSUPERSCRIPT start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ) : italic_b ∈ caligraphic_B } ) with null hypothesis ℋ ⁢ ( θ ) = { P : ϕ ⁢ ( P ) = θ } ℋ 𝜃 conditional-set 𝑃 italic-ϕ 𝑃 𝜃 \mathcal{H}(\theta)=\{P:\phi(P)=\theta\} caligraphic_H ( italic_θ ) = { italic_P : italic_ϕ ( italic_P ) = italic_θ } and with L b ⁢ ( θ , ⋅ ) subscript 𝐿 𝑏 𝜃 ⋅ L_{b}(\theta,\cdot) italic_L start_POSTSUBSCRIPT italic_b end_POSTSUBSCRIPT ( italic_θ , ⋅ ) in the role of L b ⁢ ( 0 ¯ , ⋅ ) subscript 𝐿 𝑏 ¯ 0 ⋅ L_{b}(\underline{0},\cdot) italic_L start_POSTSUBSCRIPT italic_b end_POSTSUBSCRIPT ( under¯ start_ARG 0 end_ARG , ⋅ ) . All definitions for GNP testing problems are now easily extended to GNP decision problems by requiring them to hold for the corresponding θ 𝜃 \theta italic_θ -testing problem, for all θ ∈ Θ 𝜃 Θ \theta\in\Theta italic_θ ∈ roman_Θ . In particular, we say that decision rule δ 𝛿 \delta italic_δ is compatible with e-collection { S θ : θ ∈ Θ } conditional-set subscript 𝑆 𝜃 𝜃 Θ \{S_{\theta}:\theta\in\Theta\} { italic_S start_POSTSUBSCRIPT italic_θ end_POSTSUBSCRIPT : italic_θ ∈ roman_Θ } if we have for all y ∈ 𝒴 𝑦 𝒴 y\in\mathcal{Y} italic_y ∈ caligraphic_Y , b ∈ ℬ 𝑏 ℬ b\in\mathcal{B} italic_b ∈ caligraphic_B that

The definition of Type-I risk safety is extended analogously from ( 13 ): δ 𝛿 \delta italic_δ is Type-I risk safe iff there exists an e-collection S = { S θ : θ ∈ Θ } S conditional-set subscript 𝑆 𝜃 𝜃 Θ \textsc{\rm S}=\{S_{\theta}:\theta\in\Theta\} S = { italic_S start_POSTSUBSCRIPT italic_θ end_POSTSUBSCRIPT : italic_θ ∈ roman_Θ } such that δ 𝛿 \delta italic_δ is compatible with S . If the expectation below is well-defined (which it will be in the confidence interval setting), Type-I risk safety is then clearly equivalent to the corresponding generalization of ( 12 ):

Admissibility is extended analogously: we call a decision rule δ ∘ superscript 𝛿 \delta^{\circ} italic_δ start_POSTSUPERSCRIPT ∘ end_POSTSUPERSCRIPT Type-II strictly better than δ 𝛿 \delta italic_δ if for all θ ∈ Θ 𝜃 Θ \theta\in\Theta italic_θ ∈ roman_Θ , the corresponding θ 𝜃 \theta italic_θ -testing problem satisfies ( 14 ) with 0 ¯ ¯ 0 \underline{0} under¯ start_ARG 0 end_ARG replaced by θ 𝜃 \theta italic_θ , whereas there exist θ ∈ Θ 𝜃 Θ \theta\in\Theta italic_θ ∈ roman_Θ , b ∈ ℬ , P ∈ ℋ ⁢ ( 0 ¯ ) formulae-sequence 𝑏 ℬ 𝑃 ℋ ¯ 0 b\in\mathcal{B},P\in\mathcal{H}(\underline{0}) italic_b ∈ caligraphic_B , italic_P ∈ caligraphic_H ( under¯ start_ARG 0 end_ARG ) such that the corresponding θ 𝜃 \theta italic_θ -testing problem satisfies ( 15 ) with 0 ¯ ¯ 0 \underline{0} under¯ start_ARG 0 end_ARG replaced by θ 𝜃 \theta italic_θ . The definition of admissibility and maximum compatibility are now based on this extended notion of Type-II strictly-betterness and otherwise unchanged; we further extend the notions of sharpness and richness to this generalized setting and provide a generalization of Theorem  1 to full GNP decision problems in the SI Appendix.

Confidence Intervals as Actions

Thus, we incur a Type-I loss, if the sampling distribution θ 𝜃 \theta italic_θ is not in the interval [ θ L , θ R ] subscript 𝜃 𝐿 subscript 𝜃 𝑅 [\theta_{L},\theta_{R}] [ italic_θ start_POSTSUBSCRIPT italic_L end_POSTSUBSCRIPT , italic_θ start_POSTSUBSCRIPT italic_R end_POSTSUBSCRIPT ] we specified, and b 𝑏 b italic_b determines how bad such a mistake is — this may again be data-dependent: we assume once again that we are presented B = b 𝐵 𝑏 B=b italic_B = italic_b via a random and potentially unknowable process, and we want to obtain the Type-I risk guarantee ( 20 ), which instantiates to

where δ b ⁢ ( Y ) = [ θ L ⁢ ( Y , b ) , θ R ⁢ ( Y , b ) ] subscript 𝛿 𝑏 𝑌 subscript 𝜃 𝐿 𝑌 𝑏 subscript 𝜃 𝑅 𝑌 𝑏 \delta_{b}(Y)=[\theta_{L}(Y,b),\theta_{R}(Y,b)] italic_δ start_POSTSUBSCRIPT italic_b end_POSTSUBSCRIPT ( italic_Y ) = [ italic_θ start_POSTSUBSCRIPT italic_L end_POSTSUBSCRIPT ( italic_Y , italic_b ) , italic_θ start_POSTSUBSCRIPT italic_R end_POSTSUBSCRIPT ( italic_Y , italic_b ) ] . Among all decision rules (i.e. confidence intervals) δ 𝛿 \delta italic_δ satisfying ( 22 ), we want to find the narrowest ones. Our definition of Type-II strictly better above ‘automatically’ accounts for this: the extended definition of Type-II betterness implies that [ θ L , θ R ] subscript 𝜃 𝐿 subscript 𝜃 𝑅 [\theta_{L},\theta_{R}] [ italic_θ start_POSTSUBSCRIPT italic_L end_POSTSUBSCRIPT , italic_θ start_POSTSUBSCRIPT italic_R end_POSTSUBSCRIPT ] is Type-II strictly better than [ θ L ′ , θ R ′ ] subscript superscript 𝜃 ′ 𝐿 subscript superscript 𝜃 ′ 𝑅 [\theta^{\prime}_{L},\theta^{\prime}_{R}] [ italic_θ start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_L end_POSTSUBSCRIPT , italic_θ start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_R end_POSTSUBSCRIPT ] iff [ θ L , θ R ] subscript 𝜃 𝐿 subscript 𝜃 𝑅 [\theta_{L},\theta_{R}] [ italic_θ start_POSTSUBSCRIPT italic_L end_POSTSUBSCRIPT , italic_θ start_POSTSUBSCRIPT italic_R end_POSTSUBSCRIPT ] is a proper subset of [ θ L ′ , θ R ′ ] subscript superscript 𝜃 ′ 𝐿 subscript superscript 𝜃 ′ 𝑅 [\theta^{\prime}_{L},\theta^{\prime}_{R}] [ italic_θ start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_L end_POSTSUBSCRIPT , italic_θ start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_R end_POSTSUBSCRIPT ] .

normal-^ absent 𝜃 𝐴 \theta_{\textsc{R}}\geq\hat{}\theta+A italic_θ start_POSTSUBSCRIPT R end_POSTSUBSCRIPT ≥ over^ start_ARG end_ARG italic_θ + italic_A where

normal-^ 𝜃 𝐴 \delta_{B}(Y)=[\hat{\theta}-A,\hat{\theta}+A] italic_δ start_POSTSUBSCRIPT italic_B end_POSTSUBSCRIPT ( italic_Y ) = [ over^ start_ARG italic_θ end_ARG - italic_A , over^ start_ARG italic_θ end_ARG + italic_A ] to satisfy this with equality to make the interval as narrow as possible, making our interval admissible. We are then guaranteed Type-I safety, ( 22 ), irrespective of the definition of B 𝐵 B italic_B . In contrast, it is not clear how to construct Type-I safe CI’s for data-dependent B 𝐵 B italic_B without e-values. We might be tempted to do this based on confidence distributions ( cd ’s) [ 6 , 41 ] that summarize confidence intervals for each α 𝛼 \alpha italic_α into a posterior-like quantity, or objective Bayes posteriors [ 3 ] , but as we now show, this can have bad results.

3.2 cd ’s and O’Bayes Posteriors are not valid Post-Hoc

^ 𝜃 superscript 𝑥 𝑛 1.96 𝑛 \theta_{R}=\hat{\theta}(x^{n})+1.96/\sqrt{n} italic_θ start_POSTSUBSCRIPT italic_R end_POSTSUBSCRIPT = over^ start_ARG italic_θ end_ARG ( italic_x start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT ) + 1.96 / square-root start_ARG italic_n end_ARG .

^ 𝜃 𝐴 \delta^{\prime}_{b}(Y)=[\hat{\theta}-A,\hat{\theta}+A] italic_δ start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_b end_POSTSUBSCRIPT ( italic_Y ) = [ over^ start_ARG italic_θ end_ARG - italic_A , over^ start_ARG italic_θ end_ARG + italic_A ] where A 𝐴 A italic_A , depending on b 𝑏 b italic_b , is the smallest number such that

^ 𝜃 𝐴 𝑌 superscript 𝑥 𝑛 1 𝑏 W^{\circ}(\bar{\theta}\not\in[\hat{\theta}-A,\hat{\theta}+A]\mid Y=x^{n})=1/b italic_W start_POSTSUPERSCRIPT ∘ end_POSTSUPERSCRIPT ( over¯ start_ARG italic_θ end_ARG ∉ [ over^ start_ARG italic_θ end_ARG - italic_A , over^ start_ARG italic_θ end_ARG + italic_A ] ∣ italic_Y = italic_x start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT ) = 1 / italic_b , this δ b ′ ⁢ ( Y ) subscript superscript 𝛿 ′ 𝑏 𝑌 \delta^{\prime}_{b}(Y) italic_δ start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_b end_POSTSUBSCRIPT ( italic_Y ) is equal to the standard ( 1 − α ) 1 𝛼 (1-\alpha) ( 1 - italic_α ) -confidence interval for α = 1 / b 𝛼 1 𝑏 \alpha=1/b italic_α = 1 / italic_b . The intuitive appeal for choosing this δ 𝛿 \delta italic_δ is clear: ( 25 ) expresses that as a DM one can expect the loss given the data to be bounded by 1 1 1 1 ; one simply wants to pick the smallest, most informative interval for which this holds true. Yet the real expectation of the loss may very well be different from ( 25 ) — assuming that B 𝐵 B italic_B is a fixed function of Y 𝑌 Y italic_Y , it is given by

𝑌 superscript italic-ϵ 2 𝑌 B:=1/(2F_{0}(-Y+\epsilon^{2}/Y)) italic_B := 1 / ( 2 italic_F start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ( - italic_Y + italic_ϵ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT / italic_Y ) ) where F 0 subscript 𝐹 0 F_{0} italic_F start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT is the CDF of a standard normal, then, as demonstrated in the SI, under θ * = 0 superscript 𝜃 0 \theta^{*}=0 italic_θ start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT = 0 , ( 26 ) evaluates to ∞ \infty ∞ , irrespective of the definition of B ⁢ ( Y ) 𝐵 𝑌 B(Y) italic_B ( italic_Y ) for Y < ϵ 𝑌 italic-ϵ Y<\epsilon italic_Y < italic_ϵ . In particular we may set B = 1 𝐵 1 B=1 italic_B = 1 for such Y 𝑌 Y italic_Y , corresponding to the decision problem being ‘called off’, because the required bound ( 25 ) is then achieved trivially by issuing the empty interval.

Repercussions for Neyman’s Inductive Behavior

This discrepancy between what one believes will happen according to a posterior (risk bounded by 1) and what actually will happen (potentially infinite risk) has repercussions for Neyman’s interpretation of statistics as long-run performance guarantees of inductive behavior. To illustrate, imagine a DM who is confronted with such a decision problem many times (each time j 𝑗 j italic_j the underlying θ ( j ) subscript 𝜃 𝑗 \theta_{(j)} italic_θ start_POSTSUBSCRIPT ( italic_j ) end_POSTSUBSCRIPT with Y ( j ) ∼ P θ ( j ) similar-to subscript 𝑌 𝑗 subscript 𝑃 subscript 𝜃 𝑗 Y_{(j)}\sim P_{\theta_{(j)}} italic_Y start_POSTSUBSCRIPT ( italic_j ) end_POSTSUBSCRIPT ∼ italic_P start_POSTSUBSCRIPT italic_θ start_POSTSUBSCRIPT ( italic_j ) end_POSTSUBSCRIPT end_POSTSUBSCRIPT and the sample size n ( j ) subscript 𝑛 𝑗 n_{(j)} italic_n start_POSTSUBSCRIPT ( italic_j ) end_POSTSUBSCRIPT and the importance function B ( j ) subscript 𝐵 𝑗 B_{(j)} italic_B start_POSTSUBSCRIPT ( italic_j ) end_POSTSUBSCRIPT may be different). Then, based on ( 25 ) she might think to have, by the law of large numbers, the guarantee that, almost surely,

Unfortunately however, this statement is likely false if in reality there is a dependence between B ( j ) subscript 𝐵 𝑗 B_{(j)} italic_B start_POSTSUBSCRIPT ( italic_j ) end_POSTSUBSCRIPT and Y ( j ) subscript 𝑌 𝑗 Y_{(j)} italic_Y start_POSTSUBSCRIPT ( italic_j ) end_POSTSUBSCRIPT In the SI we show that, based on the example above, the average in ( 27 ) may in fact a.s. converge to infinity, even though the individual B ( j ) subscript 𝐵 𝑗 B_{(j)} italic_B start_POSTSUBSCRIPT ( italic_j ) end_POSTSUBSCRIPT ’s look pretty innocuous. A first reaction may be to require the DM to address this problem by modeling the dependency between B ( j ) subscript 𝐵 𝑗 B_{(j)} italic_B start_POSTSUBSCRIPT ( italic_j ) end_POSTSUBSCRIPT and Y ( j ) subscript 𝑌 𝑗 Y_{(j)} italic_Y start_POSTSUBSCRIPT ( italic_j ) end_POSTSUBSCRIPT . But the precise relation may be unknowable, and then it is not clear how to do this. To avoid the issue one may output e-based CIs or, equivalently but perhaps more illuminatingly, CIs based on the e-posterior that we now introduce.

3.3 The E-Posterior remains valid Post-Hoc

Let S = { S θ : θ ∈ Θ } S conditional-set subscript 𝑆 𝜃 𝜃 Θ \textsc{\rm S}=\{S_{\theta}:\theta\in\Theta\} S = { italic_S start_POSTSUBSCRIPT italic_θ end_POSTSUBSCRIPT : italic_θ ∈ roman_Θ } be an e-collection. Just like it is tempting to interpret a ‘system’ of confidence intervals, one for each α 𝛼 \alpha italic_α , i.e. a cd , as a type of ‘posterior’, one can also view the S θ subscript 𝑆 𝜃 S_{\theta} italic_S start_POSTSUBSCRIPT italic_θ end_POSTSUBSCRIPT -reciprocal P ¯ ⁢ ( θ ∣ Y ) := S θ − 1 ⁢ ( Y ) assign ¯ 𝑃 conditional 𝜃 𝑌 subscript superscript 𝑆 1 𝜃 𝑌 \bar{P}(\theta\mid Y):=S^{-1}_{\theta}(Y) over¯ start_ARG italic_P end_ARG ( italic_θ ∣ italic_Y ) := italic_S start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_θ end_POSTSUBSCRIPT ( italic_Y ) as a type of ‘posterior representation of uncertainty’ for parameter θ 𝜃 \theta italic_θ . This idea has been conceived of independently by [ 57 ] and [ 15 ] , who called P ¯ ⁢ ( θ ∣ Y ) ¯ 𝑃 conditional 𝜃 𝑌 \bar{P}(\theta\mid Y) over¯ start_ARG italic_P end_ARG ( italic_θ ∣ italic_Y ) the e-posterior . The crucial difference between e-posteriors and cd s is that the former enable valid inferences under specific post-hoc, data-dependent assessments of Type-I risk, whereas standard cd ’s can only be validly used as in ( 24 ) if BIND holds. We thus recommend e-posteriors, like Cox [ 6 ] did for cd ’s, as a summary of estimation uncertainty — but a summary that is significantly more robust than that provided by cd ’s.

Using the e-posterior we can re-express compatibility, ( 19 ), as

with conventions about 0 ⋅ ∞ ⋅ 0 0\cdot\infty 0 ⋅ ∞ as underneath ( 8 ) and ℓ = 1 ℓ 1 \ell=1 roman_ℓ = 1 . We already know that δ 𝛿 \delta italic_δ satisfying ( 28 ) are Type-I risk safe irrespective of how B 𝐵 B italic_B is defined. The rewrite suggests an analogy to the Bayes posterior risk assessment, ( 24 ): if we replace objective Bayes/ cd -posterior expectation by e-posterior maximum , we get Type-I risk safety without the BIND assumption.

[ 15 ] shows that, for general bounds ℓ ℓ \ell roman_ℓ and with L b subscript 𝐿 𝑏 L_{b} italic_L start_POSTSUBSCRIPT italic_b end_POSTSUBSCRIPT replaced by general loss functions, without Type-I/II-dichotomies, assessment ( 28 ) is meaningful and provides a non-Bayesian alternative for Bayes-posterior expected loss assessment. In that paper, I also list a variety of e-posteriors, including an extension of the one of Example  4 to general exponential families, and point out deeper relations between e-posteriors and Bayesian posteriors. In the present paper, we merely present the e-posterior as a graphical tool which summarizes the e-based confidence intervals as given by ( 19 ) and helps to visualize how they relate to standard confidence intervals: see Figure  1 .

Refer to caption

4 State of the Art

(current) limitations and challenges.

subscript 𝑓 𝜃 𝑋 𝑍 noise Y=f_{\theta}(X,Z)+\textsc{noise} italic_Y = italic_f start_POSTSUBSCRIPT italic_θ end_POSTSUBSCRIPT ( italic_X , italic_Z ) + noise ) that involves a nonlinearity, such as GLMs and Cox proportional hazards, whenever the variable X 𝑋 X italic_X to be tested (e.g. treatment vs. control) does not satisfy the model-X assumption (conditional distribution of X 𝑋 X italic_X given Z 𝑍 Z italic_Z known). While model-X is automatically satisfied in clinical trials, there are of course many important cases in which it is not. Universal inference [ 56 ] provides an alternative generic e-design method that does lead to efficiently calculable e-values in such cases, but in regression problems it is not competitive in terms of power with classical methods for medium- to high-dimensional models [ 48 ] — its strength has rather been to provide e-values for complex ℋ ⁢ ( 0 ¯ ) ℋ ¯ 0 \mathcal{H}(\underline{0}) caligraphic_H ( under¯ start_ARG 0 end_ARG ) that have simply eluded classical testing [ 10 ] .

Challenges – II

With GRO-type methods one can obtain comparable performance in terms of power as compared to classical approaches. In many (not all) settings though, one needs to engage in optional stopping to achieve this. For a broad class of e-values, this is no problem ( [ 13 ] provides a detailed analysis): all coverage and Type-I risk guarantees are retained under such optional stopping. Still, it points towards a second challenge for e-methods, sociological/psychological rather than statistical: it requires researchers to think differently, and this is, of course, always difficult to accomplish. In this respect, the tech industry is at the forefront: anytime-valid methods based on e-processes have been adopted by several major tech companies [ 26 ] .

5 Discussion, Future Work, Conclusion

We provide a few concluding remarks. First we analyze in what sense we solved the ‘roving α 𝛼 \alpha italic_α ’ issue that motivated this work. Second, we discuss related work. Third, we suggest a ‘road ahead’ for e-methods.

Roving α 𝛼 \alpha italic_α Revisited: the Quasi-Conditional Paradigm

Assume we have a prior on ℋ ⁢ ( 0 ¯ ) ℋ ¯ 0 \mathcal{H}(\underline{0}) caligraphic_H ( under¯ start_ARG 0 end_ARG ) and ℋ ⁢ ( 1 ¯ ) ℋ ¯ 1 \mathcal{H}(\underline{1}) caligraphic_H ( under¯ start_ARG 1 end_ARG ) and priors W 0 subscript 𝑊 0 W_{0} italic_W start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT and W 1 subscript 𝑊 1 W_{1} italic_W start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT on the distributions inside these hypotheses. We can then use Bayes’ theorem to calculate the Bayes posterior P ⁢ ( ℋ ⁢ ( 0 ¯ ) ∣ Y ) 𝑃 conditional ℋ ¯ 0 𝑌 P(\mathcal{H}(\underline{0})\mid Y) italic_P ( caligraphic_H ( under¯ start_ARG 0 end_ARG ) ∣ italic_Y ) based on data Y 𝑌 Y italic_Y . Suppose we reject the null if P ⁢ ( ℋ ⁢ ( 0 ¯ ) ∣ Y ) ≤ α 𝑃 conditional ℋ ¯ 0 𝑌 𝛼 P(\mathcal{H}(\underline{0})\mid Y)\leq\alpha italic_P ( caligraphic_H ( under¯ start_ARG 0 end_ARG ) ∣ italic_Y ) ≤ italic_α . We may then define, for all y 𝑦 y italic_y for which this holds, i.e. for which we reject the null, the conditional Type-I error probability α ^ ^ 𝛼 \hat{\alpha} over^ start_ARG italic_α end_ARG to be simply equal to this posterior probability, α ^ := α ^ | y := P ⁢ ( H 0 ∣ Y = y ) \hat{\alpha}:=\hat{\alpha}_{|y}:=P(H_{0}\mid Y=y) over^ start_ARG italic_α end_ARG := over^ start_ARG italic_α end_ARG start_POSTSUBSCRIPT | italic_y end_POSTSUBSCRIPT := italic_P ( italic_H start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ∣ italic_Y = italic_y ) . This implies that, for any fixed α 0 ≤ α subscript 𝛼 0 𝛼 \alpha_{0}\leq\alpha italic_α start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ≤ italic_α , for any long sequence of studies, with probability tending to one,

Such a fully conditional statement, with post-hoc determined α ^ | Y \hat{\alpha}_{|Y} over^ start_ARG italic_α end_ARG start_POSTSUBSCRIPT | italic_Y end_POSTSUBSCRIPT , is only correct if the priors can be fully trusted, i.e. if one accepts a fully subjective Bayesian stance. It would definitely be incorrect if we set α ^ | Y \hat{\alpha}_{|Y} over^ start_ARG italic_α end_ARG start_POSTSUBSCRIPT | italic_Y end_POSTSUBSCRIPT either to a p-value or the reciprocal of an e-value based on Y 𝑌 Y italic_Y . Still, as we have seen, if we instead use e-values to perform a data-dependent action, which is allowed to get more extreme (higher Type-I loss) as our evidence against the null increases (higher e-value) according to the maximally compatible rule (which in simple cases is given by ( 8 )), then we do get an ‘unconditionally’ valid bound on Type-I risk. Thus, using e-values, setting a roving α 𝛼 \alpha italic_α to be equal to α ^ := ℓ / S ⁢ ( y ) assign ^ 𝛼 ℓ 𝑆 𝑦 \hat{\alpha}:=\ell/S(y) over^ start_ARG italic_α end_ARG := roman_ℓ / italic_S ( italic_y ) for the observed y 𝑦 y italic_y is still incorrect if we interpret it as expressing ( 29 ); but it is correct if we interpret it as setting a ‘roving bound’ of ℓ / α ^ ℓ ^ 𝛼 \ell/\hat{\alpha} roman_ℓ / over^ start_ARG italic_α end_ARG on the Type-I loss L B ⁢ ( 0 ¯ , a ) subscript 𝐿 𝐵 ¯ 0 𝑎 L_{B}(\underline{0},a) italic_L start_POSTSUBSCRIPT italic_B end_POSTSUBSCRIPT ( under¯ start_ARG 0 end_ARG , italic_a ) we dare to make: if we make sure to pick a 𝑎 a italic_a so that L B ⁢ ( 0 ¯ , a ) ≤ ℓ / α ^ subscript 𝐿 𝐵 ¯ 0 𝑎 ℓ ^ 𝛼 L_{B}(\underline{0},a)\leq\ell/\hat{\alpha} italic_L start_POSTSUBSCRIPT italic_B end_POSTSUBSCRIPT ( under¯ start_ARG 0 end_ARG , italic_a ) ≤ roman_ℓ / over^ start_ARG italic_α end_ARG , then we have compatibility and hence Type-I risk safety, ( 12 ). Note that B 𝐵 B italic_B is allowed to be any function of, hence ‘conditional on’ data; but its performance is evaluated ‘unconditionally’, i.e. by means of ( 12 ) which is an unconditional expectation. This quasi-conditional stance , explained further in [ 15 ] , provides a middle ground between fully Bayesian and traditional Neyman-Pearson-Wald type methods and analysis.

Where does the Type-I risk bound ℓ ℓ \ell roman_ℓ come from?

Whereas B 𝐵 B italic_B may arbitrarily depend on data Y 𝑌 Y italic_Y , the upper bound ℓ ℓ \ell roman_ℓ in ( 1 ) has to be set independently of Y 𝑌 Y italic_Y after all. It may still vary from decision problem to decision problem though (in the SI Appendix we explain what this means in terms of Neyman’s inductive behavior paradigm and we explain that setting ℓ = 1 ℓ 1 \ell=1 roman_ℓ = 1 , as was done for mathematical convenience in Section  2 , is unproblematic). In many practical testing problems, we might expect that for all b ∈ ℬ 𝑏 ℬ b\in\mathcal{B} italic_b ∈ caligraphic_B , 𝒜 b subscript 𝒜 𝑏 \mathcal{A}_{b} caligraphic_A start_POSTSUBSCRIPT italic_b end_POSTSUBSCRIPT contains a special action 0 0 , which stands for ‘do nothing’ (keep status quo), which would then have the same Type-II loss under all b ∈ ℬ 𝑏 ℬ b\in\mathcal{B} italic_b ∈ caligraphic_B , i.e. there is an ℓ ′ superscript ℓ ′ \ell^{\prime} roman_ℓ start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT such that for all b ∈ 𝒜 b 𝑏 subscript 𝒜 𝑏 b\in\mathcal{A}_{b} italic_b ∈ caligraphic_A start_POSTSUBSCRIPT italic_b end_POSTSUBSCRIPT , L b ⁢ ( 1 ¯ , 0 ) = ℓ ′ subscript 𝐿 𝑏 ¯ 1 0 superscript ℓ ′ L_{b}(\underline{1},0)=\ell^{\prime} italic_L start_POSTSUBSCRIPT italic_b end_POSTSUBSCRIPT ( under¯ start_ARG 1 end_ARG , 0 ) = roman_ℓ start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT . We might then simply set ℓ = ℓ ′ ℓ superscript ℓ ′ \ell=\ell^{\prime} roman_ℓ = roman_ℓ start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT , making sure that we can expect our result (with all costs and benefits incorporated), whatever action we take, to be no worse than “the cost of doing nothing when we really should have done something”.

Related Work: Inferential Models

Like we do in Example  4 , Martin, Liu and collaborators [ 28 , 27 ] point out discrepancies between what one would expect to be a valid confidence set according to a fiducial, cd   or Bayesian posterior and what are actually valid confidence sets according to the unknown, true distribution. They provide inferential models (IMs) as a safer alternative. Unlike e-posteriors, the specific IMs proposed by [ 28 ] still work under the BIND assumption and thus will not provide reliable inferences if BIND does not hold. But it may very well be that some other IMs (IMs constitute a family of methods, not a single method) essentially behave like e-posteriors.

The Road Ahead

Future work will include a further investigation of the ‘quasi-conditional’ idea launched above, as well as of the precise relation to Martin’s IMs and other related uses of e-variables such as [ 1 ] who, like us, employ e-values with a Type-I/II-error distinction with more than 2 actions.

Another unresolved fundamental issue is this: most practitioners still interpret p-values in a Fisherian way, as a notion of evidence against ℋ ⁢ ( 0 ¯ ) ℋ ¯ 0 \mathcal{H}(\underline{0}) caligraphic_H ( under¯ start_ARG 0 end_ARG ) . Although this interpretation has always been controversial, it is to some extent, and with caveats (such as ‘single isolated small p-value does not give substantial evidence’ [ 29 ] or ‘only work with special, evidential p-values [ 12 ] ’), adopted by highly accomplished statisticians, including the late Sir David Cox [ 7 , 30 ] . Even Neyman [ 33 ] has written ‘my own preferred substitute for ‘do not reject H 𝐻 H italic_H ’ is ‘no evidence against H 𝐻 H italic_H is found’. In light of the present results, one may ask if, perhaps, e-values are more suitable than p-values as such a measure. We preliminarily conjecture they are, and motivate this in the SI — although a proper analysis of such a claim warrants a separate paper, which we hope to provide in the future.

Perhaps more important for practice than all of this though, in light of Section  4 above, is the further development of practically useful e-variables for standard settings (such as GLMs) in which they are not yet available, as well as more accompanying software such as [ 51 ] .

Conclusion: A different kind of Robustness

Standard p and cs -based decision rely on BIND, an assumption that will often be false or unverifiable at the time study results are published. In this paper we showed that e-values provide valid error and risk guarantees without making such assumptions, and are therefore robust tools for inference. But whereas ‘robustness’ usually refers to robust inference in the presence of outliers, or model structure or noise process misspecification, this is a different, much less studied form of robustness: robustness in terms of the actual decision task that the study results will be used to solve.

Acknowledgements

The author would like to thank an anonymous referee and Dr. W. Koolen, who both independently alerted him to the fact that, without essential loss of generality, one may assume Type-II loss to decrease whenever Type-I loss increases.

  • [1] S. Bates, M. I. Jordan, M. Sklar, and J. Soloff. Principal-agent hypothesis testing. arXiv:2205.06812 , 2022.
  • [2] J. Berger and T. Sellke. Testing a point null hypothesis: The irreconcilability of p values and evidence (with discussion and rejoinder). Journal of the American Statistical Association , 82(397):112–122,135–139, 1987.
  • [3] J.O. Berger. Statistical Decision Theory and Bayesian Analysis . Springer Series in Statistics. Springer-Verlag, New York, 1985.
  • [4] J.O. Berger. Could Fisher, Jeffreys and Neyman have agreed on testing? Statistical Science , 18(1):1–12, 2003.
  • [5] Patrick Billingsley. Probability and Measure . Wiley, third edition, 1995.
  • [6] David R Cox. Some problems connected with statistical inference. Annals of Mathematical Statistics , 29:357–372, 1958.
  • [7] David R. Cox. In gentle praise of significance tests, 2018. Talk given at RSS 2018 keynote conference session.
  • [8] D.R. Cox and D.V. Hinkley. Theoretical Statistics . Chapman and Hall, London, 1974.
  • [9] Haosui Duanmu and Daniel M. Roy. On extended admissible procedures and their nonstandard Bayes risk. Annals of Statistics , 49(4):2053–2078, 2021.
  • [10] Robin Dunn, Aditya Gangrade, Larry Wasserman, and Aaditya Ramdas. Universal inference meets random projections: a scalable test for log-concavity. arXiv:2111.09254 , 2021.
  • [11] A.W.F. Edwards. Likelihood . Cambridge University Press, 1984.
  • [12] Sander Greenland. Divergence vs. decision p-values: A distinction worth making in theory and keeping in practice. Scandinavian Journal of Statistics , 2022.
  • [13] P. Grünwald, Rianne de Heide, and Wouter Koolen. Safe testing. Journal of the Royal Statistical Society, Series B , 2024. to appear, with discussion; also arXiv:1906.07801.
  • [14] Peter Grunwald. Safe probability. Journal of Statistical Planning and Inference , 2018.
  • [15] Peter Grünwald. The E-posterior. Phil. Trans. Roy. Soc. A , 2023.
  • [16] Peter Grünwald, Alexander Henzi, and Tyron Lardy. Anytime-valid tests of conditional independence under model-X. Journal of the American Statistical Association , 2023.
  • [17] J. Hannig, H. Iyer, R.C.S. Lai, and T.C.M. Lee. Generalized fiducial inference: A review and new results. Journal of the American Statistical Association , 111(515):1346–1361, 2016.
  • [18] Yunda Hao, Peter Grünwald, Tyron Lardy, Long Long, and Reuben Adams. E-values for k-sample tests with exponential families. Sankhya A , 2024.
  • [19] Alexander Henzi and Johanna F. Ziegel. Valid sequential inference on probability forecast performance. Biometrika , 2021.
  • [20] R. Hubbard. Alphabet soup: Blurring the distinctions between p 𝑝 p italic_p ’s and α 𝛼 \alpha italic_α ’s in psychological research. Theory and Psychology , 14(3):295–327, 2004.
  • [21] R. Hubbard and M.J. Bayarri. Confusion over measures of evidence (p’s) versus errors ( α 𝛼 \alpha italic_α ’s) in classical statistical testing. The American Statistician , 57:171––177, 2003.
  • [22] Nikolaos Ignatiadis, Ruodu Wang, and Aaditya Ramdas. E-values as unnormalized weights in multiple testing. arXiv:2204.12447 , 2022.
  • [23] Valen E Johnson. Uniformly most powerful Bayesian tests. Annals of Statistics , 41(4), 2013.
  • [24] E.L. Lehmann. Testing Statistical Hypotheses . Wiley, first edition, 1959.
  • [25] E.L. Lehmann. The Fisher, Neyman-Pearson theories of testing hypotheses: One theory or two? Journal of the American Statistical Association , 88(424):1242–1249, 1993.
  • [26] Michael Lindon, Dae Woong Ham, Martin Tingley, and Iavor Bojinov. Anytime-valid linear models and regression adjusted causal inference in randomized experiments. arXiv:2210:08589 , 2022.
  • [27] Ryan Martin. Inferential models and the decision-theoretic implications of the validity property. arXiv preprint arXiv:2112.132472009 , 2021.
  • [28] Ryan Martin and Chuanhai Liu. Inferential models: reasoning with uncertainty . CRC Press, 2015.
  • [29] Deborah G Mayo. Statistical inference as severe testing: How to get beyond the statistics wars . Cambridge University Press, 2018.
  • [30] Deborah G Mayo and David R Cox. Frequentist statistics as a theory of inductive inference. Lecture Notes-Monograph Series 2nd Lehmann Symposium , pages 77–97, 2006.
  • [31] Willie Neiswanger and Aaditya Ramdas. Uncertainty quantification using martingales for misspecified Gaussian processes. In Algorithmic Learning Theory . PMLR, 2021.
  • [32] J. Neyman. First Course in Probability and Statstics . 1950.
  • [33] J. Neyman. Tests of statistical hypotheses and their use in studies of natural phenomena. Communications in Statistics: Theory and Methods , 5(8):737–751, 1976.
  • [34] J. Neyman and E. S Pearson. On the problem of the most efficient tests of statistical hypotheses. Phil. Trans. R. Soc. Lond. A. , 231:289––337, 1933.
  • [35] Muriel Felipe Pérez-Ortiz, Tyron Lardy, Rianne de Heide, and Peter Grünwald. E-statistics, group invariance and anytime valid testing. arXiv:2208.07610 , 2022.
  • [36] Aleksandr Podkopaev, Patrick Bloebaum, Shiva Kasiviswanathan, and Aaditya Ramdas. Sequential kernelized independence testing. In International Conference on Machine Learning , 2023.
  • [37] Aaditya Ramdas, Peter Grünwald, Volodya Vovk, and Glenn Shafer. game-theoretic statistics and safe anytime-valid inference. statistical science , 2023. To appear.
  • [38] Aaditya Ramdas, Johannes Ruf, Martin Larsson, and Wouter Koolen. Admissible anytime-valid sequential inference must rely on nonnegative martingales. arXiv:2009.03167 , 2020.
  • [39] Richard Royall. Statistical evidence: a likelihood paradigm . Chapman and Hall, 1997.
  • [40] Alexander Ly Samuel Pawel and Eric-Jan Wagenmakers. Evidential calibration of confidence intervals. The American Statistician , 78(1):47–57, 2024.
  • [41] T. Schweder and N. Hjort. Confidence, Likelihood, Probability: Statistical Inference with Confidence Distributions . Cambridge University Press, 2016.
  • [42] G. Shafer. Testing by betting: A strategy for statistical and scientific communication. Journal of the Royal Statistical Society, Series A , 2021. With Discussion.
  • [43] G. Shafer and V. Vovk. Game-Theoretic Probability: Theory and Applications to Prediction, Science and Finance . Wiley, 2019.
  • [44] Glenn Shafer, Alexander Shen, Nikolai Vereshchagin, and Vladimir Vovk. Test martingales, Bayes factors and p-values. Statistical Science , pages 84–101, 2011.
  • [45] J. ter Schure and P. Grünwald. ALL-IN meta-analysis: breathing life into living systematic reviews. F1000Research , 11(549), 2022.
  • [46] J. ter Schure, M.F. Perez-Ortiz, A. Ly, and P. Grünwald. Safe logrank test: Error control under continuous monitoring with unlimited horizon. arXiv:1906.07801 , 2021.
  • [47] Judith Ter Schure, Alexander Ly, Lisa Belin, Christine S Benn, Marc JM Bonten, Jeffrey D Cirillo, Johanna AA Damen, Inês Fronteira, Kelly D Hendriks, Anna Paula Junqueira-Kipnis, André Kipnis, Odile Launay, Jose Euberto Mendez-Reyes, Mihai G Netea, Sebastian Nielsen, Caryn M Upton, Gerben van den Hoogen, Jesper M Weehuizen, Peter D Grünwald, and CH (Henri) van Werkhoven. Bacillus Calmette-Guérin vaccine to reduce COVID-19 infections and hospitalisations in healthcare workers. medRxiv , pages 2022–12, 2022.
  • [48] Timmy Tse and Anthony C Davison. A note on universal inference. Stat , 11(1):e501, 2022.
  • [49] Rosanne Turner and Peter Grünwald. Anytime-valid confidence intervals for contingency tables and beyond. Statistics and Probability Letters , 2023.
  • [50] Rosanne Turner and Peter Grünwald. Safe sequential testing and effect estimation in stratified count data. In Annual AI and Statistics Conference , 2023.
  • [51] Rosanne Turner, Alexander Ly, Muriel-Felipe Ortiz-Perez, Judith ter Schure, and Peter Grünwald. R-package safestats , 2022. CRAN.
  • [52] Vladimir Vovk and Ruodu Wang. E-values: Calibration, combination, and applications. Annals of Statistics , 2021.
  • [53] Eric-Jan Wagenmakers, Quentin F Gronau, Fabian Dablander, and Alexander Etz. The support interval. Erkenntnis , pages 1–13, 2020.
  • [54] Abraham Wald. Contributions to the theory of statistical estimation and testing hypotheses. Annals of Mathematical Statistics , 10:299–326, 1939.
  • [55] Ruodu Wang and Aaditya Ramdas. False discovery rate control with e-values. Journal of the Royal Statistical Society: Series B , 2022.
  • [56] Larry Wasserman, Aaditya Ramdas, and Sivaraman Balakrishnan. Universal inference. Proceedings of the National Academy of Sciences , 117(29):16880–16890, 2020.
  • [57] Ian Waudby-Smith and Aaditya Ramdas. Estimating means of bounded random variables by betting. Journal of the Royal Statistical Society: Series B , 2024. With discussion.
  • [58] Ziyu Xu, Ruodu Wang, and Aaditya Ramdas. Post-selection inference for e-value based confidence intervals. arXiv:2203.12572 , 2022.
  • [59] C. Yanofsky. It’s chancy, 2019. Blog Post February 5th, 2019.
  • [60] Zhenyuan Zhang, Aaditya Ramdas, and Ruodu Wang. When do exact and powerful p-values and e-values exist? arXiv:2305.16539 , 2023.

Supporting Information Appendix

Appendix a supporting information for section  1 and section  2 . 2.1, unbounded expected loss based on ( 4 ) and an ‘improvement’ of ( 4 ).

𝑐 1 \textsc{p}(y)=2^{-c+1} p ( italic_y ) = 2 start_POSTSUPERSCRIPT - italic_c + 1 end_POSTSUPERSCRIPT , which happens with exactly probability 2 − c superscript 2 𝑐 2^{-c} 2 start_POSTSUPERSCRIPT - italic_c end_POSTSUPERSCRIPT , so with probability q ⁢ ( y ) q 𝑦 \textsc{q}(y) q ( italic_y ) ). Yet still, using ( 30 ) leads to unbounded expected loss: in the above sample the expected loss is now k 𝑘 k italic_k rather than 2 ⁢ k 2 𝑘 2k 2 italic_k , still growing linearly in k 𝑘 k italic_k .

Standard conversions of p-values into e-values are sub-optimal

𝜃 S^{+}_{\theta} italic_S start_POSTSUPERSCRIPT + end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_θ end_POSTSUBSCRIPT for the normal location family as defined underneath ( 17 ) with α * = α superscript 𝛼 𝛼 \alpha^{*}=\alpha italic_α start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT = italic_α and n * = n superscript 𝑛 𝑛 n^{*}=n italic_n start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT = italic_n specified correctly, then to get power 0.8 0.8 0.8 0.8 we need a factor of ≈ 1.75 absent 1.75 \approx 1.75 ≈ 1.75 more data points than if we use the standard UMP NP test (this follows from the derivation in [ 13 , Appendix B.6] ; as explained there, the factor can be significantly reduced by optional stopping). If, instead, we use the p-value corresponding to this UMP test directly and turn it into an e-value by the above calibrator, we need a factor of ≈ 3.0 absent 3.0 \approx 3.0 ≈ 3.0 more data [ 13 , Section 7] . The reason for this discrepancy is that calibrators work for arbitrary p-values and are thus ‘blind’ to the underlying sampling model (in this case, normal location). In order to get high power it is invariably (much) better to use e-values designed for the underlying model directly — so, importantly, the distinction between e and p is not just a matter of scale and designing ‘good’ e-values (with good GRO properties) is a nontrivial taks — we cannot just take any given p-value with good power-properties and calibrate.

Why normalizing ℓ ℓ \ell roman_ℓ to 1 in ( 12 ) and ( 13 ) is not harmful

This is further discussed, in a wider context, in the Supporting Information for Section  5 further below.

Appendix B Supporting Information for Section  2 . 2.2 until and including Section  3 . 3.1

In this section we state and prove Theorem  3 , a general result which has Theorem  1 of Section  2 as a special case, extending it to the case of general GNP decision problems (e.g. confidence intervals) as defined in Definition  2 , Section  3 . However, in the first subsection below, we first state and prove a simpler form of the theorem which works for simple GNP decision problems, as defined in Section  2 , simplified even further by requiring countable 𝒴 𝒴 \mathcal{Y} caligraphic_Y . This allows us to strip away all issues about ‘almost surely, measurability’ etc. and focus on the key idea, which lies in the fundamental Lemma  1 . The proof of the general Theorem  3 is based on essentially the same key insight but requires substantial additional notations and quantifications. We prepare these in Section  B . B.2 below, where we also explain why the probabilities in ( 14 ), ( 15 ) (used to define Type-II strictly-better-than and admissibility) as well as the expectation ( 12 ) may be undefined in pathological cases, and we extend the definitions of Type-II-strictly better and admissibility to ensure that these are always well-defined. We then state and prove Lemma  2 , the general version of Lemma  1 in Section  B . B.3 , and state and prove the general theorem in Section  B . B.4 . Before we do all this though, we explain, as promised underneath Theorem  1 in the main text, how we can make any GNP testing problem rich by adding a single additional loss function and why this makes the condition of richness a reasonable one. This is illustrated by Example  6 which indicates why a condition like richness is necessary and thereby gives a (very) high-level intuition to the proof.

Enforcing Richness relative to S 𝑆 S italic_S : why richness is a weak condition, and why it is needed

For any sharp e-variable S 𝑆 S italic_S which we might want to base our decisions on, we can trivially enlarge any given GNP testing problem ( ℬ ′ , { ( 𝒜 b , L b ( 0 ¯ , ⋅ ) : 𝒜 b → ℝ 0 + ) : b ∈ ℬ ′ } ) (\mathcal{B}^{\prime},\{(\mathcal{A}_{b},L_{b}(\underline{0},\cdot):\mathcal{A% }_{b}\rightarrow{\mathbb{R}}^{+}_{0}):b\in\mathcal{B}^{\prime}\}) ( caligraphic_B start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT , { ( caligraphic_A start_POSTSUBSCRIPT italic_b end_POSTSUBSCRIPT , italic_L start_POSTSUBSCRIPT italic_b end_POSTSUBSCRIPT ( under¯ start_ARG 0 end_ARG , ⋅ ) : caligraphic_A start_POSTSUBSCRIPT italic_b end_POSTSUBSCRIPT → blackboard_R start_POSTSUPERSCRIPT + end_POSTSUPERSCRIPT start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ) : italic_b ∈ caligraphic_B start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT } ) by setting ℬ := ℬ ′ ∪ { id ⁢ ( S ) } assign ℬ superscript ℬ ′ id 𝑆 \mathcal{B}:=\mathcal{B}^{\prime}\cup\{\textsc{id}(S)\} caligraphic_B := caligraphic_B start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ∪ { id ( italic_S ) } , adding a loss function indexed by id ⁢ ( S ) id 𝑆 \textsc{id}(S) id ( italic_S ) with action space 𝒜 id ⁢ ( S ) subscript 𝒜 id 𝑆 \mathcal{A}_{\textsc{id}(S)} caligraphic_A start_POSTSUBSCRIPT id ( italic_S ) end_POSTSUBSCRIPT set equal to the co-domain of S 𝑆 S italic_S and, for s ∈ 𝒜 id ⁢ ( S ) 𝑠 subscript 𝒜 id 𝑆 s\in\mathcal{A}_{\textsc{id}(S)} italic_s ∈ caligraphic_A start_POSTSUBSCRIPT id ( italic_S ) end_POSTSUBSCRIPT , we set L id ⁢ ( S ) ⁢ ( 0 ¯ , s ) := s assign subscript 𝐿 id 𝑆 ¯ 0 𝑠 𝑠 L_{\textsc{id}(S)}(\underline{0},s):=s italic_L start_POSTSUBSCRIPT id ( italic_S ) end_POSTSUBSCRIPT ( under¯ start_ARG 0 end_ARG , italic_s ) := italic_s . Then the extended GNP decision problem will automatically be rich relative S 𝑆 S italic_S , and Part 2 of Theorem  1 can be applied. In reality, DM usually is not aware of the full details of the problem anyway, being only presented one particular loss function L b subscript 𝐿 𝑏 L_{b} italic_L start_POSTSUBSCRIPT italic_b end_POSTSUBSCRIPT , an element of a set { L b : b ∈ ℬ ′ } conditional-set subscript 𝐿 𝑏 𝑏 superscript ℬ ′ \{L_{b}:b\in\mathcal{B}^{\prime}\} { italic_L start_POSTSUBSCRIPT italic_b end_POSTSUBSCRIPT : italic_b ∈ caligraphic_B start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT } that is unknown: DM will only know the definition of the particular function L b subscript 𝐿 𝑏 L_{b} italic_L start_POSTSUBSCRIPT italic_b end_POSTSUBSCRIPT that she is presented with. Thus, assuming the set already contains this additional, special loss L id ⁢ ( S ) subscript 𝐿 id 𝑆 L_{\textsc{id}(S)} italic_L start_POSTSUBSCRIPT id ( italic_S ) end_POSTSUBSCRIPT does not really impose any additional condition on the DM and only serves to make the analysis more robust, it thus seems a reasonable assumption. It gives an imagined adversary who chooses b = B ⁢ ( Y ) 𝑏 𝐵 𝑌 b=B(Y) italic_b = italic_B ( italic_Y ) as function of Y 𝑌 Y italic_Y more power, and as illustrated by the example below, without something like this added power, the theorem simply cannot hold. As such is analogous to (but not the same as) allowing an adversary to randomize between actions, as required for the minimax theorem in game theory. To take the analogy to the minimax theorem even further, we note that, just like in that theorem, one side of the proof (Part 1) is in essence trivial whereas the other side (Part 2) requires a sophisticated argument.

Consider a GNP testing problem and e-variable S 𝑆 S italic_S defined as follows:

𝒴 = { 0 , 10 , 20 } 𝒴 0 10 20 \mathcal{Y}=\{0,10,20\} caligraphic_Y = { 0 , 10 , 20 } ; ℋ ⁢ ( 0 ¯ ) = { P 0 } ℋ ¯ 0 subscript 𝑃 0 \mathcal{H}(\underline{0})=\{P_{0}\} caligraphic_H ( under¯ start_ARG 0 end_ARG ) = { italic_P start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT } with P 0 ⁢ ( Y = 10 ) = 1 / 20 subscript 𝑃 0 𝑌 10 1 20 P_{0}(Y=10)=1/20 italic_P start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ( italic_Y = 10 ) = 1 / 20 and P 0 ⁢ ( Y = 20 ) = 1 / 40 subscript 𝑃 0 𝑌 20 1 40 P_{0}(Y=20)=1/40 italic_P start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ( italic_Y = 20 ) = 1 / 40 .

ℬ = { b 1 } ℬ subscript 𝑏 1 \mathcal{B}=\{b_{1}\} caligraphic_B = { italic_b start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT } , 𝒜 b 1 = { 0 , 9 , 19 , 21 } subscript 𝒜 subscript 𝑏 1 0 9 19 21 \mathcal{A}_{b_{1}}=\{0,9,19,21\} caligraphic_A start_POSTSUBSCRIPT italic_b start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_POSTSUBSCRIPT = { 0 , 9 , 19 , 21 } , and for all a ∈ 𝒜 b 1 𝑎 subscript 𝒜 subscript 𝑏 1 a\in\mathcal{A}_{b_{1}} italic_a ∈ caligraphic_A start_POSTSUBSCRIPT italic_b start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_POSTSUBSCRIPT , we set L b 1 ⁢ ( 0 ¯ , a ) = a subscript 𝐿 subscript 𝑏 1 ¯ 0 𝑎 𝑎 L_{b_{1}}(\underline{0},a)=a italic_L start_POSTSUBSCRIPT italic_b start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_POSTSUBSCRIPT ( under¯ start_ARG 0 end_ARG , italic_a ) = italic_a .

S ⁢ ( y ) := y assign 𝑆 𝑦 𝑦 S(y):=y italic_S ( italic_y ) := italic_y .

We note that S 𝑆 S italic_S is a sharp e-variable, but the GNP testing problem is not rich relative to S 𝑆 S italic_S . And indeed, the conclusion of Part 2 of Theorem  1 is violated here:

is seen to be a decision rule that is maximally compatible with S 𝑆 S italic_S , but it is not admissible: the decision rule

9 20 21 40 39 40 1 {\bf E}_{P_{0}}[L_{b_{1}}(\underline{0},\delta^{\prime}_{b_{1}}(Y))]=9/20+21/4% 0=39/40<1 bold_E start_POSTSUBSCRIPT italic_P start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT end_POSTSUBSCRIPT [ italic_L start_POSTSUBSCRIPT italic_b start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_POSTSUBSCRIPT ( under¯ start_ARG 0 end_ARG , italic_δ start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_b start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_POSTSUBSCRIPT ( italic_Y ) ) ] = 9 / 20 + 21 / 40 = 39 / 40 < 1 . So we have a decision rule that is maximally compatible relative to a sharp e-variable yet not admissible — this shows some additional condition such as richness is necessary. To get an initial idea why ‘richness’ does the trick, let’s enlarge ℬ ℬ \mathcal{B} caligraphic_B as above to make the resulting GNP testing problem rich relative to S 𝑆 S italic_S . That is, we add loss function indexed by b 2 := id ⁢ ( S ) assign subscript 𝑏 2 id 𝑆 b_{2}:=\textsc{id}(S) italic_b start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT := id ( italic_S ) , so that 𝒜 b 2 = { 0 , 10 , 20 } subscript 𝒜 subscript 𝑏 2 0 10 20 \mathcal{A}_{b_{2}}=\{0,10,20\} caligraphic_A start_POSTSUBSCRIPT italic_b start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT end_POSTSUBSCRIPT = { 0 , 10 , 20 } , and for all a ∈ 𝒜 b 2 𝑎 subscript 𝒜 subscript 𝑏 2 a\in\mathcal{A}_{b_{2}} italic_a ∈ caligraphic_A start_POSTSUBSCRIPT italic_b start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT end_POSTSUBSCRIPT , we have L b 2 ⁢ ( 0 ¯ , a ) = a subscript 𝐿 subscript 𝑏 2 normal-¯ 0 𝑎 𝑎 L_{b_{2}}(\underline{0},a)=a italic_L start_POSTSUBSCRIPT italic_b start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT end_POSTSUBSCRIPT ( under¯ start_ARG 0 end_ARG , italic_a ) = italic_a . Decision rule δ 𝛿 \delta italic_δ above was maximally compatible in the original problem, and the only way to extend it to the enlarged problem while keeping it maximally compatible is to set δ b 2 ⁢ ( y ) = y subscript 𝛿 subscript 𝑏 2 𝑦 𝑦 \delta_{b_{2}}(y)=y italic_δ start_POSTSUBSCRIPT italic_b start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT end_POSTSUBSCRIPT ( italic_y ) = italic_y for all y ∈ 𝒴 𝑦 𝒴 y\in\mathcal{Y} italic_y ∈ caligraphic_Y . But now, in this enlarged problem, δ 𝛿 \delta italic_δ has become admissible! Rather than proving this in full generality we will just show that the decision rule δ ′ superscript 𝛿 normal-′ \delta^{\prime} italic_δ start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT above that witnessed δ 𝛿 \delta italic_δ ’s inadmissibility in the original problem will not witness it any more in the enlarged problem. To see why, note that to witness inadmissibility of δ 𝛿 \delta italic_δ , we must have that δ ′ superscript 𝛿 normal-′ \delta^{\prime} italic_δ start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT is Type-II strictly better than δ 𝛿 \delta italic_δ and at the same time Type-I risk safe. The only way to extend δ ′ superscript 𝛿 normal-′ \delta^{\prime} italic_δ start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT to the enlarged problem while keeping it strictly better than δ 𝛿 \delta italic_δ is to set it such that δ b 2 ′ ⁢ ( y ) ≥ δ b 2 ⁢ ( y ) subscript superscript 𝛿 normal-′ subscript 𝑏 2 𝑦 subscript 𝛿 subscript 𝑏 2 𝑦 \delta^{\prime}_{b_{2}}(y)\geq\delta_{b_{2}}(y) italic_δ start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_b start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT end_POSTSUBSCRIPT ( italic_y ) ≥ italic_δ start_POSTSUBSCRIPT italic_b start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT end_POSTSUBSCRIPT ( italic_y ) for all y ∈ 𝒴 𝑦 𝒴 y\in\mathcal{Y} italic_y ∈ caligraphic_Y . But then it is not Type-I risk safe any more, so this extended δ ′ superscript 𝛿 normal-′ \delta^{\prime} italic_δ start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT does not show δ 𝛿 \delta italic_δ to be inadmissible! To see why the extended δ ′ superscript 𝛿 normal-′ \delta^{\prime} italic_δ start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT is not Type-I risk safe any more, note that, by adding loss L b 2 subscript 𝐿 subscript 𝑏 2 L_{b_{2}} italic_L start_POSTSUBSCRIPT italic_b start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT end_POSTSUBSCRIPT , we gave the imagined adversary more power: upon observing y = 10 𝑦 10 y=10 italic_y = 10 , the adversary can now choose B = b 2 𝐵 subscript 𝑏 2 B=b_{2} italic_B = italic_b start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT , and upon observing y = 20 𝑦 20 y=20 italic_y = 20 , she can choose B = b 1 𝐵 subscript 𝑏 1 B=b_{1} italic_B = italic_b start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT . Then 𝐄 P 0 [ L B ( 0 ¯ , δ B ′ ( Y ) ) ≥ ( 1 / 20 ) ⋅ 10 + ( 1 / 40 ) ⋅ 21 = 21 / 40 > 1 {\bf E}_{P_{0}}[L_{B}(\underline{0},\delta^{\prime}_{B}(Y))\geq(1/20)\cdot 10+% (1/40)\cdot 21=21/40>1 bold_E start_POSTSUBSCRIPT italic_P start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT end_POSTSUBSCRIPT [ italic_L start_POSTSUBSCRIPT italic_B end_POSTSUBSCRIPT ( under¯ start_ARG 0 end_ARG , italic_δ start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_B end_POSTSUBSCRIPT ( italic_Y ) ) ≥ ( 1 / 20 ) ⋅ 10 + ( 1 / 40 ) ⋅ 21 = 21 / 40 > 1 , so δ ′ superscript 𝛿 normal-′ \delta^{\prime} italic_δ start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT is not Type-I risk safe. Lemma  1 and its generalization Lemma  2 further below formalize this idea and are the key to proving Part 2 of Theorem  1 in the main text and its generalization Theorem  3 below.

B.1 Theorem  1 for countable 𝒴 𝒴 \mathcal{Y} caligraphic_Y and ℋ ( 0 ¯ ) \mathcal{H}_{(}\underline{0}) caligraphic_H start_POSTSUBSCRIPT ( end_POSTSUBSCRIPT under¯ start_ARG 0 end_ARG ) with full support

Throughout this subsection we assume that we deal with a GNP testing problem that is simple in the sense of Section  2 , with countable 𝒴 𝒴 \mathcal{Y} caligraphic_Y and ℋ ⁢ ( 0 ¯ ) ℋ ¯ 0 \mathcal{H}(\underline{0}) caligraphic_H ( under¯ start_ARG 0 end_ARG ) with full support. ‘Full support’ simply means that for all y ∈ 𝒴 𝑦 𝒴 y\in\mathcal{Y} italic_y ∈ caligraphic_Y , all P ∈ ℋ ⁢ ( 0 ¯ ) 𝑃 ℋ ¯ 0 P\in\mathcal{H}(\underline{0}) italic_P ∈ caligraphic_H ( under¯ start_ARG 0 end_ARG ) , we have P ⁢ ( Y = y ) > 0 𝑃 𝑌 𝑦 0 P(Y=y)>0 italic_P ( italic_Y = italic_y ) > 0 . Because the testing problem is simple, we can define maximum compatibility in terms of ( 16 ).

So, fix any simple GNP testing problem of this type. For any given random variables U = u ⁢ ( Y ) , V = v ⁢ ( Y ) formulae-sequence 𝑈 𝑢 𝑌 𝑉 𝑣 𝑌 U=u(Y),V=v(Y) italic_U = italic_u ( italic_Y ) , italic_V = italic_v ( italic_Y ) , we write U ≤ V 𝑈 𝑉 U\leq V italic_U ≤ italic_V as an abbreviation of: for all y ∈ 𝒴 𝑦 𝒴 y\in\mathcal{Y} italic_y ∈ caligraphic_Y , u ⁢ ( y ) ≤ v ⁢ ( y ) 𝑢 𝑦 𝑣 𝑦 u(y)\leq v(y) italic_u ( italic_y ) ≤ italic_v ( italic_y ) ; similarly for U < V 𝑈 𝑉 U<V italic_U < italic_V and U = V 𝑈 𝑉 U=V italic_U = italic_V .

Key Concept and Lemma: Equalizing Maximal Compatibility

For any function B : 𝒴 → ℬ : 𝐵 → 𝒴 ℬ B:\mathcal{Y}\rightarrow\mathcal{B} italic_B : caligraphic_Y → caligraphic_B , we say that decision rule δ ∘ superscript 𝛿 \delta^{\circ} italic_δ start_POSTSUPERSCRIPT ∘ end_POSTSUPERSCRIPT is equalizing-maximally compatible relative to e-variable S 𝑆 S italic_S when restricted to B 𝐵 B italic_B if

The following lemma is the key insight needed to prove Theorem  2 below and hence the special case of Theorem  1 in the main text with a simple GNP testing problem and countable 𝒴 𝒴 \mathcal{Y} caligraphic_Y . It will later be generalized to Lemma  2 which plays the same key role for the general result Theorem  3 .

Fix any simple GNP testing problem as above. Suppose that S 𝑆 S italic_S is a sharp e-variable and δ ∘ superscript 𝛿 \delta^{\circ} italic_δ start_POSTSUPERSCRIPT ∘ end_POSTSUPERSCRIPT is a Type-I risk safe decision rule such that there exists a function B : 𝒴 → ℬ normal-: 𝐵 normal-→ 𝒴 ℬ B:\mathcal{Y}\rightarrow\mathcal{B} italic_B : caligraphic_Y → caligraphic_B so that δ ∘ superscript 𝛿 \delta^{\circ} italic_δ start_POSTSUPERSCRIPT ∘ end_POSTSUPERSCRIPT is equalizing-maximally compatible relative to S 𝑆 S italic_S when restricted to B 𝐵 B italic_B , as above. Then δ ∘ superscript 𝛿 \delta^{\circ} italic_δ start_POSTSUPERSCRIPT ∘ end_POSTSUPERSCRIPT is fully compatible with S 𝑆 S italic_S , i.e. for all b ∈ ℬ 𝑏 ℬ b\in\mathcal{B} italic_b ∈ caligraphic_B , we have: L b ⁢ ( 0 ¯ , δ b ∘ ) ≤ S subscript 𝐿 𝑏 normal-¯ 0 subscript superscript 𝛿 𝑏 𝑆 L_{b}(\underline{0},\delta^{\circ}_{b})\leq S italic_L start_POSTSUBSCRIPT italic_b end_POSTSUBSCRIPT ( under¯ start_ARG 0 end_ARG , italic_δ start_POSTSUPERSCRIPT ∘ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_b end_POSTSUBSCRIPT ) ≤ italic_S .

To understand the lemma, suppose we are given some e-variable S 𝑆 S italic_S and some Type-I risk safe δ 𝛿 \delta italic_δ . Then δ 𝛿 \delta italic_δ need not be compatible with S 𝑆 S italic_S (it must be compatible with some S ′ superscript 𝑆 ′ S^{\prime} italic_S start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT , but not necessarily this S 𝑆 S italic_S ). The lemma says that if δ 𝛿 \delta italic_δ is in some specific sense ‘partially’ compatible with S 𝑆 S italic_S though, namely for a specific B 𝐵 B italic_B , and S 𝑆 S italic_S is sharp, then it must be fully compatible with S 𝑆 S italic_S after all. Now, in the case where the GNP testing problem has been made rich relative to S 𝑆 S italic_S by adding the special loss function id ⁢ ( S ) id 𝑆 \textsc{id}(S) id ( italic_S ) above, we would typically apply this lemma with B = id ⁢ ( S ) 𝐵 id 𝑆 B=\textsc{id}(S) italic_B = id ( italic_S ) , i.e. B 𝐵 B italic_B is a constant, independent of Y 𝑌 Y italic_Y ; but the lemma works even if B 𝐵 B italic_B may vary with Y 𝑌 Y italic_Y . The surprising thing here is that compatibility relative to B 𝐵 B italic_B (which may even be the constant id ⁢ ( S ) id 𝑆 \textsc{id}(S) id ( italic_S ) ) has repercussions for the behavior of δ b ′ ∘ subscript superscript 𝛿 superscript 𝑏 ′ \delta^{\circ}_{b^{\prime}} italic_δ start_POSTSUPERSCRIPT ∘ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_b start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT for all b ′ ∈ ℬ superscript 𝑏 ′ ℬ b^{\prime}\in\mathcal{B} italic_b start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ∈ caligraphic_B .

The lemma immediately leads to the following corollary:

Corollary 1

Fix any simple GNP testing problem as above. If a decision rule δ * superscript 𝛿 \delta^{*} italic_δ start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT is maximally compatible relative to a sharp e-variable S 𝑆 S italic_S (i.e. ( 16 ) holds) and equalizing-maximally compatible relative to the same S 𝑆 S italic_S when restricted to some function B : 𝒴 → ℬ normal-: 𝐵 normal-→ 𝒴 ℬ B:\mathcal{Y}\rightarrow\mathcal{B} italic_B : caligraphic_Y → caligraphic_B then any δ ∘ superscript 𝛿 \delta^{\circ} italic_δ start_POSTSUPERSCRIPT ∘ end_POSTSUPERSCRIPT that is equalizing-maximally compatible relative to S 𝑆 S italic_S when restricted to B 𝐵 B italic_B and Type-I risk safe must also be fully compatible with S 𝑆 S italic_S and hence, since δ * superscript 𝛿 \delta^{*} italic_δ start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT is maximally compatible, satisfy δ b ∘ ≤ δ b * subscript superscript 𝛿 𝑏 subscript superscript 𝛿 𝑏 \delta^{\circ}_{b}\leq\delta^{*}_{b} italic_δ start_POSTSUPERSCRIPT ∘ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_b end_POSTSUBSCRIPT ≤ italic_δ start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_b end_POSTSUBSCRIPT for all b ∈ ℬ 𝑏 ℬ b\in\mathcal{B} italic_b ∈ caligraphic_B .

Proof :  [of Lemma  2 ] By Proposition  1 , there must be some e-variable S ′ superscript 𝑆 ′ S^{\prime} italic_S start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT such that δ ∘ superscript 𝛿 \delta^{\circ} italic_δ start_POSTSUPERSCRIPT ∘ end_POSTSUPERSCRIPT is compatible with S ′ superscript 𝑆 ′ S^{\prime} italic_S start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT , i.e.

By equalizing-maximal compatibility relative to S 𝑆 S italic_S when restricted to B 𝐵 B italic_B , transitivity, and weakening ( 32 ), we must therefore also have

so that S ≤ S ′ 𝑆 superscript 𝑆 ′ S\leq S^{\prime} italic_S ≤ italic_S start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT . Suppose by means of contradiction that, even stronger, there is y ∈ 𝒴 𝑦 𝒴 y\in\mathcal{Y} italic_y ∈ caligraphic_Y such that S ⁢ ( y ) < S ′ ⁢ ( y ) 𝑆 𝑦 superscript 𝑆 ′ 𝑦 S(y)<S^{\prime}(y) italic_S ( italic_y ) < italic_S start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ( italic_y ) . We know that for some P 0 ∈ ℋ ⁢ ( 0 ¯ ) subscript 𝑃 0 ℋ ¯ 0 P_{0}\in\mathcal{H}(\underline{0}) italic_P start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ∈ caligraphic_H ( under¯ start_ARG 0 end_ARG ) , 𝐄 P 0 ⁢ [ S ] = 1 subscript 𝐄 subscript 𝑃 0 delimited-[] 𝑆 1 {\bf E}_{P_{0}}[S]=1 bold_E start_POSTSUBSCRIPT italic_P start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT end_POSTSUBSCRIPT [ italic_S ] = 1 . But then 𝐄 P 0 ⁢ [ S ′ ] > 1 subscript 𝐄 subscript 𝑃 0 delimited-[] superscript 𝑆 ′ 1 {\bf E}_{P_{0}}[S^{\prime}]>1 bold_E start_POSTSUBSCRIPT italic_P start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT end_POSTSUBSCRIPT [ italic_S start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ] > 1 , so S ′ superscript 𝑆 ′ S^{\prime} italic_S start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT is not an e-variable and we have arrived at a contradiction. Since we already established S ≤ S ′ 𝑆 superscript 𝑆 ′ S\leq S^{\prime} italic_S ≤ italic_S start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT it follows that S = S ′ 𝑆 superscript 𝑆 ′ S=S^{\prime} italic_S = italic_S start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT . But then using the inequality in ( 32 ) shows that for all b ∈ ℬ 𝑏 ℬ b\in\mathcal{B} italic_b ∈ caligraphic_B , we have L b ⁢ ( 0 ¯ , δ b ∘ ) ≤ S subscript 𝐿 𝑏 ¯ 0 superscript subscript 𝛿 𝑏 𝑆 L_{b}(\underline{0},\delta_{b}^{\circ})\leq S italic_L start_POSTSUBSCRIPT italic_b end_POSTSUBSCRIPT ( under¯ start_ARG 0 end_ARG , italic_δ start_POSTSUBSCRIPT italic_b end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∘ end_POSTSUPERSCRIPT ) ≤ italic_S and the lemma is proved. □ □ \Box □

Armed with this result, we can now state and prove a restricted version of Theorem  1 .

Consider any simple GNP testing problem as above, with countable 𝒴 𝒴 \mathcal{Y} caligraphic_Y and all P ∈ ℋ ⁢ ( 0 ¯ ) 𝑃 ℋ normal-¯ 0 P\in\mathcal{H}(\underline{0}) italic_P ∈ caligraphic_H ( under¯ start_ARG 0 end_ARG ) having full support on 𝒴 𝒴 \mathcal{Y} caligraphic_Y .

Suppose that decision rule δ 𝛿 \delta italic_δ is admissible. Then there exists an e-variable S 𝑆 S italic_S such that δ 𝛿 \delta italic_δ is maximally compatible with S 𝑆 S italic_S .

Suppose that S 𝑆 S italic_S is a sharp e-variable S 𝑆 S italic_S and δ * superscript 𝛿 \delta^{*} italic_δ start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT is maximally compatible relative to S 𝑆 S italic_S (such a δ * superscript 𝛿 \delta^{*} italic_δ start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT exists because we assume the GNP testing is simple); and assume further that the GNP testing problem is rich relative to S 𝑆 S italic_S . Then δ * superscript 𝛿 \delta^{*} italic_δ start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT is admissible.

Proof :  [of Theorem  2 ]

Part 1 . Suppose that δ 𝛿 \delta italic_δ is admissible. Then δ 𝛿 \delta italic_δ is by definition Type-I risk safe. By Proposition  1 there must be an e-variable S 𝑆 S italic_S such that δ 𝛿 \delta italic_δ is compatible with S 𝑆 S italic_S . Since δ 𝛿 \delta italic_δ is admissible, every strictly better δ ′ superscript 𝛿 ′ \delta^{\prime} italic_δ start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT is not Type-I risk safe, hence cannot be compatible with any e-variable; in particular, δ ′ superscript 𝛿 ′ \delta^{\prime} italic_δ start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT is not compatible with S 𝑆 S italic_S . Hence there exists no δ ′ superscript 𝛿 ′ \delta^{\prime} italic_δ start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT that is strictly better than δ 𝛿 \delta italic_δ and compatible with S 𝑆 S italic_S . It follows that δ 𝛿 \delta italic_δ is maximally compatible with S 𝑆 S italic_S .

Part 2 . Let δ * superscript 𝛿 \delta^{*} italic_δ start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT be maximally compatible and let δ ∘ superscript 𝛿 \delta^{\circ} italic_δ start_POSTSUPERSCRIPT ∘ end_POSTSUPERSCRIPT be another Type-I risk safe decision rule. We will show that δ ∘ superscript 𝛿 \delta^{\circ} italic_δ start_POSTSUPERSCRIPT ∘ end_POSTSUPERSCRIPT cannot be Type-II strictly better than δ * superscript 𝛿 \delta^{*} italic_δ start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT ; this implies the result.

By our S 𝑆 S italic_S -relative richness assumption and the construction of δ * superscript 𝛿 \delta^{*} italic_δ start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT , we know that there exists a function B : 𝒴 → ℬ : 𝐵 → 𝒴 ℬ B:\mathcal{Y}\rightarrow\mathcal{B} italic_B : caligraphic_Y → caligraphic_B with:

We may assume that δ ∘ superscript 𝛿 \delta^{\circ} italic_δ start_POSTSUPERSCRIPT ∘ end_POSTSUPERSCRIPT satisfies δ B * ≤ δ B ∘ superscript subscript 𝛿 𝐵 superscript subscript 𝛿 𝐵 \delta_{B}^{*}\leq\delta_{B}^{\circ} italic_δ start_POSTSUBSCRIPT italic_B end_POSTSUBSCRIPT start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT ≤ italic_δ start_POSTSUBSCRIPT italic_B end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∘ end_POSTSUPERSCRIPT , otherwise we already know that δ ∘ superscript 𝛿 \delta^{\circ} italic_δ start_POSTSUPERSCRIPT ∘ end_POSTSUPERSCRIPT is not Type-II strictly-better. Now suppose by means of contradiction that for some y ∈ 𝒴 𝑦 𝒴 y\in\mathcal{Y} italic_y ∈ caligraphic_Y , we have δ B ⁢ ( y ) * ⁢ ( y ) < δ B ⁢ ( y ) ∘ ⁢ ( y ) superscript subscript 𝛿 𝐵 𝑦 𝑦 superscript subscript 𝛿 𝐵 𝑦 𝑦 \delta_{B(y)}^{*}(y)<\delta_{B(y)}^{\circ}(y) italic_δ start_POSTSUBSCRIPT italic_B ( italic_y ) end_POSTSUBSCRIPT start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT ( italic_y ) < italic_δ start_POSTSUBSCRIPT italic_B ( italic_y ) end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∘ end_POSTSUPERSCRIPT ( italic_y ) . We then have for the P 0 ∈ ℋ ⁢ ( 0 ¯ ) subscript 𝑃 0 ℋ ¯ 0 P_{0}\in\mathcal{H}(\underline{0}) italic_P start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ∈ caligraphic_H ( under¯ start_ARG 0 end_ARG ) with 𝐄 P 0 ⁢ [ S ] = 1 subscript 𝐄 subscript 𝑃 0 delimited-[] 𝑆 1 {\bf E}_{P_{0}}[S]=1 bold_E start_POSTSUBSCRIPT italic_P start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT end_POSTSUBSCRIPT [ italic_S ] = 1 (which exists by sharpness) that

contradicting our assumption that δ ∘ superscript 𝛿 \delta^{\circ} italic_δ start_POSTSUPERSCRIPT ∘ end_POSTSUPERSCRIPT is Type-I risk safe. We may thus assume δ B ∘ = δ B * superscript subscript 𝛿 𝐵 superscript subscript 𝛿 𝐵 \delta_{B}^{\circ}=\delta_{B}^{*} italic_δ start_POSTSUBSCRIPT italic_B end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∘ end_POSTSUPERSCRIPT = italic_δ start_POSTSUBSCRIPT italic_B end_POSTSUBSCRIPT start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT .

The corollary of Lemma  1 above now implies that δ b ∘ ≤ δ b * superscript subscript 𝛿 𝑏 subscript superscript 𝛿 𝑏 \delta_{b}^{\circ}\leq\delta^{*}_{b} italic_δ start_POSTSUBSCRIPT italic_b end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∘ end_POSTSUPERSCRIPT ≤ italic_δ start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_b end_POSTSUBSCRIPT for all b ∈ ℬ 𝑏 ℬ b\in\mathcal{B} italic_b ∈ caligraphic_B (i.e. not just for B 𝐵 B italic_B ), hence δ ∘ superscript 𝛿 \delta^{\circ} italic_δ start_POSTSUPERSCRIPT ∘ end_POSTSUPERSCRIPT is not Type-II strictly better than δ * superscript 𝛿 \delta^{*} italic_δ start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT ; the theorem is proved. □ □ \Box □

B.2 Preparing the General Proof of Theorem  1 and  3

Almost sure inequality.

Fix any GNP decision problem with parameter set Θ Θ \Theta roman_Θ , as in the general Definition  2 . In particular in some applications we may have Θ = { 0 ¯ } Θ ¯ 0 \Theta=\{\underline{0}\} roman_Θ = { under¯ start_ARG 0 end_ARG } , then we really deal with a GNP testing problem and the notation ≤ θ subscript 𝜃 \leq_{\theta} ≤ start_POSTSUBSCRIPT italic_θ end_POSTSUBSCRIPT that we will now define can in such cases be replaced by ≤ 0 ¯ subscript ¯ 0 \leq_{\underline{0}} ≤ start_POSTSUBSCRIPT under¯ start_ARG 0 end_ARG end_POSTSUBSCRIPT .

0 U,V:\mathcal{Y}\rightarrow{\mathbb{R}}^{+}_{0} italic_U , italic_V : caligraphic_Y → blackboard_R start_POSTSUPERSCRIPT + end_POSTSUPERSCRIPT start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT we define

𝑉 𝑌 italic-ϵ y\in\mathcal{E},U(Y)>V(Y)+\epsilon italic_y ∈ caligraphic_E , italic_U ( italic_Y ) > italic_V ( italic_Y ) + italic_ϵ , we have P ⁢ ( ℰ ) = 0 𝑃 ℰ 0 P(\mathcal{E})=0 italic_P ( caligraphic_E ) = 0 . Similarly,

is defined to mean that U ⁢ ( Y ) ≤ θ V ⁢ ( Y ) subscript 𝜃 𝑈 𝑌 𝑉 𝑌 U(Y)\leq_{\theta}V(Y) italic_U ( italic_Y ) ≤ start_POSTSUBSCRIPT italic_θ end_POSTSUBSCRIPT italic_V ( italic_Y ) and there exist P ∈ ℋ ⁢ ( θ ) 𝑃 ℋ 𝜃 P\in\mathcal{H}(\theta) italic_P ∈ caligraphic_H ( italic_θ ) , ϵ > 0 italic-ϵ 0 \epsilon>0 italic_ϵ > 0 and measurable set ℰ ⊂ 𝒴 ℰ 𝒴 \mathcal{E}\subset\mathcal{Y} caligraphic_E ⊂ caligraphic_Y such that for all y ∈ ℰ , U ⁢ ( Y ) ≤ V ⁢ ( Y ) − ϵ formulae-sequence 𝑦 ℰ 𝑈 𝑌 𝑉 𝑌 italic-ϵ y\in\mathcal{E},U(Y)\leq V(Y)-\epsilon italic_y ∈ caligraphic_E , italic_U ( italic_Y ) ≤ italic_V ( italic_Y ) - italic_ϵ , and we have P ⁢ ( ℰ ) > 0 𝑃 ℰ 0 P(\mathcal{E})>0 italic_P ( caligraphic_E ) > 0 . Note that statements ( 34 ) and ( 35 ) are well-defined even if U 𝑈 U italic_U or V 𝑉 V italic_V are not measurable so that U ⁢ ( Y ) 𝑈 𝑌 U(Y) italic_U ( italic_Y ) or V ⁢ ( Y ) 𝑉 𝑌 V(Y) italic_V ( italic_Y ) are not random variables. Nevertheless, we shall abuse notation by abbreviating U ⁢ ( Y ) 𝑈 𝑌 U(Y) italic_U ( italic_Y ) to U 𝑈 U italic_U and V ⁢ ( Y ) 𝑉 𝑌 V(Y) italic_V ( italic_Y ) , just like we do for random variables. Note that, if the events inside the probabilities below are measurable after all, then we have

We also write U > θ V subscript 𝜃 𝑈 𝑉 U>_{\theta}V italic_U > start_POSTSUBSCRIPT italic_θ end_POSTSUBSCRIPT italic_V if it is not the case that U ≤ θ V subscript 𝜃 𝑈 𝑉 U\leq_{\theta}V italic_U ≤ start_POSTSUBSCRIPT italic_θ end_POSTSUBSCRIPT italic_V ; we write U ≥ θ V subscript 𝜃 𝑈 𝑉 U\geq_{\theta}V italic_U ≥ start_POSTSUBSCRIPT italic_θ end_POSTSUBSCRIPT italic_V if it is not the case that U < θ V subscript 𝜃 𝑈 𝑉 U<_{\theta}V italic_U < start_POSTSUBSCRIPT italic_θ end_POSTSUBSCRIPT italic_V ; and U = θ V subscript 𝜃 𝑈 𝑉 U=_{\theta}V italic_U = start_POSTSUBSCRIPT italic_θ end_POSTSUBSCRIPT italic_V if U ≤ θ V subscript 𝜃 𝑈 𝑉 U\leq_{\theta}V italic_U ≤ start_POSTSUBSCRIPT italic_θ end_POSTSUBSCRIPT italic_V and U ≥ θ V subscript 𝜃 𝑈 𝑉 U\geq_{\theta}V italic_U ≥ start_POSTSUBSCRIPT italic_θ end_POSTSUBSCRIPT italic_V ; if V 𝑉 V italic_V and U 𝑈 U italic_U are measurable then clearly the corresponding analogues to ( 36 ) hold as well, e.g.

It is easily checked that = θ subscript 𝜃 =_{\theta} = start_POSTSUBSCRIPT italic_θ end_POSTSUBSCRIPT establishes an equivalence relation on functions of Y 𝑌 Y italic_Y , and relative to this relation, ≤ θ subscript 𝜃 \leq_{\theta} ≤ start_POSTSUBSCRIPT italic_θ end_POSTSUBSCRIPT is a partial order and < θ subscript 𝜃 <_{\theta} < start_POSTSUBSCRIPT italic_θ end_POSTSUBSCRIPT is the corresponding strict order, i.e U < θ V subscript 𝜃 𝑈 𝑉 U<_{\theta}V italic_U < start_POSTSUBSCRIPT italic_θ end_POSTSUBSCRIPT italic_V iff U ≤ θ V subscript 𝜃 𝑈 𝑉 U\leq_{\theta}V italic_U ≤ start_POSTSUBSCRIPT italic_θ end_POSTSUBSCRIPT italic_V and not U = θ V subscript 𝜃 𝑈 𝑉 U=_{\theta}V italic_U = start_POSTSUBSCRIPT italic_θ end_POSTSUBSCRIPT italic_V . We shall freely use standard properties of this partial order (such as transitivity) below.

Moreover, we introduce the additional notation, for each function B : 𝒴 → ℬ : 𝐵 → 𝒴 ℬ B:\mathcal{Y}\rightarrow\mathcal{B} italic_B : caligraphic_Y → caligraphic_B , where, in line with the above, we abbreviate B ⁢ ( Y ) 𝐵 𝑌 B(Y) italic_B ( italic_Y ) to B 𝐵 B italic_B and δ B ⁢ ( Y ) ⁢ ( Y ) subscript 𝛿 𝐵 𝑌 𝑌 \delta_{B(Y)}(Y) italic_δ start_POSTSUBSCRIPT italic_B ( italic_Y ) end_POSTSUBSCRIPT ( italic_Y ) to δ B subscript 𝛿 𝐵 \delta_{B} italic_δ start_POSTSUBSCRIPT italic_B end_POSTSUBSCRIPT :

as such avoiding the cumbersome expression on the right whenever we can. Analogously,

and correspondingly with ≥ L subscript L \geq_{\textsc{L}} ≥ start_POSTSUBSCRIPT L end_POSTSUBSCRIPT and > L subscript L >_{\textsc{L}} > start_POSTSUBSCRIPT L end_POSTSUBSCRIPT . Finally, δ B ∘ = L δ B subscript L superscript subscript 𝛿 𝐵 subscript 𝛿 𝐵 \delta_{B}^{\circ}=_{\textsc{L}}\delta_{B} italic_δ start_POSTSUBSCRIPT italic_B end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∘ end_POSTSUPERSCRIPT = start_POSTSUBSCRIPT L end_POSTSUBSCRIPT italic_δ start_POSTSUBSCRIPT italic_B end_POSTSUBSCRIPT is defined to be equivalent to ‘ δ B ∘ ≥ L δ B subscript L superscript subscript 𝛿 𝐵 subscript 𝛿 𝐵 \delta_{B}^{\circ}\geq_{\textsc{L}}\delta_{B} italic_δ start_POSTSUBSCRIPT italic_B end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∘ end_POSTSUPERSCRIPT ≥ start_POSTSUBSCRIPT L end_POSTSUBSCRIPT italic_δ start_POSTSUBSCRIPT italic_B end_POSTSUBSCRIPT and not δ B ∘ > L δ B subscript L superscript subscript 𝛿 𝐵 subscript 𝛿 𝐵 \delta_{B}^{\circ}>_{\textsc{L}}\delta_{B} italic_δ start_POSTSUBSCRIPT italic_B end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∘ end_POSTSUPERSCRIPT > start_POSTSUBSCRIPT L end_POSTSUBSCRIPT italic_δ start_POSTSUBSCRIPT italic_B end_POSTSUBSCRIPT ’; similarly for ‘ δ ∘ = L δ subscript L superscript 𝛿 𝛿 \delta^{\circ}=_{\textsc{L}}\delta italic_δ start_POSTSUPERSCRIPT ∘ end_POSTSUPERSCRIPT = start_POSTSUBSCRIPT L end_POSTSUBSCRIPT italic_δ ’.

Generalized Admissibility, Maximal Compatibility

We can now generalize the definitions of Type-II strictly-better-than and admissibility for general GNP decision problems in the main text: formally, we say decision rule δ 𝛿 \delta italic_δ is Type-II strictly better than δ ∘ superscript 𝛿 \delta^{\circ} italic_δ start_POSTSUPERSCRIPT ∘ end_POSTSUPERSCRIPT , simply if we have

As before, a decision rule δ ∘ superscript 𝛿 \delta^{\circ} italic_δ start_POSTSUPERSCRIPT ∘ end_POSTSUPERSCRIPT is admissible if it is Type-I risk safe (according to the definition underneath ( 19 ) in the main text), and there is no other Type-I risk safe decision rule that is Type-II strictly better than δ 𝛿 \delta italic_δ .

We also extend the definition of maximally compatible decision rule in the same way: formally, a maximally compatible decision rule relative to a given GNP decision problem and e-variable S 𝑆 S italic_S is any compatible decision rule δ ∘ superscript 𝛿 \delta^{\circ} italic_δ start_POSTSUPERSCRIPT ∘ end_POSTSUPERSCRIPT which further satisfies that there is no other decision rule δ 𝛿 \delta italic_δ that is also compatible with S 𝑆 S italic_S and that is Type-II strictly better than δ ∘ superscript 𝛿 \delta^{\circ} italic_δ start_POSTSUPERSCRIPT ∘ end_POSTSUPERSCRIPT , with the extended definition of Type-II strictly better given above.

We see that in the case of a GNP testing problem, whenever the events L b ⁢ ( 0 ¯ , δ b ∘ ) > L b ⁢ ( 0 ¯ , δ b ) subscript 𝐿 𝑏 ¯ 0 subscript superscript 𝛿 𝑏 subscript 𝐿 𝑏 ¯ 0 subscript 𝛿 𝑏 L_{b}(\underline{0},\delta^{\circ}_{b})>L_{b}(\underline{0},\delta_{b}) italic_L start_POSTSUBSCRIPT italic_b end_POSTSUBSCRIPT ( under¯ start_ARG 0 end_ARG , italic_δ start_POSTSUPERSCRIPT ∘ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_b end_POSTSUBSCRIPT ) > italic_L start_POSTSUBSCRIPT italic_b end_POSTSUBSCRIPT ( under¯ start_ARG 0 end_ARG , italic_δ start_POSTSUBSCRIPT italic_b end_POSTSUBSCRIPT ) are measurable for all b ∈ ℬ 𝑏 ℬ b\in\mathcal{B} italic_b ∈ caligraphic_B (in particular whenever 𝒴 𝒴 \mathcal{Y} caligraphic_Y is countable), the probabilities in ( 14 ) and ( 15 ) are well-defined and then the definition of strictly-better-than coincides with the one given in the main text. But it continues to be valid in case we pick pathological ℬ ℬ \mathcal{B} caligraphic_B , { L b : b ∈ ℬ } conditional-set subscript 𝐿 𝑏 𝑏 ℬ \{L_{b}:b\in\mathcal{B}\} { italic_L start_POSTSUBSCRIPT italic_b end_POSTSUBSCRIPT : italic_b ∈ caligraphic_B } for which the events above are nonmeasurable — which cannot be ruled out since we made no restrictions on the functions L b subscript 𝐿 𝑏 L_{b} italic_L start_POSTSUBSCRIPT italic_b end_POSTSUBSCRIPT , δ ∘ superscript 𝛿 \delta^{\circ} italic_δ start_POSTSUPERSCRIPT ∘ end_POSTSUPERSCRIPT and δ 𝛿 \delta italic_δ . Similarly, by replacing the definition ( 12 ) in the main text by ( 13 ) we also make sure that Type-I risk safety is well-defined irrespective of whether sup b ∈ ℬ L b ⁢ ( 0 ¯ , δ b ⁢ ( Y ) ) subscript supremum 𝑏 ℬ subscript 𝐿 𝑏 ¯ 0 subscript 𝛿 𝑏 𝑌 \sup_{b\in\mathcal{B}}L_{b}(\underline{0},\delta_{b}(Y)) roman_sup start_POSTSUBSCRIPT italic_b ∈ caligraphic_B end_POSTSUBSCRIPT italic_L start_POSTSUBSCRIPT italic_b end_POSTSUBSCRIPT ( under¯ start_ARG 0 end_ARG , italic_δ start_POSTSUBSCRIPT italic_b end_POSTSUBSCRIPT ( italic_Y ) ) is measurable or not (we could also have avoided such measurability issues using inner- and outer-measure [ 5 , Section 1.3] but this does not simplify the treatment so we decided against it).

In the same way, for a general GNP decision problem, the definition of strictly-better-than given here generalizes the one given in the main text below ( 20 ) to the case where the events involved may be nonmeasurable; as a consequence, the definitions of admissibility for GNP testing and decision problems, and the definition of maximum compatibility relative to S 𝑆 S italic_S for GNP testing problems given in the main text, are all generalized by the definitions given here based on the generalized notion of strictly-better-than, and are valid irrespective of the measurability of the functions and events involved.

Crucially for the proof of Lemma  2 below, the ordering relation ≤ 0 ¯ subscript ¯ 0 \leq_{\underline{0}} ≤ start_POSTSUBSCRIPT under¯ start_ARG 0 end_ARG end_POSTSUBSCRIPT is strong enough to imply inequality in expectation:

Proposition 2

Consider a GNP testing problem with null hypothesis ℋ ⁢ ( 0 ¯ ) ℋ normal-¯ 0 \mathcal{H}(\underline{0}) caligraphic_H ( under¯ start_ARG 0 end_ARG ) such that all P ∈ ℋ ⁢ ( 0 ¯ ) 𝑃 ℋ normal-¯ 0 P\in\mathcal{H}(\underline{0}) italic_P ∈ caligraphic_H ( under¯ start_ARG 0 end_ARG ) are absolutely mutually continuous. Let S = S ⁢ ( Y ) 𝑆 𝑆 𝑌 S=S(Y) italic_S = italic_S ( italic_Y ) and S ′ = S ′ ⁢ ( Y ) superscript 𝑆 normal-′ superscript 𝑆 normal-′ 𝑌 S^{\prime}=S^{\prime}(Y) italic_S start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT = italic_S start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ( italic_Y ) be nonnegative random variables such that for all P ∈ ℋ ⁢ ( 0 ¯ ) 𝑃 ℋ normal-¯ 0 P\in\mathcal{H}(\underline{0}) italic_P ∈ caligraphic_H ( under¯ start_ARG 0 end_ARG ) , 𝐄 P ⁢ [ S ] subscript 𝐄 𝑃 delimited-[] 𝑆 {\bf E}_{P}[S] bold_E start_POSTSUBSCRIPT italic_P end_POSTSUBSCRIPT [ italic_S ] is finite. Suppose S ≤ 0 ¯ S ′ subscript normal-¯ 0 𝑆 superscript 𝑆 normal-′ S\leq_{\underline{0}}S^{\prime} italic_S ≤ start_POSTSUBSCRIPT under¯ start_ARG 0 end_ARG end_POSTSUBSCRIPT italic_S start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT . Then (a) for all P ∈ ℋ ⁢ ( 0 ¯ ) 𝑃 ℋ normal-¯ 0 P\in\mathcal{H}(\underline{0}) italic_P ∈ caligraphic_H ( under¯ start_ARG 0 end_ARG ) we have 𝐄 P ⁢ [ S ] ≤ 𝐄 P ⁢ [ S ′ ] subscript 𝐄 𝑃 delimited-[] 𝑆 subscript 𝐄 𝑃 delimited-[] superscript 𝑆 normal-′ {\bf E}_{P}[S]\leq{\bf E}_{P}[S^{\prime}] bold_E start_POSTSUBSCRIPT italic_P end_POSTSUBSCRIPT [ italic_S ] ≤ bold_E start_POSTSUBSCRIPT italic_P end_POSTSUBSCRIPT [ italic_S start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ] . Further, suppose S < 0 ¯ S ′ subscript normal-¯ 0 𝑆 superscript 𝑆 normal-′ S<_{\underline{0}}S^{\prime} italic_S < start_POSTSUBSCRIPT under¯ start_ARG 0 end_ARG end_POSTSUBSCRIPT italic_S start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT . Then (b) for every P ∈ ℋ ⁢ ( 0 ¯ ) 𝑃 ℋ normal-¯ 0 P\in\mathcal{H}(\underline{0}) italic_P ∈ caligraphic_H ( under¯ start_ARG 0 end_ARG ) we have 𝐄 P ⁢ [ S ] < 𝐄 P ⁢ [ S ′ ] subscript 𝐄 𝑃 delimited-[] 𝑆 subscript 𝐄 𝑃 delimited-[] superscript 𝑆 normal-′ {\bf E}_{P}[S]<{\bf E}_{P}[S^{\prime}] bold_E start_POSTSUBSCRIPT italic_P end_POSTSUBSCRIPT [ italic_S ] < bold_E start_POSTSUBSCRIPT italic_P end_POSTSUBSCRIPT [ italic_S start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ] .

superscript 𝑆 ′ 𝑦 italic-ϵ S(y)>S^{\prime}(y)+\epsilon italic_S ( italic_y ) > italic_S start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ( italic_y ) + italic_ϵ , we have

and the result follows.

superscript 𝑆 ′ 𝑦 italic-ϵ S(y)>S^{\prime}(y)+\epsilon italic_S ( italic_y ) > italic_S start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ( italic_y ) + italic_ϵ , we have P ⁢ ( ℰ ¯ ) = 1 𝑃 ¯ ℰ 1 P(\bar{\mathcal{E}})=1 italic_P ( over¯ start_ARG caligraphic_E end_ARG ) = 1 (with ⋅ ¯ ¯ ⋅ \bar{\cdot} over¯ start_ARG ⋅ end_ARG denoting complement), whereas there exist δ > 0 𝛿 0 \delta>0 italic_δ > 0 and Q ∈ ℋ ⁢ ( 0 ¯ ) 𝑄 ℋ ¯ 0 Q\in\mathcal{H}(\underline{0}) italic_Q ∈ caligraphic_H ( under¯ start_ARG 0 end_ARG ) and measurable ℱ ℱ \mathcal{F} caligraphic_F such that S ⁢ ( y ) < S ′ ⁢ ( y ) − δ 𝑆 𝑦 superscript 𝑆 ′ 𝑦 𝛿 S(y)<S^{\prime}(y)-\delta italic_S ( italic_y ) < italic_S start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ( italic_y ) - italic_δ on ℱ ℱ \mathcal{F} caligraphic_F and Q ⁢ ( ℱ ) > 0 𝑄 ℱ 0 Q(\mathcal{F})>0 italic_Q ( caligraphic_F ) > 0 . By mutual absolute continuity, we have P ⁢ ( ℱ ) > 0 𝑃 ℱ 0 P(\mathcal{F})>0 italic_P ( caligraphic_F ) > 0 as well, and therefore:

Since this holds for fixed δ > 0 𝛿 0 \delta>0 italic_δ > 0 and for every ϵ > 0 italic-ϵ 0 \epsilon>0 italic_ϵ > 0 , the result follows. □ □ \Box □

B.3 General Form of Equalizing Maximal Compatibility Lemma

Consider any GNP testing problem. Fix any function B : 𝒴 → ℬ : 𝐵 → 𝒴 ℬ B:\mathcal{Y}\rightarrow\mathcal{B} italic_B : caligraphic_Y → caligraphic_B . Generalizing ( 31 ), we say that decision rule δ ∘ superscript 𝛿 \delta^{\circ} italic_δ start_POSTSUPERSCRIPT ∘ end_POSTSUPERSCRIPT is a.s. equalizing-maximally compatible relative to e-variable S 𝑆 S italic_S when restricted to B 𝐵 B italic_B if

where here and below, ‘a.s.’ stands for ‘almost surely’. The following lemma, generalizing Lemma  1 , is the key insight needed to prove Theorem  3 below and hence its simplification Theorem  1 in the main text.

Fix any GNP testing problem. Suppose that S 𝑆 S italic_S is a sharp e-variable and δ ∘ superscript 𝛿 \delta^{\circ} italic_δ start_POSTSUPERSCRIPT ∘ end_POSTSUPERSCRIPT is a Type-I risk safe decision rule such that there exists a function B : 𝒴 → ℬ normal-: 𝐵 normal-→ 𝒴 ℬ B:\mathcal{Y}\rightarrow\mathcal{B} italic_B : caligraphic_Y → caligraphic_B so that δ ∘ superscript 𝛿 \delta^{\circ} italic_δ start_POSTSUPERSCRIPT ∘ end_POSTSUPERSCRIPT is a.s. equalizing-maximally compatible relative to S 𝑆 S italic_S when restricted to B 𝐵 B italic_B , as in ( 38 ). Then δ ∘ superscript 𝛿 \delta^{\circ} italic_δ start_POSTSUPERSCRIPT ∘ end_POSTSUPERSCRIPT is a.s. fully compatible with S 𝑆 S italic_S , i.e. for all b ∈ ℬ 𝑏 ℬ b\in\mathcal{B} italic_b ∈ caligraphic_B , L b ⁢ ( 0 ¯ , δ b ∘ ) ≤ 0 ¯ S subscript normal-¯ 0 subscript 𝐿 𝑏 normal-¯ 0 subscript superscript 𝛿 𝑏 𝑆 L_{b}(\underline{0},\delta^{\circ}_{b})\leq_{\underline{0}}S italic_L start_POSTSUBSCRIPT italic_b end_POSTSUBSCRIPT ( under¯ start_ARG 0 end_ARG , italic_δ start_POSTSUPERSCRIPT ∘ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_b end_POSTSUBSCRIPT ) ≤ start_POSTSUBSCRIPT under¯ start_ARG 0 end_ARG end_POSTSUBSCRIPT italic_S .

Just like in the simplified case with countable 𝒴 𝒴 \mathcal{Y} caligraphic_Y , this immediately leads to a relevant corollary:

Corollary 2

Fix any GNP testing problem. If a decision rule δ * superscript 𝛿 \delta^{*} italic_δ start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT is maximally compatible relative to a sharp e-variable S 𝑆 S italic_S and a.s. equalizing-maximally compatible when restricted to some function B : 𝒴 → ℬ normal-: 𝐵 normal-→ 𝒴 ℬ B:\mathcal{Y}\rightarrow\mathcal{B} italic_B : caligraphic_Y → caligraphic_B then any δ ∘ superscript 𝛿 \delta^{\circ} italic_δ start_POSTSUPERSCRIPT ∘ end_POSTSUPERSCRIPT that is a.s. equalizing-maximally compatible relative to S 𝑆 S italic_S when restricted to B 𝐵 B italic_B and Type-I risk safe must also be a.s. fully compatible with S 𝑆 S italic_S , i.e. for all b ∈ ℬ 𝑏 ℬ b\in\mathcal{B} italic_b ∈ caligraphic_B , L b ⁢ ( 0 ¯ , δ b ∘ ) ≤ 0 ¯ L b ⁢ ( 0 ¯ , δ b * ) subscript normal-¯ 0 subscript 𝐿 𝑏 normal-¯ 0 subscript superscript 𝛿 𝑏 subscript 𝐿 𝑏 normal-¯ 0 superscript subscript 𝛿 𝑏 L_{b}(\underline{0},\delta^{\circ}_{b})\leq_{\underline{0}}L_{b}(\underline{0}% ,\delta_{b}^{*}) italic_L start_POSTSUBSCRIPT italic_b end_POSTSUBSCRIPT ( under¯ start_ARG 0 end_ARG , italic_δ start_POSTSUPERSCRIPT ∘ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_b end_POSTSUBSCRIPT ) ≤ start_POSTSUBSCRIPT under¯ start_ARG 0 end_ARG end_POSTSUBSCRIPT italic_L start_POSTSUBSCRIPT italic_b end_POSTSUBSCRIPT ( under¯ start_ARG 0 end_ARG , italic_δ start_POSTSUBSCRIPT italic_b end_POSTSUBSCRIPT start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT ) .

By a.s. equalizing-maximal compatibility relative to S 𝑆 S italic_S when restricted to B 𝐵 B italic_B , transitivity, and weakening ( 39 ), we must therefore also have

so that S ≤ 0 ¯ S ′ subscript ¯ 0 𝑆 superscript 𝑆 ′ S\leq_{\underline{0}}S^{\prime} italic_S ≤ start_POSTSUBSCRIPT under¯ start_ARG 0 end_ARG end_POSTSUBSCRIPT italic_S start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT . Suppose by means of contradiction that, even stronger, S < 0 ¯ S ′ subscript ¯ 0 𝑆 superscript 𝑆 ′ S<_{\underline{0}}S^{\prime} italic_S < start_POSTSUBSCRIPT under¯ start_ARG 0 end_ARG end_POSTSUBSCRIPT italic_S start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT . We know that for some P 0 ∈ ℋ ⁢ ( 0 ¯ ) subscript 𝑃 0 ℋ ¯ 0 P_{0}\in\mathcal{H}(\underline{0}) italic_P start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ∈ caligraphic_H ( under¯ start_ARG 0 end_ARG ) , 𝐄 P 0 ⁢ [ S ] = 1 subscript 𝐄 subscript 𝑃 0 delimited-[] 𝑆 1 {\bf E}_{P_{0}}[S]=1 bold_E start_POSTSUBSCRIPT italic_P start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT end_POSTSUBSCRIPT [ italic_S ] = 1 . But then Proposition  2 gives that 𝐄 P 0 ⁢ [ S ′ ] > 1 subscript 𝐄 subscript 𝑃 0 delimited-[] superscript 𝑆 ′ 1 {\bf E}_{P_{0}}[S^{\prime}]>1 bold_E start_POSTSUBSCRIPT italic_P start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT end_POSTSUBSCRIPT [ italic_S start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ] > 1 , so S ′ superscript 𝑆 ′ S^{\prime} italic_S start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT is not an e-variable and we have arrived at a contradiction. Since we already established S ≤ 0 ¯ S ′ subscript ¯ 0 𝑆 superscript 𝑆 ′ S\leq_{\underline{0}}S^{\prime} italic_S ≤ start_POSTSUBSCRIPT under¯ start_ARG 0 end_ARG end_POSTSUBSCRIPT italic_S start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT it follows that S = 0 ¯ S ′ subscript ¯ 0 𝑆 superscript 𝑆 ′ S=_{\underline{0}}S^{\prime} italic_S = start_POSTSUBSCRIPT under¯ start_ARG 0 end_ARG end_POSTSUBSCRIPT italic_S start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT . But then using the inequality in ( 39 ) gives for all b ∈ ℬ 𝑏 ℬ b\in\mathcal{B} italic_b ∈ caligraphic_B , L b ⁢ ( 0 ¯ , δ b ∘ ) ≤ 0 ¯ S subscript ¯ 0 subscript 𝐿 𝑏 ¯ 0 subscript superscript 𝛿 𝑏 𝑆 L_{b}(\underline{0},\delta^{\circ}_{b})\leq_{\underline{0}}S italic_L start_POSTSUBSCRIPT italic_b end_POSTSUBSCRIPT ( under¯ start_ARG 0 end_ARG , italic_δ start_POSTSUPERSCRIPT ∘ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_b end_POSTSUBSCRIPT ) ≤ start_POSTSUBSCRIPT under¯ start_ARG 0 end_ARG end_POSTSUBSCRIPT italic_S , and the lemma is proved. □ □ \Box □

B.4 Extension of Theorem  1 to general GNP decision problems

First, we extend the definitions of richness and sharpness from GNP testing to decision problems in the obvious manner, by inserting ‘for all’ quantifiers: we say that a GNP decision problem is rich relative to e-collection { S θ : θ ∈ Θ } conditional-set subscript 𝑆 𝜃 𝜃 Θ \{S_{\theta}:\theta\in\Theta\} { italic_S start_POSTSUBSCRIPT italic_θ end_POSTSUBSCRIPT : italic_θ ∈ roman_Θ } if for all θ ∈ Θ 𝜃 Θ \theta\in\Theta italic_θ ∈ roman_Θ , the corresponding θ 𝜃 \theta italic_θ -testing problem (as defined in the main text underneath Definition  2 ) is rich relative to S θ subscript 𝑆 𝜃 S_{\theta} italic_S start_POSTSUBSCRIPT italic_θ end_POSTSUBSCRIPT . Relative to a given GNP decision problem, we say that e-collection { S θ : θ ∈ Θ } conditional-set subscript 𝑆 𝜃 𝜃 Θ \{S_{\theta}:\theta\in\Theta\} { italic_S start_POSTSUBSCRIPT italic_θ end_POSTSUBSCRIPT : italic_θ ∈ roman_Θ } is sharp if for all θ ∈ Θ 𝜃 Θ \theta\in\Theta italic_θ ∈ roman_Θ , S θ subscript 𝑆 𝜃 S_{\theta} italic_S start_POSTSUBSCRIPT italic_θ end_POSTSUBSCRIPT is sharp relative to the corresponding θ 𝜃 \theta italic_θ -testing problem.

Consider any GNP decision problem.

Suppose that decision rule δ 𝛿 \delta italic_δ is admissible. Then there exists an e-collection S = { S θ : θ ∈ Θ } S conditional-set subscript 𝑆 𝜃 𝜃 Θ \textsc{\rm S}=\{S_{\theta}:\theta\in\Theta\} S = { italic_S start_POSTSUBSCRIPT italic_θ end_POSTSUBSCRIPT : italic_θ ∈ roman_Θ } such that δ 𝛿 \delta italic_δ is maximally compatible with S .

Suppose that δ * superscript 𝛿 \delta^{*} italic_δ start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT is a maximally compatible decision rule relative to some e-collection S = { S θ : θ ∈ Θ } S conditional-set subscript 𝑆 𝜃 𝜃 Θ \textsc{\rm S}=\{S_{\theta}:\theta\in\Theta\} S = { italic_S start_POSTSUBSCRIPT italic_θ end_POSTSUBSCRIPT : italic_θ ∈ roman_Θ } . If (a) all P ∈ 𝒫 𝑃 𝒫 P\in\mathcal{P} italic_P ∈ caligraphic_P are mutually absolutely continuous and (b) S is sharp relative to the given GNP decision problem, and (c) the GNP decision problem is rich relative to S , then δ * superscript 𝛿 \delta^{*} italic_δ start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT is admissible.

Proof :  [of Theorem  3 ]

Part 1 . Suppose that δ 𝛿 \delta italic_δ is admissible. Then δ 𝛿 \delta italic_δ is by definition Type-I risk safe. But then by the definition in the main text underneath ( 19 ) there must be an e-collection S = { S θ : θ ∈ Θ } S conditional-set subscript 𝑆 𝜃 𝜃 Θ \textsc{\rm S}=\{S_{\theta}:\theta\in\Theta\} S = { italic_S start_POSTSUBSCRIPT italic_θ end_POSTSUBSCRIPT : italic_θ ∈ roman_Θ } such that δ 𝛿 \delta italic_δ is compatible with S . Since δ 𝛿 \delta italic_δ is admissible, for every δ ′ superscript 𝛿 ′ \delta^{\prime} italic_δ start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT with δ ′ > L δ subscript L superscript 𝛿 ′ 𝛿 \delta^{\prime}>_{\textsc{L}}\delta italic_δ start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT > start_POSTSUBSCRIPT L end_POSTSUBSCRIPT italic_δ (i.e. δ ′ superscript 𝛿 ′ \delta^{\prime} italic_δ start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT is strictly better than δ 𝛿 \delta italic_δ ) we have that δ ′ superscript 𝛿 ′ \delta^{\prime} italic_δ start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT is not Type-I risk safe. But then δ ′ superscript 𝛿 ′ \delta^{\prime} italic_δ start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT cannot be compatible with any e-collection, in particular it cannot be compatible with S . Hence, there exist no δ ′ superscript 𝛿 ′ \delta^{\prime} italic_δ start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT that is strictly better than δ 𝛿 \delta italic_δ and also compatible with S ; hence δ 𝛿 \delta italic_δ is maximally compatible.

Let δ * superscript 𝛿 \delta^{*} italic_δ start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT be maximally compatible relative to S and let δ ∘ superscript 𝛿 \delta^{\circ} italic_δ start_POSTSUPERSCRIPT ∘ end_POSTSUPERSCRIPT be another Type-I risk safe decision rule. We will show that δ ∘ superscript 𝛿 \delta^{\circ} italic_δ start_POSTSUPERSCRIPT ∘ end_POSTSUPERSCRIPT cannot be Type-II strictly better than δ * superscript 𝛿 \delta^{*} italic_δ start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT ; this implies the result.

By our relative richness assumption and the construction of δ * superscript 𝛿 \delta^{*} italic_δ start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT , we know that for all θ ∈ Θ 𝜃 Θ \theta\in\Theta italic_θ ∈ roman_Θ , there exists a function B : 𝒴 → ℬ : 𝐵 → 𝒴 ℬ B:\mathcal{Y}\rightarrow\mathcal{B} italic_B : caligraphic_Y → caligraphic_B with:

We may assume that δ ∘ superscript 𝛿 \delta^{\circ} italic_δ start_POSTSUPERSCRIPT ∘ end_POSTSUPERSCRIPT satisfies δ B ∘ ≥ L δ B * subscript L superscript subscript 𝛿 𝐵 superscript subscript 𝛿 𝐵 \delta_{B}^{\circ}\geq_{\textsc{L}}\delta_{B}^{*} italic_δ start_POSTSUBSCRIPT italic_B end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∘ end_POSTSUPERSCRIPT ≥ start_POSTSUBSCRIPT L end_POSTSUBSCRIPT italic_δ start_POSTSUBSCRIPT italic_B end_POSTSUBSCRIPT start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT , otherwise we already know that it’s not Type-II strictly-better. So, in particular, L B ⁢ ( θ , δ B ∘ ) ≥ θ L B ⁢ ( θ , δ B * ) subscript 𝜃 subscript 𝐿 𝐵 𝜃 superscript subscript 𝛿 𝐵 subscript 𝐿 𝐵 𝜃 superscript subscript 𝛿 𝐵 L_{B}(\theta,\delta_{B}^{\circ})\geq_{\theta}L_{B}(\theta,\delta_{B}^{*}) italic_L start_POSTSUBSCRIPT italic_B end_POSTSUBSCRIPT ( italic_θ , italic_δ start_POSTSUBSCRIPT italic_B end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∘ end_POSTSUPERSCRIPT ) ≥ start_POSTSUBSCRIPT italic_θ end_POSTSUBSCRIPT italic_L start_POSTSUBSCRIPT italic_B end_POSTSUBSCRIPT ( italic_θ , italic_δ start_POSTSUBSCRIPT italic_B end_POSTSUBSCRIPT start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT ) Now suppose by means of contradiction that L B ⁢ ( θ , δ B ∘ ) > θ L B ⁢ ( θ , δ B * ) subscript 𝜃 subscript 𝐿 𝐵 𝜃 superscript subscript 𝛿 𝐵 subscript 𝐿 𝐵 𝜃 superscript subscript 𝛿 𝐵 L_{B}(\theta,\delta_{B}^{\circ})>_{\theta}L_{B}(\theta,\delta_{B}^{*}) italic_L start_POSTSUBSCRIPT italic_B end_POSTSUBSCRIPT ( italic_θ , italic_δ start_POSTSUBSCRIPT italic_B end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∘ end_POSTSUPERSCRIPT ) > start_POSTSUBSCRIPT italic_θ end_POSTSUBSCRIPT italic_L start_POSTSUBSCRIPT italic_B end_POSTSUBSCRIPT ( italic_θ , italic_δ start_POSTSUBSCRIPT italic_B end_POSTSUBSCRIPT start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT ) for some θ ∈ Θ 𝜃 Θ \theta\in\Theta italic_θ ∈ roman_Θ . Since δ ∘ superscript 𝛿 \delta^{\circ} italic_δ start_POSTSUPERSCRIPT ∘ end_POSTSUPERSCRIPT is Type-I risk safe, it must by definition also be compatible with an e-collection S ′ = { S θ ′ : θ ∈ Θ } superscript S ′ conditional-set subscript superscript 𝑆 ′ 𝜃 𝜃 Θ \textsc{\rm S}^{\prime}=\{S^{\prime}_{\theta}:\theta\in\Theta\} S start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT = { italic_S start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_θ end_POSTSUBSCRIPT : italic_θ ∈ roman_Θ } so then we also have S θ ′ > θ L B ⁢ ( θ , δ B * ) subscript 𝜃 subscript superscript 𝑆 ′ 𝜃 subscript 𝐿 𝐵 𝜃 superscript subscript 𝛿 𝐵 S^{\prime}_{\theta}>_{\theta}L_{B}(\theta,\delta_{B}^{*}) italic_S start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_θ end_POSTSUBSCRIPT > start_POSTSUBSCRIPT italic_θ end_POSTSUBSCRIPT italic_L start_POSTSUBSCRIPT italic_B end_POSTSUBSCRIPT ( italic_θ , italic_δ start_POSTSUBSCRIPT italic_B end_POSTSUBSCRIPT start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT ) . By Proposition  2 , using the assumption of mutual absolute continuity, we then have for the P ∈ ℋ ⁢ ( θ ) 𝑃 ℋ 𝜃 P\in\mathcal{H}(\theta) italic_P ∈ caligraphic_H ( italic_θ ) with 𝐄 P ⁢ [ S θ ] = 1 subscript 𝐄 𝑃 delimited-[] subscript 𝑆 𝜃 1 {\bf E}_{P}[S_{\theta}]=1 bold_E start_POSTSUBSCRIPT italic_P end_POSTSUBSCRIPT [ italic_S start_POSTSUBSCRIPT italic_θ end_POSTSUBSCRIPT ] = 1 (which must exist by sharpness) that

so S θ ′ subscript superscript 𝑆 ′ 𝜃 S^{\prime}_{\theta} italic_S start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_θ end_POSTSUBSCRIPT is not an e-variable and hence S ′ superscript S ′ \textsc{\rm S}^{\prime} S start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT is not an e-collection, contradicting our assumptions (we note that all quantities inside the equation must be measurable, because S θ subscript 𝑆 𝜃 S_{\theta} italic_S start_POSTSUBSCRIPT italic_θ end_POSTSUBSCRIPT an S θ ′ subscript superscript 𝑆 ′ 𝜃 S^{\prime}_{\theta} italic_S start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_θ end_POSTSUBSCRIPT are both e-variables, and hence measurable by definition). We may thus assume L ⁢ ( θ , δ B ∘ ) = θ L ⁢ ( θ , δ B * ) subscript 𝜃 𝐿 𝜃 superscript subscript 𝛿 𝐵 𝐿 𝜃 superscript subscript 𝛿 𝐵 L(\theta,\delta_{B}^{\circ})=_{\theta}L(\theta,\delta_{B}^{*}) italic_L ( italic_θ , italic_δ start_POSTSUBSCRIPT italic_B end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∘ end_POSTSUPERSCRIPT ) = start_POSTSUBSCRIPT italic_θ end_POSTSUBSCRIPT italic_L ( italic_θ , italic_δ start_POSTSUBSCRIPT italic_B end_POSTSUBSCRIPT start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT ) .

The corollary of Lemma  2 above, applied with the corresponding θ 𝜃 \theta italic_θ -GNP testing problem, now implies that for all b ∈ ℬ 𝑏 ℬ b\in\mathcal{B} italic_b ∈ caligraphic_B (hence not just for B 𝐵 B italic_B !) we have L b ⁢ ( θ , δ b ∘ ) ≤ θ L b ⁢ ( θ , δ b * ) subscript 𝜃 subscript 𝐿 𝑏 𝜃 subscript superscript 𝛿 𝑏 subscript 𝐿 𝑏 𝜃 subscript superscript 𝛿 𝑏 L_{b}(\theta,\delta^{\circ}_{b})\leq_{\theta}L_{b}(\theta,\delta^{*}_{b}) italic_L start_POSTSUBSCRIPT italic_b end_POSTSUBSCRIPT ( italic_θ , italic_δ start_POSTSUPERSCRIPT ∘ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_b end_POSTSUBSCRIPT ) ≤ start_POSTSUBSCRIPT italic_θ end_POSTSUBSCRIPT italic_L start_POSTSUBSCRIPT italic_b end_POSTSUBSCRIPT ( italic_θ , italic_δ start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_b end_POSTSUBSCRIPT ) . Since we can make this argument for all θ ∈ Θ 𝜃 Θ \theta\in\Theta italic_θ ∈ roman_Θ , it follows that for all b ∈ ℬ 𝑏 ℬ b\in\mathcal{B} italic_b ∈ caligraphic_B , δ b ∘ ≤ L δ b * subscript L superscript subscript 𝛿 𝑏 superscript subscript 𝛿 𝑏 \delta_{b}^{\circ}\leq_{\textsc{L}}\delta_{b}^{*} italic_δ start_POSTSUBSCRIPT italic_b end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∘ end_POSTSUPERSCRIPT ≤ start_POSTSUBSCRIPT L end_POSTSUBSCRIPT italic_δ start_POSTSUBSCRIPT italic_b end_POSTSUBSCRIPT start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT . Therefore δ ∘ superscript 𝛿 \delta^{\circ} italic_δ start_POSTSUPERSCRIPT ∘ end_POSTSUPERSCRIPT is not Type-II strictly better than δ * superscript 𝛿 \delta^{*} italic_δ start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT ; the theorem is proved. □ □ \Box □

Appendix C Supporting Information for Section  3 . 3.2

Proof for claim underneath ( 26 ).

Fix some ϵ > 0 italic-ϵ 0 \epsilon>0 italic_ϵ > 0 . For simplicity we fix θ * = 0 superscript 𝜃 0 \theta^{*}=0 italic_θ start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT = 0 and n = 1 𝑛 1 n=1 italic_n = 1 (so that Y = X 1 = θ ^ ⁢ ( X 1 ) 𝑌 subscript 𝑋 1 ^ 𝜃 subscript 𝑋 1 Y=X_{1}=\hat{\theta}(X_{1}) italic_Y = italic_X start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT = over^ start_ARG italic_θ end_ARG ( italic_X start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ) ;) extension of the following argument to general sampling distributions θ * superscript 𝜃 \theta^{*} italic_θ start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT and n > 1 𝑛 1 n>1 italic_n > 1 is straightforward (for θ * ≠ 0 superscript 𝜃 0 \theta^{*}\neq 0 italic_θ start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT ≠ 0 , use Y ′ = Y − θ * superscript 𝑌 ′ 𝑌 superscript 𝜃 Y^{\prime}=Y-\theta^{*} italic_Y start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT = italic_Y - italic_θ start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT ; for n > 1 𝑛 1 n>1 italic_n > 1 , simply adjust the variance).

𝑦 subscript 𝑔 0 𝑦 1/\alpha=1/(2F_{0}(-y+g_{0}(y))) 1 / italic_α = 1 / ( 2 italic_F start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ( - italic_y + italic_g start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ( italic_y ) ) )

If data are actually sampled from θ * = 0 superscript 𝜃 0 \theta^{*}=0 italic_θ start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT = 0 , then the expected loss we actually make can be calculated in steps as follows:

𝑦 2 2 2 \exp(-yg_{0}(y))=(y+\exp(2)-2)^{-2} roman_exp ( - italic_y italic_g start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ( italic_y ) ) = ( italic_y + roman_exp ( 2 ) - 2 ) start_POSTSUPERSCRIPT - 2 end_POSTSUPERSCRIPT . In the table in the main text we took the former choice to see how a typical sample of the B 𝐵 B italic_B ’s, and corresponding α 𝛼 \alpha italic_α ’s and CI’s might look like.

Proof that the average in ( 27 ) may converge to infinity with probability 1

subscript 𝑌 𝑗 superscript italic-ϵ 2 subscript 𝑌 𝑗 B_{(j)}:=1/(2F_{0}(-Y_{(j)}+\epsilon^{2}/Y_{(j)})) italic_B start_POSTSUBSCRIPT ( italic_j ) end_POSTSUBSCRIPT := 1 / ( 2 italic_F start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ( - italic_Y start_POSTSUBSCRIPT ( italic_j ) end_POSTSUBSCRIPT + italic_ϵ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT / italic_Y start_POSTSUBSCRIPT ( italic_j ) end_POSTSUBSCRIPT ) ) as above if Y ( j ) ≥ ϵ subscript 𝑌 𝑗 italic-ϵ Y_{(j)}\geq\epsilon italic_Y start_POSTSUBSCRIPT ( italic_j ) end_POSTSUBSCRIPT ≥ italic_ϵ and B ( j ) = 1 subscript 𝐵 𝑗 1 B_{(j)}=1 italic_B start_POSTSUBSCRIPT ( italic_j ) end_POSTSUBSCRIPT = 1 otherwise. Suppose that the Y ( j ) subscript 𝑌 𝑗 Y_{(j)} italic_Y start_POSTSUBSCRIPT ( italic_j ) end_POSTSUBSCRIPT are all independently sampled from the same θ * = 0 superscript 𝜃 0 \theta^{*}=0 italic_θ start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT = 0 . Here is a sample (generated i.i.d. by R ) of 20 corresponding B ( j ) subscript 𝐵 𝑗 B_{(j)} italic_B start_POSTSUBSCRIPT ( italic_j ) end_POSTSUBSCRIPT (recall that for each j 𝑗 j italic_j , the corresponding produced interval δ B ( j ) ⁢ ( Y ( j ) ) subscript 𝛿 subscript 𝐵 𝑗 subscript 𝑌 𝑗 \delta_{B_{(j)}}(Y_{(j)}) italic_δ start_POSTSUBSCRIPT italic_B start_POSTSUBSCRIPT ( italic_j ) end_POSTSUBSCRIPT end_POSTSUBSCRIPT ( italic_Y start_POSTSUBSCRIPT ( italic_j ) end_POSTSUBSCRIPT ) is equal to the standard ( 1 − α ( j ) ) 1 subscript 𝛼 𝑗 (1-\alpha_{(j)}) ( 1 - italic_α start_POSTSUBSCRIPT ( italic_j ) end_POSTSUBSCRIPT ) -confidence interval, with α ( j ) = 1 / B ( j ) subscript 𝛼 𝑗 1 subscript 𝐵 𝑗 \alpha_{(j)}=1/B_{(j)} italic_α start_POSTSUBSCRIPT ( italic_j ) end_POSTSUBSCRIPT = 1 / italic_B start_POSTSUBSCRIPT ( italic_j ) end_POSTSUBSCRIPT ):

While the sequence looks rather innocuous, using ( 26 ), with A 𝐴 A italic_A in θ ^ ± A plus-or-minus ^ 𝜃 𝐴 \hat{\theta}\pm A over^ start_ARG italic_θ end_ARG ± italic_A chosen by ( 25 ), we see the limit in ( 27 ) will go a.s. to ∞ \infty ∞ rather than to 1 1 1 1 . The example was deliberately designed to give an extreme discrepancy — in more realistic examples, the difference will presumably not be infinite but without knowing the dependency between Y 𝑌 Y italic_Y and B 𝐵 B italic_B there is no way to assess it.

Proof for Example  4

\theta^{-}<\theta<\theta^{+} italic_θ start_POSTSUPERSCRIPT - end_POSTSUPERSCRIPT < italic_θ < italic_θ start_POSTSUPERSCRIPT + end_POSTSUPERSCRIPT be

(note that log ⁡ ( 2 / α * ) 2 superscript 𝛼 \log(2/\alpha^{*}) roman_log ( 2 / italic_α start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT ) in ( 43 ) in the main text has been replaced by log ⁡ ( 1 / α * ) 1 superscript 𝛼 \log(1/\alpha^{*}) roman_log ( 1 / italic_α start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT ) here).

𝑦 subscript 𝑝 𝜃 𝑦 S^{+}_{\theta}:=\frac{p_{\theta^{+}}(y)}{p_{\theta}(y)} italic_S start_POSTSUPERSCRIPT + end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_θ end_POSTSUBSCRIPT := divide start_ARG italic_p start_POSTSUBSCRIPT italic_θ start_POSTSUPERSCRIPT + end_POSTSUPERSCRIPT end_POSTSUBSCRIPT ( italic_y ) end_ARG start_ARG italic_p start_POSTSUBSCRIPT italic_θ end_POSTSUBSCRIPT ( italic_y ) end_ARG (respectively, S θ − := p θ − ⁢ ( y ) p θ ⁢ ( y ) assign subscript superscript 𝑆 𝜃 subscript 𝑝 superscript 𝜃 𝑦 subscript 𝑝 𝜃 𝑦 S^{-}_{\theta}:=\frac{p_{\theta^{-}}(y)}{p_{\theta}(y)} italic_S start_POSTSUPERSCRIPT - end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_θ end_POSTSUBSCRIPT := divide start_ARG italic_p start_POSTSUBSCRIPT italic_θ start_POSTSUPERSCRIPT - end_POSTSUPERSCRIPT end_POSTSUBSCRIPT ( italic_y ) end_ARG start_ARG italic_p start_POSTSUBSCRIPT italic_θ end_POSTSUBSCRIPT ( italic_y ) end_ARG ). Straightforward rewriting now gives:

𝜃 S^{+}_{\theta} italic_S start_POSTSUPERSCRIPT + end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_θ end_POSTSUBSCRIPT is strictly increasing in θ ^ ^ 𝜃 \hat{\theta} over^ start_ARG italic_θ end_ARG , so it is ≥ 1 / α absent 1 𝛼 \geq 1/\alpha ≥ 1 / italic_α iff θ ^ ≥ θ R ^ 𝜃 subscript 𝜃 𝑅 \hat{\theta}\geq\theta_{R} over^ start_ARG italic_θ end_ARG ≥ italic_θ start_POSTSUBSCRIPT italic_R end_POSTSUBSCRIPT , where θ R subscript 𝜃 𝑅 \theta_{R} italic_θ start_POSTSUBSCRIPT italic_R end_POSTSUBSCRIPT is the solution to

Straightforward calculation shows that this is the case iff θ R − θ subscript 𝜃 𝑅 𝜃 \theta_{R}-\theta italic_θ start_POSTSUBSCRIPT italic_R end_POSTSUBSCRIPT - italic_θ is equal to

An analogous calculation gives that S θ − subscript superscript 𝑆 𝜃 S^{-}_{\theta} italic_S start_POSTSUPERSCRIPT - end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_θ end_POSTSUBSCRIPT is decreasing in θ ^ ^ 𝜃 \hat{\theta} over^ start_ARG italic_θ end_ARG and ≥ 1 / α absent 1 𝛼 \geq 1/\alpha ≥ 1 / italic_α iff θ ^ ≤ θ L ^ 𝜃 subscript 𝜃 𝐿 \hat{\theta}\leq\theta_{L} over^ start_ARG italic_θ end_ARG ≤ italic_θ start_POSTSUBSCRIPT italic_L end_POSTSUBSCRIPT , with θ − θ L 𝜃 subscript 𝜃 𝐿 \theta-\theta_{L} italic_θ - italic_θ start_POSTSUBSCRIPT italic_L end_POSTSUBSCRIPT equal to ( 45 ).

𝜃 1 2 subscript superscript 𝑆 𝜃 S_{\theta}=(1/2)S^{+}_{\theta}+(1/2)S^{-}_{\theta} italic_S start_POSTSUBSCRIPT italic_θ end_POSTSUBSCRIPT = ( 1 / 2 ) italic_S start_POSTSUPERSCRIPT + end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_θ end_POSTSUBSCRIPT + ( 1 / 2 ) italic_S start_POSTSUPERSCRIPT - end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_θ end_POSTSUBSCRIPT , a sufficient condition for S θ ≥ 1 / α subscript 𝑆 𝜃 1 𝛼 S_{\theta}\geq 1/\alpha italic_S start_POSTSUBSCRIPT italic_θ end_POSTSUBSCRIPT ≥ 1 / italic_α is then that

𝜃 subscript superscript 𝑆 𝜃 S^{+}_{\theta},S^{-}_{\theta} italic_S start_POSTSUPERSCRIPT + end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_θ end_POSTSUBSCRIPT , italic_S start_POSTSUPERSCRIPT - end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_θ end_POSTSUBSCRIPT , S θ subscript 𝑆 𝜃 S_{\theta} italic_S start_POSTSUBSCRIPT italic_θ end_POSTSUBSCRIPT and c 𝑐 c italic_c are now defined as in the main text, and a sufficient condition for S θ ≥ 1 / α subscript 𝑆 𝜃 1 𝛼 S_{\theta}\geq 1/\alpha italic_S start_POSTSUBSCRIPT italic_θ end_POSTSUBSCRIPT ≥ 1 / italic_α is that

𝜃 S^{+}_{\theta} italic_S start_POSTSUPERSCRIPT + end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_θ end_POSTSUBSCRIPT and S θ − subscript superscript 𝑆 𝜃 S^{-}_{\theta} italic_S start_POSTSUPERSCRIPT - end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_θ end_POSTSUBSCRIPT interchanged), we find that ( 46 ) is quite tight in practice.

Appendix D Supporting Information for Section  5

On the type-i risk upper bound ℓ ℓ \ell roman_ℓ.

Here we discuss why normalizing ℓ ℓ \ell roman_ℓ to 1 in ( 12 ) and ( 13 ) is not harmful, and what we mean when we say (as we did in the discussion Section  5 ) that ‘ ℓ ℓ \ell roman_ℓ can be chosen differently from problem to problem, but it needs to be chosen independently of the data observed in that problem’.

Thus, your e-value based statistical hypothesis tests have a Neymanian inductive behavior interpretation: as long as the bounds ℓ ( j ) subscript ℓ 𝑗 \ell_{(j)} roman_ℓ start_POSTSUBSCRIPT ( italic_j ) end_POSTSUBSCRIPT themselves do not depend on data Y ( j ) subscript 𝑌 𝑗 Y_{(j)} italic_Y start_POSTSUBSCRIPT ( italic_j ) end_POSTSUBSCRIPT , then in the long run, among all tests in which the imposed bound was within δ 𝛿 \delta italic_δ of ℓ * superscript ℓ \ell^{*} roman_ℓ start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT , you will achieve average loss that is also within δ 𝛿 \delta italic_δ of ℓ * superscript ℓ \ell^{*} roman_ℓ start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT . In particular the normalization to ℓ ( j ) = 1 subscript ℓ 𝑗 1 \ell_{(j)}=1 roman_ℓ start_POSTSUBSCRIPT ( italic_j ) end_POSTSUBSCRIPT = 1 in the definitions in Section  2 does not affect this guarantee.

D.1 Evidential Interpretation of E-Values

Most practitioners still interpret p-values in a Fisherian way, as a notion of evidence against the null. Although this interpretation has always been highly controversial, it is to some extent, and with caveats (such as ‘single isolated small p-value does not give substantial evidence’ [ 29 ] or ‘only work with special, evidential p-values [ 12 ] ’), adopted by highly accomplished statisticians, including the late Sir David Cox [ 7 , 30 ] . Even Neyman [ 33 ] has written ‘my own preferred substitute for ‘do not reject H 𝐻 H italic_H ’ is ‘no evidence against H 𝐻 H italic_H is found’. In light of the results of this paper, one may ask if, perhaps, e-values are more suitable than p-values as such a measure. Although a proper analysis of such a claim warrants (at the very least) a separate paper, we briefly make the case here. At first sight this question may seem orthogonal to the Neymanian ‘inductive behavior’ stance adopted in this paper— as has often been noted [ 2 , 21 , 4 , 20 ] , Fisher’s and Neyman’s interpretations of testing seem incompatible. Nevertheless (echoing a point made by error statisticians [ 29 ] and likelihoodists [ 39 ] alike), for any notion of ‘evidence the data provide about a hypothesis ℋ ⁢ ( 0 ¯ ) ℋ ¯ 0 \mathcal{H}(\underline{0}) caligraphic_H ( under¯ start_ARG 0 end_ARG ) ’ to be meaningful at all, there have to be circumstances, perhaps idealized, in which additional knowledge k is available, and together with k , the evidence can be operationalized into reliable decisions (for, if there were no such circumstances, obtaining ‘high evidence’ for or against a claim could never have any empirical meaning whatsoever…). For the likelihoodists’s notion of evidence [ 39 , 11 ] , i.e. a likelihood ratio between simple ℋ ⁢ ( 0 ¯ ) ℋ ¯ 0 \mathcal{H}(\underline{0}) caligraphic_H ( under¯ start_ARG 0 end_ARG ) and ℋ ⁢ ( 1 ¯ ) ℋ ¯ 1 \mathcal{H}(\underline{1}) caligraphic_H ( under¯ start_ARG 1 end_ARG ) , this additional knowledge k would be a trustworthy prior probability on { ℋ ⁢ ( 0 ¯ ) , ℋ ⁢ ( 1 ¯ ) } ℋ ¯ 0 ℋ ¯ 1 \{\mathcal{H}(\underline{0}),\mathcal{H}(\underline{1})\} { caligraphic_H ( under¯ start_ARG 0 end_ARG ) , caligraphic_H ( under¯ start_ARG 1 end_ARG ) } — once this is supplied, a DM can use Bayes’ theorem to come up with a posterior which can then lead to optimal decisions against arbitrary loss functions. For the notion of evidence against ℋ ⁢ ( 0 ¯ ) ℋ ¯ 0 \mathcal{H}(\underline{0}) caligraphic_H ( under¯ start_ARG 0 end_ARG ) as a p-value, this k would comprise a guarantee that a specific, a priori fixed and known sampling plan, would have been followed (otherwise the p-value would be undefined), and an a priori specified α 𝛼 \alpha italic_α , and knowledge that the decision would be of the simple form ‘accept’/‘reject’. This k , however, is additional knowledge of a very specific kind (essentially, what we called the BIND assumption). In other situations, it is not clear at all how to operationalize evidence-by-p-value into decisions. Now, if we accept e-values as evidence against the null, the set of circumstances under which we can operationalize the evidence is much wider, as shown in this paper. Having thus direct empirical content in a wider variety of situations, e 𝑒 e italic_e would seem preferable over p (a).

Note that I am not saying that evidence should invariably be a ‘stepping-stone’ towards a decision 1 1 1 Thanks to a referee for prompting this important clarification. ; evidence seems a more general notion than that. I am only saying that if there are broad sets of circumstances in which it is a stepping stone, this may be a good rather than a bad thing.

Add to this: (b) if ℋ ⁢ ( 0 ¯ ) ℋ ¯ 0 \mathcal{H}(\underline{0}) caligraphic_H ( under¯ start_ARG 0 end_ARG ) and ℋ ⁢ ( 1 ¯ ) ℋ ¯ 1 \mathcal{H}(\underline{1}) caligraphic_H ( under¯ start_ARG 1 end_ARG ) are simple, the e-value coincides with the likelihood ratio, i.e. the main competing notion of evidence; (c) if ℋ ⁢ ( 0 ¯ ) ℋ ¯ 0 \mathcal{H}(\underline{0}) caligraphic_H ( under¯ start_ARG 0 end_ARG ) is simple yet ℋ ⁢ ( 1 ¯ ) ℋ ¯ 1 \mathcal{H}(\underline{1}) caligraphic_H ( under¯ start_ARG 1 end_ARG ) is not, a special type of e-value coincides with a recently proposed Bayesian notion of evidence (the support interval [ 40 , 53 ] ); (d) unlike Bayesian methods, e-values can be constructed even if no clear alternative can be formulated and if the setting is highly nonparametric; and (e) in contrast to p-values, e-values remain meaningful if some details of the sampling plan are unknown or unknowable and if information from several interdependent studies is combined [ 13 , 37 ] . In fact, this may be the most important observation: if a scientific study is performed, and, because the scientific study seemed promising, a second study was performed, then we would lie to report the evidence against the null provided by both studies taken together. Yet, while for e-values this is no problem (we can multiply the e-values of the individual studies), it is next to impossible to calculate a valid p-value for the two studies taken together — this is the main point of [ 13 ] . The fact that they cannot be calculated in such a standard scenario would seem to make them unsuitable as a notion of evidence. If we tae (a)—(d) together though, the case for e-values as evidence seems strong.

A similar comment pertains to Mayo’s error statistics philosophy with its concept of severe testing [ 30 , 29 , 59 ] : currently, Mayo’s notion of severity is, at least in simple cases, indirectly based on p-values [ 29 , page 144] . In light of the above, it might be preferable to use e-values instead.

IMAGES

  1. PPT

    what are the types hypothesis

  2. 8 Different Types of Hypotheses (Plus Essential Facts)

    what are the types hypothesis

  3. Research Hypothesis: Definition, Types, Examples and Quick Tips

    what are the types hypothesis

  4. Types of hypothesis

    what are the types hypothesis

  5. Hypothesis and its types

    what are the types hypothesis

  6. PPT

    what are the types hypothesis

VIDEO

  1. Types of Hypothesis #hypothesis #research

  2. Sample Questions and Answers- Part 3 ,Research Methodology ,BBA 3rd sem,MG University

  3. Types of Hypothesis|English| #researchpapers #hypothesis

  4. What is Hypothesis? And it's types. #hypothesis

  5. Types of Hypothesis difference between Directional hypothesis and Non-directional hypothesis?

  6. Hypothesis and it's types, One-tailed test, Two-tailed test, Type-1 & Type-II error in Urdu

COMMENTS

  1. What is a Hypothesis

    Types of Hypothesis are as follows: Research Hypothesis. A research hypothesis is a statement that predicts a relationship between variables. It is usually formulated as a specific statement that can be tested through research, and it is often used in scientific research to guide the design of experiments.

  2. 13 Different Types of Hypothesis (2024)

    A simple hypothesis is a hypothesis that predicts a correlation between two test variables: an independent and a dependent variable. This is the easiest and most straightforward type of hypothesis. You simply need to state an expected correlation between the dependant variable and the independent variable.

  3. What is a Research Hypothesis: How to Write it, Types, and Examples

    The types of hypothesis chosen will depend on the research question and the objective of the study. Research hypothesis examples Here are some good research hypothesis examples: "The use of a specific type of therapy will lead to a reduction in symptoms of depression in individuals with a history of major depressive disorder."

  4. Research Hypothesis: Definition, Types, Examples and Quick Tips

    Simple hypothesis. A simple hypothesis is a statement made to reflect the relation between exactly two variables. One independent and one dependent. Consider the example, "Smoking is a prominent cause of lung cancer." The dependent variable, lung cancer, is dependent on the independent variable, smoking. 4.

  5. Research Hypothesis In Psychology: Types, & Examples

    A research hypothesis, in its plural form "hypotheses," is a specific, testable prediction about the anticipated results of a study, established at its outset. It is a key component of the scientific method. Hypotheses connect theory to data and guide the research process towards expanding scientific understanding.

  6. Types of Research Hypotheses

    There are seven different types of research hypotheses. Simple Hypothesis. A simple hypothesis predicts the relationship between a single dependent variable and a single independent variable. Complex Hypothesis. A complex hypothesis predicts the relationship between two or more independent and dependent variables. Directional Hypothesis.

  7. How to Write a Strong Hypothesis

    5. Phrase your hypothesis in three ways. To identify the variables, you can write a simple prediction in if…then form. The first part of the sentence states the independent variable and the second part states the dependent variable. If a first-year student starts attending more lectures, then their exam scores will improve.

  8. How to Write a Great Hypothesis

    Simple hypothesis: This type of hypothesis suggests that there is a relationship between one independent variable and one dependent variable.; Complex hypothesis: This type of hypothesis suggests a relationship between three or more variables, such as two independent variables and a dependent variable.; Null hypothesis: This hypothesis suggests no relationship exists between two or more variables.

  9. 8 Different Types of Hypotheses (Plus Essential Facts)

    Types Alternative Hypothesis. Also known as a maintained hypothesis or a research hypothesis, an alternative hypothesis is the exact opposite of a null hypothesis, and it is often used in statistical hypothesis testing. There are four main types of alternative hypothesis: Point alternative hypothesis. This hypothesis occurs when the population ...

  10. What a Hypothesis Is and How to Formulate One

    A hypothesis is a prediction of what will be found at the outcome of a research project and is typically focused on the relationship between two different variables studied in the research. It is usually based on both theoretical expectations about how things work and already existing scientific evidence. Within social science, a hypothesis can ...

  11. What Is A Research Hypothesis? A Simple Definition

    A research hypothesis (also called a scientific hypothesis) is a statement about the expected outcome of a study (for example, a dissertation or thesis). To constitute a quality hypothesis, the statement needs to have three attributes - specificity, clarity and testability. Let's take a look at these more closely.

  12. Hypothesis Testing

    There are 5 main steps in hypothesis testing: State your research hypothesis as a null hypothesis and alternate hypothesis (H o) and (H a or H 1 ). Collect data in a way designed to test the hypothesis. Perform an appropriate statistical test. Decide whether to reject or fail to reject your null hypothesis. Present the findings in your results ...

  13. Scientific hypothesis

    hypothesis. science. scientific hypothesis, an idea that proposes a tentative explanation about a phenomenon or a narrow set of phenomena observed in the natural world. The two primary features of a scientific hypothesis are falsifiability and testability, which are reflected in an "If…then" statement summarizing the idea and in the ...

  14. 9 Types of Hypothesis

    Null Hypothesis. A hypothesis predicts the relationship between independent and dependent variables. An independent variable is something you change as part of an experiment such as the amount of water given to a plant. A dependent variable is something that is predicted to change as a result such as the growth rate of a plant.

  15. Hypothesis

    The hypothesis of Andreas Cellarius, showing the planetary motions in eccentric and epicyclical orbits.. A hypothesis (pl.: hypotheses) is a proposed explanation for a phenomenon.For a hypothesis to be a scientific hypothesis, the scientific method requires that one can test it. Scientists generally base scientific hypotheses on previous observations that cannot satisfactorily be explained ...

  16. Null & Alternative Hypotheses

    The alternative hypothesis (H a) is the other answer to your research question. It claims that there's an effect in the population. Often, your alternative hypothesis is the same as your research hypothesis. In other words, it's the claim that you expect or hope will be true. The alternative hypothesis is the complement to the null hypothesis.

  17. What is a Research Hypothesis and How to Write a Hypothesis

    2. Writing a hypothesis as an 'if-then' statement: Include the action and the expected outcome of your study by following a 'if-then' structure. 3. Defining the variables: Define the variables as Dependent or Independent based on their dependency to other factors. 4. Scrutinizing the hypothesis: Identify the type of your hypothesis

  18. What Is a Hypothesis and How Do I Write One?

    The goal of this type of hypothesis is to test the causal relationship between the independent and dependent variable. It's fairly simple, and each hypothesis can vary in how detailed it can be. We create if-then hypotheses all the time with our daily predictions. Here are some examples of hypotheses that use an if-then structure from daily life:

  19. What Is a Hypothesis? (With Types, Examples and FAQS)

    The hypothesis is one of the early steps in the entire scientific process. Scientists form a hypothesis after asking a question and conducting their initial research. Once they made a hypothesis, scientists can then conduct their research or experiment, analyze the results and assess the validity of their hypothesis statement. Learn about what ...

  20. What is Hypothesis

    Functions of Hypothesis. Following are the functions performed by the hypothesis: Hypothesis helps in making an observation and experiments possible. It becomes the start point for the investigation. Hypothesis helps in verifying the observations. It helps in directing the inquiries in the right direction.

  21. Types of Hypothesis Testing

    9.5: Types of Hypothesis Testing. There are three types of hypothesis tests: right-tailed, left-tailed, and two-tailed. When the null and alternative hypotheses are stated, it is observed that the null hypothesis is a neutral statement against which the alternative hypothesis is tested. The alternative hypothesis is a claim that instead has a ...

  22. What is Hypothesis

    Hypothesis is a testable statement that explains what is happening or observed. It proposes the relation between the various participating variables. Hypothesis is also called Theory, Thesis, Guess, Assumption, or Suggestion. Hypothesis creates a structure that guides the search for knowledge. In this article, we will learn what a is hypothesis ...

  23. Hypothesis Definition

    Types of Hypothesis. The hypothesis can be broadly classified into different types. They are: Simple Hypothesis. A simple hypothesis is a hypothesis that there exists a relationship between two variables. One is called a dependent variable, and the other is called an independent variable. Complex Hypothesis.

  24. Beyond Neyman-Pearson: e-values enable hypothesis testing with a data

    A standard practice in statistical hypothesis testing is to mention the p-value alongside the accept/reject decision. We show the advantages of mentioning an e-value instead. With p-values, it is not clear how to use an extreme observation (e.g. much-less-than p 𝛼 \textsc {p}\ll\alpha p ≪ italic_α) for getting better frequentist decisions.