Logo for Open Oregon Educational Resources

1 Hypothesis Testing

Biology is a science, but what exactly is science? What does the study of biology share with other scientific disciplines?  Science  (from the Latin scientia, meaning “knowledge”) can be defined as knowledge about the natural world.

Biologists study the living world by posing questions about it and seeking science-based responses. This approach is common to other sciences as well and is often referred to as the scientific method . The scientific process was used even in ancient times, but it was first documented by England’s Sir Francis Bacon (1561–1626) ( Figure 1 ), who set up inductive methods for scientific inquiry. The scientific method is not exclusively used by biologists but can be applied to almost anything as a logical problem solving method.

a painting of a guy wearing historical clothing

The scientific process typically starts with an observation  (often a problem to be solved) that leads to a question.  Science is very good at answering questions having to do with observations about the natural world, but is very bad at answering questions having to do with purely moral questions, aesthetic questions, personal opinions, or what can be generally categorized as spiritual questions. Science has cannot investigate these areas because they are outside the realm of material phenomena, the phenomena of matter and energy, and cannot be observed and measured.



• What is the optimum temperature for the growth of E. coli bacteria? • How tall is Santa Claus?
• Do birds prefer bird feeders of a specific color? • Do angels exist?
• What is the cause of this disease? • Which is better: classical music or rock and roll?
• How effective is this drug in treating this disease? • What are the ethical implications of human cloning?

Let’s think about a simple problem that starts with an observation and apply the scientific method to solve the problem. Imagine that one morning when you wake up and flip a the switch to turn on your bedside lamp, the light won’t turn on. That is an observation that also describes a problem: the lights won’t turn on. Of course, you would next ask the question: “Why won’t the light turn on?”

A hypothesis  is a suggested explanation that can be tested. A hypothesis is NOT the question you are trying to answer – it is what you think the answer to the question will be and why .  Several hypotheses may be proposed as answers to one question. For example, one hypothesis about the question “Why won’t the light turn on?” is “The light won’t turn on because the bulb is burned out.” There are also other possible answers to the question, and therefore other hypotheses may be proposed. A second hypothesis is “The light won’t turn on because the lamp is unplugged” or “The light won’t turn on because the power is out.” A hypothesis should be based on credible background information. A hypothesis is NOT just a guess (not even an educated one), although it can be based on your prior experience (such as in the example where the light won’t turn on). In general, hypotheses in biology should be based on a credible, referenced source of information.

A hypothesis must be testable to ensure that it is valid. For example, a hypothesis that depends on what a dog thinks is not testable, because we can’t tell what a dog thinks. It should also be  falsifiable,  meaning that it can be disproven by experimental results. An example of an unfalsifiable hypothesis is “Red is a better color than blue.” There is no experiment that might show this statement to be false. To test a hypothesis, a researcher will conduct one or more experiments designed to eliminate one or more of the hypotheses. This is important: a hypothesis can be disproven, or eliminated, but it can never be proven.  If an experiment fails to disprove a hypothesis, then that explanation (the hypothesis) is supported as the answer to the question. However, that doesn’t mean that later on, we won’t find a better explanation or design a better experiment that will disprove the first hypothesis and lead to a better one.

A variable is any part of the experiment that can vary or change during the experiment. Typically, an experiment only tests one variable and all the other conditions in the experiment are held constant.

  • The variable that is being changed or tested is known as the  independent variable .
  • The  dependent variable  is the thing (or things) that you are measuring as the outcome of your experiment.
  • A  constant  is a condition that is the same between all of the tested groups.
  • A confounding variable  is a condition that is not held constant that could affect the experimental results.

Let’s start with the first hypothesis given above for the light bulb experiment: the bulb is burned out. When testing this hypothesis, the independent variable (the thing that you are testing) would be changing the light bulb and the dependent variable is whether or not the light turns on.

  • HINT: You should be able to put your identified independent and dependent variables into the phrase “dependent depends on independent”. If you say “whether or not the light turns on depends on changing the light bulb” this makes sense and describes this experiment. In contrast, if you say “changing the light bulb depends on whether or not the light turns on” it doesn’t make sense.

It would be important to hold all the other aspects of the environment constant, for example not messing with the lamp cord or trying to turn the lamp on using a different light switch. If the entire house had lost power during the experiment because a car hit the power pole, that would be a confounding variable.

You may have learned that a hypothesis can be phrased as an “If..then…” statement. Simple hypotheses can be phrased that way (but they must always also include a “because”), but more complicated hypotheses may require several sentences. It is also very easy to get confused by trying to put your hypothesis into this format. Don’t worry about phrasing hypotheses as “if…then” statements – that is almost never done in experiments outside a classroom.

The results  of your experiment are the data that you collect as the outcome.  In the light experiment, your results are either that the light turns on or the light doesn’t turn on. Based on your results, you can make a conclusion. Your conclusion  uses the results to answer your original question.

flow chart illustrating a simplified version of the scientific process.

We can put the experiment with the light that won’t go in into the figure above:

  • Observation: the light won’t turn on.
  • Question: why won’t the light turn on?
  • Hypothesis: the lightbulb is burned out.
  • Prediction: if I change the lightbulb (independent variable), then the light will turn on (dependent variable).
  • Experiment: change the lightbulb while leaving all other variables the same.
  • Analyze the results: the light didn’t turn on.
  • Conclusion: The lightbulb isn’t burned out. The results do not support the hypothesis, time to develop a new one!
  • Hypothesis 2: the lamp is unplugged.
  • Prediction 2: if I plug in the lamp, then the light will turn on.
  • Experiment: plug in the lamp
  • Analyze the results: the light turned on!
  • Conclusion: The light wouldn’t turn on because the lamp was unplugged. The results support the hypothesis, it’s time to move on to the next experiment!

In practice, the scientific method is not as rigid and structured as it might at first appear. Sometimes an experiment leads to conclusions that favor a change in approach; often, an experiment brings entirely new scientific questions to the puzzle. Many times, science does not operate in a linear fashion; instead, scientists continually draw inferences and make generalizations, finding patterns as their research proceeds. Scientific reasoning is more complex than the scientific method alone suggests.

A more complex flow chart illustrating how the scientific method usually happens.

Control Groups

Another important aspect of designing an experiment is the presence of one or more control groups. A control group  allows you to make a comparison that is important for interpreting your results. Control groups are samples that help you to determine that differences between your experimental groups are due to your treatment rather than a different variable – they eliminate alternate explanations for your results (including experimental error and experimenter bias). They increase reliability, often through the comparison of control measurements and measurements of the experimental groups. Often, the control group is a sample that is not treated with the independent variable, but is otherwise treated the same way as your experimental sample. This type of control group is treated the same way as the experimental group except it does not get treated with the independent variable. Therefore, if the results of the experimental group differ from the control group, the difference must be due to the change of the independent, rather than some outside factor. It is common in complex experiments (such as those published in scientific journals) to have more control groups than experimental groups.

Question: Which fertilizer will produce the greatest number of tomatoes when applied to the plants?

Hypothesis : If I apply different brands of fertilizer to tomato plants, the most tomatoes will be produced from plants watered with Brand A because Brand A advertises that it produces twice as many tomatoes as other leading brands.

Experiment:  Purchase 10 tomato plants of the same type from the same nursery. Pick plants that are similar in size and age. Divide the plants into two groups of 5. Apply Brand A to the first group and Brand B to the second group according to the instructions on the packages. After 10 weeks, count the number of tomatoes on each plant.

Independent Variable:  Brand of fertilizer.

Dependent Variable : Number of tomatoes.

  • The number of tomatoes produced depends on the brand of fertilizer applied to the plants.

Constants:  amount of water, type of soil, size of pot, amount of light, type of tomato plant, length of time plants were grown.

Confounding variables : any of the above that are not held constant, plant health, diseases present in the soil or plant before it was purchased.

Results:  Tomatoes fertilized with Brand A  produced an average of 20 tomatoes per plant, while tomatoes fertilized with Brand B produced an average of 10 tomatoes per plant.

You’d want to use Brand A next time you grow tomatoes, right? But what if I told you that plants grown without fertilizer produced an average of 30 tomatoes per plant! Now what will you use on your tomatoes?

Bar graph: number of tomatoes produced from plants watered with different fertilizers. Brand A = 20. Brand B = 10. Control = 30.

Results including control group : Tomatoes which received no fertilizer produced more tomatoes than either brand of fertilizer.

Conclusion:  Although Brand A fertilizer produced more tomatoes than Brand B, neither fertilizer should be used because plants grown without fertilizer produced the most tomatoes!

More examples of control groups:

  • You observe growth . Does this mean that your spinach is really contaminated? Consider an alternate explanation for growth: the swab, the water, or the plate is contaminated with bacteria. You could use a control group to determine which explanation is true. If you wet one of the swabs and wiped on a nutrient plate, do bacteria grow?
  • You don’t observe growth.  Does this mean that your spinach is really safe? Consider an alternate explanation for no growth: Salmonella isn’t able to grow on the type of nutrient you used in your plates. You could use a control group to determine which explanation is true. If you wipe a known sample of Salmonella bacteria on the plate, do bacteria grow?
  • You see a reduction in disease symptoms: you might expect a reduction in disease symptoms purely because the person knows they are taking a drug so they believe should be getting better. If the group treated with the real drug does not show more a reduction in disease symptoms than the placebo group, the drug doesn’t really work. The placebo group sets a baseline against which the experimental group (treated with the drug) can be compared.
  • You don’t see a reduction in disease symptoms: your drug doesn’t work. You don’t need an additional control group for comparison.
  • You would want a “placebo feeder”. This would be the same type of feeder, but with no food in it. Birds might visit a feeder just because they are interested in it; an empty feeder would give a baseline level for bird visits.
  • You would want a control group where you knew the enzyme would function. This would be a tube where you did not change the pH. You need this control group so you know your enzyme is working: if you didn’t see a reaction in any of the tubes with the pH adjusted, you wouldn’t know if it was because the enzyme wasn’t working at all or because the enzyme just didn’t work at any of your tested pH values.
  • You would also want a control group where you knew the enzyme would not function (no enzyme added). You need the negative control group so you can ensure that there is no reaction taking place in the absence of enzyme: if the reaction proceeds without the enzyme, your results are meaningless.

Text adapted from: OpenStax , Biology. OpenStax CNX. May 27, 2016  http://cnx.org/contents/[email protected]:RD6ERYiU@5/The-Process-of-Science .

MHCC Biology 112: Biology for Health Professions Copyright © 2019 by Lisa Bartee is licensed under a Creative Commons Attribution 4.0 International License , except where otherwise noted.

Share This Book

6   Testing

biology hypothesis testing examples

Hypothesis testing is one of the workhorses of science. It is how we can draw conclusions or make decisions based on finite samples of data. For instance, new treatments for a disease are usually approved on the basis of clinical trials that aim to decide whether the treatment has better efficacy compared to the other available options, and an acceptable trade-off of side effects. Such trials are expensive and can take a long time. Therefore, the number of patients we can enroll is limited, and we need to base our inference on a limited sample of observed patient responses. The data are noisy, since a patient’s response depends not only on the treatment, but on many other factors outside of our control. The sample size needs to be large enough to enable us to make a reliable conclusion. On the other hand, it also must not be too large, so that we do not waste precious resources or time, e.g., by making drugs more expensive than necessary, or by denying patients that would benefit from the new drug access to it. The machinery of hypothesis testing was developed largely with such applications in mind, although today it is used much more widely.

In biological data analysis (and in many other fields 1 ) we see hypothesis testing applied to screen thousands or millions of possible hypotheses to find the ones that are worth following up. For instance, researchers screen genetic variants for associations with a phenotype, or gene expression levels for associations with disease. Here, “worthwhile” is often interpreted as “statistically significant”, although the two concepts are clearly not the same. It is probably fair to say that statistical significance is a necessary condition for making a data-driven decision to find something interesting, but it’s clearly not sufficient. In any case, such large-scale association screening is closely related to multiple hypothesis testing.

1  Detecting credit card fraud, email spam detection, \(...\)

6.1 Goals for this Chapter

In this chapter we will:

Familiarize ourselves with the statistical machinery of hypothesis testing, its vocabulary, its purpose, and its strengths and limitations.

Understand what multiple testing means.

See that multiple testing is not a problem – but rather, an opportunity, as it overcomes many of the limitations of single testing.

Understand the false discovery rate.

Learn how to make diagnostic plots.

Use hypothesis weighting to increase the power of our analyses.

6.1.1 Drinking from the firehose

biology hypothesis testing examples

If statistical testing—decision making with uncertainty—seems a hard task when making a single decision, then brace yourself: in genomics, or more generally with “big data”, we need to accomplish it not once, but thousands or millions of times. In Chapter 2 , we saw the example of epitope detection and the challenges from considering not only one, but several positions. Similarly, in whole genome sequencing, we scan every position in the genome for a difference between the DNA library at hand and a reference (or, another library): that’s in the order of six billion tests if we are looking at human data! In genetic or chemical compound screening, we test each of the reagents for an effect in the assay, compared to a control: that’s again tens of thousands, if not millions of tests. In Chapter 8 , we will analyse RNA-Seq data for differential expression by applying a hypothesis test to each of the thousands of genes assayed.

6.1.2 Testing versus classification

Suppose we measured the expression level of a marker gene to decide whether some cells we are studying are from cell type A or B. First, let’s consider that we have no prior assumption, and it’s equally important to us to get the assignment right no matter whether the true cell type is A or B. This is a classification task. We’ll cover classification in Chapter 12 . In this chapter, we consider the asymmetric case: based on what we already know (we could call this our prior knowledge), we lean towards conservatively calling any cell A, unless there is strong enough evidence for the alternative. Or maybe class B is interesting, rare, and/or worthwhile studying further, whereas A is a “catch-all” class for all the boring rest. In such cases, the machinery of hypothesis testing is for us.

Formally, there are many similarities between hypothesis testing and classification. In both cases, we aim to use data to choose between several possible decisions. It is even possible to think of hypothesis testing as a special case of classification. However, these two approaches are geared towards different objectives and underlying assumptions, and when you encounter a statistical decision problem, it is good to keep that in mind in your choice of methodology.

6.1.3 False discovery rate versus p-value: which is more intuitive?

biology hypothesis testing examples

Hypothesis testing has traditionally been taught with p-values first—introducing them as the primal, basic concept. Multiple testing and false discovery rates are then presented as derived, additional ideas. There are good mathematical and practical reasons for doing so, and the rest of this chapter follows this tradition. However, in this prefacing section we would like to point out that it can be more intuitive and more pedagogical to revert the order, and learn about false discovery rates first and think of p-values as an imperfect proxy.

Consider Figure  6.2 , which represents a binary decision problem. Let’s say we call a discovery whenever the summary statistic \(x\) is particularly small, i.e., when it falls to the left of the vertical black bar 2 . Then the false discovery rate 3 (FDR) is simply the fraction of false discoveries among all discoveries, i.e.:

2  This is “without loss of generality”: we could also flip the \(x\) -axis and call something with a high score a discovery.

3  This is a rather informal definition. For more precise definitions, see for instance ( Storey 2003 ; Efron 2010 ) and Section 6.10 .

\[ \text{FDR}=\frac{\text{area shaded in light blue}}{\text{sum of the areas left of the vertical bar (light blue + strong red)}}. \tag{6.1}\]

The FDR depends not only on the position of the decision threshold (the vertical bar), but also on the shape and location of the two distributions, and on their relative sizes. In Figures 6.2 and 6.3 , the overall blue area is twice as big as the overall red area, reflecting the fact that the blue class is (in this example) twice as prevalent (or: a priori, twice as likely) as the red class.

Note that this definition does not require the concept or even the calculation of a p-value. It works for any arbitrarily defined score \(x\) . However, it requires knowledge of three things:

the distribution of \(x\) in the blue class (the blue curve),

the distribution of \(x\) in the red class (the red curve),

the relative sizes of the blue and the red classes.

If we know these, then we are basically done at this point; or we can move on to supervised classification in Chapter 12 , which deals with the extension of Figure  6.2 to multivariate \(x\) .

Very often, however, we do not know all of these, and this is the realm of hypothesis testing. In particular, suppose that one of the two classes (say, the blue one) is easier than the other, and we can figure out its distribution, either from first principles or simulations. We use that fact to transform our score \(x\) to a standardized range between 0 and 1 (see Figures 6.2 — 6.4 ), which we call the p-value . We give the class a fancier name: null hypothesis . This addresses Point 1 in the above list. We do not insist on knowing Point 2 (and we give another fancy name, alternative hypothesis , to the red class). As for Point 3, we can use the conservative upper limit that the null hypothesis is far more prevalent (or: likely) than the alternative and do our calculations under the condition that the null hypothesis is true. This is the traditional approach to hypothesis testing.

Thus, instead of basing our decision-making on the intuitive FDR ( Equation  6.2 ), we base it on the

\[ \text{p-value}=\frac{\text{area shaded in light blue}}{\text{overall blue area}}. \tag{6.2}\]

In other words, the p-value is the precise and often relatively easy-to-compute answer to a rather convoluted question (and perhaps the wrong question). The FDR answers the right question, but requires a lot more input, which we often do not have.

6.1.4 The multiple testing opportunity

Here is the good news about multiple testing: even if we do not know Items 2 and 3 from the bullet list above explicitly for our tests (and perhaps even if we are unsure about Point 1 ( Efron 2010 ) ), we may be able to infer this information from the multiplicity—and thus convert p-values into estimates of the FDR!

Thus, multiple testing tends to make our inference better, and our task simpler. Since we have so much data, we do not only have to rely on abstract assumptions. We can check empirically whether the requirements of the tests are actually met by the data. All this can be incredibly helpful, and we get it because of the multiplicity. So we should think about multiple testing not as a “problem” or a “burden”, but as an opportunity!

6.2 An example: coin tossing

So now let’s dive into hypothesis testing, starting with single testing. To really understand the mechanics, we use one of the simplest possible examples: suppose we are flipping a coin to see if it is fair 4 . We flip the coin 100 times and each time record whether it came up heads or tails. So, we have a record that could look something like this:

4  We don’t look at coin tossing because it’s inherently important, but because it is an easy “model system” (just as we use model systems in biology): everything can be calculated easily, and you do not need a lot of domain knowledge to understand what coin tossing is. All the important concepts come up, and we can apply them, only with more additional details, to other applications.

which we can simulate in R. Let’s assume we are flipping a biased coin, so we set probHead different from 1/2:

Now, if the coin were fair, we would expect half of the time to get heads. Let’s see.

So that is different from 50/50. Suppose we showed the data to a friend without telling them whether the coin is fair, and their prior assumption, i.e., their null hypothesis, is that coins are, by and large, fair. Would the data be strong enough to make them conclude that this coin isn’t fair? They know that random sampling differences are to be expected. To decide, let’s look at the sampling distribution of our test statistic – the total number of heads seen in 100 coin tosses – for a fair coin 5 . As we saw in Chapter 1 , the number, \(k\) , of heads, in \(n\) independent tosses of a coin is

5  We haven’t really defined what we mean be fair – a reasonable definition would be that head and tail are equally likely, and that the outcome of each coin toss does not depend on the previous ones. For more complex applications, nailing down the most suitable null hypothesis can take some thought.

\[ P(K=k\,|\,n, p) = \left(\begin{array}{c}n\\k\end{array}\right) p^k\;(1-p)^{n-k}, \tag{6.3}\]

where \(p\) is the probability of heads (0.5 if we assume a fair coin). We read the left hand side of the above equation as “the probability that the observed value for \(K\) is \(k\) , given the values of \(n\) and \(p\) ”. Statisticians like to make a difference between all the possible values of a statistic and the one that was observed 6 , and we use the upper case \(K\) for the possible values (so \(K\) can be anything between 0 and 100), and the lower case \(k\) for the observed value.

6  In other words, \(K\) is the abstract random variable in our probabilistic model, whereas \(k\) is its realization, that is, a specific data point.

We plot Equation  6.3 in Figure  6.5 ; for good measure, we also mark the observed value numHeads with a vertical blue line.

biology hypothesis testing examples

Suppose we didn’t know about Equation  6.3 . We can still use Monte Carlo simulation to give us something to compare with:

biology hypothesis testing examples

As expected, the most likely number of heads is 50, that is, half the number of coin flips. But we see that other numbers near 50 are also quite likely. How do we quantify whether the observed value, 59, is among those values that we are likely to see from a fair coin, or whether its deviation from the expected value is already large enough for us to conclude with enough confidence that the coin is biased? We divide the set of all possible \(k\) (0 to 100) in two complementary subsets, the rejection region and the region of no rejection. Our choice here 7 is to fill up the rejection region with as many \(k\) as possible while keeping their total probability, assuming the null hypothesis, below some threshold \(\alpha\) (say, 0.05).

7  More on this in Section 6.3.1 .

In the code below, we use the function arrange from the dplyr package to sort the p-values from lowest to highest, then pass the result to mutate , which adds another dataframe column reject that is defined by computing the cumulative sum ( cumsum ) of the p-values and thresholding it against alpha . The logical vector reject therefore marks with TRUE a set of k s whose total probability is less than alpha . These are marked in Figure  6.7 , and we can see that our rejection region is not contiguous – it comprises both the very large and the very small values of k .

biology hypothesis testing examples

The explicit summation over the probabilities is clumsy, we did it here for pedagogic value. For one-dimensional distributions, R provides not only functions for the densities (e.g., dbinom ) but also for the cumulative distribution functions ( pbinom ), which are more precise and faster than cumsum over the probabilities. These should be used in practice.

Do the computations for the rejection region and produce a plot like Figure  6.7 without using dbinom and cumsum , and with using pbinom instead.

We see in Figure  6.7 that the observed value, 59, lies in the grey shaded area, so we would not reject the null hypothesis of a fair coin from these data at a significance level of \(\alpha=0.05\) .

Question 6.1 Does the fact that we don’t reject the null hypothesis mean that the coin is fair?

Question 6.2 Would we have a better chance of detecting that the coin is not fair if we did more coin tosses? How many?

Question 6.3 If we repeated the whole procedure and again tossed the coin 100 times, might we then reject the null hypothesis?

Question 6.4 The rejection region in Figure  6.7 is asymmetric – its left part ends with \(k=40\) , while its right part starts with \(k=61\) . Why is that? Which other ways of defining the rejection region might be useful?

We have just gone through the steps of a binomial test. In fact, this is such a frequent activity in R that it has been wrapped into a single function, and we can compare its output to our results.

6.3 The five steps of hypothesis testing

Let’s summarise the general principles of hypothesis testing:

Decide on the effect that you are interested in, design a suitable experiment or study, pick a data summary function and test statistic .

Set up a null hypothesis , which is a simple, computationally tractable model of reality that lets you compute the null distribution , i.e., the possible outcomes of the test statistic and their probabilities under the assumption that the null hypothesis is true.

Decide on the rejection region , i.e., a subset of possible outcomes whose total probability is small 8 .

Do the experiment and collect the data 9 ; compute the test statistic.

Make a decision: reject the null hypothesis 10 if the test statistic is in the rejection region.

8  More on this in Section 6.3.1 .

9  Or if someone else has already done it, download their data.

10  That is, conclude that it is unlikely to be true.

Note how in this idealized workflow, we make all the important decisions in Steps 1–3 before we have even seen the data. As we already alluded to in the Introduction (Figures 1 and 2 ), this is often not realistic. We will also come back to this question in Section 6.6 .

There was also idealization in our null hypothesis that we used in the example above: we postulated that a fair coin should have a probability of exactly 0.5 (not, say, 0.500001) and that there should be absolutely no dependence between tosses. We did not worry about any possible effects of air drag, elasticity of the material on which the coin falls, and so on. This gave us the advantage that the null hypothesis was computationally tractable, namely, with the binomial distribution. Here, these idealizations may not seem very controversial, but in other situations the trade-off between how tractable and how realistic a null hypothesis is can be more substantial. The problem is that if a null hypothesis is too idealized to start with, rejecting it is not all that interesting. The result may be misleading, and certainly we are wasting our time.

The test statistic in our example was the total number of heads. Suppose we observed 50 tails in a row, and then 50 heads in a row. Our test statistic ignores the order of the outcomes, and we would conclude that this is a perfectly fair coin. However, if we used a different test statistic, say, the number of times we see two tails in a row, we might notice that there is something funny about this coin.

Question 6.5 What is the null distribution of this different test statistic?

Question 6.6 Would a test based on that statistic be generally preferable?

No, while it has more power to detect such correlations between coin tosses, it has less power to detect bias in the outcome.

What we have just done is look at two different classes of alternative hypotheses . The first class of alternatives was that subsequent coin tosses are still independent of each other, but that the probability of heads differed from 0.5. The second one was that the overall probability of heads may still be 0.5, but that subsequent coin tosses were correlated.

Question 6.7 Recall the concept of sufficient statistics from Chapter 1 . Is the total number of heads a sufficient statistic for the binomial distribution? Why might it be a good test statistic for our first class of alternatives, but not for the second?

So let’s remember that we typically have multiple possible choices of test statistic (in principle it could be any numerical summary of the data). Making the right choice is important for getting a test with good power 11 . What the right choice is will depend on what kind of alternatives we expect. This is not always easy to know in advance.

11  See Section 1.4.1 and 6.4 .

Once we have chosen the test statistic we need to compute its null distribution. You can do this either with pencil and paper or by computer simulations. A pencil and paper solution is parametric and leads to a closed form mathematical expression (like Equation  6.3 ), which has the advantage that it holds for a range of model parameters of the null hypothesis (such as \(n\) , \(p\) ). It can also be quickly computed for any specific set of parameters. But it is not always as easy as in the coin tossing example. Sometimes a pencil and paper solution is impossibly difficult to compute. At other times, it may require simplifying assumptions. An example is a null distribution for the \(t\) -statistic (which we will see later in this chapter). We can compute this if we assume that the data are independent and normally distributed: the result is called the \(t\) -distribution. Such modelling assumptions may be more or less realistic. Simulating the null distribution offers a potentially more accurate, more realistic and perhaps even more intuitive approach. The drawback of simulating is that it can take a rather long time, and we need extra work to get a systematic understanding of how varying parameters influence the result. Generally, it is more elegant to use the parametric theory when it applies 12 . When you are in doubt, simulate – or do both.

12  The assumptions don’t need to be exactly true – it is sufficient that the theory’s predictions are an acceptable approximation of the truth.

6.3.1 The rejection region

How to choose the right rejection region for your test? First, what should its size be? That is your choice of the significance level or false positive rate \(\alpha\) , which is the total probability of the test statistic falling into this region even if the null hypothesis is true 13 .

13  Some people at some point in time for a particular set of questions colluded on \(\alpha=0.05\) as being “small”. But there is nothing special about this number, and in any particular case the best choice for a decision threshold may very much depend on context ( Wasserstein and Lazar 2016 ; Altman and Krzywinski 2017 ) .

Given the size, the next question is about its shape. For any given size, there are usually multiple possible shapes. It makes sense to require that the probability of the test statistic falling into the rejection region is as large possible if the alternative hypothesis is true. In other words, we want our test to have high power , or true positive rate.

The criterion that we used in the code for computing the rejection region for Figure  6.7 was to make the region contain as many k as possible. That is because in absence of any information about the alternative distribution, one k is as good as any other, and we maximize their total number.

A consequence of this is that in Figure  6.7 the rejection region is split between the two tails of the distribution. This is because we anticipate that unfair coins could have a bias either towards head or toward tail; we don’t know. If we did know, we would instead concentrate our rejection region all on the appropriate side, e.g., the right tail if we think the bias would be towards head. Such choices are also referred to as two-sided and one-sided tests. More generally, if we have assumptions about the alternative distribution, this can influence our choice of the shape of the rejection region.

6.4 Types of error

Having set out the mechanics of testing, we can assess how well we are doing. Table  6.1 compares reality (whether or not the null hypothesis is in fact true) with our decision whether or not to reject the null hypothesis after we have seen the data.

Test vs reality Null hypothesis is true \(...\) is false
Type I error (false positive) True positive
True negative Type II error (false negative)

It is always possible to reduce one of the two error types at the cost of increasing the other one. The real challenge is to find an acceptable trade-off between both of them. This is exemplified in Figure  6.2 . We can always decrease the false positive rate (FPR) by shifting the threshold to the right. We can become more “conservative”. But this happens at the price of higher false negative rate (FNR). Analogously, we can decrease the FNR by shifting the threshold to the left. But then again, this happens at the price of higher FPR. A bit on terminology: the FPR is the same as the probability \(\alpha\) that we mentioned above. \(1 - \alpha\) is also called the specificity of a test. The FNR is sometimes also called \(\beta\) , and \(1 - \beta\) the power , sensitivity or true positive rate of a test.

Question 6.8 At the end of Section 6.3 , we learned about one- and two-sided tests. Why does this distinction exist? Why don’t we always just use the two-sided test, which is sensitive to a larger class of alternatives?

6.5 The t-test

Many experimental measurements are reported as rational numbers, and the simplest comparison we can make is between two groups, say, cells treated with a substance compared to cells that are not. The basic test for such situations is the \(t\) -test. The test statistic is defined as

\[ t = c \; \frac{m_1-m_2}{s}, \tag{6.4}\]

where \(m_1\) and \(m_2\) are the mean of the values in the two groups, \(s\) is the pooled standard deviation and \(c\) is a constant that depends on the sample sizes, i.e., the numbers of observations \(n_1\) and \(n_2\) in the two groups. In formulas 14 ,

14  Everyone should try to remember Equation  6.4 , whereas many people get by with looking up Equation  6.5 when they need it.

\[ \begin{align} m_g &= \frac{1}{n_g} \sum_{i=1}^{n_g} x_{g, i} \quad\quad\quad g=1,2\\ s^2 &= \frac{1}{n_1+n_2-2} \left( \sum_{i=1}^{n_1} \left(x_{1,i} - m_1\right)^2 + \sum_{j=1}^{n_2} \left(x_{2,j} - m_2\right)^2 \right)\\ c &= \sqrt{\frac{n_1n_2}{n_1+n_2}} \end{align} \tag{6.5}\]

where \(x_{g, i}\) is the \(i^{\text{th}}\) data point in the \(g^{\text{th}}\) group. Let’s try this out with the PlantGrowth data from R’s datasets package.

biology hypothesis testing examples

Question 6.9 What do you get from the comparison with trt1 ? What for trt1 versus trt2 ?

Question 6.10 What is the significance of the var.equal = TRUE in the above call to t.test ?

We’ll get back to this in Section 6.5 .

Question 6.11 Rewrite the above call to t.test using the formula interface, i.e., by using the notation weight \(\sim\) group .

To compute the p-value, the t.test function uses the asymptotic theory for the \(t\) -statistic Equation  6.4 ; this theory states that under the null hypothesis of equal means in both groups, the statistic follows a known, mathematical distribution, the so-called \(t\) -distribution with \(n_1+n_2-2\) degrees of freedom. The theory uses additional technical assumptions, namely that the data are independent and come from a normal distribution with the same standard deviation. We could be worried about these assumptions. Clearly they do not hold: weights are always positive, while the normal distribution extends over the whole real axis. The question is whether this deviation from the theoretical assumption makes a real difference. We can use a permutation test to figure this out (we will discuss the idea behind permutation tests in a bit more detail in Section 6.5.1 ).

biology hypothesis testing examples

Question 6.12 Why did we use the absolute value function ( abs ) in the above code?

Plot the (parametric) \(t\) -distribution with the appropriate degrees of freedom.

The \(t\) -test comes in multiple flavors, all of which can be chosen through parameters of the t.test function. What we did above is called a two-sided two-sample unpaired test with equal variance. Two-sided refers to the fact that we were open to reject the null hypothesis if the weight of the treated plants was either larger or smaller than that of the untreated ones.

Two-sample 15 indicates that we compared the means of two groups to each other; another option is to compare the mean of one group against a given, fixed number.

15  It can be confusing that the term sample has a different meaning in statistics than in biology. In biology, a sample is a single specimen on which an assay is performed; in statistics, it is a set of measurements, e.g., the \(n_1\) -tuple \(\left(x_{1,1},...,x_{1,n_1}\right)\) in Equation  6.5 , which can comprise several biological samples. In contexts where this double meaning might create confusion, we refer to the data from a single biological sample as an observation .

Unpaired means that there was no direct 1:1 mapping between the measurements in the two groups. If, on the other hand, the data had been measured on the same plants before and after treatment, then a paired test would be more appropriate, as it looks at the change of weight within each plant, rather than their absolute weights.

Equal variance refers to the way the statistic Equation  6.4 is calculated. That expression is most appropriate if the variances within each group are about the same. If they are very different, an alternative form (Welch’s \(t\) -test) and associated asymptotic theory exist.

The independence assumption . Now let’s try something peculiar: duplicate the data.

Note how the estimates of the group means (and thus, of the difference) are unchanged, but the p-value is now much smaller! We can conclude two things from this:

The power of the \(t\) -test depends on the sample size. Even if the underlying biological differences are the same, a dataset with more observations tends to give more significant results 16 .

The assumption of independence between the measurements is really important. Blatant duplication of the same data is an extreme form of dependence, but to some extent the same thing happens if you mix up different levels of replication. For instance, suppose you had data from 8 plants, but measured the same thing twice on each plant (technical replicates), then pretending that these are now 16 independent measurements is wrong.

16  You can also see this from the way the numbers \(n_1\) and \(n_2\) appear in Equation  6.5 .

6.5.1 Permutation tests

What happened above when we contrasted the outcome of the parametric \(t\) -test with that of the permutation test applied to the \(t\) -statistic? It’s important to realize that these are two different tests, and the similarity of their outcomes is desirable, but coincidental. In the parametric test, the null distribution of the \(t\) -statistic follows from the assumed null distribution of the data, a multivariate normal distribution with unit covariance in the \((n_1+n_2)\) -dimensional space \(\mathbb{R}^{n_1+n_2}\) , and is continuous: the \(t\) -distribution. In contrast, the permutation distribution of our test statistic is discrete, as it is obtained from the finite set of \((n_1+n_2)!\) permutations 17 of the observation labels, from a single instance of the data (the \(n_1+n_2\) observations). All we assume here is that under the null hypothesis, the variables \(X_{1,1},...,X_{1,n_1},X_{2,1},...,X_{2,n_2}\) are exchangeable. Logically, this assumption is implied by that of the parametric test, but is weaker. The permutation test employs the \(t\) -statistic, but not the \(t\) -distribution (nor the normal distribution). The fact that the two tests gave us a very similar result is a consequence of the Central Limit Theorem.

17  Or a random subset, in case we want to save computation time.

6.6 P-value hacking

Let’s go back to the coin tossing example. We did not reject the null hypothesis (that the coin is fair) at a level of 5%—even though we “knew” that it is unfair. After all, probHead was chosen as 0.6 in Section 6.2 . Let’s suppose we now start looking at different test statistics. Perhaps the number of consecutive series of 3 or more heads. Or the number of heads in the first 50 coin flips. And so on. At some point we will find a test that happens to result in a small p-value, even if just by chance (after all, the probability for the p-value to be less than 0.05 under the null hypothesis—fair coin—is one in twenty). We just did what is called p-value hacking 18 ( Head et al. 2015 ) . You see what the problem is: in our zeal to prove our point we tortured the data until some statistic did what we wanted. A related tactic is hypothesis switching or HARKing – hypothesizing after the results are known: we have a dataset, maybe we have invested a lot of time and money into assembling it, so we need results. We come up with lots of different null hypotheses and test statistics, test them, and iterate, until we can report something.

18   http://fivethirtyeight.com/features/science-isnt-broken

These tactics violate the rules of hypothesis testing, as described in Section 6.3 , where we laid out one sequential procedure of choosing the hypothesis and the test, and then collecting the data. But, as we saw in Chapter 2 , such tactics can be tempting in reality. With biological data, we tend to have so many different choices for “normalising” the data, transforming the data, trying to adjust for batch effects, removing outliers, …. The topic is complex and open-ended. Wasserstein and Lazar ( 2016 ) give a readable short summary of the problems with how p-values are used in science, and of some of the misconceptions. They also highlight how p-values can be fruitfully used. The essential message is: be completely transparent about your data, what analyses were tried, and how they were done. Provide the analysis code. Only with such contextual information can a p-value be useful.

Avoid fallacy . Keep in mind that our statistical test is never attempting to prove our null hypothesis is true - we are simply saying whether or not there is evidence for it to be false. If a high p-value were indicative of the truth of the null hypothesis, we could formulate a completely crazy null hypothesis, do an utterly irrelevant experiment, collect a small amount of inconclusive data, find a p-value that would just be a random number between 0 and 1 (and so with some high probability above our threshold \(\alpha\) ) and, whoosh, our hypothesis would be demonstrated!

6.7 Multiple testing

Question 6.13 Look up xkcd cartoon 882 . Why didn’t the newspaper report the results for the other colors?

The quandary illustrated in the cartoon occurs with high-throughput data in biology. And with force! You will be dealing not only with 20 colors of jellybeans, but, say, with 20,000 genes that were tested for differential expression between two conditions, or with 6 billion positions in the genome where a DNA mutation might have happened. So how do we deal with this? Let’s look again at our table relating statistical test results with reality ( Table  6.1 ), this time framing everything in terms of many hypotheses.

Test vs reality Null hypothesis is true \(...\) is false Total
\(V\) \(S\) \(R\)
\(U\) \(T\) \(m-R\)
\(m_0\) \(m-m_0\) \(m\)

\(m\) : total number of tests (and null hypotheses)

\(m_0\) : number of true null hypotheses

\(m-m_0\) : number of false null hypotheses

\(V\) : number of false positives (a measure of type I error)

\(T\) : number of false negatives (a measure of type II error)

\(S\) , \(U\) : number of true positives and true negatives

\(R\) : number of rejections

In the rest of this chapter, we look at different ways of taking care of the type I and II errors.

6.8 The family wise error rate

The family wise error rate (FWER) is the probability that \(V>0\) , i.e., that we make one or more false positive errors. We can compute it as the complement of making no false positive errors at all 19 .

19  Assuming independence.

\[ \begin{align} P(V>0) &= 1 - P(\text{no rejection of any of $m_0$ nulls}) \\ &= 1 - (1 - \alpha)^{m_0} \to 1 \quad\text{as } m_0\to\infty. \end{align} \tag{6.6}\]

For any fixed \(\alpha\) , this probability is appreciable as soon as \(m_0\) is in the order of \(1/\alpha\) , and it tends towards 1 as \(m_0\) becomes larger. This relationship can have serious consequences for experiments like DNA matching, where a large database of potential matches is searched. For example, if there is a one in a million chance that the DNA profiles of two people match by random error, and your DNA is tested against a database of 800000 profiles, then the probability of a random hit with the database (i.e., without you being in it) is:

That’s pretty high. And once the database contains a few million profiles more, a false hit is virtually unavoidable.

Question 6.14 Prove that the probability Equation  6.6 does indeed become very close to 1 when \(m_0\) is large.

6.8.1 Bonferroni method

How are we to choose the per-hypothesis \(\alpha\) if we want FWER control? The above computations suggest that the product of \(\alpha\) with \(m_0\) may be a reasonable ballpark estimate. Usually we don’t know \(m_0\) , but we know \(m\) , which is an upper limit for \(m_0\) , since \(m_0\le m\) . The Bonferroni method is simply that if we want FWER control at level \(\alpha_{\text{FWER}}\) , we should choose the per hypothesis threshold \(\alpha = \alpha_{\text{FWER}}/m\) . Let’s check this out on an example.

biology hypothesis testing examples

In Figure  6.10 , the black line intersects the red line (which corresponds to a value of 0.05) at \(\alpha=5.13\times 10^{-6}\) , which is just a little bit more than the value of \(0.05/m\) implied by the Bonferroni method.

Question 6.15 Why are the two values not exactly the same?

A potential drawback of this method, however, is that if \(m_0\) is large, the rejection threshold is very small. This means that the individual tests need to be very powerful if we want to have any chance of detecting something. Often FWER control is too stringent, and would lead to an ineffective use of the time and money that was spent to generate and assemble the data. We will now see that there are more nuanced methods of controlling our type I error.

6.9 The false discovery rate

Let’s look at some data. We load up the RNA-Seq dataset airway , which contains gene expression measurements (gene-level counts) of four primary human airway smooth muscle cell lines with and without treatment with dexamethasone, a synthetic glucocorticoid. We’ll use the DESeq2 method that we’ll discuss in more detail in Chapter 8 For now it suffices to say that it performs a test for differential expression for each gene. Conceptually, the tested null hypothesis is similar to that of the \(t\) -test, although the details are slightly more involved since we are dealing with count data.

Have a look at the content of awde .

(Optional) Consult the DESeq2 vignette and/or Chapter 8 for more information on what the above code chunk does.

6.9.1 The p-value histogram

The p-value histogram is an important sanity check for any analysis that involves multiple tests. It is a mixture composed of two components:

null: the p-values resulting from the tests for which the null hypothesis is true.

alt: the p-values resulting from the tests for which the null hypothesis is not true. The relative size of these two components depends on the fraction of true nulls and true alternatives (i.e., on \(m_0\) and \(m\) ), and it can often be visually estimated from the histogram. If our analysis has high statistical power, then the second component (“alt”) consists of mostly small p-values, i.e., appears as a peak near 0 in the histogram; if the power is not high for some of the alternatives, we expect that this peak extends towards the right, i.e., has a “shoulder”. For the “null” component, we expect (by definition of the p-value for continuous data and test statistics) a uniform distribution in \([0,1]\) . Let’s plot the histogram of p-values for the airway data.

biology hypothesis testing examples

In Figure  6.11 we see the expected mixture. We also see that the null component is not exactly flat (uniform): this is because the data are counts. While these appear quasi-continuous when high, for the tests with low counts the discreteness of the data and the resulting p-values shows up in the spikes towards the right of the histogram.

Now suppose we reject all tests with a p-value less than \(\alpha\) . We can visually determine an estimate of the false discovery proportion with a plot such as in Figure  6.12 , generated by the following code.

biology hypothesis testing examples

We see that there are 4772 p-values in the first bin \([0,\alpha]\) , among which we expect around 945 to be nulls (as indicated by the blue line). Thus we can estimate the fraction of false rejections as

The false discovery rate (FDR) is defined as

\[ \text{FDR} = \text{E}\!\left [\frac{V}{\max(R, 1)}\right ], \tag{6.7}\]

where \(R\) and \(V\) are as in Table  6.2 . The expression in the denominator makes sure that the FDR is well-defined even if \(R=0\) (in that case, \(V=0\) by implication). Note that the FDR becomes identical to the FWER if all null hypotheses are true, i.e., if \(V=R\) . \(\text{E[ ]}\) stands for the expected value . That means that the FDR is not a quantity associated with a specific outcome of \(V\) and \(R\) for one particular experiment. Rather, given our choice of tests and associated rejection rules for them, it is the average 20 proportion of type I errors out of the rejections made, where the average is taken (at least conceptually) over many replicate instances of the experiment.

20  Since the FDR is an expectation value, it does not provide worst case control: in any single experiment, the so-called false discovery proportion (FDP), that is the realized value \(v/r\) (without the \(\text{E[ ]}\) ), could be much higher or lower.

6.9.2 The Benjamini-Hochberg algorithm for controlling the FDR

There is a more elegant alternative to the “visual FDR” method of the last section. The procedure, introduced by Benjamini and Hochberg ( 1995 ) has these steps:

First, order the p-values in increasing order, \(p_{(1)} ... p_{(m)}\)

Then for some choice of \(\varphi\) (our target FDR), find the largest value of \(k\) that satisfies: \(p_{(k)} \leq \varphi \, k / m\)

Finally reject the hypotheses \(1, ..., k\)

We can see how this procedure works when applied to our RNA-Seq p-values through a simple graphical illustration:

biology hypothesis testing examples

The method finds the rightmost point where the black (our p-values) and red lines (slope \(\varphi / m\) ) intersect. Then it rejects all tests to the left.

Question 6.16 Compare the value of kmax with the number of 4772 from above ( Figure  6.12 ). Why are they different?

Question 6.17 Look at the code associated with the option method="BH" of the p.adjust function that comes with R. How does it compare to what we did above?

Question 6.18 Schweder and Spj ø tvoll plot : check out Figures 1–3 in Schweder and Spjøtvoll ( 1982 ) . Make a similar plot for the data in awde . How does it relate to Figures 6.13 and 6.12 ?

Thirteen years before Benjamini and Hochberg ( 1995 ) , Schweder and Spjøtvoll ( 1982 ) suggested a diagnostic plot of the observed \(p\) -values that permits estimation of the fraction of true null hypotheses. For a series of hypothesis tests \(H_1, ..., H_m\) with \(p\) -values \(p_i\) , they suggested plotting

\[ \left( 1-p_i, N(p_i) \right) \mbox{ for } i \in 1, ..., m, \tag{6.8}\]

where \(N(p)\) is the number of \(p\) -values greater than \(p\) . An application of this diagnostic plot to awde$pvalue is shown in Figure  6.14 . When all null hypotheses are true, each of the \(p\) -values is uniformly distributed in \([0,1]\) , Consequently, the empirical cumulative distribution of the sample \((p_1, ..., p_m)\) is expected to be close to the line \(F(t)=t\) . By symmetry, the same applies to \((1 - p_1, ..., 1 - p_m)\) . When (without loss of generality) the first \(m_0\) null hypotheses are true and the other \(m-m_0\) are false, the empirical cumulative distribution of \((1-p_1, ..., 1-p_{m_0})\) is again expected to be close to the line \(F_0(t)=t\) . The empirical cumulative distribution of \((1-p_{m_0+1}, ..., 1-p_{m})\) , on the other hand, is expected to be close to a function \(F_1(t)\) which stays below \(F_0\) but shows a steep increase towards 1 as \(t\) approaches \(1\) . In practice, we do not know which of the null hypotheses are true, so we only observe a mixture whose empirical cumulative distribution is expected to be close to

\[ F(t) = \frac{m_0}{m} F_0(t) + \frac{m-m_0}{m} F_1(t). \tag{6.9}\]

Such a situation is shown in Figure  6.14 . If \(F_1(t)/F_0(t)\) is small for small \(t\) (i.e., the tests have reasonable power), then the mixture fraction \(\frac{m_0}{m}\) can be estimated by fitting a line to the left-hand portion of the plot, and then noting its height on the right. Such a fit is shown by the red line. Here, we focus on those tests for which the count data are not all very small numbers ( baseMean>=1 ), since for these the p-value null distribution is sufficiently close to uniform (i.e., does not show the discreteness mentioned above), but you could try the making the same plot on all of the genes.

biology hypothesis testing examples

There are 22853 rows in awdef , thus, according to this simple estimate, there are 22853-17302=5551 alternative hypotheses.

6.10 The local FDR

biology hypothesis testing examples

While the xkcd cartoon in the chapter’s opening figure ends with a rather sinister interpretation of the multiple testing problem as a way to accumulate errors, Figure  6.15 highlights the multiple testing opportunity: when we do many tests, we can use the multiplicity to increase our understanding beyond what’s possible with a single test.

biology hypothesis testing examples

Let’s get back to the histogram in Figure  6.12 . Conceptually, we can think of it in terms of the so-called two-groups model ( Efron 2010 ) :

\[ f(p)= \pi_0 + (1-\pi_0) f_{\text{alt}}(p), \tag{6.10}\]

Here, \(f(p)\) is the density of the distribution (what the histogram would look like with an infinite amount of data and infinitely small bins), \(\pi_0\) is a number between 0 and 1 that represents the size of the uniform component, and \(f_{\text{alt}}\) is the alternative component. This is a mixture model, as we already saw in Chapter 4 . The mixture densities and the marginal density \(f(p)\) are visualized in the upper panel of Figure  6.16 : the blue areas together correspond to the graph of \(f_{\text{alt}}(p)\) , the grey areas to that of \(f_{\text{null}}(p) = \pi_0\) . If we now consider one particular cutoff \(p\) (say, \(p=0.1\) as in Figure  6.16 ), then we can compute the probability that a hypothesis that we reject at this cutoff is a false positive, as follows. We decompose the value of \(f\) at the cutoff (red line) into the contribution from the nulls (light red, \(\pi_0\) ) and from the alternatives (darker red, \((1-\pi_0) f_{\text{alt}}(p)\) ). The local false discovery rate is then

\[ \text{fdr}(p) = \frac{\pi_0}{f(p)}. \tag{6.11}\]

By definition this quantity is between 0 and 1. Note how the \(\text{fdr}\) in Figure  6.16 is a monotonically increasing function of \(p\) , and this goes with our intuition that the fdr should be lowest for the smallest \(p\) and then gradually get larger, until it reaches 1 at the very right end. We can make a similar decomposition not only for the red line, but also for the area under the curve. This is

\[ F(p) = \int_0^p f(t)\,dt, \tag{6.12}\]

and the ratio of the dark grey area (that is, \(\pi_0\) times \(p\) ) to the overall area \(F(p)\) is the tail area false discovery rate (Fdr 21 ),

21  The convention is to use the lower case abbreviation fdr for the local, and the abbreviation Fdr for the tail-area false discovery rate in the context of the two-groups model Equation  6.10 . The abbreviation FDR is used for the original definition Equation  6.7 , which is a bit more general, namely, it does not depend on the modelling assumptions of Equation  6.10 .

\[ \text{Fdr}(p) = \frac{\pi_0\,p}{F(p)}. \tag{6.13}\]

We’ll use the data version of \(F\) for diagnostics in Figure  6.20 .

The packages qvalue and fdrtool offer facilities to fit these models to data.

In fdrtool , what we called \(\pi_0\) above is called eta0 :

Question 6.19 What do the plots that are produced by the above call to fdrtool show?

Explore the other elements of the list ft .

Question 6.20 What does the empirical in empirical Bayes methods stand for?

6.10.1 Local versus total

The FDR (or the Fdr) is a set property. It is a single number that applies to a whole set of rejections made in the course of a multiple testing analysis. In contrast, the fdr is a local property. It applies to an individual hypothesis. Recall Figure  6.16 , where the fdr was computed for each point along the \(x\) -axis of the density plot, whereas the Fdr depends on the areas to the left of the red line.

Question 6.21 Check out the concepts of total cost and marginal cost in economics. Can you see an analogy with Fdr and fdr?

For a production process that produces a set of \(m\) products, the total cost is the sum of the all costs involved. The average cost of a product is a hypothetical quantity, computed as the total cost divided by \(m\) . The marginal cost is the cost of making one additional product, and is often very different from the average cost. For instance, learning to play a single Beethoven sonata on the piano may take an uninitiated person a substantial amount of time, but then playing it once more requires comparatively little additional effort: the marginal costs are much less than the fixed (and thus the total) costs. An example for marginal costs that are higher than the average costs is running: putting on your shoes and going out for a 10km run may be quite tolerable (perhaps even fun) to most people, whereas each additional 10km could add disproportional discomfort.

6.10.2 Terminology

Historically, the terms multiple testing correction and adjusted p-value have been used for process and output. In the context of false discovery rates, these terms are not helpful, if not confusing. We advocate avoiding them. They imply that we start out with a set of p-values \((p_1,...,p_m)\) , apply some canonical procedure, and obtain a set of “corrected” or “adjusted” p-values \((p_1^{\text{adj}},...,p_m^{\text{adj}})\) . However, the output of the Benjamini-Hochberg method is not p-values, and neither are the FDR, Fdr or the fdr. Remember that FDR and Fdr are set properties, and associating them with an individual test makes as much sense as confusing average and marginal costs. Fdr and fdr also depend on a substantial amount of modelling assumptions. In the next session, you will also see that the method of Benjamini-Hochberg is not the only game in town, and that there are important and useful extensions, which further displace any putative direct correspondence between the set of hypotheses and p-values that are input into a multiple testing procedure, and its outputs.

6.11 Independent hypothesis weighting

The Benjamini-Hochberg method and the two-groups model, as we have seen them so far, implicitly assume exchangeability of the hypotheses: all we use are the p-values. Beyond these, we do not take into account any additional information. This is not always optimal, and here we’ll study ways of how to improve on this.

Let’s look at an example. Intuitively, the signal-to-noise ratio for genes with larger numbers of reads mapped to them should be better than for genes with few reads, and that should affect the power of our tests. We look at the mean of normalized counts across observations. In the DESeq2 package this quantity is called the baseMean .

Next we produce the histogram of this quantity across genes, and plot it against the p-values (Figures 6.17 and 6.18 ).

biology hypothesis testing examples

Question 6.22 Why did we use the \(\text{asinh}\) transformation for the histogram? How does it look like with no transformation, the logarithm, the shifted logarithm, i.e., \(\log(x+\text{const.})\) ?

Question 6.23 In the scatterplot, why did we use \(-\log_{10}\) for the p-values? Why the rank transformation for the baseMean ?

For convenience, we discretize baseMean into a factor variable group , which corresponds to six equal-sized groups.

In Figures 6.19 and 6.20 we see the histograms of p-values and the ECDFs stratified by stratum .

biology hypothesis testing examples

If we were to fit the two-group model to these strata separately, we would get quite different estimates for \(\pi_0\) and \(f_{\text{alt}}\) . For the most lowly expressed genes, the power of the DESeq2 -test is low, and the p-values essentially all come from the null component. As we go higher in average expression, the height of the small-p-values peak in the histograms increases, reflecting the increasing power of the test.

Can we use that to improve our handling of the multiple testing? It turns out that this is possible. One approach is independent hypothesis weighting (IHW) ( Ignatiadis et al. 2016 ; Ignatiadis and Huber 2021 ) 22 .

22  There are a number of other approaches, see e.g., a benchmark study by Korthauer et al. ( 2019 ) or the citations in the paper by Ignatiadis and Huber ( 2021 ) .

Let’s compare this to what we get from the ordinary (unweighted) Benjamini-Hochberg method:

With hypothesis weighting, we get more rejections. For these data, the difference is notable though not spectacular; this is because their signal-to-noise ratio is already quite high. In other situations, where there is less power to begin with (e.g., where there are fewer replicates, the data are more noisy, or the effect of the treatment is less drastic), the difference from using IHW can be more pronounced.

We can have a look at the weights determined by the ihw function ( Figure  6.21 ).

biology hypothesis testing examples

Intuitively, what happens here is that IHW chooses to put more weight on the hypothesis strata with higher baseMean , and low weight on those with very low counts. The Benjamini-Hochberg method has a certain type-I error budget, and rather than spreading it equally among all hypotheses, here we take it away from those strata that have little change of small fdr anyway, and “invest” it in strata where many hypotheses can be rejected at small fdr.

Question 6.24 Why does Figure  6.21 show 5 curves, rather than only one?

Such possibilities for stratification by an additional summary statistic besides the p-value—in our case, the baseMean —exist in many multiple testing situations. Informally, we need such a so-called covariate to be

statistically independent from our p-values under the null, but

informative of the prior probability \(\pi_0\) and/or the power of the test (the shape of the alternative density, \(f_{\text{alt}}\) ) in the two-groups model.

These requirements can be assessed through diagnostic plots as in Figures 6.17 — 6.20 .

6.12 Summary of this chapter

We have explored the concepts behind single hypothesis testing and then moved on to multiple testing . We have seen how some of the limitations of interpreting a single p-value from a single test can be overcome once we are able to consider a whole distribution of outcomes from many tests. We have also seen that there are often additional summary statistics of our data, besides the p-values. We called them informative covariates, and we saw how we can use them to weigh the p-values and overall get more (or better) discoveries.

The usage of hypothesis testing in the multiple testing scenario is quite different from that in the single test case: for the latter, the hypothesis test might literally be the final result, the culmination of a long and expensive data acquisition campaign (ideally, with a prespecified hypothesis and data analysis plan). In the multiple testing case, its outcome will often just be an intermediate step: a subset of most worthwhile hypotheses selected by screening a large initial set. This subset is then followed up by more careful analyses.

We have seen the concept of the false discovery rate (FDR). It is important to keep in mind that this is an average property, for the subset of hypotheses that were selected. Like other averages, it does not say anything about the individual hypotheses. Then there is the concept of the local false discovery rate (fdr), which indeed does apply to an individual hypothesis. The local false discovery rate is however quite unrelated to the p-value, as the two-group model showed us. Much of the confusion and frustration about p-values seems to come from the fact that people would like to use them for purposes that the fdr is made for. It is perhaps a historical aberration that so much of applied sciences focuses on p-values and not local false discovery rate. On the other hand, there are also practical reasons, since a p-value is readily computed, whereas a fdr is difficult to estimate or control from data without making strong modelling assumptions.

We saw the importance of diagnostic plots, in particular, to always look at the p-value histograms when encountering a multiple testing analysis.

6.13 Further reading

A comprehensive text book treatment of multiple testing is given by Efron ( 2010 ) .

Outcome switching in clinical trials: http://compare-trials.org

For hypothesis weighting, the IHW vignette, the IHW paper ( Ignatiadis et al. 2016 ) and the references therein.

6.14 Exercises

Exercise 6.1  

Identify an application from your scientific field of expertise that relies on multiple testing. Find an exemplary dataset and plot the histogram of p-values. Are the hypotheses all exchangeable, or is there one or more informative covariates? Plot the stratified histograms.

Exercise 6.2  

Why do mathematical statisticians focus so much on the null hypothesis of a test, compared to the alternative hypothesis?

Exercise 6.3  

How can we ever prove that the null hypothesis is true? Or that the alternative is true?

Exercise 6.4  

Make a less extreme example of correlated test statistics than the data duplication at the end of Section 6.5 . Simulate data with true null hypotheses only, and let the data morph from having completely independent replicates (columns) to highly correlated as a function of some continuous-valued control parameter. Check type-I error control (e.g., with the p-value histogram) as a function of this control parameter.

Exercise 6.5  

Find an example in the published literature that looks as if p-value hacking, outcome switching, HARKing played a role.

Exercise 6.6  

The FDR is an expectation value, i.e., it is used if we want to control the average behavior of a procedure. Are there methods for worst case control?

Exercise 6.7  

What is the memory and time complexity of the Benjamini-Hochberg algorithm? How about the IHW method? Can you fit polynomial functions as a function of the number of tests \(m\) ? Hint: Simulate data with increasing numbers of hypothesis tests, measure time and memory consumption with functions such as pryr::object_size or microbenchmark from the eponymous package, and plot these against \(m\) in a double-logarithmic plot.

Page built on 2023-08-03 21:37:40.81968 using R version 4.3.0 (2023-04-21)

biology hypothesis testing examples

U.S. flag

An official website of the United States government

The .gov means it’s official. Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

The site is secure. The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

  • Publications
  • Account settings

Preview improvements coming to the PMC website in October 2024. Learn More or Try it out now .

  • Advanced Search
  • Journal List
  • Indian J Crit Care Med
  • v.23(Suppl 3); 2019 Sep

An Introduction to Statistics: Understanding Hypothesis Testing and Statistical Errors

Priya ranganathan.

1 Department of Anesthesiology, Critical Care and Pain, Tata Memorial Hospital, Mumbai, Maharashtra, India

2 Department of Surgical Oncology, Tata Memorial Centre, Mumbai, Maharashtra, India

The second article in this series on biostatistics covers the concepts of sample, population, research hypotheses and statistical errors.

How to cite this article

Ranganathan P, Pramesh CS. An Introduction to Statistics: Understanding Hypothesis Testing and Statistical Errors. Indian J Crit Care Med 2019;23(Suppl 3):S230–S231.

Two papers quoted in this issue of the Indian Journal of Critical Care Medicine report. The results of studies aim to prove that a new intervention is better than (superior to) an existing treatment. In the ABLE study, the investigators wanted to show that transfusion of fresh red blood cells would be superior to standard-issue red cells in reducing 90-day mortality in ICU patients. 1 The PROPPR study was designed to prove that transfusion of a lower ratio of plasma and platelets to red cells would be superior to a higher ratio in decreasing 24-hour and 30-day mortality in critically ill patients. 2 These studies are known as superiority studies (as opposed to noninferiority or equivalence studies which will be discussed in a subsequent article).

SAMPLE VERSUS POPULATION

A sample represents a group of participants selected from the entire population. Since studies cannot be carried out on entire populations, researchers choose samples, which are representative of the population. This is similar to walking into a grocery store and examining a few grains of rice or wheat before purchasing an entire bag; we assume that the few grains that we select (the sample) are representative of the entire sack of grains (the population).

The results of the study are then extrapolated to generate inferences about the population. We do this using a process known as hypothesis testing. This means that the results of the study may not always be identical to the results we would expect to find in the population; i.e., there is the possibility that the study results may be erroneous.

HYPOTHESIS TESTING

A clinical trial begins with an assumption or belief, and then proceeds to either prove or disprove this assumption. In statistical terms, this belief or assumption is known as a hypothesis. Counterintuitively, what the researcher believes in (or is trying to prove) is called the “alternate” hypothesis, and the opposite is called the “null” hypothesis; every study has a null hypothesis and an alternate hypothesis. For superiority studies, the alternate hypothesis states that one treatment (usually the new or experimental treatment) is superior to the other; the null hypothesis states that there is no difference between the treatments (the treatments are equal). For example, in the ABLE study, we start by stating the null hypothesis—there is no difference in mortality between groups receiving fresh RBCs and standard-issue RBCs. We then state the alternate hypothesis—There is a difference between groups receiving fresh RBCs and standard-issue RBCs. It is important to note that we have stated that the groups are different, without specifying which group will be better than the other. This is known as a two-tailed hypothesis and it allows us to test for superiority on either side (using a two-sided test). This is because, when we start a study, we are not 100% certain that the new treatment can only be better than the standard treatment—it could be worse, and if it is so, the study should pick it up as well. One tailed hypothesis and one-sided statistical testing is done for non-inferiority studies, which will be discussed in a subsequent paper in this series.

STATISTICAL ERRORS

There are two possibilities to consider when interpreting the results of a superiority study. The first possibility is that there is truly no difference between the treatments but the study finds that they are different. This is called a Type-1 error or false-positive error or alpha error. This means falsely rejecting the null hypothesis.

The second possibility is that there is a difference between the treatments and the study does not pick up this difference. This is called a Type 2 error or false-negative error or beta error. This means falsely accepting the null hypothesis.

The power of the study is the ability to detect a difference between groups and is the converse of the beta error; i.e., power = 1-beta error. Alpha and beta errors are finalized when the protocol is written and form the basis for sample size calculation for the study. In an ideal world, we would not like any error in the results of our study; however, we would need to do the study in the entire population (infinite sample size) to be able to get a 0% alpha and beta error. These two errors enable us to do studies with realistic sample sizes, with the compromise that there is a small possibility that the results may not always reflect the truth. The basis for this will be discussed in a subsequent paper in this series dealing with sample size calculation.

Conventionally, type 1 or alpha error is set at 5%. This means, that at the end of the study, if there is a difference between groups, we want to be 95% certain that this is a true difference and allow only a 5% probability that this difference has occurred by chance (false positive). Type 2 or beta error is usually set between 10% and 20%; therefore, the power of the study is 90% or 80%. This means that if there is a difference between groups, we want to be 80% (or 90%) certain that the study will detect that difference. For example, in the ABLE study, sample size was calculated with a type 1 error of 5% (two-sided) and power of 90% (type 2 error of 10%) (1).

Table 1 gives a summary of the two types of statistical errors with an example

Statistical errors

(a) Types of statistical errors
: Null hypothesis is
TrueFalse
Null hypothesis is actuallyTrueCorrect results!Falsely rejecting null hypothesis - Type I error
FalseFalsely accepting null hypothesis - Type II errorCorrect results!
(b) Possible statistical errors in the ABLE trial
There is difference in mortality between groups receiving fresh RBCs and standard-issue RBCsThere difference in mortality between groups receiving fresh RBCs and standard-issue RBCs
TruthThere is difference in mortality between groups receiving fresh RBCs and standard-issue RBCsCorrect results!Falsely rejecting null hypothesis - Type I error
There difference in mortality between groups receiving fresh RBCs and standard-issue RBCsFalsely accepting null hypothesis - Type II errorCorrect results!

In the next article in this series, we will look at the meaning and interpretation of ‘ p ’ value and confidence intervals for hypothesis testing.

Source of support: Nil

Conflict of interest: None

This page has been archived and is no longer updated

Genetics and Statistical Analysis

biology hypothesis testing examples

Once you have performed an experiment, how can you tell if your results are significant? For example, say that you are performing a genetic cross in which you know the genotypes of the parents. In this situation, you might hypothesize that the cross will result in a certain ratio of phenotypes in the offspring . But what if your observed results do not exactly match your expectations? How can you tell whether this deviation was due to chance? The key to answering these questions is the use of statistics , which allows you to determine whether your data are consistent with your hypothesis.

Forming and Testing a Hypothesis

The first thing any scientist does before performing an experiment is to form a hypothesis about the experiment's outcome. This often takes the form of a null hypothesis , which is a statistical hypothesis that states there will be no difference between observed and expected data. The null hypothesis is proposed by a scientist before completing an experiment, and it can be either supported by data or disproved in favor of an alternate hypothesis.

Let's consider some examples of the use of the null hypothesis in a genetics experiment. Remember that Mendelian inheritance deals with traits that show discontinuous variation, which means that the phenotypes fall into distinct categories. As a consequence, in a Mendelian genetic cross, the null hypothesis is usually an extrinsic hypothesis ; in other words, the expected proportions can be predicted and calculated before the experiment starts. Then an experiment can be designed to determine whether the data confirm or reject the hypothesis. On the other hand, in another experiment, you might hypothesize that two genes are linked. This is called an intrinsic hypothesis , which is a hypothesis in which the expected proportions are calculated after the experiment is done using some information from the experimental data (McDonald, 2008).

How Math Merged with Biology

But how did mathematics and genetics come to be linked through the use of hypotheses and statistical analysis? The key figure in this process was Karl Pearson, a turn-of-the-century mathematician who was fascinated with biology. When asked what his first memory was, Pearson responded by saying, "Well, I do not know how old I was, but I was sitting in a high chair and I was sucking my thumb. Someone told me to stop sucking it and said that if I did so, the thumb would wither away. I put my two thumbs together and looked at them a long time. ‘They look alike to me,' I said to myself, ‘I can't see that the thumb I suck is any smaller than the other. I wonder if she could be lying to me'" (Walker, 1958). As this anecdote illustrates, Pearson was perhaps born to be a scientist. He was a sharp observer and intent on interpreting his own data. During his career, Pearson developed statistical theories and applied them to the exploration of biological data. His innovations were not well received, however, and he faced an arduous struggle in convincing other scientists to accept the idea that mathematics should be applied to biology. For instance, during Pearson's time, the Royal Society, which is the United Kingdom's academy of science, would accept papers that concerned either mathematics or biology, but it refused to accept papers than concerned both subjects (Walker, 1958). In response, Pearson, along with Francis Galton and W. F. R. Weldon, founded a new journal called Biometrika in 1901 to promote the statistical analysis of data on heredity. Pearson's persistence paid off. Today, statistical tests are essential for examining biological data.

Pearson's Chi-Square Test for Goodness-of-Fit

One of Pearson's most significant achievements occurred in 1900, when he developed a statistical test called Pearson's chi-square (Χ 2 ) test, also known as the chi-square test for goodness-of-fit (Pearson, 1900). Pearson's chi-square test is used to examine the role of chance in producing deviations between observed and expected values. The test depends on an extrinsic hypothesis, because it requires theoretical expected values to be calculated. The test indicates the probability that chance alone produced the deviation between the expected and the observed values (Pierce, 2005). When the probability calculated from Pearson's chi-square test is high, it is assumed that chance alone produced the difference. Conversely, when the probability is low, it is assumed that a significant factor other than chance produced the deviation.

In 1912, J. Arthur Harris applied Pearson's chi-square test to examine Mendelian ratios (Harris, 1912). It is important to note that when Gregor Mendel studied inheritance, he did not use statistics, and neither did Bateson, Saunders, Punnett, and Morgan during their experiments that discovered genetic linkage . Thus, until Pearson's statistical tests were applied to biological data, scientists judged the goodness of fit between theoretical and observed experimental results simply by inspecting the data and drawing conclusions (Harris, 1912). Although this method can work perfectly if one's data exactly matches one's predictions, scientific experiments often have variability associated with them, and this makes statistical tests very useful.

The chi-square value is calculated using the following formula:

Using this formula, the difference between the observed and expected frequencies is calculated for each experimental outcome category. The difference is then squared and divided by the expected frequency . Finally, the chi-square values for each outcome are summed together, as represented by the summation sign (Σ).

Pearson's chi-square test works well with genetic data as long as there are enough expected values in each group. In the case of small samples (less than 10 in any category) that have 1 degree of freedom, the test is not reliable. (Degrees of freedom, or df, will be explained in full later in this article.) However, in such cases, the test can be corrected by using the Yates correction for continuity, which reduces the absolute value of each difference between observed and expected frequencies by 0.5 before squaring. Additionally, it is important to remember that the chi-square test can only be applied to numbers of progeny , not to proportions or percentages.

Now that you know the rules for using the test, it's time to consider an example of how to calculate Pearson's chi-square. Recall that when Mendel crossed his pea plants, he learned that tall (T) was dominant to short (t). You want to confirm that this is correct, so you start by formulating the following null hypothesis: In a cross between two heterozygote (Tt) plants, the offspring should occur in a 3:1 ratio of tall plants to short plants. Next, you cross the plants, and after the cross, you measure the characteristics of 400 offspring. You note that there are 305 tall pea plants and 95 short pea plants; these are your observed values. Meanwhile, you expect that there will be 300 tall plants and 100 short plants from the Mendelian ratio.

You are now ready to perform statistical analysis of your results, but first, you have to choose a critical value at which to reject your null hypothesis. You opt for a critical value probability of 0.01 (1%) that the deviation between the observed and expected values is due to chance. This means that if the probability is less than 0.01, then the deviation is significant and not due to chance, and you will reject your null hypothesis. However, if the deviation is greater than 0.01, then the deviation is not significant and you will not reject the null hypothesis.

So, should you reject your null hypothesis or not? Here's a summary of your observed and expected data:

  300 100 305 95

Now, let's calculate Pearson's chi-square:

  • For tall plants: Χ 2 = (305 - 300) 2 / 300 = 0.08
  • For short plants: Χ 2 = (95 - 100) 2 / 100 = 0.25
  • The sum of the two categories is 0.08 + 0.25 = 0.33
  • Therefore, the overall Pearson's chi-square for the experiment is Χ 2 = 0.33

Next, you determine the probability that is associated with your calculated chi-square value. To do this, you compare your calculated chi-square value with theoretical values in a chi-square table that has the same number of degrees of freedom. Degrees of freedom represent the number of ways in which the observed outcome categories are free to vary. For Pearson's chi-square test, the degrees of freedom are equal to n - 1, where n represents the number of different expected phenotypes (Pierce, 2005). In your experiment, there are two expected outcome phenotypes (tall and short), so n = 2 categories, and the degrees of freedom equal 2 - 1 = 1. Thus, with your calculated chi-square value (0.33) and the associated degrees of freedom (1), you can determine the probability by using a chi-square table (Table 1).

Table 1: Chi-Square Table

0.995 0.99 0.975 0.95 0.90 0.10 0.05 0.025 0.01 0.005 1 --- --- 0.001 0.004 0.016 2.706 3.841 5.024 6.635 7.879 2 0.010 0.020 0.051 0.103 0.211 4.605 5.991 7.378 9.210 10.597 3 0.072 0.115 0.216 0.352 0.584 6.251 7.815 9.348 11.345 12.838 4 0.207 0.297 0.484 0.711 1.064 7.779 9.488 11.143 13.277 14.860 5 0.412 0.554 0.831 1.145 1.610 9.236 11.070 12.833 15.086 16.750 6 0.676 0.872 1.237 1.635 2.204 10.645 12.592 14.449 16.812 18.548 7 0.989 1.239 1.690 2.167 2.833 12.017 14.067 16.013 18.475 20.278 8 1.344 1.646 2.180 2.733 3.490 13.362 15.507 17.535 20.090 21.955 9 1.735 2.088 2.700 3.325 4.168 14.684 16.919 19.023 21.666 23.589 10 2.156 2.558 3.247 3.940 4.865 15.987 18.307 20.483 23.209 25.188 11 2.603 3.053 3.816 4.575 5.578 17.275 19.675 21.920 24.725 26.757 12 3.074 3.571 4.404 5.226 6.304 18.549 21.026 23.337 26.217 28.300 13 3.565 4.107 5.009 5.892 7.042 19.812 22.362 24.736 27.688 29.819 14 4.075 4.660 5.629 6.571 7.790 21.064 23.685 26.119 29.141 31.319 15 4.601 5.229 6.262 7.261 8.547 22.307 24.996 27.488 30.578 32.801 16 5.142 5.812 6.908 7.962 9.312 23.542 26.296 28.845 32.000 34.267 17 5.697 6.408 7.564 8.672 10.085 24.769 27.587 30.191 33.409 35.718 18 6.265 7.015 8.231 9.390 10.865 25.989 28.869 31.526 34.805 37.156 19 6.844 7.633 8.907 10.117 11.651 27.204 30.144 32.852 36.191 38.582 20 7.434 8.260 9.591 10.851 12.443 28.412 31.410 34.170 37.566 39.997 21 8.034 8.897 10.283 11.591 13.240 29.615 32.671 35.479 38.932 41.401 22 8.643 9.542 10.982 12.338 14.041 30.813 33.924 36.781 40.289 42.796 23 9.260 10.196 11.689 13.091 14.848 32.007 35.172 38.076 41.638 44.181 24 9.886 10.856 12.401 13.848 15.659 33.196 36.415 39.364 42.980 45.559 25 10.520 11.524 13.120 14.611 16.473 34.382 37.652 40.646 44.314 46.928 26 11.160 12.198 13.844 15.379 17.292 35.563 38.885 41.923 45.642 48.290 27 11.808 12.879 14.573 16.151 18.114 36.741 40.113 43.195 46.963 49.645 28 12.461 13.565 15.308 16.928 18.939 37.916 41.337 44.461 48.278 50.993 29 13.121 14.256 16.047 17.708 19.768 39.087 42.557 45.722 49.588 52.336 30 13.787 14.953 16.791 18.493 20.599 40.256 43.773 46.979 50.892 53.672 40 20.707 22.164 24.433 26.509 29.051 51.805 55.758 59.342 63.691 66.766 50 27.991 29.707 32.357 34.764 37.689 63.167 67.505 71.420 76.154 79.490 60 35.534 37.485 40.482 43.188 46.459 74.397 79.082 83.298 88.379 91.952 70 43.275 45.442 48.758 51.739 55.329 85.527 90.531 95.023 100.425 104.215 80 51.172 53.540 57.153 60.391 64.278 96.578 101.879 106.629 112.329 116.321 90 59.196 61.754 65.647 69.126 73.291 107.565 113.145 118.136 124.116 128.299 100 67.328 70.065 74.222 77.929 82.358 118.498 124.342 129.561 135.807 140.169  

&

(Table adapted from Jones, 2008)

Note that the chi-square table is organized with degrees of freedom (df) in the left column and probabilities (P) at the top. The chi-square values associated with the probabilities are in the center of the table. To determine the probability, first locate the row for the degrees of freedom for your experiment, then determine where the calculated chi-square value would be placed among the theoretical values in the corresponding row.

At the beginning of your experiment, you decided that if the probability was less than 0.01, you would reject your null hypothesis because the deviation would be significant and not due to chance. Now, looking at the row that corresponds to 1 degree of freedom, you see that your calculated chi-square value of 0.33 falls between 0.016, which is associated with a probability of 0.9, and 2.706, which is associated with a probability of 0.10. Therefore, there is between a 10% and 90% probability that the deviation you observed between your expected and the observed numbers of tall and short plants is due to chance. In other words, the probability associated with your chi-square value is much greater than the critical value of 0.01. This means that we will not reject our null hypothesis, and the deviation between the observed and expected results is not significant.

Level of Significance

Determining whether to accept or reject a hypothesis is decided by the experimenter, who is the person who chooses the "level of significance" or confidence. Scientists commonly use the 0.05, 0.01, or 0.001 probability levels as cut-off values. For instance, in the example experiment, you used the 0.01 probability. Thus, P ≥ 0.01 can be interpreted to mean that chance likely caused the deviation between the observed and the expected values (i.e. there is a greater than 1% probability that chance explains the data). If instead we had observed that P ≤ 0.01, this would mean that there is less than a 1% probability that our data can be explained by chance. There is a significant difference between our expected and observed results, so the deviation must be caused by something other than chance.

References and Recommended Reading

Harris, J. A. A simple test of the goodness of fit of Mendelian ratios. American Naturalist 46 , 741–745 (1912)

Jones, J. "Table: Chi-Square Probabilities." http://people.richland.edu/james/lecture/m170/tbl-chi.html (2008) (accessed July 7, 2008)

McDonald, J. H. Chi-square test for goodness-of-fit. From The Handbook of Biological Statistics . http://udel.edu/~mcdonald/statchigof.html (2008) (accessed June 9, 2008)

Pearson, K. On the criterion that a given system of deviations from the probable in the case of correlated system of variables is such that it can be reasonably supposed to have arisen from random sampling. Philosophical Magazine 50 , 157–175 (1900)

Pierce, B. Genetics: A Conceptual Approach (New York, Freeman, 2005)

Walker, H. M. The contributions of Karl Pearson. Journal of the American Statistical Association 53 , 11–22 (1958)

  • Add Content to Group

Article History

Flag inappropriate.

Google Plus+

StumbleUpon

Email your Friend

biology hypothesis testing examples

  •  |  Lead Editor:  Terry McGuire

Topic Rooms

Within this Subject (29)

  • Gene Linkage (5)
  • Methods for Studying Inheritance Patterns (7)
  • The Foundation of Inheritance Studies (11)
  • Variation in Gene Expression (6)

Other Topic Rooms

  • Gene Inheritance and Transmission
  • Gene Expression and Regulation
  • Nucleic Acid Structure and Function
  • Chromosomes and Cytogenetics
  • Evolutionary Genetics
  • Population and Quantitative Genetics
  • Genes and Disease
  • Genetics and Society
  • Cell Origins and Metabolism
  • Proteins and Gene Expression
  • Subcellular Compartments
  • Cell Communication
  • Cell Cycle and Cell Division

ScholarCast

© 2014 Nature Education

  • Press Room |
  • Terms of Use |
  • Privacy Notice |

Send

Visual Browse

Have a language expert improve your writing

Run a free plagiarism check in 10 minutes, generate accurate citations for free.

  • Knowledge Base

Hypothesis Testing | A Step-by-Step Guide with Easy Examples

Published on November 8, 2019 by Rebecca Bevans . Revised on June 22, 2023.

Hypothesis testing is a formal procedure for investigating our ideas about the world using statistics . It is most often used by scientists to test specific predictions, called hypotheses, that arise from theories.

There are 5 main steps in hypothesis testing:

  • State your research hypothesis as a null hypothesis and alternate hypothesis (H o ) and (H a  or H 1 ).
  • Collect data in a way designed to test the hypothesis.
  • Perform an appropriate statistical test .
  • Decide whether to reject or fail to reject your null hypothesis.
  • Present the findings in your results and discussion section.

Though the specific details might vary, the procedure you will use when testing a hypothesis will always follow some version of these steps.

Table of contents

Step 1: state your null and alternate hypothesis, step 2: collect data, step 3: perform a statistical test, step 4: decide whether to reject or fail to reject your null hypothesis, step 5: present your findings, other interesting articles, frequently asked questions about hypothesis testing.

After developing your initial research hypothesis (the prediction that you want to investigate), it is important to restate it as a null (H o ) and alternate (H a ) hypothesis so that you can test it mathematically.

The alternate hypothesis is usually your initial hypothesis that predicts a relationship between variables. The null hypothesis is a prediction of no relationship between the variables you are interested in.

  • H 0 : Men are, on average, not taller than women. H a : Men are, on average, taller than women.

Here's why students love Scribbr's proofreading services

Discover proofreading & editing

For a statistical test to be valid , it is important to perform sampling and collect data in a way that is designed to test your hypothesis. If your data are not representative, then you cannot make statistical inferences about the population you are interested in.

There are a variety of statistical tests available, but they are all based on the comparison of within-group variance (how spread out the data is within a category) versus between-group variance (how different the categories are from one another).

If the between-group variance is large enough that there is little or no overlap between groups, then your statistical test will reflect that by showing a low p -value . This means it is unlikely that the differences between these groups came about by chance.

Alternatively, if there is high within-group variance and low between-group variance, then your statistical test will reflect that with a high p -value. This means it is likely that any difference you measure between groups is due to chance.

Your choice of statistical test will be based on the type of variables and the level of measurement of your collected data .

  • an estimate of the difference in average height between the two groups.
  • a p -value showing how likely you are to see this difference if the null hypothesis of no difference is true.

Based on the outcome of your statistical test, you will have to decide whether to reject or fail to reject your null hypothesis.

In most cases you will use the p -value generated by your statistical test to guide your decision. And in most cases, your predetermined level of significance for rejecting the null hypothesis will be 0.05 – that is, when there is a less than 5% chance that you would see these results if the null hypothesis were true.

In some cases, researchers choose a more conservative level of significance, such as 0.01 (1%). This minimizes the risk of incorrectly rejecting the null hypothesis ( Type I error ).

The results of hypothesis testing will be presented in the results and discussion sections of your research paper , dissertation or thesis .

In the results section you should give a brief summary of the data and a summary of the results of your statistical test (for example, the estimated difference between group means and associated p -value). In the discussion , you can discuss whether your initial hypothesis was supported by your results or not.

In the formal language of hypothesis testing, we talk about rejecting or failing to reject the null hypothesis. You will probably be asked to do this in your statistics assignments.

However, when presenting research results in academic papers we rarely talk this way. Instead, we go back to our alternate hypothesis (in this case, the hypothesis that men are on average taller than women) and state whether the result of our test did or did not support the alternate hypothesis.

If your null hypothesis was rejected, this result is interpreted as “supported the alternate hypothesis.”

These are superficial differences; you can see that they mean the same thing.

You might notice that we don’t say that we reject or fail to reject the alternate hypothesis . This is because hypothesis testing is not designed to prove or disprove anything. It is only designed to test whether a pattern we measure could have arisen spuriously, or by chance.

If we reject the null hypothesis based on our research (i.e., we find that it is unlikely that the pattern arose by chance), then we can say our test lends support to our hypothesis . But if the pattern does not pass our decision rule, meaning that it could have arisen by chance, then we say the test is inconsistent with our hypothesis .

If you want to know more about statistics , methodology , or research bias , make sure to check out some of our other articles with explanations and examples.

  • Normal distribution
  • Descriptive statistics
  • Measures of central tendency
  • Correlation coefficient

Methodology

  • Cluster sampling
  • Stratified sampling
  • Types of interviews
  • Cohort study
  • Thematic analysis

Research bias

  • Implicit bias
  • Cognitive bias
  • Survivorship bias
  • Availability heuristic
  • Nonresponse bias
  • Regression to the mean

Hypothesis testing is a formal procedure for investigating our ideas about the world using statistics. It is used by scientists to test specific predictions, called hypotheses , by calculating how likely it is that a pattern or relationship between variables could have arisen by chance.

A hypothesis states your predictions about what your research will find. It is a tentative answer to your research question that has not yet been tested. For some research projects, you might have to write several hypotheses that address different aspects of your research question.

A hypothesis is not just a guess — it should be based on existing theories and knowledge. It also has to be testable, which means you can support or refute it through scientific research methods (such as experiments, observations and statistical analysis of data).

Null and alternative hypotheses are used in statistical hypothesis testing . The null hypothesis of a test always predicts no effect or no relationship between variables, while the alternative hypothesis states your research prediction of an effect or relationship.

Cite this Scribbr article

If you want to cite this source, you can copy and paste the citation or click the “Cite this Scribbr article” button to automatically add the citation to our free Citation Generator.

Bevans, R. (2023, June 22). Hypothesis Testing | A Step-by-Step Guide with Easy Examples. Scribbr. Retrieved August 12, 2024, from https://www.scribbr.com/statistics/hypothesis-testing/

Is this article helpful?

Rebecca Bevans

Rebecca Bevans

Other students also liked, choosing the right statistical test | types & examples, understanding p values | definition and examples, what is your plagiarism score.

Back Home

  • Science Notes Posts
  • Contact Science Notes
  • Todd Helmenstine Biography
  • Anne Helmenstine Biography
  • Free Printable Periodic Tables (PDF and PNG)
  • Periodic Table Wallpapers
  • Interactive Periodic Table
  • Periodic Table Posters
  • Science Experiments for Kids
  • How to Grow Crystals
  • Chemistry Projects
  • Fire and Flames Projects
  • Holiday Science
  • Chemistry Problems With Answers
  • Physics Problems
  • Unit Conversion Example Problems
  • Chemistry Worksheets
  • Biology Worksheets
  • Periodic Table Worksheets
  • Physical Science Worksheets
  • Science Lab Worksheets
  • My Amazon Books

Hypothesis Examples

Hypothesis Examples

A hypothesis is a prediction of the outcome of a test. It forms the basis for designing an experiment in the scientific method . A good hypothesis is testable, meaning it makes a prediction you can check with observation or experimentation. Here are different hypothesis examples.

Null Hypothesis Examples

The null hypothesis (H 0 ) is also known as the zero-difference or no-difference hypothesis. It predicts that changing one variable ( independent variable ) will have no effect on the variable being measured ( dependent variable ). Here are null hypothesis examples:

  • Plant growth is unaffected by temperature.
  • If you increase temperature, then solubility of salt will increase.
  • Incidence of skin cancer is unrelated to ultraviolet light exposure.
  • All brands of light bulb last equally long.
  • Cats have no preference for the color of cat food.
  • All daisies have the same number of petals.

Sometimes the null hypothesis shows there is a suspected correlation between two variables. For example, if you think plant growth is affected by temperature, you state the null hypothesis: “Plant growth is not affected by temperature.” Why do you do this, rather than say “If you change temperature, plant growth will be affected”? The answer is because it’s easier applying a statistical test that shows, with a high level of confidence, a null hypothesis is correct or incorrect.

Research Hypothesis Examples

A research hypothesis (H 1 ) is a type of hypothesis used to design an experiment. This type of hypothesis is often written as an if-then statement because it’s easy identifying the independent and dependent variables and seeing how one affects the other. If-then statements explore cause and effect. In other cases, the hypothesis shows a correlation between two variables. Here are some research hypothesis examples:

  • If you leave the lights on, then it takes longer for people to fall asleep.
  • If you refrigerate apples, they last longer before going bad.
  • If you keep the curtains closed, then you need less electricity to heat or cool the house (the electric bill is lower).
  • If you leave a bucket of water uncovered, then it evaporates more quickly.
  • Goldfish lose their color if they are not exposed to light.
  • Workers who take vacations are more productive than those who never take time off.

Is It Okay to Disprove a Hypothesis?

Yes! You may even choose to write your hypothesis in such a way that it can be disproved because it’s easier to prove a statement is wrong than to prove it is right. In other cases, if your prediction is incorrect, that doesn’t mean the science is bad. Revising a hypothesis is common. It demonstrates you learned something you did not know before you conducted the experiment.

Test yourself with a Scientific Method Quiz .

  • Mellenbergh, G.J. (2008). Chapter 8: Research designs: Testing of research hypotheses. In H.J. Adèr & G.J. Mellenbergh (eds.), Advising on Research Methods: A Consultant’s Companion . Huizen, The Netherlands: Johannes van Kessel Publishing.
  • Popper, Karl R. (1959). The Logic of Scientific Discovery . Hutchinson & Co. ISBN 3-1614-8410-X.
  • Schick, Theodore; Vaughn, Lewis (2002). How to think about weird things: critical thinking for a New Age . Boston: McGraw-Hill Higher Education. ISBN 0-7674-2048-9.
  • Tobi, Hilde; Kampen, Jarl K. (2018). “Research design: the methodology for interdisciplinary research framework”. Quality & Quantity . 52 (3): 1209–1225. doi: 10.1007/s11135-017-0513-8

Related Posts

Module 1: Introduction to Biology

Experiments and hypotheses, learning outcomes.

  • Form a hypothesis and use it to design a scientific experiment

Now we’ll focus on the methods of scientific inquiry. Science often involves making observations and developing hypotheses. Experiments and further observations are often used to test the hypotheses.

A scientific experiment is a carefully organized procedure in which the scientist intervenes in a system to change something, then observes the result of the change. Scientific inquiry often involves doing experiments, though not always. For example, a scientist studying the mating behaviors of ladybugs might begin with detailed observations of ladybugs mating in their natural habitats. While this research may not be experimental, it is scientific: it involves careful and verifiable observation of the natural world. The same scientist might then treat some of the ladybugs with a hormone hypothesized to trigger mating and observe whether these ladybugs mated sooner or more often than untreated ones. This would qualify as an experiment because the scientist is now making a change in the system and observing the effects.

Forming a Hypothesis

When conducting scientific experiments, researchers develop hypotheses to guide experimental design. A hypothesis is a suggested explanation that is both testable and falsifiable. You must be able to test your hypothesis through observations and research, and it must be possible to prove your hypothesis false.

For example, Michael observes that maple trees lose their leaves in the fall. He might then propose a possible explanation for this observation: “cold weather causes maple trees to lose their leaves in the fall.” This statement is testable. He could grow maple trees in a warm enclosed environment such as a greenhouse and see if their leaves still dropped in the fall. The hypothesis is also falsifiable. If the leaves still dropped in the warm environment, then clearly temperature was not the main factor in causing maple leaves to drop in autumn.

In the Try It below, you can practice recognizing scientific hypotheses. As you consider each statement, try to think as a scientist would: can I test this hypothesis with observations or experiments? Is the statement falsifiable? If the answer to either of these questions is “no,” the statement is not a valid scientific hypothesis.

Practice Questions

Determine whether each following statement is a scientific hypothesis.

Air pollution from automobile exhaust can trigger symptoms in people with asthma.

  • No. This statement is not testable or falsifiable.
  • No. This statement is not testable.
  • No. This statement is not falsifiable.
  • Yes. This statement is testable and falsifiable.

Natural disasters, such as tornadoes, are punishments for bad thoughts and behaviors.

a: No. This statement is not testable or falsifiable. “Bad thoughts and behaviors” are excessively vague and subjective variables that would be impossible to measure or agree upon in a reliable way. The statement might be “falsifiable” if you came up with a counterexample: a “wicked” place that was not punished by a natural disaster. But some would question whether the people in that place were really wicked, and others would continue to predict that a natural disaster was bound to strike that place at some point. There is no reason to suspect that people’s immoral behavior affects the weather unless you bring up the intervention of a supernatural being, making this idea even harder to test.

Testing a Vaccine

Let’s examine the scientific process by discussing an actual scientific experiment conducted by researchers at the University of Washington. These researchers investigated whether a vaccine may reduce the incidence of the human papillomavirus (HPV). The experimental process and results were published in an article titled, “ A controlled trial of a human papillomavirus type 16 vaccine .”

Preliminary observations made by the researchers who conducted the HPV experiment are listed below:

  • Human papillomavirus (HPV) is the most common sexually transmitted virus in the United States.
  • There are about 40 different types of HPV. A significant number of people that have HPV are unaware of it because many of these viruses cause no symptoms.
  • Some types of HPV can cause cervical cancer.
  • About 4,000 women a year die of cervical cancer in the United States.

Practice Question

Researchers have developed a potential vaccine against HPV and want to test it. What is the first testable hypothesis that the researchers should study?

  • HPV causes cervical cancer.
  • People should not have unprotected sex with many partners.
  • People who get the vaccine will not get HPV.
  • The HPV vaccine will protect people against cancer.

Experimental Design

You’ve successfully identified a hypothesis for the University of Washington’s study on HPV: People who get the HPV vaccine will not get HPV.

The next step is to design an experiment that will test this hypothesis. There are several important factors to consider when designing a scientific experiment. First, scientific experiments must have an experimental group. This is the group that receives the experimental treatment necessary to address the hypothesis.

The experimental group receives the vaccine, but how can we know if the vaccine made a difference? Many things may change HPV infection rates in a group of people over time. To clearly show that the vaccine was effective in helping the experimental group, we need to include in our study an otherwise similar control group that does not get the treatment. We can then compare the two groups and determine if the vaccine made a difference. The control group shows us what happens in the absence of the factor under study.

However, the control group cannot get “nothing.” Instead, the control group often receives a placebo. A placebo is a procedure that has no expected therapeutic effect—such as giving a person a sugar pill or a shot containing only plain saline solution with no drug. Scientific studies have shown that the “placebo effect” can alter experimental results because when individuals are told that they are or are not being treated, this knowledge can alter their actions or their emotions, which can then alter the results of the experiment.

Moreover, if the doctor knows which group a patient is in, this can also influence the results of the experiment. Without saying so directly, the doctor may show—through body language or other subtle cues—their views about whether the patient is likely to get well. These errors can then alter the patient’s experience and change the results of the experiment. Therefore, many clinical studies are “double blind.” In these studies, neither the doctor nor the patient knows which group the patient is in until all experimental results have been collected.

Both placebo treatments and double-blind procedures are designed to prevent bias. Bias is any systematic error that makes a particular experimental outcome more or less likely. Errors can happen in any experiment: people make mistakes in measurement, instruments fail, computer glitches can alter data. But most such errors are random and don’t favor one outcome over another. Patients’ belief in a treatment can make it more likely to appear to “work.” Placebos and double-blind procedures are used to level the playing field so that both groups of study subjects are treated equally and share similar beliefs about their treatment.

The scientists who are researching the effectiveness of the HPV vaccine will test their hypothesis by separating 2,392 young women into two groups: the control group and the experimental group. Answer the following questions about these two groups.

  • This group is given a placebo.
  • This group is deliberately infected with HPV.
  • This group is given nothing.
  • This group is given the HPV vaccine.
  • a: This group is given a placebo. A placebo will be a shot, just like the HPV vaccine, but it will have no active ingredient. It may change peoples’ thinking or behavior to have such a shot given to them, but it will not stimulate the immune systems of the subjects in the same way as predicted for the vaccine itself.
  • d: This group is given the HPV vaccine. The experimental group will receive the HPV vaccine and researchers will then be able to see if it works, when compared to the control group.

Experimental Variables

A variable is a characteristic of a subject (in this case, of a person in the study) that can vary over time or among individuals. Sometimes a variable takes the form of a category, such as male or female; often a variable can be measured precisely, such as body height. Ideally, only one variable is different between the control group and the experimental group in a scientific experiment. Otherwise, the researchers will not be able to determine which variable caused any differences seen in the results. For example, imagine that the people in the control group were, on average, much more sexually active than the people in the experimental group. If, at the end of the experiment, the control group had a higher rate of HPV infection, could you confidently determine why? Maybe the experimental subjects were protected by the vaccine, but maybe they were protected by their low level of sexual contact.

To avoid this situation, experimenters make sure that their subject groups are as similar as possible in all variables except for the variable that is being tested in the experiment. This variable, or factor, will be deliberately changed in the experimental group. The one variable that is different between the two groups is called the independent variable. An independent variable is known or hypothesized to cause some outcome. Imagine an educational researcher investigating the effectiveness of a new teaching strategy in a classroom. The experimental group receives the new teaching strategy, while the control group receives the traditional strategy. It is the teaching strategy that is the independent variable in this scenario. In an experiment, the independent variable is the variable that the scientist deliberately changes or imposes on the subjects.

Dependent variables are known or hypothesized consequences; they are the effects that result from changes or differences in an independent variable. In an experiment, the dependent variables are those that the scientist measures before, during, and particularly at the end of the experiment to see if they have changed as expected. The dependent variable must be stated so that it is clear how it will be observed or measured. Rather than comparing “learning” among students (which is a vague and difficult to measure concept), an educational researcher might choose to compare test scores, which are very specific and easy to measure.

In any real-world example, many, many variables MIGHT affect the outcome of an experiment, yet only one or a few independent variables can be tested. Other variables must be kept as similar as possible between the study groups and are called control variables . For our educational research example, if the control group consisted only of people between the ages of 18 and 20 and the experimental group contained people between the ages of 30 and 35, we would not know if it was the teaching strategy or the students’ ages that played a larger role in the results. To avoid this problem, a good study will be set up so that each group contains students with a similar age profile. In a well-designed educational research study, student age will be a controlled variable, along with other possibly important factors like gender, past educational achievement, and pre-existing knowledge of the subject area.

What is the independent variable in this experiment?

  • Sex (all of the subjects will be female)
  • Presence or absence of the HPV vaccine
  • Presence or absence of HPV (the virus)

List three control variables other than age.

What is the dependent variable in this experiment?

  • Sex (male or female)
  • Rates of HPV infection
  • Age (years)
  • Revision and adaptation. Authored by : Shelli Carter and Lumen Learning. Provided by : Lumen Learning. License : CC BY-NC-SA: Attribution-NonCommercial-ShareAlike
  • Scientific Inquiry. Provided by : Open Learning Initiative. Located at : https://oli.cmu.edu/jcourse/workbook/activity/page?context=434a5c2680020ca6017c03488572e0f8 . Project : Introduction to Biology (Open + Free). License : CC BY-NC-SA: Attribution-NonCommercial-ShareAlike

Footer Logo Lumen Waymaker

biology hypothesis testing examples

User Preferences

Content preview.

Arcu felis bibendum ut tristique et egestas quis:

  • Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris
  • Duis aute irure dolor in reprehenderit in voluptate
  • Excepteur sint occaecat cupidatat non proident

Keyboard Shortcuts

S.3.3 hypothesis testing examples.

  • Example: Right-Tailed Test
  • Example: Left-Tailed Test
  • Example: Two-Tailed Test

Brinell Hardness Scores

An engineer measured the Brinell hardness of 25 pieces of ductile iron that were subcritically annealed. The resulting data were:

Brinell Hardness of 25 Pieces of Ductile Iron
170 167 174 179 179 187 179 183 179
156 163 156 187 156 167 156 174 170
183 179 174 179 170 159 187    

The engineer hypothesized that the mean Brinell hardness of all such ductile iron pieces is greater than 170. Therefore, he was interested in testing the hypotheses:

H 0 : μ = 170 H A : μ > 170

The engineer entered his data into Minitab and requested that the "one-sample t -test" be conducted for the above hypotheses. He obtained the following output:

Descriptive Statistics

N Mean StDev SE Mean 95% Lower Bound
25 172.52 10.31 2.06 168.99

$\mu$: mean of Brinelli

Null hypothesis    H₀: $\mu$ = 170 Alternative hypothesis    H₁: $\mu$ > 170

T-Value P-Value
1.22 0.117

The output tells us that the average Brinell hardness of the n = 25 pieces of ductile iron was 172.52 with a standard deviation of 10.31. (The standard error of the mean "SE Mean", calculated by dividing the standard deviation 10.31 by the square root of n = 25, is 2.06). The test statistic t * is 1.22, and the P -value is 0.117.

If the engineer set his significance level α at 0.05 and used the critical value approach to conduct his hypothesis test, he would reject the null hypothesis if his test statistic t * were greater than 1.7109 (determined using statistical software or a t -table):

t distribution graph for df = 24 and a right tailed test of .05 significance level

Since the engineer's test statistic, t * = 1.22, is not greater than 1.7109, the engineer fails to reject the null hypothesis. That is, the test statistic does not fall in the "critical region." There is insufficient evidence, at the \(\alpha\) = 0.05 level, to conclude that the mean Brinell hardness of all such ductile iron pieces is greater than 170.

If the engineer used the P -value approach to conduct his hypothesis test, he would determine the area under a t n - 1 = t 24 curve and to the right of the test statistic t * = 1.22:

t distribution graph of right tailed test showing the p-value of 0117 for a t-value of 1.22

In the output above, Minitab reports that the P -value is 0.117. Since the P -value, 0.117, is greater than \(\alpha\) = 0.05, the engineer fails to reject the null hypothesis. There is insufficient evidence, at the \(\alpha\) = 0.05 level, to conclude that the mean Brinell hardness of all such ductile iron pieces is greater than 170.

Note that the engineer obtains the same scientific conclusion regardless of the approach used. This will always be the case.

Height of Sunflowers

A biologist was interested in determining whether sunflower seedlings treated with an extract from Vinca minor roots resulted in a lower average height of sunflower seedlings than the standard height of 15.7 cm. The biologist treated a random sample of n = 33 seedlings with the extract and subsequently obtained the following heights:

Heights of 33 Sunflower Seedlings
11.5 11.8 15.7 16.1 14.1 10.5 9.3 15.0 11.1
15.2 19.0 12.8 12.4 19.2 13.5 12.2 13.3  
16.5 13.5 14.4 16.7 10.9 13.0 10.3 15.8  
15.1 17.1 13.3 12.4 8.5 14.3 12.9 13.5  

The biologist's hypotheses are:

H 0 : μ = 15.7 H A : μ < 15.7

The biologist entered her data into Minitab and requested that the "one-sample t -test" be conducted for the above hypotheses. She obtained the following output:

N Mean StDev SE Mean 95% Upper Bound
33 13.664 2.544 0.443 14.414

$\mu$: mean of Height

Null hypothesis    H₀: $\mu$ = 15.7 Alternative hypothesis    H₁: $\mu$ < 15.7

T-Value P-Value
-4.60 0.000

The output tells us that the average height of the n = 33 sunflower seedlings was 13.664 with a standard deviation of 2.544. (The standard error of the mean "SE Mean", calculated by dividing the standard deviation 13.664 by the square root of n = 33, is 0.443). The test statistic t * is -4.60, and the P -value, 0.000, is to three decimal places.

Minitab Note. Minitab will always report P -values to only 3 decimal places. If Minitab reports the P -value as 0.000, it really means that the P -value is 0.000....something. Throughout this course (and your future research!), when you see that Minitab reports the P -value as 0.000, you should report the P -value as being "< 0.001."

If the biologist set her significance level \(\alpha\) at 0.05 and used the critical value approach to conduct her hypothesis test, she would reject the null hypothesis if her test statistic t * were less than -1.6939 (determined using statistical software or a t -table):s-3-3

Since the biologist's test statistic, t * = -4.60, is less than -1.6939, the biologist rejects the null hypothesis. That is, the test statistic falls in the "critical region." There is sufficient evidence, at the α = 0.05 level, to conclude that the mean height of all such sunflower seedlings is less than 15.7 cm.

If the biologist used the P -value approach to conduct her hypothesis test, she would determine the area under a t n - 1 = t 32 curve and to the left of the test statistic t * = -4.60:

t-distribution for left tailed test with significance level of 0.05 shown in left tail

In the output above, Minitab reports that the P -value is 0.000, which we take to mean < 0.001. Since the P -value is less than 0.001, it is clearly less than \(\alpha\) = 0.05, and the biologist rejects the null hypothesis. There is sufficient evidence, at the \(\alpha\) = 0.05 level, to conclude that the mean height of all such sunflower seedlings is less than 15.7 cm.

t-distribution graph for left tailed test with a t-value of -4.60 and left tail area of 0.000

Note again that the biologist obtains the same scientific conclusion regardless of the approach used. This will always be the case.

Gum Thickness

A manufacturer claims that the thickness of the spearmint gum it produces is 7.5 one-hundredths of an inch. A quality control specialist regularly checks this claim. On one production run, he took a random sample of n = 10 pieces of gum and measured their thickness. He obtained:

Thicknesses of 10 Pieces of Gum
7.65 7.60 7.65 7.70 7.55
7.55 7.40 7.40 7.50 7.50

The quality control specialist's hypotheses are:

H 0 : μ = 7.5 H A : μ ≠ 7.5

The quality control specialist entered his data into Minitab and requested that the "one-sample t -test" be conducted for the above hypotheses. He obtained the following output:

N Mean StDev SE Mean 95% CI for $\mu$
10 7.550 0.1027 0.0325 (7.4765, 7.6235)

$\mu$: mean of Thickness

Null hypothesis    H₀: $\mu$ = 7.5 Alternative hypothesis    H₁: $\mu \ne$ 7.5

T-Value P-Value
1.54 0.158

The output tells us that the average thickness of the n = 10 pieces of gums was 7.55 one-hundredths of an inch with a standard deviation of 0.1027. (The standard error of the mean "SE Mean", calculated by dividing the standard deviation 0.1027 by the square root of n = 10, is 0.0325). The test statistic t * is 1.54, and the P -value is 0.158.

If the quality control specialist sets his significance level \(\alpha\) at 0.05 and used the critical value approach to conduct his hypothesis test, he would reject the null hypothesis if his test statistic t * were less than -2.2616 or greater than 2.2616 (determined using statistical software or a t -table):

t-distribution graph of two tails with a significance level of .05 and t values of -2.2616 and 2.2616

Since the quality control specialist's test statistic, t * = 1.54, is not less than -2.2616 nor greater than 2.2616, the quality control specialist fails to reject the null hypothesis. That is, the test statistic does not fall in the "critical region." There is insufficient evidence, at the \(\alpha\) = 0.05 level, to conclude that the mean thickness of all of the manufacturer's spearmint gum differs from 7.5 one-hundredths of an inch.

If the quality control specialist used the P -value approach to conduct his hypothesis test, he would determine the area under a t n - 1 = t 9 curve, to the right of 1.54 and to the left of -1.54:

t-distribution graph for a two tailed test with t values of -1.54 and 1.54, the corresponding p-values are 0.0789732 on both tails

In the output above, Minitab reports that the P -value is 0.158. Since the P -value, 0.158, is greater than \(\alpha\) = 0.05, the quality control specialist fails to reject the null hypothesis. There is insufficient evidence, at the \(\alpha\) = 0.05 level, to conclude that the mean thickness of all pieces of spearmint gum differs from 7.5 one-hundredths of an inch.

Note that the quality control specialist obtains the same scientific conclusion regardless of the approach used. This will always be the case.

In our review of hypothesis tests, we have focused on just one particular hypothesis test, namely that concerning the population mean \(\mu\). The important thing to recognize is that the topics discussed here — the general idea of hypothesis tests, errors in hypothesis testing, the critical value approach, and the P -value approach — generally extend to all of the hypothesis tests you will encounter.

Logo for Milne Publishing

Want to create or adapt books like this? Learn more about how Pressbooks supports open publishing practices.

Chapter 3: Hypothesis Testing

The previous two chapters introduced methods for organizing and summarizing sample data, and using sample statistics to estimate population parameters. This chapter introduces the next major topic of inferential statistics: hypothesis testing.

A hypothesis is a statement or claim about a property of a population.

The Fundamentals of Hypothesis Testing

When conducting scientific research, typically there is some known information, perhaps from some past work or from a long accepted idea. We want to test whether this claim is believable. This is the basic idea behind a hypothesis test:

  • State what we think is true.
  • Quantify how confident we are about our claim.
  • Use sample statistics to make inferences about population parameters.

For example, past research tells us that the average life span for a hummingbird is about four years. You have been studying the hummingbirds in the southeastern United States and find a sample mean lifespan of 4.8 years. Should you reject the known or accepted information in favor of your results? How confident are you in your estimate? At what point would you say that there is enough evidence to reject the known information and support your alternative claim? How far from the known mean of four years can the sample mean be before we reject the idea that the average lifespan of a hummingbird is four years?

Hypothesis testing is a procedure, based on sample evidence and probability, used to test claims regarding a characteristic of a population.

A hypothesis is a claim or statement about a characteristic of a population of interest to us. A hypothesis test is a way for us to use our sample statistics to test a specific claim.

The population mean weight is known to be 157 lb. We want to test the claim that the mean weight has increased.

Two years ago, the proportion of infected plants was 37%. We believe that a treatment has helped, and we want to test the claim that there has been a reduction in the proportion of infected plants.

Components of a Formal Hypothesis Test

The null hypothesis is a statement about the value of a population parameter, such as the population mean (µ) or the population proportion ( p ). It contains the condition of equality and is denoted as H 0 (H-naught).

H 0 : µ = 157 or H 0 : p = 0.37

The alternative hypothesis is the claim to be tested, the opposite of the null hypothesis. It contains the value of the parameter that we consider plausible and is denoted as H 1 .

H 1 : µ > 157 or H 1 : p ≠ 0.37

The test statistic is a value computed from the sample data that is used in making a decision about the rejection of the null hypothesis. The test statistic converts the sample mean ( x̄ ) or sample proportion ( p̂ ) to a Z- or t-score under the assumption that the null hypothesis is true . It is used to decide whether the difference between the sample statistic and the hypothesized claim is significant.

The p-value is the area under the curve to the left or right of the test statistic. It is compared to the level of significance ( α ).

The critical value is the value that defines the rejection zone (the test statistic values that would lead to rejection of the null hypothesis). It is defined by the level of significance.

The level of significance ( α ) is the probability that the test statistic will fall into the critical region when the null hypothesis is true. This level is set by the researcher.

The conclusion is the final decision of the hypothesis test. The conclusion must always be clearly stated, communicating the decision based on the components of the test. It is important to realize that we never prove or accept the null hypothesis. We are merely saying that the sample evidence is not strong enough to warrant the rejection of the null hypothesis. The conclusion is made up of two parts:

1) Reject or fail to reject the null hypothesis, and 2) there is or is not enough evidence to support the alternative claim.

Option 1) Reject the null hypothesis (H 0 ). This means that you have enough statistical evidence to support the alternative claim (H 1 ).

Option 2) Fail to reject the null hypothesis (H 0 ). This means that you do NOT have enough evidence to support the alternative claim (H 1 ).

Another way to think about hypothesis testing is to compare it to the US justice system. A defendant is innocent until proven guilty (Null hypothesis—innocent). The prosecuting attorney tries to prove that the defendant is guilty (Alternative hypothesis—guilty). There are two possible conclusions that the jury can reach. First, the defendant is guilty (Reject the null hypothesis). Second, the defendant is not guilty (Fail to reject the null hypothesis). This is NOT the same thing as saying the defendant is innocent! In the first case, the prosecutor had enough evidence to reject the null hypothesis (innocent) and support the alternative claim (guilty). In the second case, the prosecutor did NOT have enough evidence to reject the null hypothesis (innocent) and support the alternative claim of guilty.

The Null and Alternative Hypotheses

There are three different pairs of null and alternative hypotheses:

4333.png

where c is some known value.

A Two-sided Test

This tests whether the population parameter is equal to, versus not equal to, some specific value.

H o : μ = 12 vs. H 1 : μ ≠ 12

The critical region is divided equally into the two tails and the critical values are ± values that define the rejection zones.

Image36341.PNG

A forester studying diameter growth of red pine believes that the mean diameter growth will be different if a fertilization treatment is applied to the stand.

  • H o : μ = 1.2 in./ year
  • H 1 : μ ≠ 1.2 in./ year

This is a two-sided question, as the forester doesn’t state whether population mean diameter growth will increase or decrease.

A Right-sided Test

This tests whether the population parameter is equal to, versus greater than, some specific value.

H o : μ = 12 vs. H 1 : μ > 12

The critical region is in the right tail and the critical value is a positive value that defines the rejection zone.

Image36349.PNG

A biologist believes that there has been an increase in the mean number of lakes infected with milfoil, an invasive species, since the last study five years ago.

  • H o : μ = 15 lakes
  • H 1 : μ >15 lakes

This is a right-sided question, as the biologist believes that there has been an increase in population mean number of infected lakes.

A Left-sided Test

This tests whether the population parameter is equal to, versus less than, some specific value.

H o : μ = 12 vs. H 1 : μ < 12

The critical region is in the left tail and the critical value is a negative value that defines the rejection zone.

Image36357.PNG

A scientist’s research indicates that there has been a change in the proportion of people who support certain environmental policies. He wants to test the claim that there has been a reduction in the proportion of people who support these policies.

  • H o : p = 0.57
  • H 1 : p < 0.57

This is a left-sided question, as the scientist believes that there has been a reduction in the true population proportion.

Statistically Significant

When the observed results (the sample statistics) are unlikely (a low probability) under the assumption that the null hypothesis is true, we say that the result is statistically significant, and we reject the null hypothesis. This result depends on the level of significance, the sample statistic, sample size, and whether it is a one- or two-sided alternative hypothesis.

Types of Errors

When testing, we arrive at a conclusion of rejecting the null hypothesis or failing to reject the null hypothesis. Such conclusions are sometimes correct and sometimes incorrect (even when we have followed all the correct procedures). We use incomplete sample data to reach a conclusion and there is always the possibility of reaching the wrong conclusion. There are four possible conclusions to reach from hypothesis testing. Of the four possible outcomes, two are correct and two are NOT correct.

4298.png

A Type I error is when we reject the null hypothesis when it is true. The symbol α (alpha) is used to represent Type I errors. This is the same alpha we use as the level of significance. By setting alpha as low as reasonably possible, we try to control the Type I error through the level of significance.

A Type II error is when we fail to reject the null hypothesis when it is false. The symbol β (beta) is used to represent Type II errors.

In general, Type I errors are considered more serious. One step in the hypothesis test procedure involves selecting the significance level ( α ), which is the probability of rejecting the null hypothesis when it is correct. So the researcher can select the level of significance that minimizes Type I errors. However, there is a mathematical relationship between α, β , and n (sample size).

  • As α increases, β decreases
  • As α decreases, β increases
  • As sample size increases (n), both α and β decrease

The natural inclination is to select the smallest possible value for α, thinking to minimize the possibility of causing a Type I error. Unfortunately, this forces an increase in Type II errors. By making the rejection zone too small, you may fail to reject the null hypothesis, when, in fact, it is false. Typically, we select the best sample size and level of significance, automatically setting β .

Image36377.PNG

Power of the Test

A Type II error ( β ) is the probability of failing to reject a false null hypothesis. It follows that 1- β is the probability of rejecting a false null hypothesis. This probability is identified as the power of the test, and is often used to gauge the test’s effectiveness in recognizing that a null hypothesis is false.

The probability that at a fixed level α significance test will reject H 0 , when a particular alternative value of the parameter is true is called the power of the test.

Power is also directly linked to sample size. For example, suppose the null hypothesis is that the mean fish weight is 8.7 lb. Given sample data, a level of significance of 5%, and an alternative weight of 9.2 lb., we can compute the power of the test to reject μ = 8.7 lb. If we have a small sample size, the power will be low. However, increasing the sample size will increase the power of the test. Increasing the level of significance will also increase power. A 5% test of significance will have a greater chance of rejecting the null hypothesis than a 1% test because the strength of evidence required for the rejection is less. Decreasing the standard deviation has the same effect as increasing the sample size: there is more information about μ .

Hypothesis Test about the Population Mean ( μ ) when the Population Standard Deviation ( σ ) is Known

We are going to examine two equivalent ways to perform a hypothesis test: the classical approach and the p-value approach. The classical approach is based on standard deviations. This method compares the test statistic (Z-score) to a critical value (Z-score) from the standard normal table. If the test statistic falls in the rejection zone, you reject the null hypothesis. The p-value approach is based on area under the normal curve. This method compares the area associated with the test statistic to alpha ( α ), the level of significance (which is also area under the normal curve). If the p-value is less than alpha, you would reject the null hypothesis.

As a past student poetically said: If the p-value is a wee value, Reject Ho

Both methods must have:

  • Data from a random sample.
  • Verification of the assumption of normality.
  • A null and alternative hypothesis.
  • A criterion that determines if we reject or fail to reject the null hypothesis.
  • A conclusion that answers the question.

There are four steps required for a hypothesis test:

  • State the null and alternative hypotheses.
  • State the level of significance and the critical value.
  • Compute the test statistic.
  • State a conclusion.

The Classical Method for Testing a Claim about the Population Mean ( μ ) when the Population Standard Deviation ( σ ) is Known

A forester studying diameter growth of red pine believes that the mean diameter growth will be different from the known mean growth of 1.35 inches/year if a fertilization treatment is applied to the stand. He conducts his experiment, collects data from a sample of 32 plots, and gets a sample mean diameter growth of 1.6 in./year. The population standard deviation for this stand is known to be 0.46 in./year. Does he have enough evidence to support his claim?

Step 1) State the null and alternative hypotheses.

  • H o : μ = 1.35 in./year
  • H 1 : μ ≠ 1.35 in./year

Step 2) State the level of significance and the critical value.

  • We will choose a level of significance of 5% ( α = 0.05).
  • For a two-sided question, we need a two-sided critical value – Z α /2 and + Z α /2 .
  • The level of significance is divided by 2 (since we are only testing “not equal”). We must have two rejection zones that can deal with either a greater than or less than outcome (to the right (+) or to the left (-)).
  • We need to find the Z-score associated with the area of 0.025. The red areas are equal to α /2 = 0.05/2 = 0.025 or 2.5% of the area under the normal curve.
  • Go into the body of values and find the negative Z-score associated with the area 0.025.

Image36387.PNG

  • The negative critical value is -1.96. Since the curve is symmetric, we know that the positive critical value is 1.96.
  • ±1.96 are the critical values. These values set up the rejection zone. If the test statistic falls within these red rejection zones, we reject the null hypothesis.

Step 3) Compute the test statistic.

  • The test statistic is the number of standard deviations the sample mean is from the known mean. It is also a Z-score, just like the critical value.

4266.png

  • For this problem, the test statistic is

4258.png

Step 4) State a conclusion.

  • Compare the test statistic to the critical value. If the test statistic falls into the rejection zones, reject the null hypothesis. In other words, if the test statistic is greater than +1.96 or less than -1.96, reject the null hypothesis.

Image36395.PNG

In this problem, the test statistic falls in the red rejection zone. The test statistic of 3.07 is greater than the critical value of 1.96.We will reject the null hypothesis. We have enough evidence to support the claim that the mean diameter growth is different from (not equal to) 1.35 in./year.

A researcher believes that there has been an increase in the average farm size in his state since the last study five years ago. The previous study reported a mean size of 450 acres with a population standard deviation ( σ ) of 167 acres. He samples 45 farms and gets a sample mean of 485.8 acres. Is there enough information to support his claim?

  • H o : μ = 450 acres
  • H 1 : μ >450 acres
  • For a one-sided question, we need a one-sided positive critical value Z α .
  • The level of significance is all in the right side (the rejection zone is just on the right side).
  • We need to find the Z-score associated with the 5% area in the right tail.

Image36403.PNG

  • Go into the body of values in the standard normal table and find the Z-score that separates the lower 95% from the upper 5%.
  • The critical value is 1.645. This value sets up the rejection zone.

4232.png

  • Compare the test statistic to the critical value.

Image36415.PNG

  • The test statistic does not fall in the rejection zone. It is less than the critical value.

We fail to reject the null hypothesis. We do not have enough evidence to support the claim that the mean farm size has increased from 450 acres.

A researcher believes that there has been a reduction in the mean number of hours that college students spend preparing for final exams. A national study stated that students at a 4-year college spend an average of 23 hours preparing for 5 final exams each semester with a population standard deviation of 7.3 hours. The researcher sampled 227 students and found a sample mean study time of 19.6 hours. Does this indicate that the average study time for final exams has decreased? Use a 1% level of significance to test this claim.

  • H o : μ = 23 hours
  • H 1 : μ < 23 hours
  • This is a left-sided test so alpha (0.01) is all in the left tail.

Image36427.PNG

  • Go into the body of values in the standard normal table and find the Z-score that defines the lower 1% of the area.
  • The critical value is -2.33. This value sets up the rejection zone.

4198.png

  • The test statistic falls in the rejection zone. The test statistic of -7.02 is less than the critical value of -2.33.

We reject the null hypothesis. We have sufficient evidence to support the claim that the mean final exam study time has decreased below 23 hours.

Testing a Hypothesis using P-values

The p-value is the probability of observing our sample mean given that the null hypothesis is true. It is the area under the curve to the left or right of the test statistic. If the probability of observing such a sample mean is very small (less than the level of significance), we would reject the null hypothesis. Computations for the p-value depend on whether it is a one- or two-sided test.

Steps for a hypothesis test using p-values:

  • State the level of significance.
  • Compute the test statistic and find the area associated with it (this is the p-value).
  • Compare the p-value to alpha ( α ) and state a conclusion.

Instead of comparing Z-score test statistic to Z-score critical value, as in the classical method, we compare area of the test statistic to area of the level of significance.

The Decision Rule: If the p-value is less than alpha, we reject the null hypothesis

Computing P-values

If it is a two-sided test (the alternative claim is ≠), the p-value is equal to two times the probability of the absolute value of the test statistic. If the test is a left-sided test (the alternative claim is “<”), then the p-value is equal to the area to the left of the test statistic. If the test is a right-sided test (the alternative claim is “>”), then the p-value is equal to the area to the right of the test statistic.

Let’s look at Example 6 again.

A forester studying diameter growth of red pine believes that the mean diameter growth will be different from the known mean growth of 1.35 in./year if a fertilization treatment is applied to the stand. He conducts his experiment, collects data from a sample of 32 plots, and gets a sample mean diameter growth of 1.6 in./year. The population standard deviation for this stand is known to be 0.46 in./year. Does he have enough evidence to support his claim?

Step 2) State the level of significance.

  • For this problem, the test statistic is:

4169.png

The p-value is two times the area of the absolute value of the test statistic (because the alternative claim is “not equal”).

Image36447.PNG

  • Look up the area for the Z-score 3.07 in the standard normal table. The area (probability) is equal to 1 – 0.9989 = 0.0011.
  • Multiply this by 2 to get the p-value = 2 * 0.0011 = 0.0022.

Step 4) Compare the p-value to alpha and state a conclusion.

  • Use the Decision Rule (if the p-value is less than α , reject H 0 ).
  • In this problem, the p-value (0.0022) is less than alpha (0.05).
  • We reject the H 0 . We have enough evidence to support the claim that the mean diameter growth is different from 1.35 inches/year.

Let’s look at Example 7 again.

4154.png

The p-value is the area to the right of the Z-score 1.44 (the hatched area).

  • This is equal to 1 – 0.9251 = 0.0749.
  • The p-value is 0.0749.

Image36455.PNG

  • Use the Decision Rule.
  • In this problem, the p-value (0.0749) is greater than alpha (0.05), so we Fail to Reject the H 0 .
  • The area of the test statistic is greater than the area of alpha ( α ).

We fail to reject the null hypothesis. We do not have enough evidence to support the claim that the mean farm size has increased.

Let’s look at Example 8 again.

  • H 0 : μ = 23 hours

4138.png

The p-value is the area to the left of the test statistic (the little black area to the left of -7.02). The Z-score of -7.02 is not on the standard normal table. The smallest probability on the table is 0.0002. We know that the area for the Z-score -7.02 is smaller than this area (probability). Therefore, the p-value is <0.0002.

Image36463.PNG

  • In this problem, the p-value (p<0.0002) is less than alpha (0.01), so we Reject the H 0 .
  • The area of the test statistic is much less than the area of alpha ( α ).

We reject the null hypothesis. We have enough evidence to support the claim that the mean final exam study time has decreased below 23 hours.

Both the classical method and p-value method for testing a hypothesis will arrive at the same conclusion. In the classical method, the critical Z-score is the number on the z-axis that defines the level of significance ( α ). The test statistic converts the sample mean to units of standard deviation (a Z-score). If the test statistic falls in the rejection zone defined by the critical value, we will reject the null hypothesis. In this approach, two Z-scores, which are numbers on the z-axis, are compared. In the p-value approach, the p-value is the area associated with the test statistic. In this method, we compare α (which is also area under the curve) to the p-value. If the p-value is less than α , we reject the null hypothesis. The p-value is the probability of observing such a sample mean when the null hypothesis is true. If the probability is too small (less than the level of significance), then we believe we have enough statistical evidence to reject the null hypothesis and support the alternative claim.

Software Solutions

(referring to Ex. 8)

052_1.tif

One-Sample Z

Test of mu = 23 vs. < 23
The assumed standard deviation = 7.3
99% Upper
N Mean SE Mean Bound Z P
227 19.600 0.485 20.727 -7.02 0.000

Excel does not offer 1-sample hypothesis testing.

Hypothesis Test about the Population Mean ( μ ) when the Population Standard Deviation ( σ ) is Unknown

Frequently, the population standard deviation (σ) is not known. We can estimate the population standard deviation (σ) with the sample standard deviation (s). However, the test statistic will no longer follow the standard normal distribution. We must rely on the student’s t-distribution with n-1 degrees of freedom. Because we use the sample standard deviation (s), the test statistic will change from a Z-score to a t-score.

4093.png

Steps for a hypothesis test are the same that we covered in Section 2.

Just as with the hypothesis test from the previous section, the data for this test must be from a random sample and requires either that the population from which the sample was drawn be normal or that the sample size is sufficiently large (n≥30). A t-test is robust, so small departures from normality will not adversely affect the results of the test. That being said, if the sample size is smaller than 30, it is always good to verify the assumption of normality through a normal probability plot.

We will still have the same three pairs of null and alternative hypotheses and we can still use either the classical approach or the p-value approach.

4071.png

Selecting the correct critical value from the student’s t-distribution table depends on three factors: the type of test (one-sided or two-sided alternative hypothesis), the sample size, and the level of significance.

For a two-sided test (“not equal” alternative hypothesis), the critical value (t α /2 ), is determined by alpha ( α ), the level of significance, divided by two, to deal with the possibility that the result could be less than OR greater than the known value.

  • If your level of significance was 0.05, you would use the 0.025 column to find the correct critical value (0.05/2 = 0.025).
  • If your level of significance was 0.01, you would use the 0.005 column to find the correct critical value (0.01/2 = 0.005).

For a one-sided test (“a less than” or “greater than” alternative hypothesis), the critical value (t α ) , is determined by alpha ( α ), the level of significance, being all in the one side.

  • If your level of significance was 0.05, you would use the 0.05 column to find the correct critical value for either a left or right-side question. If you are asking a “less than” (left-sided question, your critical value will be negative. If you are asking a “greater than” (right-sided question), your critical value will be positive.

Find the critical value you would use to test the claim that μ ≠ 112 with a sample size of 18 and a 5% level of significance.

In this case, the critical value (t α /2 ) would be 2.110. This is a two-sided question (≠) so you would divide alpha by 2 (0.05/2 = 0.025) and go down the 0.025 column to 17 degrees of freedom.

What would the critical value be if you wanted to test that μ < 112 for the same data?

In this case, the critical value would be 1.740. This is a one-sided question (<) so alpha would be divided by 1 (0.05/1 = 0.05). You would go down the 0.05 column with 17 degrees of freedom to get the correct critical value.

In 2005, the mean pH level of rain in a county in northern New York was 5.41. A biologist believes that the rain acidity has changed. He takes a random sample of 11 rain dates in 2010 and obtains the following data. Use a 1% level of significance to test his claim.

4.70, 5.63, 5.02, 5.78, 4.99, 5.91, 5.76, 5.54, 5.25, 5.18, 5.01

The sample size is small and we don’t know anything about the distribution of the population, so we examine a normal probability plot. The distribution looks normal so we will continue with our test.

4060.png

The sample mean is 5.343 with a sample standard deviation of 0.397.

  • H o : μ = 5.41
  • H 1 : μ ≠ 5.41
  • This is a two-sided question so alpha is divided by two.

Image36502.PNG

  • t α /2 is found by going down the 0.005 column with 14 degrees of freedom.
  • t α /2 = ±3.169.
  • The test statistic is a t-score.

4043.png

  • The test statistic does not fall in the rejection zone.

We will fail to reject the null hypothesis. We do not have enough evidence to support the claim that the mean rain pH has changed.

A One-sided Test

Cadmium, a heavy metal, is toxic to animals. Mushrooms, however, are able to absorb and accumulate cadmium at high concentrations. The government has set safety limits for cadmium in dry vegetables at 0.5 ppm. Biologists believe that the mean level of cadmium in mushrooms growing near strip mines is greater than the recommended limit of 0.5 ppm, negatively impacting the animals that live in this ecosystem. A random sample of 51 mushrooms gave a sample mean of 0.59 ppm with a sample standard deviation of 0.29 ppm. Use a 5% level of significance to test the claim that the mean cadmium level is greater than the acceptable limit of 0.5 ppm.

The sample size is greater than 30 so we are assured of a normal distribution of the means.

  • H o : μ = 0.5 ppm
  • H 1 : μ > 0.5 ppm
  • This is a right-sided question so alpha is all in the right tail.

Image36622.PNG

  • t α is found by going down the 0.05 column with 50 degrees of freedom.
  • t α = 1.676

4009.png

Step 4) State a Conclusion.

Image36634.PNG

The test statistic falls in the rejection zone. We will reject the null hypothesis. We have enough evidence to support the claim that the mean cadmium level is greater than the acceptable safe limit.

BUT, what happens if the significance level changes to 1%?

The critical value is now found by going down the 0.01 column with 50 degrees of freedom. The critical value is 2.403. The test statistic is now LESS THAN the critical value. The test statistic does not fall in the rejection zone. The conclusion will change. We do NOT have enough evidence to support the claim that the mean cadmium level is greater than the acceptable safe limit of 0.5 ppm.

The level of significance is the probability that you, as the researcher, set to decide if there is enough statistical evidence to support the alternative claim. It should be set before the experiment begins.

P-value Approach

We can also use the p-value approach for a hypothesis test about the mean when the population standard deviation ( σ ) is unknown. However, when using a student’s t-table, we can only estimate the range of the p-value, not a specific value as when using the standard normal table. The student’s t-table has area (probability) across the top row in the table, with t-scores in the body of the table.

  • To find the p-value (the area associated with the test statistic), you would go to the row with the number of degrees of freedom.
  • Go across that row until you find the two values that your test statistic is between, then go up those columns to find the estimated range for the p-value.

Estimating P-value from a Student’s T-table

3985.png

If your test statistic is 3.789 with 3 degrees of freedom, you would go across the 3 df row. The value 3.789 falls between the values 3.482 and 4.541 in that row. Therefore, the p-value is between 0.02 and 0.01. The p-value will be greater than 0.01 but less than 0.02 (0.01<p<0.02).

If your level of significance is 5%, you would reject the null hypothesis as the p-value (0.01-0.02) is less than alpha ( α ) of 0.05.

If your level of significance is 1%, you would fail to reject the null hypothesis as the p-value (0.01-0.02) is greater than alpha ( α ) of 0.01.

Software packages typically output p-values. It is easy to use the Decision Rule to answer your research question by the p-value method.

(referring to Ex. 12)

060_1.tif

One-Sample T

Test of mu = 0.5 vs. > 0.5

95% Lower

N

Mean

StDev

SE Mean

Bound

T

P

51

0.5900

0.2900

0.0406

0.5219

2.22

0.016

Additional example: www.youtube.com/watch?v=WwdSjO4VUsg .

Hypothesis Test for a Population Proportion ( p )

Frequently, the parameter we are testing is the population proportion.

  • We are studying the proportion of trees with cavities for wildlife habitat.
  • We need to know if the proportion of people who support green building materials has changed.
  • Has the proportion of wolves that died last year in Yellowstone increased from the year before?

Recall that the best point estimate of p , the population proportion, is given by

5055.png

when np (1 – p )≥10. We can use both the classical approach and the p-value approach for testing.

The steps for a hypothesis test are the same that we covered in Section 2.

The test statistic follows the standard normal distribution. Notice that the standard error (the denominator) uses p instead of p̂ , which was used when constructing a confidence interval about the population proportion. In a hypothesis test, the null hypothesis is assumed to be true, so the known proportion is used.

5019.png

  • The critical value comes from the standard normal table, just as in Section 2. We will still use the same three pairs of null and alternative hypotheses as we used in the previous sections, but the parameter is now p instead of μ :

5013.png

  • For a two-sided test, alpha will be divided by 2 giving a ± Z α /2 critical value.
  • For a left-sided test, alpha will be all in the left tail giving a – Z α critical value.
  • For a right-sided test, alpha will be all in the right tail giving a Z α critical value.

A botanist has produced a new variety of hybrid soy plant that is better able to withstand drought than other varieties. The botanist knows the seed germination for the parent plants is 75%, but does not know the seed germination for the new hybrid. He tests the claim that it is different from the parent plants. To test this claim, 450 seeds from the hybrid plant are tested and 321 have germinated. Use a 5% level of significance to test this claim that the germination rate is different from 75%.

  • H o : p = 0.75
  • H 1 : p ≠ 0.75

This is a two-sided question so alpha is divided by 2.

  • Alpha is 0.05 so the critical values are ± Z α /2 = ± Z .025 .
  • Look on the negative side of the standard normal table, in the body of values for 0.025.
  • The critical values are ± 1.96.

5007.png

The test statistic does not fall in the rejection zone. We fail to reject the null hypothesis. We do not have enough evidence to support the claim that the germination rate of the hybrid plant is different from the parent plants.

Let’s answer this question using the p-value approach. Remember, for a two-sided alternative hypothesis (“not equal”), the p-value is two times the area of the test statistic. The test statistic is -1.81 and we want to find the area to the left of -1.81 from the standard normal table.

  • On the negative page, find the Z-score -1.81. Find the area associated with this Z-score.
  • The area = 0.0351.
  • This is a two-sided test so multiply the area times 2 to get the p-value = 0.0351 x 2 = 0.0702.

Now compare the p-value to alpha. The Decision Rule states that if the p-value is less than alpha, reject the H 0 . In this case, the p-value (0.0702) is greater than alpha (0.05) so we will fail to reject H 0 . We do not have enough evidence to support the claim that the germination rate of the hybrid plant is different from the parent plants.

You are a biologist studying the wildlife habitat in the Monongahela National Forest. Cavities in older trees provide excellent habitat for a variety of birds and small mammals. A study five years ago stated that 32% of the trees in this forest had suitable cavities for this type of wildlife. You believe that the proportion of cavity trees has increased. You sample 196 trees and find that 79 trees have cavities. Does this evidence support your claim that there has been an increase in the proportion of cavity trees?

Use a 10% level of significance to test this claim.

  • H o : p = 0.32
  • H 1 : p > 0.32

This is a one-sided question so alpha is divided by 1.

  • Alpha is 0.10 so the critical value is Z α = Z .10
  • Look on the positive side of the standard normal table, in the body of values for 0.90.
  • The critical value is 1.28.

Image36682.PNG

  • The test statistic is the number of standard deviations the sample proportion is from the known proportion. It is also a Z-score, just like the critical value.

4979.png

The test statistic is larger than the critical value (it falls in the rejection zone). We will reject the null hypothesis. We have enough evidence to support the claim that there has been an increase in the proportion of cavity trees.

Now use the p-value approach to answer the question. This is a right-sided question (“greater than”), so the p-value is equal to the area to the right of the test statistic. Go to the positive side of the standard normal table and find the area associated with the Z-score of 2.49. The area is 0.9936. Remember that this table is cumulative from the left. To find the area to the right of 2.49, we subtract from one.

p-value = (1 – 0.9936) = 0.0064

The p-value is less than the level of significance (0.10), so we reject the null hypothesis. We have enough evidence to support the claim that the proportion of cavity trees has increased.

(referring to Ex. 15)

Test and CI for One Proportion

Test of p = 0.32 vs. p > 0.32

90% Lower

Sample X N Sample p Bound Z-Value p-Value
1 79 196 0.403061 0.358160 2.49 0.006
Using the normal approximation.

Hypothesis Test about a Variance

When people think of statistical inference, they usually think of inferences involving population means or proportions. However, the particular population parameter needed to answer an experimenter’s practical questions varies from one situation to another, and sometimes a population’s variability is more important than its mean. Thus, product quality is often defined in terms of low variability.

Sample variance S 2 can be used for inferences concerning a population variance σ 2 . For a random sample of n measurements drawn from a normal population with mean μ and variance σ 2 , the value S 2 provides a point estimate for σ 2 . In addition, the quantity ( n – 1) S 2 / σ 2 follows a Chi-square ( χ 2 ) distribution, with df = n – 1.

The properties of Chi-square ( χ 2 ) distribution are:

  • Unlike Z and t distributions, the values in a chi-square distribution are all positive.
  • The chi-square distribution is asymmetric, unlike the Z and t distributions.
  • There are many chi-square distributions. We obtain a particular one by specifying the degrees of freedom (df = n – 1) associated with the sample variances S 2 .

Image36711.PNG

One-sample χ 2 test for testing the hypotheses:

4933.png

Alternative hypothesis:

4929.png

where the χ 2 critical value in the rejection region is based on degrees of freedom df = n – 1 and a specified significance level of α .

4886.png

As with previous sections, if the test statistic falls in the rejection zone set by the critical value, you will reject the null hypothesis.

A forester wants to control a dense understory of striped maple that is interfering with desirable hardwood regeneration using a mist blower to apply an herbicide treatment. She wants to make sure that treatment has a consistent application rate, in other words, low variability not exceeding 0.25 gal./acre (0.06 gal. 2 ). She collects sample data (n = 11) on this type of mist blower and gets a sample variance of 0.064 gal. 2 Using a 5% level of significance, test the claim that the variance is significantly greater than 0.06 gal. 2

H 0 : σ 2 = 0.06

H 1 : σ 2 >0.06

The critical value is 18.307. Any test statistic greater than this value will cause you to reject the null hypothesis.

The test statistic is

4876.png

We fail to reject the null hypothesis. The forester does NOT have enough evidence to support the claim that the variance is greater than 0.06 gal. 2 You can also estimate the p-value using the same method as for the student t-table. Go across the row for degrees of freedom until you find the two values that your test statistic falls between. In this case going across the row 10, the two table values are 4.865 and 15.987. Now go up those two columns to the top row to estimate the p-value (0.1-0.9). The p-value is greater than 0.1 and less than 0.9. Both are greater than the level of significance (0.05) causing us to fail to reject the null hypothesis.

(referring to Ex. 16)

067_1.tif

Test and CI for One Variance

Method

Null hypothesis Sigma-squared = 0.06
Alternative hypothesis Sigma-squared > 0.06

The chi-square method is only for the normal distribution.

Test

Method Statistic DF P-Value
Chi-Square 10.67 10 0.384

Excel does not offer 1-sample χ 2 testing.

Putting it all Together Using the Classical Method

To test a claim about μ when σ is known.

  • Write the null and alternative hypotheses.
  • State the level of significance and get the critical value from the standard normal table.

4840.png

  • Compare the test statistic to the critical value (Z-score) and write the conclusion.

To Test a Claim about μ When σ is Unknown

  • State the level of significance and get the critical value from the student’s t-table with n-1 degrees of freedom.

4833.png

  • Compare the test statistic to the critical value (t-score) and write the conclusion.

To Test a Claim about p

  • State the level of significance and get the critical value from the standard normal distribution.

4826.png

To Test a Claim about Variance

  • State the level of significance and get the critical value from the chi-square table using n-1 degrees of freedom.

4813.png

  • Compare the test statistic to the critical value and write the conclusion.

Natural Resources Biometrics Copyright © 2014 by Diane Kiernan is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License , except where otherwise noted.

If you're seeing this message, it means we're having trouble loading external resources on our website.

If you're behind a web filter, please make sure that the domains *.kastatic.org and *.kasandbox.org are unblocked.

To log in and use all the features of Khan Academy, please enable JavaScript in your browser.

High school biology (DEPRECATED)

Course: high school biology (deprecated)   >   unit 1.

  • Biology overview
  • Preparing to study biology
  • What is life?
  • The scientific method
  • Data to justify experimental claims examples
  • Scientific method and data analysis
  • Introduction to experimental design
  • Controlled experiments

Biology and the scientific method review

  • Experimental design and bias

biology hypothesis testing examples

TermMeaning
BiologyThe study of living things
ObservationNoticing and describing events in an orderly way
HypothesisA scientific explanation that can be tested through experimentation or observation
Controlled experimentAn experiment in which only one variable is changed
Independent variableThe variable that is deliberately changed in an experiment
Dependent variableThe variable this is observed and changes in response to the independent variable
Control groupBaseline group that does not have changes in the independent variable
Scientific theoryA well-tested and widely accepted explanation for a phenomenon
Research biasProcess during which the researcher influences the results, either knowingly or unknowingly
PlaceboA substance that has no therapeutic effect, often used as a control in experiments
Double-blind studyStudy in which neither the participants nor the researchers know who is receiving a particular treatment

The nature of biology

Properties of life.

  • Organization: Living things are highly organized (meaning they contain specialized, coordinated parts) and are made up of one or more cells .
  • Metabolism: Living things must use energy and consume nutrients to carry out the chemical reactions that sustain life. The sum total of the biochemical reactions occurring in an organism is called its metabolism .
  • Homeostasis : Living organisms regulate their internal environment to maintain the relatively narrow range of conditions needed for cell function.
  • Growth : Living organisms undergo regulated growth. Individual cells become larger in size, and multicellular organisms accumulate many cells through cell division.
  • Reproduction : Living organisms can reproduce themselves to create new organisms.
  • Response : Living organisms respond to stimuli or changes in their environment.
  • Evolution : Populations of living organisms can undergo evolution , meaning that the genetic makeup of a population may change over time.

Scientific methodology

Scientific method example: failure to toast, experimental design, reducing errors and bias.

  • Having a large sample size in the experiment: This helps to account for any small differences among the test subjects that may provide unexpected results.
  • Repeating experimental trials multiple times: Errors may result from slight differences in test subjects, or mistakes in methodology or data collection. Repeating trials helps reduce those effects.
  • Including all data points: Sometimes it is tempting to throw away data points that are inconsistent with the proposed hypothesis. However, this makes for an inaccurate study! All data points need to be included, whether they support the hypothesis or not.
  • Using placebos , when appropriate: Placebos prevent the test subjects from knowing whether they received a real therapeutic substance. This helps researchers determine whether a substance has a true effect.
  • Implementing double-blind studies , when appropriate: Double-blind studies prevent researchers from knowing the status of a particular participant. This helps eliminate observer bias.

Communicating findings

Things to remember.

  • A hypothesis is not necessarily the right explanation. Instead, it is a possible explanation that can be tested to see if it is likely correct, or if a new hypothesis needs to be made.
  • Not all explanations can be considered a hypothesis. A hypothesis must be testable and falsifiable in order to be valid. For example, “The universe is beautiful" is not a good hypothesis, because there is no experiment that could test this statement and show it to be false.
  • In most cases, the scientific method is an iterative process. In other words, it's a cycle rather than a straight line. The result of one experiment often becomes feedback that raises questions for more experimentation.
  • Scientists use the word "theory" in a very different way than non-scientists. When many people say "I have a theory," they really mean "I have a guess." Scientific theories, on the other hand, are well-tested and highly reliable scientific explanations of natural phenomena. They unify many repeated observations and data collected from lots of experiments.

Want to join the conversation?

  • Upvote Button navigates to signup page
  • Downvote Button navigates to signup page
  • Flag Button navigates to signup page

Great Answer

Examples

Biology Hypothesis

Ai generator.

biology hypothesis testing examples

Delve into the fascinating world of biology with our definitive guide on crafting impeccable hypothesis thesis statements . As the foundation of any impactful biological research, a well-formed hypothesis paves the way for groundbreaking discoveries and insights. Whether you’re examining cellular behavior or large-scale ecosystems, mastering the art of the thesis statement is crucial. Embark on this enlightening journey with us, as we provide stellar examples and invaluable writing advice tailored for budding biologists.

What is a good hypothesis in biology?

A good hypothesis in biology is a statement that offers a tentative explanation for a biological phenomenon, based on prior knowledge or observation. It should be:

  • Testable: The hypothesis should be measurable and can be proven false through experiments or observations.
  • Clear: It should be stated clearly and without ambiguity.
  • Based on Knowledge: A solid hypothesis often stems from existing knowledge or literature in the field.
  • Specific: It should clearly define the variables being tested and the expected outcomes.
  • Falsifiable: It’s essential that a hypothesis can be disproven. This means there should be a possible result that could indicate the hypothesis is incorrect.

What is an example of a hypothesis statement in biology?

Example: “If a plant is given a higher concentration of carbon dioxide, then it will undergo photosynthesis at an increased rate compared to a plant given a standard concentration of carbon dioxide.”

In this example:

  • The independent variable (what’s being changed) is the concentration of carbon dioxide.
  • The dependent variable (what’s being measured) is the rate of photosynthesis. The statement proposes a cause-and-effect relationship that can be tested through experimentation.

100 Biology Thesis Statement Examples

Biology Thesis Statement Examples

Size: 272 KB

Biology, as the study of life and living organisms, is vast and diverse. Crafting a good thesis statement in this field requires a clear understanding of the topic at hand, capturing the essence of the research aim. From genetics to ecology, from cell biology to animal behavior, the following examples will give you a comprehensive idea about forming succinct biology thesis statements.

Genetics: Understanding the role of the BRCA1 gene in breast cancer susceptibility can lead to targeted treatments.

2. Evolution: The finch populations of the Galápagos Islands provide evidence of natural selection through beak variations in response to food availability.

3. Cell Biology: Mitochondrial dysfunction is a central factor in the onset of age-related neurodegenerative diseases.

4. Ecology: Deforestation in the Amazon directly impacts global carbon dioxide levels, influencing climate change.

5. Human Anatomy: Regular exercise enhances cardiovascular health by improving heart muscle function and reducing arterial plaque.

6. Marine Biology: Coral bleaching events in the Great Barrier Reef correlate strongly with rising sea temperatures.

7. Zoology: Migration patterns of Monarch butterflies are influenced by seasonal changes and available food sources.

8. Botany: The symbiotic relationship between mycorrhizal fungi and plant roots enhances nutrient absorption in poor soil conditions.

9. Microbiology: The overuse of antibiotics in healthcare has accelerated the evolution of antibiotic-resistant bacterial strains.

10. Physiology: High altitude adaptation in certain human populations has led to increased hemoglobin production.

11. Immunology: The role of T-cells in the human immune response is critical in developing effective vaccines against viral diseases.

12. Behavioral Biology: Birdsong variations in sparrows can be attributed to both genetic factors and environmental influences.

13. Developmental Biology: The presence of certain hormones during fetal development dictates the differentiation of sex organs in mammals.

14. Conservation Biology: The rapid decline of bee populations worldwide is directly linked to the use of certain pesticides in agriculture.

15. Molecular Biology: The CRISPR-Cas9 system has revolutionized gene editing techniques, offering potential cures for genetic diseases.

16. Virology: The mutation rate of the influenza virus necessitates annual updates in vaccine formulations.

17. Neurobiology: Neural plasticity in the adult brain can be enhanced through consistent learning and cognitive challenges.

18. Ethology: Elephant herds exhibit complex social structures and matriarchal leadership.

19. Biotechnology: Genetically modified crops can improve yield and resistance but also pose ecological challenges.

20. Environmental Biology: Industrial pollution in freshwater systems disrupts aquatic life and can lead to loss of biodiversity.

21. Neurodegenerative Diseases: Amyloid-beta protein accumulation in the brain is a key marker for Alzheimer’s disease progression.

22. Endocrinology: The disruption of thyroid hormone balance leads to metabolic disorders and weight fluctuations.

23. Bioinformatics: Machine learning algorithms can predict protein structures with high accuracy, advancing drug design.

24. Plant Physiology: The stomatal closure mechanism in plants helps prevent water loss and maintain turgor pressure.

25. Parasitology: The lifecycle of the malaria parasite involves complex interactions between humans and mosquitoes.

26. Molecular Genetics: Epigenetic modifications play a crucial role in gene expression regulation and cell differentiation.

27. Evolutionary Psychology: Human preference for symmetrical faces is a result of evolutionarily advantageous traits.

28. Ecosystem Dynamics: The reintroduction of apex predators in ecosystems restores ecological balance and biodiversity.

29. Epigenetics: Maternal dietary choices during pregnancy can influence the epigenetic profiles of offspring.

30. Biochemistry: Enzyme kinetics in metabolic pathways reveal insights into cellular energy production.

31. Bioluminescence: The role of bioluminescence in deep-sea organisms serves as camouflage and communication.

32. Genetics of Disease: Mutations in the CFTR gene cause cystic fibrosis, leading to severe respiratory and digestive issues.

33. Reproductive Biology: The influence of pheromones on mate selection is a critical aspect of reproductive success in many species.

34. Plant-Microbe Interactions: Rhizobium bacteria facilitate nitrogen fixation in leguminous plants, benefiting both organisms.

35. Comparative Anatomy: Homologous structures in different species provide evidence of shared evolutionary ancestry.

36. Stem Cell Research: Induced pluripotent stem cells hold immense potential for regenerative medicine and disease modeling.

37. Bioethics: Balancing the use of genetic modification in humans with ethical considerations is a complex challenge.

38. Molecular Evolution: The study of orthologous and paralogous genes offers insights into evolutionary relationships.

39. Bioenergetics: ATP synthesis through oxidative phosphorylation is a fundamental process driving cellular energy production.

40. Population Genetics: The Hardy-Weinberg equilibrium model helps predict allele frequencies in populations over time.

41. Animal Communication: The complex vocalizations of whales serve both social bonding and long-distance communication purposes.

42. Biogeography: The distribution of marsupials in Australia and their absence elsewhere highlights the impact of geographical isolation on evolution.

43. Aquatic Ecology: The phenomenon of eutrophication in lakes is driven by excessive nutrient runoff and results in harmful algal blooms.

44. Insect Behavior: The waggle dance of honeybees conveys precise information about the location of food sources to other members of the hive.

45. Microbial Ecology: The gut microbiome’s composition influences host health, metabolism, and immune system development.

46. Evolution of Sex: The Red Queen hypothesis explains the evolution of sexual reproduction as a defense against rapidly evolving parasites.

47. Immunotherapy: Manipulating the immune response to target cancer cells shows promise as an effective cancer treatment strategy.

48. Epigenetic Inheritance: Epigenetic modifications can be passed down through generations, impacting traits and disease susceptibility.

49. Comparative Genomics: Comparing the genomes of different species sheds light on genetic adaptations and evolutionary divergence.

50. Neurotransmission: The dopamine reward pathway in the brain is implicated in addiction and motivation-related behaviors.

51. Microbial Biotechnology: Genetically engineered bacteria can produce valuable compounds like insulin, revolutionizing pharmaceutical production.

52. Bioinformatics: DNA sequence analysis reveals evolutionary relationships between species and uncovers hidden genetic information.

53. Animal Migration: The navigational abilities of migratory birds are influenced by magnetic fields and celestial cues.

54. Human Evolution: The discovery of ancient hominin fossils provides insights into the evolutionary timeline of our species.

55. Cancer Genetics: Mutations in tumor suppressor genes contribute to the uncontrolled growth and division of cancer cells.

56. Aquatic Biomes: Coral reefs, rainforests of the sea, host incredible biodiversity and face threats from climate change and pollution.

57. Genomic Medicine: Personalized treatments based on an individual’s genetic makeup hold promise for more effective healthcare.

58. Molecular Pharmacology: Understanding receptor-ligand interactions aids in the development of targeted drugs for specific diseases.

59. Biodiversity Conservation: Preserving habitat diversity is crucial to maintaining ecosystems and preventing species extinction.

60. Evolutionary Developmental Biology: Comparing embryonic development across species reveals shared genetic pathways and evolutionary constraints.

61. Plant Reproductive Strategies: Understanding the trade-offs between asexual and sexual reproduction in plants sheds light on their evolutionary success.

62. Parasite-Host Interactions: The coevolution of parasites and their hosts drives adaptations and counter-adaptations over time.

63. Genomic Diversity: Exploring genetic variations within populations helps uncover disease susceptibilities and evolutionary history.

64. Ecological Succession: Studying the process of ecosystem recovery after disturbances provides insights into resilience and stability.

65. Conservation Genetics: Genetic diversity assessment aids in formulating effective conservation strategies for endangered species.

66. Neuroplasticity and Learning: Investigating how the brain adapts through synaptic changes improves our understanding of memory and learning.

67. Synthetic Biology: Designing and engineering biological systems offers innovative solutions for medical, environmental, and industrial challenges.

68. Ethnobotany: Documenting the traditional uses of plants by indigenous communities informs both conservation and pharmaceutical research.

69. Ecological Niche Theory: Exploring how species adapt to specific ecological niches enhances our grasp of biodiversity patterns.

70. Ecosystem Services: Quantifying the benefits provided by ecosystems, like pollination and carbon sequestration, supports conservation efforts.

71. Fungal Biology: Investigating mycorrhizal relationships between fungi and plants illuminates nutrient exchange mechanisms.

72. Molecular Clock Hypothesis: Genetic mutations accumulate over time, providing a method to estimate evolutionary divergence dates.

73. Developmental Disorders: Unraveling the genetic and environmental factors contributing to developmental disorders informs therapeutic approaches.

74. Epigenetics and Disease: Epigenetic modifications contribute to the development of diseases like cancer, diabetes, and neurodegenerative disorders.

75. Animal Cognition: Studying cognitive abilities in animals unveils their problem-solving skills, social dynamics, and sensory perceptions.

76. Microbiota-Brain Axis: The gut-brain connection suggests a bidirectional communication pathway influencing mental health and behavior.

77. Neurological Disorders: Neurodegenerative diseases like Parkinson’s and Alzheimer’s have genetic and environmental components that drive their progression.

78. Plant Defense Mechanisms: Investigating how plants ward off pests and pathogens informs sustainable agricultural practices.

79. Conservation Genomics: Genetic data aids in identifying distinct populations and prioritizing conservation efforts for at-risk species.

80. Reproductive Strategies: Comparing reproductive methods in different species provides insights into evolutionary trade-offs and reproductive success.

81. Epigenetics in Aging: Exploring epigenetic changes in the aging process offers insights into longevity and age-related diseases.

82. Antimicrobial Resistance: Understanding the genetic mechanisms behind bacterial resistance to antibiotics informs strategies to combat the global health threat.

83. Plant-Animal Interactions: Investigating mutualistic relationships between plants and pollinators showcases the delicate balance of ecosystems.

84. Adaptations to Extreme Environments: Studying extremophiles reveals the remarkable ways organisms thrive in extreme conditions like deep-sea hydrothermal vents.

85. Genetic Disorders: Genetic mutations underlie numerous disorders like cystic fibrosis, sickle cell anemia, and muscular dystrophy.

86. Conservation Behavior: Analyzing the behavioral ecology of endangered species informs habitat preservation and restoration efforts.

87. Neuroplasticity in Rehabilitation: Harnessing the brain’s ability to rewire itself offers promising avenues for post-injury or post-stroke rehabilitation.

88. Disease Vectors: Understanding how mosquitoes transmit diseases like malaria and Zika virus is critical for disease prevention strategies.

89. Biochemical Pathways: Mapping metabolic pathways in cells provides insights into disease development and potential therapeutic targets.

90. Invasive Species Impact: Examining the effects of invasive species on native ecosystems guides management strategies to mitigate their impact.

91. Molecular Immunology: Studying the intricate immune response mechanisms aids in the development of vaccines and immunotherapies.

92. Plant-Microbe Symbiosis: Investigating how plants form partnerships with beneficial microbes enhances crop productivity and sustainability.

93. Cancer Immunotherapy: Harnessing the immune system to target and eliminate cancer cells offers new avenues for cancer treatment.

94. Evolution of Flight: Analyzing the adaptations leading to the development of flight in birds and insects sheds light on evolutionary innovation.

95. Genomic Diversity in Human Populations: Exploring genetic variations among different human populations informs ancestry, migration, and susceptibility to diseases.

96. Hormonal Regulation: Understanding the role of hormones in growth, reproduction, and homeostasis provides insights into physiological processes.

97. Conservation Genetics in Plant Conservation: Genetic diversity assessment helps guide efforts to conserve rare and endangered plant species.

98. Neuronal Communication: Investigating neurotransmitter systems and synaptic transmission enhances our comprehension of brain function.

99. Microbial Biogeography: Mapping the distribution of microorganisms across ecosystems aids in understanding their ecological roles and interactions.

100. Gene Therapy: Developing methods to replace or repair defective genes offers potential treatments for genetic disorders.

Scientific Hypothesis Statement Examples

This section offers diverse examples of scientific hypothesis statements that cover a range of biological topics. Each example briefly describes the subject matter and the potential implications of the hypothesis.

  • Genetic Mutations and Disease: Certain genetic mutations lead to increased susceptibility to autoimmune disorders, providing insights into potential treatment strategies.
  • Microplastics in Aquatic Ecosystems: Elevated microplastic levels disrupt aquatic food chains, affecting biodiversity and human health through bioaccumulation.
  • Bacterial Quorum Sensing: Inhibition of quorum sensing in pathogenic bacteria demonstrates a potential avenue for novel antimicrobial therapies.
  • Climate Change and Phenology: Rising temperatures alter flowering times in plants, impacting pollinator interactions and ecosystem dynamics.
  • Neuroplasticity and Learning: The brain’s adaptability facilitates learning through synaptic modifications, elucidating educational strategies for improved cognition.
  • CRISPR-Cas9 in Agriculture: CRISPR-engineered crops with enhanced pest resistance showcase a sustainable approach to improving agricultural productivity.
  • Invasive Species Impact on Predators: The introduction of invasive prey disrupts predator-prey relationships, triggering cascading effects in terrestrial ecosystems.
  • Microbial Contributions to Soil Health: Beneficial soil microbes enhance nutrient availability and plant growth, promoting sustainable agriculture practices.
  • Marine Protected Areas: Examining the effectiveness of marine protected areas reveals their role in preserving biodiversity and restoring marine ecosystems.
  • Epigenetic Regulation of Cancer: Epigenetic modifications play a pivotal role in cancer development, highlighting potential therapeutic targets for precision medicine.

Testable Hypothesis Statement Examples in Biology

Testability hypothesis is a critical aspect of a hypothesis. These examples are formulated in a way that allows them to be tested through experiments or observations. They focus on cause-and-effect relationships that can be verified or refuted.

  • Impact of Light Intensity on Plant Growth: Increasing light intensity accelerates photosynthesis rates and enhances overall plant growth.
  • Effect of Temperature on Enzyme Activity: Higher temperatures accelerate enzyme activity up to an optimal point, beyond which denaturation occurs.
  • Microbial Diversity in Soil pH Gradients: Soil pH influences microbial composition, with acidic soils favoring certain bacterial taxa over others.
  • Predation Impact on Prey Behavior: The presence of predators induces changes in prey behavior, resulting in altered foraging strategies and vigilance levels.
  • Chemical Communication in Marine Organisms: Investigating chemical cues reveals the role of allelopathy in competition among marine organisms.
  • Social Hierarchy in Animal Groups: Observing animal groups establishes a correlation between social rank and access to resources within the group.
  • Effect of Habitat Fragmentation on Pollinator Diversity: Fragmented habitats reduce pollinator species richness, affecting plant reproductive success.
  • Dietary Effects on Gut Microbiota Composition: Dietary shifts influence gut microbiota diversity and metabolic functions, impacting host health.
  • Hybridization Impact on Plant Fitness: Hybrid plants exhibit varied fitness levels depending on the combination of parent species.
  • Human Impact on Coral Bleaching: Analyzing coral reefs under different anthropogenic stresses identifies the main factors driving coral bleaching events.

Scientific Investigation Hypothesis Statement Examples in Biology

This section emphasizes hypotheses that are part of broader scientific investigations. They involve studying complex interactions or phenomena and often contribute to our understanding of larger biological systems.

  • Genomic Variation in Human Disease Susceptibility: Genetic analysis identifies variations associated with increased risk of common diseases, aiding personalized medicine.
  • Behavioral Responses to Temperature Shifts in Insects: Investigating insect responses to temperature fluctuations reveals adaptation strategies to climate change.
  • Endocrine Disruptors and Amphibian Development: Experimental exposure to endocrine disruptors elucidates their role in amphibian developmental abnormalities.
  • Microbial Succession in Decomposition: Tracking microbial communities during decomposition uncovers the succession patterns of different decomposer species.
  • Gene Expression Patterns in Stress Response: Studying gene expression profiles unveils the molecular mechanisms underlying stress responses in plants.
  • Effect of Urbanization on Bird Song Patterns: Urban noise pollution influences bird song frequency and complexity, impacting communication and mate attraction.
  • Nutrient Availability and Algal Blooms: Investigating nutrient loading in aquatic systems sheds light on factors triggering harmful algal blooms.
  • Host-Parasite Coevolution: Analyzing genetic changes in hosts and parasites over time uncovers coevolutionary arms races and adaptation.
  • Ecosystem Productivity and Biodiversity: Linking ecosystem productivity to biodiversity patterns reveals the role of species interactions in ecosystem stability.
  • Habitat Preference of Invasive Species: Studying the habitat selection of invasive species identifies factors promoting their establishment and spread.

Hypothesis Statement Examples in Biology Research

These examples are tailored for research hypothesis studies. They highlight hypotheses that drive focused research questions, often leading to specific experimental designs and data collection methods.

  • Microbial Community Structure in Human Gut: Investigating microbial diversity and composition unveils the role of gut microbiota in human health.
  • Plant-Pollinator Mutualisms: Hypothesizing reciprocal benefits in plant-pollinator interactions highlights the role of coevolution in shaping ecosystems.
  • Chemical Defense Mechanisms in Insects: Predicting the correlation between insect feeding behavior and chemical defenses explores natural selection pressures.
  • Evolutionary Significance of Mimicry: Examining mimicry in organisms demonstrates its adaptive value in predator-prey relationships and survival.
  • Neurological Basis of Mate Choice: Proposing neural mechanisms underlying mate choice behaviors uncovers the role of sensory cues in reproductive success.
  • Mycorrhizal Symbiosis Impact on Plant Growth: Investigating mycorrhizal colonization effects on plant biomass addresses nutrient exchange dynamics.
  • Social Learning in Primates: Formulating a hypothesis on primate social learning explores the transmission of knowledge and cultural behaviors.
  • Effect of Pollution on Fish Behavior: Anticipating altered behaviors due to pollution exposure highlights ecological consequences on aquatic ecosystems.
  • Coevolution of Flowers and Pollinators: Hypothesizing mutual adaptations between flowers and pollinators reveals intricate ecological relationships.
  • Genetic Basis of Disease Resistance in Plants: Identifying genetic markers associated with disease resistance enhances crop breeding programs.

Prediction Hypothesis Statement Examples in Biology

Predictive simple hypothesis involve making educated guesses about how variables might interact or behave under specific conditions. These examples showcase hypotheses that anticipate outcomes based on existing knowledge.

  • Pesticide Impact on Insect Abundance: Predicting decreased insect populations due to pesticide application underscores ecological ramifications.
  • Climate Change and Migratory Bird Patterns: Anticipating shifts in migratory routes of birds due to climate change informs conservation strategies.
  • Ocean Acidification Effect on Coral Calcification: Predicting reduced coral calcification rates due to ocean acidification unveils threats to coral reefs.
  • Disease Spread in Crowded Bird Roosts: Predicting accelerated disease transmission in densely populated bird roosts highlights disease ecology dynamics.
  • Eutrophication Impact on Freshwater Biodiversity: Anticipating decreased freshwater biodiversity due to eutrophication emphasizes conservation efforts.
  • Herbivore Impact on Plant Species Diversity: Predicting reduced plant diversity in areas with high herbivore pressure elucidates ecosystem dynamics.
  • Predator-Prey Population Cycles: Predicting cyclical fluctuations in predator and prey populations showcases the role of trophic interactions.
  • Climate Change and Plant Phenology: Anticipating earlier flowering times due to climate change demonstrates the influence of temperature on plant life cycles.
  • Antibiotic Resistance in Bacterial Communities: Predicting increased antibiotic resistance due to overuse forewarns the need for responsible antibiotic use.
  • Human Impact on Avian Nesting Success: Predicting decreased avian nesting success due to habitat fragmentation highlights conservation priorities.

How to Write a Biology Hypothesis – Step by Step Guide

A hypothesis in biology is a critical component of scientific research that proposes an explanation for a specific biological phenomenon. Writing a well-formulated hypothesis sets the foundation for conducting experiments, making observations, and drawing meaningful conclusions. Follow this step-by-step guide to create a strong biology hypothesis:

1. Identify the Phenomenon: Clearly define the biological phenomenon you intend to study. This could be a question, a pattern, an observation, or a problem in the field of biology.

2. Conduct Background Research: Before formulating a hypothesis, gather relevant information from scientific literature. Understand the existing knowledge about the topic to ensure your hypothesis builds upon previous research.

3. State the Independent and Dependent Variables: Identify the variables involved in the phenomenon. The independent variable is what you manipulate or change, while the dependent variable is what you measure as a result of the changes.

4. Formulate a Testable Question: Based on your background research, create a specific and testable question that addresses the relationship between the variables. This question will guide the formulation of your hypothesis.

5. Craft the Hypothesis: A hypothesis should be a clear and concise statement that predicts the outcome of your experiment or observation. It should propose a cause-and-effect relationship between the independent and dependent variables.

6. Use the “If-Then” Structure: Formulate your hypothesis using the “if-then” structure. The “if” part states the independent variable and the condition you’re manipulating, while the “then” part predicts the outcome for the dependent variable.

7. Make it Falsifiable: A good hypothesis should be testable and capable of being proven false. There should be a way to gather data that either supports or contradicts the hypothesis.

8. Be Specific and Precise: Avoid vague language and ensure that your hypothesis is specific and precise. Clearly define the variables and the expected relationship between them.

9. Revise and Refine: Once you’ve formulated your hypothesis, review it to ensure it accurately reflects your research question and variables. Revise as needed to make it more concise and focused.

10. Seek Feedback: Share your hypothesis with peers, mentors, or colleagues to get feedback. Constructive input can help you refine your hypothesis further.

Tips for Writing a Biology Hypothesis Statement

Writing a biology alternative hypothesis statement requires precision and clarity to ensure that your research is well-structured and testable. Here are some valuable tips to help you create effective and scientifically sound hypothesis statements:

1. Be Clear and Concise: Your hypothesis statement should convey your idea succinctly. Avoid unnecessary jargon or complex language that might confuse your audience.

2. Address Cause and Effect: A hypothesis suggests a cause-and-effect relationship between variables. Clearly state how changes in the independent variable are expected to affect the dependent variable.

3. Use Specific Language: Define your variables precisely. Use specific terms to describe the independent and dependent variables, as well as any conditions or measurements.

4. Follow the “If-Then” Structure: Use the classic “if-then” structure to frame your hypothesis. State the independent variable (if) and the expected outcome (then). This format clarifies the relationship you’re investigating.

5. Make it Testable: Your hypothesis must be capable of being tested through experimentation or observation. Ensure that there is a measurable and observable way to determine if it’s true or false.

6. Avoid Ambiguity: Eliminate vague terms that can be interpreted in multiple ways. Be precise in your language to avoid confusion.

7. Base it on Existing Knowledge: Ground your hypothesis in prior research or existing scientific theories. It should build upon established knowledge and contribute new insights.

8. Predict a Direction: Your hypothesis should predict a specific outcome. Whether you anticipate an increase, decrease, or a difference, your hypothesis should make a clear prediction.

9. Be Focused: Keep your hypothesis statement focused on one specific idea or relationship. Avoid trying to address too many variables or concepts in a single statement.

10. Consider Alternative Explanations: Acknowledge alternative explanations for your observations or outcomes. This demonstrates critical thinking and a thorough understanding of your field.

11. Avoid Value Judgments: Refrain from including value judgments or opinions in your hypothesis. Stick to objective and measurable factors.

12. Be Realistic: Ensure that your hypothesis is plausible and feasible. It should align with what is known about the topic and be achievable within the scope of your research.

13. Refine and Revise: Draft multiple versions of your hypothesis statement and refine them. Discuss and seek feedback from mentors, peers, or advisors to enhance its clarity and precision.

14. Align with Research Goals: Your hypothesis should align with the overall goals of your research project. Make sure it addresses the specific question or problem you’re investigating.

15. Be Open to Revision: As you conduct research and gather data, be open to revising your hypothesis if the evidence suggests a different outcome than initially predicted.

Remember, a well-crafted biology science hypothesis statement serves as the foundation of your research and guides your experimental design and data analysis. It’s essential to invest time and effort in formulating a clear, focused, and testable hypothesis that contributes to the advancement of scientific knowledge.

Twitter

Text prompt

  • Instructive
  • Professional

10 Examples of Public speaking

20 Examples of Gas lighting

Pardon Our Interruption

As you were browsing something about your browser made us think you were a bot. There are a few reasons this might happen:

  • You've disabled JavaScript in your web browser.
  • You're a power user moving through this website with super-human speed.
  • You've disabled cookies in your web browser.
  • A third-party browser plugin, such as Ghostery or NoScript, is preventing JavaScript from running. Additional information is available in this support article .

To regain access, please make sure that cookies and JavaScript are enabled before reloading the page.

IMAGES

  1. PPT

    biology hypothesis testing examples

  2. PPT

    biology hypothesis testing examples

  3. PPT

    biology hypothesis testing examples

  4. Hypothesis Testing Notes

    biology hypothesis testing examples

  5. Biology Hypothesis Examples

    biology hypothesis testing examples

  6. 05 Easy Steps for Hypothesis Testing with Examples

    biology hypothesis testing examples

COMMENTS

  1. 4 Examples of Hypothesis Testing in Real Life

    Example 1: Biology. Hypothesis tests are often used in biology to determine whether some new treatment, fertilizer, pesticide, chemical, etc. causes increased growth, stamina, immunity, etc. in plants or animals. For example, suppose a biologist believes that a certain fertilizer will cause plants to grow more during a one-month period than ...

  2. Hypothesis testing

    The technique used by the vast majority of biologists, and the technique that most of this handbook describes, is sometimes called "frequentist" or "classical" statistics. It involves testing a null hypothesis by comparing the data you observe in your experiment with the predictions of a null hypothesis. You estimate what the probability would ...

  3. Hypothesis Testing

    A hypothesis should be based on credible background information. A hypothesis is NOT just a guess (not even an educated one), although it can be based on your prior experience (such as in the example where the light won't turn on). In general, hypotheses in biology should be based on a credible, referenced source of information.

  4. Modern Statistics for Modern Biology

    Hypothesis testing is one of the workhorses of science. It is how we can draw conclusions or make decisions based on finite samples of data. ... 15 It can be confusing that the term sample has a different meaning in statistics than in biology. In biology, a sample is a single specimen on which an assay is performed; in statistics, it is a set ...

  5. 1.4: Basic Concepts of Hypothesis Testing

    This page titled 1.4: Basic Concepts of Hypothesis Testing is shared under a not declared license and was authored, remixed, and/or curated by John H. McDonald via source content that was edited to the style and standards of the LibreTexts platform. The technique used by the vast majority of biologists, and the technique that most of this ...

  6. An Introduction to Statistics: Understanding Hypothesis Testing and

    HYPOTHESIS TESTING. A clinical trial begins with an assumption or belief, and then proceeds to either prove or disprove this assumption. In statistical terms, this belief or assumption is known as a hypothesis. Counterintuitively, what the researcher believes in (or is trying to prove) is called the "alternate" hypothesis, and the opposite ...

  7. Genetics and Statistical Analysis

    In your experiment, there are two expected outcome phenotypes (tall and short), so n = 2 categories, and the degrees of freedom equal 2 - 1 = 1. Thus, with your calculated chi-square value (0.33 ...

  8. Hypothesis Testing

    There are 5 main steps in hypothesis testing: State your research hypothesis as a null hypothesis and alternate hypothesis (H o) and (H a or H 1 ). Collect data in a way designed to test the hypothesis. Perform an appropriate statistical test. Decide whether to reject or fail to reject your null hypothesis. Present the findings in your results ...

  9. Hypothesis Examples

    A hypothesis is a prediction of the outcome of a test. It forms the basis for designing an experiment in the scientific method. A good hypothesis is testable, meaning it makes a prediction you can check with observation or experimentation. Here are different hypothesis examples. Null Hypothesis Examples

  10. The scientific method (article)

    At the core of biology and other sciences lies a problem-solving approach called the scientific method. The scientific method has five basic steps, plus one feedback step: Make an observation. Ask a question. Form a hypothesis, or testable explanation. Make a prediction based on the hypothesis. Test the prediction.

  11. Experiments and Hypotheses

    When conducting scientific experiments, researchers develop hypotheses to guide experimental design. A hypothesis is a suggested explanation that is both testable and falsifiable. You must be able to test your hypothesis through observations and research, and it must be possible to prove your hypothesis false. For example, Michael observes that ...

  12. S.3.3 Hypothesis Testing Examples

    If the engineer used the P -value approach to conduct his hypothesis test, he would determine the area under a tn - 1 = t24 curve and to the right of the test statistic t * = 1.22: In the output above, Minitab reports that the P -value is 0.117. Since the P -value, 0.117, is greater than \ (\alpha\) = 0.05, the engineer fails to reject the null ...

  13. Chapter 3: Hypothesis Testing

    Components of a Formal Hypothesis Test. The null hypothesis is a statement about the value of a population parameter, such as the population mean (µ) or the population proportion (p).It contains the condition of equality and is denoted as H 0 (H-naught).. H 0: µ = 157 or H 0: p = 0.37. The alternative hypothesis is the claim to be tested, the opposite of the null hypothesis.

  14. Hypothesis Testing

    The results do not support the hypothesis, time to develop a new one! Hypothesis 2: the lamp is unplugged. Prediction 2: if I plug in the lamp, then the light will turn on. Experiment: plug in the lamp. Analyze the results: the light turned on! Conclusion: The light wouldn't turn on because the lamp was unplugged.

  15. Controlled experiments (article)

    In situations like these, biologists may use non-experimental forms of hypothesis testing. In a non-experimental hypothesis test, a researcher predicts observations or patterns that should be seen in nature if the hypothesis is correct. They then collect and analyze data, seeing whether the patterns are actually present.

  16. Biology and the scientific method review

    Meaning. Biology. The study of living things. Observation. Noticing and describing events in an orderly way. Hypothesis. A scientific explanation that can be tested through experimentation or observation. Controlled experiment. An experiment in which only one variable is changed.

  17. 7.1: Basics of Hypothesis Testing

    Figure 7.1.1. Before calculating the probability, it is useful to see how many standard deviations away from the mean the sample mean is. Using the formula for the z-score from chapter 6, you find. z = ¯ x − μo σ / √n = 490 − 500 25 / √30 = − 2.19. This sample mean is more than two standard deviations away from the mean.

  18. Hypothesis Testing

    Hypothesis Testing hypothesis testing experiment design scientific hypotheses come in pairs: the research hypothesis (h1) that states the potential relationship ... Biology 1. Assignments. 100% (5) 5. Chapter 10 active reading guide. Biology 1. Assignments. ... Example: H1: T he taller a person is, the longer their arms will be.

  19. Biology Hypothesis

    Writing a well-formulated hypothesis sets the foundation for conducting experiments, making observations, and drawing meaningful conclusions. Follow this step-by-step guide to create a strong biology hypothesis: 1. Identify the Phenomenon: Clearly define the biological phenomenon you intend to study.

  20. Understanding Statistics: Hypothesis Testing with One Sample (Lesson 9

    For a right-tailed test, the critical z for a hypothesis test that would reject Ho at the .01 level of significance i. 2.33 c. For a left-tailed test, the critical z for a hypothesis test that would reject Ho at the .05 level of significance i.-1.645 4. Match the hypothesis statements below to the correct description a. Ho: π = .5; Ha: π ≠ ...