Logo for Maricopa Open Digital Press

10 Chapter 10: Hypothesis Testing with Z

Setting up the hypotheses.

When setting up the hypotheses with z, the parameter is associated with a sample mean (in the previous chapter examples the parameters for the null used 0). Using z is an occasion in which the null hypothesis is a value other than 0. For example, if we are working with mothers in the U.S. whose children are at risk of low birth weight, we can use 7.47 pounds, the average birth weight in the US, as our null value and test for differences against that. For now, we will focus on testing a value of a single mean against what we expect from the population.

Using birthweight as an example, our null hypothesis takes the form: H 0 : μ = 7.47 Notice that we are testing the value for μ, the population parameter, NOT the sample statistic ̅X (or M). We are referring to the data right now in raw form (we have not standardized it using z yet). Again, using inferential statistics, we are interested in understanding the population, drawing from our sample observations. For the research question, we have a mean value from the sample to use, we have specific data is – it is observed and used as a comparison for a set point.

As mentioned earlier, the alternative hypothesis is simply the reverse of the null hypothesis, and there are three options, depending on where we expect the difference to lie. We will set the criteria for rejecting the null hypothesis based on the directionality (greater than, less than, or not equal to) of the alternative.

If we expect our obtained sample mean to be above or below the null hypothesis value (knowing which direction), we set a directional hypothesis. O ur alternative hypothesis takes the form based on the research question itself. In our example with birthweight, this could be presented as H A : μ > 7.47 or H A : μ < 7.47. 

Note that we should only use a directional hypothesis if we have a good reason, based on prior observations or research, to suspect a particular direction. When we do not know the direction, such as when we are entering a new area of research, we use a non-directional alternative hypothesis. In our birthweight example, this could be set as H A : μ ≠ 7.47

In working with data for this course we will need to set a critical value of the test statistic for alpha (α) for use of test statistic tables in the back of the book. This is determining the critical rejection region that has a set critical value based on α.

Determining Critical Value from α

We set alpha (α) before collecting data in order to determine whether or not we should reject the null hypothesis. We set this value beforehand to avoid biasing ourselves by viewing our results and then determining what criteria we should use.

When a research hypothesis predicts an effect but does not predict a direction for the effect, it is called a non-directional hypothesis . To test the significance of a non-directional hypothesis, we have to consider the possibility that the sample could be extreme at either tail of the comparison distribution. We call this a two-tailed test .

z score reject null hypothesis

Figure 1. showing a 2-tail test for non-directional hypothesis for z for area C is the critical rejection region.

When a research hypothesis predicts a direction for the effect, it is called a directional hypothesis . To test the significance of a directional hypothesis, we have to consider the possibility that the sample could be extreme at one-tail of the comparison distribution. We call this a one-tailed test .

z score reject null hypothesis

Figure 2. showing a 1-tail test for a directional hypothesis (predicting an increase) for z for area C is the critical rejection region.

Determining Cutoff Scores with Two-Tailed Tests

Typically we specify an α level before analyzing the data. If the data analysis results in a probability value below the α level, then the null hypothesis is rejected; if it is not, then the null hypothesis is not rejected. In other words, if our data produce values that meet or exceed this threshold, then we have sufficient evidence to reject the null hypothesis ; if not, we fail to reject the null (we never “accept” the null). According to this perspective, if a result is significant, then it does not matter how significant it is. Moreover, if it is not significant, then it does not matter how close to being significant it is. Therefore, if the 0.05 level is being used, then probability values of 0.049 and 0.001 are treated identically. Similarly, probability values of 0.06 and 0.34 are treated identically. Note we will discuss ways to address effect size (which is related to this challenge of NHST).

When setting the probability value, there is a special complication in a two-tailed test. We have to divide the significance percentage between the two tails. For example, with a 5% significance level, we reject the null hypothesis only if the sample is so extreme that it is in either the top 2.5% or the bottom 2.5% of the comparison distribution. This keeps the overall level of significance at a total of 5%. A one-tailed test does have such an extreme value but with a one-tailed test only one side of the distribution is considered.

z score reject null hypothesis

Figure 3. Critical value differences in one and two-tail tests. Photo Credit

Let’s re view th e set critical values for Z.

We discussed z-scores and probability in chapter 8.  If we revisit the z-score for 5% and 1%, we can identify the critical regions for the critical rejection areas from the unit standard normal table.

  • A two-tailed test at the 5% level has a critical boundary Z score of +1.96 and -1.96
  • A one-tailed test at the 5% level has a critical boundary Z score of +1.64 or -1.64
  • A two-tailed test at the 1% level has a critical boundary Z score of +2.58 and -2.58
  • A one-tailed test at the 1% level has a critical boundary Z score of +2.33 or -2.33.

Review: Critical values, p-values, and significance level

There are two criteria we use to assess whether our data meet the thresholds established by our chosen significance level, and they both have to do with our discussions of probability and distributions. Recall that probability refers to the likelihood of an event, given some situation or set of conditions. In hypothesis testing, that situation is the assumption that the null hypothesis value is the correct value, or that there is no effec t. The value laid out in H 0 is our condition under which we interpret our results. To reject this assumption, and thereby reject the null hypothesis, we need results that would be very unlikely if the null was true.

Now recall that values of z which fall in the tails of the standard normal distribution represent unlikely values. That is, the proportion of the area under the curve as or more extreme than z is very small as we get into the tails of the distribution. Our significance level corresponds to the area under the tail that is exactly equal to α: if we use our normal criterion of α = .05, then 5% of the area under the curve becomes what we call the rejection region (also called the critical region) of the distribution. This is illustrated in Figure 4.

image

Figure 4: The rejection region for a one-tailed test

The shaded rejection region takes us 5% of the area under the curve. Any result which falls in that region is sufficient evidence to reject the null hypothesis.

The rejection region is bounded by a specific z-value, as is any area under the curve. In hypothesis testing, the value corresponding to a specific rejection region is called the critical value, z crit (“z-crit”) or z* (hence the other name “critical region”). Finding the critical value works exactly the same as finding the z-score corresponding to any area under the curve like we did in Unit 1. If we go to the normal table, we will find that the z-score corresponding to 5% of the area under the curve is equal to 1.645 (z = 1.64 corresponds to 0.0405 and z = 1.65 corresponds to 0.0495, so .05 is exactly in between them) if we go to the right and -1.645 if we go to the left. The direction must be determined by your alternative hypothesis, and drawing then shading the distribution is helpful for keeping directionality straight.

Suppose, however, that we want to do a non-directional test. We need to put the critical region in both tails, but we don’t want to increase the overall size of the rejection region (for reasons we will see later). To do this, we simply split it in half so that an equal proportion of the area under the curve falls in each tail’s rejection region. For α = .05, this means 2.5% of the area is in each tail, which, based on the z-table, corresponds to critical values of z* = ±1.96. This is shown in Figure 5.

image

Figure 5: Two-tailed rejection region

Thus, any z-score falling outside ±1.96 (greater than 1.96 in absolute value) falls in the rejection region. When we use z-scores in this way, the obtained value of z (sometimes called z-obtained) is something known as a test statistic, which is simply an inferential statistic used to test a null hypothesis.

Calculate the test statistic: Z

Now that we understand setting up the hypothesis and determining the outcome, let’s examine hypothesis testing with z!  The next step is to carry out the study and get the actual results for our sample. Central to hypothesis test is comparison of the population and sample means. To make our calculation and determine where the sample is in the hypothesized distribution we calculate the Z for the sample data.

Make a decision

To decide whether to reject the null hypothesis, we compare our sample’s Z score to the Z score that marks our critical boundary. If our sample Z score falls inside the rejection region of the comparison distribution (is greater than the z-score critical boundary) we reject the null hypothesis.

The formula for our z- statistic has not changed:

z score reject null hypothesis

To formally test our hypothesis, we compare our obtained z-statistic to our critical z-value. If z obt > z crit , that means it falls in the rejection region (to see why, draw a line for z = 2.5 on Figure 1 or Figure 2) and so we reject H 0 . If z obt < z crit , we fail to reject. Remember that as z gets larger, the corresponding area under the curve beyond z gets smaller. Thus, the proportion, or p-value, will be smaller than the area for α, and if the area is smaller, the probability gets smaller. Specifically, the probability of obtaining that result, or a more extreme result, under the condition that the null hypothesis is true gets smaller.

Conversely, if we fail to reject, we know that the proportion will be larger than α because the z-statistic will not be as far into the tail. This is illustrated for a one- tailed test in Figure 6.

image

Figure 6. Relation between α, z obt , and p

When the null hypothesis is rejected, the effect is said to be statistically significant . Do not confuse statistical significance with practical significance. A small effect can be highly significant if the sample size is large enough.

Why does the word “significant” in the phrase “statistically significant” mean something so different from other uses of the word? Interestingly, this is because the meaning of “significant” in everyday language has changed. It turns out that when the procedures for hypothesis testing were developed, something was “significant” if it signified something. Thus, finding that an effect is statistically significant signifies that the effect is real and not due to chance. Over the years, the meaning of “significant” changed, leading to the potential misinterpretation.

Review: Steps of the Hypothesis Testing Process

The process of testing hypotheses follows a simple four-step procedure. This process will be what we use for the remained of the textbook and course, and though the hypothesis and statistics we use will change, this process will not.

Step 1: State the Hypotheses

Your hypotheses are the first thing you need to lay out. Otherwise, there is nothing to test! You have to state the null hypothesis (which is what we test) and the alternative hypothesis (which is what we expect). These should be stated mathematically as they were presented above AND in words, explaining in normal English what each one means in terms of the research question.

Step 2: Find the Critical Values

Next, we formally lay out the criteria we will use to test our hypotheses. There are two pieces of information that inform our critical values: α, which determines how much of the area under the curve composes our rejection region, and the directionality of the test, which determines where the region will be.

Step 3: Compute the Test Statistic

Once we have our hypotheses and the standards we use to test them, we can collect data and calculate our test statistic, in this case z . This step is where the vast majority of differences in future chapters will arise: different tests used for different data are calculated in different ways, but the way we use and interpret them remains the same.

Step 4: Make the Decision

Finally, once we have our obtained test statistic, we can compare it to our critical value and decide whether we should reject or fail to reject the null hypothesis. When we do this, we must interpret the decision in relation to our research question, stating what we concluded, what we based our conclusion on, and the specific statistics we obtained.

Example: Movie Popcorn

Let’s see how hypothesis testing works in action by working through an example. Say that a movie theater owner likes to keep a very close eye on how much popcorn goes into each bag sold, so he knows that the average bag has 8 cups of popcorn and that this varies a little bit, about half a cup. That is, the known population mean is μ = 8.00 and the known population standard deviation is σ =0.50. The owner wants to make sure that the newest employee is filling bags correctly, so over the course of a week he randomly assesses 25 bags filled by the employee to test for a difference (n = 25). He doesn’t want bags overfilled or under filled, so he looks for differences in both directions. This scenario has all of the information we need to begin our hypothesis testing procedure.

Our manager is looking for a difference in the mean cups of popcorn bags compared to the population mean of 8. We will need both a null and an alternative hypothesis written both mathematically and in words. We’ll always start with the null hypothesis:

H 0 : There is no difference in the cups of popcorn bags from this employee H 0 : μ = 8.00

Notice that we phrase the hypothesis in terms of the population parameter μ, which in this case would be the true average cups of bags filled by the new employee.

Our assumption of no difference, the null hypothesis, is that this mean is exactly

the same as the known population mean value we want it to match, 8.00. Now let’s do the alternative:

H A : There is a difference in the cups of popcorn bags from this employee H A : μ ≠ 8.00

In this case, we don’t know if the bags will be too full or not full enough, so we do a two-tailed alternative hypothesis that there is a difference.

Our critical values are based on two things: the directionality of the test and the level of significance. We decided in step 1 that a two-tailed test is the appropriate directionality. We were given no information about the level of significance, so we assume that α = 0.05 is what we will use. As stated earlier in the chapter, the critical values for a two-tailed z-test at α = 0.05 are z* = ±1.96. This will be the criteria we use to test our hypothesis. We can now draw out our distribution so we can visualize the rejection region and make sure it makes sense

image

Figure 7: Rejection region for z* = ±1.96

Step 3: Calculate the Test Statistic

Now we come to our formal calculations. Let’s say that the manager collects data and finds that the average cups of this employee’s popcorn bags is ̅X = 7.75 cups. We can now plug this value, along with the values presented in the original problem, into our equation for z:

So our test statistic is z = -2.50, which we can draw onto our rejection region distribution:

image

Figure 8: Test statistic location

Looking at Figure 5, we can see that our obtained z-statistic falls in the rejection region. We can also directly compare it to our critical value: in terms of absolute value, -2.50 > -1.96, so we reject the null hypothesis. We can now write our conclusion:

When we write our conclusion, we write out the words to communicate what it actually means, but we also include the average sample size we calculated (the exact location doesn’t matter, just somewhere that flows naturally and makes sense) and the z-statistic and p-value. We don’t know the exact p-value, but we do know that because we rejected the null, it must be less than α.

Effect Size

When we reject the null hypothesis, we are stating that the difference we found was statistically significant, but we have mentioned several times that this tells us nothing about practical significance. To get an idea of the actual size of what we found, we can compute a new statistic called an effect size. Effect sizes give us an idea of how large, important, or meaningful a statistically significant effect is.

For mean differences like we calculated here, our effect size is Cohen’s d :

z score reject null hypothesis

Effect sizes are incredibly useful and provide important information and clarification that overcomes some of the weakness of hypothesis testing. Whenever you find a significant result, you should always calculate an effect size

Table 1. Interpretation of Cohen’s d

Example: Office Temperature

Let’s do another example to solidify our understanding. Let’s say that the office building you work in is supposed to be kept at 74 degree Fahrenheit but is allowed

to vary by 1 degree in either direction. You suspect that, as a cost saving measure, the temperature was secretly set higher. You set up a formal way to test your hypothesis.

You start by laying out the null hypothesis:

H 0 : There is no difference in the average building temperature H 0 : μ = 74

Next you state the alternative hypothesis. You have reason to suspect a specific direction of change, so you make a one-tailed test:

H A : The average building temperature is higher than claimed H A : μ > 74

image

Now that you have everything set up, you spend one week collecting temperature data:

You calculate the average of these scores to be 𝑋̅ = 76.6 degrees. You use this to calculate the test statistic, using μ = 74 (the supposed average temperature), σ = 1.00 (how much the temperature should vary), and n = 5 (how many data points you collected):

z = 76.60 − 74.00 = 2.60    = 5.78

          1.00/√5            0.45

This value falls so far into the tail that it cannot even be plotted on the distribution!

image

Figure 7: Obtained z-statistic

You compare your obtained z-statistic, z = 5.77, to the critical value, z* = 1.645, and find that z > z*. Therefore you reject the null hypothesis, concluding: Based on 5 observations, the average temperature (𝑋̅ = 76.6 degrees) is statistically significantly higher than it is supposed to be, z = 5.77, p < .05.

d = (76.60-74.00)/ 1= 2.60

The effect size you calculate is definitely large, meaning someone has some explaining to do!

Example: Different Significance Level

First, let’s take a look at an example phrased in generic terms, rather than in the context of a specific research question, to see the individual pieces one more time. This time, however, we will use a stricter significance level, α = 0.01, to test the hypothesis.

We will use 60 as an arbitrary null hypothesis value: H 0 : The average score does not differ from the population H 0 : μ = 50

We will assume a two-tailed test: H A : The average score does differ H A : μ ≠ 50

We have seen the critical values for z-tests at α = 0.05 levels of significance several times. To find the values for α = 0.01, we will go to the standard normal table and find the z-score cutting of 0.005 (0.01 divided by 2 for a two-tailed test) of the area in the tail, which is z crit * = ±2.575. Notice that this cutoff is much higher than it was for α = 0.05. This is because we need much less of the area in the tail, so we need to go very far out to find the cutoff. As a result, this will require a much larger effect or much larger sample size in order to reject the null hypothesis.

We can now calculate our test statistic.  The average of 10 scores is M = 60.40 with a µ = 60. We will use σ = 10 as our known population standard deviation. From this information, we calculate our z-statistic as:

Our obtained z-statistic, z = 0.13, is very small. It is much less than our critical value of 2.575. Thus, this time, we fail to reject the null hypothesis. Our conclusion would look something like:

Notice two things about the end of the conclusion. First, we wrote that p is greater than instead of p is less than, like we did in the previous two examples. This is because we failed to reject the null hypothesis. We don’t know exactly what the p- value is, but we know it must be larger than the α level we used to test our hypothesis. Second, we used 0.01 instead of the usual 0.05, because this time we tested at a different level. The number you compare to the p-value should always be the significance level you test at. Because we did not detect a statistically significant effect, we do not need to calculate an effect size. Note: some statisticians will suggest to always calculate effects size as a possibility of Type II error. Although insignificant, calculating d = (60.4-60)/10 = .04 which suggests no effect (and not a possibility of Type II error).

Review Considerations in Hypothesis Testing

Errors in hypothesis testing.

Keep in mind that rejecting the null hypothesis is not an all-or-nothing decision. The Type I error rate is affected by the α level: the lower the α level the lower the Type I error rate. It might seem that α is the probability of a Type I error. However, this is not correct. Instead, α is the probability of a Type I error given that the null hypothesis is true. If the null hypothesis is false, then it is impossible to make a Type I error. The second type of error that can be made in significance testing is failing to reject a false null hypothesis. This kind of error is called a Type II error.

Statistical Power

The statistical power of a research design is the probability of rejecting the null hypothesis given the sample size and expected relationship strength. Statistical power is the complement of the probability of committing a Type II error. Clearly, researchers should be interested in the power of their research designs if they want to avoid making Type II errors. In particular, they should make sure their research design has adequate power before collecting data. A common guideline is that a power of .80 is adequate. This means that there is an 80% chance of rejecting the null hypothesis for the expected relationship strength.

Given that statistical power depends primarily on relationship strength and sample size, there are essentially two steps you can take to increase statistical power: increase the strength of the relationship or increase the sample size. Increasing the strength of the relationship can sometimes be accomplished by using a stronger manipulation or by more carefully controlling extraneous variables to reduce the amount of noise in the data (e.g., by using a within-subjects design rather than a between-subjects design). The usual strategy, however, is to increase the sample size. For any expected relationship strength, there will always be some sample large enough to achieve adequate power.

Inferential statistics uses data from a sample of individuals to reach conclusions about the whole population. The degree to which our inferences are valid depends upon how we selected the sample (sampling technique) and the characteristics (parameters) of population data. Statistical analyses assume that sample(s) and population(s) meet certain conditions called statistical assumptions.

It is easy to check assumptions when using statistical software and it is important as a researcher to check for violations; if violations of statistical assumptions are not appropriately addressed then results may be interpreted incorrectly.

Learning Objectives

Having read the chapter, students should be able to:

  • Conduct a hypothesis test using a z-score statistics, locating critical region, and make a statistical decision including.
  • Explain the purpose of measuring effect size and power, and be able to compute Cohen’s d.

Exercises – Ch. 10

  • List the main steps for hypothesis testing with the z-statistic. When and why do you calculate an effect size?
  • z = 1.99, two-tailed test at α = 0.05
  • z = 1.99, two-tailed test at α = 0.01
  • z = 1.99, one-tailed test at α = 0.05
  • You are part of a trivia team and have tracked your team’s performance since you started playing, so you know that your scores are normally distributed with μ = 78 and σ = 12. Recently, a new person joined the team, and you think the scores have gotten better. Use hypothesis testing to see if the average score has improved based on the following 8 weeks’ worth of score data: 82, 74, 62, 68, 79, 94, 90, 81, 80.
  • A study examines self-esteem and depression in teenagers.  A sample of 25 teens with a low self-esteem are given the Beck Depression Inventory.  The average score for the group is 20.9.  For the general population, the average score is 18.3 with σ = 12.  Use a two-tail test with α = 0.05 to examine whether teenagers with low self-esteem show significant differences in depression.
  • You get hired as a server at a local restaurant, and the manager tells you that servers’ tips are $42 on average but vary about $12 (μ = 42, σ = 12). You decide to track your tips to see if you make a different amount, but because this is your first job as a server, you don’t know if you will make more or less in tips. After working 16 shifts, you find that your average nightly amount is $44.50 from tips. Test for a difference between this value and the population mean at the α = 0.05 level of significance.

Answers to Odd- Numbered Exercises – Ch. 10

1. List hypotheses. Determine critical region. Calculate z.  Compare z to critical region. Draw Conclusion.  We calculate an effect size when we find a statistically significant result to see if our result is practically meaningful or important

5. Step 1: H 0 : μ = 42 “My average tips does not differ from other servers”, H A : μ ≠ 42 “My average tips do differ from others”

Introduction to Statistics for Psychology Copyright © 2021 by Alisa Beyer is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License , except where otherwise noted.

Share This Book

z score reject null hypothesis

  • The Open University
  • Guest user / Sign out
  • Study with The Open University

My OpenLearn Profile

Personalise your OpenLearn profile, save your favourite content and get recognition for your learning

About this free course

Become an ou student, download this course, share this free course.

Data analysis: hypothesis testing

Start this free course now. Just create an account and sign in. Enrol and complete the course for a free statement of participation or digital badge if available.

5.1 Acceptance and rejection regions

The z-score table you created in Activity 5 represents the area under the normal distribution bell curve left of z (as shown in Figure 17).

A symmetrical graph resembling a bell. Areas left of z are coloured orange in the graph.

A symmetrical graph resembling a bell. Areas left of z are coloured orange in the graph.

The entries in this table can be used to determine whether to accept or reject the null hypothesis.

Suppose a marketing team at a company wishes to test the hypothesis that a new ad campaign will lead to a significant increase in sales. The team could use a one-tailed test with the reject region in the upper (right) tail and an alpha level of 1%.

Using the table created in Activity 5, the team can identify the range of z-scores that correspond to this test. They can then calculate the test statistic based on the data collected from the sales during and after the ad campaign. If the calculated test statistic falls within the rejection region identified by the table, the team can reject the null hypothesis and conclude that the ad campaign has had a significant impact on sales. This information can be used by the marketing team to justify the investment in the ad campaign and to make informed decisions about future marketing strategies.

In the context of the marketing team's hypothesis testing, the reject region for the one-tailed test with an alpha level of 1% corresponds to the range of z-scores that fall within the top 1% of the normal distribution. Conversely, the acceptable range refers to the range of z-scores that corresponds to the remaining 99% of the distribution to the left of z . Using the table created in Activity 5, the marketing team can identify the specific range of z-scores that correspond to the acceptable range and the reject region. Based on this table, the z-score of 2.33 corresponds to the upper limit of the acceptable range, as the area to the left of z = 2.33 represents approximately 99% of the area under the curve.

Therefore, if the team obtains a calculated z-score that is greater than 2.33, they can reject the null hypothesis and conclude that the new ad campaign has had a significant impact on sales. This information can help the marketing team make data-driven decisions about future campaigns and allocate resources effectively to maximise sales and profits. Figure 18 below illustrates this.

A symmetrical graph reminiscent of a bell showing the z-score azis and the rejection regions of null hypothesis

A symmetrical graph reminiscent of a bell. The graph points out z-score axis. Areas left of z are coloured orange in the graph. It also circles the rejection regions of null hypothesis when z  = 2.33 and alpha = 0.01.

Other than creating a z-score table, you calculate the region to the left of z by using the Excel formula NORM.S.DIST(z, cumulative). For example, you can calculate the region left of z when z = 2.33 by simply entering 2.33 as a z-score and setting the cumulative to be ‘TRUE’ in this Excel formula.

A table showing the entry of Excel formula and value ‘NORM.S.DIST(2.33, TRUE)’

A picture of a table made in Excel. It shows the entry of Excel formula and value ‘NORM.S.DIST(2.33, TRUE)’.

A table displaying the result 0.9901

A picture of a table made in Excel. After the calculation, it displays the result (0.9901).

You should get a value reading of 0.9901, which is exactly what you found in the z-score table in row 2.3 and column 0.03.

Here is another question. If you want to test hypotheses using the two-tailed test with the alpha level equal to 0.5%, how can you determine the z-scores region to reject the null hypothesis?

The two-tailed test requires you to divide the levels of alpha by 2.

Therefore, α for the two-tailed test = 0.05/2 = 0.0250

As the z-score table shows the area to the left of the value of z, a two-tailed test requires you to identify two entries. The area of one entry covers 0.975 (97.5%) of the area (where 0.025 of the area is outside the value of z on the right tail), and the area of another entry covers 0.025 of the area on the left tail.

Using the z-score table, you can determine the z-score = 1.96 or -1.96. Therefore, you will reject the null hypothesis for obtained z-score > 1.96 or z-score

Previous

Introduction to Statistics and Data Analysis

6 hypothesis testing: the z-test.

We’ve all had the experience of standing at a crosswalk waiting staring at a pedestrian traffic light showing the little red man. You’re waiting for the little green man so you can cross. After a little while you’re still waiting and there aren’t any cars around. You might think ‘this light is really taking a long time’, but you continue waiting. Minutes pass and there’s still no little green man. At some point you come to the conclusion that the light is broken and you’ll never see that little green man. You cross on the little red man when it’s clear.

You may not have known this but you just conducted a hypothesis test. When you arrived at the crosswalk, you assumed that the light was functioning properly, although you will always entertain the possibility that it’s broken. In terms of hypothesis testing, your ‘null hypothesis’ is that the light is working and your ‘alternative hypothesis’ is that it’s broken. As time passes, it seems less and less likely that light is working properly. Eventually, the probability of the light working given how long you’ve been waiting becomes so low that you reject the null hypothesis in favor of the alternative hypothesis.

This sort of reasoning is the backbone of hypothesis testing and inferential statistics. It’s also the point in the course where we turn the corner from descriptive statistics to inferential statistics. Rather than describing our data in terms of means and plots, we will now start using our data to make inferences, or generalizations, about the population that our samples are drawn from. In this course we’ll focus on standard hypothesis testing where we set up a null hypothesis and determine the probability of our observed data under the assumption that the null hypothesis is true (the much maligned p-value). If this probability is small enough, then we conclude that our data suggests that the null hypothesis is false, so we reject it.

In this chapter, we’ll introduce hypothesis testing with examples from a ‘z-test’, when we’re comparing a single mean to what we’d expect from a population with known mean and standard deviation. In this case, we can convert our observed mean into a z-score for the standard normal distribution. Hence the name z-test.

It’s time to introduce the hypothesis test flow chart . It’s pretty self explanatory, even if you’re not familiar with all of these hypothesis tests. The z-test is (1) based on means, (2) with only one mean, and (3) where we know \(\sigma\) , the standard deviation of the population. Here’s how to find the z-test in the flow chart:

z score reject null hypothesis

6.1 Women’s height example

Let’s work with the example from the end of the last chapter where we started with the fact that the heights of US women has a mean of 63 and a standard deviation of 2.5 inches. We calculated that the average height of the 122 women in Psych 315 is 64.7 inches. We then used the central limit theorem and calculated the probability of a random sample 122 heights from this population having a mean of 64.7 or greater is 2.4868996^{-14}. This is a very, very small number.

Here’s how we do it using R:

Let’s think of our sample as a random sample of UW psychology students, which is a reasonable assumption since all psychology students have to take a statistics class. What does this sample say about the psychology students that are women at UW compared to the US population? It could be that these psychology students at UW have the same mean and standard deviation as the US population, but our sample just happens to have an unusual number of tall women, but we calculated that the probability of this happening is really low. Instead, it makes more sense that the population that we’re drawing from has a mean that’s greater than the US population mean. Notice that we’re making a conclusion about the whole population of women psychology students based on our one sample.

Using the terminology of hypothesis testing, we first assumed the null hypothesis that UW women psych students have the same mean (and standard deviation) as the US population. The null hypothesis is written as:

\[ H_{0}: \mu = 63 \] In this example, our alternative hypothesis is that the mean of our population is larger than the mean of null hypothesis population. We write this as:

\[ H_{A}: \mu > 63 \]

Next, after obtaining a random sample and calculate the mean, we calculate the probability of drawing a mean this large (or larger) from the null hypothesis distribution.

If this probability is low enough, we reject the null hypothesis in favor of the alternative hypothesis. When our probability allows us to reject the null hypothesis, we say that our observed results are ‘statistically significant’.

In statistics terms, we never say we ‘accept that alternative hypothesis’ as true. All we can say is that we don’t think the null hypothesis is true. I know it’s subtle, but in science can never prove that a hypothesis is true or not. There’s always the possibility that we just happened to grab an unusual sample from the null hypothesis distribution.

6.2 The hated p<.05

The probability that we obtain our observed mean or greater given that the null hypothesis is true is called the p-value. How improbable is improbable enough to reject the null hypothesis? The p-value for our example above on women’s heights is astronomically low, so it’s clear that we should reject \(H_{0}\) .

The p-value that’s on the border of rejection is called the alpha ( \(\alpha\) ) value. We reject \(H_{0}\) when our p-value is less than \(\alpha\) .

You probably know that the most common value of alpha is \(\alpha = .05\) .

The first publication of this value dates back to Sir Ronald Fisher, in his seminal 1925 book Statistical Methods for Research Workers where he states:

“It is convenient to take this point as a limit in judging whether a deviation is considered significant or not. Deviations exceeding twice the standard deviation are thus formally regarded as significant.” (p. 47)

If you read the chapter on the normal distribution, then you should know that 95% of the area under the normal distribution lies within \(\pm\) two standard deviations of the mean. So the probability of obtaining a sample that exceeds two standard deviations from the mean (in either direction) is .05.

6.3 IQ example

Let’s do an example using IQ scores. IQ scores are normalized to have a mean of 100 and a standard deviation of 15 points. Because they’re normalized, they are a rare example of a population which has a known mean and standard deviation. In the next chapter we’ll discuss the t-test, which is used in the more common situation when we don’t know the population standard deviation.

Suppose you have the suspicion that graduate students have higher IQ’s than the general population. You have enough time to go and measure the IQ’s of 25 randomly sampled grad students and obtain a mean of 105. Is this difference between our this observed mean and 100 statistically significant using an alpha value of \(\alpha = 0.05\) ?

Here the null hypothesis is:

\[ H_{0}: \mu = 100\]

And the alternative hypothesis is:

\[ H_{A}: \mu > 100 \]

We know that the parameters for the null hypothesis are:

\[ \mu = 100 \] and \[ \sigma = 15 \]

From this, we can calculate the probability of observing our mean of 105 or higher using the central limit theorem and what we know about the normal distribution:

\[ \sigma_{\bar{x}} = \frac{\sigma_{x}}{\sqrt{n}} = \frac{15}{\sqrt{25}} = 3 \] From this, we can calculate the probability of our observed mean using R’s ‘pnorm’ function. Here’s how to do the whole thing in R.

Since our p-value of 0.0478 is (just barely) less than our chosen value of \(\alpha = 0.05\) as our criterion, we reject \(H_{0}\) for this (contrived) example and conclude that our observed mean of 105 is significantly greater than 100, so our study suggests that the average graduate student has a higher IQ than the overall population.

You should feel uncomfortable making such a hard, binary decision for such a borderline case. After all, if we had chosen our second favorite value of alpha, \(\alpha = .01\) , we would have failed to reject \(H_{0}\) . This discomfort is a primary reason why statisticians are moving away from this discrete decision making process. Later on we’ll discuss where things are going, including reporting effect sizes, and using confidence intervals.

6.4 Alpha values vs. critical values

Using R’s qnorm function, we can find the z-score for which only 5% of the area lies above:

So the probability of a randomly sampled z-score exceeding 1.644854 is less than 5%. It follows that if we convert our observed mean into z-score values, we will reject \(H_{0}\) if and only if our z-score is greater than 1.644854. This value is called the ‘critical value’ because it lies on the boundary between rejecting and failing to reject \(H_{0}\) .

In our last example, the z-score for our observed mean is:

\[ z = \frac{X-\mu}{\frac{\sigma}{\sqrt{n}}} = \frac{105 - 100}{3} = 1.67 \] Our z-score is just barely greater than the critical value of 1.644854, which makes sense because our p-value is just barely less than 0.05.

Sometimes you’ll see textbooks will compare critical values to observed scores for the decision making process in hypothesis testing. This dates back to days were computers were less available and we had to rely on tables instead. There wasn’t enough space in a book to hold complete tables which prohibited the ability to look up a p-value for any observed value. Instead only critical values for specific values of alpha were included. If you look at really old papers, you’ll see statistics reported as \(p<.05\) or \(p<.01\) instead of actual p-values for this reason.

It may help to visualize the relationship between p-values, alpha values and critical values like this:

z score reject null hypothesis

The red shaded region is the upper 5% of the standard normal distribution which starts at the critical value of z=1.644854. This is sometimes called the ‘rejection region’. The blue vertical line is drawn at our observed value of z=1.67. You can see that the red line falls just inside the rejection region, so we Reject \(H_{0}\) !

6.5 One vs. two-tailed tests

Recall that our alternative hypothesis was to reject if our mean IQ was significantly greater than the null hypothesis mean: \(H_{A}: \mu > 100\) . This implies that the situation where \(\mu < 100\) is never even in consideration, which is weird. In science, we’re trying to understand the true state of the world. Although we have a hunch that grad student IQ’s are higher than average, there is always the possibility that they are lower than average. If our sample came up with an IQ well below 100, we’d simply fail to reject \(H_{0}\) and move on. This feels like throwing out important information.

The test we just ran is called a ‘one-tailed’ test because we only reject \(H_{0}\) if our results fall in one of the two tails of the population distribution.

Instead, it might make more sense to reject \(H_{0}\) if we get either an unusually large or small score. This means we need two critical values - one above and one below zero. At first thought you might think we just duplicate our critical value from a one-tailed test to the other side. But will double the area of the rejection region. That’s not a good thing because if \(H_{0}\) is true, there’s actually a \(2\alpha\) probability that we’ll draw a score in the rejection region.

Instead, we divide the area into two tails, each containing an area of \(\frac{\alpha}{2}\) . For \(\alpha\) = 0.05, we can find the critical value of z with qnorm:

So with a two-tailed test at \(\alpha = 0.05\) we reject \(H_{0}\) if our observed z-score is either above z = 1.96 or less than -1.96. This is that value around 2 that Sir Ronald Fischer was talking about!

Here’s what the critical regions and observed value of z looks like for our example with a two-tailed test:

z score reject null hypothesis

You can see that splitting the area of \(\alpha = 0.05\) into two halves increased the critical value in the positive direction from 1.64 to 1.96, making it harder to reject \(H_{0}\) . For our example, this changes our decision: our observed value of z = 1.67 no longer falls into the rejection region, so now we fail to reject \(H_{0}\) .

If we now fail to reject \(H_{0}\) , what about the p-value? Remember, for a one-tailed test, p = \(\alpha\) if our observed z-score lands right on the critical value of z. The same is true for a two-tailed test. But the z-score moved so that the area above that score is \(\frac{\alpha}{2}\) . So for a two-tailed test, in order to have a p-value of \(\alpha\) when our z-score lands right on the critical value, we need to double p-value hat we’d get for a one-tailed test.

For our example, the p-value for the one tailed test was \(p=0.0478\) . So if we use a two-tailed test, our p-value is \((2)(0.0478) = 0.0956\) . This value is greater than \(\alpha\) = 0.05, which makes sense because we just showed above that we fail to reject \(H_{0}\) with a two tailed test.

Which is the right test, one-tailed or two-tailed? Ideally, as scientists, we should be agnostic about the results of our experiment. But in reality, we all know that the results are more interesting if they are statistically significant. So you can imagine that for this example, given a choice between one and two-tailed, we’d choose a one-tailed test so that we can reject \(H_{0}\) .

There are two problems with this. First, we should never adjust our choice of hypothesis test after we observe the data. That would be an example of ‘p-hacking’, a topic we’ll discuss later. Second, most statisticians these days strongly recommend against one-tailed tests. The only reason for a one-tailed test is if there is no logical or physical possibility for a population mean to fall below the null hypothesis mean.

Hypothesis Testing with Z-Test: Significance Level and Rejection Region

Join over 2 million students who advanced their careers with 365 Data Science. Learn from instructors who have worked at Meta, Spotify, Google, IKEA, Netflix, and Coca-Cola and master Python, SQL, Excel, machine learning, data analysis, AI fundamentals, and more.

z score reject null hypothesis

If you want to understand why hypothesis testing works, you should first have an idea about the significance level and the reject region . We assume you already know what a hypothesis is , so let’s jump right into the action.

What Is the Significance Level?

First, we must define the term significance level .

Normally, we aim to reject the null if it is false.

Significance level

However, as with any test, there is a small chance that we could get it wrong and reject a null hypothesis that is true.

Error, significance level

How Is the Significance Level Denoted?

The significance level is denoted by α and is the probability of rejecting the null hypothesis , if it is true.

α and is the probability of rejecting the null hypothesis, significance level

So, the probability of making this error.

Typical values for α are 0.01, 0.05 and 0.1. It is a value that we select based on the certainty we need. In most cases, the choice of α is determined by the context we are operating in, but 0.05 is the most commonly used value.

Most common, significance level

A Case in Point

Say, we need to test if a machine is working properly. We would expect the test to make little or no mistakes. As we want to be very precise, we should pick a low significance level such as 0.01.

The famous Coca Cola glass bottle is 12 ounces. If the machine pours 12.1 ounces, some of the liquid would be spilled, and the label would be damaged as well. So, in certain situations, we need to be as accurate as possible.

Significance level: Coca Cola example

Higher Degree of Error

However, if we are analyzing humans or companies, we would expect more random or at least uncertain behavior. Hence, a higher degree of error.

You expect more random behavior, significance level

For instance, if we want to predict how much Coca Cola its consumers drink on average, the difference between 12 ounces and 12.1 ounces will not be that crucial. So, we can choose a higher significance level like 0.05 or 0.1.

The difference between 12 and 12.1, significance level

Hypothesis Testing: Performing a Z-Test

Now that we have an idea about the significance level , let’s get to the mechanics of hypothesis testing.

Imagine you are consulting a university and want to carry out an analysis on how students are performing on average.

How students are performing on average, significance-level

The university dean believes that on average students have a GPA of 70%. Being the data-driven researcher that you are, you can’t simply agree with his opinion, so you start testing.

The null hypothesis is: The population mean grade is 70%.

This is a hypothesized value.

The alternative hypothesis is: The population mean grade is not 70%. You can see how both of them are denoted, below.

University Dean example: Null hypothesis equals the population mean

Visualizing the Grades

Assuming that the population of grades is normally distributed, all grades received by students should look in the following way.

Distribution of grades, significance level

That is the true population mean .

Performing a Z-test

Now, a test we would normally perform is the Z-test . The formula is:

Z equals the sample mean , minus the hypothesized mean , divided by the standard error .

Z equals the sample mean, minus the hypothesized mean, divided by the standard error, significance level

The idea is the following.

We are standardizing or scaling the sample mean we got. (You can quickly obtain it with our Mean, Median, Mode calculator .) If the sample mean is close enough to the hypothesized mean , then Z will be close to 0. Otherwise, it will be far away from it. Naturally, if the sample mean is exactly equal to the hypothesized mean , Z will be 0.

If the sample mean is exactly equal to the hypothesized mean, Z will be 0, significance level

In all these cases, we would accept the null hypothesis .

What Is the Rejection Region?

The question here is the following:

How big should Z be for us to reject the null hypothesis ?

Well, there is a cut-off line. Since we are conducting a two-sided or a two-tailed test, there are two cut-off lines, one on each side.

Distribution of Z (standard normal distribution), significance level

When we calculate Z , we will get a value. If this value falls into the middle part, then we cannot reject the null. If it falls outside, in the shaded region, then we reject the null hypothesis .

That is why the shaded part is called: rejection region , as you can see below.

Rejection region, significance level

What Does the Rejection Region Depend on?

The area that is cut-off actually depends on the significance level .

Say the level of significance , α , is 0.05. Then we have α divided by 2, or 0.025 on the left side and 0.025 on the right side.

The level of significance, α, is 0.05. Then we have α divided by 2, or 0.025 on the left side and 0.025 on the right side

Now these are values we can check from the z-table . When α is 0.025, Z is 1.96. So, 1.96 on the right side and minus 1.96 on the left side.

Therefore, if the value we get for Z from the test is lower than minus 1.96, or higher than 1.96, we will reject the null hypothesis . Otherwise, we will accept it.

One-sided test: Z score is 1.96

That’s more or less how hypothesis testing works.

We scale the sample mean with respect to the hypothesized value. If Z is close to 0, then we cannot reject the null. If it is far away from 0, then we reject the null hypothesis .

How does hypothesis testing work?

Example of One Tailed Test

What about one-sided tests? We have those too!

Let’s consider the following situation.

Paul says data scientists earn more than $125,000. So, H 0 is: μ 0 is bigger than $125,000.

The alternative is that μ 0 is lower or equal to 125,000.

Using the same significance level , this time, the whole rejection region is on the left. So, the rejection region has an area of α . Looking at the z-table, that corresponds to a Z -score of 1.645. Since it is on the left, it is with a minus sign.

One-sided test: Z score is 1.645

Accept or Reject

Now, when calculating our test statistic Z , if we get a value lower than -1.645, we would reject the null hypothesis . We do that because we have statistical evidence that the data scientist salary is less than $125,000. Otherwise, we would accept it.

One-sided test: Z score is - 1.645 - rejecting null hypothesis

Another One-Tailed Test

To exhaust all possibilities, let’s explore another one-tailed test.

Say the university dean told you that the average GPA students get is lower than 70%. In that case, the null hypothesis is:

μ 0 is lower than 70%.

While the alternative is:

μ 0` is bigger or equal to 70%.

University Dean example: Null hypothesis lower than the population mean

In this situation, the rejection region is on the right side. So, if the test statistic is bigger than the cut-off z-score, we would reject the null, otherwise, we wouldn’t.

One-sided test: test statistic is bigger than the cut-off z-score - reject the null hypothesis

Importance of the Significance Level and the Rejection Region

To sum up, the significance level and the reject region are quite crucial in the process of hypothesis testing. The level of significance conducts the accuracy of prediction. We (the researchers) choose it depending on how big of a difference a possible error could make. On the other hand, the reject region helps us decide whether or not to reject the null hypothesis . After reading this and putting both of them into use, you will realize how convenient they make your work.

Interested in taking your skills from good to great? Try statistics course for free !

Next Tutorial:  Providing a Few Linear Regression Examples

World-Class

Data Science

Learn with instructors from:

Iliya Valchanov

Co-founder of 365 Data Science

Iliya is a finance graduate with a strong quantitative background who chose the exciting path of a startup entrepreneur. He demonstrated a formidable affinity for numbers during his childhood, winning more than 90 national and international awards and competitions through the years. Iliya started teaching at university, helping other students learn statistics and econometrics. Inspired by his first happy students, he co-founded 365 Data Science to continue spreading knowledge. He authored several of the program’s online courses in mathematics, statistics, machine learning, and deep learning.

We Think you'll also like

Hypothesis Testing: Null Hypothesis and Alternative Hypothesis

Statistics Tutorials

Hypothesis Testing: Null Hypothesis and Alternative Hypothesis

Article by Iliya Valchanov

False Positive vs. False Negative: Type I and Type II Errors in Statistical Hypothesis Testing

Calculating and Using Covariance and Linear Correlation Coefficient

Calculating and Using Covariance and Linear Correlation Coefficient

Examples of Numerical and Categorical Variables

Examples of Numerical and Categorical Variables

Library homepage

  • school Campus Bookshelves
  • menu_book Bookshelves
  • perm_media Learning Objects
  • login Login
  • how_to_reg Request Instructor Account
  • hub Instructor Commons

Margin Size

  • Download Page (PDF)
  • Download Full Book (PDF)
  • Periodic Table
  • Physics Constants
  • Scientific Calculator
  • Reference & Cite
  • Tools expand_more
  • Readability

selected template will load here

This action is not available.

Statistics LibreTexts

8.3: Hypothesis Test for One Mean

  • Last updated
  • Save as PDF
  • Page ID 24057

  • Rachel Webb
  • Portland State University

\( \newcommand{\vecs}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \)

\( \newcommand{\vecd}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash {#1}}} \)

\( \newcommand{\id}{\mathrm{id}}\) \( \newcommand{\Span}{\mathrm{span}}\)

( \newcommand{\kernel}{\mathrm{null}\,}\) \( \newcommand{\range}{\mathrm{range}\,}\)

\( \newcommand{\RealPart}{\mathrm{Re}}\) \( \newcommand{\ImaginaryPart}{\mathrm{Im}}\)

\( \newcommand{\Argument}{\mathrm{Arg}}\) \( \newcommand{\norm}[1]{\| #1 \|}\)

\( \newcommand{\inner}[2]{\langle #1, #2 \rangle}\)

\( \newcommand{\Span}{\mathrm{span}}\)

\( \newcommand{\id}{\mathrm{id}}\)

\( \newcommand{\kernel}{\mathrm{null}\,}\)

\( \newcommand{\range}{\mathrm{range}\,}\)

\( \newcommand{\RealPart}{\mathrm{Re}}\)

\( \newcommand{\ImaginaryPart}{\mathrm{Im}}\)

\( \newcommand{\Argument}{\mathrm{Arg}}\)

\( \newcommand{\norm}[1]{\| #1 \|}\)

\( \newcommand{\Span}{\mathrm{span}}\) \( \newcommand{\AA}{\unicode[.8,0]{x212B}}\)

\( \newcommand{\vectorA}[1]{\vec{#1}}      % arrow\)

\( \newcommand{\vectorAt}[1]{\vec{\text{#1}}}      % arrow\)

\( \newcommand{\vectorB}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \)

\( \newcommand{\vectorC}[1]{\textbf{#1}} \)

\( \newcommand{\vectorD}[1]{\overrightarrow{#1}} \)

\( \newcommand{\vectorDt}[1]{\overrightarrow{\text{#1}}} \)

\( \newcommand{\vectE}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash{\mathbf {#1}}}} \)

There are three methods used to test hypotheses:

The Traditional Method (Critical Value Method)

There are five steps in hypothesis testing when using the traditional method:

  • Identify the claim and formulate the hypotheses.
  • Compute the test statistic.
  • Compute the critical value(s) and state the rejection rule (the rule by which you will reject the null hypothesis (H 0 ).
  • Make the decision to reject or not reject the null hypothesis by comparing the test statistic to the critical value(s). Reject H 0 when the test statistic is in the critical tail(s).
  • Summarize the results and address the claim using context and units from the research question.

Steps ii and iii do not have to be in that order so make sure you know the difference between the critical value, which comes from the stated significance level \(\alpha\), and the test statistic, which is calculated from the sample data.

Note: The test statistic and the critical value(s) come from the same distribution and will usually have the same letter such as z, t, or F. The critical value(s) will have a subscript with the lower tail area \((z_{\alpha}, z_{1–\alpha}, z_{\alpha / 2})\) or an asterisk next to it (z*) to distinguish it from the test statistic.

You can find the critical value(s) or test statistic in any order, but make sure you know the difference when you compare the two. The critical value is found from α and is the start of the shaded area called the critical region (also called rejection region or area). The test statistic is computed using sample data and may or may not be in the critical region.

The critical value(s) is set before you begin (a priori) by the level of significance you are using for your test. This critical value(s) defines the shaded area known as the rejection area. The test statistic for this example is the z-score we find using the sample data that is then compared to the shaded tail(s). When the test statistic is in the shaded rejection area, you reject the null hypothesis. When your test statistic is not in the shaded rejection area, then you fail to reject the null hypothesis. Depending on if your claim is in the null or the alternative, the sample data may or may not support your claim.

The P-value Method

Most modern statistics and research methods utilize this method with the advent of computers and graphing calculators.

There are five steps in hypothesis testing when using the p-value method:

  • Compute the p-value.
  • Make the decision to reject or not reject the null hypothesis by comparing the p-value with \(\alpha\). Reject H0 when the p-value ≤ \(\alpha\).
  • Summarize the results and address the claim.

The ideas below review the process of evaluating hypothesis tests with p-values:

  • The null hypothesis represents a skeptic’s position or a position of no difference. We reject this position only if the evidence strongly favors the alternative hypothesis.
  • A small p-value means that if the null hypothesis is true, there is a low probability of seeing a point estimate at least as extreme as the one we saw. We interpret this as strong evidence in favor of the alternative hypothesis.
  • The p-value is constructed in such a way that we can directly compare it to the significance level (\(\alpha\)) to determine whether to reject H 0 . We reject the null hypothesis if the p-value is smaller than the significance level, \(\alpha\), which is usually 0.05. Otherwise, we fail to reject H 0 .
  • We should always state the conclusion of the hypothesis test in plain language use context and units so non-statisticians can also understand the results.

The Confidence Interval Method (results are in the same units as the data)

There are four steps in hypothesis testing when using the confidence interval method:

  • Compute confidence interval.
  • Make the decision to reject or not reject the null hypothesis by comparing the p-value with \(\alpha\). Reject H 0 when the hypothesized value found in H 0 is outside the bounds of the confidence interval. We only will be doing a two-tailed version of this.

For all 3 methods, Step i is the most important step. If you do not correctly set up your hypotheses then the next steps will be incorrect.

The decision and summary would be the same no matter which method you use. Figure 8-12 is a flow chart that may help with starting your summaries, but make sure you finish the sentence with context and units from the question.

clipboard_e1140413bcf9562500c92f289898037f4.png

Figure 8-12

The hypothesis-testing framework is a very general tool, and we often use it without a second thought. If a person makes a somewhat unbelievable claim, we are initially skeptical. However, if there is sufficient evidence that supports the claim, we set aside our skepticism and reject the null hypothesis in favor of the alternative.

8.3.1 Z-Test

When the population standard deviation is known and stated in the problem, we will use the z-test .

The z-test is a statistical test for the mean of a population. It can be used when σ is known. The population should be approximately normally distributed when n < 30.

When using this model, the test statistic is \(Z=\frac{\bar{x}-\mu_{0}}{\left(\frac{\sigma}{\sqrt{n}}\right)}\) where µ 0 is the test value from the H 0 .

M&Ms candies advertise a mean weight of 0.8535 grams. A sample of 50 M&M candies are randomly selected from a bag of M&Ms and the mean is found to be \(\overline{ x }\) = 0.8472 grams. The standard deviation of the weights of all M&Ms is (somehow) known to be σ = 0.06 grams. A skeptic M&M consumer claims that the mean weight is less than what is advertised. Test this claim using the traditional method of hypothesis testing. Use a 5% level of significance.

By letting \(\alpha\) = 0.05, we are allowing a 5% chance that the null hypothesis (average weight that is at least 0.8535 grams) is rejected when in actuality it is true.

1. Identify the Claim: The claim is “M&Ms candies have a mean weight that is less than 0.8535 grams.” This translates mathematically to µ < 0.8535 grams. Therefore, the null and alternative hypotheses are:

H0: µ = 0.8535

H1: µ < 0.8535 (claim)

This is a left-tailed test since the alternative hypothesis has a “less than” sign.

We are performing a test about a population mean. We can use the z-test because we were given a population standard deviation σ (not a sample standard deviation s). In practice, σ is rarely known and usually comes from a similar study or previous year’s data.

2. Find the Critical Value: The critical value for a left-tailed test with a level of significance \(\alpha\) = 0.05 is found in a way similar to finding the critical values from confidence intervals. Because we are using the z-test, we must find the critical value \(z_{\alpha}\) from the z (standard normal) distribution.

This is a left-tailed test since the sign in the alternative hypothesis is < (most of the time a left-tailed test will have a negative z-score test statistic).

clipboard_e30dde96c4893d599ab1beb1114682332.png

Figure 8-13

First draw your curve and shade the appropriate tale with the area \(\alpha\) = 0.05. Usually the technology you are using only asks for the area in the left tail, which in this case is \(\alpha\) = 0.05. For the TI calculators, under the DISTR menu use invNorm(0.05,0,1) = –1.645. See Figure 8-13.

For Excel use =NORM.S.INV(0.05).

3. Find the Test Statistic: The formula for the test statistic is the z-score that we used back in the Central Limit Theorem section \(z=\frac{\bar{x}-\mu_{0}}{\left(\frac{\sigma}{\sqrt{n}}\right)}=\frac{0.8472-0.8535}{\left(\frac{0.06}{\sqrt{50}}\right)}=-0.7425\).

4. Make the Decision: Figure 8-14 shows both the critical value and the test statistic. There are only two possible correct answers for the decision step.

i. Reject H 0

ii. Fail to reject H 0

clipboard_e7d1b91a777de3955030ec329dee1860c.png

Figure 8-14

To make the decision whether to “Do not reject H 0 ” or “Reject H 0 ” using the traditional method, we must compare the test statistic z = –0.7425 with the critical value z α = –1.645.

When the test statistic is in the shaded tail, called the rejection area, then we would reject H 0 , if not then we fail to reject H 0 . Since the test statistic z ≈ –0.7425 is in the unshaded region, the decision is: Do not reject H 0 .

5. Summarize the Results: At 5% level of significance, there is not enough evidence to support the claim that the mean weight is less than 0.8535 grams.

Example 8-5 used the traditional critical value method. With the onset of computers, this method is outdated and the p-value and confidence interval methods are becoming more popular.

Most statistical software packages will give a p-value and confidence interval but not the critical value.

TI-84: Press the [STAT] key, go to the [TESTS] menu, arrow down to the [Z-Test] option and press the [ENTER] key. Arrow over to the [Stats] menu and press the [ENTER] key. Then type in value for the hypothesized mean (µ 0 ), standard deviation, sample mean, sample size, arrow over to the \(\neq\), <, > sign that is in the alternative hypothesis statement then press the [ENTER] key, arrow down to [Calculate] and press the [ENTER] key. Alternatively (If you have raw data in a list) Select the [Data] menu and press the [ENTER] key. Then type in the value for the hypothesized mean (µ 0 ), type in your list name (TI-84 L 1 is above the 1 key).

clipboard_e84bcfd52053cf54117018344a8499a9d.png

Press the [STAT] key, go to the [TESTS] menu, arrow down to either the [Z-Test] option and press the [ENTER] key. Arrow over to the [Stats] menu and press the [ENTER] key. Then type in value for the hypothesized mean (µ 0 ), standard deviation, sample mean, sample size, arrow over to the \(\neq\), <, > sign that is in the alternative hypothesis statement then press the [ENTER] key, arrow down to [Calculate] and press the [ENTER] key. Alternatively (If you have raw data in a list) Select the [Data] menu and press the [ENTER] key. Then type in the value for the hypothesized mean (µ 0 ), type in your list name (TI-84 L 1 is above the 1 key).

clipboard_e40992627be2973329e831eef7c18a74d.png

The calculator returns the alternative hypothesis (check and make sure you selected the correct sign), the test statistic, p-value, sample mean, and sample size.

TI-89: Go in to the Stat/List Editor App. Select [F6] Tests. Select the first option Z-Test. Select Data if you have raw data in a list, select Stats if you have the summarized statistics given to you in the problem. If you have data, press [2nd] Var-Link, the go down to list1 in the main folder to select the list name. If you have statistics then enter the values. Leave Freq:1 alone, arrow over to the \(\neq\), <, > sign that is in the alternative hypothesis statement then press the [ENTER]key, arrow down to [Calculate] and press the [ENTER] key. The calculator returns the test statistic and the p-value.

clipboard_e2e00e29f95abe7de3f5f9594417245e7.png

What is the p-value?

The p-value is the probability of observing an effect as least as extreme as in your sample data, assuming that the null hypothesis is true. The p-value is calculated based on the assumptions that the null hypothesis is true for the population and that the difference in the sample is caused entirely by random chance.

Recall the example at the beginning of the chapter.

Suppose a manufacturer of a new laptop battery claims the mean life of the battery is 900 days with a standard deviation of 40 days. You are the buyer of this battery and you think this claim is inflated. You would like to test your belief because without a good reason you cannot get out of your contract. You take a random sample of 35 batteries and find that the mean battery life is 890 days. Test the claim using the p-value method. Let \(\alpha\) = 0.05.

We had the following hypotheses:

H 0 : μ = 900, since the manufacturer says the mean life of a battery is 900 days.

H 1 : μ < 900, since you believe the mean life of the battery is less than 900 days.

The test statistic was found to be: \(Z=\frac{\bar{x}-\mu_{0}}{\left(\frac{\sigma}{\sqrt{n}}\right)}=\frac{890-900}{\left(\frac{40}{\sqrt{35}}\right)}=-1.479\).

The p-value is P(\(\overline{ x }\) < 890 | H 0 is true) = P(\(\overline{ x }\)< 890 | μ = 900) = P(Z < –1.479).

On the TI Calculator use normalcdf(-1E99,890,900,40/\(\sqrt{35}\)) \(\approx\) 0.0696. See Figure 8-15.

clipboard_e4da98f35311c37ecda88f5e7fe48fe2e.png

Figure 8-15

Alternatively, in Excel use =NORM.DIST(890,900,40/SQRT(35),TRUE) \(\approx\) 0.0696.

clipboard_e8cf70bc4aa8ffbbe69077034f968135b.png

The TI calculators will easily find the p-value for you.

clipboard_e53381c2d9a67d1ae7089a5ec2247d34b.png

Now compare the p-value = 0.0696 to \(\alpha\) = 0.05. Make the decision to reject or not reject the null hypothesis by comparing the p-value with \(\alpha\). Reject H 0 when the p-value ≤ α, and do not reject H0 when the p-value > \(\alpha\). The p-value for this example is larger than alpha 0.0696 > 0.05, therefore the decision is to not reject H 0 .

Since we fail to reject the null, there is not enough evidence to indicate that the mean life of the battery is less than 900 days.

8.3.2 T-Test

When the population standard deviation is unknown, we will use the t-test .

The t-test is a statistical test for the mean of a population. It will be used when σ is unknown. The population should be approximately normally distributed when n < 30.

When using this model, the test statistic is \(t=\frac{\bar{x}-\mu_{0}}{\left(\frac{s}{\sqrt{n}}\right)}\) where µ 0 is the test value from the H 0 . The degrees of freedom are df = n – 1.

The z and t-tests are easy to mix up. Sometimes a standard deviation will be stated in the problem without specifying if it is a population’s standard deviation σ or the sample standard deviation s. If the standard deviation is in the same sentence that describes the sample or only raw data is given then this would be s. When you only have sample data, use the t-test.

Figure 8-16 is a flow chart to remind you when to use z versus t.

clipboard_e81d4ef166161821bd95dc4f6bbe41a1b.png

Figure 8-16

Use Figure 8-17 as a guide in setting up your hypotheses. The two-tailed test will always have a not equal ≠ sign in H 1 and both tails shaded. The right-tailed test will always have the greater than > sign in H 1 and the right tail shaded. The left-tailed test will always have a less than < sign in H 1 and the left tail shaded.

clipboard_ef6be25246ae148e1db8fb754cb98c10d.png

Figure 8-17

The label on a particular brand of cream of mushroom soup states that (on average) there is 870 mg of sodium per serving. A nutritionist would like to test if the average is actually more than the stated value. To test this, 13 servings of this soup were randomly selected and amount of sodium measured. The sample mean was found to be 882.4 mg and the sample standard deviation was 24.3 mg. Assume that the amount of sodium per serving is normally distributed. Test this claim using the traditional method of hypothesis testing. Use the \(\alpha\) = 0.05 level of significance.

Step 1: State the hypotheses and identify the claim: The statement “the average is more (>) than 870” must be in the alternative hypothesis. Therefore, the null and alternative hypotheses are:

H 0 : µ = 870

H 1 : µ > 870 (claim)

This is a right-tailed test with the claim in the alternative hypothesis.

Step 2: Compute the test statistic: We are using the t-test because we are performing a test about a population mean. We must use the t-test (instead of the z-test) because the population standard deviation σ is unknown. (Note: be sure that you know why we are using the t-test instead of the z-test in general.)

The formula for the test statistic is \(t=\frac{\bar{x}-\mu_{0}}{\left(\frac{S}{\sqrt{n}}\right)}=\frac{882.4-870}{\left(\frac{24.3}{\sqrt{13}}\right)}=1.8399\).

Note: If you were given raw data use 1-var Stats on your calculator to find the sample mean, sample size and sample standard deviation.

Step 3: Compute the critical value(s): The critical value for a right-tailed test with a level of significance \(\alpha\) = 0.05 is found in a way similar to finding the critical values from confidence intervals.

Since we are using the t-test, we must find the critical value t 1–\(\alpha\) from a t-distribution with the degrees of freedom, df = n – 1 = 13 –1 = 12. Use the DISTR menu invT option. Note that if you have an older TI-84 or a TI-83 calculator you need to have the invT program installed or use Excel.

Draw and label the t-distribution curve with the critical value as in Figure 8-18.

clipboard_e2073659de08a6b4aeecacf48f025219e.png

Figure 8-18

The critical value is t 1–\(\alpha\) = 1.782 and the rejection rule becomes: Reject H 0 if the test statistic t ≥ t 1–\(\alpha\) = 1.782.

Step 4: State the decision. Decision: Since the test statistic t =1.8399 is in the critical region, we should Reject H 0 .

Step 5: State the summary. Summary: At the 5% significance level, we have sufficient evidence to say that the average amount of sodium per serving of cream of mushroom soup exceeds the stated 870 mg amount.

Example 8-7 Continued:

Use the prior example, but this time use the p-value method . Again, let the significance level be \(\alpha\) = 0.05.

Step 1 : The hypotheses remain the same. H 0 : µ = 870

Step 2: The test statistic remains the same, \(t=\frac{\bar{x}-\mu_{0}}{\left(\frac{S}{\sqrt{n}}\right)}=\frac{882.4-870}{\left(\frac{24.3}{\sqrt{13}}\right)}=1.8399\).

Step 3: Compute the p-value.

For a right-tailed test, the p-value is found by finding the area to the right of the test statistic t = 1.8339 under a tdistribution with 12 degrees of freedom. See Figure 8-19.

clipboard_e31493abc8d7bd3d835479d9e187566db.png

Figure 8-19

Note that exact p-values for a t-test can only be found using a computer or calculator. For the TI calculators this is in the DISTR menu. Use tcdf(lower,upper, df ).

For this example, we would have p-value = tcdf(1.8399,∞,12) = 0.0453.

Step 4: State the decision. The rejection rule: reject the null hypothesis if the p-value ≤ \(\alpha\). Decision: Since the p-value = 0.0453 is less than \(\alpha\) = 0.05, we Reject H 0 . This agrees with the decision from the traditional method. (These two methods should always agree!)

Step 5: State the summary. The summary remains the same as in the previous method. At the 5% significance level, we have sufficient evidence to say that the average amount of sodium per serving of cream of mushroom soup exceeds the stated 870 mg amount.

We can use technology to get the test statistic and p-value.

TI-84: If you have raw data, enter the data into a list before you go to the test menu. Press the [STAT] key, arrow over to the [TESTS] menu, arrow down to the [2:T-Test] option and press the [ENTER] key. Arrow over to the [Stats] menu and press the [ENTER] key. Then type in the hypothesized mean (µ 0 ), sample or population standard deviation, sample mean, sample size, arrow over to the \(\neq\), <, > sign that is the same as the problem’s alternative hypothesis statement then press the [ENTER] key, arrow down to [Calculate] and press the [ENTER] key. The calculator returns the t-test statistic and p-value.

clipboard_e1f7ce7702f1fd056f927e431cd9249a6.png

Alternatively (If you have raw data in list one) Arrow over to the [Data] menu and press the [ENTER] key. Then type in the hypothesized mean (µ 0 ), L 1 , leave Freq:1 alone, arrow over to the \(\neq\), <, > sign that is the same in the problem’s alternative hypothesis statement then press the [ENTER] key, arrow down to [Calculate] and press the [ENTER] key. The calculator returns the t-test statistic and the p-value.

TI-89: Go to the [Apps] Stat/List Editor, then press [2 nd ] then F6 [Tests], then select 2: T-Test. Choose the input method, data is when you have entered data into a list previously or stats when you are given the mean and standard deviation already. Then type in the hypothesized mean (μ 0 ), sample standard deviation, sample mean, sample size (or list name (list1), and Freq: 1), arrow over to the \(\neq\), <, > and select the sign that is the same as the problem’s alternative hypothesis statement then press the [ENTER] key to calculate. The calculator returns the t-test statistic and p-value.

clipboard_ee6ee07b247a8dbe3c99babfd60690494.png

The weight of the world’s smallest mammal is the bumblebee bat (also known as Kitti’s hog-nosed bat or Craseonycteris thonglongyai ) is approximately normally distributed with a mean 1.9 grams. Such bats are roughly the size of a large bumblebee. A chiropterologist believes that the Kitti’s hog-nosed bats in a new geographical region under study has a different average weight than 1.9 grams. A sample of 10 bats weighed in grams in the new region are shown below. Use the confidence interval method to test the claim that mean weight for all bumblebee bats is not 1.9 g using a 10% level of significance.

clipboard_eded04400d8044b884aaddc3055aeac28.png

Step 1: State the hypotheses and identify the claim. The key phrase is “mean weight not equal to 1.9 g.” In mathematical notation, this is μ ≠ 1.9. The not equal ≠ symbol is only allowed in the alternative hypothesis so the hypotheses would be:

H 0 : μ = 1.9

H 1 : μ ≠ 1.9

Step 2: Compute the confidence interval. First, find the t critical value using df = n – 1 = 9 and 90% confidence. In Excel t \(\alpha\) /2 = T.INV(.1/2,9) = 1.833113.

Then use technology to find the sample mean and sample standard deviation and substitute in your numbers to the formula.

\(\begin{aligned} &\bar{x} \pm t_{\alpha / 2}\left(\frac{s}{\sqrt{n}}\right) \\ &\Rightarrow 1.985 \pm 1.833113\left(\frac{0.235242}{\sqrt{10}}\right) \\ &\Rightarrow 1.985 \pm 1.833113(0.07439) \\ &\Rightarrow 1.985 \pm 0.136365 \\ &\Rightarrow(1.8486,2.1214) \end{aligned}\)

The answer can be given as an inequality 1.8486 < µ < 2.1214

or in interval notation (1.8486, 2.1214).

Step 3: Make the decision: The rejection rule is to reject H0 when the hypothesized value found in H 0 is outside the bounds of the confidence interval. The null hypothesis was μ = 1.9 g. Since 1.9 is between the lower and upper boundary of the confidence interval 1.8486 < µ < 2.1214 then we would not reject H 0 .

The sampling distribution, assuming the null hypothesis is true, will have a mean of μ = 1.9 and a standard error of \(\frac{0.2352}{\sqrt{10}}=0.07439\). When we calculated the confidence interval using the sample mean of 1.985 the confidence interval captured the hypothesized mean of 1.9. See Figure 8-20.

clipboard_e1ff990721e3ff41b0f17bc336941ad97.png

Figure 8-20

Step 4: State the summary: At the 10% significance level, there is not enough evidence to support the claim that the population mean weight for bumblebee bats in the new geographical region is different from 1.9 g.

This interval can also be computed using a TI calculator or Excel.

TI-84: Enter the data in a list, choose Tests > TInterval. Select and highlight Data, change the list and confidence level to match the question. Choose Calculate.

clipboard_e05f42f706148f07b9c5b947adde95723.png

Excel: Select Data Analysis > Descriptive Statistics: Note, you will need to change the cell reference numbers to where you copy and paste your data, only check the label box if you selected the label in the input range, and change the confidence level to 1 – \(\alpha\).

clipboard_ee0f9e73c4603bccdf7780454d5d6d5de.png

Below is the Excel output. Excel only calculates the descriptive statistics with the margin of error.

clipboard_efad58f58634f5aa4fb0915e33cd36ab3.png

Use Excel to find each piece of the interval \(\bar{x} \pm t_{\alpha / 2}\left(\frac{s}{\sqrt{n}}\right)\).

Excel \(t_{\alpha / 2}\) = T.INV(0.1/2,9) = 1.8311.

\(\begin{aligned} &\bar{x} \pm t_{\alpha / 2}\left(\frac{s}{\sqrt{n}}\right) \\ &\Rightarrow 1.985 \pm 1.8311\left(\frac{0.2352}{\sqrt{10}}\right) \\ &\Rightarrow 1.985 \pm 1.8311(0.07439) \end{aligned}\)

Can you find the mean and standard error \(\frac{s}{\sqrt{n}}=0.07439\) in the Excel output?

\(\Rightarrow 1.985 \pm 0.136365\)

Can you find the margin of error \(t_{\frac{\alpha}{2}}\left(\frac{s}{\sqrt{n}}\right)=0.136365\) in the Excel output?

Subtract and add the margin of error from the sample mean to get each confidence interval boundary (1.8486, 2.1214).

If we have raw data, Excel will do both the traditional and p-value method.

Example 8-8 Continued:

Step 1: State the hypotheses. The hypotheses are: H 0 : μ = 1.9

Step 2: Compute the test statistic, \(t=\frac{\bar{x}-\mu_{0}}{\left(\frac{s}{\sqrt{n}}\right)}=\frac{1.985-1.9}{\left(\frac{.235242}{\sqrt{10}}\right)}=1.1426\)

Verify using Excel. Excel does not have a one-sample t-test, but it does have a twosample t-test that can be used with a dummy column of zeros as the second sample to get the results for just one sample. Copy over the data into cell A1. In column B, next to the data, type in a dummy column of zeros, and label it Dummy. (We frequently use placeholders in statistics called dummy variables.)

clipboard_e8fe4ea3b1ac9776d898cad2e7ebc32c2.png

Select the Data Analysis tool and then select t-Test: Paired Two Sample for Means, then select OK.

clipboard_e3e25b9a5546cfb3adc2524d7d021a559.png

For the Variable 1 Range select the data in cells A1:A11, including the label. For the Variable 2 Range select the dummy column of zeros in cells B1:B11, including the label. Change the hypothesized mean to 1.9. Check the Labels box and change the alpha value to 0.10, then select OK.

clipboard_e0cb7805c1a1e3d2f112ec1e586082c82.png

Excel provides the following output:

clipboard_edb0ff9a634588c2198429949099261bc.png

Step 3: Compute the p-value. Since the alternative hypothesis has a ≠ symbol, use the Excel output next two-tailed p-value = 0.2826.

Step 4: Make the decision. For the p-value method we would compare the two-tailed p-value = 0.2826 to \(\alpha\) = 0.10. The rule is to reject H 0 if the p-value ≤ \(\alpha\). In this case the p-value > \(\alpha\), therefore we do not reject H 0 . Again, the same decision as the confidence interval method.

For the critical value method, we would compare the test statistic t = 1.142625 with the critical values for a twotailed test \(t_{\frac{\alpha}{2}}\) = ±1.833113. Since the test statistic is between –1.8331 and 1.8331 we would not reject H 0 , which is the same decision using the p-value method or the confidence interval method.

Step 5: State the summary. There is not enough evidence to support the claim that the population mean weight for all bumblebee bats is not equal to 1.9 g.

One-Tailed Versus Two-Tailed Tests

Most software packages do not ask which tailed test you are performing. Make sure you look at the sign in the alternative hypothesis to and determine which p-value to use. The difference is just what part of the picture you are looking at. In Excel, the critical value shown is for a one-tail test and does not specify left or right tail. The critical value in the output will always be positive, it is up to you to know if the critical value should be a negative or positive value. For example, Figures 8-21, 8-22, and 8-23 uses df = 9, \(\alpha\) = 0.10 to show all three tests comparing either the test statistic with the critical value or the p-value with \(\alpha\).

Two-Tailed Test

The test statistic can be negative or positive depending on what side of the distribution it falls; however, the p-value is a probability and will always be a positive number between 0 and 1. See Figure 8-21.

clipboard_e5ee81ac959096119cce3bfbdab8735e2.png

Figure 8-21

Right-Tailed Test

If we happened to do a right-tailed test with df = 9 and \(\alpha\) = 0.10, the critical value t 1-\(\alpha\) = 1.383 will be in the right tail and usually the test statistic will be a positive number. See Figure 8-22.

clipboard_e7fe1f5f4c7373d90184ab24f2ffd3be4.png

Figure 8-22

Left-Tailed Test

If we happened to do a left-tailed test with df = 9 and \(\alpha\) = 0.10, the critical value t \(\alpha\) = –1.383 will be in the left tail and usually the test statistic will be a negative number. See Figure 8-23.

clipboard_ee09b4c2bae7bf3f01077a238ffbefc0a.png

Figure 8-23

User Preferences

Content preview.

Arcu felis bibendum ut tristique et egestas quis:

  • Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris
  • Duis aute irure dolor in reprehenderit in voluptate
  • Excepteur sint occaecat cupidatat non proident

Keyboard Shortcuts

10.1 - z-test: when population variance is known.

Let's start by acknowledging that it is completely unrealistic to think that we'd find ourselves in the situation of knowing the population variance, but not the population mean. Therefore, the hypothesis testing method that we learn on this page has limited practical use. We study it only because we'll use it later to learn about the "power" of a hypothesis test (by learning how to calculate Type II error rates). As usual, let's start with an example.

Example 10-1 Section  

boy playing

Boys of a certain age are known to have a mean weight of \(\mu=85\) pounds. A complaint is made that the boys living in a municipal children's home are underfed. As one bit of evidence, \(n=25\) boys (of the same age) are weighed and found to have a mean weight of \(\bar{x}\) = 80.94 pounds. It is known that the population standard deviation \(\sigma\) is 11.6 pounds (the unrealistic part of this example!). Based on the available data, what should be concluded concerning the complaint?

The null hypothesis is \(H_0:\mu=85\), and the alternative hypothesis is \(H_A:\mu<85\). In general, we know that if the weights are normally distributed, then:

\(Z=\dfrac{\bar{X}-\mu}{\sigma/\sqrt{n}}\)

follows the standard normal \(N(0,1)\) distribution. It is actually a bit irrelevant here whether or not the weights are normally distributed, because the same size \(n=25\) is large enough for the Central Limit Theorem to apply. In that case, we know that \(Z\), as defined above, follows at least approximately the standard normal distribution. At any rate, it seems reasonable to use the test statistic:

\(Z=\dfrac{\bar{X}-\mu_0}{\sigma/\sqrt{n}}\)

for testing the null hypothesis

\(H_0:\mu=\mu_0\)

against any of the possible alternative hypotheses \(H_A:\mu \neq \mu_0\), \(H_A:\mu<\mu_0\), and \(H_A:\mu>\mu_0\).

For the example in hand, the value of the test statistic is:

\(Z=\dfrac{80.94-85}{11.6/\sqrt{25}}=-1.75\)

The critical region approach tells us to reject the null hypothesis at the \(\alpha=0.05\) level if \(Z<-1.645\). Therefore, we reject the null hypothesis because \(Z=-1.75<-1.645\), and therefore falls in the rejection region:

As always, we draw the same conclusion by using the \(p\)-value approach. Recall that the \(p\)-value approach tells us to reject the null hypothesis at the \(\alpha=0.05\) level if the \(p\)-value \(\le \alpha=0.05\). In this case, the \(p\)-value is \(P(Z<-1.75)=0.0401\):

As expected, we reject the null hypothesis because the \(p\)-value \(=0.0401<\alpha=0.05\).

By the way, we'll learn how to ask Minitab to conduct the \(Z\)-test for a mean \(\mu\) in a bit, but this is what the Minitab output for this example looks like this:

Test of hypothesis (two-tail)

  • Number System
  • Linear Algebra
  • Trigonometry
  • Probability
  • Discrete Mathematics
  • Engineering Math Practice Problems
  • Machine Learning Mathematics

Linear Algebra and Matrix

  • Scalar and Vector
  • Python Program to Add Two Matrices
  • Python program to multiply two matrices
  • Vector Operations
  • Product of Vectors
  • Scalar Product of Vectors
  • Dot and Cross Products on Vectors
  • Transpose a matrix in Single line in Python
  • Transpose of a Matrix
  • Adjoint and Inverse of a Matrix
  • How to inverse a matrix using NumPy
  • Determinant of a Matrix
  • Program to find Normal and Trace of a matrix
  • Data Science | Solving Linear Equations
  • Data Science - Solving Linear Equations with Python
  • System of Linear Equations
  • System of Linear Equations in three variables using Cramer's Rule
  • Eigenvalues
  • Applications of Eigenvalues and Eigenvectors
  • How to compute the eigenvalues and right eigenvectors of a given square array using NumPY?

Statistics for Machine Learning

  • Descriptive Statistic
  • Measures of Central Tendency
  • Measures of Dispersion | Types, Formula and Examples
  • Mean, Variance and Standard Deviation
  • Calculate the average, variance and standard deviation in Python using NumPy
  • Random Variables
  • Difference between Parametric and Non-Parametric Methods
  • Probability Distribution - Function, Formula, Table
  • Confidence Interval
  • Mathematics | Covariance and Correlation
  • Program to find correlation coefficient
  • Robust Correlation
  • Normal Probability Plot
  • Quantile Quantile plots
  • True Error vs Sample Error
  • Bias-Variance Trade Off - Machine Learning
  • Understanding Hypothesis Testing
  • Paired T-Test - A Detailed Overview
  • P-value in Machine Learning
  • F-Test in Statistics
  • Residual Leverage Plot (Regression Diagnostic)
  • Difference between Null and Alternate Hypothesis
  • Mann and Whitney U test
  • Wilcoxon Signed Rank Test
  • Kruskal Wallis Test
  • Friedman Test
  • Mathematics | Probability

Probability and Probability Distributions

  • Mathematics - Law of Total Probability
  • Bayes's Theorem for Conditional Probability
  • Mathematics | Probability Distributions Set 1 (Uniform Distribution)
  • Mathematics | Probability Distributions Set 4 (Binomial Distribution)
  • Mathematics | Probability Distributions Set 5 (Poisson Distribution)
  • Uniform Distribution Formula
  • Mathematics | Probability Distributions Set 2 (Exponential Distribution)
  • Mathematics | Probability Distributions Set 3 (Normal Distribution)
  • Mathematics | Beta Distribution Model
  • Gamma Distribution Model in Mathematics
  • Chi-Square Test for Feature Selection - Mathematical Explanation
  • Student's t-distribution in Statistics
  • Python - Central Limit Theorem
  • Mathematics | Limits, Continuity and Differentiability
  • Implicit Differentiation

Calculus for Machine Learning

  • Engineering Mathematics - Partial Derivatives
  • Advanced Differentiation
  • How to find Gradient of a Function using Python?
  • Optimization techniques for Gradient Descent
  • Higher Order Derivatives
  • Taylor Series
  • Application of Derivative - Maxima and Minima | Mathematics
  • Absolute Minima and Maxima
  • Optimization for Data Science
  • Unconstrained Multivariate Optimization
  • Lagrange Multipliers
  • Lagrange's Interpolation
  • Linear Regression in Machine learning
  • Ordinary Least Squares (OLS) using statsmodels

Regression in Machine Learning

Different tests are used in statistics to compare distinct samples or groups and make conclusions about populations. These tests, also referred to as statistical tests, concentrate on examining the probability or possibility of acquiring the observed data under particular premises or hypotheses. They offer a framework for evaluating the evidence for or against a given hypothesis. 

A statistical test starts with the formulation of a null hypothesis (H0) and an alternative hypothesis (Ha). The alternative hypothesis proposes a particular link or effect, whereas the null hypothesis reflects the default assumption and often states no effect or no difference.

The p-value indicates the likelihood of observing the data or more extreme results assuming the null hypothesis is true. Researchers compare the calculated p-value to a predetermined significance level, often denoted as α, to make a decision regarding the null hypothesis. If the p-value is smaller than α, the results are considered statistically significant, leading to the rejection of the null hypothesis in favor of the alternative hypothesis.

The p-value is calculated using a variety of statistical tests, including the Z-test, T-test , Chi-squared test , ANOVA , Z-test , and F-test , among others. In this article, we will focus on explaining the Z-test.

What is Z-Test?

Z-test is a statistical test that is used to determine whether the mean of a sample is significantly different from a known population mean when the population standard deviation is known. It is particularly useful when the sample size is large (>30).

Z-test can also be defined as a statistical method that is used to determine whether the distribution of the test statistics can be approximated using the normal distribution or not. It is the method to determine whether two sample means are approximately the same or different when their variance is known and the sample size is large (should be >= 30).

The Z-test compares the difference between the sample mean and the population means by considering the standard deviation of the sampling distribution. The resulting Z-score represents the number of standard deviations that the sample mean deviates from the population mean. This Z-Score is also known as Z-Statistics, and can be formulated as:

\text{Z-Score} = \frac{\bar{x}-\mu}{\sigma}

The average family annual income in India is 200k, with a standard deviation of 5k, and the average family annual income in Delhi is 300k.

Then Z-Score for Delhi will be.

\begin{aligned} \text{Z-Score}&=\frac{\bar{x}-\mu}{\sigma} \\&=\frac{300-200}{5} \\&=20 \end{aligned}

This indicates that the average family’s annual income in Delhi is 20 standard deviations above the mean of the population (India).

When to Use Z-test:

  • The sample size should be greater than 30. Otherwise, we should use the t-test.
  • Samples should be drawn at random from the population.
  • The standard deviation of the population should be known.
  • Samples that are drawn from the population should be independent of each other.
  • The data should be normally distributed , however, for a large sample size, it is assumed to have a normal distribution because central limit theorem

Hypothesis Testing

A hypothesis is an educated guess/claim about a particular property of an object. Hypothesis testing is a way to validate the claim of an experiment.

  • Null Hypothesis: The null hypothesis is a statement that the value of a population parameter (such as proportion, mean, or standard deviation) is equal to some claimed value. We either reject or fail to reject the null hypothesis. The null hypothesis is denoted by H 0 .
  • Alternate Hypothesis: The alternative hypothesis is the statement that the parameter has a value that is different from the claimed value. It is denoted by H A .

Level of significance: It means the degree of significance in which we accept or reject the null hypothesis. Since in most of the experiments 100% accuracy is not possible for accepting or rejecting a hypothesis, we, therefore, select a level of significance. It is denoted by alpha (∝).

Steps to perform Z-test:

  • First, identify the null and alternate hypotheses.
  • Determine the level of significance (∝).
  • Find the critical value of z in the z-test using

Z  =  \frac{(\overline{X}- \mu)}{\left ( \sigma /\sqrt{n} \right )}

  • n: sample size.
  • Now compare with the hypothesis and decide whether to reject or not reject the null hypothesis

Type of Z-test

  • Left-tailed Test: In this test, our region of rejection is located to the extreme left of the distribution. Here our null hypothesis is that the claimed value is less than or equal to the mean population value.

z score reject null hypothesis

  • Right-tailed Test: In this test, our region of rejection is located to the extreme right of the distribution. Here our null hypothesis is that the claimed value is less than or equal to the mean population value.

z score reject null hypothesis

  • Two-tailed test: In this test, our region of rejection is located to both extremes of the distribution. Here our null hypothesis is that the claimed value is equal to the mean population value.

z score reject null hypothesis

Below is an example of performing the z-test:

Example One-Tailed Test:

 A school claimed that the students who study that are more intelligent than the average school. On calculating the IQ scores of 50 students, the average turns out to be 110. The mean of the population IQ is 100 and the standard deviation is 15. State whether the claim of the principal is right or not at a 5% significance level.

H_0 : \mu  = 100

  • Now, we look up to the z-table. For the value of ∝=0.05, the z-score for the right-tailed test is 1.645.
  • Here 4.71 >1.645, so we reject the null hypothesis. 
  • If the z-test statistics are less than the z-score, then we will not reject the null hypothesis.

Code Implementations

Two-sampled z-test:.

In this test, we have provided 2 normally distributed and independent populations, and we have drawn samples at random from both populations. Here, we consider u 1 and u 2 to be the population mean, and X 1 and X 2 to be the observed sample mean. Here, our null hypothesis could be like this:

H_{0} : \mu_{1} -\mu_{2} = 0

and alternative hypothesis

H_{1} :  \mu_{1} - \mu_{2} \ne 0

and the formula for calculating the z-test score:

Z = \frac{\left ( \overline{X_{1}} - \overline{X_{2}} \right ) - \left ( \mu_{1} - \mu_{2} \right )}{\sqrt{\frac{\sigma_{1}^2}{n_{1}} + \frac{\sigma_{2}^2}{n_{2}}}}

There are two groups of students preparing for a competition: Group A and Group B. Group A has studied offline classes, while Group B has studied online classes. After the examination, the score of each student comes. Now we want to determine whether the online or offline classes are better.

Group A: Sample size = 50, Sample mean = 75, Sample standard deviation = 10 Group B: Sample size = 60, Sample mean = 80, Sample standard deviation = 12

Assuming a 5% significance level, perform a two-sample z-test to determine if there is a significant difference between the online and offline classes.

Step 1: Null & Alternate Hypothesis

 \mu_1 -\mu_2 = 0

Step 2: Significance Label

\alpha = 0.05

Step 3: Z-Score

\begin{aligned} \text{Z-score} &= \frac{(x_1-x_2)-(\mu_1 -\mu_2)} {\sqrt{\frac{\sigma_1^2}{n_1}+\frac{\sigma_2^2}{n_1}}}  \\ &= \frac{(75-80)-0} {\sqrt{\frac{10^2}{50}+\frac{12^2}{60}}} \\ &= \frac{-5} {\sqrt{2+2.4}} \\ &= \frac{-5} {2.0976} \\&=-2.384 \end{aligned}

Step 4: Check to Critical Z-Score value in the Z-Table for apha/2 = 0.025

  •  Critical Z-Score = 1.96

Step 5: Compare with the absolute Z-Score value

  • absolute(Z-Score) > Critical Z-Score
  • Reject the null hypothesis. There is a significant difference between the online and offline classes.

Type 1 error and Type II error:

  • Type I error: Type 1 error has occurred when we reject the null hypothesis, even when the hypothesis is true. This error is denoted by alpha.
  • Type II error: Type II error occurred when we didn’t reject the null hypothesis, even when the hypothesis is false. This error is denoted by beta.

Please Login to comment...

Similar reads.

  • Engineering Mathematics
  • Machine Learning
  • Mathematical

Improve your Coding Skills with Practice

 alt=

What kind of Experience do you want to share?

Hypothesis Testing (cont...)

Hypothesis testing, the null and alternative hypothesis.

In order to undertake hypothesis testing you need to express your research hypothesis as a null and alternative hypothesis. The null hypothesis and alternative hypothesis are statements regarding the differences or effects that occur in the population. You will use your sample to test which statement (i.e., the null hypothesis or alternative hypothesis) is most likely (although technically, you test the evidence against the null hypothesis). So, with respect to our teaching example, the null and alternative hypothesis will reflect statements about all statistics students on graduate management courses.

The null hypothesis is essentially the "devil's advocate" position. That is, it assumes that whatever you are trying to prove did not happen ( hint: it usually states that something equals zero). For example, the two different teaching methods did not result in different exam performances (i.e., zero difference). Another example might be that there is no relationship between anxiety and athletic performance (i.e., the slope is zero). The alternative hypothesis states the opposite and is usually the hypothesis you are trying to prove (e.g., the two different teaching methods did result in different exam performances). Initially, you can state these hypotheses in more general terms (e.g., using terms like "effect", "relationship", etc.), as shown below for the teaching methods example:

Depending on how you want to "summarize" the exam performances will determine how you might want to write a more specific null and alternative hypothesis. For example, you could compare the mean exam performance of each group (i.e., the "seminar" group and the "lectures-only" group). This is what we will demonstrate here, but other options include comparing the distributions , medians , amongst other things. As such, we can state:

Now that you have identified the null and alternative hypotheses, you need to find evidence and develop a strategy for declaring your "support" for either the null or alternative hypothesis. We can do this using some statistical theory and some arbitrary cut-off points. Both these issues are dealt with next.

Significance levels

The level of statistical significance is often expressed as the so-called p -value . Depending on the statistical test you have chosen, you will calculate a probability (i.e., the p -value) of observing your sample results (or more extreme) given that the null hypothesis is true . Another way of phrasing this is to consider the probability that a difference in a mean score (or other statistic) could have arisen based on the assumption that there really is no difference. Let us consider this statement with respect to our example where we are interested in the difference in mean exam performance between two different teaching methods. If there really is no difference between the two teaching methods in the population (i.e., given that the null hypothesis is true), how likely would it be to see a difference in the mean exam performance between the two teaching methods as large as (or larger than) that which has been observed in your sample?

So, you might get a p -value such as 0.03 (i.e., p = .03). This means that there is a 3% chance of finding a difference as large as (or larger than) the one in your study given that the null hypothesis is true. However, you want to know whether this is "statistically significant". Typically, if there was a 5% or less chance (5 times in 100 or less) that the difference in the mean exam performance between the two teaching methods (or whatever statistic you are using) is as different as observed given the null hypothesis is true, you would reject the null hypothesis and accept the alternative hypothesis. Alternately, if the chance was greater than 5% (5 times in 100 or more), you would fail to reject the null hypothesis and would not accept the alternative hypothesis. As such, in this example where p = .03, we would reject the null hypothesis and accept the alternative hypothesis. We reject it because at a significance level of 0.03 (i.e., less than a 5% chance), the result we obtained could happen too frequently for us to be confident that it was the two teaching methods that had an effect on exam performance.

Whilst there is relatively little justification why a significance level of 0.05 is used rather than 0.01 or 0.10, for example, it is widely used in academic research. However, if you want to be particularly confident in your results, you can set a more stringent level of 0.01 (a 1% chance or less; 1 in 100 chance or less).

Testimonials

One- and two-tailed predictions

When considering whether we reject the null hypothesis and accept the alternative hypothesis, we need to consider the direction of the alternative hypothesis statement. For example, the alternative hypothesis that was stated earlier is:

The alternative hypothesis tells us two things. First, what predictions did we make about the effect of the independent variable(s) on the dependent variable(s)? Second, what was the predicted direction of this effect? Let's use our example to highlight these two points.

Sarah predicted that her teaching method (independent variable: teaching method), whereby she not only required her students to attend lectures, but also seminars, would have a positive effect (that is, increased) students' performance (dependent variable: exam marks). If an alternative hypothesis has a direction (and this is how you want to test it), the hypothesis is one-tailed. That is, it predicts direction of the effect. If the alternative hypothesis has stated that the effect was expected to be negative, this is also a one-tailed hypothesis.

Alternatively, a two-tailed prediction means that we do not make a choice over the direction that the effect of the experiment takes. Rather, it simply implies that the effect could be negative or positive. If Sarah had made a two-tailed prediction, the alternative hypothesis might have been:

In other words, we simply take out the word "positive", which implies the direction of our effect. In our example, making a two-tailed prediction may seem strange. After all, it would be logical to expect that "extra" tuition (going to seminar classes as well as lectures) would either have a positive effect on students' performance or no effect at all, but certainly not a negative effect. However, this is just our opinion (and hope) and certainly does not mean that we will get the effect we expect. Generally speaking, making a one-tail prediction (i.e., and testing for it this way) is frowned upon as it usually reflects the hope of a researcher rather than any certainty that it will happen. Notable exceptions to this rule are when there is only one possible way in which a change could occur. This can happen, for example, when biological activity/presence in measured. That is, a protein might be "dormant" and the stimulus you are using can only possibly "wake it up" (i.e., it cannot possibly reduce the activity of a "dormant" protein). In addition, for some statistical tests, one-tailed tests are not possible.

Rejecting or failing to reject the null hypothesis

Let's return finally to the question of whether we reject or fail to reject the null hypothesis.

If our statistical analysis shows that the significance level is below the cut-off value we have set (e.g., either 0.05 or 0.01), we reject the null hypothesis and accept the alternative hypothesis. Alternatively, if the significance level is above the cut-off value, we fail to reject the null hypothesis and cannot accept the alternative hypothesis. You should note that you cannot accept the null hypothesis, but only find evidence against it.

What is a z-score? What is a p-value?

Most statistical tests begin by identifying a null hypothesis. The null hypothesis for the pattern analysis tools ( Analyzing Patterns toolset and Mapping Clusters toolset ) is Complete Spatial Randomness (CSR), either of the features themselves or of the values associated with those features. The z-scores and p-values returned by the pattern analysis tools tell you whether you can reject that null hypothesis or not. Often, you will run one of the pattern analysis tools, hoping that the z-score and p-value will indicate that you can reject the null hypothesis, because it would indicate that rather than a random pattern, your features (or the values associated with your features) exhibit statistically significant clustering or dispersion. Whenever you see spatial structure such as clustering in the landscape (or in your spatial data), you are seeing evidence of some underlying spatial processes at work, and as a geographer or GIS analyst, this is often what you are most interested in.

The p-value is a probability. For the pattern analysis tools, it is the probability that the observed spatial pattern was created by some random process. When the p-value is very small, it means it is very unlikely (small probability) that the observed spatial pattern is the result of random processes, so you can reject the null hypothesis. You might ask: How small is small enough? Good question. See the table and discussion below.

Z-scores are standard deviations. If, for example, a tool returns a z-score of +2.5, you would say that the result is 2.5 standard deviations. Both z-scores and p-values are associated with the standard normal distribution as shown below.

Standard Normal Distribution

Very high or very low (negative) z-scores, associated with very small p-values, are found in the tails of the normal distribution. When you run a feature pattern analysis tool and it yields small p-values and either a very high or a very low z-score, this indicates it is unlikely that the observed spatial pattern reflects the theoretical random pattern represented by your null hypothesis (CSR).

To reject the null hypothesis, you must make a subjective judgment regarding the degree of risk you are willing to accept for being wrong (for falsely rejecting the null hypothesis). Consequently, before you run the spatial statistic, you select a confidence level. Typical confidence levels are 90, 95, or 99 percent. A confidence level of 99 percent would be the most conservative in this case, indicating that you are unwilling to reject the null hypothesis unless the probability that the pattern was created by random chance is really small (less than a 1 percent probability).

  • Confidence Levels

Tools that allow you to apply the False Discovery Rate (FDR) will use corrected critical p-values. Those critical values will be the same or smaller than those shown in the table below.

Consider an example. The critical z-score values when using a 95 percent confidence level are -1.96 and +1.96 standard deviations. The uncorrected p-value associated with a 95 percent confidence level is 0.05. If your z-score is between -1.96 and +1.96, your uncorrected p-value will be larger than 0.05, and you cannot reject your null hypothesis because the pattern exhibited could very likely be the result of random spatial processes. If the z-score falls outside that range (for example, -2.5 or +5.4 standard deviations), the observed spatial pattern is probably too unusual to be the result of random chance, and the p-value will be small to reflect this. In this case, it is possible to reject the null hypothesis and proceed with figuring out what might be causing the statistically significant spatial structure in your data.

A key idea here is that the values in the middle of the normal distribution (z-scores like 0.19 or -1.2, for example), represent the expected outcome. When the absolute value of the z-score is large and the probabilities are small (in the tails of the normal distribution), however, you are seeing something unusual and generally very interesting. For the Hot Spot Analysis tool, for example, unusual means either a statistically significant hot spot or a statistically significant cold spot.

  • FDR Correction

The local spatial pattern analysis tools including Hot Spot Analysis and Cluster and Outlier Analysis Anselin Local Moran's I provide an optional Boolean parameter Apply False Discovery Rate (FDR) Correction . When this parameter is checked, the False Discovery Rate (FDR) procedure will potentially reduce the critical p-value thresholds shown in the table above in order to account for multiple testing and spatial dependency. The reduction, if any, is a function of the number of input features and the neighborhood structure employed.

Local spatial pattern analysis tools work by considering each feature within the context of neighboring features and determining if the local pattern (a target feature and its neighbors) is statistically different from the global pattern (all features in the dataset). The z-score and p-value results associated with each feature determines if the difference is statistically significant or not. This analytical approach creates issues with both multiple testing and dependency.

Multiple Testing —With a confidence level of 95 percent, probability theory tells us that there are 5 out of 100 chances that a spatial pattern could appear structured (clustered or dispersed, for example) and could be associated with a statistically significant p-value, when in fact the underlying spatial processes promoting the pattern are truly random. We would falsely reject the CSR null hypothesis in these cases because of the statistically significant p-values. Five chances out of 100 seems quite conservative until you consider that local spatial statistics perform a test for every feature in the dataset. If there are 10,000 features, for example, we might expect as many as 500 false results.

Spatial Dependency —Features near to each other tend to be similar; more often than not spatial data exhibits this type of dependency. Nonetheless, many statistical tests require features to be independent. For local pattern analysis tools this is because spatial dependency can artificially inflate statistical significance. Spatial dependency is exacerbated with local pattern analysis tools because each feature is evaluated within the context of its neighbors, and features that are near each other will likely share many of the same neighbors. This overlap accentuates spatial dependency.

There are at least three approaches for dealing with both the multiple test and spatial dependency issues. The first approach is to ignore the problem on the basis that the individual test performed for each feature in the dataset should be considered in isolation. With this approach, however, it is very likely that some statistically significant results will be incorrect (appear to be statistically significant when in fact the underlying spatial processes are random). The second approach is to apply a classical multiple testing procedure such as the Bonferroni or Sidak corrections. These methods are typically too conservative, however. While they will greatly reduce the number of false positives they will also miss finding statistically significant results when they do exist. A third approach is to apply the FDR correction which estimates the number of false positives for a given confidence level and adjusts the critical p-value accordingly. For this method statistically significant p-values are ranked from smallest (strongest) to largest (weakest), and based on the false positive estimate, the weakest are removed from this list. The remaining features with statistically significant p-values are identified by the Gi_Bin or COType fields in the output feature class. While not perfect, empirical tests show this method performs much better than assuming that each local test is performed in isolation, or applying the traditional, overly conservative, multiple test methods. The additional resources section provides more information about the FDR correction.

  • The Null Hypothesis and Spatial Statistics

Several statistics in the Spatial Statistics toolbox are inferential spatial pattern analysis techniques including Spatial Autocorrelation (Global Moran's I) , Cluster and Outlier Analysis (Anselin Local Moran's I) , and Hot Spot Analysis (Getis-Ord Gi*) . Inferential statistics are grounded in probability theory. Probability is a measure of chance, and underlying all statistical tests (either directly or indirectly) are probability calculations that assess the role of chance on the outcome of your analysis. Typically, with traditional (nonspatial) statistics, you work with a random sample and try to determine the probability that your sample data is a good representation (is reflective) of the population at large. As an example, you might ask "What are the chances that the results from my exit poll (showing candidate A will beat candidate B by a slim margin) will reflect final election results?" But with many spatial statistics, including the spatial autocorrelation type statistics listed above, very often you are dealing with all available data for the study area (all crimes, all disease cases, attributes for every census block, and so on). When you compute a statistic for the entire population, you no longer have an estimate at all. You have a fact. Consequently, it makes no sense to talk about likelihood or probabilities anymore. So how can the spatial pattern analysis tools, often applied to all data in the study area, legitimately report probabilities? The answer is that they can do this by postulating, via the null hypothesis, that the data is, in fact, part of some larger population. Consider this in more detail.

The Randomization Null Hypothesis —Where appropriate, the tools in the Spatial Statistics toolbox use the randomization null hypothesis as the basis for statistical significance testing. The randomization null hypothesis postulates that the observed spatial pattern of your data represents one of many (n!) possible spatial arrangements. If you could pick up your data values and throw them down onto the features in your study area, you would have one possible spatial arrangement of those values. (Note that picking up your data values and throwing them down arbitrarily is an example of a random spatial process). The randomization null hypothesis states that if you could do this exercise (pick them up, throw them down) infinite times, most of the time you would produce a pattern that would not be markedly different from the observed pattern (your real data). Once in a while you might accidentally throw all the highest values into the same corner of your study area, but the probability of doing that is small. The randomization null hypothesis states that your data is one of many, many, many possible versions of complete spatial randomness. The data values are fixed; only their spatial arrangement could vary.

The Normalization Null Hypothesis —A common alternative null hypothesis, not implemented for the Spatial Statistics toolbox, is the normalization null hypothesis. The normalization null hypothesis postulates that the observed values are derived from an infinitely large, normally distributed population of values through some random sampling process. With a different sample you would get different values, but you would still expect those values to be representative of the larger distribution. The normalization null hypothesis states that the values represent one of many possible samples of values. If you could fit your observed data to a normal curve and randomly select values from that distribution to toss onto your study area, most of the time you would produce a pattern and distribution of values that would not be markedly different from the observed pattern/distribution (your real data). The normalization null hypothesis states that your data and their arrangement are one of many, many, many possible random samples. Neither the data values nor their spatial arrangement are fixed. The normalization null hypothesis is only appropriate when the data values are normally distributed.

  • Additional Resources
  • Ebdon, David. Statistics in Geography. Blackwell, 1985.
  • Mitchell, Andy. The ESRI Guide to GIS Analysis, Volume 2. ESRI Press, 2005.
  • Goodchild, M.F., Spatial Autocorrelation. Catmog 47, Geo Books, 1986
  • Caldas de Castro, Marcia, and Burton H. Singer. "Controlling the False Discovery Rate: A New Application to Account for Multiple and Dependent Test in Local Statistics of Spatial Association." Geographical Analysis 38 , pp 180-208, 2006.

Related topics

  • High/Low Clustering (Getis-Ord General G)
  • Spatial Autocorrelation (Global Moran's I)
  • Cluster and Outlier Analysis (Anselin Local Moran's I)
  • Hot Spot Analysis (Getis-Ord Gi*)
  • Ordinary Least Squares (OLS)
  • Optimized Hot Spot Analysis
  • Emerging Hot Spot Analysis

Feedback on this topic?

In this topic

p-value Calculator

What is p-value, how do i calculate p-value from test statistic, how to interpret p-value, how to use the p-value calculator to find p-value from test statistic, how do i find p-value from z-score, how do i find p-value from t, p-value from chi-square score (χ² score), p-value from f-score.

Welcome to our p-value calculator! You will never again have to wonder how to find the p-value, as here you can determine the one-sided and two-sided p-values from test statistics, following all the most popular distributions: normal, t-Student, chi-squared, and Snedecor's F.

P-values appear all over science, yet many people find the concept a bit intimidating. Don't worry – in this article, we will explain not only what the p-value is but also how to interpret p-values correctly . Have you ever been curious about how to calculate the p-value by hand? We provide you with all the necessary formulae as well!

🙋 If you want to revise some basics from statistics, our normal distribution calculator is an excellent place to start.

Formally, the p-value is the probability that the test statistic will produce values at least as extreme as the value it produced for your sample . It is crucial to remember that this probability is calculated under the assumption that the null hypothesis H 0 is true !

More intuitively, p-value answers the question:

Assuming that I live in a world where the null hypothesis holds, how probable is it that, for another sample, the test I'm performing will generate a value at least as extreme as the one I observed for the sample I already have?

It is the alternative hypothesis that determines what "extreme" actually means , so the p-value depends on the alternative hypothesis that you state: left-tailed, right-tailed, or two-tailed. In the formulas below, S stands for a test statistic, x for the value it produced for a given sample, and Pr(event | H 0 ) is the probability of an event, calculated under the assumption that H 0 is true:

Left-tailed test: p-value = Pr(S ≤ x | H 0 )

Right-tailed test: p-value = Pr(S ≥ x | H 0 )

Two-tailed test:

p-value = 2 × min{Pr(S ≤ x | H 0 ), Pr(S ≥ x | H 0 )}

(By min{a,b} , we denote the smaller number out of a and b .)

If the distribution of the test statistic under H 0 is symmetric about 0 , then: p-value = 2 × Pr(S ≥ |x| | H 0 )

or, equivalently: p-value = 2 × Pr(S ≤ -|x| | H 0 )

As a picture is worth a thousand words, let us illustrate these definitions. Here, we use the fact that the probability can be neatly depicted as the area under the density curve for a given distribution. We give two sets of pictures: one for a symmetric distribution and the other for a skewed (non-symmetric) distribution.

  • Symmetric case: normal distribution:

p-values for symmetric distribution — left-tailed, right-tailed, and two-tailed tests.

  • Non-symmetric case: chi-squared distribution:

p-values for non-symmetric distribution — left-tailed, right-tailed, and two-tailed tests.

In the last picture (two-tailed p-value for skewed distribution), the area of the left-hand side is equal to the area of the right-hand side.

To determine the p-value, you need to know the distribution of your test statistic under the assumption that the null hypothesis is true . Then, with the help of the cumulative distribution function ( cdf ) of this distribution, we can express the probability of the test statistics being at least as extreme as its value x for the sample:

Left-tailed test:

p-value = cdf(x) .

Right-tailed test:

p-value = 1 - cdf(x) .

p-value = 2 × min{cdf(x) , 1 - cdf(x)} .

If the distribution of the test statistic under H 0 is symmetric about 0 , then a two-sided p-value can be simplified to p-value = 2 × cdf(-|x|) , or, equivalently, as p-value = 2 - 2 × cdf(|x|) .

The probability distributions that are most widespread in hypothesis testing tend to have complicated cdf formulae, and finding the p-value by hand may not be possible. You'll likely need to resort to a computer or to a statistical table, where people have gathered approximate cdf values.

Well, you now know how to calculate the p-value, but… why do you need to calculate this number in the first place? In hypothesis testing, the p-value approach is an alternative to the critical value approach . Recall that the latter requires researchers to pre-set the significance level, α, which is the probability of rejecting the null hypothesis when it is true (so of type I error ). Once you have your p-value, you just need to compare it with any given α to quickly decide whether or not to reject the null hypothesis at that significance level, α. For details, check the next section, where we explain how to interpret p-values.

As we have mentioned above, the p-value is the answer to the following question:

What does that mean for you? Well, you've got two options:

  • A high p-value means that your data is highly compatible with the null hypothesis; and
  • A small p-value provides evidence against the null hypothesis , as it means that your result would be very improbable if the null hypothesis were true.

However, it may happen that the null hypothesis is true, but your sample is highly unusual! For example, imagine we studied the effect of a new drug and got a p-value of 0.03 . This means that in 3% of similar studies, random chance alone would still be able to produce the value of the test statistic that we obtained, or a value even more extreme, even if the drug had no effect at all!

The question "what is p-value" can also be answered as follows: p-value is the smallest level of significance at which the null hypothesis would be rejected. So, if you now want to make a decision on the null hypothesis at some significance level α , just compare your p-value with α :

  • If p-value ≤ α , then you reject the null hypothesis and accept the alternative hypothesis; and
  • If p-value ≥ α , then you don't have enough evidence to reject the null hypothesis.

Obviously, the fate of the null hypothesis depends on α . For instance, if the p-value was 0.03 , we would reject the null hypothesis at a significance level of 0.05 , but not at a level of 0.01 . That's why the significance level should be stated in advance and not adapted conveniently after the p-value has been established! A significance level of 0.05 is the most common value, but there's nothing magical about it. Here, you can see what too strong a faith in the 0.05 threshold can lead to. It's always best to report the p-value, and allow the reader to make their own conclusions.

Also, bear in mind that subject area expertise (and common reason) is crucial. Otherwise, mindlessly applying statistical principles, you can easily arrive at statistically significant, despite the conclusion being 100% untrue.

As our p-value calculator is here at your service, you no longer need to wonder how to find p-value from all those complicated test statistics! Here are the steps you need to follow:

Pick the alternative hypothesis : two-tailed, right-tailed, or left-tailed.

Tell us the distribution of your test statistic under the null hypothesis: is it N(0,1), t-Student, chi-squared, or Snedecor's F? If you are unsure, check the sections below, as they are devoted to these distributions.

If needed, specify the degrees of freedom of the test statistic's distribution.

Enter the value of test statistic computed for your data sample.

Our calculator determines the p-value from the test statistic and provides the decision to be made about the null hypothesis. The standard significance level is 0.05 by default.

Go to the advanced mode if you need to increase the precision with which the calculations are performed or change the significance level .

In terms of the cumulative distribution function (cdf) of the standard normal distribution, which is traditionally denoted by Φ , the p-value is given by the following formulae:

Left-tailed z-test:

p-value = Φ(Z score )

Right-tailed z-test:

p-value = 1 - Φ(Z score )

Two-tailed z-test:

p-value = 2 × Φ(−|Z score |)

p-value = 2 - 2 × Φ(|Z score |)

🙋 To learn more about Z-tests, head to Omni's Z-test calculator .

We use the Z-score if the test statistic approximately follows the standard normal distribution N(0,1) . Thanks to the central limit theorem, you can count on the approximation if you have a large sample (say at least 50 data points) and treat your distribution as normal.

A Z-test most often refers to testing the population mean , or the difference between two population means, in particular between two proportions. You can also find Z-tests in maximum likelihood estimations.

The p-value from the t-score is given by the following formulae, in which cdf t,d stands for the cumulative distribution function of the t-Student distribution with d degrees of freedom:

Left-tailed t-test:

p-value = cdf t,d (t score )

Right-tailed t-test:

p-value = 1 - cdf t,d (t score )

Two-tailed t-test:

p-value = 2 × cdf t,d (−|t score |)

p-value = 2 - 2 × cdf t,d (|t score |)

Use the t-score option if your test statistic follows the t-Student distribution . This distribution has a shape similar to N(0,1) (bell-shaped and symmetric) but has heavier tails – the exact shape depends on the parameter called the degrees of freedom . If the number of degrees of freedom is large (>30), which generically happens for large samples, the t-Student distribution is practically indistinguishable from the normal distribution N(0,1).

The most common t-tests are those for population means with an unknown population standard deviation, or for the difference between means of two populations , with either equal or unequal yet unknown population standard deviations. There's also a t-test for paired (dependent) samples .

🙋 To get more insights into t-statistics, we recommend using our t-test calculator .

Use the χ²-score option when performing a test in which the test statistic follows the χ²-distribution .

This distribution arises if, for example, you take the sum of squared variables, each following the normal distribution N(0,1). Remember to check the number of degrees of freedom of the χ²-distribution of your test statistic!

How to find the p-value from chi-square-score ? You can do it with the help of the following formulae, in which cdf χ²,d denotes the cumulative distribution function of the χ²-distribution with d degrees of freedom:

Left-tailed χ²-test:

p-value = cdf χ²,d (χ² score )

Right-tailed χ²-test:

p-value = 1 - cdf χ²,d (χ² score )

Remember that χ²-tests for goodness-of-fit and independence are right-tailed tests! (see below)

Two-tailed χ²-test:

p-value = 2 × min{cdf χ²,d (χ² score ), 1 - cdf χ²,d (χ² score )}

(By min{a,b} , we denote the smaller of the numbers a and b .)

The most popular tests which lead to a χ²-score are the following:

Testing whether the variance of normally distributed data has some pre-determined value. In this case, the test statistic has the χ²-distribution with n - 1 degrees of freedom, where n is the sample size. This can be a one-tailed or two-tailed test .

Goodness-of-fit test checks whether the empirical (sample) distribution agrees with some expected probability distribution. In this case, the test statistic follows the χ²-distribution with k - 1 degrees of freedom, where k is the number of classes into which the sample is divided. This is a right-tailed test .

Independence test is used to determine if there is a statistically significant relationship between two variables. In this case, its test statistic is based on the contingency table and follows the χ²-distribution with (r - 1)(c - 1) degrees of freedom, where r is the number of rows, and c is the number of columns in this contingency table. This also is a right-tailed test .

Finally, the F-score option should be used when you perform a test in which the test statistic follows the F-distribution , also known as the Fisher–Snedecor distribution. The exact shape of an F-distribution depends on two degrees of freedom .

To see where those degrees of freedom come from, consider the independent random variables X and Y , which both follow the χ²-distributions with d 1 and d 2 degrees of freedom, respectively. In that case, the ratio (X/d 1 )/(Y/d 2 ) follows the F-distribution, with (d 1 , d 2 ) -degrees of freedom. For this reason, the two parameters d 1 and d 2 are also called the numerator and denominator degrees of freedom .

The p-value from F-score is given by the following formulae, where we let cdf F,d1,d2 denote the cumulative distribution function of the F-distribution, with (d 1 , d 2 ) -degrees of freedom:

Left-tailed F-test:

p-value = cdf F,d1,d2 (F score )

Right-tailed F-test:

p-value = 1 - cdf F,d1,d2 (F score )

Two-tailed F-test:

p-value = 2 × min{cdf F,d1,d2 (F score ), 1 - cdf F,d1,d2 (F score )}

Below we list the most important tests that produce F-scores. All of them are right-tailed tests .

A test for the equality of variances in two normally distributed populations . Its test statistic follows the F-distribution with (n - 1, m - 1) -degrees of freedom, where n and m are the respective sample sizes.

ANOVA is used to test the equality of means in three or more groups that come from normally distributed populations with equal variances. We arrive at the F-distribution with (k - 1, n - k) -degrees of freedom, where k is the number of groups, and n is the total sample size (in all groups together).

A test for overall significance of regression analysis . The test statistic has an F-distribution with (k - 1, n - k) -degrees of freedom, where n is the sample size, and k is the number of variables (including the intercept).

With the presence of the linear relationship having been established in your data sample with the above test, you can calculate the coefficient of determination, R 2 , which indicates the strength of this relationship . You can do it by hand or use our coefficient of determination calculator .

A test to compare two nested regression models . The test statistic follows the F-distribution with (k 2 - k 1 , n - k 2 ) -degrees of freedom, where k 1 and k 2 are the numbers of variables in the smaller and bigger models, respectively, and n is the sample size.

You may notice that the F-test of an overall significance is a particular form of the F-test for comparing two nested models: it tests whether our model does significantly better than the model with no predictors (i.e., the intercept-only model).

Can p-value be negative?

No, the p-value cannot be negative. This is because probabilities cannot be negative, and the p-value is the probability of the test statistic satisfying certain conditions.

What does a high p-value mean?

A high p-value means that under the null hypothesis, there's a high probability that for another sample, the test statistic will generate a value at least as extreme as the one observed in the sample you already have. A high p-value doesn't allow you to reject the null hypothesis.

What does a low p-value mean?

A low p-value means that under the null hypothesis, there's little probability that for another sample, the test statistic will generate a value at least as extreme as the one observed for the sample you already have. A low p-value is evidence in favor of the alternative hypothesis – it allows you to reject the null hypothesis.

BMR - Harris-Benedict equation

Bertrand's box paradox, possible combinations.

  • Biology (100)
  • Chemistry (100)
  • Construction (144)
  • Conversion (295)
  • Ecology (30)
  • Everyday life (262)
  • Finance (570)
  • Health (440)
  • Physics (510)
  • Sports (105)
  • Statistics (184)
  • Other (183)
  • Discover Omni (40)

Statology

Statistics Made Easy

Decision Rule Calculator

One-tailed or two-tailed hypothesis?

Significance level

Z-statistic or t-statistic?

Decision Rule: fail to reject the null hypothesis

Explanation:

The p-value for a Z-statistic of 1.34 for a two-tailed test is 0.18025 . Since this p-value is greater than 0.05 , we fail to reject the null hypothesis .

Featured Posts

5 Regularization Techniques You Should Know

Hey there. My name is Zach Bobbitt. I have a Masters of Science degree in Applied Statistics and I’ve worked on machine learning algorithms for professional businesses in both healthcare and retail. I’m passionate about statistics, machine learning, and data visualization and I created Statology to be a resource for both students and teachers alike.  My goal with this site is to help you learn statistics through using simple terms, plenty of real-world examples, and helpful illustrations.

Leave a Reply Cancel reply

Your email address will not be published. Required fields are marked *

Join the Statology Community

Sign up to receive Statology's exclusive study resource: 100 practice problems with step-by-step solutions. Plus, get our latest insights, tutorials, and data analysis tips straight to your inbox!

By subscribing you accept Statology's Privacy Policy.

IMAGES

  1. Z Test

    z score reject null hypothesis

  2. Z Test

    z score reject null hypothesis

  3. Chapter 10: Hypothesis Testing with Z

    z score reject null hypothesis

  4. Discovering Z Test Example Problems

    z score reject null hypothesis

  5. Null Hypothesis

    z score reject null hypothesis

  6. Hypothesis Testing: Z-Scores. A guide to understanding what…

    z score reject null hypothesis

VIDEO

  1. In-Depth Discussion Lecture

  2. Negative Z-Score

  3. Inferential Statistics Revision handout

  4. How to calculate z-scores and Probability

  5. Ch6 Hypothesis Tests for the Difference Between Two Proportions of two Populations Video 4 of 7

  6. Ch6 Large Sample Hypothesis Test for a Population Mean Video 1 of 7

COMMENTS

  1. Z Test: Uses, Formula & Examples

    Related posts: Null Hypothesis: Definition, Rejecting & Examples and Understanding Significance Levels. Two-Sample Z Test Hypotheses. Null hypothesis (H 0): Two population means are equal (µ 1 = µ 2).; Alternative hypothesis (H A): Two population means are not equal (µ 1 ≠ µ 2).; Again, when the p-value is less than or equal to your significance level, reject the null hypothesis.

  2. 10 Chapter 10: Hypothesis Testing with Z

    You compare your obtained z-statistic, z = 5.77, to the critical value, z* = 1.645, and find that z > z*. Therefore you reject the null hypothesis, concluding: Based on 5 observations, the average temperature (𝑋̅ = 76.6 degrees) is statistically significantly higher than it is supposed to be, z = 5.77, p < .05.

  3. Support or Reject Null Hypothesis in Easy Steps

    Use the P-Value method to support or reject null hypothesis. Step 1: State the null hypothesis and the alternate hypothesis ("the claim"). H o :p ≤ 0.23; H 1 :p > 0.23 (claim) Step 2: Compute by dividing the number of positive respondents from the number in the random sample: 63 / 210 = 0.3.

  4. Z-test Calculator

    The critical value approach involves comparing the value of the test statistic obtained for our sample, z z z, to the so-called critical values.These values constitute the boundaries of regions where the test statistic is highly improbable to lie.Those regions are often referred to as the critical regions, or rejection regions.The decision of whether or not you should reject the null ...

  5. Z-Test for Statistical Hypothesis Testing Explained

    The name Z-test comes from the Z-score of the normal distribution. This is a measure of how many standard deviations away a raw score or sample statistics is from the populations' mean. ... So, choosing a critical value of 5 percent, which equals a Z-score of 1.96, we can only reject the null hypothesis if our Z-test statistic is greater than ...

  6. 4.4: Hypothesis Testing

    Now if we obtain any observation with a Z score greater than 1.65, we would reject H 0. If the null hypothesis is true, we incorrectly reject the null hypothesis about 5% of the time when the sample mean is above the null value, as shown in Figure 4.19. Suppose the sample mean was smaller than the null value.

  7. Hypothesis Testing: Z-Scores

    To perform a hypothesis test, we need to determine 2 hypotheses: the null hypothesis (or H0) and the alternative hypothesis (or H1). The null hypothesis refers to the formalization of the assertion of a statistical property of the population to be verified. ... The number of standard deviations is determined by the z-scores. Boom! Figure 2 ...

  8. Data analysis: hypothesis testing: 5.1 Acceptance and ...

    Therefore, if the team obtains a calculated z-score that is greater than 2.33, they can reject the null hypothesis and conclude that the new ad campaign has had a significant impact on sales. This information can help the marketing team make data-driven decisions about future campaigns and allocate resources effectively to maximise sales and ...

  9. 6 Hypothesis Testing: the z-test

    When our probability allows us to reject the null hypothesis, we say that our observed results are 'statistically significant'. ... So with a two-tailed test at \(\alpha = 0.05\) we reject \(H_{0}\) if our observed z-score is either above z = 1.96 or less than -1.96. This is that value around 2 that Sir Ronald Fischer was talking about!

  10. Z Test: Definition & Two Proportion Z-Test

    The z-score associated with a 5% alpha level / 2 is 1.96.. Step 5: Compare the calculated z-score from Step 3 with the table z-score from Step 4. If the calculated z-score is larger, you can reject the null hypothesis. 8.99 > 1.96, so we can reject the null hypothesis.. Check out our YouTube channel for more stats help and tips!. References

  11. 6a.1

    The first step in hypothesis testing is to set up two competing hypotheses. The hypotheses are the most important aspect. If the hypotheses are incorrect, your conclusion will also be incorrect. The two hypotheses are named the null hypothesis and the alternative hypothesis. The null hypothesis is typically denoted as H 0.

  12. Hypothesis Testing: Significance Level & Rejection Region

    Therefore, if the value we get for Z from the test is lower than minus 1.96, or higher than 1.96, we will reject the null hypothesis. Otherwise, we will accept it. That's more or less how hypothesis testing works. We scale the sample mean with respect to the hypothesized value. If Z is close to 0, then we cannot reject the null.

  13. When Do You Reject the Null Hypothesis? (3 Examples)

    A hypothesis test is a formal statistical test we use to reject or fail to reject a statistical hypothesis. We always use the following steps to perform a hypothesis test: Step 1: State the null and alternative hypotheses. The null hypothesis, denoted as H0, is the hypothesis that the sample data occurs purely from chance.

  14. 8.3: Hypothesis Test for One Mean

    There are five steps in hypothesis testing when using the p-value method: Identify the claim and formulate the hypotheses. Compute the test statistic. Compute the p-value. Make the decision to reject or not reject the null hypothesis by comparing the p-value with α α. Reject H0 when the p-value ≤ α α.

  15. 10.1

    for testing the null hypothesis. H 0: μ = μ 0. against any of the possible alternative hypotheses H A: μ ≠ μ 0, H A: μ < μ 0, and H A: μ > μ 0. For the example in hand, the value of the test statistic is: Z = 80.94 − 85 11.6 / 25 = − 1.75. The critical region approach tells us to reject the null hypothesis at the α = 0.05 level ...

  16. Test of hypothesis (two-tail)

    Because .0188 > .01, we fail to reject the null hypothesis at the 1% level; if the null hypothesis were true, we would get such a large z-score more than 1% of the time. N.B.: You reject the null hypothesis if the z-score is large, which means that the p-value is small.

  17. Z-test

    Critical Z-Score : 1.6448536269514722 Reject Null Hypothesis p-value : 1.2142337364462463e-06 Reject Null Hypothesis Two-sampled z-test: In this test, we have provided 2 normally distributed and independent populations, and we have drawn samples at random from both populations.

  18. Testing a null hypothesis using positive and negative z-score; accept

    $-2.58$ is the rejection region, below which you will reject your null hypothesis with a test statistic calculated from your data (i.e. $-.924$).Think of it this way, if you are always using a two-sided hypothesis test with $\alpha$ set at .01, the rejection region is always going to be $|2.58|$ regardless of your data (the rejection regions stays constant).

  19. Hypothesis Testing

    Let's return finally to the question of whether we reject or fail to reject the null hypothesis. If our statistical analysis shows that the significance level is below the cut-off value we have set (e.g., either 0.05 or 0.01), we reject the null hypothesis and accept the alternative hypothesis. Alternatively, if the significance level is above ...

  20. What is a z-score? What is a p-value?—ArcGIS Pro

    The uncorrected p-value associated with a 95 percent confidence level is 0.05. If your z-score is between -1.96 and +1.96, your uncorrected p-value will be larger than 0.05, and you cannot reject your null hypothesis because the pattern exhibited could very likely be the result of random spatial processes.

  21. p-value Calculator

    For instance, if the p-value was 0.03, we would reject the null hypothesis at a significance level of 0.05, ... 2 × Φ(|Z score |) 🙋 To learn more about Z-tests, head to Omni's Z-test calculator. We use the Z-score if the test statistic approximately follows the standard normal distribution N(0,1). Thanks to the central limit theorem, you ...

  22. Decision Rule Calculator

    Decision Rule Calculator. In hypothesis testing, we want to know whether we should reject or fail to reject some statistical hypothesis. To make this decision, we compare the p-value of the test statistic to a significance level we have chosen to use for the test. If the p-value is less than the significance level, we reject the null hypothesis.

  23. How To Reject a Null Hypothesis Using 2 Different Methods

    The steps involved in using the critical value approach to conduct a hypothesis test include: 1. Specify the null and alternative hypotheses. The first step in rejecting any null hypothesis involves stating the null and alternative hypotheses and separating them from each other.

  24. Hypothesis-Testing (pdf)

    also known as critical region collection of test statistic values for which the null hypothesis is rejected. If the observed test statistic is in the rejection region, the null hypothesis is rejected and the alternative hypothesis is accepted. the critical value at a given significance level can be considered a cut- off point.