Find Study Materials for

  • Business Studies
  • Combined Science
  • Computer Science
  • Engineering
  • English Literature
  • Environmental Science
  • Human Geography
  • Macroeconomics
  • Microeconomics
  • Social Studies
  • Browse all subjects
  • Read our Magazine

Create Study Materials

Hypothesis tests for the normal distribution can be conducted in a very similar way to binomial distribution , e xcept this time we switch our test statistic.  These tests are useful as again they help us test claims of normally distributed items.

Mockup Schule

Explore our app and discover over 50 million learning materials for free.

  • Normal Distribution Hypothesis Test
  • Explanations
  • StudySmarter AI
  • Textbook Solutions
  • Applied Mathematics
  • Decision Maths
  • Discrete Mathematics
  • Logic and Functions
  • Mechanics Maths
  • Probability and Statistics
  • Bayesian Statistics
  • Bias in Experiments
  • Binomial Distribution
  • Binomial Hypothesis Test
  • Biostatistics
  • Bivariate Data
  • Categorical Data Analysis
  • Categorical Variables
  • Causal Inference
  • Central Limit Theorem
  • Chi Square Test for Goodness of Fit
  • Chi Square Test for Homogeneity
  • Chi Square Test for Independence
  • Chi-Square Distribution
  • Cluster Analysis
  • Combining Random Variables
  • Comparing Data
  • Comparing Two Means Hypothesis Testing
  • Conditional Probability
  • Conducting A Study
  • Conducting a Survey
  • Conducting an Experiment
  • Confidence Interval for Population Mean
  • Confidence Interval for Population Proportion
  • Confidence Interval for Slope of Regression Line
  • Confidence Interval for the Difference of Two Means
  • Confidence Intervals
  • Correlation Math
  • Cox Regression
  • Cumulative Distribution Function
  • Cumulative Frequency
  • Data Analysis
  • Data Interpretation
  • Decision Theory
  • Degrees of Freedom
  • Discrete Random Variable
  • Discriminant Analysis
  • Distributions
  • Empirical Bayes Methods
  • Empirical Rule
  • Errors In Hypothesis Testing
  • Estimation Theory
  • Estimator Bias
  • Events (Probability)
  • Experimental Design
  • Factor Analysis
  • Frequency Polygons
  • Generalization and Conclusions
  • Geometric Distribution
  • Geostatistics
  • Hierarchical Modeling
  • Hypothesis Test for Correlation
  • Hypothesis Test for Regression Slope
  • Hypothesis Test of Two Population Proportions
  • Hypothesis Testing
  • Inference For Distributions Of Categorical Data
  • Inferences in Statistics
  • Item Response Theory
  • Kaplan-Meier Estimate
  • Kernel Density Estimation
  • Large Data Set
  • Lasso Regression
  • Latent Variable Models
  • Least Squares Linear Regression
  • Linear Interpolation
  • Linear Regression
  • Logistic Regression
  • Machine Learning
  • Mann-Whitney Test
  • Markov Chains
  • Mean and Variance of Poisson Distributions
  • Measures of Central Tendency
  • Methods of Data Collection
  • Mixed Models
  • Multilevel Modeling
  • Multivariate Analysis
  • Neyman-Pearson Lemma
  • Non-parametric Methods
  • Normal Distribution
  • Normal Distribution Percentile
  • Ordinal Regression
  • Paired T-Test
  • Parametric Methods
  • Path Analysis
  • Point Estimation
  • Poisson Regression
  • Principle Components Analysis
  • Probability
  • Probability Calculations
  • Probability Density Function
  • Probability Distribution
  • Probability Generating Function
  • Product Moment Correlation Coefficient
  • Quantile Regression
  • Quantitative Variables
  • Random Effects Model
  • Random Variables
  • Randomized Block Design
  • Regression Analysis
  • Residual Sum of Squares
  • Robust Statistics
  • Sample Mean
  • Sample Proportion
  • Sampling Distribution
  • Sampling Theory
  • Scatter Graphs
  • Sequential Analysis
  • Single Variable Data
  • Spearman's Rank Correlation
  • Spearman's Rank Correlation Coefficient
  • Standard Deviation
  • Standard Error
  • Standard Normal Distribution
  • Statistical Graphs
  • Statistical Inference
  • Statistical Measures
  • Stem and Leaf Graph
  • Stochastic Processes
  • Structural Equation Modeling
  • Sum of Independent Random Variables
  • Survey Bias
  • Survival Analysis
  • Survivor Function
  • T-distribution
  • The Power Function
  • Time Series Analysis
  • Transforming Random Variables
  • Tree Diagram
  • Two Categorical Variables
  • Two Quantitative Variables
  • Type I Error
  • Type II Error
  • Types of Data in Statistics
  • Variance for Binomial Distribution
  • Venn Diagrams
  • Wilcoxon Test
  • Zero-Inflated Models
  • Theoretical and Mathematical Physics

Lerne mit deinen Freunden und bleibe auf dem richtigen Kurs mit deinen persönlichen Lernstatistiken

Nie wieder prokastinieren mit unseren Lernerinnerungen.

Hypothesis tests for the normal distribution can be conducted in a very similar way to binomial distribution , e xcept this time we switch our test statistic. These tests are useful as again they help us test claims of normally distributed items.

How do we carry out a hypothesis test for normal distribution?

When we hypothesis test for the mean of a normal distribution we think about looking at the mean of a sample from a population.

So for a random sample of size n of a population, taken from the random variable \(X \sim N(\mu, \sigma^2)\) , the sample mean \(\bar{X}\) can be normally distributed by \(\bar{X} \sim N(\mu, \frac{\sigma^2}{n})\) .

Let's look at an example.

The weight of crisps is each packet is normally distributed with a standard deviation of 2.5g.

The crisp company claims that the crisp packets have a mean weight of 28g. There were numerous complaints that each crisp packet weighs less than this. Therefore a trading inspector investigated this and found in a sample of 50 crisp packets, the mean weight was 27.2g.

Using a 5% significance level and stating the hypothesis, clearly test whether or not the evidence upholds the complaints.

This is an example of a one tailed test. Let's look at an example of a two tailed test.

A machine produces circular discs with a radius R, where R is normally distributed with a mean of 2cm and a standard deviation of 0.3cm.

The machine is serviced and after the service, a random sample of 40 discs is taken to see if the mean has changed from 2cm. The radius is still normally distributed with a standard deviation of 0.3 cm.

The mean is found to be 1.9cm.

Has the mean changed? Test this to a 5% significance level.

Step 5 may be confusing – do we carry out the calculation with \(P(\bar{X} \leq \bar{x})\) or \(P(\bar{X} \geq \bar{x})\)? As a general rule of thumb if the value is between 0 and the mean, then we use \(P(\bar{X} \leq \bar{x})\) . If it is greater than the mean then we use \(P(\bar{X} \geq \bar{x})\) .

How about finding critical values and critical regions?

This is the same idea as in binomial distribution . However, in normal distribution, a calculator can make our lives easier.

The distributions menu has an option called inverse normal.

Here, we enter the significance level (Area), the mean (\(\mu\) ) and the standard deviation (\(\sigma\) ).

The calculator will give us an answer. Let's have a look at an example below.

Wheels are made to measure for a bike. The diameter of the wheel is normally distributed with a mean of 40cm and a standard deviation of 5cm. Some people think that their wheels are too small. Find the critical value of this to a 5% significance level.

In our calculator, in the inverse normal function, we need to enter:

If we perform the inverse normal function we get 31.775732 .

So that is our critical value and our critical region is \(X \leq 31.775732\) .

Let's look at an example with two tails.

Normal distribution hypothesis test two tailed test studysmarter

Hypothesis Test for Normal Distribution - Key takeaways

  • When we hypothesis test for a normal distribution we are trying to see if the mean is different from the mean stated in the null hypothesis.
  • We use the sample mean which is \(\bar{X} \sim N(\mu, \frac{\sigma^2}{n})\) .
  • In two-tailed tests we divide the significance level by two and test on both tails.
  • When finding critical values we use the calculator inverse normal function entering the area as the significance level.
  • For two-tailed tests we need to find two critical values on either end of the distribution.

Frequently Asked Questions about Normal Distribution Hypothesis Test

--> how do you test a hypothesis for a normal distribution, --> is hypothesis testing only for a normal distribution.

No, pretty much any distribution can be used when testing a hypothesis. The two distributions that you learn at A-Level are Normal and Binomial.

--> What statistical hypothesis can be tested in the means of a normal distribution?

We test whether or not the data can support the value of a mean being too low or too high.

What is our test statistic with normal distribution?

What calculator tool do we use to work backwards with normal distribution?

The Inverse Normal

A coach thinks his athletes will achieve less than 12 seconds in their 100 metre race. His assistant thinks they won't be this fast. If this claim was tested is this a one or two-tailed test?

One-tailed test

How do we find the critical region of a normal distribution?

By using the calculator inverse normal setting.

Flashcards

Learn with 4 Normal Distribution Hypothesis Test flashcards in the free StudySmarter app

Already have an account? Log in

of the users don't pass the Normal Distribution Hypothesis Test quiz! Will you pass the quiz?

How would you like to learn this content?

Free math cheat sheet!

Everything you need to know on . A perfect summary so you can easily remember everything.

Join over 22 million students in learning with our StudySmarter App

The first learning app that truly has everything you need to ace your exams in one place

  • Flashcards & Quizzes
  • AI Study Assistant
  • Study Planner
  • Smart Note-Taking

Join over 22 million students in learning with our StudySmarter App

Sign up to highlight and take notes. It’s 100% free.

This is still free to read, it's not a paywall.

You need to register to keep reading, create a free account to save this explanation..

Save explanations to your personalised space and access them anytime, anywhere!

By signing up, you agree to the Terms and Conditions and the Privacy Policy of StudySmarter.

Entdecke Lernmaterial in der StudySmarter-App

Google Popup

hypothesis test normal distribution example

  • The Open University
  • Guest user / Sign out
  • Study with The Open University

My OpenLearn Profile

Personalise your OpenLearn profile, save your favourite content and get recognition for your learning

About this free course

Become an ou student, download this course, share this free course.

Data analysis: hypothesis testing

Start this free course now. Just create an account and sign in. Enrol and complete the course for a free statement of participation or digital badge if available.

4.1 The normal distribution

Here, you will look at the concept of normal distribution and the bell-shaped curve. The peak point (the top of the bell) represents the most probable occurrences, while other possible occurrences are distributed symmetrically around the peak point, creating a downward-sloping curve on either side of the peak point.

Cartoon showing a bell-shaped curve.

The cartoon shows a bell-shaped curve. The x-axis is titled ‘How high the hill is’ and the y-axis is titled ‘Number of hills’. The top of the bell-shaped curve is labelled ‘Average hill’, but on the lower right tail of the bell-shaped curve is labelled ‘Big hill’.

In order to test hypotheses, you need to calculate the test statistic and compare it with the value in the bell curve. This will be done by using the concept of ‘normal distribution’.

A normal distribution is a probability distribution that is symmetric about the mean, indicating that data near the mean are more likely to occur than data far from it. In graph form, a normal distribution appears as a bell curve. The values in the x-axis of the normal distribution graph represent the z-scores. The test statistic that you wish to use to test the set of hypotheses is the z-score . A z-score is used to measure how far the observation (sample mean) is from the 0 value of the bell curve (population mean). In statistics, this distance is measured by standard deviation. Therefore, when the z-score is equal to 2, the observation is 2 standard deviations away from the value 0 in the normal distribution curve.

A symmetrical graph reminiscent of a bell showing normal distribution.

A symmetrical graph reminiscent of a bell. The top of the bell-shaped curve appears where the x-axis is at 0. This is labelled as Normal distribution.

Previous

Treasure chest

Video Crash Courses

Junior Math

Math Essentials

Tutor-on-Demand

Encyclopedia

Digital Tools

How to Do Hypothesis Testing with Normal Distribution

Hypothesis tests compare a result against something you already believe is true. Let X 1 , X 2 , … ⁡ , X n be n independent random variables with equal expected value   μ and standard deviation  σ . Let X be the mean of these n random variables, so

The stochastic variable  X has an expected value  μ and a standard deviation  σ n . You want to perform a hypothesis test on this expected value. You have a null hypothesis  H 0 : μ = μ 0 and three possible alternative hypotheses: H a : μ < μ 0 , H a : μ > μ 0 or H a : μ ≠ μ 0 . The first two alternative hypotheses belong to what you call a one-sided test, while the latter is two-sided.

In hypothesis testing, you calculate using the alternative hypothesis in order to say something about the null hypothesis.

Hypothesis Testing (Normal Distribution)

Note! For two-sided testing, multiply the p -value by 2 before checking against the critical region.

As the production manager at the new soft drink factory, you are worried that the machines don’t fill the bottles to their proper capacity. Each bottle should be filled with 0 . 5 L soda, but random samples show that 48 soda bottles have an average of 0 . 4 8 L , with an empirical standard deviation of 0 . 1 . You are wondering if you need to recalibrate the machines so that they become more precise.

This is a classic case of hypothesis testing by normal distribution. You now follow the instructions above and select 1 0 % level of significance, since it is only a quantity of soda and not a case of life and death.

The alternative hypothesis in this case is that the bottles do not contain 0 . 5 L and that the machines are not precise enough. This thus becomes a two-sided hypothesis test and you must therefore remember to multiply the p -value by 2 before deciding whether the p -value is in the critical region. This is because the normal distribution is symmetric, so P ( X ≥ k ) = P ( X ≤ − k ) . Thus it is just as likely to observe an equally extremely high value as an equally extreme low:

so H 0 must be kept, and the machines are deemed to be fine as is.

Had the p -value been less than the level of significance, that would have meant that the calibration represented by the alternative hypothesis would be significantly better for the business.

White arrow pointing to the Left

An Introduction to Bayesian Thinking

Chapter 5 hypothesis testing with normal populations.

In Section 3.5 , we described how the Bayes factors can be used for hypothesis testing. Now we will use the Bayes factors to compare normal means, i.e., test whether the mean of a population is zero or compare the means of two groups of normally-distributed populations. We divide this mission into three cases: known variance for a single population, unknown variance for a single population using paired data, and unknown variance using two independent groups.

Also note that some of the examples in this section use an updated version of the bayes_inference function. If your local output is different from what is seen in this chapter, or the provided code fails to run for you please make sure that you have the most recent version of the package.

5.1 Bayes Factors for Testing a Normal Mean: variance known

Now we show how to obtain Bayes factors for testing hypothesis about a normal mean, where the variance is known . To start, let’s consider a random sample of observations from a normal population with mean \(\mu\) and pre-specified variance \(\sigma^2\) . We consider testing whether the population mean \(\mu\) is equal to \(m_0\) or not.

Therefore, we can formulate the data and hypotheses as below:

Data \[Y_1, \cdots, Y_n \mathrel{\mathop{\sim}\limits^{\rm iid}}\textsf{Normal}(\mu, \sigma^2)\]

  • \(H_1: \mu = m_0\)
  • \(H_2: \mu \neq m_0\)

We also need to specify priors for \(\mu\) under both hypotheses. Under \(H_1\) , we assume that \(\mu\) is exactly \(m_0\) , so this occurs with probability 1 under \(H_1\) . Now under \(H_2\) , \(\mu\) is unspecified, so we describe our prior uncertainty with the conjugate normal distribution centered at \(m_0\) and with a variance \(\sigma^2/\mathbf{n_0}\) . This is centered at the hypothesized value \(m_0\) , and it seems that the mean is equally likely to be larger or smaller than \(m_0\) , so a dividing factor \(n_0\) is given to the variance. The hyper parameter \(n_0\) controls the precision of the prior as before.

In mathematical terms, the priors are:

  • \(H_1: \mu = m_0 \text{ with probability 1}\)
  • \(H_2: \mu \sim \textsf{Normal}(m_0, \sigma^2/\mathbf{n_0})\)

Bayes Factor

Now the Bayes factor for comparing \(H_1\) to \(H_2\) is the ratio of the distribution, the data under the assumption that \(\mu = m_0\) to the distribution of the data under \(H_2\) .

\[\begin{aligned} \textit{BF}[H_1 : H_2] &= \frac{p(\text{data}\mid \mu = m_0, \sigma^2 )} {\int p(\text{data}\mid \mu, \sigma^2) p(\mu \mid m_0, \mathbf{n_0}, \sigma^2)\, d \mu} \\ \textit{BF}[H_1 : H_2] &=\left(\frac{n + \mathbf{n_0}}{\mathbf{n_0}} \right)^{1/2} \exp\left\{-\frac 1 2 \frac{n }{n + \mathbf{n_0}} Z^2 \right\} \\ Z &= \frac{(\bar{Y} - m_0)}{\sigma/\sqrt{n}} \end{aligned}\]

The term in the denominator requires integration to account for the uncertainty in \(\mu\) under \(H_2\) . And it can be shown that the Bayes factor is a function of the observed sampled size, the prior sample size \(n_0\) and a \(Z\) score.

Let’s explore how the hyperparameters in \(n_0\) influences the Bayes factor in Equation (5.1) . For illustration we will use the sample size of 100. Recall that for estimation, we interpreted \(n_0\) as a prior sample size and considered the limiting case where \(n_0\) goes to zero as a non-informative or reference prior.

\[\begin{equation} \textsf{BF}[H_1 : H_2] = \left(\frac{n + \mathbf{n_0}}{\mathbf{n_0}}\right)^{1/2} \exp\left\{-\frac{1}{2} \frac{n }{n + \mathbf{n_0}} Z^2 \right\} \tag{5.1} \end{equation}\]

Figure 5.1 shows the Bayes factor for comparing \(H_1\) to \(H_2\) on the y-axis as \(n_0\) changes on the x-axis. The different lines correspond to different values of the \(Z\) score or how many standard errors \(\bar{y}\) is from the hypothesized mean. As expected, larger values of the \(Z\) score favor \(H_2\) .

Vague prior for mu: n=100

Figure 5.1: Vague prior for mu: n=100

But as \(n_0\) becomes smaller and approaches 0, the first term in the Bayes factor goes to infinity, while the exponential term involving the data goes to a constant and is ignored. In the limit as \(n_0 \rightarrow 0\) under this noninformative prior, the Bayes factor paradoxically ends up favoring \(H_1\) regardless of the value of \(\bar{y}\) .

The takeaway from this is that we cannot use improper priors with \(n_0 = 0\) , if we are going to test our hypothesis that \(\mu = n_0\) . Similarly, vague priors that use a small value of \(n_0\) are not recommended due to the sensitivity of the results to the choice of an arbitrarily small value of \(n_0\) .

This problem arises with vague priors – the Bayes factor favors the null model \(H_1\) even when the data are far away from the value under the null – are known as the Bartlett’s paradox or the Jeffrey’s-Lindleys paradox.

Now, one way to understand the effect of prior is through the standard effect size

\[\delta = \frac{\mu - m_0}{\sigma}.\] The prior of the standard effect size is

\[\delta \mid H_2 \sim \textsf{Normal}(0, \frac{1}{\mathbf{n_0}})\]

This allows us to think about a standardized effect independent of the units of the problem. One default choice is using the unit information prior, where the prior sample size \(n_0\) is 1, leading to a standard normal for the standardized effect size. This is depicted with the blue normal density in Figure 5.2 . This suggested that we expect that the mean will be within \(\pm 1.96\) standard deviations of the hypothesized mean with probability 0.95 . (Note that we can say this only under a Bayesian setting.)

In many fields we expect that the effect will be small relative to \(\sigma\) . If we do not expect to see large effects, then we may want to use a more informative prior on the effect size as the density in orange with \(n_0 = 4\) . So they expected the mean to be within \(\pm 1/\sqrt{n_0}\) or five standard deviations of the prior mean.

Prior on standard effect size

Figure 5.2: Prior on standard effect size

Example 1.1 To illustrate, we give an example from parapsychological research. The case involved the test of the subject’s claim to affect a series of randomly generated 0’s and 1’s by means of extra sensory perception (ESP). The random sequence of 0’s and 1’s are generated by a machine with probability of generating 1 being 0.5. The subject claims that his ESP would make the sample mean differ significantly from 0.5.

Therefore, we are testing \(H_1: \mu = 0.5\) versus \(H_2: \mu \neq 0.5\) . Let’s use a prior that suggests we do not expect a large effect which leads the following solution for \(n_0\) . Assume we want a standard effect of 0.03, there is a 95% chance that it is between \((-0.03/\sigma, 0.03/\sigma)\) , with \(n_0 = (1.96\sigma/0.03)^2 = 32.7^2\) .

Figure 5.3 shows our informative prior in blue, while the unit information prior is in orange. On this scale, the unit information prior needs to be almost uniform for the range that we are interested.

Prior effect in the extra sensory perception test

Figure 5.3: Prior effect in the extra sensory perception test

A very large data set with over 104 million trials was collected to test this hypothesis, so we use a normal distribution to approximate the distribution the sample mean.

  • Sample size: \(n = 1.0449 \times 10^8\)
  • Sample mean: \(\bar{y} = 0.500177\) , standard deviation \(\sigma = 0.5\)
  • \(Z\) -score: 3.61

Now using our prior in the data, the Bayes factor for \(H_1\) to \(H_2\) was 0.46, implying evidence against the hypothesis \(H_1\) that \(\mu = 0.5\) .

  • Informative \(\textit{BF}[H_1:H_2] = 0.46\)
  • \(\textit{BF}[H_2:H_1] = 1/\textit{BF}[H_1:H_2] = 2.19\)

Now, this can be inverted to provide the evidence in favor of \(H_2\) . The evidence suggests that the hypothesis that the machine operates with a probability that is not 0.5, is 2.19 times more likely than the hypothesis the probability is 0.5. Based on the interpretation of Bayes factors from Table 3.5 , this is in the range of “not worth the bare mention”.

To recap, we present expressions for calculating Bayes factors for a normal model with a specified variance. We show that the improper reference priors for \(\mu\) when \(n_0 = 0\) , or vague priors where \(n_0\) is arbitrarily small, lead to Bayes factors that favor the null hypothesis regardless of the data, and thus should not be used for hypothesis testing.

Bayes factors with normal priors can be sensitive to the choice of the \(n_0\) . While the default value of \(n_0 = 1\) is reasonable in many cases, this may be too non-informative if one expects more effects. Wherever possible, think about how large an effect you expect and use that information to help select the \(n_0\) .

All the ESP examples suggest weak evidence and favored the machine generating random 0’s and 1’s with a probability that is different from 0.5. Note that ESP is not the only explanation – a deviation from 0.5 can also occur if the random number generator is biased. Bias in the stream of random numbers in our pseudorandom numbers has huge implications for numerous fields that depend on simulation. If the context had been about detecting a small bias in random numbers what prior would you use and how would it change the outcome? You can experiment it in R or other software packages that generate random Bernoulli trials.

Next, we will look at Bayes factors in normal models with unknown variances using the Cauchy prior so that results are less sensitive to the choice of \(n_0\) .

5.2 Comparing Two Paired Means using Bayes Factors

We previously learned that we can use a paired t-test to compare means from two paired samples. In this section, we will show how Bayes factors can be expressed as a function of the t-statistic for comparing the means and provide posterior probabilities of the hypothesis that whether the means are equal or different.

Example 5.1 Trace metals in drinking water affect the flavor, and unusually high concentrations can pose a health hazard. Ten pairs of data were taken measuring the zinc concentration in bottom and surface water at ten randomly sampled locations, as listed in Table 5.1 .

Water samples collected at the the same location, on the surface and the bottom, cannot be assumed to be independent of each other. However, it may be reasonable to assume that the differences in the concentration at the bottom and the surface in randomly sampled locations are independent of each other.

To start modeling, we will treat the ten differences as a random sample from a normal population where the parameter of interest is the difference between the average zinc concentration at the bottom and the average zinc concentration at the surface, or the main difference, \(\mu\) .

In mathematical terms, we have

  • Random sample of \(n= 10\) differences \(Y_1, \ldots, Y_n\)
  • Normal population with mean \(\mu \equiv \mu_B - \mu_S\)

In this case, we have no information about the variability in the data, and we will treat the variance, \(\sigma^2\) , as unknown.

The hypothesis of the main concentration at the surface and bottom are the same is equivalent to saying \(\mu = 0\) . The second hypothesis is that the difference between the mean bottom and surface concentrations, or equivalently that the mean difference \(\mu \neq 0\) .

In other words, we are going to compare the following hypotheses:

  • \(H_1: \mu_B = \mu_S \Leftrightarrow \mu = 0\)
  • \(H_2: \mu_B \neq \mu_S \Leftrightarrow \mu \neq 0\)

The Bayes factor is the ratio between the distributions of the data under each hypothesis, which does not depend on any unknown parameters.

\[\textit{BF}[H_1 : H_2] = \frac{p(\text{data}\mid H_1)} {p(\text{data}\mid H_2)}\]

To obtain the Bayes factor, we need to use integration over the prior distributions under each hypothesis to obtain those distributions of the data.

\[\textit{BF}[H_1 : H_2] = \iint p(\text{data}\mid \mu, \sigma^2) p(\mu \mid \sigma^2) p(\sigma^2 \mid H_2)\, d \mu \, d\sigma^2\]

This requires specifying the following priors:

  • \(\mu \mid \sigma^2, H_2 \sim \textsf{Normal}(0, \sigma^2/n_0)\)
  • \(p(\sigma^2) \propto 1/\sigma^2\) for both \(H_1\) and \(H_2\)

\(\mu\) is exactly zero under the hypothesis \(H_1\) . For \(\mu\) in \(H_2\) , we start with the same conjugate normal prior as we used in Section 5.1 – testing the normal mean with known variance. Since we assume that \(\sigma^2\) is known, we model \(\mu \mid \sigma^2\) instead of \(\mu\) itself.

The \(\sigma^2\) appears in both the numerator and denominator of the Bayes factor. For default or reference case, we use the Jeffreys prior (a.k.a. reference prior) on \(\sigma^2\) . As long as we have more than two observations, this (improper) prior will lead to a proper posterior.

After integration and rearranging, one can derive a simple expression for the Bayes factor:

\[\textit{BF}[H_1 : H_2] = \left(\frac{n + n_0}{n_0} \right)^{1/2} \left( \frac{ t^2 \frac{n_0}{n + n_0} + \nu } { t^2 + \nu} \right)^{\frac{\nu + 1}{2}}\]

This is a function of the t-statistic

\[t = \frac{|\bar{Y}|}{s/\sqrt{n}},\]

where \(s\) is the sample standard deviation and the degrees of freedom \(\nu = n-1\) (sample size minus one).

As we saw in the case of Bayes factors with known variance, we cannot use the improper prior on \(\mu\) because when \(n_0 \to 0\) , then \(\textit{BF}[H1:H_2] \to \infty\) favoring \(H_1\) regardless of the magnitude of the t-statistic. Arbitrary, vague small choices for \(n_0\) also lead to arbitrary large Bayes factors in favor of \(H_1\) . Another example of the Barlett’s or Jeffreys-Lindley paradox.

Sir Herald Jeffrey discovered another paradox testing using the conjugant normal prior, known as the information paradox . His thought experiment assumed that our sample size \(n\) and the prior sample size \(n_0\) . He then considered what would happen to the Bayes factor as the sample mean moved further and further away from the hypothesized mean, measured in terms standard errors with the t-statistic, i.e., \(|t| \to \infty\) . As the t-statistic or information about the mean moved further and further from zero, the Bayes factor goes to a constant depending on \(n, n_0\) rather than providing overwhelming support for \(H_2\) .

The bounded Bayes factor is

\[\textit{BF}[H_1 : H_2] \to \left( \frac{n_0}{n_0 + n} \right)^{\frac{n - 1}{2}}\]

Jeffrey wanted a prior with \(\textit{BF}[H_1 : H_2] \to 0\) (or equivalently, \(\textit{BF}[H_2 : H_1] \to \infty\) ), as the information from the t-statistic grows, indicating the sample mean is as far as from the hypothesized mean and should favor \(H_2\) .

To resolve the paradox when the information the t-statistic favors \(H_2\) but the Bayes factor does not, Jeffreys showed that no normal prior could resolve the paradox .

But a Cauchy prior on \(\mu\) , would resolve it. In this way, \(\textit{BF}[H_2 : H_1]\) goes to infinity as the sample mean becomes further away from the hypothesized mean. Recall that the Cauchy prior is written as \(\textsf{C}(0, r^2 \sigma^2)\) . While Jeffreys used a default of \(r = 1\) , smaller values of \(r\) can be used if smaller effects are expected.

The combination of the Jeffrey’s prior on \(\sigma^2\) and this Cauchy prior on \(\mu\) under \(H_2\) is sometimes referred to as the Jeffrey-Zellener-Siow prior .

However, there is no closed form expressions for the Bayes factor under the Cauchy distribution. To obtain the Bayes factor, we must use the numerical integration or simulation methods.

We will use the function from the package to test whether the mean difference is zero in Example 5.1 (zinc), using the JZS (Jeffreys-Zellener-Siow) prior.

hypothesis test normal distribution example

With equal prior probabilities on the two hypothesis, the Bayes factor is the posterior odds. From the output, we see this indicates that the hypothesis \(H_2\) , the mean difference is different from 0, is almost 51 times more likely than the hypothesis \(H_1\) that the average concentration is the same at the surface and the bottom.

To sum up, we have used the Cauchy prior as a default prior testing hypothesis about a normal mean when variances are unknown. This does require numerical integration, but it is available in the function from the package. If you expect that the effect sizes will be small, smaller values of \(r\) are recommended.

It is often important to quantify the magnitude of the difference in addition to testing. The Cauchy Prior provides a default prior for both testing and inference; it avoids problems that arise with choosing a value of \(n_0\) (prior sample size) in both cases. In the next section, we will illustrate using the Cauchy prior for comparing two means from independent normal samples.

5.3 Comparing Independent Means: Hypothesis Testing

In the previous section, we described Bayes factors for testing whether the mean difference of paired samples was zero. In this section, we will consider a slightly different problem – we have two independent samples, and we would like to test the hypothesis that the means are different or equal.

Example 5.2 We illustrate the testing of independent groups with data from a 2004 survey of birth records from North Carolina, which are available in the package.

The variable of interest is – the weight gain of mothers during pregnancy. We have two groups defined by the categorical variable, , with levels, younger mom and older mom.

Question of interest : Do the data provide convincing evidence of a difference between the average weight gain of older moms and the average weight gain of younger moms?

We will view the data as a random sample from two populations, older and younger moms. The two groups are modeled as:

\[\begin{equation} \begin{aligned} Y_{O,i} & \mathrel{\mathop{\sim}\limits^{\rm iid}} \textsf{N}(\mu + \alpha/2, \sigma^2) \\ Y_{Y,i} & \mathrel{\mathop{\sim}\limits^{\rm iid}} \textsf{N}(\mu - \alpha/2, \sigma^2) \end{aligned} \tag{5.2} \end{equation}\]

The model for weight gain for older moms using the subscript \(O\) , and it assumes that the observations are independent and identically distributed, with a mean \(\mu+\alpha/2\) and variance \(\sigma^2\) .

For the younger women, the observations with the subscript \(Y\) are independent and identically distributed with a mean \(\mu-\alpha/2\) and variance \(\sigma^2\) .

Using this representation of the means in the two groups, the difference in means simplifies to \(\alpha\) – the parameter of interest.

\[(\mu + \alpha/2) - (\mu - \alpha/2) = \alpha\]

You may ask, “Why don’t we set the average weight gain of older women to \(\mu+\alpha\) , and the average weight gain of younger women to \(\mu\) ?” We need the parameter \(\alpha\) to be present in both \(Y_{O,i}\) (the group of older women) and \(Y_{Y,i}\) (the group of younger women).

We have the following competing hypotheses:

  • \(H_1: \alpha = 0 \Leftrightarrow\) The means are not different.
  • \(H_2: \alpha \neq 0 \Leftrightarrow\) The means are different.

In this representation, \(\mu\) represents the overall weight gain for all women. (Does the model in Equation (5.2) make more sense now?) To test the hypothesis, we need to specify prior distributions for \(\alpha\) under \(H_2\) (c.f. \(\alpha = 0\) under \(H_1\) ) and priors for \(\mu,\sigma^2\) under both hypotheses.

Recall that the Bayes factor is the ratio of the distribution of the data under the two hypotheses.

\[\begin{aligned} \textit{BF}[H_1 : H_2] &= \frac{p(\text{data}\mid H_1)} {p(\text{data}\mid H_2)} \\ &= \frac{\iint p(\text{data}\mid \alpha = 0,\mu, \sigma^2 )p(\mu, \sigma^2 \mid H_1) \, d\mu \,d\sigma^2} {\int \iint p(\text{data}\mid \alpha, \mu, \sigma^2) p(\alpha \mid \sigma^2) p(\mu, \sigma^2 \mid H_2) \, d \mu \, d\sigma^2 \, d \alpha} \end{aligned}\]

As before, we need to average over uncertainty and the parameters to obtain the unconditional distribution of the data. Now, as in the test about a single mean, we cannot use improper or non-informative priors for \(\alpha\) for testing.

Under \(H_2\) , we use the Cauchy prior for \(\alpha\) , or equivalently, the Cauchy prior on the standardized effect \(\delta\) with the scale of \(r\) :

\[\delta = \alpha/\sigma^2 \sim \textsf{C}(0, r^2)\]

Now, under both \(H_1\) and \(H_2\) , we use the Jeffrey’s reference prior on \(\mu\) and \(\sigma^2\) :

\[p(\mu, \sigma^2) \propto 1/\sigma^2\]

While this is an improper prior on \(\mu\) , this does not suffer from the Bartlett’s-Lindley’s-Jeffreys’ paradox as \(\mu\) is a common parameter in the model in \(H_1\) and \(H_2\) . This is another example of the Jeffreys-Zellner-Siow prior.

As in the single mean case, we will need numerical algorithms to obtain the Bayes factor. Now the following output illustrates testing of Bayes factors, using the Bayes inference function from the package.

hypothesis test normal distribution example

We see that the Bayes factor for \(H_1\) to \(H_2\) is about 5.7, with positive support for \(H_1\) that there is no difference in average weight gain between younger and older women. Using equal prior probabilities, the probability that there is a difference in average weight gain between the two groups is about 0.15 given the data. Based on the interpretation of Bayes factors from Table 3.5 , this is in the range of “positive” (between 3 and 20).

To recap, we have illustrated testing hypotheses about population means with two independent samples, using a Cauchy prior on the difference in the means. One assumption that we have made is that the variances are equal in both groups . The case where the variances are unequal is referred to as the Behren-Fisher problem, and this is beyond the scope for this course. In the next section, we will look at another example to put everything together with testing and discuss summarizing results.

5.4 Inference after Testing

In this section, we will work through another example for comparing two means using both hypothesis tests and interval estimates, with an informative prior. We will also illustrate how to adjust the credible interval after testing.

Example 5.3 We will use the North Carolina survey data to examine the relationship between infant birth weight and whether the mother smoked during pregnancy. The response variable, , is the birth weight of the baby in pounds. The categorical variable provides the status of the mother as a smoker or non-smoker.

We would like to answer two questions:

Is there a difference in average birth weight between the two groups?

If there is a difference, how large is the effect?

As before, we need to specify models for the data and priors. We treat the data as a random sample for the two populations, smokers and non-smokers.

The birth weights of babies born to non-smokers, designated by a subgroup \(N\) , are assumed to be independent and identically distributed from a normal distribution with mean \(\mu + \alpha/2\) , as in Section 5.3 .

\[Y_{N,i} \mathrel{\mathop{\sim}\limits^{\rm iid}}\textsf{Normal}(\mu + \alpha/2, \sigma^2)\]

While the birth weights of the babies born to smokers, designated by the subgroup \(S\) , are also assumed to have a normal distribution, but with mean \(\mu - \alpha/2\) .

\[Y_{S,i} \mathrel{\mathop{\sim}\limits^{\rm iid}}\textsf{Normal}(\mu - \alpha/2, \sigma^2)\]

The difference in the average birth weights is the parameter \(\alpha\) , because

\[(\mu + \alpha/2) - (\mu - \alpha/2) = \alpha\] .

The hypotheses that we will test are \(H_1: \alpha = 0\) versus \(H_2: \alpha \ne 0\) .

We will still use the Jeffreys-Zellner-Siow Cauchy prior. However, since we may expect the standardized effect size to not be as strong, we will use a scale of \(r = 0.5\) rather than 1.

Therefore, under \(H_2\) , we have \[\delta = \alpha/\sigma \sim \textsf{C}(0, r^2), \text{ with } r = 0.5.\]

Under both \(H_1\) and \(H_2\) , we will use the reference priors on \(\mu\) and \(\sigma^2\) :

\[\begin{aligned} p(\mu) &\propto 1 \\ p(\sigma^2) &\propto 1/\sigma^2 \end{aligned}\]

The input to the base inference function is similar, but now we will specify that \(r = 0.5\) .

hypothesis test normal distribution example

We see that the Bayes factor is 1.44, which weakly favors there being a difference in average birth weights for babies whose mothers are smokers versus mothers who did not smoke. Converting this to a probability, we find that there is about a 60% chance of the average birth weights are different.

While looking at evidence of there being a difference is useful, Bayes factors and posterior probabilities do not convey any information about the magnitude of the effect. Reporting a credible interval or the complete posterior distribution is more relevant for quantifying the magnitude of the effect.

Using the function, we can generate samples from the posterior distribution under \(H_2\) using the option.

The 2.5 and 97.5 percentiles for the difference in the means provide a 95% credible interval of 0.023 to 0.57 pounds for the difference in average birth weight. The MCMC output shows not only summaries about the difference in the mean \(\alpha\) , but the other parameters in the model.

In particular, the Cauchy prior arises by placing a gamma prior on \(n_0\) and the conjugate normal prior. This provides quantiles about \(n_0\) after updating with the current data.

The row labeled effect size is the standardized effect size \(\delta\) , indicating that the effects are indeed small relative to the noise in the data.

Estimates of effect under H2

Figure 5.4: Estimates of effect under H2

Figure 5.4 shows the posterior density for the difference in means, with the 95% credible interval indicated by the shaded area. Under \(H_2\) , there is a 95% chance that the average birth weight of babies born to non-smokers is 0.023 to 0.57 pounds higher than that of babies born to smokers.

The previous statement assumes that \(H_2\) is true and is a conditional probability statement. In mathematical terms, the statement is equivalent to

\[P(0.023 < \alpha < 0.57 \mid \text{data}, H_2) = 0.95\]

However, we still have quite a bit of uncertainty based on the current data, because given the data, the probability of \(H_2\) being true is 0.59.

\[P(H_2 \mid \text{data}) = 0.59\]

Using the law of total probability, we can compute the probability that \(\mu\) is between 0.023 and 0.57 as below:

\[\begin{aligned} & P(0.023 < \alpha < 0.57 \mid \text{data}) \\ = & P(0.023 < \alpha < 0.57 \mid \text{data}, H_1)P(H_1 \mid \text{data}) + P(0.023 < \alpha < 0.57 \mid \text{data}, H_2)P(H_2 \mid \text{data}) \\ = & I( 0 \text{ in CI }) P(H_1 \mid \text{data}) + 0.95 \times P(H_2 \mid \text{data}) \\ = & 0 \times 0.41 + 0.95 \times 0.59 = 0.5605 \end{aligned}\]

Finally, we get that the probability that \(\alpha\) is in the interval, given the data, averaging over both hypotheses, is roughly 0.56. The unconditional statement is the average birth weight of babies born to nonsmokers is 0.023 to 0.57 pounds higher than that of babies born to smokers with probability 0.56. This adjustment addresses the posterior uncertainty and how likely \(H_2\) is.

To recap, we have illustrated testing, followed by reporting credible intervals, and using a Cauchy prior distribution that assumed smaller standardized effects. After testing, it is common to report credible intervals conditional on \(H_2\) . We also have shown how to adjust the probability of the interval to reflect our posterior uncertainty about \(H_2\) . In the next chapter, we will turn to regression models to incorporate continuous explanatory variables.

8.1.2 - Hypothesis Testing

A hypothesis test for a proportion is used when you are comparing one group to a known or hypothesized population proportion value. In other words, you have one sample with one categorical variable. The hypothesized value of the population proportion is symbolized by \(p_0\) because this is the value in the null hypothesis (\(H_0\)).

If \(np_0 \ge 10\) and \(n(1-p_0) \ge 10\) then the distribution of sample proportions is approximately normal and can be estimated using the normal distribution. That sampling distribution will have a mean of \(p_0\) and a standard deviation (i.e., standard error) of \(\sqrt{\frac{p_0 (1-p_0)}{n}}\)

Recall that the standard normal distribution is also known as the z distribution. Thus, this is known as a "single sample proportion z test" or "one sample proportion z test." 

If \(np_0 < 10\) or \(n(1-p_0) < 10\) then the distribution of sample proportions follows a binomial distribution. We will not be conducting this test by hand in this course, however you will learn how this can be conducted using Minitab using the exact method.

8.1.2.1 - Normal Approximation Method Formulas

Here we will be using the five step hypothesis testing procedure to compare the proportion in one random sample to a specified population proportion using the normal approximation method.

In order to use the normal approximation method, the assumption is that both \(n p_0 \geq 10\) and \(n (1-p_0) \geq 10\). Recall that \(p_0\) is the population proportion in the null hypothesis.

Where \(p_0\) is the hypothesized population proportion that you are comparing your sample to.

When using the normal approximation method we will be using a z test statistic. The z test statistic tells us how far our sample proportion is from the hypothesized population proportion in standard error units. Note that this formula follows the basic structure of a test statistic that you learned in the last lesson:

\(test\;statistic=\dfrac{sample\;statistic-null\;parameter}{standard\;error}\)

\(\widehat{p}\) = sample proportion \(p_{0}\) = hypothesize population proportion \(n\) = sample size

Given that the null hypothesis is true, the p value is the probability that a randomly selected sample of n would have a sample proportion as different, or more different, than the one in our sample, in the direction of the alternative hypothesis. We can find the p value by mapping the test statistic from step 2 onto the z distribution. 

Note that p-values are also symbolized by \(p\). Do not confuse this with the population proportion which shares the same symbol.

We can look up the \(p\)-value using Minitab by constructing the sampling distribution.  Because we are using the normal approximation here, we have a \(z\) test statistic that we can map onto the \(z\) distribution. Recall, the z distribution is a normal distribution with a mean of 0 and standard deviation of 1. If we are conducting a one-tailed (i.e., right- or left-tailed) test, we look up the area of the sampling distribution that is beyond our test statistic. If we are conducting a two-tailed (i.e., non-directional) test there is one additional step: we need to multiple the area by two to take into account the possibility of being in the right or left tail. 

We can decide between the null and alternative hypotheses by examining our p-value. If \(p \leq \alpha\) reject the null hypothesis. If \(p>\alpha\) fail to reject the null hypothesis. Unless stated otherwise, assume that \(\alpha=.05\).

When we reject the null hypothesis our results are said to be statistically significant.

Based on our decision in step 4, we will write a sentence or two concerning our decision in relation to the original research question.

8.1.2.1.1 - Video Example: Male Babies

8.1.2.1.2 - Example: Handedness

Research Question : Are more than 80% of American's right handed?

In a sample of 100 Americans, 87 were right handed.

\(np_0 = 100(0.80)=80\)

\(n(1-p_0) = 100 (1-0.80) = 20\)

Both \(np_0\) and \(n(1-p_0)\) are at least 10 so we can use the normal approximation method. 

This is a right-tailed test because we want to know if the proportion is greater than 0.80.

\(H_{0}\colon p=0.80\) \(H_{a}\colon p>0.80\)

\(z=\dfrac{\widehat{p}- p_0 }{\sqrt{\frac{p_0 (1- p_0)}{n}}}\)

\(\widehat{p}=\dfrac{87}{100}=0.87\), \(p_{0}=0.80\), \(n=100\)

\(z= \dfrac{\widehat{p}- p_0 }{\sqrt{\frac{p_0 (1- p_0)}{n}}}= \dfrac{0.87-0.80}{\sqrt{\frac{0.80 (1-0.80)}{100}}}=1.75\)

Our \(z\) test statistic is 1.75.

This is a right-tailed test so we need to find the area to the right of the test statistic, \(z=1.75\), on the z distribution.

Using Minitab , we find the probability \(P(z\geq1.75)=0.0400592\) which may be rounded to \(p\; value=0.0401\).

Distribution plot of Density vs X - Normal, Mean=0, StDev=1

\(p\leq .05\), therefore our decision is to reject the null hypothesis

Yes, there is statistical evidence to state that more than 80% of all Americans are right handed.

8.1.2.1.3 - Example: Ice Cream

Research Question : Is the percentage of Creamery customers who prefer chocolate ice cream over vanilla less than 80%?

In a sample of 50 customers 60% preferred chocolate over vanilla.

\(np_0 = 50(0.80) = 40\)

\(n(1-p_0)=50(1-0.80) = 10\)

Both \(np_0\) and \(n(1-p_0)\) are at least 10. We can use the normal approximation method.

This is a left-tailed test because we want to know if the proportion is less than 0.80.

\(H_{0}\colon p=0.80\) \(H_{a}\colon p<0.80\)

\(\widehat{p}=0.60\), \(p_{0}=0.80\), \(n=50\)

\(z= \dfrac{\widehat{p}- p_0 }{\sqrt{\frac{p_0 (1- p_0)}{n}}}= \dfrac{0.60-0.80}{\sqrt{\frac{0.80 (1-0.80)}{50}}}=-3.536\)

Our \(z\) test statistic is -3.536.

This is a left-tailed test so we need to find the area to the right of our test statistic, \(z=-3.536\).

Distribution Plot of Density vs X - Normal, Mean=0, StDev=1

From the Minitab output above, the p-value is 0.0002031

\(p \leq.05\), therefore our decision is to reject the null hypothesis.

Yes, there is evidence that the percentage of all Creamery customers who prefer chocolate ice cream over vanilla is less than 80%.

8.1.2.1.4 - Example: Overweight Citizens

According to the Center for Disease Control (CDC), the percent of adults 20 years of age and over in the United States who are overweight is 69.0% (see  http://www.cdc.gov/nchs/fastats/obesity-overweight.htm ). One city’s council wants to know if the proportion of overweight citizens in their city is different from this known national proportion. They take a random sample of 150 adults 20 years of age or older in their city and find that 98 are classified as overweight. Let’s use the five step hypothesis testing procedure to determine if there is evidence that the proportion in this city is different from the known national proportion.

\(np_0 =150 (0.690)=103.5 \)

\(n (1-p_0) =150 (1-0.690)=46.5\)

Both \(n p_0\) and \(n (1-p_0)\) are at least 10, this assumption has been met.

Research question: Is this city’s proportion of overweight individuals different from 0.690?

This is a non-directional test because our question states that we are looking for a differences as opposed to a specific direction. This will be a two-tailed test.

\(H_{0}\colon p=0.690\) \(H_{a}\colon p\neq 0.690\)

\(\widehat{p}=\dfrac{98}{150}=.653\)

\( z =\dfrac{0.653- 0.690 }{\sqrt{\frac{0.690 (1- 0.690)}{150}}} = -0.980 \)

Our test statistic is \(z=-0.980\)

This is a non-directional (i.e., two-tailed) test, so we need to find the area under the z distribution that is more extreme than \(z=-0.980\).

In Minitab, we find the proportion of a normal curve beyond \(\pm0.980\):

Distribution Plot of Density vs X - Normal, Mean=0, StDev=1

\(p-value=0.163543+0.163543=0.327086\)

\(p>\alpha\), therefore we fail to reject the null hypothesis

There is not sufficient evidence to state that the proportion of citizens of this city who are overweight is different from the national proportion of 0.690.

8.1.2.2 - Minitab: Hypothesis Tests for One Proportion

A hypothesis test for one proportion can be conducted in Minitab. This can be done using raw data or summarized data.

  • If you have a data file with every individual's observation, then you have  raw data .
  • If you do not have each individual observation, but rather have the sample size and number of successes in the sample, then you have summarized data.

The next two pages will show you how to use Minitab to conduct this analysis using either raw data or summarized data .

Note that the default method for constructing the sampling distribution in Minitab is to use the exact method.  If \(np_0 \geq 10\) and \(n(1-p_0) \geq 10\) then you will need to change this to the normal approximation method.  This must be done manually.  Minitab will use the method that you select, it will not check assumptions for you!

8.1.2.2.1 - Minitab: 1 Proportion z Test, Raw Data

If you have data in a Minitab worksheet, then you have what we call "raw data."  This is in contrast to "summarized data" which you'll see on the next page.

In order to use the normal approximation method both \(np_0 \geq 10\) and \(n(1-p_0) \geq 10\). Before we can conduct our hypothesis test we must check this assumption to determine if the normal approximation method or exact method should be used. This must be checked manually.  Minitab will not check assumptions for you.

In the example below, we want to know if there is evidence that the proportion of students who are male is different from 0.50.

\(n=226\) and \(p_0=0.50\)

\(np_0 = 226(0.50)=113\) and \(n(1-p_0) = 226(1-0.50)=113\)

Both \(np_0 \geq 10\) and \(n(1-p_0) \geq 10\) so we can use the normal approximation method. 

Minitab ®  – Conducting a One Sample Proportion z Test: Raw Data

Research question:  Is the proportion of students who are male different from 0.50?

  • class_survey.mpx
  • In Minitab, select Stat > Basic Statistics > 1 Proportion
  • Select One or more samples, each in a column from the dropdown
  • Double-click the variable  Biological Sex  to insert it into the box
  • Check the box next to  Perform hypothesis test and enter  0.50  in the  Hypothesized proportion  box
  • Select Options
  • Use the default  Alternative hypothesis  setting of  Proportion ≠ hypothesized proportion value 
  • Use the default  Confidence level  of 95
  • Select  Normal approximation method
  • Click OK and OK

The result should be the following output:

Event: Biological Sex = Male p: proportion where Biological Sex = Male Normal approximation is used for this analysis.

Summary of Results

We could summarize these results using the five-step hypothesis testing procedure:

\(np_0 = 226(0.50)=113\) and \(n(1-p_0) = 226(1-0.50)=113\) therefore the normal approximation method will be used.

 \(H_0\colon p = 0.50\)

 \(H_a\colon p \ne 0.50\)

From the Minitab output, \(z\) = -1.86

From the Minitab output, \(p\) = 0.0625

\(p > \alpha\), fail to reject the null hypothesis

There is NOT enough evidence that the proportion of all students in the population who are male is different from 0.50.

8.1.2.2.2 - Minitab: 1 Sample Proportion z test, Summary Data

Example: overweight.

The following example uses a scenario in which we want to know if the proportion of college women who think they are overweight is less than 40%. We collect data from a random sample of 129 college women and 37 said that they think they are overweight.

First, we should check assumptions to determine if the normal approximation method or exact method should be used:

\(np_0=129(0.40)=51.6\) and \(n(1-p_0)=129(1-0.40)=77.4\) both values are at least 10 so we can use the normal approximation method.

Minitab ®  – Performing a One Proportion z Test with Summarized Data

To perform a one sample proportion  z  test with summarized data in Minitab:

  • Select Summarized data from the dropdown
  • For number of events, add 37 and for number of trials add 129.
  • Check the box next to  Perform hypothesis test and enter  0.40  in the  Hypothesized proportion  box
  • Use the default  Alternative hypothesis  setting of  Proportion < hypothesized proportion value 

Event: Event proportion Normal approximation is used for this analysis.

\(H_0\colon p = 0.40\)

\(H_a\colon p < 0.40\)

From output, \(z\) = -2.62

From output, \(p\) = 0.004

\(p \leq \alpha\), reject the null hypothesis

There is evidence that the proportion of women in the population who think they are overweight is less than 40%.

8.1.2.2.2.1 - Minitab Example: Normal Approx. Method

Example: gym membership.

Research question:  Are less than 50% of all individuals with a membership at one gym female?

A simple random sample of 60 individuals with a membership at one gym was collected. Each individual's biological sex was recorded. There were 24 females. 

First we have to check the assumptions:

  np = 60 (0.50) = 30

  n(1-p) = 60(1-0.50) = 30

The assumptions are met to use the normal approximation method.

  • For number of events, add 24 and for number of trials add 60.

\(np_0=60(0.50)=30\) and \(n(1-p_0)=60(1-0.50)=30\) both values are at least 10 so we can use the normal approximation method.

\(H_0\colon p = 0.50\)

\(H_a\colon p < 0.50\)

From output, \(z\) = -1.55

From output, \(p\) = 0.061

\(p \geq \alpha\), fail to reject the null hypothesis

There is not enough evidence to support the alternative that the proportion of women memberships at this gym is less than 50%.

Library homepage

  • school Campus Bookshelves
  • menu_book Bookshelves
  • perm_media Learning Objects
  • login Login
  • how_to_reg Request Instructor Account
  • hub Instructor Commons
  • Download Page (PDF)
  • Download Full Book (PDF)
  • Periodic Table
  • Physics Constants
  • Scientific Calculator
  • Reference & Cite
  • Tools expand_more
  • Readability

selected template will load here

This action is not available.

Statistics LibreTexts

9.E: Hypothesis Testing with One Sample (Exercises)

  • Last updated
  • Save as PDF
  • Page ID 1146

These are homework exercises to accompany the Textmap created for "Introductory Statistics" by OpenStax.

9.1: Introduction

9.2: null and alternative hypotheses.

Some of the following statements refer to the null hypothesis, some to the alternate hypothesis.

State the null hypothesis, \(H_{0}\), and the alternative hypothesis. \(H_{a}\), in terms of the appropriate parameter \((\mu \text{or} p)\).

  • The mean number of years Americans work before retiring is 34.
  • At most 60% of Americans vote in presidential elections.
  • The mean starting salary for San Jose State University graduates is at least $100,000 per year.
  • Twenty-nine percent of high school seniors get drunk each month.
  • Fewer than 5% of adults ride the bus to work in Los Angeles.
  • The mean number of cars a person owns in her lifetime is not more than ten.
  • About half of Americans prefer to live away from cities, given the choice.
  • Europeans have a mean paid vacation each year of six weeks.
  • The chance of developing breast cancer is under 11% for women.
  • Private universities' mean tuition cost is more than $20,000 per year.
  • \(H_{0}: \mu = 34; H_{a}: \mu \neq 34\)
  • \(H_{0}: p \leq 0.60; H_{a}: p > 0.60\)
  • \(H_{0}: \mu \geq 100,000; H_{a}: \mu < 100,000\)
  • \(H_{0}: p = 0.29; H_{a}: p \neq 0.29\)
  • \(H_{0}: p = 0.05; H_{a}: p < 0.05\)
  • \(H_{0}: \mu \leq 10; H_{a}: \mu > 10\)
  • \(H_{0}: p = 0.50; H_{a}: p \neq 0.50\)
  • \(H_{0}: \mu = 6; H_{a}: \mu \neq 6\)
  • \(H_{0}: p ≥ 0.11; H_{a}: p < 0.11\)
  • \(H_{0}: \mu \leq 20,000; H_{a}: \mu > 20,000\)

Over the past few decades, public health officials have examined the link between weight concerns and teen girls' smoking. Researchers surveyed a group of 273 randomly selected teen girls living in Massachusetts (between 12 and 15 years old). After four years the girls were surveyed again. Sixty-three said they smoked to stay thin. Is there good evidence that more than thirty percent of the teen girls smoke to stay thin? The alternative hypothesis is:

  • \(p < 0.30\)
  • \(p \leq 0.30\)
  • \(p \geq 0.30\)
  • \(p > 0.30\)

A statistics instructor believes that fewer than 20% of Evergreen Valley College (EVC) students attended the opening night midnight showing of the latest Harry Potter movie. She surveys 84 of her students and finds that 11 attended the midnight showing. An appropriate alternative hypothesis is:

  • \(p = 0.20\)
  • \(p > 0.20\)
  • \(p < 0.20\)
  • \(p \leq 0.20\)

Previously, an organization reported that teenagers spent 4.5 hours per week, on average, on the phone. The organization thinks that, currently, the mean is higher. Fifteen randomly chosen teenagers were asked how many hours per week they spend on the phone. The sample mean was 4.75 hours with a sample standard deviation of 2.0. Conduct a hypothesis test. The null and alternative hypotheses are:

  • \(H_{0}: \bar{x} = 4.5, H_{a}: \bar{x} > 4.5\)
  • \(H_{0}: \mu \geq 4.5, H_{a}: \mu < 4.5\)
  • \(H_{0}: \mu = 4.75, H_{a}: \mu > 4.75\)
  • \(H_{0}: \mu = 4.5, H_{a}: \mu > 4.5\)

9.3: Outcomes and the Type I and Type II Errors

State the Type I and Type II errors in complete sentences given the following statements.

  • The mean number of cars a person owns in his or her lifetime is not more than ten.
  • Private universities mean tuition cost is more than $20,000 per year.
  • Type I error: We conclude that the mean is not 34 years, when it really is 34 years. Type II error: We conclude that the mean is 34 years, when in fact it really is not 34 years.
  • Type I error: We conclude that more than 60% of Americans vote in presidential elections, when the actual percentage is at most 60%.Type II error: We conclude that at most 60% of Americans vote in presidential elections when, in fact, more than 60% do.
  • Type I error: We conclude that the mean starting salary is less than $100,000, when it really is at least $100,000. Type II error: We conclude that the mean starting salary is at least $100,000 when, in fact, it is less than $100,000.
  • Type I error: We conclude that the proportion of high school seniors who get drunk each month is not 29%, when it really is 29%. Type II error: We conclude that the proportion of high school seniors who get drunk each month is 29% when, in fact, it is not 29%.
  • Type I error: We conclude that fewer than 5% of adults ride the bus to work in Los Angeles, when the percentage that do is really 5% or more. Type II error: We conclude that 5% or more adults ride the bus to work in Los Angeles when, in fact, fewer that 5% do.
  • Type I error: We conclude that the mean number of cars a person owns in his or her lifetime is more than 10, when in reality it is not more than 10. Type II error: We conclude that the mean number of cars a person owns in his or her lifetime is not more than 10 when, in fact, it is more than 10.
  • Type I error: We conclude that the proportion of Americans who prefer to live away from cities is not about half, though the actual proportion is about half. Type II error: We conclude that the proportion of Americans who prefer to live away from cities is half when, in fact, it is not half.
  • Type I error: We conclude that the duration of paid vacations each year for Europeans is not six weeks, when in fact it is six weeks. Type II error: We conclude that the duration of paid vacations each year for Europeans is six weeks when, in fact, it is not.
  • Type I error: We conclude that the proportion is less than 11%, when it is really at least 11%. Type II error: We conclude that the proportion of women who develop breast cancer is at least 11%, when in fact it is less than 11%.
  • Type I error: We conclude that the average tuition cost at private universities is more than $20,000, though in reality it is at most $20,000. Type II error: We conclude that the average tuition cost at private universities is at most $20,000 when, in fact, it is more than $20,000.

For statements a-j in Exercise 9.109 , answer the following in complete sentences.

  • State a consequence of committing a Type I error.
  • State a consequence of committing a Type II error.

When a new drug is created, the pharmaceutical company must subject it to testing before receiving the necessary permission from the Food and Drug Administration (FDA) to market the drug. Suppose the null hypothesis is “the drug is unsafe.” What is the Type II Error?

  • To conclude the drug is safe when in, fact, it is unsafe.
  • Not to conclude the drug is safe when, in fact, it is safe.
  • To conclude the drug is safe when, in fact, it is safe.
  • Not to conclude the drug is unsafe when, in fact, it is unsafe.

A statistics instructor believes that fewer than 20% of Evergreen Valley College (EVC) students attended the opening midnight showing of the latest Harry Potter movie. She surveys 84 of her students and finds that 11 of them attended the midnight showing. The Type I error is to conclude that the percent of EVC students who attended is ________.

  • at least 20%, when in fact, it is less than 20%.
  • 20%, when in fact, it is 20%.
  • less than 20%, when in fact, it is at least 20%.
  • less than 20%, when in fact, it is less than 20%.

It is believed that Lake Tahoe Community College (LTCC) Intermediate Algebra students get less than seven hours of sleep per night, on average. A survey of 22 LTCC Intermediate Algebra students generated a mean of 7.24 hours with a standard deviation of 1.93 hours. At a level of significance of 5%, do LTCC Intermediate Algebra students get less than seven hours of sleep per night, on average?

The Type II error is not to reject that the mean number of hours of sleep LTCC students get per night is at least seven when, in fact, the mean number of hours

  • is more than seven hours.
  • is at most seven hours.
  • is at least seven hours.
  • is less than seven hours.

Previously, an organization reported that teenagers spent 4.5 hours per week, on average, on the phone. The organization thinks that, currently, the mean is higher. Fifteen randomly chosen teenagers were asked how many hours per week they spend on the phone. The sample mean was 4.75 hours with a sample standard deviation of 2.0. Conduct a hypothesis test, the Type I error is:

  • to conclude that the current mean hours per week is higher than 4.5, when in fact, it is higher
  • to conclude that the current mean hours per week is higher than 4.5, when in fact, it is the same
  • to conclude that the mean hours per week currently is 4.5, when in fact, it is higher
  • to conclude that the mean hours per week currently is no higher than 4.5, when in fact, it is not higher

9.4: Distribution Needed for Hypothesis Testing

It is believed that Lake Tahoe Community College (LTCC) Intermediate Algebra students get less than seven hours of sleep per night, on average. A survey of 22 LTCC Intermediate Algebra students generated a mean of 7.24 hours with a standard deviation of 1.93 hours. At a level of significance of 5%, do LTCC Intermediate Algebra students get less than seven hours of sleep per night, on average? The distribution to be used for this test is \(\bar{X} \sim\) ________________

  • \(N\left(7.24, \frac{1.93}{\sqrt{22}}\right)\)
  • \(N\left(7.24, 1.93\right)\)

9.5: Rare Events, the Sample, Decision and Conclusion

The National Institute of Mental Health published an article stating that in any one-year period, approximately 9.5 percent of American adults suffer from depression or a depressive illness. Suppose that in a survey of 100 people in a certain town, seven of them suffered from depression or a depressive illness. Conduct a hypothesis test to determine if the true proportion of people in that town suffering from depression or a depressive illness is lower than the percent in the general adult American population.

  • Is this a test of one mean or proportion?
  • State the null and alternative hypotheses. \(H_{0}\) : ____________________ \(H_{a}\) : ____________________
  • Is this a right-tailed, left-tailed, or two-tailed test?
  • What symbol represents the random variable for this test?
  • In words, define the random variable for this test.
  • \(x =\) ________________
  • \(n =\) ________________
  • \(p′ =\) _____________
  • Calculate \(\sigma_{x} =\) __________. Show the formula set-up.
  • State the distribution to use for the hypothesis test.
  • Find the \(p\text{-value}\).
  • Reason for the decision:
  • Conclusion (write out in a complete sentence):

9.6: Additional Information and Full Hypothesis Test Examples

For each of the word problems, use a solution sheet to do the hypothesis test. The solution sheet is found in [link] . Please feel free to make copies of the solution sheets. For the online version of the book, it is suggested that you copy the .doc or the .pdf files.

If you are using a Student's \(t\) - distribution for one of the following homework problems, you may assume that the underlying population is normally distributed. (In general, you must first prove that assumption, however.)

A particular brand of tires claims that its deluxe tire averages at least 50,000 miles before it needs to be replaced. From past studies of this tire, the standard deviation is known to be 8,000. A survey of owners of that tire design is conducted. From the 28 tires surveyed, the mean lifespan was 46,500 miles with a standard deviation of 9,800 miles. Using \(\alpha = 0.05\), is the data highly inconsistent with the claim?

  • \(H_{0}: \mu \geq 50,000\)
  • \(H_{a}: \mu < 50,000\)
  • Let \(\bar{X} =\) the average lifespan of a brand of tires.
  • normal distribution
  • \(z = -2.315\)
  • \(p\text{-value} = 0.0103\)
  • Check student’s solution.
  • alpha: 0.05
  • Decision: Reject the null hypothesis.
  • Reason for decision: The \(p\text{-value}\) is less than 0.05.
  • Conclusion: There is sufficient evidence to conclude that the mean lifespan of the tires is less than 50,000 miles.
  • \((43,537, 49,463)\)

From generation to generation, the mean age when smokers first start to smoke varies. However, the standard deviation of that age remains constant of around 2.1 years. A survey of 40 smokers of this generation was done to see if the mean starting age is at least 19. The sample mean was 18.1 with a sample standard deviation of 1.3. Do the data support the claim at the 5% level?

The cost of a daily newspaper varies from city to city. However, the variation among prices remains steady with a standard deviation of 20¢. A study was done to test the claim that the mean cost of a daily newspaper is $1.00. Twelve costs yield a mean cost of 95¢ with a standard deviation of 18¢. Do the data support the claim at the 1% level?

  • \(H_{0}: \mu = $1.00\)
  • \(H_{a}: \mu \neq $1.00\)
  • Let \(\bar{X} =\) the average cost of a daily newspaper.
  • \(z = –0.866\)
  • \(p\text{-value} = 0.3865\)
  • \(\alpha: 0.01\)
  • Decision: Do not reject the null hypothesis.
  • Reason for decision: The \(p\text{-value}\) is greater than 0.01.
  • Conclusion: There is sufficient evidence to support the claim that the mean cost of daily papers is $1. The mean cost could be $1.
  • \(($0.84, $1.06)\)

An article in the San Jose Mercury News stated that students in the California state university system take 4.5 years, on average, to finish their undergraduate degrees. Suppose you believe that the mean time is longer. You conduct a survey of 49 students and obtain a sample mean of 5.1 with a sample standard deviation of 1.2. Do the data support your claim at the 1% level?

The mean number of sick days an employee takes per year is believed to be about ten. Members of a personnel department do not believe this figure. They randomly survey eight employees. The number of sick days they took for the past year are as follows: 12; 4; 15; 3; 11; 8; 6; 8. Let \(x =\) the number of sick days they took for the past year. Should the personnel team believe that the mean number is ten?

  • \(H_{0}: \mu = 10\)
  • \(H_{a}: \mu \neq 10\)
  • Let \(\bar{X}\) the mean number of sick days an employee takes per year.
  • Student’s t -distribution
  • \(t = –1.12\)
  • \(p\text{-value} = 0.300\)
  • \(\alpha: 0.05\)
  • Reason for decision: The \(p\text{-value}\) is greater than 0.05.
  • Conclusion: At the 5% significance level, there is insufficient evidence to conclude that the mean number of sick days is not ten.
  • \((4.9443, 11.806)\)

In 1955, Life Magazine reported that the 25 year-old mother of three worked, on average, an 80 hour week. Recently, many groups have been studying whether or not the women's movement has, in fact, resulted in an increase in the average work week for women (combining employment and at-home work). Suppose a study was done to determine if the mean work week has increased. 81 women were surveyed with the following results. The sample mean was 83; the sample standard deviation was ten. Does it appear that the mean work week has increased for women at the 5% level?

Your statistics instructor claims that 60 percent of the students who take her Elementary Statistics class go through life feeling more enriched. For some reason that she can't quite figure out, most people don't believe her. You decide to check this out on your own. You randomly survey 64 of her past Elementary Statistics students and find that 34 feel more enriched as a result of her class. Now, what do you think?

  • \(H_{0}: p \geq 0.6\)
  • \(H_{a}: p < 0.6\)
  • Let \(P′ =\) the proportion of students who feel more enriched as a result of taking Elementary Statistics.
  • normal for a single proportion
  • \(p\text{-value} = 0.1308\)
  • Conclusion: There is insufficient evidence to conclude that less than 60 percent of her students feel more enriched.

The “plus-4s” confidence interval is \((0.411, 0.648)\)

A Nissan Motor Corporation advertisement read, “The average man’s I.Q. is 107. The average brown trout’s I.Q. is 4. So why can’t man catch brown trout?” Suppose you believe that the brown trout’s mean I.Q. is greater than four. You catch 12 brown trout. A fish psychologist determines the I.Q.s as follows: 5; 4; 7; 3; 6; 4; 5; 3; 6; 3; 8; 5. Conduct a hypothesis test of your belief.

Refer to Exercise 9.119 . Conduct a hypothesis test to see if your decision and conclusion would change if your belief were that the brown trout’s mean I.Q. is not four.

  • \(H_{0}: \mu = 4\)
  • \(H_{a}: \mu \neq 4\)
  • Let \(\bar{X}\) the average I.Q. of a set of brown trout.
  • two-tailed Student's t-test
  • \(t = 1.95\)
  • \(p\text{-value} = 0.076\)
  • Reason for decision: The \(p\text{-value}\) is greater than 0.05
  • Conclusion: There is insufficient evidence to conclude that the average IQ of brown trout is not four.
  • \((3.8865,5.9468)\)

According to an article in Newsweek , the natural ratio of girls to boys is 100:105. In China, the birth ratio is 100: 114 (46.7% girls). Suppose you don’t believe the reported figures of the percent of girls born in China. You conduct a study. In this study, you count the number of girls and boys born in 150 randomly chosen recent births. There are 60 girls and 90 boys born of the 150. Based on your study, do you believe that the percent of girls born in China is 46.7?

A poll done for Newsweek found that 13% of Americans have seen or sensed the presence of an angel. A contingent doubts that the percent is really that high. It conducts its own survey. Out of 76 Americans surveyed, only two had seen or sensed the presence of an angel. As a result of the contingent’s survey, would you agree with the Newsweek poll? In complete sentences, also give three reasons why the two polls might give different results.

  • \(H_{a}: p < 0.13\)
  • Let \(P′ =\) the proportion of Americans who have seen or sensed angels
  • –2.688
  • \(p\text{-value} = 0.0036\)
  • Reason for decision: The \(p\text{-value}\)e is less than 0.05.
  • Conclusion: There is sufficient evidence to conclude that the percentage of Americans who have seen or sensed an angel is less than 13%.

The“plus-4s” confidence interval is (0.0022, 0.0978)

The mean work week for engineers in a start-up company is believed to be about 60 hours. A newly hired engineer hopes that it’s shorter. She asks ten engineering friends in start-ups for the lengths of their mean work weeks. Based on the results that follow, should she count on the mean work week to be shorter than 60 hours?

Data (length of mean work week): 70; 45; 55; 60; 65; 55; 55; 60; 50; 55.

Use the “Lap time” data for Lap 4 (see [link] ) to test the claim that Terri finishes Lap 4, on average, in less than 129 seconds. Use all twenty races given.

  • \(H_{0}: \mu \geq 129\)
  • \(H_{a}: \mu < 129\)
  • Let \(\bar{X} =\) the average time in seconds that Terri finishes Lap 4.
  • Student's t -distribution
  • \(t = 1.209\)
  • Conclusion: There is insufficient evidence to conclude that Terri’s mean lap time is less than 129 seconds.
  • \((128.63, 130.37)\)

Use the “Initial Public Offering” data (see [link] ) to test the claim that the mean offer price was $18 per share. Do not use all the data. Use your random number generator to randomly survey 15 prices.

The following questions were written by past students. They are excellent problems!

"Asian Family Reunion," by Chau Nguyen

Every two years it comes around.

We all get together from different towns.

In my honest opinion,

It's not a typical family reunion.

Not forty, or fifty, or sixty,

But how about seventy companions!

The kids would play, scream, and shout

One minute they're happy, another they'll pout.

The teenagers would look, stare, and compare

From how they look to what they wear.

The men would chat about their business

That they make more, but never less.

Money is always their subject

And there's always talk of more new projects.

The women get tired from all of the chats

They head to the kitchen to set out the mats.

Some would sit and some would stand

Eating and talking with plates in their hands.

Then come the games and the songs

And suddenly, everyone gets along!

With all that laughter, it's sad to say

That it always ends in the same old way.

They hug and kiss and say "good-bye"

And then they all begin to cry!

I say that 60 percent shed their tears

But my mom counted 35 people this year.

She said that boys and men will always have their pride,

So we won't ever see them cry.

I myself don't think she's correct,

So could you please try this problem to see if you object?

  • \(H_{0}: p = 0.60\)
  • \(H_{a}: p < 0.60\)
  • Let \(P′ =\) the proportion of family members who shed tears at a reunion.
  • –1.71
  • Reason for decision: \(p\text{-value} < \alpha\)
  • Conclusion: At the 5% significance level, there is sufficient evidence to conclude that the proportion of family members who shed tears at a reunion is less than 0.60. However, the test is weak because the \(p\text{-value}\) and alpha are quite close, so other tests should be done.
  • We are 95% confident that between 38.29% and 61.71% of family members will shed tears at a family reunion. \((0.3829, 0.6171)\). The“plus-4s” confidence interval (see chapter 8) is \((0.3861, 0.6139)\)

Note that here the “large-sample” \(1 - \text{PropZTest}\) provides the approximate \(p\text{-value}\) of 0.0438. Whenever a \(p\text{-value}\) based on a normal approximation is close to the level of significance, the exact \(p\text{-value}\) based on binomial probabilities should be calculated whenever possible. This is beyond the scope of this course.

"The Problem with Angels," by Cyndy Dowling

Although this problem is wholly mine,

The catalyst came from the magazine, Time.

On the magazine cover I did find

The realm of angels tickling my mind.

Inside, 69% I found to be

In angels, Americans do believe.

Then, it was time to rise to the task,

Ninety-five high school and college students I did ask.

Viewing all as one group,

Random sampling to get the scoop.

So, I asked each to be true,

"Do you believe in angels?" Tell me, do!

Hypothesizing at the start,

Totally believing in my heart

That the proportion who said yes

Would be equal on this test.

Lo and behold, seventy-three did arrive,

Out of the sample of ninety-five.

Now your job has just begun,

Solve this problem and have some fun.

"Blowing Bubbles," by Sondra Prull

Studying stats just made me tense,

I had to find some sane defense.

Some light and lifting simple play

To float my math anxiety away.

Blowing bubbles lifts me high

Takes my troubles to the sky.

POIK! They're gone, with all my stress

Bubble therapy is the best.

The label said each time I blew

The average number of bubbles would be at least 22.

I blew and blew and this I found

From 64 blows, they all are round!

But the number of bubbles in 64 blows

Varied widely, this I know.

20 per blow became the mean

They deviated by 6, and not 16.

From counting bubbles, I sure did relax

But now I give to you your task.

Was 22 a reasonable guess?

Find the answer and pass this test!

  • \(H_{0}: \mu \geq 22\)
  • \(H_{a}: \mu < 22\)
  • Let \(\bar{X} =\) the mean number of bubbles per blow.
  • –2.667
  • \(p\text{-value} = 0.00486\)
  • Conclusion: There is sufficient evidence to conclude that the mean number of bubbles per blow is less than 22.
  • \((18.501, 21.499)\)

"Dalmatian Darnation," by Kathy Sparling

A greedy dog breeder named Spreckles

Bred puppies with numerous freckles

The Dalmatians he sought

Possessed spot upon spot

The more spots, he thought, the more shekels.

His competitors did not agree

That freckles would increase the fee.

They said, “Spots are quite nice

But they don't affect price;

One should breed for improved pedigree.”

The breeders decided to prove

This strategy was a wrong move.

Breeding only for spots

Would wreak havoc, they thought.

His theory they want to disprove.

They proposed a contest to Spreckles

Comparing dog prices to freckles.

In records they looked up

One hundred one pups:

Dalmatians that fetched the most shekels.

They asked Mr. Spreckles to name

An average spot count he'd claim

To bring in big bucks.

Said Spreckles, “Well, shucks,

It's for one hundred one that I aim.”

Said an amateur statistician

Who wanted to help with this mission.

“Twenty-one for the sample

Standard deviation's ample:

They examined one hundred and one

Dalmatians that fetched a good sum.

They counted each spot,

Mark, freckle and dot

And tallied up every one.

Instead of one hundred one spots

They averaged ninety six dots

Can they muzzle Spreckles’

Obsession with freckles

Based on all the dog data they've got?

"Macaroni and Cheese, please!!" by Nedda Misherghi and Rachelle Hall

As a poor starving student I don't have much money to spend for even the bare necessities. So my favorite and main staple food is macaroni and cheese. It's high in taste and low in cost and nutritional value.

One day, as I sat down to determine the meaning of life, I got a serious craving for this, oh, so important, food of my life. So I went down the street to Greatway to get a box of macaroni and cheese, but it was SO expensive! $2.02 !!! Can you believe it? It made me stop and think. The world is changing fast. I had thought that the mean cost of a box (the normal size, not some super-gigantic-family-value-pack) was at most $1, but now I wasn't so sure. However, I was determined to find out. I went to 53 of the closest grocery stores and surveyed the prices of macaroni and cheese. Here are the data I wrote in my notebook:

Price per box of Mac and Cheese:

  • 5 stores @ $2.02
  • 15 stores @ $0.25
  • 3 stores @ $1.29
  • 6 stores @ $0.35
  • 4 stores @ $2.27
  • 7 stores @ $1.50
  • 5 stores @ $1.89
  • 8 stores @ 0.75.

I could see that the cost varied but I had to sit down to figure out whether or not I was right. If it does turn out that this mouth-watering dish is at most $1, then I'll throw a big cheesy party in our next statistics lab, with enough macaroni and cheese for just me. (After all, as a poor starving student I can't be expected to feed our class of animals!)

  • \(H_{0}: \mu \leq 1\)
  • \(H_{a}: \mu > 1\)
  • Let \(\bar{X} =\) the mean cost in dollars of macaroni and cheese in a certain town.
  • Student's \(t\)-distribution
  • \(t = 0.340\)
  • \(p\text{-value} = 0.36756\)
  • Conclusion: The mean cost could be $1, or less. At the 5% significance level, there is insufficient evidence to conclude that the mean price of a box of macaroni and cheese is more than $1.
  • \((0.8291, 1.241)\)

"William Shakespeare: The Tragedy of Hamlet, Prince of Denmark," by Jacqueline Ghodsi

THE CHARACTERS (in order of appearance):

  • HAMLET, Prince of Denmark and student of Statistics
  • POLONIUS, Hamlet’s tutor
  • HOROTIO, friend to Hamlet and fellow student

Scene: The great library of the castle, in which Hamlet does his lessons

(The day is fair, but the face of Hamlet is clouded. He paces the large room. His tutor, Polonius, is reprimanding Hamlet regarding the latter’s recent experience. Horatio is seated at the large table at right stage.)

POLONIUS: My Lord, how cans’t thou admit that thou hast seen a ghost! It is but a figment of your imagination!

HAMLET: I beg to differ; I know of a certainty that five-and-seventy in one hundred of us, condemned to the whips and scorns of time as we are, have gazed upon a spirit of health, or goblin damn’d, be their intents wicked or charitable.

POLONIUS If thou doest insist upon thy wretched vision then let me invest your time; be true to thy work and speak to me through the reason of the null and alternate hypotheses. (He turns to Horatio.) Did not Hamlet himself say, “What piece of work is man, how noble in reason, how infinite in faculties? Then let not this foolishness persist. Go, Horatio, make a survey of three-and-sixty and discover what the true proportion be. For my part, I will never succumb to this fantasy, but deem man to be devoid of all reason should thy proposal of at least five-and-seventy in one hundred hold true.

HORATIO (to Hamlet): What should we do, my Lord?

HAMLET: Go to thy purpose, Horatio.

HORATIO: To what end, my Lord?

HAMLET: That you must teach me. But let me conjure you by the rights of our fellowship, by the consonance of our youth, but the obligation of our ever-preserved love, be even and direct with me, whether I am right or no.

(Horatio exits, followed by Polonius, leaving Hamlet to ponder alone.)

(The next day, Hamlet awaits anxiously the presence of his friend, Horatio. Polonius enters and places some books upon the table just a moment before Horatio enters.)

POLONIUS: So, Horatio, what is it thou didst reveal through thy deliberations?

HORATIO: In a random survey, for which purpose thou thyself sent me forth, I did discover that one-and-forty believe fervently that the spirits of the dead walk with us. Before my God, I might not this believe, without the sensible and true avouch of mine own eyes.

POLONIUS: Give thine own thoughts no tongue, Horatio. (Polonius turns to Hamlet.) But look to’t I charge you, my Lord. Come Horatio, let us go together, for this is not our test. (Horatio and Polonius leave together.)

HAMLET: To reject, or not reject, that is the question: whether ‘tis nobler in the mind to suffer the slings and arrows of outrageous statistics, or to take arms against a sea of data, and, by opposing, end them. (Hamlet resignedly attends to his task.)

(Curtain falls)

"Untitled," by Stephen Chen

I've often wondered how software is released and sold to the public. Ironically, I work for a company that sells products with known problems. Unfortunately, most of the problems are difficult to create, which makes them difficult to fix. I usually use the test program X, which tests the product, to try to create a specific problem. When the test program is run to make an error occur, the likelihood of generating an error is 1%.

So, armed with this knowledge, I wrote a new test program Y that will generate the same error that test program X creates, but more often. To find out if my test program is better than the original, so that I can convince the management that I'm right, I ran my test program to find out how often I can generate the same error. When I ran my test program 50 times, I generated the error twice. While this may not seem much better, I think that I can convince the management to use my test program instead of the original test program. Am I right?

  • \(H_{0}: p = 0.01\)
  • \(H_{a}: p > 0.01\)
  • Let \(P′ =\) the proportion of errors generated
  • Normal for a single proportion
  • Decision: Reject the null hypothesis
  • Conclusion: At the 5% significance level, there is sufficient evidence to conclude that the proportion of errors generated is more than 0.01.

The“plus-4s” confidence interval is \((0.004, 0.144)\).

"Japanese Girls’ Names"

by Kumi Furuichi

It used to be very typical for Japanese girls’ names to end with “ko.” (The trend might have started around my grandmothers’ generation and its peak might have been around my mother’s generation.) “Ko” means “child” in Chinese characters. Parents would name their daughters with “ko” attaching to other Chinese characters which have meanings that they want their daughters to become, such as Sachiko—happy child, Yoshiko—a good child, Yasuko—a healthy child, and so on.

However, I noticed recently that only two out of nine of my Japanese girlfriends at this school have names which end with “ko.” More and more, parents seem to have become creative, modernized, and, sometimes, westernized in naming their children.

I have a feeling that, while 70 percent or more of my mother’s generation would have names with “ko” at the end, the proportion has dropped among my peers. I wrote down all my Japanese friends’, ex-classmates’, co-workers, and acquaintances’ names that I could remember. Following are the names. (Some are repeats.) Test to see if the proportion has dropped for this generation.

Ai, Akemi, Akiko, Ayumi, Chiaki, Chie, Eiko, Eri, Eriko, Fumiko, Harumi, Hitomi, Hiroko, Hiroko, Hidemi, Hisako, Hinako, Izumi, Izumi, Junko, Junko, Kana, Kanako, Kanayo, Kayo, Kayoko, Kazumi, Keiko, Keiko, Kei, Kumi, Kumiko, Kyoko, Kyoko, Madoka, Maho, Mai, Maiko, Maki, Miki, Miki, Mikiko, Mina, Minako, Miyako, Momoko, Nana, Naoko, Naoko, Naoko, Noriko, Rieko, Rika, Rika, Rumiko, Rei, Reiko, Reiko, Sachiko, Sachiko, Sachiyo, Saki, Sayaka, Sayoko, Sayuri, Seiko, Shiho, Shizuka, Sumiko, Takako, Takako, Tomoe, Tomoe, Tomoko, Touko, Yasuko, Yasuko, Yasuyo, Yoko, Yoko, Yoko, Yoshiko, Yoshiko, Yoshiko, Yuka, Yuki, Yuki, Yukiko, Yuko, Yuko.

"Phillip’s Wish," by Suzanne Osorio

My nephew likes to play

Chasing the girls makes his day.

He asked his mother

If it is okay

To get his ear pierced.

She said, “No way!”

To poke a hole through your ear,

Is not what I want for you, dear.

He argued his point quite well,

Says even my macho pal, Mel,

Has gotten this done.

It’s all just for fun.

C’mon please, mom, please, what the hell.

Again Phillip complained to his mother,

Saying half his friends (including their brothers)

Are piercing their ears

And they have no fears

He wants to be like the others.

She said, “I think it’s much less.

We must do a hypothesis test.

And if you are right,

I won’t put up a fight.

But, if not, then my case will rest.”

We proceeded to call fifty guys

To see whose prediction would fly.

Nineteen of the fifty

Said piercing was nifty

And earrings they’d occasionally buy.

Then there’s the other thirty-one,

Who said they’d never have this done.

So now this poem’s finished.

Will his hopes be diminished,

Or will my nephew have his fun?

  • \(H_{0}: p = 0.50\)
  • \(H_{a}: p < 0.50\)
  • Let \(P′ =\) the proportion of friends that has a pierced ear.
  • –1.70
  • \(p\text{-value} = 0.0448\)
  • Reason for decision: The \(p\text{-value}\) is less than 0.05. (However, they are very close.)
  • Conclusion: There is sufficient evidence to support the claim that less than 50% of his friends have pierced ears.
  • Confidence Interval: \((0.245, 0.515)\): The “plus-4s” confidence interval is \((0.259, 0.519)\).

"The Craven," by Mark Salangsang

Once upon a morning dreary

In stats class I was weak and weary.

Pondering over last night’s homework

Whose answers were now on the board

This I did and nothing more.

While I nodded nearly napping

Suddenly, there came a tapping.

As someone gently rapping,

Rapping my head as I snore.

Quoth the teacher, “Sleep no more.”

“In every class you fall asleep,”

The teacher said, his voice was deep.

“So a tally I’ve begun to keep

Of every class you nap and snore.

The percentage being forty-four.”

“My dear teacher I must confess,

While sleeping is what I do best.

The percentage, I think, must be less,

A percentage less than forty-four.”

This I said and nothing more.

“We’ll see,” he said and walked away,

And fifty classes from that day

He counted till the month of May

The classes in which I napped and snored.

The number he found was twenty-four.

At a significance level of 0.05,

Please tell me am I still alive?

Or did my grade just take a dive

Plunging down beneath the floor?

Upon thee I hereby implore.

Toastmasters International cites a report by Gallop Poll that 40% of Americans fear public speaking. A student believes that less than 40% of students at her school fear public speaking. She randomly surveys 361 schoolmates and finds that 135 report they fear public speaking. Conduct a hypothesis test to determine if the percent at her school is less than 40%.

  • \(H_{0}: p = 0.40\)
  • \(H_{a}: p < 0.40\)
  • Let \(P′ =\) the proportion of schoolmates who fear public speaking.
  • –1.01
  • \(p\text{-value} = 0.1563\)
  • Conclusion: There is insufficient evidence to support the claim that less than 40% of students at the school fear public speaking.
  • Confidence Interval: \((0.3241, 0.4240)\): The “plus-4s” confidence interval is \((0.3257, 0.4250)\).

Sixty-eight percent of online courses taught at community colleges nationwide were taught by full-time faculty. To test if 68% also represents California’s percent for full-time faculty teaching the online classes, Long Beach City College (LBCC) in California, was randomly selected for comparison. In the same year, 34 of the 44 online courses LBCC offered were taught by full-time faculty. Conduct a hypothesis test to determine if 68% represents California. NOTE: For more accurate results, use more California community colleges and this past year's data.

According to an article in Bloomberg Businessweek , New York City's most recent adult smoking rate is 14%. Suppose that a survey is conducted to determine this year’s rate. Nine out of 70 randomly chosen N.Y. City residents reply that they smoke. Conduct a hypothesis test to determine if the rate is still 14% or if it has decreased.

  • \(H_{0}: p = 0.14\)
  • \(H_{a}: p < 0.14\)
  • Let \(P′ =\) the proportion of NYC residents that smoke.
  • –0.2756
  • \(p\text{-value} = 0.3914\)
  • At the 5% significance level, there is insufficient evidence to conclude that the proportion of NYC residents who smoke is less than 0.14.
  • Confidence Interval: \((0.0502, 0.2070)\): The “plus-4s” confidence interval (see chapter 8) is \((0.0676, 0.2297)\).

The mean age of De Anza College students in a previous term was 26.6 years old. An instructor thinks the mean age for online students is older than 26.6. She randomly surveys 56 online students and finds that the sample mean is 29.4 with a standard deviation of 2.1. Conduct a hypothesis test.

Registered nurses earned an average annual salary of $69,110. For that same year, a survey was conducted of 41 California registered nurses to determine if the annual salary is higher than $69,110 for California nurses. The sample average was $71,121 with a sample standard deviation of $7,489. Conduct a hypothesis test.

  • \(H_{0}: \mu = 69,110\)
  • \(H_{0}: \mu > 69,110\)
  • Let \(\bar{X} =\) the mean salary in dollars for California registered nurses.
  • \(t = 1.719\)
  • \(p\text{-value}: 0.0466\)
  • Conclusion: At the 5% significance level, there is sufficient evidence to conclude that the mean salary of California registered nurses exceeds $69,110.
  • \(($68,757, $73,485)\)

La Leche League International reports that the mean age of weaning a child from breastfeeding is age four to five worldwide. In America, most nursing mothers wean their children much earlier. Suppose a random survey is conducted of 21 U.S. mothers who recently weaned their children. The mean weaning age was nine months (3/4 year) with a standard deviation of 4 months. Conduct a hypothesis test to determine if the mean weaning age in the U.S. is less than four years old.

Over the past few decades, public health officials have examined the link between weight concerns and teen girls' smoking. Researchers surveyed a group of 273 randomly selected teen girls living in Massachusetts (between 12 and 15 years old). After four years the girls were surveyed again. Sixty-three said they smoked to stay thin. Is there good evidence that more than thirty percent of the teen girls smoke to stay thin?

After conducting the test, your decision and conclusion are

  • Reject \(H_{0}\): There is sufficient evidence to conclude that more than 30% of teen girls smoke to stay thin.
  • Do not reject \(H_{0}\): There is not sufficient evidence to conclude that less than 30% of teen girls smoke to stay thin.
  • Do not reject \(H_{0}\): There is not sufficient evidence to conclude that more than 30% of teen girls smoke to stay thin.
  • Reject \(H_{0}\): There is sufficient evidence to conclude that less than 30% of teen girls smoke to stay thin.

A statistics instructor believes that fewer than 20% of Evergreen Valley College (EVC) students attended the opening night midnight showing of the latest Harry Potter movie. She surveys 84 of her students and finds that 11 of them attended the midnight showing.

At a 1% level of significance, an appropriate conclusion is:

  • There is insufficient evidence to conclude that the percent of EVC students who attended the midnight showing of Harry Potter is less than 20%.
  • There is sufficient evidence to conclude that the percent of EVC students who attended the midnight showing of Harry Potter is more than 20%.
  • There is sufficient evidence to conclude that the percent of EVC students who attended the midnight showing of Harry Potter is less than 20%.
  • There is insufficient evidence to conclude that the percent of EVC students who attended the midnight showing of Harry Potter is at least 20%.

Previously, an organization reported that teenagers spent 4.5 hours per week, on average, on the phone. The organization thinks that, currently, the mean is higher. Fifteen randomly chosen teenagers were asked how many hours per week they spend on the phone. The sample mean was 4.75 hours with a sample standard deviation of 2.0. Conduct a hypothesis test.

At a significance level of \(a = 0.05\), what is the correct conclusion?

  • There is enough evidence to conclude that the mean number of hours is more than 4.75
  • There is enough evidence to conclude that the mean number of hours is more than 4.5
  • There is not enough evidence to conclude that the mean number of hours is more than 4.5
  • There is not enough evidence to conclude that the mean number of hours is more than 4.75

Instructions: For the following ten exercises,

Hypothesis testing: For the following ten exercises, answer each question.

State the null and alternate hypothesis.

State the \(p\text{-value}\).

State \(\alpha\).

What is your decision?

Write a conclusion.

Answer any other questions asked in the problem.

According to the Center for Disease Control website, in 2011 at least 18% of high school students have smoked a cigarette. An Introduction to Statistics class in Davies County, KY conducted a hypothesis test at the local high school (a medium sized–approximately 1,200 students–small city demographic) to determine if the local high school’s percentage was lower. One hundred fifty students were chosen at random and surveyed. Of the 150 students surveyed, 82 have smoked. Use a significance level of 0.05 and using appropriate statistical evidence, conduct a hypothesis test and state the conclusions.

A recent survey in the N.Y. Times Almanac indicated that 48.8% of families own stock. A broker wanted to determine if this survey could be valid. He surveyed a random sample of 250 families and found that 142 owned some type of stock. At the 0.05 significance level, can the survey be considered to be accurate?

  • \(H_{0}: p = 0.488\) \(H_{a}: p \neq 0.488\)
  • \(p\text{-value} = 0.0114\)
  • \(\alpha = 0.05\)
  • Reject the null hypothesis.
  • At the 5% level of significance, there is enough evidence to conclude that 48.8% of families own stocks.
  • The survey does not appear to be accurate.

Driver error can be listed as the cause of approximately 54% of all fatal auto accidents, according to the American Automobile Association. Thirty randomly selected fatal accidents are examined, and it is determined that 14 were caused by driver error. Using \(\alpha = 0.05\), is the AAA proportion accurate?

The US Department of Energy reported that 51.7% of homes were heated by natural gas. A random sample of 221 homes in Kentucky found that 115 were heated by natural gas. Does the evidence support the claim for Kentucky at the \(\alpha = 0.05\) level in Kentucky? Are the results applicable across the country? Why?

  • \(H_{0}: p = 0.517\) \(H_{0}: p \neq 0.517\)
  • \(p\text{-value} = 0.9203\).
  • \(\alpha = 0.05\).
  • Do not reject the null hypothesis.
  • At the 5% significance level, there is not enough evidence to conclude that the proportion of homes in Kentucky that are heated by natural gas is 0.517.
  • However, we cannot generalize this result to the entire nation. First, the sample’s population is only the state of Kentucky. Second, it is reasonable to assume that homes in the extreme north and south will have extreme high usage and low usage, respectively. We would need to expand our sample base to include these possibilities if we wanted to generalize this claim to the entire nation.

For Americans using library services, the American Library Association claims that at most 67% of patrons borrow books. The library director in Owensboro, Kentucky feels this is not true, so she asked a local college statistic class to conduct a survey. The class randomly selected 100 patrons and found that 82 borrowed books. Did the class demonstrate that the percentage was higher in Owensboro, KY? Use \(\alpha = 0.01\) level of significance. What is the possible proportion of patrons that do borrow books from the Owensboro Library?

The Weather Underground reported that the mean amount of summer rainfall for the northeastern US is at least 11.52 inches. Ten cities in the northeast are randomly selected and the mean rainfall amount is calculated to be 7.42 inches with a standard deviation of 1.3 inches. At the \(\alpha = 0.05 level\), can it be concluded that the mean rainfall was below the reported average? What if \(\alpha = 0.01\)? Assume the amount of summer rainfall follows a normal distribution.

  • \(H_{0}: \mu \geq 11.52\) \(H_{a}: \mu < 11.52\)
  • \(p\text{-value} = 0.000002\) which is almost 0.
  • At the 5% significance level, there is enough evidence to conclude that the mean amount of summer rain in the northeaster US is less than 11.52 inches, on average.
  • We would make the same conclusion if alpha was 1% because the \(p\text{-value}\) is almost 0.

A survey in the N.Y. Times Almanac finds the mean commute time (one way) is 25.4 minutes for the 15 largest US cities. The Austin, TX chamber of commerce feels that Austin’s commute time is less and wants to publicize this fact. The mean for 25 randomly selected commuters is 22.1 minutes with a standard deviation of 5.3 minutes. At the \(\alpha = 0.10\) level, is the Austin, TX commute significantly less than the mean commute time for the 15 largest US cities?

A report by the Gallup Poll found that a woman visits her doctor, on average, at most 5.8 times each year. A random sample of 20 women results in these yearly visit totals

3; 2; 1; 3; 7; 2; 9; 4; 6; 6; 8; 0; 5; 6; 4; 2; 1; 3; 4; 1

At the \(\alpha = 0.05\) level can it be concluded that the sample mean is higher than 5.8 visits per year?

  • \(H_{0}: \mu \leq 5.8\) \(H_{a}: \mu > 5.8\)
  • \(p\text{-value} = 0.9987\)
  • At the 5% level of significance, there is not enough evidence to conclude that a woman visits her doctor, on average, more than 5.8 times a year.

According to the N.Y. Times Almanac the mean family size in the U.S. is 3.18. A sample of a college math class resulted in the following family sizes:

5; 4; 5; 4; 4; 3; 6; 4; 3; 3; 5; 5; 6; 3; 3; 2; 7; 4; 5; 2; 2; 2; 3; 2

At \(\alpha = 0.05\) level, is the class’ mean family size greater than the national average? Does the Almanac result remain valid? Why?

The student academic group on a college campus claims that freshman students study at least 2.5 hours per day, on average. One Introduction to Statistics class was skeptical. The class took a random sample of 30 freshman students and found a mean study time of 137 minutes with a standard deviation of 45 minutes. At α = 0.01 level, is the student academic group’s claim correct?

  • \(H_{0}: \mu \geq 150\) \(H_{0}: \mu < 150\)
  • \(p\text{-value} = 0.0622\)
  • \(\alpha = 0.01\)
  • At the 1% significance level, there is not enough evidence to conclude that freshmen students study less than 2.5 hours per day, on average.
  • The student academic group’s claim appears to be correct.

9.7: Hypothesis Testing of a Single Mean and Single Proportion

  • 8. Hypothesis Testing

4. Tests in the Two-Sample Normal Model

In this section, we will study hypothesis tests in the two-sample normal model and in the bivariate normal model. This section parallels the section on Estimation in the Two Sample Normal Model in the chapter on Interval Estimation .

The Two-Sample Normal Model

Suppose that \(\bs{X} = (X_1, X_2, \ldots, X_n)\) is a random sample of size \(m\) from the normal distribution with mean \(\mu\) and standard deviation \(\sigma\), and that \(\bs{Y} = (Y_1, Y_2, \ldots, Y_n)\) is a random sample of size \(n\) from the normal distribution with mean \(\nu\) and standard deviation \(\tau\). Moreover, suppose that the samples \(\bs{X}\) and \(\bs{Y}\) are independent .

This type of situation arises frequently when the random variables represent a measurement of interest for the objects of the population, and the samples correspond to two different treatments. For example, we might be interested in the blood pressure of a certain population of patients. The \(\bs{X}\) vector records the blood pressures of a control sample, while the \(\bs{Y}\) vector records the blood pressures of the sample receiving a new drug. Similarly, we might be interested in the yield of an acre of corn. The \(\bs{X}\) vector records the yields of a sample receiving one type of fertilizer, while the \(\bs{Y}\) vector records the yields of a sample receiving a different type of fertilizer.

Usually our interest is in a comparison of the parameters (either the mean or variance) for the two sampling distributions. In this section we will construct tests for the for the difference of the means and the ratio of the variances. As with previous estimation problems we have studied, the procedures vary depending on what parameters are known or unknown. Also as before, key elements in the construction of the tests are the sample means and sample variances and the special properties of these statistics when the sampling distribution is normal.

We will use the following notation for the sample mean and sample variance of a generic sample \(\bs{U} = (U_1, U_2, \ldots, U_k)\): \[ M(\bs{U}) = \frac{1}{k} \sum_{i=1}^k U_i, \quad S^2(\bs{U}) = \frac{1}{k - 1} \sum_{i=1}^k [U_i - M(\bs{U})]^2 \]

Tests of the Difference in the Means with Known Standard Deviations

Our first discussion concerns tests for the difference in the means \(\nu - \mu\) under the assumption that the standard deviations \(\sigma\) and \(\tau\) are known. This is often, but not always, an unrealistic assumption. In some statistical problems, the variances are stable, and are at least approximately known, while the means may be different because of different treatments. Also this is a good place to start because the analysis is fairly easy.

For a conjectured difference of the means \( \delta \in \R \), define the test statistic \[ Z = \frac{[M(\bs{Y}) - M(\bs{X})] - \delta}{\sqrt{\sigma^2 / m + \tau^2 / n}} \]

  • If \( \nu - \mu = \delta \) then \( Z \) has the standard normal distribution.
  • If \( \nu - \mu \ne \delta \) then \(Z\) has the normal distribution with mean \([(\nu - \mu) - \delta] \big/ {\sqrt{\sigma^2 / m + \tau^2 / n}}\) and variance 1.

From properties of normal samples, \( M(\bs{X}) \) has a normal distribution with mean \( \mu \) and variance \( \sigma^2 / m \) and similarly \( M(\bs{Y}) \) has a normal distribution with mean \( \nu \) and variance \( \tau^2 / n \). Since the samples are independent, \( M(\bs{X}) \) and \( M(\bs{Y}) \) are independent, so \( M(\bs{Y}) - M(\bs{X}) \) has a normal distribution with mean \( \nu - \mu \) and variance \( \sigma^2 / m + \sigma^2 / n \). The final result then follows since \( Z \) is a linear function of \( M(\bs{Y}) - M(\bs{X}) \).

Of course (b) actually subsumes (a), but we separate them because the two cases play an impotrant role in the hypothesis tests. In part (b), the non-zero mean can be viewed as a non-centrality parameter .

As usual, for \(p \in (0, 1)\), let \(z(p)\) denote the quantile of order \(p\) for the standard normal distribution. For selected values of \(p\), \(z(p)\) can be obtained from the quantile app or from most statistical software packages. Recall also by symmetry that \(z(1 - p) = -z(p)\).

For every \( \alpha \in (0, 1) \), the following tests have significance level \(\alpha\):

  • Reject \(H_0: \nu - \mu = \delta\) versus \(H_1: \nu - \mu \ne \delta\) if and only if \(Z \lt -z(1 - \alpha / 2)\) or \(Z \gt z(1 - \alpha / 2)\) if and only if \( M(\bs{Y}) - M(\bs{X}) \gt \delta + z(1 - \alpha / 2) \sqrt{\sigma^2 / m + \tau^2 / n} \) or \( M(\bs{Y}) - M(\bs{X}) \lt \delta - z(1 - \alpha / 2) \sqrt{\sigma^2 / m + \tau^2 / n} \).
  • Reject \(H_0: \nu - \mu \ge \delta\) versus \(H_1: \nu - \mu \lt \delta\) if and only if \(Z \lt -z(1 - \alpha)\) if and only if \( M(\bs{Y}) - M(\bs{X}) \lt \delta - z(1 - \alpha) \sqrt{\sigma^2 / m + \tau^2 / n} \).
  • Reject \(H_0: \nu - \mu \le \delta\) versus \(H_1: \nu - \mu \gt \delta\) if and only if \(Z \gt z( 1 - \alpha)\) if and only if \( M(\bs{Y}) - M(\bs{X}) \gt \delta + z(1 - \alpha) \sqrt{\sigma^2 / m + \tau^2 / n} \).

This follows the same logic that we have seen before. In part (a), \( H_0 \) is a simple hypothesis, and under this hypothesis \( Z \) has the standard normal distribution. Thus, if \( H_0 \) is true then the probability of falsely rejecting \( H_0 \) is \( \alpha \) by definition of the quantiles. In parts (b) and (c), \( H_0 \) specifies a range of values of \( \nu - \mu \), and under \( H_0 \), \( Z \) has a nonstandard normal distribution, by . But the largest type 1 error probability is \( \alpha \) and occurs when \( \nu - \mu = \delta \). The decision rules in terms of \( M(\bs{Y}) - M(\bs{X}) \) are equivalent to those in terms of \( Z \) by simple algebra.

For each of the tests above, we fail to reject \(H_0\) at significance level \(\alpha\) if and only if \(\delta\) is in the corresponding \(1 - \alpha\) level confidence interval.

  • \( [M(\bs{Y}) - M(\bs{X})] - z(1 - \alpha / 2) \sqrt{\sigma^2 / m + \tau^2 / n} \le \delta \le [M(\bs{Y}) - M(\bs{X})] + z(1 - \alpha / 2) \sqrt{\sigma^2 / m + \tau^2 / n} \)
  • \( \delta \le [M(\bs{Y}) - M(\bs{X})] + z(1 - \alpha) \sqrt{\sigma^2 / m + \tau^2 / n} \)
  • \( \delta \ge [M(\bs{Y}) - M(\bs{X})] - z(1 - \alpha) \sqrt{\sigma^2 / m + \tau^2 / n} \)

These results follow from . In each case, we start with the inequality that corresponds to not rejecting the null hypothesis and solve for \( \delta \).

Tests of the Difference of the Means with Unknown Standard Deviations

Next we will construct tests for the difference in the means \(\nu - \mu\) under the more realistic assumption that the standard deviations \(\sigma\) and \(\tau\) are unknown. In this case, it is more difficult to find a suitable test statistic, but we can do the analysis in the special case that the standard deviations are the same. Thus, we will assume that \(\sigma = \tau\), and the common value \(\sigma\) is unknown. This assumption is reasonable if there is an inherent variability in the measurement variables that does not change even when different treatments are applied to the objects in the population.

Recall that the pooled estimate of the common variance \(\sigma^2\) is the weighted average of the sample variances, with the degrees of freedom as the weight factors: \[ S^2(\bs{X}, \bs{Y}) = \frac{(m - 1) S^2(\bs{X}) + (n - 1) S^2(\bs{Y})}{m + n - 2} \] The statistic \( S^2(\bs{X}, \bs{Y}) \) is an unbiased and consistent estimator of the common variance \( \sigma^2 \).

For a conjectured \( \delta \in \R \) define the test statistc \[ T = \frac{[M(\bs{Y}) - M(\bs{X})] - \delta}{S(\bs{X}, \bs{Y}) \sqrt{1 / m + 1 / n}} \]

  • If \( \nu - \mu = \delta \) then \( T \) has the \(t\) distribution with \(m + n - 2\) degrees of freedom,
  • If \( \nu - \mu \ne \delta \) then \( T \) has a non-central \( t \) distribution with \( m + n - 2 \) degrees of freedom and non-centrality parameter \[ \frac{(\nu - \mu) - \delta}{\sigma \sqrt{1/m + 1 /n}} \]

Part (b) actually subsumes part (a), since the ordinary \( t \) distribution is a special case of the non-central \( t \) distribution, with non-centrality parameter 0. With some basic algebra, we can write \( T \) in the form \[ T = \frac{Z + a}{\sqrt{V \big/ (m + n - 2)}}\] where \( Z \) is the standard score of \( M(\bs{Y}) - M(\bs{X}) \), \( a \) is the non-centrality parameter given in the theorem, and \( V = \frac{m + n - 2}{\sigma^2} S^2(\bs{X}, \bs{Y}) \). So \( Z \) has the standard normal distribution, \( V \) has the chi-square distribution with \( m + n - 2 \) degrees of freedom, and \( Z \) and \( V \) are independent. Thus by definition, \( T \) has the non-central \( t \) distribution with \( m + n - 2 \) degrees of freedom and non-centrality parameter \( a \).

As usual, for \(k \gt 0\) and \(p \in (0, 1)\), let \(t_k(p)\) denote the quantile of order \(p\) for the \(t\) distribution with \(k\) degrees of freedom. For selected values of \(k\) and \(p\), values of \(t_k(p)\) can be computed from the quantile app , or from most statistical software packages. Recall also that, by symmetry, \(t_k(1 - p) = -t_k(p)\).

The following tests have significance level \(\alpha\):

  • Reject \(H_0: \nu - \mu = \delta\) versus \(H_1: \nu - \mu \ne \delta\) if and only if \(T \lt -t_{m + n - 2}(1 - \alpha / 2)\) or \(T \gt t_{m + n - 2}(1 - \alpha / 2)\) if and only if \( M(\bs{Y}) - M(\bs{X}) \gt \delta + t_{m+n-2}(1 - \alpha / 2) \sqrt{\sigma^2 / m + \tau^2 / n} \) or \( M(\bs{Y}) - M(\bs{X}) \lt \delta - t_{m+n-2}(1 - \alpha / 2) \sqrt{\sigma^2 / m + \tau^2 / n} \)
  • Reject \(H_0: \nu - \mu \ge \delta\) versus \(H_1: \nu - \mu \lt \delta\) if and only if \(T \le -t_{m-n+2}(1 - \alpha)\) if and only if \( M(\bs{Y}) - M(\bs{X}) \lt \delta - t_{m+n-2}(1 - \alpha) \sqrt{\sigma^2 / m + \tau^2 / n} \)
  • Reject \(H_0: \nu - \mu \le \delta\) versus \(H_1: \nu - \mu \gt \delta\) if and only if \(T \ge t_{m-n+2}(1 - \alpha)\) if and only if \( M(\bs{Y}) - M(\bs{X}) \gt \delta + t_{m+n-2}(1 - \alpha) \sqrt{\sigma^2 / m + \tau^2 / n} \)

This follows the same logic that we have seen before. In part (a), \( H_0 \) is a simple hypothesis, and under this hypothesis \( T \) has the \( t \) distribution with \( m + n - 2 \) degrees of freedom. Thus, if \( H_0 \) is true then the probability of falsely rejecting \( H_0 \) is \( \alpha \) by definition of the quantiles. In parts (b) and (c), \( H_0 \) specifies a range of values of \( \nu - \mu \), and under \( H_0 \), \( T \) has a non-central \( t \) distribution by . But the largest type 1 error probability is \( \alpha \) and occurs when \( \nu - \mu = \delta \). The decision rules in terms of \( M(\bs{Y}) - M(\bs{X}) \) are equivalent to those in terms of \( T \) by simple algebra.

  • \( [M(\bs{Y}) - M(\bs{X})] - t_{m+n-2}(1 - \alpha / 2) \sqrt{\sigma^2 / m + \tau^2 / n} \le \delta \le [M(\bs{Y}) - M(\bs{X})] + t_{m+n-2}(1 - \alpha / 2) \sqrt{\sigma^2 / m + \tau^2 / n} \)
  • \( \delta \le [M(\bs{Y}) - M(\bs{X})] + t_{m+n-2}(1 - \alpha) \sqrt{\sigma^2 / m + \tau^2 / n} \)
  • \( \delta \ge [M(\bs{Y}) - M(\bs{X})] - t_{m+n-2}(1 - \alpha) \sqrt{\sigma^2 / m + \tau^2 / n} \)

Tests of the Ratio of the Variances

Next we will construct tests for the ratio of the distribution variances \(\tau^2 / \sigma^2\). So the basic assumption is that the variances, and of course the means \(\mu\) and \(\nu\) are unknown.

For a conjectured \( \rho \in (0, \infty) \), define the test statistics \[ F = \frac{S^2(\bs{X})}{S^2(\bs{Y})} \rho \]

  • If \( \tau^2 / \sigma^2 = \rho \) then \( F \) has the \(F\) distribution with \(m - 1\) degrees of freedom in the numerator and \(n - 1\) degrees of freedom in the denominator.
  • If \( \tau^2 / \sigma^2 \ne \rho \) then \( F \) has a scaled \( F \) distribution with \( m - 1 \) degrees of freedom in the numerator, \( n - 1 \) degrees of freedom in the denominator, and scale factor \( \rho \frac{\sigma^2}{\tau^2} \).

Part (b) actually subsumes part (a) when \( \rho = \tau^2 / \rho^2 \), so we will just prove (b). Note that \[ F = \left(\frac{S^2(\bs{X}) \big/ \sigma^2}{S^2(\bs{Y}) \big/ \tau^2}\right) \rho \frac{\sigma^2}{\tau^2} \] But \( S^2(\bs{X}) \big/ \sigma^2 \) has the chi-square distribution with \( m - 1 \) degrees of freedom, \( S^2(\bs{Y}) \big/ \tau^2 \) has the chi-square distribution with \( n - 1 \) degrees of freedom, and the variables are independent. Hence the ratio has the \( F \) distribution with \( m - 1 \) degrees of freedom in the numerator and \( n - 1 \) degrees of freedom in the denominator

The following tests have significance level \( \alpha \):

  • Reject \(H_0: \tau^2 / \sigma^2 = \rho\) versus \(H_1: \tau^2 / \sigma^2 \ne \rho\) if and only if \(F \gt f_{m-1, n-1}(1 - \alpha / 2)\) or \(F \lt f_{m-1, n-1}(\alpha / 2 )\).
  • Reject \(H_0: \tau^2 / \sigma^2 \le \rho\) versus \(H_1: \tau^2 / \sigma^2 \gt \rho\) if and only if \(F \lt f_{m-1, n-1}(\alpha)\).
  • Reject \(H_0: \tau^2 / \sigma^2 \ge \rho\) versus \(H_1: \tau^2 / \sigma^2 \lt \rho\) if and only if \(F \gt f_{m-1, n-1}(1 - \alpha)\).

The proof is the usual argument. In part (a), \( H_0 \) is a simple hypothesis, and under this hypothesis \( F \) has the \( f \) distribution with \( m - 1 \) degrees of freedom in the numerator \( n - 1 \) degrees of freedom in the denominator. Thus, if \( H_0 \) is true then the probability of falsely rejecting \( H_0 \) is \( \alpha \) by definition of the quantiles. In parts (b) and (c), \( H_0 \) specifies a range of values of \( \tau^2 / \sigma^2 \), and under \( H_0 \), \( F \) has a scaled \( F \) distribution by thoerem . But the largest type 1 error probability is \( \alpha \) and occurs when \( \tau^2 / \sigma^2 = \rho \).

For each of the tests above, we fail to reject \(H_0\) at significance level \(\alpha\) if and only if \(\rho_0\) is in the corresponding \(1 - \alpha\) level confidence interval.

  • \( \frac{S^2(\bs{Y})}{S^2(\bs{X})} F_{m-1,n-1}(\alpha / 2) \le \rho \le \frac{S^2(\bs{Y})}{S^2(\bs{X})} F_{m-1,n-1}(1 - \alpha / 2) \)
  • \(\rho \le \frac{S^2(\bs{Y})}{S^2(\bs{X})} F_{m-1,n-1}(\alpha) \)
  • \( \rho \ge \frac{S^2(\bs{Y})}{S^2(\bs{X})} F_{m-1,n-1}(1 - \alpha) \)

These results follow from . In each case, we start with the inequality that corresponds to not rejecting the null hypothesis and solve for \( \rho \).

Tests in the Bivariate Normal Model

In this subsection, we consider a model that is superficially similar to the two-sample normal model, but is actually much simpler. Suppose that \[ ((X_1, Y_1), (X_2, Y_2), \ldots, (X_n, Y_n)) \] is a random sample of size \(n\) from the bivariate normal distribution of \((X, Y)\) with \(\E(X) = \mu\), \(\E(Y) = \nu\), \(\var(X) = \sigma^2\), \(\var(Y) = \tau^2\), and \(\cov(X, Y) = \delta\).

Thus, instead of a pair of samples , we have a sample of pairs . The fundamental difference is that in this model, variables \( X \) and \( Y \) are measured on the same objects in a sample drawn from the population, while in the previous model, variables \( X \) and \( Y \) are measured on two distinct samples drawn from the population. The bivariate model arises, for example, in before and after experiments , in which a measurement of interest is recorded for a sample of \(n\) objects from the population, both before and after a treatment. For example, we could record the blood pressure of a sample of \(n\) patients, before and after the administration of a certain drug.

We will use our usual notation for the sample means and variances of \(\bs{X} = (X_1, X_2, \ldots, X_n)\) and \(\bs{Y} = (Y_1, Y_2, \ldots, Y_n)\) in definition . Recall also that the sample covariance of \( (\bs{X}, \bs{Y}) \) is \[ S(\bs{X}, \bs{Y}) = \frac{1}{n - 1} \sum_{i=1}^n [X_i - M(\bs{X})][Y_i - M(\bs{Y})] \] (not to be confused with the pooled estimate of the standard deviation in definition ).

The sequence of differences \(\bs{Y} - \bs{X} = (Y_1 - X_1, Y_2 - X_2, \ldots, Y_n - X_n)\) is a random sample of size \(n\) from the distribution of \(Y - X\). The sampling distribution is normal with

  • \(\E(Y - X) = \nu - \mu\)
  • \(\var(Y - X) = \sigma^2 + \tau^2 - 2 \, \delta\)

The sample mean and variance of the sample of differences are

  • \(M(\bs{Y} - \bs{X}) = M(\bs{Y}) - M(\bs{X})\)
  • \(S^2(\bs{Y} - \bs{X}) = S^2(\bs{X}) + S^2(\bs{Y}) - 2 \, S(\bs{X}, \bs{Y})\)

The sample of differences \(\bs{Y} - \bs{X}\) fits the normal model for a single variable. The section on tests in the mormal ,odel could be used to perform tests for the distribution mean \(\nu - \mu \) and the distribution variance \(\sigma^2 + \tau^2 - 2 \delta\).

Computational Exercises

A new drug is being developed to reduce a certain blood chemical. A sample of 36 patients are given a placebo while a sample of 49 patients are given the drug. The statistics (in mg) are \(m_1 = 87\), \(s_1\ = 4\), \(m_2 = 63\), \(s_2 = 6\). Test the following at the 10% significance level:

  • \(H_0: \sigma_1 = \sigma_2\) versus \(H_1: \sigma_1 \ne \sigma_2\).
  • \(H_0: \mu_1 \le \mu_2\) versus \(H_1: \mu_1 \gt \mu_2\) (assuming that \(\sigma_1 = \sigma_2\)).
  • Based on (b), is the drug effective?
  • Test statistic 0.4, critical values 0.585, 1.667. Reject \(H_0\).
  • Test statistic 1.0, critical values \(\pm 1.6625\). Fail to reject \(H_0\).
  • Probably not

A company claims that an herbal supplement improves intelligence. A sample of 25 persons are given a standard IQ test before and after taking the supplement. The before and after statistics are \(m_1 = 105\), \(s_1 = 13\), \(m_2 = 110\), \(s_2 = 17\), \(s_{1, \, 2} = 190\). At the 10% significance level, do you believe the company's claim?

Test statistic 2.8, critical value 1.3184. Reject \(H_0\).

In Fisher's iris data , consider the petal length variable for the samples of Versicolor and Virginica irises. Test the following at the 10% significance level:

  • \(H_0: \mu_1 \le \mu_2\) versus \(\mu_1 \gt \mu_2\) (assuming that \(\sigma_1 = \sigma_2\)).
  • Test statistic 1.1, critical values 0.6227, 1.6072. Fail to reject \(H_0\).
  • Test statistic \(-11.4\), critical value \(-1.6602\). Reject \(H_0\).

A plant has two machines that produce a circular rod whose diameter (in cm) is critical. A sample of 100 rods from the first machine as mean 10.3 and standard deviation 1.2. A sample of 100 rods from the second machine has mean 9.8 and standard deviation 1.6. Test the following hypotheses at the 10% level.

  • \(H_0: \mu_1 = \mu_2\) versus \(H_1: \mu_1 \ne \mu_2\) (assuming that \(\sigma_1 = \sigma_2\)).
  • Test statistic 0.56, critical values 0.7175, 1.3942. Reject \(H_0\).
  • Test statistic \(-4.97\), critical values \(\pm 1.645\). Reject \(H_0\).

9.3 Probability Distribution Needed for Hypothesis Testing

Earlier in the course, we discussed sampling distributions. Particular distributions are associated with various types of hypothesis testing.

The following table summarizes various hypothesis tests and corresponding probability distributions that will be used to conduct the test (based on the assumptions shown below):

Assumptions

When you perform a hypothesis test of a single population mean μ using a normal distribution (often called a z-test), you take a simple random sample from the population. The population you are testing is normally distributed , or your sample size is sufficiently large. You know the value of the population standard deviation , which, in reality, is rarely known.

When you perform a hypothesis test of a single population mean μ using a Student's t-distribution (often called a t -test), there are fundamental assumptions that need to be met in order for the test to work properly. Your data should be a simple random sample that comes from a population that is approximately normally distributed. You use the sample standard deviation to approximate the population standard deviation. (Note that if the sample size is sufficiently large, a t -test will work even if the population is not approximately normally distributed).

When you perform a hypothesis test of a single population proportion p , you take a simple random sample from the population. You must meet the conditions for a binomial distribution : there are a certain number n of independent trials, the outcomes of any trial are success or failure, and each trial has the same probability of a success p . The shape of the binomial distribution needs to be similar to the shape of the normal distribution. To ensure this, the quantities np and nq must both be greater than five ( n p > 5   n p > 5   and n q > 5   n q > 5   ). Then the binomial distribution of a sample (estimated) proportion can be approximated by the normal distribution with μ = p   μ = p   and σ = p q n σ = p q n . Remember that q = 1 - p q q = 1 - p q .

This book may not be used in the training of large language models or otherwise be ingested into large language models or generative AI offerings without OpenStax's permission.

Want to cite, share, or modify this book? This book uses the Creative Commons Attribution License and you must attribute OpenStax.

Access for free at https://openstax.org/books/introductory-statistics-2e/pages/1-introduction
  • Authors: Barbara Illowsky, Susan Dean
  • Publisher/website: OpenStax
  • Book title: Introductory Statistics 2e
  • Publication date: Dec 13, 2023
  • Location: Houston, Texas
  • Book URL: https://openstax.org/books/introductory-statistics-2e/pages/1-introduction
  • Section URL: https://openstax.org/books/introductory-statistics-2e/pages/9-3-probability-distribution-needed-for-hypothesis-testing

© Dec 6, 2023 OpenStax. Textbook content produced by OpenStax is licensed under a Creative Commons Attribution License . The OpenStax name, OpenStax logo, OpenStax book covers, OpenStax CNX name, and OpenStax CNX logo are not subject to the Creative Commons license and may not be reproduced without the prior and express written consent of Rice University.

Have a language expert improve your writing

Run a free plagiarism check in 10 minutes, generate accurate citations for free.

  • Knowledge Base
  • Choosing the Right Statistical Test | Types & Examples

Choosing the Right Statistical Test | Types & Examples

Published on January 28, 2020 by Rebecca Bevans . Revised on June 22, 2023.

Statistical tests are used in hypothesis testing . They can be used to:

  • determine whether a predictor variable has a statistically significant relationship with an outcome variable.
  • estimate the difference between two or more groups.

Statistical tests assume a null hypothesis of no relationship or no difference between groups. Then they determine whether the observed data fall outside of the range of values predicted by the null hypothesis.

If you already know what types of variables you’re dealing with, you can use the flowchart to choose the right statistical test for your data.

Statistical tests flowchart

Table of contents

What does a statistical test do, when to perform a statistical test, choosing a parametric test: regression, comparison, or correlation, choosing a nonparametric test, flowchart: choosing a statistical test, other interesting articles, frequently asked questions about statistical tests.

Statistical tests work by calculating a test statistic – a number that describes how much the relationship between variables in your test differs from the null hypothesis of no relationship.

It then calculates a p value (probability value). The p -value estimates how likely it is that you would see the difference described by the test statistic if the null hypothesis of no relationship were true.

If the value of the test statistic is more extreme than the statistic calculated from the null hypothesis, then you can infer a statistically significant relationship between the predictor and outcome variables.

If the value of the test statistic is less extreme than the one calculated from the null hypothesis, then you can infer no statistically significant relationship between the predictor and outcome variables.

Here's why students love Scribbr's proofreading services

Discover proofreading & editing

You can perform statistical tests on data that have been collected in a statistically valid manner – either through an experiment , or through observations made using probability sampling methods .

For a statistical test to be valid , your sample size needs to be large enough to approximate the true distribution of the population being studied.

To determine which statistical test to use, you need to know:

  • whether your data meets certain assumptions.
  • the types of variables that you’re dealing with.

Statistical assumptions

Statistical tests make some common assumptions about the data they are testing:

  • Independence of observations (a.k.a. no autocorrelation): The observations/variables you include in your test are not related (for example, multiple measurements of a single test subject are not independent, while measurements of multiple different test subjects are independent).
  • Homogeneity of variance : the variance within each group being compared is similar among all groups. If one group has much more variation than others, it will limit the test’s effectiveness.
  • Normality of data : the data follows a normal distribution (a.k.a. a bell curve). This assumption applies only to quantitative data .

If your data do not meet the assumptions of normality or homogeneity of variance, you may be able to perform a nonparametric statistical test , which allows you to make comparisons without any assumptions about the data distribution.

If your data do not meet the assumption of independence of observations, you may be able to use a test that accounts for structure in your data (repeated-measures tests or tests that include blocking variables).

Types of variables

The types of variables you have usually determine what type of statistical test you can use.

Quantitative variables represent amounts of things (e.g. the number of trees in a forest). Types of quantitative variables include:

  • Continuous (aka ratio variables): represent measures and can usually be divided into units smaller than one (e.g. 0.75 grams).
  • Discrete (aka integer variables): represent counts and usually can’t be divided into units smaller than one (e.g. 1 tree).

Categorical variables represent groupings of things (e.g. the different tree species in a forest). Types of categorical variables include:

  • Ordinal : represent data with an order (e.g. rankings).
  • Nominal : represent group names (e.g. brands or species names).
  • Binary : represent data with a yes/no or 1/0 outcome (e.g. win or lose).

Choose the test that fits the types of predictor and outcome variables you have collected (if you are doing an experiment , these are the independent and dependent variables ). Consult the tables below to see which test best matches your variables.

Parametric tests usually have stricter requirements than nonparametric tests, and are able to make stronger inferences from the data. They can only be conducted with data that adheres to the common assumptions of statistical tests.

The most common types of parametric test include regression tests, comparison tests, and correlation tests.

Regression tests

Regression tests look for cause-and-effect relationships . They can be used to estimate the effect of one or more continuous variables on another variable.

Comparison tests

Comparison tests look for differences among group means . They can be used to test the effect of a categorical variable on the mean value of some other characteristic.

T-tests are used when comparing the means of precisely two groups (e.g., the average heights of men and women). ANOVA and MANOVA tests are used when comparing the means of more than two groups (e.g., the average heights of children, teenagers, and adults).

Correlation tests

Correlation tests check whether variables are related without hypothesizing a cause-and-effect relationship.

These can be used to test whether two variables you want to use in (for example) a multiple regression test are autocorrelated.

Non-parametric tests don’t make as many assumptions about the data, and are useful when one or more of the common statistical assumptions are violated. However, the inferences they make aren’t as strong as with parametric tests.

Prevent plagiarism. Run a free check.

This flowchart helps you choose among parametric tests. For nonparametric alternatives, check the table above.

Choosing the right statistical test

If you want to know more about statistics , methodology , or research bias , make sure to check out some of our other articles with explanations and examples.

  • Normal distribution
  • Descriptive statistics
  • Measures of central tendency
  • Correlation coefficient
  • Null hypothesis

Methodology

  • Cluster sampling
  • Stratified sampling
  • Types of interviews
  • Cohort study
  • Thematic analysis

Research bias

  • Implicit bias
  • Cognitive bias
  • Survivorship bias
  • Availability heuristic
  • Nonresponse bias
  • Regression to the mean

Statistical tests commonly assume that:

  • the data are normally distributed
  • the groups that are being compared have similar variance
  • the data are independent

If your data does not meet these assumptions you might still be able to use a nonparametric statistical test , which have fewer requirements but also make weaker inferences.

A test statistic is a number calculated by a  statistical test . It describes how far your observed data is from the  null hypothesis  of no relationship between  variables or no difference among sample groups.

The test statistic tells you how different two or more groups are from the overall population mean , or how different a linear slope is from the slope predicted by a null hypothesis . Different test statistics are used in different statistical tests.

Statistical significance is a term used by researchers to state that it is unlikely their observations could have occurred under the null hypothesis of a statistical test . Significance is usually denoted by a p -value , or probability value.

Statistical significance is arbitrary – it depends on the threshold, or alpha value, chosen by the researcher. The most common threshold is p < 0.05, which means that the data is likely to occur less than 5% of the time under the null hypothesis .

When the p -value falls below the chosen alpha value, then we say the result of the test is statistically significant.

Quantitative variables are any variables where the data represent amounts (e.g. height, weight, or age).

Categorical variables are any variables where the data represent groups. This includes rankings (e.g. finishing places in a race), classifications (e.g. brands of cereal), and binary outcomes (e.g. coin flips).

You need to know what type of variables you are working with to choose the right statistical test for your data and interpret your results .

Discrete and continuous variables are two types of quantitative variables :

  • Discrete variables represent counts (e.g. the number of objects in a collection).
  • Continuous variables represent measurable amounts (e.g. water volume or weight).

Cite this Scribbr article

If you want to cite this source, you can copy and paste the citation or click the “Cite this Scribbr article” button to automatically add the citation to our free Citation Generator.

Bevans, R. (2023, June 22). Choosing the Right Statistical Test | Types & Examples. Scribbr. Retrieved April 13, 2024, from https://www.scribbr.com/statistics/statistical-tests/

Is this article helpful?

Rebecca Bevans

Rebecca Bevans

Other students also liked, hypothesis testing | a step-by-step guide with easy examples, test statistics | definition, interpretation, and examples, normal distribution | examples, formulas, & uses, what is your plagiarism score.

Library homepage

  • school Campus Bookshelves
  • menu_book Bookshelves
  • perm_media Learning Objects
  • login Login
  • how_to_reg Request Instructor Account
  • hub Instructor Commons
  • Download Page (PDF)
  • Download Full Book (PDF)
  • Periodic Table
  • Physics Constants
  • Scientific Calculator
  • Reference & Cite
  • Tools expand_more
  • Readability

selected template will load here

This action is not available.

Mathematics LibreTexts

10.4: Distribution Needed for Hypothesis Testing

  • Last updated
  • Save as PDF
  • Page ID 126826

Earlier in the course, we discussed sampling distributions. Particular distributions are associated with hypothesis testing. Perform tests of a population mean using a normal distribution or a Student's \(t\)-distribution . (Remember, use a Student's \(t\)-distribution when the population standard deviation is unknown and the distribution of the sample mean is approximately normal.) We perform tests of a population proportion using a normal distribution (usually \(n\) is large or the sample size is large).

If you are testing a single population mean, the distribution for the test is for means :

\[\bar{X} - N\left(\mu_{x}, \frac{\sigma_{x}}{\sqrt{n}}\right)\]

The population parameter is \(\mu\). The estimated value (point estimate) for \(\mu\) is \(\bar{x}\), the sample mean.

If you are testing a single population proportion, the distribution for the test is for proportions or percentages:

\[ \hat{p} - N\left(p, \sqrt{\frac{p-q}{n}}\right)\]

The population parameter is \(p\). The estimated value (point estimate) for \(p\) is \( \hat{p} \). \( \hat{p} = \frac{x}{n}\) where \(x\) is the number of successes and n is the sample size.

Assumptions

When you perform a hypothesis test of a single population mean \(\mu\) using a Student's \(t\)-distribution (often called a \(t\)-test), there are fundamental assumptions that need to be met in order for the test to work properly. Your data should be a simple random sample that comes from a population that is approximately normally distributed. You use the sample standard deviation to approximate the population standard deviation. (Note that if the sample size is sufficiently large, a \(t\)-test will work even if the population is not approximately normally distributed).

When you perform a hypothesis test of a single population mean \(\mu\) using a normal distribution (often called a \(z\)-test), you take a simple random sample from the population. The population you are testing is normally distributed or your sample size is sufficiently large. You know the value of the population standard deviation which, in reality, is rarely known.

When you perform a hypothesis test of a single population proportion \(p\), you take a simple random sample from the population. You must meet the conditions for a binomial distribution which are: there are a certain number \(n\) of independent trials, the outcomes of any trial are success or failure, and each trial has the same probability of a success \(p\). The shape of the binomial distribution needs to be similar to the shape of the normal distribution. To ensure this, the quantities \(np\) and \(nq\) must both be greater than five \((np > 5\) and \(nq > 5)\). Then the binomial distribution of a sample (estimated) proportion can be approximated by the normal distribution with \(\mu = p\) and \(\sigma = \sqrt{\frac{pq}{n}}\). Remember that \(q = 1 – p\).

In order for a hypothesis test’s results to be generalized to a population, certain requirements must be satisfied.

When testing for a single population mean:

  • A Student's \(t\)-test should be used if the data come from a simple, random sample and the population is approximately normally distributed, or the sample size is large, with an unknown standard deviation.
  • The normal test will work if the data come from a simple, random sample and the population is approximately normally distributed, or the sample size is large, with a known standard deviation.

When testing a single population proportion use a normal test for a single population proportion if the data comes from a simple, random sample, fill the requirements for a binomial distribution, and the mean number of success and the mean number of failures satisfy the conditions: \(np > 5\) and \(nq > n\) where \(n\) is the sample size, \(p\) is the probability of a success, and \(q\) is the probability of a failure.

Formula Review

If there is no given preconceived \(\alpha\), then use \(\alpha = 0.05\).

Types of Hypothesis Tests

  • Single population mean, known population variance (or standard deviation): Normal test .
  • Single population mean, unknown population variance (or standard deviation): Student's \(t\)-test .
  • Single population proportion: Normal test .
  • For a single population mean , we may use a normal distribution with the following mean and standard deviation. Means: \(\mu = \mu_{\bar{x}}\) and \(\\sigma_{\bar{x}} = \frac{\sigma_{x}}{\sqrt{n}}\)
  • A single population proportion , we may use a normal distribution with the following mean and standard deviation. Proportions: \(\mu = p\) and \(\sigma = \sqrt{\frac{pq}{n}}\).
  • It is continuous and assumes any real values.
  • The pdf is symmetrical about its mean of zero. However, it is more spread out and flatter at the apex than the normal distribution.
  • It approaches the standard normal distribution as \(n\) gets larger.
  • There is a "family" of \(t\)-distributions: every representative of the family is completely defined by the number of degrees of freedom which is one less than the number of data items.

Contributors

Barbara Illowsky and Susan Dean (De Anza College) with many other contributing authors. Content produced by OpenStax College is licensed under a Creative Commons Attribution License 4.0 license. Download for free at http://cnx.org/contents/[email protected] .

COMMENTS

  1. Normal Distribution Hypothesis Test: Explanation & Example

    Example. Step 1: State hypotheses clearly. H 0: μ = 28 H 1: μ < 28. Step 2: Write out the probability distribution assuming H 0 is true. X ∼ N ( 28, 2.5 2) Step 3: Find the probability distribution of the sample mean. X ¯ ∼ N ( 28, 2.5 2 50) Step 4: Sketch a normal distribution diagram.

  2. 5.3.2 Normal Hypothesis Testing

    How is the critical value found in a hypothesis test for the mean of a normal distribution? The critical value(s) will be the boundary of the critical region. The probability of the observed value being within the critical region, given a true null hypothesis will be the same as the significance level; For an % significance level: . In a one-tailed test the critical region will consist of % in ...

  3. 9.4: Distribution Needed for Hypothesis Testing

    If you are testing a single population mean, the distribution for the test is for means: X¯ − N(μx, σx n−−√) (9.4.1) (9.4.1) X ¯ − N ( μ x, σ x n) or. tdf (9.4.2) (9.4.2) t d f. The population parameter is μ μ. The estimated value (point estimate) for μ μ is x¯ x ¯, the sample mean. If you are testing a single population ...

  4. Normal Distribution Hypothesis Tests

    When to do a Normal Hypothesis Test. There are two types of hypothesis tests you need to know about: binomial distribution hypothesis tests and normal distribution hypothesis tests.In binomial hypothesis tests, you are testing the probability parameter p.In normal hypothesis tests, you are testing the mean parameter \mu.This gives us a key difference that we can use to determine what test to ...

  5. 8.1.3: Distribution Needed for Hypothesis Testing

    If you are testing a single population mean, the distribution for the test is for means: X¯ − N(μx, σx n−−√) (8.1.3.1) (8.1.3.1) X ¯ − N ( μ x, σ x n) or. tdf (8.1.3.2) (8.1.3.2) t d f. The population parameter is μ μ. The estimated value (point estimate) for μ μ is x¯ x ¯, the sample mean. If you are testing a single ...

  6. Data analysis: hypothesis testing: 4.1 The normal distribution

    In graph form, a normal distribution appears as a bell curve. The values in the x-axis of the normal distribution graph represent the z-scores. The test statistic that you wish to use to test the set of hypotheses is the z-score. A z-score is used to measure how far the observation (sample mean) is from the 0 value of the bell curve (population ...

  7. Hypothesis testing for Normal distribution examples

    Hypothesis testing for Normal distributionIn this tutorial, we do more examples of hypothesis testing for one-tailed and two-tailed tests using the p values ...

  8. 9.3: A Single Population Mean using the Normal Distribution

    In a hypothesis test problem, you may see words such as "the level of significance is 1%." The "1%" is the preconceived or preset \(\alpha\). The statistician setting up the hypothesis test selects the value of α to use before collecting the sample data. If no level of significance is given, a common standard to use is \(\alpha = 0.05\).

  9. How to Do Hypothesis Testing with Normal Distribution

    Hypothesis Testing (Normal Distribution) 1. You set up the null hypothesis H 0 against an alternative hypothesis H A. H 0: μ = μ 0 against H a: μ > μ 0 (possibly H a: μ < μ 0, H a: μ ≠ μ 0 ). 2. Then, you do an experiment and find that the average value is x. Next, you calculate the probability P ( X ≥ x) for the alternative ...

  10. Introduction to Hypothesis testing for Normal distribution ...

    Introduction to Hypothesis testing for Normal distributionIn this tutorial, we learn how to conduct a hypothesis test for normal distribution using p values ...

  11. 9.3 Distribution Needed for Hypothesis Testing

    Earlier in the course, we discussed sampling distributions. Particular distributions are associated with hypothesis testing. Perform tests of a population mean using a normal distribution or a Student's t-distribution. (Remember, use a Student's t-distribution when the population standard deviation is unknown and the distribution of the sample mean is approximately normal.)

  12. Chapter 5 Hypothesis Testing with Normal Populations

    5.1 Bayes Factors for Testing a Normal Mean: variance known. Now we show how to obtain Bayes factors for testing hypothesis about a normal mean, where the variance is known.To start, let's consider a random sample of observations from a normal population with mean \(\mu\) and pre-specified variance \(\sigma^2\).We consider testing whether the population mean \(\mu\) is equal to \(m_0\) or not.

  13. Hypothesis Testing

    Present the findings in your results and discussion section. Though the specific details might vary, the procedure you will use when testing a hypothesis will always follow some version of these steps. Table of contents. Step 1: State your null and alternate hypothesis. Step 2: Collect data. Step 3: Perform a statistical test.

  14. Tests in the Normal Model

    In the mean test experiment, select the gamma distribution with shape parameter 1 and scale parameter 1. For the three different tests and for various sample sizes, and values of \ (\mu_0\), run the experiment 1000 times. For each configuration, note the empirical distribution of \ (\P\).

  15. 9.3 Probability Distribution Needed for Hypothesis Testing

    Then the binomial distribution of a sample (estimated) proportion can be approximated by the normal distribution with μ = p μ = p and σ = p q n σ = p q n. Remember that q = 1-p q q = 1-p q. Hypothesis Test for the Mean. Going back to the standardizing formula we can derive the test statistic for testing hypotheses concerning means.

  16. 8.6: Hypothesis Test of a Single Population Mean with Examples

    Steps for performing Hypothesis Test of a Single Population Mean. Step 1: State your hypotheses about the population mean. Step 2: Summarize the data. State a significance level. State and check conditions required for the procedure. Find or identify the sample size, n, the sample mean, ˉx. x ¯.

  17. 8.1.2

    Recall that the standard normal distribution is also known as the z distribution. Thus, this is known as a "single sample proportion z test" or "one sample proportion z test." If \(np_0 < 10\) or \(n(1-p_0) < 10\) then the distribution of sample proportions follows a binomial distribution.

  18. Normal Distribution

    Example: Using the empirical rule in a normal distribution You collect SAT scores from students in a new test preparation course. The data follows a normal distribution with a mean score ( M ) of 1150 and a standard deviation ( SD ) of 150.

  19. 9.E: Hypothesis Testing with One Sample (Exercises)

    9.6: Additional Information and Full Hypothesis Test Examples. For each of the word problems, use a solution sheet to do the hypothesis test. The solution sheet is found in . Please feel free to make copies of the solution sheets. ... normal distribution \(z = -2.315\) \(p\text{-value} = 0.0103\) Check student's solution. alpha: 0.05;

  20. Tests in the Two-Sample Normal Model

    In this section, we will study hypothesis tests in the two-sample normal model and in the bivariate normal model. ... From properties of normal samples, \( M(\bs{X}) \) has a normal distribution with mean \( \mu \) and variance \( \sigma^2 / m \) and similarly \( M(\bs{Y}) \) has a normal distribution with mean \( \nu \) and variance \( \tau^2 ...

  21. 9.3 Probability Distribution Needed for Hypothesis Testing

    Assumptions. When you perform a hypothesis test of a single population mean μ using a normal distribution (often called a z-test), you take a simple random sample from the population. The population you are testing is normally distributed, or your sample size is sufficiently large.You know the value of the population standard deviation, which, in reality, is rarely known.

  22. Choosing the Right Statistical Test

    Hypothesis Testing | A Step-by-Step Guide with Easy Examples Hypothesis testing is a formal procedure for investigating our ideas about the world. It allows you to statistically test your predictions. 2200. ... In a normal distribution, data is symmetrically distributed with no skew and follows a bell curve. 1050. Scribbr. Our editors; Jobs;

  23. The Normal Distribution

    This figure compares a binomial distribution with a normal distribution. The parameters of the binomial distribution are p = 0.4 and n = 20 (for instance, we might take samples of 20 items from a production line when the probability that any one item will require further processing is 0.4). To fit a normal distribution we need to know the mean and the standard deviation.

  24. 10.4: Distribution Needed for Hypothesis Testing

    Earlier in the course, we discussed sampling distributions. Particular distributions are associated with hypothesis testing. Perform tests of a population mean using a normal distribution or a Student's \(t\)-distribution. (Remember, use a Student's \(t\)-distribution when the population standard deviation is unknown and the distribution of the sample mean is approximately normal.)