Hypothesis Testing for a proportion Calculator

Camera for Math Problems

Enter x (# of successes) Enter n (sample size) Enter H Enter α
p

How does the Hypothesis Testing for a proportion Calculator work?

What 2 formulas are used for the hypothesis testing for a proportion calculator, what 6 concepts are covered in the hypothesis testing for a proportion calculator.

hypothesis testing calculator population proportion

An Automated Online Math Tutor serving 8.1 million parents and students in 235 countries and territories.

hypothesis testing calculator population proportion

Our Services

  • All Subjects
  • A.I. Training Data and Analytics
  • Get Paid as an Affiliate

Top Categories

  • Trigonometry
  • Pre-Algebra
  • Pre-Calculus
  • Post a Math Problem

Scroll to Top

Z-test: One Population Proportion

Instructions: This calculator conducts a Z-test for one population proportion (p). Please select the null and alternative hypotheses, type the hypothesized population proportion \(p_0\), the significance level \(\alpha\), the sample proportion or number o favorable cases, and the sample size, and the results of the z-test for one proportion will be displayed for you:

hypothesis testing calculator population proportion

Z-Test for One Population Proportion

More about the z-test for one population proportion so you can better interpret the results obtained by this solver: A z-test for one proportion is a hypothesis test that attempts to make a claim about the population proportion (p) for a certain population attribute (proportion of males, proportion of people underage). The test has two non-overlapping hypotheses, the null and the alternative hypothesis. The null hypothesis is a statement about the population proportion, which corresponds to the assumption of no effect, and the alternative hypothesis is the complementary hypothesis to the null hypothesis. The main properties of a one sample z-test for one population proportion are:

  • Depending on our knowledge about the "no effect" situation, the z-test can be two-tailed, left-tailed or right-tailed
  • The main principle of hypothesis testing is that the null hypothesis is rejected if the test statistic obtained is sufficiently unlikely under the assumption that the null hypothesis is true
  • The sampling distribution used to construct the test statistics is approximately normal
  • The p-value is the probability of obtaining sample results as extreme or more extreme than the sample results obtained, under the assumption that the null hypothesis is true
  • In a hypothesis tests there are two types of errors. Type I error occurs when we reject a true null hypothesis, and the Type II error occurs when we fail to reject a false null hypothesis

The formula for a z-statistic is

The null hypothesis is rejected when the z-statistic lies on the rejection region, which is determined by the significance level (\(\alpha\)) and the type of tail (two-tailed, left-tailed or right-tailed).

This one proportion z test calculator will allow you to compute the critical values are p-values for this one sample proportion test, that will help you decide whether or not the sample data provides enough evidence to reject the null hypothesis. If instead, what you want to do is to compare two sample proportions, you can use this z-test for two proportions calculator , which will help you assess whether the two sample proportions differ significantly.

Related Calculators

Descriptive Statistics Calculator of Grouped Data

log in to your account

Reset password.

hypothesis testing calculator population proportion

  • Calculators
  • Descriptive Statistics
  • Merchandise
  • Which Statistics Test?

Z Score Calculator for 2 Population Proportions

This is a simple z score calculator that calculates the value of z (and associated p value) for two population proportions.

Further Information

The z score test for two population proportions is used when you want to know whether two populations or groups (e.g., males and females; theists and atheists) differ significantly on some single (categorical) characteristic - for example, whether they are vegetarians.

Requirements

  • A random sample of each of the population groups to be compared.
  • Categorial data

Null Hypothesis

H0: p1 - p2 = 0, where p1 is the proportion from the first population and p2 the proportion from the second.

As above, the null hypothesis tends to be that there is no difference between the two population proportions; or, more formally, that the difference is zero (so, for example, that there is no difference between the proportion of males who are vegetarian and the proportion of females who are vegetarian). 

hypothesis testing calculator population proportion

Logo for Open Library Publishing Platform

Want to create or adapt books like this? Learn more about how Pressbooks supports open publishing practices.

8.8 Hypothesis Tests for a Population Proportion

Learning objectives.

  • Conduct and interpret hypothesis tests for a population proportion.

Some notes about conducting a hypothesis test:

  • The null hypothesis [latex]H_0[/latex] is always an “equal to.”  The null hypothesis is the original claim about the population parameter.
  • The alternative hypothesis [latex]H_a[/latex] is a “less than,” “greater than,” or “not equal to.”  The form of the alternative hypothesis depends on the context of the question.
  • If the alternative hypothesis is a “less than”, then the test is left-tail.  The p -value is the area in the left-tail of the distribution.
  • If the alternative hypothesis is a “greater than”, then the test is right-tail.  The p -value is the area in the right-tail of the distribution.
  • If the alternative hypothesis is a “not equal to”, then the test is two-tail.  The p -value is the sum of the area in the two-tails of the distribution.  Each tail represents exactly half of the p -value.
  • Think about the meaning of the p -value.  A data analyst (and anyone else) should have more confidence that they made the correct decision to reject the null hypothesis with a smaller p -value (for example, 0.001 as opposed to 0.04) even if using a significance level of 0.05. Similarly, for a large p -value such as 0.4, as opposed to a p -value of 0.056 (a significance level of 0.05 is less than either number), a data analyst should have more confidence that they made the correct decision in not rejecting the null hypothesis. This makes the data analyst use judgment rather than mindlessly applying rules.
  • The significance level must be identified before collecting the sample data and conducting the test.  Generally, the significance level will be included in the question.  If no significance level is given, a common standard is to use a significance level of 5%.

Suppose the hypotheses for a hypothesis test are:

[latex]\begin{eqnarray*} H_0: & & p=20 \% \\ H_a: & & p \gt 20\% \end{eqnarray*}[/latex]

Because the alternative hypothesis is a [latex]\gt[/latex], this is a right-tail test.  The p -value is the area in the right-tail of the distribution.

Normal distribution curve of a single population proportion with the value of 0.2 on the x-axis. The p-value points to the area on the right tail of the curve.

[latex]\begin{eqnarray*} H_0: & & p=50 \% \\ H_a: & & p \neq  50\% \end{eqnarray*}[/latex]

Because the alternative hypothesis is a [latex]\neq[/latex], this is a two-tail test.  The p -value is the sum of the areas in the two tails of the distribution.  Each tail contains exactly half of the p -value.

Normal distribution curve of a single population mean with a value of 50 on the x-axis. The p-value formulas, 1/2(p-value), for a two-tailed test is shown for the areas on the left and right tails of the curve.

[latex]\begin{eqnarray*} H_0: & & p=10\% \\ H_a: & & p \lt  10\% \end{eqnarray*}[/latex]

Because the alternative hypothesis is a [latex]\lt[/latex], this is a left-tail test.  The p -value is the area in the left-tail of the distribution.

Steps to Conduct a Hypothesis Test for a Population Proportion

  • Write down the null and alternative hypotheses in terms of the population proportion [latex]p[/latex].  Include appropriate units with the values of the proportion.
  • Use the form of the alternative hypothesis to determine if the test is left-tailed, right-tailed, or two-tailed.
  • Collect the sample information for the test and identify the significance level.
  • If [latex]n \times p \geq 5[/latex] and [latex]n \times (1-p) \geq 5[/latex], use the normal distribution with [latex]\displaystyle{z=\frac{\hat{p}-p}{\sqrt{\frac{p \times (1-p)}{n}}}}[/latex].
  • If one of [latex]n \times p \lt 5[/latex] or [latex]n \times (1-p) \lt 5[/latex], use a binomial distribution.
  • The results of the sample data are significant.  There is sufficient evidence to conclude that the null hypothesis [latex]H_0[/latex] is an incorrect belief and that the alternative hypothesis [latex]H_a[/latex] is most likely correct.
  • The results of the sample data are not significant.  There is not sufficient evidence to conclude that the alternative hypothesis [latex]H_a[/latex] may be correct.
  • Write down a concluding sentence specific to the context of the question.

USING EXCEL TO CALCULE THE P -VALUE FOR A HYPOTHESIS TEST ON A POPULATION PROPORTION

The p -value for a hypothesis test on a population proportion is the area in the tail(s) of distribution of the sample proportion.  If both [latex]n \times p \geq 5[/latex] and [latex]n \times (1-p) \geq 5[/latex], use the normal distribution to find the p -value.  If at least one of [latex]n \times p \lt 5[/latex] or [latex]n \times (1-p) \lt 5[/latex], use the binomial distribution to find the p -value.

If both [latex]n \times p \geq 5[/latex] and [latex]n \times (1-p) \geq 5[/latex]:

  • For x , enter the value for [latex]\hat{p}[/latex].
  • For [latex]\mu[/latex] , enter the mean of the sample proportions [latex]p[/latex].  Note:  Because the test is run assuming the null hypothesis is true, the value for [latex]p[/latex] is the claim from the null hypothesis.
  • For [latex]\sigma[/latex] , enter the standard error of the proportions [latex]\displaystyle{\sqrt{\frac{p \times (1-p)}{n}}}[/latex].
  • For the logic operator , enter true .  Note:  Because we are calculating the area under the curve, we always enter true for the logic operator.
  • Use the appropriate technique with the norm.dist function to find the area in the left-tail or the area in the right-tail.

If at least one of [latex]n \times p \lt 5[/latex] or [latex]n \times (1-p) \lt 5[/latex]:

  • The p -value is found using the binomial distribution.
  • For x , enter the number of successes.
  • For n , enter the sample size.
  • For p , enter the the value of the population proportion [latex]p[/latex] from the null hypothesis.
  • For the logic operator , enter true .  Note:  Because we are calculating an at most probability, the logic operator is always true.
  • For p , enter the the value of the population proportion [latex]p[/latex] in the null hypothesis.
  • For the logic operator , enter true .  Note:  Because we are calculating an at least probability, the logic operator is always true.

Marketers believe that 92% of adults own a cell phone.  A cell phone manufacturer believes that number is actually lower.  In a sample of 200 adults, 87% own a cell phone.  At the 1% significance level, determine if the proportion of adults that own a cell phone is lower than the marketers’ claim.

Hypotheses:

[latex]\begin{eqnarray*} H_0: & & p=92\% \mbox{ of adults own a cell phone} \\ H_a: & & p \lt 92\% \mbox{ of adults own a cell phone} \end{eqnarray*}[/latex]

From the question, we have [latex]n=200[/latex], [latex]\hat{p}=0.87[/latex], and [latex]\alpha=0.01[/latex].

To determine the distribution, we check [latex]n \times p[/latex] and [latex]n \times (1-p)[/latex].  For the value of [latex]p[/latex], we use the claim from the null hypothesis ([latex]p=0.92[/latex]).

[latex]\begin{eqnarray*} n \times p & = & 200 \times 0.92=184 \geq 5 \\ n \times (1-p) & = & 200 \times (1-0.92)=16 \geq 5\end{eqnarray*}[/latex]

Because both [latex]n \times p \geq 5[/latex] and [latex]n \times (1-p)  \geq 5[/latex] we use a normal distribution to calculate the p -value.  Because the alternative hypothesis is a [latex]\lt[/latex], the p -value is the area in the left tail of the distribution.

This is a normal distribution curve. On the left side of the center a vertical line extends to the curve with the area to the left of this vertical line shaded. The p-value equals the area of this shaded region.

norm.dist
0.87 0.0046
0.92
sqrt(0.92*(1-0.92)/200)
true

So the p -value[latex]=0.0046[/latex].

Conclusion:

Because p -value[latex]=0.0046 \lt 0.01=\alpha[/latex], we reject the null hypothesis in favour of the alternative hypothesis.  At the 1% significance level there is enough evidence to suggest that the proportion of adults who own a cell phone is lower than 92%.

  • The null hypothesis [latex]p=92\%[/latex] is the claim that 92% of adults own a cell phone.
  • The alternative hypothesis [latex]p \lt 92\%[/latex] is the claim that less than 92% of adults own a cell phone.
  • The function is norm.dist because we are finding the area in the left tail of a normal distribution.
  • Field 1 is the value of [latex]\hat{p}[/latex].
  • Field 2 is the value of [latex]p[/latex] from the null hypothesis.  Remember, we run the test assuming the null hypothesis is true, so that means we assume [latex]p=0.92[/latex].
  • Field 3 is the standard deviation for the sample proportions [latex]\displaystyle{\sqrt{\frac{p \times (1-p)}{n}}}[/latex].
  • The p -value of 0.0046 tells us that under the assumption that 92% of adults own a cell phone (the null hypothesis), there is only a 0.46% chance that the proportion of adults who own a cell phone in a sample of 200 is 87% or less.  This is a small probability, and so is unlikely to happen assuming the null hypothesis is true.  This suggests that the assumption that the null hypothesis is true is most likely incorrect, and so the conclusion of the test is to reject the null hypothesis in favour of the alternative hypothesis.  In other words, the proportion of adults who own a cell phone is most likely less than 92%.

A consumer group claims that the proportion of households that have at least three cell phones is 30%.  A cell phone company has reason to believe that the proportion of households with at least three cell phones is much higher.  Before they start a big advertising campaign based on the proportion of households that have at least three cell phones, they want to test their claim.  Their marketing people survey 150 households with the result that 54 of the households have at least three cell phones.  At the 1% significance level, determine if the proportion of households that have at least three cell phones is less than 30%.

[latex]\begin{eqnarray*} H_0: & & p=30\% \mbox{ of household have at least 3 cell phones} \\ H_a: & & p \gt 30\% \mbox{ of household have at least 3 cell phones} \end{eqnarray*}[/latex]

From the question, we have [latex]n=150[/latex], [latex]\displaystyle{\hat{p}=\frac{54}{150}=0.36}[/latex], and [latex]\alpha=0.01[/latex].

To determine the distribution, we check [latex]n \times p[/latex] and [latex]n \times (1-p)[/latex].  For the value of [latex]p[/latex], we use the claim from the null hypothesis ([latex]p=0.3[/latex]).

[latex]\begin{eqnarray*} n \times p & = & 150 \times 0.3=45 \geq 5 \\ n \times (1-p) & = & 150 \times (1-0.3)=105 \geq 5\end{eqnarray*}[/latex]

Because both [latex]n \times p \geq 5[/latex] and [latex]n \times (1-p)  \geq  5[/latex] we use a normal distribution to calculate the p -value.  Because the alternative hypothesis is a [latex]\gt[/latex], the p -value is the area in the right tail of the distribution.

This is a normal distribution curve. On the right side of the center a vertical line extends to the curve with the area to the right of this vertical line shaded. The p-value equals the area of this shaded region.

1-norm.dist
0.36 0.0544
0.3
sqrt(0.3*(1-0.3)/150)
true

So the p -value[latex]=0.0544[/latex].

Because p -value[latex]=0.0544 \gt 0.01=\alpha[/latex], we do not reject the null hypothesis.  At the 1% significance level there is not enough evidence to suggest that the proportion of households with at least three cell phones is more than 30%.

  • The null hypothesis [latex]p=30\%[/latex] is the claim that 30% of households have at least three cell phones.
  • The alternative hypothesis [latex]p \gt 30\%[/latex] is the claim that more than 30% of households have at least three cell phones.
  • The function is 1-norm.dist because we are finding the area in the right tail of a normal distribution.
  • Field 2 is the value of [latex]p[/latex] from the null hypothesis.  Remember, we run the test assuming the null hypothesis is true, so that means we assume [latex]p=0.3[/latex].
  • The p -value of 0.0544 tells us that under the assumption that 30% of households have at least three cell phones (the null hypothesis), there is a 5.44% chance that the proportion of households with at least three cell phones in a sample of 150 is 36% or more.  Compared to the 1% significance level, this is a large probability, and so is likely to happen assuming the null hypothesis is true.  This suggests that the assumption that the null hypothesis is true is most likely correct, and so the conclusion of the test is to not reject the null hypothesis.  In other words, the claim that 30% of households have at least three cell phones is most likely correct.

A teacher believes that 70% of students in the class will want to go on a field trip to the local zoo.  The students in the class believe the proportion is much higher and ask the teacher to verify her claim.  The teacher samples 50 students and 39 reply that they would want to go to the zoo.  At the 5% significance level, determine if the proportion of students who want to go on the field trip is higher than 70%.

[latex]\begin{eqnarray*} H_0: & & p = 70\% \mbox{ of students want to go on the field trip}  \\ H_a: & & p \gt 70\% \mbox{ of students want to go on the field trip}   \end{eqnarray*}[/latex]

From the question, we have [latex]n=50[/latex], [latex]\displaystyle{\hat{p}=\frac{39}{50}=0.78}[/latex], and [latex]\alpha=0.05[/latex].

[latex]\begin{eqnarray*} n \times p & = & 50 \times 0.7=35 \geq 5 \\ n \times (1-p) & = & 50 \times (1-0.7)=15 \geq 5\end{eqnarray*}[/latex]

Because both [latex]n \times p \geq 5[/latex] and [latex]n \times (1-p)  \geq 5[/latex] we use a normal distribution to calculate the p -value.  Because the alternative hypothesis is a [latex]\gt[/latex], the p -value is the area in the right tail of the distribution.

1-norm.dist
0.78 0.1085
0.7
sqrt(0.7*(1-0.7)/50)
true

So the p -value[latex]=0.1085[/latex].

Because p -value[latex]=0.1085 \gt 0.05=\alpha[/latex], we do not reject the null hypothesis.  At the 5% significance level there is not enough evidence to suggest that the proportion of students who want to go on the field trip is higher than 70%.

  • The null hypothesis [latex]p=70\%[/latex] is the claim that 70% of the students want to go on the field trip.
  • The alternative hypothesis [latex]p \gt 70\%[/latex] is the claim that more than 70% of students want to go on the field trip.
  • The p -value of 0.1085 tells us that under the assumption that 70% of students want to go on the field trip (the null hypothesis), there is a 10.85% chance that the proportion of students who want to go on the field trip in a sample of 50 students is 78% or more.  Compared to the 5% significance level, this is a large probability, and so is likely to happen assuming the null hypothesis is true.  This suggests that the assumption that the null hypothesis is true is most likely correct, and so the conclusion of the test is to not reject the null hypothesis.  In other words, the teacher’s claim that 70% of students want to go on the field trip is most likely correct.

Joan believes that 50% of first-time brides in the United States are younger than their grooms.  She performs a hypothesis test to determine if the percentage is the same or different from 50%.  Joan samples 100 first-time brides and 56 reply that they are younger than their grooms.  Use a 5% significance level.

[latex]\begin{eqnarray*} H_0: & & p=50\% \mbox{ of first-time brides are younger than the groom} \\ H_a: & & p \neq 50\% \mbox{ of first-time brides are younger than the groom} \end{eqnarray*}[/latex]

From the question, we have [latex]n=100[/latex], [latex]\displaystyle{\hat{p}=\frac{56}{100}=0.56}[/latex], and [latex]\alpha=0.05[/latex].

To determine the distribution, we check [latex]n \times p[/latex] and [latex]n \times (1-p)[/latex].  For the value of [latex]p[/latex], we use the claim from the null hypothesis ([latex]p=0.5[/latex]).

[latex]\begin{eqnarray*} n \times p & = & 100 \times 0.5=50 \geq 5 \\ n \times (1-p) & = & 100 \times (1-0.5)=50 \geq 5\end{eqnarray*}[/latex]

Because both [latex]n \times p \geq 5[/latex] and [latex]n \times (1-p)  \geq 5[/latex] we use a normal distribution to calculate the p -value.  Because the alternative hypothesis is a [latex]\neq[/latex], the p -value is the sum of area in the tails of the distribution.

This is a normal distribution curve. On the left side of the center a vertical line extends to the curve with the area to the left of this vertical line shaded and labeled as one half of the p-value. On the right side of the center a vertical line extends to the curve with the area to the right of this vertical line shaded and labeled as one half of the p-value. The p-value equals the sum of area of these two shaded regions.

Because there is only one sample, we only have information relating to one of the two tails, either the left or the right.  We need to know if the sample relates to the left or right tail because that will determine how we calculate out the area of that tail using the normal distribution.  In this case, the sample proportion [latex]\hat{p}=0.56[/latex] is greater than the value of the population proportion in the null hypothesis [latex]p=0.5[/latex] ([latex]\hat{p}=0.56>0.5=p[/latex]), so the sample information relates to the right-tail of the normal distribution.  This means that we will calculate out the area in the right tail using 1-norm.dist .  However, this is a two-tailed test where the p -value is the sum of the area in the two tails and the area in the right-tail is only one half of the p -value.  The area in the left tail equals the area in the right tail and the p -value is the sum of these two areas.

1-norm.dist
0.56 0.1151
0.5
sqrt(0.5*(1-0.5)/100)
true

So the area in the right tail is 0.1151 and  [latex]\frac{1}{2}[/latex]( p -value)[latex]=0.1151[/latex].  This is also the area in the left tail, so

p -value[latex]=0.1151+0.1151=0.2302[/latex]

Because p -value[latex]=0.2302 \gt 0.05=\alpha[/latex], we do not reject the null hypothesis.  At the 5% significance level there is not enough evidence to suggest that the proportion of first-time brides that are younger than the groom is different from 50%.

  • The null hypothesis [latex]p=50\%[/latex] is the claim that the proportion of first-time brides that are younger than the groom is 50%.
  • The alternative hypothesis [latex]p \neq 50\%[/latex] is the claim that the proportion of first-time brides that are younger than the groom is different from 50%.
  • We use norm.dist([latex]\hat{p}[/latex],[latex]p[/latex],[latex]\mbox{sqrt}(p*(1-p)/n)[/latex],true) to find the area in the left tail.  The area in the right tail equals the area in the left tail, so we can find the p -value by adding the output from this function to itself.
  • We use 1-norm.dist([latex]\hat{p}[/latex],[latex]p[/latex],[latex]\mbox{sqrt}(p*(1-p)/n)[/latex],true) to find the area in the right tail.  The area in the left tail equals the area in the right tail, so we can find the p -value by adding the output from this function to itself.
  • The p -value of 0.2302  is a large probability compared to the 5% significance level, and so is likely to happen assuming the null hypothesis is true.  This suggests that the assumption that the null hypothesis is true is most likely correct, and so the conclusion of the test is to not reject the null hypothesis.  In other words, the claim that the proportion of first-time brides who are younger than the groom is most likely correct.

Watch this video: Hypothesis Testing for Proportions: z -test by ExcelIsFun [7:27] 

An online retailer believes that 93% of the visitors to its website will make a purchase.   A researcher in the marketing department thinks the actual percent is lower than claimed.  The researcher examines a sample of 50 visits to the website and finds that 45 of the visits resulted in a purchase.  At the 1% significance level, determine if the proportion of visits to the website that result in a purchase is lower than claimed.

[latex]\begin{eqnarray*} H_0: & & p=93\% \mbox{ of visitors make a purchase} \\ H_a: & & p \lt 93\% \mbox{ of visitors make a purchase} \end{eqnarray*}[/latex]

From the question, we have [latex]n=50[/latex], [latex]x=45[/latex], and [latex]\alpha=0.01[/latex].

To determine the distribution, we check [latex]n \times p[/latex] and [latex]n \times (1-p)[/latex].  For the value of [latex]p[/latex], we use the claim from the null hypothesis ([latex]p=0.93[/latex]).

[latex]\begin{eqnarray*} n \times p & = & 50 \times 0.93=46.5 \geq 5 \\ n \times (1-p) & = & 50 \times (1-0.93)=3.5 \lt 5\end{eqnarray*}[/latex]

Because [latex]n \times (1-p)  \lt 5[/latex] we use a binomial distribution to calculate the p -value.  Because the alternative hypothesis is a [latex]\lt[/latex], the p -value is the probability of getting at most 45 successes in 50 trials.

binom.dist
45 0.2710
50
0.93
true

So the p -value[latex]=0.2710[/latex].

Because p -value[latex]=0.2710 \gt 0.01=\alpha[/latex], we do not reject the null hypothesis.  At the 1% significance level there is not enough evidence to suggest that the proportion of visitors who make a purchase is lower than 93%.

  • The null hypothesis [latex]p=93\%[/latex] is the claim that 93% of visitors to the website make a purchase.
  • The alternative hypothesis [latex]p \lt 93\%[/latex] is the claim that less than 93% of visitors to the website make a purchase.
  • The function is binom.dist because we are finding the probability of at most 45 successes.
  • Field 1 is the number of successes [latex]x[/latex].
  • Field 2 is the sample size [latex]n[/latex].
  • Field 3 is the probability of success [latex]p[/latex].  This is the claim about the population proportion made in the null hypothesis, so that means we assume [latex]p=0.93[/latex].
  • The p -value of 0.2710 tells us that under the assumption that 93% of visitors make a purchase (the null hypothesis), there is a 27.10% chance that the number of visitors in a sample of 50 who make a purchase is 45 or less.  This is a large probability compared to the significance level, and so is likely to happen assuming the null hypothesis is true.  This suggests that the assumption that the null hypothesis is true is most likely correct, and so the conclusion of the test is to not reject the null hypothesis.  In other words, the proportion of visitors to the website who make a purchase adults is most likely 93%.

A drug company claims that only 4% of people who take their new drug experience any side effects from the drug.  A researcher believes that the percent is higher than drug company’s claim.  The researcher takes a sample of 80 people who take the drug and finds that 10% of the people in the sample experience side effects from the drug.  At the 5% significance level, determine if the proportion of people who experience side effects from taking the drug is higher than claimed.

[latex]\begin{eqnarray*} H_0: & & p=4\% \mbox{ of people experience side effects} \\ H_a: & & p \gt 4\% \mbox{ of people experience side effects} \end{eqnarray*}[/latex]

From the question, we have [latex]n=80[/latex], [latex]\hat{p}=0.1[/latex], and [latex]\alpha=0.05[/latex].

To determine the distribution, we check [latex]n \times p[/latex] and [latex]n \times (1-p)[/latex].  For the value of [latex]p[/latex], we use the claim from the null hypothesis ([latex]p=0.04[/latex]).

[latex]\begin{eqnarray*} n \times p & = & 80 \times 0.04=3.2 \lt 5\end{eqnarray*}[/latex]

Because [latex]n \times p  \lt 5[/latex] we use a binomial distribution to calculate the p -value.  Because the alternative hypothesis is a [latex]\gt[/latex], the p -value is the probability of getting at least 8 successes in 80 trials.  (Note:  In the sample of size 80, 10% have the characteristic of interest, so this means that [latex]80 \times 0.1=8[/latex] people in the sample have the characteristic of interest.)

1-binom.dist
7 0.0147
80
0.04
true

So the p -value[latex]=0.0147[/latex].

Because p -value[latex]=0.0147 \lt 0.05=\alpha[/latex], we reject the null hypothesis in favour of the alternative hypothesis.  At the 5% significance level there is enough evidence to suggest that the proportion of people who experience side effects from taking the drug is higher than 4%.

  • The null hypothesis [latex]p=4\%[/latex] is the claim that 4% of the people experience side effects from taking the drug.
  • The alternative hypothesis [latex]p \gt 4\%[/latex] is the claim that more than 4% of the people experience side effects from taking the drug.
  • The function is 1-binom.dist because we are finding the probability of at least 8 successes.
  • Field 1 is [latex]x-1[/latex] where [latex]x[/latex] is the number of successes.  In this case, we are using the compliment rule to change the probability of at least 8 successes into 1 minus the probability of at most 7 successes.
  • Field 3 is the probability of success [latex]p[/latex].  This is the claim about the population proportion made in the null hypothesis, so that means we assume [latex]p=0.04[/latex].
  • The p -value of 0.0147 tells us that under the assumption that 4% of people experience side effects (the null hypothesis), there is a 1.47% chance that the number of people in a sample of 80 who experience side effects is 8 or more.  This is a small probability compared to the significance level, and so is unlikely to happen assuming the null hypothesis is true.  This suggests that the assumption that the null hypothesis is true is most likely incorrect, and so the conclusion of the test is to reject the null hypothesis in favour of the alternative hypothesis.  In other words, the proportion of people who experience side effects is most likely greater than 4%.

Concept Review

The hypothesis test for a population proportion is a well-established process:

  • Find the p -value (the area in the corresponding tail) for the test using the appropriate distribution (normal or binomial).
  • Compare the p -value to the significance level and state the outcome of the test.

Attribution

“ 9.6   Hypothesis Testing of a Single Mean and Single Proportion “ in Introductory Statistics by OpenStax  is licensed under a  Creative Commons Attribution 4.0 International License.

Introduction to Statistics Copyright © 2022 by Valerie Watts is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License , except where otherwise noted.

Hypothesis Testing Calculator

$H_o$:
$H_a$: μ μ₀
$n$ =   $\bar{x}$ =   =
$\text{Test Statistic: }$ =
$\text{Degrees of Freedom: } $ $df$ =
$ \text{Level of Significance: } $ $\alpha$ =

Type II Error

$H_o$: $\mu$
$H_a$: $\mu$ $\mu_0$
$n$ =   σ =   $\mu$ =
$\text{Level of Significance: }$ $\alpha$ =

The first step in hypothesis testing is to calculate the test statistic. The formula for the test statistic depends on whether the population standard deviation (σ) is known or unknown. If σ is known, our hypothesis test is known as a z test and we use the z distribution. If σ is unknown, our hypothesis test is known as a t test and we use the t distribution. Use of the t distribution relies on the degrees of freedom, which is equal to the sample size minus one. Furthermore, if the population standard deviation σ is unknown, the sample standard deviation s is used instead. To switch from σ known to σ unknown, click on $\boxed{\sigma}$ and select $\boxed{s}$ in the Hypothesis Testing Calculator.

$\sigma$ Known $\sigma$ Unknown
Test Statistic $ z = \dfrac{\bar{x}-\mu_0}{\sigma/\sqrt{{\color{Black} n}}} $ $ t = \dfrac{\bar{x}-\mu_0}{s/\sqrt{n}} $

Next, the test statistic is used to conduct the test using either the p-value approach or critical value approach. The particular steps taken in each approach largely depend on the form of the hypothesis test: lower tail, upper tail or two-tailed. The form can easily be identified by looking at the alternative hypothesis (H a ). If there is a less than sign in the alternative hypothesis then it is a lower tail test, greater than sign is an upper tail test and inequality is a two-tailed test. To switch from a lower tail test to an upper tail or two-tailed test, click on $\boxed{\geq}$ and select $\boxed{\leq}$ or $\boxed{=}$, respectively.

Lower Tail Test Upper Tail Test Two-Tailed Test
$H_0 \colon \mu \geq \mu_0$ $H_0 \colon \mu \leq \mu_0$ $H_0 \colon \mu = \mu_0$
$H_a \colon \mu $H_a \colon \mu \neq \mu_0$

In the p-value approach, the test statistic is used to calculate a p-value. If the test is a lower tail test, the p-value is the probability of getting a value for the test statistic at least as small as the value from the sample. If the test is an upper tail test, the p-value is the probability of getting a value for the test statistic at least as large as the value from the sample. In a two-tailed test, the p-value is the probability of getting a value for the test statistic at least as unlikely as the value from the sample.

To test the hypothesis in the p-value approach, compare the p-value to the level of significance. If the p-value is less than or equal to the level of signifance, reject the null hypothesis. If the p-value is greater than the level of significance, do not reject the null hypothesis. This method remains unchanged regardless of whether it's a lower tail, upper tail or two-tailed test. To change the level of significance, click on $\boxed{.05}$. Note that if the test statistic is given, you can calculate the p-value from the test statistic by clicking on the switch symbol twice.

In the critical value approach, the level of significance ($\alpha$) is used to calculate the critical value. In a lower tail test, the critical value is the value of the test statistic providing an area of $\alpha$ in the lower tail of the sampling distribution of the test statistic. In an upper tail test, the critical value is the value of the test statistic providing an area of $\alpha$ in the upper tail of the sampling distribution of the test statistic. In a two-tailed test, the critical values are the values of the test statistic providing areas of $\alpha / 2$ in the lower and upper tail of the sampling distribution of the test statistic.

To test the hypothesis in the critical value approach, compare the critical value to the test statistic. Unlike the p-value approach, the method we use to decide whether to reject the null hypothesis depends on the form of the hypothesis test. In a lower tail test, if the test statistic is less than or equal to the critical value, reject the null hypothesis. In an upper tail test, if the test statistic is greater than or equal to the critical value, reject the null hypothesis. In a two-tailed test, if the test statistic is less than or equal the lower critical value or greater than or equal to the upper critical value, reject the null hypothesis.

Lower Tail Test Upper Tail Test Two-Tailed Test
If $z \leq -z_\alpha$, reject $H_0$. If $z \geq z_\alpha$, reject $H_0$. If $z \leq -z_{\alpha/2}$ or $z \geq z_{\alpha/2}$, reject $H_0$.
If $t \leq -t_\alpha$, reject $H_0$. If $t \geq t_\alpha$, reject $H_0$. If $t \leq -t_{\alpha/2}$ or $t \geq t_{\alpha/2}$, reject $H_0$.

When conducting a hypothesis test, there is always a chance that you come to the wrong conclusion. There are two types of errors you can make: Type I Error and Type II Error. A Type I Error is committed if you reject the null hypothesis when the null hypothesis is true. Ideally, we'd like to accept the null hypothesis when the null hypothesis is true. A Type II Error is committed if you accept the null hypothesis when the alternative hypothesis is true. Ideally, we'd like to reject the null hypothesis when the alternative hypothesis is true.

Condition
$H_0$ True $H_a$ True
Conclusion Accept $H_0$ Correct Type II Error
Reject $H_0$ Type I Error Correct

Hypothesis testing is closely related to the statistical area of confidence intervals. If the hypothesized value of the population mean is outside of the confidence interval, we can reject the null hypothesis. Confidence intervals can be found using the Confidence Interval Calculator . The calculator on this page does hypothesis tests for one population mean. Sometimes we're interest in hypothesis tests about two population means. These can be solved using the Two Population Calculator . The probability of a Type II Error can be calculated by clicking on the link at the bottom of the page.

This calculator runs a one sample proportion test for a given sample data set and specified null and alternative hypotheses. In the fields below enter the sample size \(n\) and the number of scores with the trait of interest, \(f\).

Enter a value for the null hypothesis. This value should indicate the absence of an effect in your data. It must be between the values 0 and 1. Indicate whether your alternative hypothesis involves one-tail or two-tails. If it is a one-tailed test, then you need to indicate whether it is a positive (right tail) test or a negative (left tail) test.

Enter an \(\alpha\) value for the hypothesis test. This is the Type I error rate for your hypothesis test. It also determines the confidence level \(100 \times (1-\alpha)\) for a confidence interval. The confidence interval is based on the normal distribution, which is an approximation.

Press the Run Test button and a table summarizing the computations and conclusions will appear below.

Specify hypotheses:
\(H_0: P=\)
\(H_a:\)
\(\alpha=\)
Test summary
Null hypothesis \(H_0: P=\)
Alternative hypothesis \(H_a: P \)
Type I error rate \(\alpha=\)
Sample size \(n=\)
Sample proportion \(p=\)
Sample standard error \(s_p=\)
Test statistic \(z=\)
\(p\) value \(p=\)
Decision
Confidence interval critical value \(z_{cv}=\)
Confidence interval standard error \(s_p=\)
Confidence interval CI =

Teach yourself statistics

Hypothesis Test for a Proportion

This lesson explains how to conduct a hypothesis test of a proportion, when the following conditions are met:

  • The sampling method is simple random sampling .
  • Each sample point can result in just two possible outcomes. We call one of these outcomes a success and the other, a failure.
  • The sample includes at least 10 successes and 10 failures.
  • The population size is at least 20 times as big as the sample size.

This approach consists of four steps: (1) state the hypotheses, (2) formulate an analysis plan, (3) analyze sample data, and (4) interpret results.

State the Hypotheses

Every hypothesis test requires the analyst to state a null hypothesis and an alternative hypothesis . The hypotheses are stated in such a way that they are mutually exclusive. That is, if one is true, the other must be false; and vice versa.

Formulate an Analysis Plan

The analysis plan describes how to use sample data to accept or reject the null hypothesis. It should specify the following elements.

  • Significance level. Often, researchers choose significance levels equal to 0.01, 0.05, or 0.10; but any value between 0 and 1 can be used.
  • Test method. Use the one-sample z-test to determine whether the hypothesized population proportion differs significantly from the observed sample proportion.

Analyze Sample Data

Using sample data, find the test statistic and its associated P-Value.

σ = sqrt[ P * ( 1 - P ) / n ]

z = (p - P) / σ

  • P-value. The P-value is the probability of observing a sample statistic as extreme as the test statistic. Since the test statistic is a z-score, use the Normal Distribution Calculator to assess the probability associated with the z-score. (See sample problems at the end of this lesson for examples of how this is done.)

Interpret Results

If the sample findings are unlikely, given the null hypothesis, the researcher rejects the null hypothesis. Typically, this involves comparing the P-value to the significance level , and rejecting the null hypothesis when the P-value is less than the significance level.

Test Your Understanding

In this section, two hypothesis testing examples illustrate how to conduct a hypothesis test of a proportion. The first problem involves a a two-tailed test; the second problem, a one-tailed test.

Sample Size Calculator

As you probably noticed, the process of testing a hypothesis about a proportion can be complex. Stat Trek's Sample Size Calculator can do the same job quickly and easily. When you need to test a hypothesis, consider using the Sample Size Calculator. The calculator is free. It can found in the Stat Trek main menu under the Stat Tools tab. Or you can tap the button below.

Problem 1: Two-Tailed Test

The CEO of a large electric utility claims that 80 percent of his 1,000,000 customers are very satisfied with the service they receive. To test this claim, the local newspaper surveyed 100 customers, using simple random sampling. Among the sampled customers, 73 percent say they are very satisified. Based on these findings, can we reject the CEO's hypothesis that 80% of the customers are very satisfied? Use a 0.05 level of significance.

Solution: The solution to this problem takes four steps: (1) state the hypotheses, (2) formulate an analysis plan, (3) analyze sample data, and (4) interpret results. We work through those steps below:

State the hypotheses. The first step is to state the null hypothesis and an alternative hypothesis.

Null hypothesis: P = 0.80

Alternative hypothesis: P ≠ 0.80

  • Formulate an analysis plan . For this analysis, the significance level is 0.05. The test method, shown in the next section, is a one-sample z-test .

σ = sqrt [(0.8 * 0.2) / 100]

σ = sqrt(0.0016) = 0.04

z = (p - P) / σ = (.73 - .80)/0.04 = -1.75

where P is the hypothesized value of population proportion in the null hypothesis, p is the sample proportion, and n is the sample size.

Since we have a two-tailed test , the P-value is the probability that the z-score is less than -1.75 or greater than 1.75. We use the Normal Distribution Calculator to find P(z < -1.75) = 0.04. Since the standard normal distribution is symmetric with a mean of zero, we know that P(z > 1.75) = 0.04. Thus, the P-value = 0.04 + 0.04 = 0.08.

  • Interpret results . Since the P-value (0.08) is greater than the significance level (0.05), we cannot reject the null hypothesis.

Note: If you use this approach on an exam, you may also want to mention why this approach is appropriate. Specifically, the approach is appropriate because the sampling method was simple random sampling, the sample included at least 10 successes and 10 failures, and the population size was at least 10 times the sample size.

Problem 2: One-Tailed Test Suppose the previous example is stated a little bit differently. Suppose the CEO claims that at least 80 percent of the company's 1,000,000 customers are very satisfied. Again, 100 customers are surveyed using simple random sampling. The result: 73 percent are very satisfied. Based on these results, should we accept or reject the CEO's hypothesis? Assume a significance level of 0.05.

Null hypothesis: P >= 0.80

Alternative hypothesis: P < 0.80

σ = sqrt[ P * ( 1 - P ) / n ] = sqrt [(0.8 * 0.2) / 100]

  • Interpret results . Since the P-value (0.04) is less than the significance level (0.05), we cannot accept the null hypothesis.

Statistics Tutorial

Descriptive statistics, inferential statistics, stat reference, statistics - hypothesis testing a proportion.

A population proportion is the share of a population that belongs to a particular category .

Hypothesis tests are used to check a claim about the size of that population proportion.

Hypothesis Testing a Proportion

The following steps are used for a hypothesis test:

  • Check the conditions
  • Define the claims
  • Decide the significance level
  • Calculate the test statistic

For example:

  • Population : Nobel Prize winners
  • Category : Born in the United States of America

And we want to check the claim:

" More than 20% of Nobel Prize winners were born in the US"

By taking a sample of 40 randomly selected Nobel Prize winners we could find that:

10 out of 40 Nobel Prize winners in the sample were born in the US

The sample proportion is then: \(\displaystyle \frac{10}{40} = 0.25\), or 25%.

From this sample data we check the claim with the steps below.

1. Checking the Conditions

The conditions for calculating a confidence interval for a proportion are:

  • The sample is randomly selected
  • Being in the category
  • Not being in the category
  • 5 members in the category
  • 5 members not in the category

In our example, we randomly selected 10 people that were born in the US.

The rest were not born in the US, so there are 30 in the other category.

The conditions are fulfilled in this case.

Note: It is possible to do a hypothesis test without having 5 of each category. But special adjustments need to be made.

2. Defining the Claims

We need to define a null hypothesis (\(H_{0}\)) and an alternative hypothesis (\(H_{1}\)) based on the claim we are checking.

The claim was:

In this case, the parameter is the proportion of Nobel Prize winners born in the US (\(p\)).

The null and alternative hypothesis are then:

Null hypothesis : 20% of Nobel Prize winners were born in the US.

Alternative hypothesis : More than 20% of Nobel Prize winners were born in the US.

Which can be expressed with symbols as:

\(H_{0}\): \(p = 0.20 \)

\(H_{1}\): \(p > 0.20 \)

This is a ' right tailed' test, because the alternative hypothesis claims that the proportion is more than in the null hypothesis.

If the data supports the alternative hypothesis, we reject the null hypothesis and accept the alternative hypothesis.

Advertisement

3. Deciding the Significance Level

The significance level (\(\alpha\)) is the uncertainty we accept when rejecting the null hypothesis in a hypothesis test.

The significance level is a percentage probability of accidentally making the wrong conclusion.

Typical significance levels are:

  • \(\alpha = 0.1\) (10%)
  • \(\alpha = 0.05\) (5%)
  • \(\alpha = 0.01\) (1%)

A lower significance level means that the evidence in the data needs to be stronger to reject the null hypothesis.

There is no "correct" significance level - it only states the uncertainty of the conclusion.

Note: A 5% significance level means that when we reject a null hypothesis:

We expect to reject a true null hypothesis 5 out of 100 times.

4. Calculating the Test Statistic

The test statistic is used to decide the outcome of the hypothesis test.

The test statistic is a standardized value calculated from the sample.

The formula for the test statistic (TS) of a population proportion is:

\(\displaystyle \frac{\hat{p} - p}{\sqrt{p(1-p)}} \cdot \sqrt{n} \)

\(\hat{p}-p\) is the difference between the sample proportion (\(\hat{p}\)) and the claimed population proportion (\(p\)).

\(n\) is the sample size.

In our example:

The claimed (\(H_{0}\)) population proportion (\(p\)) was \( 0.20 \)

The sample size (\(n\)) was \(40\)

So the test statistic (TS) is then:

\(\displaystyle \frac{0.25-0.20}{\sqrt{0.2(1-0.2)}} \cdot \sqrt{40} = \frac{0.05}{\sqrt{0.2(0.8)}} \cdot \sqrt{40} = \frac{0.05}{\sqrt{0.16}} \cdot \sqrt{40} \approx \frac{0.05}{0.4} \cdot 6.325 = \underline{0.791}\)

You can also calculate the test statistic using programming language functions:

With Python use the scipy and math libraries to calculate the test statistic for a proportion.

With R use the built-in prop.test() function to calculate the test statistic for a proportion.

5. Concluding

There are two main approaches for making the conclusion of a hypothesis test:

  • The critical value approach compares the test statistic with the critical value of the significance level.
  • The P-value approach compares the P-value of the test statistic and with the significance level.

Note: The two approaches are only different in how they present the conclusion.

The Critical Value Approach

For the critical value approach we need to find the critical value (CV) of the significance level (\(\alpha\)).

For a population proportion test, the critical value (CV) is a Z-value from a standard normal distribution .

This critical Z-value (CV) defines the rejection region for the test.

The rejection region is an area of probability in the tails of the standard normal distribution.

Because the claim is that the population proportion is more than 20%, the rejection region is in the right tail:

Choosing a significance level (\(\alpha\)) of 0.05, or 5%, we can find the critical Z-value from a Z-table , or with a programming language function:

Note: The functions find the Z-value for an area from the left side.

To find the Z-value for a right tail we need to use the function on the area to the left of the tail (1-0.05 = 0.95).

With Python use the Scipy Stats library norm.ppf() function find the Z-value for an \(\alpha\) = 0.05 in the right tail.

With R use the built-in qnorm() function to find the Z-value for an \(\alpha\) = 0.05 in the right tail.

Using either method we can find that the critical Z-value is \(\approx \underline{1.6449}\)

For a right tailed test we need to check if the test statistic (TS) is bigger than the critical value (CV).

If the test statistic is bigger than the critical value, the test statistic is in the rejection region .

When the test statistic is in the rejection region, we reject the null hypothesis (\(H_{0}\)).

Here, the test statistic (TS) was \(\approx \underline{0.791}\) and the critical value was \(\approx \underline{1.6449}\)

Here is an illustration of this test in a graph:

Since the test statistic was smaller than the critical value we do not reject the null hypothesis.

This means that the sample data does not support the alternative hypothesis.

And we can summarize the conclusion stating:

The sample data does not support the claim that "more than 20% of Nobel Prize winners were born in the US" at a 5% significance level .

The P-Value Approach

For the P-value approach we need to find the P-value of the test statistic (TS).

If the P-value is smaller than the significance level (\(\alpha\)), we reject the null hypothesis (\(H_{0}\)).

The test statistic was found to be \( \approx \underline{0.791} \)

For a population proportion test, the test statistic is a Z-Value from a standard normal distribution .

Because this is a right tailed test, we need to find the P-value of a Z-value bigger than 0.791.

We can find the P-value using a Z-table , or with a programming language function:

Note: The functions find the P-value (area) to the left side of Z-value.

To find the P-value for a right tail we need to subtract the left area from the total area: 1 - the output of the function.

With Python use the Scipy Stats library norm.cdf() function find the P-value of a Z-value bigger than 0.791:

With R use the built-in pnorm() function find the P-value of a Z-value bigger than 0.791:

Using either method we can find that the P-value is \(\approx \underline{0.2145}\)

This tells us that the significance level (\(\alpha\)) would need to be bigger than 0.2145, or 21.45%, to reject the null hypothesis.

This P-value is bigger than any of the common significance levels (10%, 5%, 1%).

So the null hypothesis is kept at all of these significance levels.

The sample data does not support the claim that "more than 20% of Nobel Prize winners were born in the US" at a 10%, 5%, or 1% significance level .

Note: It may still be true that the real population proportion is more than 20%.

But there was not strong enough evidence to support it with this sample.

Calculating a P-Value for a Hypothesis Test with Programming

Many programming languages can calculate the P-value to decide outcome of a hypothesis test.

Using software and programming to calculate statistics is more common for bigger sets of data, as calculating manually becomes difficult.

The P-value calculated here will tell us the lowest possible significance level where the null-hypothesis can be rejected.

With Python use the scipy and math libraries to calculate the P-value for a right tailed hypothesis test for a proportion.

Here, the sample size is 40, the occurrences are 10, and the test is for a proportion bigger than 0.20.

With R use the built-in prop.test() function find the P-value for a right tailed hypothesis test for a proportion.

Note: The conf.level in the R code is the reverse of the significance level.

Here, the significance level is 0.05, or 5%, so the conf.level is 1-0.05 = 0.95, or 95%.

Left-Tailed and Two-Tailed Tests

This was an example of a right tailed test, where the alternative hypothesis claimed that parameter is bigger than the null hypothesis claim.

You can check out an equivalent step-by-step guide for other types here:

  • Left-Tailed Test
  • Two-Tailed Test

Get Certified

COLOR PICKER

colorpicker

Contact Sales

If you want to use W3Schools services as an educational institution, team or enterprise, send us an e-mail: [email protected]

Report Error

If you want to report an error, or if you want to make a suggestion, send us an e-mail: [email protected]

Top Tutorials

Top references, top examples, get certified.

Section 10.2: Hypothesis Tests for a Population Proportion

  • 10.1 The Language of Hypothesis Testing
  • 10.2 Hypothesis Tests for a Population Proportion
  • 10.3 Hypothesis Tests for a Population Mean
  • 10.4 Hypothesis Tests for a Population Standard Deviation
  • 10.5 Putting It Together: Which Method Do I Use?

By the end of this lesson, you will be able to...

  • explain the logic of hypothesis testing
  • test hypotheses about a population proportion
  • test hypotheses about a population proportion using the binomial probability distribution

For a quick overview of this section, watch this short video summary:

The Logic of Hypothesis Testing

Once we have our null and alternative hypotheses chosen, and our sample data collected, how do we choose whether or not to reject the null hypothesis? In a nutshell, it's this:

If the observed results are unlikely assuming that the null hypothesis is true, we say the result is statistically significant , and we reject the null hypothesis. In other words, the observed results are so unusual, that our original assumption in the null hypothesis must not have been correct.

There are generally three different methods for testing hypotheses:

  • the classical approach
  • confidence intervals

Because P -values are so much more widely used, we will be focusing on this method. You will be required to include P -values for your homework and exams. We will also frequently look at both P -values and confidence intervals to make sure the two methods align.

In general, we define the P -value this way:

The P -value is the probability of observing a sample statistic as extreme or more extreme than the one observed in the sample assuming that the null hypothesis is true.

The Sample Proportion

In Section 8.2, we learned about the distribution of the sample proportion, so let's do a quick review of that now.

We also learned some information about how the sample proportion is distributed:

Sampling Distribution of

For a random sample of size n such that n≤0.05N (in other words, the sample is less than 5% of the population),

So what we do is create a test statistic based on our sample, and then use a table or technology to find the probability of what we observed. Here are the details.

Testing Claims Regarding the Population Proportion Using P -Values

In this first section, we assume we are testing some claim about the population proportion. As usual, the following two conditions must be true:

  • np(1-p)≥10, and

Step 1 : State the null and alternative hypotheses.

H : p = p
H : p ≠ p
H : p = p
H : p < p
H : p = p
H : p > p

Step 2 : Decide on a level of significance, α , depending on the seriousness of making a Type I error. ( α will often be given as part of a test or homework question, but this will not be the case in the outside world.)

Step 4 : Determine the P -value.

Step 5 : Reject the null hypothesis if the P -value is less than the level of significance, α.

Step 6 : State the conclusion.

Calculating P -Values

Right-tailed tests.

right-tailed P-value

Left-Tailed Tests

left-tailed P-value

Two-Tailed Tests

In a two-tailed test, the P -value = 2P(Z > |z o |).

two-tailed p-value

It may seem odd to multiply the probability by two, since "or more extreme" seems to imply the area in the tail only. The reason why we do multiply by two is that even though the result was on one side, we didn't know before collecting the data , on which side it would be.

The Strength of the Evidence

Since the P -value represents the probability of observing our result or more extreme, the smaller the P -value, the more unusual our observation was. Another way to look at it is this:

The smaller the P -value, the stronger the evidence supporting the alternative hypothesis. We can use the following guideline:

  • P -value < 0.01: very strong evidence supporting the alternative hypothesis
  • 0.01 ≤ P -value < 0.05: strong evidence supporting the alternative hypothesis
  • 0.05 ≤ P -value < 0.1: some evidence supporting the alternative hypothesis
  • P -value ≥ 0.1: weak to no evidence supporting the alternative hypothesis

These values are not hard lines, of course, but they can give us a general idea of the strength of the evidence.

But wait! There is an important caveat here, which was mentioned earlier in the section about The Controversy Regarding Hypothesis Testing . The problem is that it's relatively easy to get a large p-value - just get a really large sample size! So the chart above is really with the caveat " assuming equal sample sizes in comparable studies , ... "

This isn't something every statistics text will mention, nor will every instructor mention, but it's important.

According to the Elgin Community College website , approximately 56% of ECC students are female. Suppose we wonder if the same proportion is true for math courses. If we collect a sample of 200 ECC students enrolled in math courses and find that 105 of them are female, do we have enough evidence at the 10% level of significance to say that the proportion of math students who are female is different from the general population?

Note: Be sure to check that the conditions for performing the hypothesis test are met.

[ reveal answer ]

Before we begin, we need to make sure that our sample is less than 5% of the population, and that np 0 (1-p 0 )≥10.

Since there are roughly 16,000 students at ECC (source: www.elgin.edu ), our sample of 200 is clearly less than 5% of the population. Also, np 0 (1-p 0 ) = 200(0.56)(1-0.56) = 49.28 > 10

Step 1 : H 0 : p = 0.56 H 1 : p ≠ 0.56

Step 2 : α = 0.1

Step 4 : P -value = 2•P(Z < -1.00) ≈ 0.3187 (Note that this is a 2-tailed test.)

Step 5 : Since P -value > α , we do not reject H 0 .

Step 6 : There is not enough evidence at the 5% level of significance to support the claim that the proportion of students in math courses who are female is different from the general population.

Hypothesis Testing Regarding p Using StatCrunch

> > > and H , then click .

> > > and H , then click .

* To get the counts, first create a frequency table. If you have a grouping variable, use a contingency table.

Consider the excerpt shown below (also used in Example 1 , in Section 9.3) from a poll conducted by Pew Research:

Stem cell, marijuana proposals lead in Mich. poll A recent poll shows voter support leading opposition for ballot proposals to loosen Michigan's restrictions on embryonic stem cell research and allow medical use of marijuana. The EPIC-MRA poll conducted for The Detroit News and television stations WXYZ, WILX, WOOD and WJRT found 50 percent of likely Michigan voters support the stem cell proposal , 32 percent against and 18 percent undecided. The telephone poll of 602 likely Michigan voters was conducted Sept. 22 through Wednesday. It has a margin of sampling error of plus or minus 4 percentage points. (Source: Associated Press )

Suppose we wonder if the percent of Elgin Community College students who support stem cell research is different from this. If 61 of 100 randomly selected ECC students support stem cell research, is there enough evidence at the 5% level of signficance to support our claim?

Since there are roughly 16,000 students at ECC (source: www.elgin.edu ), our sample of 100 is clearly less than 5% of the population. Also, np 0 (1-p 0 ) = 100(0.50)(1-0.50) = 25 > 10

Step 1 : H 0 : p = 0.5 H 1 : p ≠ 0.5

Step 2 : α = 0.05

Step 3 : We'll use StatCrunch.

Step 4 : Using StatCrunch:

StatCrunch calculation

Step 5 : Since P -value < α , we reject H 0 .

Step 6 : Based on this sample, there is enough evidence at the 5% level of significance to support the claim that the proportion of ECC students who support stem cell research is different from the Michigan poll.

One question you might have is, "What do we do if the conditions for the hypothesis test about p aren't met?" Great question!

A Binomial Refresher

The binomial probability distribution function.

The probability of obtaining x successes in n independent trials of a binomial experiment, where the probability of success is p, is given by

Where x = 0, 1, 2, ... , n

Using Technology to Calculate Binomial Probabilities

Here's a quick overview of the formulas for finding binomial probabilities in StatCrunch.

Click on > >

Enter n, p, the appropriate equality/inequality, and x. The figure below shows P(X≥3) if n=4 and p=0.25.

Hypothesis Testing Using the Binomial Distribution

Traditionally, about 70% of students in a particular Statistics course at ECC are successful. If only 15 students in a class of 28 randomly selected students are successful, is there enough evidence at the 5% level of significance to say that students of that particular instructor are successful at a rate less than 70%?

Step 1 : H 0 : p = 0.7 H 1 : p < 0.7

Step 4 : If we let X = the number of students who were successful, X follows the binomial distribution. For this example, n=28 and p=0.70, and we want P(X≤15). Using StatCrunch:

Step 5 : Since P -value < α (though it's very close), we reject H 0 .

Step 6 : Based on this sample, there is enough evidence at the 5% level of significance to support the claim that the proportion of students who are successful in this professor's classes are less than 70%. (Keep in mind that this assumes the students were randomly assigned to that class, which is never the case in reality!)

<< previous section | next section >>

Creative Commons License

Difference in Proportions Hypothesis Test Calculator

Use the calculator below to analyze the results of a difference in two proportions hypothesis test. Enter your sample proportions, sample sizes, hypothesized difference in proportions, test type, and significance level to calculate your results.

You will find a description of how to conduct a difference in proportion hypothesis test below the calculator.

Define the Two Sample z-test

Significance Level Difference in Proportions
z-score
Probability

The Difference Between the Sample Proportions Under the Null Distribution

Conducting a hypothesis test for the difference in proportions.

When two populations rates are related, you can compare them by analyzing the difference between their proportions.

A hypothesis test for the difference in sample proportions can help you make inferences about the relationships between two population proportions.

Note: Difference in proportions hypothesis tests are commonly used in “A/B Tests” in which a researcher compares one rate to another. For example, a digital marketer might use an A/B test to compare a conversion rate from one web advertisement to another version of the same advertisement.

Testing for a Difference in Proportions

For the results of a hypothesis test to be valid, you should follow these steps:

Check Your Conditions

State your hypothesis, determine your analysis plan, analyze your sample, interpret your results.

To use the testing procedure described below, you should check the following conditions:

  • Independent Samples – Your samples should be independent of each other.
  • Binary Outcomes - When conducting a hypothesis test for the difference in two proportion, each sample point from each sample should consist of only one of two outcomes. We often label one outcome a “success” and the other a “failure,” but it does not matter which of the two outcomes gets which label.
  • Success-Failure Rate - Each sample size should be large enough that you see at least 10 “success” and 10 “failures” in each sample. For example, if one of your sample proportions has a 20% or 0.2 “success” rate, then you would need to check that the sample size is at least 50 [20 = 50 * 20%] to meet this condition. This condition helps ensure that the sampling distributions from which you collect your samples reasonably follow the Normal Distribution.
  • Simple Random Sampling - You should collect your samples with simple random sampling. This type of sampling requires that every occurrence of a category or event in a population has an equal chance of being selected when taking a sample.
  • Sample-to-Population Ratio - For each sample, the population should be much larger than the sample you collect. As a rule-of-thumb, a sample size should represent no more than 5% of its population.

You must state a null hypothesis and an alternative hypothesis to conduct a hypothesis test.

The null hypothesis is a skeptical claim that you would like to test.

The alternative hypothesis represents the alternative claim to the null hypothesis.

Your null hypothesis and alternative hypothesis should be stated in one of three mutually exclusive ways listed in the table below.

Null Hypothesis Alternative Hypothesis Number of Tails Description
- P = D - P ≠ D Tests whether the sample proportions come from populations with a difference in proportions equal to D. If D = 0, then tests if the samples come from populations that are different from each other.
- P ≤ D - P > D Tests whether sample one comes from a population with a proportion that is greater than sample two's population proportion by a difference of D. If D = 0, then tests if sample one comes from a population with a proportion greater than sample two's population proportion.
- P ≥ D - P < D Tests whether sample one comes from a population with a proportion that is less than sample two's population proportion by a difference of D. If D = 0, then tests if sample one comes from a population with a proportion less than sample two's population proportion.

D is the hypothesized difference between the populations' proportions that you would like to test.

Before conducting a hypothesis test, you must determine a reasonable significance level, α, or the probability of rejecting the null hypothesis assuming it is true. The lower your significance level, the more confident you can be of the conclusion of your hypothesis test. Common significance levels are 10%, 5%, and 1%.

To evaluate your hypothesis test at the significance level that you set, consider if you are conducting a one or two tail test:

  • Two-tail tests divide the rejection region, or critical region, evenly above and below the null distribution, i.e. to the tails of the null sampling distribution. For example, in a two-tail test with a 5% significance level, your rejection region would be the upper and lower 2.5% of the null distribution. An alternative hypothesis of P 1 - P 2 ≠ D requires a two-tail test.
  • One-tail tests place the rejection region entirely on one side of the distribution i.e. to the right or left tail of the null distribution. For example, in a one-tail test evaluating if actual population proportion difference D is above the null distribution with a 5% significance level, your rejection region would be the upper 5% of the null distribution. P 1 - P 2 1 - P 2 > D alternative hypotheses require one-tail tests.

The graphical results section of the calculator above shades rejection regions blue.

After checking your conditions, stating your hypothesis, determining your significance level, and collect your sample, you are ready to analyze your hypothesis.

Sample proportions follow the Normal Distribution with the following parameters (i.e. numbers that define the distribution):

  • The Difference in the Population Proportions, D - The true difference in the proportions is unknown, but we use the hypothesized difference in the proportions, D, from the null hypothesis in the calculations.
  • The Standard Error, SE - The standard error of the difference in the sample proportions can be computed as follows: SE = ((p 1 x (1 – p 1 ))/ n 1 + (p 1 x (1 – p 1 ))/ n 2 ) (1/2) , with n being the sample size. It defines how differences in sample proportions are expected to vary around the null difference in proportions sampling distribution given the sample sizes and under the assumption that the null hypothesis is true.

In a difference in proportions hypothesis test, we calculate the probability that we would observe the difference in sample proportions (p 1 - p 2 ), assuming the null hypothesis is true, also known as the p-value . If the p-value is less than the significance level, then we can reject the null hypothesis.

You can determine a precise p-value using the calculator above, but we can find an estimate of the p-value manually by calculating the z-score as follows: z = (p 1 - p 2 - D) / SE

The z-score is a test statistic that tells us how far our observation is from the difference in proportions given by the null hypothesis under the null distribution. Using any z-score table, we can look up the probability of observing the results under the null distribution. You will need to look up the z-score for the type of test you are conducting, i.e. one or two tail. A hypothesis test for the difference in two proportions is sometimes known as a two proportion z-test because of the use of a z-score in analyzing results.

The conclusion of a hypothesis test for the difference in proportions is always either:

  • Reject the null hypothesis
  • Do not reject the null hypothesis

If you reject the null hypothesis, you cannot say that your sample difference in proportions is the true difference between the populations. If you do not reject the null hypothesis, you cannot say that the null hypothesis is true.

A hypothesis test is simply a way to look at evidence and conclude if it provides sufficient evidence to reject the null hypothesis.

Example: A/B Test (Hypothesis Test for the Difference in Two Proportions)

Let’s say you are in charge of email marketing for a clothing brand. Your goal is to sell clothes online, and to sell clothes online, you have to get your email recipients to open your emails.

As part of a new email campaign, you have written two versions of an email subject line: an A version and a B version. But you do not know which one will be more effective.

So, you decide to run an “A/B Test” of your subject lines using a difference in proportions hypothesis test to analyze your results. Your goal is to see if either subject line will have a higher "open" rate.

Your email database consists of 100,000 contacts, and you decide to run the test on 5,000 of them with 50% of the sample group receiving subject line A and 50% receiving subject line B. Let’s go through the steps you would take to run the test.

  • Check the conditions - Your test consists of binary outcomes (i.e. open and no open), your sample sizes are large enough to meet the success-failure condition but not too large to violate the sample-to-population ratio condition, and you collect your samples using simple random sampling .
  • State Your Hypothesis - Your null hypothesis is that the email subject lines are the same (i.e. P 1 - P 2> = 0) and your alternative hypothesis is that they are not the same (i.e. P 1 - P 2> ≠ 0).
  • Determine Your Analysis Plan - You believe that a 5% significance level is reasonable. As your test is two-tail test, you will evaluate if the difference in open rates between the samples would occur at the upper or lower 2.5% [2.5% = 5%/2] of the null distribution.
  • Analyze Your Sample - After collecting your samples (which you do after steps 1-3), you find that subject line A had a sample open rate, p 1 , of 20%. Subject line B has a sample open rate, p 2 , of 17%. Using the calculator above, you find that a difference in sample proportions of 3% [3% = 20% - 17%] would results in a z-score of 2.73 under the null distribution, which translates to a p-value of 0.63%.
  • Interpret Your Results - Since your p-value of 0.63% is less than the significance level of 5%, you have sufficient evidence to reject the null hypothesis.

In this example, you found that you can reject your original claim that the subject lines have the same performance. The test does not guarantee that your subject line A has a higher open rate than subject line B, but it does give you strong reason to favor subject line A.

Calculator-Online.net

Informative

Conversion Calculator

Calculator-Online.net

Follow Us On:

Test Statistic Calculator

Choose the method, enter the values into the test statistic calculator, and click on the “Calculate” button to calculate the statistical value for hypothesis evaluation.

Add this calculator to your site

This test statistic calculator helps to find the static value for hypothesis testing. The calculated test value shows if there’s enough evidence to reject a null hypothesis. Also, this calculator performs calculations of either for one population mean, comparing two means, single population proportion, and two population proportions.

Our tool is highly useful in various fields like research, experimentation, quality control, and data analysis.

What is Test Statistics?

A test statistic is a numerical value obtained from the sample data set. It summarizes the differences between what you observe within your sample and what would be expected if a hypothesis were true. 

The t-test statistic also shows how closely your data matches the predicted distribution among the sample tests you perform. 

How to Calculate Test Statistics Value?

  • Collect the data from the populations
  • Use the data to find the standard deviation of the population
  • Calculate the mean (μ) of the population using this data
  • Determine the z-value or sample size 
  • Use the suitable test statistic formula and get the results

Test Statistic For One Population Mean:

Test statistics for a single population mean is calculated when a variable is numeric and involves one population or a group. 

x̄ - µ 0 σ / √n

  • x̄ = Mean of your sample data
  • µ 0 = Hypothesized population mean that you are comparing to your sample mean
  • σ = Population standard deviation
  • n = number of observations (sample size) in your data set

Suppose we want to test if the average height of adult males in a city is 70 inches. We take a sample of 25 adult males and find the sample mean height to be 71 inches with a sample standard deviation of 3 inches. We use a significance level of 0.05.

t = 70 - 71 3√25

Test Statistic Comparing Two Population Means:

This test is applied when the numeric value is compared across the various populations or groups. To compute the resulting t statistic, two distinct random samples must be chosen, one from each population.

\(\frac{√x̄ - √ȳ}{\sqrt{\frac{\sigma_{1}^{2}}{n_{1}}+\frac{\sigma_{2}^{2}}{n_{2}}}}\)

  • ȳ = means of hypothesized population

Suppose we want to test if there is a difference in average test scores between two schools. We take a sample of 30 students from school A with an average score of 85 and a standard deviation of 5, and a sample of 35 students from school B with an average score of 82 and a standard deviation of 6.

t = 85 - 82 √5 2 / 30 + 6 2 / 35

t = 3 √ 25/30 + 36/35

t = 3 √0.833 + 1.029

t = 3 √1.862

Test Statistic For a Single Population Proportion:

This test is used to determine if a single population's proportion differs from a specified standard. The t statistic calculator works for a population proportion when dealing with data by having a limit of P₀ because proportions represent parts of a whole and cannot logically exceed the total or be negative.

\(\frac{\hat{p}-p_{0}}{\sqrt{\frac{p_{0}(1-p_{0})}{n}}}\)

  • P̂ = Sample proportion
  • P 0 = Population proportion

Suppose we want to test if the proportion of left-handed people in a population is 10%. We take a sample of 100 people and find that 8 are left-handed. We use a significance level of 0.05.

= P̂ - P₀ √0.10 (1 - 0.10)/100

= 0.08 - 0.10 √0.10 (1 - 0.10)/100

= -0.02 √0.10 (0.9)/100

= -0.02 √0.009

= -0.02 0.03

= −0.67

Test Statistic For Two Population Proportion:

This test identifies the difference in proportions between two independent groups to assess their significance.

\(\frac{\hat{p}_{1}-\hat{p}_{2}}{\sqrt{\hat{p}(1-\hat{p})(\frac{1}{n_{1}}+\frac{1}{n_{2}})}}\)

  • P̂ 1 and P̂ 2 = Sample proportions for two groups

Suppose we want to test if the proportion of smokers is different between two cities. We take a sample of 150 people from City A and find that 30 are smokers, and a sample of 200 people from City B and find that 50 are smokers.

  • P̂ 1 = 30 / 150 = 0.20
  • P̂ 2 = 50 / 200 = 0.25
  • P̂ = 30 + 50 / 150 + 200 = 0.229

Calculation:

= 0.20 - 0.25 √0.229 (1 - 0.229) (1 / 150 + 1/200)

= -0.05 √0.229 (0.771) (1 / 150 + 1 / 200)

= -0.05 √0.176 (1/150 + 1/200)

= -0.05 √0.176 (0.0113)

= -0.05 √0.002

= -0.05 0.045

= −1.11

Sample Distribution Calculator

Sample Size Calculator

Effect Size Calculator

Chi-Square Calculator

Critical Value Calculator

Probability Calculator

Pie Chart Calculator

Add this calculator to your site.

Just copy a given code & paste it right now into your website HTML (source) for suitable page.

Calculator Online

Give Us Your Feedback

Remove image

Share Result

Remove

Get the ease of calculating anything from the source of calculator online

Email us at

© Copyrights 2024 by Calculator-Online.net

This calculator runs a two sample independent proportions test for given sample data and specified null and alternative hypotheses. Enter the data in the fields below. For each sample enter the total number of scores (\(n_1\) and \(n_2\)) and the number of scores that have the trait of interest (\(f_1\) and \(f_2\)).

Enter a value for the null hypothesis. This value should indicate the absence of an effect in your data. It must be between the values 0 and 1. Indicate whether your alternative hypothesis involves one-tail or two-tails. If it is a one-tailed test, then you need to indicate whether it is a positive (right tail) test or a negative (left tail) test.

Enter an \(\alpha\) value for the hypothesis test. This is the Type I error rate for your hypothesis test. It also determines the confidence level \(100 \times (1-\alpha)\) for a confidence interval. The confidence interval is based on the normal distribution, which is an approximation.

Press the Run Test button and a table summarizing the computations and conclusions will appear below.

Specify hypotheses:
\(H_0: P_1 - P_2=\)
\(H_a:\)
\(\alpha=\)
Test summary
Null hypothesis \(H_0: P_1 - P_2=\)
Alternative hypothesis \(H_a: P_1 - P_2 \)
Type I error rate \(\alpha=\)
Sample size for group 1 \(n_1=\)
Sample size for group 2 \(n_2=\)
Sample proportion for group 1 \(p_1=\)
Sample proportion for group 2 \(p_2=\)
Pooled proportion \(p=\)
Standard error \(s_{p_1 - p_2}=\)
Test statistic \(z=\)
\(p\) value \(p=\)
Decision
Confidence interval critical value \(z_{cv}=\)
Confidence interval CI =

2   Confidence Intervals and Sample Size

Using the result of confidence intervals from the last lesson, this lesson starts with a discussion on selecting sample size for estimating the population mean as well as the population total by a confidence interval with a specified margin of error and specified level of confidence. In the second section, the confidence interval for estimating a population proportion is discussed. In the last section, sample sizes needed for estimating a population proportion are discussed. Both the educated guess and conservative methods are introduced.

Lesson 2: Ch. 4.1-4.2, 5.1-5.3, of Sampling by Steven Thompson, 3rd Edition.

Upon completion of this lesson, you should be able to:

  • Calculate the sample size needed for estimating population mean and population total,
  • Compute the confidence interval for population proportion,
  • Given a desired level of confidence for estimating a population proportion, determine the sample size required using both the educated guess method and conservative method, and critically evaluate the advantages and disadvantages of both approaches,
  • Determine the necessary sample size to estimate the population proportion, and
  • Choose between using the educated guess method and the conservative method.

2.1 Sample Size for Estimating Population Mean and Total

How large is a sample size that is large enough for estimating the population mean?

If \(\hat{\theta}\) is an unbiased, normally distributed estimator of \(\theta\) :

\[\dfrac{\hat{\theta}-\theta}{\sqrt{\operatorname{Var}(\hat{\theta})}} \sim N(0,1)\]

\[ P\left(\dfrac{|\hat{\theta}-\theta|}{\sqrt{\operatorname{Var}(\hat{\theta})}} > z_{\alpha/2} \right)=\alpha\]

\[P\left(|\hat{\theta}-\theta|>z_{\alpha/2} \cdot \sqrt{\operatorname{Var}(\hat{\theta})} \right)= \alpha\]

Note! because we know that \(\hat{\theta}\) is normal, we can thus use the \(z\) distribution.

And, if we specify this \(\alpha\) we can then try to find out the sample size large enough to achieve the goal of your experiment.

So, we need to ask, “What is the goal of your experiment?” This is perhaps the most important question asked as a part of your experiment.

Example: What if we were interested in estimating the average weight of Penn State male students? How many samples should we plan on taking? We want to estimate this mean. What do we need to consider?

  • The variability of the data and the measure that you are estimating is your first concern. This directly affects how many samples you will need.
  • The second thing that you need to think about is the type of conclusion that you would like to report. That is, you need to specify the \(1 - \alpha\) value that you are happy with.
  • How accurate (precision) do you want this estimate to be? You thus need to specify the margin of error.

Now, if we specify \(1-\alpha\) , the margin of error \(d\) (also can be viewed as the half-width of the \((1 - \alpha)\) 100% CI), we can solve for the sample size such that the CI has the specified margin of error.

For estimating the population mean , the equation becomes:

\[P\left(|\bar{x}-\mu|>z_{\alpha/2} \cdot \sqrt{\dfrac{N-n}{N}\cdot \dfrac{\sigma^2}{n}}\right)=\alpha\]

\[z_{\alpha/2}\sqrt{\dfrac{N-n}{N}\cdot \dfrac{\sigma^2}{n}}=d\]

\[n=\dfrac{1}{\dfrac{ d^2}{ z^2_{\alpha/2}\cdot \sigma^2}+\dfrac{1}{N}}\]

Can we now use this formula to estimate the sample size?

What is the weak point of this formula? The weak point is the estimate of the population variance used. We do not know what this is!

Similarly, for estimating the population total \(\tau\) , here is the formula:

\[P\left(|\hat{\tau}-\tau|>z_{\alpha/2} \cdot \sqrt{N(N-n)\dfrac{\sigma^2}{n}} \right)=\alpha\]

\[z_{\alpha/2}\sqrt{N(N-n)\dfrac{\sigma^2}{n}}=d\]

\[n=\dfrac{1}{\dfrac{ d^2}{ N^2 \cdot z^2_{\alpha/2}\cdot \sigma^2}+\dfrac{1}{N}}\]

Example 2.1 (Beetles: Sample Size) What sample size is needed to estimate the population total, \(\tau\) , to within \(d = 1000\) with a 95% CI?

Now, let’s begin plugging what we know into the formula. We know \(N = 100\) , \(\alpha = 0.05\) . Do we know \(\sigma^2\) ? No, but we can estimate \(\sigma^2\) by \(s^2 = 1932.657\) .

How many should we sample? Let’s calculate this out:

\[n=\dfrac{1}{\dfrac{ (1000)^2}{ (100)^2 \cdot (1.96)^2 \cdot 1932.657}+\dfrac{1}{100}}=42.610\]

We will always round this up, therefore, we will sample 43 of the 100 plots.

Note! If we ignore the finite population correction adjustment then,

\[\begin{align} n &= \dfrac{N^2 \cdot z^2_{\alpha/2} \cdot \sigma^2}{d^2} \\ &= \dfrac{(100)^2 \cdot (1.96)^2 \cdot 1932.657}{(1000)^2} \\ &= 74.245 \end{align}\]

which rounds up to 75. This value is much larger than 43.

What is the major point that was just illustrated in the previous example?

In this first example, \(N = 100\) is not very large compared to \(n\) , so one should not ignore the finite population adjustment!

But wait a minute, should you have any cause for concern about the answer, \(n = 43\) , that we obtained?

What about the value that we used for \(\sigma^2\) , (1932.657)?

Note! \(\sigma^2\) is not 1932.657. 1932.657 is the sample variance. Using \(z\) in the formula may be too aggressive. Sometimes people use \(t\) iteratively.

Let’s take a look at this more iterative method:

\[n=\dfrac{1}{\dfrac{d^2}{N^2\cdot t^2 \cdot s^2}+\dfrac{1}{N}}\]

Complication: \(t\) values depend on \(n\) . First we will use \(n = 43\) , and the \(t\) for \(df = 42\) is 2.0181.

\[n=\dfrac{1}{\dfrac{(1000)^2}{(100)^2\cdot (2.0181)^2 \cdot 1932.657}+\dfrac{1}{100}}=44.044\]

Round up to 45, \(t\) for 44 \(df\) is 2.0154.

\[n=\dfrac{1}{\dfrac{(1000)^2}{(100)^2\cdot (2.0154)^2 \cdot 1932.657}+\dfrac{1}{100}}=43.978\]

Here, we get \(n = 44\) . So, we see that the conservative answer is to take \(n = 45\) .

Consequently, our final answer will be to take 45 samples.

In the beetle example, there are data to estimate \(\sigma^2\) . What can one do if there is no pilot data? How can we get a rough idea about what \(\sigma\) is? How is this possible? How do we do this?

Example 2.2 (Average Weight Gain of Pigs) A farm has 1000 young pigs with an initial weight of about 50 lbs. They put them on a new diet for 3 weeks and want to know how many pigs to sample so that they can estimate the average weight gain. They want the answer to be within 2 lbs with 90% confidence.

There is no pilot data here. We don’t have the time to select some pigs in order to get an estimate for \(\sigma\) , the standard deviation of the weight gain.

Question: How do we get a rough estimate of \(\sigma\) ?

What would be a reasonable measure that would help this farmer to give him some guidance on how to estimate the standard deviation of the weight gain?

One thing we can do is rely on the information that we already have, i.e., find some historical data that exists on this topic. But what if this historical data does not exist?

Could we find a rough estimate for \(\sigma\) ?

For certain variables, we can make reasonable guesses for an estimate of \(\sigma\) . Here is a formula for this rough estimate:

\[\sigma \approx \frac{\text{Range}}{4}\]

The range is relatively easy to have some idea about. This is an important point. Even though perhaps none of us has raised pigs we can still come up with a sensible guess. So, for this case, we will make a sensible guess of the range of weight gain and intuitively estimate this to be from a minimum of 10 lbs to a maximum of 50 lbs within this 3-week period.

\(\sigma\) can now be roughly estimated to be:

\[\dfrac{\text{Range}}{4}=\dfrac{50-10}{4}=10 \text{ lbs}\]

Now we can use the formula for estimating the mean, \(\mu\) . Then,

\[\begin{align} n &= \dfrac{1}{\dfrac{ d^2}{ z^2_{\alpha/2}\cdot \sigma^2}+\dfrac{1}{N}} \\ &= \dfrac{1}{\dfrac{ 2^2}{ (1.645)^2 \cdot (10)^2}+\dfrac{1}{1000}} \\ &= 63.36 \end{align}\]

Round up to 64.

We will need to sample 64 pigs in order to estimate the average weight gain in 3 weeks to within 2 lbs with a 90% confidence interval.

2.2 Confidence Intervals for Population Proportion

Estimating proportions.

Estimating population proportions can be seen as a particular case of estimating the population mean. Many things that belong to the problems associated with the mean problem can be borrowed and used when working with proportions…

We want to estimate the proportion of units in the population having some attribute. For example, a question might be, “What would be the proportion of Penn State students who are smokers?” Another example is, “What would be the proportion of people preferring a type of presentation?”

The Gallop Poll: Most are based on telephone interviews with a significant portion based on interviews conducted in person from home visits. Usually, the sample size is at least 1000, sometimes even 1500.

Here are a number of interesting websites associated with estimating proportions:

  • Gallup Poll
  • President Bush’s Final Approval Rating (From CBS News)

Let’s see in what ways the proportion problem is related to the mean problem…

Do you approve of President Bush’s job performance?

\[y_{i} = \begin{cases} 0 & \text{no} \\ 1 & \text{yes} \end{cases}\]

The population unit: \(1, 2, \dots, N\) .

The variable of interest: \(y_1,y_2, \dots,y_{N}\) .

Population proportion: \(p=\dfrac{1}{N} \sum\limits_{i=1}^N y_i\) , which is the population mean, \(\mu\) .

If we take a simple random sample of size \(n\) , then

\[\hat{p}= \sum\limits_{i=1}^n \dfrac{y_i}{n}=\bar{y}\]

This specific definition of \(y_i\) makes it have a variance that is related to its mean.

To find the finite population variance for \(y_1,y_2, \dots,y_{N}\) , we know that the population mean is:

\[\mu=\dfrac{1}{N} \sum\limits_{i=1}^N y_i =p\]

By definition the variance is then:

\[\begin{align} \sigma^2 &= \dfrac{\sum\limits_{i=1}^{N}(y_i-p)^2}{N-1} \\ &= \dfrac{\sum\limits_{i=1}^{N}(y_i^2-2py_i+p^2)}{N-1} \\ &= \dfrac{\sum\limits_{i=1}^{N}y_i^2-2p\sum\limits_{i=1}^N y_i+Np^2}{N-1} \end{align}\]

Then, since \(y^2_i = y_i\) :

\[\begin{align} &= \dfrac{\sum\limits_{i=1}^{N}y_i-2p\sum\limits_{i=1}^N y_i+Np^2}{N-1} \\ &= \dfrac{Np-2p(Np)+Np^2}{N-1} \\ \sigma^2 &= \dfrac{Np-Np^2}{N-1}=\dfrac{Np(1-p)}{N-1} \end{align}\]

Theoretically, this is the variance.

How will we estimate this? We can estimate this by:

\[\hat{\sigma}^2=s^2=\dfrac{n}{n-1}\hat{p}\cdot (1-\hat{p})\]

What we want is to see how \(\hat{p}\) behaves, therefore, we want to know its distribution. First, we find its mean, then its variance.

Since \(\hat{p}\) is \(\bar{y}\) , we can get \(E(\hat{p})=\mu=p\) . Then, we proceed to find its variance.

\[\begin{align} \operatorname{Var}(\hat{p}) &= \left(1-\dfrac{n}{N}\right)\cdot \dfrac{\sigma^2}{n} \\ &= \left(\dfrac{N-n}{N}\right)\cdot \dfrac{N \cdot p \cdot (1-p)}{(N-1)\cdot n} \\ &= \left(\dfrac{N-n}{N-1}\right)\cdot \dfrac{p \cdot (1-p)}{n} \\ \end{align}\]

How will we estimate the variance of \(\hat{p}\) ? There are many answers for how to do this. One method would be to use maximum likelihood, another would be to find the unbiased estimator.

An unbiased estimator of the variance is:

\[\hat{\operatorname{Var}}(\hat{p})=\left(\dfrac{N-n}{N}\right) \cdot \dfrac{\hat{p} \cdot (1-\hat{p})}{n-1}\]

This is one reasonable answer for determining an estimate of the variance. The answer will not be very different from what one would get using other methods.

What about confidence intervals? For this, we need to know the distribution of \(\hat{p}\) . When the sample size is large we know that \(\hat{p}\) has a normal distribution by the Central Limit Theorem. Therefore, we can use the \(t\) interval:

\[\text{Answer: } \hat{p} \pm t_{\alpha/2} \sqrt{\hat{\operatorname{Var}}(\hat{p})}\]

How large is large enough?

\[\text{Answer: if } n \cdot \hat{p}\geq 5, n \cdot (1-\hat{p})\geq 5.\]

We have fairly precise criteria here for whether or not to use \(t\) when constructing the confidence interval.

Example 2.3 (Presidential Approval Rating) Let’s revisit the previous example about President Bush’s final approval rating.

From CBS News (Jan 21, 2009) from the web site: President Bush’s Final Approval Rating

President Bush’s final approval rating is 22%!

If you read the website you can learn a lot about the specifics of this poll. The poll was conducted by telephone interview with 1,112 adults nationwide.

After looking at this statistic, provide a 95% CI for the true proportion. The 22% is a sample proportion - what is the true population proportion?

\[\hat{\operatorname{Var}}(\hat{p})=\left(\dfrac{N-n}{N}\right) \cdot \dfrac{\hat{p}\cdot (1-\hat{p})}{n-1}=1\cdot \dfrac{0.22 \times 0.78}{1112-1}=0.0001545\]

And a 95% confidence interval for \(p\) is:

\[0.22 \pm 1.96 \sqrt{0.0001545}\]

\[=0.22 \pm 0.0244\]

2.3 Sample Size Needed for Estimating Proportion

Using the formula to find the sample size for estimating the mean, we have:

Now, \(\sigma^2=\dfrac{N}{N-1}\cdot p \cdot (1-p)\) substitutes in and we get:

\[n=\dfrac{N \cdot p \cdot (1-p)}{(N-1)\dfrac{d^2}{z^2_{\alpha/2}}+p\cdot(1-p)}\]

When the finite population correction can be ignored, the formula is:

\[n\approx \dfrac{z^2_{\alpha/2}\cdot p \cdot (1-p)}{d^2}\]

Now, for finding sample sizes for proportion, in addition to using an educated guess to estimate \(p\) , we can also find a conservative sample size that can guarantee the margin of error is short enough at a specified \(\alpha\) .

Educated guess (estimate \(p\) by \(\hat{p}\) ):

\[n=\dfrac{N\cdot\hat{p}\cdot(1-\hat{p})}{(N-1)\dfrac{d^2}{z^2_{\alpha/2}}+\hat{p}\cdot(1-\hat{p})}\]

Note, \(\hat{p}\) may be different from the true proportion. The sample size may not be large enough for some cases, (i.e., the margin of error is not as small as specified).

Conservative sample size:

Since \(p(1 - p)\) attains maximum at \(p = 1/2\) , a conservative estimate for sample size is:

\[n=\dfrac{N\cdot 1/4}{(N-1)\dfrac{d^2}{z^2_{\alpha/2}}+1/4}\]

Example 2.4 (Presidential Approval Rating: Sample Size) To estimate the next president’s final approval rating, how many people should be sampled so that the margin of error is 3%, (a popular choice), with 95% confidence?

Use an educated guess: Bush’s = 0.22

Since \(N\) is very large compared to \(n\) , finite population correction is not needed.

\[\begin{align} n &=\dfrac{\hat{p}\cdot(1-\hat{p})\cdot z^2_{\alpha/2}}{d^2}\\ &=\dfrac{0.22\cdot0.78\cdot1.96^2}{0.03^2}\\ &=732.47\\ \end{align}\]

round up to 733.

Use a conservative approach.

\[\begin{align} n &=\dfrac{0.5\cdot0.5\cdot1.96^2}{0.03^2}\\ &=1067.11\\ \end{align}\]

round up to 1068.

How do we choose between the educated guess or the conservative approach?

One should look at the cost of sampling extra units versus the set-up cost of the sampling process once more. If the set-up cost (maybe needed if an educated guess is used) of the sampling procedure once more is high compared to the cost of sampling extra units, then one will prefer to use a conservative approach.

Find the proportion of CD players in this shipment that have a lifetime longer than 2000 hours. The proportion from the last shipment was 0.9. It is not costly to set up the testing procedure again if needed whereas the sampling cost of each unit is expensive. We want to estimate the proportion to be within 0.01 with 95% confidence. Would you use the educated guess or the conservative approach?

We should use an educated guess because it is not costly to set up the testing procedure again. On the other hand, the cost of the sampling of extra units is high due to the nature of the test.

Get a ship out to the Bering Sea to sample the proportion of fish that have mercury levels within a specified level. Last year the proportion is 0.9. Want to estimate the proportion to be within 0.01 with 95% confidence. Would you use the educated guess or the conservative approach?

We should use a conservative approach because it is too expensive to send a ship out again if needed.

Exact intervals for population proportions

Since \(Y_i\) is defined as 1 or 0 depending on whether the unit has the attribute or not and the sampling is without replacement, one can see that to be exact, \(\sum Y_i\) has a hypergeometric distribution.

Using this property, one can obtain an exact confidence interval for \(p\) . When the total number of successes and a total number of failures are large (larger than 5), we can use the \(t\) -interval. (We can use \(z\) -interval if \(n > 50\) ).

Sample size for estimating several proportions simultaneously

It is good to know that there is a solution to the following scenario:

There are a few (maybe unknown) classes and one wants to collect enough samples so that the proportion in each class can be estimated within a certain prescribed precision. (Details not needed, if interested, read Ch. 5.4 and the reference cited there.)

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

  • View all journals
  • Explore content
  • About the journal
  • Publish with us
  • Sign up for alerts
  • Open access
  • Published: 17 August 2024

Prediction-error signals in anterior cingulate cortex drive task-switching

  • Nicholas Cole 1 ,
  • Matthew Harvey 1 ,
  • Dylan Myers-Joseph 1 ,
  • Aditya Gilra   ORCID: orcid.org/0000-0002-8628-1864 2 , 3 &
  • Adil G. Khan   ORCID: orcid.org/0000-0002-6215-000X 1  

Nature Communications volume  15 , Article number:  7088 ( 2024 ) Cite this article

1755 Accesses

30 Altmetric

Metrics details

  • Cognitive control
  • Neural circuits

Task-switching is a fundamental cognitive ability that allows animals to update their knowledge of current rules or contexts. Detecting discrepancies between predicted and observed events is essential for this process. However, little is known about how the brain computes cognitive prediction-errors and whether neural prediction-error signals are causally related to task-switching behaviours. Here we trained mice to use a prediction-error to switch, in a single trial, between responding to the same stimuli using two distinct rules. Optogenetic silencing and un-silencing, together with widefield and two-photon calcium imaging revealed that the anterior cingulate cortex (ACC) was specifically required for this rapid task-switching, but only when it exhibited neural prediction-error signals. These prediction-error signals were projection-target dependent and were larger preceding successful behavioural transitions. An all-optical approach revealed a disinhibitory interneuron circuit required for successful prediction-error computation. These results reveal a circuit mechanism for computing prediction-errors and transitioning between distinct cognitive states.

Similar content being viewed by others

hypothesis testing calculator population proportion

Perirhinal cortex learns a predictive map of the task environment

hypothesis testing calculator population proportion

Cingulate-motor circuits update rule representations for sequential choice decisions

hypothesis testing calculator population proportion

Distinct prefrontal top-down circuits differentially modulate sensorimotor behavior

Introduction.

Animals need to rapidly update their behaviour to survive in a changing environment, and such behavioural flexibility is studied experimentally using task-switching paradigms. Task-switching involves shifting between distinct cognitive rules or contexts, allowing flexible behaviour adapted to changing environmental demands 1 . The framework of predictive processing 2 , 3 , 4 provides a simple yet powerful way of describing flexible task-switching behaviour using three stages. First, animals maintain a previously learned prediction of what they expect to happen in the world at any given instant, i.e., a ‘model of the world’. Second, they may detect discrepancies between predicted and observed events, a ‘prediction error’. Third, the prediction error guides the updating of their model of the world and their ongoing behaviour. While this account has widespread support across humans, monkeys and rodents 2 , 5 , 6 , there is no clear link between the neural representations of prediction errors and their causal requirement in updating mental rules and subsequent behaviour.

There is evidence for cognitive prediction-error signalling in the prefrontal cortex (PFC) in tasks across humans, monkeys, and rodents 7 , 8 , 9 , 10 , 11 , 12 , 13 . The ACC in particular plays a central role in detecting the need for updating behavioural rules, and implementing the updated rules 7 , 8 , 9 , 10 , 14 , 15 , 16 , 17 , 18 , 19 , 20 , 21 , 22 , 23 , 24 , 25 . However, it is unclear how prediction-error signals are distributed across the cortex, how these signals are organised with respect to the projection targets of cortical neurons, and to what extent the amplitude of prediction-error responses is important in subsequent behaviour. Crucially, it is unclear whether prediction-error signals have any causal influence on the subsequent updating of an animal’s behavioural strategy. Finally, the inhibitory circuit basis for computing cognitive prediction errors is largely unknown.

Determining the neural basis of cognitive transitions is challenging because it requires experimental control over an animal’s internal model of the world. Animals need to demonstrably hold one rule in mind and transition to another distinct rule on noticing a violation of a prediction based on the current rule. Although animals can be trained on tasks which change or reverse rules in a block-wise manner 12 , 26 , 27 , 28 , 29 , 30 , 31 , 32 , 33 , these transitions typically involve several tens to hundreds of trials of intermediate performance, making it difficult to directly relate neural activity to an update of cognitive rules. One-shot block transitions provide a significant advantage in this regard. In one-shot block transitions 34 , 35 , a single error leads to a complete and persistent switch between distinct cognitive states. Thus, repeated one-shot block transitions within a session would allow directly relating neural prediction-error responses with the updating of mental rules.

The identification of temporally well-defined cognitive prediction errors would allow addressing a key question: What neural circuit compares predictions and observations to compute prediction errors? Computing prediction errors requires inhibition 2 , and distinct inhibitory cell classes with their specific connectivity patterns provide the basis for current models of prediction error circuits 36 . In particular, vasoactive intestinal peptide (VIP) expressing interneurons are crucial in shaping cortical activity 37 , 38 , 39 and provide unique computational opportunities through a disinhibitory motif involving somatostatin (SOM) expressing interneurons 40 . While VIP interneurons have been hypothesized to be a key player in prediction error computation 36 , directly testing this theory is challenging, as it would require measuring neural prediction errors while the activity of VIP interneurons is perturbed in animals performing a cognitive task. Importantly, any neural perturbation should not lead to a change in the task-switching behaviour itself, as this would confound any interpretation of the neural circuit consequences of the perturbation.

Here we studied the neural basis of cognitive prediction errors during task-switching in the mouse neocortex. We trained mice to perform largely one-shot block transitions triggered by cognitive prediction errors. The prediction error was an absence of an expected stimulus, allowing us to better isolate cognitive factors from reward and stimulus evoked factors 41 . Using behavioural modelling, widefield calcium imaging, and optogenetic silencing, we generated a cortex-wide map of prediction-error signals and established a specific role for the ACC in task-switching behaviour. Using two-photon imaging of the ACC we identified a population of prediction-error neurons and, crucially, established that the duration of the prediction-error signal corresponded with the requirement of the ACC in task-switching at timescales within and across trials. We finally used all-optical methods to bi-directionally modulate VIP interneuron activity and established that VIP interneurons play a specific and causal role in the computation of prediction errors in the ACC. These results provide direct evidence for a causal role of prediction-error signals in the ACC in task switching, and identify a key inhibitory interneuron class required for cognitive prediction error computation.

We trained head-fixed mice to switch between blocks of discriminating two visual stimuli or discriminating two olfactory stimuli (Fig.  1A, B ). During olfactory discrimination, a random subset (70%) of odour stimuli were preceded by the same visual stimuli, now irrelevant to the task. Mice repeatedly switched between attending to and accurately discriminating the visual stimuli (rule 1) and ignoring the same visual stimuli while accurately discriminating the odour stimuli (rule 2). Mice performed up to 15 behavioural switches, or block transitions per session (Fig.  1C , range 6 to 15, median 14 transitions across 13 sessions, 10 mice, one session per mouse shown. Number of trials per block, median ± IQR, 33 ± 1 and 35 ± 8 trials in visual and odour blocks respectively). Visual and odour stimuli were never presented simultaneously (Fig.  1B , Supplementary Fig.  1 ).

figure 1

A Experimental setup. B Task schematic. Mice switched between blocks of discriminating two visual stimuli in the visual blocks or discriminating two olfactory stimuli while ignoring the same visual stimuli in the odour blocks. C Behavioural discrimination performance (behavioural d′) across blocks ( N   = 10 mice). Shades of grey indicate individual mice. Odour discrimination performance is shown in brown circles. Mice performed up to 15 block transitions in a session. D Schematic of a transition from an odour block to a visual block, showing the last trial of the odour block and first two trials of the visual block, indicating the moment of odour prediction error. Labels A-E refer to timepoints in E . During one-shot block transitions as in E , a complete cognitive rule update occurs in the time indicated. E Example behaviour from an odour to visual block transition (lick raster) showing stimuli, lick, and reward times. In the odour block the mouse ignored both visual gratings while accurately discriminating odour stimuli, but switched rules after a single error trial and started accurately discriminating the same visual stimuli. F Histogram showing the number of trials required to switch between the two blocks, with one-shot transitions indicated by an arrowhead. N  = 95 transitions, 17 sessions, 14 mice. G Schematic of reinforcement learning (RL) models. Top, basic RL, bottom, RL with belief state. In the RL model with belief state, Q -values were only updated while the agent learnt the task, following which only context belief was updated to switch between blocks. H Average probability of licking the reward spout in response to Visual Stimulus 1 (rewarded only in visual blocks) and Visual Stimulus 2 (unrewarded in both blocks), aligned to the block transitions. Top, data from average of 95 odour to visual block transitions. Shading indicates SEM. Grey bar indicates the 10-trial duration where mice performed at 100% accuracy with no licks to either irrelevant visual stimulus, which was a condition for triggering a block transition. First 3 trials of each block were forced to be Visual Stimulus 1 (to assess the animal’s belief of which block it was in), resulting in a gap in the red curve depicting Visual Stimulus 2. Middle, basic RL model fit to data. Bottom, RL model with belief state fit to data. Source data are provided as a Source Data file, for this and subsequent figures.

One-shot cognitive task-switching behaviour

Our aim was to repeatedly capture the transition between two distinct and accurately applied task rules within a well-defined time period. Furthermore, to better separate cognitive processes from stimulus- or reward-evoked activity during the transition, we needed the block transitions to be inferred without any explicit stimulus or reward signal. Our task satisfied these requirements on the transitions from odour to visual blocks: mice noticed the absence of an expected odour stimulus (an odour prediction error) to switch their behaviour, and accurately responded to the now-relevant visual stimuli in subsequent trials (Fig.  1D, E ). Thus, we focused on the transition from odour to visual blocks, triggered by the omission of an expected odour stimulus (but see below for visual to odour block transitions).

Animals typically require many trials to switch between blocks of distinct rules, transitioning through periods of intermediate performance 12 , 26 , 27 , 28 , 29 , 30 , 31 , 32 , 42 . This makes it difficult to precisely identify the behavioural and neural processes underlying a rule-switch. Optimal ‘one-shot’ transitions offer a significant advantage, since they involve animals switching between two accurately applied rules after a single error 34 , 35 . Most odour to visual block transitions in our task were indeed one-shot (Fig.  1D–F , 63.2% one-shot transitions, see Supplementary Movie  1 for an example). We used strict criteria to ensure that the one-shot block transitions captured a complete mental transition between two demonstrably applied rules (see methods). A low probability of licking in response to irrelevant visual stimuli in the odour block led to a corresponding proportion of zero-shot block switches, or ‘fluke transitions’ (Fig.  1F , zero trial bin) which were removed from further analysis. The remaining transitions were dominated by one-shot transitions (Fig.  1F arrowhead) with a smaller fraction requiring two or more error trials.

To understand the processes underlying this rapid task-switching, we first fitted the behaviour to a basic reinforcement learning (RL) model (tabular SARSA, Fig.  1G , top) in which an agent continually updated its estimated value of licking in response to stimuli in each trial (the Q-value). This model was unable to reproduce the rapid block transitions observed in the data (Fig.  1H , top), and the best fit of learning and exploration rates produced slower block transitions (Fig.  1H , middle, see methods). We next built an RL model with a context belief state, hierarchically controlling subsequent choices (Fig.  1G , bottom). In this model the agent updated its belief about which context (or block) it was in by a context error signal. This model learnt two Q -value-tables for the distinct contexts, capturing the two rules of the task. Once the learning was complete, the agent was able to rapidly switch between these two Q-value-tables on accumulating a large enough context-error signal. This belief state model reproduced the rapid block transitions (Fig.  1H , bottom). Once the model had learnt the task, freezing all learning, i.e., Q-value updates, did not affect the block transitions (see methods), demonstrating that switching between blocks did not involve any further learning. Instead, the agent inferred the block transitions using the context error. These results suggest that rapid task-switching behaviour involves transitioning between abstract representations of rules or contexts, driven by prediction errors.

Map of prediction error signals

Which brain regions signal the prediction error at block transitions? We measured neural activity across the entire dorsal cortical surface using widefield calcium imaging (Fig.  2A ) to identify the cortical regions which represent cognitive prediction-error signals at block transitions. Our task design provides a well-defined moment of prediction error at the odour to visual block transition: when, following the visual stimulus, an expected odour does not arrive (Fig.  1D ). Any neural activity specific to this prediction error would be (1) absent in the final trials of the odour block when the odour was expected and actually delivered, (2) present when the odour was expected but absent (the prediction error), and (3) absent once again when the odour was no longer expected later in the visual block (schematic of the three conditions shown in Fig.  2B ).

figure 2

A Schematic of widefield calcium imaging. B Schematic of behaviour during a one-shot transition from an odour to visual block, indicating the three trial types used to identify prediction-error pixels. The first trial of the visual block contains an odour prediction error, due to the absence of an expected odour stimulus, which mice can use to infer the change in block type. Prediction-error pixels were defined as pixels with significantly different activity between the odour prediction-error period and both the odour period and no odour predicted period. C Mean activity (columns 1-3) and significant prediction-error pixels (column 4) across 6 timepoints, aligned to expected/actual odour onset ( N  = 11 sessions, 4 mice). Significance column shows pixels with activity significantly different between odour prediction error and odour trials, and between odour prediction error and no odour predicted trials, with the t-statistic values from the comparison of column 1 and 2 displayed in colour code. D Pixels with significant prediction-error signalling, averaged across 1.2–2.7 s relative to the expected/actual odour onset. E Mean activity profiles from the secondary motor cortex (top), primary visual cortex (middle) and olfactory bulb (bottom, ROIs shown in insets). Shading indicates SEM. Black line indicates periods during which the odour prediction-error response was significantly different from both the odour and no odour conditions (two-sided paired t -test, P  < 0.01 for more than 500 ms, N  = 4 mice).

We aligned the hemodynamic-corrected and movement-corrected calcium activity to these three temporal epochs of odour onset, odour prediction error, and no odour predicted (Fig.  2B ). The activity in these epochs evolved across the cortex over time, with higher activity apparent in the odour prediction-error condition (Fig.  2C , first three columns). We identified pixels at each time point which satisfied the criteria described above (see methods) and mapped their activity (Fig.  2C final column, colour map represents t-statistic of comparison between first and second column). This revealed the precise spatio-temporal evolution of the prediction-error signal across dorsal cortex, which spread from anterior secondary motor regions to posterior parietal areas (average map from 1.2 to 2.7 s shown in Fig.  2D ). In particular, we found a strong prediction-error signal in areas superior to prefrontal cortex. Although the visual cortex and olfactory bulb showed robust sensory stimulus evoked responses to visual and odour stimuli respectively, they did not contain detectable prediction-error signals (Fig.  2D , averaged activity PSTHs shown in Fig.  2E ). Together these results suggested that prefrontal cortical areas, not primary sensory areas, are involved in detecting the cognitive prediction error relevant for the rapid block transitions in this task.

ACC is required for rapid block transitions

We next asked which specific region within prefrontal cortex was required for driving the task-switching behaviour. A number of studies have linked ACC activity to surprising events 9 , 42 , errors 10 , 23 or negative feedback 43 , 44 , which promote updating of behavioural policies 16 , 21 , or promote control 15 . In addition, the prelimbic (PL) cortex has been implicated in task-switching behaviour 45 . If prediction-error signalling in the ACC or PL is required for rapid task-switching, silencing these regions should disrupt the behaviour specifically at block transitions. To test this, we optogenetically silenced the ACC or PL bilaterally for the entire behavioural session in a group of mice (Fig.  3A ). ACC silencing caused a strong deficit in switching from odour to visual blocks, when the mice needed to detect the absence of an expected odour to switch rules. Strikingly, with the ACC silenced, mice ignored the rewarded visual stimulus repeatedly in anticipation of an odour stimulus, reflecting the continuing application of the odour block rule (Fig.  3B ). As a result, ACC silencing increased the number of trials taken to switch (Fig.  3C, D ) and reduced the proportion of one-shot odour to visual block transitions (Fig.  3D, E ). This was also the case when silencing the ACC using bilateral infusions of the GABA-A receptor agonist muscimol (Supplementary Fig.  2C ). The belief-state RL model could fit the ACC silencing data with the prediction-error signal reduced by a factor of 0.22, suggesting that the task-switching deficit could be explained by a partial reduction in the ACC prediction-error signal (Supplementary Fig.  2D ). These results demonstrate that the ACC is required for processing the omission of an expected event, a prediction error, during task-switching.

figure 3

A Schematic of bilateral optic fibre implantation targeting both ACC and PL in each mouse. B Example lick raster showing stimuli, lick and reward times during a transition from an odour to visual block during ACC silencing leading to impaired switching behaviour. C Average number of trials required in a session to switch from an odour to visual block, N  = 30 sessions, 8 mice, 3–4 sessions per silencing condition per mouse, shades of grey indicate individual mice here and below. Median ± interquartile range (IQR) for control 2.83 ± 2.83 trials, continuous ACC silencing 5.4 ± 4.34 trials, continuous PL silencing 2.75 ± 3.37 trials. Two-sided Wilcoxon signed-rank test comparing control and continuous ACC silencing P  = 0.0006, control and continuous PRL silencing P  = 0.78, continuous ACC silencing and continuous PRL silencing P  = 0.02. D Histogram of number of trials taken to switch from odour to visual blocks during control and continuous ACC silencing sessions. N  = 136 and 131 control and ACC silencing transitions respectively. One-shot transitions indicated by arrowhead. E Proportion of transitions from odour to visual blocks that were one-shot. Grey circles represent means for individual mice ( N  = 8), lines represent means ± STD across mice. Mean ± STD for control 0.39 ± 0.27, continuous ACC silencing 0.19 ± 0.14, continuous PL silencing 0.36 ± 0.28. Two-sided Chi-squared test of proportions comparing control and continuous ACC silencing P  = 0.001, control and continuous PL silencing P  = 0.65, continuous ACC silencing and continuous PL silencing P  = 0.005. F Steady-state stimulus discrimination performance after each successful block transition. Behavioural d-primes to visual stimuli in a visual block: control, continuous ACC silencing and continuous PL silencing sessions, 2.64 ± 0.55, 2.44 ± 0.38 and 2.46 ± 0.55, respectively, visual stimuli in an odour block 0.44 ± 0.5, 0.54 ± 1.19, and 0.54 ± 0.89, respectively, and odour stimuli 2.64 ± 0.63, 2.87 ± 1.0, and 2.79 ± 0.57, respectively. Two-sided Wilcoxon signed-rank tests as indicated, all Ps > 0.05. Each datapoint represents averages for a single session, recorded from 8 mice performing 3-4 sessions each. For all boxplots the centre mark indicates median, with the upper and lower bounds indicating 75th and 25th percentile respectively, and the whiskers indicating the most extreme datapoints not considered outliers.

Interestingly, silencing PL in the same mice did not affect switching behaviour (Fig.  3C ). Although we observed a cognitive prediction-error signal across a substantial portion of frontal cortex in our widefield experiment (Fig.  2D ), the absence of any behavioural effect of silencing PL, adjacent to the ACC, reveals a striking specificity in the role of the ACC. These results indicate that the ACC has a specific role in rapidly updating behavioural rules or contexts within seconds, and this role is not widely shared with other prefrontal areas.

While the ACC is required for rapidly switching between task rules, does its role extend to applying these task rules once the animal has switched blocks? We found that even during continuous ACC silencing, once the mice did spontaneously switch from ignoring to discriminating the visual stimuli, the subsequent accuracy of visual discrimination was only slightly lower from unsilenced sessions (Fig.  3F ). This suggests that the ACC plays a critical role in transitioning between task rules, and less so in maintaining these rules.

In addition to rapid task-switching, the ACC has been implicated in guiding slower learning processes 46 , 47 , 48 . We optogenetically silenced the ACC in a subset of mice as they first learned the switching task and found no difference relative to controls in either the rate of learning a novel stimulus-reward association, or learning to ignore irrelevant stimuli 49 (Supplementary Fig.  2G ). Thus, in this paradigm, the role of the ACC is specific to rapid task-switching and does not extend to slower learning.

Overall, these results established that the ACC is essential for rapid task switching driven by a cognitive prediction error. However, ACC projections become dispensable when task demands are reduced in an attentional task 50 . We asked if a similar result was true in task-switching behaviours, that is, if an animal could overcome the inhibition of the ACC if the block transition was marked by a more salient event than the absent expected odour. The opposite direction of block transition in our task, from visual to odour blocks, was marked by the unexpected arrival of an odour, a more salient prediction error. Most visual to odour block transitions (60%) were also one-shot (Supplementary Fig.  3A, B ) and were captured well by the belief-state RL model (Supplementary Fig.  3C ). Interestingly, silencing the ACC had no effect on the switching behaviour in these visual to odour block transitions (Supplementary Fig.  3D–F ). Thus, when task demands are reduced by using a highly salient event to mark block transitions, the ACC may become dispensable in task switching, similar to attentional tasks. Our study, however, focused on the odour to visual block transitions, which relied on the ACC.

Prediction-error signals in ACC neurons

Since block transitions were typically complete within a single trial, this also meant that the neural activity in the ACC required for the block transitions must be present within the few seconds between the prediction error (absent predicted odour) and the next visual stimulus (Fig.  1D ). We therefore asked what neural signals in the ACC during this period may account for its role in rapid block transitions. We recorded the activity of populations of ACC neurons during the behaviour using chronic two-photon calcium imaging through a microprism (Fig.  4A , Supplementary Fig.  4A ). As expected, we found that individual ACC neurons responded to many task-related variables. Subsets of neurons responded to visual and odour stimuli, locomotion onsets, reward delivery and licking (Fig.  4B, C ). A binary classifier was able to accurately decode the identity of visual and odour stimuli after stimulus onsets from ACC neural population activity, and was further able to decode the block type both before and after stimulus onsets (Fig.  4D ). Thus, ACC contained diverse signals relevant to the ongoing task.

figure 4

A Top, schematic depicting two-photon imaging of neurons expressing GCaMP7f in the ACC through a microprism implant. Bottom, example ACC imaging site. B Responses from 5 example neurons to task stimuli and running onsets. Shading here and below indicates SEM. C Mean responses of all ACC neurons ( N  = 6878 neurons) aligned to task stimuli and behaviour. Activity was aligned −1 s to 3 s around task stimulus or behaviour onset and mean baseline (−0.5 s to 0 s) subtracted. Each condition is sorted by the averaged activity from 0 to 1 s. D Time course of decoding accuracy of a binary classifier using neuronal activity of simultaneously imaged neurons from the ACC, mean of 9 sessions from 8 mice. Grey, decoding the two visual stimuli in the visual block, brown, decoding the two odour stimuli in the odour block, and purple, decoding block-type from activity aligned to the non-rewarded visual stimulus onset in the two blocks. E Schematic of behaviour during a one-shot transition from an odour to visual block, indicating the three trial types used to identify prediction-error neurons. F Example prediction-error neuron with a significantly larger response to the odour prediction-error condition (red), compared to the actual delivery of the odour (green) or trials where odours were neither predicted nor delivered (blue). Data are presented as mean responses +/− SEM. G Average response of all prediction-error neurons with a positive response to the odour prediction-error condition, in the three conditions as described in F ( N  = 168 neurons, Two-sided Wilcoxon signed-rank test between the prediction error and odour conditions averaged 0 to 1.5 s, *** P  = 1.69 × 10 −28 ). Shading indicates SEM. H Proportions of neurons with significantly different activity (averaged 0 to 1.5 s) between the three trial types described in E . Type A and Type B were significantly different only when comparing odour prediction error to odour ( N  = 2007 neurons, 29%) or to no odour predicted conditions ( N  = 567 neurons, 8%) respectively. Prediction-error neurons were defined as neurons significantly different in both comparisons (N = 616 neurons, 9%, total 6878 neurons, 10 mice). I Left, schematic of two-photon calcium imaging from primary visual cortex (V1). Right, same comparisons as in H with neurons recorded from V1 ( N  = 18 Type A, 1%, 41 Type B, 2%, and 35 prediction-error neurons, 1.6%, total 2138 neurons, 4 mice) J Top, schematic of retrograde labelling and imaging strategy. Bottom, example image of retrogradely labelled striatal-projecting (CTB-Alexa647 labelled) and non-striatal projecting neurons in the ACC. K Percentages of recorded neurons significantly responsive to each of 9 task events. L Proportion of prediction-error neurons from striatal-projecting and non-striatal projecting neurons in ACC. Striatal projecting: N  = 124 Type A (29%), 30 Type B (7%), 21 prediction-error neurons (5%), total 421 neurons. Non-striatal projecting: N  = 1384 Type A (28%), 392 Type B (8%), 484 prediction-error neurons (10%), total 4888 neurons, 8 mice). M Bootstrapped distribution of proportion of prediction-error neurons in non-striatal projecting neurons. Data for proportion of striatal projecting prediction-error neurons indicated as vertical line. Data is outside the 99% confidence intervals.

To identify neurons representing prediction errors, we asked if a neuron showed responses which were (1) absent in the final trials of the odour block (2) present during the odour prediction error, and (3) absent once again when the odour was no longer expected later in the visual block (Fig.  4E , the same criteria as used in the widefield data). We found neurons in the ACC that satisfied these criteria, which we termed ‘prediction-error neurons’. Figure  4F shows an example prediction-error neuron which was suppressed in trials when the expected odour did arrive, but responded strongly during the odour prediction error (when an odour was expected but did not arrive), and was not responsive later when an odour was no longer expected (see also Supplementary Fig.  4B ). Importantly, the prediction-error signal is a response to the non-occurrence of an expected stimulus, not a stimulus onset or offset, ensuring that the response originates from a violated stimulus expectation.

The average of all prediction-error neurons positively activated by the prediction error is shown in Fig.  4G . The percentage of all recorded neurons which showed any prediction-error responses was 9% (Fig.  4H ). This group contained neurons with a positively activated response to the prediction error (Fig.  4F , g, 27% of prediction-error neurons), as well as a similar number of neurons with an inhibited response to the prediction error (30% of prediction-error neurons, Supplementary Fig.  5A ) and other combinations of activity profiles. Response inhibition by the arrival of the expected odour occurred for both rewarded and non-rewarded odours, suggesting that these neurons did not reflect a simple negative reward signal (Supplementary Fig.  5B ). Crucially, changes in running and licking could not account for the identification of prediction-error neurons (Supplementary Fig.  5C, D ).

To investigate whether prediction-error neurons were found widely across the brain, we conducted the same experiment using two-photon calcium imaging to record activity in the primary visual cortex (V1) (Fig.  4I , left). Consistent with the widefield calcium imaging results, our findings revealed a near absence of prediction-error neurons in V1 (Fig.  4I , right). This corroborated the widefield imaging results and, importantly, confirmed that our identification of prediction-error neurons in the ACC was statistically reliable.

We also studied the opposite direction of block transition, from visual to odour blocks, which was marked by the unexpected arrival of an odour stimulus. We again found prediction-error neurons in the ACC which responded differently to the unexpected odour, compared to the same odour when expected, or no odour (Supplementary Fig.  6A, B , similar proportions of prediction-error neurons were obtained when controlling for running and licking behaviours, data not shown). Crucially, however, we obtained a similar proportion of these neurons in V1 (Supplementary Fig.  6C ). Thus, prediction-error neurons signalling an unexpected appearance of an odour are found both within and outside prefrontal cortex, consistent with the finding that the ACC is not necessary for this direction of transition (Supplementary Fig.  3D–F ). Interestingly, neurons which represented both directions of prediction error were present at a higher proportion than expected by chance (Supplementary Fig.  6D ), suggesting the presence of generalist, in addition to specialist prediction-error neurons in the ACC.

Striatal projections in the ACC exclude prediction errors

Prediction-error neurons may broadcast their activity widely across the brain, or they may be enriched or excluded from populations based on their projection target. We distinguished between these two scenarios in the same mice by selectively labelling the sub-population of ACC neurons which projected to the striatum, a major projection target of the PFC 51 . CTB-Alexa647 injections in the striatum identified striatal-projection neurons in the ACC (Fig.  4J ), and non-retrolabelled neurons were enriched for non-striatal projecting neurons. Although striatal projecting neurons had mostly overlapping response properties with non-striatal projection neurons (Fig.  4K , Supplementary Fig.  5E, F ), they contained a significantly smaller proportion of prediction-error neurons (Fig.  4L , Chi-squared test of proportion P  = 0.0015, striatal projecting prediction-error neurons = 21/410, 5%, non-striatal projecting prediction-error neurons = 463/4888, 9%). This lower proportion of striatal projecting prediction-error neurons was outside the 99% confidence intervals of the non-striatal projecting prediction-error neurons (Fig.  4M , bootstrap test). Thus, prediction-error responses in the ACC are not indiscriminately broadcast, but are significantly excluded from a major projection target.

Prediction-error signals in the ACC coincide with effective silencing and un-silencing epochs

Having identified prediction-error neurons in the ACC, we next asked whether the duration of the neural prediction-error signal corresponded to the duration of causal involvement of the ACC in the task. We first asked how sustained the prediction-error response was within an individual trial. We focused on all ACC neurons with a positive prediction-error response and found that their response peaked soon after the time of expected odour onset, but remained significantly higher than both its baseline and the actual odour response until the beginning of the next trial (Fig.  5A , Wilcoxon signed-rank test prediction-error response vs baseline aligned to prediction error and next stimulus P  = 2.98 × 10 −15 , P  = 3.34 × 10 −8 respectively, prediction-error response vs odour response, P  = 1.69 × 10 −28 , P  = 5.45 × 10 −15 respectively). Indeed, this corresponded to the duration of the requirement of the ACC in the task, since silencing the ACC both during the ITI and peri-stimulus period on each trial (Fig.  5B ) caused significant deficits in the switching behaviour, although each to a smaller degree than continuous silencing (Fig.  3C ). Thus, ACC activity is required not only around the moment of expectation violation, when the largest prediction-error responses are present, but remains important until the start of the next trial as the animal updates its belief about the current rule.

figure 5

A Mean response of all positively responding prediction-error neurons ( N  = 168 neurons) aligned to the odour prediction-error event, and to the next visual stimulus onset (grey shading). Response shown to the odour prediction-error event (red) and actual odour delivery (green). Two-sided Wilcoxon signed-rank test between the two conditions averaged 0 to 1.5 s from prediction-error event and −0.5 to 0 s from the next stimulus onset, *** P  = 1.69 × 10 −28 and P  = 5.45 × 10 −15 respectively. B Schematics depicting inter-trial-interval (ITI) and peri-stimulus silencing epochs. C Number of trials required in a session to switch from an odour to visual block, median ± IQR here and below, control 2.83 ± 2.83 trials, ITI ACC silencing 3.8 ± 3.79 trials, and peri-stimulus ACC silencing 4.0 ± 6.98 trials. Two-sided Wilcoxon signed-rank tests comparing control and ITI sessions P  = 0.02, control and peri-stimulus sessions P  = 0.009, ITI and peri-stimulus sessions P  = 0.39. Each datapoint represents averages from a single session, recorded from 8 mice performing 3-4 sessions each. D Mean odour prediction-error response relative to pre-event baseline aligned to the same temporal epoch over consecutive trials (T is first prediction-error event in an odour to visual block transition), shown up to 3 trials following first prediction-error event (two-sided Wilcoxon signed-rank test, average −0.5 to 0 s compared to 0 to 1.5 s, *** P  = 2.98 × 10 −15 ). Each datapoint represents the mean response across all positively responding prediction-error neurons, with error-bars indicating SEM. Top, example neuron responses showing the rapid decay of prediction-error response over trials, mean responses +/− SEM. E Schematic depicting silencing during the entire session except for one trial at each odour to visual block transition. F Number of trials required in a session to switch from an odour to visual block, control 2.83 ± 2.25 trials, continuous ACC silencing 5.4 ± 4.34 trials, one trial un-silencing 1.0 ± 0.82 trials, two-sided Wilcoxon signed rank test comparing control and continuous ACC silencing sessions P  = 0.0006, control and one trial un-silencing P  = 0.0084, continuous ACC silencing and one trial un-silencing P  = 0.0016. Each datapoint represents averages from a single session, recorded from 8 mice performing 2–3 sessions each. For all boxplots the centre mark indicates median, with the upper and lower bounds indicating 75th and 25th percentile respectively, and the whiskers indicating the most extreme datapoints not considered outliers.

Is the ACC required for multiple trials as a block transition occurs? To address this question, we first asked how sustained the prediction-error response in the ACC was across consecutive trials. We found that the prediction-error response was present only on the first prediction-error trial, and rapidly decayed to non-significant amplitudes on subsequent trials (Fig.  5D ). If the ACC indeed enables block transitions through prediction-error signalling, the rapid decay of the prediction-error signal over trials would predict that the ACC is only required during the prediction-error trial, and not earlier or later. To directly test this prediction, we continuously silenced the ACC for the entire session and unsilenced it only on the first trial of a visual block when the prediction error occurs (Fig.  5E ). The resulting behaviour showed no deficits in task switching, and mice rapidly switched between blocks (Fig.  5F , bottom, Supplementary Fig.  2E ). Interestingly, the speed of behavioural switching in the un-silencing condition was significantly faster than controls, possibly due to enhanced ACC activity from the removal of inhibition (see also Fig.  6 ). Critically, after switching to the new block, the mice performed highly accurate discrimination for the rest of the block despite the ACC being continuously silenced (Supplementary Fig.  2F ). The rapid task switching could not be accounted for by a startle response to the light offset, since the mice did not lick in response to the light offset, or the visual stimulus immediately following the light offset (middle row in Fig.  5E ), but instead licked in response to the subsequent visual stimulus, after the odour prediction error. Thus, the ACC is specifically required only for processing the prediction error and is thereafter no longer required for accurate task performance. These results provide direct evidence for prediction-error signals in the ACC driving task-switching 2 , 4 , and are consistent with a prominent theory suggesting that the PFC largely signals the non-occurrence of expected outcomes 52 .

figure 6

A Schematic showing odour to visual block transitions and the prediction-error response preceding two scenarios, top, where the mouse executes a one-shot block transition, or switch, and bottom, where the block transition is slower than one-shot. Red and yellow responses are prediction-error responses from an example neuron, green shows the response to delivery of odour, shading indicates SEM here and below. B Left, odour prediction-error response of two example neurons preceding a one-shot (red) or slower transition (yellow). Right, mean responses across all positively-responding prediction-error neurons ( N  = 146 neurons) preceding one-shot and slower transitions (two-sided Wilcoxon signed rank test between the two conditions averaged 0 to 1.5 s from prediction-error event and −0.5 to 0 s from the next stimulus onset, *** P  = 0.0003, * P  = 0.019). Shading indicates SEM. C Prediction-error signal amplitude from the Belief-State RL model, preceding one-shot transitions and slower transitions, *** P  < 10 −8 , N  = 70 transitions. For the boxplots the centre mark indicates median, with the upper and lower bounds indicating 75th and 25th percentile respectively, and the whiskers indicating the most extreme datapoints not considered outliers.

Larger prediction-error signals in ACC precede successful one-shot block transitions

Finally, to determine whether prediction-error neurons in the ACC actively contribute to the behavioural transitions across blocks, we asked if the amplitude of prediction-error responses at block transitions was related to the subsequent success in behavioural switching. We divided the odour to visual block transitions into one-shot and slower than one-shot block transitions (Fig.  6A , N  = 51% one-shot and 49% slower transitions). Prediction-error neurons with a positive response were more active preceding one-shot block transitions, compared to slower transitions (Fig.  6B ) when aligned to the prediction-error event (Wilcoxon signed-rank test, P  = 0.0003) and also when aligned to the next stimulus onset ( P  = 0.019, N  = 146 neurons). This result was also true when combining all prediction-error neurons ( P  = 8.57 × 10 −9 , N  = 616 neurons). This result could not be accounted for by differences in running speed across the two conditions (Supplementary Fig.  7A ). The pupil diameter also did not show significant differences between the one-shot and slower block transitions, and largely reflected running speed changes (Supplementary Fig.  7B ).

Since our belief-state RL model of the behaviour included an explicit context-prediction-error signal, we asked if this signal also predicted one-shot block transitions across contexts. The model revealed a similar pattern, where the amplitude of the noisy prediction-error signal was predictive of future one-shot transitions (Fig.  6C , Wilcoxon rank sum test, P  < 10 −8 , N  = 70 transitions). These results support the claim that prediction-error neurons in the ACC play a key role in driving rapid task-switching behaviour, particularly when they produce larger prediction-error signals.

VIP interneurons contribute to prediction-error computation in the ACC

Local inhibitory circuits are crucial in shaping cortical activity 53 , and are believed to be necessary for computing prediction errors 2 . These circuits need to compare predictions and observations, with a mismatch between the two leading to prediction-error signals. While a diversity of inhibitory circuits may compute prediction-errors 36 , VIP interneuron disinhibition 37 , 38 , 39 is hypothesized to be key in this process 36 (Fig.  7A ). Indeed, VIP driven disinhibition interacts with a perceptual prediction error during visuo-motor coupling mismatch in V1 54 . We asked whether VIP interneurons played a role in producing cognitive prediction-error responses in the ACC in our task. We employed an all-optical approach, where we photoactivated or photoinhibited VIP interneurons while simultaneously measuring the activity of VIP and non-VIP neurons in the same brain region with in-vivo two-photon calcium imaging (Fig.  7A, B ). Importantly, the VIP perturbations were in one hemisphere only, and did not lead to any changes in the task-switching behaviour itself (Wilcoxon signed-rank test comparing behavioural d’ for the relevant visual stimuli, irrelevant visual stimuli, and odours between control and ITI VIP activation or inhibition sessions, all Ps > 0.05, similar Wilcoxon signed-rank test comparing the number of trials to switch between blocks all Ps > 0.05).

figure 7

A Hypothesis from theoretical work for a cortical circuit to compute prediction errors. The prediction-error neuron will respond (positive or negative responses) when predictions and observations do not cancel each other out. Part of this hypothesis is tested by activating and inhibiting VIP interneurons during task-switching. B Left, example site with neurons expressing GCaMP7f in green and VIP interneurons expressing the excitatory opsin ChrimsonR in red. Right, schematic of our all-optical approach: simultaneous 2-photon imaging and optogenetic activation or inhibition of VIP interneurons in the ACC. C Top, peri-stimulus time histograms of mean activity of all non-VIP neurons ( N  = 2467 neurons, 8 mice) aligned to light onset, at 5 different light powers, for 1.5 s. Shading and error bars indicate mean ± SEM across all neurons from all 8 mice here and below. Bottom, average activity of all neurons across light presentation window (0–1.5 s). Inset: Image of an example VIP interneuron (arrowhead) and its response to increasing light powers, demonstrating photoactivation. D Schematic of inter-trial-interval (ITI) activation epoch. E Proportions of cells showing prediction-error responses and other cell classes as in Fig.  4H . Left, control and right, ITI VIP activation sessions from the same sites, performed on subsequent days. F Top, average activity in each trial type of positively responding prediction-error cells identified in all three sessions ( N  = 38). Shading indicates SEM. Bottom, mean response (0 to 1.5 s) of these cells to odour prediction error (left, red) and odour (right, green). Two-sided Wilcoxon rank-sum tests comparing control vs ITI sessions, prediction-error responses P  = 1.1 × 10 −4 , odour responses P  = 0.93, with whiskers indicating SEM. G Top, schematic of peri-stimulus activation/inhibition epoch. Bottom, time course of decoding accuracy of a binary classifier using neuronal activity during peri-stimulus VIP activation sessions ( N  = 4603 neurons, 8 mice), decoding task stimuli as in Fig.  4D . Shading indicates SD across mice. H Proportions of cells showing prediction-error responses and other cell classes as in Fig.  4H . Left, control and right, ITI VIP inhibition sessions from the same sites, performed on subsequent days. I Time course of decoding accuracy with a binary classifier using activity during peri-stimulus VIP inhibition sessions ( N  = 888 neurons, 3 mice). Shading indicates SD across mice. J Proportions of cells showing prediction-error responses and other cell classes as in Fig.  4H . Left, no light and right, ITI light only control sessions from the same sites, performed on subsequent days.

On photoactivating VIP cells in the ACC in passive mice with increasing light powers (Fig.  7C bottom, inset), we observed monotonically increasing population activity in non-VIP cells (Fig.  7C ). Since non-VIP cells in cortex are predominantly excitatory pyramidal neurons, this experiment demonstrated the effective disinhibition induced by our all-optical approach. We next photoactivated VIP cells in the ACC as mice performed the task-switching behaviour. VIP cells were photoactivated on each trial only during the inter-trial-interval (ITI) period, which encompassed the prediction-error event at block transitions (Fig.  7D ). We compared the proportion of prediction-error neurons in the VIP photoactivation sessions to control sessions and found that VIP activation during the ITI period strongly reduced the percentage of prediction-error neurons from 10% to 2% (Fig.  7E , Chi-squared test of proportion P  < 0.0001). To determine how VIP photoactivation affected the response of ACC neurons to prediction-errors, we measured the average response amplitude to the odour prediction-error event, and odour stimulus, from positively responding prediction-error neurons (Fig.  7F ). We found that VIP photoactivation led to a significant reduction only in the response to the odour prediction-error, demonstrating that VIP photoactivation limits the prediction-error response amplitude (Fig.  7F bottom). Critically, we confirmed that our VIP activation did not lead to widespread deficits in ACC responses, since our perturbation did not affect representations of stimuli and block rules. A binary classifier as used in Fig.  4D was able to accurately decode the identity of visual and odour stimuli after stimulus onsets and the block type before and after stimulus onsets, both when VIP cells were photoactivated during the visual stimulus onsets (Fig.  7G ) and during the ITI period (data not shown). Thus, inducing disinhibition through VIP interneuron activation specifically and heavily disrupted prediction error signalling in the ACC.

Finally, to prove their necessity in generating prediction error responses in the ACC, we optogenetically silenced VIP interneurons during the task. VIP inhibition during the ITI period strongly reduced the percentage of prediction-error neurons from 5% to 0.6% (Fig.  7H , Chi-squared test of proportion P  < 0.0001). Again, a binary classifier was able to accurately decode visual and odour stimuli and block type, both when VIP cells were photoinhibited during the visual stimulus onsets (Fig.  7I ) and during the ITI period (data not shown). Crucially, light-only control mice which did not express any opsin showed no change in the proportion of prediction-error neurons detected on presentation of the same optogenetic light (Figs.  7J , 3.6% to 4.3%, Chi-squared test of proportion P  = 0.46). Together, these results demonstrate that bidirectional perturbation of VIP interneurons strongly and specifically disrupts prediction-error computations in the ACC. Future work may test the role of other cell classes such as PV interneurons, and the sources of prediction and observation signals in this computation (Fig.  7A ).

In this study, we demonstrate that mice can perform rapid and repeated block transitions in a cross-modal task switching paradigm. A theoretical RL model suggested that this behaviour was driven by a cognitive prediction-error signal, and in agreement with this prediction, we identified a neural signal which represented a cognitive prediction error in the ACC. Silencing and un-silencing the ACC in precise time windows demonstrated that this neural prediction-error signal was causally required for the mouse to update its behaviour between blocks. The amplitude of the prediction-error signal preceding a trial was related to whether or not a mouse would correctly switch its behaviour in the subsequent trial. Finally, VIP interneurons were identified as a key inhibitory cell class in the computation of cognitive prediction errors. Taken together, our results provide causal and mechanistic evidence for a longstanding idea, that the ACC computes cognitive prediction errors to guide flexible behaviour.

By training mice to perform repeated one-shot block transitions, we encapsulated a complete cognitive switch in a well-defined time window of a few seconds. Such rapid and complete cognitive transitions are akin to ‘Aha! moments’, where some information or insight allows an abrupt updating of one’s internal model of the world 55 . These rapid cognitive transitions are difficult to study experimentally – one-shot transitions have primarily been observed in primates 34 , 35 , and previous studies in rodents required tens to hundreds of trials to complete a behavioural transition 12 , 26 , 27 , 28 , 29 , 30 , 31 , 32 , 42 . Periods of intermediate accuracy which accompany these slower transitions prevent unambiguously relating a neural signal to its behavioural consequence. In our task, a well-defined prediction error leads to a demonstrable update in the animal’s cognitive model in the next few seconds. Moreover, when studying the moment of cognitive transition, multiple block transitions are required in a single session for sufficient statistical power. In addition, to be certain of an animal’s internal model of the world requires that the animal performs at high accuracy in the given task before the block transition, typically resulting in longer and fewer blocks. Our task provided an unprecedented number of repeated, highly rapid block transitions with mice switching between high accuracy behaviours, which was crucial for conclusively assigning a behaviourally relevant role to the neural prediction-error signal.

Cognitive prediction errors are distinct from prediction-error signals in other domains, such as reward prediction errors in dopaminergic neurons, sensory prediction errors in sensory cortex, or sensorimotor prediction errors in the cerebellum 2 , 56 , 57 , 58 , 59 . Cognitive prediction errors enable flexible cognition, where one needs to maintain an abstract cognitive context or model of the world in mind, which is updated when a prediction from the cognitive context or model is violated. This is a distinct problem to solve compared to other prediction-error computations and involves different brain regions 9 . Critically, the prediction-error responses we found in this study did not resemble classical negative reward signals such as those found in dopaminergic neurons. Instead, this signal resembles an outcome prediction error 52 which relates to errors in predicting outcomes based on the currently believed rules of the task.

The computation of a prediction-error signal in our task requires comparing an internally generated prediction signal with an external odour signal. The origin of the odour prediction signal is currently unclear, and may reach the ACC from other prefrontal areas 11 , 60 , 61 , 62 . However, the ACC itself robustly represents the ongoing task rules as well as external stimuli, and thus may compute the prediction error autonomously.

An important goal in understanding the neural circuit basis of cognition is to identify the circuit which compares predictions with observations. While this circuit has been studied in other contexts, such as for the dopaminergic reward-prediction system 57 or the visuomotor mismatch system 56 , it is poorly understood for cognitive rule-updating. In this study, we took advantage of a temporally well-defined cognitive prediction-error signal to take the first steps in uncovering the circuit involved. We found VIP-driven disinhibition to be key, and expect future studies to reveal further details about the role of other inhibitory cell classes in this circuit.

The recurrently connected nature of cortical circuits may suggest that perturbation of any cell class will invariably lead to a disruption of processing in that region. However, VIP modulation can be orthogonal to cognitive modulations in other cortical regions 63 , and importantly, here we show that VIP perturbations in the ACC did not affect stimulus and context representations, while disrupting prediction-error signals.

Other cortical association areas may play a role in task switching, such as orbitofrontal cortex 12 , 27 and posterior parietal cortex 64 , as well as subcortical areas including higher-order thalamic nuclei 65 . Future work understanding the role of these and other regions, would need to establish both the nature of the prediction-error signal and its requirement in behaviour. Although we have ruled out a role for PL in our task, earlier studies have shown that PL circuits are involved in flexible behaviour 45 , 66 . However, these studies investigated slower forms of behavioural transitions, compared to our largely one-shot transitions, possibly accounting for the difference with our results. Indeed, there are multiple types of surprise signals which may be processed differentially by different brain regions 9 , 67 .

Where is the prediction-error signal sent to in order to drive the behavioural changes? We have found that the striatum is an unlikely candidate, despite being a major projection target of the ACC, since the prediction-error signal is under-represented in ACC neurons projecting to the striatum. Other likely brain regions may include the locus coeruleus, which has been implicated in updating current strategies through norepinephrine signalling 17 , 18 , 68 , 69 , or the dorsal raphe nucleus, which promotes behavioural flexibility through serotonin and glutamate signalling 70 , 71 .

Disruption of cognitive flexibility in humans may lead to excessive transitioning between attentional states in ADHD, or excessive persistence in one or a few repetitive behaviours in ASD 72 . However, the neural circuit basis of these conditions is poorly understood. Our results provide a crucial insight regarding the role of the ACC in transitioning between, rather than maintaining, cognitive states. Thus, atypical ACC activity patterns may contribute to excessive or insufficient cognitive transitions in humans, and a more detailed understanding of how ACC circuits produce and transmit prediction errors may provide insights to better understand these conditions.

All experimental procedures were carried out in accordance with the institutional animal welfare guidelines and licensed by the UK Home Office.

Animals and surgical procedures

For all surgeries, mice were anaesthetised using isoflurane, at 4% concentration for induction and at 0.5–1% for maintenance. Additional drugs were used to provide analgesia (Metacam 5 mg/kg), anti-inflammatory effects (dexamethasone 3.8 mg/kg), and to reduce mucus secretions (Atropine 0.08 mg/kg). Eye-cream (Maxitrol) was applied to the eyes to prevent drying and body temperature was maintained at 37 °C using a heating mat and rectal temperature probe (Harvard Apparatus). Injections of antibiotic (Betamox 120 mg/kg) and analgesia (methadone hydrochloride 10 mg/kg) were given before the withdrawal of anaesthesia, and further analgesia was given daily for 1–2 days during recovery of the animal.

For the two-photon imaging experiments, 10 VIP-Cre mice (C57BL/6-VIP tm1(cre)zjh , The Jackson Laboratory, strain # 010908, P42-49, 6 males and 4 females) were used. An additional 4 mice of the same genotype (1 male, 3 females) were used together with the imaging mice for behavioural analysis. A circular piece of scalp was removed and the underlying skull was cleaned. Small holes were drilled in the skull above injection sites, located using stereotaxic co-ordinates. Injections of a mixture of viruses expressing GCaMP7f (pGP-AAV9-syn-jGCaMP7f-WPRE, Addgene) and Cre-dependent ChrimsonR (pAAV5-syn-FLEX-rc[ChrimsonR-tdTomato], Addgene) were made in the anterior cingulate cortex (ACC, +0.9 mm AP, −1.3 rising to −0.8 mm DV) of either the left or right hemisphere (+/− 0.55 mm ML), using glass pipettes and a pressure micro-injection system (Picospritzer III, Parker). An injection of Cholera Toxin Subunit B (recombinant) conjugated to Alexa Fluor 647 (ThermoFisher) was made in the striatum of the same hemisphere ( + 1.2 mm AP, +/− 1.5 mm ML, −2.5 rising to −2.0 mm DV) to retrogradely label cells projecting to the striatum. 8 out of the 10 mice had good enough quality z-stacks in the ACC to visualise CTB-Alexa647 and were used to identify striatal projecting ACC neurons. 8 out of the 10 mice exhibited sufficiently consistent behaviour to study the effects of VIP photoactivation. For the VIP silencing experiments (Fig.  7H ) 3 VIP-Cre mice were injected with a mixture of viruses expressing GcaMP7f (pGP-AAV9-syn-jGCaMP7f-WPRE, Addgene) and Cre-dependent ArchT (pAAV-FLEX-ArchT-tdTomato (AAV5), Addgene). For the light-only control experiments (Fig.  7J ), an additional 3 VIP-Cre mice were injected with a mixture of viruses expressing GCaMP7f and Cre-dependent tdTomato (pAAV-FLEX-tdTomato (AAV1), Addgene).

A circular craniotomy (diameter = 3 mm) was made above the ACC imaging site, with a centre 300 µm posterior to ACC injections. A 1.5 × 1.5 × 1.5 mm right-angled microprism with a reflective hypotenuse (Tower Optical) fixed to a glass coverslip using ultraviolet light-cured glue (Thorlabs) was slowly lowered into the craniotomy, with the vertical face closest to the injection site. The glass coverslip was fixed in place using cyano-acrylic glue (Loctite) and a custom machined aluminium head-plate was cemented onto the skull using dental cement (C&B Superbond). Imaging and behavioural training started approximately three weeks after surgery.

For the widefield imaging experiments, four male wildtype mice (C57BL/6;129- Nrxn1 tm1Sud ) from the ages of 21 to 27 weeks were used (all mice were wildtype C57BL/6). At least four weeks before surgery mice were given an intravenous injection of AAV PHP.eB GCaMP7f (Zurich vector core). For the surgery, the skin overlying the skull was removed and the edges of the skin were secured with tissue adhesive (Vetbond, 3 M). The overlying connective tissue on the skull was removed and a layer of transparent dental cement (C&B Superbond) was applied to cover all exposed skull and to secure a custom aluminium headplate. Following this 5 layers of 2.5ul of cyanoacrylate glue (Zap-A-Gap CA + , Pacer Technology) were thinly applied onto the cement to increase the transparency. After recovery from surgery for at least 5 days mice began habituation and behavioural training.

For the optogenetic silencing experiments, 8 transgenic mice (P42-49, 4 males and 4 females) expressing Channelrhodopsin-2 in parvalbumin-expressing interneurons were used, obtained by crossing FLEX ChR2 mice (Gt(ROSA)26Sor tm32(CAG-COP4*H134R/EYFP)Hze ) and PV-Cre mice (Pvalb tm1(cre)Arbr , The Jackson Laboratory, strain # 204109 and # 017320 respectively). Two small holes were drilled above the ACC and prelimbic cortex (PL) of each hemisphere. Dual-core cannulae with bilateral optical fibres (Thorlabs), each with a diameter of 200 µm and 0.39NA, cut to a length of <3 mm, were implanted in the ACC ( + 0.9 mm AP, −1.2 mm DV, +/−0.35 mm ML) and PL ( + 2.6 mm AP,−1.25 mm DV, +/− 0.35 mm ML), and the stainless steel ferrules were bonded to the skull using dental cement (C&B Superbond), along with a custom machined head-plate. PL implants were inserted at a 25° angle (relative to vertical) through holes drilled 0.8 mm anterior to PL. We performed light-only control experiments by comparing 3 PV-Cre mice (1 male and 2 females) with 3 PV-Cre crossed with FLEX-ChR2 mice (1 male and 2 females), which were implanted with optical fibre cannulae above ACC only. After recovery from surgery for at least 5 days mice began habituation and behavioural training. Mice were housed in a reversed-light-cycle cabinet illuminated between 7 pm and 7am, maintained at a temperature of 22 °C and 56% humidity.

Two-photon imaging was performed using a custom-built resonant scanning two-photon microscope (Cosys) and a Chameleon Vision S laser (Coherent) at 930 nm using a 16X, .8NA objective (Nikon). Images were acquired using a 12 KHz resonant scanner (Cambridge Technology) and an FPGA module (PXIe-7965R FlexRIO, National Instruments). Two-photon calcium imaging of GCaMP7f-labelled neurons in the ACC was performed across 40 training sessions and 48 full task-switching sessions in these 10 mice. The microprism depth, injection coordinates and cell morphology indicated that the imaging sites were largely located in layer 5. Multi-plane imaging was performed using a piezoelectric objective scanner (Physik Instrumente). Depending on the depth of GCaMP7f expression, each imaging volume consisted of either 6 or 8 imaging planes, 40 µm apart, giving an effective imaging rate of 6.4 or 4.8 Hz per volume respectively.

Mice were trained first in the visual discrimination task, then had at least 3 training sessions in the visual-odour block switching task. Once mice had learned the switching task, at least 3 recordings of the mice performing the task were taken per mouse. Before each recording session the same imaging site was found by matching anatomical landmarks.

After all in-vivo imaging data had been collected, a final high-quality image stack was acquired under anaesthesia. Subcutaneous injections of ketamine (100 mg/kg) and xylazine (16 mg/kg) were used to induce anaesthesia, with further injections of ketamine to maintain anaesthesia if necessary. Eye-cream (Maxitrol) was used to prevent drying, and body temperature was maintained using a heating pad.

Widefield imaging was performed on a custom built inverted tandem lens macroscope (Cosys), with two photographic lenses (AF-S NIKKOR 85 mm f/1.8 G lens and AF NIKKOR 50 mm f/1.4D Lens). The brain was illuminated with interleaved collimated blue (470 nm, Thorlabs M470L4) and violet light (405 nm, Thorlabs M405L4) at an irradiance of ~0.03 mW/mm 2 . Images were recorded with a CMOS camera (Point Grey Research Grasshopper3) at frame rate of 54 Hz. LEDs and camera frame acquisition were triggered using a digital microprocessor (Teensy 3.2).

Widefield data was pre-processed using the methods described in ref. 73 . The widefield video underwent motion correction and the brain images were aligned within and across mice by manual rigid alignment to a number of anatomical landmarks. The video data was compressed and denoised by performing SVD on the matrix of pixels × time and retaining the top 500 components. The ΔF/F was computed for each pixel by taking the difference between F and F0, and dividing by F0, where F0 was the mean value across the entire session. Traces were filtered with a 0.0033 Hz high-pass second order Butterworth filter, and an additional 7 Hz lowpass filter was applied to the violet illumination trace. To correct for haemodynamic artefacts, a scaled version of the violet illumination trace was subtracted from the blue illumination trace for each pixel. This scaling factor was found by regressing the violet trace onto the blue trace. To account for overt movement-related brain activity, we fit a ridge regression model to the data, predicting brain activity from a number of movement regressors. These included a binarized lick trace, with lags up to 500 ms as well as instantaneous running speed and average face motion energy. Running speed and face motion energy were divided by twice their standard deviation to ensure all regressors had approximately the same scale and were penalised equally by ridge regularisation. Ridge penalties were selected using fivefold cross-validation from 36 values spaced logarithmically between 10 −2 and 10 5 selecting the ridge penalty which resulted in the lowest cross validated mean squared error. Penalties were selected independently for each pixel. We then subtracted this predicted activity and all subsequent analysis was performed on the model residuals.

Behavioural training

The behaviour apparatus and training were similar to previous studies 74 , 75 . Mice were trained on a visual discrimination task for up to two weeks, until discrimination performance reached threshold, before training them on the switching task (see below). Mice had free access to water but were food restricted to maintain at least 85% of their free-feeding body weight (typically 85–90%, 2–3 g of standard food pellets per animal per day). A reward delivery spout was positioned near the snout of the mouse, and licks to this spout were detected using a piezo disc sensor and custom electronics. The reward was a 10% solution of soy milk powder (SMA Wysoy) delivered by opening a pinch valve (NResearch) controlled through custom electronics. The mouse’s running speed on the cylinder was measured using an incremental rotary encoder (Kübler). Two luminance-corrected monitors (luminance metre LS-100, Konica Minolta) positioned at 45° angles and 25 cm distance to the mouse delivered visual stimuli.

Animals were habituated to handling and gentle restraint over two to 3 days, before they were head-fixed and trained to run on a polystyrene cylinder (20 cm diameter) for one to four days. This period was also used to find suitable imaging sites. After the habituation phase, mice performed one behaviour session in which the movement of the gratings was linked to the mouse’s movement on the wheel. Subsequently, mice were trained to self-initiate trials by sustained running on the wheel for at least 2.8 s and an added random duration drawn from an exponential distribution with mean 0.4 s (trial structure and all timings shown in Supplementary Fig.  1A ). At this point one of two drifting sinusoidal visual gratings were randomly presented, drifting in the opposite direction to the direction of running, with a fixed spatial and temporal frequency of 0.1 cycles per degree and 2 Hz respectively. The rewarded and unrewarded gratings were oriented +/−20° relative to vertical, symmetrically on both screens. When the rewarded grating was displayed the mouse could trigger the delivery of a reward, a drop of soya milk, by licking the spout during the ‘reward period’, lasting from 1.0 s after the appearance of the grating to its disappearance, maximum 0.8 s into the reward period. This was recorded as a ‘hit’. In some sessions, the duration of the reward period started at 1.5 s and lasted up to 1.53 s, no difference in behaviour was observed with this minor change in timings. The visual stimulus stayed on for an additional 0.8 s after reward onset while the mouse consumed the reward. If the mouse did not lick during this period, the trial was recorded as a ‘miss’, and a drop of soy milk was delivered shortly before the disappearance of the grating. When the unrewarded grating was presented, a single lick or more at any time until the stimulus disappearance was recorded as a ‘false alarm’, triggering a time-out period of 4 s in which the unrewarded grating remained on screen, and any further licks restarted the time-out. During early training the probability of unrewarded trials was occasionally increased transiently up to 0.8 to discourage erroneous licking. All mice learned the visual discrimination task in 5–10 days, with post-learning defined as three consecutive days of discrimination with a behavioural d-prime score of 2 or above. Behavioural d-prime was calculated as: bd’  = Φ −1 ( H ) - Φ −1 ( F ), where Φ −1 is the normal inverse cumulative distribution function, H is the rate of hit trials, and F is the rate of false alarm trials.

Once mice had learned the visual discrimination task, they were trained in odour discrimination. After the same randomised period of sustained running, one of two odour stimuli were presented to the mouse via a polyethylene tubing positioned above the snout of the mouse. Odours were delivered through a custom-built flow dilution olfactometer calibrated with a mini PID (Aurora) at 10–20% saturated vapour concentration of two solutions, 10% soy milk (rewarded odour) and 10% soy milk with 0.1% p-Cymene mixture (unrewarded odour). The odour task structure was identical to the visual task.

Once animals were discriminating the odours accurately (typically after 30–40 trials), they were trained to switch between blocks of the olfactory and visual discrimination task. Mice typically learned to switch successfully in 1–3 days. In the olfactory blocks, 70% of odour stimuli were preceded with one of the two same visual gratings featured in the visual discrimination task (fixed duration of 1.8 s, with an identical onset delay distribution as in the visual block). In this instance neither grating was rewarded or punished, and mice learned to ignore these irrelevant gratings while accurately discriminating the odours, which were presented after the irrelevant visual grating (delay between visual grating offset and odour onset 1.8 s (in some two-photon imaging sessions this delay was 1.8 s plus an added random duration drawn from an exponential distribution with mean 0.2 s). In initial switching training sessions, a reward was delivered at the end of a rewarded grating in a visual block if the mouse had failed to lick, giving a clear indication that the grating was now relevant. By the end of early training, and for all data in this paper except Supplementary Fig.  2F , this feature was removed, requiring mice to switch between blocks through noticing unexpected stimuli alone. Block switches occurred automatically when a mouse had demonstrated a > 80% discrimination performance to the relevant stimuli (visual gratings in visual block, odours in odour block) over the last 30 trials of a block. Additionally in odour blocks mice were required to have successfully ignored all irrelevant visual gratings over the previous 10 trials before a block transition was triggered. Blocks typically contained 30 to 40 trials. Mice were deemed to have learned the switching task when they could complete sessions at these parameters with at least 3 repeats of each block type.

In order to compare the speed of behavioural switching between blocks, we applied a transition period immediately after a block transition where visual stimuli were selected according to the following rules. For odour to visual block transitions, in the first trials of a visual block only the rewarded grating (visual stim 1) was shown. When a mouse responded correctly by licking to these grating stimuli on three consecutive trials this transition period ended and the block continued with the normal 50% probabilities of visual grating identities. In the visual to odour block transitions, we applied two variations of the transition period. In the first variation, the first irrelevant visual stimulus was the otherwise unrewarded visual stimulus 2, ensuring that that the block transition was indicated by the unexpected appearance of an odour, rather than a reward prediction error. The subsequent irrelevant stimuli were visual stimulus 1, and the transition period ended when a mouse responded correctly by not licking to these irrelevant visual stimuli on three consecutive trials. Odour stimuli selection itself was kept random. In the second variation, even the first irrelevant visual stimulus was visual stimulus 1, and the subsequent rules were the same. We confirmed that mice switched equally fast in both variants of the task (Wilcoxon rank sum test, P  > 0.05). These transition periods were used in all behaviour sessions except muscimol silencing experiments (Supplementary Fig.  2C ) and light-only controls (Supplementary Fig.  2B ), in which case either visual stimulus was presented from the start of the block with 50% probability.

Reinforcement learning model

We modelled the experimental protocol of stimuli, and rewards based on mouse actions, as a reinforcement learning environment. The environment was written in Python with the OpenAI Gym interface for ease of use with other agents. Code for, and experimental data used to train the models is available at https://github.com/adityagilra/BeliefStateRL . The environment had 5 states, two for visual cues, two for olfactory cues and an end of trial cue, and two possible actions that the agent could perform, lick and no-lick. Each trial comprised a number of steps.

In a visual block, each trial had 2 steps: in step 1, a needless lick (to the previous end of trial) was punished (−1 assuming an internal cost), and one of 2 visual stimuli were shown; in step 2: a lick led to reward ( + 1 corresponding to soy milk drop) if visual cue 1 was presented, or punishment (−1 corresponding to experimental timeout) if visual cue 2 was presented, and the end of trial was indicated. In an olfactory block, a trial had 2 time steps in 30% of the trials, corresponding to trials without irrelevant visual stimuli, and 3 time steps in 70% of the trials, corresponding to trials with irrelevant visual stimuli. In the 2 time steps case, in step 1, a needless lick was punished (−1), and one of the 2 odour stimuli were given. In step 2, a lick led to reward ( + 1) if odour 1 was delivered, or punishment (−1) if odour 2 was delivered in step 1, and end of trial was indicated. In the 3 time step case, in step 1, a needless lick was punished (−1), and one of the 2 visual stimuli were shown. In step 2, a needless lick was punished (−1), and one of the 2 odour stimuli were delivered. In step 3, a lick led to reward ( + 1) if odour 1 was delivered, or punishment (−1) if odour 2 was delivered in step 2, and end of trial was indicated. No lick always led to 0 reward. Overall, a correct response for a trial in either block is defined as a lick for cue 1 or a no lick for cue 2 in the final reward step, while requiring no-lick in all other time steps. Block switches occurred using the same transition rules as in the experiment, described above.

Basic RL agent (Tabular SARSA algorithm): Each of the 2 visual and 2 olfactory cues, as well as the end-of-trial cue, was considered a state. The possible actions were lick and no-lick. A Q -value Q ( s, a ) table was constructed with entries for each combination of 4 states (leaving out end-of-trial) and 2 actions, denoted by s and a , initialized to zeros. The Q -value Q ( s, a ) represented the expected total reward till the end of the trial, given the cue s and taking action a at the current step. The Q -value for the end of trial cue was set to 0. As each cue s was encountered, the agent responded with an action a according to an ϵ -greedy policy. i.e., a random action was taken with probability ϵ , otherwise the action that yielded the maximum Q-value for the current cue s was taken. The Q-table was updated as per the SARSA (State-Action-Reward-State-Action) algorithm using the temporal difference (TD) error

multiplied by a learning rate α

Cues \(s^{\prime},s\) and actions \(a^{\prime},a\) correspond to the current, previous time steps respectively.

Belief State RL agent: Two Q -value tables were constructed and initialized to zero, corresponding to the visual and olfactory blocks, each of size 4 states by 2 actions. The agent also had a belief b about being in a visual block v versus an olfactory block o , which was represented as a discrete probability distribution \(b\equiv (p(v),\, p(o))\) . At every step of the trial, the agent assumed that the current block was either visual or olfactory depending on which probability p ( v ) or p ( o ) was higher, and took an action according to an ϵ -greedy policy based on the Q-table corresponding to the assumed block. This Q-table was updated similar to the basic agent using the TD-error δ multiplied by a learning rate α .

At the end of every trial, a block mismatch signal \(\chi \equiv d-b\) was computed as the difference between the detected block d (represented as (1,0) or (0,1) for visual or olfactory block respectively depending on whether the cue just before the end of the trial was a visual or odour cue), and the agent’s belief b . A noise-corrupted version of this block mismatch signal \(\chi ^{\prime}=\chi (1+\beta \xi )\) was computed, where ξ was a Gaussian random variable with mean 0 and variance 1, and β was a noise factor parameter. The agent’s belief was updated as:

where \(\chi ^{\prime}\) was multiplied by a belief switching rate ζ , and then each component of \(b^{\prime}\) was clipped to be greater than 0 and clipped \(b^{\prime}\) was normalized to yield a probability distribution b which served as the belief for the next trial. After training, updating of Q-value tables was turned off, and only belief updates were carried out.

Model fitting: Each model was simulated with learning via SARSA for N /2 time steps. After this training, we obtained simulated behaviour data for a further N /2 time steps, for fitting to experimental data of trained mice. For Basic RL model fitting, exploration and updation of Q-value tables via SARSA parametrized by ϵ and α were kept the same as during training. For Belief-state RL model fitting, Q-values were no longer updated i.e., α = 0 after training, only the belief state was updated, and exploration was kept on. We confirmed that keeping Q-value updating on after learning, at α = 0.1 as during training, did not have a noticeable effect on the results.

We minimized the root mean squared error (RMSE) between the experimental and simulated p ( lick|cue ). For all fits, we performed a global grid search within reasonable parameter ranges, followed by a local minimization starting from the best parameter sets obtained from the grid search. The fits were performed using 5-fold cross validation, where for each fold, we fit the parameters of the model to 4/5th of the data and tested on the held out 1/5th of the data. RMSE mean for each fold i.e., each training and test split was calculated across 5 seeds (2 with the first variation + 3 seeds with the second variation in a similar ratio as the number of experimental sessions on the two variations of the block transitions described above). To select between the models, the RMSE mean ± SD across these 5 folds, computed on the above RMSE mean across seeds, were reported and compared as below.

For the default basic RL model, we fit 2 parameters: exploration rate ϵ and the learning rate α . This model was unable to reproduce the rapid block transitions observed in the data, since both the exploration rate and the learning rate needed to be very high to rapidly explore actions and learn a different reward structure after a transition, but a high exploration rate was inconsistent with the steady-state experimental data. For the default belief-state RL model, we fit 4 parameters: belief switching rate ζ , the noise factor β for the prediction-error signal, exploration rate ϵ , and a different exploration rate \(\epsilon {\prime}\) to account for enhanced licking to visual cue 2 in the olfactory block. The learning rate α was fixed at 0.1 during training, but was set to zero for the fitted data, as the belief switching rate played a much stronger role in rapid switching between blocks. The RMSE mean ± SD on training and test splits were 0.157438 ± 0.004906 and 0.193520 ± 0.010869 for the default basic RL agent with 2 parameters, and 0.087012 ± 0.005060 and 0.137714 ± 0.014658 for the default belief-state RL with 4 parameters, leading us to choose the Belief-state RL model over the Basic one.

We also fitted a Basic RL model with 3 parameters: exploration rate ϵ , learning rate α , and independent exploration rate \(\epsilon ^{\prime}\) on receiving visual cue 2, which yielded RMSE mean ± SD on 5-fold training and test splits as 0.161927 ± 0.006923 and 0.197955 ± 0.011329. Further, we fitted a Belief-state RL model with 3 parameters: belief switching rate ζ , noise factor β , and a common exploration rate ϵ , which yielded RMSEs as 0.095403 ± 0.006537 and 0.143631 ± 0.015731. This shows that \(\epsilon ^{\prime}\) , the enhanced exploration rate to visual cue 2, does not play a major role in selecting between Basic vs Belief-state RL models.

Simulated \(p\left({lick}|{cue}\right)\) using the best fit parameters (on the full dataset) for each model for one seed are shown in Fig.  1G and Supplementary Fig.  3C middle and bottom. The prediction-error signal for one-shot versus slower switches made by the agent, shown in Fig.  6C , was computed as the sum of the absolute values of the two components of the prediction-error signal χ at the end of the first trial following the block transition. For each of both types of transitions, only 70 transitions were chosen randomly from the simulated data, similar to the number of transitions in the experimental data, for plotting and significance testing.

For fitting the behaviour during ACC silencing, which was a separate experimental dataset, first we fitted the default 4 parameters for the belief-state RL model on the behaviour data without ACC silencing. Then keeping these parameters fixed, we fitted the behaviour data with ACC silenced (Supplementary Figs.  2d and 3f ), using two parameters – factors on the prediction-error signal for odour and visual trials. These factors signified how much the prediction-error signal reduced on silencing the ACC.

Fitted parameters are shown in Table  1 . Since these fits were not for model selection, all of the data was fit, minimizing mean RMSE across 5 seeds (both variations of the task included as described above).

Imaging data analysis

Image stacks were corrected for motion, and regions of interest (ROIs) were selected for each active neuron in each session using Suite2p 76 . Each site yielded between 129 and 925 neurons, median = 499 neurons. Raw fluorescence time series F(t) were obtained for each neuron by averaging across pixels within each ROI. Baseline fluorescence F0(t) was computed by smoothing F(t) (causal moving average of 0.75 s) and determining for each time point the minimum value in the preceding 60 s time window. The change in fluorescence relative to baseline, ΔF/F, was computed by taking the difference between F and F0, and dividing by F0.

To identify prediction-error neurons, we selected neurons which responded significantly differently to the odour prediction-error event when compared to both the expected arrival of the odour and when no odour was expected (Fig.  2B ). We defined three epochs each lasting 1.5 s and measured the average neural activity in these epochs: 1) Odour prediction-error trials, starting 2.0 s after the offset of the first visual stimulus following a switch from an odour block to a visual block, provided the mouse did not already lick to the preceding visual stimulus (2.0 s was the average delay in the imaging sessions from the visual stimulus offset to the odour stimuli onset). 2) Stable odour block trials from the end of the preceding odour block (when an odour is expected and received) aligned to the onset of the odour stimuli, following a correctly ignored visual grating. 3) No odour expected trials, when no odour is expected following a visual stimulus, 2.0 s following the offset of an unrewarded visual stimulus. Trials were taken from subsequent visual block up to 10 trials before the end of the block. In epochs 2 and 3, we averaged up to 7 trials of each condition for each block transition (median 7 trials). We compared the neural activity in different epochs with a Wilcoxon rank-sum test with the number of samples equal to the number of block transitions. Prediction-error neurons were defined as neurons with significantly different activity in odour prediction-error trials, when compared to both of the other conditions. We repeated the analysis without averaging multiple trials in epochs 2 and 3 and still obtained a significantly larger fraction of prediction-error neurons in the ACC compared to V1 (Chi-squared test of proportions P  < 0.0001). Similar criteria were used for the visual to odour block transitions. Positively and negatively responsive prediction-error neurons were those in which the response to the odour prediction error was largest or smallest of the three conditions respectively. Two other combinations were observed, first with the odour condition significantly higher and no odour condition significantly lower compared to the prediction-error condition (99 neurons), and second, the reverse of this (163 neurons). To test whether activity in prediction-error neurons could predict subsequent switching, average activity between 0–1.5 s in odour prediction-error trials was compared between one-shot switches (in which the mouse responded correctly to the subsequent visual grating) and slower switches (in which the mouse continued to miss at least the next visual grating) using a Wilcoxon signed-rank test.

Decoding analysis (Figs.  4 D, 7G, I ) was performed by training a binary logistic regression classifier to decode the stimulus or block identity from the vector of ΔF/F values at each frame. Decoding performance was assessed using stratified 5-fold cross validation and taking the mean accuracy across the 5 test sets. Stimuli classes were evenly balanced by randomly subsampling the larger class. We applied an L2 regularisation penalty to reduce overfitting.

For identification of striatal projecting neurons, a brief dual channel recording of the imaging planes was acquired before each imaging session at an excitation wavelength of 830 nm. Following segmentation, imaged neurons co-expressing Alexa-647 were identified using this recording, and confirmed using a detailed anesthetised dual channel z-stack taken at the end of all imaging sessions. To calculate confidence intervals for the percentage of prediction-error neurons in the non-striatal projecting ROIs (Fig.  4M ) a percentile bootstrap method was used, resampling with replacement a number of ROIs equivalent to the size of the striatal projecting ROI population 10,000 times. The proportion of prediction-error neurons was then calculated and the 0.5 th and 99.5 th percentiles of this distribution of proportions was calculated to obtain the 99% confidence interval.

To assess the proportion of neural activity which was attributable to overt behaviour recorded during our task (Supplementary Fig.  5C, D, E ) a linear model was fit using ridge regression to predict neural activity. The model was constructed by combining multiple sets of variables into a design matrix, to capture signal modulation by the following different task or behavioural events: 2 visual stimuli, 2 odour stimuli, reward delivery, licks, running speed, block type, and an interaction term for visual stimuli and block type. Each stimulus/event variable was structured to capture a time-varying event kernel. Variables therefore consisted of a vector of the relevant stimulus/event, and copies of this vector, each shifted in time by one frame for specific durations. For sensory stimuli, the time-shifted copies ranged up to 2 s after the original. For motor events (running and licking) the time-shifted copies spanned the frames from 0.5 s before until 2 s after the original. The model was fit with 5-fold cross validation and the coefficient of determination (R 2 ) was calculated based on the predictions of the model on held out data not used during training. We then assessed the predictive power of the behavioural model variables by comparing the R 2 value for the full model to a model without the running and licking predictors.

Optogenetic activation and inhibition of VIP interneurons during ACC imaging

To perturb VIP neurons expressing Chrimson, ArchT, or tdTomato concurrently with 2-photon imaging, a 639 nm laser (OBIS, Coherent) was used to deliver light via a 200 µm diameter, 0.39 NA optic fibre (Thorlabs) positioned around 3 mm from the posterior edge of the microprism, at a 30 o angle relative to the coverslip. The laser and stimulus monitors were blanked during the linear phase of the resonant scanner to allow quasi-simultaneous two-photon imaging and optogenetic activation. The effective maximum output power from the optic fibre was 5.3 mW. During an optogenetic calibration session in the dark, 5 laser powers (20%, 40%, 60%, 80%, and 100% of maximum) were pseudorandomly applied to the coverslip for 1.5 s, with a 5 s interval between each. During switching sessions, two laser epochs were used, with 5.3 mW power only. In the peri-stimulus epoch, the laser began 0.1 s before the visual stimulus and continued until the end of the visual stimulus. In the inter-trial-interval epoch, the laser began at the offset of each visual stimulus and continued until the onset of the next visual stimulus.

Pupil tracking

Eye recordings were acquired using a monochrome USB2.0 video camera (The Imaging Source) with a 50 mm 2/3” format 5-megapixel lens (Computar), set to acquire at 320x240 (Y800) resolution and 30 frames per second. Frames were triggered using an Arduino Uno microcontroller board (Arduino) to ensure a constant acquisition rate. Pupil data were extracted using DeepLabCut 77 for 2D marker tracking, with markers set to track vertical and horizontal boundaries of the eye and pupil throughout the recording. Frames that coincided with blinks were removed based on changed in vertical size of the eye, and pupil width was calculated from the remaining frames.

Optogenetic silencing of ACC activity

Once mice had learned the full switching task, optogenetic silencing of ACC neurons was performed by connecting the optic fibre cannulae to a blue LED (470 nm, Thorlabs), and delivering light while the mouse performed the task. Before implantation, each optic fibre was confirmed to have an effective power output of >1 mW after cutting. Light was delivered either throughout the session (pulsed at 40 Hz 78 ), 0.5 s before each visual stimulus continuing to the end of the stimulus (‘peri-stimulus’), or from the end of each visual stimulus until the beginning of the next visual stimulus (‘inter-trial-interval’, ITI). These three epochs were used to silence both ACC and PL on different days, creating 6 silencing conditions. These conditions were pseudorandomly chosen across consecutive days, with the order different between mice but the time between repeated conditions kept constant, and with control no-light sessions interspersed every third session. For one-trial un-silencing, the light was continuously pulsed throughout the session as above, but this was paused at the end of the last trial in the odour block, and resumed at the start of the second trial of the visual block. In the peri-stimulus and ITI epochs, light power was ramped down for the final 200 ms of each pulse. For ACC silencing during learning, the silencing group included all 8 optogenetic mice, and the non-silenced controls were the 10 imaging mice. Light-only controls were performed in PV-Cre mice not expressing channelrhodopsin ( N  = 3 mice). No differences were found in these mice in the number of trials to transition between blocks with and without the light stimulation ( P  > 0.05, Wilcoxon rank sum test).

Pharmacological silencing of ACC activity

For the muscimol silencing experiments, 4 male wildtype C57Bl/6j mice were implanted with bilateral infusion cannulae (−1.2 mm DV, +0.7 mm AP, ± 0.5 mm ML, bilaterally). After training in the switching task, mice were infused bilaterally with 300 nl muscimol (Sigma, 1 µg/ µl) or saline at a rate of 0.25 µl/min using a 1 µl syringe (Hamilton) and syringe pump (World Precision Instruments SP100IZ) 30 min before the start of a switching session. We waited 5 min after the syringe pump had finished to allow full infusion of the drug before disconnecting the cannulae.

Silencing data analysis

Behavioural d-primes in each silencing condition were calculated by taking performance in stable periods of blocks, outside of transition periods 74 , 75 . Switching speeds were defined as the number of trials that elapse before a mouse correctly responded to three rewarded visual gratings in a row, either licking to the grating after a switch to a visual block or by ignoring the grating after a switch to an odour block. A ‘fluke’ switch was defined as a switch in which the mouse correctly licked to the very first rewarded grating in a new visual block, before any evidence of the switch had been received. These were interpreted as exploratory licks, visible in the histograms in Fig.  1F and Supplementary Fig.  3E at trial 0, and all such transitions were excluded from analysis of switching speeds.

Reporting summary

Further information on research design is available in the  Nature Portfolio Reporting Summary linked to this article.

Data availability

The data that support the findings of this study are available at https://figshare.com/projects/Prediction-error_signals_in_anterior_cingulate_cortex_drive_task-switching/211438 .  Source data are provided with this paper.

Code availability

The code generated in this study is available at https://github.com/adityagilra/BeliefStateRL with DOI release https://doi.org/10.5281/zenodo.12636612 .

Monsell, S. Task switching. Trends Cogn. Sci. 7 , 134–140 (2003).

Article   PubMed   Google Scholar  

Keller, G. B. & Mrsic-Flogel, T. D. Predictive processing: a canonical cortical computation. Neuron 100 , 424–435 (2018).

Article   PubMed   PubMed Central   CAS   Google Scholar  

Clark, A. Whatever next? Predictive brains, situated agents, and the future of cognitive science. Behav. Brain Sci. 36 , 181–204 (2013).

Bastos, A. M. et al. Canonical microcircuits for predictive coding. Neuron 76 , 695–711 (2012).

de Lange, F. P., Heilbron, M. & Kok, P. How do expectations shape perception? Trends Cogn. Sci. 22 , 764–779 (2018).

Meirhaeghe, N., Sohn, H. & Jazayeri, M. A precise and adaptive neural mechanism for predictive temporal processing in the frontal cortex. Neuron 109 , 2995–3011.e5 (2021).

Sarafyazd, M. & Jazayeri, M. Hierarchical reasoning by neural circuits in the frontal cortex. Science 364 , eaav8911 (2019).

Article   PubMed   CAS   Google Scholar  

Shima, K. & Tanji, J. Role for cingulate motor area cells in voluntary movement selection based on reward. Science 282 , 1335–1338 (1998).

Article   ADS   PubMed   CAS   Google Scholar  

O’Reilly, J. X. et al. Dissociable effects of surprise and model update in parietal and anterior cingulate cortex. Proc. Natl Acad. Sci. USA 110 , E3660–E3669 (2013).

Article   PubMed   PubMed Central   Google Scholar  

Narayanan, N. S., Cavanagh, J. F., Frank, M. J. & Laubach, M. Common medial frontal mechanisms of adaptive control in humans and rodents. Nat. Neurosci. 16 , 1888–1895 (2013).

Bartolo, R. & Averbeck, B. B. Prefrontal cortex predicts state switches during reversal learning. Neuron 106 , 1044–1054.e4 (2020).

Banerjee, A. et al. Value-guided remapping of sensory cortex by lateral orbitofrontal cortex. Nature 585 , 245–250 (2020).

Klein-Flügge, M. C., Bongioanni, A. & Rushworth, M. F. S. Medial and orbital frontal cortex in decision-making and flexible behavior. Neuron https://doi.org/10.1016/j.neuron.2022.05.022 (2022).

Heilbronner, S. R. & Hayden, B. Y. Dorsal anterior cingulate cortex: a bottom-up view. Annu. Rev. Neurosci. 39 , 149–170 (2016).

Shenhav, A., Cohen, J. D. & Botvinick, M. M. Dorsal anterior cingulate cortex and the value of control. Nat. Neurosci. 19 , 1286–1291 (2016).

Kolling, N. et al. Value, search, persistence and model updating in anterior cingulate cortex. Nat. Neurosci. 19 , 1280–1285 (2016).

Tervo, D. G. R. et al. Behavioral variability through stochastic choice and its gating by anterior cingulate cortex. Cell 159 , 21–32 (2014).

Tervo, D. G. R. et al. The anterior cingulate cortex directs exploration of alternative strategies. Neuron 109 , 1876–1887.e6 (2021).

Williams, Z. M., Bush, G., Rauch, S. L., Cosgrove, G. R. & Eskandar, E. N. Human anterior cingulate neurons and the integration of monetary reward with motor responses. Nat. Neurosci. 7 , 1370–1375 (2004).

Kennerley, S. W., Behrens, T. E. J. & Wallis, J. D. Double dissociation of value computations in orbitofrontal and anterior cingulate neurons. Nat. Neurosci. 14 , 1581–1589 (2011).

Hayden, B. Y., Pearson, J. M. & Platt, M. L. Neuronal basis of sequential foraging decisions in a patchy environment. Nat. Neurosci. 14 , 933–939 (2011).

Kolling, N., Behrens, T. E. J., Mars, R. B. & Rushworth, M. F. S. Neural mechanisms of foraging. Science 336 , 95–98 (2012).

Article   ADS   PubMed   PubMed Central   CAS   Google Scholar  

Ito, S., Stuphorn, V., Brown, J. W. & Schall, J. D. Performance monitoring by the anterior cingulate cortex during saccade countermanding. Science 302 , 120–122 (2003).

Quilodran, R., Rothé, M. & Procyk, E. Behavioral shifts and action valuation in the anterior cingulate cortex. Neuron 57 , 314–325 (2008).

Hyman, J. M., Holroyd, C. B. & Seamans, J. K. A novel neural prediction error found in anterior cingulate cortex ensembles. Neuron 95 , 447–456.e3 (2017).

Jaramillo, S. & Zador, A. M. Mice and rats achieve similar levels of performance in an adaptive decision-making task. Front. Syst. Neurosci . 8 , 173 (2014).

Liu, Y., Xin, Y. & Xu, N.-L. A cortical circuit mechanism for structural knowledge-based flexible sensorimotor decision-making. Neuron 109 , 2009–2024.e6 (2021).

Reinert, S., Hübener, M., Bonhoeffer, T. & Goltstein, P. M. Mouse prefrontal cortex represents learned rules for categorization. Nature https://doi.org/10.1038/s41586-021-03452-z (2021).

Wang, T.-Y., Liu, J. & Yao, H. Control of adaptive action selection by secondary motor cortex during flexible visual categorization. eLife 9 , e54474 (2020).

Siniscalchi, M. J., Phoumthipphavong, V., Ali, F., Lozano, M. & Kwan, A. C. Fast and slow transitions in frontal ensemble activity during flexible sensorimotor behavior. Nat. Neurosci. 19 , 1234–1242 (2016).

Rodgers, C. C. & DeWeese, M. R. Neural correlates of task switching in prefrontal cortex and primary auditory cortex in a novel stimulus selection task for rodents. Neuron 82 , 1157–1170 (2014).

Karlsson, M. P., Tervo, D. G. R. & Karpova, A. Y. Network resets in medial prefrontal cortex mark the onset of behavioral uncertainty. Science 338 , 135–139 (2012).

Spellman, T., Svei, M., Kaminsky, J., Manzano-Nieves, G. & Liston, C. Prefrontal deep projection neurons enable cognitive flexibility via persistent feedback monitoring. Cell 184 , 2750–2766.e17 (2021).

Harlow, H. F. The formation of learning sets. Psychol. Rev. 56 , 51–65 (1949).

Achterberg, J. et al. A one-shot shift from explore to exploit in monkey prefrontal cortex. J. Neurosci. 42 , 276–287 (2022).

Hertäg, L. & Clopath, C. Prediction-error neurons in circuits with multiple neuron types: Formation, refinement, and functional implications. Proc. Natl. Acad. Sci. USA 119 , e2115699119 (2022).

Pi, H.-J. et al. Cortical interneurons that specialize in disinhibitory control. Nature 503 , 521–524 (2013).

Pfeffer, C. K., Xue, M., He, M., Huang, Z. J. & Scanziani, M. Inhibition of inhibition in visual cortex: the logic of connections between molecularly distinct interneurons. Nat. Neurosci. 16 , 1068–1076 (2013).

Lee, S., Kruglikov, I., Huang, Z. J., Fishell, G. & Rudy, B. A disinhibitory circuit mediates motor integration in the somatosensory cortex. Nat. Neurosci. 16 , 1662–1670 (2013).

Letzkus, J. J., Wolff, S. B. E. & Lüthi, A. Disinhibition, a circuit mechanism for associative learning and memory. Neuron 88 , 264–276 (2015).

Braga, A. & Schönwiesner, M. Neural substrates and models of omission responses and predictive processes. Front. Neural Circuits 16 , 799581 (2022).

Akam, T. et al. The anterior cingulate cortex predicts future states to mediate model-based action selection. Neuron 109 , 149–163.e7 (2021).

Rushworth, M. F. S. & Behrens, T. E. J. Choice, uncertainty and value in prefrontal and cingulate cortex. Nat. Neurosci. 11 , 389–397 (2008).

Warren, C. M., Hyman, J. M., Seamans, J. K. & Holroyd, C. B. Feedback-related negativity observed in rodent anterior cingulate cortex. J. Physiol. Paris 109 , 87–94 (2015).

Bissonette, G. B., Powell, E. M. & Roesch, M. R. Neural structures underlying set-shifting: roles of medial prefrontal cortex and anterior cingulate cortex. Behav. Brain Res. 250 , 91–101 (2013).

Botvinick, M. M. Conflict monitoring and decision making: reconciling two perspectives on anterior cingulate function. Cogn. Affect Behav. Neurosci. 7 , 356–366 (2007).

Passingham, R. E. et al. Attention to action. Philos. Trans. R. Soc. Lond. B Biol. Sci. 351 , 1473–1479 (1996).

Procyk, E., Tanaka, Y. L. & Joseph, J. P. Anterior cingulate activity during routine and non-routine sequential behaviors in macaques. Nat. Neurosci. 3 , 502–508 (2000).

Hadland, K. A., Rushworth, M. F. S., Gaffan, D. & Passingham, R. E. The anterior cingulate and reward-guided selection of actions. J. Neurophysiol. 89 , 1161–1164 (2003).

Norman, K. J. et al. Frontal-sensory cortical projections become dispensable for attentional performance upon a reduction of task demand in mice. Front Neurosci. 15 , 775256 (2022).

Zhang, S. et al. Organization of long-range inputs and outputs of frontal cortex for top-down control. Nat. Neurosci. 19 , 1733–1742 (2016).

Alexander, W. H. & Brown, J. W. Medial prefrontal cortex as an action-outcome predictor. Nat. Neurosci. 14 , 1338–1344 (2011).

Kuchibhotla, K. V. et al. Parallel processing by cortical inhibition enables context-dependent behavior. Nat. Neurosci. 20 , 62–71 (2017).

Attinger, A., Wang, B. & Keller, G. B. Visuomotor coupling shapes the functional development of mouse visual cortex. Cell 169 , 1291–1302.e14 (2017).

Kounios, J. & Beeman, M. The Aha! moment: the cognitive neuroscience of insight. Curr. Dir. Psychol. Sci. 18 , 210–216 (2009).

Article   Google Scholar  

Leinweber, M., Ward, D. R., Sobczak, J. M., Attinger, A. & Keller, G. B. A sensorimotor circuit in mouse cortex for visual flow predictions. Neuron 95 , 1420–1432.e5 (2017).

Watabe-Uchida, M., Eshel, N. & Uchida, N. Neural circuitry of reward prediction error. Annu. Rev. Neurosci. 40 , 373–394 (2017).

Rabinovich, R. J., Kato, D. D. & Bruno, R. M. Learning enhances encoding of time and temporal surprise in mouse primary sensory cortex. Nat. Commun. 13 , 5504 (2022).

Sokolov, A. A., Miall, R. C. & Ivry, R. B. The cerebellum: adaptive prediction for movement and cognition. Trends Cogn. Sci. 21 , 313–332 (2017).

Wilson, R. C., Takahashi, Y. K., Schoenbaum, G. & Niv, Y. Orbitofrontal cortex as a cognitive map of task space. Neuron 81 , 267–279 (2014).

Durstewitz, D., Vittoz, N. M., Floresco, S. B. & Seamans, J. K. Abrupt transitions between prefrontal neural ensemble states accompany behavioral transitions during rule learning. Neuron 66 , 438–448 (2010).

Vertechi, P. et al. Inference-based decisions in a hidden state foraging task: differential contributions of prefrontal cortical areas. Neuron https://doi.org/10.1016/j.neuron.2020.01.017 (2020).

Myers-Joseph, D., Wilmes, K. A., Fernandez-Otero, M., Clopath, C. & Khan, A. G. Disinhibition by VIP interneurons is orthogonal to cross-modal attentional modulation in primary visual cortex. Neuron 112 , 628–645.e7 (2024).

Fox, M. T., Barense, M. D. & Baxter, M. G. Perceptual attentional set-shifting is impaired in rats with neurotoxic lesions of posterior parietal cortex. J. Neurosci. 23 , 676–681 (2003).

Marton, T., Seifikar, H., Luongo, F. J., Lee, A. T. & Sohal, V. S. Roles of prefrontal cortex and mediodorsal thalamus in task engagement and behavioral flexibility. J. Neurosci . https://doi.org/10.1523/JNEUROSCI.1728-17.2018 (2018).

Cho, K. K. A., Shi, J., Phensy, A. J., Turner, M. L. & Sohal, V. S. Long-range inhibition synchronizes and updates prefrontal task activity. Nature 617 , 548–554 (2023).

Modirshanechi, A., Brea, J. & Gerstner, W. A taxonomy of surprise definitions. J. Math. Psychol. 110 , 102712 (2022).

Article   MathSciNet   Google Scholar  

Aston-Jones, G. & Cohen, J. D. An integrative theory of locus coeruleus-norepinephrine function: adaptive gain and optimal performance. Annu. Rev. Neurosci. 28 , 403–450 (2005).

Jordan, R. & Keller, G. B. The locus coeruleus broadcasts prediction errors across the cortex to promote sensorimotor plasticity. Elife 12 , RP85111 (2023).

Clarke, H. F., Dalley, J. W., Crofts, H. S., Robbins, T. W. & Roberts, A. C. Cognitive inflexibility after prefrontal serotonin depletion. Science 304 , 878–880 (2004).

Ahmadlou, M. et al. A subcortical switchboard for exploratory, exploitatory, and disengaged states. 2023.12.20.572654 Preprint at https://doi.org/10.1101/2023.12.20.572654 (2023).

Uddin, L. Q. Cognitive and behavioural flexibility: neural mechanisms and clinical considerations. Nat. Rev. Neurosci. 1–13 https://doi.org/10.1038/s41583-021-00428-w (2021).

Couto, J. et al. Chronic, cortex-wide imaging of specific cell populations during behavior. Nat. Protoc. 16 , 3241–3263 (2021).

Poort, J. et al. Learning enhances sensory and multiple non-sensory representations in primary visual cortex. Neuron 86 , 1478–1490 (2015).

Poort, J. et al. Learning and attention increase visual response selectivity through distinct mechanisms. Neuron 110 , 686–697.e6 (2022).

Pachitariu, M. et al. Suite2p: beyond 10,000 neurons with standard two-photon microscopy. 061507 Preprint at https://doi.org/10.1101/061507 (2017).

Mathis, A. et al. DeepLabCut: markerless pose estimation of user-defined body parts with deep learning. Nat. Neurosci. 21 , 1281–1289 (2018).

Li, N. et al. Spatiotemporal constraints on optogenetic inactivation in cortical circuits. Elife 8 , e48622 (2019).

Download references

Acknowledgements

We thank John Duncan, Apoorva Bhandari, Dimitar Kostadinov and Sonja Hofer for discussions and comments on the manuscript, and Thomas Mrsic-Flogel, Petr Znamenskiy, Anil Seth, Jasper Poort and Juan Burrone for discussions of the results in this manuscript. We thank Marian Fernandez-Otero for technical assistance and Robert Taylor for assistance in pilot experiments. This work was supported by the Wellcome Trust (AGK, 206222/Z/17/Z), the BBSRC (AGK BB/S015809/1), start-up funds from the CDN, King’s College London (AGK), and the MRC CNDD PhD programme (MH).

Author information

Authors and affiliations.

Centre for Developmental Neurobiology, King’s College London, London, UK

Nicholas Cole, Matthew Harvey, Dylan Myers-Joseph & Adil G. Khan

Machine Learning Group, Centrum Wiskunde & Informatica, Amsterdam, the Netherlands

Aditya Gilra

Department of Computer Science, The University of Sheffield, Sheffield, UK

You can also search for this author in PubMed   Google Scholar

Contributions

N.C. and A.G.K. designed the experiments. N.C. performed the experiments and analysed the data. M.H. performed the widefield imaging and contributed to imaging data analysis. D.M.-J. performed the V1 recordings and contributed to imaging and behavioural data analysis. AG built and analysed the RL model. A.G.K. and N.C. wrote the paper, with contributions from A.G., M.H. and D.M.-J. All authors contributed to discussions and commented on the manuscript.

Corresponding author

Correspondence to Adil G. Khan .

Ethics declarations

Competing interests.

The authors declare no competing interests.

Peer review

Peer review information.

Nature Communications thanks the anonymous reviewer(s) for their contribution to the peer review of this work. A peer review file is available.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary information

Supplementary information, peer review file, description of additional supplementary files, supplementary movie 1, reporting summary, source data, source data, rights and permissions.

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ .

Reprints and permissions

About this article

Cite this article.

Cole, N., Harvey, M., Myers-Joseph, D. et al. Prediction-error signals in anterior cingulate cortex drive task-switching. Nat Commun 15 , 7088 (2024). https://doi.org/10.1038/s41467-024-51368-9

Download citation

Received : 30 April 2024

Accepted : 05 August 2024

Published : 17 August 2024

DOI : https://doi.org/10.1038/s41467-024-51368-9

Share this article

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

By submitting a comment you agree to abide by our Terms and Community Guidelines . If you find something abusive or that does not comply with our terms or guidelines please flag it as inappropriate.

Quick links

  • Explore articles by subject
  • Guide to authors
  • Editorial policies

Sign up for the Nature Briefing newsletter — what matters in science, free to your inbox daily.

hypothesis testing calculator population proportion

IMAGES

  1. Population Proportion

    hypothesis testing calculator population proportion

  2. How to Perform Hypothesis Testing for a Proportion: 8 Steps

    hypothesis testing calculator population proportion

  3. 2 population proportion hypothesis test calculator

    hypothesis testing calculator population proportion

  4. Hypothesis Testing Population Proportion

    hypothesis testing calculator population proportion

  5. Population Proportion

    hypothesis testing calculator population proportion

  6. Hypothesis Test for Proportion

    hypothesis testing calculator population proportion

COMMENTS

  1. 29: Hypothesis Test for a Population Proportion Calculator

    hypothesis test for a population Proportion calculator. Fill in the sample size, n, the number of successes, x, the hypothesized population proportion p0 p 0, and indicate if the test is left tailed, <, right tailed, >, or two tailed, ≠ ≠ . Then hit "Calculate" and the test statistic and p-Value will be calculated for you. n: x: p0 p 0

  2. Hypothesis Testing for a proportion Calculator

    Hypothesis Testing of a Proportion Calculator. 1-224-725-3522; [email protected]; Home; About Us; Go Mobile; Math Subjects; Login; Upgrade ... an act in statistics whereby an analyst tests an assumption regarding a population proportion null hypothesis in a statistical test, the hypothesis that there is no significant difference between ...

  3. Single Proportion Hypothesis Test Calculator

    Null Hypothesis Alternative Hypothesis Number of Tails Description; P = P 0: P ≠ P 0: Two: Tests whether the population defined by the proportion, P, from which you drew your sample is different from the population defined by the null hypothesis's proportion, P 0.: P ≤ P 0: P > P 0: One (right)

  4. One Proportion Z-Test Calculator

    The test statistic is calculated as: z = (p-p 0) / √ (p0(1-p0)/n) where: p = observed sample proportion. p 0 = hypothesized population proportion. n = sample size. To perform a one proportion z-test, simply fill in the information below and then click the "Calculate" button. p0 (hypothesized population proportion)

  5. Z-test for One Population Proportion

    Instructions: This calculator conducts a Z-test for one population proportion (p). Please select the null and alternative hypotheses, type the hypothesized population proportion p_0 p0, the significance level \alpha α, the sample proportion or number o favorable cases, and the sample size, and the results of the z-test for one proportion will ...

  6. Z Score Calculator for 2 Population Proportions

    This is a simple z score calculator that calculates the value of z (and associated p value) for two population proportions. Z Score Calculator. Further Information. The z score test for two population proportions is used when you want to know whether two populations or groups (e.g., males and females; theists and atheists) differ significantly ...

  7. 8.8 Hypothesis Tests for a Population Proportion

    The p -value for a hypothesis test on a population proportion is the area in the tail (s) of distribution of the sample proportion. If both n× p ≥ 5 n × p ≥ 5 and n ×(1− p) ≥ 5 n × ( 1 − p) ≥ 5, use the normal distribution to find the p -value. If at least one of n× p < 5 n × p < 5 or n×(1 −p) < 5 n × ( 1 − p) < 5, use ...

  8. Hypothesis Testing Calculator with Steps

    Hypothesis Testing Calculator. The first step in hypothesis testing is to calculate the test statistic. The formula for the test statistic depends on whether the population standard deviation (σ) is known or unknown. If σ is known, our hypothesis test is known as a z test and we use the z distribution. If σ is unknown, our hypothesis test is ...

  9. 3.4: Hypothesis Test for a Population Proportion

    H1: p ≠ 0.75. Step 2) State the level of significance and the critical value. This is a two-sided question so alpha is divided by 2. Alpha is 0.05 so the critical values are ± Zα/2 = ± Z.025. Look on the negative side of the standard normal table, in the body of values for 0.025. The critical values are ± 1.96.

  10. 8.4: Hypothesis Test Examples for Proportions

    For the hypothesis test, she uses a 1% level of significance. Answer. Set up the hypothesis test: The 1% level of significance means that α = 0.01. This is a test of a single population proportion. \(H_{0}: p = 0.50\) \(H_{a}: p \neq 0.50\) The words "is the same or different from" tell you this is a two-tailed test. Calculate the distribution ...

  11. Hypothesis test for one sample proportion

    This calculator runs a one sample proportion test for a given sample data set and specified null and alternative hypotheses. In the fields below enter the sample size \(n\) and the number of scores with the trait of interest, \(f\). Enter a value for the null hypothesis. This value should indicate the absence of an effect in your data. It must ...

  12. Hypothesis Test for Population Proportions

    Explore math with our beautiful, free online graphing calculator. Graph functions, plot points, visualize algebraic equations, add sliders, animate graphs, and more. Hypothesis Test for Population Proportions. Save Copy Log InorSign Up. The following folder is where the assumed population proportion for the null hypothesis value is SELECTED ...

  13. PDF STAT 201 Chapter 9.1-9.2 Hypothesis Testing for Proportion

    Hypothesis Test for Proportions: Step 3 •Calculate Test Statistic, z* •The test statistic measures how different the sample proportion we have is from the null hypothesis •We calculate the z-score by assuming that is the population proportion 𝑧∗= ( − ) 1− 8

  14. Hypothesis Test for a Proportion

    Test statistic. The test statistic is a z-score (z) defined by the following equation. z = (p - P) / σ. where P is the hypothesized value of population proportion in the null hypothesis, p is the sample proportion, and σ is the standard deviation of the sampling distribution. P-value.

  15. Hypothesis tests for a proportion

    Instructions. Specify the Sample size, the true proportion in the population with characteristic of interest (True p), the null value being tested for the proportion (Null p), and the alternative for the test (Alternative).; Select Update applet to reload the applet with these parameters.; Select 1 test to compute the hypothesis test results based on a single sample of the specified size.

  16. Two Proportion Z-Test Calculator

    The test statistic is calculated as: z = (p 1 -p 2) / √ (p (1-p) (1/n1+1/n2) where: p = total pooled proportion. p 1 = sample 1 proportion. p 2 = sample 2 proportion. n 1 = sample 1 size. n 2 = sample 2 size. To perform a two proportion z-test, simply fill in the information below and then click the "Calculate" button.

  17. Statistics

    The following steps are used for a hypothesis test: For example: And we want to check the claim: By taking a sample of 40 randomly selected Nobel Prize winners we could find that: The sample proportion is then: \ (\displaystyle \frac {10} {40} = 0.25\), or 25%. From this sample data we check the claim with the steps below.

  18. Section 10.2: Hypothesis Tests for a Population Proportion

    10.1 The Language of Hypothesis Testing; 10.2 Hypothesis Tests for a Population Proportion ; 10.3 Hypothesis Tests for a Population Mean; 10.4 Hypothesis Tests for a Population Standard Deviation; 10.5 Putting It Together: Which Method Do I Use?

  19. Difference in Proportions Hypothesis Test Calculator

    Using the calculator above, you find that a difference in sample proportions of 3% [3% = 20% - 17%] would results in a z-score of 2.73 under the null distribution, which translates to a p-value of 0.63%. Interpret Your Results - Since your p-value of 0.63% is less than the significance level of 5%, you have sufficient evidence to reject the ...

  20. Test Statistic Calculator

    The calculated test value shows if there's enough evidence to reject a null hypothesis. Also, this calculator performs calculations of either for one population mean, comparing two means, single population proportion, and two population proportions. ... Test Statistic For a Single Population Proportion: This test is used to determine if a ...

  21. 6.4: Hypothesis Tests for a Single Population Proportion

    The Null Hypothesis. In every hypothesis test, we assume that the null hypothesis is true. The null hypothesis is always a statement of equality and therefore, should always contain an equal symbol (=). When a test involves a single population proportion, the null hypothesis will be \[H_0: p=\text { value }\nonumber\]

  22. Hypothesis test for two sample proportions

    This calculator runs a two sample independent proportions test for given sample data and specified null and alternative hypotheses. Enter the data in the fields below. For each sample enter the total number of scores ( n1 and n2) and the number of scores that have the trait of interest ( f1 and f2 ). Enter a value for the null hypothesis.

  23. 8.7: Hypothesis Test of Single Population Proportion with Examples

    Steps for performing Hypothesis Test for a Single Population Proportion. Step 1: State your hypotheses about the population proportion. Step 2: Summarize the data. State a significance level. State and check conditions required for the procedure. P^ = X n. Conditions for the test:

  24. 8.15: Hypothesis Test for a Population Proportion (3 of 3)

    Step 2: Collect the data. Since the hypothesis test is based on probability, random selection or assignment is essential in data production. Additionally, we need to check whether the sample proportion can be np ≥ 10 and n (1 − p) ≥ 10. Step 3: Assess the evidence. Determine the test statistic which is the z -score for the sample proportion.

  25. 2 Confidence Intervals and Sample Size

    Calculate the sample size needed for estimating population mean and population total, Compute the confidence interval for population proportion, ... It is not costly to set up the testing procedure again if needed whereas the sampling cost of each unit is expensive. We want to estimate the proportion to be within 0.01 with 95% confidence.

  26. PDF STA 2023 Introduction to Statistics 1 Online

    2 II Probability and Inference: Using the language of probability and the properties of numerical summaries computed from a random sample (Modules 8 through 13, approximately 4 weeks), we learn to draw conclusions about the population of interest, based on our random sample, and attach a measure of reliability to them (Modules 14 through

  27. Prediction-error signals in anterior cingulate cortex drive task

    Two-sided Chi-squared test of proportions comparing control and continuous ACC silencing P = 0.001, control and continuous PL silencing P = 0.65, continuous ACC silencing and continuous PL ...

  28. 6.4.1: Exercises

    6.4: Hypothesis Tests for a Single Population Proportion 6.4.1: Exercises Expand/collapse global location ... 6.4: Hypothesis Tests for a Single Population Proportion; 6.5: Conclusions; Was this article helpful? Yes; No; Recommended articles. Article type Section or Page Author Hannah Seidler-Wright

  29. 33: Hypothesis Test and Confidence Interval Calculator- Difference

    The student enters the sample sizes, the number of successes for each, the tail type, and the confidence level. Then the computer presents the test statistic, the p-value, and the confidence … 33: Hypothesis Test and Confidence Interval Calculator- Difference Between Population Proportions - Statistics LibreTexts

  30. 6.3: Introduction to Hypothesis Testing

    Either way, you made a decision about a population (all penny spins) which you cannot observe in its entirety. Your decision was based on a small sample (100 penny spins). In statistics, sample data help us make decisions about populations through a process known as hypothesis testing. A hypothesis is an assumption or claim.