Chi-Square (Χ²) Test & How To Calculate Formula Equation

Benjamin Frimodig

Science Expert

B.A., History and Science, Harvard University

Ben Frimodig is a 2021 graduate of Harvard College, where he studied the History of Science.

Learn about our Editorial Process

Saul McLeod, PhD

Editor-in-Chief for Simply Psychology

BSc (Hons) Psychology, MRes, PhD, University of Manchester

Saul McLeod, PhD., is a qualified psychology teacher with over 18 years of experience in further and higher education. He has been published in peer-reviewed journals, including the Journal of Clinical Psychology.

On This Page:

Chi-square (χ2) is used to test hypotheses about the distribution of observations into categories with no inherent ranking.

What Is a Chi-Square Statistic?

The Chi-square test (pronounced Kai) looks at the pattern of observations and will tell us if certain combinations of the categories occur more frequently than we would expect by chance, given the total number of times each category occurred.

It looks for an association between the variables. We cannot use a correlation coefficient to look for the patterns in this data because the categories often do not form a continuum.

There are three main types of Chi-square tests, tests of goodness of fit, the test of independence, and the test for homogeneity. All three tests rely on the same formula to compute a test statistic.

These tests function by deciphering relationships between observed sets of data and theoretical or “expected” sets of data that align with the null hypothesis.

What is a Contingency Table?

Contingency tables (also known as two-way tables) are grids in which Chi-square data is organized and displayed. They provide a basic picture of the interrelation between two variables and can help find interactions between them.

In contingency tables, one variable and each of its categories are listed vertically, and the other variable and each of its categories are listed horizontally.

Additionally, including column and row totals, also known as “marginal frequencies,” will help facilitate the Chi-square testing process.

In order for the Chi-square test to be considered trustworthy, each cell of your expected contingency table must have a value of at least five.

Each Chi-square test will have one contingency table representing observed counts (see Fig. 1) and one contingency table representing expected counts (see Fig. 2).

contingency table representing observed counts

Figure 1. Observed table (which contains the observed counts).

To obtain the expected frequencies for any cell in any cross-tabulation in which the two variables are assumed independent, multiply the row and column totals for that cell and divide the product by the total number of cases in the table.

contingency table representing observed counts

Figure 2. Expected table (what we expect the two-way table to look like if the two categorical variables are independent).

To decide if our calculated value for χ2 is significant, we also need to work out the degrees of freedom for our contingency table using the following formula: df= (rows – 1) x (columns – 1).

Formula Calculation

chi-squared-equation

Calculate the chi-square statistic (χ2) by completing the following steps:

  • Calculate the expected frequencies and the observed frequencies.
  • For each observed number in the table, subtract the corresponding expected number (O — E).
  • Square the difference (O —E)².
  • Divide the squares obtained for each cell in the table by the expected number for that cell (O – E)² / E.
  • Sum all the values for (O – E)² / E. This is the chi-square statistic.
  • Calculate the degrees of freedom for the contingency table using the following formula; df= (rows – 1) x (columns – 1).

Once we have calculated the degrees of freedom (df) and the chi-squared value (χ2), we can use the χ2 table (often at the back of a statistics book) to check if our value for χ2 is higher than the critical value given in the table. If it is, then our result is significant at the level given.

Interpretation

The chi-square statistic tells you how much difference exists between the observed count in each table cell to the counts you would expect if there were no relationship at all in the population.

Small Chi-Square Statistic: If the chi-square statistic is small and the p-value is large (usually greater than 0.05), this often indicates that the observed frequencies in the sample are close to what would be expected under the null hypothesis.

The null hypothesis usually states no association between the variables being studied or that the observed distribution fits the expected distribution.

In theory, if the observed and expected values were equal (no difference), then the chi-square statistic would be zero — but this is unlikely to happen in real life.

Large Chi-Square Statistic : If the chi-square statistic is large and the p-value is small (usually less than 0.05), then the conclusion is often that the data does not fit the model well, i.e., the observed and expected values are significantly different. This often leads to the rejection of the null hypothesis.

How to Report

To report a chi-square output in an APA-style results section, always rely on the following template:

χ2 ( degrees of freedom , N = sample size ) = chi-square statistic value , p = p value .

chi-squared-spss output

In the case of the above example, the results would be written as follows:

A chi-square test of independence showed that there was a significant association between gender and post-graduation education plans, χ2 (4, N = 101) = 54.50, p < .001.

APA Style Rules

  • Do not use a zero before a decimal when the statistic cannot be greater than 1 (proportion, correlation, level of statistical significance).
  • Report exact p values to two or three decimals (e.g., p = .006, p = .03).
  • However, report p values less than .001 as “ p < .001.”
  • Put a space before and after a mathematical operator (e.g., minus, plus, greater than, less than, equals sign).
  • Do not repeat statistics in both the text and a table or figure.

p -value Interpretation

You test whether a given χ2 is statistically significant by testing it against a table of chi-square distributions , according to the number of degrees of freedom for your sample, which is the number of categories minus 1. The chi-square assumes that you have at least 5 observations per category.

If you are using SPSS then you will have an expected p -value.

For a chi-square test, a p-value that is less than or equal to the .05 significance level indicates that the observed values are different to the expected values.

Thus, low p-values (p< .05) indicate a likely difference between the theoretical population and the collected sample. You can conclude that a relationship exists between the categorical variables.

Remember that p -values do not indicate the odds that the null hypothesis is true but rather provide the probability that one would obtain the sample distribution observed (or a more extreme distribution) if the null hypothesis was true.

A level of confidence necessary to accept the null hypothesis can never be reached. Therefore, conclusions must choose to either fail to reject the null or accept the alternative hypothesis, depending on the calculated p-value.

The four steps below show you how to analyze your data using a chi-square goodness-of-fit test in SPSS (when you have hypothesized that you have equal expected proportions).

Step 1 : Analyze > Nonparametric Tests > Legacy Dialogs > Chi-square… on the top menu as shown below:

Step 2 : Move the variable indicating categories into the “Test Variable List:” box.

Step 3 : If you want to test the hypothesis that all categories are equally likely, click “OK.”

Step 4 : Specify the expected count for each category by first clicking the “Values” button under “Expected Values.”

Step 5 : Then, in the box to the right of “Values,” enter the expected count for category one and click the “Add” button. Now enter the expected count for category two and click “Add.” Continue in this way until all expected counts have been entered.

Step 6 : Then click “OK.”

The four steps below show you how to analyze your data using a chi-square test of independence in SPSS Statistics.

Step 1 : Open the Crosstabs dialog (Analyze > Descriptive Statistics > Crosstabs).

Step 2 : Select the variables you want to compare using the chi-square test. Click one variable in the left window and then click the arrow at the top to move the variable. Select the row variable and the column variable.

Step 3 : Click Statistics (a new pop-up window will appear). Check Chi-square, then click Continue.

Step 4 : (Optional) Check the box for Display clustered bar charts.

Step 5 : Click OK.

Goodness-of-Fit Test

The Chi-square goodness of fit test is used to compare a randomly collected sample containing a single, categorical variable to a larger population.

This test is most commonly used to compare a random sample to the population from which it was potentially collected.

The test begins with the creation of a null and alternative hypothesis. In this case, the hypotheses are as follows:

Null Hypothesis (Ho) : The null hypothesis (Ho) is that the observed frequencies are the same (except for chance variation) as the expected frequencies. The collected data is consistent with the population distribution.

Alternative Hypothesis (Ha) : The collected data is not consistent with the population distribution.

The next step is to create a contingency table that represents how the data would be distributed if the null hypothesis were exactly correct.

The sample’s overall deviation from this theoretical/expected data will allow us to draw a conclusion, with a more severe deviation resulting in smaller p-values.

Test for Independence

The Chi-square test for independence looks for an association between two categorical variables within the same population.

Unlike the goodness of fit test, the test for independence does not compare a single observed variable to a theoretical population but rather two variables within a sample set to one another.

The hypotheses for a Chi-square test of independence are as follows:

Null Hypothesis (Ho) : There is no association between the two categorical variables in the population of interest.

Alternative Hypothesis (Ha) : There is no association between the two categorical variables in the population of interest.

The next step is to create a contingency table of expected values that reflects how a data set that perfectly aligns the null hypothesis would appear.

The simplest way to do this is to calculate the marginal frequencies of each row and column; the expected frequency of each cell is equal to the marginal frequency of the row and column that corresponds to a given cell in the observed contingency table divided by the total sample size.

Test for Homogeneity

The Chi-square test for homogeneity is organized and executed exactly the same as the test for independence.

The main difference to remember between the two is that the test for independence looks for an association between two categorical variables within the same population, while the test for homogeneity determines if the distribution of a variable is the same in each of several populations (thus allocating population itself as the second categorical variable).

Null Hypothesis (Ho) : There is no difference in the distribution of a categorical variable for several populations or treatments.

Alternative Hypothesis (Ha) : There is a difference in the distribution of a categorical variable for several populations or treatments.

The difference between these two tests can be a bit tricky to determine, especially in the practical applications of a Chi-square test. A reliable rule of thumb is to determine how the data was collected.

If the data consists of only one random sample with the observations classified according to two categorical variables, it is a test for independence. If the data consists of more than one independent random sample, it is a test for homogeneity.

What is the chi-square test?

The Chi-square test is a non-parametric statistical test used to determine if there’s a significant association between two or more categorical variables in a sample.

It works by comparing the observed frequencies in each category of a cross-tabulation with the frequencies expected under the null hypothesis, which assumes there is no relationship between the variables.

This test is often used in fields like biology, marketing, sociology, and psychology for hypothesis testing.

What does chi-square tell you?

The Chi-square test informs whether there is a significant association between two categorical variables. Suppose the calculated Chi-square value is above the critical value from the Chi-square distribution.

In that case, it suggests a significant relationship between the variables, rejecting the null hypothesis of no association.

How to calculate chi-square?

To calculate the Chi-square statistic, follow these steps:

1. Create a contingency table of observed frequencies for each category.

2. Calculate expected frequencies for each category under the null hypothesis.

3. Compute the Chi-square statistic using the formula: Χ² = Σ [ (O_i – E_i)² / E_i ], where O_i is the observed frequency and E_i is the expected frequency.

4. Compare the calculated statistic with the critical value from the Chi-square distribution to draw a conclusion.

Print Friendly, PDF & Email

LEARN STATISTICS EASILY

LEARN STATISTICS EASILY

Learn Data Analysis Now!

LEARN STATISTICS EASILY LOGO 2

Mastering the Chi-Square Test: A Comprehensive Guide

The Chi-Square Test is a statistical method used to determine if there’s a significant association between two categorical variables in a sample data set. It checks the independence of these variables, making it a robust and flexible tool for data analysis.

Introduction to Chi-Square Test

The  Chi-Square Test  of Independence is an important tool in the statistician’s arsenal. Its primary function is determining whether a significant association exists between two categorical variables in a sample data set. Essentially, it’s a test of independence, gauging if variations in one variable can impact another.

This comprehensive guide gives you a deeper understanding of the Chi-Square Test, its mechanics, importance, and correct implementation.

  • Chi-Square Test assess the association between two categorical variables.
  • Chi-Square Test requires the data to be a random sample.
  • Chi-Square Test is designed for categorical or nominal variables.
  • Each observation in the Chi-Square Test must be mutually exclusive and exhaustive.
  • Chi-Square Test can’t establish causality, only an association between variables.

 width=

Ad description. Lorem ipsum dolor sit amet, consectetur adipiscing elit.

Case Study: Chi-Square Test in Real-World Scenario

Let’s delve into a real-world scenario to illustrate the application of the  Chi-Square Test . Picture this: you’re the lead data analyst for a burgeoning shoe company. The company has an array of products but wants to enhance its marketing strategy by understanding if there’s an association between gender (Male, Female) and product preference (Sneakers, Loafers).

To start, you collect data from a random sample of customers, using a survey to identify their gender and their preferred shoe type. This data then gets organized into a contingency table , with gender across the top and shoe type down the side.

Next, you apply the Chi-Square Test to this data. The null hypothesis (H0) is that gender and shoe preference are independent. In contrast, the alternative hypothesis (H1) proposes that these variables are associated. After calculating the expected frequencies and the Chi-Square statistic, you compare this statistic with the critical value from the Chi-Square distribution.

Suppose the Chi-Square statistic is higher than the critical value in our scenario, leading to the rejection of the null hypothesis. This result indicates a significant association between gender and shoe preference. With this insight, the shoe company has valuable information for targeted marketing campaigns.

For instance, if the data shows that females prefer sneakers over loafers, the company might emphasize its sneaker line in marketing materials directed toward women. Conversely, if men show a higher preference for loafers, the company can highlight these products in campaigns targeting men.

This case study exemplifies the power of the Chi-Square Test. It’s a simple and effective tool that can drive strategic decisions in various real-world contexts, from marketing to medical research.

The Mathematics Behind Chi-Square Test

At the heart of the  Chi-Square Test  lies the calculation of the discrepancy between observed data and the expected data under the assumption of variable independence. This discrepancy termed the Chi-Square statistic, is calculated as the sum of squared differences between observed (O) and expected (E) frequencies, normalized by the expected frequencies in each category.

In mathematical terms, the Chi-Square statistic (χ²) can be represented as follows: χ² = Σ [ (Oᵢ – Eᵢ)² / Eᵢ ] , where the summation (Σ) is carried over all categories.

This formula quantifies the discrepancy between our observations and what we would expect if the null hypothesis of independence were true. We can decide on the variables’ independence by comparing the calculated Chi-Square statistic to a critical value from the Chi-Square distribution. Suppose the computed χ² is greater than the critical value. In that case, we reject the null hypothesis, indicating a significant association between the variables.

Step-by-Step Guide to Perform Chi-Square Test

To effectively execute a  Chi-Square Test , follow these methodical steps:

State the Hypotheses:  The null hypothesis (H0) posits no association between the variables — i.e., independent — while the alternative hypothesis (H1) posits an association between the variables.

Construct a Contingency Table:  Create a matrix to present your observations, with one variable defining the rows and the other defining the columns. Each table cell shows the frequency of observations corresponding to a particular combination of variable categories.

Calculate the Expected Values:  For each cell in the contingency table, calculate the expected frequency assuming that H0 is true. This can be calculated by multiplying the sum of the row and column for that cell and dividing by the total number of observations.

Compute the Chi-Square Statistic:  Apply the formula χ² = Σ [ (Oᵢ – Eᵢ)² / Eᵢ ] to compute the Chi-Square statistic.

Compare Your Test Statistic:  Evaluate your test statistic against a Chi-Square distribution to find the p-value, which will indicate the statistical significance of your test. If the p-value is less than your chosen significance level (usually 0.05), you reject H0.

Interpretation of the results should always be in the context of your research question and hypothesis. This includes considering practical significance — not just statistical significance — and ensuring your findings align with the broader theoretical understanding of the topic.

Steps in Chi-Square Test Description
State the Hypotheses The null hypothesis (H0) posits no association between the variables (i.e., they are independent), while the alternative hypothesis (H1) posits an association between the variables.
Construct a Contingency Table Create a matrix to present your observations, with one variable defining the rows and the other defining the columns. Each table cell shows the frequency of observations corresponding to a particular combination of variable categories.
Calculate the Expected Values For each cell in the contingency table, calculate the expected frequency under the assumption that H0 is true. This is calculated by multiplying the row and column total for that cell and dividing by the grand total.
Compute the Chi-Square Statistic Apply the formula χ² = Σ [ (Oᵢ – Eᵢ)² / Eᵢ ] to compute the Chi-Square statistic.
Compare Your Test Statistic Evaluate your test statistic against a Chi-Square distribution to find the p-value, which will indicate the statistical significance of your test. If the p-value is less than your chosen significance level (usually 0.05), you reject H0.
Interpret the Results Interpretation should always be in the context of your research question and hypothesis. Consider the practical significance, not just statistical significance, and ensure your findings align with the broader theoretical understanding of the topic.

Assumptions, Limitations, and Misconceptions

The  Chi-Square Test , a vital tool in statistical analysis, comes with certain assumptions and distinct limitations. Firstly, it presumes that the data used are a  random sample  from a larger population and that the variables under investigation are nominal or categorical. Each observation must fall into one unique category or cell in the analysis, meaning observations are mutually  exclusive  and  exhaustive .

The Chi-Square Test has limitations when deployed with small sample sizes. The  expected frequency  of any cell in the contingency table should ideally be 5 or more. If it falls short, this can cause distortions in the test findings, potentially triggering a Type I or Type II error.

Misuse and misconceptions about this test often center on its application and interpretability. A standard error is using it for continuous or ordinal data without appropriate  categorization , leading to misleading results. Also, a significant result from a Chi-Square Test indicates an association between variables, but it doesn’t infer  causality . This is a frequent misconception — interpreting the association as proof of causality — while the test doesn’t offer information about whether changes in one variable cause changes in another.

Moreover, more than a significant Chi-Square test is required to comprehensively understand the relationship between variables. To get a more nuanced interpretation, it’s crucial to accompany the test with a measure of  effect size , such as Cramer’s V or Phi coefficient for a 2×2 contingency table. These measures provide information about the strength of the association, adding another dimension to the interpretation of results. This is essential as statistically significant results do not necessarily imply a practically significant effect. An effect size measure is critical in large sample sizes where even minor deviations from independence might result in a significant Chi-Square test.

Conclusion and Further Reading

Mastering the  Chi-Square Test  is vital in any data analyst’s or statistician’s journey. Its wide range of applications and robustness make it a tool you’ll turn to repeatedly.

For further learning, statistical textbooks and online courses can provide more in-depth knowledge and practice. Don’t hesitate to delve deeper and keep exploring the fascinating world of data analysis.

  • Effect Size for Chi-Square Tests
  • Assumptions for the Chi-Square Test
  • Assumptions for Chi-Square Test (Story)
  • Chi Square Test – an overview (External Link)
  • Understanding the Null Hypothesis in Chi-Square
  • What is the Difference Between the T-Test vs. Chi-Square Test?
  • How to Report Chi-Square Test Results in APA Style: A Step-By-Step Guide

Frequently Asked Questions (FAQ)

It’s a statistical test used to determine if there’s a significant association between two categorical variables.

The test is suitable for categorical or nominal variables.

No, the test can only indicate an association, not a causal relationship.

The test assumes that the data is a random sample and that observations are mutually exclusive and exhaustive.

It measures the discrepancy between observed and expected data, calculated by χ² = Σ [ (Oᵢ – Eᵢ)² / Eᵢ ].

The result is generally considered statistically significant if the p-value is less than 0.05.

Misuse can lead to misleading results, making it crucial to use it with categorical data only.

Small sample sizes can lead to wrong results, especially when expected cell frequencies are less than 5.

Low expected cell frequencies can lead to Type I or Type II errors.

Results should be interpreted in context, considering the statistical significance and the broader understanding of the topic.

Similar Posts

chebyshev's theorem calculator

Chebyshev’s Theorem Calculator: A Tool for Unlocking Statistical Insights

Explore Chebyshev’s Theorem Calculator to unlock statistical insights and enhance your data analysis with precision and clarity.

what is t statistic

What is the T-Statistic? Mastering the Basics

What Is the T-Test Statistic? Master the basics of inferential statistics, enhancing your data analysis skills and decision-making abilities.

p-values

Music, Tea, and P-Values: A Tale of Impossible Results and P-Hacking

Uncover the intriguing truth about p-values and the concept of p-hacking in scientific research and its impact on statistical analysis.

calculation for standard deviation

Standard Deviation Calculator

Discover the power of our user-friendly Standard Deviation Calculator. The standard deviation importance!

standard deviation rules

Standard Deviation Rules Misconceptions

Standard deviation rules are often misunderstood, leading to incorrect data analysis. Learn the truth about these rules and how to use them correctly with this guide.

Decision Trees

Decision Trees: From Theory to Practice in Python for Aspiring Data Scientists

This is a step-by-step guide for beginners. Explore Decision Trees in Python and master this powerful data science tool for precise analysis.

Leave a Reply Cancel reply

Your email address will not be published. Required fields are marked *

Save my name, email, and website in this browser for the next time I comment.

hypothesis testing chi square

Encyclopedia Britannica

  • History & Society
  • Science & Tech
  • Biographies
  • Animals & Nature
  • Geography & Travel
  • Arts & Culture
  • Games & Quizzes
  • On This Day
  • One Good Fact
  • New Articles
  • Lifestyles & Social Issues
  • Philosophy & Religion
  • Politics, Law & Government
  • World History
  • Health & Medicine
  • Browse Biographies
  • Birds, Reptiles & Other Vertebrates
  • Bugs, Mollusks & Other Invertebrates
  • Environment
  • Fossils & Geologic Time
  • Entertainment & Pop Culture
  • Sports & Recreation
  • Visual Arts
  • Demystified
  • Image Galleries
  • Infographics
  • Top Questions
  • Britannica Kids
  • Saving Earth
  • Space Next 50
  • Student Center
  • Where was science invented?
  • When did science begin?

bunch of numbers

chi-squared test

Our editors will review what you’ve submitted and determine whether to revise the article.

  • Newcastle University - Chi-square Tests (Business)
  • Simply Psychology - Chi-Square (Χ²) Test and How to Calculate Formula Equation
  • Statistics LibreTexts - Chi-Square Tests
  • BMJ - The Chi squared tests
  • The University of utah - Department of Sociology - Chi-Square

chi-squared test , a hypothesis-testing method in which observed frequencies are compared with expected frequencies for experimental outcomes.

In hypothesis testing , data from a sample are used to draw conclusions about a population parameter or a population probability distribution. First, a tentative assumption is made about the parameter or distribution. This assumption is called the null hypothesis and is denoted by H 0 . An alternative hypothesis (denoted H a ), which is the opposite of what is stated in the null hypothesis, is then defined. The hypothesis-testing procedure involves using sample data to determine whether H 0 can be rejected. If H 0 is rejected, the statistical conclusion is that the alternative hypothesis H a is true.

The chi-squared test is such a hypothesis test. First, one selects a p -value, a measure of how likely the sample results are to fall in a predicted range, assuming the null hypothesis is true; the smaller the p -value, the less likely the sample results are to fall in a predicted range. If the p -value is less than α, the null hypothesis can be rejected; otherwise, the null hypothesis cannot be rejected. The value of α is often chosen to be 0.05.

One then calculates the chi-squared value. The formula for the chi-squared test is χ 2 = Σ ( O i − E i ) 2 / E i , where χ 2 represents the chi-squared value, O i represents the observed value, E i represents the expected value (that is, the value expected from the null hypothesis), and the symbol Σ represents the summation of values for all i . One then looks up in a table the chi-squared value that corresponds to the chosen p - value and the number of degrees of freedom of the data (that is, the number of categories of the data minus one). If that value from the table is less than the chi-squared value calculated from the data, one can reject the null hypothesis.

The two most common chi-squared tests are the one-variable goodness of fit test and the two-variable test of independence. The one-variable goodness of fit test determines if one variable value is likely or not likely to be within a given distribution. For example, suppose a study was being conducted to measure the volume of soda in cans being filled with soda at a bottling and distribution centre. A one-variable goodness of fit test might be used to determine the likelihood that a randomly selected can of soda has a volume within a fixed volume range--this range refers to all acceptable volumes of soda in cans filled at the centre.

The two-variable test of independence determines whether two variables could be related. For example, a a two-variable test of independence could be used to test whether there is a correlation between the types of books people choose to read and the season of the year when they make their choices.

hypothesis testing chi square

Hypothesis Testing - Chi Squared Test

  •   1  
  • |   2  
  • |   3  
  • |   4  

On This Page sidebar

Introduction

Learning objectives.

Handouts sidebar

Chi-Squared Table

Learn More sidebar

All Modules

Lisa Sullivan, PhD

Professor of Biostatistics

Boston University School of Public Health

Introductory word scramble

This module will continue the discussion of hypothesis testing, where a specific statement or hypothesis is generated about a population parameter, and sample statistics are used to assess the likelihood that the hypothesis is true. The hypothesis is based on available information and the investigator's belief about the population parameters. The specific tests considered here are called chi-square tests and are appropriate when the outcome is discrete (dichotomous, ordinal or categorical). For example, in some clinical trials the outcome is a classification such as hypertensive, pre-hypertensive or normotensive. We could use the same classification in an observational study such as the Framingham Heart Study to compare men and women in terms of their blood pressure status - again using the classification of hypertensive, pre-hypertensive or normotensive status.  

The technique to analyze a discrete outcome uses what is called a chi-square test. Specifically, the test statistic follows a chi-square probability distribution. We will consider chi-square tests here with one, two and more than two independent comparison groups.

After completing this module, the student will be able to:

  • Perform chi-square tests by hand
  • Appropriately interpret results of chi-square tests
  • Identify the appropriate hypothesis testing procedure based on type of outcome variable and number of samples

return to top | next page

Content ©2016. All Rights Reserved. Date last modified: September 1, 2016. Wayne W. LaMorte, MD, PhD, MPH

8. The Chi squared tests

The χ²tests.

Tablet 8.1  Number of variablesOneTwoPurpose of testDecide if one variable is likely to come from a given distribution or notDecide if two variables might be related or notExampleDecide if bags of candy have the same number of pieces of each flavor or notDecide if movie goers' decision to buy snacks is related to the type of movie they plan to watchHypotheses in example

H : proportion of flavors of candy are the same

H : proportions of flavors are not the same

H : proportion of people who buy snacks is independent of the movie type

H : proportion of people who buy snacks is different for different types of movies

used in testChi-SquareChi-SquareDegrees of freedom

Number of categories minus 1

Number of categories for first variable minus 1, multiplied by number of categories for second variable minus 1

How to perform a Chi-square test

For both the Chi-square goodness of fit test and the Chi-square test of independence , you perform the same analysis steps, listed below. Visit the pages for each type of test to see these steps in action.

  • Define your null and alternative hypotheses before collecting your data.
  • Decide on the alpha value. This involves deciding the risk you are willing to take of drawing the wrong conclusion. For example, suppose you set α=0.05 when testing for independence. Here, you have decided on a 5% risk of concluding the two variables are independent when in reality they are not.
  • Check the data for errors.
  • Check the assumptions for the test. (Visit the pages for each test type for more detail on assumptions.)
  • Perform the test and draw your conclusion.

Both Chi-square tests in the table above involve calculating a test statistic. The basic idea behind the tests is that you compare the actual data values with what would be expected if the null hypothesis is true. The test statistic involves finding the squared difference between actual and expected data values, and dividing that difference by the expected data values. You do this for each data point and add up the values.

Then, you compare the test statistic to a theoretical value from the Chi-square distribution . The theoretical value depends on both the alpha value and the degrees of freedom for your data. Visit the pages for each test type for detailed examples.

Tutorial Playlist

Statistics tutorial, everything you need to know about the probability density function in statistics, the best guide to understand central limit theorem, an in-depth guide to measures of central tendency : mean, median and mode, the ultimate guide to understand conditional probability, a comprehensive look at percentile in statistics, the best guide to understand bayes theorem, everything you need to know about the normal distribution, an in-depth explanation of cumulative distribution function, a complete guide to chi-square test.

What Is Hypothesis Testing in Statistics? Types and Examples

Understanding the Fundamentals of Arithmetic and Geometric Progression

The definitive guide to understand spearman’s rank correlation, mean squared error: overview, examples, concepts and more, all you need to know about the empirical rule in statistics, the complete guide to skewness and kurtosis, a holistic look at bernoulli distribution, all you need to know about bias in statistics, a complete guide to get a grasp of time series analysis.

The Key Differences Between Z-Test Vs. T-Test

The Complete Guide to Understand Pearson's Correlation

A complete guide on the types of statistical studies, everything you need to know about poisson distribution, your best guide to understand correlation vs. regression, the most comprehensive guide for beginners on what is correlation, what is a chi-square test - types | formula | examples.

Lesson 9 of 24 By Avijeet Biswal

A Complete Guide to Chi-Square Test

Table of Contents

A statistical technique called chi-squared test (represented symbolically as χ²) is employed to examine discrepancies between the data distributions that are observed and those that are expected. Known also as Pearson's chi-squared test, it was developed in 1900 by Karl Pearson for the analysis of categorical data and distribution. Assuming the null hypothesis is correct, this test determines the probability that the observed frequencies in a sample match the predicted frequencies. The null hypothesis, which essentially suggests that any observed differences are the result of random chance, is a statement that suggests there is no substantial difference between the observed and predicted frequencies. Usually, the sum of the squared differences between the predicted and observed frequencies, normalized by the expected frequencies, over the sample variance is used to construct chi-squared tests. This test offers a means to test theories on the links between categorical variables by determining whether the observed deviations are statistically significant or can be attributable to chance.

The world is constantly curious about the Chi-Square test's application in machine learning and how it makes a difference. Feature selection is a critical topic in machine learning , as you will have multiple features in line and must choose the best ones to build the model. By examining the relationship between the elements, the chi-square test aids in the solution of feature selection problems. In this tutorial, you will learn about the chi-square test and its application.

What Is a Chi-Square Test?

The Chi-Square test is a statistical procedure for determining the difference between observed and expected data. This test can also be used to determine whether it correlates to the categorical variables in our data. It helps to find out whether a difference between two categorical variables is due to chance or a relationship between them.

A chi-square test is a statistical test that is used to compare observed and expected results. The goal of this test is to identify whether a disparity between actual and predicted data is due to chance or to a link between the variables under consideration. As a result, the chi-square test is an ideal choice for aiding in our understanding and interpretation of the connection between our two categorical variables.

A chi-square test or comparable nonparametric test is required to test a hypothesis regarding the distribution of a categorical variable. Categorical variables, which indicate categories such as animals or countries, can be nominal or ordinal. They cannot have a normal distribution since they can only have a few particular values.

For example, a meal delivery firm in India wants to investigate the link between gender, geography, and people's food preferences.

It is used to calculate the difference between two categorical variables, which are:

  • As a result of chance or
  • Because of the relationship

Your Data Analytics Career is Around The Corner!

Your Data Analytics Career is Around The Corner!

Chi-Square Test Formula

Chi_Sq_formula.

c = Degrees of freedom

O = Observed Value

E = Expected Value

The degrees of freedom in a  statistical calculation represent the number of variables that can vary in a calculation. The degrees of freedom can be calculated to ensure that chi-square tests are statistically valid. These tests are frequently used to compare observed data with data that would be expected to be obtained if a particular hypothesis were true.

The Observed values are those you gather yourselves.

The expected values are the frequencies expected, based on the null hypothesis. 

Fundamentals of Hypothesis Testing

Hypothesis testing is a technique for interpreting and drawing inferences about a population based on sample data. It aids in determining which sample data best support mutually exclusive population claims.

Null Hypothesis (H0) - The Null Hypothesis is the assumption that the event will not occur. A null hypothesis has no bearing on the study's outcome unless it is rejected.

H0 is the symbol for it, and it is pronounced H-naught.

Alternate Hypothesis(H1 or Ha) - The Alternate Hypothesis is the logical opposite of the null hypothesis. The acceptance of the alternative hypothesis follows the rejection of the null hypothesis. H1 is the symbol for it.

Types of Chi-Square Tests

There are two main types of Chi-Square tests namely -

Independence 

  • Goodness-of-Fit 

The Chi-Square Test of Independence is a derivable ( also known as inferential ) statistical test which examines whether the two sets of variables are likely to be related with each other or not. This test is used when we have counts of values for two nominal or categorical variables and is considered as non-parametric test. A relatively large sample size and independence of obseravations are the required criteria for conducting this test.

In a movie theatre, suppose we made a list of movie genres. Let us consider this as the first variable. The second variable is whether or not the people who came to watch those genres of movies have bought snacks at the theatre. Here the null hypothesis is that th genre of the film and whether people bought snacks or not are unrelatable. If this is true, the movie genres don’t impact snack sales. 

Learn All The Tricks Of The BI Trade

Learn All The Tricks Of The BI Trade

Goodness-Of-Fit

In statistical hypothesis testing, the Chi-Square Goodness-of-Fit test determines whether a variable is likely to come from a given distribution or not. We must have a set of data values and the idea of the distribution of this data. We can use this test when we have value counts for categorical variables. This test demonstrates a way of deciding if the data values have a “ good enough” fit for our idea or if it is a representative sample data of the entire population. 

Suppose we have bags of balls with five different colours in each bag. The given condition is that the bag should contain an equal number of balls of each colour. The idea we would like to test here is that the proportions of the five colours of balls in each bag must be exact. 

What Are Categorical Variables?

Categorical variables belong to a subset of variables that can be divided into discrete categories. Names or labels are the most common categories. These variables are also known as qualitative variables because they depict the variable's quality or characteristics.

Categorical variables can be divided into two categories:

  • Nominal Variable: A nominal variable's categories have no natural ordering. Example: Gender, Blood groups
  • Ordinal Variable: A variable that allows the categories to be sorted is ordinal variables. Customer satisfaction (Excellent, Very Good, Good, Average, Bad, and so on) is an example.

How to Perform a Chi-Square Test?

Let's say you want to know if gender has anything to do with political party preference. You poll 440 voters in a simple random sample to find out which political party they prefer. The results of the survey are shown in the table below:

chi-1.

To see if gender is linked to political party preference, perform a Chi-Square test of independence using the steps below.

Step 1: Define the Hypothesis

H0: There is no link between gender and political party preference.

H1: There is a link between gender and political party preference.

Step 2: Calculate the Expected Values

Now you will calculate the expected frequency.

Chi_Sq_formula_1.

For example, the expected value for Male Republicans is: 

Chi_Sq_formula_2

Similarly, you can calculate the expected value for each of the cells.

chi-2.

Step 3: Calculate (O-E)2 / E for Each Cell in the Table

Now you will calculate the (O - E)2 / E for each cell in the table.

chi-3.

Step 4: Calculate the Test Statistic X2

X2  is the sum of all the values in the last table

 =  0.743 + 2.05 + 2.33 + 3.33 + 0.384 + 1

Before you can conclude, you must first determine the critical statistic, which requires determining our degrees of freedom. The degrees of freedom in this case are equal to the table's number of columns minus one multiplied by the table's number of rows minus one, or (r-1) (c-1). We have (3-1)(2-1) = 2.

Finally, you compare our obtained statistic to the critical statistic found in the chi-square table. As you can see, for an alpha level of 0.05 and two degrees of freedom, the critical statistic is 5.991, which is less than our obtained statistic of 9.83. You can reject our null hypothesis because the critical statistic is higher than your obtained statistic.

This means you have sufficient evidence to say that there is an association between gender and political party preference.

Chi_Sq_formula_3

Chi-Square Practice Problems

1. voting patterns.

A researcher wants to know if voting preferences (party A, party B, or party C) and gender (male, female) are related. Apply a chi-square test to the following set of data:

  • Male: Party A - 30, Party B - 20, Party C - 50
  • Female: Party A - 40, Party B - 30, Party C - 30

To determine if gender influences voting preferences, run a chi-square test of independence.

2. State of Health

In a sample population, a medical study examines the association between smoking status (smoker, non-smoker) and the occurrence of lung disease (yes, no). The information is as follows:

  • Smoker: Yes - 90, No - 60
  • Non-smoker: Yes - 30, No - 120 

To find out if smoking status is related to the incidence of lung disease, do a chi-square test.

3. Consumer Preferences

Customers are surveyed by a company to determine whether their age group (under 20, 20-40, over 40) and their preferred product category (food, apparel, or electronics) are related. The information gathered is:

  • Under 20: Electronic - 50, Clothing - 30, Food - 20
  • 20-40: Electronic - 60, Clothing - 70, Food - 50
  • Over 40: Electronic - 30, Clothing - 40, Food - 80

Use a chi-square test to investigate the connection between product preference and age group

4. Academic Performance

An educational researcher looks at the relationship between students' success on standardized tests (pass, fail) and whether or not they participate in after-school programs. The information is as follows:

  • Yes: Pass - 80, Fail - 20
  • No: Pass - 50, Fail - 50

Use a chi-square test to determine if involvement in after-school programs and test scores are connected.

5. Genetic Inheritance

A geneticist investigates how a particular trait is inherited in plants and seeks to ascertain whether the expression of a trait (trait present, trait absent) and the existence of a genetic marker (marker present, marker absent) are significantly correlated. The information gathered is:

  • Marker Present: Trait Present - 70, Trait Absent - 30
  • Marker Absent: Trait Present - 40, Trait Absent - 60

Do a chi-square test to determine if there is a correlation between the trait's expression and the genetic marker.

How to Solve Chi-Square Problems?

1. state the hypotheses.

  • Null hypothesis (H0): There is no association between the variables
  • Alternative hypothesis (H1): There is an association between the variables.

2. Calculate the Expected Frequencies

  • Use the formula: E=(Row Total×Column Total)Grand TotalE = \frac{(Row \ Total \times Column \ Total)}{Grand \ Total}E=Grand Total(Row Total×Column Total)​

3. Compute the Chi-Square Statistic

  • Use the formula: χ2=∑(O−E)2E\chi^2 = \sum \frac{(O - E)^2}{E}χ2=∑E(O−E)2​, where O is the observed frequency and E is the expected frequency.

4. Determine the Degrees of Freedom (df)

  • Use the formula: df=(number of rows−1)×(number of columns−1)df = (number \ of \ rows - 1) \times (number \ of \ columns - 1)df=(number of rows−1)×(number of columns−1)

5. Find the Critical Value and Compare

  • Use the chi-square distribution table to find the critical value for the given df and significance level (usually 0.05).
  • Compare the chi-square statistic to the critical value to decide whether to reject the null hypothesis.

These practice problems help you understand how chi-square analysis tests hypotheses and explores relationships between categorical variables in various fields.

When to Use a Chi-Square Test?

A Chi-Square Test is used to examine whether the observed results are in order with the expected values. When the data to be analysed is from a random sample, and when the variable is the question is a categorical variable, then Chi-Square proves the most appropriate test for the same. A categorical variable consists of selections such as breeds of dogs, types of cars, genres of movies, educational attainment, male v/s female etc. Survey responses and questionnaires are the primary sources of these types of data. The Chi-square test is most commonly used for analysing this kind of data. This type of analysis is helpful for researchers who are studying survey response data. The research can range from customer and marketing research to political sciences and economics. 

The Ultimate Ticket to Top Data Science Job Roles

The Ultimate Ticket to Top Data Science Job Roles

Chi-Square Distribution 

Chi-square distributions (X2) are a type of continuous probability distribution. They're commonly utilized in hypothesis testing, such as the chi-square goodness of fit and independence tests. The parameter k, which represents the degrees of freedom, determines the shape of a chi-square distribution.

A chi-square distribution is followed by very few real-world observations. The objective of chi-square distributions is to test hypotheses, not to describe real-world distributions. In contrast, most other commonly used distributions, such as normal and Poisson distributions, may explain important things like baby birth weights or illness cases per year.

Because of its close resemblance to the conventional normal distribution, chi-square distributions are excellent for hypothesis testing. Many essential statistical tests rely on the conventional normal distribution.

In statistical analysis , the Chi-Square distribution is used in many hypothesis tests and is determined by the parameter k degree of freedoms. It belongs to the family of continuous probability distributions . The Sum of the squares of the k independent standard random variables is called the Chi-Squared distribution. Pearson’s Chi-Square Test formula is - 

Chi_Square_Distribution_1

Where X^2 is the Chi-Square test symbol

Σ is the summation of observations

O is the observed results

E is the expected results 

The shape of the distribution graph changes with the increase in the value of k, i.e. degree of freedoms. 

When k is 1 or 2, the Chi-square distribution curve is shaped like a backwards ‘J’. It means there is a high chance that X^2 becomes close to zero. 

Courtesy: Scribbr

When k is greater than 2, the shape of the distribution curve looks like a hump and has a low probability that X^2 is very near to 0 or very far from 0. The distribution occurs much longer on the right-hand side and shorter on the left-hand side. The probable value of X^2 is (X^2 - 2).

When k is greater than ninety, a normal distribution is seen, approximating the Chi-square distribution.

What is the P-Value in a Chi-Square Test?

The P-Value in a Chi-Square test is a statistical measure that helps to assess the importance of your test results.

Here P denotes the probability; hence for the calculation of p-values, the Chi-Square test comes into the picture. The different p-values indicate different types of hypothesis interpretations. 

  • P <= 0.05 (Hypothesis interpretations are rejected)
  • P>= 0.05 (Hypothesis interpretations are accepted) 

The concepts of probability and statistics are entangled with Chi-Square Test. Probability is the estimation of something that is most likely to happen. Simply put, it is the possibility of an event or outcome of the sample. Probability can understandably represent bulky or complicated data. And statistics involves collecting and organising, analysing, interpreting and presenting the data. 

Finding P-Value

When you run all of the Chi-square tests, you'll get a test statistic called X2. You have two options for determining whether this test statistic is statistically significant at some alpha level:

  • Compare the test statistic X2 to a critical value from the Chi-square distribution table.
  • Compare the p-value of the test statistic X2 to a chosen alpha level.

Test statistics are calculated by taking into account the sampling distribution of the test statistic under the null hypothesis, the sample data, and the approach which is chosen for performing the test. 

The p-value will be as mentioned in the following cases.

  • A lower-tailed test is specified by: P(TS ts | H0 is true) p-value = cdf (ts)
  • Lower-tailed tests have the following definition: P(TS ts | H0 is true) p-value = cdf (ts)
  • A two-sided test is defined as follows, if we assume that the test static distribution  of H0 is symmetric about 0. 2 * P(TS |ts| | H0 is true) = 2 * (1 - cdf(|ts|))

P: probability Event

TS: Test statistic is computed observed value of the test statistic from your sample cdf(): Cumulative distribution function of the test statistic's distribution (TS)

Properties of Chi-Square Test 

  • Variance is double the times the number of degrees of freedom.
  • Mean distribution is equal to the number of degrees of freedom.
  • When the degree of freedom increases, the Chi-Square distribution curve becomes normal.

Limitations of Chi-Square Test

There are two limitations to using the chi-square test that you should be aware of. 

  • The chi-square test, for starters, is extremely sensitive to sample size. Even insignificant relationships can appear statistically significant when a large enough sample is used. Keep in mind that "statistically significant" does not always imply "meaningful" when using the chi-square test.
  • Be mindful that the chi-square can only determine whether two variables are related. It does not necessarily follow that one variable has a causal relationship with the other. It would require a more detailed analysis to establish causality.

Get In-Demand Skills to Launch Your Data Career

Get In-Demand Skills to Launch Your Data Career

In this tutorial titled ‘The Complete Guide to Chi-square test’, you explored the concept of Chi-square distribution and how to find the related values. You also take a look at how the critical value and chi-square value is related to each other.

If you want to gain more insight and get a work-ready understanding in statistical concepts and learn how to use them to get into a career in Data Analytics , our Post Graduate Program in Data Analytics in partnership with Purdue University should be your next stop. A comprehensive program with training from top practitioners and in collaboration with IBM, this will be all that you need to kickstart your career in the field. 

Was this tutorial on the Chi-square test useful to you? Do you have any doubts or questions for us? Mention them in this article's comments section, and we'll have our experts answer them for you at the earliest!

1. What is the chi-square test used for? 

The chi-square test is a statistical method used to determine if there is a significant association between two categorical variables. It helps researchers understand whether the observed distribution of data differs from the expected distribution, allowing them to assess whether any relationship exists between the variables being studied.

2. What is the chi-square test and its types? 

The chi-square test is a statistical test used to analyze categorical data and assess the independence or association between variables. There are two main types of chi-square tests:

a) Chi-square test of independence: This test determines whether there is a significant association between two categorical variables. b) Chi-square goodness-of-fit test: This test compares the observed data to the expected data to assess how well the observed data fit the expected distribution.

3. What is the difference between t-test and chi-square? 

The t-test and the chi-square test are two different statistical tests used for different types of data. The t-test is used to compare the means of two groups and is suitable for continuous numerical data. On the other hand, the chi-square test is used to examine the association between two categorical variables. It is applicable to discrete, categorical data. So, the choice between the t-test and chi-square test depends on the nature of the data being analyzed.

Find our PL-300 Microsoft Power BI Certification Training Online Classroom training classes in top cities:

NameDatePlace
7 Sep -22 Sep 2024,
Weekend batch
Your City
21 Sep -6 Oct 2024,
Weekend batch
Your City
12 Oct -27 Oct 2024,
Weekend batch
Your City

About the Author

Avijeet Biswal

Avijeet is a Senior Research Analyst at Simplilearn. Passionate about Data Analytics, Machine Learning, and Deep Learning, Avijeet is also interested in politics, cricket, and football.

Recommended Resources

What Is Hypothesis Testing in Statistics? Types and Examples

Getting Started with Google Display Network: The Ultimate Beginner’s Guide

Sanity Testing Vs Smoke Testing: Know the Differences, Applications, and Benefits Of Each

Sanity Testing Vs Smoke Testing: Know the Differences, Applications, and Benefits Of Each

Fundamentals of Software Testing

Fundamentals of Software Testing

The Key Differences Between Z-Test Vs. T-Test

The Building Blocks of API Development

  • PMP, PMI, PMBOK, CAPM, PgMP, PfMP, ACP, PBA, RMP, SP, and OPM3 are registered marks of the Project Management Institute, Inc.

hypothesis testing chi square

User Preferences

Content preview.

Arcu felis bibendum ut tristique et egestas quis:

  • Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris
  • Duis aute irure dolor in reprehenderit in voluptate
  • Excepteur sint occaecat cupidatat non proident

Keyboard Shortcuts

8.1 - the chi-square test of independence.

How do we test the independence of two categorical variables? It will be done using the Chi-Square Test of Independence.

As with all prior statistical tests we need to define null and alternative hypotheses. Also, as we have learned, the null hypothesis is what is assumed to be true until we have evidence to go against it. In this lesson, we are interested in researching if two categorical variables are related or associated (i.e., dependent). Therefore, until we have evidence to suggest that they are, we must assume that they are not. This is the motivation behind the hypothesis for the Chi-Square Test of Independence:

  • \(H_0\): In the population, the two categorical variables are independent.
  • \(H_a\): In the population, the two categorical variables are dependent.

Note! There are several ways to phrase these hypotheses. Instead of using the words "independent" and "dependent" one could say "there is no relationship between the two categorical variables" versus "there is a relationship between the two categorical variables." Or "there is no association between the two categorical variables" versus "there is an association between the two variables." The important part is that the null hypothesis refers to the two categorical variables not being related while the alternative is trying to show that they are related.

Once we have gathered our data, we summarize the data in the two-way contingency table. This table represents the observed counts and is called the Observed Counts Table or simply the Observed Table. The contingency table on the introduction page to this lesson represented the observed counts of the party affiliation and opinion for those surveyed.

The question becomes, "How would this table look if the two variables were not related?" That is, under the null hypothesis that the two variables are independent, what would we expect our data to look like?

Consider the following table:

  Success Failure Total
Group 1 A B A+B
Group 2 C D C+D
Total A+C B+D A+B+C+D

The total count is \(A+B+C+D\). Let's focus on one cell, say Group 1 and Success with observed count A. If we go back to our probability lesson, let \(G_1\) denote the event 'Group 1' and \(S\) denote the event 'Success.' Then,

\(P(G_1)=\dfrac{A+B}{A+B+C+D}\) and \(P(S)=\dfrac{A+C}{A+B+C+D}\).

Recall that if two events are independent, then their intersection is the product of their respective probabilities. In other words, if \(G_1\) and \(S\) are independent, then...

\begin{align} P(G_1\cap S)&=P(G_1)P(S)\\&=\left(\dfrac{A+B}{A+B+C+D}\right)\left(\dfrac{A+C}{A+B+C+D}\right)\\[10pt] &=\dfrac{(A+B)(A+C)}{(A+B+C+D)^2}\end{align}

If we considered counts instead of probabilities, then we get the count by multiplying the probability by the total count. In other words...

\begin{align} \text{Expected count for cell with A} &=P(G_1)P(S)\  x\  (\text{total count}) \\   &= \left(\dfrac{(A+B)(A+C)}{(A+B+C+D)^2}\right)(A+B+C+D)\\[10pt]&=\mathbf{\dfrac{(A+B)(A+C)}{A+B+C+D}} \end{align}

This is the count we would expect to see if the two variables were independent (i.e. assuming the null hypothesis is true).

The expected count for each cell under the null hypothesis is:

\(E=\dfrac{\text{(row total)}(\text{column total})}{\text{total sample size}}\)

Example 8-1: Political Affiliation and Opinion Section  

To demonstrate, we will use the Party Affiliation and Opinion on Tax Reform example.

Observed Table:

  favor indifferent opposed total
democrat 138 83 64 285
republican 64 67 84 215
total 202 150 148 500

Find the expected counts for all of the cells.

We need to find what is called the Expected Counts Table or simply the Expected Table. This table displays what the counts would be for our sample data if there were no association between the variables.

Calculating Expected Counts from Observed Counts

  favor indifferent opposed total
democrat \(\frac{285(202)}{500}=115.14\) \(\frac{285(150)}{500}=85.5\) \(\frac{285(148)}{500}=84.36\) 285
republican \(\frac{215(202)}{500}=86.86\) \(\frac{215(150)}{500}=64.5\) \(\frac{215(148)}{500}=63.64\) 215
total 202 150 148 500

Chi-Square Test Statistic Section  

To better understand what these expected counts represent, first recall that the expected counts table is designed to reflect what the sample data counts would be if the two variables were independent. Taking what we know of independent events, we would be saying that the sample counts should show similarity in opinions of tax reform between democrats and republicans. If you find the proportion of each cell by taking a cell's expected count divided by its row total, you will discover that in the expected table each opinion proportion is the same for democrats and republicans. That is, from the expected counts, 0.404 of the democrats and 0.404 of the republicans favor the bill; 0.3 of the democrats and 0.3 of the republicans are indifferent; and 0.296 of the democrats and 0.296 of the republicans are opposed.

The statistical question becomes, "Are the observed counts so different from the expected counts that we can conclude a relationship exists between the two variables?" To conduct this test we compute a Chi-Square test statistic where we compare each cell's observed count to its respective expected count.

In a summary table, we have \(r\times c=rc\) cells. Let \(O_1, O_2, …, O_{rc}\) denote the observed counts for each cell and \(E_1, E_2, …, E_{rc}\) denote the respective expected counts for each cell.

The Chi-Square test statistic is calculated as follows:

\(\chi^{2*}=\frac{(O_1-E_1)^2}{E_1}+\frac{(O_2-E_2)^2}{E_2}+...+\frac{(O_{rc}-E_{rc})^2}{E_{rc}}=\overset{rc}{ \underset{i=1}{\sum}}\frac{(O_i-E_i)^2}{E_i}\)

Under the null hypothesis and certain conditions (discussed below), the test statistic follows a Chi-Square distribution with degrees of freedom equal to \((r-1)(c-1)\), where \(r\) is the number of rows and \(c\) is the number of columns. We leave out the mathematical details to show why this test statistic is used and why it follows a Chi-Square distribution.

As we have done with other statistical tests, we make our decision by either comparing the value of the test statistic to a critical value (rejection region approach) or by finding the probability of getting this test statistic value or one more extreme (p-value approach).

The critical value for our Chi-Square test is \(\chi^2_{\alpha}\) with degree of freedom =\((r - 1) (c -1)\), while the p-value is found by \(P(\chi^2>\chi^{2*})\) with degrees of freedom =\((r - 1)(c - 1)\).

Example 8-1 Cont'd: Chi-Square Section  

Let's apply the Chi-Square Test of Independence to our example where we have a random sample of 500 U.S. adults who are questioned regarding their political affiliation and opinion on a tax reform bill. We will test if the political affiliation and their opinion on a tax reform bill are dependent at a 5% level of significance. Calculate the test statistic.

  • Using Minitab

The contingency table ( political_affiliation.csv ) is given below. Each cell contains the observed count and the expected count in parentheses. For example, there were 138 democrats who favored the tax bill. The expected count under the null hypothesis is 115.14. Therefore, the cell is displayed as 138 (115.14).

  favor indifferent opposed total
democrat 138 (115.14) 83 (85.5) 64 (84.36) 285
republican 64 (86.86) 67 (64.50) 84 (63.64) 215
total 202 150 148 500

Calculating the test statistic by hand:

\begin{multline} \chi^{2*}=\dfrac{(138−115.14)^2}{115.14}+\dfrac{(83−85.50)^2}{85.50}+\dfrac{(64−84.36)^2}{84.36}+\\ \dfrac{(64−86.86)^2}{86.86}+\dfrac{(67−64.50)^2}{64.50}+\dfrac{(84−63.64)^2}{63.64}=22.152\end{multline}

...with degrees for freedom equal to \((2 - 1)(3 - 1) = 2\).

  Minitab: Chi-Square Test of Independence

To perform the Chi-Square test in Minitab...

  • Choose Stat  >  Tables  >  Chi-Square Test for Association
  • If you have summarized data (i.e., observed count) from the drop-down box 'Summarized data in a two-way table.' Select and enter the columns that contain the observed counts, otherwise, if you have the raw data use 'Raw data' (categorical variables). Note that if using the raw data your data will need to consist of two columns: one with the explanatory variable data (goes in the 'row' field) and the response variable data (goes in the 'column' field).
  • Labeling (Optional) When using the summarized data you can label the rows and columns if you have the variable labels in columns of the worksheet. For example, if we have a column with the two political party affiliations and a column with the three opinion choices we could use these columns to label the output.
  • Click the Statistics  tab. Keep checked the four boxes already checked, but also check the box for 'Each cell's contribution to the chi-square.' Click OK .

Note! If you have the observed counts in a table, you can copy/paste them into Minitab. For instance, you can copy the entire observed counts table (excluding the totals!) for our example and paste these into Minitab starting with the first empty cell of a column.

The following is the Minitab output for this example.

Cell Contents: Count, Expected count, Contribution to Chi-Square

 

favor

indifferent opposed All

1

138

115.14

4.5836

83

85.50

0.0731

64

84.36

4.9138

285

2

64

86.86

6.0163

67

64.50

0.0969

84

63.64

6.5137

215

All

202 150 148 500

Pearson Chi-Sq = 4.5386 + 0.073 + 4.914 + 6.016 + 0.097 + 6.5137 = 22.152 DF = 2, P-Value = 0.000

Likelihood Ratio Chi-Square

The Chi-Square test statistic is 22.152 and calculated by summing all the individual cell's Chi-Square contributions:

\(4.584 + 0.073 + 4.914 + 6.016 + 0.097 + 6.532 = 22.152\)

The p-value is found by \(P(X^2>22.152)\) with degrees of freedom =\((2-1)(3-1) = 2\).  

Minitab calculates this p-value to be less than 0.001 and reports it as 0.000. Given this p-value of 0.000 is less than the alpha of 0.05, we reject the null hypothesis that political affiliation and their opinion on a tax reform bill are independent. We conclude that there is evidence that the two variables are dependent (i.e., that there is an association between the two variables).

Conditions for Using the Chi-Square Test Section  

Exercise caution when there are small expected counts. Minitab will give a count of the number of cells that have expected frequencies less than five. Some statisticians hesitate to use the Chi-Square test if more than 20% of the cells have expected frequencies below five, especially if the p-value is small and these cells give a large contribution to the total Chi-Square value.

Example 8-2: Tire Quality Section  

The operations manager of a company that manufactures tires wants to determine whether there are any differences in the quality of work among the three daily shifts. She randomly selects 496 tires and carefully inspects them. Each tire is either classified as perfect, satisfactory, or defective, and the shift that produced it is also recorded. The two categorical variables of interest are the shift and condition of the tire produced. The data ( shift_quality.txt ) can be summarized by the accompanying two-way table. Does the data provide sufficient evidence at the 5% significance level to infer that there are differences in quality among the three shifts?

  Perfect Satisfactory Defective Total

Shift 1

106

124

1

231

Shift 2

67

85

1

153

Shift 3

37

72

3

112

Total

210

281

5

496

Chi-Square Test

 

C1

C2 C3 Total

1

106

97.80

124

130.87

1

2.33

231

2

67

64.78

85

86.68

1

1.54

153

3

37

47.42

72

63.45

3

1.13

112
Total 210 281 5 496

Chi-Sq = 8.647 DF = 4, P-Value = 0.071 

Note that there are 3 cells with expected counts less than 5.0.

In the above example, we don't have a significant result at a 5% significance level since the p-value (0.071) is greater than 0.05. Even if we did have a significant result, we still could not trust the result, because there are 3 (33.3% of) cells with expected counts < 5.0

Sometimes researchers will categorize quantitative data (e.g., take height measurements and categorize as 'below average,' 'average,' and 'above average.') Doing so results in a loss of information - one cannot do the reverse of taking the categories and reproducing the raw quantitative measurements. Instead of categorizing, the data should be analyzed using quantitative methods.

Try it! Section  

A food services manager for a baseball park wants to know if there is a relationship between gender (male or female) and the preferred condiment on a hot dog. The following table summarizes the results. Test the hypothesis with a significance level of 10%.

    Condiment
Gender   Ketchup Mustard Relish Total
Male 15 23 10 48
Female 25 19 8 52
Total 40 42 18 100

The hypotheses are:

  • \(H_0\): Gender and condiments are independent
  • \(H_a\): Gender and condiments are not independent

We need to expected counts table:

    Condiment
Gender   Ketchup Mustard Relish Total
Male 15 (19.2) 23 (20.16) 10 (8.64) 48
Female 25 (20.8) 19 (21.84) 8 (9.36) 52
Total 40 42 18 100

None of the expected counts in the table are less than 5. Therefore, we can proceed with the Chi-Square test.

The test statistic is:

\(\chi^{2*}=\frac{(15-19.2)^2}{19.2}+\frac{(23-20.16)^2}{20.16}+...+\frac{(8-9.36)^2}{9.36}=2.95\)

The p-value is found by \(P(\chi^2>\chi^{2*})=P(\chi^2>2.95)\) with (3-1)(2-1)=2 degrees of freedom. Using a table or software, we find the p-value to be 0.2288.

With a p-value greater than 10%, we can conclude that there is not enough evidence in the data to suggest that gender and preferred condiment are related.

  • Skip to secondary menu
  • Skip to main content
  • Skip to primary sidebar

Statistics By Jim

Making statistics intuitive

How the Chi-Squared Test of Independence Works

By Jim Frost 21 Comments

Chi-squared tests of independence determine whether a relationship exists between two categorical variables . Do the values of one categorical variable depend on the value of the other categorical variable? If the two variables are independent, knowing the value of one variable provides no information about the value of the other variable.

I’ve previously written about Pearson’s chi-square test of independence using a fun Star Trek example . Are the uniform colors related to the chances of dying? You can test the notion that the infamous red shirts have a higher likelihood of dying. In that post, I focus on the purpose of the test, applied it to this example, and interpreted the results.

In this post, I’ll take a bit of a different approach. I’ll show you the nuts and bolts of how to calculate the expected values, chi-square value, and degrees of freedom. Then you’ll learn how to use the chi-squared distribution in conjunction with the degrees of freedom to calculate the p-value.

I’ve used the same approach to explain how:

  • t-Tests work .
  • F-tests work in one-way ANOVA .

Of course, you’ll usually just let your statistical software perform all calculations. However, understanding the underlying methodology helps you fully comprehend the analysis.

Chi-Squared Example Dataset

For the Star Trek example, uniform color and status are the two categorical variables. The contingency table below shows the combination of variable values, frequencies, and percentages.

7 9 24 40
129 46 215 390
136 55 239 N = 430
5.15% 16.36% 10.04%

Red shirts on Star Trek.

However, our fatality rates are not equal. Gold has the highest fatality rate at 16.36%, while Blue has the lowest at 5.15%. Red is in the middle at 10.04%. Does this inequality in our sample suggest that the fatality rates are different in the population? Does a relationship exist between uniform color and fatalities?

Thanks to random sampling error, our sample’s fatality rates don’t exactly equal the population’s rates. If the population rates are equal, we’d likely still see differences in our sample. So, the question becomes, after factoring in sampling error, are the fatality rates in our sample different enough to conclude that they’re different in the population? In other words, we want to be confident that the observed differences represent a relationship in the population rather than merely random fluctuations in the sample. That’s where Pearson’s chi-squared test for independence comes in!

Hypotheses for Our Test

The two hypotheses for the chi-squared test of independence are the following:

  • Null : The variables are independent. No relationship exists.
  • Alternative : A relationship between the variables exists.

Related posts : Hypothesis Testing Overview and Guide to Data Types

Calculating the Expected Frequencies for the Chi-squared Test of Independence

The chi-squared test of independence compares our sample data in the contingency table to the distribution of values we’d expect if the null hypothesis is correct. Let’s construct the contingency table we’d expect to see if the null hypothesis is true for our population.

For chi-squared tests, the term “expected frequencies” refers to the values we’d expect to see if the null hypothesis is true. To calculate the expected frequency for a specific combination of categorical variables (e.g., blue shirts who died), multiply the column total (Blue) by the row total (Dead), and divide by the sample size.

Row total X Column total / Sample Size = Expected value for one table cell

To calculate the expected frequency for the Dead/Blue cell in our dataset, do the following:

  • Find the row total for Dead (40)
  • Find the column total for Blue (136)
  • Multiply those two values and divide by the sample size (430)

40 * 136 / 430 = 12.65

If the null hypothesis is true, we’d expect to see 12.65 fatalities for wearers of the Blue uniforms in our sample. Of course, we can’t have a fraction of a death, but that doesn’t affect the results.

Contingency Table with the Expected Values

I’ll calculate the expected values for all six cells that represent the combinations of the three uniform colors and two statuses. I’ll also include the observed values in our sample. Expected values are in parentheses.

7 (12.65) 9 (5.12) 24 (22.23) 40
129 (123.35) 46 (49.88) 215 (216.77) 390
9.3% 9.3% 9.3%

In this table, notice how the column percentages for the expected dead are all 9.3%. This equality occurs when the null hypothesis is valid, which is the condition that the expected values represent.

Using this table, we can also compare the values we observe in our sample to the frequencies we’d expect if the null hypothesis that the variables are not related is correct.

For example, the observed frequency for Blue/Dead is less than the expected value (7 < 12.65). In our sample, deaths of those in blue uniforms occurred less frequently than we’d expect if the variables are independent. On the other hand, the observed frequency for Gold/Dead is greater than the expected value (9 > 5.12). Meanwhile, the observed frequency for Red/Dead approximately equals the expected value. This interpretation matches what we concluded by assessing the column percentages in the first contingency table.

Pearson’s chi-squared test works by mathematically comparing observed frequencies to the expected values and boiling all those differences down into one number. Let’s see how it does that!

Related post : Using Contingency Tables to Calculate Probabilities

Calculating the Chi-Squared Statistic

Most hypothesis tests calculate a test statistic. For example, t-tests use t-values and F-tests use F-values as their test statistics. These statistical tests compare your observed sample data to what you would expect if the null hypothesis is true. The calculations reduce your sample data down to one value that represents how different your data are from the null. Learn more about Test Statistics .

For chi-squared tests, the test statistic is, unsurprisingly, chi-squared, or χ 2 .

The chi-squared calculations involve a familiar concept in statistics—the sum of the squared differences between the observed and expected values. This concept is similar to how regression models assess goodness-of-fit using the sum of the squared differences.

Here’s the formula for chi-squared.

Chi-squared equation.

Let’s walk through it!

To calculate the chi-squared statistic, take the difference between a pair of observed (O) and expected values (E), square the difference, and divide that squared difference by the expected value. Repeat this process for all cells in your contingency table and sum those values. The resulting value is χ 2 . We’ll calculate it for our example data shortly!

Important Considerations about the Chi-Squared Statistic

Notice several important considerations about chi-squared values:

Zero represents the null hypothesis. If all your observed frequencies equal the expected frequencies exactly, the chi-squared value for each cell equals zero, and the overall chi-squared statistic equals zero. Zero indicates your sample data exactly match what you’d expect if the null hypothesis is correct.

Squaring the differences ensures both that cell values must be non-negative and that larger differences are weighted more than smaller differences. A cell can never subtract from the chi-squared value.

Larger values represent a greater difference between your sample data and the null hypothesis. Chi-squared tests are one-tailed tests rather than the more familiar two-tailed tests. The test determines whether the entire set of differences exceeds a significance threshold. If your χ 2 passes the limit, your results are statistically significant! You can reject the null hypothesis and conclude that the variables are dependent–a relationship exists.

Related post : One-tailed and Two-tailed Hypothesis Tests

Calculating Chi-Squared for our Example Data

Let’s calculate the chi-squared statistic for our example data! To do that, I’ll rearrange the contingency table, making it easier to illustrate how to calculate the sum of the squared differences.

Worksheet that shows chi-squared calculations for our example data.

The first two columns indicate the combination of categorical variable values. The next two are the observed and expected values that we calculated before. The last column is the squared difference divided by the expected value for each row. The bottom line sums those values.

Our chi-squared test statistic is 6.17. Ok, great. What does that mean? Larger values indicate a more substantial divergence between our observed data and the null hypothesis. However, the number by itself is not useful because we don’t know if it’s unusually large. We need to place it into a broader context to determine whether it is an extreme value.

Using the Chi-Squared Distribution to Test Hypotheses

One chi-squared test produces a single chi-squared value. However, imagine performing the following process.

First, assume the null hypothesis is valid for the population. At the population level, there is no relationship between the two categorical variables. Now, we’ll repeat our study many times by drawing many random samples from this population using the same design and sample size. Next, we perform the chi-squared test of independence on all the samples and plot the distribution of the chi-squared values. This distribution is known as a sampling distribution, which is a type of probability distribution.

If we follow this procedure, we create a graph that displays the distribution of chi-squared values for a population where the null hypothesis is true. We use sampling distributions to calculate probabilities for how unlikely our sample statistic is if the null hypothesis is correct. Chi-squared tests use the chi-square distribution.

Fortunately, we don’t need to collect many random samples to create this graph! Statisticians understand the properties of chi-squared distributions so we can estimate the sampling distribution using the details of our design.

Our goal is to determine whether our sample chi-squared value is so rare that it justifies rejecting the null hypothesis for the entire population. The chi-squared distribution provides the context for making that determination. We’ll calculate the probability of obtaining a chi-squared value that is at least as high as the value that our study found (6.17).

This probability has a name—the P-value!  A low probability indicates that our sample data are unlikely when the null hypothesis is true.

Alternatively, you can use a chi-square table to determine whether our study’s chi-square test statistic exceeds the critical value .

Related posts : Sampling Distributions , Understanding Probability Distributions and Interpreting P-values

Graphing the Chi-Squared Test Results for Our Example

For chi-squared tests, the degrees of freedom define the shape of the chi-squared distribution for a design. Chi-square tests use this distribution to calculate p-values. The graph below displays several chi-square distributions with differing degrees of freedom.

Graph the displays chi-square distributions for different degrees of freedom.

For a table with r rows and c columns, the method for calculating degrees of freedom for a chi-square test is (r-1) (c-1). For our example, we have two rows and three columns: (2-1) * (3-1) = 2 df.

Read my post about degrees of freedom to learn about this concept along with a more intuitive way of understanding degrees of freedom in chi-squared tests of independence.

Below is the chi-squared distribution for our study’s design.

Chi-squared distribution for our example analysis.

The distribution curve displays the likelihood of chi-squared values for a population where there is no relationship between uniform color and status at the population level. I shaded the region that corresponds to chi-square values greater than or equal to our study’s value (6.17). When the null hypothesis is correct, chi-square values fall in this area approximately 4.6% of the time, which is the p-value (0.046). With a significance level of 0.05, our sample data are unusual enough to reject the null hypothesis.

The sample evidence suggests that a relationship between the variables exists in the population. While this test doesn’t indicate red shirts have a higher chance of dying, there is something else going on with red shirts. Read my other post chi-squared to learn about that !

Related Reading

When you have smaller sample sizes, you might need to use Fisher’s exact test instead of the chi-square version. To learn more, read my post, Fisher’s Exact Test: Using and Interpreting .

Learn more about How to Find the P Value .

You can also read about the chi-square goodness of fit test , which assesses the distribution of outcomes for a categorical or discrete variable.

Pearson’s chi-squared test for independence doesn’t tell you the effect size. To understand the strength of the relationship, you’d need to use something like Cramér’s V, which is a measure of association like Pearson’s correlation —except for categorical variables. That’s the topic of a future post!

Share this:

hypothesis testing chi square

Reader Interactions

' src=

November 15, 2021 at 1:56 pm

Jim – I want to start by saying that I love your site. It has helped me out greatly during many occasions. In this particular example I am interested in understanding the logic around the math for the expected values. For example, can you explain how I should interpret scaling the total number dead by the total number blue?

From there I get that we divide by the total number of people to get the number of blue deaths expected within the group of 430 people. Is this a formula that is well known for contingency tables or did you apply that strictly for this scenario?

Hopefully this question made sense?

Either way, thanks for the contributing to the community!

' src=

November 16, 2021 at 11:48 am

I’m so glad to hear that my site has been helpful!

I’m not 100% sure what you’re asking, so I’m not sure if I’m answering your question. To start, the formulas are the standard ones for the chi-squared test of independence, which you use in conjunction with contingency tables. You’d use the same methods and formulas for other datasets.

The portion you’re asking about is how to calculate the expected number for blue deaths if there is no association between uniform color and deaths (i.e., the null hypothesis of the test is true). So, the interpretation of the value is: If there is no relationship between uniform color and deaths, we’d expect 12.6 fatalities among those wearing blue uniforms. The test as a whole compares these expected values (for all table cells) to the observed values to determine whether the data support rejecting the null hypothesis and concluding that there is a relationship between the variables.

' src=

April 22, 2021 at 7:38 am

I teach AP Stat and am planning on using your example. However, in checking conditions I would like to be able to give background on the origin of the data. I went to your link and found that this data was collected for the TV episodes. Are those the episodes just for the original series?

April 23, 2021 at 11:21 pm

That’s great you’re teaching an AP Stats class! 🙂

Yes, the data I use are from the original TV series that aired from 1966-69.

' src=

July 5, 2020 at 12:34 pm

Thank you for your gracious reply. I’m especially happy because it meant that I actually understood! You’ve done a great service with this blog; I plan to return regularly! Thank you.

July 5, 2020 at 5:43 pm

I was think exactly that after fixing the comment. It would make a perfect comprehension test. Read this article and find the two incorrect letters! You passed! 🙂

July 4, 2020 at 9:13 am

I very much appreciate your clear explanations. I’m a “50 something” trying to finish a PhD in Library Science and my brain needs the help!

One question, please?

You write above:

Larger values represent a greater difference between your sample data and the null hypothesis. Chi-squared tests are one-tailed tests rather than the more familiar two-tailed tests. The test determines whether the entire set of differences exceeds a significance threshold. If your χ2 passes the limit, your results are statistically significant! You can reject the null hypothesis and conclude that the variables are independent.

I thought that rejecting the null hypothesis allowed you to conclude the opposite. If the null hypothesis is

Null: The variables are independent. No relationship exists.

Then rejecting the Null hypothesis means rejecting that the variables are independent, not concluding that the variables are independent.

This is, please, a honest question, (not being “that guy”; i’m not smart enough!).

Again, thank you for your work!! I’m going to check to see if you cover Kendall’s W, as it’s central to a paper I’m reading!

July 4, 2020 at 3:08 pm

First, I definitely welcome all questions! And, especially in this case because you caught a typo! You’re correct about what rejecting the null hypothesis means for this test. I’ve updated the text to say “and conclude that the variables are dependent.” I double-checked elsewhere through article and all the other text about the conclusions based on significance are correct. Just a brain malfunction on my part! I’m grateful you caught that as that little slip changes the entire meaning!

Alas, I don’t cover Kendall’s W–at least not yet. I plan to add that down the road.

' src=

April 28, 2020 at 7:28 pm

Thanks Jim. Your explanations are so effective, yet easy to understand!

' src=

April 26, 2020 at 8:54 pm

Thank you Jim. Great post and reply. I have a question which is an extension of Michael’s question.

In general, it seems like one could build any test statistic. Find the distribution of your statistic under the null (say using bootstrap), and that will give you a p-value for your dataset.

Are chi-squared, t, or F-statistics special in some way? Or do we continue to use them simply because people have used them historically?

April 27, 2020 at 12:31 am

Originally, hypothesis tests that used these distributions were easier to calculate. You could calculate the test statistic using a simple formula and then look it up in a table. Later, it got even easier when the computer could both calculate the test statistic and tell you its p-value. It’s really the ease of calculation that made them special along with the theories behind them.

Now, we have such powerful computers that they can easily construct very large sets of bootstrap samples. That would’ve been difficult earlier. So, a large part of the answer is that bootstrapping really wasn’t feasible earlier and so the use of the chi-squared, t, and F distributions became the norm. The historically accepted standards.

It’s possible that over time bootstrap methods will gain be used more. I haven’t done extensive research into how efficient they are compared to using the various distributions, but what I have done indicates they are at least roughly on par. If you haven’t, I’d suggest reading my post about bootstrapping for more information.

Thanks for asking the great question!

' src=

January 31, 2020 at 1:29 am

Nice explanation

' src=

January 30, 2020 at 4:21 am

This has started my year, so far so good, Thank you Jim.

' src=

January 29, 2020 at 1:32 am

great lesson thanks

' src=

January 28, 2020 at 9:24 pm

Thankyou Jim, I will read and calc this lesson today, at 3 o’clock Brasilia time.

' src=

January 28, 2020 at 4:40 am

Thank You Sir

' src=

January 27, 2020 at 8:49 am

Great post, thanks for writing it. I am looking forward to the Cramer’s V post!

As a person just starting to dive into statistics I am curios why we so often square the differences to make calculations. It seems squaring a difference will put to much weight on large differences. For example, in the chi-square test what if we used the absolute value of observed and expected differences? Just something I have been wondering about.

January 28, 2020 at 11:43 pm

Hi Michael,

There’s several ways of looking at your question. In some cases, if you just want to know how far observations are from the mean for a dataset, you would be justified using the mean absolute deviation rather than the standard deviation, which incorporates squared deviations but then takes the square root.

However, in other cases, the squared deviations are built into the underlying analysis. Such as in linear regression where it penalizes larger errors which helps force them to be smaller. Otherwise, the regression line would not “consider” larger errors to be much worse than smaller errors. Here’s an article about it in the regression context .

Or, if you’re working with the normal distribution and using it calculate probabilities or what not, that distribution has the mean and standard deviation as parameters. And the standard deviation incorporates squared differences. You could not work with the normal distribution using mean absolute deviations (MAD).

In a similar vein for chi-squared tests, you have to realize that the chi-squared distribution is based on squared differences. So, if you wanted to do a similar analysis but with the mean absolute deviation (MAD), you’d have to devise an entirely new test statistic and sampling distribution for it! You couldn’t just use the chi-squared distribution because that is specifically for these differences that use squaring. Same thing for F-tests which use ratios of variances, and variances are of course based on squared differences. Again, to use MAD for something like ANOVA, you’d need to come up with a new test statistic and sampling distribution!

But, the general reason is that squaring does weight large differences more heavily and that fits in with the rational that given a distribution of values, outlier values should be weighted more because they are relatively unlikely to occur so when they do it’s noteworthy. It makes those large differences between the expected and the observed more “odd.” And, some analyses use an underlying sampling distribution that is based on a test statistic calculated using squared differences in some fashion.

' src=

January 27, 2020 at 2:08 am

Thank you Jim.

' src=

January 27, 2020 at 1:12 am

Great lesson Jim! You’re putting it a very simple ways for non-statisticians. Thanks for sharing the knowledge!

' src=

January 26, 2020 at 8:32 pm

Thanks for sharing, Jim!

Comments and Questions Cancel reply

Have a language expert improve your writing

Run a free plagiarism check in 10 minutes, generate accurate citations for free.

  • Knowledge Base
  • Chi-Square (Χ²) Table | Examples & Downloadable Table

Chi-Square (Χ²) Table | Examples & Downloadable Table

Published on May 31, 2022 by Shaun Turney . Revised on June 21, 2023.

The chi-square (Χ 2 ) distribution table is a reference table that lists chi-square critical values . A chi-square critical value is a threshold for statistical significance for certain hypothesis tests and defines confidence intervals for certain parameters.

Chi-square critical values are calculated from chi-square distributions . They’re difficult to calculate by hand, which is why most people use a reference table or statistical software instead.

Download chi-square table (PDF)

Table of contents

When to use a chi-square distribution table, chi-square distribution table (right-tail probabilities), how to use the table, left-tailed and two-tailed probabilities, practice questions, other interesting articles, frequently asked questions about chi-square tables.

You will need a chi-square critical value if you want to:

  • Calculate a confidence interval for a population variance or standard deviation
  • Test whether the variance or standard deviation of a population is equal to a certain value (test of a single variance)
  • Test whether the frequency distribution of a categorical variable is different from your expectations ( chi-square goodness of fit test )
  • Test whether two categorical variables are related to each other ( chi-square test of independence )
  • Test whether the proportions of two closely related variables are equal ( McNemar’s test )

Prevent plagiarism. Run a free check.

Use the table below to find the chi-square critical value for your chi-square test or confidence interval or download the chi-square distribution table (PDF) .

The table provides the right-tail probabilities. If you need the left-tail probabilities, you’ll need to make a small additional calculation .

Chi-square distribution table

To find the chi-square critical value for your hypothesis test or confidence interval, follow the three steps below.

The team wants to use a chi-square goodness of fit test to test the null hypothesis ( H 0 ) that the four entrances are used equally often by the population.

Step 1: Calculate the degrees of freedom

There isn’t just one chi-square distribution —there are many, and their shapes differ depending on a parameter called “degrees of freedom” (also referred to as df or k ). Each row of the chi-square distribution table represents a chi-square distribution with a different df.

You need to use the distribution with the correct df for your test or confidence interval. The table below gives equations to calculate df for several common procedures:

Test or procedure Degrees of freedom ( ) equation
Test of a single variance

Confidence interval for variance or standard deviation

= sample size − 1
= number of groups − 1
= (number of variable 1 groups − 1) * (number of variable 2 groups − 1)
= 1

df = number of groups − 1

Step 2: Choose a significance level

The columns of the chi-square distribution table indicate the significance level of the critical value. By convention, the significance level (α) is almost always .05, so the column for .05 is highlighted in the table.

In rare situations, you may want to increase α to decrease your Type II error rate or decrease α to decrease your Type I error rate.

To calculate a confidence interval, choose the significance level based on your desired confidence level :

α = 1 − confidence level

The most common confidence level is 95% (.95), which corresponds to α = .05.

Step 3: Find the critical value in the table

You now have the two numbers you need to find your critical value in the chi-square distribution table:

  • The degrees of freedom ( df ) are listed along the left-hand side of the table. Find the table row corresponding to the degrees of freedom you calculated.
  • The significance levels (α) are listed along the top of the table. Find the column corresponding to your chosen significance level.
  • The table cell where the row and column meet is your critical value.

The security team can now compare this chi-square critical value to the Pearson’s chi-square they calculated for their sample. If the critical value is larger than the sample’s chi-square, they can reject the null hypothesis.

Chi-square table critical value

The table provided here gives the right-tail probabilities. You should use this table for most chi-square tests, including the chi-square goodness of fit test and the chi-square test of independence, and McNemar’s test.

If you want to perform a two-tailed or left-tailed test, you’ll need to make a small additional calculation.

Left-tailed tests

The most common left-tailed test is the test of a single variance when determining whether a population’s variance or standard deviation is less than a certain value.

To find the critical value for a left-tailed probability in the table above, simply use the table column for 1 − α.

You pride yourself on making every cookie the same size, so you decide to randomly sample 25 of your cookies to see if their standard deviation is less than 0.2 inches.

This is a left-tailed test because you want to know if the standard deviation is less than a certain value. You look up the left-tailed probability in the right-tailed table by subtracting one from your significance level : 1 − α = 1 − .05 = 0.95.

The critical value for df = 25 − 1 = 24 and α = .95 is 13.848.

Two-tailed tests

The most common left-tailed test is the test of a single variance when determining whether a population’s variance or standard deviation is equal to a certain value.

\dfrac{\alpha}{2}

They find in a medical textbook that the standard deviation of head diameter of six-month-old babies is 1 inch, but they want to confirm this number themselves. They randomly sample 20 six-month-old babies and measure their heads.

This is a two-tailed test because they want to know if the standard deviation is equal to a certain value. They should look up the two critical values in the columns for:

\dfrac{\alpha}{2}=\dfrac{.05}{2}=.025

The critical value for df = 20 − 1 = 19 and α = .025 is 32.852. The critical value for df = 19 and α = .975 is 8.907.

Receive feedback on language, structure, and formatting

Professional editors proofread and edit your paper by focusing on:

  • Academic style
  • Vague sentences
  • Style consistency

See an example

hypothesis testing chi square

If you want to know more about statistics , methodology , or research bias , make sure to check out some of our other articles with explanations and examples.

  • Student’s t table
  • Student’s t distribution
  • Descriptive statistics
  • Measures of central tendency
  • Correlation coefficient

Methodology

  • Cluster sampling
  • Stratified sampling
  • Types of interviews
  • Cohort study
  • Thematic analysis

Research bias

  • Implicit bias
  • Cognitive bias
  • Survivorship bias
  • Availability heuristic
  • Nonresponse bias
  • Regression to the mean

You can use the qchisq() function to find a chi-square critical value in R.

For example, to calculate the chi-square critical value for a test with df = 22 and α = .05:

qchisq(p = .05, df = 22, lower.tail = FALSE)

You can use the CHISQ.INV.RT() function to find a chi-square critical value in Excel.

For example, to calculate the chi-square critical value for a test with df = 22 and α = .05, click any blank cell and type:

=CHISQ.INV.RT(0.05,22)

A chi-square distribution is a continuous probability distribution . The shape of a chi-square distribution depends on its degrees of freedom , k . The mean of a chi-square distribution is equal to its degrees of freedom ( k ) and the variance is 2 k . The range is 0 to ∞.

Cite this Scribbr article

If you want to cite this source, you can copy and paste the citation or click the “Cite this Scribbr article” button to automatically add the citation to our free Citation Generator.

Turney, S. (2023, June 21). Chi-Square (Χ²) Table | Examples & Downloadable Table. Scribbr. Retrieved August 21, 2024, from https://www.scribbr.com/statistics/chi-square-distribution-table/

Is this article helpful?

Shaun Turney

Shaun Turney

Other students also liked, chi-square (χ²) tests | types, formula & examples, chi-square (χ²) distributions | definition & examples, hypothesis testing | a step-by-step guide with easy examples, what is your plagiarism score.

IMAGES

  1. Chi Square Test

    hypothesis testing chi square

  2. PPT

    hypothesis testing chi square

  3. Chi-square test Question Example

    hypothesis testing chi square

  4. 02 Complete Chi Square Hypothesis Test Example 1

    hypothesis testing chi square

  5. Chi Square Test

    hypothesis testing chi square

  6. PPT

    hypothesis testing chi square

COMMENTS

  1. Hypothesis Testing

    We then determine the appropriate test statistic for the hypothesis test. The formula for the test statistic is given below. Test Statistic for Testing H0: p1 = p 10 , p2 = p 20 , ..., pk = p k0. We find the critical value in a table of probabilities for the chi-square distribution with degrees of freedom (df) = k-1.

  2. Chi-Square (Χ²) Tests

    Where: Χ 2 is the chi-square test statistic; Σ is the summation operator (it means "take the sum of") O is the observed frequency; E is the expected frequency; The larger the difference between the observations and the expectations (O − E in the equation), the bigger the chi-square will be.To decide whether the difference is big enough to be statistically significant, you compare the ...

  3. What Is Chi Square Test & How To Calculate Formula Equation

    The Chi-square test is a non-parametric statistical test used to determine if there's a significant association between two or more categorical variables in a sample. It works by comparing the observed frequencies in each category of a cross-tabulation with the frequencies expected under the null hypothesis, which assumes there is no ...

  4. Chi-squared test

    Chi-squared distribution, showing χ 2 on the x-axis and p-value (right tail probability) on the y-axis.. A chi-squared test (also chi-square or χ 2 test) is a statistical hypothesis test used in the analysis of contingency tables when the sample sizes are large. In simpler terms, this test is primarily used to examine whether two categorical variables (two dimensions of the contingency table ...

  5. Chi-Square Test of Independence

    A chi-square (Χ 2) test of independence is a nonparametric hypothesis test. You can use it to test whether two categorical variables are related to each other. Example: Chi-square test of independence. Imagine a city wants to encourage more of its residents to recycle their household waste.

  6. Chi-Square Test: A Comprehensive Guide

    Step-by-Step Guide to Perform Chi-Square Test. To effectively execute a Chi-Square Test, follow these methodical steps:. State the Hypotheses: The null hypothesis (H0) posits no association between the variables — i.e., independent — while the alternative hypothesis (H1) posits an association between the variables. Construct a Contingency Table: Create a matrix to present your observations ...

  7. Chi-squared test

    The formula for the chi-squared test is χ 2 = Σ (Oi − Ei)2/ Ei, where χ 2 represents the chi-squared value, Oi represents the observed value, Ei represents the expected value (that is, the value expected from the null hypothesis), and the symbol Σ represents the summation of values for all i. One then looks up in a table the chi-squared ...

  8. S.4 Chi-Square Tests

    Chi-Square Test Statistic. χ 2 = ∑ ( O − E) 2 / E. where O represents the observed frequency. E is the expected frequency under the null hypothesis and computed by: E = row total × column total sample size. We will compare the value of the test statistic to the critical value of χ α 2 with the degree of freedom = ( r - 1) ( c - 1), and ...

  9. Introduction

    This module will continue the discussion of hypothesis testing, where a specific statement or hypothesis is generated about a population parameter, and sample statistics are used to assess the likelihood that the hypothesis is true. ... The specific tests considered here are called chi-square tests and are appropriate when the outcome is ...

  10. 8. The Chi squared tests

    To calculate the expected numbers a constant multiplier for each sample is obtained by dividing the total of the sample by the grand total for both samples. In table 8.1 for sample A this is 155/289 = 0.5363. This fraction is then successively multiplied by 22, 46, 73, 91, and 57. For sample B the fraction is 134/289 = 0.4636.

  11. Chi-Square (Χ²) Distributions

    Published on May 20, 2022 by Shaun Turney . Revised on June 21, 2023. A chi-square (Χ2) distribution is a continuous probability distribution that is used in many hypothesis tests. The shape of a chi-square distribution is determined by the parameter k. The graph below shows examples of chi-square distributions with different values of k.

  12. Chi-Square Test of Independence and an Example

    The chi-squared test of independence (or association) and the two-sample proportions test are related. The main difference is that the chi-squared test is more general while the 2-sample proportions test is more specific. And, it happens that the proportions test it more targeted at specifically the type of data you have.

  13. The Chi-Square Test

    The basic idea behind the test is to compare the observed values in your data to the expected values that you would see if the null hypothesis is true. There are two commonly used Chi-square tests: the Chi-square goodness of fit test and the Chi-square test of independence. Both tests involve variables that divide your data into categories.

  14. Chi-square statistic for hypothesis testing

    Courses on Khan Academy are always 100% free. Start practicing—and saving your progress—now: https://www.khanacademy.org/math/ap-statistics/chi-square-tests/...

  15. 10.1: The Chi-Square Distribution

    We perform a two-tailed test because the alternative hypothesis, \(H_a: p \neq 0.60\), contains a not equal to inequality. We can now find the P-value, which is the probability of seeing a sample proportion as extreme or more extreme than 0.5, by finding the probability from the standard normal distribution. ... Chi-Square Statistic. The test ...

  16. 9.6: Chi-Square Tests

    Computational Exercises. In each of the following exercises, specify the number of degrees of freedom of the chi-square statistic, give the value of the statistic and compute the P -value of the test. A coin is tossed 100 times, resulting in 55 heads. Test the null hypothesis that the coin is fair.

  17. When to Use a Chi-Square Test (With Examples)

    You should use the Chi-Square Goodness of Fit Test whenever you would like to know if some categorical variable follows some hypothesized distribution. Here are some examples of when you might use this test: Example 1: Counting Customers. A shop owner wants to know if an equal number of people come into a shop each day of the week, so he counts ...

  18. Khan Academy

    Learn for free about math, art, computer programming, economics, physics, chemistry, biology, medicine, finance, history, and more. Khan Academy is a nonprofit with the mission of providing a free, world-class education for anyone, anywhere.

  19. What is a Chi-Square Test

    A Chi-Square Test is used to examine whether the observed results are in order with the expected values. When the data to be analysed is from a random sample, and when the variable is the question is a categorical variable, then Chi-Square proves the most appropriate test for the same. A categorical variable consists of selections such as ...

  20. 8.1

    To conduct this test we compute a Chi-Square test statistic where we compare each cell's observed count to its respective expected count. In a summary table, we have r × c = r c cells. Let O 1, O 2, …, O r c denote the observed counts for each cell and E 1, E 2, …, E r c denote the respective expected counts for each cell.

  21. Inference for categorical data (chi-square tests)

    Start Unit test. Chi-square tests are a family of significance tests that give us ways to test hypotheses about distributions of categorical data. This topic covers goodness-of-fit tests to see if sample data fits a hypothesized distribution, and tests for independence between two categorical variables.

  22. Chi Square Test

    The name for this test is "t-test," and it is one of the simplest and most popular hypothesis tests. The chi-square test is for another use. It is to see if one thing is related to another.

  23. How the Chi-Squared Test of Independence Works

    For chi-squared tests, the term "expected frequencies" refers to the values we'd expect to see if the null hypothesis is true. To calculate the expected frequency for a specific combination of categorical variables (e.g., blue shirts who died), multiply the column total (Blue) by the row total (Dead), and divide by the sample size.

  24. 10 Statistics Questions to Ace Your Data Science Interview

    A Chi-Square test of independence is used to examine the relationship between observed and expected results. The null hypothesis (H0) of this test is that any observed difference between the features is purely due to chance. In simple terms, this test can help us identify if the relationship between two categorical variables is due to chance ...

  25. Chi-Square (Χ²) Table

    The chi-square (Χ2) distribution table is a reference table that lists chi-square critical values. A chi-square critical value is a threshold for statistical significance for certain hypothesis tests and defines confidence intervals for certain parameters. Chi-square critical values are calculated from chi-square distributions.

  26. Khan Academy

    Khan Academy