Men
Women
You can clearly see some overlap in the body fat measurements for the men and women in our sample, but also some differences. Just by looking at the data, it's hard to draw any solid conclusions about whether the underlying populations of men and women at the gym have the same mean body fat. That is the value of statistical tests – they provide a common, statistically valid way to make decisions, so that everyone makes the same decision on the same set of data values.
Let’s start by answering: Is the two-sample t -test an appropriate method to evaluate the difference in body fat between men and women?
Before jumping into analysis, we should always take a quick look at the data. The figure below shows histograms and summary statistics for the men and women.
The two histograms are on the same scale. From a quick look, we can see that there are no very unusual points, or outliers . The data look roughly bell-shaped, so our initial idea of a normal distribution seems reasonable.
Examining the summary statistics, we see that the standard deviations are similar. This supports the idea of equal variances. We can also check this using a test for variances.
Based on these observations, the two-sample t -test appears to be an appropriate method to test for a difference in means.
For each group, we need the average, standard deviation and sample size. These are shown in the table below.
Women | 10 | 22.29 | 5.32 |
Men | 13 | 14.95 | 6.84 |
Without doing any testing, we can see that the averages for men and women in our samples are not the same. But how different are they? Are the averages “close enough” for us to conclude that mean body fat is the same for the larger population of men and women at the gym? Or are the averages too different for us to make this conclusion?
We'll further explain the principles underlying the two sample t -test in the statistical details section below, but let's first proceed through the steps from beginning to end. We start by calculating our test statistic. This calculation begins with finding the difference between the two averages:
$ 22.29 - 14.95 = 7.34 $
This difference in our samples estimates the difference between the population means for the two groups.
Next, we calculate the pooled standard deviation. This builds a combined estimate of the overall standard deviation. The estimate adjusts for different group sizes. First, we calculate the pooled variance:
$ s_p^2 = \frac{((n_1 - 1)s_1^2) + ((n_2 - 1)s_2^2)} {n_1 + n_2 - 2} $
$ s_p^2 = \frac{((10 - 1)5.32^2) + ((13 - 1)6.84^2)}{(10 + 13 - 2)} $
$ = \frac{(9\times28.30) + (12\times46.82)}{21} $
$ = \frac{(254.7 + 561.85)}{21} $
$ =\frac{816.55}{21} = 38.88 $
Next, we take the square root of the pooled variance to get the pooled standard deviation. This is:
$ \sqrt{38.88} = 6.24 $
We now have all the pieces for our test statistic. We have the difference of the averages, the pooled standard deviation and the sample sizes. We calculate our test statistic as follows:
$ t = \frac{\text{difference of group averages}}{\text{standard error of difference}} = \frac{7.34}{(6.24\times \sqrt{(1/10 + 1/13)})} = \frac{7.34}{2.62} = 2.80 $
To evaluate the difference between the means in order to make a decision about our gym programs, we compare the test statistic to a theoretical value from the t- distribution. This activity involves four steps:
Let’s look at the body fat data and the two-sample t -test using statistical terms.
Our null hypothesis is that the underlying population means are the same. The null hypothesis is written as:
$ H_o: \mathrm{\mu_1} =\mathrm{\mu_2} $
The alternative hypothesis is that the means are not equal. This is written as:
$ H_o: \mathrm{\mu_1} \neq \mathrm{\mu_2} $
We calculate the average for each group, and then calculate the difference between the two averages. This is written as:
$\overline{x_1} - \overline{x_2} $
We calculate the pooled standard deviation. This assumes that the underlying population variances are equal. The pooled variance formula is written as:
The formula shows the sample size for the first group as n 1 and the second group as n 2 . The standard deviations for the two groups are s 1 and s 2 . This estimate allows the two groups to have different numbers of observations. The pooled standard deviation is the square root of the variance and is written as s p .
What if your sample sizes for the two groups are the same? In this situation, the pooled estimate of variance is simply the average of the variances for the two groups:
$ s_p^2 = \frac{(s_1^2 + s_2^2)}{2} $
The test statistic is calculated as:
$ t = \frac{(\overline{x_1} -\overline{x_2})}{s_p\sqrt{1/n_1 + 1/n_2}} $
The numerator of the test statistic is the difference between the two group averages. It estimates the difference between the two unknown population means. The denominator is an estimate of the standard error of the difference between the two unknown population means.
Technical Detail: For a single mean, the standard error is $ s/\sqrt{n} $ . The formula above extends this idea to two groups that use a pooled estimate for s (standard deviation), and that can have different group sizes.
We then compare the test statistic to a t value with our chosen alpha value and the degrees of freedom for our data. Using the body fat data as an example, we set α = 0.05. The degrees of freedom ( df ) are based on the group sizes and are calculated as:
$ df = n_1 + n_2 - 2 = 10 + 13 - 2 = 21 $
The formula shows the sample size for the first group as n 1 and the second group as n 2 . Statisticians write the t value with α = 0.05 and 21 degrees of freedom as:
$ t_{0.05,21} $
The t value with α = 0.05 and 21 degrees of freedom is 2.080. There are two possible results from our comparison:
When the variances for the two groups are not equal, we cannot use the pooled estimate of standard deviation. Instead, we take the standard error for each group separately. The test statistic is:
$ t = \frac{ (\overline{x_1} - \overline{x_2})}{\sqrt{s_1^2/n_1 + s_2^2/n_2}} $
The numerator of the test statistic is the same. It is the difference between the averages of the two groups. The denominator is an estimate of the overall standard error of the difference between means. It is based on the separate standard error for each group.
The degrees of freedom calculation for the t value is more complex with unequal variances than equal variances and is usually left up to statistical software packages. The key point to remember is that if you cannot use the pooled estimate of standard deviation, then you cannot use the simple formula for the degrees of freedom.
The normality assumption is more important when the two groups have small sample sizes than for larger sample sizes.
Normal distributions are symmetric, which means they are “even” on both sides of the center. Normal distributions do not have extreme values, or outliers. You can check these two features of a normal distribution with graphs. Earlier, we decided that the body fat data was “close enough” to normal to go ahead with the assumption of normality. The figure below shows a normal quantile plot for men and women, and supports our decision.
You can also perform a formal test for normality using software. The figure above shows results of testing for normality with JMP software. We test each group separately. Both the test for men and the test for women show that we cannot reject the hypothesis of a normal distribution. We can go ahead with the assumption that the body fat data for men and for women are normally distributed.
Testing for unequal variances is complex. We won’t show the calculations in detail, but will show the results from JMP software. The figure below shows results of a test for unequal variances for the body fat data.
Without diving into details of the different types of tests for unequal variances, we will use the F test. Before testing, we decide to accept a 10% risk of concluding the variances are equal when they are not. This means we have set α = 0.10.
Like most statistical software, JMP shows the p -value for a test. This is the likelihood of finding a more extreme value for the test statistic than the one observed. It’s difficult to calculate by hand. For the figure above, with the F test statistic of 1.654, the p- value is 0.4561. This is larger than our α value: 0.4561 > 0.10. We fail to reject the hypothesis of equal variances. In practical terms, we can go ahead with the two-sample t -test with the assumption of equal variances for the two groups.
Using a visual, you can check to see if your test statistic is a more extreme value in the distribution. The figure below shows a t- distribution with 21 degrees of freedom.
Since our test is two-sided and we have set α = .05, the figure shows that the value of 2.080 “cuts off” 2.5% of the data in each of the two tails. Only 5% of the data overall is further out in the tails than 2.080. Because our test statistic of 2.80 is beyond the cut-off point, we reject the null hypothesis of equal means.
The figure below shows results for the two-sample t -test for the body fat data from JMP software.
The results for the two-sample t -test that assumes equal variances are the same as our calculations earlier. The test statistic is 2.79996. The software shows results for a two-sided test and for one-sided tests. The two-sided test is what we want (Prob > |t|). Our null hypothesis is that the mean body fat for men and women is equal. Our alternative hypothesis is that the mean body fat is not equal. The one-sided tests are for one-sided alternative hypotheses – for example, for a null hypothesis that mean body fat for men is less than that for women.
We can reject the hypothesis of equal mean body fat for the two groups and conclude that we have evidence body fat differs in the population between men and women. The software shows a p -value of 0.0107. We decided on a 5% risk of concluding the mean body fat for men and women are different, when they are not. It is important to make this decision before doing the statistical test.
The figure also shows the results for the t- test that does not assume equal variances. This test does not use the pooled estimate of the standard deviation. As was mentioned above, this test also has a complex formula for degrees of freedom. You can see that the degrees of freedom are 20.9888. The software shows a p- value of 0.0086. Again, with our decision of a 5% risk, we can reject the null hypothesis of equal mean body fat for men and women.
If you have more than two independent groups, you cannot use the two-sample t- test. You should use a multiple comparison method. ANOVA, or analysis of variance, is one such method. Other multiple comparison methods include the Tukey-Kramer test of all pairwise differences, analysis of means (ANOM) to compare group means to the overall mean or Dunnett’s test to compare each group mean to a control mean.
If your sample size is very small, it might be hard to test for normality. In this situation, you might need to use your understanding of the measurements. For example, for the body fat data, the trainer knows that the underlying distribution of body fat is normally distributed. Even for a very small sample, the trainer would likely go ahead with the t -test and assume normality.
What if you know the underlying measurements are not normally distributed? Or what if your sample size is large and the test for normality is rejected? In this situation, you can use nonparametric analyses. These types of analyses do not depend on an assumption that the data values are from a specific distribution. For the two-sample t -test, the Wilcoxon rank sum test is a nonparametric test that could be used.
Run a free plagiarism check in 10 minutes, generate accurate citations for free.
Published on January 31, 2020 by Rebecca Bevans . Revised on June 22, 2023.
A t test is a statistical test that is used to compare the means of two groups. It is often used in hypothesis testing to determine whether a process or treatment actually has an effect on the population of interest, or whether two groups are different from one another.
When to use a t test, what type of t test should i use, performing a t test, interpreting test results, presenting the results of a t test, other interesting articles, frequently asked questions about t tests.
A t test can only be used when comparing the means of two groups (a.k.a. pairwise comparison). If you want to compare more than two groups, or if you want to do multiple pairwise comparisons, use an ANOVA test or a post-hoc test.
The t test is a parametric test of difference, meaning that it makes the same assumptions about your data as other parametric tests. The t test assumes your data:
If your data do not fit these assumptions, you can try a nonparametric alternative to the t test, such as the Wilcoxon Signed-Rank test for data with unequal variances .
Discover proofreading & editing
When choosing a t test, you will need to consider two things: whether the groups being compared come from a single population or two different populations, and whether you want to test the difference in a specific direction.
The t test estimates the true difference between two group means using the ratio of the difference in group means over the pooled standard error of both groups. You can calculate it manually using a formula, or use statistical analysis software.
The formula for the two-sample t test (a.k.a. the Student’s t-test) is shown below.
In this formula, t is the t value, x 1 and x 2 are the means of the two groups being compared, s 2 is the pooled standard error of the two groups, and n 1 and n 2 are the number of observations in each of the groups.
A larger t value shows that the difference between group means is greater than the pooled standard error, indicating a more significant difference between the groups.
You can compare your calculated t value against the values in a critical value chart (e.g., Student’s t table) to determine whether your t value is greater than what would be expected by chance. If so, you can reject the null hypothesis and conclude that the two groups are in fact different.
Most statistical software (R, SPSS, etc.) includes a t test function. This built-in function will take your raw data and calculate the t value. It will then compare it to the critical value, and calculate a p -value . This way you can quickly see whether your groups are statistically different.
In your comparison of flower petal lengths, you decide to perform your t test using R. The code looks like this:
Download the data set to practice by yourself.
Sample data set
If you perform the t test for your flower hypothesis in R, you will receive the following output:
The output provides:
When reporting your t test results, the most important values to include are the t value , the p value , and the degrees of freedom for the test. These will communicate to your audience whether the difference between the two groups is statistically significant (a.k.a. that it is unlikely to have happened by chance).
You can also include the summary statistics for the groups being compared, namely the mean and standard deviation . In R, the code for calculating the mean and the standard deviation from the data looks like this:
flower.data %>% group_by(Species) %>% summarize(mean_length = mean(Petal.Length), sd_length = sd(Petal.Length))
In our example, you would report the results like this:
If you want to know more about statistics , methodology , or research bias , make sure to check out some of our other articles with explanations and examples.
Methodology
Research bias
A t-test is a statistical test that compares the means of two samples . It is used in hypothesis testing , with a null hypothesis that the difference in group means is zero and an alternate hypothesis that the difference in group means is different from zero.
A t-test measures the difference in group means divided by the pooled standard error of the two group means.
In this way, it calculates a number (the t-value) illustrating the magnitude of the difference between the two group means being compared, and estimates the likelihood that this difference exists purely by chance (p-value).
Your choice of t-test depends on whether you are studying one group or two groups, and whether you care about the direction of the difference in group means.
If you are studying one group, use a paired t-test to compare the group mean over time or after an intervention, or use a one-sample t-test to compare the group mean to a standard value. If you are studying two groups, use a two-sample t-test .
If you want to know only whether a difference exists, use a two-tailed test . If you want to know if one group mean is greater or less than the other, use a left-tailed or right-tailed one-tailed test .
A one-sample t-test is used to compare a single population to a standard value (for example, to determine whether the average lifespan of a specific town is different from the country average).
A paired t-test is used to compare a single population before and after some experimental intervention or at two different points in time (for example, measuring student performance on a test before and after being taught the material).
A t-test should not be used to measure differences among more than two groups, because the error structure for a t-test will underestimate the actual error when many groups are being compared.
If you want to compare the means of several groups at once, it’s best to use another statistical test such as ANOVA or a post-hoc test.
If you want to cite this source, you can copy and paste the citation or click the “Cite this Scribbr article” button to automatically add the citation to our free Citation Generator.
Bevans, R. (2023, June 22). An Introduction to t Tests | Definitions, Formula and Examples. Scribbr. Retrieved September 3, 2024, from https://www.scribbr.com/statistics/t-test/
Other students also liked, choosing the right statistical test | types & examples, hypothesis testing | a step-by-step guide with easy examples, test statistics | definition, interpretation, and examples, what is your plagiarism score.
Two sample t-test is also called independent sample t-test or unpaired sample t-test. This tutorial explains what two sample t-test is, its formula, and examples of two sample t-test.
Two sample t-test is used to test whether means from two different groups of people or objects are significantly different.
The name of two sample t-test suggests that it involves two separate, independent samples.
You test how how men and women differ in their attitudes towards drinking coffee. Here, two samples of men and women are independent from each other.
Attitudes Men vs. Attitudes Women
Another example, a restaurant chain has 20 stores in California, and has another 20 stores in New York state. They want to compare whether how the stores in California perform differently from stores in New York.
Sales California vs. Sales New York
The null hypothesis is that the underlying population means of two groups are the same:
H 0 :μ 1 =μ 2
The alternative hypothesis is that the means are not equal:
H 1 :μ 1 ≠μ 2
\( \bar{x_1} \) and \( \bar{x_2} \) are the sample means, and \( s_1^2 \) and \( s_2^2 \) are the sample variances.
The following are formulas for two sample t-test, including situations of equal variances and unequal variances.
When we assume the variances of group 1 and group 2 are equal:
\[ t=\frac{\bar{x_1}-\bar{x_2}}{ \sqrt{s_p^2(1/n_1+1/n_2)}} \]
\( s_p^2=\frac{((n_1-1)s_1^2)+((n_2-1)s_2^2)}{n_1+n_2-2} \)
When the variances of two groups are equal, the degree of freedom for the test statistic is df = n 1 +n 2 -2.
When we assume the variances of group 1 and group 2 are unequal:
\[ t=\frac{\bar{x_1}-\bar{x_2}}{ \sqrt{(s_1^2/n_1+s_2^2/n_2)}} \]
When the variances of two groups are unequal, the degree of freedom for the test statistic is \( df=\frac{(\frac{s_1^2}{n_1}+\frac{s_2^2}{n_2})^2}{\frac{(s_1^2/n_1)^2}{n_1-1}+\frac{(s_2^2/n_2)^2}{n_2-1}} \) . .
When n 1 = n 2
Note that, when n 1 = n 2 = n, the formula of equal variance and the formula of unequal variance are the same and they will generate the exactly same result for the t statistic. That is,
\[ t=\frac{\bar{x_1}-\bar{x_2}}{ \sqrt{ \frac{s_1^2+s_2^2}{n}}} \]
However, when n 1 = n 2 = n, the degree of freedom for unequal variances is still not n 1 +n 2 -2. Instead, it is \( df=\frac{(n-1)(s_1^2+s_2^2)^2}{(s_1^2)^2+(s_2^2)^2} \) .
Suppose you want to test whether women and men differ in their attitudes toward a brand, and the attitude is measured on a 7-point scale (1= Not like at all, 7 = Like it a lot).
The following is the hypothetical data, one column for men’s attitudes and another one for women’s attitudes toward the brand.
Men’s Attitudes | Women’s Attitudes |
---|---|
4 | 4 |
6 | 3 |
7 | 4 |
7 | 5 |
6 | 2 |
7 | 1 |
The following is the means for men’s and women’s attitudes.
\( \bar{x_1}=\bar{x}_{men}=\frac{4+6+7+7+6+7}{6}=6.17 \) \( \bar{x_2}=\bar{x}_{women}=\frac{4+3+4+5+2+1}{6}=3.17 \)
After knowing the means, we can calculate the sample variances for men and women respectively.
\( s_1^2=s_{men}^2=\frac{(4-6.17)^2+(6-6.17)^2+(7-6.17)^2+(7-6.17)^2+(6-6.17)^2+(7-6.17)^2}{6-1}=1.37 \)
\( s_2^2=s_{women}^2=\frac{(4-3.17)^2+(3-3.17)^2+(4-3.17)^2+(5-3.17)^2+(2-3.17)^2+(1-3.17)^2}{6-1}=2.17 \)
Then, we can calculate the pooled variance.
\( s_p^2=\frac{((n_1-1)s_1^2)+((n_2-1)s_2^2)}{n_1+n_2-2}=\frac{(6-1)1.37+(6-1)2.17}{6+6-2}=1.77 \)
Then, we can get the t statistic.
\( t=\frac{\bar{x_1}-\bar{x_2}}{\sqrt{s_p^2(1/n_1+1/n_2)}}=\frac{6.17-3.17}{\sqrt{1.77(1/6+1/6)}}=3.90935 \)
The degree of freedom is 10 if assuming equal variances. The critical t-value for p=0.05 and df=10 is 2.228. Thus, we can reject the null hypothesis. That is, we can conclude that men and women significantly differ in their attitudes. Further, the means suggest that men have significantly more favorable attitudes than women.
What is the difference between independent and paired sample t-test
Introduction.
The independent t-test, also called the two sample t-test, independent-samples t-test or student's t-test, is an inferential statistical test that determines whether there is a statistically significant difference between the means in two unrelated groups.
The null hypothesis for the independent t-test is that the population means from the two unrelated groups are equal:
H 0 : u 1 = u 2
In most cases, we are looking to see if we can show that we can reject the null hypothesis and accept the alternative hypothesis, which is that the population means are not equal:
H A : u 1 ≠ u 2
To do this, we need to set a significance level (also called alpha) that allows us to either reject or accept the alternative hypothesis. Most commonly, this value is set at 0.05.
In order to run an independent t-test, you need the following:
Unrelated groups, also called unpaired groups or independent groups, are groups in which the cases (e.g., participants) in each group are different. Often we are investigating differences in individuals, which means that when comparing two groups, an individual in one group cannot also be a member of the other group and vice versa. An example would be gender - an individual would have to be classified as either male or female – not both.
The independent t-test requires that the dependent variable is approximately normally distributed within each group.
Note: Technically, it is the residuals that need to be normally distributed, but for an independent t-test, both will give you the same result.
You can test for this using a number of different tests, but the Shapiro-Wilks test of normality or a graphical method, such as a Q-Q Plot, are very common. You can run these tests using SPSS Statistics, the procedure for which can be found in our Testing for Normality guide. However, the t-test is described as a robust test with respect to the assumption of normality. This means that some deviation away from normality does not have a large influence on Type I error rates. The exception to this is if the ratio of the smallest to largest group size is greater than 1.5 (largest compared to smallest).
If you find that either one or both of your group's data is not approximately normally distributed and groups sizes differ greatly, you have two options: (1) transform your data so that the data becomes normally distributed (to do this in SPSS Statistics see our guide on Transforming Data ), or (2) run the Mann-Whitney U test which is a non-parametric test that does not require the assumption of normality (to run this test in SPSS Statistics see our guide on the Mann-Whitney U Test ).
The independent t-test assumes the variances of the two groups you are measuring are equal in the population. If your variances are unequal, this can affect the Type I error rate. The assumption of homogeneity of variance can be tested using Levene's Test of Equality of Variances, which is produced in SPSS Statistics when running the independent t-test procedure. If you have run Levene's Test of Equality of Variances in SPSS Statistics, you will get a result similar to that below:
This test for homogeneity of variance provides an F -statistic and a significance value ( p -value). We are primarily concerned with the significance value – if it is greater than 0.05 (i.e., p > .05), our group variances can be treated as equal. However, if p < 0.05, we have unequal variances and we have violated the assumption of homogeneity of variances.
If the Levene's Test for Equality of Variances is statistically significant, which indicates that the group variances are unequal in the population, you can correct for this violation by not using the pooled estimate for the error term for the t -statistic, but instead using an adjustment to the degrees of freedom using the Welch-Satterthwaite method. In all reality, you will probably never have heard of these adjustments because SPSS Statistics hides this information and simply labels the two options as "Equal variances assumed" and "Equal variances not assumed" without explicitly stating the underlying tests used. However, you can see the evidence of these tests as below:
From the result of Levene's Test for Equality of Variances, we can reject the null hypothesis that there is no difference in the variances between the groups and accept the alternative hypothesis that there is a statistically significant difference in the variances between groups. The effect of not being able to assume equal variances is evident in the final column of the above figure where we see a reduction in the value of the t -statistic and a large reduction in the degrees of freedom (df). This has the effect of increasing the p -value above the critical significance level of 0.05. In this case, we therefore do not accept the alternative hypothesis and accept that there are no statistically significant differences between means. This would not have been our conclusion had we not tested for homogeneity of variances.
When reporting the result of an independent t-test, you need to include the t -statistic value, the degrees of freedom (df) and the significance value of the test ( p -value). The format of the test result is: t (df) = t -statistic, p = significance value. Therefore, for the example above, you could report the result as t (7.001) = 2.233, p = 0.061.
In order to provide enough information for readers to fully understand the results when you have run an independent t-test, you should include the result of normality tests, Levene's Equality of Variances test, the two group means and standard deviations, the actual t-test result and the direction of the difference (if any). In addition, you might also wish to include the difference between the groups along with a 95% confidence interval. For example:
Inspection of Q-Q Plots revealed that cholesterol concentration was normally distributed for both groups and that there was homogeneity of variance as assessed by Levene's Test for Equality of Variances. Therefore, an independent t-test was run on the data with a 95% confidence interval (CI) for the mean difference. It was found that after the two interventions, cholesterol concentrations in the dietary group (6.15 ± 0.52 mmol/L) were significantly higher than the exercise group (5.80 ± 0.38 mmol/L) ( t (38) = 2.470, p = 0.018) with a difference of 0.35 (95% CI, 0.06 to 0.64) mmol/L.
To know how to run an independent t-test in SPSS Statistics, see our SPSS Statistics Independent-Samples T-Test guide. Alternatively, you can carry out an independent-samples t-test using Excel, R and RStudio .
Table of contents
Welcome to our t-test calculator! Here you can not only easily perform one-sample t-tests , but also two-sample t-tests , as well as paired t-tests .
Do you prefer to find the p-value from t-test, or would you rather find the t-test critical values? Well, this t-test calculator can do both! 😊
What does a t-test tell you? Take a look at the text below, where we explain what actually gets tested when various types of t-tests are performed. Also, we explain when to use t-tests (in particular, whether to use the z-test vs. t-test) and what assumptions your data should satisfy for the results of a t-test to be valid. If you've ever wanted to know how to do a t-test by hand, we provide the necessary t-test formula, as well as tell you how to determine the number of degrees of freedom in a t-test.
A t-test is one of the most popular statistical tests for location , i.e., it deals with the population(s) mean value(s).
There are different types of t-tests that you can perform:
In the next section , we explain when to use which. Remember that a t-test can only be used for one or two groups . If you need to compare three (or more) means, use the analysis of variance ( ANOVA ) method.
The t-test is a parametric test, meaning that your data has to fulfill some assumptions :
If your sample doesn't fit these assumptions, you can resort to nonparametric alternatives. Visit our Mann–Whitney U test calculator or the Wilcoxon rank-sum test calculator to learn more. Other possibilities include the Wilcoxon signed-rank test or the sign test.
Your choice of t-test depends on whether you are studying one group or two groups:
One sample t-test
Choose the one-sample t-test to check if the mean of a population is equal to some pre-set hypothesized value .
The average volume of a drink sold in 0.33 l cans — is it really equal to 330 ml?
The average weight of people from a specific city — is it different from the national average?
Choose the two-sample t-test to check if the difference between the means of two populations is equal to some pre-determined value when the two samples have been chosen independently of each other.
In particular, you can use this test to check whether the two groups are different from one another .
The average difference in weight gain in two groups of people: one group was on a high-carb diet and the other on a high-fat diet.
The average difference in the results of a math test from students at two different universities.
This test is sometimes referred to as an independent samples t-test , or an unpaired samples t-test .
A paired t-test is used to investigate the change in the mean of a population before and after some experimental intervention , based on a paired sample, i.e., when each subject has been measured twice: before and after treatment.
In particular, you can use this test to check whether, on average, the treatment has had any effect on the population .
The change in student test performance before and after taking a course.
The change in blood pressure in patients before and after administering some drug.
So, you've decided which t-test to perform. These next steps will tell you how to calculate the p-value from t-test or its critical values, and then which decision to make about the null hypothesis.
Decide on the alternative hypothesis :
Use a two-tailed t-test if you only care whether the population's mean (or, in the case of two populations, the difference between the populations' means) agrees or disagrees with the pre-set value.
Use a one-tailed t-test if you want to test whether this mean (or difference in means) is greater/less than the pre-set value.
Compute your T-score value :
Formulas for the test statistic in t-tests include the sample size , as well as its mean and standard deviation . The exact formula depends on the t-test type — check the sections dedicated to each particular test for more details.
Determine the degrees of freedom for the t-test:
The degrees of freedom are the number of observations in a sample that are free to vary as we estimate statistical parameters. In the simplest case, the number of degrees of freedom equals your sample size minus the number of parameters you need to estimate . Again, the exact formula depends on the t-test you want to perform — check the sections below for details.
The degrees of freedom are essential, as they determine the distribution followed by your T-score (under the null hypothesis). If there are d degrees of freedom, then the distribution of the test statistics is the t-Student distribution with d degrees of freedom . This distribution has a shape similar to N(0,1) (bell-shaped and symmetric) but has heavier tails . If the number of degrees of freedom is large (>30), which generically happens for large samples, the t-Student distribution is practically indistinguishable from N(0,1).
💡 The t-Student distribution owes its name to William Sealy Gosset, who, in 1908, published his paper on the t-test under the pseudonym "Student". Gosset worked at the famous Guinness Brewery in Dublin, Ireland, and devised the t-test as an economical way to monitor the quality of beer. Cheers! 🍺🍺🍺
Recall that the p-value is the probability (calculated under the assumption that the null hypothesis is true) that the test statistic will produce values at least as extreme as the T-score produced for your sample . As probabilities correspond to areas under the density function, p-value from t-test can be nicely illustrated with the help of the following pictures:
The following formulae say how to calculate p-value from t-test. By cdf t,d we denote the cumulative distribution function of the t-Student distribution with d degrees of freedom:
p-value from left-tailed t-test:
p-value = cdf t,d (t score )
p-value from right-tailed t-test:
p-value = 1 − cdf t,d (t score )
p-value from two-tailed t-test:
p-value = 2 × cdf t,d (−|t score |)
or, equivalently: p-value = 2 − 2 × cdf t,d (|t score |)
However, the cdf of the t-distribution is given by a somewhat complicated formula. To find the p-value by hand, you would need to resort to statistical tables, where approximate cdf values are collected, or to specialized statistical software. Fortunately, our t-test calculator determines the p-value from t-test for you in the blink of an eye!
Recall, that in the critical values approach to hypothesis testing, you need to set a significance level, α, before computing the critical values , which in turn give rise to critical regions (a.k.a. rejection regions).
Formulas for critical values employ the quantile function of t-distribution, i.e., the inverse of the cdf :
Critical value for left-tailed t-test: cdf t,d -1 (α)
critical region:
(-∞, cdf t,d -1 (α)]
Critical value for right-tailed t-test: cdf t,d -1 (1-α)
[cdf t,d -1 (1-α), ∞)
Critical values for two-tailed t-test: ±cdf t,d -1 (1-α/2)
(-∞, -cdf t,d -1 (1-α/2)] ∪ [cdf t,d -1 (1-α/2), ∞)
To decide the fate of the null hypothesis, just check if your T-score lies within the critical region:
If your T-score belongs to the critical region , reject the null hypothesis and accept the alternative hypothesis.
If your T-score is outside the critical region , then you don't have enough evidence to reject the null hypothesis.
Choose the type of t-test you wish to perform:
A one-sample t-test (to test the mean of a single group against a hypothesized mean);
A two-sample t-test (to compare the means for two groups); or
A paired t-test (to check how the mean from the same group changes after some intervention).
Two-tailed;
Left-tailed; or
Right-tailed.
This t-test calculator allows you to use either the p-value approach or the critical regions approach to hypothesis testing!
Enter your T-score and the number of degrees of freedom . If you don't know them, provide some data about your sample(s): sample size, mean, and standard deviation, and our t-test calculator will compute the T-score and degrees of freedom for you .
Once all the parameters are present, the p-value, or critical region, will immediately appear underneath the t-test calculator, along with an interpretation!
The null hypothesis is that the population mean is equal to some value μ 0 \mu_0 μ 0 .
The alternative hypothesis is that the population mean is:
One-sample t-test formula :
Number of degrees of freedom in t-test (one-sample) = n − 1 n-1 n − 1 .
The null hypothesis is that the actual difference between these groups' means, μ 1 \mu_1 μ 1 , and μ 2 \mu_2 μ 2 , is equal to some pre-set value, Δ \Delta Δ .
The alternative hypothesis is that the difference μ 1 − μ 2 \mu_1 - \mu_2 μ 1 − μ 2 is:
In particular, if this pre-determined difference is zero ( Δ = 0 \Delta = 0 Δ = 0 ):
The null hypothesis is that the population means are equal.
The alternate hypothesis is that the population means are:
Formally, to perform a t-test, we should additionally assume that the variances of the two populations are equal (this assumption is called the homogeneity of variance ).
There is a version of a t-test that can be applied without the assumption of homogeneity of variance: it is called a Welch's t-test . For your convenience, we describe both versions.
Use this test if you know that the two populations' variances are the same (or very similar).
Two-sample t-test formula (with equal variances) :
where s p s_p s p is the so-called pooled standard deviation , which we compute as:
Number of degrees of freedom in t-test (two samples, equal variances) = n 1 + n 2 − 2 n_1 + n_2 - 2 n 1 + n 2 − 2 .
Use this test if the variances of your populations are different.
Two-sample Welch's t-test formula if variances are unequal:
The number of degrees of freedom in a Welch's t-test (two-sample t-test with unequal variances) is very difficult to count. We can approximate it with the help of the following Satterthwaite formula :
Alternatively, you can take the smaller of n 1 − 1 n_1 - 1 n 1 − 1 and n 2 − 1 n_2 - 1 n 2 − 1 as a conservative estimate for the number of degrees of freedom.
🔎 The Satterthwaite formula for the degrees of freedom can be rewritten as a scaled weighted harmonic mean of the degrees of freedom of the respective samples: n 1 − 1 n_1 - 1 n 1 − 1 and n 2 − 1 n_2 - 1 n 2 − 1 , and the weights are proportional to the standard deviations of the corresponding samples.
As we commonly perform a paired t-test when we have data about the same subjects measured twice (before and after some treatment), let us adopt the convention of referring to the samples as the pre-group and post-group.
The null hypothesis is that the true difference between the means of pre- and post-populations is equal to some pre-set value, Δ \Delta Δ .
The alternative hypothesis is that the actual difference between these means is:
Typically, this pre-determined difference is zero. We can then reformulate the hypotheses as follows:
The null hypothesis is that the pre- and post-means are the same, i.e., the treatment has no impact on the population .
The alternative hypothesis:
Paired t-test formula
In fact, a paired t-test is technically the same as a one-sample t-test! Let us see why it is so. Let x 1 , . . . , x n x_1, ... , x_n x 1 , ... , x n be the pre observations and y 1 , . . . , y n y_1, ... , y_n y 1 , ... , y n the respective post observations. That is, x i , y i x_i, y_i x i , y i are the before and after measurements of the i -th subject.
For each subject, compute the difference, d i : = x i − y i d_i := x_i - y_i d i := x i − y i . All that happens next is just a one-sample t-test performed on the sample of differences d 1 , . . . , d n d_1, ... , d_n d 1 , ... , d n . Take a look at the formula for the T-score :
Δ \Delta Δ — Mean difference postulated in the null hypothesis;
n n n — Size of the sample of differences, i.e., the number of pairs;
x ˉ \bar{x} x ˉ — Mean of the sample of differences; and
s s s — Standard deviation of the sample of differences.
Number of degrees of freedom in t-test (paired): n − 1 n - 1 n − 1
We use a Z-test when we want to test the population mean of a normally distributed dataset, which has a known population variance . If the number of degrees of freedom is large, then the t-Student distribution is very close to N(0,1).
Hence, if there are many data points (at least 30), you may swap a t-test for a Z-test, and the results will be almost identical. However, for small samples with unknown variance, remember to use the t-test because, in such cases, the t-Student distribution differs significantly from the N(0,1)!
🙋 Have you concluded you need to perform the z-test? Head straight to our z-test calculator !
A t-test is a widely used statistical test that analyzes the means of one or two groups of data. For instance, a t-test is performed on medical data to determine whether a new drug really helps.
Different types of t-tests are:
To find the t-value:
Choose test type
t-test for the population mean, μ, based on one independent sample . Null hypothesis H 0 : μ = μ 0
Alternative hypothesis H 1
Significance level α
The probability that we reject a true H 0 (type I error).
Degrees of freedom
Calculated as sample size minus one.
Topics: Hypothesis Testing , Data Analysis
In statistics, t-tests are a type of hypothesis test that allows you to compare means. They are called t-tests because each t-test boils your sample data down to one number, the t-value. If you understand how t-tests calculate t-values, you’re well on your way to understanding how these tests work.
In this series of posts, I'm focusing on concepts rather than equations to show how t-tests work. However, this post includes two simple equations that I’ll work through using the analogy of a signal-to-noise ratio.
Minitab Statistical Software offers the 1-sample t-test, paired t-test, and the 2-sample t-test. Let's look at how each of these t-tests reduce your sample data down to the t-value.
Understanding this process is crucial to understanding how t-tests work. I'll show you the formula first, and then I’ll explain how it works.
Please notice that the formula is a ratio. A common analogy is that the t-value is the signal-to-noise ratio.
The numerator is the signal. You simply take the sample mean and subtract the null hypothesis value. If your sample mean is 10 and the null hypothesis is 6, the difference, or signal, is 4.
If there is no difference between the sample mean and null value, the signal in the numerator, as well as the value of the entire ratio, equals zero. For instance, if your sample mean is 6 and the null value is 6, the difference is zero.
As the difference between the sample mean and the null hypothesis mean increases in either the positive or negative direction, the strength of the signal increases.
The denominator is the noise. The equation in the denominator is a measure of variability known as the standard error of the mean . This statistic indicates how accurately your sample estimates the mean of the population. A larger number indicates that your sample estimate is less precise because it has more random error.
This random error is the “noise.” When there is more noise, you expect to see larger differences between the sample mean and the null hypothesis value even when the null hypothesis is true . We include the noise factor in the denominator because we must determine whether the signal is large enough to stand out from it.
Both the signal and noise values are in the units of your data. If your signal is 6 and the noise is 2, your t-value is 3. This t-value indicates that the difference is 3 times the size of the standard error. However, if there is a difference of the same size but your data have more variability (6), your t-value is only 1. The signal is at the same scale as the noise.
In this manner, t-values allow you to see how distinguishable your signal is from the noise. Relatively large signals and low levels of noise produce larger t-values. If the signal does not stand out from the noise, it’s likely that the observed difference between the sample estimate and the null hypothesis value is due to random error in the sample rather than a true difference at the population level.
Many people are confused about when to use a paired t-test and how it works. I’ll let you in on a little secret. The paired t-test and the 1-sample t-test are actually the same test in disguise! As we saw above, a 1-sample t-test compares one sample mean to a null hypothesis value. A paired t-test simply calculates the difference between paired observations (e.g., before and after) and then performs a 1-sample t-test on the differences.
You can test this with this data set to see how all of the results are identical, including the mean difference, t-value, p-value, and confidence interval of the difference.
Understanding that the paired t-test simply performs a 1-sample t-test on the paired differences can really help you understand how the paired t-test works and when to use it. You just need to figure out whether it makes sense to calculate the difference between each pair of observations.
For example, let’s assume that “before” and “after” represent test scores, and there was an intervention in between them. If the before and after scores in each row of the example worksheet represent the same subject, it makes sense to calculate the difference between the scores in this fashion—the paired t-test is appropriate. However, if the scores in each row are for different subjects, it doesn’t make sense to calculate the difference. In this case, you’d need to use another test, such as the 2-sample t-test, which I discuss below.
Using the paired t-test simply saves you the step of having to calculate the differences before performing the t-test. You just need to be sure that the paired differences make sense!
When it is appropriate to use a paired t-test, it can be more powerful than a 2-sample t-test. For more information, go to Overview for paired t .
The 2-sample t-test takes your sample data from two groups and boils it down to the t-value. The process is very similar to the 1-sample t-test, and you can still use the analogy of the signal-to-noise ratio. Unlike the paired t-test, the 2-sample t-test requires independent groups for each sample.
The formula is below, and then some discussion.
For the 2-sample t-test, the numerator is again the signal, which is the difference between the means of the two samples. For example, if the mean of group 1 is 10, and the mean of group 2 is 4, the difference is 6.
The default null hypothesis for a 2-sample t-test is that the two groups are equal. You can see in the equation that when the two groups are equal, the difference (and the entire ratio) also equals zero. As the difference between the two groups grows in either a positive or negative direction, the signal becomes stronger.
In a 2-sample t-test, the denominator is still the noise, but Minitab can use two different values. You can either assume that the variability in both groups is equal or not equal, and Minitab uses the corresponding estimate of the variability. Either way, the principle remains the same: you are comparing your signal to the noise to see how much the signal stands out.
Just like with the 1-sample t-test, for any given difference in the numerator, as you increase the noise value in the denominator, the t-value becomes smaller. To determine that the groups are different, you need a t-value that is large.
Each type of t-test uses a procedure to boil all of your sample data down to one value, the t-value. The calculations compare your sample mean(s) to the null hypothesis and incorporates both the sample size and the variability in the data. A t-value of 0 indicates that the sample results exactly equal the null hypothesis. In statistics, we call the difference between the sample estimate and the null hypothesis the effect size. As this difference increases, the absolute value of the t-value increases.
That’s all nice, but what does a t-value of, say, 2 really mean? From the discussion above, we know that a t-value of 2 indicates that the observed difference is twice the size of the variability in your data. However, we use t-tests to evaluate hypotheses rather than just figuring out the signal-to-noise ratio. We want to determine whether the effect size is statistically significant.
To see how we get from t-values to assessing hypotheses and determining statistical significance, read the other post in this series, Understanding t-Tests: t-values and t-distributions .
© 2023 Minitab, LLC. All Rights Reserved.
Calcworkshop
// Last Updated: October 9, 2020 - Watch Video //
Did you know that the two sample t test is used to calculate the difference between population means?
Jenn, Founder Calcworkshop ® , 15+ Years Experience (Licensed & Certified Teacher)
It’s true!
Now, there 3 ways to calculate the difference between means, as listed below:
Let’s find out more!
So how do we compare the mean of some quantitative variables for two different populations?
If our parameters of interest are the population means , then the best approach is to take random samples from both populations and compare their sample means as noted on the Engineering Statistics Handbook .
In other words , we analyze the difference between two sample means to understand the average difference between the two populations. And as always, the larger the sample size the more accurate our inferences will be.
Just like we saw with one-sample means , we will either employ a z-test or t-test depending on whether or not the population standard deviation is known or unknown .
However, there is a component we must consider, if we have independent random samples where the population standard deviation is unknown – do we pool our variances ?
When we found the difference of population proportions, we automatically pooled our variances. However, with the difference of population means, we will have to check. We do this by finding an F-statistic .
If this F-statistic is less than or equal to the critical number, then we will pool our variances. Otherwise, we will not pool.
Please note, that it is infrequent to have two independent samples with equal, or almost equal, variances — therefore, the formula for un-pooled variations is more readily accepted for most high school statistics courses.
But it is an important skill to learn and understand, so we will be working through several examples of when we need to pool variances and when we do not.
For example, imagine the college provost at one school said their students study more, on average than those at the neighboring school.
However, the provost at the nearby school believed the study time was the same and wants to clear up the controversy.
So, independent random samples were taken from both schools, with the results stated below. And at a 5% significance level, the following significance test is conducted.
Two Sample T Test Pooled Example
Notice that we pooled our variances because our F-statistic yielded a value less than our critical value. The interpretation of our results are as follows:
But what do we do if the populations we wish to compare are not different but the same?
Meaning, the difference between means is due to the population’s varying conditions and not due to the experimental units in the study.
When this happens, we have what is called a Matched Pairs T Test .
The great thing about a paired t test is that it becomes a one-sample t-test on the differences.
And then we will calculate the sample mean and sample standard deviation, sometimes referred to as standard error, using these difference values.
Matched Pairs T Test Formula
What is important to remember with any of these tests, whether it be a z-test or a two-sample t-test, our conclusions will be the same as a one-sample test.
For example, once we find out the test statistic, we then determine our p-value, and if our p-value is less than or equal to our significance level, we will reject our null hypothesis.
One Sample Flow Chart
Two Sample Flow Chart
As the flow chart demonstrates above, our first step is to decide what type of test we are conducting. Is the standard deviation known? Do we have a one sample test or a two sample test or is it matched-pair?
Then, once we have identified the test we are using, our procedure is as follows:
Together, we will work through various examples of all different hypothesis tests for the difference in population means, so we become comfortable with each formula and know why and how to use them effectively.
1 hr 22 min
Get access to all the courses and over 450 HD videos with your subscription
Monthly and Yearly Plans Available
Get My Subscription Now
Still wondering if CalcWorkshop is right for you? Take a Tour and find out how a membership can take the struggle out of learning math.
In this topic, step 1: determine a confidence interval for the difference in population means, step 2: determine whether the difference is statistically significant, step 3: check your data for problems.
First, consider the difference in the sample means and then examine the confidence interval.
The difference is an estimate of the difference in the population means. Because the difference is based on sample data and not on the entire population, it is unlikely that the sample difference equals the population difference. To better estimate the population difference, use the confidence interval for the difference.
The confidence interval provides a range of likely values for the difference between two population means. For example, a 95% confidence level indicates that if you take 100 random samples from the population, you could expect approximately 95 of the samples to produce intervals that contain the population difference. The confidence interval helps you assess the practical significance of your results. Use your specialized knowledge to determine whether the confidence interval includes values that have practical significance for your situation. If the interval is too wide to be useful, consider increasing your sample size. For more information, go to Ways to get a more precise confidence interval .
Difference | 95% CI for Difference |
---|---|
21.00 | (14.22, 27.78) |
In these results, the estimate of the population difference in means in hospital ratings is 21. You can be 95% confident that the population mean for the difference is between 14.22 and 27.78.
Null hypothesis | H₀: μ₁ - µ₂ = 0 |
---|---|
Alternative hypothesis | H₁: μ₁ - µ₂ ≠ 0 |
T-Value | DF | P-Value |
---|---|---|
6.31 | 32 | 0.000 |
In these results, the null hypothesis states that the difference in the mean rating between two hospitals is 0. Because the p-value is less than 0.000, which is less than the significance level of 0.05, the decision is to reject the null hypothesis and conclude that the ratings of the hospitals are different.
Problems with your data, such as skewness and outliers can adversely affect your results. Use the graphs to look for skewness (by examining the spread of each sample) and to identify potential outliers.
When data are skewed, the majority of the data are located on the high or low side of the graph. Often, skewness is easiest to detect with a histogram or boxplot.
The boxplot with right-skewed data shows wait times. Most of the wait times are relatively short, and only a few wait times are long. The boxplot with left-skewed data shows failure time data. A few items fail immediately, and many more items fail later.
Data that are severely skewed can affect the validity of the p-value if your samples are small (either sample is less than 15 values). If your data are severely skewed and you have a small sample, consider increasing your sample size.
In these boxplots, the data for Hospital B appear to be severely skewed.
Outliers, which are data values that are far away from other data values, can strongly affect the results of your analysis. Often, outliers are easiest to identify on a boxplot.
On a boxplot, asterisks (*) denote outliers.
Try to identify the cause of any outliers. Correct any data–entry errors or measurement errors. Consider removing data values for abnormal, one-time events (also called special causes). Then, repeat the analysis. For more information, go to Identifying outliers .
In these boxplots, the data for Hospital B have 2 outliers.
You are now leaving support.minitab.com.
Click Continue to proceed to:
Help Center Help Center
Two-sample t -test
h = ttest2( x , y ) returns a test decision for the null hypothesis that the data in vectors x and y comes from independent random samples from normal distributions with equal means and equal but unknown variances, using the two-sample t -test . The alternative hypothesis is that the data in x and y comes from populations with unequal means. The result h is 1 if the test rejects the null hypothesis at the 5% significance level, and 0 otherwise.
h = ttest2( x , y , Name,Value ) returns a test decision for the two-sample t -test with additional options specified by one or more name-value pair arguments. For example, you can change the significance level or conduct the test without assuming equal variances.
[ h , p ] = ttest2( ___ ) also returns the p -value, p , of the test, using any of the input arguments in the previous syntaxes.
[ h , p , ci , stats ] = ttest2( ___ ) also returns the confidence interval on the difference of the population means, ci , and the structure stats containing information about the test statistic.
collapse all
Load the data set. Create vectors containing the first and second columns of the data matrix to represent students’ grades on two exams.
Test the null hypothesis that the two data samples are from populations with equal means.
The returned value of h = 0 indicates that ttest2 does not reject the null hypothesis at the default 5% significance level.
Test the null hypothesis that the two data vectors are from populations with equal means, without assuming that the populations also have equal variances.
The returned value of h = 0 indicates that ttest2 does not reject the null hypothesis at the default 5% significance level even if equal variances are not assumed.
Load the sample data. Create a categorical vector to label the vehicle mileage data according to the vehicle year.
Create box plots of the mileage data for each decade.
Create vectors from the mileage data for each decade. Use a left-tailed, two-sample t -test to test the null hypothesis that the data comes from populations with equal means. Use the alternative hypothesis that the population mean for the mileage of cars made in the 1970s is less than the population mean for the mileage of cars made in the 1980s.
The returned value of h = 1 indicates that ttest2 rejects the null hypothesis at the default significance level of 5%, in favor of the alternative hypothesis that the population mean for the mileage of cars made in the 1970s is less than the population mean for the mileage of cars made in the 1980s.
Plot the corresponding Student's t -distribution, the returned t -statistic, and the critical t -value. Calculate the critical t -value at the default confidence level of 95% by using tinv .
The orange dot represents the t -statistic and is located to the left of the dashed black line that represents the critical t -value.
X — sample data vector | matrix | multidimensional array.
Sample data, specified as a vector, matrix, or multidimensional array. ttest2 treats NaN values as missing data and ignores them.
If x and y are specified as vectors, they do not need to be the same length.
If x and y are specified as matrices, they must have the same number of columns. ttest2 performs a separate t -test along each column and returns a vector of results.
If x and y are specified as multidimensional arrays , they must have the same size along all but the first nonsingleton dimension .
Data Types: single | double
If x and y are specified as multidimensional arrays , they must have the same size along all but the first nonsingleton dimension . ttest2 works along the first nonsingleton dimension.
Specify optional pairs of arguments as Name1=Value1,...,NameN=ValueN , where Name is the argument name and Value is the corresponding value. Name-value arguments must appear after other arguments, but the order of the pairs does not matter.
Before R2021a, use commas to separate each name and value, and enclose Name in quotes.
Example: 'Tail','right','Alpha',0.01,'Vartype','unequal' specifies a right-tailed test at the 1% significance level, and does not assume that x and y have equal population variances.
Significance level of the hypothesis test, specified as the comma-separated pair consisting of 'Alpha' and a scalar value in the range (0,1).
Example: 'Alpha',0.01
Dimension of the input matrix along which to test the means, specified as the comma-separated pair consisting of 'Dim' and a positive integer value. For example, specifying 'Dim',1 tests the column means, while 'Dim',2 tests the row means.
Example: 'Dim',2
Type of alternative hypothesis to evaluate, specified as the comma-separated pair consisting of 'Tail' and one of:
'both' — Test against the alternative hypothesis that the population means are not equal.
'right' — Test against the alternative hypothesis that the population mean of x is greater than the population mean of y .
'left' — Test against the alternative hypothesis that the population mean of x is less than the population mean of y .
ttest2 tests the null hypothesis that the population means are equal against the specified alternative hypothesis.
Example: 'Tail','right'
Variance type, specified as the comma-separated pair consisting of 'Vartype' and one of the following.
Conduct test using the assumption that and are from normal distributions with unknown but equal variances. | |
Conduct test using the assumption that and are from normal distributions with unknown and unequal variances. This is called the Behrens-Fisher problem. uses Satterthwaite’s approximation for the effective degrees of freedom. |
Vartype must be a single variance type, even when x is a matrix or a multidimensional array.
Example: 'Vartype','unequal'
H — hypothesis test result 1 | 0.
Hypothesis test result, returned as 1 or 0 .
If h = 1 , this indicates the rejection of the null hypothesis at the Alpha significance level.
If h = 0 , this indicates a failure to reject the null hypothesis at the Alpha significance level.
p -value of the test, returned as a scalar value in the range [0,1]. p is the probability of observing a test statistic as extreme as, or more extreme than, the observed value under the null hypothesis. Small values of p cast doubt on the validity of the null hypothesis.
Confidence interval for the difference in population means of x and y , returned as a two-element vector containing the lower and upper boundaries of the 100 × (1 – Alpha )% confidence interval.
Test statistics for the two-sample t -test, returned as a structure containing the following:
tstat — Value of the test statistic.
df — Degrees of freedom of the test.
sd — Pooled estimate of the population standard deviation (for the equal variance case) or a vector containing the unpooled estimates of the population standard deviations (for the unequal variance case).
The two-sample t -test is a parametric test that compares the location parameter of two independent data samples.
The test statistic is
t = x ¯ − y ¯ s x 2 n + s y 2 m ,
where x ¯ and y ¯ are the sample means, s x and s y are the sample standard deviations, and n and m are the sample sizes.
In the case where it is assumed that the two data samples are from populations with equal variances, the test statistic under the null hypothesis has Student's t distribution with n + m – 2 degrees of freedom, and the sample standard deviations are replaced by the pooled standard deviation
s = ( n − 1 ) s x 2 + ( m − 1 ) s y 2 n + m − 2 .
In the case where it is not assumed that the two data samples are from populations with equal variances, the test statistic under the null hypothesis has an approximate Student's t distribution with a number of degrees of freedom given by Satterthwaite's approximation. This test is sometimes called Welch’s t -test.
A multidimensional array has more than two dimensions. For example, if x is a 1-by-3-by-4 array, then x is a three-dimensional array.
The first nonsingleton dimension is the first dimension of an array whose size is not equal to 1. For example, if x is a 1-by-2-by-3-by-4 array, then the second dimension is the first nonsingleton dimension of x .
Use sampsizepwr to calculate:
The sample size that corresponds to specified power and parameter values;
The power achieved for a particular sample size, given the true parameter value;
The parameter value detectable with the specified sample size and power.
Gpu arrays accelerate code by running on a graphics processing unit (gpu) using parallel computing toolbox™..
This function fully supports GPU arrays. For more information, see Run MATLAB Functions on a GPU (Parallel Computing Toolbox) .
Introduced before R2006a
ttest | ztest | sampsizepwr
You clicked a link that corresponds to this MATLAB command:
Run the command by entering it in the MATLAB Command Window. Web browsers do not support MATLAB commands.
Select a Web Site
Choose a web site to get translated content where available and see local events and offers. Based on your location, we recommend that you select: .
You can also select a web site from the following list:
Select the China site (in Chinese or English) for best site performance. Other MathWorks country sites are not optimized for visits from your location.
Contact your local office
Statistics By Jim
Making statistics intuitive
By Jim Frost 12 Comments
T-tests are statistical hypothesis tests that you use to analyze one or two sample means. Depending on the t-test that you use, you can compare a sample mean to a hypothesized value, the means of two independent samples, or the difference between paired samples. In this post, I show you how t-tests use t-values and t-distributions to calculate probabilities and test hypotheses.
As usual, I’ll provide clear explanations of t-values and t-distributions using concepts and graphs rather than formulas! If you need a primer on the basics, read my hypothesis testing overview .
The term “t-test” refers to the fact that these hypothesis tests use t-values to evaluate your sample data. T-values are a type of test statistic. Hypothesis tests use the test statistic that is calculated from your sample to compare your sample to the null hypothesis. If the test statistic is extreme enough, this indicates that your data are so incompatible with the null hypothesis that you can reject the null. Learn more about Test Statistics .
Don’t worry. I find these technical definitions of statistical terms are easier to explain with graphs, and we’ll get to that!
When you analyze your data with any t-test, the procedure reduces your entire sample to a single value, the t-value. These calculations factor in your sample size and the variation in your data. Then, the t-test compares your sample means(s) to the null hypothesis condition in the following manner:
Read the companion post where I explain how t-tests calculate t-values .
The tricky thing about t-values is that they are a unitless statistic, which makes them difficult to interpret on their own. Imagine that we performed a t-test, and it produced a t-value of 2. What does this t-value mean exactly? We know that the sample mean doesn’t equal the null hypothesis value because this t-value doesn’t equal zero. However, we don’t know how exceptional our value is if the null hypothesis is correct.
To be able to interpret individual t-values, we have to place them in a larger context. T-distributions provide this broader context so we can determine the unusualness of an individual t-value.
A single t-test produces a single t-value. Now, imagine the following process. First, let’s assume that the null hypothesis is true for the population. Now, suppose we repeat our study many times by drawing many random samples of the same size from this population. Next, we perform t-tests on all of the samples and plot the distribution of the t-values. This distribution is known as a sampling distribution, which is a type of probability distribution.
Related posts : Sampling Distributions and Understanding Probability Distributions
If we follow this procedure, we produce a graph that displays the distribution of t-values that we obtain from a population where the null hypothesis is true. We use sampling distributions to calculate probabilities for how unusual our sample statistic is if the null hypothesis is true.
Luckily, we don’t need to go through the hassle of collecting numerous random samples to create this graph! Statisticians understand the properties of t-distributions so we can estimate the sampling distribution using the t-distribution and our sample size.
The degrees of freedom (DF) for the statistical design define the t-distribution for a particular study. The DF are closely related to the sample size. For t-tests, there is a different t-distribution for each sample size.
Related posts : Degrees of Freedom in Statistics and T Distribution: Definition and Uses .
T-distributions assume that the null hypothesis is correct for the population from which you draw your random samples. To evaluate how compatible your sample data are with the null hypothesis, place your study’s t-value in the t-distribution and determine how unusual it is.
The sampling distribution below displays a t-distribution with 20 degrees of freedom, which equates to a sample size of 21 for a 1-sample t-test. The t-distribution centers on zero because it assumes that the null hypothesis is true. When the null is true, your study is most likely to obtain a t-value near zero and less liable to produce t-values further from zero in either direction.
On the graph, I’ve displayed the t-value of 2 from our hypothetical study to see how our sample data compares to the null hypothesis. Under the assumption that the null is true, the t-distribution indicates that our t-value is not the most likely value. However, there still appears to be a realistic chance of observing t-values from -2 to +2.
We know that our t-value of 2 is rare when the null hypothesis is true. How rare is it exactly? Our final goal is to evaluate whether our sample t-value is so rare that it justifies rejecting the null hypothesis for the entire population based on our sample data. To proceed, we need to quantify the probability of observing our t-value.
Related post : What are Critical Values?
Hypothesis tests work by taking the observed test statistic from a sample and using the sampling distribution to calculate the probability of obtaining that test statistic if the null hypothesis is correct. In the context of how t-tests work, you assess the likelihood of a t-value using the t-distribution. If a t-value is sufficiently improbable when the null hypothesis is true, you can reject the null hypothesis.
I have two crucial points to explain before we calculate the probability linked to our t-value of 2.
Because I’m showing the results of a two-tailed test, we’ll use the t-values of +2 and -2. Two-tailed tests allow you to assess whether the sample mean is greater than or less than the target value in a 1-sample t-test. A one-tailed hypothesis test can only determine statistical significance for one or the other.
Additionally, it is possible to calculate a probability only for a range of t-values. On a probability distribution plot, probabilities are represented by the shaded area under a distribution curve. Without a range of values, there is no area under the curve and, hence, no probability.
Related posts : One-Tailed and Two-Tailed Tests Explained and T-Distribution Table of Critical Values
Considering these points, the graph below finds the probability associated with t-values less than -2 and greater than +2 using the area under the curve. This graph is specific to our t-test design (1-sample t-test with N = 21).
The probability distribution plot indicates that each of the two shaded regions has a probability of 0.02963—for a total of 0.05926. This graph shows that t-values fall within these areas almost 6% of the time when the null hypothesis is true.
There is a chance that you’ve heard of this type of probability before—it’s the P value! While the likelihood of t-values falling within these regions seems small, it’s not quite unlikely enough to justify rejecting the null under the standard significance level of 0.05.
Learn how to interpret the P value correctly and avoid a common mistake!
Related posts : How to Find the P value: Process and Calculations and Types of Errors in Hypothesis Testing
The sample size for a t-test determines the degrees of freedom (DF) for that test, which specifies the t-distribution. The overall effect is that as the sample size decreases, the tails of the t-distribution become thicker. Thicker tails indicate that t-values are more likely to be far from zero even when the null hypothesis is correct. The changing shapes are how t-distributions factor in the greater uncertainty when you have a smaller sample.
You can see this effect in the probability distribution plot below that displays t-distributions for 5 and 30 DF.
Sample means from smaller samples tend to be less precise. In other words, with a smaller sample, it’s less surprising to have an extreme t-value, which affects the probabilities and p-values. A t-value of 2 has a P value of 10.2% and 5.4% for 5 and 30 DF, respectively. Use larger samples!
Click here for step-by-step instructions for how to do t-tests in Excel !
If you like this approach and want to learn about other hypothesis tests, read my posts about:
To see an alternative to traditional hypothesis testing that does not use probability distributions and test statistics, learn about bootstrapping in statistics !
May 25, 2021 at 10:42 pm
what statistical tools, is recommended for measuring the level of satisfaction
May 26, 2021 at 3:55 pm
Hi McKienze,
The correct analysis depends on the nature of the data you have and what you want to learn. You don’t provide enough information to be able to answer the question. However, read my hypothesis testing overview to learn about the options.
August 23, 2020 at 1:33 am
Hi Jim, I want to ask about standardizing data before the t test.. For example I have USD prices of a big Mac across the world and this varies by quite a bit. Doing the t-test here would be misleading since some countries would have a higher mean… Should the approach be standardizing all the usd values? Or perhaps even local values?
August 24, 2020 at 12:37 am
Yes, that makes complete sense. I don’t know what method is best. If you can find a common scale to use for all prices, I’d do that. You’re basically using a data transformation before analysis, which is totally acceptable when you have a good reason.
April 3, 2020 at 4:47 am
Hey Jim. Your blog is one of the only few ones where everything is explained in a simple and well structured manner, in a way that both an absolute beginner and a geek can equally benefit from your writing. Both this article as well as your article on one tailed and two tailed hypothesis tests have been super helpful. Thank you for this post
March 6, 2020 at 11:04 am
Thank you, Jim, for sharing your knowledge with us.
I have a 2 part question. I am testing the difference in walking distance within a busy environment compared with a simple environment. I am also testing walking time within the 2 environments. I am using the same individuals for both scenarios. I was planning to do a paired ttest for distance difference between busy and simple environments and a 2nd paired ttest for time difference between the environments.
My question(s) for you is: 1. Do you feel that a paired ttest is the best choice for these? 2. Do you feel that, because there are 2 tests, I should do a bonferroni correction or do you believe that because the data is completely different (distance as opposed to time), it is okay not to do a multiple comparison test?
August 13, 2019 at 12:43 pm
thank you very eye opening on the use of two or one tailed test
April 19, 2019 at 3:49 pm
Hi Mr. Frost,
Thanks for the breakdown. I have a question … if I wanted to run a test to show that the medical professionals could use more training with data set consisting of questions which in your opinion would be my best route?
January 14, 2019 at 2:22 pm
Hello Jim, I find this statement in this excellent write up contradicting : 1)This graph shows that t-values fall within these areas almost 6% of the time when the null hypothesis is true I mean if this is true the t-value =0 hypothesis is rejected.
January 14, 2019 at 2:51 pm
I can see how that statement sounds contradictory, but I can assure that it is quite accurate. It’s often forgotten but the underlying assumption for the calculations surrounding hypothesis testing, significance levels, and p-values is that the null hypothesis is true.
So, the probabilities shown in the graph that you refer to are based on the assumption that the null hypothesis is true. Further, t-values for this study design have a 6% chance of falling in those critical areas assuming the null is true (a false positive).
Significance levels are defined as the maximum acceptable probability of a false positive. Usually, we set that as 5%. In the example, there’s a large probability of a false positive (6%), so we fail to reject the null hypothesis. In other words, we fail to reject the null because false positives will happen too frequently–where the significance level defines the cutoff point for too frequently.
Keep in mind that when you have statistically significant results, you’re really saying that the results you obtained are improbable enough assuming that the null is true that you can reject the notion that the null is true. But, the math and probabilities are all based on the assumption that the null is true because you need to determine how unlikely your results are under the null hypothesis.
Even the p-value is defined in terms of assuming the null hypothesis is true. You can read about that in my post about interpreting p-values correctly .
I hope this clarifies things!
November 9, 2018 at 2:36 am
Jim …I was involved in in a free SAT/ACT tutoring program that I need to analyze for effectiveness .
I have pre test scores of a number of students and the post test scores after they were tutored (treatment ).
Glenn dowell
November 9, 2018 at 9:05 am
It sounds like you need to perform a paired t-test assuming.
Sample Size Calculator | Chi-Squared Test | Sequential Sampling | 2 Sample T-Test | Survival Times | Count Data
Question: Does the average value differ across two groups?
IMAGES
VIDEO
COMMENTS
Fortunately, a two sample t-test allows us to answer this question. Two Sample t-test: Formula. A two-sample t-test always uses the following null hypothesis: H 0: μ 1 = μ 2 (the two population means are equal) The alternative hypothesis can be either two-tailed, left-tailed, or right-tailed:
Two-Sample T Test Hypotheses. Null hypothesis (H 0): Two population means are equal (µ 1 = µ 2). Alternative hypothesis (H A): Two population means are not equal (µ 1 ≠ µ 2). Again, when the p-value is less than or equal to your significance level, reject the null hypothesis. The difference between the two means is statistically significant.
Fortunately, a two sample t-test allows us to answer this question. Two Sample t-test: Formula. A two-sample t-test always uses the following null hypothesis: H 0: μ 1 = μ 2 (the two population means are equal) The alternative hypothesis can be either two-tailed, left-tailed, or right-tailed:
How Two-Sample T-tests Calculate T-Values. Use the 2-sample t-test when you want to analyze the difference between the means of two independent samples. This test is also known as the independent samples t-test. Click the link to learn more about its hypotheses, assumptions, and interpretations. Like the other t-tests, this procedure reduces ...
The results for the two-sample t-test that assumes equal variances are the same as our calculations earlier. The test statistic is 2.79996. The software shows results for a two-sided test and for one-sided tests. The two-sided test is what we want (Prob > |t|). Our null hypothesis is that the mean body fat for men and women is equal.
Revised on June 22, 2023. A t test is a statistical test that is used to compare the means of two groups. It is often used in hypothesis testing to determine whether a process or treatment actually has an effect on the population of interest, or whether two groups are different from one another. t test example.
Aug 5, 2022. --. 6. Photo by Andrew George on Unsplash. Student's t-tests are commonly used in inferential statistics for testing a hypothesis on the basis of a difference between sample means. However, people often misinterpret the results of t-tests, which leads to false research findings and a lack of reproducibility of studies.
Two sample t-test is also called independent sample t-test or unpaired sample t-test. This tutorial explains what two sample t-test is, its formula, ... Hypothesis for two sample t-test. The null hypothesis is that the underlying population means of two groups are the same: H 0: ...
Independent Samples T Tests Hypotheses. Independent samples t tests have the following hypotheses: Null hypothesis: The means for the two populations are equal. Alternative hypothesis: The means for the two populations are not equal.; If the p-value is less than your significance level (e.g., 0.05), you can reject the null hypothesis. The difference between the two means is statistically ...
The independent t-test, also called the two sample t-test, independent-samples t-test or student's t-test, is an inferential statistical test that determines whether there is a statistically significant difference between the means in two unrelated groups. ... Null and alternative hypotheses for the independent t-test. The null hypothesis for ...
A one-sample t-test (to test the mean of a single group against a hypothesized mean); A two-sample t-test (to compare the means for two groups); or. A paired t-test (to check how the mean from the same group changes after some intervention). Decide on the alternative hypothesis: Two-tailed; Left-tailed; or. Right-tailed.
For the 2-sample t-test, the numerator is again the signal, which is the difference between the means of the two samples. For example, if the mean of group 1 is 10, and the mean of group 2 is 4, the difference is 6. The default null hypothesis for a 2-sample t-test is that the two groups are equal.
The consultant performs a 2-sample t-test to determine whether there is a difference in the patient ratings between the hospitals. Open the sample data, ... The null hypothesis states that the difference between ratings for the two hospitals is 0. Because the p-value is 0.000, which is less than the significance level of 0.05, the consultant ...
The t-test for dependent samples is a statistical test for comparing the means from two dependent populations (or the difference between the means from two populations). The t-test is used when the differences are normally distributed. The samples also must be dependent. The formula for the t-test statistic is: t = D¯−μD (SD n√) t = D ...
00:37:48 - Create a two sample t-test and confidence interval with pooled variances (Example #4) 00:51:23 - Construct a two-sample t-test (Example #5) 00:59:47 - Matched Pair one sample t-test (Example #6) 01:09:38 - Use a match paired hypothesis test and provide a confidence interval for difference of means (Example #7) Practice ...
If this is not the case, you should instead use the Welch's t-test calculator. To perform a two sample t-test, simply fill in the information below and then click the "Calculate" button. Enter raw data Enter summary data. Sample 1. 301, 298, 295, 297, 304, 305, 309, 298, 291, 299, 293, 304. Sample 2.
Complete the following steps to interpret a 2-sample t-test. Key output includes the estimate for difference, the confidence interval, the p-value, and several graphs. ... In these results, the null hypothesis states that the difference in the mean rating between two hospitals is 0. Because the p-value is less than 0.000, which is less than the ...
So an unusual value of t would be anything with absolute value greater than 2.04. Similarly, for 2 d.f. the bars are at +/- 4.3. The brilliance of the t-test is that if the null hypothesis is true then the two sample means and variances and the sample sizes are sufficient to compute the t-value and determine the associated p-value.
Step 3: Select the appropriate test to use. Select the option that says t-Test: Two-Sample Assuming Equal Variances and then click OK. Step 4: Enter the necessary info. Enter the range of values for Variable 1 (our first sample), Variable 2 (our second sample), the hypothesized mean difference (in this case we put "0" because we want to ...
h = ttest2(x,y) returns a test decision for the null hypothesis that the data in vectors x and y comes from independent random samples from normal distributions with equal means and equal but unknown variances, using the two-sample t-test.The alternative hypothesis is that the data in x and y comes from populations with unequal means. The result h is 1 if the test rejects the null hypothesis ...
Two-tailed tests allow you to assess whether the sample mean is greater than or less than the target value in a 1-sample t-test. A one-tailed hypothesis test can only determine statistical significance for one or the other. Additionally, it is possible to calculate a probability only for a range of t-values. On a probability distribution plot ...
If the experiment is repeated many times, the confidence level is the percent of the time each sample's mean will fall within the confidence interval. It is also the percent of the time the hypothesis will be accepted (i.e., no difference detected), assuming the hypothesis is correct. Visual, interactive two-sample t-test for comparing the ...
by Zach Bobbitt August 3, 2022. A two sample t-test is used to test whether or not the means of two populations are equal. You can use the following basic syntax to perform a two sample t-test in R: t.test(group1, group2, var.equal=TRUE) Note: By specifying var.equal=TRUE, we tell R to assume that the variances are equal between the two samples.
Illustration of the Kolmogorov-Smirnov statistic. The red line is a model CDF, the blue line is an empirical CDF, and the black arrow is the KS statistic.. Kolmogorov-Smirnov test (K-S test or KS test) is a nonparametric test of the equality of continuous (or discontinuous, see Section 2.2), one-dimensional probability distributions that can be used to test whether a sample came from a ...