Pardon Our Interruption

As you were browsing something about your browser made us think you were a bot. There are a few reasons this might happen:

  • You've disabled JavaScript in your web browser.
  • You're a power user moving through this website with super-human speed.
  • You've disabled cookies in your web browser.
  • A third-party browser plugin, such as Ghostery or NoScript, is preventing JavaScript from running. Additional information is available in this support article .

To regain access, please make sure that cookies and JavaScript are enabled before reloading the page.

  • Skip to secondary menu
  • Skip to main content
  • Skip to primary sidebar

Statistics By Jim

Making statistics intuitive

Correlational Study Overview & Examples

By Jim Frost 2 Comments

What is a Correlational Study?

A correlational study is an experimental design that evaluates only the correlation between variables. The researchers record measurements but do not control or manipulate the variables. Correlational research is a form of observational study .

A correlation indicates that as the value of one variable increases, the other tends to change in a specific direction:

  • Positive correlation : Two variables increase or decrease together (as height increases, weight tends to increase).
  • Negative correlation : As one variable increases, the other tends to decrease (as school absences increase, grades tend to fall).
  • No correlation : No relationship exists between the two variables. As one increases, the other does not change in a specific direction (as absences increase, height doesn’t tend to increase or decrease).

Correlational study results showing a positive trend.

For example, researchers conducting correlational research explored the relationship between social media usage and levels of anxiety in young adults. Participants reported their demographic information and daily time on various social media platforms and completed a standardized anxiety assessment tool.

The correlational study looked for relationships between social media usage and anxiety. Is increased social media usage associated with higher anxiety? Is it worse for particular demographics?

Learn more about Interpreting Correlation .

Using Correlational Research

Correlational research design is crucial in various disciplines, notably psychology and medicine. This type of design is generally cheaper, easier, and quicker to conduct than an experiment because the researchers don’t control any variables or conditions. Consequently, these studies often serve as an initial assessment, especially when random assignment and controlling variables for a true experiment are not feasible or unethical.

However, an unfortunate aspect of a correlational study is its limitation in establishing causation. While these studies can reveal connections between variables, they cannot prove that altering one variable will cause changes in another. Hence, correlational research can determine whether relationships exist but cannot confirm causality.

Remember, correlation doesn’t necessarily imply causation !

Correlational Study vs Experiment

The difference between the two designs is simple.

In a correlational study, the researchers don’t systematically control any variables. They’re simply observing events and do not want to influence outcomes.

In an experiment, researchers manipulate variables and explicitly hope to affect the outcomes. For example, they might control the treatment condition by giving a medication or placebo to each subject. They also randomly assign subjects to the control and treatment groups, which helps establish causality.

Learn more about Randomized Controlled Trials (RCTs) , which statisticians consider to be true experiments.

Types of Correlation Studies and Examples

Researchers divide these studies into three broad types.

Secondary Data Sources

One approach to correlational research is to utilize pre-existing data, which may include official records, public polls, or data from earlier studies. This method can be cost-effective and time-efficient because other researchers have already gathered the data. These existing data sources can provide large sample sizes and longitudinal data , thereby showing relationship trends.

However, it also comes with potential drawbacks. The data may be incomplete or irrelevant to the new research question. Additionally, as a researcher, you won’t have control over the original data collection methods, potentially impacting the data’s reliability and validity .

Using existing data makes this approach a retrospective study .

Surveys in Correlation Research

Surveys are a great way to collect data for correlational studies while using a consistent instrument across all respondents. You can use various formats, such as in-person, online, and by phone. And you can ask the questions necessary to obtain the particular variables you need for your project. In short, it’s easy to customize surveys to match your study’s requirements.

However, you’ll need to carefully word all the questions to be clear and not introduce bias in the results. This process can take multiple iterations and pilot studies to produce the finished survey.

For example, you can use a survey to find correlations between various demographic variables and political opinions.

Naturalistic Observation

Naturalistic observation is a method of collecting field data for a correlational study. Researchers observe and measure variables in a natural environment. The process can include counting events, categorizing behavior, and describing outcomes without interfering with the activities.

For example, researchers might observe and record children’s behavior after watching television. Does a relationship exist between the type of television program and behaviors?

Naturalistic observations occur in a prospective study .

Analyzing Data from a Correlational Study

Statistical analysis of correlational research frequently involves correlation and regression analysis .

A correlation coefficient describes the strength and direction of the relationship between two variables with a single number.

Regression analysis can evaluate how multiple variables relate to a single outcome. For example, in the social media correlational study example, how do the demographic variables and daily social media usage collectively correlate with anxiety?

Curtis EA, Comiskey C, Dempsey O.  Importance and use of correlational research .  Nurse Researcher . 2016;23(6):20-25. doi:10.7748/nr.2016.e1382

Share this:

example of correlation hypothesis

Reader Interactions

' src=

January 14, 2024 at 4:34 pm

Hi Jim. Have you written a blog note dedicated to clinical trials? If not, besides the note on hypothesis testing, are there other blogs ypo have written that touch on clinical trials?

' src=

January 14, 2024 at 5:49 pm

Hi Stan, I haven’t written a blog post specifically about clinical trials, but I have the following related posts:

Randomized Controlled Trials Clinical Trial about a COVID vaccine Clinical Trials about flu vaccines

Comments and Questions Cancel reply

Learning Materials

  • Business Studies
  • Combined Science
  • Computer Science
  • Engineering
  • English Literature
  • Environmental Science
  • Human Geography
  • Macroeconomics
  • Microeconomics
  • Hypothesis Test for Correlation

Let's look at the hypothesis test for correlation, including the hypothesis test for correlation coefficient, the hypothesis test for negative correlation and the null hypothesis for correlation test.

Millions of flashcards designed to help you ace your studies

Review generated flashcards

to start learning or create your own AI flashcards

Start learning or create your own AI flashcards

  • Applied Mathematics
  • Decision Maths
  • Discrete Mathematics
  • Logic and Functions
  • Mechanics Maths
  • Probability and Statistics
  • Bayesian Statistics
  • Bias in Experiments
  • Binomial Distribution
  • Binomial Hypothesis Test
  • Biostatistics
  • Bivariate Data
  • Categorical Data Analysis
  • Categorical Variables
  • Causal Inference
  • Central Limit Theorem
  • Chi Square Test for Goodness of Fit
  • Chi Square Test for Homogeneity
  • Chi Square Test for Independence
  • Chi-Square Distribution
  • Cluster Analysis
  • Combining Random Variables
  • Comparing Data
  • Comparing Two Means Hypothesis Testing
  • Conditional Probability
  • Conducting A Study
  • Conducting a Survey
  • Conducting an Experiment
  • Confidence Interval for Population Mean
  • Confidence Interval for Population Proportion
  • Confidence Interval for Slope of Regression Line
  • Confidence Interval for the Difference of Two Means
  • Confidence Intervals
  • Correlation Math
  • Cox Regression
  • Cumulative Distribution Function
  • Cumulative Frequency
  • Data Analysis
  • Data Interpretation
  • Decision Theory
  • Degrees of Freedom
  • Discrete Random Variable
  • Discriminant Analysis
  • Distributions
  • Empirical Bayes Methods
  • Empirical Rule
  • Errors In Hypothesis Testing
  • Estimation Theory
  • Estimator Bias
  • Events (Probability)
  • Experimental Design
  • Factor Analysis
  • Frequency Polygons
  • Generalization and Conclusions
  • Geometric Distribution
  • Geostatistics
  • Hierarchical Modeling
  • Hypothesis Test for Regression Slope
  • Hypothesis Test of Two Population Proportions
  • Hypothesis Testing
  • Inference For Distributions Of Categorical Data
  • Inferences in Statistics
  • Item Response Theory
  • Kaplan-Meier Estimate
  • Kernel Density Estimation
  • Large Data Set
  • Lasso Regression
  • Latent Variable Models
  • Least Squares Linear Regression
  • Linear Interpolation
  • Linear Regression
  • Logistic Regression
  • Machine Learning
  • Mann-Whitney Test
  • Markov Chains
  • Mean and Variance of Poisson Distributions
  • Measures of Central Tendency
  • Methods of Data Collection
  • Mixed Models
  • Multilevel Modeling
  • Multivariate Analysis
  • Neyman-Pearson Lemma
  • Non-parametric Methods
  • Normal Distribution
  • Normal Distribution Hypothesis Test
  • Normal Distribution Percentile
  • Ordinal Regression
  • Paired T-Test
  • Parametric Methods
  • Path Analysis
  • Point Estimation
  • Poisson Regression
  • Principle Components Analysis
  • Probability
  • Probability Calculations
  • Probability Density Function
  • Probability Distribution
  • Probability Generating Function
  • Product Moment Correlation Coefficient
  • Quantile Regression
  • Quantitative Variables
  • Random Effects Model
  • Random Variables
  • Randomized Block Design
  • Regression Analysis
  • Residual Sum of Squares
  • Robust Statistics
  • Sample Mean
  • Sample Proportion
  • Sampling Distribution
  • Sampling Theory
  • Scatter Graphs
  • Sequential Analysis
  • Single Variable Data
  • Spearman's Rank Correlation
  • Spearman's Rank Correlation Coefficient
  • Standard Deviation
  • Standard Error
  • Standard Normal Distribution
  • Statistical Graphs
  • Statistical Inference
  • Statistical Measures
  • Stem and Leaf Graph
  • Stochastic Processes
  • Structural Equation Modeling
  • Sum of Independent Random Variables
  • Survey Bias
  • Survival Analysis
  • Survivor Function
  • T-distribution
  • The Power Function
  • Time Series Analysis
  • Transforming Random Variables
  • Tree Diagram
  • Two Categorical Variables
  • Two Quantitative Variables
  • Type I Error
  • Type II Error
  • Types of Data in Statistics
  • Variance for Binomial Distribution
  • Venn Diagrams
  • Wilcoxon Test
  • Zero-Inflated Models
  • Theoretical and Mathematical Physics

What is the hypothesis test for correlation coefficient?

When given a sample of bivariate data (data which include two variables), it is possible to calculate how linearly correlated the data are, using a correlation coefficient.

The product moment correlation coefficient (PMCC) describes the extent to which one variable correlates with another. In other words, the strength of the correlation between two variables. The PMCC for a sample of data is denoted by r , while the PMCC for a population is denoted by ρ.

The PMCC is limited to values between -1 and 1 (included).

If r = 1 , there is a perfect positive linear correlation. All points lie on a straight line with a positive gradient, and the higher one of the variables is, the higher the other.

If r = 0 , there is no linear correlation between the variables.

If r = - 1 , there is a perfect negative linear correlation. All points lie on a straight line with a negative gradient, and the higher one of the variables is, the lower the other.

Correlation is not equivalent to causation, but a PMCC close to 1 or -1 can indicate that there is a higher likelihood that two variables are related.

statistics bivariate data correlation null positive negative graphs StudySmarter

The PMCC should be able to be calculated using a graphics calculator by finding the regression line of y on x, and hence finding r (this value is automatically calculated by the calculator), or by using the formula r = S x y S x x S y y , which is in the formula booklet. The closer r is to 1 or -1, the stronger the correlation between the variables, and hence the more closely associated the variables are. You need to be able to carry out hypothesis tests on a sample of bivariate data to determine if we can establish a linear relationship for an entire population. By calculating the PMCC, and comparing it to a critical value, it is possible to determine the likelihood of a linear relationship existing.

What is the hypothesis test for negative correlation?

To conduct a hypothesis test, a number of keywords must be understood:

Null hypothesis ( H 0 ) : the hypothesis assumed to be correct until proven otherwise

Alternative hypothesis ( H 1 ) : the conclusion made if H 0 is rejected.

Hypothesis test: a mathematical procedure to examine a value of a population parameter proposed by the null hypothesis compared to the alternative hypothesis.

Test statistic: is calculated from the sample and tested in cumulative probability tables or with the normal distribution as the last part of the significance test.

Critical region: the range of values that lead to the rejection of the null hypothesis.

Significance level: the actual significance level is the probability of rejecting H 0 when it is in fact true.

The null hypothesis is also known as the 'working hypothesis'. It is what we assume to be true for the purpose of the test, or until proven otherwise.

The alternative hypothesis is what is concluded if the null hypothesis is rejected. It also determines whether the test is one-tailed or two-tailed.

A one-tailed test allows for the possibility of an effect in one direction, while two-tailed tests allow for the possibility of an effect in two directions, in other words, both in the positive and the negative directions. Method: A series of steps must be followed to determine the existence of a linear relationship between 2 variables. 1 . Write down the null and alternative hypotheses ( H 0 a n d H 1 ). The null hypothesis is always ρ = 0 , while the alternative hypothesis depends on what is asked in the question. Both hypotheses must be stated in symbols only (not in words).

2 . Using a calculator, work out the value of the PMCC of the sample data, r .

3 . Use the significance level and sample size to figure out the critical value. This can be found in the PMCC table in the formula booklet.

4 . Take the absolute value of the PMCC and r , and compare these to the critical value. If the absolute value is greater than the critical value, the null hypothesis should be rejected. Otherwise, the null hypothesis should be accepted.

5 . Write a full conclusion in the context of the question. The conclusion should be stated in full: both in statistical language and in words reflecting the context of the question. A negative correlation signifies that the alternative hypothesis is rejected: the lack of one variable correlates with a stronger presence of the other variable, whereas, when there is a positive correlation, the presence of one variable correlates with the presence of the other.

How to interpret results based on the null hypothesis

From the observed results (test statistic), a decision must be made, determining whether to reject the null hypothesis or not.

hypothesis test for correlation probability of observed result studysmarter

Both the one-tailed and two-tailed tests are shown at the 5% level of significance. However, the 5% is distributed in both the positive and negative side in the two-tailed test, and solely on the positive side in the one-tailed test.

From the null hypothesis, the result could lie anywhere on the graph. If the observed result lies in the shaded area, the test statistic is significant at 5%, in other words, we reject H 0 . Therefore, H 0 could actually be true but it is still rejected. Hence, the significance level, 5%, is the probability that H 0 is rejected even though it is true, in other words, the probability that H 0 is incorrectly rejected. When H 0 is rejected, H 1 (the alternative hypothesis) is used to write the conclusion.

We can define the null and alternative hypotheses for one-tailed and two-tailed tests:

For a one-tailed test:

  • H 0 : ρ = 0 : H 1 ρ > 0 o r
  • H 0 : ρ = 0 : H 1 ρ < 0

For a two-tailed test:

  • H 0 : ρ = 0 : H 1 ρ ≠ 0

Let us look at an example of testing for correlation.

12 students sat two biology tests: one was theoretical and the other was practical. The results are shown in the table.

a) Find the product moment correlation coefficient for this data, to 3 significant figures.

b) A teacher claims that students who do well in the theoretical test tend to do well in the practical test. Test this claim at the 0.05 level of significance, clearly stating your hypotheses.

a) Using a calculator, we find the PMCC (enter the data into two lists and calculate the regression line. the PMCC will appear). r = 0.935 to 3 sign. figures

b) We are testing for a positive correlation, since the claim is that a higher score in the theoretical test is associated with a higher score in the practical test. We will now use the five steps we previously looked at.

1. State the null and alternative hypotheses. H 0 : ρ = 0 and H 1 : ρ > 0

2. Calculate the PMCC. From part a), r = 0.935

3. Figure out the critical value from the sample size and significance level. The sample size, n , is 12. The significance level is 5%. The hypothesis is one-tailed since we are only testing for positive correlation. Using the table from the formula booklet, the critical value is shown to be cv = 0.4973

4. The absolute value of the PMCC is 0.935, which is larger than 0.4973. Since the PMCC is larger than the critical value at the 5% level of significance, we can reach a conclusion.

5. Since the PMCC is larger than the critical value, we choose to reject the null hypothesis. We can conclude that there is significant evidence to support the claim that students who do well in the theoretical biology test also tend to do well in the practical biology test.

Let us look at a second example.

A tetrahedral die (four faces) is rolled 40 times and 6 'ones' are observed. Is there any evidence at the 10% level that the probability of a score of 1 is less than a quarter?

The expected mean is 10 = 40 × 1 4 . The question asks whether the observed result (test statistic 6 is unusually low.

We now follow the same series of steps.

1. State the null and alternative hypotheses. H 0 : ρ = 0 and H 1 : ρ <0.25

2. We cannot calculate the PMCC since we are only given data for the frequency of 'ones'.

3. A one-tailed test is required ( ρ < 0.25) at the 10% significance level. We can convert this to a binomial distribution in which X is the number of 'ones' so X ~ B ( 40 , 0 . 25 ) , we then use the cumulative binomial tables. The observed value is X = 6. To P ( X ≤ 6 ' o n e s ' i n 40 r o l l s ) = 0 . 0962 .

4. Since 0.0962, or 9.62% <10%, the observed result lies in the critical region.

5. We reject and accept the alternative hypothesis. We conclude that there is evidence to show that the probability of rolling a 'one' is less than 1 4

Hypothesis Test for Correlation - Key takeaways

  • The Product Moment Correlation Coefficient (PMCC), or r , is a measure of how strongly related 2 variables are. It ranges between -1 and 1, indicating the strength of a correlation.
  • The closer r is to 1 or -1 the stronger the (positive or negative) correlation between two variables.
  • The null hypothesis is the hypothesis that is assumed to be correct until proven otherwise. It states that there is no correlation between the variables.
  • The alternative hypothesis is that which is accepted when the null hypothesis is rejected. It can be either one-tailed (looking at one outcome) or two-tailed (looking at both outcomes – positive and negative).
  • If the significance level is 5%, this means that there is a 5% chance that the null hypothesis is incorrectly rejected.

Images One-tailed test: https://en.wikipedia.org/w/index.php?curid=35569621

Hypothesis Test for Correlation

Learn with 0 Hypothesis Test for Correlation flashcards in the free StudySmarter app

We have 14,000 flashcards about Dynamic Landscapes.

Already have an account? Log in

Frequently Asked Questions about Hypothesis Test for Correlation

Is the Pearson correlation a hypothesis test?

Yes. The Pearson correlation produces a PMCC value, or r   value, which indicates the strength of the relationship between two variables.

Can we test a hypothesis with correlation?

Yes. Correlation is not equivalent to causation, however we can test hypotheses to determine whether a correlation (or association) exists between two variables.

How do you set up the hypothesis test for correlation?

You need a null (p = 0) and alternative hypothesis. The PMCC, or r value must be calculated, based on the sample data. Based on the significance level and sample size, the critical value can be worked out from a table of values in the formula booklet. Finally the r value and critical value can be compared to determine which hypothesis is accepted.

Discover learning materials with the free StudySmarter app

1

About StudySmarter

StudySmarter is a globally recognized educational technology company, offering a holistic learning platform designed for students of all ages and educational levels. Our platform provides learning support for a wide range of subjects, including STEM, Social Sciences, and Languages and also helps students to successfully master various tests and exams worldwide, such as GCSE, A Level, SAT, ACT, Abitur, and more. We offer an extensive library of learning materials, including interactive flashcards, comprehensive textbook solutions, and detailed explanations. The cutting-edge technology and tools we provide help students create their own learning materials. StudySmarter’s content is not only expert-verified but also regularly updated to ensure accuracy and relevance.

Hypothesis Test for Correlation

StudySmarter Editorial Team

Team Math Teachers

  • 9 minutes reading time
  • Checked by StudySmarter Editorial Team

Study anywhere. Anytime.Across all devices.

Create a free account to save this explanation..

Save explanations to your personalised space and access them anytime, anywhere!

By signing up, you agree to the Terms and Conditions and the Privacy Policy of StudySmarter.

Sign up to highlight and take notes. It’s 100% free.

Join over 22 million students in learning with our StudySmarter App

The first learning app that truly has everything you need to ace your exams in one place

  • Flashcards & Quizzes
  • AI Study Assistant
  • Study Planner
  • Smart Note-Taking

Join over 22 million students in learning with our StudySmarter App

13.2 Testing the Significance of the Correlation Coefficient

The correlation coefficient, r , tells us about the strength and direction of the linear relationship between X 1 and X 2 .

The sample data are used to compute r , the correlation coefficient for the sample. If we had data for the entire population, we could find the population correlation coefficient. But because we have only sample data, we cannot calculate the population correlation coefficient. The sample correlation coefficient, r , is our estimate of the unknown population correlation coefficient.

  • ρ = population correlation coefficient (unknown)
  • r = sample correlation coefficient (known; calculated from sample data)

The hypothesis test lets us decide whether the value of the population correlation coefficient ρ is "close to zero" or "significantly different from zero". We decide this based on the sample correlation coefficient r and the sample size n .

If the test concludes that the correlation coefficient is significantly different from zero, we say that the correlation coefficient is "significant."

  • Conclusion: There is sufficient evidence to conclude that there is a significant linear relationship between X 1 and X 2 because the correlation coefficient is significantly different from zero.
  • What the conclusion means: There is a significant linear relationship X 1 and X 2 . If the test concludes that the correlation coefficient is not significantly different from zero (it is close to zero), we say that correlation coefficient is "not significant".

Performing the Hypothesis Test

  • Null Hypothesis: H 0 : ρ = 0
  • Alternate Hypothesis: H a : ρ ≠ 0
  • Null Hypothesis H 0 : The population correlation coefficient IS NOT significantly different from zero. There IS NOT a significant linear relationship (correlation) between X 1 and X 2 in the population.
  • Alternate Hypothesis H a : The population correlation coefficient is significantly different from zero. There is a significant linear relationship (correlation) between X 1 and X 2 in the population.

Drawing a Conclusion There are two methods of making the decision concerning the hypothesis. The test statistic to test this hypothesis is:

Where the second formula is an equivalent form of the test statistic, n is the sample size and the degrees of freedom are n-2. This is a t-statistic and operates in the same way as other t tests. Calculate the t-value and compare that with the critical value from the t-table at the appropriate degrees of freedom and the level of confidence you wish to maintain. If the calculated value is in the tail then cannot accept the null hypothesis that there is no linear relationship between these two independent random variables. If the calculated t-value is NOT in the tailed then cannot reject the null hypothesis that there is no linear relationship between the two variables.

A quick shorthand way to test correlations is the relationship between the sample size and the correlation. If:

then this implies that the correlation between the two variables demonstrates that a linear relationship exists and is statistically significant at approximately the 0.05 level of significance. As the formula indicates, there is an inverse relationship between the sample size and the required correlation for significance of a linear relationship. With only 10 observations, the required correlation for significance is 0.6325, for 30 observations the required correlation for significance decreases to 0.3651 and at 100 observations the required level is only 0.2000.

Correlations may be helpful in visualizing the data, but are not appropriately used to "explain" a relationship between two variables. Perhaps no single statistic is more misused than the correlation coefficient. Citing correlations between health conditions and everything from place of residence to eye color have the effect of implying a cause and effect relationship. This simply cannot be accomplished with a correlation coefficient. The correlation coefficient is, of course, innocent of this misinterpretation. It is the duty of the analyst to use a statistic that is designed to test for cause and effect relationships and report only those results if they are intending to make such a claim. The problem is that passing this more rigorous test is difficult so lazy and/or unscrupulous "researchers" fall back on correlations when they cannot make their case legitimately.

This book may not be used in the training of large language models or otherwise be ingested into large language models or generative AI offerings without OpenStax's permission.

Want to cite, share, or modify this book? This book uses the Creative Commons Attribution License and you must attribute OpenStax.

Access for free at https://openstax.org/books/introductory-business-statistics-2e/pages/1-introduction
  • Authors: Alexander Holmes, Barbara Illowsky, Susan Dean
  • Publisher/website: OpenStax
  • Book title: Introductory Business Statistics 2e
  • Publication date: Dec 13, 2023
  • Location: Houston, Texas
  • Book URL: https://openstax.org/books/introductory-business-statistics-2e/pages/1-introduction
  • Section URL: https://openstax.org/books/introductory-business-statistics-2e/pages/13-2-testing-the-significance-of-the-correlation-coefficient

© Jul 18, 2024 OpenStax. Textbook content produced by OpenStax is licensed under a Creative Commons Attribution License . The OpenStax name, OpenStax logo, OpenStax book covers, OpenStax CNX name, and OpenStax CNX logo are not subject to the Creative Commons license and may not be reproduced without the prior and express written consent of Rice University.

Sciencing_Icons_Science SCIENCE

Sciencing_icons_biology biology, sciencing_icons_cells cells, sciencing_icons_molecular molecular, sciencing_icons_microorganisms microorganisms, sciencing_icons_genetics genetics, sciencing_icons_human body human body, sciencing_icons_ecology ecology, sciencing_icons_chemistry chemistry, sciencing_icons_atomic &amp; molecular structure atomic & molecular structure, sciencing_icons_bonds bonds, sciencing_icons_reactions reactions, sciencing_icons_stoichiometry stoichiometry, sciencing_icons_solutions solutions, sciencing_icons_acids &amp; bases acids & bases, sciencing_icons_thermodynamics thermodynamics, sciencing_icons_organic chemistry organic chemistry, sciencing_icons_physics physics, sciencing_icons_fundamentals-physics fundamentals, sciencing_icons_electronics electronics, sciencing_icons_waves waves, sciencing_icons_energy energy, sciencing_icons_fluid fluid, sciencing_icons_astronomy astronomy, sciencing_icons_geology geology, sciencing_icons_fundamentals-geology fundamentals, sciencing_icons_minerals &amp; rocks minerals & rocks, sciencing_icons_earth scructure earth structure, sciencing_icons_fossils fossils, sciencing_icons_natural disasters natural disasters, sciencing_icons_nature nature, sciencing_icons_ecosystems ecosystems, sciencing_icons_environment environment, sciencing_icons_insects insects, sciencing_icons_plants &amp; mushrooms plants & mushrooms, sciencing_icons_animals animals, sciencing_icons_math math, sciencing_icons_arithmetic arithmetic, sciencing_icons_addition &amp; subtraction addition & subtraction, sciencing_icons_multiplication &amp; division multiplication & division, sciencing_icons_decimals decimals, sciencing_icons_fractions fractions, sciencing_icons_conversions conversions, sciencing_icons_algebra algebra, sciencing_icons_working with units working with units, sciencing_icons_equations &amp; expressions equations & expressions, sciencing_icons_ratios &amp; proportions ratios & proportions, sciencing_icons_inequalities inequalities, sciencing_icons_exponents &amp; logarithms exponents & logarithms, sciencing_icons_factorization factorization, sciencing_icons_functions functions, sciencing_icons_linear equations linear equations, sciencing_icons_graphs graphs, sciencing_icons_quadratics quadratics, sciencing_icons_polynomials polynomials, sciencing_icons_geometry geometry, sciencing_icons_fundamentals-geometry fundamentals, sciencing_icons_cartesian cartesian, sciencing_icons_circles circles, sciencing_icons_solids solids, sciencing_icons_trigonometry trigonometry, sciencing_icons_probability-statistics probability & statistics, sciencing_icons_mean-median-mode mean/median/mode, sciencing_icons_independent-dependent variables independent/dependent variables, sciencing_icons_deviation deviation, sciencing_icons_correlation correlation, sciencing_icons_sampling sampling, sciencing_icons_distributions distributions, sciencing_icons_probability probability, sciencing_icons_calculus calculus, sciencing_icons_differentiation-integration differentiation/integration, sciencing_icons_application application, sciencing_icons_projects projects, sciencing_icons_news news.

  • Share Tweet Email Print
  • Home ⋅
  • Math ⋅
  • Probability & Statistics ⋅
  • Distributions

How to Write a Hypothesis for Correlation

A hypothesis for correlation predicts a statistically significant relationship.

How to Calculate a P-Value

A hypothesis is a testable statement about how something works in the natural world. While some hypotheses predict a causal relationship between two variables, other hypotheses predict a correlation between them. According to the Research Methods Knowledge Base, a correlation is a single number that describes the relationship between two variables. If you do not predict a causal relationship or cannot measure one objectively, state clearly in your hypothesis that you are merely predicting a correlation.

Research the topic in depth before forming a hypothesis. Without adequate knowledge about the subject matter, you will not be able to decide whether to write a hypothesis for correlation or causation. Read the findings of similar experiments before writing your own hypothesis.

Identify the independent variable and dependent variable. Your hypothesis will be concerned with what happens to the dependent variable when a change is made in the independent variable. In a correlation, the two variables undergo changes at the same time in a significant number of cases. However, this does not mean that the change in the independent variable causes the change in the dependent variable.

Construct an experiment to test your hypothesis. In a correlative experiment, you must be able to measure the exact relationship between two variables. This means you will need to find out how often a change occurs in both variables in terms of a specific percentage.

Establish the requirements of the experiment with regard to statistical significance. Instruct readers exactly how often the variables must correlate to reach a high enough level of statistical significance. This number will vary considerably depending on the field. In a highly technical scientific study, for instance, the variables may need to correlate 98 percent of the time; but in a sociological study, 90 percent correlation may suffice. Look at other studies in your particular field to determine the requirements for statistical significance.

State the null hypothesis. The null hypothesis gives an exact value that implies there is no correlation between the two variables. If the results show a percentage equal to or lower than the value of the null hypothesis, then the variables are not proven to correlate.

Record and summarize the results of your experiment. State whether or not the experiment met the minimum requirements of your hypothesis in terms of both percentage and significance.

Related Articles

How to determine the sample size in a quantitative..., how to calculate a two-tailed test, how to interpret a student's t-test results, how to know if something is significant using spss, quantitative vs. qualitative data and laboratory testing, similarities of univariate & multivariate statistical..., what is the meaning of sample size, distinguishing between descriptive & causal studies, how to calculate cv values, how to determine your practice clep score, what are the different types of correlations, how to calculate p-hat, how to calculate percentage error, how to calculate percent relative range, how to calculate a sample size population, how to calculate bias, how to calculate the percentage of another number, how to find y value for the slope of a line, advantages & disadvantages of finding variance.

  • University of New England; Steps in Hypothesis Testing for Correlation; 2000
  • Research Methods Knowledge Base; Correlation; William M.K. Trochim; 2006
  • Science Buddies; Hypothesis

About the Author

Brian Gabriel has been a writer and blogger since 2009, contributing to various online publications. He earned his Bachelor of Arts in history from Whitworth University.

Photo Credits

Thinkstock/Comstock/Getty Images

Find Your Next Great Science Fair Project! GO

Module 12: Linear Regression and Correlation

Hypothesis test for correlation, learning outcomes.

  • Conduct a linear regression t-test using p-values and critical values and interpret the conclusion in context

The correlation coefficient,  r , tells us about the strength and direction of the linear relationship between x and y . However, the reliability of the linear model also depends on how many observed data points are in the sample. We need to look at both the value of the correlation coefficient r and the sample size n , together.

We perform a hypothesis test of the “ significance of the correlation coefficient ” to decide whether the linear relationship in the sample data is strong enough to use to model the relationship in the population.

The sample data are used to compute  r , the correlation coefficient for the sample. If we had data for the entire population, we could find the population correlation coefficient. But because we only have sample data, we cannot calculate the population correlation coefficient. The sample correlation coefficient, r , is our estimate of the unknown population correlation coefficient.

  • The symbol for the population correlation coefficient is ρ , the Greek letter “rho.”
  • ρ = population correlation coefficient (unknown)
  • r = sample correlation coefficient (known; calculated from sample data)

The hypothesis test lets us decide whether the value of the population correlation coefficient  ρ is “close to zero” or “significantly different from zero.” We decide this based on the sample correlation coefficient r and the sample size n .

If the test concludes that the correlation coefficient is significantly different from zero, we say that the correlation coefficient is “significant.”

  • Conclusion: There is sufficient evidence to conclude that there is a significant linear relationship between x and y because the correlation coefficient is significantly different from zero.
  • What the conclusion means: There is a significant linear relationship between x and y . We can use the regression line to model the linear relationship between x and y in the population.

If the test concludes that the correlation coefficient is not significantly different from zero (it is close to zero), we say that the correlation coefficient is “not significant.”

  • Conclusion: “There is insufficient evidence to conclude that there is a significant linear relationship between x and y because the correlation coefficient is not significantly different from zero.”
  • What the conclusion means: There is not a significant linear relationship between x and y . Therefore, we CANNOT use the regression line to model a linear relationship between x and y in the population.
  • If r is significant and the scatter plot shows a linear trend, the line can be used to predict the value of y for values of x that are within the domain of observed x values.
  • If r is not significant OR if the scatter plot does not show a linear trend, the line should not be used for prediction.
  • If r is significant and if the scatter plot shows a linear trend, the line may NOT be appropriate or reliable for prediction OUTSIDE the domain of observed x values in the data.

Performing the Hypothesis Test

  • Null Hypothesis: H 0 : ρ = 0
  • Alternate Hypothesis: H a : ρ ≠ 0

What the Hypotheses Mean in Words

  • Null Hypothesis H 0 : The population correlation coefficient IS NOT significantly different from zero. There IS NOT a significant linear relationship (correlation) between x and y in the population.
  • Alternate Hypothesis H a : The population correlation coefficient IS significantly DIFFERENT FROM zero. There IS A SIGNIFICANT LINEAR RELATIONSHIP (correlation) between x and y in the population.

Drawing a Conclusion

There are two methods of making the decision. The two methods are equivalent and give the same result.

  • Method 1: Using the p -value
  • Method 2: Using a table of critical values

In this chapter of this textbook, we will always use a significance level of 5%,  α = 0.05

Using the  p -value method, you could choose any appropriate significance level you want; you are not limited to using α = 0.05. But the table of critical values provided in this textbook assumes that we are using a significance level of 5%, α = 0.05. (If we wanted to use a different significance level than 5% with the critical value method, we would need different tables of critical values that are not provided in this textbook).

Method 1: Using a p -value to make a decision

Using the ti-83, 83+, 84, 84+ calculator.

To calculate the  p -value using LinRegTTEST:

  • On the LinRegTTEST input screen, on the line prompt for β or ρ , highlight “≠ 0”
  • The output screen shows the p-value on the line that reads “p =”.
  • (Most computer statistical software can calculate the  p -value).

If the p -value is less than the significance level ( α = 0.05)

  • Decision: Reject the null hypothesis.
  • Conclusion: “There is sufficient evidence to conclude that there is a significant linear relationship between x and y because the correlation coefficient is significantly different from zero.”

If the p -value is NOT less than the significance level ( α = 0.05)

  • Decision: DO NOT REJECT the null hypothesis.
  • Conclusion: “There is insufficient evidence to conclude that there is a significant linear relationship between x and y because the correlation coefficient is NOT significantly different from zero.”

Calculation Notes:

  • You will use technology to calculate the p -value. The following describes the calculations to compute the test statistics and the p -value:
  • The p -value is calculated using a t -distribution with n – 2 degrees of freedom.
  • The formula for the test statistic is [latex]\displaystyle{t}=\dfrac{{{r}\sqrt{{{n}-{2}}}}}{\sqrt{{{1}-{r}^{{2}}}}}[/latex]. The value of the test statistic, t , is shown in the computer or calculator output along with the p -value. The test statistic t has the same sign as the correlation coefficient r .
  • The p -value is the combined area in both tails.

Recall: ORDER OF OPERATIONS

parentheses exponents multiplication division addition subtraction
[latex]( \ )[/latex] [latex]x^2[/latex] [latex]\times \ \mathrm{or} \ \div[/latex] [latex]+ \ \mathrm{or} \ -[/latex]

1st find the numerator:

Step 1: Find [latex]n-2[/latex], and then take the square root.

Step 2: Multiply the value in Step 1 by [latex]r[/latex].

2nd find the denominator: 

Step 3: Find the square of [latex]r[/latex], which is [latex]r[/latex] multiplied by [latex]r[/latex].

Step 4: Subtract this value from 1, [latex]1 -r^2[/latex].

Step 5: Find the square root of Step 4.

3rd take the numerator and divide by the denominator.

An alternative way to calculate the  p -value (p) given by LinRegTTest is the command 2*tcdf(abs(t),10^99, n-2) in 2nd DISTR.

THIRD-EXAM vs FINAL-EXAM EXAM:  p- value method

  • Consider the  third exam/final exam example (example 2).
  • The line of best fit is: [latex]\hat{y}[/latex] = -173.51 + 4.83 x  with  r  = 0.6631 and there are  n  = 11 data points.
  • Can the regression line be used for prediction?  Given a third exam score ( x  value), can we use the line to predict the final exam score (predicted  y  value)?
  • H 0 :  ρ  = 0
  • H a :  ρ  ≠ 0
  • The  p -value is 0.026 (from LinRegTTest on your calculator or from computer software).
  • The  p -value, 0.026, is less than the significance level of  α  = 0.05.
  • Decision: Reject the Null Hypothesis  H 0
  • Conclusion: There is sufficient evidence to conclude that there is a significant linear relationship between the third exam score ( x ) and the final exam score ( y ) because the correlation coefficient is significantly different from zero.

Because  r  is significant and the scatter plot shows a linear trend, the regression line can be used to predict final exam scores.

Method 2: Using a table of Critical Values to make a decision

The 95% Critical Values of the Sample Correlation Coefficient Table can be used to give you a good idea of whether the computed value of r is significant or not . Compare  r to the appropriate critical value in the table. If r is not between the positive and negative critical values, then the correlation coefficient is significant. If  r is significant, then you may want to use the line for prediction.

Suppose you computed  r = 0.801 using n = 10 data points. df = n – 2 = 10 – 2 = 8. The critical values associated with df = 8 are -0.632 and + 0.632. If r < negative critical value or r > positive critical value, then r is significant. Since r = 0.801 and 0.801 > 0.632, r is significant and the line may be used for prediction. If you view this example on a number line, it will help you.

Horizontal number line with values of -1, -0.632, 0, 0.632, 0.801, and 1. A dashed line above values -0.632, 0, and 0.632 indicates not significant values.

r is not significant between -0.632 and +0.632. r = 0.801 > +0.632. Therefore, r is significant.

For a given line of best fit, you computed that  r = 0.6501 using n = 12 data points and the critical value is 0.576. Can the line be used for prediction? Why or why not?

If the scatter plot looks linear then, yes, the line can be used for prediction, because  r > the positive critical value.

Suppose you computed  r = –0.624 with 14 data points. df = 14 – 2 = 12. The critical values are –0.532 and 0.532. Since –0.624 < –0.532, r is significant and the line can be used for prediction

Horizontal number line with values of -0.624, -0.532, and 0.532.

r = –0.624-0.532. Therefore, r is significant.

For a given line of best fit, you compute that  r = 0.5204 using n = 9 data points, and the critical value is 0.666. Can the line be used for prediction? Why or why not?

No, the line cannot be used for prediction, because  r < the positive critical value.

Suppose you computed  r = 0.776 and n = 6. df = 6 – 2 = 4. The critical values are –0.811 and 0.811. Since –0.811 < 0.776 < 0.811, r is not significant, and the line should not be used for prediction.

Horizontal number line with values -0.924, -0.532, and 0.532.

–0.811 <  r = 0.776 < 0.811. Therefore, r is not significant.

For a given line of best fit, you compute that  r = –0.7204 using n = 8 data points, and the critical value is = 0.707. Can the line be used for prediction? Why or why not?

Yes, the line can be used for prediction, because  r < the negative critical value.

THIRD-EXAM vs FINAL-EXAM EXAMPLE: critical value method

Consider the  third exam/final exam example  again. The line of best fit is: [latex]\hat{y}[/latex] = –173.51+4.83 x  with  r  = 0.6631 and there are  n  = 11 data points. Can the regression line be used for prediction?  Given a third-exam score ( x  value), can we use the line to predict the final exam score (predicted  y  value)?

  • Use the “95% Critical Value” table for  r  with  df  =  n  – 2 = 11 – 2 = 9.
  • The critical values are –0.602 and +0.602
  • Since 0.6631 > 0.602,  r  is significant.

Suppose you computed the following correlation coefficients. Using the table at the end of the chapter, determine if  r is significant and the line of best fit associated with each r can be used to predict a y value. If it helps, draw a number line.

  • r = –0.567 and the sample size, n , is 19. The df = n – 2 = 17. The critical value is –0.456. –0.567 < –0.456 so r is significant.
  • r = 0.708 and the sample size, n , is nine. The df = n – 2 = 7. The critical value is 0.666. 0.708 > 0.666 so r is significant.
  • r = 0.134 and the sample size, n , is 14. The df = 14 – 2 = 12. The critical value is 0.532. 0.134 is between –0.532 and 0.532 so r is not significant.
  • r = 0 and the sample size, n , is five. No matter what the dfs are, r = 0 is between the two critical values so r is not significant.

For a given line of best fit, you compute that  r = 0 using n = 100 data points. Can the line be used for prediction? Why or why not?

No, the line cannot be used for prediction no matter what the sample size is.

Assumptions in Testing the Significance of the Correlation Coefficient

Testing the significance of the correlation coefficient requires that certain assumptions about the data are satisfied. The premise of this test is that the data are a sample of observed points taken from a larger population. We have not examined the entire population because it is not possible or feasible to do so. We are examining the sample to draw a conclusion about whether the linear relationship that we see between  x and y in the sample data provides strong enough evidence so that we can conclude that there is a linear relationship between x and y in the population.

The regression line equation that we calculate from the sample data gives the best-fit line for our particular sample. We want to use this best-fit line for the sample as an estimate of the best-fit line for the population. Examining the scatterplot and testing the significance of the correlation coefficient helps us determine if it is appropriate to do this.

The assumptions underlying the test of significance are:

  • There is a linear relationship in the population that models the average value of y for varying values of x . In other words, the expected value of y for each particular value lies on a straight line in the population. (We do not know the equation for the line for the population. Our regression line from the sample is our best estimate of this line in the population).
  • The y values for any particular x value are normally distributed about the line. This implies that there are more y values scattered closer to the line than are scattered farther away. Assumption (1) implies that these normal distributions are centered on the line: the means of these normal distributions of y values lie on the line.
  • The standard deviations of the population y values about the line are equal for each value of x . In other words, each of these normal distributions of y  values has the same shape and spread about the line.
  • The residual errors are mutually independent (no pattern).
  • The data are produced from a well-designed, random sample or randomized experiment.

The left graph shows three sets of points. Each set falls in a vertical line. The points in each set are normally distributed along the line — they are densely packed in the middle and more spread out at the top and bottom. A downward sloping regression line passes through the mean of each set. The right graph shows the same regression line plotted. A vertical normal curve is shown for each line.

The  y values for each x value are normally distributed about the line with the same standard deviation. For each x value, the mean of the y values lies on the regression line. More y values lie near the line than are scattered further away from the line.

  • Provided by : Lumen Learning. License : CC BY: Attribution
  • Testing the Significance of the Correlation Coefficient. Provided by : OpenStax. Located at : https://openstax.org/books/introductory-statistics/pages/12-4-testing-the-significance-of-the-correlation-coefficient . License : CC BY: Attribution . License Terms : Access for free at https://openstax.org/books/introductory-statistics/pages/1-introduction
  • Introductory Statistics. Authored by : Barbara Illowsky, Susan Dean. Provided by : OpenStax. Located at : https://openstax.org/books/introductory-statistics/pages/1-introduction . License : CC BY: Attribution . License Terms : Access for free at https://openstax.org/books/introductory-statistics/pages/1-introduction

Footer Logo Lumen Candela

Privacy Policy

  • Privacy Policy

Research Method

Home » Correlation Analysis – Types, Methods and Examples

Correlation Analysis – Types, Methods and Examples

Table of Contents

Correlation Analysis

Correlation Analysis

Correlation analysis is a statistical method used to evaluate the strength and direction of the relationship between two or more variables . The correlation coefficient ranges from -1 to 1.

  • A correlation coefficient of 1 indicates a perfect positive correlation. This means that as one variable increases, the other variable also increases.
  • A correlation coefficient of -1 indicates a perfect negative correlation. This means that as one variable increases, the other variable decreases.
  • A correlation coefficient of 0 means that there’s no linear relationship between the two variables.

Correlation Analysis Methodology

Conducting a correlation analysis involves a series of steps, as described below:

  • Define the Problem : Identify the variables that you think might be related. The variables must be measurable on an interval or ratio scale. For example, if you’re interested in studying the relationship between the amount of time spent studying and exam scores, these would be your two variables.
  • Data Collection : Collect data on the variables of interest. The data could be collected through various means such as surveys , observations , or experiments. It’s crucial to ensure that the data collected is accurate and reliable.
  • Data Inspection : Check the data for any errors or anomalies such as outliers or missing values. Outliers can greatly affect the correlation coefficient, so it’s crucial to handle them appropriately.
  • Choose the Appropriate Correlation Method : Select the correlation method that’s most appropriate for your data. If your data meets the assumptions for Pearson’s correlation (interval or ratio level, linear relationship, variables are normally distributed), use that. If your data is ordinal or doesn’t meet the assumptions for Pearson’s correlation, consider using Spearman’s rank correlation or Kendall’s Tau.
  • Compute the Correlation Coefficient : Once you’ve selected the appropriate method, compute the correlation coefficient. This can be done using statistical software such as R, Python, or SPSS, or manually using the formulas.
  • Interpret the Results : Interpret the correlation coefficient you obtained. If the correlation is close to 1 or -1, the variables are strongly correlated. If the correlation is close to 0, the variables have little to no linear relationship. Also consider the sign of the correlation coefficient: a positive sign indicates a positive relationship (as one variable increases, so does the other), while a negative sign indicates a negative relationship (as one variable increases, the other decreases).
  • Check the Significance : It’s also important to test the statistical significance of the correlation. This typically involves performing a t-test. A small p-value (commonly less than 0.05) suggests that the observed correlation is statistically significant and not due to random chance.
  • Report the Results : The final step is to report your findings. This should include the correlation coefficient, the significance level, and a discussion of what these findings mean in the context of your research question.

Types of Correlation Analysis

Types of Correlation Analysis are as follows:

Pearson Correlation

This is the most common type of correlation analysis. Pearson correlation measures the linear relationship between two continuous variables. It assumes that the variables are normally distributed and have equal variances. The correlation coefficient (r) ranges from -1 to +1, with -1 indicating a perfect negative linear relationship, +1 indicating a perfect positive linear relationship, and 0 indicating no linear relationship.

Spearman Rank Correlation

Spearman’s rank correlation is a non-parametric measure that assesses how well the relationship between two variables can be described using a monotonic function. In other words, it evaluates the degree to which, as one variable increases, the other variable tends to increase, without requiring that increase to be consistent.

Kendall’s Tau

Kendall’s Tau is another non-parametric correlation measure used to detect the strength of dependence between two variables. Kendall’s Tau is often used for variables measured on an ordinal scale (i.e., where values can be ranked).

Point-Biserial Correlation

This is used when you have one dichotomous and one continuous variable, and you want to test for correlations. It’s a special case of the Pearson correlation.

Phi Coefficient

This is used when both variables are dichotomous or binary (having two categories). It’s a measure of association for two binary variables.

Canonical Correlation

This measures the correlation between two multi-dimensional variables. Each variable is a combination of data sets, and the method finds the linear combination that maximizes the correlation between them.

Partial and Semi-Partial (Part) Correlations

These are used when the researcher wants to understand the relationship between two variables while controlling for the effect of one or more additional variables.

Cross-Correlation

Used mostly in time series data to measure the similarity of two series as a function of the displacement of one relative to the other.

Autocorrelation

This is the correlation of a signal with a delayed copy of itself as a function of delay. This is often used in time series analysis to help understand the trend in the data over time.

Correlation Analysis Formulas

There are several formulas for correlation analysis, each corresponding to a different type of correlation. Here are some of the most commonly used ones:

Pearson’s Correlation Coefficient (r)

Pearson’s correlation coefficient measures the linear relationship between two variables. The formula is:

   r = Σ[(xi – Xmean)(yi – Ymean)] / sqrt[(Σ(xi – Xmean)²)(Σ(yi – Ymean)²)]

  • xi and yi are the values of X and Y variables.
  • Xmean and Ymean are the mean values of X and Y.
  • Σ denotes the sum of the values.

Spearman’s Rank Correlation Coefficient (rs)

Spearman’s correlation coefficient measures the monotonic relationship between two variables. The formula is:

   rs = 1 – (6Σd² / n(n² – 1))

  • d is the difference between the ranks of corresponding variables.
  • n is the number of observations.

Kendall’s Tau (τ)

Kendall’s Tau is a measure of rank correlation. The formula is:

   τ = (nc – nd) / 0.5n(n-1)

  • nc is the number of concordant pairs.
  • nd is the number of discordant pairs.

This correlation is a special case of Pearson’s correlation, and so, it uses the same formula as Pearson’s correlation.

Phi coefficient is a measure of association for two binary variables. It’s equivalent to Pearson’s correlation in this specific case.

Partial Correlation

The formula for partial correlation is more complex and depends on the Pearson’s correlation coefficients between the variables.

For partial correlation between X and Y given Z:

  rp(xy.z) = (rxy – rxz * ryz) / sqrt[(1 – rxz^2)(1 – ryz^2)]

  • rxy, rxz, ryz are the Pearson’s correlation coefficients.

Correlation Analysis Examples

Here are a few examples of how correlation analysis could be applied in different contexts:

  • Education : A researcher might want to determine if there’s a relationship between the amount of time students spend studying each week and their exam scores. The two variables would be “study time” and “exam scores”. If a positive correlation is found, it means that students who study more tend to score higher on exams.
  • Healthcare : A healthcare researcher might be interested in understanding the relationship between age and cholesterol levels. If a positive correlation is found, it could mean that as people age, their cholesterol levels tend to increase.
  • Economics : An economist may want to investigate if there’s a correlation between the unemployment rate and the rate of crime in a given city. If a positive correlation is found, it could suggest that as the unemployment rate increases, the crime rate also tends to increase.
  • Marketing : A marketing analyst might want to analyze the correlation between advertising expenditure and sales revenue. A positive correlation would suggest that higher advertising spending is associated with higher sales revenue.
  • Environmental Science : A scientist might be interested in whether there’s a relationship between the amount of CO2 emissions and average temperature increase. A positive correlation would indicate that higher CO2 emissions are associated with higher average temperatures.

Importance of Correlation Analysis

Correlation analysis plays a crucial role in many fields of study for several reasons:

  • Understanding Relationships : Correlation analysis provides a statistical measure of the relationship between two or more variables. It helps in understanding how one variable may change in relation to another.
  • Predicting Trends : When variables are correlated, changes in one can predict changes in another. This is particularly useful in fields like finance, weather forecasting, and technology, where forecasting trends is vital.
  • Data Reduction : If two variables are highly correlated, they are conveying similar information, and you may decide to use only one of them in your analysis, reducing the dimensionality of your data.
  • Testing Hypotheses : Correlation analysis can be used to test hypotheses about relationships between variables. For example, a researcher might want to test whether there’s a significant positive correlation between physical exercise and mental health.
  • Determining Factors : It can help identify factors that are associated with certain behaviors or outcomes. For example, public health researchers might analyze correlations to identify risk factors for diseases.
  • Model Building : Correlation is a fundamental concept in building multivariate statistical models, including regression models and structural equation models. These models often require an understanding of the inter-relationships (correlations) among multiple variables.
  • Validity and Reliability Analysis : In psychometrics, correlation analysis is used to assess the validity and reliability of measurement instruments such as tests or surveys.

Applications of Correlation Analysis

Correlation analysis is used in many fields to understand and quantify the relationship between variables. Here are some of its key applications:

  • Finance : In finance, correlation analysis is used to understand the relationship between different investment types or the risk and return of a portfolio. For example, if two stocks are positively correlated, they tend to move together; if they’re negatively correlated, they move in opposite directions.
  • Economics : Economists use correlation analysis to understand the relationship between various economic indicators, such as GDP and unemployment rate, inflation rate and interest rates, or income and consumption patterns.
  • Marketing : Correlation analysis can help marketers understand the relationship between advertising spend and sales, or the relationship between price changes and demand.
  • Psychology : In psychology, correlation analysis can be used to understand the relationship between different psychological variables, such as the correlation between stress levels and sleep quality, or between self-esteem and academic performance.
  • Medicine : In healthcare, correlation analysis can be used to understand the relationships between various health outcomes and potential predictors. For example, researchers might investigate the correlation between physical activity levels and heart disease, or between smoking and lung cancer.
  • Environmental Science : Correlation analysis can be used to investigate the relationships between different environmental factors, such as the correlation between CO2 levels and average global temperature, or between pesticide use and biodiversity.
  • Social Sciences : In fields like sociology and political science, correlation analysis can be used to investigate relationships between different social and political phenomena, such as the correlation between education levels and political participation, or between income inequality and social unrest.

Advantages and Disadvantages of Correlation Analysis

AdvantagesDisadvantages
Provides statistical measure of the relationship between variables.Cannot establish causality, only association.
Useful for prediction if variables are known to have a correlation.Can be misleading if important variables are left out (omitted variable bias).
Can help in hypothesis testing about the relationships between variables.Outliers can greatly affect the correlation coefficient.
Can help in data reduction by identifying closely related variables.Assumes a linear relationship in Pearson correlation, which may not always hold.
Fundamental concept in building multivariate statistical models.May not capture complex relationships (e.g., quadratic or cyclical relationships).
Helps in validity and reliability analysis in psychometrics.Correlation can be affected by the range of observed values (restriction of range).

About the author

' src=

Muhammad Hassan

Researcher, Academic Writer, Web developer

You may also like

Grounded Theory

Grounded Theory – Methods, Examples and Guide

ANOVA

ANOVA (Analysis of variance) – Formulas, Types...

Narrative Analysis

Narrative Analysis – Types, Methods and Examples

Inferential Statistics

Inferential Statistics – Types, Methods and...

Discriminant Analysis

Discriminant Analysis – Methods, Types and...

Graphical Methods

Graphical Methods – Types, Examples and Guide

Correlation in Psychology: Meaning, Types, Examples & coefficient

Saul McLeod, PhD

Editor-in-Chief for Simply Psychology

BSc (Hons) Psychology, MRes, PhD, University of Manchester

Saul McLeod, PhD., is a qualified psychology teacher with over 18 years of experience in further and higher education. He has been published in peer-reviewed journals, including the Journal of Clinical Psychology.

Learn about our Editorial Process

Olivia Guy-Evans, MSc

Associate Editor for Simply Psychology

BSc (Hons) Psychology, MSc Psychology of Education

Olivia Guy-Evans is a writer and associate editor for Simply Psychology. She has previously worked in healthcare and educational sectors.

On This Page:

Correlation means association – more precisely, it measures the extent to which two variables are related. There are three possible results of a correlational study: a positive correlation, a negative correlation, and no correlation.
  • A positive correlation is a relationship between two variables in which both variables move in the same direction. Therefore, one variable increases as the other variable increases, or one variable decreases while the other decreases. An example of a positive correlation would be height and weight. Taller people tend to be heavier.

positive correlation

  • A negative correlation is a relationship between two variables in which an increase in one variable is associated with a decrease in the other. An example of a negative correlation would be the height above sea level and temperature. As you climb the mountain (increase in height), it gets colder (decrease in temperature).

negative correlation

  • A zero correlation exists when there is no relationship between two variables. For example, there is no relationship between the amount of tea drunk and the level of intelligence.

zero correlation

Scatter Plots

A correlation can be expressed visually. This is done by drawing a scatter plot (also known as a scattergram, scatter graph, scatter chart, or scatter diagram).

A scatter plot is a graphical display that shows the relationships or associations between two numerical variables (or co-variables), which are represented as points (or dots) for each pair of scores.

A scatter plot indicates the strength and direction of the correlation between the co-variables.

Types of Correlations: Positive, Negative, and Zero

When you draw a scatter plot, it doesn’t matter which variable goes on the x-axis and which goes on the y-axis.

Remember, in correlations, we always deal with paired scores, so the values of the two variables taken together will be used to make the diagram.

Decide which variable goes on each axis and then simply put a cross at the point where the two values coincide.

Uses of Correlations

  • If there is a relationship between two variables, we can make predictions about one from another.
  • Concurrent validity (correlation between a new measure and an established measure).

Reliability

  • Test-retest reliability (are measures consistent?).
  • Inter-rater reliability (are observers consistent?).

Theory verification

  • Predictive validity.

Correlation Coefficients

Instead of drawing a scatter plot, a correlation can be expressed numerically as a coefficient, ranging from -1 to +1. When working with continuous variables, the correlation coefficient to use is Pearson’s r.

Correlation Coefficient Interpretation

The correlation coefficient ( r ) indicates the extent to which the pairs of numbers for these two variables lie on a straight line. Values over zero indicate a positive correlation, while values under zero indicate a negative correlation.

A correlation of –1 indicates a perfect negative correlation, meaning that as one variable goes up, the other goes down. A correlation of +1 indicates a perfect positive correlation, meaning that as one variable goes up, the other goes up.

There is no rule for determining what correlation size is considered strong, moderate, or weak. The interpretation of the coefficient depends on the topic of study.

When studying things that are difficult to measure, we should expect the correlation coefficients to be lower (e.g., above 0.4 to be relatively strong). When we are studying things that are easier to measure, such as socioeconomic status, we expect higher correlations (e.g., above 0.75 to be relatively strong).)

In these kinds of studies, we rarely see correlations above 0.6. For this kind of data, we generally consider correlations above 0.4 to be relatively strong; correlations between 0.2 and 0.4 are moderate, and those below 0.2 are considered weak.

When we are studying things that are more easily countable, we expect higher correlations. For example, with demographic data, we generally consider correlations above 0.75 to be relatively strong; correlations between 0.45 and 0.75 are moderate, and those below 0.45 are considered weak.

Correlation vs. Causation

Causation means that one variable (often called the predictor variable or independent variable) causes the other (often called the outcome variable or dependent variable).

Experiments can be conducted to establish causation. An experiment isolates and manipulates the independent variable to observe its effect on the dependent variable and controls the environment in order that extraneous variables may be eliminated.

A correlation between variables, however, does not automatically mean that the change in one variable is the cause of the change in the values of the other variable. A correlation only shows if there is a relationship between variables.

causation correlationg graph

While variables are sometimes correlated because one does cause the other, it could also be that some other factor, a confounding variable , is actually causing the systematic movement in our variables of interest.

Correlation does not always prove causation, as a third variable may be involved. For example, being a patient in a hospital is correlated with dying, but this does not mean that one event causes the other, as another third variable might be involved (such as diet and level of exercise).

“Correlation is not causation” means that just because two variables are related it does not necessarily mean that one causes the other.

A correlation identifies variables and looks for a relationship between them. An experiment tests the effect that an independent variable has upon a dependent variable but a correlation looks for a relationship between two variables.

This means that the experiment can predict cause and effect (causation) but a correlation can only predict a relationship, as another extraneous variable may be involved that it not known about.

1. Correlation allows the researcher to investigate naturally occurring variables that may be unethical or impractical to test experimentally. For example, it would be unethical to conduct an experiment on whether smoking causes lung cancer.

2 . Correlation allows the researcher to clearly and easily see if there is a relationship between variables. This can then be displayed in a graphical form.

Limitations

1 . Correlation is not and cannot be taken to imply causation. Even if there is a very strong association between two variables, we cannot assume that one causes the other.

For example, suppose we found a positive correlation between watching violence on T.V. and violent behavior in adolescence.

It could be that the cause of both these is a third (extraneous) variable – for example, growing up in a violent home – and that both the watching of T.V. and the violent behavior is the outcome of this.

2 . Correlation does not allow us to go beyond the given data. For example, suppose it was found that there was an association between time spent on homework (1/2 hour to 3 hours) and the number of G.C.S.E. passes (1 to 6).

It would not be legitimate to infer from this that spending 6 hours on homework would likely generate 12 G.C.S.E. passes.

How do you know if a study is correlational?

A study is considered correlational if it examines the relationship between two or more variables without manipulating them. In other words, the study does not involve the manipulation of an independent variable to see how it affects a dependent variable.

One way to identify a correlational study is to look for language that suggests a relationship between variables rather than cause and effect.

For example, the study may use phrases like “associated with,” “related to,” or “predicts” when describing the variables being studied.

Another way to identify a correlational study is to look for information about how the variables were measured. Correlational studies typically involve measuring variables using self-report surveys, questionnaires, or other measures of naturally occurring behavior.

Finally, a correlational study may include statistical analyses such as correlation coefficients or regression analyses to examine the strength and direction of the relationship between variables.

Why is a correlational study used?

Correlational studies are particularly useful when it is not possible or ethical to manipulate one of the variables.

For example, it would not be ethical to manipulate someone’s age or gender. However, researchers may still want to understand how these variables relate to outcomes such as health or behavior.

Additionally, correlational studies can be used to generate hypotheses and guide further research.

If a correlational study finds a significant relationship between two variables, this can suggest a possible causal relationship that can be further explored in future research.

What is the goal of correlational research?

The ultimate goal of correlational research is to increase our understanding of how different variables are related and to identify patterns in those relationships.

This information can then be used to generate hypotheses and guide further research aimed at establishing causality.

Print Friendly, PDF & Email

Examples

Correlation Hypothesis

Ai generator.

example of correlation hypothesis

Understanding the relationships between variables is pivotal in research. Correlation hypotheses delve into the degree of association between two or more variables. In this guide, delve into an array of correlation hypothesis examples that explore connections, followed by a step-by-step tutorial on crafting these thesis statement hypothesis effectively. Enhance your research prowess with valuable tips tailored to unravel the intricate world of correlations.

What is Correlation Hypothesis?

A correlation hypothesis is a statement that predicts a specific relationship between two or more variables based on the assumption that changes in one variable are associated with changes in another variable. It suggests that there is a correlation or statistical relationship between the variables, meaning that when one variable changes, the other variable is likely to change in a consistent manner.

What is an example of a Correlation Hypothesis Statement?

Example: “If the amount of exercise increases, then the level of physical fitness will also increase.”

In this example, the correlation hypothesis suggests that there is a positive correlation between the amount of exercise a person engages in and their level of physical fitness. As exercise increases, the hypothesis predicts that physical fitness will increase as well. This hypothesis can be tested by collecting data on exercise levels and physical fitness levels and analyzing the relationship between the two variables using statistical methods.

100 Correlation Hypothesis Statement Examples

Correlation Hypothesis Statement Examples

Size: 277 KB

Discover the intriguing world of correlation through a collection of examples that illustrate how variables can be linked in research. Explore diverse scenarios where changes in one variable may correspond to changes in another, forming the basis of correlation hypotheses. These real-world instances shed light on the essence of correlation analysis and its role in uncovering connections between different aspects of data.

  • Study Hours and Exam Scores : If students study more hours per week, then their exam scores will show a positive correlation, indicating that increased study time might lead to better performance.
  • Income and Education : If the level of education increases, then income levels will also rise, demonstrating a positive correlation between education attainment and earning potential.
  • Social Media Usage and Well-being : If individuals spend more time on social media platforms, then their self-reported well-being might exhibit a negative correlation, suggesting that excessive use could impact mental health.
  • Temperature and Ice Cream Sales : If temperatures rise, then the sales of ice cream might increase, displaying a positive correlation due to the weather’s influence on consumer behavior.
  • Physical Activity and Heart Rate : If the intensity of physical activity rises, then heart rate might increase, signifying a positive correlation between exercise intensity and heart rate.
  • Age and Reaction Time : If age increases, then reaction time might show a positive correlation, indicating that as people age, their reaction times might slow down.
  • Smoking and Lung Capacity : If the number of cigarettes smoked daily increases, then lung capacity might decrease, suggesting a negative correlation between smoking and respiratory health.
  • Stress and Sleep Quality : If stress levels elevate, then sleep quality might decline, reflecting a negative correlation between psychological stress and restorative sleep.
  • Rainfall and Crop Yield : If the amount of rainfall decreases, then crop yield might also decrease, illustrating a negative correlation between precipitation and agricultural productivity.
  • Screen Time and Academic Performance : If screen time usage increases among students, then academic performance might show a negative correlation, suggesting that excessive screen time could be detrimental to studies.
  • Exercise and Body Weight : If individuals engage in regular exercise, then their body weight might exhibit a negative correlation, implying that physical activity can contribute to weight management.
  • Income and Crime Rates : If income levels decrease in a neighborhood, then crime rates might show a positive correlation, indicating a potential link between socio-economic factors and crime.
  • Social Support and Mental Health : If the level of social support increases, then individuals’ mental health scores may exhibit a positive correlation, highlighting the potential positive impact of strong social networks on psychological well-being.
  • Study Time and GPA : If students spend more time studying, then their Grade Point Average (GPA) might display a positive correlation, suggesting that increased study efforts may lead to higher academic achievement.
  • Parental Involvement and Academic Success : If parents are more involved in their child’s education, then the child’s academic success may show a positive correlation, emphasizing the role of parental support in shaping student outcomes.
  • Alcohol Consumption and Reaction Time : If alcohol consumption increases, then reaction time might slow down, indicating a negative correlation between alcohol intake and cognitive performance.
  • Social Media Engagement and Loneliness : If time spent on social media platforms increases, then feelings of loneliness might show a positive correlation, suggesting a potential connection between excessive online interaction and emotional well-being.
  • Temperature and Insect Activity : If temperatures rise, then the activity of certain insects might increase, demonstrating a potential positive correlation between temperature and insect behavior.
  • Education Level and Voting Participation : If education levels rise, then voter participation rates may also increase, showcasing a positive correlation between education and civic engagement.
  • Work Commute Time and Job Satisfaction : If work commute time decreases, then job satisfaction might show a positive correlation, indicating that shorter commutes could contribute to higher job satisfaction.
  • Sleep Duration and Cognitive Performance : If sleep duration increases, then cognitive performance scores might also rise, suggesting a potential positive correlation between adequate sleep and cognitive functioning.
  • Healthcare Access and Mortality Rate : If access to healthcare services improves, then the mortality rate might decrease, highlighting a potential negative correlation between healthcare accessibility and mortality.
  • Exercise and Blood Pressure : If individuals engage in regular exercise, then their blood pressure levels might exhibit a negative correlation, indicating that physical activity can contribute to maintaining healthy blood pressure.
  • Social Media Use and Academic Distraction : If students spend more time on social media during study sessions, then their academic focus might show a negative correlation, suggesting that excessive online engagement can hinder concentration.
  • Age and Technological Adaptation : If age increases, then the speed of adapting to new technologies might exhibit a negative correlation, suggesting that younger individuals tend to adapt more quickly.
  • Temperature and Plant Growth : If temperatures rise, then the rate of plant growth might increase, indicating a potential positive correlation between temperature and biological processes.
  • Music Exposure and Mood : If individuals listen to upbeat music, then their reported mood might show a positive correlation, suggesting that music can influence emotional states.
  • Income and Healthcare Utilization : If income levels increase, then the frequency of healthcare utilization might decrease, suggesting a potential negative correlation between income and healthcare needs.
  • Distance and Communication Frequency : If physical distance between individuals increases, then their communication frequency might show a negative correlation, indicating that proximity tends to facilitate communication.
  • Study Group Attendance and Exam Scores : If students regularly attend study groups, then their exam scores might exhibit a positive correlation, suggesting that collaborative study efforts could enhance performance.
  • Temperature and Disease Transmission : If temperatures rise, then the transmission of certain diseases might increase, pointing to a potential positive correlation between temperature and disease spread.
  • Interest Rates and Consumer Spending : If interest rates decrease, then consumer spending might show a positive correlation, suggesting that lower interest rates encourage increased economic activity.
  • Digital Device Use and Eye Strain : If individuals spend more time on digital devices, then the occurrence of eye strain might show a positive correlation, suggesting that prolonged screen time can impact eye health.
  • Parental Education and Children’s Educational Attainment : If parents have higher levels of education, then their children’s educational attainment might display a positive correlation, highlighting the intergenerational impact of education.
  • Social Interaction and Happiness : If individuals engage in frequent social interactions, then their reported happiness levels might show a positive correlation, indicating that social connections contribute to well-being.
  • Temperature and Energy Consumption : If temperatures decrease, then energy consumption for heating might increase, suggesting a potential positive correlation between temperature and energy usage.
  • Physical Activity and Stress Reduction : If individuals engage in regular physical activity, then their reported stress levels might display a negative correlation, indicating that exercise can help alleviate stress.
  • Diet Quality and Chronic Diseases : If diet quality improves, then the prevalence of chronic diseases might decrease, suggesting a potential negative correlation between healthy eating habits and disease risk.
  • Social Media Use and Body Image Dissatisfaction : If time spent on social media increases, then feelings of body image dissatisfaction might show a positive correlation, suggesting that online platforms can influence self-perception.
  • Income and Access to Quality Education : If household income increases, then access to quality education for children might improve, suggesting a potential positive correlation between financial resources and educational opportunities.
  • Workplace Diversity and Innovation : If workplace diversity increases, then the rate of innovation might show a positive correlation, indicating that diverse teams often generate more creative solutions.
  • Physical Activity and Bone Density : If individuals engage in weight-bearing exercises, then their bone density might exhibit a positive correlation, suggesting that exercise contributes to bone health.
  • Screen Time and Attention Span : If screen time increases, then attention span might show a negative correlation, indicating that excessive screen exposure can impact sustained focus.
  • Social Support and Resilience : If individuals have strong social support networks, then their resilience levels might display a positive correlation, suggesting that social connections contribute to coping abilities.
  • Weather Conditions and Mood : If sunny weather persists, then individuals’ reported mood might exhibit a positive correlation, reflecting the potential impact of weather on emotional states.
  • Nutrition Education and Healthy Eating : If individuals receive nutrition education, then their consumption of fruits and vegetables might show a positive correlation, suggesting that knowledge influences dietary choices.
  • Physical Activity and Cognitive Aging : If adults engage in regular physical activity, then their cognitive decline with aging might show a slower rate, indicating a potential negative correlation between exercise and cognitive aging.
  • Air Quality and Respiratory Illnesses : If air quality deteriorates, then the incidence of respiratory illnesses might increase, suggesting a potential positive correlation between air pollutants and health impacts.
  • Reading Habits and Vocabulary Growth : If individuals read regularly, then their vocabulary size might exhibit a positive correlation, suggesting that reading contributes to language development.
  • Sleep Quality and Stress Levels : If sleep quality improves, then reported stress levels might display a negative correlation, indicating that sleep can impact psychological well-being.
  • Social Media Engagement and Academic Performance : If students spend more time on social media, then their academic performance might exhibit a negative correlation, suggesting that excessive online engagement can impact studies.
  • Exercise and Blood Sugar Levels : If individuals engage in regular exercise, then their blood sugar levels might display a negative correlation, indicating that physical activity can influence glucose regulation.
  • Screen Time and Sleep Duration : If screen time before bedtime increases, then sleep duration might show a negative correlation, suggesting that screen exposure can affect sleep patterns.
  • Environmental Pollution and Health Outcomes : If exposure to environmental pollutants increases, then the occurrence of health issues might show a positive correlation, suggesting that pollution can impact well-being.
  • Time Management and Academic Achievement : If students improve time management skills, then their academic achievement might exhibit a positive correlation, indicating that effective planning contributes to success.
  • Physical Fitness and Heart Health : If individuals improve their physical fitness, then their heart health indicators might display a positive correlation, indicating that exercise benefits cardiovascular well-being.
  • Weather Conditions and Outdoor Activities : If weather is sunny, then outdoor activities might show a positive correlation, suggesting that favorable weather encourages outdoor engagement.
  • Media Exposure and Body Image Perception : If exposure to media images increases, then body image dissatisfaction might show a positive correlation, indicating media’s potential influence on self-perception.
  • Community Engagement and Civic Participation : If individuals engage in community activities, then their civic participation might exhibit a positive correlation, indicating an active citizenry.
  • Social Media Use and Productivity : If individuals spend more time on social media, then their productivity levels might exhibit a negative correlation, suggesting that online distractions can affect work efficiency.
  • Income and Stress Levels : If income levels increase, then reported stress levels might exhibit a negative correlation, suggesting that financial stability can impact psychological well-being.
  • Social Media Use and Interpersonal Skills : If individuals spend more time on social media, then their interpersonal skills might show a negative correlation, indicating potential effects on face-to-face interactions.
  • Parental Involvement and Academic Motivation : If parents are more involved in their child’s education, then the child’s academic motivation may exhibit a positive correlation, highlighting the role of parental support.
  • Technology Use and Sleep Quality : If screen time increases before bedtime, then sleep quality might show a negative correlation, suggesting that technology use can impact sleep.
  • Outdoor Activity and Mood Enhancement : If individuals engage in outdoor activities, then their reported mood might display a positive correlation, suggesting the potential emotional benefits of nature exposure.
  • Income Inequality and Social Mobility : If income inequality increases, then social mobility might exhibit a negative correlation, suggesting that higher inequality can hinder upward mobility.
  • Vegetable Consumption and Heart Health : If individuals increase their vegetable consumption, then heart health indicators might show a positive correlation, indicating the potential benefits of a nutritious diet.
  • Online Learning and Academic Achievement : If students engage in online learning, then their academic achievement might display a positive correlation, highlighting the effectiveness of digital education.
  • Emotional Intelligence and Workplace Performance : If emotional intelligence improves, then workplace performance might exhibit a positive correlation, indicating the relevance of emotional skills.
  • Community Engagement and Mental Well-being : If individuals engage in community activities, then their reported mental well-being might show a positive correlation, emphasizing social connections’ impact.
  • Rainfall and Agriculture Productivity : If rainfall levels increase, then agricultural productivity might exhibit a positive correlation, indicating the importance of water for crops.
  • Social Media Use and Body Posture : If screen time increases, then poor body posture might show a positive correlation, suggesting that screen use can influence physical habits.
  • Marital Satisfaction and Relationship Length : If marital satisfaction decreases, then relationship length might show a negative correlation, indicating potential challenges over time.
  • Exercise and Anxiety Levels : If individuals engage in regular exercise, then reported anxiety levels might exhibit a negative correlation, indicating the potential benefits of physical activity on mental health.
  • Music Listening and Concentration : If individuals listen to instrumental music, then their concentration levels might display a positive correlation, suggesting music’s impact on focus.
  • Internet Usage and Attention Deficits : If screen time increases, then attention deficits might show a positive correlation, implying that excessive internet use can affect concentration.
  • Financial Literacy and Debt Levels : If financial literacy improves, then personal debt levels might exhibit a negative correlation, suggesting better financial decision-making.
  • Time Spent Outdoors and Vitamin D Levels : If time spent outdoors increases, then vitamin D levels might show a positive correlation, indicating sun exposure’s role in vitamin synthesis.
  • Family Meal Frequency and Nutrition : If families eat meals together frequently, then nutrition quality might display a positive correlation, emphasizing family dining’s impact on health.
  • Temperature and Allergy Symptoms : If temperatures rise, then allergy symptoms might increase, suggesting a potential positive correlation between temperature and allergen exposure.
  • Social Media Use and Academic Distraction : If students spend more time on social media, then their academic focus might exhibit a negative correlation, indicating that online engagement can hinder studies.
  • Financial Stress and Health Outcomes : If financial stress increases, then the occurrence of health issues might show a positive correlation, suggesting potential health impacts of economic strain.
  • Study Hours and Test Anxiety : If students study more hours, then test anxiety might show a negative correlation, suggesting that increased preparation can reduce anxiety.
  • Music Tempo and Exercise Intensity : If music tempo increases, then exercise intensity might display a positive correlation, indicating music’s potential to influence workout vigor.
  • Green Space Accessibility and Stress Reduction : If access to green spaces improves, then reported stress levels might exhibit a negative correlation, highlighting nature’s stress-reducing effects.
  • Parenting Style and Child Behavior : If authoritative parenting increases, then positive child behaviors might display a positive correlation, suggesting parenting’s influence on behavior.
  • Sleep Quality and Productivity : If sleep quality improves, then work productivity might show a positive correlation, emphasizing the connection between rest and efficiency.
  • Media Consumption and Political Beliefs : If media consumption increases, then alignment with specific political beliefs might exhibit a positive correlation, suggesting media’s influence on ideology.
  • Workplace Satisfaction and Employee Retention : If workplace satisfaction increases, then employee retention rates might show a positive correlation, indicating the link between job satisfaction and tenure.
  • Digital Device Use and Eye Discomfort : If screen time increases, then reported eye discomfort might show a positive correlation, indicating potential impacts of screen exposure.
  • Age and Adaptability to Technology : If age increases, then adaptability to new technologies might exhibit a negative correlation, indicating generational differences in tech adoption.
  • Physical Activity and Mental Health : If individuals engage in regular physical activity, then reported mental health scores might exhibit a positive correlation, showcasing exercise’s impact.
  • Video Gaming and Attention Span : If time spent on video games increases, then attention span might display a negative correlation, indicating potential effects on focus.
  • Social Media Use and Empathy Levels : If social media use increases, then reported empathy levels might show a negative correlation, suggesting possible effects on emotional understanding.
  • Reading Habits and Creativity : If individuals read diverse genres, then their creative thinking might exhibit a positive correlation, emphasizing reading’s cognitive benefits.
  • Weather Conditions and Outdoor Exercise : If weather is pleasant, then outdoor exercise might show a positive correlation, suggesting weather’s influence on physical activity.
  • Parental Involvement and Bullying Prevention : If parents are actively involved, then instances of bullying might exhibit a negative correlation, emphasizing parental impact on behavior.
  • Digital Device Use and Sleep Disruption : If screen time before bedtime increases, then sleep disruption might show a positive correlation, indicating technology’s influence on sleep.
  • Friendship Quality and Psychological Well-being : If friendship quality increases, then reported psychological well-being might show a positive correlation, highlighting social support’s impact.
  • Income and Environmental Consciousness : If income levels increase, then environmental consciousness might also rise, indicating potential links between affluence and sustainability awareness.

Correlational Hypothesis Interpretation Statement Examples

Explore the art of interpreting correlation hypotheses with these illustrative examples. Understand the implications of positive, negative, and zero correlations, and learn how to deduce meaningful insights from data relationships.

  • Relationship Between Exercise and Mood : A positive correlation between exercise frequency and mood scores suggests that increased physical activity might contribute to enhanced emotional well-being.
  • Association Between Screen Time and Sleep Quality : A negative correlation between screen time before bedtime and sleep quality indicates that higher screen exposure could lead to poorer sleep outcomes.
  • Connection Between Study Hours and Exam Performance : A positive correlation between study hours and exam scores implies that increased study time might correspond to better academic results.
  • Link Between Stress Levels and Meditation Practice : A negative correlation between stress levels and meditation frequency suggests that engaging in meditation could be associated with lower perceived stress.
  • Relationship Between Social Media Use and Loneliness : A positive correlation between social media engagement and feelings of loneliness implies that excessive online interaction might contribute to increased loneliness.
  • Association Between Income and Happiness : A positive correlation between income and self-reported happiness indicates that higher income levels might be linked to greater subjective well-being.
  • Connection Between Parental Involvement and Academic Performance : A positive correlation between parental involvement and students’ grades suggests that active parental engagement might contribute to better academic outcomes.
  • Link Between Time Management and Stress Levels : A negative correlation between effective time management and reported stress levels implies that better time management skills could lead to lower stress.
  • Relationship Between Outdoor Activities and Vitamin D Levels : A positive correlation between time spent outdoors and vitamin D levels suggests that increased outdoor engagement might be associated with higher vitamin D concentrations.
  • Association Between Water Consumption and Skin Hydration : A positive correlation between water intake and skin hydration indicates that higher fluid consumption might lead to improved skin moisture levels.

Alternative Correlational Hypothesis Statement Examples

Explore alternative scenarios and potential correlations in these examples. Learn to articulate different hypotheses that could explain data relationships beyond the conventional assumptions.

  • Alternative to Exercise and Mood : An alternative hypothesis could suggest a non-linear relationship between exercise and mood, indicating that moderate exercise might have the most positive impact on emotional well-being.
  • Alternative to Screen Time and Sleep Quality : An alternative hypothesis might propose that screen time has a curvilinear relationship with sleep quality, suggesting that moderate screen exposure leads to optimal sleep outcomes.
  • Alternative to Study Hours and Exam Performance : An alternative hypothesis could propose that there’s an interaction effect between study hours and study method, influencing the relationship between study time and exam scores.
  • Alternative to Stress Levels and Meditation Practice : An alternative hypothesis might consider that the relationship between stress levels and meditation practice is moderated by personality traits, resulting in varying effects.
  • Alternative to Social Media Use and Loneliness : An alternative hypothesis could posit that the relationship between social media use and loneliness depends on the quality of online interactions and content consumption.
  • Alternative to Income and Happiness : An alternative hypothesis might propose that the relationship between income and happiness differs based on cultural factors, leading to varying happiness levels at different income ranges.
  • Alternative to Parental Involvement and Academic Performance : An alternative hypothesis could suggest that the relationship between parental involvement and academic performance varies based on students’ learning styles and preferences.
  • Alternative to Time Management and Stress Levels : An alternative hypothesis might explore the possibility of a curvilinear relationship between time management and stress levels, indicating that extreme time management efforts might elevate stress.
  • Alternative to Outdoor Activities and Vitamin D Levels : An alternative hypothesis could consider that the relationship between outdoor activities and vitamin D levels is moderated by sunscreen usage, influencing vitamin synthesis.
  • Alternative to Water Consumption and Skin Hydration : An alternative hypothesis might propose that the relationship between water consumption and skin hydration is mediated by dietary factors, influencing fluid retention and skin health.

Correlational Hypothesis Pearson Interpretation Statement Examples

Discover how the Pearson correlation coefficient enhances your understanding of data relationships with these examples. Learn to interpret correlation strength and direction using this valuable statistical measure.

  • Strong Positive Correlation : A Pearson correlation coefficient of +0.85 between study time and exam scores indicates a strong positive relationship, suggesting that increased study time is strongly associated with higher grades.
  • Moderate Negative Correlation : A Pearson correlation coefficient of -0.45 between screen time and sleep quality reflects a moderate negative correlation, implying that higher screen exposure is moderately linked to poorer sleep outcomes.
  • Weak Positive Correlation : A Pearson correlation coefficient of +0.25 between social media use and loneliness suggests a weak positive correlation, indicating that increased online engagement is weakly related to higher loneliness.
  • Strong Negative Correlation : A Pearson correlation coefficient of -0.75 between stress levels and meditation practice indicates a strong negative relationship, implying that engaging in meditation is strongly associated with lower stress.
  • Moderate Positive Correlation : A Pearson correlation coefficient of +0.60 between income and happiness signifies a moderate positive correlation, suggesting that higher income is moderately linked to greater happiness.
  • Weak Negative Correlation : A Pearson correlation coefficient of -0.30 between parental involvement and academic performance represents a weak negative correlation, indicating that higher parental involvement is weakly associated with lower academic performance.
  • Strong Positive Correlation : A Pearson correlation coefficient of +0.80 between time management and stress levels reveals a strong positive relationship, suggesting that effective time management is strongly linked to lower stress.
  • Weak Negative Correlation : A Pearson correlation coefficient of -0.20 between outdoor activities and vitamin D levels signifies a weak negative correlation, implying that higher outdoor engagement is weakly related to lower vitamin D levels.
  • Moderate Positive Correlation : A Pearson correlation coefficient of +0.50 between water consumption and skin hydration denotes a moderate positive correlation, suggesting that increased fluid intake is moderately linked to better skin hydration.
  • Strong Negative Correlation : A Pearson correlation coefficient of -0.70 between screen time and attention span indicates a strong negative relationship, implying that higher screen exposure is strongly associated with shorter attention spans.

Correlational Hypothesis Statement Examples in Psychology

Explore how correlation hypotheses apply to psychological research with these examples. Understand how psychologists investigate relationships between variables to gain insights into human behavior.

  • Sleep Patterns and Cognitive Performance : There is a positive correlation between consistent sleep patterns and cognitive performance, suggesting that individuals with regular sleep schedules exhibit better cognitive functioning.
  • Anxiety Levels and Social Media Use : There is a positive correlation between anxiety levels and excessive social media use, indicating that individuals who spend more time on social media might experience higher anxiety.
  • Self-Esteem and Body Image Satisfaction : There is a negative correlation between self-esteem and body image satisfaction, implying that individuals with higher self-esteem tend to be more satisfied with their physical appearance.
  • Parenting Styles and Child Aggression : There is a negative correlation between authoritative parenting styles and child aggression, suggesting that children raised by authoritative parents might exhibit lower levels of aggression.
  • Emotional Intelligence and Conflict Resolution : There is a positive correlation between emotional intelligence and effective conflict resolution, indicating that individuals with higher emotional intelligence tend to resolve conflicts more successfully.
  • Personality Traits and Career Satisfaction : There is a positive correlation between certain personality traits (e.g., extraversion, openness) and career satisfaction, suggesting that individuals with specific traits experience higher job contentment.
  • Stress Levels and Coping Mechanisms : There is a negative correlation between stress levels and adaptive coping mechanisms, indicating that individuals with lower stress levels are more likely to employ effective coping strategies.
  • Attachment Styles and Romantic Relationship Quality : There is a positive correlation between secure attachment styles and higher romantic relationship quality, suggesting that individuals with secure attachments tend to have healthier relationships.
  • Social Support and Mental Health : There is a negative correlation between perceived social support and mental health issues, indicating that individuals with strong social support networks tend to experience fewer mental health challenges.
  • Motivation and Academic Achievement : There is a positive correlation between intrinsic motivation and academic achievement, implying that students who are internally motivated tend to perform better academically.

Does Correlational Research Have Hypothesis?

Correlational research involves examining the relationship between two or more variables to determine whether they are related and how they change together. While correlational studies do not establish causation, they still utilize hypotheses to formulate expectations about the relationships between variables. These good hypotheses predict the presence, direction, and strength of correlations. However, in correlational research, the focus is on measuring and analyzing the degree of association rather than establishing cause-and-effect relationships.

How Do You Write a Null-Hypothesis for a Correlational Study?

The null hypothesis in a correlational study states that there is no significant correlation between the variables being studied. It assumes that any observed correlation is due to chance and lacks meaningful association. When writing a null hypothesis for a correlational study, follow these steps:

  • Identify the Variables: Clearly define the variables you are studying and their relationship (e.g., “There is no significant correlation between X and Y”).
  • Specify the Population: Indicate the population from which the data is drawn (e.g., “In the population of [target population]…”).
  • Include the Direction of Correlation: If relevant, specify the direction of correlation (positive, negative, or zero) that you are testing (e.g., “…there is no significant positive/negative correlation…”).
  • State the Hypothesis: Write the null hypothesis as a clear statement that there is no significant correlation between the variables (e.g., “…there is no significant correlation between X and Y”).

What Is Correlation Hypothesis Formula?

The correlation hypothesis is often expressed in the form of a statement that predicts the presence and nature of a relationship between two variables. It typically follows the “If-Then” structure, indicating the expected change in one variable based on changes in another. The correlation hypothesis formula can be written as:

“If [Variable X] changes, then [Variable Y] will also change [in a specified direction] because [rationale for the expected correlation].”

For example, “If the amount of exercise increases, then mood scores will improve because physical activity has been linked to better emotional well-being.”

What Is a Correlational Hypothesis in Research Methodology?

A correlational hypothesis in research methodology is a testable hypothesis statement that predicts the presence and nature of a relationship between two or more variables. It forms the basis for conducting a correlational study, where the goal is to measure and analyze the degree of association between variables. Correlational hypotheses are essential in guiding the research process, collecting relevant data, and assessing whether the observed correlations are statistically significant.

How Do You Write a Hypothesis for Correlation? – A Step by Step Guide

Writing a hypothesis for correlation involves crafting a clear and testable statement about the expected relationship between variables. Here’s a step-by-step guide:

  • Identify Variables : Clearly define the variables you are studying and their nature (e.g., “There is a relationship between X and Y…”).
  • Specify Direction : Indicate the expected direction of correlation (positive, negative, or zero) based on your understanding of the variables and existing literature.
  • Formulate the If-Then Statement : Write an “If-Then” statement that predicts the change in one variable based on changes in the other variable (e.g., “If [Variable X] changes, then [Variable Y] will also change [in a specified direction]…”).
  • Provide Rationale : Explain why you expect the correlation to exist, referencing existing theories, research, or logical reasoning.
  • Quantitative Prediction (Optional) : If applicable, provide a quantitative prediction about the strength of the correlation (e.g., “…for every one unit increase in [Variable X], [Variable Y] is predicted to increase by [numerical value].”).
  • Specify Population : Indicate the population to which your hypothesis applies (e.g., “In a sample of [target population]…”).

Tips for Writing Correlational Hypothesis

  • Base on Existing Knowledge : Ground your hypothesis in existing literature, theories, or empirical evidence to ensure it’s well-informed.
  • Be Specific : Clearly define the variables and direction of correlation you’re predicting to avoid ambiguity.
  • Avoid Causation Claims : Remember that correlational hypotheses do not imply causation. Focus on predicting relationships, not causes.
  • Use Clear Language : Write in clear and concise language, avoiding jargon that may confuse readers.
  • Consider Alternative Explanations : Acknowledge potential confounding variables or alternative explanations that could affect the observed correlation.
  • Be Open to Results : Correlation results can be unexpected. Be prepared to interpret findings even if they don’t align with your initial hypothesis.
  • Test Statistically : Once you collect data, use appropriate statistical tests to determine if the observed correlation is statistically significant.
  • Revise as Needed : If your findings don’t support your hypothesis, revise it based on the data and insights gained.

Crafting a well-structured correlational hypothesis is crucial for guiding your research, conducting meaningful analysis, and contributing to the understanding of relationships between variables.

Twitter

Text prompt

  • Instructive
  • Professional

10 Examples of Public speaking

20 Examples of Gas lighting

User Preferences

Content preview.

Arcu felis bibendum ut tristique et egestas quis:

  • Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris
  • Duis aute irure dolor in reprehenderit in voluptate
  • Excepteur sint occaecat cupidatat non proident

Keyboard Shortcuts

10.1 - setting the hypotheses: examples.

A significance test examines whether the null hypothesis provides a plausible explanation of the data. The null hypothesis itself does not involve the data. It is a statement about a parameter (a numerical characteristic of the population). These population values might be proportions or means or differences between means or proportions or correlations or odds ratios or any other numerical summary of the population. The alternative hypothesis is typically the research hypothesis of interest. Here are some examples.

Example 10.2: Hypotheses with One Sample of One Categorical Variable Section  

About 10% of the human population is left-handed. Suppose a researcher at Penn State speculates that students in the College of Arts and Architecture are more likely to be left-handed than people found in the general population. We only have one sample since we will be comparing a population proportion based on a sample value to a known population value.

  • Research Question : Are artists more likely to be left-handed than people found in the general population?
  • Response Variable : Classification of the student as either right-handed or left-handed

State Null and Alternative Hypotheses

  • Null Hypothesis : Students in the College of Arts and Architecture are no more likely to be left-handed than people in the general population (population percent of left-handed students in the College of Art and Architecture = 10% or p = .10).
  • Alternative Hypothesis : Students in the College of Arts and Architecture are more likely to be left-handed than people in the general population (population percent of left-handed students in the College of Arts and Architecture > 10% or p > .10). This is a one-sided alternative hypothesis.

Example 10.3: Hypotheses with One Sample of One Measurement Variable Section  

 two Diphenhydramine pills

A generic brand of the anti-histamine Diphenhydramine markets a capsule with a 50 milligram dose. The manufacturer is worried that the machine that fills the capsules has come out of calibration and is no longer creating capsules with the appropriate dosage.

  • Research Question : Does the data suggest that the population mean dosage of this brand is different than 50 mg?
  • Response Variable : dosage of the active ingredient found by a chemical assay.
  • Null Hypothesis : On the average, the dosage sold under this brand is 50 mg (population mean dosage = 50 mg).
  • Alternative Hypothesis : On the average, the dosage sold under this brand is not 50 mg (population mean dosage ≠ 50 mg). This is a two-sided alternative hypothesis.

Example 10.4: Hypotheses with Two Samples of One Categorical Variable Section  

vegetarian airline meal

Many people are starting to prefer vegetarian meals on a regular basis. Specifically, a researcher believes that females are more likely than males to eat vegetarian meals on a regular basis.

  • Research Question : Does the data suggest that females are more likely than males to eat vegetarian meals on a regular basis?
  • Response Variable : Classification of whether or not a person eats vegetarian meals on a regular basis
  • Explanatory (Grouping) Variable: Sex
  • Null Hypothesis : There is no sex effect regarding those who eat vegetarian meals on a regular basis (population percent of females who eat vegetarian meals on a regular basis = population percent of males who eat vegetarian meals on a regular basis or p females = p males ).
  • Alternative Hypothesis : Females are more likely than males to eat vegetarian meals on a regular basis (population percent of females who eat vegetarian meals on a regular basis > population percent of males who eat vegetarian meals on a regular basis or p females > p males ). This is a one-sided alternative hypothesis.

Example 10.5: Hypotheses with Two Samples of One Measurement Variable Section  

low carb meal

Obesity is a major health problem today. Research is starting to show that people may be able to lose more weight on a low carbohydrate diet than on a low fat diet.

  • Research Question : Does the data suggest that, on the average, people are able to lose more weight on a low carbohydrate diet than on a low fat diet?
  • Response Variable : Weight loss (pounds)
  • Explanatory (Grouping) Variable : Type of diet
  • Null Hypothesis : There is no difference in the mean amount of weight loss when comparing a low carbohydrate diet with a low fat diet (population mean weight loss on a low carbohydrate diet = population mean weight loss on a low fat diet).
  • Alternative Hypothesis : The mean weight loss should be greater for those on a low carbohydrate diet when compared with those on a low fat diet (population mean weight loss on a low carbohydrate diet > population mean weight loss on a low fat diet). This is a one-sided alternative hypothesis.

Example 10.6: Hypotheses about the relationship between Two Categorical Variables Section  

  • Research Question : Do the odds of having a stroke increase if you inhale second hand smoke ? A case-control study of non-smoking stroke patients and controls of the same age and occupation are asked if someone in their household smokes.
  • Variables : There are two different categorical variables (Stroke patient vs control and whether the subject lives in the same household as a smoker). Living with a smoker (or not) is the natural explanatory variable and having a stroke (or not) is the natural response variable in this situation.
  • Null Hypothesis : There is no relationship between whether or not a person has a stroke and whether or not a person lives with a smoker (odds ratio between stroke and second-hand smoke situation is = 1).
  • Alternative Hypothesis : There is a relationship between whether or not a person has a stroke and whether or not a person lives with a smoker (odds ratio between stroke and second-hand smoke situation is > 1). This is a one-tailed alternative.

This research question might also be addressed like example 11.4 by making the hypotheses about comparing the proportion of stroke patients that live with smokers to the proportion of controls that live with smokers.

Example 10.7: Hypotheses about the relationship between Two Measurement Variables Section  

  • Research Question : A financial analyst believes there might be a positive association between the change in a stock's price and the amount of the stock purchased by non-management employees the previous day (stock trading by management being under "insider-trading" regulatory restrictions).
  • Variables : Daily price change information (the response variable) and previous day stock purchases by non-management employees (explanatory variable). These are two different measurement variables.
  • Null Hypothesis : The correlation between the daily stock price change (\$) and the daily stock purchases by non-management employees (\$) = 0.
  • Alternative Hypothesis : The correlation between the daily stock price change (\$) and the daily stock purchases by non-management employees (\$) > 0. This is a one-sided alternative hypothesis.

Example 10.8: Hypotheses about comparing the relationship between Two Measurement Variables in Two Samples Section  

Calculation of a person's approximate tip for their meal

  • Research Question : Is there a linear relationship between the amount of the bill (\$) at a restaurant and the tip (\$) that was left. Is the strength of this association different for family restaurants than for fine dining restaurants?
  • Variables : There are two different measurement variables. The size of the tip would depend on the size of the bill so the amount of the bill would be the explanatory variable and the size of the tip would be the response variable.
  • Null Hypothesis : The correlation between the amount of the bill (\$) at a restaurant and the tip (\$) that was left is the same at family restaurants as it is at fine dining restaurants.
  • Alternative Hypothesis : The correlation between the amount of the bill (\$) at a restaurant and the tip (\$) that was left is the difference at family restaurants then it is at fine dining restaurants. This is a two-sided alternative hypothesis.

Have a language expert improve your writing

Run a free plagiarism check in 10 minutes, generate accurate citations for free.

  • Knowledge Base

Methodology

  • Correlation vs. Causation | Difference, Designs & Examples

Correlation vs. Causation | Difference, Designs & Examples

Published on July 12, 2021 by Pritha Bhandari . Revised on June 22, 2023.

Correlation means there is a statistical association between variables. Causation means that a change in one variable causes a change in another variable.

In research, you might have come across the phrase “correlation doesn’t imply causation.” Correlation and causation are two related ideas, but understanding their differences will help you critically evaluate sources and interpret scientific research.

Table of contents

What’s the difference, why doesn’t correlation mean causation, correlational research, third variable problem, regression to the mean, spurious correlations, directionality problem, causal research, other interesting articles, frequently asked questions about correlation and causation.

Correlation describes an association between types of variables : when one variable changes, so does the other. A correlation is a statistical indicator of the relationship between variables. These variables change together: they covary. But this covariation isn’t necessarily due to a direct or indirect causal link.

Causation means that changes in one variable brings about changes in the other; there is a cause-and-effect relationship between variables. The two variables are correlated with each other and there is also a causal link between them.

Prevent plagiarism. Run a free check.

There are two main reasons why correlation isn’t causation. These problems are important to identify for drawing sound scientific conclusions from research.

The third variable problem means that a confounding variable affects both variables to make them seem causally related when they are not. For example, ice cream sales and violent crime rates are closely correlated, but they are not causally linked with each other. Instead, hot temperatures, a third variable, affects both variables separately. Failing to account for third variables can lead research biases to creep into your work.

The directionality problem occurs when two variables correlate and might actually have a causal relationship, but it’s impossible to conclude which variable causes changes in the other. For example, vitamin D levels are correlated with depression, but it’s not clear whether low vitamin D causes depression, or whether depression causes reduced vitamin D intake.

You’ll need to use an appropriate research design to distinguish between correlational and causal relationships:

  • Correlational research designs can only demonstrate correlational links between variables.
  • Experimental designs can test causation.

In a correlational research design, you collect data on your variables without manipulating them.

Correlational research is usually high in external validity , so you can generalize your findings to real life settings. But these studies are low in internal validity , which makes it difficult to causally connect changes in one variable to changes in the other.

These research designs are commonly used when it’s unethical, too costly, or too difficult to perform controlled experiments. They are also used to study relationships that aren’t expected to be causal.

Without controlled experiments, it’s hard to say whether it was the variable you’re interested in that caused changes in another variable. Extraneous variables are any third variable or omitted variable other than your variables of interest that could affect your results.

Limited control in correlational research means that extraneous or confounding variables serve as alternative explanations for the results. Confounding variables can make it seem as though a correlational relationship is causal when it isn’t.

When two variables are correlated, all you can say is that changes in one variable occur alongside changes in the other.

Regression to the mean is observed when variables that are extremely higher or extremely lower than average on the first measurement move closer to the average on the second measurement. Particularly in research that intentionally focuses on the most extreme cases or events, RTM should always be considered as a possible cause of an observed change.

Players or teams featured on the cover of SI have earned their place by performing exceptionally well. But athletic success is a mix of skill and luck, and even the best players don’t always win.

Chances are that good luck will not continue indefinitely, and neither can exceptional success.

A spurious correlation is when two variables appear to be related through hidden third variables or simply by coincidence.

The Theory of the Stork draws a simple causal link between the variables to argue that storks physically deliver babies. This satirical study shows why you can’t conclude causation from correlational research alone.

When you analyze correlations in a large dataset with many variables, the chances of finding at least one statistically significant result are high. In this case, you’re more likely to make a type I error . This means erroneously concluding there is a true correlation between variables in the population based on skewed sample data.

To demonstrate causation, you need to show a directional relationship with no alternative explanations. This relationship can be unidirectional, with one variable impacting the other, or bidirectional, where both variables impact each other.

A correlational design won’t be able to distinguish between any of these possibilities, but an experimental design can test each possible direction, one at a time.

  • Physical activity may affect self esteem
  • Self esteem may affect physical activity
  • Physical activity and self esteem may both affect each other

In correlational research, the directionality of a relationship is unclear because there is limited researcher control. You might risk concluding reverse causality, the wrong direction of the relationship.

Causal links between variables can only be truly demonstrated with controlled experiments . Experiments test formal predictions, called hypotheses , to establish causality in one direction at a time.

Experiments are high in internal validity , so cause-and-effect relationships can be demonstrated with reasonable confidence.

You can establish directionality in one direction because you manipulate an independent variable before measuring the change in a dependent variable.

In a controlled experiment, you can also eliminate the influence of third variables by using random assignment and control groups.

Random assignment helps distribute participant characteristics evenly between groups so that they’re similar and comparable. A control group lets you compare the experimental manipulation to a similar treatment or no treatment (or a placebo, to control for the placebo effect ).

If you want to know more about statistics , methodology , or research bias , make sure to check out some of our other articles with explanations and examples.

  • Chi square test of independence
  • Statistical power
  • Descriptive statistics
  • Degrees of freedom
  • Pearson correlation
  • Null hypothesis
  • Double-blind study
  • Case-control study
  • Research ethics
  • Data collection
  • Hypothesis testing
  • Structured interviews

Research bias

  • Hawthorne effect
  • Unconscious bias
  • Recall bias
  • Halo effect
  • Self-serving bias
  • Information bias

A correlation reflects the strength and/or direction of the association between two or more variables.

  • A positive correlation means that both variables change in the same direction.
  • A negative correlation means that the variables change in opposite directions.
  • A zero correlation means there’s no relationship between the variables.

Correlation describes an association between variables : when one variable changes, so does the other. A correlation is a statistical indicator of the relationship between variables.

Causation means that changes in one variable brings about changes in the other (i.e., there is a cause-and-effect relationship between variables). The two variables are correlated with each other, and there’s also a causal link between them.

While causation and correlation can exist simultaneously, correlation does not imply causation. In other words, correlation is simply a relationship where A relates to B—but A doesn’t necessarily cause B to happen (or vice versa). Mistaking correlation for causation is a common error and can lead to false cause fallacy .

The third variable and directionality problems are two main reasons why correlation isn’t causation .

The third variable problem means that a confounding variable affects both variables to make them seem causally related when they are not.

The directionality problem is when two variables correlate and might actually have a causal relationship, but it’s impossible to conclude which variable causes changes in the other.

Controlled experiments establish causality, whereas correlational studies only show associations between variables.

  • In an experimental design , you manipulate an independent variable and measure its effect on a dependent variable. Other variables are controlled so they can’t impact the results.
  • In a correlational design , you measure variables without manipulating any of them. You can test whether your variables change together, but you can’t be sure that one variable caused a change in another.

In general, correlational research is high in external validity while experimental research is high in internal validity .

Cite this Scribbr article

If you want to cite this source, you can copy and paste the citation or click the “Cite this Scribbr article” button to automatically add the citation to our free Citation Generator.

Bhandari, P. (2023, June 22). Correlation vs. Causation | Difference, Designs & Examples. Scribbr. Retrieved September 16, 2024, from https://www.scribbr.com/methodology/correlation-vs-causation/

Is this article helpful?

Pritha Bhandari

Pritha Bhandari

Other students also liked, correlational research | when & how to use, guide to experimental design | overview, steps, & examples, confounding variables | definition, examples & controls, "i thought ai proofreading was useless but..".

I've been using Scribbr for years now and I know it's a service that won't disappoint. It does a good job spotting mistakes”

IMAGES

  1. PPT

    example of correlation hypothesis

  2. Independent study tutorial topics Feb 2021

    example of correlation hypothesis

  3. Correlation Hypothesis

    example of correlation hypothesis

  4. PPT

    example of correlation hypothesis

  5. PPT

    example of correlation hypothesis

  6. Correlation Hypothesis

    example of correlation hypothesis

VIDEO

  1. Hypothesis Testing

  2. Correlation Hypothesis Testing

  3. Conduct a Linear Correlation Hypothesis Test Using Free Web Calculators

  4. correlation& hypothesis using spss

  5. Statistics Correlation

  6. Testing of hypothesis about correlation coefficient

COMMENTS

  1. 11.2: Correlation Hypothesis Test

    The hypothesis test lets us decide whether the value of the population correlation coefficient \(\rho\) is "close to zero" or "significantly different from zero". We decide this based on the sample correlation coefficient \(r\) and the sample size \(n\). If the test concludes that the correlation coefficient is significantly different from zero ...

  2. 1.9

    Let's perform the hypothesis test on the husband's age and wife's age data in which the sample correlation based on n = 170 couples is r = 0.939. To test H 0: ρ = 0 against the alternative H A: ρ ≠ 0, we obtain the following test statistic: t ∗ = r n − 2 1 − R 2 = 0.939 170 − 2 1 − 0.939 2 = 35.39. To obtain the P -value, we need ...

  3. 12.1.2: Hypothesis Test for a Correlation

    The t-test is a statistical test for the correlation coefficient. It can be used when x x and y y are linearly related, the variables are random variables, and when the population of the variable y y is normally distributed. The formula for the t-test statistic is t = r (n − 2 1 −r2)− −−−−−−−√ t = r (n − 2 1 − r 2).

  4. Pearson Correlation Coefficient (r)

    Example: Deciding whether to reject the null hypothesis For the correlation between weight and height in a sample of 10 newborns, the t value is less than the critical value of t. Therefore, we don't reject the null hypothesis that the Pearson correlation coefficient of the population (ρ) is 0.

  5. 9.4.1

    The test statistic is: t ∗ = r n − 2 1 − r 2 = (0.711) 28 − 2 1 − 0.711 2 = 5.1556. Next, we need to find the p-value. The p-value for the two-sided test is: p-value = 2 P (T> 5.1556) <0.0001. Therefore, for any reasonable α level, we can reject the hypothesis that the population correlation coefficient is 0 and conclude that it is ...

  6. Correlation Coefficient

    What does a correlation coefficient tell you? Correlation coefficients summarize data and help you compare results between studies.. Summarizing data. A correlation coefficient is a descriptive statistic.That means that it summarizes sample data without letting you infer anything about the population. A correlation coefficient is a bivariate statistic when it summarizes the relationship ...

  7. 12.4 Testing the Significance of the Correlation Coefficient

    The correlation coefficient, r, tells us about the strength and direction of the linear relationship between x and y.However, the reliability of the linear model also depends on how many observed data points are in the sample. We need to look at both the value of the correlation coefficient r and the sample size n, together.. We perform a hypothesis test of the "significance of the correlation ...

  8. Conducting a Hypothesis Test for the Population Correlation Coefficient

    It should be noted that the three hypothesis tests we learned for testing the existence of a linear relationship — the t-test for H 0: β 1 = 0, the ANOVA F-test for H 0: β 1 = 0, and the t-test for H 0: ρ = 0 — will always yield the same results. For example, if we treat the husband's age ("HAge") as the response and the wife's age ("WAge") as the predictor, each test yields a P-value ...

  9. Interpreting Correlation Coefficients

    A positive correlation example is the relationship between the speed of a wind turbine and the amount of energy it produces. As the turbine speed increases, electricity production also increases. ... Hypothesis Test for Correlation Coefficients. Correlation coefficients have a hypothesis test. As with any hypothesis test, this test takes sample ...

  10. Hypothesis Testing: Correlations

    We perform a hypothesis test of the "significance of the correlation coefficient" to decide whether the linear relationship in the sample data is strong enough to use to model the relationship in the population. The hypothesis test lets us decide whether the value of the population correlation coefficient. \rho ρ.

  11. Correlational Study Overview & Examples

    A correlational study is an experimental design that evaluates only the correlation between variables. The researchers record measurements but do not control or manipulate the variables. Correlational research is a form of observational study. A correlation indicates that as the value of one variable increases, the other tends to change in a ...

  12. Hypothesis Test for Correlation: Explanation & Example

    Hypothesis Test for Correlation - Key takeaways. The Product Moment Correlation Coefficient (PMCC), or r, is a measure of how strongly related 2 variables are. It ranges between -1 and 1, indicating the strength of a correlation. The closer r is to 1 or -1 the stronger the (positive or negative) correlation between two variables.

  13. 13.2 Testing the Significance of the Correlation Coefficient

    The hypothesis test lets us decide whether the value of the population correlation coefficient ρ is "close to zero" or "significantly different from zero". We decide this based on the sample correlation coefficient r and the sample size n. If the test concludes that the correlation coefficient is significantly different from zero, we say that ...

  14. How to Write a Hypothesis for Correlation

    A hypothesis is a testable statement about how something works in the natural world. While some hypotheses predict a causal relationship between two variables, other hypotheses predict a correlation between them. According to the Research Methods Knowledge Base, a correlation is a single number that describes the relationship between two variables.

  15. Hypothesis Test for Correlation

    The hypothesis test lets us decide whether the value of the population correlation coefficient ρ is "close to zero" or "significantly different from zero.". We decide this based on the sample correlation coefficient r and the sample size n. If the test concludes that the correlation coefficient is significantly different from zero, we ...

  16. Correlation Analysis

    Here are a few examples of how correlation analysis could be applied in different contexts: ... Can help in hypothesis testing about the relationships between variables. Outliers can greatly affect the correlation coefficient. Can help in data reduction by identifying closely related variables.

  17. Hypothesis Testing

    Step 5: Present your findings. The results of hypothesis testing will be presented in the results and discussion sections of your research paper, dissertation or thesis.. In the results section you should give a brief summary of the data and a summary of the results of your statistical test (for example, the estimated difference between group means and associated p-value).

  18. Correlation: Meaning, Types, Examples & Coefficient

    Types. A positive correlation is a relationship between two variables in which both variables move in the same direction. Therefore, one variable increases as the other variable increases, or one variable decreases while the other decreases. An example of a positive correlation would be height and weight. Taller people tend to be heavier.

  19. 12.5: Testing the Significance of the Correlation Coefficient

    The hypothesis test lets us decide whether the value of the population correlation coefficient \(\rho\) is "close to zero" or "significantly different from zero". We decide this based on the sample correlation coefficient \(r\) and the sample size \(n\). If the test concludes that the correlation coefficient is significantly different from zero ...

  20. Correlation Hypothesis

    In this example, the correlation hypothesis suggests that there is a positive correlation between the amount of exercise a person engages in and their level of physical fitness. As exercise increases, the hypothesis predicts that physical fitness will increase as well. This hypothesis can be tested by collecting data on exercise levels and ...

  21. 10.1

    10.1 - Setting the Hypotheses: Examples. A significance test examines whether the null hypothesis provides a plausible explanation of the data. The null hypothesis itself does not involve the data. It is a statement about a parameter (a numerical characteristic of the population). These population values might be proportions or means or ...

  22. 10.1: Testing the Significance of the Correlation Coefficient

    The hypothesis test lets us decide whether the value of the population correlation coefficient \(\rho\) is "close to zero" or "significantly different from zero". We decide this based on the sample correlation coefficient \(r\) and the sample size \(n\). If the test concludes that the correlation coefficient is significantly different from zero ...

  23. Correlation vs. Causation

    Correlation means there is a statistical association between variables. Causation means that a change in one variable causes a change in another variable. In research, you might have come across the phrase "correlation doesn't imply causation.". Correlation and causation are two related ideas, but understanding their differences will help ...