Sciencing_Icons_Science SCIENCE

Sciencing_icons_biology biology, sciencing_icons_cells cells, sciencing_icons_molecular molecular, sciencing_icons_microorganisms microorganisms, sciencing_icons_genetics genetics, sciencing_icons_human body human body, sciencing_icons_ecology ecology, sciencing_icons_chemistry chemistry, sciencing_icons_atomic & molecular structure atomic & molecular structure, sciencing_icons_bonds bonds, sciencing_icons_reactions reactions, sciencing_icons_stoichiometry stoichiometry, sciencing_icons_solutions solutions, sciencing_icons_acids & bases acids & bases, sciencing_icons_thermodynamics thermodynamics, sciencing_icons_organic chemistry organic chemistry, sciencing_icons_physics physics, sciencing_icons_fundamentals-physics fundamentals, sciencing_icons_electronics electronics, sciencing_icons_waves waves, sciencing_icons_energy energy, sciencing_icons_fluid fluid, sciencing_icons_astronomy astronomy, sciencing_icons_geology geology, sciencing_icons_fundamentals-geology fundamentals, sciencing_icons_minerals & rocks minerals & rocks, sciencing_icons_earth scructure earth structure, sciencing_icons_fossils fossils, sciencing_icons_natural disasters natural disasters, sciencing_icons_nature nature, sciencing_icons_ecosystems ecosystems, sciencing_icons_environment environment, sciencing_icons_insects insects, sciencing_icons_plants & mushrooms plants & mushrooms, sciencing_icons_animals animals, sciencing_icons_math math, sciencing_icons_arithmetic arithmetic, sciencing_icons_addition & subtraction addition & subtraction, sciencing_icons_multiplication & division multiplication & division, sciencing_icons_decimals decimals, sciencing_icons_fractions fractions, sciencing_icons_conversions conversions, sciencing_icons_algebra algebra, sciencing_icons_working with units working with units, sciencing_icons_equations & expressions equations & expressions, sciencing_icons_ratios & proportions ratios & proportions, sciencing_icons_inequalities inequalities, sciencing_icons_exponents & logarithms exponents & logarithms, sciencing_icons_factorization factorization, sciencing_icons_functions functions, sciencing_icons_linear equations linear equations, sciencing_icons_graphs graphs, sciencing_icons_quadratics quadratics, sciencing_icons_polynomials polynomials, sciencing_icons_geometry geometry, sciencing_icons_fundamentals-geometry fundamentals, sciencing_icons_cartesian cartesian, sciencing_icons_circles circles, sciencing_icons_solids solids, sciencing_icons_trigonometry trigonometry, sciencing_icons_probability-statistics probability & statistics, sciencing_icons_mean-median-mode mean/median/mode, sciencing_icons_independent-dependent variables independent/dependent variables, sciencing_icons_deviation deviation, sciencing_icons_correlation correlation, sciencing_icons_sampling sampling, sciencing_icons_distributions distributions, sciencing_icons_probability probability, sciencing_icons_calculus calculus, sciencing_icons_differentiation-integration differentiation/integration, sciencing_icons_application application, sciencing_icons_projects projects, sciencing_icons_news news.

  • Share Tweet Email Print
  • Home ⋅
  • Math ⋅
  • Probability & Statistics ⋅
  • Distributions

How to Write a Hypothesis for Correlation

A hypothesis for correlation predicts a statistically significant relationship.

How to Calculate a P-Value

A hypothesis is a testable statement about how something works in the natural world. While some hypotheses predict a causal relationship between two variables, other hypotheses predict a correlation between them. According to the Research Methods Knowledge Base, a correlation is a single number that describes the relationship between two variables. If you do not predict a causal relationship or cannot measure one objectively, state clearly in your hypothesis that you are merely predicting a correlation.

Research the topic in depth before forming a hypothesis. Without adequate knowledge about the subject matter, you will not be able to decide whether to write a hypothesis for correlation or causation. Read the findings of similar experiments before writing your own hypothesis.

Identify the independent variable and dependent variable. Your hypothesis will be concerned with what happens to the dependent variable when a change is made in the independent variable. In a correlation, the two variables undergo changes at the same time in a significant number of cases. However, this does not mean that the change in the independent variable causes the change in the dependent variable.

Construct an experiment to test your hypothesis. In a correlative experiment, you must be able to measure the exact relationship between two variables. This means you will need to find out how often a change occurs in both variables in terms of a specific percentage.

Establish the requirements of the experiment with regard to statistical significance. Instruct readers exactly how often the variables must correlate to reach a high enough level of statistical significance. This number will vary considerably depending on the field. In a highly technical scientific study, for instance, the variables may need to correlate 98 percent of the time; but in a sociological study, 90 percent correlation may suffice. Look at other studies in your particular field to determine the requirements for statistical significance.

State the null hypothesis. The null hypothesis gives an exact value that implies there is no correlation between the two variables. If the results show a percentage equal to or lower than the value of the null hypothesis, then the variables are not proven to correlate.

Record and summarize the results of your experiment. State whether or not the experiment met the minimum requirements of your hypothesis in terms of both percentage and significance.

Related Articles

How to determine the sample size in a quantitative..., how to calculate a two-tailed test, how to interpret a student's t-test results, how to know if something is significant using spss, quantitative vs. qualitative data and laboratory testing, similarities of univariate & multivariate statistical..., what is the meaning of sample size, distinguishing between descriptive & causal studies, how to calculate cv values, how to determine your practice clep score, what are the different types of correlations, how to calculate p-hat, how to calculate percentage error, how to calculate percent relative range, how to calculate a sample size population, how to calculate bias, how to calculate the percentage of another number, how to find y value for the slope of a line, advantages & disadvantages of finding variance.

  • University of New England; Steps in Hypothesis Testing for Correlation; 2000
  • Research Methods Knowledge Base; Correlation; William M.K. Trochim; 2006
  • Science Buddies; Hypothesis

About the Author

Brian Gabriel has been a writer and blogger since 2009, contributing to various online publications. He earned his Bachelor of Arts in history from Whitworth University.

Photo Credits

Thinkstock/Comstock/Getty Images

Find Your Next Great Science Fair Project! GO

Correlation in Psychology: Meaning, Types, Examples & coefficient

Saul McLeod, PhD

Editor-in-Chief for Simply Psychology

BSc (Hons) Psychology, MRes, PhD, University of Manchester

Saul McLeod, PhD., is a qualified psychology teacher with over 18 years of experience in further and higher education. He has been published in peer-reviewed journals, including the Journal of Clinical Psychology.

Learn about our Editorial Process

Olivia Guy-Evans, MSc

Associate Editor for Simply Psychology

BSc (Hons) Psychology, MSc Psychology of Education

Olivia Guy-Evans is a writer and associate editor for Simply Psychology. She has previously worked in healthcare and educational sectors.

On This Page:

Correlation means association – more precisely, it measures the extent to which two variables are related. There are three possible results of a correlational study: a positive correlation, a negative correlation, and no correlation.
  • A positive correlation is a relationship between two variables in which both variables move in the same direction. Therefore, one variable increases as the other variable increases, or one variable decreases while the other decreases. An example of a positive correlation would be height and weight. Taller people tend to be heavier.

positive correlation

  • A negative correlation is a relationship between two variables in which an increase in one variable is associated with a decrease in the other. An example of a negative correlation would be the height above sea level and temperature. As you climb the mountain (increase in height), it gets colder (decrease in temperature).

negative correlation

  • A zero correlation exists when there is no relationship between two variables. For example, there is no relationship between the amount of tea drunk and the level of intelligence.

zero correlation

Scatter Plots

A correlation can be expressed visually. This is done by drawing a scatter plot (also known as a scattergram, scatter graph, scatter chart, or scatter diagram).

A scatter plot is a graphical display that shows the relationships or associations between two numerical variables (or co-variables), which are represented as points (or dots) for each pair of scores.

A scatter plot indicates the strength and direction of the correlation between the co-variables.

Types of Correlations: Positive, Negative, and Zero

When you draw a scatter plot, it doesn’t matter which variable goes on the x-axis and which goes on the y-axis.

Remember, in correlations, we always deal with paired scores, so the values of the two variables taken together will be used to make the diagram.

Decide which variable goes on each axis and then simply put a cross at the point where the two values coincide.

Uses of Correlations

  • If there is a relationship between two variables, we can make predictions about one from another.
  • Concurrent validity (correlation between a new measure and an established measure).

Reliability

  • Test-retest reliability (are measures consistent?).
  • Inter-rater reliability (are observers consistent?).

Theory verification

  • Predictive validity.

Correlation Coefficients

Instead of drawing a scatter plot, a correlation can be expressed numerically as a coefficient, ranging from -1 to +1. When working with continuous variables, the correlation coefficient to use is Pearson’s r.

Correlation Coefficient Interpretation

The correlation coefficient ( r ) indicates the extent to which the pairs of numbers for these two variables lie on a straight line. Values over zero indicate a positive correlation, while values under zero indicate a negative correlation.

A correlation of –1 indicates a perfect negative correlation, meaning that as one variable goes up, the other goes down. A correlation of +1 indicates a perfect positive correlation, meaning that as one variable goes up, the other goes up.

There is no rule for determining what correlation size is considered strong, moderate, or weak. The interpretation of the coefficient depends on the topic of study.

When studying things that are difficult to measure, we should expect the correlation coefficients to be lower (e.g., above 0.4 to be relatively strong). When we are studying things that are easier to measure, such as socioeconomic status, we expect higher correlations (e.g., above 0.75 to be relatively strong).)

In these kinds of studies, we rarely see correlations above 0.6. For this kind of data, we generally consider correlations above 0.4 to be relatively strong; correlations between 0.2 and 0.4 are moderate, and those below 0.2 are considered weak.

When we are studying things that are more easily countable, we expect higher correlations. For example, with demographic data, we generally consider correlations above 0.75 to be relatively strong; correlations between 0.45 and 0.75 are moderate, and those below 0.45 are considered weak.

Correlation vs. Causation

Causation means that one variable (often called the predictor variable or independent variable) causes the other (often called the outcome variable or dependent variable).

Experiments can be conducted to establish causation. An experiment isolates and manipulates the independent variable to observe its effect on the dependent variable and controls the environment in order that extraneous variables may be eliminated.

A correlation between variables, however, does not automatically mean that the change in one variable is the cause of the change in the values of the other variable. A correlation only shows if there is a relationship between variables.

causation correlationg graph

While variables are sometimes correlated because one does cause the other, it could also be that some other factor, a confounding variable , is actually causing the systematic movement in our variables of interest.

Correlation does not always prove causation, as a third variable may be involved. For example, being a patient in a hospital is correlated with dying, but this does not mean that one event causes the other, as another third variable might be involved (such as diet and level of exercise).

“Correlation is not causation” means that just because two variables are related it does not necessarily mean that one causes the other.

A correlation identifies variables and looks for a relationship between them. An experiment tests the effect that an independent variable has upon a dependent variable but a correlation looks for a relationship between two variables.

This means that the experiment can predict cause and effect (causation) but a correlation can only predict a relationship, as another extraneous variable may be involved that it not known about.

1. Correlation allows the researcher to investigate naturally occurring variables that may be unethical or impractical to test experimentally. For example, it would be unethical to conduct an experiment on whether smoking causes lung cancer.

2 . Correlation allows the researcher to clearly and easily see if there is a relationship between variables. This can then be displayed in a graphical form.

Limitations

1 . Correlation is not and cannot be taken to imply causation. Even if there is a very strong association between two variables, we cannot assume that one causes the other.

For example, suppose we found a positive correlation between watching violence on T.V. and violent behavior in adolescence.

It could be that the cause of both these is a third (extraneous) variable – for example, growing up in a violent home – and that both the watching of T.V. and the violent behavior is the outcome of this.

2 . Correlation does not allow us to go beyond the given data. For example, suppose it was found that there was an association between time spent on homework (1/2 hour to 3 hours) and the number of G.C.S.E. passes (1 to 6).

It would not be legitimate to infer from this that spending 6 hours on homework would likely generate 12 G.C.S.E. passes.

How do you know if a study is correlational?

A study is considered correlational if it examines the relationship between two or more variables without manipulating them. In other words, the study does not involve the manipulation of an independent variable to see how it affects a dependent variable.

One way to identify a correlational study is to look for language that suggests a relationship between variables rather than cause and effect.

For example, the study may use phrases like “associated with,” “related to,” or “predicts” when describing the variables being studied.

Another way to identify a correlational study is to look for information about how the variables were measured. Correlational studies typically involve measuring variables using self-report surveys, questionnaires, or other measures of naturally occurring behavior.

Finally, a correlational study may include statistical analyses such as correlation coefficients or regression analyses to examine the strength and direction of the relationship between variables.

Why is a correlational study used?

Correlational studies are particularly useful when it is not possible or ethical to manipulate one of the variables.

For example, it would not be ethical to manipulate someone’s age or gender. However, researchers may still want to understand how these variables relate to outcomes such as health or behavior.

Additionally, correlational studies can be used to generate hypotheses and guide further research.

If a correlational study finds a significant relationship between two variables, this can suggest a possible causal relationship that can be further explored in future research.

What is the goal of correlational research?

The ultimate goal of correlational research is to increase our understanding of how different variables are related and to identify patterns in those relationships.

This information can then be used to generate hypotheses and guide further research aimed at establishing causality.

Print Friendly, PDF & Email

  • Skip to secondary menu
  • Skip to main content
  • Skip to primary sidebar

Statistics By Jim

Making statistics intuitive

Interpreting Correlation Coefficients

By Jim Frost 143 Comments

What are Correlation Coefficients?

Correlation coefficients measure the strength of the relationship between two variables. A correlation between variables indicates that as one variable changes in value, the other variable tends to change in a specific direction.  Understanding that relationship is useful because we can use the value of one variable to predict the value of the other variable. For example, height and weight are correlated—as height increases, weight also tends to increase. Consequently, if we observe an individual who is unusually tall, we can predict that his weight is also above the average.

In statistics , correlation coefficients are a quantitative assessment that measures both the direction and the strength of this tendency to vary together. There are different types of correlation coefficients that you can use for different kinds of data . In this post, I cover the most common type of correlation—Pearson’s correlation coefficient.

Before we get into the numbers, let’s graph some data first so we can understand the concept behind what we are measuring.

Graph Your Data to Find Correlations

Scatterplots are a great way to check quickly for correlation between pairs of continuous data. The scatterplot below displays the height and weight of pre-teenage girls. Each dot on the graph represents an individual girl and her combination of height and weight. These data are actual data that I collected during an experiment.

This scatterplot displays a positive correlation between height and weight.

At a glance, you can see that there is a correlation between height and weight. As height increases, weight also tends to increase. However, it’s not a perfect relationship. If you look at a specific height, say 1.5 meters, you can see that there is a range of weights associated with it. You can also find short people who weigh more than taller people. However, the general tendency that height and weight increase together is unquestionably present—a correlation exists.

Pearson’s correlation coefficient takes all of the data points on this graph and represents them as a single number. In this case, the statistical output below indicates that the Pearson’s correlation coefficient is 0.694.

Statistical output that displays Pearson's correlation coefficient and p-value.

What do the Pearson correlation coefficient and p-value mean? We’ll interpret the output soon. First, let’s look at a range of possible correlation coefficients so we can understand how our height and weight example fits in.

Related posts : Using Excel to Calculate Correlation and Guide to Scatterplots

How to Interpret Pearson Correlation Coefficients

Pearson’s correlation coefficient is represented by the Greek letter rho ( ρ ) for the population parameter and r for a sample statistic. This correlation coefficient is a single number that measures both the strength and direction of the linear relationship between two continuous variables. Values can range from -1 to +1.

The greater the absolute value of the Pearson correlation coefficient, the stronger the relationship.

  • The extreme values of -1 and 1 indicate a perfectly linear relationship where a change in one variable is accompanied by a perfectly consistent change in the other. For these relationships, all of the data points fall on a line. In practice, you won’t see either type of perfect relationship.
  • A coefficient of zero represents no linear relationship. As one variable increases, there is no tendency in the other variable to either increase or decrease.
  • When the value is in-between 0 and +1/-1, there is a relationship, but the points don’t all fall on a line. As r approaches -1 or 1, the strength of the relationship increases and the data points tend to fall closer to a line.

The sign of the Pearson correlation coefficient represents the direction of the relationship.

  • Positive coefficients indicate that when the value of one variable increases, the value of the other variable also tends to increase. Positive relationships produce an upward slope on a scatterplot.
  • Negative coefficients represent cases when the value of one variable increases, the value of the other variable tends to decrease. Negative relationships produce a downward slope.

Statisticians consider Pearson’s correlation coefficients to be a standardized effect size because they indicate the strength of the relationship between variables using unitless values that fall within a standardized range of -1 to +1. Effect sizes help you understand how important the findings are in a practical sense. To learn more about unstandardized and standardized effect sizes, read my post about Effect Sizes in Statistics .

Learn how to calculate correlation in my post, Correlation Coefficient Formula Walkthrough .

Covariance is an unstandardized form of correlation. Learn about it in my posts:

  • Covariance: Definition, Formula & Example
  • Covariances vs Correlation: Understanding the Differences

Examples of Positive and Negative Correlation Coefficients

A positive correlation example is the relationship between the speed of a wind turbine and the amount of energy it produces. As the turbine speed increases, electricity production also increases.

A negative correlation example is the relationship between outdoor temperature and heating costs. As the temperature increases, heating costs decrease.

Graphs for Different Correlation Coefficients

Graphs always help bring concepts to life. The scatterplots below represent a spectrum of different Pearson correlation coefficients. I’ve held the horizontal and vertical scales of the scatterplots constant to allow for valid comparisons between them.

This scatterplot displays a perfect positive correlation of +1.

Discussion about the Scatterplots

For the scatterplots above, I created one positive correlation between the variables and one negative relationship between the variables. Then, I varied only the amount of dispersion between the data points and the line that defines the relationship. That process illustrates how correlation measures the strength of the relationship. The stronger the relationship, the closer the data points fall to the line. I didn’t include plots for weaker correlation coefficients that are closer to zero than 0.6 and -0.6 because they start to look like blobs of dots and it’s hard to see the relationship.

A common misinterpretation is assuming that negative Pearson correlation coefficients indicate that there is no relationship. After all, a negative correlation sounds suspiciously like no relationship. However, the scatterplots for the negative correlations display real relationships. For negative correlation coefficients, high values of one variable are associated with low values of another variable. For example, there is a negative correlation coefficient for school absences and grades. As the number of absences increases, the grades decrease.

Earlier I mentioned how crucial it is to graph your data to understand them better. However, a quantitative measurement of the relationship does have an advantage. Graphs are a great way to visualize the data, but the scaling can exaggerate or weaken the appearance of a correlation. Additionally, the automatic scaling in most statistical software tends to make all data look similar .

Fortunately, Pearson’s correlation coefficients are unaffected by scaling issues. Consequently, a statistical assessment is better for determining the precise strength of the relationship.

Graphs and the relevant statistical measures often work better in tandem.

Pearson’s Correlation Coefficients Measure Linear Relationship

Pearson’s correlation coefficients measure only linear relationships. Consequently, if your data contain a curvilinear relationship, the Pearson correlation coefficient will not detect it. For example, the correlation for the data in the scatterplot below is zero. However, there is a relationship between the two variables—it’s just not linear.

Scatterplot displays a curvilinear relationship that has a Pearson's correlation coefficient of 0.

This example illustrates another reason to graph your data! Just because the coefficient is near zero, it doesn’t necessarily indicate that there is no relationship.

Spearman’s correlation is a nonparametric alternative to Pearson’s correlation coefficient. Use Spearman’s correlation for nonlinear, monotonic relationships and for ordinal data. For more information, read my post Spearman’s Correlation Explained !

Hypothesis Test for Correlation Coefficients

Correlation coefficients have a hypothesis test. As with any hypothesis test, this test takes sample data and evaluates two mutually exclusive statements about the population from which the sample was drawn. For Pearson correlations, the two hypotheses are the following:

  • Null hypothesis: There is no linear relationship between the two variables. ρ = 0.
  • Alternative hypothesis: There is a linear relationship between the two variables. ρ ≠ 0.

Correlation coefficients that equal zero indicate no linear relationship exists. If your p-value is less than your significance level , the sample contains sufficient evidence to reject the null hypothesis and conclude that the Pearson correlation coefficient does not equal zero. In other words, the sample data support the notion that the relationship exists in the population.

Related post : Overview of Hypothesis Tests

Interpreting our Height and Weight Correlation Example

Now that we have seen a range of positive and negative relationships, let’s see how our Pearson correlation coefficient of 0.694 fits in. We know that it’s a positive relationship. As height increases, weight tends to increase. Regarding the strength of the relationship, the graph shows that it’s not a very strong relationship where the data points tightly hug a line. However, it’s not an entirely amorphous blob with a very low correlation. It’s somewhere in between. That description matches our moderate correlation coefficient of 0.694.

For the hypothesis test, our p-value equals 0.000. This p-value is less than any reasonable significance level. Consequently, we can reject the null hypothesis and conclude that the relationship is statistically significant. The sample data support the notion that the relationship between height and weight exists in the population of preteen girls.

Correlation Does Not Imply Causation

I’m sure you’ve heard this expression before, and it is a crucial warning. Correlation between two variables indicates that changes in one variable are associated with changes in the other variable. However, correlation does not mean that the changes in one variable actually cause the changes in the other variable.

Sometimes it is clear that there is a causal relationship. For the height and weight data, it makes sense that adding more vertical structure to a body causes the total mass to increase. Or, increasing the wattage of lightbulbs causes the light output to increase.

However, in other cases, a causal relationship is not possible. For example, ice cream sales and shark attacks have a positive correlation coefficient. Clearly, selling more ice cream does not cause shark attacks (or vice versa). Instead, a third variable, outdoor temperatures, causes changes in the other two variables. Higher temperatures increase both sales of ice cream and the number of swimmers in the ocean, which creates the apparent relationship between ice cream sales and shark attacks.

Beware of spurious correlations!

In statistics, you typically need to perform a randomized, controlled experiment to determine that a relationship is causal rather than merely correlation. Conversely, Correlational Studies will find relationships quickly and easily but they are not suitable for establishing causality.

Learn more about Correlation vs. Causation: Understanding the Differences .

Related posts : Using Random Assignment in Experiments and Observational Studies

How Strong of a Correlation is Considered Good?

What is a good correlation? How high should correlation coefficients be? These are commonly asked questions. I have seen several schemes that attempt to classify correlations as strong, medium, and weak.

However, there is only one correct answer. A Pearson correlation coefficient should accurately reflect the strength of the relationship. Take a look at the correlation between the height and weight data, 0.694. It’s not a very strong relationship, but it accurately represents our data. An accurate representation is the best-case scenario for using a statistic to describe an entire dataset.

The strength of any relationship naturally depends on the specific pair of variables. Some research questions involve weaker relationships than other subject areas. Case in point, humans are hard to predict. Studies that assess relationships involving human behavior tend to have correlation coefficients weaker than +/- 0.6.

However, if you analyze two variables in a physical process, and have very precise measurements, you might expect correlations near +1 or -1. There is no one-size fits all best answer for how strong a relationship should be. The correct values for correlation coefficients depend on your study area.

Taking Correlation to the Next Level with Regression Analysis

Wouldn’t it be nice if instead of just describing the strength of the relationship between height and weight, we could define the relationship itself using an equation? Regression analysis does just that. That analysis finds the line and corresponding equation that provides the best fit to our dataset. We can use that equation to understand how much weight increases with each additional unit of height and to make predictions for specific heights. Read my post where I talk about the regression model for the height and weight data .

Regression analysis allows us to expand on correlation in other ways. If we have more variables that explain changes in weight, we can include them in the model and potentially improve our predictions. And, if the relationship is curved, we can still fit a regression model to the data.

Additionally, a form of the Pearson correlation coefficient shows up in regression analysis. R-squared is a primary measure of how well a regression model fits the data. This statistic represents the percentage of variation in one variable that other variables explain. For a pair of variables, R-squared is simply the square of the Pearson’s correlation coefficient. For example, squaring the height-weight correlation coefficient of 0.694 produces an R-squared of 0.482, or 48.2%. In other words, height explains about half the variability of weight in preteen girls.

If you’re learning about statistics and like the approach I use in my blog, check out my Introduction to Statistics book! It’s available at Amazon and other retailers.

Cover of my Introduction to Statistics: An Intuitive Guide ebook.

Share this:

correlational hypothesis example

Reader Interactions

' src=

May 7, 2024 at 9:18 am

Is there any benefit to doing both a correlation and a regression test? I don’t think there is – I believe that a regression output will give you the same information a correlation output would plus more. Please could you let me know if that is correct or am I missing something?

' src=

May 7, 2024 at 2:08 pm

Hi Charlotte,

In general, you are correct for simple regression, where you have one independent variable and the dependent variable. The R-square for that model is literally the square of the Pearson’s correlation (r) for those two variables. As you mention, regression gives you additional output along with the strength of the relationship.

But there are a few caveats.

Regression is much more flexible than correlation because it allows you to add other variables, fit curvature and include interaction effects. For example, regression allows you to fit curvature between the two variables using polynomials. So, there are cases where using Pearson’s correlation is inappropriate because the data violate some of the assumptions but regression analysis can handle those data acceptably.

But what you say is correct when you’re looking at a straight line relationship between a pair of variables. In that specific case, simple regression and Pearson’s correlation provide consistent information with regression providing more details.

' src=

March 12, 2024 at 4:11 am

Hi If you are finding the trend between one type of quantitative discrete data and one type of qualitative ordinal data, what correlation test do you use?

' src=

September 9, 2023 at 4:46 am

It could be that the sharks are using ice cream as bait. Maybe the sharks are smarter than we think… Seriously, the ice cream as a cause is not likely, but sometimes a perfectly sensible hypothesis with lots of data behind it can be just plain wrong.

September 9, 2023 at 11:43 pm

It can be wrong in causal sense but if ice cream cones has a non-causal correlation with the number of shark attacks, it can still help you make predictions. Now, if you thought limiting ice cream sales will reduce shark attacks, that’s not going to work!

' src=

June 9, 2023 at 1:56 am

What is to be done when two positive items show a negative correlation within one variable.. e.g increase in house help decreases no interruptions in work?? It’s confusing as both r positive questions

June 10, 2023 at 1:09 am

It’s possibly the result of other variables, known as confounding variables (or confounders) that you might not even have recorded. For example, there might be some other variable that correlates with both “house help” and “interruptions at work” that explain the unexpected negative correlation. Perhaps individuals with house help have more activities occurring throughout the day at home. Those activities would then cause more interruptions. So, you might have chain of correlations where the “home activities” and “house help” have positive correlations. Additionally, “home activities” and “interruptions” might have a negative correlation. Given this arrangement, it wouldn’t be surprising to see a negative correlation between “home activities” and “interruptions.”

It goes to show that you need to understand the larger context when analyzing data. Technically, this phenomenon is known as omitted variable bias . Your model (pairwise correlation) omits an important variable (a confounder) which is biasing the results. Click the link to learn more.

The answer is to identify and record the confounding variables and include them in your model, likely a regression model or partial correlation.

' src=

May 8, 2023 at 12:58 pm

What if my pearson’s r is 0.187 and p-value is 0.001 do i reject the null hypothesis?

May 8, 2023 at 2:56 pm

Yes! That p-value is below any reasonable significance level. Hence, you can reject the null hypothesis. However, be aware that while the correlation is statistically significant, it is so weak that it probably isn’t practically significant in the real world. In other words, it probably exists in the population you’re assessing but it is too weak to be noticeable/meaningful.

November 30, 2022 at 4:53 am

Thank you, Jim. I really appreciate your help. I will read your post about statistical v practical significance – that sounds really useful. I love how you explain things in such an accessible way.

I have one more question that I was hoping you would be able to help me with, please?

If I have done a correlation test and I have found an extremely weak negative relationship (e.g., -.02) but the relationship is not statistically significant, would this mean that although I have found that there is a very weak negative correlation between the variables in the sample data, this would unlikely to be found in the population. Therefore, I would fail to reject the null hypothesis that the correlation in the population equals zero.

Thank you again for your help and for this wonderful blog.

December 1, 2022 at 1:57 am

You’re very welcome!

In the case where the correlation is not significant, it indicates that you have insufficient evidence to conclude that it does not equal zero. That’s a mouthful but there’s a reason for the convoluted wording. Insignificant results don’t prove that there is no effect, it just indicates that your test didn’t detect an effect in the population. It could be that the effect doesn’t exist in the population OR it could be that your sample size was too small or there’s too much variability in the data.

In short, we say that you failed to reject the null hypothesis.

Basically, you can’t prove a negative (no effect). All you can say is that your study didn’t detect an effect. In this case, it didn’t detect a non-zero correlation.

You can read more about the reason behind the wording failing to reject the null hypothesis and what it means precisely.

November 29, 2022 at 12:39 pm

Thank you for this webpage. It is great. I have a question, which I was hoping you’d be able to help me with please.

I have carried out a correlation test, and from my understanding a null hypothesis would be that there is no relationship between the two variables (the variables are independent – there is no correlation).

The p value is statistically significant (.000), and the Pearson correlation result is -.036.

My understanding is that if there is a statically significant relationship then I would reject the null hypothesis (which suggests there is no relationship between the two variables). My issue is then whether -.036 suggests a very weak relationship or no relationship at all given how close to 0 it is. If it is the latter, would I then say I have failed to reject the null hypothesis even though there is a statisicially significant relationship? Or would I say that I have rejected the null hypothesis because there is a statically significant relationship, but the correlation is very weak.

Any help would be appreciated. Kind regards.

November 29, 2022 at 4:10 pm

What you’re seeing is the difference between statistical significance and practically significance. Yes, your results are statistically significant. You can reject the null hypothesis that rho (the correlation in the population) does not equal zero. Your data provide enough evidence to conclude that the negative correlation exists in the population (not just your sample).

However, as you say, it’s an extremely weak relationship. Even though it’s not zero it is essentially zero in a practical sense. Statistically significant results don’t automatically mean that the effect size (correlation is this case) is meaningful in the real-world. When a test has very high statistical power (e.g., sometimes due to a very large sample size), it can detect trivial effects. Those effects are real but they’re small in size.

I write more about this in my post about statistical vs. practical significance . But, in a nutshell, your correlation coefficient is statistically significant, but it is not a meaningful effect in the real world.

' src=

September 28, 2022 at 10:44 am

I have a simple question, only to frame how to use correlation. Imagine a trial with plants, testing different phosphate (Pi) concentrations (like 8) and its effect on plant growth (assessed as mean plant size per Pi concentration, from enough replicates and data validity to perform classical parametric statistics).

In case A, I have a strong (positive) and significant Pearson correlation between these two parameters, and in particular, the 8 average size values show statistical significant differences (ANOVA) between all the Pi concentrations tested.

In case B, I have the same strong (positive) significant Pearson correlation, but there is no any statistical significant difference in term of size between any Pi concentration tested.

My guess is that it may be possible to interpret the case A as Pi is correlated with plant growth; but in case B, no interpretation can be provided given that no significant difference is seen between Pi concentrations on plant size, even if a correlation is obtained. Is this right ? But in this case, if I have 3 out the 8 Pi concentrations which I obtained significant difference on plant size, should I perform correlation only between significant Pi groups or could I still take all the 8 Pi groups to make interpretations ? Thanks in advance !

September 29, 2022 at 7:02 pm

I don’t fully understand your trial. You say that you have a continuous measure of Pi concentration and then average plant sizes. Pearson correlations work with two continuous measures–not a group average. So, you’d need to correlate the Pi concentration with plant size, not average plant size. Or perhaps I’m misunderstanding your description. Please clarify your process. Thanks!

In a more general sense, you have to remember that statistical significance doesn’t necessarily indicate there is a real-world, practical significance to your results. That’s possibly what you’re finding in case B. Although again it’s hard to say if you’re applying correlation to averages.

Statistical significance just indicates that you have reason to believe that a relationship/effect exists in the population. It doesn’t necessarily mean that the effect is large enough to be practically meaningful. For more information, read my post about Practical vs. Statistical Significance .

' src=

August 16, 2022 at 11:16 am

This was very educative and easy to follow through for a statistics noob such as me. Thanks! I like your books. Which one is most suited for a beginner level of knowledge?

August 17, 2022 at 12:20 am

My Introduction to Statistics book is the best to get started with for beginners. Click the link to see a post where I discuss it and included a full table of contents.

After reading that, you’d be ready to read both of my two other books: Hypothesis Testing Regression Analysis

' src=

May 16, 2022 at 2:45 pm

Jim, Nassim Taleb makes the point on YouTube (search for Taleb and correlation) that an r = 0.10 is much closer to zero than to r = 0.20) implying that the distribution function for r is very dependent on the r in the population, and the sample size and that the scale of -1.0 to +1.0 is not a scale separated by equal units. He then warns of significance tests because r is a random variable and subject to sampling fluctuations and r = .25 could easily be zero due to sampling error (especially for small sample sizes). Can you please discuss if the scale of r = -1.0 to 1.0 is set in equidistant units, or units that only superficially look like they are equidistant?

May 16, 2022 at 6:41 pm

I did a quick search and found a video where he’s talking about using correlation in the financial and investment areas. He seems to be saying that correlation is not the correct tool for that context. I can’t talk to that point because I’m not familiar with the context.

However, yes, I can help you out with most of the other points!

I’ll start with the fact that the scale of -1 to +1 is, in some ways, not consistent. To start, correlation coefficients are a standardized effect. As such, they are unitless. You can’t link them to anything real, but they help you compare between disparate types of studies. In other words, they excel at providing a standard basis of comparison between studies. However, they’re not as good for knowing what the statistic actually means, except for a few specific values, -1, +1, and 0. And perhaps that’s why Taleb isn’t fond of them. (At 20 minutes, I didn’t watch the entire video.)

However, we can convert r to R-squared and it becomes more meaningful. R-squared tells us how much of the variance the relationship accounts for. And, as the name implies, you simply square r to get R-squared. It’s in R-squared where you see that the difference between r of 0.1 and 0.2 is different from say 0.8 and 0.9. When you go from 0.1 to 0.2, R-squared increases from 0.01 to 0.04, an increase of 3%. And note that at those correlations, we’re only explaining between 1 – 4% of the variance. Virtually nothing! Now, if we look at going from an r of 0.8 to 0.9, R-squared increases from 0.64 to 0.81, or 17%. So, we have the same size increase in r (0.1) in both cases, but R-squared increases by 3% in one case and 17% in the other. Also, notice how at a r of 0.5, you’re only accounting for 25% of the variance. That’s not very much. You need an r of 0.707 to explain half the variance (50%). Another way to think of it is that the range of r [0, 0.7] accounts for half the variance while r [0.7, 1] accounts for the other half.

I agree with the point that r = 0.1 is virtually nothing. In fact, you need an r of 0.316 to explain even a tenth (10%) of the variability. I also agree that fixed differences in r (e.g., 0.1) indicates different changes in the strength of the relationship, as I illustrate above. I think those points are valid.

Below, I include a graph showing r vs. R-squared and the curved line indicates that the relationship between the two statistics changes (the inconsistency you mention). If the relationship was consistent, it would be a straight line. For me, R-squared is the better statistic, particularly in conjunction with regression analysis, which provides more information about the nature of the relationships. Of course, the negative range of r produces the mirror graph but the same ideas apply.

Graph displaying the relationship between r and R-squared.

I think correlation coefficients (r) have some other shortcomings. They describe the strength of the relationship but not the actual relationship. And they don’t account for other variables. Regression analysis handles those aspects and I generally prefer that methodology. For me, simple correlation just doesn’t provide enough information by itself in most cases. You also typically don’t get residual plots so you can be sure that you’re satisfying the assumptions (Pearson’s correlation (r) is essentially a linear model).

The sample r does depend on the relationship in the population. But that’s true for all sample statistics–as I write in my post, Sample Statistics Are Always Wrong to Some Extent! I don’t think it’s any worse for correlation than other types of sample statistics. As you increase your sample size, the estimate’s precision will increase (i.e., the error bars become smaller).

I think significance tests are valid for correlation. Yes, it’s subject to sampling fluctuations ( sampling error ) but so are all sample based statistics. Hypothesis testing is designed to factor that in. In fact, significance testing specifically helps you distinguish between cases where the sample r = 0.25 might represent 0 in the population vs. cases where that is unlikely. That’s the very intention of significance testing, so I strongly disagree with that point!

' src=

April 9, 2022 at 2:20 am

Thank you for the fast response!! I have alaso read the Spearman’s Rho article (very insightful). In my scatterplot it is suggesting that there is no correlation (completely random distribution). However, I would still like to test the correlation but in the Spearmans’s Rho article you mentioned that if it is there is no correlation, both the spearman’s Rho value and Pearson’s correlation value would be close to zero. Is it also possible that one value is positive and one is negative? My results right now are R2 Linear= 0.003, Pearson correlation= .058, and Spearman’s correlation coefficient= -0.19. Should I base the rejection of either of my hypothesises on Spaerman’s value or Pearson’s value

Thank you so much!!!

April 9, 2022 at 10:42 pm

I’m glad that it was helpful! It’s definitely possible for correlations to switch directions like that. That’s especially true because both correlations are barely different from zero. So, it wouldn’t take much to cause them to be on opposite sides of zero. The R-squared is telling you that the Pearson’s correlation explains hardly any of the variability.

' src=

April 8, 2022 at 7:05 pm

Thank you for this post!! I was wondering, I did a scatterplot which gave me a R2 value of 0.003. The fitline showed a really weak positive correlation which I wanted to test with the Spearmans rho. However, this value is showing a negative value (negative relationship). Do you maybe know why it is showing different correlations since I am using the exact same values?

April 8, 2022 at 7:51 pm

The R-squared value and slope you’re seeing are related to Pearson’s correlation, which differs from Spearmans rho. They’re different statistical measures using different methods, so it’s not surprising that their values can be different. For more information, read my post about Spearman’s Rho .

' src=

April 6, 2022 at 3:37 am

Hi Jim, I had a question. It’s kinda complicated but I try my best to explain it well.

I run a correlation test between objective social isolation and subjective social isolation. To measure OSI, I used an instrument called LSNS-6, while I used R-UCLA Loneliness Scale to measure the SSI. Here is the scoring guide for the instruments: * higher score obtained in LSNS-6 = low objective social isolation * higher score obtained in R-UCLA Loneliness scale = high subjective social isolation

After I run the correlation test, I found the value was r= -.437.

My question is, did the value represents correlation between variables (meaning when someone is objectively isolated, they are less likely to be subjectively isolated and vice versa) OR the value represents correlation between scores of instruments used (meaning when someone score higher in LSNS-6, they will had a lower scores for R-UCLA Loneliness Scale and vice versa)? I had confusions due to the scoring guide. I hope you can help me.

Thank you Jim!

April 8, 2022 at 8:17 pm

This specific correlation is a bit tricky because, based on what you wrote, the LSNS-6 is inverted. High LSNS-6 scores correspond to low objective social isolation. Let’s work through this example.

The negative correlation (-0.437) indicates that high LSNS-6 scores tend to correlate with low R-UCLA scores. Now, if we “translate” the instrument measures into what the scores mean as constructs, low objective social isolation tends to correspond low subjective social isolation.

In other words, there is a negative correlation between the instrument scores. However, there is a positive correlation between the concepts of objective social isolation and subjective isolation, which makes theoretical sense.

The reason why the instrument scores have a negative correlation and the constructs having a positive correlation goes back to the fact that high LSNs-6 scores relate to low objective isolation.

I hope that helps!

' src=

April 2, 2022 at 7:16 am

Thanks so much for the highly helpful statistical resources on this website. I am a bit confused about an analysis I carried out. My scatter plot show a kind of negative relationship between two variables but my Pearson’s correlation coefficient results tend to say something different. r= -0.198 and p-value of 0.082. I would appreciate clarification on this.

April 4, 2022 at 3:56 pm

I’m not sure what is surprising you? Can you be more specific?

It sounds like your scatterplot displays a negative correlation and your negative correlation is also negative, which sounds consistent. It’s a fairly weak correlation. The p-value indicates that your data don’t provide quite enough evidence to conclude that the correlation you see in the sample via the scatterplot and correlation coefficient also exists in the population. It might just be sampling error.

' src=

January 14, 2022 at 8:31 am

Hi Jim, Andrew here.

I am using a Pearson test for two variables: LifeSatisfaction and JobSatisfaction. I have gotten a P-Value 0.000 whilst my R-Value is 0.338. Can you explain to me what relation this is? Am I right in thinking that is strong significance with a weak correlation? And that there is no significant correlation between the two.

January 14, 2022 at 4:59 pm

What you’re running in to is the difference between statistical significance and practical significance in the real world. A statistically significant results, such as your correlation, suggests that the relationship you observe in your sample also exists in the population as a whole. However, statistical significance says nothing about how important that relationship is in a practical sense.

Your correlation results suggest that a positive correlation exists between life satisfaction and job satisfaction amongst the population from which you drew your sample. However, the fairly weak correlation of 0.338 might not be of practical significant. People with satisfying jobs might be a little happier but perhaps not to a noticeable degree.

So, for your correlation, statistical significance–yes! Practical significant–maybe not.

For more information, read my post about statistical significance vs. practical significance where I go into it in more detail.

' src=

January 7, 2022 at 7:07 pm

Thank you, Jim, will do.

' src=

January 7, 2022 at 5:07 pm

Hello Jim, I just came across this website. I have a query.

I wrote the following for a report: Table 5 shows the associations between all the domains. The correlation coefficients between the environment and the economy, social, and culture domains are rs=0.335 (weak), rs=0.427 (low) and rs=0.374 (weak), respectively. The correlation coefficient between the economy and the social and culture domains are rs=0.224 and rs=0.157, respectively and are negligible. The correlation coefficient (rs =0.451) between the social and the culture domains is low, positive, and significant. These weak to low correlation coefficient values imply that changes in one domain are not correlated strongly with changes in the related domain.

The comment I received was: Correlation studies are meant to see relationships- not influence- even if there is a positive correlation between x and y, one can never conclude if x or y is the reason for such correlation. It can never determine which variables have the most influence. Thus the caution and need to re-word for some of the lines above. A correlation study also does not take into account any extraneous variables that might influence the correlation outcome.

I am not sure how I should reword? I have checked several sources and their interpretations are similar to mine, Please advise. Thank you

January 7, 2022 at 9:25 pm

Personally, I think your wording is fine. Appropriately, you don’t suggest that correlation implies causation. You state that there is correlation. So, I’m not sure why the reviewer has an issue with it.

Perhaps the reviewer wants an explicit statement to that effect? “As with all correlation studies, these correlations do not necessarily represent causal relationships.”

The second portion of the review comment about extraneous variables is, in my opinion, more relevant. Pairwise correlations don’t control for the effects of other variables. Omitted variable bias can affect these pairs. I write about this in a post about omitted variable bias . These biases can exaggerate or minimize the apparent strength of pairwise correlations.

You can avoid that problem by using partial correlations or multiple regression analysis. Although, it’s not necessarily a problem. It’s just a possibility.

January 5, 2022 at 8:52 pm

Is it possible to compare two correlation coefficients? For example, let’s say that I have three data points (A, B, and C) for each of 75 subjects. If I run a Pearson’s on the A&B survey points and receive a result of .006, while the Pearson’s on the A&C survey points is .215…although both are not significant, can I say that there is a stronger correlation between A&C than between A&B? thank you!

January 6, 2022 at 8:31 pm

I am not aware of test that will assess whether the difference between two correlation coefficients is statistically significant. I know you can do that with regression coefficients , so you might want to determine whether you can use that approach. Click the link to learn more.

However, I can guess that your two coefficients probably are not significantly different and thus you can’t say one is higher. Each of your hypothesis tests are assessing whether one of the coefficients is significantly different from zero. In both cases (0.006 and 0.215), neither are significantly different from zero. Because both of your coefficients are on the same side of zero (positive) the distance between them is even smaller than your larger coefficients (0.215) distance from zero. Hence, that difference probably is also not statistically significant. However, one muddling issue is that with the two datasets combined you have a larger total sample size than either alone, which might allow a supposed combined test to determine that the smaller difference is significant. But that’s uncertain and probably unlikely.

There’s a more fundamental issue to consider beyond statistical significance . . . practical significance. The correlation of 0.006 is so small it might as well be zero. The other is 0.215 (which according to the hypothesis test, also might as well be zero). However, in practical terms, a correlation of 0.215 is also a very weak correlation. So, even if its hypothesis test said it was statistically significant from zero, it’s a puny correlation that doesn’t provide much predictive power at all. So, you’re looking at the difference between two practically insignificant correlations. Even if the larger sample size for a combined test did indicate the difference is statistically significant, that difference (0.215 – 0.006 = 0.209) almost certainly is not practically significant in a real-world sense.

But, if you really want to know the statistical answer, look into the regression method.

May 16, 2022 at 2:57 pm

JIm – here is a YT purporting to demonstrate how to compare correlation coefficients for statistical significance. I’m not a statistician and cannot vouch for the contents. https://www.youtube.com/watch?v=ipqUoAN2m4g

May 16, 2022 at 7:22 pm

That seems like a very non-standard approach in the YT video. And, with a sample size of 200 (100 males, 100 females), even very small effect sizes should be significant. So, I have some doubts about that process, but I haven’t dug into it. It might be totally valid, but it seems inefficient in terms of statistical power for the sample size.

Here’s how I would’ve done that analysis. Instead of correlation, I’d use regression with an interaction effect. I’d want to model the relationship between the amount time studying for a test and the scores. Additionally, I also gather 100 males and females and want to see if the relationship between time studying and test scores differs between genders. In regression, that’s an interaction effect. It’s the same question the YT video assesses, but using a different approach that provides a whole lot more answers.

To see that approach in action, read my post about Comparing Regression Lines Using Hypothesis Tests . In that post, I refer to comparing the relationships between two conditions, A and B. You can equate those two conditions to gender (male and female). And I look at the relationship between Input and Output, which you can equate to Time Studying and Test Score, respectively. While reading that post, notice how much more information you obtain using that approach than just the two correlation coefficients and whether they’re significantly different.

That’s what I mean by generally preferring regression analysis over simple correlation.

' src=

December 9, 2021 at 7:33 pm

salut Jim merci beaucoup pour cette explication je travaille sur un article et je veux calculer la taille d’echantillon pour critiquer la taille d’echantillon utulisé est ce que c posiible de deduire le P par le graphqiue et puis appliquer la regle pour d”duire N ?

December 12, 2021 at 11:57 pm

Unfortunately, I don’t speak French. However, I used Google Translate and I think I understand your question.

No, you can’t calculate the p-value by looking at a graph. You need the actual data values to do that. However, there is another approach you can use to determine whether they have a reasonable sample size.

You can use power and sample size software (such as the free G*Power ) to determine a good sample size. Keep in mind that the sample size you need depends on the strength of the correlation in the population. If the population has a correlation of 0.3, then you’ll need 67 data points to obtain a statistical power of 0.8. However, if the population correlation is higher, the required sample size declines while maintaining the statistical power of 0.8. For instance, for population correlations of 0.5 and 0.8, you’ll only need sample sizes of 23 and 8, respectively.

Using this approach, you’ll at least be able to determine whether they’re using a reasonable sample size given the size of correlation that they report even though you won’t know the p-value.

Hopefully, the reported the sample size, but, if not, you can just count the number of dots on the scatterplot.

' src=

November 19, 2021 at 4:47 pm

Hi Jim. How do I interpret r(12) = -.792, p < .001 for Pearson Coefficiient Correlation?

' src=

October 26, 2021 at 4:53 am

Hi If the correlation between the two independent constructs/variables and the dependent variable/constructs is medium or large, what must the manager to improve the two independent constructs/variables

' src=

October 7, 2021 at 1:12 am

Hi Jim, First of all thank you, this is an excellent resource and has really helped clarify some queries I had. I have run a Pearson’s r test on some stats software to analyse relationship between increasing age and need for friendship. The return is r = 0.052 and p = 0.381. Am I right in assuming there is a very slight positive correlation between the variables but one that is not statistically significant so the null hypothesis cannot be rejected? Kind regards

October 7, 2021 at 11:26 pm

Hi Victoria,

That correlation is so close to 0 that it essentially means that there is no relationship between your two variables. In fact, it’s so close to zero that calling it a very slight positive correlation might be exaggerating by a bit.

As for the p-value, you’re correct. It’s testing the null hypothesis that the correlation equals zero. Because your p-value is greater than any reasonable significance level, you fail to reject the null. Your data provide insufficient evidence to conclude that the correlation doesn’t equal zero (no effect).

If you haven’t, you should graph your data in a scatterplot. Perhaps there’s a U shaped relationship that Pearson’s won’t detect?

' src=

July 21, 2021 at 11:23 pm

No Jim, I mean to ask, let’s assume correlation between variable x and y is 0.91, how do we interpret the remaining 0.09 assuming correlation at 1 is strong positive linear correlation. ?

Is this because of diversification, correlation residual or any error term?

July 21, 2021 at 11:29 pm

Oh, ok. Basically, you’re asking why it’s not a perfect correlation of 1? What explains that difference of 0.09 between the observed correlation and 1? There are several reasons. The typical reason is that most relationships aren’t perfect. There’s usually a certain amount of inherent uncertainty between two variables. It’s the nature of the relationship. Occasionally, you might find very near perfect correlations for relationships governed by physical laws.

If you were to have pair of variables that should have a perfect correlation for theoretical reasons, you might still observe an imperfect correlation thanks to measurement error.

July 20, 2021 at 12:49 pm

If two variable has a correlation of 0.91 what is 0.09, in the equation?

July 21, 2021 at 10:59 pm

I’d need more information/context to be able to answer that question. Is it a regression coefficient?

' src=

June 30, 2021 at 4:21 pm

You are a great resource. Thank you for being so responsive. I’m sure I’ll be bugging you some more in the future.

June 30, 2021 at 12:48 pm

Jim, using Excel, I just calculated that the correlation between two variables (A and B) is .57, which I believe you would consider to be “moderate.” My question is, how can I translate that correlation into a statement that predicts what would happen to B if A goes up by 1 point. Thanks in advance for your help and most especially for your clarity.

June 30, 2021 at 2:59 pm

Hi Gerry, to get that type of information, you’ll need use regression analysis. Read my post about using Excel to perform regression for details . For your example, be sure to use A as the independent variable and B as the dependent variable. Then look at the regression coefficient for A to get your answer!

' src=

May 24, 2021 at 11:51 pm

Hey Man, I’m taking my stats final this week and I’m so glad I found you! Thank you for saving random college kids like me!

' src=

May 19, 2021 at 8:38 am

Hi, I am Nasib Zaman The Spearman correlation between high temperature and COVID-19 cases was significant ( r = 0.393). Correlation between UV index and COVID-19 cases was also significant ( r = 0.386). Is it true?

May 20, 2021 at 1:31 am

Both suggests that as temperature and UV increase that the number of COVID cases increases. Although it is a weak correlation. I don’t know whether that’s true or not. You’d have to assess the validity of the data to make that determination. Additionally, their might be confounding variables at play, which could bias the correlations. I have no way of knowing.

' src=

April 12, 2021 at 1:49 pm

I am using Pearson’s correlation co-efficient to to express the strength of relationship between my two variables on happiness, would this be an appropriate use?

Happiness Diet RelationshipSatisfaction

Pearson Correlation

Happiness 1.000 .310 . 416 Diet .310 1.000 .193 RelationshipSatisfaction .416 .193 1.000

Sig. (1-tailed) 0.00 0.00 Happiness Diet 0.00 0.00 RelationshipSatisfaction 0.00 0.00

N Happiness 1297 1297 1297 Diet 1297 1297 1297 RelationshipSatisfaction 1297 1297 1297

If so, would I be right to say that because the coefficient was r= (.193), it suggests that there is not too strong a relationship between the two independent variables. Can I use anything else to indicate significance levels?

' src=

March 29, 2021 at 3:12 am

I just want to say that your posts are great, but the QA section in the comments is even greater!

Congrats, Jim.

March 29, 2021 at 2:57 pm

Thanks so much!! 🙂

And, I’m really glad you enjoy the QA in the comments. I always request readers to post their questions in the comments section of the relevant post so the answers benefit everyone!

' src=

March 24, 2021 at 1:16 am

Thank you very much. This question was troubling me since last some days , thanks for helping.

Have a nice day…

March 24, 2021 at 1:34 am

You’re very welcome, Ronak! I’m glad to help!

' src=

March 22, 2021 at 12:56 pm

Nalin here. I found your article to be very clarifying conceptually. I had a doubt.

So there is this dataset I have been working on and I calculated the Pearson correlation coefficient between the target variable and the predictor variables. I found out that none of the predictor variables had a correlation >0.1 and <-0.1 with the target variable, hence indicating that no linear relationship exists between them.

How can I verify whether or not any non-linear relationships exist between these pairs of variables or not? Will a scatterplot confirm my claims?

March 23, 2021 at 3:09 pm

Yes, graphing the data in a scatterplot is always a good idea. While you might not have a linear relationship, you could have a curvilinear relationship. A scatterplot would reveal that.

One other thing to watch out for is omitted variable bias. When you perform correlation on a pair of variables, you’re not factoring in other relevant variables that can be confounding the results. To see what I mean, read my post about omitted variable bias . In it, I start with a correlation that appear to be zero even though there actually is a relationship. After I accounted for another variable, there was a significant relationship between the original pair of variables! Just another thing to watch out for that isn’t obvious!

March 20, 2021 at 3:23 am

Yes, I am also doing well…

I am having some subsequent queries…

By overall trend you mean that correlation coefficient will capture how y is changing with respect to x (means y is increasing or decreasing with increase or decrease in x), am i interpreting correctly ?

correlational hypothesis example

March 22, 2021 at 12:25 am

This is something should be clear by examining the scatterplot. Will a straight line fit the dots? Do the dots fall randomly about a straight line or are there patterns? If a straight line fits the data, Pearson’s correlation is valid. However, if it does not, then Pearson’s is not valid. Graphing is the best way to make the determination.

Thanks for the image.

March 23, 2021 at 3:41 pm

Hi again Ronak!

On your graph, the data points are the red line (actually lots and lots of data points and not really a line!). And, the green line is the linear fit. You don’t usually think of Pearson’s correlation as modeling the data but it uses a linear fit. So, the green line is how Pearson’s correlation models your data. You can see that the model doesn’t fit the data adequately. There are systematic (i.e., non-random departures) from the data points. Right there you know that Pearson’s correlation is invalid for these data.

Your data has an upward trend. That is, as X increases, Y also increases. And Pearson’s partially captures that trend. Hence, the positive slope for the green line and the positive correlation you calculated. But, it’s not perfect. You need a better model! In terms of correlation, the graph displays a monotonic relationship and Spearman’s correlation would be a good candidate. Or, you could use regression analysis and include a polynomial to model the curvature . Either of these methods will produce a better fit and more accurate results!

March 18, 2021 at 11:01 am

i am ronak from india. how are you?…hoping corona has not troubled you much. you have simplified concept very well. you are doing amazing job ,great work. i have one doubt and want to clarify it.

Question : whenever we talk correlation coefficient we talk in terms of linear relationship. but i have calculated correlation coefficient for relationship Y vs X^3.

X variable : 1 to 10000 Y = X^3

and correlation coefficient is coming around 0.9165. it is strange even relationship is not linear still it is giving me very high correlation coefficient.

March 19, 2021 at 3:53 pm

I’m doing well here. Just hunkering down like everyone else! I hope you’re doing well too! 🙂

For your data, I’d recommend graphing them in a scatterplot and fit a linear trend line. You can do that in Excel. If your data follow an S-shaped cubic relationship, it is still possible to get a relatively strong correlation. You’ll be able to see how that happens in the scatterplot with trend line. There’s an overall trend to the data that your line follows, but it does hug the curves. However, if you fit a model with a cubic term to fit the curves, you’ll get a better model.

So, let’s switch from a correlation to R-squared. Your correlation of 0.9165 corresponds to an R-squared of 0.84. I’m literally squaring your correlation coefficient to get the R-squared value. Now, fit a regression model with the quadratic and cubic terms to fit your data. You’ll find that your R-squared for this model is higher than for the linear model.

In short, the linear correlation is capturing the overall trend in the data but doesn’t fit the data points as well as the model designed for curvilinear data. Your correlation seems good but it doesn’t fully fit the data.

' src=

March 11, 2021 at 10:56 am

Hi Jim Do the partial correlation include the continuous (scale) variables all times? Is it possible to include other types of variables (as nominal or ordinal)? Regards Jagar

March 16, 2021 at 12:30 am

Pearson correlations are for continuous data that follow a linear relationship. If you have ordinal data or continuous data that follow a monotonic relationship, you can use Spearman’s correlation.

There are correlations specifically for nominal data. I need to write a blog post about those!

' src=

March 10, 2021 at 11:45 am

if the correlation coefficient is 0.153 what type of correlation is it?

February 14, 2021 at 1:49 pm

' src=

February 12, 2021 at 8:09 pm

If my r value when finding correlation between two things is -0.0258 what would that be negative weak correlation or something else?

February 14, 2021 at 12:08 am

Hi Dez, your correlation coefficient is essentially zero, which indicates no relationship between the variables. As one variable increases, there is no tendency for the variable to either increase or decrease. There’s just no relationship between them according to your data.

' src=

January 9, 2021 at 12:10 pm

my coefficient correlation between my independent variables (anger, anxiety, happiness, satisfaction) and a dependent variable(entrepreneurial decision making behavior) is 0.401, 0.303, 0.369, 0.384.

what does this mean? how do i interpret explain this? what’s the relationship?

January 10, 2021 at 1:33 am

It means that separately each independent variable (IV) has a positive correlation with the dependent variable (DV). As each IV increases, the DV tends to increase. However, it is a fairly weak correlation. Additionally, these correlations don’t control for confounding variables. You should perform a regression analysis because you have your IVs and DV. Your model will tell how much variability the IVs account for in the DV collectively. And, it will control for the other variables in the model, which can help reduce omitted variable bias.

The information in this post should help you interpret your correlation coefficients. Just read through it carefully.

' src=

January 4, 2021 at 6:20 am

Hello there, If one were to find out the correlation between the average grade and a variable, could this coefficient be used? Thanks!

January 4, 2021 at 4:03 pm

If you mean something like an average grade per student and the other variable is something like the number of hours each student studies, yes, that’s fine. You just need to be sure that the average grade applies to one person and that the other variable applies to the same person. You can’t use a class average and then the other variable is for individuals.

' src=

December 27, 2020 at 8:27 am

I’m helping a friend working on a paper and don’t have the variables. The question centers around the nature of Criterion Referenced Tests, in general, i.e. correlations of CRT vs. Norm Referenced Tests. As you know, Norm Referenced compares students to each other across a wide population. In this paper, the student is creating a teacher made CRT. It is measuring proficiency of students of more similar abilities and smaller population to criteria and not to each other. I suspect, in general, the CRT doesn’t distinguish as well between students with similar abilities and knowledge. Therefore, the reliability coefficients, in general, are less reliable. How does this effect high or low correlations?

December 26, 2020 at 9:40 pm

high or lower correlation on a CRT proficiency test good or bad?

December 27, 2020 at 1:30 am

Hi Raymond, I’d have to know more about the variables to have an idea about what the correlation means.

' src=

December 8, 2020 at 11:02 pm

I have zero statistics experience but I want to spice up a paper that I’m writing with some quants. And so learned the basics about Pearson correlation on SPSS and I plugged in my data. Now, here’s where it gets “interesting.” Two sets of numbers show up: One on the Pearson Correlation row and below that is the Sig. (2-tailed) row.

I’m too embarrassed to ask folks around me (because I should already know this!). So, let me ask you: which of the row of numbers should I use in my analysis about the correlations between two variables? For example, my independent variable correlates with the dependent variable at -.002 on the first (Pearson Correlation) row. But below that is the Sig. (2-tailed) .995. What does that mean? And is it necessary to have both numbers?

I would really appreciate your response … and will acknowledge you (if the paper gets published).

Many thanks from an old-school qualitative researcher struggling in the times of quants! 🙂

December 9, 2020 at 12:32 am

The one you want to use for a measure of association is the Pearson Correlation. The other value is the p-value. The p-value is for a hypothesis test that determines whether your correlation value is significantly different from zero (no correlation).

If we take your -0.002 correlation and it’s p-value (0.995), we’d interpret that as meaning that your sample contains insufficient evidence to conclude that the population correlation is not zero. Given how close the correlation is to zero, that’s not surprising! Zero correlation indicates there is no tendency for one variable to either increase or decrease as the other variable increases. In other words, there is no relationship between them.

' src=

November 24, 2020 at 7:55 am

Thank you for the good explanation. I am looking for the source or an article that states that most correlations regarding human behaviour are around .6. What source did you use?

Kind regards, Amy

' src=

November 13, 2020 at 5:27 am

This is an informative article and I agree with most of what is said, but this particular sentence might be misleading to readers: “R-squared is a primary measure of how well a regression model fits the data.”. R-squared is in fact based on the assumption that the regression model fits the data to a reasonable extent therefore it cannot also simultaneously be a measure of the goodness of said fit.

The rest of the claims regarding R-squared I completely agree with.

Cheers, Georgi

November 13, 2020 at 2:48 pm

Yes, I make that exact point repeatedly throughout multiple blog posts, particularly my post about R-squared .

Additionally, R-squared is a goodness-of-fit measure, so it is not misleading to say that it measures how well the model fits the data. Yes, it is not a 100% informative measure by itself. You’d also need to assess residual plots in conjunction with the R-squared. Again, that’s a point that I make repeatedly.

I don’t mind disagreements, but I do ask that before disagreeing, you read what I write about a topic to understand what I’m saying. In this case, you would’ve found in my various topics about R-squared and residual plots that we’re saying the same thing.

' src=

November 7, 2020 at 12:31 pm

Thank you very much!

November 6, 2020 at 7:34 pm

Hi Jim, I have a question for you – and thank you in advance for responding to it 🙂

Set A has the correlation coefficient of .25 and Set B has the correlation of .9, Which set has the steeper trend line? A or B?

November 6, 2020 at 8:41 pm

Set B has a stronger relationship. However, that’s not quite equivalent to saying it has a steeper trend line. It means the data points fall closer to the line.

If you look at the examples in this post, you’ll notice that all the positive correlations have roughly equal slopes despite having different correlations. Instead, you see the points moving closer to the line as the strength of the relationship increases. The only exception is that a correlation of zero has a slope of zero.

The point being that you can’t tell from the correlation alone which trend line is steeper. However, the relationship in Set B is much stronger than the relationship in Set A.

' src=

October 19, 2020 at 6:33 am

Thank you 😊. Now I understand.

October 11, 2020 at 4:49 am

hi, I’m a little confused.

What does it indicating, If there is positive correlation, but negative coefficient from multiple regression outcome? in this situation, how to interpret? the relationship is negative or positive?

October 13, 2020 at 1:32 pm

This is likely a case of omitted variable bias. A pairwise correlation involves just two variables. Multiple regression analysis involves three variables at a minimum (2 IVs and a DV). Correlation doesn’t control for other variables while regression analysis controls for the other variables in the model. That can explain the different relationships. Omitted variable bias occurs under specific conditions. Click the link to read about when it occurs. I include an example where I first look at a pair of variables and then three variables and shows how that changes the results, similar to your example.

' src=

September 30, 2020 at 4:26 pm

Hi Jim, I have 4 objective in my research and when I did the correlation between first one and others the result is: ob1 with ob2 is (0.87) – ob1 with ob3 is (0.84) – ob1 with ob4 is ( 0.83). My question is what is that meaning and can I do Correlation Coefficient with all of them in one time.

' src=

September 28, 2020 at 4:06 pm

Which best describes the correlation coefficient for r=.08?

September 30, 2020 at 4:29 pm

Hi Jolette,

I’d say that is an extremely weak correlation. I’d want to see its p-value. If it’s not significant, then you can’t conclude that the correlation is different from zero (no correlation). Is there something else particular you want to know about it?

' src=

September 15, 2020 at 11:50 am

Correlation result between Vul and FCV

t = 3.4535, df = 306, p-value = 0.0006314 alternative hypothesis: true correlation is not equal to 0 95 percent confidence interval: 0.08373962 0.29897226 sample estimates: cor 0.1936854

What does this mean?

September 17, 2020 at 2:53 am

Hi Lakshmi,

It means that your correlation coefficient is ~0.19. That’s the sample estimate. However, because you’re working with a sample, there’s always sample error and so the population correlation is probably not exactly equal to the sample value. The confidence interval indications that you can be 95% confident that the true population correlation falls between ~0.08 and 0.30. The p-value is less than any common significance level. Consequently, you can reject the null hypothesis that the population correlation equals zero and conclude that it does not equal zero. In other words, the correlation you see in the sample is likely to exist in the population.

A correlation of 0.19 is a fairly weak relationship. However, even though it is weak, you have enough evidence to conclude that it exists in the population.

' src=

September 1, 2020 at 8:16 am

Hi Jim Thank you for your support. I have a question that is. Testing criteria for Validity by Pearson correlation, r table determine by formula DF=N-2 – If it is Valid the correlation value less that Pearson correlation value. (Pearson correlation > r table ) – if it is Invalid the correlation value greater that Pearson correlation value. (Pearson correlation < r table ) I got the above information on SPSS tutorial Video about Pearson correlation.

but I didn't get on other literature please can you recommend me some literature that refers about this? or can you clarify more about how to check Validity by Pearson correlation?

' src=

August 31, 2020 at 3:21 am

HI JIM i am zia from pakistan i wanna finding correlation of two factoer i have find 144.6 of 66.93 thats is postive relation?

August 31, 2020 at 12:39 pm

Hi Zia, I’m sorry but I’m not clear about what you’re asking. Correlation coefficients range between -1 and +1, so those two values are not correlation coefficients. Are they regression coefficients?

' src=

August 16, 2020 at 6:47 am

Warmest greetings.

My name is Norshidah Nordin and I am very grateful if you could provide me some answers to the following questions.

1) Can I used two different set of samples (for e.g. students academic performance (CGPA) as dependent variable and teacher’s self efficacy as dependent variable) to run on a Pearson correlation analysis. If yes, could you elaborate on this aspect.

2) what is the minimum sample size to use in multiple regression analysis.

August 17, 2020 at 9:06 pm

Hi Norshidah,

For correlations, you need to have multiple measurements on the same item or person. In your scenario, it sounds like you’re taking different measurements on different people. Pearson’s correlation would not be appropriate.

The minimum sample size for multiple regression depends on the number of terms you need to include in your model. Read my post about overfitting regression models , which occurs when you have too few observations for the number of model terms.

I hope this helps!

' src=

July 29, 2020 at 5:27 pm

Greetings sir, question…. Can you do an accurate regression with a Pearson’s correlation coefficient of 0.10? Why or Why not?

July 31, 2020 at 5:33 pm

Hi Monique,

It is possible. First, you should determine whether that correlation is statistically significant. You’re seeing a correlation in your sample, but you want to be confident that is also exists in the large population you’re studying. There’s a possibility that the correlation only exists in your sample by random chance and does not exist in the population–particularly with such a low coefficient. So, check the p-value for the coefficient. If it’s significant, you have reason to proceed with the regression analysis. Additionally, graph your data. Pearson’s only is for linear relationships. Perhaps your coefficient is low because the relationship is curved?

You can fit the regression model to your data. A correlation of 0.10 equates to an R-squared of only 0.01, which is very low. Perhaps adding more independent variables will increase the R-squared. Even if the r-squared stays very low, if your independent variable is significant, you’re still learning something from your regression model. To understand what you can learn in this situation, read my post about regression models with significant variables and a low R-squared values .

So, it is possible to do a valid regression and learn useful information even when the correlation is so low. But, you need to check for significance along the way.

' src=

July 8, 2020 at 4:55 am

Hello Jim, first and foremost thank you for giving us a comprehensive information regarding this! This totally help me. But I have a question; my pearson results showing that there’s a moderate positive relationship between my variables which is Parasocial Interaction and the fans’ purchase intention.

But the thing is, if I look at the answer majority of my participants are mostly answering Neutral regarding purchase intention.

What does this means? could you help me to figure out this T.T thanks you in advance! I’m a student currently doing thesis from Malaysia.

July 8, 2020 at 4:00 pm

Hi Titania,

Have you graphed your data using a scatterplot? I’d highly recommend that because I think it will probably clarify what your data are telling you. Also, are both of your variables continuous variables? I’m wonder if purchase intention is ordinal if one of the values is Neutral. If that’s the case, you’d need to use Spearman’s Rank Correlation rather than Pearson’s.

' src=

June 18, 2020 at 8:57 am

Hello Jim ! I have a question . I calculated a correlation coefficient between the scale variables and got 0.36, which is relatively weak since it gives a 0.12 if quared. What does the interpretation of correlation concern ? The sample taken or the type of data measurement ? or anything else?

I hope you got my question. Thank you for your help!!

June 18, 2020 at 5:06 pm

I’m not clear what you’re asking exactly. Please clarify. The correlation measures the strength of the relationship between the two continuous variables, as I explain in this article.

Yes, that it is a weak relationship. If you’re going to include this is a regression analysis, you might want to read my article about interpreting low R-squared values .

I’m not sure what you mean by scale variables. However, if these are Likert scale items, you’ll need to use Spearman’s correlation instead of Pearson’s correlation.

' src=

May 26, 2020 at 12:08 am

Hi Jim I am very new to statistics and data analysis. I am doing a quantitative study and my sample size is 200 participants. So far I have only obtained 50 complete responses. . Using G*Power a simple linear regression with a medium effect size, an alpha of .05, and a power level of .80 can I do a data analysis with this small sample.

May 26, 2020 at 3:52 am

Please repost your question in the comments section of the appropriate article. It has nothing to do with correlation coefficients. Use the search bar part way down in the right column and search for power. I have a post about power analysis that is a good fit.

' src=

May 24, 2020 at 9:02 pm

Thank you Mr.Jim, it was a great answer for me!😉 Take good care~

May 24, 2020 at 9:46 am

I am a student from Malaysia.

I have a question to ask Mr.Jim about how to determine the validity (the accurate figure) of the data for analysis purpose base on the table of Pearson’s Correlation Coefficient? Do it has any method?

For example, since the coefficient between one independent variable with the other variable is below 0.7, thus the data valid for analysis purpose.

However, I have read the table there is a figure which more than 0.7. I am not sure about that.

Hope to hearing from Mr.Jim soon. Thank you.

May 24, 2020 at 4:20 pm

Hi, I hope you’re doing well!

There is no single correlation coefficient value that determines whether it is valid to study. It partly depends on your subject area. I low noise physical process might often have a correlation in the very high 0.9s and 0.8 would be considered unacceptable. However, in a study of human behavior, it’s normal and acceptable to have much lower correlations. For example a correlation of 0.5 might be considered very good. Of course, I’m writing the positive values, but the same applies to negative correlations too.

It also depends on what the purpose of your study. If you’re doing something practical, such as describing the relationship between material composition and strength, there might be very specific requirements about how strong that relationship must be for it to be useful. It’s based on real-world practicalities. On the other hand, if you’re just studying something for the sake of science and expanding knowledge, lower correlations might still be interesting.

So, there’s not single answer. It depends on the subject-area you are studying and the purpose of your study.

' src=

February 17, 2020 at 3:49 pm

HI Jim, what could be the implication of my result if I obtained a weak relationship between industry experience and instructional effectiveness? thanks in advance

February 20, 2020 at 11:29 am

The best way to think of it is to look at the graphs in this article and compare the higher correlation graphs to the lower correlation graphs. In the higher correlation graphs, if you know the value of one variable, you have a more precise prediction of the value of the other variable. Look along the x-axis and pick a value. In the higher correlation graphs, the range of y-values that correspond to your x-value is narrower. That range is relatively wide for lower correlations.

For your example, I’ll assume there is a positive correlation. As industry experience increases, instructional effectiveness also increases. However, because that relationship is weak, the range of instructional effectiveness for any given value of industry experience is relatively wide.

' src=

November 25, 2019 at 9:05 pm

if correlation between X and Y is 0.8 .what is the correlation of -X and -Y

November 26, 2019 at 4:59 pm

If you take all the values of X and multiply them by -1 and do the same for Y, your correlation would still be 0.8.

' src=

November 7, 2019 at 3:51 am

This is very helpful, thank you Jim!

' src=

November 6, 2019 at 3:16 am

Hi, My data is continuous – the variables are individual shares volatility and oil prices and they were non-normal. I used Kendall’s Tau and did not rank the data or alter it in any way. Can my results be trusted?

November 6, 2019 at 3:32 pm

Hi Lorraine,

Kendall’s Tau is a correlation coefficient for ranked data. Even though you might not have ranked your data, your statistical software must have created the ranks behind the scenes.

Typically, you’ll use Pearson’s correlation when you have continuous data that have a straight line relationship. If your data are ordinal, ranked, or do not have a straight line relationship, using something other than Pearson’s correlation is necessary.

You mention that your data are nonnormal. Technically, you want to graph your data and look at the shape of the relationship rather than assessing the distribution for each variable. Although, nonnormality can make a linear relationship less likely. So, graph your data on a scatterplot and see what it looks like. If it is close to a straight line, you should probably use Pearson’s correlation. If it’s not a straight line relationship, you might need to use something like Kendall’s Tau or Spearman’s rho coefficient, both of which are based on ranked data. While Spearman’s rho is more commonly used, Kendall’s Tau has preferable statistical properties.

' src=

October 24, 2019 at 11:56 pm

Hi, Jim. If correlations between continuous variables can be measured using Pearson’s, how is correlation between categorical variables measured? Thank you.

October 25, 2019 at 2:38 pm

There are several possible methods, although unlike with continuous data, there doesn’t seem to be a consensus best approach.

But, first off, if you want to determine whether the relationship between categorical variables is statistically significant, use the chi-square test of independence . This test determines whether the relationship between categorical variables is significant, but it does not tell you the degree of correlation.

For the correlation values themselves, there are different methods, such as Goodman and Kruskal’s lambda, Cramér’s V (or phi) for categorical variables with more than 2 levels, and the Phi coefficient for binary data. There are several others that are available as well. Offhand I don’t know the relative pros and cons of each methodology. Perhaps that would be a good post for the future!

' src=

August 29, 2019 at 7:31 pm

Thanks, great explanations.

' src=

April 25, 2019 at 11:58 am

In a multi-variable regression model, is there a method for determining where two predictor variables are correlated in their impact on the outcome variable?

If so, then how is this type of scenario determined, and handled?

Thanks, Curt

April 25, 2019 at 1:27 pm

When predictors are correlated, it’s known as multicollinearity. This condition reduces the precision of the coefficient estimates. I’ve written a post about it: Multicollinearity: Detection, Problems, and Solutions . That post should answer all your questions!

' src=

February 3, 2019 at 6:45 am

Hi Jim: Great explanations. One quick thing, because the probability distribution is asymptotic, there is no p=.000. The probability can never be zero. I see students reporting that or p<.000 all of the time. The actual number may be p <.00000001, so setting a level of p < .001 is usually the best thing to do and seems like journal editors want that when reporting data. Your thoughts?

February 4, 2019 at 12:25 am

Hi Susan, yes, you’re correct about that. You can’t have a p-value that equals zero. Sometimes software will round down when it’s a very small value. The underlying issue is that no matter how large the difference between your sample value and the null hypothesis value, there is a non-zero probability that you’d obtain the observed results when the null is true.

' src=

January 9, 2019 at 6:41 pm

Sir you are love. Such a nice share

' src=

November 21, 2018 at 11:17 am

Awesome stuff, really helpful

' src=

November 9, 2018 at 11:48 am

What do you do when you can’t perform randomized controlled experiments, like in the cases of social science or societal wide health issues? Apropos to gun violence in America, there appears to be correlation between the availability of guns in a society and the number of gun deaths in a society, where as the number of guns in the society goes up the number of gun deaths go up. This is true of individual states in the US where gun availability differs, and also in countries where gun availability differs. But, when/how can you come to a determination that lowering the number of guns available in a society could reasonably be said to lower the number of gun deaths in that society.

November 9, 2018 at 12:20 pm

Hi Patrick,

It is difficult proving causality using observational studies rather than randomized experiments.

In my mind, the following approach can help when you’re trying to use observational studies to show that A causes B.

In observational study, you need to worry about confounding variables because the study is not randomized. These confounding variables can provide alternative explanations for the effect/correlations. If you can include all confounding variables in the analysis, it makes the case stronger because it helps rule out other causes. You must also show that A precedes B. Further, it helps if you can demonstrate the mechanism by which A causes B. That mechanism requires subject-area knowledge beyond just a statistical test.

Those are some ideas that come to my mind after brief reflection. There might well be more and, of course, there will be variations based on the study-area.

' src=

September 19, 2018 at 4:55 am

Thank you so much, I am learning a lot of thing from you!

Please, keep doing this great job!

Best regards

September 19, 2018 at 11:45 pm

You bet, Patrik!

September 18, 2018 at 6:04 am

Another question is: should I consider transform my variable before using person correlation, if they do not follow normal distribution or if the two variable do not have a clear liner relationship? What is the implication of that transformation? How to interpret the relationship if used transformed variable (let“s say log)?

September 18, 2018 at 4:44 pm

Because the data need to follow the bivariate normal distribution to use the hypothesis test, I’d assume the transformation process would be more complex than transforming each variable individually. However, I’m not sure about this.

However, if you just want to make a straight line for the correlation to assess, I’d be careful about that too. The correlation of the transformed data would not apply to the untransformed data. One solution would be to use Spearman’s rank order correlation. Another would be to use regression analysis. In regression analysis, you can fit curves, use transformations, etc., and the assumption is that the residual follow a normal distribution (along with some other assumptions) is easy to check.

If you’re not sure that your data fit the assumptions for Pearson’s correlation, consider using regression instead. There are more tools there for you to use.

September 18, 2018 at 5:36 am

Hi Jim, I am always here following your posts.

I would like if you could clarify something to me, please! What is the assumptions for person correlation that must hold true, in order to apply correlation coefficient?

I have read something on the internet, but there is many confusion. Some people are saying that the dependent variable (if have) must be normally distributed, other saying both (dependent and independent) must be following normal distribution. Therefore, I dont know which one I should follow. I would appreciate a lot your kind contribution. This is something that I am using for my paper.

Thank you in advance!

September 18, 2018 at 4:34 pm

I’m so glad to see that you’re hear reading and learning!

This issue turns out to be a bit complicated!

The assumption is actually that the two variables follow a bivariate normal distribution. I won’t go into that here in much detail, but a bivariate normal distribution is more complex than just each variable following a normal distribution. In a nutshell, if you plot data that follow a bivariate normal distribution on a scatterplot, it’ll appear as an elliptical shape.

In terms of the the correlation coefficient, that simply describes the relationship between the data. It is what it is and the data don’t need to follow a bivariate normal distribution as long as you are assessing a linear relationship.

On the other hand, the hypothesis test of Pearson’s correlation coefficient does assume that the data follow a bivariate normal distribution. If you want to test whether the coefficient equals zero, then you need to satisfy this assumption. However, one thing I’m not sure about is whether the test is robust to departures from normality. For example, a 1-sample t-test assumes normality, but with a large enough sample size you don’t need to satisfy this assumption. I’m not sure if a similar sample size requirement applies to this particular test.

I hope this clarifies this issue a bit!

' src=

August 29, 2018 at 8:04 am

Hello, thanks for the good explanation. Do variables have to be normally distributed to be analyzed in a Pearson’s correlation? Thanks, Moritz

August 30, 2018 at 1:41 pm

No, the variables do not need to follow a normal distribution to use Pearson’s correlation. However, you do need to graph the data on a scatterplot to be sure that the relationship between the variables is linear rather than curved. For curved relationships, consider using Spearman’s rank correlation.

' src=

June 1, 2018 at 9:08 am

Pearson’s correlation measures only linear relationships. But regression can be performed with nonlinear functions, and the software will calculate a value of R^2. What is the meaning of an R^2 value when it accompanies a nonlinear regression?

June 1, 2018 at 9:49 am

Hi Jerry, you raise an important point. R^2 is actually not a valid measure in nonlinear models. To read about why, read my post about R-squared in nonlinear models . In that post, I write about why it’s problematic that many statistical software packages do calculate R-squared values for nonlinear regression. Instead, you should use a different goodness-of-fit measure, such as the standard error of the regression .

' src=

May 30, 2018 at 11:59 pm

Hi, fantastic blog, very helpful. I was hoping I could ask a question? You talk about correlation coefficients but I was wondering if you have a section that talks about the slope of an association? For example, am I right in thinking that the slope is equal to the standardized coefficient from a regression?

I refer to the paper of Cameron et al., (The Aging of Elastic and Muscular Arteries. Diabetes Care 26:2133–2138, 2003) where in table 3 they report a correlation and a slope. Is the correlation the r value and the slope the beta value?

Many thanks, Matt

May 31, 2018 at 12:13 pm

Thanks and I’m glad you found the blog to be helpful!

Typically, you’d use regression analysis to obtain the slope and correlation to obtain the correlation coefficient. These statistics represent fairly different types of information. The correlation coefficient (r) is more closely related to R^2 in simple regression analysis because both statistics measure how close the data points fall to a line. Not surprisingly if you square r, you obtain R^2.

However, you can use r to calculate the slope coefficient. To do that, you’ll need some other information–the standard deviation of the X variable and the standard deviation of the Y variable.

The formula for the slope in simple regression = r(standard deviation of Y/standard deviation of X).

For more information, read my post about slope coefficients and their p-values in regression analysis . I think that will answer a lot of your questions.

' src=

April 12, 2018 at 5:19 am

Nice post ! About pitfalls regarding correlation’s interpretation, here’s a funny database:

http://www.tylervigen.com/spurious-correlations

And a nice and poetic illustration of the concept of correlation:

https://www.youtube.com/watch?v=VFjaBh12C6s&t=0s&index=4&list=PLCkLQOAPOtT1xqDNK8m6IC1bgYCxGZJb_

Have a nice day

April 12, 2018 at 1:57 pm

Thanks for sharing those links! It always fun finding strange correlations like that.

The link for spurious correlations illustrates an important point. Many of those funny correlations are for time series data where both variables have a long-term trend. If you have two variables that you measure over time and they both have long term trends, those two variables will have a strong correlation even if there is no real connection between them!

' src=

April 3, 2018 at 7:05 pm

“In statistics, you typically need to perform a randomized, controlled experiment to determine that a relationship is causal rather than merely correlation.”

Would you please provide an example where you can reasonably conclude that x causes y? And how do you know there isn’t a z that you didn’t control for?

April 3, 2018 at 11:00 pm

That’s a great question. The trick is that when you perform an experiment, you should randomly assign subjects to treatment and control groups. This process randomly distributes any other characteristics that are related to the outcome variable (y). Suppose there is a z that is correlated to the outcome. That z gets randomly distributed between the treatment and control groups. The end result is that z should exist in all groups in roughly equal amounts. This equal distribution should occur even if you don’t know what z is. And, that’s the beautiful thing about random assignment. You don’t need to know everything that can affect the outcome, but random assignment still takes care of it all.

Consequently, if there is a relationship between a treatment and the outcome, you can be pretty certain that the treatment causes the changes in the outcome because all other correlation-only relationships should’ve been randomized away.

I’ll be writing about random assignment in the near future. And, I’ve written about the effectiveness of flu shots , which is based on randomized controlled trials.

Comments and Questions Cancel reply

User Preferences

Content preview.

Arcu felis bibendum ut tristique et egestas quis:

  • Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris
  • Duis aute irure dolor in reprehenderit in voluptate
  • Excepteur sint occaecat cupidatat non proident

Keyboard Shortcuts

5.2 - writing hypotheses.

The first step in conducting a hypothesis test is to write the hypothesis statements that are going to be tested. For each test you will have a null hypothesis (\(H_0\)) and an alternative hypothesis (\(H_a\)).

When writing hypotheses there are three things that we need to know: (1) the parameter that we are testing (2) the direction of the test (non-directional, right-tailed or left-tailed), and (3) the value of the hypothesized parameter.

  • At this point we can write hypotheses for a single mean (\(\mu\)), paired means(\(\mu_d\)), a single proportion (\(p\)), the difference between two independent means (\(\mu_1-\mu_2\)), the difference between two proportions (\(p_1-p_2\)), a simple linear regression slope (\(\beta\)), and a correlation (\(\rho\)). 
  • The research question will give us the information necessary to determine if the test is two-tailed (e.g., "different from," "not equal to"), right-tailed (e.g., "greater than," "more than"), or left-tailed (e.g., "less than," "fewer than").
  • The research question will also give us the hypothesized parameter value. This is the number that goes in the hypothesis statements (i.e., \(\mu_0\) and \(p_0\)). For the difference between two groups, regression, and correlation, this value is typically 0.

Hypotheses are always written in terms of population parameters (e.g., \(p\) and \(\mu\)).  The tables below display all of the possible hypotheses for the parameters that we have learned thus far. Note that the null hypothesis always includes the equality (i.e., =).

One Group Mean
Research Question Is the population mean different from \( \mu_{0} \)? Is the population mean greater than \(\mu_{0}\)? Is the population mean less than \(\mu_{0}\)?
Null Hypothesis, \(H_{0}\) \(\mu=\mu_{0} \) \(\mu=\mu_{0} \) \(\mu=\mu_{0} \)
Alternative Hypothesis, \(H_{a}\) \(\mu\neq \mu_{0} \) \(\mu> \mu_{0} \) \(\mu<\mu_{0} \)
Type of Hypothesis Test Two-tailed, non-directional Right-tailed, directional Left-tailed, directional
Paired Means
Research Question Is there a difference in the population? Is there a mean increase in the population? Is there a mean decrease in the population?
Null Hypothesis, \(H_{0}\) \(\mu_d=0 \) \(\mu_d =0 \) \(\mu_d=0 \)
Alternative Hypothesis, \(H_{a}\) \(\mu_d \neq 0 \) \(\mu_d> 0 \) \(\mu_d<0 \)
Type of Hypothesis Test Two-tailed, non-directional Right-tailed, directional Left-tailed, directional
One Group Proportion
Research Question Is the population proportion different from \(p_0\)? Is the population proportion greater than \(p_0\)? Is the population proportion less than \(p_0\)?
Null Hypothesis, \(H_{0}\) \(p=p_0\) \(p= p_0\) \(p= p_0\)
Alternative Hypothesis, \(H_{a}\) \(p\neq p_0\) \(p> p_0\) \(p< p_0\)
Type of Hypothesis Test Two-tailed, non-directional Right-tailed, directional Left-tailed, directional
Difference between Two Independent Means
Research Question Are the population means different? Is the population mean in group 1 greater than the population mean in group 2? Is the population mean in group 1 less than the population mean in groups 2?
Null Hypothesis, \(H_{0}\) \(\mu_1=\mu_2\) \(\mu_1 = \mu_2 \) \(\mu_1 = \mu_2 \)
Alternative Hypothesis, \(H_{a}\) \(\mu_1 \ne \mu_2 \) \(\mu_1 \gt \mu_2 \) \(\mu_1 \lt \mu_2\)
Type of Hypothesis Test Two-tailed, non-directional Right-tailed, directional Left-tailed, directional
Difference between Two Proportions
Research Question Are the population proportions different? Is the population proportion in group 1 greater than the population proportion in groups 2? Is the population proportion in group 1 less than the population proportion in group 2?
Null Hypothesis, \(H_{0}\) \(p_1 = p_2 \) \(p_1 = p_2 \) \(p_1 = p_2 \)
Alternative Hypothesis, \(H_{a}\) \(p_1 \ne p_2\) \(p_1 \gt p_2 \) \(p_1 \lt p_2\)
Type of Hypothesis Test Two-tailed, non-directional Right-tailed, directional Left-tailed, directional
Simple Linear Regression: Slope
Research Question Is the slope in the population different from 0? Is the slope in the population positive? Is the slope in the population negative?
Null Hypothesis, \(H_{0}\) \(\beta =0\) \(\beta= 0\) \(\beta = 0\)
Alternative Hypothesis, \(H_{a}\) \(\beta\neq 0\) \(\beta> 0\) \(\beta< 0\)
Type of Hypothesis Test Two-tailed, non-directional Right-tailed, directional Left-tailed, directional
Correlation (Pearson's )
Research Question Is the correlation in the population different from 0? Is the correlation in the population positive? Is the correlation in the population negative?
Null Hypothesis, \(H_{0}\) \(\rho=0\) \(\rho= 0\) \(\rho = 0\)
Alternative Hypothesis, \(H_{a}\) \(\rho \neq 0\) \(\rho > 0\) \(\rho< 0\)
Type of Hypothesis Test Two-tailed, non-directional Right-tailed, directional Left-tailed, directional

Module 12: Linear Regression and Correlation

Hypothesis test for correlation, learning outcomes.

  • Conduct a linear regression t-test using p-values and critical values and interpret the conclusion in context

The correlation coefficient,  r , tells us about the strength and direction of the linear relationship between x and y . However, the reliability of the linear model also depends on how many observed data points are in the sample. We need to look at both the value of the correlation coefficient r and the sample size n , together.

We perform a hypothesis test of the “ significance of the correlation coefficient ” to decide whether the linear relationship in the sample data is strong enough to use to model the relationship in the population.

The sample data are used to compute  r , the correlation coefficient for the sample. If we had data for the entire population, we could find the population correlation coefficient. But because we only have sample data, we cannot calculate the population correlation coefficient. The sample correlation coefficient, r , is our estimate of the unknown population correlation coefficient.

  • The symbol for the population correlation coefficient is ρ , the Greek letter “rho.”
  • ρ = population correlation coefficient (unknown)
  • r = sample correlation coefficient (known; calculated from sample data)

The hypothesis test lets us decide whether the value of the population correlation coefficient  ρ is “close to zero” or “significantly different from zero.” We decide this based on the sample correlation coefficient r and the sample size n .

If the test concludes that the correlation coefficient is significantly different from zero, we say that the correlation coefficient is “significant.”

  • Conclusion: There is sufficient evidence to conclude that there is a significant linear relationship between x and y because the correlation coefficient is significantly different from zero.
  • What the conclusion means: There is a significant linear relationship between x and y . We can use the regression line to model the linear relationship between x and y in the population.

If the test concludes that the correlation coefficient is not significantly different from zero (it is close to zero), we say that the correlation coefficient is “not significant.”

  • Conclusion: “There is insufficient evidence to conclude that there is a significant linear relationship between x and y because the correlation coefficient is not significantly different from zero.”
  • What the conclusion means: There is not a significant linear relationship between x and y . Therefore, we CANNOT use the regression line to model a linear relationship between x and y in the population.
  • If r is significant and the scatter plot shows a linear trend, the line can be used to predict the value of y for values of x that are within the domain of observed x values.
  • If r is not significant OR if the scatter plot does not show a linear trend, the line should not be used for prediction.
  • If r is significant and if the scatter plot shows a linear trend, the line may NOT be appropriate or reliable for prediction OUTSIDE the domain of observed x values in the data.

Performing the Hypothesis Test

  • Null Hypothesis: H 0 : ρ = 0
  • Alternate Hypothesis: H a : ρ ≠ 0

What the Hypotheses Mean in Words

  • Null Hypothesis H 0 : The population correlation coefficient IS NOT significantly different from zero. There IS NOT a significant linear relationship (correlation) between x and y in the population.
  • Alternate Hypothesis H a : The population correlation coefficient IS significantly DIFFERENT FROM zero. There IS A SIGNIFICANT LINEAR RELATIONSHIP (correlation) between x and y in the population.

Drawing a Conclusion

There are two methods of making the decision. The two methods are equivalent and give the same result.

  • Method 1: Using the p -value
  • Method 2: Using a table of critical values

In this chapter of this textbook, we will always use a significance level of 5%,  α = 0.05

Using the  p -value method, you could choose any appropriate significance level you want; you are not limited to using α = 0.05. But the table of critical values provided in this textbook assumes that we are using a significance level of 5%, α = 0.05. (If we wanted to use a different significance level than 5% with the critical value method, we would need different tables of critical values that are not provided in this textbook).

Method 1: Using a p -value to make a decision

Using the ti-83, 83+, 84, 84+ calculator.

To calculate the  p -value using LinRegTTEST:

  • On the LinRegTTEST input screen, on the line prompt for β or ρ , highlight “≠ 0”
  • The output screen shows the p-value on the line that reads “p =”.
  • (Most computer statistical software can calculate the  p -value).

If the p -value is less than the significance level ( α = 0.05)

  • Decision: Reject the null hypothesis.
  • Conclusion: “There is sufficient evidence to conclude that there is a significant linear relationship between x and y because the correlation coefficient is significantly different from zero.”

If the p -value is NOT less than the significance level ( α = 0.05)

  • Decision: DO NOT REJECT the null hypothesis.
  • Conclusion: “There is insufficient evidence to conclude that there is a significant linear relationship between x and y because the correlation coefficient is NOT significantly different from zero.”

Calculation Notes:

  • You will use technology to calculate the p -value. The following describes the calculations to compute the test statistics and the p -value:
  • The p -value is calculated using a t -distribution with n – 2 degrees of freedom.
  • The formula for the test statistic is [latex]\displaystyle{t}=\dfrac{{{r}\sqrt{{{n}-{2}}}}}{\sqrt{{{1}-{r}^{{2}}}}}[/latex]. The value of the test statistic, t , is shown in the computer or calculator output along with the p -value. The test statistic t has the same sign as the correlation coefficient r .
  • The p -value is the combined area in both tails.

Recall: ORDER OF OPERATIONS

parentheses exponents multiplication division addition subtraction
[latex]( \ )[/latex] [latex]x^2[/latex] [latex]\times \ \mathrm{or} \ \div[/latex] [latex]+ \ \mathrm{or} \ -[/latex]

1st find the numerator:

Step 1: Find [latex]n-2[/latex], and then take the square root.

Step 2: Multiply the value in Step 1 by [latex]r[/latex].

2nd find the denominator: 

Step 3: Find the square of [latex]r[/latex], which is [latex]r[/latex] multiplied by [latex]r[/latex].

Step 4: Subtract this value from 1, [latex]1 -r^2[/latex].

Step 5: Find the square root of Step 4.

3rd take the numerator and divide by the denominator.

An alternative way to calculate the  p -value (p) given by LinRegTTest is the command 2*tcdf(abs(t),10^99, n-2) in 2nd DISTR.

THIRD-EXAM vs FINAL-EXAM EXAM:  p- value method

  • Consider the  third exam/final exam example (example 2).
  • The line of best fit is: [latex]\hat{y}[/latex] = -173.51 + 4.83 x  with  r  = 0.6631 and there are  n  = 11 data points.
  • Can the regression line be used for prediction?  Given a third exam score ( x  value), can we use the line to predict the final exam score (predicted  y  value)?
  • H 0 :  ρ  = 0
  • H a :  ρ  ≠ 0
  • The  p -value is 0.026 (from LinRegTTest on your calculator or from computer software).
  • The  p -value, 0.026, is less than the significance level of  α  = 0.05.
  • Decision: Reject the Null Hypothesis  H 0
  • Conclusion: There is sufficient evidence to conclude that there is a significant linear relationship between the third exam score ( x ) and the final exam score ( y ) because the correlation coefficient is significantly different from zero.

Because  r  is significant and the scatter plot shows a linear trend, the regression line can be used to predict final exam scores.

Method 2: Using a table of Critical Values to make a decision

The 95% Critical Values of the Sample Correlation Coefficient Table can be used to give you a good idea of whether the computed value of r is significant or not . Compare  r to the appropriate critical value in the table. If r is not between the positive and negative critical values, then the correlation coefficient is significant. If  r is significant, then you may want to use the line for prediction.

Suppose you computed  r = 0.801 using n = 10 data points. df = n – 2 = 10 – 2 = 8. The critical values associated with df = 8 are -0.632 and + 0.632. If r < negative critical value or r > positive critical value, then r is significant. Since r = 0.801 and 0.801 > 0.632, r is significant and the line may be used for prediction. If you view this example on a number line, it will help you.

Horizontal number line with values of -1, -0.632, 0, 0.632, 0.801, and 1. A dashed line above values -0.632, 0, and 0.632 indicates not significant values.

r is not significant between -0.632 and +0.632. r = 0.801 > +0.632. Therefore, r is significant.

For a given line of best fit, you computed that  r = 0.6501 using n = 12 data points and the critical value is 0.576. Can the line be used for prediction? Why or why not?

If the scatter plot looks linear then, yes, the line can be used for prediction, because  r > the positive critical value.

Suppose you computed  r = –0.624 with 14 data points. df = 14 – 2 = 12. The critical values are –0.532 and 0.532. Since –0.624 < –0.532, r is significant and the line can be used for prediction

Horizontal number line with values of -0.624, -0.532, and 0.532.

r = –0.624-0.532. Therefore, r is significant.

For a given line of best fit, you compute that  r = 0.5204 using n = 9 data points, and the critical value is 0.666. Can the line be used for prediction? Why or why not?

No, the line cannot be used for prediction, because  r < the positive critical value.

Suppose you computed  r = 0.776 and n = 6. df = 6 – 2 = 4. The critical values are –0.811 and 0.811. Since –0.811 < 0.776 < 0.811, r is not significant, and the line should not be used for prediction.

Horizontal number line with values -0.924, -0.532, and 0.532.

–0.811 <  r = 0.776 < 0.811. Therefore, r is not significant.

For a given line of best fit, you compute that  r = –0.7204 using n = 8 data points, and the critical value is = 0.707. Can the line be used for prediction? Why or why not?

Yes, the line can be used for prediction, because  r < the negative critical value.

THIRD-EXAM vs FINAL-EXAM EXAMPLE: critical value method

Consider the  third exam/final exam example  again. The line of best fit is: [latex]\hat{y}[/latex] = –173.51+4.83 x  with  r  = 0.6631 and there are  n  = 11 data points. Can the regression line be used for prediction?  Given a third-exam score ( x  value), can we use the line to predict the final exam score (predicted  y  value)?

  • Use the “95% Critical Value” table for  r  with  df  =  n  – 2 = 11 – 2 = 9.
  • The critical values are –0.602 and +0.602
  • Since 0.6631 > 0.602,  r  is significant.

Suppose you computed the following correlation coefficients. Using the table at the end of the chapter, determine if  r is significant and the line of best fit associated with each r can be used to predict a y value. If it helps, draw a number line.

  • r = –0.567 and the sample size, n , is 19. The df = n – 2 = 17. The critical value is –0.456. –0.567 < –0.456 so r is significant.
  • r = 0.708 and the sample size, n , is nine. The df = n – 2 = 7. The critical value is 0.666. 0.708 > 0.666 so r is significant.
  • r = 0.134 and the sample size, n , is 14. The df = 14 – 2 = 12. The critical value is 0.532. 0.134 is between –0.532 and 0.532 so r is not significant.
  • r = 0 and the sample size, n , is five. No matter what the dfs are, r = 0 is between the two critical values so r is not significant.

For a given line of best fit, you compute that  r = 0 using n = 100 data points. Can the line be used for prediction? Why or why not?

No, the line cannot be used for prediction no matter what the sample size is.

Assumptions in Testing the Significance of the Correlation Coefficient

Testing the significance of the correlation coefficient requires that certain assumptions about the data are satisfied. The premise of this test is that the data are a sample of observed points taken from a larger population. We have not examined the entire population because it is not possible or feasible to do so. We are examining the sample to draw a conclusion about whether the linear relationship that we see between  x and y in the sample data provides strong enough evidence so that we can conclude that there is a linear relationship between x and y in the population.

The regression line equation that we calculate from the sample data gives the best-fit line for our particular sample. We want to use this best-fit line for the sample as an estimate of the best-fit line for the population. Examining the scatterplot and testing the significance of the correlation coefficient helps us determine if it is appropriate to do this.

The assumptions underlying the test of significance are:

  • There is a linear relationship in the population that models the average value of y for varying values of x . In other words, the expected value of y for each particular value lies on a straight line in the population. (We do not know the equation for the line for the population. Our regression line from the sample is our best estimate of this line in the population).
  • The y values for any particular x value are normally distributed about the line. This implies that there are more y values scattered closer to the line than are scattered farther away. Assumption (1) implies that these normal distributions are centered on the line: the means of these normal distributions of y values lie on the line.
  • The standard deviations of the population y values about the line are equal for each value of x . In other words, each of these normal distributions of y  values has the same shape and spread about the line.
  • The residual errors are mutually independent (no pattern).
  • The data are produced from a well-designed, random sample or randomized experiment.

The left graph shows three sets of points. Each set falls in a vertical line. The points in each set are normally distributed along the line — they are densely packed in the middle and more spread out at the top and bottom. A downward sloping regression line passes through the mean of each set. The right graph shows the same regression line plotted. A vertical normal curve is shown for each line.

The  y values for each x value are normally distributed about the line with the same standard deviation. For each x value, the mean of the y values lies on the regression line. More y values lie near the line than are scattered further away from the line.

  • Provided by : Lumen Learning. License : CC BY: Attribution
  • Testing the Significance of the Correlation Coefficient. Provided by : OpenStax. Located at : https://openstax.org/books/introductory-statistics/pages/12-4-testing-the-significance-of-the-correlation-coefficient . License : CC BY: Attribution . License Terms : Access for free at https://openstax.org/books/introductory-statistics/pages/1-introduction
  • Introductory Statistics. Authored by : Barbara Illowsky, Susan Dean. Provided by : OpenStax. Located at : https://openstax.org/books/introductory-statistics/pages/1-introduction . License : CC BY: Attribution . License Terms : Access for free at https://openstax.org/books/introductory-statistics/pages/1-introduction

Footer Logo Lumen Candela

Privacy Policy

Learning Materials

  • Business Studies
  • Combined Science
  • Computer Science
  • Engineering
  • English Literature
  • Environmental Science
  • Human Geography
  • Macroeconomics
  • Microeconomics
  • Hypothesis Test for Correlation

Let's look at the hypothesis test for correlation, including the hypothesis test for correlation coefficient, the hypothesis test for negative correlation and the null hypothesis for correlation test.

Hypothesis Test for Correlation

Create learning materials about Hypothesis Test for Correlation with our free learning app!

  • Instand access to millions of learning materials
  • Flashcards, notes, mock-exams and more
  • Everything you need to ace your exams

Millions of flashcards designed to help you ace your studies

  • Cell Biology

What does a PMCC, or r coefficient of 1 signify?

What does a PMCC, or r coefficient of 0 signify?

What does a PMCC, or r coefficient of -0.986 signify? 

What does the null hypothesis state?

What is bivariate data?

What is the critical region?

What is the difference between a one-tailed and a two-tailed test?

Are hypotheses written in words or symbols?

What does a significance of 5% indicate?

  • Applied Mathematics
  • Decision Maths
  • Discrete Mathematics
  • Logic and Functions
  • Mechanics Maths
  • Probability and Statistics
  • Bayesian Statistics
  • Bias in Experiments
  • Binomial Distribution
  • Binomial Hypothesis Test
  • Biostatistics
  • Bivariate Data
  • Categorical Data Analysis
  • Categorical Variables
  • Causal Inference
  • Central Limit Theorem
  • Chi Square Test for Goodness of Fit
  • Chi Square Test for Homogeneity
  • Chi Square Test for Independence
  • Chi-Square Distribution
  • Cluster Analysis
  • Combining Random Variables
  • Comparing Data
  • Comparing Two Means Hypothesis Testing
  • Conditional Probability
  • Conducting A Study
  • Conducting a Survey
  • Conducting an Experiment
  • Confidence Interval for Population Mean
  • Confidence Interval for Population Proportion
  • Confidence Interval for Slope of Regression Line
  • Confidence Interval for the Difference of Two Means
  • Confidence Intervals
  • Correlation Math
  • Cox Regression
  • Cumulative Distribution Function
  • Cumulative Frequency
  • Data Analysis
  • Data Interpretation
  • Decision Theory
  • Degrees of Freedom
  • Discrete Random Variable
  • Discriminant Analysis
  • Distributions
  • Empirical Bayes Methods
  • Empirical Rule
  • Errors In Hypothesis Testing
  • Estimation Theory
  • Estimator Bias
  • Events (Probability)
  • Experimental Design
  • Factor Analysis
  • Frequency Polygons
  • Generalization and Conclusions
  • Geometric Distribution
  • Geostatistics
  • Hierarchical Modeling
  • Hypothesis Test for Regression Slope
  • Hypothesis Test of Two Population Proportions
  • Hypothesis Testing
  • Inference For Distributions Of Categorical Data
  • Inferences in Statistics
  • Item Response Theory
  • Kaplan-Meier Estimate
  • Kernel Density Estimation
  • Large Data Set
  • Lasso Regression
  • Latent Variable Models
  • Least Squares Linear Regression
  • Linear Interpolation
  • Linear Regression
  • Logistic Regression
  • Machine Learning
  • Mann-Whitney Test
  • Markov Chains
  • Mean and Variance of Poisson Distributions
  • Measures of Central Tendency
  • Methods of Data Collection
  • Mixed Models
  • Multilevel Modeling
  • Multivariate Analysis
  • Neyman-Pearson Lemma
  • Non-parametric Methods
  • Normal Distribution
  • Normal Distribution Hypothesis Test
  • Normal Distribution Percentile
  • Ordinal Regression
  • Paired T-Test
  • Parametric Methods
  • Path Analysis
  • Point Estimation
  • Poisson Regression
  • Principle Components Analysis
  • Probability
  • Probability Calculations
  • Probability Density Function
  • Probability Distribution
  • Probability Generating Function
  • Product Moment Correlation Coefficient
  • Quantile Regression
  • Quantitative Variables
  • Random Effects Model
  • Random Variables
  • Randomized Block Design
  • Regression Analysis
  • Residual Sum of Squares
  • Robust Statistics
  • Sample Mean
  • Sample Proportion
  • Sampling Distribution
  • Sampling Theory
  • Scatter Graphs
  • Sequential Analysis
  • Single Variable Data
  • Spearman's Rank Correlation
  • Spearman's Rank Correlation Coefficient
  • Standard Deviation
  • Standard Error
  • Standard Normal Distribution
  • Statistical Graphs
  • Statistical Inference
  • Statistical Measures
  • Stem and Leaf Graph
  • Stochastic Processes
  • Structural Equation Modeling
  • Sum of Independent Random Variables
  • Survey Bias
  • Survival Analysis
  • Survivor Function
  • T-distribution
  • The Power Function
  • Time Series Analysis
  • Transforming Random Variables
  • Tree Diagram
  • Two Categorical Variables
  • Two Quantitative Variables
  • Type I Error
  • Type II Error
  • Types of Data in Statistics
  • Variance for Binomial Distribution
  • Venn Diagrams
  • Wilcoxon Test
  • Zero-Inflated Models
  • Theoretical and Mathematical Physics

What is the hypothesis test for correlation coefficient?

When given a sample of bivariate data (data which include two variables), it is possible to calculate how linearly correlated the data are, using a correlation coefficient.

The product moment correlation coefficient (PMCC) describes the extent to which one variable correlates with another. In other words, the strength of the correlation between two variables. The PMCC for a sample of data is denoted by r , while the PMCC for a population is denoted by ρ.

The PMCC is limited to values between -1 and 1 (included).

If r = 1 , there is a perfect positive linear correlation. All points lie on a straight line with a positive gradient, and the higher one of the variables is, the higher the other.

If r = 0 , there is no linear correlation between the variables.

If r = - 1 , there is a perfect negative linear correlation. All points lie on a straight line with a negative gradient, and the higher one of the variables is, the lower the other.

Correlation is not equivalent to causation, but a PMCC close to 1 or -1 can indicate that there is a higher likelihood that two variables are related.

statistics bivariate data correlation null positive negative graphs StudySmarter

The PMCC should be able to be calculated using a graphics calculator by finding the regression line of y on x, and hence finding r (this value is automatically calculated by the calculator), or by using the formula r = S x y S x x S y y , which is in the formula booklet. The closer r is to 1 or -1, the stronger the correlation between the variables, and hence the more closely associated the variables are. You need to be able to carry out hypothesis tests on a sample of bivariate data to determine if we can establish a linear relationship for an entire population. By calculating the PMCC, and comparing it to a critical value, it is possible to determine the likelihood of a linear relationship existing.

What is the hypothesis test for negative correlation?

To conduct a hypothesis test, a number of keywords must be understood:

Null hypothesis ( H 0 ) : the hypothesis assumed to be correct until proven otherwise

Alternative hypothesis ( H 1 ) : the conclusion made if H 0 is rejected.

Hypothesis test: a mathematical procedure to examine a value of a population parameter proposed by the null hypothesis compared to the alternative hypothesis.

Test statistic: is calculated from the sample and tested in cumulative probability tables or with the normal distribution as the last part of the significance test.

Critical region: the range of values that lead to the rejection of the null hypothesis.

Significance level: the actual significance level is the probability of rejecting H 0 when it is in fact true.

The null hypothesis is also known as the 'working hypothesis'. It is what we assume to be true for the purpose of the test, or until proven otherwise.

The alternative hypothesis is what is concluded if the null hypothesis is rejected. It also determines whether the test is one-tailed or two-tailed.

A one-tailed test allows for the possibility of an effect in one direction, while two-tailed tests allow for the possibility of an effect in two directions, in other words, both in the positive and the negative directions. Method: A series of steps must be followed to determine the existence of a linear relationship between 2 variables. 1 . Write down the null and alternative hypotheses ( H 0 a n d H 1 ). The null hypothesis is always ρ = 0 , while the alternative hypothesis depends on what is asked in the question. Both hypotheses must be stated in symbols only (not in words).

2 . Using a calculator, work out the value of the PMCC of the sample data, r .

3 . Use the significance level and sample size to figure out the critical value. This can be found in the PMCC table in the formula booklet.

4 . Take the absolute value of the PMCC and r , and compare these to the critical value. If the absolute value is greater than the critical value, the null hypothesis should be rejected. Otherwise, the null hypothesis should be accepted.

5 . Write a full conclusion in the context of the question. The conclusion should be stated in full: both in statistical language and in words reflecting the context of the question. A negative correlation signifies that the alternative hypothesis is rejected: the lack of one variable correlates with a stronger presence of the other variable, whereas, when there is a positive correlation, the presence of one variable correlates with the presence of the other.

How to interpret results based on the null hypothesis

From the observed results (test statistic), a decision must be made, determining whether to reject the null hypothesis or not.

hypothesis test for correlation probability of observed result studysmarter

Both the one-tailed and two-tailed tests are shown at the 5% level of significance. However, the 5% is distributed in both the positive and negative side in the two-tailed test, and solely on the positive side in the one-tailed test.

From the null hypothesis, the result could lie anywhere on the graph. If the observed result lies in the shaded area, the test statistic is significant at 5%, in other words, we reject H 0 . Therefore, H 0 could actually be true but it is still rejected. Hence, the significance level, 5%, is the probability that H 0 is rejected even though it is true, in other words, the probability that H 0 is incorrectly rejected. When H 0 is rejected, H 1 (the alternative hypothesis) is used to write the conclusion.

We can define the null and alternative hypotheses for one-tailed and two-tailed tests:

For a one-tailed test:

  • H 0 : ρ = 0 : H 1 ρ > 0 o r
  • H 0 : ρ = 0 : H 1 ρ < 0

For a two-tailed test:

  • H 0 : ρ = 0 : H 1 ρ ≠ 0

Let us look at an example of testing for correlation.

12 students sat two biology tests: one was theoretical and the other was practical. The results are shown in the table.

a) Find the product moment correlation coefficient for this data, to 3 significant figures.

b) A teacher claims that students who do well in the theoretical test tend to do well in the practical test. Test this claim at the 0.05 level of significance, clearly stating your hypotheses.

a) Using a calculator, we find the PMCC (enter the data into two lists and calculate the regression line. the PMCC will appear). r = 0.935 to 3 sign. figures

b) We are testing for a positive correlation, since the claim is that a higher score in the theoretical test is associated with a higher score in the practical test. We will now use the five steps we previously looked at.

1. State the null and alternative hypotheses. H 0 : ρ = 0 and H 1 : ρ > 0

2. Calculate the PMCC. From part a), r = 0.935

3. Figure out the critical value from the sample size and significance level. The sample size, n , is 12. The significance level is 5%. The hypothesis is one-tailed since we are only testing for positive correlation. Using the table from the formula booklet, the critical value is shown to be cv = 0.4973

4. The absolute value of the PMCC is 0.935, which is larger than 0.4973. Since the PMCC is larger than the critical value at the 5% level of significance, we can reach a conclusion.

5. Since the PMCC is larger than the critical value, we choose to reject the null hypothesis. We can conclude that there is significant evidence to support the claim that students who do well in the theoretical biology test also tend to do well in the practical biology test.

Let us look at a second example.

A tetrahedral die (four faces) is rolled 40 times and 6 'ones' are observed. Is there any evidence at the 10% level that the probability of a score of 1 is less than a quarter?

The expected mean is 10 = 40 × 1 4 . The question asks whether the observed result (test statistic 6 is unusually low.

We now follow the same series of steps.

1. State the null and alternative hypotheses. H 0 : ρ = 0 and H 1 : ρ <0.25

2. We cannot calculate the PMCC since we are only given data for the frequency of 'ones'.

3. A one-tailed test is required ( ρ < 0.25) at the 10% significance level. We can convert this to a binomial distribution in which X is the number of 'ones' so X ~ B ( 40 , 0 . 25 ) , we then use the cumulative binomial tables. The observed value is X = 6. To P ( X ≤ 6 ' o n e s ' i n 40 r o l l s ) = 0 . 0962 .

4. Since 0.0962, or 9.62% <10%, the observed result lies in the critical region.

5. We reject and accept the alternative hypothesis. We conclude that there is evidence to show that the probability of rolling a 'one' is less than 1 4

Hypothesis Test for Correlation - Key takeaways

  • The Product Moment Correlation Coefficient (PMCC), or r , is a measure of how strongly related 2 variables are. It ranges between -1 and 1, indicating the strength of a correlation.
  • The closer r is to 1 or -1 the stronger the (positive or negative) correlation between two variables.
  • The null hypothesis is the hypothesis that is assumed to be correct until proven otherwise. It states that there is no correlation between the variables.
  • The alternative hypothesis is that which is accepted when the null hypothesis is rejected. It can be either one-tailed (looking at one outcome) or two-tailed (looking at both outcomes – positive and negative).
  • If the significance level is 5%, this means that there is a 5% chance that the null hypothesis is incorrectly rejected.

Images One-tailed test: https://en.wikipedia.org/w/index.php?curid=35569621

Flashcards in Hypothesis Test for Correlation 9

There is a perfect positive linear correlation between 2 variables

There is no correlation between 2 variables

There is a strong negative linear correlation between the 2 variables

p = 0 (there is no correlation between the variables)

Data which includes 2 variables

The range of values which lead to the rejection of the null hypothesis

Hypothesis Test for Correlation

Learn with 9 Hypothesis Test for Correlation flashcards in the free StudySmarter app

We have 14,000 flashcards about Dynamic Landscapes.

Already have an account? Log in

Frequently Asked Questions about Hypothesis Test for Correlation

Is the Pearson correlation a hypothesis test?

Yes. The Pearson correlation produces a PMCC value, or r   value, which indicates the strength of the relationship between two variables.

Can we test a hypothesis with correlation?

Yes. Correlation is not equivalent to causation, however we can test hypotheses to determine whether a correlation (or association) exists between two variables.

How do you set up the hypothesis test for correlation?

You need a null (p = 0) and alternative hypothesis. The PMCC, or r value must be calculated, based on the sample data. Based on the significance level and sample size, the critical value can be worked out from a table of values in the formula booklet. Finally the r value and critical value can be compared to determine which hypothesis is accepted.

Discover learning materials with the free StudySmarter app

1

About StudySmarter

StudySmarter is a globally recognized educational technology company, offering a holistic learning platform designed for students of all ages and educational levels. Our platform provides learning support for a wide range of subjects, including STEM, Social Sciences, and Languages and also helps students to successfully master various tests and exams worldwide, such as GCSE, A Level, SAT, ACT, Abitur, and more. We offer an extensive library of learning materials, including interactive flashcards, comprehensive textbook solutions, and detailed explanations. The cutting-edge technology and tools we provide help students create their own learning materials. StudySmarter’s content is not only expert-verified but also regularly updated to ensure accuracy and relevance.

Hypothesis Test for Correlation

StudySmarter Editorial Team

Team Math Teachers

  • 9 minutes reading time
  • Checked by StudySmarter Editorial Team

Study anywhere. Anytime.Across all devices.

Create a free account to save this explanation..

Save explanations to your personalised space and access them anytime, anywhere!

By signing up, you agree to the Terms and Conditions and the Privacy Policy of StudySmarter.

Sign up to highlight and take notes. It’s 100% free.

Join over 22 million students in learning with our StudySmarter App

The first learning app that truly has everything you need to ace your exams in one place

  • Flashcards & Quizzes
  • AI Study Assistant
  • Study Planner
  • Smart Note-Taking

Join over 22 million students in learning with our StudySmarter App

Get unlimited access with a free StudySmarter account.

  • Instant access to millions of learning materials.
  • Flashcards, notes, mock-exams, AI tools and more.
  • Everything you need to ace your exams.

Second Popup Banner

  • Privacy Policy

Research Method

Home » Correlation Analysis – Types, Methods and Examples

Correlation Analysis – Types, Methods and Examples

Table of Contents

Correlation Analysis

Correlation Analysis

Correlation analysis is a statistical method used to evaluate the strength and direction of the relationship between two or more variables . The correlation coefficient ranges from -1 to 1.

  • A correlation coefficient of 1 indicates a perfect positive correlation. This means that as one variable increases, the other variable also increases.
  • A correlation coefficient of -1 indicates a perfect negative correlation. This means that as one variable increases, the other variable decreases.
  • A correlation coefficient of 0 means that there’s no linear relationship between the two variables.

Correlation Analysis Methodology

Conducting a correlation analysis involves a series of steps, as described below:

  • Define the Problem : Identify the variables that you think might be related. The variables must be measurable on an interval or ratio scale. For example, if you’re interested in studying the relationship between the amount of time spent studying and exam scores, these would be your two variables.
  • Data Collection : Collect data on the variables of interest. The data could be collected through various means such as surveys , observations , or experiments. It’s crucial to ensure that the data collected is accurate and reliable.
  • Data Inspection : Check the data for any errors or anomalies such as outliers or missing values. Outliers can greatly affect the correlation coefficient, so it’s crucial to handle them appropriately.
  • Choose the Appropriate Correlation Method : Select the correlation method that’s most appropriate for your data. If your data meets the assumptions for Pearson’s correlation (interval or ratio level, linear relationship, variables are normally distributed), use that. If your data is ordinal or doesn’t meet the assumptions for Pearson’s correlation, consider using Spearman’s rank correlation or Kendall’s Tau.
  • Compute the Correlation Coefficient : Once you’ve selected the appropriate method, compute the correlation coefficient. This can be done using statistical software such as R, Python, or SPSS, or manually using the formulas.
  • Interpret the Results : Interpret the correlation coefficient you obtained. If the correlation is close to 1 or -1, the variables are strongly correlated. If the correlation is close to 0, the variables have little to no linear relationship. Also consider the sign of the correlation coefficient: a positive sign indicates a positive relationship (as one variable increases, so does the other), while a negative sign indicates a negative relationship (as one variable increases, the other decreases).
  • Check the Significance : It’s also important to test the statistical significance of the correlation. This typically involves performing a t-test. A small p-value (commonly less than 0.05) suggests that the observed correlation is statistically significant and not due to random chance.
  • Report the Results : The final step is to report your findings. This should include the correlation coefficient, the significance level, and a discussion of what these findings mean in the context of your research question.

Types of Correlation Analysis

Types of Correlation Analysis are as follows:

Pearson Correlation

This is the most common type of correlation analysis. Pearson correlation measures the linear relationship between two continuous variables. It assumes that the variables are normally distributed and have equal variances. The correlation coefficient (r) ranges from -1 to +1, with -1 indicating a perfect negative linear relationship, +1 indicating a perfect positive linear relationship, and 0 indicating no linear relationship.

Spearman Rank Correlation

Spearman’s rank correlation is a non-parametric measure that assesses how well the relationship between two variables can be described using a monotonic function. In other words, it evaluates the degree to which, as one variable increases, the other variable tends to increase, without requiring that increase to be consistent.

Kendall’s Tau

Kendall’s Tau is another non-parametric correlation measure used to detect the strength of dependence between two variables. Kendall’s Tau is often used for variables measured on an ordinal scale (i.e., where values can be ranked).

Point-Biserial Correlation

This is used when you have one dichotomous and one continuous variable, and you want to test for correlations. It’s a special case of the Pearson correlation.

Phi Coefficient

This is used when both variables are dichotomous or binary (having two categories). It’s a measure of association for two binary variables.

Canonical Correlation

This measures the correlation between two multi-dimensional variables. Each variable is a combination of data sets, and the method finds the linear combination that maximizes the correlation between them.

Partial and Semi-Partial (Part) Correlations

These are used when the researcher wants to understand the relationship between two variables while controlling for the effect of one or more additional variables.

Cross-Correlation

Used mostly in time series data to measure the similarity of two series as a function of the displacement of one relative to the other.

Autocorrelation

This is the correlation of a signal with a delayed copy of itself as a function of delay. This is often used in time series analysis to help understand the trend in the data over time.

Correlation Analysis Formulas

There are several formulas for correlation analysis, each corresponding to a different type of correlation. Here are some of the most commonly used ones:

Pearson’s Correlation Coefficient (r)

Pearson’s correlation coefficient measures the linear relationship between two variables. The formula is:

   r = Σ[(xi – Xmean)(yi – Ymean)] / sqrt[(Σ(xi – Xmean)²)(Σ(yi – Ymean)²)]

  • xi and yi are the values of X and Y variables.
  • Xmean and Ymean are the mean values of X and Y.
  • Σ denotes the sum of the values.

Spearman’s Rank Correlation Coefficient (rs)

Spearman’s correlation coefficient measures the monotonic relationship between two variables. The formula is:

   rs = 1 – (6Σd² / n(n² – 1))

  • d is the difference between the ranks of corresponding variables.
  • n is the number of observations.

Kendall’s Tau (τ)

Kendall’s Tau is a measure of rank correlation. The formula is:

   τ = (nc – nd) / 0.5n(n-1)

  • nc is the number of concordant pairs.
  • nd is the number of discordant pairs.

This correlation is a special case of Pearson’s correlation, and so, it uses the same formula as Pearson’s correlation.

Phi coefficient is a measure of association for two binary variables. It’s equivalent to Pearson’s correlation in this specific case.

Partial Correlation

The formula for partial correlation is more complex and depends on the Pearson’s correlation coefficients between the variables.

For partial correlation between X and Y given Z:

  rp(xy.z) = (rxy – rxz * ryz) / sqrt[(1 – rxz^2)(1 – ryz^2)]

  • rxy, rxz, ryz are the Pearson’s correlation coefficients.

Correlation Analysis Examples

Here are a few examples of how correlation analysis could be applied in different contexts:

  • Education : A researcher might want to determine if there’s a relationship between the amount of time students spend studying each week and their exam scores. The two variables would be “study time” and “exam scores”. If a positive correlation is found, it means that students who study more tend to score higher on exams.
  • Healthcare : A healthcare researcher might be interested in understanding the relationship between age and cholesterol levels. If a positive correlation is found, it could mean that as people age, their cholesterol levels tend to increase.
  • Economics : An economist may want to investigate if there’s a correlation between the unemployment rate and the rate of crime in a given city. If a positive correlation is found, it could suggest that as the unemployment rate increases, the crime rate also tends to increase.
  • Marketing : A marketing analyst might want to analyze the correlation between advertising expenditure and sales revenue. A positive correlation would suggest that higher advertising spending is associated with higher sales revenue.
  • Environmental Science : A scientist might be interested in whether there’s a relationship between the amount of CO2 emissions and average temperature increase. A positive correlation would indicate that higher CO2 emissions are associated with higher average temperatures.

Importance of Correlation Analysis

Correlation analysis plays a crucial role in many fields of study for several reasons:

  • Understanding Relationships : Correlation analysis provides a statistical measure of the relationship between two or more variables. It helps in understanding how one variable may change in relation to another.
  • Predicting Trends : When variables are correlated, changes in one can predict changes in another. This is particularly useful in fields like finance, weather forecasting, and technology, where forecasting trends is vital.
  • Data Reduction : If two variables are highly correlated, they are conveying similar information, and you may decide to use only one of them in your analysis, reducing the dimensionality of your data.
  • Testing Hypotheses : Correlation analysis can be used to test hypotheses about relationships between variables. For example, a researcher might want to test whether there’s a significant positive correlation between physical exercise and mental health.
  • Determining Factors : It can help identify factors that are associated with certain behaviors or outcomes. For example, public health researchers might analyze correlations to identify risk factors for diseases.
  • Model Building : Correlation is a fundamental concept in building multivariate statistical models, including regression models and structural equation models. These models often require an understanding of the inter-relationships (correlations) among multiple variables.
  • Validity and Reliability Analysis : In psychometrics, correlation analysis is used to assess the validity and reliability of measurement instruments such as tests or surveys.

Applications of Correlation Analysis

Correlation analysis is used in many fields to understand and quantify the relationship between variables. Here are some of its key applications:

  • Finance : In finance, correlation analysis is used to understand the relationship between different investment types or the risk and return of a portfolio. For example, if two stocks are positively correlated, they tend to move together; if they’re negatively correlated, they move in opposite directions.
  • Economics : Economists use correlation analysis to understand the relationship between various economic indicators, such as GDP and unemployment rate, inflation rate and interest rates, or income and consumption patterns.
  • Marketing : Correlation analysis can help marketers understand the relationship between advertising spend and sales, or the relationship between price changes and demand.
  • Psychology : In psychology, correlation analysis can be used to understand the relationship between different psychological variables, such as the correlation between stress levels and sleep quality, or between self-esteem and academic performance.
  • Medicine : In healthcare, correlation analysis can be used to understand the relationships between various health outcomes and potential predictors. For example, researchers might investigate the correlation between physical activity levels and heart disease, or between smoking and lung cancer.
  • Environmental Science : Correlation analysis can be used to investigate the relationships between different environmental factors, such as the correlation between CO2 levels and average global temperature, or between pesticide use and biodiversity.
  • Social Sciences : In fields like sociology and political science, correlation analysis can be used to investigate relationships between different social and political phenomena, such as the correlation between education levels and political participation, or between income inequality and social unrest.

Advantages and Disadvantages of Correlation Analysis

AdvantagesDisadvantages
Provides statistical measure of the relationship between variables.Cannot establish causality, only association.
Useful for prediction if variables are known to have a correlation.Can be misleading if important variables are left out (omitted variable bias).
Can help in hypothesis testing about the relationships between variables.Outliers can greatly affect the correlation coefficient.
Can help in data reduction by identifying closely related variables.Assumes a linear relationship in Pearson correlation, which may not always hold.
Fundamental concept in building multivariate statistical models.May not capture complex relationships (e.g., quadratic or cyclical relationships).
Helps in validity and reliability analysis in psychometrics.Correlation can be affected by the range of observed values (restriction of range).

About the author

' src=

Muhammad Hassan

Researcher, Academic Writer, Web developer

You may also like

Documentary Analysis

Documentary Analysis – Methods, Applications and...

Framework Analysis

Framework Analysis – Method, Types and Examples

Graphical Methods

Graphical Methods – Types, Examples and Guide

Bimodal Histogram

Bimodal Histogram – Definition, Examples

Critical Analysis

Critical Analysis – Types, Examples and Writing...

Inferential Statistics

Inferential Statistics – Types, Methods and...

  • Bipolar Disorder
  • Therapy Center
  • When To See a Therapist
  • Types of Therapy
  • Best Online Therapy
  • Best Couples Therapy
  • Managing Stress
  • Sleep and Dreaming
  • Understanding Emotions
  • Self-Improvement
  • Healthy Relationships
  • Student Resources
  • Personality Types
  • Sweepstakes
  • Guided Meditations
  • Verywell Mind Insights
  • 2024 Verywell Mind 25
  • Mental Health in the Classroom
  • Editorial Process
  • Meet Our Review Board
  • Crisis Support

Correlation Studies in Psychology Research

Determining the relationship between two or more variables.

Verywell / Brianna Gilmartin

  • Characteristics

Potential Pitfalls

Frequently asked questions.

A correlational study is a type of research design that looks at the relationships between two or more variables. Correlational studies are non-experimental, which means that the experimenter does not manipulate or control any of the variables.

A correlation refers to a relationship between two variables. Correlations can be strong or weak and positive or negative. Sometimes, there is no correlation.

There are three possible outcomes of a correlation study: a positive correlation, a negative correlation, or no correlation. Researchers can present the results using a numerical value called the correlation coefficient, a measure of the correlation strength. It can range from –1.00 (negative) to +1.00 (positive). A correlation coefficient of 0 indicates no correlation.

  • Positive correlations : Both variables increase or decrease at the same time. A correlation coefficient close to +1.00 indicates a strong positive correlation.
  • Negative correlations : As the amount of one variable increases, the other decreases (and vice versa). A correlation coefficient close to -1.00 indicates a strong negative correlation.
  • No correlation : There is no relationship between the two variables. A correlation coefficient of 0 indicates no correlation.

Characteristics of a Correlational Study

Correlational studies are often used in psychology, as well as other fields like medicine. Correlational research is a preliminary way to gather information about a topic. The method is also useful if researchers are unable to perform an experiment.

Researchers use correlations to see if a relationship between two or more variables exists, but the variables themselves are not under the control of the researchers.

While correlational research can demonstrate a relationship between variables, it cannot prove that changing one variable will change another. In other words, correlational studies cannot prove cause-and-effect relationships.

When you encounter research that refers to a "link" or an "association" between two things, they are most likely talking about a correlational study.

Types of Correlational Research

There are three types of correlational research: naturalistic observation, the survey method, and archival research. Each type has its own purpose, as well as its pros and cons.

Naturalistic Observation

The naturalistic observation method involves observing and recording variables of interest in a natural setting without interference or manipulation.  

Can inspire ideas for further research

Option if lab experiment not available

Variables are viewed in natural setting

Can be time-consuming and expensive

Extraneous variables can't be controlled

No scientific control of variables

Subjects might behave differently if aware of being observed

This method is well-suited to studies where researchers want to see how variables behave in their natural setting or state.   Inspiration can then be drawn from the observations to inform future avenues of research.

In some cases, it might be the only method available to researchers; for example, if lab experimentation would be precluded by access, resources, or ethics. It might be preferable to not being able to conduct research at all, but the method can be costly and usually takes a lot of time.  

Naturalistic observation presents several challenges for researchers. For one, it does not allow them to control or influence the variables in any way nor can they change any possible external variables.

However, this does not mean that researchers will get reliable data from watching the variables, or that the information they gather will be free from bias.

For example, study subjects might act differently if they know that they are being watched. The researchers might not be aware that the behavior that they are observing is not necessarily the subject's natural state (i.e., how they would act if they did not know they were being watched).

Researchers also need to be aware of their biases, which can affect the observation and interpretation of a subject's behavior.  

Surveys and questionnaires are some of the most common methods used for psychological research. The survey method involves having a  random sample  of participants complete a survey, test, or questionnaire related to the variables of interest.   Random sampling is vital to the generalizability of a survey's results.

Cheap, easy, and fast

Can collect large amounts of data in a short amount of time

Results can be affected by poor survey questions

Results can be affected by unrepresentative sample

Outcomes can be affected by participants

If researchers need to gather a large amount of data in a short period of time, a survey is likely to be the fastest, easiest, and cheapest option.  

It's also a flexible method because it lets researchers create data-gathering tools that will help ensure they get the information they need (survey responses) from all the sources they want to use (a random sample of participants taking the survey).

Survey data might be cost-efficient and easy to get, but it has its downsides. For one, the data is not always reliable—particularly if the survey questions are poorly written or the overall design or delivery is weak.   Data is also affected by specific faults, such as unrepresented or underrepresented samples .

The use of surveys relies on participants to provide useful data. Researchers need to be aware of the specific factors related to the people taking the survey that will affect its outcome.

For example, some people might struggle to understand the questions. A person might answer a particular way to try to please the researchers or to try to control how the researchers perceive them (such as trying to make themselves "look better").

Sometimes, respondents might not even realize that their answers are incorrect or misleading because of mistaken memories .

Archival Research

Many areas of psychological research benefit from analyzing studies that were conducted long ago by other researchers, as well as reviewing historical records and case studies.

For example, in an experiment known as  "The Irritable Heart ," researchers used digitalized records containing information on American Civil War veterans to learn more about post-traumatic stress disorder (PTSD).

Large amount of data

Can be less expensive

Researchers cannot change participant behavior

Can be unreliable

Information might be missing

No control over data collection methods

Using records, databases, and libraries that are publicly accessible or accessible through their institution can help researchers who might not have a lot of money to support their research efforts.

Free and low-cost resources are available to researchers at all levels through academic institutions, museums, and data repositories around the world.

Another potential benefit is that these sources often provide an enormous amount of data that was collected over a very long period of time, which can give researchers a way to view trends, relationships, and outcomes related to their research.

While the inability to change variables can be a disadvantage of some methods, it can be a benefit of archival research. That said, using historical records or information that was collected a long time ago also presents challenges. For one, important information might be missing or incomplete and some aspects of older studies might not be useful to researchers in a modern context.

A primary issue with archival research is reliability. When reviewing old research, little information might be available about who conducted the research, how a study was designed, who participated in the research, as well as how data was collected and interpreted.

Researchers can also be presented with ethical quandaries—for example, should modern researchers use data from studies that were conducted unethically or with questionable ethics?

You've probably heard the phrase, "correlation does not equal causation." This means that while correlational research can suggest that there is a relationship between two variables, it cannot prove that one variable will change another.

For example, researchers might perform a correlational study that suggests there is a relationship between academic success and a person's self-esteem. However, the study cannot show that academic success changes a person's self-esteem.

To determine why the relationship exists, researchers would need to consider and experiment with other variables, such as the subject's social relationships, cognitive abilities, personality, and socioeconomic status.

The difference between a correlational study and an experimental study involves the manipulation of variables. Researchers do not manipulate variables in a correlational study, but they do control and systematically vary the independent variables in an experimental study. Correlational studies allow researchers to detect the presence and strength of a relationship between variables, while experimental studies allow researchers to look for cause and effect relationships.

If the study involves the systematic manipulation of the levels of a variable, it is an experimental study. If researchers are measuring what is already present without actually changing the variables, then is a correlational study.

The variables in a correlational study are what the researcher measures. Once measured, researchers can then use statistical analysis to determine the existence, strength, and direction of the relationship. However, while correlational studies can say that variable X and variable Y have a relationship, it does not mean that X causes Y.

The goal of correlational research is often to look for relationships, describe these relationships, and then make predictions. Such research can also often serve as a jumping off point for future experimental research. 

Heath W. Psychology Research Methods . Cambridge University Press; 2018:134-156.

Schneider FW. Applied Social Psychology . 2nd ed. SAGE; 2012:50-53.

Curtis EA, Comiskey C, Dempsey O. Importance and use of correlational research .  Nurse Researcher . 2016;23(6):20-25. doi:10.7748/nr.2016.e1382

Carpenter S. Visualizing Psychology . 3rd ed. John Wiley & Sons; 2012:14-30.

Pizarro J, Silver RC, Prause J. Physical and mental health costs of traumatic war experiences among civil war veterans .  Arch Gen Psychiatry . 2006;63(2):193. doi:10.1001/archpsyc.63.2.193

Post SG. The echo of Nuremberg: Nazi data and ethics .  J Med Ethics . 1991;17(1):42-44. doi:10.1136/jme.17.1.42

Lau F. Chapter 12 Methods for Correlational Studies . In: Lau F, Kuziemsky C, eds. Handbook of eHealth Evaluation: An Evidence-based Approach . University of Victoria.

Akoglu H. User's guide to correlation coefficients .  Turk J Emerg Med . 2018;18(3):91-93. doi:10.1016/j.tjem.2018.08.001

Price PC. Research Methods in Psychology . California State University.

By Kendra Cherry, MSEd Kendra Cherry, MS, is a psychosocial rehabilitation specialist, psychology educator, and author of the "Everything Psychology Book."

13.2 Testing the Significance of the Correlation Coefficient

The correlation coefficient, r , tells us about the strength and direction of the linear relationship between X 1 and X 2 .

The sample data are used to compute r , the correlation coefficient for the sample. If we had data for the entire population, we could find the population correlation coefficient. But because we have only sample data, we cannot calculate the population correlation coefficient. The sample correlation coefficient, r , is our estimate of the unknown population correlation coefficient.

  • ρ = population correlation coefficient (unknown)
  • r = sample correlation coefficient (known; calculated from sample data)

The hypothesis test lets us decide whether the value of the population correlation coefficient ρ is "close to zero" or "significantly different from zero". We decide this based on the sample correlation coefficient r and the sample size n .

If the test concludes that the correlation coefficient is significantly different from zero, we say that the correlation coefficient is "significant."

  • Conclusion: There is sufficient evidence to conclude that there is a significant linear relationship between X 1 and X 2 because the correlation coefficient is significantly different from zero.
  • What the conclusion means: There is a significant linear relationship X 1 and X 2 . If the test concludes that the correlation coefficient is not significantly different from zero (it is close to zero), we say that correlation coefficient is "not significant".

Performing the Hypothesis Test

  • Null Hypothesis: H 0 : ρ = 0
  • Alternate Hypothesis: H a : ρ ≠ 0
  • Null Hypothesis H 0 : The population correlation coefficient IS NOT significantly different from zero. There IS NOT a significant linear relationship (correlation) between X 1 and X 2 in the population.
  • Alternate Hypothesis H a : The population correlation coefficient is significantly different from zero. There is a significant linear relationship (correlation) between X 1 and X 2 in the population.

Drawing a Conclusion There are two methods of making the decision concerning the hypothesis. The test statistic to test this hypothesis is:

Where the second formula is an equivalent form of the test statistic, n is the sample size and the degrees of freedom are n-2. This is a t-statistic and operates in the same way as other t tests. Calculate the t-value and compare that with the critical value from the t-table at the appropriate degrees of freedom and the level of confidence you wish to maintain. If the calculated value is in the tail then cannot accept the null hypothesis that there is no linear relationship between these two independent random variables. If the calculated t-value is NOT in the tailed then cannot reject the null hypothesis that there is no linear relationship between the two variables.

A quick shorthand way to test correlations is the relationship between the sample size and the correlation. If:

then this implies that the correlation between the two variables demonstrates that a linear relationship exists and is statistically significant at approximately the 0.05 level of significance. As the formula indicates, there is an inverse relationship between the sample size and the required correlation for significance of a linear relationship. With only 10 observations, the required correlation for significance is 0.6325, for 30 observations the required correlation for significance decreases to 0.3651 and at 100 observations the required level is only 0.2000.

Correlations may be helpful in visualizing the data, but are not appropriately used to "explain" a relationship between two variables. Perhaps no single statistic is more misused than the correlation coefficient. Citing correlations between health conditions and everything from place of residence to eye color have the effect of implying a cause and effect relationship. This simply cannot be accomplished with a correlation coefficient. The correlation coefficient is, of course, innocent of this misinterpretation. It is the duty of the analyst to use a statistic that is designed to test for cause and effect relationships and report only those results if they are intending to make such a claim. The problem is that passing this more rigorous test is difficult so lazy and/or unscrupulous "researchers" fall back on correlations when they cannot make their case legitimately.

This book may not be used in the training of large language models or otherwise be ingested into large language models or generative AI offerings without OpenStax's permission.

Want to cite, share, or modify this book? This book uses the Creative Commons Attribution License and you must attribute OpenStax.

Access for free at https://openstax.org/books/introductory-business-statistics-2e/pages/1-introduction
  • Authors: Alexander Holmes, Barbara Illowsky, Susan Dean
  • Publisher/website: OpenStax
  • Book title: Introductory Business Statistics 2e
  • Publication date: Dec 13, 2023
  • Location: Houston, Texas
  • Book URL: https://openstax.org/books/introductory-business-statistics-2e/pages/1-introduction
  • Section URL: https://openstax.org/books/introductory-business-statistics-2e/pages/13-2-testing-the-significance-of-the-correlation-coefficient

© Jul 18, 2024 OpenStax. Textbook content produced by OpenStax is licensed under a Creative Commons Attribution License . The OpenStax name, OpenStax logo, OpenStax book covers, OpenStax CNX name, and OpenStax CNX logo are not subject to the Creative Commons license and may not be reproduced without the prior and express written consent of Rice University.

Examples

Correlation Hypothesis

Ai generator.

correlational hypothesis example

Understanding the relationships between variables is pivotal in research. Correlation hypotheses delve into the degree of association between two or more variables. In this guide, delve into an array of correlation hypothesis examples that explore connections, followed by a step-by-step tutorial on crafting these thesis statement hypothesis effectively. Enhance your research prowess with valuable tips tailored to unravel the intricate world of correlations.

What is Correlation Hypothesis?

A correlation hypothesis is a statement that predicts a specific relationship between two or more variables based on the assumption that changes in one variable are associated with changes in another variable. It suggests that there is a correlation or statistical relationship between the variables, meaning that when one variable changes, the other variable is likely to change in a consistent manner.

What is an example of a Correlation Hypothesis Statement?

Example: “If the amount of exercise increases, then the level of physical fitness will also increase.”

In this example, the correlation hypothesis suggests that there is a positive correlation between the amount of exercise a person engages in and their level of physical fitness. As exercise increases, the hypothesis predicts that physical fitness will increase as well. This hypothesis can be tested by collecting data on exercise levels and physical fitness levels and analyzing the relationship between the two variables using statistical methods.

100 Correlation Hypothesis Statement Examples

Correlation Hypothesis Statement Examples

Size: 277 KB

Discover the intriguing world of correlation through a collection of examples that illustrate how variables can be linked in research. Explore diverse scenarios where changes in one variable may correspond to changes in another, forming the basis of correlation hypotheses. These real-world instances shed light on the essence of correlation analysis and its role in uncovering connections between different aspects of data.

  • Study Hours and Exam Scores : If students study more hours per week, then their exam scores will show a positive correlation, indicating that increased study time might lead to better performance.
  • Income and Education : If the level of education increases, then income levels will also rise, demonstrating a positive correlation between education attainment and earning potential.
  • Social Media Usage and Well-being : If individuals spend more time on social media platforms, then their self-reported well-being might exhibit a negative correlation, suggesting that excessive use could impact mental health.
  • Temperature and Ice Cream Sales : If temperatures rise, then the sales of ice cream might increase, displaying a positive correlation due to the weather’s influence on consumer behavior.
  • Physical Activity and Heart Rate : If the intensity of physical activity rises, then heart rate might increase, signifying a positive correlation between exercise intensity and heart rate.
  • Age and Reaction Time : If age increases, then reaction time might show a positive correlation, indicating that as people age, their reaction times might slow down.
  • Smoking and Lung Capacity : If the number of cigarettes smoked daily increases, then lung capacity might decrease, suggesting a negative correlation between smoking and respiratory health.
  • Stress and Sleep Quality : If stress levels elevate, then sleep quality might decline, reflecting a negative correlation between psychological stress and restorative sleep.
  • Rainfall and Crop Yield : If the amount of rainfall decreases, then crop yield might also decrease, illustrating a negative correlation between precipitation and agricultural productivity.
  • Screen Time and Academic Performance : If screen time usage increases among students, then academic performance might show a negative correlation, suggesting that excessive screen time could be detrimental to studies.
  • Exercise and Body Weight : If individuals engage in regular exercise, then their body weight might exhibit a negative correlation, implying that physical activity can contribute to weight management.
  • Income and Crime Rates : If income levels decrease in a neighborhood, then crime rates might show a positive correlation, indicating a potential link between socio-economic factors and crime.
  • Social Support and Mental Health : If the level of social support increases, then individuals’ mental health scores may exhibit a positive correlation, highlighting the potential positive impact of strong social networks on psychological well-being.
  • Study Time and GPA : If students spend more time studying, then their Grade Point Average (GPA) might display a positive correlation, suggesting that increased study efforts may lead to higher academic achievement.
  • Parental Involvement and Academic Success : If parents are more involved in their child’s education, then the child’s academic success may show a positive correlation, emphasizing the role of parental support in shaping student outcomes.
  • Alcohol Consumption and Reaction Time : If alcohol consumption increases, then reaction time might slow down, indicating a negative correlation between alcohol intake and cognitive performance.
  • Social Media Engagement and Loneliness : If time spent on social media platforms increases, then feelings of loneliness might show a positive correlation, suggesting a potential connection between excessive online interaction and emotional well-being.
  • Temperature and Insect Activity : If temperatures rise, then the activity of certain insects might increase, demonstrating a potential positive correlation between temperature and insect behavior.
  • Education Level and Voting Participation : If education levels rise, then voter participation rates may also increase, showcasing a positive correlation between education and civic engagement.
  • Work Commute Time and Job Satisfaction : If work commute time decreases, then job satisfaction might show a positive correlation, indicating that shorter commutes could contribute to higher job satisfaction.
  • Sleep Duration and Cognitive Performance : If sleep duration increases, then cognitive performance scores might also rise, suggesting a potential positive correlation between adequate sleep and cognitive functioning.
  • Healthcare Access and Mortality Rate : If access to healthcare services improves, then the mortality rate might decrease, highlighting a potential negative correlation between healthcare accessibility and mortality.
  • Exercise and Blood Pressure : If individuals engage in regular exercise, then their blood pressure levels might exhibit a negative correlation, indicating that physical activity can contribute to maintaining healthy blood pressure.
  • Social Media Use and Academic Distraction : If students spend more time on social media during study sessions, then their academic focus might show a negative correlation, suggesting that excessive online engagement can hinder concentration.
  • Age and Technological Adaptation : If age increases, then the speed of adapting to new technologies might exhibit a negative correlation, suggesting that younger individuals tend to adapt more quickly.
  • Temperature and Plant Growth : If temperatures rise, then the rate of plant growth might increase, indicating a potential positive correlation between temperature and biological processes.
  • Music Exposure and Mood : If individuals listen to upbeat music, then their reported mood might show a positive correlation, suggesting that music can influence emotional states.
  • Income and Healthcare Utilization : If income levels increase, then the frequency of healthcare utilization might decrease, suggesting a potential negative correlation between income and healthcare needs.
  • Distance and Communication Frequency : If physical distance between individuals increases, then their communication frequency might show a negative correlation, indicating that proximity tends to facilitate communication.
  • Study Group Attendance and Exam Scores : If students regularly attend study groups, then their exam scores might exhibit a positive correlation, suggesting that collaborative study efforts could enhance performance.
  • Temperature and Disease Transmission : If temperatures rise, then the transmission of certain diseases might increase, pointing to a potential positive correlation between temperature and disease spread.
  • Interest Rates and Consumer Spending : If interest rates decrease, then consumer spending might show a positive correlation, suggesting that lower interest rates encourage increased economic activity.
  • Digital Device Use and Eye Strain : If individuals spend more time on digital devices, then the occurrence of eye strain might show a positive correlation, suggesting that prolonged screen time can impact eye health.
  • Parental Education and Children’s Educational Attainment : If parents have higher levels of education, then their children’s educational attainment might display a positive correlation, highlighting the intergenerational impact of education.
  • Social Interaction and Happiness : If individuals engage in frequent social interactions, then their reported happiness levels might show a positive correlation, indicating that social connections contribute to well-being.
  • Temperature and Energy Consumption : If temperatures decrease, then energy consumption for heating might increase, suggesting a potential positive correlation between temperature and energy usage.
  • Physical Activity and Stress Reduction : If individuals engage in regular physical activity, then their reported stress levels might display a negative correlation, indicating that exercise can help alleviate stress.
  • Diet Quality and Chronic Diseases : If diet quality improves, then the prevalence of chronic diseases might decrease, suggesting a potential negative correlation between healthy eating habits and disease risk.
  • Social Media Use and Body Image Dissatisfaction : If time spent on social media increases, then feelings of body image dissatisfaction might show a positive correlation, suggesting that online platforms can influence self-perception.
  • Income and Access to Quality Education : If household income increases, then access to quality education for children might improve, suggesting a potential positive correlation between financial resources and educational opportunities.
  • Workplace Diversity and Innovation : If workplace diversity increases, then the rate of innovation might show a positive correlation, indicating that diverse teams often generate more creative solutions.
  • Physical Activity and Bone Density : If individuals engage in weight-bearing exercises, then their bone density might exhibit a positive correlation, suggesting that exercise contributes to bone health.
  • Screen Time and Attention Span : If screen time increases, then attention span might show a negative correlation, indicating that excessive screen exposure can impact sustained focus.
  • Social Support and Resilience : If individuals have strong social support networks, then their resilience levels might display a positive correlation, suggesting that social connections contribute to coping abilities.
  • Weather Conditions and Mood : If sunny weather persists, then individuals’ reported mood might exhibit a positive correlation, reflecting the potential impact of weather on emotional states.
  • Nutrition Education and Healthy Eating : If individuals receive nutrition education, then their consumption of fruits and vegetables might show a positive correlation, suggesting that knowledge influences dietary choices.
  • Physical Activity and Cognitive Aging : If adults engage in regular physical activity, then their cognitive decline with aging might show a slower rate, indicating a potential negative correlation between exercise and cognitive aging.
  • Air Quality and Respiratory Illnesses : If air quality deteriorates, then the incidence of respiratory illnesses might increase, suggesting a potential positive correlation between air pollutants and health impacts.
  • Reading Habits and Vocabulary Growth : If individuals read regularly, then their vocabulary size might exhibit a positive correlation, suggesting that reading contributes to language development.
  • Sleep Quality and Stress Levels : If sleep quality improves, then reported stress levels might display a negative correlation, indicating that sleep can impact psychological well-being.
  • Social Media Engagement and Academic Performance : If students spend more time on social media, then their academic performance might exhibit a negative correlation, suggesting that excessive online engagement can impact studies.
  • Exercise and Blood Sugar Levels : If individuals engage in regular exercise, then their blood sugar levels might display a negative correlation, indicating that physical activity can influence glucose regulation.
  • Screen Time and Sleep Duration : If screen time before bedtime increases, then sleep duration might show a negative correlation, suggesting that screen exposure can affect sleep patterns.
  • Environmental Pollution and Health Outcomes : If exposure to environmental pollutants increases, then the occurrence of health issues might show a positive correlation, suggesting that pollution can impact well-being.
  • Time Management and Academic Achievement : If students improve time management skills, then their academic achievement might exhibit a positive correlation, indicating that effective planning contributes to success.
  • Physical Fitness and Heart Health : If individuals improve their physical fitness, then their heart health indicators might display a positive correlation, indicating that exercise benefits cardiovascular well-being.
  • Weather Conditions and Outdoor Activities : If weather is sunny, then outdoor activities might show a positive correlation, suggesting that favorable weather encourages outdoor engagement.
  • Media Exposure and Body Image Perception : If exposure to media images increases, then body image dissatisfaction might show a positive correlation, indicating media’s potential influence on self-perception.
  • Community Engagement and Civic Participation : If individuals engage in community activities, then their civic participation might exhibit a positive correlation, indicating an active citizenry.
  • Social Media Use and Productivity : If individuals spend more time on social media, then their productivity levels might exhibit a negative correlation, suggesting that online distractions can affect work efficiency.
  • Income and Stress Levels : If income levels increase, then reported stress levels might exhibit a negative correlation, suggesting that financial stability can impact psychological well-being.
  • Social Media Use and Interpersonal Skills : If individuals spend more time on social media, then their interpersonal skills might show a negative correlation, indicating potential effects on face-to-face interactions.
  • Parental Involvement and Academic Motivation : If parents are more involved in their child’s education, then the child’s academic motivation may exhibit a positive correlation, highlighting the role of parental support.
  • Technology Use and Sleep Quality : If screen time increases before bedtime, then sleep quality might show a negative correlation, suggesting that technology use can impact sleep.
  • Outdoor Activity and Mood Enhancement : If individuals engage in outdoor activities, then their reported mood might display a positive correlation, suggesting the potential emotional benefits of nature exposure.
  • Income Inequality and Social Mobility : If income inequality increases, then social mobility might exhibit a negative correlation, suggesting that higher inequality can hinder upward mobility.
  • Vegetable Consumption and Heart Health : If individuals increase their vegetable consumption, then heart health indicators might show a positive correlation, indicating the potential benefits of a nutritious diet.
  • Online Learning and Academic Achievement : If students engage in online learning, then their academic achievement might display a positive correlation, highlighting the effectiveness of digital education.
  • Emotional Intelligence and Workplace Performance : If emotional intelligence improves, then workplace performance might exhibit a positive correlation, indicating the relevance of emotional skills.
  • Community Engagement and Mental Well-being : If individuals engage in community activities, then their reported mental well-being might show a positive correlation, emphasizing social connections’ impact.
  • Rainfall and Agriculture Productivity : If rainfall levels increase, then agricultural productivity might exhibit a positive correlation, indicating the importance of water for crops.
  • Social Media Use and Body Posture : If screen time increases, then poor body posture might show a positive correlation, suggesting that screen use can influence physical habits.
  • Marital Satisfaction and Relationship Length : If marital satisfaction decreases, then relationship length might show a negative correlation, indicating potential challenges over time.
  • Exercise and Anxiety Levels : If individuals engage in regular exercise, then reported anxiety levels might exhibit a negative correlation, indicating the potential benefits of physical activity on mental health.
  • Music Listening and Concentration : If individuals listen to instrumental music, then their concentration levels might display a positive correlation, suggesting music’s impact on focus.
  • Internet Usage and Attention Deficits : If screen time increases, then attention deficits might show a positive correlation, implying that excessive internet use can affect concentration.
  • Financial Literacy and Debt Levels : If financial literacy improves, then personal debt levels might exhibit a negative correlation, suggesting better financial decision-making.
  • Time Spent Outdoors and Vitamin D Levels : If time spent outdoors increases, then vitamin D levels might show a positive correlation, indicating sun exposure’s role in vitamin synthesis.
  • Family Meal Frequency and Nutrition : If families eat meals together frequently, then nutrition quality might display a positive correlation, emphasizing family dining’s impact on health.
  • Temperature and Allergy Symptoms : If temperatures rise, then allergy symptoms might increase, suggesting a potential positive correlation between temperature and allergen exposure.
  • Social Media Use and Academic Distraction : If students spend more time on social media, then their academic focus might exhibit a negative correlation, indicating that online engagement can hinder studies.
  • Financial Stress and Health Outcomes : If financial stress increases, then the occurrence of health issues might show a positive correlation, suggesting potential health impacts of economic strain.
  • Study Hours and Test Anxiety : If students study more hours, then test anxiety might show a negative correlation, suggesting that increased preparation can reduce anxiety.
  • Music Tempo and Exercise Intensity : If music tempo increases, then exercise intensity might display a positive correlation, indicating music’s potential to influence workout vigor.
  • Green Space Accessibility and Stress Reduction : If access to green spaces improves, then reported stress levels might exhibit a negative correlation, highlighting nature’s stress-reducing effects.
  • Parenting Style and Child Behavior : If authoritative parenting increases, then positive child behaviors might display a positive correlation, suggesting parenting’s influence on behavior.
  • Sleep Quality and Productivity : If sleep quality improves, then work productivity might show a positive correlation, emphasizing the connection between rest and efficiency.
  • Media Consumption and Political Beliefs : If media consumption increases, then alignment with specific political beliefs might exhibit a positive correlation, suggesting media’s influence on ideology.
  • Workplace Satisfaction and Employee Retention : If workplace satisfaction increases, then employee retention rates might show a positive correlation, indicating the link between job satisfaction and tenure.
  • Digital Device Use and Eye Discomfort : If screen time increases, then reported eye discomfort might show a positive correlation, indicating potential impacts of screen exposure.
  • Age and Adaptability to Technology : If age increases, then adaptability to new technologies might exhibit a negative correlation, indicating generational differences in tech adoption.
  • Physical Activity and Mental Health : If individuals engage in regular physical activity, then reported mental health scores might exhibit a positive correlation, showcasing exercise’s impact.
  • Video Gaming and Attention Span : If time spent on video games increases, then attention span might display a negative correlation, indicating potential effects on focus.
  • Social Media Use and Empathy Levels : If social media use increases, then reported empathy levels might show a negative correlation, suggesting possible effects on emotional understanding.
  • Reading Habits and Creativity : If individuals read diverse genres, then their creative thinking might exhibit a positive correlation, emphasizing reading’s cognitive benefits.
  • Weather Conditions and Outdoor Exercise : If weather is pleasant, then outdoor exercise might show a positive correlation, suggesting weather’s influence on physical activity.
  • Parental Involvement and Bullying Prevention : If parents are actively involved, then instances of bullying might exhibit a negative correlation, emphasizing parental impact on behavior.
  • Digital Device Use and Sleep Disruption : If screen time before bedtime increases, then sleep disruption might show a positive correlation, indicating technology’s influence on sleep.
  • Friendship Quality and Psychological Well-being : If friendship quality increases, then reported psychological well-being might show a positive correlation, highlighting social support’s impact.
  • Income and Environmental Consciousness : If income levels increase, then environmental consciousness might also rise, indicating potential links between affluence and sustainability awareness.

Correlational Hypothesis Interpretation Statement Examples

Explore the art of interpreting correlation hypotheses with these illustrative examples. Understand the implications of positive, negative, and zero correlations, and learn how to deduce meaningful insights from data relationships.

  • Relationship Between Exercise and Mood : A positive correlation between exercise frequency and mood scores suggests that increased physical activity might contribute to enhanced emotional well-being.
  • Association Between Screen Time and Sleep Quality : A negative correlation between screen time before bedtime and sleep quality indicates that higher screen exposure could lead to poorer sleep outcomes.
  • Connection Between Study Hours and Exam Performance : A positive correlation between study hours and exam scores implies that increased study time might correspond to better academic results.
  • Link Between Stress Levels and Meditation Practice : A negative correlation between stress levels and meditation frequency suggests that engaging in meditation could be associated with lower perceived stress.
  • Relationship Between Social Media Use and Loneliness : A positive correlation between social media engagement and feelings of loneliness implies that excessive online interaction might contribute to increased loneliness.
  • Association Between Income and Happiness : A positive correlation between income and self-reported happiness indicates that higher income levels might be linked to greater subjective well-being.
  • Connection Between Parental Involvement and Academic Performance : A positive correlation between parental involvement and students’ grades suggests that active parental engagement might contribute to better academic outcomes.
  • Link Between Time Management and Stress Levels : A negative correlation between effective time management and reported stress levels implies that better time management skills could lead to lower stress.
  • Relationship Between Outdoor Activities and Vitamin D Levels : A positive correlation between time spent outdoors and vitamin D levels suggests that increased outdoor engagement might be associated with higher vitamin D concentrations.
  • Association Between Water Consumption and Skin Hydration : A positive correlation between water intake and skin hydration indicates that higher fluid consumption might lead to improved skin moisture levels.

Alternative Correlational Hypothesis Statement Examples

Explore alternative scenarios and potential correlations in these examples. Learn to articulate different hypotheses that could explain data relationships beyond the conventional assumptions.

  • Alternative to Exercise and Mood : An alternative hypothesis could suggest a non-linear relationship between exercise and mood, indicating that moderate exercise might have the most positive impact on emotional well-being.
  • Alternative to Screen Time and Sleep Quality : An alternative hypothesis might propose that screen time has a curvilinear relationship with sleep quality, suggesting that moderate screen exposure leads to optimal sleep outcomes.
  • Alternative to Study Hours and Exam Performance : An alternative hypothesis could propose that there’s an interaction effect between study hours and study method, influencing the relationship between study time and exam scores.
  • Alternative to Stress Levels and Meditation Practice : An alternative hypothesis might consider that the relationship between stress levels and meditation practice is moderated by personality traits, resulting in varying effects.
  • Alternative to Social Media Use and Loneliness : An alternative hypothesis could posit that the relationship between social media use and loneliness depends on the quality of online interactions and content consumption.
  • Alternative to Income and Happiness : An alternative hypothesis might propose that the relationship between income and happiness differs based on cultural factors, leading to varying happiness levels at different income ranges.
  • Alternative to Parental Involvement and Academic Performance : An alternative hypothesis could suggest that the relationship between parental involvement and academic performance varies based on students’ learning styles and preferences.
  • Alternative to Time Management and Stress Levels : An alternative hypothesis might explore the possibility of a curvilinear relationship between time management and stress levels, indicating that extreme time management efforts might elevate stress.
  • Alternative to Outdoor Activities and Vitamin D Levels : An alternative hypothesis could consider that the relationship between outdoor activities and vitamin D levels is moderated by sunscreen usage, influencing vitamin synthesis.
  • Alternative to Water Consumption and Skin Hydration : An alternative hypothesis might propose that the relationship between water consumption and skin hydration is mediated by dietary factors, influencing fluid retention and skin health.

Correlational Hypothesis Pearson Interpretation Statement Examples

Discover how the Pearson correlation coefficient enhances your understanding of data relationships with these examples. Learn to interpret correlation strength and direction using this valuable statistical measure.

  • Strong Positive Correlation : A Pearson correlation coefficient of +0.85 between study time and exam scores indicates a strong positive relationship, suggesting that increased study time is strongly associated with higher grades.
  • Moderate Negative Correlation : A Pearson correlation coefficient of -0.45 between screen time and sleep quality reflects a moderate negative correlation, implying that higher screen exposure is moderately linked to poorer sleep outcomes.
  • Weak Positive Correlation : A Pearson correlation coefficient of +0.25 between social media use and loneliness suggests a weak positive correlation, indicating that increased online engagement is weakly related to higher loneliness.
  • Strong Negative Correlation : A Pearson correlation coefficient of -0.75 between stress levels and meditation practice indicates a strong negative relationship, implying that engaging in meditation is strongly associated with lower stress.
  • Moderate Positive Correlation : A Pearson correlation coefficient of +0.60 between income and happiness signifies a moderate positive correlation, suggesting that higher income is moderately linked to greater happiness.
  • Weak Negative Correlation : A Pearson correlation coefficient of -0.30 between parental involvement and academic performance represents a weak negative correlation, indicating that higher parental involvement is weakly associated with lower academic performance.
  • Strong Positive Correlation : A Pearson correlation coefficient of +0.80 between time management and stress levels reveals a strong positive relationship, suggesting that effective time management is strongly linked to lower stress.
  • Weak Negative Correlation : A Pearson correlation coefficient of -0.20 between outdoor activities and vitamin D levels signifies a weak negative correlation, implying that higher outdoor engagement is weakly related to lower vitamin D levels.
  • Moderate Positive Correlation : A Pearson correlation coefficient of +0.50 between water consumption and skin hydration denotes a moderate positive correlation, suggesting that increased fluid intake is moderately linked to better skin hydration.
  • Strong Negative Correlation : A Pearson correlation coefficient of -0.70 between screen time and attention span indicates a strong negative relationship, implying that higher screen exposure is strongly associated with shorter attention spans.

Correlational Hypothesis Statement Examples in Psychology

Explore how correlation hypotheses apply to psychological research with these examples. Understand how psychologists investigate relationships between variables to gain insights into human behavior.

  • Sleep Patterns and Cognitive Performance : There is a positive correlation between consistent sleep patterns and cognitive performance, suggesting that individuals with regular sleep schedules exhibit better cognitive functioning.
  • Anxiety Levels and Social Media Use : There is a positive correlation between anxiety levels and excessive social media use, indicating that individuals who spend more time on social media might experience higher anxiety.
  • Self-Esteem and Body Image Satisfaction : There is a negative correlation between self-esteem and body image satisfaction, implying that individuals with higher self-esteem tend to be more satisfied with their physical appearance.
  • Parenting Styles and Child Aggression : There is a negative correlation between authoritative parenting styles and child aggression, suggesting that children raised by authoritative parents might exhibit lower levels of aggression.
  • Emotional Intelligence and Conflict Resolution : There is a positive correlation between emotional intelligence and effective conflict resolution, indicating that individuals with higher emotional intelligence tend to resolve conflicts more successfully.
  • Personality Traits and Career Satisfaction : There is a positive correlation between certain personality traits (e.g., extraversion, openness) and career satisfaction, suggesting that individuals with specific traits experience higher job contentment.
  • Stress Levels and Coping Mechanisms : There is a negative correlation between stress levels and adaptive coping mechanisms, indicating that individuals with lower stress levels are more likely to employ effective coping strategies.
  • Attachment Styles and Romantic Relationship Quality : There is a positive correlation between secure attachment styles and higher romantic relationship quality, suggesting that individuals with secure attachments tend to have healthier relationships.
  • Social Support and Mental Health : There is a negative correlation between perceived social support and mental health issues, indicating that individuals with strong social support networks tend to experience fewer mental health challenges.
  • Motivation and Academic Achievement : There is a positive correlation between intrinsic motivation and academic achievement, implying that students who are internally motivated tend to perform better academically.

Does Correlational Research Have Hypothesis?

Correlational research involves examining the relationship between two or more variables to determine whether they are related and how they change together. While correlational studies do not establish causation, they still utilize hypotheses to formulate expectations about the relationships between variables. These good hypotheses predict the presence, direction, and strength of correlations. However, in correlational research, the focus is on measuring and analyzing the degree of association rather than establishing cause-and-effect relationships.

How Do You Write a Null-Hypothesis for a Correlational Study?

The null hypothesis in a correlational study states that there is no significant correlation between the variables being studied. It assumes that any observed correlation is due to chance and lacks meaningful association. When writing a null hypothesis for a correlational study, follow these steps:

  • Identify the Variables: Clearly define the variables you are studying and their relationship (e.g., “There is no significant correlation between X and Y”).
  • Specify the Population: Indicate the population from which the data is drawn (e.g., “In the population of [target population]…”).
  • Include the Direction of Correlation: If relevant, specify the direction of correlation (positive, negative, or zero) that you are testing (e.g., “…there is no significant positive/negative correlation…”).
  • State the Hypothesis: Write the null hypothesis as a clear statement that there is no significant correlation between the variables (e.g., “…there is no significant correlation between X and Y”).

What Is Correlation Hypothesis Formula?

The correlation hypothesis is often expressed in the form of a statement that predicts the presence and nature of a relationship between two variables. It typically follows the “If-Then” structure, indicating the expected change in one variable based on changes in another. The correlation hypothesis formula can be written as:

“If [Variable X] changes, then [Variable Y] will also change [in a specified direction] because [rationale for the expected correlation].”

For example, “If the amount of exercise increases, then mood scores will improve because physical activity has been linked to better emotional well-being.”

What Is a Correlational Hypothesis in Research Methodology?

A correlational hypothesis in research methodology is a testable hypothesis statement that predicts the presence and nature of a relationship between two or more variables. It forms the basis for conducting a correlational study, where the goal is to measure and analyze the degree of association between variables. Correlational hypotheses are essential in guiding the research process, collecting relevant data, and assessing whether the observed correlations are statistically significant.

How Do You Write a Hypothesis for Correlation? – A Step by Step Guide

Writing a hypothesis for correlation involves crafting a clear and testable statement about the expected relationship between variables. Here’s a step-by-step guide:

  • Identify Variables : Clearly define the variables you are studying and their nature (e.g., “There is a relationship between X and Y…”).
  • Specify Direction : Indicate the expected direction of correlation (positive, negative, or zero) based on your understanding of the variables and existing literature.
  • Formulate the If-Then Statement : Write an “If-Then” statement that predicts the change in one variable based on changes in the other variable (e.g., “If [Variable X] changes, then [Variable Y] will also change [in a specified direction]…”).
  • Provide Rationale : Explain why you expect the correlation to exist, referencing existing theories, research, or logical reasoning.
  • Quantitative Prediction (Optional) : If applicable, provide a quantitative prediction about the strength of the correlation (e.g., “…for every one unit increase in [Variable X], [Variable Y] is predicted to increase by [numerical value].”).
  • Specify Population : Indicate the population to which your hypothesis applies (e.g., “In a sample of [target population]…”).

Tips for Writing Correlational Hypothesis

  • Base on Existing Knowledge : Ground your hypothesis in existing literature, theories, or empirical evidence to ensure it’s well-informed.
  • Be Specific : Clearly define the variables and direction of correlation you’re predicting to avoid ambiguity.
  • Avoid Causation Claims : Remember that correlational hypotheses do not imply causation. Focus on predicting relationships, not causes.
  • Use Clear Language : Write in clear and concise language, avoiding jargon that may confuse readers.
  • Consider Alternative Explanations : Acknowledge potential confounding variables or alternative explanations that could affect the observed correlation.
  • Be Open to Results : Correlation results can be unexpected. Be prepared to interpret findings even if they don’t align with your initial hypothesis.
  • Test Statistically : Once you collect data, use appropriate statistical tests to determine if the observed correlation is statistically significant.
  • Revise as Needed : If your findings don’t support your hypothesis, revise it based on the data and insights gained.

Crafting a well-structured correlational hypothesis is crucial for guiding your research, conducting meaningful analysis, and contributing to the understanding of relationships between variables.

Twitter

Text prompt

  • Instructive
  • Professional

10 Examples of Public speaking

20 Examples of Gas lighting

Hypothesis Testing for Correlation ( AQA A Level Maths: Statistics )

Revision note.

Amber

Hypothesis Testing for Correlation

You should be familiar with using a hypothesis test to determine bias within probability problems. It is also possible to use a hypothesis test to determine whether a given product moment correlation coefficient calculated from a sample could be representative of the same relationship existing within the whole population.  For full information on hypothesis testing, see the revision notes from section 5.1.1 Hypothesis Testing

Why use a hypothesis test?

  • This would involve having data on each individual within the whole population
  • It is very rare that a statistician would have the time or resources to collect all of that data
  • The PMCC for a sample taken from the population is denoted r
  • A hypothesis test would be conducted using the value of to r determine whether the population can be said to have positive, negative or zero correlation

How is a hypothesis test for correlation carried out?

  • Most of the time the hypothesis test will be carried out by using a critical value
  • You won't be expected to calculate p-values but you might be given a p-value
  • The hypothesis test could either be a one-tailed test or a two-tailed test
  • You will be given the critical value in the question  
  • If  r  is not in the critical region the null hypothesis should be accepted and the alternative hypothesis should be rejected

Or: Compare the p - value with the significance level

  • If the p - value is less than the significance level the test is significant and the null hypothesis should be rejected
  • If the p - value is greater than the significance level the null hypothesis should be accepted and the alternative hypothesis should be rejected
  • Use the wording in the question to help you write your conclusion
  • If rejecting the null hypothesis your conclusion should state that there is evidence to accept the context of the alternative hypothesis at the level of significance of the test only
  • If accepting the null hypothesis your conclusion should state that there is not enough evidence to accept the context of the alternative hypothesis at the level of significance of the test only

Worked example

A student believes that there is a positive correlation between the number of hours spent studying for a test and the percentage scored on it.

The student takes a random sample of 10 of his friends and records the amount of revision they did and percentage they score in the test.

Given that the critical value for this test is 0.5494, carry out a hypothesis test at the 5% level of significance to test whether the student’s claim is justified.

aqa-2-5-2-hyp-testing-correlation-we-solution

  • Make sure you read the question carefully to determine whether the test you are carrying out is for a one-tailed or a two-tailed test and use the level of significance accordingly. Be careful when comparing negative values of r with a negative critical value, it is easy to make an error with negative numbers when in an exam situation.

You've read 0 of your 10 free revision notes

Get unlimited access.

to absolutely everything:

  • Downloadable PDFs
  • Unlimited Revision Notes
  • Topic Questions
  • Past Papers
  • Model Answers
  • Videos (Maths and Science)

Join the 100,000 + Students that ❤️ Save My Exams

the (exam) results speak for themselves:

Did this page help you?

Author: Amber

Amber gained a first class degree in Mathematics & Meteorology from the University of Reading before training to become a teacher. She is passionate about teaching, having spent 8 years teaching GCSE and A Level Mathematics both in the UK and internationally. Amber loves creating bright and informative resources to help students reach their potential.

Population, sample and hypothesis testing

What is a hypothesis?

A hypothesis is an assumption that is neither proven nor disproven. In the research process, a hypothesis is made at the very beginning and the goal is to either reject or not reject the hypothesis. In order to reject or or not reject a hypothesis, data, e.g. from an experiment or a survey, are needed, which are then evaluated using a hypothesis test .

Usually, hypotheses are formulated starting from a literature review. Based on the literature review, you can then justify why you formulated the hypothesis in this way.

An example of a hypothesis could be: "Men earn more than women in the same job in Austira."

hypothesis

To test this hypothesis, you need data, e.g. from a survey, and a suitable hypothesis test such as the t-test or correlation analysis . Don't worry, DATAtab will help you choose the right hypothesis test.

How do I formulate a hypothesis?

In order to formulate a hypothesis, a research question must first be defined. A precisely formulated hypothesis about the population can then be derived from the research question, e.g. men earn more than women in the same job in Austria.

Formulate hypothesis

Hypotheses are not simple statements; they are formulated in such a way that they can be tested with collected data in the course of the research process.

To test a hypothesis, it is necessary to define exactly which variables are involved and how the variables are related. Hypotheses, then, are assumptions about the cause-and-effect relationships or the associations between variables.

What is a variable?

A variable is a property of an object or event that can take on different values. For example, the eye color is a variable, it is the property of the object eye and can take different values (blue, brown,...).

If you are researching in the social sciences, your variables may be:

  • Attitude towards environmental protection

If you are researching in the medical field, your variables may be:

  • Body weight
  • Smoking status

What is the null and alternative hypothesis?

There are always two hypotheses that are exactly opposite to each other, or that claim the opposite. These opposite hypotheses are called null and alternative hypothesis and are abbreviated with H0 and H1 .

Null hypothesis H0:

The null hypothesis assumes that there is no difference between two or more groups with respect to a characteristic.

The salary of men and women does not differ in Austria.

Alternative hypothesis H1:

Alternative hypotheses, on the other hand, assume that there is a difference between two or more groups.

The salary of men and women differs in Austria.

The hypothesis that you want to test or that you have derived from the theory usually states that there is an effect e.g. gender has an effect on salary . This hypothesis is called an alternative hypothesis.

The null hypothesis usually states that there is no effect e.g. gender has no effect on salary . In a hypothesis test, only the null hypothesis can be tested; the goal is to find out whether the null hypothesis is rejected or not.

Types of hypotheses

What types of hypotheses are available? The most common distinction is between difference and correlation hypotheses , as well as directional and non-directional hypotheses .

Differential and correlation hypotheses

Difference hypotheses are used when different groups are to be distinguished, e.g., the group of men and the group of women. Correlation hypotheses are used when the relationship or correlation between variables is to be tested, e.g., the relationship between age and height.

Difference hypotheses

Difference hypotheses test whether there is a difference between two or more groups.

Difference hypotheses

Examples of difference hypotheses are:

  • The "group" of men earn more than the "group" of women.
  • Smokers have a higher risk of heart attack than non-smokers
  • There is a difference between Germany, Austria and France in terms of hours worked per week.

Thus, one variable is always a categorical variable, e.g., gender (male, female), smoking status (smoker, nonsmoker), or country (Germany, Austria, and France); the other variable is at least ordinally scaled, e.g., salary, percent risk of heart attack, or hours worked per week.

Correlation hypotheses

Correlation hypotheses test correlations between two variables, for example height and body weight

Correlation hypotheses

Correlation hypotheses are, for example:

  • The taller a person is, the heavier he is.
  • The more horsepower a car has, the higher its fuel consumption.
  • The better the math grade, the higher the future salary.

As can be seen from the examples, correlation hypotheses often take the form "The more..., the higher/lower...". Thus, at least two ordinally scaled variables are being examined.

Directional and non-directional hypotheses

Hypotheses are divided into directional and non-directional or one-sided and two-sided hypotheses. If the hypothesis contains words like "better than" or "worse than", the hypothesis is usually directional.

directional hypotheses

In the case of a non-directional hypothesis, one often finds building blocks such as "there is a difference between" in the formulation, but it is not stated in which direction the difference lies.

  • With a non-directional hypothesis , the only thing of interest is whether there is a difference in a value between the groups under consideration.
  • In a directional hypothesis , what is of interest is whether one group has a higher or lower value than the other.

Directional and non-directional hypothesis test

Non-directional hypotheses

Non-directional hypotheses test whether there is a relationship or a difference, and it does not matter in which direction the relationship or difference goes. In the case of a difference hypothesis, this means there is a difference between two groups, but it does not say whether one of the groups has a higher value.

  • There is a difference between the salary of men and women (but it is not said who earns more!).
  • There is a difference in the risk of heart attack between smokers and non-smokers (but it is not said who has the higher risk!).

In regard to a correlation hypothesis, this means there is a relationship or correlation between two variables, but it is not said whether this relationship is positive or negative.

  • There is a correlation, between height and weight.
  • There is a correlation between horsepower and fuel consumption in cars.

In both cases it is not said whether this correlation is positive or negative!

Directional hypotheses

Directional hypotheses additionally indicate the direction of the relationship or the difference. In the case of the difference hypothesis a statement is made which group has a higher or lower value.

  • Men earn more than women

In the case of a correlation hypothesis, a statement is made as to whether the correlation is positive or negative.

  • The taller a person is the heavier he is
  • The more horsepower a car has, the higher its fuel economy

The p-value for directional hypotheses

Usually, statistical software always calculates the non-directional test and then also outputs the p-value for this.

To obtain the p-value for the directional hypothesis, it must first be checked whether the effect is in the right direction. Then the p-value must be divided by two. This is because the significance level is not split on two sides, but only on one side. More about this in the tutorial about the p-value .

If you select a directed alternative hypothesis in DATAtab for the calculated hypothesis test, the conversion is done automatically and you only need to read the result.

Step-by-step instructions for testing hypotheses

  • Literature research
  • Formulation of the hypothesis
  • Define scale level
  • Determine significance level
  • Determination of hypothesis type
  • Which hypothesis test is suitable for the scale level and hypothesis type?

Next tutorial about hypothesis testing

The next tutorial is about hypothesis testing. You will learn what hypothesis tests are, how to find the right one and how to interpret it.

Statistics made easy

  • many illustrative examples
  • ideal for exams and theses
  • statistics made easy on 412 pages
  • 5rd revised edition (April 2024)
  • Only 8.99 €

Datatab

"Super simple written"

"It could not be simpler"

"So many helpful examples"

Statistics Calculator

Cite DATAtab: DATAtab Team (2024). DATAtab: Online Statistics Calculator. DATAtab e.U. Graz, Austria. URL https://datatab.net

Have a language expert improve your writing

Run a free plagiarism check in 10 minutes, generate accurate citations for free.

  • Knowledge Base

Methodology

  • How to Write a Strong Hypothesis | Steps & Examples

How to Write a Strong Hypothesis | Steps & Examples

Published on May 6, 2022 by Shona McCombes . Revised on November 20, 2023.

A hypothesis is a statement that can be tested by scientific research. If you want to test a relationship between two or more variables, you need to write hypotheses before you start your experiment or data collection .

Example: Hypothesis

Daily apple consumption leads to fewer doctor’s visits.

Table of contents

What is a hypothesis, developing a hypothesis (with example), hypothesis examples, other interesting articles, frequently asked questions about writing hypotheses.

A hypothesis states your predictions about what your research will find. It is a tentative answer to your research question that has not yet been tested. For some research projects, you might have to write several hypotheses that address different aspects of your research question.

A hypothesis is not just a guess – it should be based on existing theories and knowledge. It also has to be testable, which means you can support or refute it through scientific research methods (such as experiments, observations and statistical analysis of data).

Variables in hypotheses

Hypotheses propose a relationship between two or more types of variables .

  • An independent variable is something the researcher changes or controls.
  • A dependent variable is something the researcher observes and measures.

If there are any control variables , extraneous variables , or confounding variables , be sure to jot those down as you go to minimize the chances that research bias  will affect your results.

In this example, the independent variable is exposure to the sun – the assumed cause . The dependent variable is the level of happiness – the assumed effect .

Prevent plagiarism. Run a free check.

Step 1. ask a question.

Writing a hypothesis begins with a research question that you want to answer. The question should be focused, specific, and researchable within the constraints of your project.

Step 2. Do some preliminary research

Your initial answer to the question should be based on what is already known about the topic. Look for theories and previous studies to help you form educated assumptions about what your research will find.

At this stage, you might construct a conceptual framework to ensure that you’re embarking on a relevant topic . This can also help you identify which variables you will study and what you think the relationships are between them. Sometimes, you’ll have to operationalize more complex constructs.

Step 3. Formulate your hypothesis

Now you should have some idea of what you expect to find. Write your initial answer to the question in a clear, concise sentence.

4. Refine your hypothesis

You need to make sure your hypothesis is specific and testable. There are various ways of phrasing a hypothesis, but all the terms you use should have clear definitions, and the hypothesis should contain:

  • The relevant variables
  • The specific group being studied
  • The predicted outcome of the experiment or analysis

5. Phrase your hypothesis in three ways

To identify the variables, you can write a simple prediction in  if…then form. The first part of the sentence states the independent variable and the second part states the dependent variable.

In academic research, hypotheses are more commonly phrased in terms of correlations or effects, where you directly state the predicted relationship between variables.

If you are comparing two groups, the hypothesis can state what difference you expect to find between them.

6. Write a null hypothesis

If your research involves statistical hypothesis testing , you will also have to write a null hypothesis . The null hypothesis is the default position that there is no association between the variables. The null hypothesis is written as H 0 , while the alternative hypothesis is H 1 or H a .

  • H 0 : The number of lectures attended by first-year students has no effect on their final exam scores.
  • H 1 : The number of lectures attended by first-year students has a positive effect on their final exam scores.
Research question Hypothesis Null hypothesis
What are the health benefits of eating an apple a day? Increasing apple consumption in over-60s will result in decreasing frequency of doctor’s visits. Increasing apple consumption in over-60s will have no effect on frequency of doctor’s visits.
Which airlines have the most delays? Low-cost airlines are more likely to have delays than premium airlines. Low-cost and premium airlines are equally likely to have delays.
Can flexible work arrangements improve job satisfaction? Employees who have flexible working hours will report greater job satisfaction than employees who work fixed hours. There is no relationship between working hour flexibility and job satisfaction.
How effective is high school sex education at reducing teen pregnancies? Teenagers who received sex education lessons throughout high school will have lower rates of unplanned pregnancy teenagers who did not receive any sex education. High school sex education has no effect on teen pregnancy rates.
What effect does daily use of social media have on the attention span of under-16s? There is a negative between time spent on social media and attention span in under-16s. There is no relationship between social media use and attention span in under-16s.

If you want to know more about the research process , methodology , research bias , or statistics , make sure to check out some of our other articles with explanations and examples.

  • Sampling methods
  • Simple random sampling
  • Stratified sampling
  • Cluster sampling
  • Likert scales
  • Reproducibility

 Statistics

  • Null hypothesis
  • Statistical power
  • Probability distribution
  • Effect size
  • Poisson distribution

Research bias

  • Optimism bias
  • Cognitive bias
  • Implicit bias
  • Hawthorne effect
  • Anchoring bias
  • Explicit bias

Receive feedback on language, structure, and formatting

Professional editors proofread and edit your paper by focusing on:

  • Academic style
  • Vague sentences
  • Style consistency

See an example

correlational hypothesis example

A hypothesis is not just a guess — it should be based on existing theories and knowledge. It also has to be testable, which means you can support or refute it through scientific research methods (such as experiments, observations and statistical analysis of data).

Null and alternative hypotheses are used in statistical hypothesis testing . The null hypothesis of a test always predicts no effect or no relationship between variables, while the alternative hypothesis states your research prediction of an effect or relationship.

Hypothesis testing is a formal procedure for investigating our ideas about the world using statistics. It is used by scientists to test specific predictions, called hypotheses , by calculating how likely it is that a pattern or relationship between variables could have arisen by chance.

Cite this Scribbr article

If you want to cite this source, you can copy and paste the citation or click the “Cite this Scribbr article” button to automatically add the citation to our free Citation Generator.

McCombes, S. (2023, November 20). How to Write a Strong Hypothesis | Steps & Examples. Scribbr. Retrieved August 12, 2024, from https://www.scribbr.com/methodology/hypothesis/

Is this article helpful?

Shona McCombes

Shona McCombes

Other students also liked, construct validity | definition, types, & examples, what is a conceptual framework | tips & examples, operationalization | a guide with examples, pros & cons, "i thought ai proofreading was useless but..".

I've been using Scribbr for years now and I know it's a service that won't disappoint. It does a good job spotting mistakes”

  • Skip to main content
  • Skip to primary sidebar
  • Skip to footer
  • QuestionPro

survey software icon

  • Solutions Industries Gaming Automotive Sports and events Education Government Travel & Hospitality Financial Services Healthcare Cannabis Technology Use Case AskWhy Communities Audience Contactless surveys Mobile LivePolls Member Experience GDPR Positive People Science 360 Feedback Surveys
  • Resources Blog eBooks Survey Templates Case Studies Training Help center

correlational hypothesis example

Home Market Research

Correlational Research: What it is with Examples

Use correlational research method to conduct a correlational study and measure the statistical relationship between two variables. Learn more.

Our minds can do some brilliant things. For example, it can memorize the jingle of a pizza truck. The louder the jingle, the closer the pizza truck is to us. Who taught us that? Nobody! We relied on our understanding and came to a conclusion. We don’t stop there, do we? If there are multiple pizza trucks in the area and each one has a different jingle, we would memorize it all and relate the jingle to its pizza truck.

This is what correlational research precisely is, establishing a relationship between two variables, “jingle” and “distance of the truck” in this particular example. The correlational study looks for variables that seem to interact with each other. When you see one variable changing, you have a fair idea of how the other variable will change.

What is Correlational research?

Correlational research is a type of non-experimental research method in which a researcher measures two variables and understands and assesses the statistical relationship between them with no influence from any extraneous variable. In statistical analysis, distinguishing between categorical data and numerical data is essential, as categorical data involves distinct categories or labels, while numerical data consists of measurable quantities.

Correlational Research Example

The correlation coefficient shows the correlation between two variables (A correlation coefficient is a statistical measure that calculates the strength of the relationship between two variables), a value measured between -1 and +1. When the correlation coefficient is close to +1, there is a positive correlation between the two variables. If the value is relative to -1, there is a negative correlation between the two variables. When the value is close to zero, then there is no relationship between the two variables.

Let us take an example to understand correlational research.

Consider hypothetically, a researcher is studying a correlation between cancer and marriage. In this study, there are two variables: disease and marriage. Let us say marriage has a negative association with cancer. This means that married people are less likely to develop cancer.

However, this doesn’t necessarily mean that marriage directly avoids cancer. In correlational research, it is not possible to establish the fact, what causes what. It is a misconception that a correlational study involves two quantitative variables. However, the reality is two variables are measured, but neither is changed. This is true independent of whether the variables are quantitative or categorical.

Types of correlational research

Mainly three types of correlational research have been identified:

1. Positive correlation: A positive relationship between two variables is when an increase in one variable leads to a rise in the other variable. A decrease in one variable will see a reduction in the other variable. For example, the amount of money a person has might positively correlate with the number of cars the person owns.

2. Negative correlation: A negative correlation is quite literally the opposite of a positive relationship. If there is an increase in one variable, the second variable will show a decrease, and vice versa.

For example, being educated might negatively correlate with the crime rate when an increase in one variable leads to a decrease in another and vice versa. If a country’s education level is improved, it can lower crime rates. Please note that this doesn’t mean that lack of education leads to crimes. It only means that a lack of education and crime is believed to have a common reason – poverty.

3. No correlation: There is no correlation between the two variables in this third type . A change in one variable may not necessarily see a difference in the other variable. For example, being a millionaire and happiness are not correlated. An increase in money doesn’t lead to happiness.

Characteristics of correlational research

Correlational research has three main characteristics. They are: 

  • Non-experimental : The correlational study is non-experimental. It means that researchers need not manipulate variables with a scientific methodology to either agree or disagree with a hypothesis. The researcher only measures and observes the relationship between the variables without altering them or subjecting them to external conditioning.
  • Backward-looking : Correlational research only looks back at historical data and observes events in the past. Researchers use it to measure and spot historical patterns between two variables. A correlational study may show a positive relationship between two variables, but this can change in the future.
  • Dynamic : The patterns between two variables from correlational research are never constant and are always changing. Two variables having negative correlation research in the past can have a positive correlation relationship in the future due to various factors.

Data collection

The distinctive feature of correlational research is that the researcher can’t manipulate either of the variables involved. It doesn’t matter how or where the variables are measured. A researcher could observe participants in a closed environment or a public setting.

Correlational Research

Researchers use two data collection methods to collect information in correlational research.

01. Naturalistic observation

Naturalistic observation is a way of data collection in which people’s behavioral targeting is observed in their natural environment, in which they typically exist. This method is a type of field research. It could mean a researcher might be observing people in a grocery store, at the cinema, playground, or in similar places.

Researchers who are usually involved in this type of data collection make observations as unobtrusively as possible so that the participants involved in the study are not aware that they are being observed else they might deviate from being their natural self.

Ethically this method is acceptable if the participants remain anonymous, and if the study is conducted in a public setting, a place where people would not normally expect complete privacy. As mentioned previously, taking an example of the grocery store where people can be observed while collecting an item from the aisle and putting in the shopping bags. This is ethically acceptable, which is why most researchers choose public settings for recording their observations. This data collection method could be both qualitative and quantitative . If you need to know more about qualitative data, you can explore our newly published blog, “ Examples of Qualitative Data in Education .”

02. Archival data

Another approach to correlational data is the use of archival data. Archival information is the data that has been previously collected by doing similar kinds of research . Archival data is usually made available through primary research .

In contrast to naturalistic observation, the information collected through archived data can be pretty straightforward. For example, counting the number of people named Richard in the various states of America based on social security records is relatively short.

Use the correlational research method to conduct a correlational study and measure the statistical relationship between two variables. Uncover the insights that matter the most. Use QuestionPro’s research platform to uncover complex insights that can propel your business to the forefront of your industry.

Research to make better decisions. Start a free trial today. No credit card required.

LEARN MORE         FREE TRIAL

MORE LIKE THIS

Jotform vs Wufoo

Jotform vs Wufoo: Comparison of Features and Prices

Aug 13, 2024

correlational hypothesis example

Product or Service: Which is More Important? — Tuesday CX Thoughts

correlational hypothesis example

Life@QuestionPro: Thomas Maiwald-Immer’s Experience

Aug 9, 2024

Top 13 Reporting Tools to Transform Your Data Insights & More

Top 13 Reporting Tools to Transform Your Data Insights & More

Aug 8, 2024

Other categories

  • Academic Research
  • Artificial Intelligence
  • Assessments
  • Brand Awareness
  • Case Studies
  • Communities
  • Consumer Insights
  • Customer effort score
  • Customer Engagement
  • Customer Experience
  • Customer Loyalty
  • Customer Research
  • Customer Satisfaction
  • Employee Benefits
  • Employee Engagement
  • Employee Retention
  • Friday Five
  • General Data Protection Regulation
  • Insights Hub
  • Life@QuestionPro
  • Market Research
  • Mobile diaries
  • Mobile Surveys
  • New Features
  • Online Communities
  • Question Types
  • Questionnaire
  • QuestionPro Products
  • Release Notes
  • Research Tools and Apps
  • Revenue at Risk
  • Survey Templates
  • Training Tips
  • Tuesday CX Thoughts (TCXT)
  • Uncategorized
  • What’s Coming Up
  • Workforce Intelligence

IMAGES

  1. Independent study tutorial topics Feb 2021

    correlational hypothesis example

  2. PPT

    correlational hypothesis example

  3. Day 9 hypothesis and correlation for students

    correlational hypothesis example

  4. PPT

    correlational hypothesis example

  5. What Is a Correlational Study And Examples of correlational research

    correlational hypothesis example

  6. Correlational Hypothesis Explained

    correlational hypothesis example

COMMENTS

  1. How to Write a Hypothesis for Correlation

    A hypothesis is a testable statement about how something works in the natural world. While some hypotheses predict a causal relationship between two variables, other hypotheses predict a correlation between them. According to the Research Methods Knowledge Base, a correlation is a single number that describes the relationship between two variables.

  2. 11.2: Correlation Hypothesis Test

    The formula for the test statistic is t = r√n − 2 √1 − r2. The value of the test statistic, t, is shown in the computer or calculator output along with the p-value. The test statistic t has the same sign as the correlation coefficient r. The p-value is the combined area in both tails.

  3. Pearson Correlation Coefficient (r)

    Example: Deciding whether to reject the null hypothesis For the correlation between weight and height in a sample of 10 newborns, the t value is less than the critical value of t. Therefore, we don't reject the null hypothesis that the Pearson correlation coefficient of the population (ρ) is 0.

  4. Correlational Study Overview & Examples

    A correlational study is an experimental design that evaluates only the correlation between variables. The researchers record measurements but do not control or manipulate the variables. Correlational research is a form of observational study. A correlation indicates that as the value of one variable increases, the other tends to change in a ...

  5. 12.1.2: Hypothesis Test for a Correlation

    One should perform a hypothesis test to determine if there is a statistically significant correlation between the independent and the dependent variables. The population correlation coefficient \ ... Figure 12-8: Scatterplot for spurious correlation example. With \(r = 0.9586\), there is strong correlation between the number of engineering ...

  6. 1.9

    Let's perform the hypothesis test on the husband's age and wife's age data in which the sample correlation based on n = 170 couples is r = 0.939. To test H 0: ρ = 0 against the alternative H A: ρ ≠ 0, we obtain the following test statistic: t ∗ = r n − 2 1 − R 2 = 0.939 170 − 2 1 − 0.939 2 = 35.39. To obtain the P -value, we need ...

  7. Correlation: Meaning, Types, Examples & Coefficient

    Types. A positive correlation is a relationship between two variables in which both variables move in the same direction. Therefore, one variable increases as the other variable increases, or one variable decreases while the other decreases. An example of a positive correlation would be height and weight. Taller people tend to be heavier.

  8. Interpreting Correlation Coefficients

    A positive correlation example is the relationship between the speed of a wind turbine and the amount of energy it produces. As the turbine speed increases, electricity production also increases. ... Hypothesis Test for Correlation Coefficients. Correlation coefficients have a hypothesis test. As with any hypothesis test, this test takes sample ...

  9. Conducting a Hypothesis Test for the Population Correlation Coefficient

    It should be noted that the three hypothesis tests we learned for testing the existence of a linear relationship — the t-test for H 0: β 1 = 0, the ANOVA F-test for H 0: β 1 = 0, and the t-test for H 0: ρ = 0 — will always yield the same results. For example, if we treat the husband's age ("HAge") as the response and the wife's age ("WAge") as the predictor, each test yields a P-value ...

  10. 12.4 Testing the Significance of the Correlation Coefficient

    The correlation coefficient, r, tells us about the strength and direction of the linear relationship between x and y.However, the reliability of the linear model also depends on how many observed data points are in the sample. We need to look at both the value of the correlation coefficient r and the sample size n, together.. We perform a hypothesis test of the "significance of the correlation ...

  11. Correlation Coefficient

    i. = the difference between the x-variable rank and the y-variable rank for each pair of data. ∑ d2. i. = sum of the squared differences between x- and y-variable ranks. n = sample size. If you have a correlation coefficient of 1, all of the rankings for each variable match up for every data pair.

  12. Correlational Research

    Revised on June 22, 2023. A correlational research design investigates relationships between variables without the researcher controlling or manipulating any of them. A correlation reflects the strength and/or direction of the relationship between two (or more) variables. The direction of a correlation can be either positive or negative.

  13. 5.2

    The first step in conducting a hypothesis test is to write the hypothesis statements that are going to be tested. For each test you will have a null hypothesis (\(H_0\)) and an alternative hypothesis (\(H_a\)). ... 4.4.2.1 - Example: Correlation Between Quiz & Exam Scores; 4.4.2.2 - Example: Difference in Dieting by Biological Sex; 4.4.2.3 ...

  14. Hypothesis Test for Correlation

    The hypothesis test lets us decide whether the value of the population correlation coefficient ρ is "close to zero" or "significantly different from zero.". We decide this based on the sample correlation coefficient r and the sample size n. If the test concludes that the correlation coefficient is significantly different from zero, we ...

  15. Hypothesis Test for Correlation: Explanation & Example

    3. Figure out the critical value from the sample size and significance level. The sample size, n, is 12. The significance level is 5%. The hypothesis is one-tailed since we are only testing for positive correlation. Using the table from the formula booklet, the critical value is shown to be cv = 0.4973.

  16. Correlation Analysis

    Here are a few examples of how correlation analysis could be applied in different contexts: Education: A researcher might want to determine if there's a relationship between the amount of time students spend studying each week and their exam scores. The two variables would be "study time" and "exam scores".

  17. Correlation Studies in Psychology Research

    A correlational study is a type of research design that looks at the relationships between two or more variables. Correlational studies are non-experimental, which means that the experimenter does not manipulate or control any of the variables. A correlation refers to a relationship between two variables. Correlations can be strong or weakand ...

  18. 13.2 Testing the Significance of the Correlation Coefficient

    The hypothesis test lets us decide whether the value of the population correlation coefficient ρ is "close to zero" or "significantly different from zero". We decide this based on the sample correlation coefficient r and the sample size n. If the test concludes that the correlation coefficient is significantly different from zero, we say that ...

  19. Correlation Hypothesis

    A correlational hypothesis in research methodology is a testable hypothesis statement that predicts the presence and nature of a relationship between two or more variables. It forms the basis for conducting a correlational study, where the goal is to measure and analyze the degree of association between variables.

  20. 2.5.2 Hypothesis Testing for Correlation

    The alternative hypothesis, H 1 will be or A two-tailed test would test to see if the population PMCC, ρ , is not equal to zero (meaning there is some form of linear correlation) The alternative hypothesis, H 1 will be Step 2. Either: Compare the value of r calculated from the sample with the critical value

  21. 6 Examples of Correlation in Real Life

    Positive Correlation Examples. Example 1: Height vs. Weight. The correlation between the height of an individual and their weight tends to be positive. In other words, individuals who are taller also tend to weigh more. If we created a scatterplot of height vs. weight, it may look something like this: Example 2: Temperature vs. Ice Cream Sales.

  22. What are hypotheses? • Simply explained

    An example of a hypothesis could be: "Men earn more than women in the same job in Austira." To test this hypothesis, you need data, e.g. from a survey, ... In regard to a correlation hypothesis, this means there is a relationship or correlation between two variables, but it is not said whether this relationship is positive or negative. ...

  23. 12.5: Testing the Significance of the Correlation Coefficient

    The hypothesis test lets us decide whether the value of the population correlation coefficient \(\rho\) is "close to zero" or "significantly different from zero". We decide this based on the sample correlation coefficient \(r\) and the sample size \(n\). If the test concludes that the correlation coefficient is significantly different from zero ...

  24. How to Write a Strong Hypothesis

    Developing a hypothesis (with example) Step 1. Ask a question. Writing a hypothesis begins with a research question that you want to answer. The question should be focused, specific, and researchable within the constraints of your project. Example: Research question.

  25. Correlational Research: What it is with Examples

    Mainly three types of correlational research have been identified: 1. Positive correlation:A positive relationship between two variables is when an increase in one variable leads to a rise in the other variable. A decrease in one variable will see a reduction in the other variable. For example, the amount of money a person has might positively ...